5,649 Matching Annotations
  1. May 2024
    1. Author response:

      The following is the authors’ response to the original reviews.

      We would like to thank both Editors and reviewers for their valuable time, careful reading, and constructive comments. The comments have been highly valuable and useful for improving the quality of our study, as well as important in guiding the direction of our present and future research. In the revised manuscript, we have incorporated the necessary changes including additional experimental data as suggested; please find our detailed pointby-point response to the reviewer’s comments and the changes we have made in the manuscript as follows.

      Reviewer #1 (Public Review):

      In this work, the authors have explored how treating C. albicans fungal cells with EDTA affects their growth and virulence potential. They then explore the use of EDTA-treated yeast as a whole-cell vaccine in a mouse model of systemic infection. In general, the results of the paper are unsurprising. Treating yeast cells with EDTA affects their growth and the addition of metals rescues the phenotype. Because of the significant growth defects of the cells, they don't infect mice and you see reduced virulence. Injection with these cells effectively immunises the mice, in the same way that heatkilled yeast cells would. The data is fairly sound and mostly well-presented, and the paper is easy to follow. However, I feel the data is an incremental advance at best, and the immune analysis in the paper is very basic and descriptive.

      Strengths:

      Detailed analysis of EDTA-treated yeast cells

      Weaknesses:

      • Basic immune data with little advance in knowledge.

      • No comparison between their whole-cell vaccine and others tried in the field.

      • The data is largely unsurprising and not novel.

      Reply: Thank you so much for appreciating our effort to generate a whole cell anti-fungal vaccine by treating C. albicans cells with EDTA. Also, we appreciate your comment that the manuscript is sound and well-presented. However, we are afraid that the respected reviewer assumed the CAET cells as dead cells while they only divide relatively slower than the untreated cells. In the revised manuscript, we have presented additional evidence to show that CAET are live cells (Supp. Figs 2) and based on the new data, we expect a positive change in the reviewer’s opinion. Since CAET is a live strain, the data presented here is novel.

      Reviewer #2 (Public Review):

      Summary:

      Invasive fungal infections are very difficult to treat with limited drug options. With the increasing concern of drug resistance, developing an antifungal vaccine is a high priority. In this study, the authors studied the metal metabolism in Candida albicans by testing some chelators, including EDTA, to block the metal acquisition and metabolism by the fungus. Interestingly, they found EDTAtreated yeast cells grew poorly in vitro and non-pathogenic in vivo in a murine model. Mice immunized by EDTA-treated Candida (CAET) were protected against challenge with wild-type Candida cells. RNA-Seq analysis to survey the gene expression profile in response to EDTA treatment in vitro revealed upregulation of genes in metal homeostasis and downregulation of ribosome biogenesis. They also revealed an induction of both pro- and anti-inflammatory cytokines involved in Th1, Th2 and Th17 host immune response in response to CAET immunization. Overall, this is an interesting study with translational potential.

      Strengths:

      The main strength of the report is that the authors identified a potential whole-cell live vaccine strain that can provide full protection against candidiasis. Abundant data both on in vitro phenotype, gene expression profile, and host immune response have been presented.

      Weaknesses:

      A weakness is that the immune mechanism of CAET-mediated host protection remains unclear. The immune data is somewhat confusing. The authors only checked cytokines and chemokines in blood. The immune response in infected tissues and antibody response may be investigated.

      Reply: Thank you very much for appreciating our work and finding our strain to be a live whole-cell anti-fungal vaccine strain with translational potential. Since the current study focused on the identification and detailed characterizations of a non-genetically modified live-attenuated strain and determination of its safety and efficacy as a potential vaccine candidate in the preclinical model, we have excluded the possible immune mechanisms involving CAET. In a separate study, we are currently investigating both cellular and molecular mechanisms that provide protective immunity in CAET-vaccinated mice.

      Reviewer #3 (Public Review):

      Summary:

      The authors are trying to find a vaccine solution for invasive candidiasis.

      Strengths:

      The testing of the antifungal activity of EDTA on Candida is not new as many other papers have examined this effect. The novelty here is the use of this EDTA-treated strain as a vaccine to protect against a secondary challenge with wild-type Candida.

      Weaknesses:

      However, data presented in Figure 5 and Figure 6 are not convincing and need further experimental controls and analysis as the authors do not show a time-dependent effect on the CFU of their vaccine formulation. The methodology used is also an issue. As it stands, the impact is minor.

      Reply: Thank you so much for appreciating our efforts to develop a novel vaccine against fungal infections. We are extremely sorry for the lack of clarity in our writing related to Figs. 5 and 6, we have now modified the text and hope that the respected reviewer will find these convincing.

      Recommendations for the authors:

      Although the reviewers recognize the importance of the manuscript, they would like to see: 1) comparisons between their whole-cell vaccine and others tried in the field, 2) an investigation of the immune response in infected tissues and antibody response, and 3) more controls in Figures 5 and 6, and a time-dependent effect on the colony-forming units of their vaccine formulation. Please, address the questions and submit a revised version together with a rebuttal letter addressing point-by-point raised by each reviewer.

      Reply: (1) We are afraid that a comparative study of a live and heat-killed cell vaccines will mislead the information presented here. This is the only non-genetically modified antifungal vaccine candidate therefore a comparison with a dead strain at present is unwarranted. We have now added supporting data to confirm that, the survivability of C. albicans cells was unaffected at 6 hr of EDTA treatment (CAET, Supp. Fig. S2). (2) Since the current study focused on the identification and a detailed characterization of a non-genetically modified live attenuated strain and its safety and efficacy as a potential vaccine candidate in the preclinical model, we have excluded the possible immune mechanisms involving CAET. However, in a separate study, we are currently investigating both cellular and molecular mechanisms that provide protective immunity in CAET-vaccinated mice. (3) The results of Figs 5 and 6 were misinterpreted by the respected reviewer, please see the explanation below.

      Reviewer #1 (Recommendations For The Authors):

      Some specific comments/suggestions for the authors: (1) What was the viability of the yeast after EDTA treatment? Is the delayed growth response because many cells died and it takes a while for remaining viable cells to catch up? This is important to know because it may mean the dose given to mice is substantially different and that should be accounted for. Some PI staining of the cells after treatment would help.

      Reply: The growth curve assays (Fig. 1A and 1E) were initiated with O.D.600nm=0.5 of each cultures (~ 107 cells/mL) and the analyses suggested that the EDTA-treated C. albicans cells grew slower than the untreated cells. Fig. 1B and 1F further demonstrated that EDTA has minimal effect on the survival of the strain up to 8 hrs post-exposure. The proportion of the number of cells increased without and with metal chelators almost remained the same for this duration (0 – 8 hrs). Therefore, for subsequent analyses, 6 hr treatment was selected and such treated cells were considered as CAET, which were actively dividing live cells, albeit slower than untreated cells. As suggested and to strengthen our finding, a time dependent SYTOX Green and Propidium iodide staining of C. albicans cells without and with EDTA treatment was carried out and analysed by flow cytometry and microscopy, respectively. Both analyses revealed that the percentage of dead cells up to 12 hrs of without and with EDTA treatment remained the same. The new data has now been added in the revised version of the manuscript as Supplementary figure 2.

      Author response image 1.

      (2) In line with the above, what was the viability of the CAET cells after 3h in media? In the macrophage in vitro experiments, how do you know the reduced viability of the CAET cells is macrophage-specific? Did you run a control of CAET cells in media on their own to determine how CFU changed in macrophage-free conditions? Is the proliferation rates of untreated and CAET cells different? That would affect CFSE labelling and results. These experiments would work better with a GFP-expressing C. albicans strain, which is widely available. In the images in Figure 4c, it looks like there are more hyphae in CAET than untreated - was hyphal induction checked/measured? That's important to know because more hyphae usually means more clumping and this can affect CFU counts (giving the impression of less CFU when actually there is more). Because of all the issues above, I'm not fully convinced by the uptake/killing data.

      Reply: As explained in response 1, we used actively dividing WT and CAET cells, and equal number of these cells were CFSE labelled. As can be seen in Fig.4A, the rate of phagocytosis was the same in 1 hr of pre-culture, but in the subsequent time points the double-positive cells were reduced in the case of CAET cells and that is due to fungal killing by macrophages. Fungal cells were released from the macrophages by warm water treatment and CFU was determined. Fig. 4B suggested that at 1hr of co-culture, the CFU of both fungal cells (WT and CAET) were the same and the fungal clearance was observed at later time points. Thus, the reduced viability of CAET cells was macrophagespecific. EDTA has minimal effect on hyphal transition without and with the presence of serum and the new data has now been provided in the revised version (Supplementary Fig. 3).

      Author response image 2.

      (3) Pooled data should be shown for all animal experiments.

      Reply: Thank you for the suggestion, wherever it was meaningful pooled data for the animal experiments have now been provided.

      (4) Immune cell counts/analysis in the kidney and bone marrow would be hugely helpful and more relevant to understanding immune responses following immunisation/infection. I think a more interesting analysis for the authors to consider would be to immunise with heat-killed yeast vs EDTAtreated yeast and see if there is a qualitative difference or better protection, i.e. is the EDTA-treated whole-cell vaccine superior to the heat-killed version? That is a better question to address. As it stands, the data in the paper is not surprising.

      Reply: The studies on cellular and molecular mechanisms underlying protective immunity in CAETvaccinated mice are under progress in a separate study. This study mostly focused on the identification and detailed characterization of a non-genetically modified live-attenuated strain and its safety and efficacy as a potential vaccine candidate in a preclinical model. We are afraid that a comparison of a live cell (CAET) with a dead cell (heat-killed) will dilute the content of the manuscript and will not be meaningful. It is well accepted that the heat-killed C. albicans strain only provides partial short-lived protection to re-challenge (Refs-PMIDs: 12146759, and 9916097), thus, it does not warrant any comparison with CAET.

      Reviewer #2 (Recommendations For The Authors):

      Overall, this is a highly interesting study. I have the following specific comments for clarification.

      (1) In the introduction, the authors mentioned other anti-candida vaccines that are mostly effective against Candida infection by inducing neutralizing antibodies. However, in their CAET vaccine candidate, they only checked the cellular immunity in blood and found a balanced immune response (both pro- and anti-inflammatory responses are induced). How about the antibody production in these mice? It is a bit surprising that both untreated Candida infection and CAET Candida infection produced similar immune activation based on Figure 6, yet the CAET immunization provides protection. Some innate cell recruitment is higher in untreated Ca infection than the CAET infected mice (Figure 5F). The overall results on immune response characterization did not seem to explain why the CAET infection led to host protection while untreated Ca infection cannot. Characterizing infected tissue immune cell differentiation and cytokine production may offer some additional insights.

      Reply: We agree with you that in this manuscript we have not provided any mechanistic study on the protective immunity in CAET-vaccinated mice. This will be demonstrated in a subsequent study.

      (2) In Figure 5, some critical data seem to be missing in panels B and C. The CFU and histopathological images for CAET-treated mice challenged by Ca should also be shown there for comparison. Although they did show some data in Figure 5E and Figure S4, it is necessary to have that data in 5B and 5C from the same experiment. Figure S4 is a very busy figure and the images are quite small. It may be necessary to use arrows to point out what information authors want to emphasize.

      Reply: Fig 5 B and 5C showed the data for mice that succumbed to infection. Since the other mice (saline control groups, CAET infected, CAET vaccinated, and re-challenged groups) survived, they were not sacrificed; therefore, the CFU data was not collected. In addition, we wanted to see the longevity of these survived mice and after 1 year of observations, they were handed over to the animal house for clearance as per the institutional guidelines. However, Figure 5E and Figure S4 (now Fig. S6) included all the mice groups as they were sacrificed at various time points irrespective of humane end points. As suggested FigS6 has now been modified and fungal cells were denoted by yellow arrows.

      (3) EDTA-treated yeast cells showed poor growth but also had thicker cell walls with high chitin, glucan, and mannan levels. What leads to its clearance in vivo remains unclear, as usually, cells with thick cell wall structures and low metabolism are more resistant to stress, e.g., dormant cells. Macrophages were shown to contribute to CAET killing in a phagocytosis assay (Figure 4). Checking cytokines produced by macrophages during co-incubation may offer some insights. In all, additional discussion on what caused in vivo clearance would be helpful.

      Reply: Mechanistic study on the protective immune responses of CAET will be demonstrated in a separate study. As suggested, the discussion section now contains additional information emphasising the in vivo clearance of CAET cells in the 3rd paragraph of discussion section.

      (4) Long paragraphs in the discussion section could be divided into a bigger number of shorter paragraphs.

      Reply: Thank you for the suggestion, it has now been modified in the revised version (7 short paragraphs). To make it more comprehensive, some of the content has been removed.

      Reviewer #3 (Recommendations For The Authors):

      (1) It is unclear how many cells were treated with 250 micromolar of EDTA for 6 hours before preparing the inoculum. It seems that only the OD was measured before adding EDTA. This is not a very rigorous and reproducible method.

      Reply: In this manuscript, we have repeatedly used the same protocol to generate CAET cells for various analyses. The O.D.600nm= 0.5 culture is equivalent to 107 C. albicans cells per mL and this information has now been added in the revised manuscript.

      (2) Upon treatment with 250 micromolar of EDTA, cells were harvested and counted to prepare the inoculum (5x10e5) for injecting it in mice. However, it appears that CFU of the inoculum was not done. Based on data shown in Fig. 1B, 250 micromolar of EDTA does inhibit Candida cell replication. Thus, the authors may have counted dead cells and, thus, injected dead cells together with live cells for the CAET inoculum. Thus, mice receiving this inoculum may have been infected (and vaccinated) with a lower number of live Candida cells.

      Reply: Please see a similar response to reviewer #1. EDTA has minimal effect on the survival of C. albicans cells at 6 hr (also see supp. Fig. S2). We have already mentioned the CFU analysis of untreated and CAET cells in the methodology section related to inoculum preparation.

      (3) It is unclear if 6 hours of treatment with 250 micromolar of EDTA is enough to induce a block of Candida cell replication. In Figure 1B, the authors treated for 24h. The authors are encouraged to wash the cells after 6 hours of treatment and see if their cell division will recover upon removal of EDTA.

      Reply: Thank you for the suggestion. At 6 hr treatment, survivability of C. albicans cells was unaffected upon EDTA exposure. PI and SYTOX GREEN staining confirmed it (Supp. Fig. 2). Additionally, as suggested a rescue experiment was carried out by exogenous addition of divalent metals after 6 hr EDTA treatment and growth/CFU analyses were followed thereafter. A modified Fig. 1 A and B with new data has been provided.

      (4) The data shown in Figure 5A is extremely exciting. However, the number of mice in each group (n=6) is too low. Normally, 10 mice per group are used for virulence studies unless the authors provide a power analysis that 6 mice per group will be sufficient. Also, CFU data were only provided for Ca and saline-Ca groups (Fig. 5B) and not for the other groups. CFU data should be provided for all mice.

      Reply: Thank you for the suggestion and a statistical analysis of Fig. 5A was provided in the revised version. The rationale behind not including all mice groups in Fig. 5B is already explained in a response to reviewer #2.

      (5) It is unclear how the authors differentiate between CFU arising from CAET or from WT Candida.

      Reply: Since the Fig 5 E demonstrated that no CAET cells were detected in the kidney beyond 10 days of inoculation, in the re-challenged mice group (1CAET 2 Ca), the fungal cells those detected in the 3rd and 7th days were from the later inoculated cells (brown colour).

      (6) Figure 5E: it is unclear if a 1 saline-2 saline (Figure legend) or if 1 saline-2 Ca (text) group was included. If the latter, where are the CFU? It is impossible that 1 saline-2 Ca mice have no CFU.

      Reply: Thank you so much for pointing this out. The legend has now been modified that include 1saline-2saline and 1CAET-2Ca.

      (7) It seems that CFU is significantly present in the kidney in the 1 CAET - 2 Ca group at day 7 but not at day 3. How is this possible? This is an extremely invasive model of infection, and the authors are challenging intravenously 500,000 live Candida cells. If by the 3rd day, the authors detect no CFU, then how is it possible that CFUs are arising on day 7?

      Reply: We do detect fungal cells on 3rd day in 1CAET 2 WT mice group (~2000 cells), albeit much lower than in 7 days (~11200 cells). A Log10 scale graph has now been provided for better representation.

      (8) Most importantly, if the authors are not detecting CFU at day 3, then earlier time points (e.g. day 2, day 1, or even 12 hours post-challenge) must be analyzed. The authors should show that CFU from the organs is decreasing in a time-dependent manner. Also, all CFU should be shown as Log10.

      Reply: please see the previous response.

      (9) Fig. 6: because it is unclear if the mice were challenged with the same inoculum of live Candida cells (untreated and treated with EDTA), the different cytokine profiles between the two groups could be simply due to the different inoculum sizes and not to the effect of EDTA on Ca.

      Reply: please see the previous response as given also for Reviewer 1.

    1. Author response:

      Reviewer #1

      […] it seems that the readout units are not operating in continuous time, and that interval discrimination relies in part on external information. Specifically, the readout units only look at the spike counts during the window delta_t_w.

      In the first version of the review, the reviewer implied that each readout unit only receives input during a small window around the interval it represents. However, this is not the case. The small window that is depicted in Fig. 16 is a sliding window that is used to compute the states (i.e., an estimate of the instantaneous firing rate) at each point in time. The fact that the readout units indeed do operate in continuous time is apparent from Fig. 2A, showing the activity of all output units as a function of time: There is gradually changing activity with a peak at the represented interval. If each unit would only receive input during a window of a couple milliseconds, there would be a single peak of activity at the represented interval, and near-zero activity at any other time.

      This misunderstanding has been cleared out in the current version of the review (see last paragraph of review #1).

      Stimulus onset occurs at 1500 ms in order to allow the network to stabilize. Ideally, this value should be randomized across trials to ensure performance generalizes across initial states.

      This is a valid point which we will address in the revision. However, we note that experimentation with different onset values did not change the dynamics of the network systematically in previous studies (i.e., Hass et al., 2022).

      Why does StDev saturate? Is that because subjective time saturates as well?

      Indeed, the two phenomena are closely related. In section “Deviations from the scalar property and the origin on Vierordt’s law”, we discuss that both is caused by the broadening of the tuning curves of the readout units (Fig 1A) as the longest time constants of the network are exceeded.

      In the discussion, it would be nice to explain that dopaminergic modulation of subjective timing is not as universally observed as the linear psychophysical law or the scalar property, and I believe somewhat controversial (e.g., Ward, ..., Balsam, 2009).

      We are thankful for this advice and will adapt the discussion accordingly in the revision. Still, we note that dopaminergic modulation of subjective timing is one of the more robust effects observed in several time perception experiments.

      Reviewer #2:

      (1) Lack of Empirical Data: […] The paper would benefit from quantitative and qualitative simulations of results from specific, large-sample studies to anchor the model's predictions in concrete empirical evidence.

      While it is correct that this study does not attempt the replicate a concrete empirical study, we note that do compare the model's results with specific studies wherever possible. The comparison is done on the level of parameters of functional relationships: For the linear psychophysical law, we compare the slope and the indifference point of the model with those from experimental studies. For the scalar property, we compare the Weber fraction of the model to those computed from experiments. For dopaminergic modulation of subjective duration, no direct comparison with experimental data is possible, as the levels of modulation are estimated from in vitro experiments and cannot be directly compared with modulations in vivo. However, we discuss a range of qualitative observations in experiments that are reproduced (and explained) by the model.

      The above arguments notwithstanding, one can discuss whether the presentation of the experimental results and the comparison with the simulations is appropriate, and we do plan to extend this presentation in a revision.

      (2) Methodological Ambiguities: The training and testing procedures lack robust checks for generalization, leading to potential overfitting issues.

      It is correct that formal checks for generalization, such as cross-validation protocols, are missing, and we will include them in the revision. However, as we obtained a mechanistic understanding of how the model tells time, we are confident that our results are not due to overfitting.

      (3) Inadequate Visualization of Empirical Data: References to empirical data are vague and not directly visualized alongside model outputs. Future iterations should include empirical data, not general trends from psychophysics, in figures for a clear comparison.

      As mentioned above, the comparison between simulation and empirical data will be extended in a revision. However, we argue that the “general trends”, namely adherence of the model to the often-reported psychophysical regularities, are of greater importance compared to the replication of, e.g. one specific slope of the linear psychophysical law, which does vary a lot between experiments.

      (4) Limitations in Model Scope and Dynamics: […] Expanding the model limitations to consider isochronous pulse processing and the emergence of limit-cycle behaviors after prolonged stimulation would provide a more comprehensive understanding of the model's capabilities and limitations.

      The current research focuses on the estimation of a single duration rather than the processing of sequences of durations. Sequence processing is a vast field, and it has been argued that it comprises different mechanisms compared to duration estimation. Thus, we feel that including sequences processing would be beyond the scope of the already quite extensive paper. However, we will discuss a possible extension of the model to sequence processing in the revision.

      Additionally, the justification for using(N_{Poisson}\) as a proxy for more connections is unclear and warrants a more direct approach.

      We considered different means to vary the noise input into the network, including changes in the number of connections. We ultimately chose to vary the firing rate of a fixed number of Poisson input neurons. As the sum of the firing rates of N independent Poisson neurons with the same f is simply N*f and the synaptic contributions from each spike also linearly add up, this is equivalent to adding more Poisson neurons and thus, more connections.

      (5) Omissions and Redundancies: Certain omissions, such as the lack of a condition in Figure 7A or missing references to relevant models and reviews, detract from the paper's thoroughness.

      The reviewer refers to a condition where everything is ablated except NMDA. We will include such a condition in the revision. Regarding missing references, the reviewer requests including references that focus on sequence processing. While the focus of the current work is on estimating a single duration rather than a sequence of durations (see above), we will include a review on this topic as an outlook on this possible extension of the model.

      Moreover, some statements and terms like "internal clock" are used without a clear mechanistic definition within the model.

      We are thankful for this advice and will adapt the revision accordingly.

    1. Author response:

      The following is the authors’ response to the current reviews.

      eLife assessment

      This useful manuscript challenges the utility of current paradigms for estimating brain-age with magnetic resonance imaging measures, but presents inadequate evidence to support the suggestion that an alternative approach focused on predicting cognition is more useful. The paper would benefit from a clearer explication of the methods and a more critical evaluation of the conceptual basis of the different models. This work will be of interest to researchers working on brain-age and related models.

      Thank you so much for providing high-quality reviews on our manuscript. We revised the manuscript to address all of the reviewers’ comments and provided full responses to each of the comments below. Importantly, in this revision, we clarified that we did not intend to use Brain Cognition as an alternative approach as mentioned by the editor. This is because, by design, the variation in fluid cognition explained by Brain Cognition should be higher or equal to that explained by Brain Age. Here we made this point more explicit and further stated that the relationship between Brain Cognition and fluid cognition indicates the upper limit of Brain Age’s capability in capturing fluid cognition. By examining what was captured by Brain Cognition, over and above Brain Age and chronological age via the unique effects of Brain Cognition, we were able to quantify the amount of co-variation between brain MRI and fluid cognition that was missed by Brain Age. And such quantification is the third aim of this study.

      Reviewer #1 (Public Review):

      In this paper, the authors evaluate the utility of brain age derived metrics for predicting cognitive decline by performing a 'commonality' analysis in a downstream regression that enables the different contribution of different predictors to be assessed. The main conclusion is that brain age derived metrics do not explain much additional variation in cognition over and above what is already explained by age. The authors propose to use a regression model trained to predict cognition ('brain cognition') as an alternative suited to applications of cognitive decline. While this is less accurate overall than brain age, it explains more unique variance in the downstream regression.

      Importantly, in this revision, we clarified that we did not intend to use Brain Cognition as an alternative approach. This is because, by design, the variation in fluid cognition explained by Brain Cognition should be higher or equal to that explained by Brain Age. Here we made this point more explicit and further stated that the relationship between Brain Cognition and fluid cognition indicates the upper limit of Brain Age’s capability in capturing fluid cognition. By examining what was captured by Brain Cognition, over and above Brain Age and chronological age via the unique effects of Brain Cognition, we were able to quantify the amount of co-variation between brain MRI and fluid cognition that was missed by Brain Age.

      REVISED VERSION: while the authors have partially addressed my concerns, I do not feel they have addressed them all. I do not feel they have addressed the weight instability and concerns about the stacked regression models satisfactorily.

      Please see our responses to #3 below

      I also must say that I agree with Reviewer 3 about the limitations of the brain age and brain cognition methods conceptually. In particular that the regression model used to predict fluid cognition will by construction explain more variance in cognition than a brain age model that is trained to predict age. This suffers from the same problem the authors raise with brain age and would indeed disappear if the authors had a separate measure of cognition against which to validate and were then to regress this out as they do for age correction. I am aware that these conceptual problems are more widespread than this paper alone (in fact throughout the brain age literature), so I do not believe the authors should be penalised for that. However, I do think they can make these concerns more explicit and further tone down the comments they make about the utility of brain cognition. I have indicated the main considerations about these points in the recommendations section below.

      Thank you so much for raising this point. We now have the following statement in the introduction and discussion to address this concern (see below).

      Briefly, we made it explicit that, by design, the variation in fluid cognition explained by Brain Cognition should be higher or equal to that explained by Brain Age. That is, the relationship between Brain Cognition and fluid cognition indicates the upper limit of Brain Age’s capability in capturing fluid cognition. More importantly, by examining what was captured by Brain Cognition, over and above Brain Age and chronological age via the unique effects of Brain Cognition, we were able to quantify the amount of co-variation between brain MRI and fluid cognition that was missed by Brain Age. And this is the third goal of this present study.

      From Introduction:

      “Third and finally, certain variation in fluid cognition is related to brain MRI, but to what extent does Brain Age not capture this variation? To estimate the variation in fluid cognition that is related to the brain MRI, we could build prediction models that directly predict fluid cognition (i.e., as opposed to chronological age) from brain MRI data. Previous studies found reasonable predictive performances of these cognition-prediction models, built from certain MRI modalities (Dubois et al., 2018; Pat, Wang, Anney, et al., 2022; Rasero et al., 2021; Sripada et al., 2020; Tetereva et al., 2022; for review, see Vieira et al., 2022). Analogous to Brain Age, we called the predicted values from these cognition-prediction models, Brain Cognition. The strength of an out-of-sample relationship between Brain Cognition and fluid cognition reflects variation in fluid cognition that is related to the brain MRI and, therefore, indicates the upper limit of Brain Age’s capability in capturing fluid cognition. This is, by design, the variation in fluid cognition explained by Brain Cognition should be higher or equal to that explained by Brain Age. Consequently, if we included Brain Cognition, Brain Age and chronological age in the same model to explain fluid cognition, we would be able to examine the unique effects of Brain Cognition that explain fluid cognition beyond Brain Age and chronological age. These unique effects of Brain Cognition, in turn, would indicate the amount of co-variation between brain MRI and fluid cognition that is missed by Brain Age.”

      From Discussion:

      “Third, by introducing Brain Cognition, we showed the extent to which Brain Age indices were not able to capture the variation in fluid cognition that is related to brain MRI. More specifically, using Brain Cognition allowed us to gauge the variation in fluid cognition that is related to the brain MRI, and thereby, to estimate the upper limit of what Brain Age can do. Moreover, by examining what was captured by Brain Cognition, over and above Brain Age and chronological age via the unique effects of Brain Cognition, we were able to quantify the amount of co-variation between brain MRI and fluid cognition that was missed by Brain Age.

      From our results, Brain Cognition, especially from certain cognition-prediction models such as the stacked models, has relatively good predictive performance, consistent with previous studies (Dubois et al., 2018; Pat, Wang, Anney, et al., 2022; Rasero et al., 2021; Sripada et al., 2020; Tetereva et al., 2022; for review, see Vieira et al., 2022). We then examined Brain Cognition using commonality analyses (Nimon et al., 2008) in multiple regression models having a Brain Age index, chronological age and Brain Cognition as regressors to explain fluid cognition. Similar to Brain Age indices, Brain Cognition exhibited large common effects with chronological age. But more importantly, unlike Brain Age indices, Brain Cognition showed large unique effects, up to around 11%. As explained above, the unique effects of Brain Cognition indicated the amount of co-variation between brain MRI and fluid cognition that was missed by a Brain Age index and chronological age. This missing amount was relatively high, considering that Brain Age and chronological age together explained around 32% of the total variation in fluid cognition. Accordingly, if a Brain Age index was used as a biomarker along with chronological age, we would have missed an opportunity to improve the performance of the model by around one-third of the variation explained.”

      This is a reasonably good paper and the use of a commonality analysis is a nice contribution to understanding variance partitioning across different covariates. I have some comments that I believe the authors ought to address, which mostly relate to clarity and interpretation

      Reviewer #1 Public Review #1

      First, from a conceptual point of view, the authors focus exclusively on cognition as a downstream outcome. I would suggest the authors nuance their discussion to provide broader considerations of the utility of their method and on the limits of interpretation of brain age models more generally.

      Thank you for your comments on this issue.

      We now discussed the broader consideration in detail:

      (1) the consistency between our findings on fluid cognition and other recent works on brain disorders,

      (2) the difference between studies investigating the utility of Brain Age in explaining cognitive functioning, including ours and others (e.g., Butler et al., 2021; Cole, 2020, 2020; Jirsaraie, Kaufmann, et al., 2023) and those explaining neurological/psychological disorders (e.g., Bashyam et al., 2020; Rokicki et al., 2021)

      and

      (3) suggested solutions we and others made to optimise the utility of Brain Age for both cognitive functioning and brain disorders.

      From Discussion:

      “This discrepancy between the predictive performance of age-prediction models and the utility of Brain Age indices as a biomarker is consistent with recent findings (for review, see Jirsaraie, Gorelik, et al., 2023), both in the context of cognitive functioning (Jirsaraie, Kaufmann, et al., 2023) and neurological/psychological disorders (Bashyam et al., 2020; Rokicki et al., 2021). For instance, combining different MRI modalities into the prediction models, similar to our stacked models, often leads to the highest performance of age-prediction models, but does not likely explain the highest variance across different phenotypes, including cognitive functioning and beyond (Jirsaraie, Gorelik, et al., 2023).”

      “There is a notable difference between studies investigating the utility of Brain Age in explaining cognitive functioning, including ours and others (e.g., Butler et al., 2021; Cole, 2020, 2020; Jirsaraie, Kaufmann, et al., 2023) and those explaining neurological/psychological disorders (e.g., Bashyam et al., 2020; Rokicki et al., 2021). We consider the former as a normative type of study and the latter as a case-control type of study (Insel et al., 2010; Marquand et al., 2016). Those case-control Brain Age studies focusing on neurological/psychological disorders often build age-prediction models from MRI data of largely healthy participants (e.g., controls in a case-control design or large samples in a population-based design), apply the built age-prediction models to participants without vs. with neurological/psychological disorders and compare Brain Age indices between the two groups. On the one hand, this means that case-control studies treat Brain Age as a method to detect anomalies in the neurological/psychological group (Hahn et al., 2021). On the other hand, this also means that case-control studies have to ignore under-fitted models when applied prediction models built from largely healthy participants to participants with neurological/psychological disorders (i.e., Brain Age may predict chronological age well for the controls, but not for those with a disorder). On the contrary, our study and other normative studies focusing on cognitive functioning often build age-prediction models from MRI data of largely healthy participants and apply the built age-prediction models to participants who are also largely healthy. Accordingly, the age-prediction models for explaining cognitive functioning in normative studies, while not allowing us to detect group-level anomalies, do not suffer from being under-fitted. This unfortunately might limit the generalisability of our study into just the normative type of study. Future work is still needed to test the utility of brain age in the case-control case.”

      “Next, researchers should not select age-prediction models based solely on age-prediction performance. Instead, researchers could select age-prediction models that explained phenotypes of interest the best. Here we selected age-prediction models based on a set of features (i.e., modalities) of brain MRI. This strategy was found effective not only for fluid cognition as we demonstrated here, but also for neurological and psychological disorders as shown elsewhere (Jirsaraie, Gorelik, et al., 2023; Rokicki et al., 2021). Rokicki and colleagues (2021), for instance, found that, while integrating across MRI modalities led to age-prediction models with the highest age-prediction performance, using only T1 structural MRI gave age-prediction models that were better at classifying Alzheimer’s disease. Similarly, using only cerebral blood flow gave age-prediction models that were better at classifying mild/subjective cognitive impairment, schizophrenia and bipolar disorder.

      As opposed to selecting age-prediction models based on a set of features, researchers could also select age-prediction models based on modelling methods. For instance, Jirsaraie and colleagues (2023) compared gradient tree boosting (GTB) and deep-learning brain network (DBN) algorithms in building age-prediction models. They found GTB to have higher age-prediction performance but DBN to have better utility in explaining cognitive functioning. In this case, an algorithm with better utility (e.g., DBN) should be used for explaining a phenotype of interest. Similarly, Bashyam and colleagues (2020) built different DBN-based age-prediction models, varying in age-prediction performance. The DBN models with a higher number of epochs corresponded to higher age-prediction performance. However, DBN-based age-prediction models with a moderate (as opposed to higher or lower) number of epochs were better at classifying Alzheimer’s disease, mild cognitive impairment and schizophrenia. In this case, a model from the same algorithm with better utility (e.g., those DBN with a moderate epoch number) should be used for explaining a phenotype of interest. Accordingly, this calls for a change in research practice, as recently pointed out by Jirasarie and colleagues (2023, p7), “Despite mounting evidence, there is a persisting assumption across several studies that the most accurate brain age models will have the most potential for detecting differences in a given phenotype of interest”. Future neuroimaging research should aim to build age-prediction models that are not necessarily good at predicting age, but at capturing phenotypes of interest.”

      Reviewer #1 Public Review #2

      Second, from a methods perspective, there is not a sufficient explanation of the methodological procedures in the current manuscript to fully understand how the stacked regression models were constructed. I would request that the authors provide more information to enable the reader to better understand the stacked regression models used to ensure that these models are not overfit.

      Thank you for allowing us an opportunity to clarify our stacked model. We made additional clarification to make this clearer (see below). We wanted to confirm that we did not use test sets to build a stacked model in both lower and higher levels of the Elastic Net models. Test sets were there just for testing the performance of the models.

      From Methods: “We used nested cross-validation (CV) to build these prediction models (see Figure 7). We first split the data into five outer folds, leaving each outer fold with around 100 participants. This number of participants in each fold is to ensure the stability of the test performance across folds. In each outer-fold CV loop, one of the outer folds was treated as an outer-fold test set, and the rest was treated as an outer-fold training set. Ultimately, looping through the nested CV resulted in a) prediction models from each of the 18 sets of features as well as b) prediction models that drew information across different combinations of the 18 separate sets, known as “stacked models.” We specified eight stacked models: “All” (i.e., including all 18 sets of features), “All excluding Task FC”, “All excluding Task Contrast”, “Non-Task” (i.e., including only Rest FC and sMRI), “Resting and Task FC”, “Task Contrast and FC”, “Task Contrast” and “Task FC”. Accordingly, there were 26 prediction models in total for both Brain Age and Brain Cognition.

      To create these 26 prediction models, we applied three steps for each outer-fold loop. The first step aimed at tuning prediction models for each of 18 sets of features. This step only involved the outer-fold training set and did not involve the outer-fold test set. Here, we divided the outer-fold training set into five inner folds and applied inner-fold CV to tune hyperparameters with grid search. Specifically, in each inner-fold CV, one of the inner folds was treated as an inner-fold validation set, and the rest was treated as an inner-fold training set. Within each inner-fold CV loop, we used the inner-fold training set to estimate parameters of the prediction model with a particular set of hyperparameters and applied the estimated model to the inner-fold validation set. After looping through the inner-fold CV, we, then, chose the prediction models that led to the highest performance, reflected by coefficient of determination (R2), on average across the inner-fold validation sets. This led to 18 tuned models, one for each of the 18 sets of features, for each outer fold.

      The second step aimed at tuning stacked models. Same as the first step, the second step only involved the outer-fold training set and did not involve the outer-fold test set. Here, using the same outer-fold training set as the first step, we applied tuned models, created from the first step, one from each of the 18 sets of features, resulting in 18 predicted values for each participant. We, then, re-divided this outer-fold training set into new five inner folds. In each inner fold, we treated different combinations of the 18 predicted values from separate sets of features as features to predict the targets in separate “stacked” models. Same as the first step, in each inner-fold CV loop, we treated one out of five inner folds as an inner-fold validation set, and the rest as an inner-fold training set. Also as in the first step, we used the inner-fold training set to estimate parameters of the prediction model with a particular set of hyperparameters from our grid. We tuned the hyperparameters of stacked models using grid search by selecting the models with the highest R2 on average across the inner-fold validation sets. This led to eight tuned stacked models.

      The third step aimed at testing the predictive performance of the 18 tuned prediction models from each of the set of features, built from the first step, and eight tuned stacked models, built from the second step. Unlike the first two steps, here we applied the already tuned models to the outer-fold test set. We started by applying the 18 tuned prediction models from each of the sets of features to each observation in the outer-fold test set, resulting in 18 predicted values. We then applied the tuned stacked models to these predicted values from separate sets of features, resulting in eight predicted values.

      To demonstrate the predictive performance, we assessed the similarity between the observed values and the predicted values of each model across outer-fold test sets, using Pearson’s r, coefficient of determination (R2) and mean absolute error (MAE). Note that for R2, we used the sum of squares definition (i.e., R2 = 1 – (sum of squares residuals/total sum of squares)) per a previous recommendation (Poldrack et al., 2020). We considered the predicted values from the outer-fold test sets of models predicting age or fluid cognition, as Brain Age and Brain Cognition, respectively.”

      Note some previous research, including ours (Tetereva et al., 2022), splits the observations in the outer-fold training set into layer 1 and layer 2 and applies the first and second steps to layers 1 and 2, respectively. Here we decided against this approach and used the same outer-fold training set for both first and second steps in order to avoid potential bias toward the stacked models. This is because, when the data are split into two layers, predictive models built for each separate set of features only use the data from layer 1, while the stacked models use the data from both layers 1 and 2. In practice with large enough data, these two approaches might not differ much, as we demonstrated previously (Tetereva et al., 2022).

      Reviewer #1 Public Review #3

      Please also provide an indication of the different regression strengths that were estimated across the different models and cross-validation splits. Also, how stable were the weights across splits?

      The focus of this article is on the predictions. Still, it is informative for readers to understand how stable the feature importance (i.e., Elastic Net coefficients) is. To demonstrate the stability of feature importance, we now examined the rank stability of feature importance using Spearman’s ρ (see Figure 4). Specifically, we correlated the feature importance between two prediction models of the same features, used in two different outer-fold test sets. Given that there were five outer-fold test sets, we computed 10 Spearman’s ρ for each prediction model of the same features. We found Spearman’s ρ to be varied dramatically in both age-prediction (range=.31-.94) and fluid cognition-prediction (range=.16-.84) models. This means that some prediction models were much more stable in their feature importance than others. This is probably due to various factors such as a) the collinearity of features in the model, b) the number of features (e.g., 71,631 features in functional connectivity, which were further reduced to 75 PCAs, as compared to 19 features in subcortical volume based on the ASEG atlas), c) the penalisation of coefficients either with ‘Ridge’ or ‘Lasso’ methods, which resulted in reduction as a group of features or selection of a feature among correlated features, respectively, and d) the predictive performance of the models. Understanding the stability of feature importance is beyond the scope of the current article. As mentioned by Reviewer 1, “The predictions can be stable when the coefficients are not,” and we chose to focus on the prediction in the current article.

      Reviewer #1 Public Review #4

      Please provide more details about the task designs, MRI processing procedures that were employed on this sample in addition to the regression methods and bias correction methods used. For example, there are several different parameterisations of the elastic net, please provide equations to describe the method used here so that readers can easily determine how the regularisation parameters should be interpreted.

      Thank you for the opportunity for us to provide more methodical details.

      First, for the task design, we included the following statements:

      From Methods:

      “HCP-A collected fMRI data from three tasks: Face Name (Sperling et al., 2001), Conditioned Approach Response Inhibition Task (CARIT) (Somerville et al., 2018) and VISual MOTOR (VISMOTOR) (Ances et al., 2009).

      First, the Face Name task (Sperling et al., 2001) taps into episodic memory. The task had three blocks. In the encoding block [Encoding], participants were asked to memorise the names of faces shown. These faces were then shown again in the recall block [Recall] when the participants were asked if they could remember the names of the previously shown faces. There was also the distractor block [Distractor] occurring between the encoding and recall blocks. Here participants were distracted by a Go/NoGo task. We computed six contrasts for this Face Name task: [Encode], [Recall], [Distractor], [Encode vs. Distractor], [Recall vs. Distractor] and [Encode vs. Recall].

      Second, the CARIT task (Somerville et al., 2018) was adapted from the classic Go/NoGo task and taps into inhibitory control. Participants were asked to press a button to all [Go] but not to two [NoGo] shapes. We computed three contrasts for the CARIT task: [NoGo], [Go] and [NoGo vs. Go].

      Third, the VISMOTOR task (Ances et al., 2009) was designed to test simple activation of the motor and visual cortices. Participants saw a checkerboard with a red square either on the left or right. They needed to press a corresponding key to indicate the location of the red square. We computed just one contrast for the VISMOTOR task: [Vismotor], which indicates the presence of the checkerboard vs. baseline.”

      Second, for MRI processing procedures, we included the following statements.

      From Methods: “HCP-A provides details of parameters for brain MRI elsewhere (Bookheimer et al., 2019; Harms et al., 2018). Here we used MRI data that were pre-processed by the HCP-A with recommended methods, including the MSMALL alignment (Glasser et al., 2016; Robinson et al., 2018) and ICA-FIX (Glasser et al., 2016) for functional MRI. We used multiple brain MRI modalities, covering task functional MRI (task fMRI), resting-state functional MRI (rsfMRI) and structural MRI (sMRI), and organised them into 19 sets of features.”

      “ Sets of Features 1-10: Task fMRI contrast (Task Contrast) Task contrasts reflect fMRI activation relevant to events in each task. Bookheimer and colleagues (2019) provided detailed information about the fMRI in HCP-A. Here we focused on the pre-processed task fMRI Connectivity Informatics Technology Initiative (CIFTI) files with a suffix, “_PA_Atlas_MSMAll_hp0_clean.dtseries.nii.” These CIFTI files encompassed both the cortical mesh surface and subcortical volume (Glasser et al., 2013). Collected using the posterior-to-anterior (PA) phase, these files were aligned using MSMALL (Glasser et al., 2016; Robinson et al., 2018), linear detrended (see https://groups.google.com/a/humanconnectome.org/g/hcp-users/c/ZLJc092h980/m/GiihzQAUAwAJ) and cleaned from potential artifacts using ICA-FIX (Glasser et al., 2016).

      To extract Task Contrasts, we regressed the fMRI time series on the convolved task events using a double-gamma canonical hemodynamic response function via FMRIB Software Library (FSL)’s FMRI Expert Analysis Tool (FEAT) (Woolrich et al., 2001). We kept FSL’s default high pass cutoff at 200s (i.e., .005 Hz). We then parcellated the contrast ‘cope’ files, using the Glasser atlas (Gordon et al., 2016) for cortical surface regions and the Freesurfer’s automatic segmentation (aseg) (Fischl et al., 2002) for subcortical regions. This resulted in 379 regions, whose number was, in turn, the number of features for each Task Contrast set of features. “

      “ Sets of Features 11-13: Task fMRI functional connectivity (Task FC) Task FC reflects functional connectivity (FC ) among the brain regions during each task, which is considered an important source of individual differences (Elliott et al., 2019; Fair et al., 2007; Gratton et al., 2018). We used the same CIFTI file “_PA_Atlas_MSMAll_hp0_clean.dtseries.nii.” as the task contrasts. Unlike Task Contrasts, here we treated the double-gamma, convolved task events as regressors of no interest and focused on the residuals of the regression from each task (Fair et al., 2007). We computed these regressors on FSL, and regressed them in nilearn (Abraham et al., 2014). Following previous work on task FC (Elliott et al., 2019), we applied a highpass at .008 Hz. For parcellation, we used the same atlases as Task Contrast (Fischl et al., 2002; Glasser et al., 2016). We computed Pearson’s correlations of each pair of 379 regions, resulting in a table of 71,631 non-overlapping FC indices for each task. We then applied r-to-z transformation and principal component analysis (PCA) of 75 components (Rasero et al., 2021; Sripada et al., 2019, 2020). Note to avoid data leakage, we conducted the PCA on each training set and applied its definition to the corresponding test set. Accordingly, there were three sets of 75 features for Task FC, one for each task.

      Set of Features 14: Resting-state functional MRI functional connectivity (Rest FC) Similar to Task FC, Rest FC reflects functional connectivity (FC ) among the brain regions, except that Rest FC occurred during the resting (as opposed to task-performing) period. HCP-A collected Rest FC from four 6.42-min (488 frames) runs across two days, leading to 26-min long data (Harms et al., 2018). On each day, the study scanned two runs of Rest FC, starting with anterior-to-posterior (AP) and then with posterior-to-anterior (PA) phase encoding polarity. We used the “rfMRI_REST_Atlas_MSMAll_hp0_clean.dscalar.nii” file that was pre-processed and concatenated across the four runs. We applied the same computations (i.e., highpass filter, parcellation, Pearson’s correlations, r-to-z transformation and PCA) with the Task FC.

      Sets of Features 15-18: Structural MRI (sMRI)

      sMRI reflects individual differences in brain anatomy. The HCP-A used an established pre-processing pipeline for sMRI (Glasser et al., 2013). We focused on four sets of features: cortical thickness, cortical surface area, subcortical volume and total brain volume. For cortical thickness and cortical surface area, we used Destrieux’s atlas (Destrieux et al., 2010; Fischl, 2012) from FreeSurfer’s “aparc.stats” file, resulting in 148 regions for each set of features. For subcortical volume, we used the aseg atlas (Fischl et al., 2002) from FreeSurfer’s “aseg.stats” file, resulting in 19 regions. For total brain volume, we had five FreeSurfer-based features: “FS_IntraCranial_Vol” or estimated intra-cranial volume, “FS_TotCort_GM_Vol” or total cortical grey matter volume, “FS_Tot_WM_Vol” or total cortical white matter volume, “FS_SubCort_GM_Vol” or total subcortical grey matter volume and “FS_BrainSegVol_eTIV_Ratio” or ratio of brain segmentation volume to estimated total intracranial volume.”

      Third, for regression methods and bias correction methods used, we included the following statements:

      From Methods:

      “For the machine learning algorithm, we used Elastic Net (Zou & Hastie, 2005). Elastic Net is a general form of penalised regressions (including Lasso and Ridge regression), allowing us to simultaneously draw information across different brain indices to predict one target variable. Penalised regressions are commonly used for building age-prediction models (Jirsaraie, Gorelik, et al., 2023). Previously we showed that the performance of Elastic Net in predicting cognitive abilities is on par, if not better than, many non-linear and more-complicated algorithms (Pat, Wang, Bartonicek, et al., 2022; Tetereva et al., 2022). Moreover, Elastic Net coefficients are readily explainable, allowing us the ability to explain how our age-prediction and cognition-prediction models made the prediction from each brain feature (Molnar, 2019; Pat, Wang, Bartonicek, et al., 2022) (see below).

      Elastic Net simultaneously minimises the weighted sum of the features’ coefficients. The degree of penalty to the sum of the feature’s coefficients is determined by a shrinkage hyperparameter ‘α’: the greater the α, the more the coefficients shrink, and the more regularised the model becomes. Elastic Net also includes another hyperparameter, ‘l1 ratio’, which determines the degree to which the sum of either the squared (known as ‘Ridge’; l1 ratio=0) or absolute (known as ‘Lasso’; l1 ratio=1) coefficients is penalised (Zou & Hastie, 2005). The objective function of Elastic Net as implemented by sklearn (Pedregosa et al., 2011) is defined as:

      where X is the features, y is the target, and β is the coefficient. In our grid search, we tuned two Elastic Net hyperparameters: α using 70 numbers in log space, ranging from .1 and 100, and l_1-ratio using 25 numbers in linear space, ranging from 0 and 1.

      To understand how Elastic Net made a prediction based on different brain features, we examined the coefficients of the tuned model. Elastic Net coefficients can be considered as feature importance, such that more positive Elastic Net coefficients lead to more positive predicted values and, similarly, more negative Elastic Net coefficients lead to more negative predicted values (Molnar, 2019; Pat, Wang, Bartonicek, et al., 2022). While the magnitude of Elastic Net coefficients is regularised (thus making it difficult for us to interpret the magnitude itself directly), we could still indicate that a brain feature with a higher magnitude weights relatively stronger in making a prediction. Another benefit of Elastic Net as a penalised regression is that the coefficients are less susceptible to collinearity among features as they have already been regularised (Dormann et al., 2013; Pat, Wang, Bartonicek, et al., 2022).

      Given that we used five-fold nested cross validation, different outer folds may have different degrees of ‘α’ and ‘l1 ratio’, making the final coefficients from different folds to be different. For instance, for certain sets of features, penalisation may not play a big part (i.e., higher or lower ‘α’ leads to similar predictive performance), resulting in different ‘α’ for different folds. To remedy this in the visualisation of Elastic Net feature importance, we refitted the Elastic Net model to the full dataset without splitting them into five folds and visualised the coefficients on brain images using Brainspace (Vos De Wael et al., 2020) and Nilern (Abraham et al., 2014) packages. Note, unlike other sets of features, Task FC and Rest FC were modelled after data reduction via PCA. Thus, for Task FC and Rest FC, we, first, multiplied the absolute PCA scores (extracted from the ‘components_’ attribute of ‘sklearn.decomposition.PCA’) with Elastic Net coefficients and, then, summed the multiplied values across the 75 components, leaving 71,631 ROI-pair indices. “

      References

      Abraham, A., Pedregosa, F., Eickenberg, M., Gervais, P., Mueller, A., Kossaifi, J., Gramfort, A., Thirion, B., & Varoquaux, G. (2014). Machine learning for neuroimaging with scikit-learn. Frontiers in Neuroinformatics, 8, 14. https://doi.org/10.3389/fninf.2014.00014

      Ances, B. M., Liang, C. L., Leontiev, O., Perthen, J. E., Fleisher, A. S., Lansing, A. E., & Buxton, R. B. (2009). Effects of aging on cerebral blood flow, oxygen metabolism, and blood oxygenation level dependent responses to visual stimulation. Human Brain Mapping, 30(4), 1120–1132. https://doi.org/10.1002/hbm.20574

      Bashyam, V. M., Erus, G., Doshi, J., Habes, M., Nasrallah, I. M., Truelove-Hill, M., Srinivasan, D., Mamourian, L., Pomponio, R., Fan, Y., Launer, L. J., Masters, C. L., Maruff, P., Zhuo, C., Völzke, H., Johnson, S. C., Fripp, J., Koutsouleris, N., Satterthwaite, T. D., … on behalf of the ISTAGING Consortium, the P. A. disease C., ADNI, and CARDIA studies. (2020). MRI signatures of brain age and disease over the lifespan based on a deep brain network and 14 468 individuals worldwide. Brain, 143(7), 2312–2324. https://doi.org/10.1093/brain/awaa160

      Bookheimer, S. Y., Salat, D. H., Terpstra, M., Ances, B. M., Barch, D. M., Buckner, R. L., Burgess, G. C., Curtiss, S. W., Diaz-Santos, M., Elam, J. S., Fischl, B., Greve, D. N., Hagy, H. A., Harms, M. P., Hatch, O. M., Hedden, T., Hodge, C., Japardi, K. C., Kuhn, T. P., … Yacoub, E. (2019). The Lifespan Human Connectome Project in Aging: An overview. NeuroImage, 185, 335–348. https://doi.org/10.1016/j.neuroimage.2018.10.009

      Butler, E. R., Chen, A., Ramadan, R., Le, T. T., Ruparel, K., Moore, T. M., Satterthwaite, T. D., Zhang, F., Shou, H., Gur, R. C., Nichols, T. E., & Shinohara, R. T. (2021). Pitfalls in brain age analyses. Human Brain Mapping, 42(13), 4092–4101. https://doi.org/10.1002/hbm.25533

      Cole, J. H. (2020). Multimodality neuroimaging brain-age in UK biobank: Relationship to biomedical, lifestyle, and cognitive factors. Neurobiology of Aging, 92, 34–42. https://doi.org/10.1016/j.neurobiolaging.2020.03.014

      Destrieux, C., Fischl, B., Dale, A., & Halgren, E. (2010). Automatic parcellation of human cortical gyri and sulci using standard anatomical nomenclature. NeuroImage, 53(1), 1–15. https://doi.org/10.1016/j.neuroimage.2010.06.010

      Dormann, C. F., Elith, J., Bacher, S., Buchmann, C., Carl, G., Carré, G., Marquéz, J. R. G., Gruber, B., Lafourcade, B., Leitão, P. J., Münkemüller, T., McClean, C., Osborne, P. E., Reineking, B., Schröder, B., Skidmore, A. K., Zurell, D., & Lautenbach, S. (2013). Collinearity: A review of methods to deal with it and a simulation study evaluating their performance. Ecography, 36(1), 27–46. https://doi.org/10.1111/j.1600-0587.2012.07348.x

      Dubois, J., Galdi, P., Paul, L. K., & Adolphs, R. (2018). A distributed brain network predicts general intelligence from resting-state human neuroimaging data. Philosophical Transactions of the Royal Society B: Biological Sciences, 373(1756), 20170284. https://doi.org/10.1098/rstb.2017.0284

      Elliott, M. L., Knodt, A. R., Cooke, M., Kim, M. J., Melzer, T. R., Keenan, R., Ireland, D., Ramrakha, S., Poulton, R., Caspi, A., Moffitt, T. E., & Hariri, A. R. (2019). General functional connectivity: Shared features of resting-state and task fMRI drive reliable and heritable individual differences in functional brain networks. NeuroImage, 189, 516–532. https://doi.org/10.1016/j.neuroimage.2019.01.068

      Fair, D. A., Schlaggar, B. L., Cohen, A. L., Miezin, F. M., Dosenbach, N. U. F., Wenger, K. K., Fox, M. D., Snyder, A. Z., Raichle, M. E., & Petersen, S. E. (2007). A method for using blocked and event-related fMRI data to study “resting state” functional connectivity. NeuroImage, 35(1), 396–405. https://doi.org/10.1016/j.neuroimage.2006.11.051

      Fischl, B. (2012). FreeSurfer. NeuroImage, 62(2), 774–781. https://doi.org/10.1016/j.neuroimage.2012.01.021

      Fischl, B., Salat, D. H., Busa, E., Albert, M., Dieterich, M., Haselgrove, C., van der Kouwe, A., Killiany, R., Kennedy, D., Klaveness, S., Montillo, A., Makris, N., Rosen, B., & Dale, A. M. (2002). Whole Brain Segmentation. Neuron, 33(3), 341–355. https://doi.org/10.1016/S0896-6273(02)00569-X

      Glasser, M. F., Smith, S. M., Marcus, D. S., Andersson, J. L. R., Auerbach, E. J., Behrens, T. E. J., Coalson, T. S., Harms, M. P., Jenkinson, M., Moeller, S., Robinson, E. C., Sotiropoulos, S. N., Xu, J., Yacoub, E., Ugurbil, K., & Van Essen, D. C. (2016). The Human Connectome Project’s neuroimaging approach. Nature Neuroscience, 19(9), 1175–1187. https://doi.org/10.1038/nn.4361

      Glasser, M. F., Sotiropoulos, S. N., Wilson, J. A., Coalson, T. S., Fischl, B., Andersson, J. L., Xu, J., Jbabdi, S., Webster, M., Polimeni, J. R., Van Essen, D. C., & Jenkinson, M. (2013). The minimal preprocessing pipelines for the Human Connectome Project. NeuroImage, 80, 105–124. https://doi.org/10.1016/j.neuroimage.2013.04.127

      Gordon, E. M., Laumann, T. O., Adeyemo, B., Huckins, J. F., Kelley, W. M., & Petersen, S. E. (2016). Generation and Evaluation of a Cortical Area Parcellation from Resting-State Correlations. Cerebral Cortex, 26(1), 288–303. https://doi.org/10.1093/cercor/bhu239

      Gratton, C., Laumann, T. O., Nielsen, A. N., Greene, D. J., Gordon, E. M., Gilmore, A. W., Nelson, S. M., Coalson, R. S., Snyder, A. Z., Schlaggar, B. L., Dosenbach, N. U. F., & Petersen, S. E. (2018). Functional Brain Networks Are Dominated by Stable Group and Individual Factors, Not Cognitive or Daily Variation. Neuron, 98(2), 439-452.e5. https://doi.org/10.1016/j.neuron.2018.03.035

      Hahn, T., Fisch, L., Ernsting, J., Winter, N. R., Leenings, R., Sarink, K., Emden, D., Kircher, T., Berger, K., & Dannlowski, U. (2021). From ‘loose fitting’ to high-performance, uncertainty-aware brain-age modelling. Brain, 144(3), e31–e31. https://doi.org/10.1093/brain/awaa454

      Harms, M. P., Somerville, L. H., Ances, B. M., Andersson, J., Barch, D. M., Bastiani, M., Bookheimer, S. Y., Brown, T. B., Buckner, R. L., Burgess, G. C., Coalson, T. S., Chappell, M. A., Dapretto, M., Douaud, G., Fischl, B., Glasser, M. F., Greve, D. N., Hodge, C., Jamison, K. W., … Yacoub, E. (2018). Extending the Human Connectome Project across ages: Imaging protocols for the Lifespan Development and Aging projects. NeuroImage, 183, 972–984. https://doi.org/10.1016/j.neuroimage.2018.09.060

      Insel, T., Cuthbert, B., Garvey, M., Heinssen, R., Pine, D. S., Quinn, K., Sanislow, C., & Wang, P. (2010). Research Domain Criteria (RDoC): Toward a New Classification Framework for Research on Mental Disorders. American Journal of Psychiatry, 167(7), 748–751. https://doi.org/10.1176/appi.ajp.2010.09091379

      Jirsaraie, R. J., Gorelik, A. J., Gatavins, M. M., Engemann, D. A., Bogdan, R., Barch, D. M., & Sotiras, A. (2023). A systematic review of multimodal brain age studies: Uncovering a divergence between model accuracy and utility. Patterns, 4(4), 100712. https://doi.org/10.1016/j.patter.2023.100712

      Jirsaraie, R. J., Kaufmann, T., Bashyam, V., Erus, G., Luby, J. L., Westlye, L. T., Davatzikos, C., Barch, D. M., & Sotiras, A. (2023). Benchmarking the generalizability of brain age models: Challenges posed by scanner variance and prediction bias. Human Brain Mapping, 44(3), 1118–1128. https://doi.org/10.1002/hbm.26144

      Marquand, A. F., Rezek, I., Buitelaar, J., & Beckmann, C. F. (2016). Understanding Heterogeneity in Clinical Cohorts Using Normative Models: Beyond Case-Control Studies. Biological Psychiatry, 80(7), 552–561. https://doi.org/10.1016/j.biopsych.2015.12.023

      Molnar, C. (2019). Interpretable Machine Learning. A Guide for Making Black Box Models Explainable. https://christophm.github.io/interpretable-ml-book/

      Nimon, K., Lewis, M., Kane, R., & Haynes, R. M. (2008). An R package to compute commonality coefficients in the multiple regression case: An introduction to the package and a practical example. Behavior Research Methods, 40(2), 457–466. https://doi.org/10.3758/BRM.40.2.457

      Pat, N., Wang, Y., Anney, R., Riglin, L., Thapar, A., & Stringaris, A. (2022). Longitudinally stable, brain‐based predictive models mediate the relationships between childhood cognition and socio‐demographic, psychological and genetic factors. Human Brain Mapping, hbm.26027. https://doi.org/10.1002/hbm.26027

      Pat, N., Wang, Y., Bartonicek, A., Candia, J., & Stringaris, A. (2022). Explainable machine learning approach to predict and explain the relationship between task-based fMRI and individual differences in cognition. Cerebral Cortex, bhac235. https://doi.org/10.1093/cercor/bhac235

      Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., & Duchesnay, É. (2011). Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research, 12(85), 2825–2830.

      Poldrack, R. A., Huckins, G., & Varoquaux, G. (2020). Establishment of Best Practices for Evidence for Prediction: A Review. JAMA Psychiatry, 77(5), 534–540. https://doi.org/10.1001/jamapsychiatry.2019.3671

      Rasero, J., Sentis, A. I., Yeh, F.-C., & Verstynen, T. (2021). Integrating across neuroimaging modalities boosts prediction accuracy of cognitive ability. PLOS Computational Biology, 17(3), e1008347. https://doi.org/10.1371/journal.pcbi.1008347

      Robinson, E. C., Garcia, K., Glasser, M. F., Chen, Z., Coalson, T. S., Makropoulos, A., Bozek, J., Wright, R., Schuh, A., Webster, M., Hutter, J., Price, A., Cordero Grande, L., Hughes, E., Tusor, N., Bayly, P. V., Van Essen, D. C., Smith, S. M., Edwards, A. D., … Rueckert, D. (2018). Multimodal surface matching with higher-order smoothness constraints. NeuroImage, 167, 453–465. https://doi.org/10.1016/j.neuroimage.2017.10.037

      Rokicki, J., Wolfers, T., Nordhøy, W., Tesli, N., Quintana, D. S., Alnæs, D., Richard, G., de Lange, A.-M. G., Lund, M. J., Norbom, L., Agartz, I., Melle, I., Nærland, T., Selbæk, G., Persson, K., Nordvik, J. E., Schwarz, E., Andreassen, O. A., Kaufmann, T., & Westlye, L. T. (2021). Multimodal imaging improves brain age prediction and reveals distinct abnormalities in patients with psychiatric and neurological disorders. Human Brain Mapping, 42(6), 1714–1726. https://doi.org/10.1002/hbm.25323

      Somerville, L. H., Bookheimer, S. Y., Buckner, R. L., Burgess, G. C., Curtiss, S. W., Dapretto, M., Elam, J. S., Gaffrey, M. S., Harms, M. P., Hodge, C., Kandala, S., Kastman, E. K., Nichols, T. E., Schlaggar, B. L., Smith, S. M., Thomas, K. M., Yacoub, E., Van Essen, D. C., & Barch, D. M. (2018). The Lifespan Human Connectome Project in Development: A large-scale study of brain connectivity development in 5–21 year olds. NeuroImage, 183, 456–468. https://doi.org/10.1016/j.neuroimage.2018.08.050

      Sperling, R. A., Bates, J. F., Cocchiarella, A. J., Schacter, D. L., Rosen, B. R., & Albert, M. S. (2001). Encoding novel face-name associations: A functional MRI study. Human Brain Mapping, 14(3), 129–139. https://doi.org/10.1002/hbm.1047

      Sripada, C., Angstadt, M., Rutherford, S., Kessler, D., Kim, Y., Yee, M., & Levina, E. (2019). Basic Units of Inter-Individual Variation in Resting State Connectomes. Scientific Reports, 9(1), Article 1. https://doi.org/10.1038/s41598-018-38406-5

      Sripada, C., Angstadt, M., Rutherford, S., Taxali, A., & Shedden, K. (2020). Toward a “treadmill test” for cognition: Improved prediction of general cognitive ability from the task activated brain. Human Brain Mapping, 41(12), 3186–3197. https://doi.org/10.1002/hbm.25007

      Tetereva, A., Li, J., Deng, J. D., Stringaris, A., & Pat, N. (2022). Capturing brain‐cognition relationship: Integrating task‐based fMRI across tasks markedly boosts prediction and test‐retest reliability. NeuroImage, 263, 119588. https://doi.org/10.1016/j.neuroimage.2022.119588

      Vieira, B. H., Pamplona, G. S. P., Fachinello, K., Silva, A. K., Foss, M. P., & Salmon, C. E. G. (2022). On the prediction of human intelligence from neuroimaging: A systematic review of methods and reporting. Intelligence, 93, 101654. https://doi.org/10.1016/j.intell.2022.101654

      Vos De Wael, R., Benkarim, O., Paquola, C., Lariviere, S., Royer, J., Tavakol, S., Xu, T., Hong, S.-J., Langs, G., Valk, S., Misic, B., Milham, M., Margulies, D., Smallwood, J., & Bernhardt, B. C. (2020). BrainSpace: A toolbox for the analysis of macroscale gradients in neuroimaging and connectomics datasets. Communications Biology, 3(1), 103. https://doi.org/10.1038/s42003-020-0794-7

      Woolrich, M. W., Ripley, B. D., Brady, M., & Smith, S. M. (2001). Temporal Autocorrelation in Univariate Linear Modeling of FMRI Data. NeuroImage, 14(6), 1370–1386. https://doi.org/10.1006/nimg.2001.0931

      Zou, H., & Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(2), 301–320. https://doi.org/10.1111/j.1467-9868.2005.00503.x


      The following is the authors’ response to the previous reviews.

      eLife assessment

      This useful manuscript challenges the utility of current paradigms for estimating brain-age with magnetic resonance imaging measures, but presents inadequate evidence to support the suggestion that an alternative approach focused on predicting cognition is more useful. The paper would benefit from a clearer explication of the methods and a more critical evaluation of the conceptual basis of the different models. This work will be of interest to researchers working on brain-age and related models.

      Thank you so much for providing high-quality reviews on our manuscript. We revised the manuscript to address all of the reviewers’ comments and provided full responses to each of the comments below. Importantly, in this revision, we clarified that we did not intend to use Brain Cognition as an alternative approach. This is because, by design, the variation in fluid cognition explained by Brain Cognition should be higher or equal to that explained by Brain Age. Here we made this point more explicit and further stated that the relationship between Brain Cognition and fluid cognition indicates the upper limit of Brain Age’s capability in capturing fluid cognition. By examining what was captured by Brain Cognition, over and above Brain Age and chronological age via the unique effects of Brain Cognition, we were able to quantify the amount of co-variation between brain MRI and fluid cognition that was missed by Brain Age. And such quantification is the third aim of this study.

      Public Reviews:

      Reviewer 1 (Public Review):

      In this paper, the authors evaluate the utility of brain-age-derived metrics for predicting cognitive decline by performing a 'commonality' analysis in a downstream regression that enables the different contribution of different predictors to be assessed. The main conclusion is that brain-age-derived metrics do not explain much additional variation in cognition over and above what is already explained by age. The authors propose to use a regression model trained to predict cognition ("brain-cognition") as an alternative suited to applications of cognitive decline. While this is less accurate overall than brain age, it explains more unique variance in the downstream regression.

      (1) I thank the authors for addressing many of my concerns with this revision. However, I do not feel they have addressed them all. In particular I think the authors could do more to address the concern I raised about the instability of the regression coefficients and about providing enough detail to determine that the stacked regression models do not overfit.

      Thank you Reviewer 1 for the comment. We addressed them in our response to Reviewer 1 Recommendations For The Authors #1 and #2 (see below).

      (2) In considering my responses to the authors revision, I also must say that I agree with Reviewer 3 about the limitations of the brain age and brain cognition methods conceptually. In particular that the regression model used to predict fluid cognition will by construction explain more variance in cognition than a brain age model that is trained to predict age. To be fair, these conceptual problems are more widespread than this paper alone, so I do not believe the authors should be penalised for that. However, I would recommend to make these concerns more explicit in the manuscript

      Thank you Reviewer 1 for the comment. We addressed them in our response to Reviewer 1 Recommendations For The Authors #3 (see below).

      Reviewer 2 (Public Review):

      In this study, the authors aimed to evaluate the contribution of brain-age indices in capturing variance in cognitive decline and proposed an alternative index, brain-cognition, for consideration.

      The study employs suitable methods and data to address the research questions, and the methods and results sections are generally clear and easy to follow.

      I appreciate the authors' efforts in significantly improving the paper, including some considerable changes, from the original submission. While not all reviewer points were tackled, the majority of them were adequately addressed. These include additional analyses, more clarity in the methods and a much richer and nuanced discussion. While recognising the merits of the revised paper, I have a few additional comments.

      (1) Perhaps it would help the reader to note that it might be expected for brain-cognition to account for a significantly larger variance (11%) in fluid cognition, in contrast to brain-age. This stems from the fact that the authors specifically trained brain-cognition to predict fluid cognition, the very variable under consideration. In line with this, the authors later recommend that researchers considering the use of brain-age should evaluate its utility using a regression approach. The latter involves including a brain index (e.g. brain-cognition) previously trained to predict the regression's target variable (e.g. fluid cognition) alongside a brain-age index (e.g., corrected brain-age gap). If the target-trained brain index outperforms the brain-age metric, it suggests that relying solely on brain-age might not be the optimal choice. Although not necessarily the case, is it surprising for the target-trained brain index to demonstrate better performance than brain-age? This harks back to the broader point raised in the initial review: while brain-age may prove useful (though sometimes with modest effect sizes) across diverse outcomes as a generally applicable metric, a brain index tailored for predicting a specific outcome, such as brain-cognition in this case, might capture a considerably larger share of variance in that specific context but could lack broader applicability. The latter aspect needs to be empirically assessed.

      Thank you so much for raising this point. Reviewer 1 (Public Review #2/Recommendations For The Authors #3) and Reviewer 3 (Recommendations for the Authors #1) made a similar observation. We now made changes to the introduction and discussion to address this concern (please see our responses to Reviewer 1 Recommendations For The Authors #3 below).

      Briefly, as in our 2nd revision, we did not intend to compare Brain Age with Brain Cognition since, by design, the variation in fluid cognition explained by Brain Cognition should be higher or equal to that explained by Brain Age. Here we made this point more explicit and further stated that the relationship between Brain Cognition and fluid cognition indicates the upper limit of Brain Age’s capability in capturing fluid cognition. By examining what was captured by Brain Cognition, over and above Brain Age and chronological age via the unique effects of Brain Cognition, we were able to quantify the amount of co-variation between brain MRI and fluid cognition that was missed by Brain Age. And such quantification is the third aim of this study.

      (2) Furthermore, the discussion pertaining to training brain-age models on healthy populations for subsequent testing on individuals with neurological or psychological disorders seems somewhat one-sided within the broader debate. This one-sidedness might potentially confuse readers. It is worth noting that the choice to employ healthy participants in the training model is likely deliberate, serving as a norm against which atypical populations are compared. To provide a more comprehensive understanding, referencing Tim Hans's counterargument to Bashyam's perspective could offer a more complete view (https://academic.oup.com/brain/article/144/3/e31/6214475?login=false).

      Thank you Reviewer 2 for bringing up this issue. We have now revised the paragraph in question and added nuances on the usage of Brain Age for normative vs. case-control studies. We also cited Tim Hahn’s article that explained the conceptual foundation of the use of Brain Age in case-control studies. Please see below. Additionally, we also made a statement about our study not being able to address issues about the case-control studies directly in the newly written conclusion (see Reviewer 3 Recommendations for the Authors #3).

      Discussion:

      “There is a notable difference between studies investigating the utility of Brain Age in explaining cognitive functioning, including ours and others (e.g., Butler et al., 2021; Cole, 2020, 2020; Jirsaraie et al., 2023) and those explaining neurological/psychological disorders (e.g., Bashyam et al., 2020; Rokicki et al., 2021). We consider the former as a normative type of study and the latter as a case-control type of study (Insel et al., 2010; Marquand et al., 2016). Those case-control Brain Age studies focusing on neurological/psychological disorders often build age-prediction models from MRI data of largely healthy participants (e.g., controls in a case-control design or large samples in a population-based design), apply the built age-prediction models to participants without vs. with neurological/psychological disorders and compare Brain Age indices between the two groups. On the one hand, this means that case-control studies treat Brain Age as a method to detect anomalies in the neurological/psychological group (Hahn et al., 2021). On the other hand, this also means that case-control studies have to ignore under-fitted models when applied prediction models built from largely healthy participants to participants with neurological/psychological disorders (i.e., Brain Age may predict chronological age well for the controls, but not for those with a disorder). On the contrary, our study and other normative studies focusing on cognitive functioning often build age-prediction models from MRI data of largely healthy participants and apply the built age-prediction models to participants who are also largely healthy. Accordingly, the age-prediction models for explaining cognitive functioning in normative studies, while not allowing us to detect group-level anomalies, do not suffer from being under-fitted. This unfortunately might limit the generalisability of our study into just the normative type of study. Future work is still needed to test the utility of brain age in the case-control case.”

      (3) Overall, this paper makes a significant contribution to the field of brain-age and related brain indices and their utility.

      Thank you for the encouragement.

      Reviewer 3 (Public Review):

      The main question of this article is as follows: "To what extent does having information on brain-age improve our ability to capture declines in fluid cognition beyond knowing a person's chronological age?" This question is worthwhile, considering that there is considerable confusion in the field about the nature of brain-age.

      (1) Thank you to the authors for addressing so many of my concerns with this revision. There are a few points that I feel still need addressing/clarifying related to 1) calculating brain cognition, 2) the inevitability of their results, and 3) their continued recommendation to use brain-age metrics.

      Thank you Reviewer 3 for the comment. We addressed them in our response to Reviewer 3 Recommendations For The Authors #1-3 (see below).

      Recommendations for the authors:

      Reviewer 1 (Recommendations For The Authors):

      (1) I do not feel the authors have fully addressed the concern I raised about the stacked regression models. Despite the new figure, it is still not entirely clear what the authors are using as the training set in the final step. To be clear, the problem occurs because of the parameters, not the hyperparameters (which the authors now state that they are optimising via nested grid search). in other words, given a regression model y = X*beta, if the X are taken to be predictions from a lower level regression model, then they contain information that is derived from both the training set at the test set for the model that this was trained on. If the split is the same (i.e. the predictions are derived on the same test set as is being used at the second level), then this can lead to overfitting. It is not clear to me whether the authors have done this or not. Please provide additional detail to clarify this point.

      Thank you for allowing us an opportunity to clarify our stacked model. We wanted to confirm that we did not use test sets to build a stacked model in both lower and higher levels of the Elastic Net models. Test sets were there just for testing the performance of the models. We made additional clarification to make this clearer (see below). Let us explain what we did and provide the rationales below.

      From Methods:

      “We used nested cross-validation (CV) to build these prediction models (see Figure 7). We first split the data into five outer folds, leaving each outer fold with around 100 participants. This number of participants in each fold is to ensure the stability of the test performance across folds. In each outer-fold CV loop, one of the outer folds was treated as an outer-fold test set, and the rest was treated as an outer-fold training set. Ultimately, looping through the nested CV resulted in a) prediction models from each of the 18 sets of features as well as b) prediction models that drew information across different combinations of the 18 separate sets, known as “stacked models.” We specified eight stacked models: “All” (i.e., including all 18 sets of features), “All excluding Task FC”, “All excluding Task Contrast”, “Non-Task” (i.e., including only Rest FC and sMRI), “Resting and Task FC”, “Task Contrast and FC”, “Task Contrast” and “Task FC”. Accordingly, there were 26 prediction models in total for both Brain Age and Brain Cognition.

      To create these 26 prediction models, we applied three steps for each outer-fold loop. The first step aimed at tuning prediction models for each of 18 sets of features. This step only involved the outer-fold training set and did not involve the outer-fold test set. Here, we divided the outer-fold training set into five inner folds and applied inner-fold CV to tune hyperparameters with grid search. Specifically, in each inner-fold CV, one of the inner folds was treated as an inner-fold validation set, and the rest was treated as an inner-fold training set. Within each inner-fold CV loop, we used the inner-fold training set to estimate parameters of the prediction model with a particular set of hyperparameters and applied the estimated model to the inner-fold validation set. After looping through the inner-fold CV, we, then, chose the prediction models that led to the highest performance, reflected by coefficient of determination (R2), on average across the inner-fold validation sets. This led to 18 tuned models, one for each of the 18 sets of features, for each outer fold.

      The second step aimed at tuning stacked models. Same as the first step, the second step only involved the outer-fold training set and did not involve the outer-fold test set. Here, using the same outer-fold training set as the first step, we applied tuned models, created from the first step, one from each of the 18 sets of features, resulting in 18 predicted values for each participant. We, then, re-divided this outer-fold training set into new five inner folds. In each inner fold, we treated different combinations of the 18 predicted values from separate sets of features as features to predict the targets in separate “stacked” models. Same as the first step, in each inner-fold CV loop, we treated one out of five inner folds as an inner-fold validation set, and the rest as an inner-fold training set. Also as in the first step, we used the inner-fold training set to estimate parameters of the prediction model with a particular set of hyperparameters from our grid. We tuned the hyperparameters of stacked models using grid search by selecting the models with the highest R2 on average across the inner-fold validation sets. This led to eight tuned stacked models.

      The third step aimed at testing the predictive performance of the 18 tuned prediction models from each of the set of features, built from the first step, and eight tuned stacked models, built from the second step. Unlike the first two steps, here we applied the already tuned models to the outer-fold test set. We started by applying the 18 tuned prediction models from each of the sets of features to each observation in the outer-fold test set, resulting in 18 predicted values. We then applied the tuned stacked models to these predicted values from separate sets of features, resulting in eight predicted values.

      To demonstrate the predictive performance, we assessed the similarity between the observed values and the predicted values of each model across outer-fold test sets, using Pearson’s r, coefficient of determination (R2) and mean absolute error (MAE). Note that for R2, we used the sum of squares definition (i.e., R2 = 1 – (sum of squares residuals/total sum of squares)) per a previous recommendation (Poldrack et al., 2020). We considered the predicted values from the outer-fold test sets of models predicting age or fluid cognition, as Brain Age and Brain Cognition, respectively.”

      Author response image 1.

      Diagram of the nested cross-validation used for creating predictions for models of each set of features as well as predictions for stacked models.

      Note some previous research, including ours (Tetereva et al., 2022), splits the observations in the outer-fold training set into layer 1 and layer 2 and applies the first and second steps to layers 1 and 2, respectively. Here we decided against this approach and used the same outer-fold training set for both first and second steps in order to avoid potential bias toward the stacked models. This is because, when the data are split into two layers, predictive models built for each separate set of features only use the data from layer 1, while the stacked models use the data from both layers 1 and 2. In practice with large enough data, these two approaches might not differ much, as we demonstrated previously (Tetereva et al., 2022).

      (2) I also do not feel the authors have fully addressed the concern I raised about stability of the regression coefficients over splits of the data. I wanted to see the regression coefficients, not the predictions. The predictions can be stable when the coefficients are not.

      The focus of this article is on the predictions. Still, as pointed out by reviewer 1, it is informative for readers to understand how stable the feature importance (i.e., Elastic Net coefficients) is. To demonstrate the stability of feature importance, we now examined the rank stability of feature importance using Spearman’s ρ (see Figure 4). Specifically, we correlated the feature importance between two prediction models of the same features, used in two different outer-fold test sets. Given that there were five outer-fold test sets, we computed 10 Spearman’s ρ for each prediction model of the same features. We found Spearman’s ρ to be varied dramatically in both age-prediction (range=.31-.94) and fluid cognition-prediction (range=.16-.84) models. This means that some prediction models were much more stable in their feature importance than others. This is probably due to various factors such as a) the collinearity of features in the model, b) the number of features (e.g., 71,631 features in functional connectivity, which were further reduced to 75 PCAs, as compared to 19 features in subcortical volume based on the ASEG atlas), c) the penalisation of coefficients either with ‘Ridge’ or ‘Lasso’ methods, which resulted in reduction as a group of features or selection of a feature among correlated features, respectively, and d) the predictive performance of the models. Understanding the stability of feature importance is beyond the scope of the current article. As mentioned by Reviewer 1, “The predictions can be stable when the coefficients are not,” and we chose to focus on the prediction in the current article.

      Author response image 2.

      Stability of feature importance (i.e., Elastic Net Coefficients) of prediction models. Each dot represents rank stability (reflected by Spearman’s ρ) in the feature importance between two prediction models of the same features, used in two different outer-fold test sets. Given that there were five outer-fold test sets, there were 10 Spearman’s ρs for each prediction model. The numbers to the right of the plots indicate the mean of Spearman’s ρ for each prediction model.

      (3) I also must say that I agree with Reviewer 3 about the limitations of the brain-age and brain-cognition methods conceptually. In particular that the regression model used to predict fluid cognition will by construction explain more variance in cognition than a brain-age model that is trained to predict age. This suffers from the same problem the authors raise with brain-age and I agree that this would probably disappear if the authors had a separate measure of cognition against which to validate and were then to regress this out as they do for age correction. I am aware that these conceptual problems are more widespread than this paper alone (in fact throughout the brain-age literature), so I do not believe the authors should be penalised for that. However, I do think they can make these concerns more explicit and further tone down the comments they make about the utility of brain-cognition.

      Thank you so much for raising this point. Reviewer 2 (Public Review #1) and Reviewer 3 (Recommendations for the Authors #1) made a similar observation. We now made changes to the introduction and discussion to address this concern (see below).

      Briefly, we made it explicit that, by design, the variation in fluid cognition explained by Brain Cognition should be higher or equal to that explained by Brain Age. That is, the relationship between Brain Cognition and fluid cognition indicates the upper limit of Brain Age’s capability in capturing fluid cognition. More importantly, by examining what was captured by Brain Cognition, over and above Brain Age and chronological age via the unique effects of Brain Cognition, we were able to quantify the amount of co-variation between brain MRI and fluid cognition that was missed by Brain Age. And this is the third goal of this present study.

      From Introduction:

      “Third and finally, certain variation in fluid cognition is related to brain MRI, but to what extent does Brain Age not capture this variation? To estimate the variation in fluid cognition that is related to the brain MRI, we could build prediction models that directly predict fluid cognition (i.e., as opposed to chronological age) from brain MRI data. Previous studies found reasonable predictive performances of these cognition-prediction models, built from certain MRI modalities (Dubois et al., 2018; Pat et al., 2022; Rasero et al., 2021; Sripada et al., 2020; Tetereva et al., 2022; for review, see Vieira et al., 2022). Analogous to Brain Age, we called the predicted values from these cognition-prediction models, Brain Cognition. The strength of an out-of-sample relationship between Brain Cognition and fluid cognition reflects variation in fluid cognition that is related to the brain MRI and, therefore, indicates the upper limit of Brain Age’s capability in capturing fluid cognition. This is, by design, the variation in fluid cognition explained by Brain Cognition should be higher or equal to that explained by Brain Age. Consequently, if we included Brain Cognition, Brain Age and chronological age in the same model to explain fluid cognition, we would be able to examine the unique effects of Brain Cognition that explain fluid cognition beyond Brain Age and chronological age. These unique effects of Brain Cognition, in turn, would indicate the amount of co-variation between brain MRI and fluid cognition that is missed by Brain Age.”

      From Discussion:

      “Third, by introducing Brain Cognition, we showed the extent to which Brain Age indices were not able to capture the variation in fluid cognition that is related to brain MRI. More specifically, using Brain Cognition allowed us to gauge the variation in fluid cognition that is related to the brain MRI, and thereby, to estimate the upper limit of what Brain Age can do. Moreover, by examining what was captured by Brain Cognition, over and above Brain Age and chronological age via the unique effects of Brain Cognition, we were able to quantify the amount of co-variation between brain MRI and fluid cognition that was missed by Brain Age.

      From our results, Brain Cognition, especially from certain cognition-prediction models such as the stacked models, has relatively good predictive performance, consistent with previous studies (Dubois et al., 2018; Pat et al., 2022; Rasero et al., 2021; Sripada et al., 2020; Tetereva et al., 2022; for review, see Vieira et al., 2022). We then examined Brain Cognition using commonality analyses (Nimon et al., 2008) in multiple regression models having a Brain Age index, chronological age and Brain Cognition as regressors to explain fluid cognition. Similar to Brain Age indices, Brain Cognition exhibited large common effects with chronological age. But more importantly, unlike Brain Age indices, Brain Cognition showed large unique effects, up to around 11%. As explained above, the unique effects of Brain Cognition indicated the amount of co-variation between brain MRI and fluid cognition that was missed by a Brain Age index and chronological age. This missing amount was relatively high, considering that Brain Age and chronological age together explained around 32% of the total variation in fluid cognition. Accordingly, if a Brain Age index was used as a biomarker along with chronological age, we would have missed an opportunity to improve the performance of the model by around one-third of the variation explained.”

      Reviewer #3 (Recommendations For The Authors):

      Thank you to the authors for addressing so many of my concerns with this revision. There are a few points that I feel still need addressing/clarifying related to: 1) calculating brain cognition, 2) the inevitability of their results, and 3) their continued recommendation to use brain age metrics.

      (1) I understand your point here. I think the distinction is that it is fine to build predictive models, but then there is no need to go through this intermediate step of "brain-cognition". Just say that brain features can predict cognition XX well, and brain-age (or some related metric) can predict cognition YY well. It creates a confusing framework for the reader that can lead them to believe that "brain-cognition" is not just a predicted value of fluid cognition from a model using brain features to predict cognition. While you clearly state that that is in fact what it is in the text, which is a huge improvement, I do not see what is added by going through brain-cognition instead of simply just obtaining a change in R2 where the first model uses brain features alone to predict cognition, and the second adds on brain-age (or related metrics), or visa versa, depending on the question. Please do this analysis, and either compare and contrast it with going through "brain-cognition" in your paper, or switch to this analysis, as it more directly addresses the question of the incremental predictive utility of brain-age above and beyond brain features.

      Thank you so much for raising this point. Reviewer 1 (Public Review #2/Recommendations For The Authors #3) and Reviewer 2 (Public Review #1) made a similar observation. We now made changes to the introduction and discussion to address this concern (see our responses to Reviewer 1 Recommendations For The Authors #3 above).

      Briefly, as in our 2nd revision, we made it explicitly clear that we did not intend to compare Brain Age with Brain Cognition since, by design, the variation in fluid cognition explained by Brain Cognition should be higher or equal to that explained by Brain Age. And, by examining what was captured by Brain Cognition, over and above Brain Age and chronological age via the unique effects of Brain Cognition, we were able to quantify the amount of co-variation between brain MRI and fluid cognition that was missed by Brain Age.

      We have thought about changing the name Brain Cognition into something along the lines of “predicted values of prediction models predicting fluid cognition based on brain MRI.” However, this made the manuscript hard to follow, especially with the commonality analyses. For instance, the sentence, “Here, we tested Brain Cognition’s unique effects in multiple regression models with a Brain Age index, chronological age and Brain Cognition as regressors to explain fluid cognition” would become “Here, we tested predicted values of prediction models predicting fluid cognition based on brain MRI unique effects in multiple regression models with a Brain Age index, chronological age and predicted values of prediction models predicting fluid cognition based on brain MRI as regressors to explain fluid cognition.” We believe, given our additional explanation (see our responses to Reviewer 1 Recommendations For The Authors #3 above), readers should understand what Brain Cognition is, and that we did not intend to compare Brain Age and Brain Cognition directly.

      As for the suggested analysis, “obtaining a change in R2 where the first model uses brain features alone to predict cognition, and the second adds on brain-age (or related metrics), or visa versa,” we have already done this in the form of commonality analysis (Nimon et al., 2008) (see Figure 7 below). That is, to obtain unique and common effects of the regressors, we need to look at all of the possible changes in R2 when all possible subsets of regressors were excluded or included, see equations 12 and 13 below.

      From Methods:

      “Similar to the above multiple regression model, we had chronological age, each Brain Age index and Brain Cognition as the regressors for fluid cognition:

      Fluid Cognitioni = β0 + β1 Chronological Agei + β2 Brain Age Indexi,j + β3 Brain Cognitioni + εi, (12)

      Applying the commonality analysis here allowed us, first, to investigate the addictive, unique effects of Brain Cognition, over and above chronological age and Brain Age indices. More importantly, the commonality analysis also enabled us to test the common, shared effects that Brain Cognition had with chronological age and Brain Age indices in explaining fluid cognition. We calculated the commonality analysis as follows (Nimon et al., 2017):

      Unique Effectchronological age = ΔR2chronological age = R2chronological age, Brain Age index, Brain Cognition – R2 Brain Age index, Brain Cognition

      Unique EffectBrain Age index = ΔR2Brain Age index = R2chronological age, Brain Age index, Brain Cognition – R2 chronological age, Brain Cognition

      Unique EffectBrain Cognition = ΔR2Brain Cognition = R2chronological age, Brain Age index, Brain Cognition – R2 chronological age, Brain Age Index

      Common Effectchronological age, Brain Age index = R2chronological age, Brain Cognition + R2 Brain Age index, Brain Cognition – R2 Brain Cognition – R2chronological age, Brain Age index, Brain Cognition

      Common Effectchronological age, Brain Cognition = R2chronological age, Brain Age Index + R2 Brain Age index, Brain Cognition – R2 Brain Age Index – R2chronological age, Brain Age index, Brain Cognition

      Common Effect Brain Age index, Brain Cognition = R2chronological age, Brain Age Index + R2 chronological age, Brain Cognition – R2 chronological age – R2chronological age, Brain Age index, Brain Cognition

      Common Effect chronological age, Brain Age index, Brain Cognition = R2 chronological age + R2 Brain Age Index + R2 Brain Cognition – R2chronological age, Brain Age Index – R2 chronological age, Brain Cognition – R2 Brain Age Index, Brain Cognition – R2chronological age, Brain Age index, Brain Cognition , (13)”

      (2) I agree that the solution is not to exclude age as a covariate, and that there is a big difference between inevitable and obvious. I simply think a further discussion of the inevitability of the results would be clarifying for the readers. There is a big opportunity in the brain-age literature to be as direct as possible about why you are finding what you are finding. People need to know not only what you found, but why you found what you found.

      Thank you. We agreed that we need to make this point more explicit and direct. In the revised manuscript, we had the statements in both Introduction and Discussion (see below) about the tight relationship between Brain Age and chronological age by design, making the small unique effects of Brain Age inevitable.

      Introduction:

      “Accordingly, by design, Brain Age is tightly close to chronological age. Because chronological age usually has a strong relationship with fluid cognition, to begin with, it is unclear how much Brain Age adds to what is already captured by chronological age.“

      Discussion:

      “First, Brain Age itself did not add much more information to help us capture fluid cognition than what we had already known from a person’s chronological age. This can clearly be seen from the small unique effects of Brain Age indices in the multiple regression models having Brain Age and chronological age as the regressors. While the unique effects of some Brain Age indices from certain age-prediction models were statistically significant, there were all relatively small. Without Brain Age indices, chronological age by itself already explained around 32% of the variation in fluid cognition. Including Brain Age indices only added around 1.6% at best. We believe the small unique effects of Brain Age were inevitable because, by design, Brain Age is tightly close to chronological age. Therefore, chronological age and Brain Age captured mostly a similar variation in fluid cognition.

      Investigating the simple regression models and the commonality analysis between each Brain Age index and chronological age provided additional insights….”

      (3) I believe it is very important to critically examine the use of brain-age and related metrics. As part of this process, I think we should be asking ourselves the following questions (among others): Why go through age prediction? Wouldn't the predictions of cognition (or another variable) using the same set of brain features always be as good or better? You still have not justified the use of brain-age. As I said before, if you are going to continue to recommend the use of brain-age, you need a very strong argument for why you are recommending this. What does it truly add? Otherwise, temper your statements to indicate possible better paths forward.

      Thank you Reviewer 3 for making an argument against the use of Brain Age. We largely agree with you. However, our work only focuses on one phenotype, fluid cognition, and on the normative situation (i.e., not having a case vs control group). As Reviewer 2 pointed out, Brain Age might still have utility in other cases, not studied here. Still, future studies that focus on other phenotypes may consider using our approach as a template to test the utility of Brain Age in other situations. We added the conclusion statement to reflect this.

      From Discussion:

      “Altogether, we examined the utility of Brain Age as a biomarker for fluid cognition. Here are the three conclusions. First, Brain Age failed to add substantially more information over and above chronological age. Second, a higher ability to predict chronological age did not correspond to a higher utility to capture fluid cognition. Third, Brain Age missed up to around one-third of the variation in fluid cognition that could have been explained by brain MRI. Yet, given our focus on fluid cognition, future empirical research is needed to test the utility of Brain Age on other phenotypes, especially when Brain Age is used for anomaly detection in case-control studies (e.g., Bashyam et al., 2020; Rokicki et al., 2021). We hope that future studies may consider applying our approach (i.e., using the commonality analysis that includes predicted values from a model that directly predicts the phenotype of interest) to test the utility of Brain Age as a biomarker for other phenotypes.”

      References

      Bashyam, V. M., Erus, G., Doshi, J., Habes, M., Nasrallah, I. M., Truelove-Hill, M., Srinivasan, D., Mamourian, L., Pomponio, R., Fan, Y., Launer, L. J., Masters, C. L., Maruff, P., Zhuo, C., Völzke, H., Johnson, S. C., Fripp, J., Koutsouleris, N., Satterthwaite, T. D., … on behalf of the ISTAGING Consortium, the P. A. disease C., ADNI, and CARDIA studies. (2020). MRI signatures of brain age and disease over the lifespan based on a deep brain network and 14 468 individuals worldwide. Brain, 143(7), 2312–2324. https://doi.org/10.1093/brain/awaa160

      Butler, E. R., Chen, A., Ramadan, R., Le, T. T., Ruparel, K., Moore, T. M., Satterthwaite, T. D., Zhang, F., Shou, H., Gur, R. C., Nichols, T. E., & Shinohara, R. T. (2021). Pitfalls in brain age analyses. Human Brain Mapping, 42(13), 4092–4101. https://doi.org/10.1002/hbm.25533

      Cole, J. H. (2020). Multimodality neuroimaging brain-age in UK biobank: Relationship to biomedical, lifestyle, and cognitive factors. Neurobiology of Aging, 92, 34–42. https://doi.org/10.1016/j.neurobiolaging.2020.03.014

      Dubois, J., Galdi, P., Paul, L. K., & Adolphs, R. (2018). A distributed brain network predicts general intelligence from resting-state human neuroimaging data. Philosophical Transactions of the Royal Society B: Biological Sciences, 373(1756), 20170284. https://doi.org/10.1098/rstb.2017.0284

      Hahn, T., Fisch, L., Ernsting, J., Winter, N. R., Leenings, R., Sarink, K., Emden, D., Kircher, T., Berger, K., & Dannlowski, U. (2021). From ‘loose fitting’ to high-performance, uncertainty-aware brain-age modelling. Brain, 144(3), e31–e31. https://doi.org/10.1093/brain/awaa454

      Insel, T., Cuthbert, B., Garvey, M., Heinssen, R., Pine, D. S., Quinn, K., Sanislow, C., & Wang, P. (2010). Research Domain Criteria (RDoC): Toward a New Classification Framework for Research on Mental Disorders. American Journal of Psychiatry, 167(7), 748–751. https://doi.org/10.1176/appi.ajp.2010.09091379

      Jirsaraie, R. J., Kaufmann, T., Bashyam, V., Erus, G., Luby, J. L., Westlye, L. T., Davatzikos, C., Barch, D. M., & Sotiras, A. (2023). Benchmarking the generalizability of brain age models: Challenges posed by scanner variance and prediction bias. Human Brain Mapping, 44(3), 1118–1128. https://doi.org/10.1002/hbm.26144

      Marquand, A. F., Rezek, I., Buitelaar, J., & Beckmann, C. F. (2016). Understanding Heterogeneity in Clinical Cohorts Using Normative Models: Beyond Case-Control Studies. Biological Psychiatry, 80(7), 552–561. https://doi.org/10.1016/j.biopsych.2015.12.023

      Nimon, K., Lewis, M., Kane, R., & Haynes, R. M. (2008). An R package to compute commonality coefficients in the multiple regression case: An introduction to the package and a practical example. Behavior Research Methods, 40(2), 457–466. https://doi.org/10.3758/BRM.40.2.457

      Pat, N., Wang, Y., Anney, R., Riglin, L., Thapar, A., & Stringaris, A. (2022). Longitudinally stable, brain‐based predictive models mediate the relationships between childhood cognition and socio‐demographic, psychological and genetic factors. Human Brain Mapping, hbm.26027. https://doi.org/10.1002/hbm.26027

      Poldrack, R. A., Huckins, G., & Varoquaux, G. (2020). Establishment of Best Practices for Evidence for Prediction: A Review. JAMA Psychiatry, 77(5), 534–540. https://doi.org/10.1001/jamapsychiatry.2019.3671

      Rasero, J., Sentis, A. I., Yeh, F.-C., & Verstynen, T. (2021). Integrating across neuroimaging modalities boosts prediction accuracy of cognitive ability. PLOS Computational Biology, 17(3), e1008347. https://doi.org/10.1371/journal.pcbi.1008347

      Rokicki, J., Wolfers, T., Nordhøy, W., Tesli, N., Quintana, D. S., Alnæs, D., Richard, G., de Lange, A.-M. G., Lund, M. J., Norbom, L., Agartz, I., Melle, I., Nærland, T., Selbæk, G., Persson, K., Nordvik, J. E., Schwarz, E., Andreassen, O. A., Kaufmann, T., & Westlye, L. T. (2021). Multimodal imaging improves brain age prediction and reveals distinct abnormalities in patients with psychiatric and neurological disorders. Human Brain Mapping, 42(6), 1714–1726. https://doi.org/10.1002/hbm.25323

      Sripada, C., Angstadt, M., Rutherford, S., Taxali, A., & Shedden, K. (2020). Toward a “treadmill test” for cognition: Improved prediction of general cognitive ability from the task activated brain. Human Brain Mapping, 41(12), 3186–3197. https://doi.org/10.1002/hbm.25007

      Tetereva, A., Li, J., Deng, J. D., Stringaris, A., & Pat, N. (2022). Capturing brain‐cognition relationship: Integrating task‐based fMRI across tasks markedly boosts prediction and test‐retest reliability. NeuroImage, 263, 119588. https://doi.org/10.1016/j.neuroimage.2022.119588

      Vieira, B. H., Pamplona, G. S. P., Fachinello, K., Silva, A. K., Foss, M. P., & Salmon, C. E. G. (2022). On the prediction of human intelligence from neuroimaging: A systematic review of methods and reporting. Intelligence, 93, 101654. https://doi.org/10.1016/j.intell.2022.101654

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This paper provides useful information about how the ionome of Arabidopsis thaliana adapts to very high CO2-levels, backed up by solid evidence and carefully designed studies. However, the broader claims of the paper about climate change and food security - heavily emphasized in the abstract, introduction, and discussion - are inappropriate, as there is no direct link to the presented work.

      We sincerely thank you for the work you have done in reviewing our manuscript. We very much appreciate your overall positive assessment of the experimental work as a whole, its value and robustness.

      In this revised version, we took on board the majority of your suggestions and your comments. In particular, we understood your critical point about overstating our objectives, which might in turn seem uncorrelated with our results. We fully agree with the comments that have been made on this point. Consequently, we have made substantial modifications and corrections in order to clarify our objectives and their implications: exploring in depth the natural variation of the shoot ionome response to elevated CO2, and generating a valuable resource allowing a better understanding of the genetic and molecular mechanisms involved in the regulation of plant mineral nutrition by the elevation of atmospheric CO2.

      We also made modifications in response to the other suggestions, including a clarification of the functional experiments carried out around the function of TIP2;2 in response to elevated CO2. Figure 7 now comprises the comparison between both ambient and elevated CO2 conditions, which is much more informative that what appeared in the previous version.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The study's abstract, introduction, and conclusions are not supported by the methods and results conducted. In fact, the results presented suggest that Arabidopsis could easily adapt to an extremely high CO2 environment.

      We understand the reviewer’s comment. Although our work is considered useful, robust and well designed, we agree with the reviewer's point. We have certainly overemphasized the significance of our work to address the issue of food security in response to rising atmospheric CO2, at the expense of the factual description of the results of our fundamental study of the mechanisms at the interface between CO2 and mineral nutrition. We have clarified this focus by modifying the text of the introduction, objectives and discussion. We hope that these modifications will enable readers to better appreciate the core of this work.

      Regarding the last part of the comment, our results do suggest that genetic variation could allow adaptation to rising atmospheric CO2, and our study does indeed aim to identify the extent and basis of this genetic variation.

      This study offers good evidence pointing to a genetic basis for Arabidopsis thaliana's response to elevated CO2 (eCO2) levels and its subsequent impact on the leaf ionome. The natural variation analyses in the study support the hypothesis that genetic factors, rather than local adaptation, guide the influence of eCO2 on the ionome of rosette leaves in Arabidopsis. However, the manuscript's claim regarding its role in "the development of biofortified crops adapted to a high-CO2 world" (line 23) is overstated, especially given the absence of any analysis on the influence of eCO2 on the seed ionome and Arabidopsis is a poor model for harvest index for any crop. The manuscript, in its current form, necessitates massive revisions, particularly in clarifying its broader implications and in providing more substantial evidence for some of its assertions.

      We thank the reviewer for this comment, and we would like to thank the reviewer for the positive appreciation for the identification of genetic basis for Arabidopsis thaliana's response to elevated CO2 and its subsequent impact on the leaf ionome. Nevertheless, it is true that the study of the leaf ionome is far from being able to lead to the development of biofortified plants. Some papers described that nutrient harvest index in Arabidopsis is a potential indicator of nutrient use efficiency (for instance, Masclaux-Daubresse and Chardon, Journal of Experimental Botany 2011 or Aranjuelo et al., Journal of Experimental Botany 2013). However, as we did not include any seed ionome data in the paper, we added clear mentions that our analyses were made on leaves (lines 56/57/250/319) and a comment in the discussion section to address this limitation (lines 325-328).

      Major Drawbacks and Questions:

      (1) Evidence for the Central Premise:

      The foundational premise of the study is the assertion that rising atmospheric CO2 levels result in a decline in plant mineral content. This phenomenon is primarily observed in C3 plants, with C4 plants seemingly less affected. The evidence provided on this topic is scant and, in some instances, contradicts the authors' own references. The potential reduction of certain minerals, especially in grains, can be debated. For instance, reduced nitrogen (N) and phosphorus (P) content in grains might not necessarily be detrimental for human and animal consumption. In fact, it could potentially mitigate issues like nitrogen emissions and phosphorus leaching. Labeling this as a "major threat to food security" (line 30) is exaggerated. While the case for microelements might be more compelling, the introduction fails to articulate this adequately. Furthermore, the introduction lacks any discussion on how eCO2 might influence nutrient allocation to grains, which would be crucial in substantiating the claim that eCO2 poses a threat to food security. A more comprehensive introduction that clearly delineates the adverse effects of eCO2 and its implications for food security would greatly enhance the manuscript.

      We partially agree with this comment. The decline in mineral status of C3 plants under conditions of elevated atmospheric CO2 has been widely described in the literature, and specifically documented for the cereal grains. While there are variations in this effect (depending on species, ecotype, cultivar), there is no debate about its acceptance. Here are just a few of the many works describing this effect, both on a global scale and at the level of the individual plant (Cotrufo MF (1998) Elevated CO2 reduces the nitrogen concentration of plant tissues. Global Change Biology 4: 43-54; Loladze I (2014) Hidden shift of the ionome of plants exposed to elevated CO(2)depletes minerals at the base of human nutrition. eLife 3: e02245; Myers SS (2014) Increasing CO2 threatens human nutrition. Nature 510: 139-142; Poorter H (1997) The effect of elevated CO2 on the chemical composition and construction costs of leaves of 27 C3 species. Plant, Cell & Environment 20: 472-482 ; Soares JC (2019) Preserving the nutritional quality of crop plants under a changing climate: importance and strategies. Plant and Soil 443: 1-26; Stitt] M (1999) The interaction between elevated carbon dioxide and nitrogen nutrition: the physiological and molecular background. Plant, Cell & Environment 22: 583-621; Uddling J (2018) Crop quality under rising atmospheric CO2. Curr Opin Plant Biol 45: 262-267).

      In addition to this, the threat to food security posed by this alteration in plant mineral status has also been well described in the literature by several modeling approaches (Beach RH (2019) Combining the effects of increased atmospheric carbon dioxide on protein, iron, and zinc availability and projected climate change on global diets: a modelling study. Lancet Planet Health 3: e307-e317; Ebi KL (2019) Elevated atmospheric CO(2) concentrations and climate change will affect our food's quality and quantity. Lancet Planet Health 3: e283-e284; Medek DE (2017) Estimated Effects of Future Atmospheric CO2 Concentrations on Protein Intake and the Risk of Protein Deficiency by Country and Region. Environ Health Perspect 125: 087002; Smith MR (2018) Impact of anthropogenic CO2 emissions on global human nutrition. Nature Climate Change 8: 834-839; Weyant C (2018) Anticipated burden and mitigation of carbon-dioxide-induced nutritional deficiencies and related diseases: A simulation modeling study. PLoS Med 15: e1002586; Zhu C (2018) Carbon dioxide (CO2) levels this century will alter the protein, micronutrients, and vitamin content of rice grains with potential health consequences for the poorest rice-dependent countries. Sci Adv 4: eaaq1012). To reinforce this point, we have added a sentence and references (lines 30-33). Nevertheless, we understand the reviewer's comment on the nuance to be given to the intensity of this potential threat. We have therefore modified the text, replacing "major threat" by "significant threat" (lines 3 and 29).

      We also would like to answer the reviewer’s comment on the potential environmental benefit associated with reduced N and P content in grains (mitigation of N emissions and P leaching). Indeed, if this reduced N and P content results from a lowered use efficiency of soil nutrients by plants, as suggested by several studies (Bloom 2010, Cassan 2023, Gojon 2023 and references therein), this may at the opposite favor N oxides emission and P leaching from the soil.

      (2) Exaggerated Concerns:

      The paper begins with the concern that carbon fertilization will lead to carbon dilution in our foods. While we indeed face numerous genuine threats in the coming decades, this particular issue is manageable. The increase in CO2 alone offers many opportunities for boosting yield. However, the heightened heat and increased evapotranspiration will pose massive challenges in many environments.

      While there are indeed multiple threats that we are facing in the coming decades, we don't fully agree with this comment. At present, there's no evidence to say that the negative effect of CO2 on plant mineral content will be manageable. Furthermore, there is compelling evidence that altered mineral nutrition and mineral status of plants will be an important factor limiting the high CO2-induced increase in yield, as will be heat or increased evapotranspiration (see for instance Coskun et al (2016) Nutrient constraints on terrestrial carbon fixation: The role of Nitrogen. J. Plant Physiol. 203: 95-109; Jiang M (2020) Low phosphorus supply constrains plant responses to elevated CO2 : A meta-analysis. Glob Chang Biol 26: 5856-5873 ; Reich PB (2006) Nitrogen limitation constrains sustainability of ecosystem response to CO2. Nature 440: 922-925). Thus, although we do not negate the crucial importance of heat and water stress, we believe it is relevant to study the basic mechanisms responsible for the negative effect of CO2 on plant mineral composition.

      Figure 4 in fact suggests that 43% of the REGMAP panel (cluster 3) is already pre-adapted to very high CO2 levels. This suggests annual species could adapt very rapidly.

      We agree with the reviewer. However, this suggests that genetic variation exists in some ecotypes to support adaptation to elevated CO2. The purpose of this work is indeed to identify this genetic variation, in order to characterize the mechanisms behind.

      (3) Assumptions on CO2 Levels:

      The assumption of 900ppm seems to be based on a very extreme climate change scenario. Most people believe we will overshoot the 1.5°C scenario, however, it seems plausible that 2.5 to 3°C scenarios are more likely. This would correspond to around 500ppm of CO2. https://www.nature.com/articles/s41597-022-01196-7/tables/4

      We agree with the reviewer that the CO2 concentration we used corresponds to a high value in the IPCC projections. That said, this value is currently considered very plausible: the following figure (from Smith and Myers (2018) Nature Climate Change) shows that current CO2 emissions align with the IPCC's most extreme model (RCP 8.5), which would result in a CO2 concentration of around 900 ppm in 2100. Furthermore, nothing allows to exclude the 4°C scenario in the 6th IPCC report.

      Author response image 1.

      (4) Focus on Real Challenges:

      We have numerous real challenges, such as extreme heat and inconsistent rainfall, to address in the context of climate change. However, testing under extreme CO2 conditions and then asserting that carbon dilution will negatively impact nutrition is exaggerated.

      While we fully agree that several threats linked to climate change exist, and all deserve to be studied, we find it questionable to consider that the potential effect of high CO2 on the mineral nutrition of plants is not a real challenge. The mineral nutrition of plants is already a current major environmental challenge. This perspective seems to reflect the reviewer's personal opinion rather than an analysis of our work.

      In contrast, the FACE experiments are fundamental and are conducted at more realistic eCO2 levels. Understanding the interaction between a 20% increase in CO2 and new precipitation patterns is key for global carbon flux prediction.

      Again, we do not fully understand this comment, as the aim of our study was not to perform a global carbon flux prediction, but to unravel genes and mechanisms underlying the negative effect of elevated CO2 on the nutrient content of Arabidopsis rosettes. However, we agree with the reviewer’s comment and with the fact that FACE are useful facilities to explore the CO2 response in more natural environments, and we highlight the fact that the decrease in mineral status of C3 plants has been widely documented in FACE studies. FACE experiments do not facilitate, however, to conduct fully controlled experiments (temperature, rainfall, wind and light intensities are not controllable in FACE), that allow to disentangle the mechanisms by which elevated CO2 regulates the signaling pathways associated with the plant mineral composition. In the longer term, studying the mechanisms we have identified in a more global context of climate change could be highly relevant.

      As I look at the literature on commercial greenhouse tomato production, 1000ppm of eCO2 is common, but it also looks like the breeders and growers have already solved for flavor and nutrition under these conditions.

      Indeed, tomato is often cultivated in CO2-enriched greenhouses at 1000 ppm. According to the literature, this results in a 20-25% reduction in vitamin C or lycopene, and requires a significantly higher nitrogen and water intake to reach expected sugar levels (Doddrell H (2023) Horticulture Research). In addition, the negative effect of elevated CO2 on tomato nutrient content seems to have significant repercussions on nutrition-health properties (Boufeldja (2023), Molecules).

      Conclusion:

      While the study provides valuable insights into the genetic underpinnings of Arabidopsis thaliana's response to elevated CO2 levels, it requires an entirely revised writeup, especially in its abstract, broader claims and implications. The manuscript would benefit from a more thorough introduction, a clearer definition of its scope, and a clear focus on the limits of this study.

      We thank the reviewer for the comments made on our manuscript. In addition to the responses that we provide to these comments, we have modified the main text of the introduction, objectives and discussion to take these comments into consideration. We believe that this will significantly improve the manuscript.

      Reviewer #2 (Public Review):

      Strengths:

      The authors have conducted a large, well-designed experiment to test the response to eCO2. Overall, the experimental design is sound and appropriate for the questions about how a change in CO2 affects the ionome of Arabidopsis. Most of the conclusions in this area are well supported by the data that the authors present.

      We thank the reviewer for this positive appreciation.

      Weakness:

      While the authors have done good experiments, it is a big stretch from Arabidopsis grown in an arbitrary concentration of CO2 to relevance to human and animal nutrition in future climates. Arabidopsis is a great model plant, but its leaves are not generally eaten by humans or animals.

      We agree with the reviewer’s comment. We recognized that implying a direct contribution of our work to human nutrition in the future climates is overstated, as mentioned by the reviewer 1 as well. This was not an intentional overstatement, as we have always been convinced that our work contributed to the understanding of the basic mechanisms involved in the negative regulation of plant mineral nutrition by high CO2. We have significantly modified the text to correct any misunderstanding of our work’s implication.

      The authors don't justify their choice of a CO2 concentration. Given the importance of the parameter for the experiment, the rationale for selecting 900 ppm as elevated CO2 compared to any other concentration should be addressed. And CO2 is just one of the variables that plants will have to contend with in future climates, other variables will also affect elemental concentrations.

      We agree with this comment. We added a justification of the high CO2 concentration used in this work in the Material and Methods section (lines 343-344). You can also read the explanation of this choice in the response to the reviewer 1’s point 3.

      Given these concerns, I think the emphasis on biofortification for future climates is unwarranted for this study.

      Anew, we agree with this comment and we have significantly modified the text to correct any misunderstanding of our work’s implication.

      Additionally, I have trouble with these conclusions:

      -Abstract "Finally, we demonstrate that manipulating the function of one of these genes can mitigate the negative effect of elevated CO2 on the plant mineral composition."

      -Discussion "Consistent with these results, we show that manipulating TIP2;2 expressions with a knock-out mutant can modulate the Zn loss observed under high CO2."

      The authors have not included the data to support this conclusion as stated. They have shown that this mutant increases the Zn content of the leaves when compared to WT but have not demonstrated that this response is different than in ambient CO2. This is an important distinction: one way to ameliorate the reduction of nutrients due to eCO2 is to try to identify genes that are involved in the mechanism of eCO2-induced reduction. Another way is to increase the concentration of nutrients so that the eCO2-induced reduction is not as important (i.e. a 10% reduction in Zn due to eCO2 is not as important if you have increased the baseline Zn concentration by 20%). The authors identified tip2 as a target from the GWAS on difference, but their validation experiment only looks at eCO2.

      We thank the reviewer for this comment, and we agree with it. It is much more interesting, especially in the context of this paper, to analyze the function of a candidate gene not only in elevated CO2, but in both ambient and elevated CO2. Therefore, we added in Figure 7 data for the expression of TIP2;2 in contrasted haplotypes under ambient CO2, in comparison to those already presented under elevated CO2 (now Fig. 7C and 7D). This showed that TIP2;2 expression is lower in haplotype 0 also under ambient CO2. We also added in Figure 7 (Fig. 7E) the Zn level in WT and tip2;2-1 mutant under ambient CO2, in comparison to those already presented under elevated CO2. This showed that that the tip2;2-1 mutant line did not present any decrease in Zn shoot content in response to elevated CO2, in opposition to what is observed for the WT.

      We have added comments associated to these new results in the Results and Discussion sections and in the discussion section (lines 233-242 in the results section, and lines 310-314 in the discussion section).

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Reviewer Comments on the Article's Approach to Ionome Analysis

      (1) Omission of Phosphorus from the Ionome:

      It's surprising that phosphorus (P) was not measured in the ionome. After nitrogen (N), P is often the most limiting mineral for plant development and yield, making it a significant component of the ionome. Why did the authors omit this crucial element?

      We agree with the reviewer that P is an important mineral for plant growth. The absence of data related to P content is due to feasibility constraints rather than oversight. The MP-AES instrument we used to analyze the ionome (except N and C, that we obtained from an Elementar Analyzer) would have required an extra-step and an extra-analysis to obtain data for macronutrient such as P or K. In the context of this large-scale experiment, we faced the necessity to compromise and proceed without these data.

      (2) Relationship Between Leaf Ionome and Seed:

      The manuscript lacks evidence demonstrating the relationship between the leaf ionome and the seed. This connection is vital to establish the study's aims as outlined in lines 20-24. If the central argument is that eCO2 threatens food security, it's essential for the authors to either:

      • Provide evidence that eCO2 induces changes in the ionome profiles of seeds.

      • Show that changes in the rosette leaf ionome lead to alterations in seed ionome profiles.

      We agree with the reviewer. Although we know that seed ionome composition of Arabidopsis model accession such as Columbia is indeed negatively affected by eCO2, we do not provide the data that support some of the terms used in lines 20-24. The correspondence between leaf and seed ionome in natural population under eCO2 is certainly a next question that we will address. Therefore, to align our stated objectives with our data, we have modified the sentence in lines 20-24. We also added a comment on this point lines on the discussion section (lines 324-328).

      (3) Analysis of Ionome in Rosette Leaves:

      Why did the authors choose to analyze the ionome specifically in rosette leaves? Is there a known correlation between the ionome profile in rosette leaves and seeds?

      See our answer to the above comment.

      (4) Experimental Design Comments:

      • The layout of the accession growouts, the methods of randomization, blocking, and controls/checks should be detailed.

      • Were BLUEs (Best Linear Unbiased Estimators) or BLUPs (Best Linear Unbiased Predictors) employed to account for experimental design conditions? If not, it's recommended that they be used.

      We thank the reviewer for this comment. A note on replicates has been added in the Method/Plant Material section. Concerning the BLUEs/BLUPs, although I am not familiar with their use, I do not think that these approaches are relevant in our experimental design. Indeed, we pooled 3 to 5 replicates for each accession to measure the ionome (as mentioned in the Method/Ionome analysis section – we realized this was perhaps not clear enough, and thus we reinforced this point in this section). Therefore, we do not have the variance data required to perform BLUEs/BLUPs.

      (5) Carbon Dilution Effect:

      The statement, "The first component of the PCA described a clear antagonistic trend between C content and the change of other mineral elements (Fig. 3B)..." suggests a well-understood carbon dilution effect. These results are anticipated and align with existing knowledge.

      We thank the reviewer for this comment. However, this sentence does not relate to the biomass dilution hypothesis referred to by the reviewer. Indeed, the composition of each mineral (C and others) is expressed as a percentage of biomass, not as an absolute value. Therefore, this reflects more a probable effect of the increase in carbon compounds (notably soluble sugars), which could influence mineral composition.

      (6) Heritability Estimates:

      The authors should report both the broad-sense heritability and an estimate of heritability based on a GRM or Kinship matrix.

      We thank the reviewer for this suggestion. We are skeptical of using a kinship matrix to estimate heritability in our study. Estimating narrow-sense heritability using a kinship matrix is conceptually based on the infinitesimal model of Fisher, thereby meaning that phenotypic variation is driven by hundreds to thousands of QTLs with small effects. If this is the case, GWAS conducted on several hundred (or even thousands) of genotypes will not be powerful enough to detect such QTLs. Accordingly, estimates of broad-sense heritability based on estimates of variance components can drastically differ from estimates of narrow-sense heritability based on the use of a kinship matrix, as illustrated in the study of Bergelson et al. (2019 Scientific Reports).

      (7) Application of the Breeder's Equation:

      It would be beneficial if the authors applied the breeder's equation to estimate the species' potential rate of response. Based on the allele frequency of the adapted cluster 3 (69 ecotypes or 43% frequency of Figure 3B), it seems plausible that the populations could adapt within 23 generations.

      We thank the reviewer for this suggestion. Indeed, it would be really interesting to test whether sub-populations could adapt in comparison with others, and over what period of time. It is nevertheless not possible to do so using the Breeder’s equation in our case, as this requires fitness data under conditions of ambient or elevated CO2 (i.e. production of seeds) to be applied, and we do not have these data at the level of the whole population.

      (8) Overall Quality:

      In general, the authors have executed a high-quality ionome mapping experiment. However, the abstract, introduction, and discussion should be entirely rewritten and reframed.

      We thank the reviewer for the positive evaluation of our experiment. As previously mentioned, we are for the most part in agreement with the comments made about the need to align our stated objectives with our experimental data and conclusions. To do so, we have rewritten part of the abstract, introduction and discussion. The details of these modifications are described in the responses made to each comment.

      Here's a line-by-line list of suggestions on writing:

      Line 30 would read better with a comma after thus (or by replacing thus with therefore and then a comma at the start of the sentence).

      Line 33 nevertheless would read better in between commas.

      Lines 45 - 48 sentence is too long, could probably divide it into two.

      Lines 90 - 94 are hard to interpret, recommend rephrasing for clarity.

      Line 130 - keep verbs in the past tense for consistency (ran instead of run).

      Line 194 - what do the authors mean by crossed? I'm inferring they looked at the intersection of DEGs with the list of genes identified by GWA mapping, probably should use a more concise word.

      There's a concurrent use of the adjective strong (Lines 80, 142, 144, 197, 245). I would advise using a more concise adjective or avoiding its use to let the reader form their own opinion on the data.

      Lines 174-176 the cited reference (No. 15) is incorrect. The study by Katz et al. (2022) does not provide information on the role of ZIF1 in zinc sequestration mechanisms under elevated CO2 conditions.

      We thank the reviewer for these detailed recommendations. We have corrected or rephrased the text according to these suggestions.

      Reviewer #2 (Recommendations For The Authors):

      Technical points:

      900 ppm as elevated CO2: Given the importance of the parameter for the experiment, the rationale for selection 900 ppm as elevated CO2 compared to any other concentration should be addressed.

      We acknowledge the reviewer's point and have previously addressed related aspects earlier in our response. In line with this, we have included a justification for this particular parameter in the Method section.

      The authors do not mention what genotype was used for their root/shoot RNAseq experiment.

      We thank the reviewer for this comment, and indeed, this information was not mentioned. This is now done, in the Method section.

      Line 125: Spelling error "REGMPA".

      This has been corrected.

      Line 338: Removal of outlier observations - "Prior to GWAS and multivariate analyses such as PCA or clustering, mineral composition measures were pre-processed to remove technical outliers". The authors should mention the exact number of outliers that were removed and what the explicit criteria were for removal.

      The number of outliers removed from each dataset is now indicated in Supplemental Table 7 (this is cited in the Method section). The explicit criteria used for this analysis is actually mentioned in the corresponding Method section: “the values positioned more than 5 median absolute deviations away from the median were removed from the dataset”.

      Line 379: "Lowly expressed genes with an average value across conditions under 25 reads were excluded from the analysis". Providing information about the number of the lowly expressed genes that were removed from the analysis can help with the interpretation of the likelihood of the candidates selected being correct.

      This is a standard procedure in RNAseq analysis. It avoids many false positives in the differential analysis of gene expression based on ratios (where a very small number in the denominator can lead to a very high variation in expression, of no real significance). For information, this step led to the removal of 11607 and 10121 genes for the shoot and root datasets.

      Line 384: It's not clear how many biological replicates were used.

      This has been corrected.

      Additional comment: We have also become aware of a confusion concerning one of the candidate genes located close to GWA peaks: line 180 of the first version, we mentioned CAX1 (AT1G16380) for its role on nutrient deficiency response. There are actually two genes annotated as CAX1 in TAIR (both are cation exchangers), but the one involved in nutrient deficiency response is AT2G38170. We therefore removed the sentence mentioning AT1G16380/CAX1 as a potential candidate gene.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We thank the reviewers for their constructive comments and suggestions. We have prepared a revised manuscript with updated quantification of theta cycle skipping, new statistical comparisons of the difference between the two behavioral tasks, and general improvements to the text and figures.

      Reviewer #1 (Public Review):

      Summary

      The authors provide very compelling evidence that the lateral septum (LS) engages in theta cycle skipping.

      Strengths

      The data and analysis are highly compelling regarding the existence of cycle skipping.

      Weaknesses

      The manuscript falls short on in describing the behavioral or physiological importance of the witnessed theta cycle skipping, and there is a lack of attention to detail with some of the findings and figures:

      More/any description is needed in the article text to explain the switching task and the behavioral paradigm generally. This should be moved from only being in methods as it is essential for understanding the study.

      Following this suggestion, we have expanded the description of the behavioral tasks in the Results section.

      An explanation is needed as to how a cell can be theta skipping if it is not theta rhythmic.

      A cell that is purely theta skipping (i.e., always fires on alternating theta cycles and never on adjacent theta cycles) will only have enhanced power at half theta frequency and not at theta frequency. Such a cell will therefore not be considered theta rhythmic in our analysis. Note, however, that there is a large overlap between theta rhythmic and theta skipping cell populations in our data (Figure 3 - figure supplement 2), indicating that most cells are not purely theta skipping.

      The most interesting result, in my opinion, is the last paragraph of the entire results section, where there is more switching in the alternation task, but the reader is kind of left hanging as to how this relates to other findings. How does this relate to differences in decoding of relative arms (the correct or incorrect arm) during those theta cycles or to the animal's actual choice? Similarly, how does it relate to the animal's actual choice? Is this phenomenon actually behaviorally or physiologically meaningful at all? Does it contribute at all to any sort of planning or decision-making?

      We agree that the difference between the two behavioral tasks is very interesting. It may provide clues about the mechanisms that control the cycle-by-cycle expression of possible future paths and the potential impact of goal-directed planning and (recent) experience. In the revised manuscript, we have expanded the analysis of the differences in theta-cycle dynamics between the two behavioral tasks. First, we confirm the difference through a new quantification and statistical comparison. Second, we performed additional analyses to explore the idea that the alternation of non-local representations reflects the number of relevant paths available to the animal (Figure 11 – figure supplements 2 and 3), but this did not appear to be the case. However, these results provide a starting point for future studies to clarify the task dependence of the theta- cycle dynamics of spatial representations and to address the important question of behavioral/physiological relevance.

      The authors state that there is more cycle skipping in the alternation task than in the switching task, and that this switching occurs in the lead-up to the choice point. Then they say there is a higher peak at ~125 in the alternation task, which is consistent. However, in the final sentence, the authors note that "This result indicates that the representations of the goal arms alternate more strongly ahead of the choice point when animals performed a task in which either goal arm potentially leads to reward." Doesn't either arm potentially lead to a reward (but different amounts) in the switching task, not the alternation task? Yet switching is stronger in the alternation task, which is not constant and contradicts this last sentence.

      The reviewer is correct that both choices lead to (different amounts of) reward in the switching task. As written, the sentence that the reviewer refers to is indeed not accurate and we have rephrased it to: “This result indicates that the representations of the goal arms alternate more strongly ahead of the choice point when animals performed a task in which either goal arm potentially leads to a desirable high-value reward.”.

      Additionally, regarding the same sentence - "representations of the goal arms alternate more strongly ahead of the choice point when the animals performed a task in which either goal arm potentially leads to reward." - is this actually what is going on? Is there any reason at all to think this has anything to do with reward versus just a navigational choice?

      We appreciate the reviewer’s feedback and acknowledge that our statement needs clarification. At the choice point in the Y-maze there are two physical future paths available to the animal (disregarding the path that the animal took to reach the choice point) – we assume this is what the reviewer refers to as “a navigational choice”. One hypothesis could be that alternation of goal arm representations is present whenever there are multiple future paths available, irrespective of the animal’s (learned) preference to visit one or the other goal arm. However, the reduced alternation of goal arm representations in the switching task that we report, suggests that the animal’s recent history of goal arm visits and reward expectations likely do influence the theta-cycle representations ahead of the choice point. We have expanded our analysis to test if theta cycle dynamics differ for trials before and after a switch in reward contingency in the switching task, but there was no statistical difference in our data. We have rewritten and expanded this part of the results to make our point more clearly.

      Similarly, the authors mention several times that the LS links the HPC to 'reward' regions in the brain, and it has been found that the LS represents rewarded locations comparatively more than the hippocampus. How does this relate to their finding?

      Indeed, Wirtshafter and Wilson (2020) reported that lateral septum cells are more likely to have a place field close to a reward site than elsewhere in their double-sided T-maze. It is possible that this indicates a shift towards reward or value representations in the lateral septum. In our study we did not look at reward-biased cells and whether they are more or less likely to engage in theta cycle skipping. This could be a topic for future analyses. It should be noted that the study by Wirtshafter and Wilson (2020) reports that a reward bias was predominantly present for place fields in the direction of travel away from the reward site. These reward-proximate LS cells may thus contribute to theta-cycle skipping in the inbound direction, but it is not clear if these cells would be active during theta sweeps when approaching the choice point in the outbound direction.

      Reviewer #2 (Public Review)

      Summary

      Recent evidence indicates that cells of the navigation system representing different directions and whole spatial routes fire in a rhythmic alternation during 5-10 Hz (theta) network oscillation (Brandon et al., 2013, Kay et al., 2020). This phenomenon of theta cycle skipping was also reported in broader circuitry connecting the navigation system with the cognitive control regions (Jankowski et al., 2014, Tang et al., 2021). Yet nothing was known about the translation of these temporally separate representations to midbrain regions involved in reward processing as well as the hypothalamic regions, which integrate metabolic, visceral, and sensory signals with the descending signals from the forebrain to ensure adaptive control of innate behaviors (Carus-Cadavieco et al., 2017). The present work aimed to investigate theta cycle skipping and alternating representations of trajectories in the lateral septum, neurons of which receive inputs from a large number of CA1 and nearly all CA3 pyramidal cells (Risold and Swanson, 1995). While spatial firing has been reported in the lateral septum before (Leutgeb and Mizumori, 2002, Wirtshafter and Wilson, 2019), its dynamic aspects have remained elusive. The present study replicates the previous findings of theta-rhythmic neuronal activity in the lateral septum and reports a temporal alternation of spatial representations in this region, thus filling an important knowledge gap and significantly extending the understanding of the processing of spatial information in the brain. The lateral septum thus propagates the representations of alternative spatial behaviors to its efferent regions. The results can instruct further research of neural mechanisms supporting learning during goal-oriented navigation and decision-making in the behaviourally crucial circuits entailing the lateral septum.

      Strengths

      To this end, cutting-edge approaches for high-density monitoring of neuronal activity in freely behaving rodents and neural decoding were applied. Strengths of this work include comparisons of different anatomically and probably functionally distinct compartments of the lateral septum, innervated by different hippocampal domains and projecting to different parts of the hypothalamus; large neuronal datasets including many sessions with simultaneously recorded neurons; consequently, the rhythmic aspects of the spatial code could be directly revealed from the analysis of multiple spike trains, which were also used for decoding of spatial trajectories; and comparisons of the spatial coding between the two differently reinforced tasks.

      Weaknesses

      Possible in principle, with the present data across sessions, longitudinal analysis of the spatial coding during learning the task was not performed. Without using perturbation techniques, the present approach could not identify the aspects of the spatial code actually influencing the generation of behaviors by downstream regions.

      Reviewer #3 (Public Review)

      Summary

      Bzymek and Kloosterman carried out a complex experiment to determine the temporal spike dynamics of cells in the dorsal and intermediate lateral septum during the performance of a Y-maze spatial task. In this descriptive study, the authors aim to determine if inputting spatial and temporal dynamics of hippocampal cells carry over to the lateral septum, thereby presenting the possibility that this information could then be conveyed to other interconnected subcortical circuits. The authors are successful in these aims, demonstrating that the phenomenon of theta cycle skipping is present in cells of the lateral septum. This finding is a significant contribution to the field as it indicates the phenomenon is present in neocortex, hippocampus, and the subcortical hub of the lateral septal circuit. In effect, this discovery closes the circuit loop on theta cycle skipping between the interconnected regions of the entorhinal cortex, hippocampus, and lateral septum. Moreover, the authors make 2 additional findings: 1) There are differences in the degree of theta modulation and theta cycle skipping as a function of depth, between the dorsal and intermediate lateral septum; and 2) The significant proportion of lateral septum cells that exhibit theta cycle skipping, predominantly do so during 'non-local' spatial processing.

      Strengths

      The major strength of the study lies in its design, with 2 behavioral tasks within the Y-maze and a battery of established analyses drawn from prior studies that have established spatial and temporal firing patterns of entorhinal and hippocampal cells during these tasks. Primary among these analyses, is the ability to decode the animal's position relative to locations of increased spatial cognitive demand, such as the choice point before the goal arms. The presence of theta cycle skipping cells in the lateral septum is robust and has significant implications for the ability to dissect the generation and transfer of spatial routes to goals within and between the neocortex and subcortical neural circuits.

      Weaknesses

      There are no major discernable weaknesses in the study, yet the scope and mechanism of the theta cycle phenomenon remain to be placed in the context of other phenomena indicative of spatial processing independent of the animal's current position. An example of this would be the ensemble-level 'scan ahead' activity of hippocampal place cells (Gupta et al., 2012; Johnson & Redish, 2007). Given the extensive analytical demands of the study, it is understandable that the authors chose to limit the analyses to the spatial and burst firing dynamics of the septal cells rather than the phasic firing of septal action potentials relative to local theta oscillations or CA1 theta oscillations. Yet, one would ideally be able to link, rather than parse the phenomena of temporal dynamics. For example, Tingley et al recently showed that there was significant phase coding of action potentials in lateral septum cells relative to spatial location (Tingley & Buzsaki, 2018). This begs the question as to whether the non-uniform distribution of septal cell activity within the Y-maze may have a phasic firing component, as well as a theta cycle skipping component. If so, these phenomena could represent another means of information transfer within the spatial circuit during cognitive demands. Alternatively, these phenomena could be part of the same process, ultimately representing the coherent input of information from one region to another. Future experiments will therefore have to sort out whether theta cycle skipping, is a feature of either rate or phase coding, or perhaps both, depending on circuit and cognitive demands.

      The authors have achieved their aims of describing the temporal dynamics of the lateral septum, at both the dorsal extreme and the intermediate region. All conclusions are warranted.

      Reviewer #1 (Recommendations For The Authors)

      The text states: "We found that 39.7% of cells in the LSD and 32.4% of cells in LSI had significantly higher CSI values than expected by chance on at least one of the trajectories." The text in the supplemental figure indicates a p-value of 0.05 was used to determine significance. However, four trajectory categories are being examined so a Bonferroni correction should be used (significance at p<0.0125).

      Indeed, a p-value correction for multiple tests should be performed when determining theta cycle skipping behavior for each of the four trajectories. We thank the reviewer for pointing out this oversight. We have implemented a Holm-Sidak p-value correction for the number of tested trajectories per cell (excluding trajectories with insufficient spikes). As a consequence, the number of cells with significant cycle-skipping activity decreased, but overall the results have not changed.

      Figure 4 is very confusing as raster plots are displayed for multiple animals but it is unclear which animal the LFP refers to? The bottom of the plot is also referenced twice in the figure caption.

      We apologize for the confusion. We have removed this figure in the revised manuscript, as it was not necessary to make the point about the spatial distribution of theta cycle skipping. Instead, we show examples of spatially-resolved cycle skipping in Figure 4 (formerly Figure 5 - supplementary figures 1 and 2) and we have added a plot with the spatially-resolved cycle skipping index for all analyzed cells in Figure 5A.

      Figure 6 has, I think, an incorrect caption or figure. Only A and B are marked in the figure but A-G are mentioned in the caption but do not appear to correspond to anything in the figure.

      Indeed, the caption was outdated. This has now been corrected.

      Figure 8 is also confusing for several reasons: how is the probability scale on the right related to multiple semi-separate (top and middle) figures? In the top and bottom figures, it is not clear what the right and left sides refer to. It is also unclear why a probability of 0.25 is used for position (seems potentially low). The caption also mentions Figure A but there are no lettered "sub" figures in Figure 8.

      The color bar on the right applies to both the top plot (directional decoding) and the middle plot (positional decoding). However, the maximum probability that is represented by black differs between the top and middle plots. We acknowledge that a shared color bar may lead to confusion and we have given each of the plots a separate color bar.

      As for the maximum probability of 0.25 for position: this was a typo in the legend. The correct maximum value is 0.5. In general, the posterior probability will be distributed over multiple (often neighboring) spatial bins, and the distribution of maximum probabilities will depend on the number of spatial bins, the level of spatial smoothing in the decoding algorithm, and the amount of decodable information in the data. It would be more appropriate to consider the integrated probability over a small section of the maze, rather than the peak probability that is assigned to a single 5 cm bin. Also, note that a posterior probability of 0.5 is many times higher than the probability associated with a uniform distribution, which is in our case.

      The left and right sides of the plots represent two different journeys that the animal ran. On the left an outbound journey is shown, and on the right an inbound journey. We have improved the figure and the description in the legend to make this clearer.

      The reviewer is correct that there are no panels in Figure 8 and we have corrected the legend.

      Some minor concerns

      The introduction states that "a few studies have reported place cell-like activity in the lateral septum (Tingley and Buzsaki, 2018; Wirtshafter and Wilson, 2020, 2019)." However, notably and controversially, the Tingley study is one of the few studies to find NO place cell activity in the lateral septum. This is sort of mentioned later but the citation in this location should be removed.

      The reviewer is correct, Tingley and Buzsaki reported a spatial phase code but no spatial rate code. We have removed the citation.

      Stronger position/direction coding in the dLS consistent with prior studies and they should be cited in text (not a novel finding).

      Thank you for pointing out this omission. Indeed, a stronger spatial coding in the dorsal lateral septum has been reported before, for example by Van der Veldt et al. (2021). We now cite this paper when discussing these findings.

      Why is the alternation task administered for 30m but the switching task for 45m?

      The reason is that rats received a larger reward in the switching task (in the high-reward goal arm) and took longer to complete trials on average. To obtain a more-or-less similar number of trials per session in both tasks, we extended the duration of switching task sessions to 45 minutes. We have added this explanation to the text.

      Regarding the percentage of spatially modulated cells in the discussion, it is also worth pointing out that bits/sec information is consistent with previous studies.

      Thank you for the suggestion. We now point out that the spatial information in our data is consistent with previous studies.

      Reviewer #2 (Recommendations For The Authors)

      While the results of the study are robust and timely, further details of behavioural training, additional quantitative comparisons, and improvements in the data presentation would make the study more comprehensible and complete.

      Major comments

      (1) I could not fully comprehend the behavioural protocols. They require a clearer explanation of both the specific rationale of the two tasks as well as a more detailed presentation of the protocols. Specifically:

      (1.1) In the alternation task, were the arms baited in a random succession? How many trials were applied per session? Fig 1D: how could animals reach high choice accuracy if the baiting was random?

      We used a continuous version of the alternation task, in which the animals were rewarded for left→home→right and right→home→left visit sequences. In addition, animals were always rewarded on inbound journeys. There was no random baiting of goal arms. Perhaps the confusion stems from our use of the word “trial” to refer to a completed lap (i.e., a pair of outbound/inbound journeys). On average, animals performed 54 of such trials per 30-minute session in the alternation task. We have expanded the description of the behavioral tasks in the Results and further clarified these points in the Methods section.

      (1.2) Were they rewarded for correct inbound trials? If there was no reward, why were they considered correct?

      Yes, rats received a reward at the home platform for correct inbound trials. We have now explicitly stated this in the text.

      (1.3) In the switch alternation protocol, for how many trials was one arm kept more rewarding than the other, and how many trials followed after the rewarding value switch?

      A switch was triggered when rats (of their own volition) visited the high-reward goal arm eight times in a row. Following a switch, the animals could complete as many trials as necessary until they visited the new high- reward goal arm in eight consecutive trials, which triggered another switch. As can be seen in Figure 1D, at the population level, animals needed ~13 trials to fully commit to the high-reward goal arm following a switch. We have further clarified the switching task protocol in the Results and Methods sections.

      (1.4) What does the phrase "the opposite arm (as 8 consecutive visits)" exactly mean? Sounds like 8 consecutive visits signalled that the arm was rewarded (as if were not predefined in the protocol).

      The task is self-paced and the animals initially visit both goal arms, before developing a bias for the high- reward goal arm. A switch of reward size was triggered as soon as the animal visited the high-reward goal arm for eight consecutive trials. We have rewritten the description of the switching task protocol, including this sentence, which hopefully clarifies the procedure.

      (1.5) P. 15, 1st paragraph, Theta cycle skipping and alternation of spatial representations is more prominent in the alternation task. Why in the switching task, did rats visit the left and right arms approximately equally often if one was more rewarding than the other? How many switches were applied per recording session, and how many trials were there in total?

      Both the left and right goal arms were sampled more or less equally by the animals because both goal arms at various times were associated with a large reward following switches in reward values during sessions. The number of switches per session varied from 1 to 3. Sampling of both goal arms was also evident at the beginning of each session and following each reward value switch, before animals switched their behavior to the (new) highly rewarded goal arm. In Table 1, we have now listed the number of trials and the number of reward-value switches for all sessions.

      (1.6) Is the goal arm in figures the rewarded/highly rewarded arm only or are non-baited arms also considered here?

      Both left and right arms are considered goal arms and were included in the analyses, irrespective of the reward that was received (or not received).

      (2) The spatial navigation-centred behavioural study design and the interpretation of results highlight the importance of the dorsal hippocampal input to the LS. Yet, the recorded LSI cells are innervated by intermediate and ventral aspects of the hippocampus, and LS receives inputs from the amygdala and the prefrontal cortex, which together may together bring about - crucial for the adaptive behaviours regulated by the LS - reward, and reward-prediction-related aspects in the firing of LS cells during spatial navigation. Does success or failure to acquire reward in a trial modify spatial coding and cycle skipping of LSD vs. LSI cells in ensuing inbound and outbound trials?

      This is an excellent question and given the length of the current manuscript, we think that exploration of this question is best left for a future extension of our study.

      A related question: in Figure 10, it is interesting that cycle skipping is prominent in the goal arm for outbound switching trials and inbound trials of both tasks. Could it be analytically explained by task contingencies and behaviour (e.g. correct/incorrect trial, learning dynamics, running speed, or acceleration)?

      Our observation of cycle skipping at the single-cell level in the goal arms is somewhat surprising and, we agree with the reviewer, potentially interesting. However, it was not accompanied by alternation of representations at the population level. Given the current focus and length of the manuscript, we think further investigation of cycle skipping in the goal arm is better left for future analyses.

      (3) Regarding possible cellular and circuit mechanisms of cycle skipping and their relation to the alternating representations in the LS. Recent history of spiking influences the discharge probability; e.g. complex spike bursts in the hippocampus are associated with a post-burst delay of spiking. In LS, cycle skipping was characteristic for LS cells with high firing rates and was not uniformly present in all trajectories and arms. The authors propose that cycle skipping can be more pronounced in epochs of reduced firing, yet the opposite seems also possible - this phenomenon can be due to an intermittently increased drive onto some LS cells. Was there a systematic relationship between cycle skipping in a given cell and the concurrent firing rate or a recent discharge with short interspike intervals?

      In our discussion, we tried to explain the presence of theta cycle skipping in the goal arms at the single-cell level without corresponding alternation dynamics at the population level. We mentioned the possibility of a decrease in excitatory drive. As the reviewer suggests, an increase in excitatory drive combined with post- burst suppression or delay of spiking is an alternative explanation. We analyzed the spatial tuning of cells with theta cycle skipping and found that, on average, these cells have a higher firing rate in the goal arm than the stem of the maze in both outbound and inbound run directions (Figure 5 – figure supplement 1). In contrast, cells that do not display theta cycle skipping do not show increased firing in the goal arm. These results are more consistent with the reviewer’s suggested mechanism and we have updated the discussion accordingly.

      (4) Were the differences between the theta modulation (cycle skipping) of local vs. non-local representations (P.14, line 10-12, "In contrast...", Figure 9A) and between alternation vs. switching tasks (Figure 10 C,D) significantly different?

      We have added quantification and statistical comparisons for the auto- and cross-correlations of the local/non-local representations. The results indeed show significantly stronger theta cycle skipping of the non-local representations as compared to the local representations (Figure 10 - figure supplement 1A), a stronger alternation of non-local representations in the outbound direction (Figure 10 - figure supplement 1B), and significant differences between the two tasks (Figure 11E,F).

      (5) Regarding the possibility of prospective coding in LS, is the accurate coding of run direction not consistent with prospective coding? Can the direction be decoded from the neural activity in the start arm? Are the cycling representations of the upcoming arms near the choice point equally likely or preferential for the then- selected arm?

      The coding of run direction (outbound or inbound) is distinct from the prospective/retrospective coding of the goal arm. As implemented, the directional decoding model does not differentiate between the two goal arms and accurate decoding of direction with this model can not inform us whether or not there is prospective (or retrospective) coding. To address the reviewer’s comments, we performed two additional analyses. First, we analyzed the directional (outbound/inbound) decoding performance as a function of location in the maze (Figure 6 - figure supplement 3E). The results show that directional decoding performance is high in both stem and goal arms. Second, we analyzed how well we can predict the trajectory type (i.e., to/from the left or right goal arm) as a function of location in the maze, and separately for outbound and inbound trajectories (Figure 6 - figure supplement 3C,D). The results show that on outbound journeys, decoding the future goal arm is close to chance when the animals are running along the stem. The decoding performance goes up around the choice point and reaches the highest level when animals are in the goal arm.

      (6) Figure 10 seems to show the same or similar data as Figures 5 (A,B) and 9 (C,D).

      Figure 10 (figure 11 in revised manuscript) re-analyzes the same data as presented in Figures 5 and 9, but separates the experimental sessions according to the behavioral task. We now explicitly state this.

      Minor comments

      (1) If cycle skipping in the periodicity of non-local representations was more prominent in alternation than in the switching task, one might expect them to be also prominent in early trials of the switching task, when the preference of a more rewarding arm is not yet established. Was this the case?

      The reviewer makes an interesting suggestion. Indeed, if theta cycle skipping and the alternation of non-local representations reflect that there are multiple paths that the animal is considering, one may predict that the theta skipping dynamics are similar between the two tasks in early trials (as the reviewer suggests). Similarly, one may predict that in the switching task, the alternation of non-local representations is weaker immediately before a reward contingency switch (when the animal has developed a bias towards the goal arm with a large reward) as compared to after the switch.

      We have now quantified the theta cycle dynamics of spatial representations in the early trials in each session of both tasks (Figure 11 - figure supplement 2) and in the trials before and after each switch in the switching task (Figure 11 - figure supplement 3).

      The results of the early trial analysis indicate stronger alternation of non-local representations in the alternation task than in the switching task (consistent with the whole session analysis), which is contrary to the prediction.

      The pre-/post-switch analysis did not reveal a significant difference between the trials before and after a reward contingency switch. If anything, there was a trend towards stronger theta cycle skipping/alternation in the trials before a switch, which would be opposite to the prediction.

      These results do not appear to support the idea that the alternation of non-local representations reflects the number of relevant paths available to the animal. We have updated the text to incorporate these new data and discuss the implications.

      (2) Summary: sounds like the encoding of spatial information and its readout in the efferent regions are equally well established.

      Thank you for pointing this out.

      (3) Summary: "motivation and reward processing centers such as the ventral tegmental area." How about also mentioning here the hypothalamus, which is a more prominent output of the lateral septum than the VTA?

      We have now also mentioned the hypothalamus.

      (4) "lateral septum may contribute to the hippocampal theta" - readers not familiar with details of the medial vs. lateral septum research may misinterpret the modest role of LS in theta compared to MS.

      We have added “in addition to the strong theta drive originating from the medial septum” to make clear that the lateral septum has a modest role in hippocampal theta generation.

      (5) "(Tingley and Buzsáki, 2018) found a lack of spatial rate coding in the lateral septum and instead reported a place coding by specific phases of the hippocampal theta rhythm (Rizzi-Wise and Wang, 2021) " needs rephrasing.

      Thank you, we have rephrased the sentence.

      (6) Figure 4 is a bit hard to generalize. The authors may additionally consider a sorted raster presentation of the dataset in this main figure.

      We have removed this figure in the revised manuscript, as it was not necessary to make the point about the location of theta cycle skipping. Instead, we show examples of spatially-resolved cycle skipping in Figure 4 (formerly Figure 5 - supplementary figures 1 and 2), and, following the reviewer’s suggestion, we have added a plot with the spatially-resolved cycle skipping index for all analyzed cells (Figure 5A).

      (7) It would help if legends of Figure 5 (and related supplementary figures) state in which of the two tasks the data was acquired, as it is done for Figure 10.

      Thank you for the suggestion. The legends of Figure 4A,B (formerly Figure 5 – supplemental figures 1 and 2) and Figure 5 now include in which behavioral task the data was acquired.

      (8) Page 10, "Spatial coding...", 1st Citing the initial report by Leugeb and Mizumori would be appropriate here too.

      The reviewer is correct. We have added the citation.

      (9) The legend in Figure 6 (panels A-G) does not match the figure (only panels A,B). What is shown in Fig. 6B, the legend does not seem to fully match.

      Indeed, the legend was outdated. This has now been corrected.

      (10) 7 suppl., if extended to enable comparisons, could be a main figure. Presently, Figure 7C does not account for the confounding effect of population size and is therefore difficult to interpret without complex comparisons with the Supplementary Figure which is revealing per se.

      We thank the reviewer for their suggestion. We have changed Figure 7 such that it only shows the analysis of decoding performed with all LSD and LSI cells. Figure 7 – supplemental figure 1 has been transformed into main Figure 8, with the addition of a panel to show a statistical comparison between decoding performance in LSD and LSI with a fixed number of cells.

      (11) 14, line 10 there is no Figure 8A

      This has been corrected.

      (12) 15 paragraph 1, is the discussed here model the one from Kay et al?

      From Kay et al. (2020) and also Wang et al. (2020). We have added the citations.

      (13) Figure 5 - Figure Supplement 1 presents a nice analysis that, in my view, can merit a main figure. I could not find the description of the colour code in CSI panels, does grey/red refer to non/significant points?

      Indeed, grey/red refers to non-significant points and significant points respectively. We have clarified the color code in the figure legend. Following the reviewer’s suggestion, we have made Figure 5 Supplement 1 and 2 a main figure (Figure 4).

      (14) Figure 5 -Figure Supplement 2. Half of the cells (255 and 549) seems not to be representative of the typically high SCI in the goal arm in left and right inbound trials combined (Figure 5 A). Were the changes in CSI in the right and left inbound trials similar enough to be combined in Fig 5A? Otherwise, considering left and right inbound runs separately and trying to explain where the differences come from would seem to make sense.

      Figure 5 – figure supplement 2 is now part of the new main Figure 4. Originally, the examples were from a single session and the same cells as shown in the old Figure 4. However, since the old Figure 4 has been removed, we have selected examples from different sessions and both left/right trajectories that are more representative of the overall distribution. We have further added a plot with the spatially-resolved cycle skipping for all analyzed cells in Figure 5A.

      (15) In the second paragraph of the Discussion, dorso-ventral topography of hippocampal projections to the LS (Risold and Swanson, Science, 90s) could be more explicitly stated here.

      Thank you for the suggestion. We have now explicitly mentioned the dorsal-ventral topography of hippocampal-lateral septum projections and cite Risold & Swanson (1997).

      (16) Discussion point: why do the differences in spatial information of cells in the ventral/intermediate vs. dorsal hippocampus not translate into similarly prominent differences in LSI vs. LSD?

      In our data, we do observe clear differences in spatial coding between LSD and LSI. Specifically, cell activity in the LSD is more directional, has higher goal arm selectivity, and higher spatial information (we have now added statistical comparisons to Figure 6 – figure supplement 1). As a result, spatial decoding performance is much better for LSD cell populations than LSI cell populations (see updated Figure 8, with statistical comparison of decoding performance). Spatial coding in the LS is not as strong as in the hippocampus, likely because of the convergence of hippocampal inputs, which may give the impression of a less prominent difference between the two subregions.

      (17) Discussion, last paragraph: citation of the few original anatomical and neurophysiological studies would be fitting here, in addition to the recent review article.

      Thank you for the suggestion. We have added selected citations of the original literature.

      (18) Methods, what was the reference electrode?

      We used an external reference electrode that was soldered to a skull screw, which was positioned above the cerebellum. We have added this to the Methods section.

      (19) Methods, Theta cycle skipping: bandwidth = gaussian kerner parameter?

      The bandwidth is indeed a parameter of the Gaussian smoothing kernel and is equal to the standard deviation.

      Reviewer #3 (Recommendations For The Authors)

      Below I offer a short list of minor comments and suggestions that may benefit the manuscript.

      (A) I was not able to access the Open Science Framework Repository. Can this be rectified?

      Thank you for checking the OSF repository. The data and analysis code are now publicly available.

      (B) In the discussion the authors should attempt to flesh out whether they can place theta cycle skipping into context with left/right sweeps or scan ahead phenomena, as shown in the Redish lab.

      Thank you for the excellent suggestion. We have now added a discussion of the possible link between theta cycle skipping and the previously reported scan-ahead theta sweeps.

      (C) What is the mechanism of cycle skipping? This could be relevant to intrinsic vs network oscillator models. Reference should also be made to the Deshmukh model of interference between theta and delta (Deshmukh, Yoganarasimha, Voicu, & Knierim, 2010).

      We had discussed a potential mechanism in the discussion (2nd to last paragraph in the revised manuscript), which now includes a citation of a recent computational study (Chu et al., 2023). We have now also added a reference to the interference model in Deshmukh et al, 2010.

      (D) Little background was given for the motivation and expectation for potential differences between the comparison of the dorsal and intermediate lateral septum. I don't believe that this is the same as the dorsal/ventral axis of the hippocampus, but if there's a physiological justification, the authors need to make it.

      We have added a paragraph to the introduction to explain the anatomical and physiological differences across the lateral septum subregions that provide our rationale for comparing dorsal and intermediate lateral septum (we excluded the ventral lateral septum because the number of cells recorded in this region was too low).

      (E) It would help to label "outbound" and "inbound" on several of the figures. All axes need to be labeled, with appropriate units indicated.

      We have carefully checked the figures and added inbound/outbound labels and axes labels where appropriate.

      (F) In Figure 6, the legend doesn't match the figure.

      Indeed, the legend was outdated. This has now been corrected.

      (G) The firing rate was non-uniform across the Y-maze. Does this mean that the cells tended to fire more in specific positions of the maze? If so, how would this affect the result? Would increased theta cycle skipping at the choice point translate to a lower firing rate at the choice point? Perhaps less overdispersion of the firing rate (Fenton et al., 2010)?

      Individual cells indeed show a non-uniform firing rate across the maze. To address the reviewer’s comment and test if theta cycle skipping cells were active preferentially near the choice point or other locations, we computed the mean-corrected spatial tuning curves for cell-trajectory pairs with and without significant theta cycle skipping. This additional analysis indicates that, on average, the population of theta cycle skipping cells showed a higher firing rate in the goal arms than in the stem of the maze as compared to non-skipping cells for outbound and inbound directions (shown in Figure 5 - figure supplement 1).

      (H) As mentioned above, it could be helpful to look at phase preference. Was there an increased phase preference at the choice point? Would half-cycle firing correlate with an increased or decreased phase preference? Based on prior work, one would expect increased phase preference, at least in CA1, at the choice point (Schomburg et al., 2014). In contrast, other work might predict phasic preference according to spatial location (Tingley & Buzsaki, 2018). Including phase analyses is a suggestion, of course. The manuscript is already sufficiently novel and informative. Yet, the authors should state why phase was not analyzed and that these questions remain for follow-up analyses. If the authors did analyze this and found negative results, it should be included in this manuscript.

      We thank the reviewer for their suggestion. We have not yet analyzed the theta phase preference of lateral septum cells or other relations to the theta phase. We agree that this would be a valuable extension of our work, but prefer to leave it for future analyses.

      (I) One of the most important aspects of the manuscript, is that there is now evidence of theta cycle skipping in the circuit loop between the EC, CA1, and LS. This now creates a foundation for circuit-based studies that could dissect the origin of route planning. Perhaps the authors should state this? In the same line of thinking, how would one determine whether theta cycle skipping is necessary for route planning as opposed to a byproduct of route planning? While this question is extremely complex, other studies have shown that spatial navigation and memory are still possible during the optogenetic manipulation of septal oscillations (Mouchati, Kloc, Holmes, White, & Barry, 2020; Quirk et al., 2021). However, pharmacological perturbation or lesioning of septal activity can have a more profound effect on spatial navigation (Bolding, Ferbinteanu, Fox, & Muller, 2019; Winson, 1978). As a descriptive study, I think it would be helpful to remind the readers of these basic concepts.

      We thank the reviewer for their comment and for pointing out possible future directions for linking theta cycle skipping to route planning. Experimental manipulations to directly test this link would be very challenging, but worthwhile to pursue. We now mention how circuit-based studies may help to test if theta cycle skipping in the broader subcortical-cortical network is necessary for route planning. Given that the discussion is already quite long, we decided to omit a more detailed discussion of the possible role of the medial septum (which is the focus of the papers cited by the reviewer).

      Very minor points

      (A) In the introduction, "one study" begins the sentence but there is a second reference.

      Thank you, we have rephrased the sentence.

      (B) Also in the introduction, it could be helpful to have an operational definition of theta cycle skipping (i.e., 'enhanced rhythmicity at half theta frequency').

      We followed the reviewer’s suggestion.

      (C) The others should be more explicit in the introduction about their main question. Theta cycle skipping exists in CA1, and then import some of the explanations mentioned in the discussion to the introduction (i.e., attractors states of multiple routes). The main question is then whether this phenomenon, and others from CA1, translate to the output in LS.

      We have edited the introduction to more clearly state the main question of our study, following the suggestion from the reviewer.

      (D) There are a few instances of extra closing parentheses.

      We checked the text but did not find instances of erroneous extra closing parentheses. There are instances of nested parentheses, which may have given the impression that closing parentheses were duplicated.

      (E) The first paragraph of the Discussion lacks sufficient references.

      We have now added references to the first paragraph of the discussion.

      (F) At the end of the 2nd paragraph in the Discussion, the comparison is missing. More than what? It's not until the next reference that one can assume that the authors are referring to a dorsal/ventral axis. However, the physiological motivation for this comparison is lacking. Why would one expect a dorsal/intermediate continuum for theta modulation as there is along the dorsal/ventral axis of the hippocampus?

      Thank you for spotting this omission. We have rewritten the paragraph to more clearly make the parallel between dorsal-ventral gradients in the lateral septum and hippocampus and how this relates to the topographical connections between the two structures.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Recommendations For The Authors):

      In this revision the authors address some of the key concerns, including clarification of the balanced nature of the RL driven pitch changes and conducting analyses to control for the possible effects of singing quantity on their results. The paper is much improved but still has some sources of confusion, especially around Fig. 4, that should be fixed. The authors also start the paper with a statistically underpowered minor claim that seems unnecessary in the context of the major finding. I recommend the authors may want to restructure their results section to focus on the major points backed by sufficient n and stats.

      Major issues.

      (1) The results section begins very weak - a negative result based on n=2 birds and then a technical mistake of tube clogging re-spun as an opportunity to peak at intermittent song in the otherwise muted birds. The logic may be sound but these issues detract from the main experiment, result, analysis, and interpretation. I recommend re-writing this section to home in on, from the outset, the well-powered results. How much is really gained from the n=2 birds that were muted before ANY experience? These negative results may not provide enough data to make a claim. Nor is this claim necessary to motivate what was done in the next 6 birds. I recommend dropping the claim?

      We thank the reviewer for the recommendation. We moved the information to the Methods.

      (2) Fig. 4 is very important yet remains very confusing, as detailed below.

      Fig. 4a. Can the authors clarify if the cohort of WNd birds that give rise to the positive result in Fig 4 ever experienced the mismatch in the absence of ongoing DAF reinforcement pre-deafening? Fig4a does nor the next clearly specifies this. This is important because we know that there are day timescale delays in LMAN-dependent bias away from DAF and consolidation into the HVC-RA pathway (Andalman and Fee, 2009). Thus, if birds experienced mismatch pre-deafening in the absence of DAF, then an earnly learning phase in Area X could be set in place. Then deafening occurs, but these weight changes in X could result in LMAN bias that expresses only days later -independent of auditory feedback. Such a process would not require an internal model as the authors are arguing for here. It would simply arise from delays in implementing reinforcement-driven feedback. If the birds in Fig 4 always had DAF on before deafening, then this is not an issue. But if the birds had hours of singing with DAF off before deafening, and therefore had the opportunity to associate DA error signals with the targeted time in the song (e.g. pauses on the far-from-target renditions (Duffy et al, 2022), then the return-to-baseline would be expected to be set in place independent of auditory feedback. Please clarify exactly if the pitch-contingent DAF was on or off in the WNd cohort in the hours before deafening. In Fig. 3b it looks like the answer is yes but I cannot find this clearly stated in the text.

      We did not provide DAF-free singing experience to the birds in Fig. 4 before deafening. Thus, according to the reviewer, the concern does not apply.

      Note that we disagree with the reviewer’s premise that there is ‘day timescale delay in LMAN-dependent bias away from DAF and consolidation into the HVC-RA pathway’. More recent data reveals immediate consolidation of the anterior forebrain bias without a night-time effect (Kollmorgen, Hahnloser, Mante 2020; Tachibana, Lee, Kai, Kojima 2022). Thus, the single bird in (Andalman and Fee 2009) seems to be somewhat of an outlier.

      Hearing birds can experience the mismatch regardless of whether they experience DAF-free singing (provided their song was sufficiently shifted): even the renditions followed by white noise can be assessed with regards to their pitch mismatch, so that DAF imposes no limitation on mismatch assessment.

      We disagree with their claim that no internal model would be needed in case consolidation was delayed in Area X. If indeed, Area X stores the needed change and it takes time to implement this change in LMAN, then we would interpret the change in Area X as the plan that birds would be able to implement without auditory feedback. Because pitch can either revert (after DAF stops) or shift further away (when DAF is still present), there is no rigid delay that is involved in recovering the target, but a flexible decision making of implementing the plan, which in our view amounts to using a model.

      Fig 4b. Early and Late colored dots in legend are both red; late should be yellow? Perhaps use colors that are more distinct - this may be an issue of my screen but the two colors are difficult to discern.

      We used colors yellow to red to distinguish different birds and not early and late. We modified the markers to improve visual clarity: Early is indicated with round markers and late with crosses.

      Fig 4b. R, E, and L phases are only plotted for 4c; not in 4b. But the figure legend says that R, E and L are on both panels.

      In Fig. 4b E and L are marked with markers because they are different for different birds. In Fig. 4c the phases are the same for all birds and thus we labeled them on top. We additionally marked R in Fig. 4b as in Fig. 4c.

      Fig 4e. Did the color code switch? In the rest of Fig 4, DLO is red and WND is blue. Then in 4e it swaps. Is this a typo in the caption? Or are the colors switch? Please fix this it's very confusing.

      Thank you for pointing out the typo in the caption. We corrected it.

      The y axes in Fig 4d-e are both in std of pitch change - yet they have different ylim which make it visually difficult to compare by eye. Is there a reason for this? Can the authors make the ylim the same for fig 4d-e?.

      We added dashed lines to clarify the difference in ylim.

      Fig 4d-3 is really the main positive finding of the paper. Can the others show an example bird that showcases this positive result, plotted as in Fig 3b? This will help the audience clearly visualize the raw data that go into the d' analyses and get a more intuitive sense of the magnitude of the positive result.

      We added example birds to figure 4, one for WNd and one for dLO.

      Please define 'late' in Fig.4 legend.

      Done

      Minor

      Define NRP In the text with an example. Is an NRP of 100 where the birds was before the withdrawal of reinforcement?

      We added the sentence to the results:

      "We quantified recovery in terms of 𝑵𝑹𝑷 to discount for differences in the amount of initial pitch shift where 𝑵𝑹𝑷 = 𝟎% corresponds to complete recovery and 𝑵𝑹𝑷 = 𝟏𝟎𝟎% corresponds pitch values before withdrawal of reinforcement (R) and thus no recovery."

      Reviewer #3 (Recommendations For The Authors):

      The use of "hierarchically lower" to refer to the flexible process is confusing to me, and possibly to many readers. Some people think of flexible, top-down processes as being _higher_ in a hierarchy. Regardless, it doesn't seem important, in this paper, to label the processes in a hierarchy, so perhaps avoid using that terminology.

      We reformulated the paragraph using ‘nested processes’ instead of hierarchical processes.

      In the statement "a seeming analogous task to re-pitching of zebra finch song, in humans, is to modify developmentally learned speech patterns", a few suggestions: it is not clear whether "re-pitching" refers to planning or feedback-dependent learning (I didn't see it introduced anywhere else). And if this means planning, then it is not clear why this would be analogous to "humans modifying developmentally learned speech patterns". As you mentioned, humans are more flexible at planning, so it seems re-pitching would _not_ be analogous (or is this referring to the less flexible modification of accents?).

      We changed the sentence to:

      "Thus, a seeming analogous task to feedback-dependent learning of zebra finch song, in humans, is to modify developmentally learned speech patterns."

    1. Author response:

      We would first like to thank the editor for considering our findings for publication in eLife. Furthermore, we thank the reviewers and editors for their encouraging reviews and for providing helpful and insightful comments.

      Reviewer #1 (Public Review):

      Summary:

      The pituitary gonadotropins, FSH and LH, are critical regulators of reproduction. In mammals, synthesis and secretion of FSH and LH by gonadotrope cells are controlled by the hypothalamic peptide, GnRH. As FSH and LH are made in the same cells in mammals, variation in the nature of GnRH secretion is thought to contribute to the differential regulation of the two hormones. In contrast, in fish, FSH and LH are produced in distinct gonadotrope populations and may be less (or differently) dependent on GnRH than in mammals. In the present manuscript, the authors endeavored to determine whether FSH may be independently controlled by a distinct peptide, cholecystokinin (CCK), in zebrafish.

      Strengths:

      The authors demonstrated that the CCK receptor is enriched in FSH-producing relative to LH-producing gonadotropes, and that genetic deletion of the receptor leads to dramatic decreases in gonadotropin production and gonadal development in zebrafish. Also, using innovative in vivo and ex vivo calcium imaging approaches, they show that LH- and FSH-producing gonadotropes preferentially respond to GnRH and CCK, respectively. Exogenous CCK also preferentially stimulated FSH secretion ex vivo and in vivo.

      Weaknesses:

      The concept that there may be a distinct FSH-releasing hormone (FSHRH) has been debated for decades. As the authors suggest that CCK is the long-sought FSHRH (at least in fish), they must provide data that convincingly leads to such a conclusion. In my estimation, they have not yet met this burden. In particular, they show that CCK is sufficient to activate FSH-producing cells, but have not yet demonstrated its necessity. Their one attempt to do so was using fish in which they inactivated the CCK receptor using CRISPR-Cas9. While this manipulation led to a reduction in FSH, LH was affected to a similar extent. As a result, they have not shown that CCK is a selective regulator of FSH.

      Our conclusion regarding the necessity of CCK signaling for FSH secretion is based on the following evidence:

      (1) CCK-like receptors are expressed in the pituitary gland predominantly on FSH cells.

      (2) Application of CCK to pituitaries elicits FSH cell activation and FSH release, and, to a lesser degree, activation of LH cells.

      (3) Mutating the CCK-like receptor causes a decrease in fsh and lh mRNA synthesis.

      (4) Mutating the CCK-like receptor gives rise to a phenotype which is identical to that caused by mutation of both lh and fsh genes in zebrafish.

      (5) Mutating the FSH-specific CCK receptor in a different species of fish (medaka) also causes a complete shutdown of FSH production and phenocopies a fsh-mutant phenotype (Uehara et al, BioRxiv, DOI: 10.1101/2023.05.26.542428).

      Taken together, we believe that this data strongly supports the conclusion that CCK is necessary for FSH production and release from the fish pituitary. Admittedly, the overlapping effects of CCK on both FSH and LH cells in zebrafish (evident in both our calcium imaging experiments and the KO phenotype) complicates the interpretation of the phenotype. We speculate that the effect of CCK on LH cells in zebrafish can be caused either by paracrine signaling within the gland or by the effects of CCK on higher levels of the axis. In our revised manuscript we will make sure to highlight the overlapping effects of CCK on LH cells rather than portray it as a selective activator of FSH cells.

      Moreover, they do not yet demonstrate that the effects observed reflect the loss of the receptor's function in gonadotropes, as opposed to other cell types.

      Although there is evidence for the expression of CCK receptor in other tissues, we do show a direct decrease of FSH and LH expression in the gonadotrophs of the pituitary of the mutant fish; taken together with its significant expression in FSH cells, it is the most reasonable and forward explanation for the mutant phenotype. Unfortunately, unlike in mice, technologies for conditional knockout of genes in specific cell types are not yet available for our model and cell types. However, in the revised manuscript we will add a supplementary figure describing the distribution of this receptor in other tissues.

      It also is not clear whether the phenotypes of the fish reflect perturbations in pituitary development vs. a loss of CCK receptor function in the pituitary later in life. Ideally, the authors would attempt to block CCK signaling in adult fish that develop normally. For example, if CCK receptor antagonists are available, they could be used to treat fish and see whether and how this affects FSH vs. LH secretion.

      While the observed gonadal phenotype of the KO (sex inversion) should have a developmental origin since it requires a long time to manifest, the effect of the KO on FSH and LH cells is probably more acute.

      In the Discussion, the authors suggest that CCK, as a satiety factor, may provide a link between metabolism and reproduction. This is an interesting idea, but it is not supported by the data presented. That is, none of the results shown link metabolic state to CCK regulation of FSH and fertility. Absent such data, the lengthy Discussion of the link is speculative and not fully merited.

      In the revised manuscript, we will address this comment by either providing data to link cck with metabolic status or tuning down the Discussion of this topic.

      Also in the Discussion, the authors argue that "CCK directly controls FSH cells by innervating the pituitary gland and binding to specific receptors that are particularly abundant in FSH gonadotrophs." However, their imaging does not demonstrate innervation of FSH cells by CCK terminals (e.g., at the EM level).

      Innervation of the fish pituitary does not imply a synaptic-like connection between axon terminals and endocrine cells. In fact, such connections are extremely rare, and their functionality is unclear. Instead, the mode of regulation between hypothalamic terminals and endocrine cells in the fish pituitary is more similar to "volume transmission" in the CNS, i.e. peptides are released into the tissue and carried to their endocrine cell targets by the circulation or via diffusion.

      Moreover, they have not demonstrated the binding of CCK to these cells. Indeed, no CCK receptor protein data are shown.

      Our revised manuscript will include detailed experiments showing the activation of the receptor by its ligand. Unfortunately, no antibody is available against this fish- specific receptor (one of the caveats of working with fish models); therefore, we cannot present receptor protein data.

      The calcium responses of FSH cells to exogenous CCK certainly suggest the presence of functional CCK receptors therein; but, the nature of the preparations (with all pituitary cell types present) does not demonstrate that CCK is acting directly in these cells.

      We agree with the reviewer that there are some disadvantages in choosing to work with a whole-tissue preparation. However, we believe that the advantages of working in a more physiological context far outweigh the drawbacks as it reflects the natural dynamics more precisely. Since our transcriptome data as well as our ISH staining, show that the CCK receptor is exclusively expressed on FSH cells, it is improbable that the observed calcium response is mediated via a different pituitary cell type.

      Indeed, the asynchrony in responses of individual FSH cells to CCK (Figure 4) suggests that not all cells may be activated in the same way. Contrast the response of LH cells to GnRH, where the onset of calcium signaling is similar across cells (Figure 3).

      The difference between the synchronization levels of LH and FSH cells activity stems from the gap-junction mediated coupling between LH cells that does not exist between FSH cells (Golan et al 2016, DOI: 10.1038/srep23777). Therefore, the onset of calcium response in FSH cells is dependent on the irregular diffusion rate of the peptide within the preparation, whereas the tight homotypic coupling between LH cells generates a strong and synchronized calcium rise that propagates quickly throughout the entire population; we will make sure this is clear in the final revision.

      Finally, as the authors note in the Discussion, the data presented do not enable them to conclude that the endogenous CCK regulating FSH (assuming it does) is from the brain as opposed to other sources (e.g., the gut).

      We agree with the reviewer that, for now, we are unable to determine whether hypothalamic or peripheral CCK are the main drivers of FSH cells. While the strong innervation of the gland by CCK-secreting hypothalamic neurons strengthens the notion of a hypothalamic-releasing hormone and also fits with the dogma of the neural control of the pituitary gland in fish (Ball, 1981; doi: 10.1016/0016-6480(81)90243-4.), more experiments are required to resolve this question.

      Reviewer #2 (Public Review):

      Summary:

      This manuscript builds on previous work suggesting that the CCK peptide is the releasing hormone for FSH in fishes, which is different than that observed in mammals where both LH and FSH release are under the control of GnRH. Based on data using calcium imaging as a readout for stimulation of the gonadotrophs, the researchers present data supporting the hypothesis that CCK stimulates FSH- containing cells in the pituitary. In contrast, LH-containing cells show a weak and variable response to CCK but are highly responsive to GnRH. Data are presented that support the role of CCK in the release of FSH. Researchers also state that functional overlap exists in the potency of GnRH to activate FSH cells, thus the two signalling pathways are not separate.

      The results are of interest to the field because for many years the assumption has been that fishes use the same signalling mechanism. These data present an intriguing variation where a hormone involved in satiation acts in the control of reproduction.

      Strengths:

      The strengths of the manuscript are that researchers have shed light on different pathways controlling reproduction in fishes.

      Weaknesses:

      Weaknesses are that it is not clear if multiple ligand/receptors are involved (more than one CCK and more than one receptor?). The imaging of the CCK terminals and CCK receptors needs to be reinforced.

      Reviewer consultation summary:

      • The data presented establish sufficiency, but not necessity of CCK in FSH regulation. The paper did not show that CCK endogenously regulates FSH in fish. This has not been established yet.

      This is a very important comment, also raised by reviewer 1. To avoid repetition, please see our detailed response to the comment above.

      • The paper presents the pharmacological effects of CCK on ex vivo preparations but does not establish the in vivo physiological function of the peptide. The current evidence for a novel physiological regulatory mechanism is incomplete and would require further physiological experiments. These could include the use of a CCK receptor antagonist in adult fish to see the effects on FSH and LH release, the generation of a CCK knockout, or cell-specific genetic manipulations.

      As detailed in the responses to the first reviewer,we cannot conduct conditional, cell- specific gene knockout in our model.

      • Zebrafish have two CCK ligands: ccka, cckb and also multiple receptors: cckar, cckbra and cckbrb. There is ambiguity about which CCK receptor and ligand are expressed and which gene was knocked out.

      In the revised manuscript, we will clarify which of the receptors are expressed and which receptor is targeted. We will also provide data showing the specificity of the receptors (both WT and mutant) to the ligands.

      • Blocking CCK action in fish (with receptor KO) affects FSH and LH. Therefore, the work did not demonstrate a selective role for CCK in FSH regulation in vivo and any claims to have discovered FSHRH need to be more conservative.

      We agree with the reviewer that the overlap in the effect of CCK measured in the calcium activation of cells and in the KO model does not allow us to conclude selectivity. In this context, it is crucial to highlight that CCK-R exhibits high expression on FSH cells but not on LH cells. Therefore, the effect of CCK on LH cells is likely paracrine rather than solely endocrine. We will tone down our claims of selectivity in the revised manuscript.

      • The labelling of the terminals with anti-CCK looks a lot like the background and the authors did not show a specificity control (e.g. anti-CCK antibody pre-absorbed with the peptide or anti-CCK in morphant/KO animals).

      We will update the colors of the image for better clarity. Also, The same antibody had been previously used to mark CCK-positive cells in the gut of the red drum fish (K.A. Webb, Jr. 2010; DOI: https://doi.org/10.1016/j.ygcen.2009.10.010), where a control (pre-absorbed with the peptide) experiment had been conducted.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      The authors have addressed my comments. As a final minor point, regarding comment 2, these condensates are likely viscoelastic rather than purely viscous. It is prudent to indicate that the data may refer to an apparent viscosity.

      We added the following text to the manuscript to highlight the viscoelastic nature of ELP condensates, and the relationship of reported values with the steady state viscosity. “It is worth noting that the reported values, although related, may not quantitatively represent the steady-state viscosity. This discrepancy arises from the slow relaxation timescale inherent in ELP condensates with viscoelastic properties.”

    1. Author response:

      We thank eLife and the reviewers for the thoughtful summary and valuable review of our manuscript. We largely agree with the summary and review and have provided our responses to the comments below. We believe BADGER is a significant new tool for identifying associated risk factors for complex diseases, and the associations we observed in the analysis provide insights into the genetic basis of Alzheimer's disease.

      Reviewer #1 (Public Review):

      The major aim of the paper was a method for determining genetic associations between two traits using common variants tested in genome-wide association studies. The work includes a software implementation and application of their approach. The results of the application of their method generally agree with what others have seen using similar AD and UKB data.

      The paper has several distinct portions. The first is a method for testing genetic associations between two or more traits using genome-wide association tests statistics. The second is a python implementation of the method. The last portion is the results of their method using GWAS from AD and UK Biobank.

      We thank the reviewer for the conclusion and positive comments.

      Regarding the method, it seems like it has similarities to LDSC, and it is not clear how it differs from LDSC or other similar methods. The implementation of the method used python 2.7 (or at least was reportedly tested using that version) that was retired in 2020. The implementation was committed between Wed Oct 3 15:21:49 2018 to Mon Jan 28 09:18:09 2019 using data that existed at the time so it was a bit surprising it used python 2.7 since it was initially going to be set for end-of-life in 2015. Anyway, trying to run the package resulted in unmet dependency errors, which I think are related to an internal package not getting installed. I would expect that published software could be installed using standard tooling for the language, and, ideally, software should have automated testing of key portions.

      We thank the reviewer for their comments. To clarify, the primary difference between our proposed method, BADGERS, and LDSC lies in their respective objectives and applications. LDSC is designed to estimate heritability and genetic correlations between traits by utilizing GWAS summary statistics, thereby aiding in the elucidation of the genetic architecture of complex traits and diseases. Conversely, BADGERS is specifically developed to explore causal relationships between risk factors, such as biomarkers, and diseases of interest. It employs genetic variants as variables to deduce causality, thereby addressing the challenges of confounding and reverse causation that are common in observational studies. Although BADGERS utilizes the LD reference panel derived from LDSC, the LD reference panel is used to obtain the predicted trait expression. The ultimate goal is to focus on linking biobank traits with Alzheimer’s disease and building causal relationships instead of identifying genetic architecture.

      Regarding the technical aspects mentioned, we acknowledge the concerns about the use of Python 2.7 and the issues encountered during the package installation. We are in the process of updating the software to ensure compatibility with current versions of Python and to enhance the installation process with standard tooling and automated testing for a more user-friendly experience. We have provided tests for each portion of the software so the user can test if the software is working properly.

      Regarding the main results, they find what has largely been shown by others using the same data or similar data, which add prima facie validity to the work The portions of the work dealing with AD subgroups, pathology, biomarkers, and cognitive traits of interest. I was puzzled why the authors suggested surprise regarding parental history and high cholesterol not associated with MCI or cognitive composite scores since the this would seem like the likely fallout of selection of the WRAP cohort. The discussion paragraph that started "What's more, environmental factors may play a big role in the identified associations." confused me. I think what the authors are referring to are how selection, especially in a biobank dataset, can induce correlations, which is not what I think of as an environmental effect.

      We thank the reviewer very much for their comment. We're glad that our findings align with existing research using similar data, increasing the validity of our work and the proposed BADGER algorithm. Your point about the lack of association between parental history, high cholesterol, and mild cognitive impairment (MCI) or cognitive composite scores in the WRAP cohort is well-taken. We agree that the selection criteria of the WRAP cohort may influence these findings, as it consists of individuals with a specific risk profile for Alzheimer's disease. This selection could indeed mitigate the observed association between these factors and cognitive outcomes, which we initially found surprising.

      Regarding the environmental factors, we appreciate your clarification and understand the confusion. Our intention was to discuss the potential for selection bias and confounding factors in biobank datasets for the identified associations, which might not necessarily be direct environmental effects.

      Overall, the work has merit, but I am left without a clear impression of the improvement in the approach over similar methods. Likewise, the results are interesting, but similar findings are described with the data that was used in the study, which are over 5 years old at the time of this review.

      We thank the reviewer a lot for their endorsement of the BADGER framework. We believe that our method, BADGER, improves on existing approaches by effectively linking genetic data with the detailed phenotypic information in biobanks and large disease GWAS. This enhances our ability to detect associations without needing individual-level data, offering clearer insights while reducing issues like reverse causality and confounding factors.

      Even though the IGAP dataset is over five years old, it remains one of the largest publicly available datasets for Alzheimer’s Disease. Likewise, the UK biobank is one of the largest publicly available human traits datasets, which researchers continue to use. These datasets' continued utility demonstrates their value in the research community. Additionally, the versatility of the BADGER framework makes it suitable for future research investigating the relationship between human traits and various diseases using different datasets.

      Reviewer #2 (Public Review):

      Summary:

      Yan, Hu, and colleagues introduce BADGERS, a new method for biobank-wide scanning to find associations between a phenotype of interest, and the genetic component of a battery of candidate phenotypes. Briefly, BADGERS capitalizes on publicly available weights of genetic variants for a myriad of traits to estimate polygenic risk scores for each trait, and then identify associations with the trait of interest. Of note, the method works using summary statistics for the trait of interest, which is especially beneficial for running in population-based cohorts that are not enriched for any particular phenotype (ie. with few actual cases of the phenotype of interest).

      Here, they apply BADGERS on Alzheimer's disease (AD) as the trait of interest, and a battery of circa 2,000 phenotypes with publicly available precalculated genome-wide summary statistics from the UK Biobank. They run it on two AD cohorts, to discover at least 14 significant associations between AD and traits. These include expected associations with dementia, cognition (educational attainment), and socioeconomic status-related phenotypes. Through multivariate modelling, they distinguish between (1) clearly independent components associated with AD, from (2) by-product associations that are inflated in the original bivariate analysis. Analyses stratified according to APOE inclusion show that this region does not seem to play a role in the association of some of the identified phenotypes. Of note, they observe overlap but significant differences in the associations identified with BADGERS and other Mendelian randomization (MR), hinting at BADGERS being more powerful than classical top variant-based MR approaches. They then extend BADGERS to other AD-related phenotypes, which serves to refine the hypotheses about the underlying mechanisms accounting for the genetic correlation patterns originally identified for AD. Finally, they run BADGERS on a pre-clinical cohort with mild cognitive impairment. They observe important differences in the association patterns, suggesting that this preclinical phenotype (at least in this cohort) has a different genetic architecture than general AD.

      We thank the reviewer a lot for the conclusion and positive comments.

      Strengths:

      BADGERS is an interesting new addition to a stream of attempts to "squeeze" biobank data beyond pure association studies for diagnosis. Increasingly available biobank cohorts do not usually focus on specific diseases. However, they tend to be data-rich, opening for deep explorations that can be useful to refine our knowledge of the latent factors that lead to diagnosis. Indeed, the possibility of running genetic correlation studies in specific sub-settings of interest (e.g. preclinical cohorts) is arguably the most interesting aspect of BADGERS. Classical methods like LDSC or two-sample MR capitalize on publicly available summary statistics from large cohorts, or having access to individual genotype data of large cohorts to ensure statistical power. Seemingly, BADGERS provides a balanced opportunity to dissect the correlation between traits of interest in settings with small sample size in which other methods do not work well.

      We thank the reviewer a lot for the conclusion and positive comments.

      Weaknesses:

      However, the increased statistical power is just hinted, and for instance, they do not explore if LDSC would have identified these associations. Although I suspect that is the case, this evidence is important to ensure that the abovementioned balance is right. Finally, as discussed by the authors, the reliance on polygenic risk scoring necessarily undermines the causality evidence gained through BADGERS. In this sense, BADGERS provides an alternative to strict instrumental-variable based analysis, which can be particularly useful to generate new mechanistic hypotheses.

      We thank the reviewer a lot for the comments. We understand the importance of comparing BADGER to other methods. The comparison with LDSC, while not directly relevant to BADGER’s causal inference aims, is indeed an interesting aspect to consider for future studies. In this paper, we focused on comparing BADGER with Mendelian Randomization (MR), which shares its causal inference objective.

      As a result, BADGERS identified a total of 48 traits that reached Bonferroni-corrected statistical significance. In contrast, MR-IVW only identified nine traits with Bonferroni-corrected statistical significance. Among these nine traits, seven were also identified by BADGERS. This demonstrates that BADGER holds higher power in detecting causal relationships.

      Regarding the use of polygenic risk scoring, we agree that it holds challenges in directly inferring causality. While BADGERS offers an innovative way to explore genetic correlations and can help generate new hypotheses about disease mechanisms, it does not replace the causal inferences that can be drawn from instrumental-variable-based analyses. Instead, it should be viewed as a complementary tool that can illuminate potential genetic relationships and guide further causal investigations.

      In summary, after 15 years of focus on diagnosis that would require having individual access to large patient cohorts, BADGERS can become an excellent tool to dig into trait heterogeneity, especially if it turns out to be more powerful than other available methodologies.

      We thank the reviewer a lot for the conclusion and positive comments.

    1. Author response:

      We thank the reviewers and editors for their time and effort reviewing and improving this manuscript. We also thank them for their support.

      Following the guidelines received by eLife we submit here the preliminary author’s response to the Public review with our planned changes to the manuscript.

      Reviewer 1.

      Comment 1. Issue on cross-reactivities of MafB antibodies.

      We are confident that our description of MafB V1 interneurons is correct despite some cross-reactivity with one of the antibodies used. We test all antibodies we use, and unfortunately, we found an inverse relationship between sensitivity and specificity with the two MafB antibodies used in this study. We chose for quantification the one with highest sensitivity, despite the presence of some cross-reactivity in interneurons other than the dorsal and ventral (Renshaw) V1 populations we focus on. The dorsal and ventral (Renshaw) V1 populations we describe here are also reactive with the more specific antibody (although with lower sensitivity) and both are neatly labeled in a MafB-GFP reporter mouse as described in Figure 3. We will add an image to the supplement with MafB-GFP V1 Interneurons at P5 showing the immunoreactivity of both MafB antibodies as suggested by the reviewer. We agree with the reviewer that this will give further support to the characterization of these populations by either immunocytochemical or genetic means at P5.

      Unfortunately, we cannot show lack of immunoreactivity for MafB antibodies in MafB GFP/GFP knockout mice at P5 because MafB global KOs die at birth as a result of respiratory failure. This is due to removal of inhibitory interneurons in brainstem centers critical for respiration (Blanchi at al. 2003 MafB deficiency causes defective respiratory rhythmogenesis and fatal central apnea at birth. Nat Neurosci. 6(10):1091-100. doi: 10.1038/nn1129. PMID: 14513037). This is why we used tissues from late embryos for testing antibody specificity in KO spinal cords. We will make this clearer in the text.

      Comment 2. Overlap of V1 clades with lineage labeled Foxp2-V1s at P5.

      We collected the data requested by the reviewer for P5 Foxp2-V1 interneurons and this will be added to an updated version of this figure. In comparison to the results with the OTP mouse, we only found marginal overlap at P5 with Renshaw cells, Pou6f2, and Sp8 V1s in our genetic intersection to label Foxp2-V1s. We apologize for not showing the data. We will make this clearer.

      Reviewer 2.

      Comment 1. Paper VERY hard to read.

      We will make every effort to make the paper more readable by moving methodological discussions to supplementary materials. We strive to keep our methods as rigorous, clean, and replicable as possible, and that sometimes requires lengthy explanations of the details and reasoning behind our approaches. We will make sure this does not distract from the principal scientific messages we want to convey. We agree with the reviewer that these should be emphasized over methodological detail, and we will correct any mistakes in the text that lead to confusion. Thank you for pointing out this problem that we hope to correct in a new version. Why focus on Foxp2 V1s? We focus in the Foxp2 population for several reasons: 1) This is the largest population of V1s, and it is the one with a close spatial association to motoneurons, in particular limb motoneurons; 2) Given previous results (Benito-Gonzalez and Alvarez, 2012, cited in bibliography) it likely includes many reciprocal inhibitory interneurons; 3) We do not have the mice for studying the Pou6f2 (or Sp8) population, but similar studies are now being carried out in the Bikoff lab.

      Comment 2. Lack of functional studies.

      Functional studies are currently being carried out, both during development of limb function in postnatal mice as well as in adult animals. These studies required the creation of several new animal models and reagents. As with the present manuscript, we thoroughly characterize all animals and methods. This takes time and space. These studies are beyond the goals and length of the current manuscript, but we agree with the reviewer that these are the critical next experiments that need to be performed. We are now finalizing studies on the role of Foxp2-V1 interneurons in the postnatal development of limb coordination and validating approaches for silencing them in the adult while also optimizing behavioral assays and recordings. The data presented here on Foxp2-V1 interneuron heterogeneity and relations with limb motoneurons gives the necessary context for raising stronger hypotheses and aiding in the interpretation of future results in functional studies.

      Synapse counts.

      We respectfully disagree with the reviewer’s comments on our synapse density estimates. To fully explain the reasons and prevent any ambiguity, we need to focus on detailed methodological aspects. We apologize for the lengthy response. Two major issues were raised:

      (1) Focus on the cell body.

      The issue pointed by the reviewer of potential synapses in distal dendrites from V1 subgroups not projecting proximally was already discussed in the text. The reason we focus on the cell body is because 1) it is not feasible to study the full dendritic arbor of so many different types of motoneurons and 2) it allows us to identify V1 subpopulations that likely exert stronger modulation of motoneuron firing by targeting the proximal somatodendritic membrane. The fact that synaptic organization on motoneurons is similar on cell bodies and proximal dendrites (first 100 µm) suggests that inputs from V1 clades other than Renshaw cells are likely further away, and therefore there is limited benefit to include analyses of proximal dendrites in these data. Additionally, dendrites would be difficult to consistently follow in Chat immunostained tissue. We are currently using novel viral approaches to obtain labeling of single motoneurons and their full dendritic trees for more in depth dendritic analyses in the mouse. The classical method based on single cell in vivo intracellular labeling using micropipettes is presently very low yield in the adult mouse. We are experienced with detailed single motoneuron dendritic arbor analyses in cat and rat motoneurons (Alvarez et al. 1997 Cell-type specific organization of glycine receptor clusters in the mammalian spinal cord. J Comp Neurol. 379(1):150-70; Alvarez et al., 1998 Distribution of 5-hydroxytryptamine-immunoreactive boutons on alpha-motoneurons in the lumbar spinal cord of adult cats. J Comp Neurol. 393(1):69-83; Rotterman et al., 2014. Normal distribution of VGLUT1 synapses on spinal motoneuron dendrites and their reorganization after nerve injury. J Neurosci. 34(10):3475-92. doi: 10.1523/JNEUROSCI.4768-13.2014). Based on this experience, we do not believe it is feasible to include similar analyses to compare all motor columns throughout 6 segments of the spinal cord in this study. We agree with the reviewer that these are important data sets that need to be collected and they are planned for future experiments. These analyses will address different questions than the ones posed and answered in our current manuscript.

      (2) Number of motoneurons analyzed.

      We disagree with the reviewer assessment that our conclusions might be biased because of the numbers of motoneurons analyzed. We sampled a total of 295 motoneurons in 5 different mice (117 LMC/HMC, 99 MMC, and 79 PGC motoneurons), and we used stringent methods for synapse detection. Due to a technical error, Mouse 3 lacked data in upper lumbar and Th13, but all other mice included data in almost all motor columns and segments. We disagree with the characterization that these are small samples. For full transparency, all motoneurons analyzed were identified in Figure 6D. Each of the nearly 300 motoneuron cell bodies was carefully reconstructed through several optical planes to obtain an accurate estimate of synapse density. More automatic methods in current use in the literature sometimes analyze larger samples, but our methods are designed to avoid methodological biases inherent to these automatic methods. We do not use image thresholding to extract synaptic contacts because they lack accuracy identifying single synapses. Thus, estimates using this technique frequently refer to coverage, not synapse density. In addition, it is hard to keep threshold criteria consistent across multiple optical planes to analyze enough section thickness to estimate a motoneuron surface. This is because tissue light diffraction alters thresholding levels continuously across optical planes. Thus, many authors present data as linear densities across a perimeter (in a single plane) measuring many cells in one field in one plane. We avoid cell body linear densities (or coverage) because they bias counts towards larger synapses that have higher probability of being present at any single confocal plane. Moreover, estimates along a surface reduces synapse sampling variability and better estimate synaptic coverage compared to estimates derived from analyzing single cross-sections. We also confirm each genetically labeled varicosity as a likely synapse by accumulation of VGAT. In this manner we restrict our counts to synaptic boutons and not axons or intervaricose regions. Previously, we used bassoon to show the accuracy of our methods (Wootz et al. 2013 Alterations in the motor neuron-Renshaw cell circuit in the Sod1(G93A) mouse model. J Comp Neurol. 521(7):1449-69. doi: 10.1002/cne.23266). That means that our densities are true synaptic densities, which are difficult to extract from automatic methods that estimate fluorescence coverage over larger samples of somatic profiles but fail to individualize synapses and frequently bias results. These bulk methods introduce significant confounds in data interpretation: Is higher coverage due to bigger synapses or more synapses? Do threshold structures represent true synapses or also include axons? To what extent does sub- or over-thresholding in different planes affect identification of structures in contact with the motoneuron surface? We avoid all these problems. Not surprisingly, a nested ANOVA demonstrated consistent significant differences among motor columns and segments.

      In summary, while more automatic methods allow larger samples, they disregard true synaptic densities and are based on thresholding methods with high variability in different motoneurons, optical planes and histological sections, thereby they require much larger numbers of motoneurons to overcome their many biases and sources of error. This is not our case. Our sample size is large enough considering the accuracy of our methods and data quality. This is demonstrated by consistency in statistical results across motor columns in different segments and mice.

      Comment 3. Possibility of anterograde transsynaptic labeling from primary afferents infected with rabies virus.

      This is a fair question that we did not clearly explain. The reviewer compares our results with those of Pimpinella et al., 2022. The methods used are different. To obtain anterograde tracing, these authors used Cre lines to achieve high levels of expression of TVA and RV glycoprotein in specific subtypes of sensory neurons including proprioceptors. Then EnVa-coated Rabies virus was injected directly inside the spinal cord for cell-type specificity. This method transynaptically labeled in the anterograde direction interneurons receiving inputs from specific types of sensory afferents, but the method does not have the muscle specificity required in our analyses. In our case, we used intramuscular injections at P5 of AAV1-G for transcomplementation with Rabies virus delta G injected in the same muscles later, at P15. In previous studies in which we used the RV-delta G virus without AAV1G, we analyzed motoneuron and primary afferent infection rates and found both to be considerably reduced with injection age. In our hands, there is almost no RV infection of primary afferents when Rabies virus is injected i.m. at P15, but there is some limited motoneuron infection remaining (that we used to our advantage in this paper to avoid primary afferent and developmental confounds).

      Unfortunately, these methodological studies are presently communicated only in abstract form (GomezPerez et al., 2015 and 2016; Program Nos. 242.08 and 366.06). Therefore, we will add to the supplementary information some images from serial sections to those illustrated in the paper and that will show a few “start” LG motoneurons that remained labeled at this survival time point and the lack of any dorsal horn primary afferent labeling. This is consistent with our yet unpublished data that is based on a larger number of animals and more extensive time courses.

      Comment 4. Temporal resolution of birth-dating.

      We agree with the reviewer, and that is the reason we explicitly discuss that temporal resolution is not perfect (we also add a few more caveats that affect temporal resolution beyond the reviewers’ comments). However, the method is good enough to differentiate temporal sequences of neurogenesis with close to 12-hour resolution, once enough animals are analyzed to compensate for methodological temporal overlaps. That is the reason for our Figure 1D.

      Reviewer 3

      Comment 1. Text is too long and main message buried in technical details.

      We agree and similar to our response to the first comment of Reviewer 2, we will revise the writing to make it more straightforward while moving some of the information on methods and technical discussion to supplementary materials. As demonstrated by reviewer 2 comments, methodological discussions are still important to best interpret the data presented in this paper.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This study presents a valuable contribution to cardiac arrhythmia research by demonstrating long noncoding RNA Dachshund homolog 1 (lncDACH1) tunes sodium channel functional expression and affects cardiac action potential conduction and rhythms. Whereas the evidence for functional impact of lncDACH1 expression on cardiac sodium currents and rhythms is convincing, biochemical experiments addressing the mechanism of changes in sodium channel expression and subcellular localization are incomplete.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this study, the authors show that a long-non coding RNA lncDACH1 inhibits sodium currents in cardiomyocytes by binding to and altering the localization of dystrophin. The authors use a number of methodologies to demonstrate that lncDACH1 binds to dystrophin and disrupts its localization to the membrane, which in turn downregulates NaV1.5 currents. Knockdown of lncDACH1 upregulates NaV1.5 currents. Furthermore, in heart failure, lncDACH1 is shown to be upregulated which suggests that this mechanism may have pathophysiolgoical relevance.

      Strengths:

      (1) This study presents a novel mechanism of Na channel regulation which may be pathophysiologically important.

      (2) The experiments are comprehensive and systematically evaluate the physiological importance of lncDACH1.

      Weaknesses:

      (1). What is indicated by the cytoplasmic level of NaV1.5, a transmembrane protein? The methods do not provide details regarding how this was determined. Do you authors means NaV1.5 retained in various intracellular organelles?

      Thank you for the good suggestion. Our study showed that Nav1.5 was transferred to the cell membrane by the scaffold protein Dystropin in response to the regulation of LncDACH1, but not all Nav1.5 in the cytoplasm was transferred to the cell membrane. Therefore, the cytoplasmic level of Nav1.5 represents the Nav1.5 protein that is not transferred to the cell membrane but stays in the cytoplasm and various organelles within the cytoplasm when Nav1.5 is regulated by LncDACH1

      (2) What is the negative control in Fig. 2b, Fig. 4b, Fig. 6e, Fig. 7c? The maximum current amplitude in these seem quite different. -40 pA/pF in some, -30 pA/pF in others and this value seems to be different than in CMs from WT mice (<-20 pA/pF). Is there an explanation for what causes this variability between experiments and/or increase with transfection of the negative control? This is important since the effect of lncDACH1 is less than 50% reduction and these could fall in the range depending on the amplitude of the negative control.

      Thank you for the insightful comment. The negative control in Fig. 2b, Fig. 4b, Fig. 6e are primary cardiomyocytes transfected with empty plasmids. The negative control in Fig.7c are cardiomyocytes of wild-type mice injected with control virus. When we prepare cells before the patch-clamp experiments, the transfection efficiency of the transfection reagent used in different batches of cells, as well as the different cell sizes, ultimately lead to differences in CMS.

      (3) NaV1.5 staining in Fig. 1E is difficult to visualize and to separate from lncDACH1. Is it possible to pseudocolor differently so that all three channels can be visualized/distinguished more robustly?

      Thank you for the good suggestion. We have re-added color to the original image to distinguish between the three channels.

      Author response image 1.

      (4) The authors use shRNA to knockdown lncDACH1 levels. It would be helpful to have a scrambled ShRNA control.

      Thank you for the insightful comment. The control group we used was actually the scrambled shRNA, but we labeled the control group as NC in the article, maybe this has caused you to misunderstand.

      (5) Is there any measurement on the baseline levels of LncDACH1 in wild-type mice? It seems quite low and yet is a substantial increase in NaV1.5 currents upon knocking down LncDACH1. By comparison, the level of LncDACH1 seems to be massively upregulated in TAC models. Have the authors measured NaV1.5 currents in these cells? Furthermore, does LncDACH1 knockdown evoke a larger increase in NaV1.5 currents?

      Thank you for the insightful comment.

      (1).The baseline protein levels of LncDACH1 in wild-type mice and LncDACH1-CKO mice has been verified in a previously published article(Figure 3).(Hypertension. 2019;74:00-00. DOI: 10.1161/HYPERTENSIONAHA.119.12998.)

      Author response image 2.

      (2). We did not measure the Nav1.5 currents in cardiomyocytes of the TAC model mice in this artical, but in another published paper, we found that the Nav1.5 current in the TAC model mice was remarkably reduced than that in wild-type mice(Figure 4).(Gene Ther. 2023 Feb;30(1-2):142-149. DOI: 10.1038/s41434-022-00348-z)

      Author response image 3.

      This is consistent with our results in this artical, and our results show that LncDACH1 levels are significantly upregulated in the TAC model, then in the LncDACH1-TG group, the Nav1.5 current is significantly reduced after the LncDACH1 upregulation(Figure 3).

      Author response image 4.

      (6) What do error bars denote in all bar graphs, and also in the current voltage relationships?

      Thank you for the good comment. All the error bars represent the mean ± SEM. They represent the fluctuation of all individuals of a set of data based on the average value of this set of data, that is, the dispersion of a set of data.

      Reviewer #2 (Public Review):

      This manuscript by Xue et al. describes the effects of a long noncoding RNA, lncDACH1, on the localization of Nav channel expression, the magnitude of INa, and arrhythmia susceptibility in the mouse heart. Because lncDACH1 was previously reported to bind and disrupt membrane expression of dystrophin, which in turn is required for proper Nav1.5 localization, much of the findings are inferred through the lens of dystrophin alterations.

      The results report that cardiomyocyte-specific transgenic overexpression of lncDACH1 reduces INa in isolated cardiomyocytes; measurements in whole heart show a corresponding reduction in conduction velocity and enhanced susceptibility to arrhythmia. The effect on INa was confirmed in isolated WT mouse cardiomyocytes infected with a lncDACH1 adenoviral construct. Importantly, reducing lncDACH1 expression via either a cardiomyocyte-specific knockout or using shRNA had the opposite effect: INa was increased in isolated cells, as was conduction velocity in heart. Experiments were also conducted with a fragment of lnDACH1 identified by its conservation with other mammalian species. Overexpression of this fragment resulted in reduced INa and greater proarrhythmic behavior. Alteration of expression was confirmed by qPCR.

      The mechanism by which lnDACH1 exerts its effects on INa was explored by measuring protein levels from cell fractions and immunofluorescence localization in cells. In general, overexpression was reported to reduce Nav1.5 and dystrophin levels and knockout or knockdown increased them.

      Thank you for summarizing our work and thank you very much for your appreciation on our work.

      Reviewer #3 (Public Review):

      Summary:

      In this manuscript, the authors report the first evidence of Nav1.5 regulation by a long noncoding RNA, LncRNA-DACH1, and suggest its implication in the reduction in sodium current observed in heart failure. Since no direct interaction is observed between Nav1.5 and the LncRNA, they propose that the regulation is via dystrophin and targeting of Nav1.5 to the plasma membrane.

      Strengths:

      (1) First evidence of Nav1.5 regulation by a long noncoding RNA.

      (2) Implication of LncRNA-DACH1 in heart failure and mechanisms of arrhythmias.

      (3) Demonstration of LncRNA-DACH1 binding to dystrophin.

      (4) Potential rescuing of dystrophin and Nav1.5 strategy.

      Thank you very much for your appreciation on our work.

      Weaknesses:

      (1) Main concern is that the authors do not provide evidence of how LncRNA-DACH1 regulates Nav1.5 protein level. The decrease in total Nav1.5 protein by about 50% seems to be the main consequence of the LncRNA on Nav1.5, but no mechanistic information is provided as to how this occurs.

      Thank you for the insightful comment.

      (1) The mechanism of the whole article is as mentioned in the discussion at the end of the article: LncDACH1 binds to dystrophin and thus inhibits membrane trafficking of Nav1.5, Dystrophin is a well-characterized Nav1.5 partner protein. It indirectly interacts with Nav1.5 via syntrophin, which binds with the C-terminus of dystrophin and with the SIV motif on the C-terminus of Nav1.5(Circ Res. 2006;99:407-414. doi: 10.1161/01.RES.0000237466.13252.5e)(Circulation.2014;130:147-160.doi:10.1161/CIRCULATIONAHA.113.007852).

      And we performed pulldown and RNA immunoprecipitation experiments to verify it (Figure 1).

      Author response image 5.

      2) Then we found that overexpression of lncDACH1 increased the ubiquitination of Nav1.5, which explains the downregulation of total Nav1.5 protein (Online Supplementary Figure 12).

      Author response image 6.

      3). Lastly,we found that lncDACH1 failed to pulldown Nav1.5 and anti-Nav1.5 did not precipitate lncDACH1( Supplementary Fig. 1).

      Author response image 7.

      These data indicated that lncDACH does not interact with Nav1.5 directly. It participates in the regulation of Nav1.5 by binding to dystrophin.Cytoplasmic Nav1.5 that failed to target on plasma membrane may be quickly distinguished and then degraded by these ubiquitination enzymes.

      (2) The fact that the total Nav1.5 protein is reduced by 50% which is similar to the reduction in the membrane reduction questions the main conclusion of the authors implicating dystrophin in the reduced Nav1.5 targeting. The reduction in membrane Nav1.5 could simply be due to the reduction in total protein.

      Thank you for the insightful comment. We do not rule out the possibility that the reduction in membrane Nav1.5 maybe be due to the reduction in total protein, but we don't think this is the main mechanism. Our data indicates that the membrane and total protein levels of Nav1.5 were reduced by 50%. However, the cytoplasmic Nav1.5 increased in the hearts of lncDACH1-TG mice than WT controls rather than reduced like membrane and total protein(Figure 1).

      Author response image 8.

      Therefore, we think the mian mechanism of the whole article is as mentioned in the discussion at the end of the article: LncDACH1 binds to dystrophin and thus inhibits membrane trafficking of Nav1.5.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) In Fig. 6E the error bars are only in one direction for cF-lncDACH1. It seems that this error overlaps for NC and cF-lncDACH1 at several voltages, yet it is marked as statistically significant. Also in Fig. 7C, what statistical test was used? Do the authors account for multiple comparisons?

      Thank you for the insightful comment.

      (1) We have recalculated the two sets of data and confirmed that there are indeed statistically significant between the two sets of data for NC and cF-lncDACH1 at In Fig. 6E, The overlaps in the picture may only be visually apparent.

      (2) The data in Fig. 7C are expressed as mean ± SEM. Statistical analysis was performed using unpaired Student’s t test or One-Way Analysis of Variance (ANOVA) followed by Tukey’s post-hoc analysis.

      (2) line 57, "The Western blot" remove "The"

      Sorry for the mistake. We have corrected it.

      (3) line 61, "The opposite data were collected" It is unclear what is meant by opposite.

      Sorry for the mistake. We have corrected it.

      (4) Lines 137-140. This sentence is complex, I would simplify as two sentences.

      Sorry for the mistake. We have corrected it.

      (5) Line 150, "We firstly validated" should be "we first validated"

      Sorry for the mistake. We have corrected it.

      (6) Line 181, "Consistently, the membrane" Is this statement meant to indicate that the experiments yielded a consistent results or that this statement is consistent with the previous one? In either case, this sentence should be reworded for clarification.

      Sorry for the mistake. We have corrected it.

      (7) Line 223, "In consistent, the ex vivo" I am not sure what In consistent means here.

      Thank you for the good suggestion. We mean that the results of ex vivo is consistent with the results of in vivo. We have corrected it to make it clearer.

      (8) Line 285. "a bunch of studies" could be rephrased as "multiple studies"

      Sorry for the mistake. We have corrected it.

      (9) Line 299 "produced no influence" Do you mean produced no change?

      Thank you for the good suggestion.As you put it,we mean it produced no change.

      (10) Line 325 "is to interact with the molecules" no need for "the molecules

      Sorry for the mistake. We have corrected it.

      (11) lines 332-335. This sentence is very confusing.

      Thank you for the insightful comment. We have corrected it.

      (12) Lines 341-342. It is unnecessary to claim primacy here.

      Thank you for the good suggestion. We have removed this sentence.

      (13) Line 373. "Sodium channel remodeling is commonly occured in" perhaps rephrase as occurs commonly

      Thank you for the insightful comment. We have corrected it.

      Reviewer #2 (Recommendations For The Authors):

      Critique

      (1) Aside from some issues with presentation noted below, these data provide convincing evidence of a link between lncDACH1 and Na channel function. The identification of a lncDACH1 segment conserved among mammalian species is compelling. The observation that lncDACH1 is increased in a heart failure model and provides a plausible hypothesis for disease mechanism.

      Thank you very much for your appreciation on our work.

      (2) Has a causal link between dystrophin and Na channel surface expression has been made, or is it an argument based on correlation? Is it possible to rule out a direct effect of lncDACH1 on Na channel expression? A bit more discussion of the limitations of the study would help here.

      Thank you for the insightful comment.

      (1). Dystrophin is a well-characterized Nav1.5 partner protein. It indirectly interacts with Nav1.5 via syntrophin, which binds with the C-terminus of dystrophin and with the SIV motif on the C-terminus of Nav1.5(Circ Res. 2006;99:407-414. doi: 10.1161/01.RES.0000237466.13252.5e)(Circulation.2014;130:147-160.doi:10.1161/CIRCULATIONAHA.113.007852).

      Author response image 9.

      (2).we performed pulldown and RNA immunoprecipitation experiments. The data showed that lncDACH1 failed to pulldown Nav1.5 and anti-Nav1.5 did not precipitate lncDACH1 (Online Supplementary Figure 11). These data indicated that lncDACH does not interact with Nav1.5 directly. ( Supplementary Fig. 1)

      Author response image 10.

      (3) What normalization procedures were used for qPCR quantification? I could not find these.

      Thank you for the good suggestion.The expression levels of mRNA were calculated using the comparative cycle threshold (Ct) method (2−ΔΔCt). Each data point was then normalized to ACTIN as an internal control in each sample. The final results are expressed as fold changes by normalizing the data to the values from control subjects. We have added the normalization procedures in the methods section of the article.

      (4) In general, I found the IF to be unconvincing - first, because the reported effects were not very apparent to me, but more importantly, because only exemplars were shown without quantification of a larger sample size.

      Thank you for the good suggestion. Accordingly, we quantified the immunostaining data. The data have been included in Supplementary Figure 2- 16.The sample size is labeled in the caption.

      Author response image 11.

      Fluorescence intensity of lncDACH1, dystrophin and Nav1.5 in isolated cardiomyocytes of lncDACH1-TG mice. a,b, Membrane levels of dystrophin (dys) and Nav1.5. N=9 for dys. N=8 for Nav1.5. P<0.05 versus WT group. c,d, Cytoplasm levels of dystrophin and Nav1.5. N=9. P<0.05 versus WT group. e, Fluorescence in situ hybridization (FISH) images of LncDACH1. N=10. *P<0.05 versus WT group. P-values were determined by unpaired t test.

      Author response image 12.

      Fluorescence intensity of dystrophin and Nav1.5 in cultured neonatal cardiomyocyte overexpressing lncDACH1. a,b, Membrane levels of dystrophin and Nav1.5. N=9. P<0.05 versus NC group. c,d, Cytoplasm levels of dystrophin and Nav1.5. N=9 for dys. N=12 for Nav1.5. P<0.05 versus NC group. P-values were determined by unpaired t test.

      Author response image 13.

      Fluorescence intensity of lncDACH1, dystrophin and Nav1.5 in isolated cardiomyocytes of lncDACH1-cKO mice. a,b, Membrane levels of dystrophin (dys) and Nav1.5. N=12 for dys. N=8 for Nav1.5. P<0.05 versus WT group. c,d, Distribution of cytoplasm levels of dystrophin and Nav1.5. N=12. P<0.05 versus WT group. e, Fluorescence in situ hybridization (FISH) images of LncDACH1 expression. N=8. *P<0.05 versus WT group. P-values were determined by unpaired t test.

      Author response image 14.

      Fluorescence intensity of dystrophin and Nav1.5 in cultured neonatal cardiomyocytes after knocking down of lncDACH1. a,b, Distribution of membrane levels of dystrophin and Nav1.5. N=11 for dys. N=8 for Nav1.5.P<0.05 versus NC group. c,d, Distribution of cytoplasm levels of dystrophin and Nav1.5. N=12 for dys. N=9 for Nav1.5.P<0.05 versus NC group. P-values were determined by unpaired t test.

      Author response image 15.

      Fluorescence intensity of dystrophin and Nav1.5 in isolated cardiomyocytes overexpressing cF-lncDACH1. a,b, Membrane levels of dystrophin (dys) and Nav1.5. N=9 for dys. N=7 for Nav1.5. P<0.05 versus NC group. c,d, Cytoplasm levels of dystrophin and Nav1.5. N=6 for dys. N=7 for Nav1.5. P<0.05 versus NC group. P-values were determined by unpaired t test.

      Author response image 16.

      Fluorescence intensity of dystrophin and Nav1.5 in cultured neonatal cardiomyocytes overexpressing cF-lncDACH1. a,b, Membrane levels of dystrophin and Nav1.5. N=10 for dys. N=11 for Nav1.5. P<0.05 versus NC group. c,d, Cytoplasm levels of dystrophin and Nav1.5. N=7 for dys. N=6 for Nav1.5.P<0.05 versus NC group. P-values were determined by unpaired t test.

      Author response image 17.

      Fluorescence intensity of Nav1.5 in human iPS differentiated cardiomyocytes overexpressing cF-lncDACH1. a, Membrane levels of Nav1.5. N=8 for Nav1.5. P<0.05 versus NC group. b, Cytoplasm levels of Nav1.5. N=10 for Nav1.5.P<0.05 versus NC group. P-values were determined by unpaired t test.

      (5) More information on how the fractionation kit works would be helpful. How are membrane v. cytoplasm fractions identified?

      a. I presume the ER is part of the membrane fraction? When Nav1.5 is found in the cytoplasmic fraction, what subcompartment is it in - the proteasome?

      b. In the middle panel of A - is the dystrophin signal visible on the WB for WT? I assume the selected exemplar is the best of the blots and so this raises concerns. Much is riding on the confidence with which the fractions report "membrane" v "cytoplasm."

      Thank you for the insightful comment.

      (1). How the fractionation kit works:

      The kit utilizes centrifuge column technology to obtain plasma membrane structures with native activity and minimal cross-contamination with organelles without the need for an ultracentrifuge and can be used for a variety of downstream assays. Separation principle: cells/tissues are sensitized by Buffer A, the cells pass through the centrifuge column under the action of 16000Xg centrifugation, the cell membrane is cut to make the cell rupture, and then the four components of nucleus, cytoplasm, organelle and plasma membrane will be obtained sequentially through differential centrifugation and density centrifugation, which can be used for downstream detection.

      Author response image 18.

      (2). How are membrane v. cytoplasm fractions identified:

      The membrane proteins and cytosolic proteins isolated by the kit, and then the internal controls we chose when performing the western blot experiment were :membrane protein---N-cadherin cytosolic protein---β-Actin

      Most importantly, when we incubate either the primary antibody of N-cadherin with the PVDF membrane of the cytosolic protein, or the primary antibody of the cytosolic control β-Actin with the PVDF membrane of the membrane protein, the protein bands cannot be obtained in the scan results

      Author response image 19.

      (6) More detail in Results, figures, and figure legends will assist the reader.

      a. In Fig. 5, it would be helpful to label sinus rhythm vs. arrhythmia segments.

      Thank you for the good suggestion. We've marked Sinus Rhythm and Arrhythmia segments with arrows

      Author response image 20.

      b. Please explain in the figure legend what the red bars in 5A are

      Thank you for the insightful comment. We've added the explanation to the figure legend .The red lines in the ECG traces indicate VT duration.

      c. In 5C, what the durations pertain to.

      Thank you for the good suggestion. 720ms-760ms refers to the duration of one action potential, with 720ms being the peak of one action potential and 760ms being the peak of another action potential.The interval duration is not fixed, in this artical, we use 10ms as an interval to count the phase singularities from the Consecutive phase maps. Because the shorter the interval duration, the larger the sample size and the more convincing the data.

      d. In the text, please define "breaking points" and explain what the physiological underpinning is. Define "phase singularity."

      Thank you for the insightful comment. Cardiac excitation can be viewed as an electrical wave, with a wavefront corresponding to the action potential upstroke (phase 0) and a waveback corresponding to rapid repolarization (phase 3). Normally, Under normal circumstances, cardiac conduction is composed of a sequence of well-ordered action potentials, and in the results of optical mapping experiments, different colors represent different phases.when a wave propagates through cardiac tissue, wavefront and waveback never touch.when arrhythmias occur in the heart, due to factors such as reenfrant phenomenon, the activation contour will meet the refractory contour and waves will break up, initiating a newly spiral reentry. Corresponding to the optical mapping result graph, different colors representing different time phases (including depolarization and repolarization) come together to form a vortex, and the center of the vortex is defined as the phase singularity.

      (7) In reflecting on why enhanced INa is not proarrhythmic, it is noted that the kinetics are not altered. I agree that is key, but perhaps the consequence could be better articulated. Because lncDACH1 does not alter Nav1.5 gating, the late Na current may not be enhanced to the same effect as observed with LQT gain-of-function Nav1.5 mutations, in which APD prolongation is attributed to gating defects that increase late Na current.

      Thank you for the good suggestion. Your explanation is very brilliant and important for this article. We have revised the discussion section of the article and added these explanations to it.

      Reviewer #3 (Recommendations For The Authors):

      (1) Experiments to specifically address the reduction in total Nav1.5 protein should be included.

      Thank you for the insightful comment. We examined the ubiquitination of Nav1.5. We found that overexpression of lncDACH1 increased the ubiquitination of Nav1.5, which explains the downregulation of total Nav1.5 protein (Online Supplementary Figure 12).

      Author response image 21.

      (2) Experiments to convincingly demonstrate that LncRNA-DACH1 regulates Nav1.5 targeting via dystrophin are missing. As it is, total reduction in Nav1.5 seems to be the explanation as to why there is a decrease in membrane Nav1.5.

      Thank you for the insightful comment. we performed pulldown and RNA immunoprecipitation experiments. The data showed that lncDACH1 can pulldown dystrophin(Figure 1),but failed to pulldown Nav1.5 and anti-Nav1.5 did not precipitate lncDACH1( Supplementary Fig. 1). These data indicated that lncDACH does not interact with Nav1.5 directly. It participates in the regulation of Nav1.5 by binding to dystrophin.

      Author response image 22.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #2 (Public Review):

      In this revised manuscript Aguillon and collaborators convincingly demonstrating that CLK is required for free-running behavioral rhythms under constant conditions in the Cnidarian Nematostella. The results also convincingly show that CLK impacts rhythmic gene expression in this organism. This original work thus demonstrates that CLK was recruited very early during animal evolution in the circadian clock mechanism to optimize behavior and gene expression with the time-of-day. The manuscript could still benefit from some improvements so that it is more accessible for a wide readership.

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      Aguillon and collaborators have deeply revised, and in the progress significantly improved the presentation of their interesting results with the first Cnidarian circadian gene mutant. Results are now very convincingly demonstrating that CLK is required for free-running behavioral rhythms under constant conditions. The results also now more convincingly show that CLK impact rhythmic gene expression, although interpretation of the transcriptomics data is not straightforward. I think there is still improvements that are needed to make the manuscript more accessible. We authors need to keep in mind that a broad audience will read their report, not just chronobiologists. I have listed below several issues that I think should be addressed, and some editing suggestions.

      General comment to Editor and Reviewers:

      We are genuinely grateful to both reviewers and editors about all the feedback which helped us to make the best of our data, to question our analysis to the point we redefined our approach and end up with a great article we are proud of it. Only the name of authors is visible on the article, and considering how much the reviewing system help to improve the research it seems almost unfair. As such, we thank all of you and really appreciate the new eLife system. Bravo all.

      Abstract:

      (1) Line 40" It should read "transcript levels" instead of "transcription". There is no measurement of transcription rates in this manuscript, only mRNA levels.

      Modified accordingly.

      (2) Line 41: the authors mention "constant light". Does this refer to previous work? Their data in Figure 4 were in constant darkness, not in LL.

      Modified accordingly.

      (3) Line 46 and throughout the manuscript, the allelic nomenclature is not standard. 1-/- seems to indicate there are two different alleles. Since the allele might not be a null, I would suggest simply using 1/1, or perhaps delta/delta since the mutation results in a truncates CLK.

      NvClk1-/- became NvClkΔ/Δ. Except in the .xls supplementary table were the mutant kept the NvClk-/- nomenclature. It is not possible to replace only part of a word with a different font, here generating delta sign would require to do it one by one.

      (4) The last sentence of the abstract needs to be rephrased, as it suggests that CLK evolved to maintain circadian rhythms under constant conditions. Constant conditions very rarely exist on Earth, and thus cannot be an evolutionary driving force. Different explanations have been proposed on why a self-sustained clock is the evolutionary solution to timekeeping, but the purpose of the clock and of clock genes is not to maintain oscillations in constant conditions. Actually, this sentence conflicts with the title.

      Modified to: the Clock gene has evolved in cnidarians to sustain 24-hour rhythmic physiology and behavior in absence of diel environmental conditions. From my actual understanding, you are right, the purpose of clock gene is not to maintain oscillation in constant conditions (this is simply the result of the experiment), but to synchronize the physiology to the day/night rhythm, and surely to sustain 24h oscillations in case the environment challenges the perception of the diel cues. The DD or LL is just an artificial experimental design to reveal the endogenous time-keeping pacemaker.

      Results:

      (1) Line 148 and elsewhere in the MS: I would not use the word "lower" or "higher" to qualify acrophases. I would suggest advanced/delayed or earlier/later.

      Modified accordingly.

      (2) Line 157-9: The introductory sentence does not clearly present the rationale for the 6/6 experiments.

      We modified the paragraph accordingly: The presence of a 24-hour rhythm of NvClkΔ/Δ polyps under LD conditions could be attributed to either a direct light-response or the partial functioning of the circadian clock due to the nature of the mutation….

      (3) At the end of the behavior section, or perhaps at the end of each paragraph in this section, it would be helpful to have a summary of the results and more clearly explain their interpretation. The authors need to guide the readers, particularly non-chronobiologist, so that they can understand what the really neat data that were obtained mean. For example, what does it mean that the acrophase is different between mutant and wild-type, why are Clk mutants rhythmic under LD12/12 or 6/6, etc.

      We added a conclusion sentence to help non-specialist to understand each result.

      (4) Line 172 and elsewhere" "true rhythmic genes" sounds odd to me. Either they are, or they are not rhythmic.

      Modified to “rhythmic genes.”

      (5) Paragraph starting with line 184: I do not follow what is important about the number of genes per time cluster. What does it tell us, beyond the simple fact that less genes are rhythmic in the Clk mutants?

      We rewrote the result paragraph to make it clearer why we performed this clustering analysis. This clustering analysis became Extended Data Fig.2 with modification of the figures (see my comments in your review about Figure 3).

      (6) Line 197: The authors need to explain what they saw with circadian clock genes and their expression in CLk mutants. In some case, amplitude increased in LD. This surprising observation deserves some explanations. "Complex regulatory effect" is too vague.

      We replaced the vague “complex regulatory effect” by a more thorough description of the figure 3.a.

      (7) Line 198-203: Again, help the reader understand the significance of these observations.

      We rewrote the paragraph to help the reader to better understand the significance of these observations.

      Discussion:

      (1) Line 236-40. Careful with the use of -/-, which implies that an allele is a null. The first CLk mutants in mammals and flies, which the authors refer to. were actually dominant negatives.

      I went over the citations we used for this paragraph and this first mutation in fly dClkar is null, no dominant negative. Flies are still rhythmic in the dark. Unless there is an older mutation? However, you right the first mutation identified in mouse was a dominant-negative with loss of rhythmicity, while the gene deletion did not show any effect on the behavior, suggesting compensation by a paralog. I removed two references which were not relevant to the discussion.

      (2) Line 265-268 are not very clear. Do the authors mean that the lack of overlap for non-cricadian pacemaker genes is because of different experimental conditions? What would be those differences? It is reassuring that the Leach/Reitzel study and the present share pacemaker genes as rhythmic, but it is also surprising that there is almost no overlap beyond these genes. How robust are those other rhythms compared to circadian clock genes?

      We revised this paragraph and raised major points regarding the raising condition of our polyps between labs and their potential genetic differences which could explain these differences.

      (3) Line 270. I am not sure "compensation" is the right word, since there is no overlap between the rhythmic genes in mutants under LD and wild-type under either LD or DD. Also, saying on line 273 that the transcriptional pattern is not fully reproduced is a rather striking understatement, given the absence of rhythm gene overlap

      We rewrote the paragraph accordingly. We replaced by “alternative way to drive rhythmicity under LD condition”.

      (4) Line 279. The authors mention the possibility of false positives. Based on the FDR, is there more rhythmic genes than by chance?

      The possibility of false-positive is a risk to consider when you do not perform multiple-testing. We added within the results paragraph the number of rhythmic genes identified with BH.Q or p.adj. which both are the multiple testing for each algorithm (RAIN and JTK) we used.

      (5) Line 279-82. The references to the Ray study is rather obscure. What is the point the authors are trying to make here?

      Eventually, we removed the reference from this article and modify the paragraph of the discussion. Indeed, the discussion around the Ray study did not gave an interesting direction to discuss our results and analysis approach.

      (6) Line 284: define BHQ and p.adj

      Defined and referenced.

      (7) The way Lines 283-288 are worded do not provide a good rationale for how transcriptional rhythms were analyzed. The idea to combine two different approaches (JTK and RAIN) to be selective with rhythmicity was great. The authors need however to justify these choices in a more convincing manner. The goal is to detect rhythmic genes in a reliable manner, irrespective of the number of rhythmic genes observed Also, explaining the choice of methodology belongs to the result section.

      We explained our choice of methodology and moved it to the result section as suggested.

      (8) Line 292-3. There are known mechanisms that explain how transcriptional time clusters are generated. In particular, the use of interlocked feedback loop with antiphase peaks of transcriptions is well documented. Actually, it seems to me the clustering shown in Fig 4 might hint at such a mechanism.

      Indeed you are right the clustering shown in Fig 3 (former Fig 4) revealed such mechanism.

      Figures:

      Figure 2: Define relative amplitude

      We added the definition of the relative amplitude within the results. If this is what you asked for?

      Figure 3: Some of the cycles look odd (first row of graphs in panel C). Why would the first and last data point be so different in three of these graphs?

      We decided to modify this figure as we realized it was not informative and not objective enough, as we selected among multiple patterns few “representatives”. In the new figure we combined the cluster analysis to the behavior. Thus, readers can now pick a cluster according to a specific behavior activity level (or ZT/CT) and reach in supp. Table 4 the “genes of potential interest”. However generally speaking this figure does not explain more the consequences of the mutation, so we moved it into the Extended data Fig.2

      Figure4: define the color coding in the correlation panels (blue to red)

      These values from -1 to 1 are the Pearson correlation values. Now indicated on the figure with the color coding legend.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This study provides an important cell atlas of the gill of the mussel Gigantidas platifrons using a single nucleus RNA-seq dataset, a resource for the community of scientists studying deep sea physiology and metabolism and intracellular host-symbiont relationships. The work, which offers solid insights into cellular responses to starvation stress and molecular mechanisms behind deep-sea chemosymbiosis, is of relevance to scientists interested in host-symbiont relationships across ecosystems.

      Public Reviews:

      Reviewer #1 (Public Review):

      Wang et al have constructed a comprehensive single nucleus atlas for the gills of the deep sea Bathymodioline mussels, which possess intracellular symbionts that provide a key source of carbon and allow them to live in these extreme environments. They provide annotations of the different cell states within the gills, shedding light on how multiple cell types cooperate to give rise to the emergent functions of the composite tissues and the gills as a whole. They pay special attention to characterizing the bacteriocyte cell populations and identifying sets of genes that may play a role in their interaction with the symbiotes.

      Wang et al sample mussels from 3 different environments: animals from their native methane-rich environment, animals transplanted to a methane-poor environment to induce starvation, and animals that have been starved in the methane-poor environment and then moved back to the methane-rich environment. They demonstrated that starvation had the biggest impact on bacteriocyte transcriptomes. They hypothesize that the upregulation of genes associated with lysosomal digestion leads to the digestion of the intracellular symbiont during starvation, while the non-starved and reacclimated groups more readily harvest the nutrients from symbiotes without destroying them.

      Strengths:

      This paper makes available a high-quality dataset that is of interest to many disciplines of biology. The unique qualities of this non-model organism and the collection of conditions sampled make it of special interest to those studying deep sea adaptation, the impact of environmental perturbation on Bathymodioline mussels populations, and intracellular symbiotes. The authors do an excellent job of making all their data and analysis available, making this not only an important dataset but a readily accessible and understandable one.

      The authors also use a diverse array of tools to explore their data. For example, the quality of the data is augmented by the use of in situ hybridizations to validate cluster identity and KEGG analysis provides key insights into how the transcriptomes of bacteriocytes change.

      The authors also do a great job of providing diagrams and schematics to help orient non-mussel experts, thereby widening the audience of the paper.

      Thank the reviewer for the valuable feedback on our study. We are grateful that the reviewers found our work to be interesting and we appreciate their thorough evaluation of our research. Their constructive comments will be considered as we continue to develop and improve our study.

      Weaknesses:

      One of the main weaknesses of this paper is the lack of coherence between the images and the text, with some parts of the figures never being referenced in the body of the text. This makes it difficult for the reader to interpret how they fit in with the author's discussion and assess confidence in their analysis and interpretation of data. This is especially apparent in the cluster annotation section of the paper.

      We appreciate the feedback and suggestions provided by the reviewer, and we have revised our manuscript to make it more accessible to general audiences.

      Another concern is the linking of the transcriptomic shifts associated with starvation with changes in interactions with the symbiotes. Without examining and comparing the symbiote population between the different samples, it cannot be concluded that the transcriptomic shifts correlate with a shift to the 'milking' pathway and not other environmental factors. Without comparing the symbiote abundance between samples, it is difficult to disentangle changes in cell state that are due to their changing interactions with the symbiotes from other environmental factors.

      We are grateful for the valuable feedback and suggestions provided by the reviewer. Our keen interest lies in understanding symbiont responses, particularly at the single-cell level. However, it's worth noting that existing commercial single-cell RNA-seq technologies rely on oligo dT priming for reverse transcription and barcoding, thus omitting bacterial gene expression information from our dataset. We hope that advancements in technology will soon enable us to perform an integrated analysis encompassing both host and symbiont gene expression.

      Additionally, conclusions in this area are further complicated by using only snRNA-seq to study intracellular processes. This is limiting since cytoplasmic mRNA is excluded and only nuclear reads are sequenced after the organisms have had several days to acclimate to their environment and major transcriptomic shifts have occurred.

      We appreciate the comments shared by the reviewer and agree that scRNA-seq provides more comprehensive transcriptional information by targeting the entire mRNA of the cell. However, we would like to highlight that snRNA-seq has some unique advantages over scRNA-seq. Notably, snRNA-seq allows for simple snap-freezing of collected samples, facilitating easier storage, particularly for samples obtained during field trips involving deep-sea animals and other ecologically significant non-model animal samples. Additionally, unlike scRNA-seq, snRNA-seq eliminates the need for tissue dissociation, which often involves prolonged enzymatic treatment of deep-sea animal tissue/cells under atmospheric pressure. This process can potentially lead to the loss of sensitive cells or alterations in gene expression. Moreover, snRNA-seq procedures disregard the size and shape of animal cells, rendering it a superior technology for constructing the cell atlas of animal tissues. Consequently, we assert that snRNA-seq offers flexibility and represents a suitable choice for the research objects of our current research.

      Reviewer #2 (Public Review):

      Wang, He et al. shed insight into the molecular mechanisms of deep-sea chemosymbiosis at the single-cell level. They do so by producing a comprehensive cell atlas of the gill of Gigantidas platifrons, a chemosymbiotic mussel that dominates the deep-sea ecosystem. They uncover novel cell types and find that the gene expression of bacteriocytes, the symbiont-hosting cells, supports two hypotheses of host-symbiont interactions: the "farming" pathway, where symbionts are directly digested, and the "milking" pathway, where nutrients released by the symbionts are used by the host. They perform an in situ transplantation experiment in the deep sea and reveal transitional changes in gene expression that support a model where starvation stress induces bacteriocytes to "farm" their symbionts, while recovery leads to the restoration of the "farming" and "milking" pathways.

      A major strength of this study includes the successful application of advanced single-nucleus techniques to a non-model, deep-sea organism that remains challenging to sample. I also applaud the authors for performing an in situ transplantation experiment in a deep-sea environment. From gene expression profiles, the authors deftly provide a rich functional description of G. platifrons cell types that is well-contextualized within the unique biology of chemosymbiosis. These findings offer significant insight into the molecular mechanisms of deep-sea host-symbiont ecology, and will serve as a valuable resource for future studies into the striking biology of G. platifrons.

      The authors' conclusions are generally well-supported by their results. However, I recognize that the difficulty of obtaining deep-sea specimens may have impacted experimental design. In this area, I would appreciate more in-depth discussion of these impacts when interpreting the data.

      Thank the reviewer for their valuable feedback on our study. We're grateful that the reviewers found our work interesting, and we appreciate their thorough evaluation of our research. We'll consider their constructive comments as we continue to develop and improve our study.

      Because cells from multiple individuals were combined before sequencing, the in situ transplantation experiment lacks clear biological replicates. This may potentially result in technical variation (ie. batch effects) confounding biological variation, directly impacting the interpretation of observed changes between the Fanmao, Reconstitution, and Starvation conditions. It is notable that Fanmao cells were much more sparsely sampled. It appears that fewer cells were sequenced, resulting in the Starvation and Reconstitution conditions having 2-3x more cells after doublet filtering. It is not clear whether this is due to a technical factor impacting sequencing or whether these numbers are the result of the unique biology of Fanmao cells. Furthermore, from Table S19 it appears that while 98% of Fanmao cells survived doublet filtering, only ~40% and ~70% survived for the Starvation and Reconstitution conditions respectively, suggesting some kind of distinction in quality or approach.

      There is a pronounced divergence in the relative proportions of cells per cell type cluster in Fanmao compared to Reconstitution and Starvation (Fig. S11). This is potentially a very interesting finding, but it is difficult to know if these differences are the expected biological outcome of the experiment or the fact that Fanmao cells are much more sparsely sampled. The study also finds notable differences in gene expression between Fanmao and the other two conditions- a key finding is that bacteriocytes had the largest Fanmao-vs-starvation distance (Fig. 6B). But it is also notable that for every cell type, one or both comparisons against Fanmao produced greater distances than comparisons between Starvation and Reconstitution (Fig. 6B). Again, it is difficult to interpret whether Fanmao's distinctiveness from the other two conditions is underlain by fascinating biology or technical batch effects. Without biological replicates, it remains challenging to disentangle the two.

      As highlighted by the reviewer, our experimental design involves pooling multiple biological samples within a single treatment state before sequencing. We acknowledge the concern regarding the absence of distinct biological replicates and the potential impact of batch effects on result interpretation. While we recognize the merit of conducting multiple sequencing runs for a single treatment to provide genuine biological replicates, we contend that batch effects may not exert a strong influence on the observed patterns.

      In addition, we applied a bootstrap sampling algorithm to assess whether the gene expression patterns within a cluster are more similar than those between clusters. This algorithm involves selecting a portion of cells per cluster and examining whether this subset remains distinguishable from other clusters. Our assumption was that if different samples exhibited distinct expression patterns due to batch effect, the co-assignment probabilities of a cluster would be very low. This expectation was not met in our data, as illustrated in Fig. S2. The lack of significantly low co-assignment probabilities within clusters suggests that batch effects may not exert a strong influence on our results.

      Indeed, we acknowledge a noticeable shift in the expression patterns of certain cell types, such as the bacteriocyte. However, this is not universally applicable across all cell types. For instance, the UMAP figure in Fig. 6A illustrates a substantial overlap among basal membrane cell 2 from Fanmao, Starvation, and Reconstitution treatments, and the centroid distances between the three treatments are subtle, as depicted in Fig. 6B. This consistent pattern is also observed in DEPC, smooth muscle cells, and the food groove ciliary cells.

      The reviewer also noted variations in the number of cells per treatment. Specifically, Fanmao sequencing yielded fewer than 10 thousand cells, whereas the other two treatments produced 2-3 times more cells after quality control (QC). It is highly probable that the technician loaded different quantities of cells into the machine for single-nucleus sequencing—a not uncommon occurrence in this methodology. While loading more cells may increase the likelihood of doublets, it is crucial to emphasize that this should not significantly impact the expression patterns post-QC. It's worth noting that overloading samples has been employed as a strategic approach to capture rare cell types, as discussed in a previous study (reference: 10.1126/science.aay0267).

      The reviewer highlighted the discrepancy in cell survival rates during the 'doublet filtering' process, with 98% of Fanmao cells surviving compared to approximately 40% and 70% for the Starvation and Reconstitution conditions, respectively. It's important to clarify that the reported percentages reflect the survival of cells through a multi-step QC process employing various filtering strategies.

      Post-doublet removal, we filtered out cells with <100 or >2500 genes and <100 or >6000 unique molecular identifiers (UMIs). Additionally, genes with <10 UMIs in each data matrix were excluded. The observed differences in survival rates for Starvation and Reconstitution cells can be attributed to the total volume of data generated in Illumina sequencing. Specifically, we sequenced approximately 91 GB of data for Fanmao, ~196 GB for Starvation, and ~249 GB for Reconstitution. As a result, the qualified data obtained for Starvation and Reconstitution conditions was only about twice that of Fanmao due to the limited data volume.

      The reviewer also observed a divergence in the relative proportions of cells per cell type cluster in Fanmao compared to Reconstitution and Starvation, as depicted in Fig. S1. This discrepancy may hold true biological significance, presenting a potentially intriguing finding. However, our discussion on this pattern was rather brief, as we acknowledge that the observed differences could be influenced by the sample preparation process for dissection and digestion. It is crucial to consider that cutting a slightly different area during dissection may result in variations in the proportion of cells obtained. While we recognize the potential impact of this factor, we do not think that the sparsity of sampling alone could significantly affect the relative proportions of cells per cell type.

      In conclusion, we acknowledge the reviewer's suggestion that sequencing multiple individual samples per treatment condition would have been ideal, rather than pooling them together. However, the homogenous distribution observed in UMAP and the consistent results obtained from bootstrap sampling suggest that the impact of batch effects on our analyses is likely not substantial. Additionally, based on our understanding, the smaller number of cells in the Fanmao sample should not have any significant effect on the resulting different proportion of cells or the expression patterns per each cluster.

      Reviewer #3 (Public Review):

      Wang et al. explored the unique biology of the deep-sea mussel Gigantidas platifrons to understand the fundamental principles of animal-symbiont relationships. They used single-nucleus RNA sequencing and validation and visualization of many of the important cellular and molecular players that allow these organisms to survive in the deep sea. They demonstrate that a diversity of cell types that support the structure and function of the gill including bacteriocytes, specialized epithelial cells that host sulfur-oxidizing or methane-oxidizing symbionts as well as a suite of other cell types including supportive cells, ciliary, and smooth muscle cells. By performing experiments of transplanting mussels from one habitat which is rich in methane to methane-limited environments, the authors showed that starved mussels may consume endosymbionts versus in methane-rich environments upregulated genes involved in glutamate synthesis. These data add to the growing body of literature that organisms control their endosymbionts in response to environmental change.

      The conclusions of the data are well supported. The authors adapted a technique that would have been technically impossible in their field environment by preserving the tissue and then performing nuclear isolation after the fact. The use of single-nucleus sequencing opens the possibility of new cellular and molecular biology that is not possible to study in the field. Additionally, the in-situ data (both WISH and FISH) are high-quality and easy to interpret. The use of cell-type-specific markers along with a symbiont-specific probe was effective. Finally, the SEM and TEM were used convincingly for specific purposes in the case of showing the cilia that may support water movement.

      We appreciate the valuable feedback provided by the reviewer on our study. It is encouraging to know that our work was found to be interesting and that they conducted a thorough evaluation of our research. We will take their constructive comments into account as we strive to develop and enhance our study. Thank the reviewer for all the input.

      The one particular area for clarification and improvement surrounds the concept of a proliferative progenitor population within the gill. The authors imply that three types of proliferative cells within gills have long been known, but their study may be the first to recover molecular markers for these putative populations. The markers the authors present for gill posterior end budding zone cells (PEBZCs) and dorsal end proliferation cells (DEPCs) are not intuitively associated with cell proliferation and some additional exploration of the data could be performed to strengthen the argument that these are indeed proliferative cells. The authors do utilize a trajectory analysis tool called Slingshot which they claim may suggest that PEBZCs could be the origin of all gill epithelial cells, however, one of the assumptions of this analysis is that differentiated cells are developed from the same precursor PEBZC population.

      However, these conclusions do not detract from the overall significance of the work of identifying the relationship between symbionts and bacteriocytes and how these host bacteriocytes modulate their gene expression in response to environmental change. It will be interesting to see how similar or different these data are across animal phyla. For instance, the work of symbiosis in cnidarians may converge on similar principles or there may be independent ways in which organisms have been able to solve these problems.

      We are grateful for the valuable comments and suggestions provided by the reviewer. All suggestions have been carefully considered, and the manuscript has been revised accordingly. We particularly value the reviewer's insights regarding the characterization of the G. platifrons gill proliferative cell populations. In a separate research endeavor, we have conducted experiments utilizing both cell division and cell proliferation markers on these proliferative cell populations. While these results are not incorporated into the current manuscript, we would be delighted to share our preliminary findings with the reviewer. Our preliminary results indicate that the proliferative cell populations exhibit positivity for cell proliferation markers and contain a significant number of mitotic cells..

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Further experiments are needed to link the changes in transcriptomes of Bathymodioline mussels in the different environmental conditions to changes in their interactions with symbiotes. For example, quantifying the abundance and comparing the morphology of symbiotes between the environmental conditions would lend much support for shifting between milking and farming strategies. Without analyzing the symbiotes and comparing them across populations, it is difficult to comment on the mechanisms of interactions between symbiotes and the hosts. Without this analysis, this data is better suited towards comments about the general effect of environmental perturbation and stress on gene expression in these mussels.

      We appreciate the reviewer’s comments. We are also very curious about the symbiont responses, especially at the single-cell level. However, all the current commercial single-cell RNA-seq technologies are based on oligo dT priming for reverse transcription and barcoding. Therefore, the bacterial gene expression information is omitted from our dataset. Hopefully, with the development of technology, we could conduct an integrated analysis of both host and symbiont gene expression soon.

      Additionally, clarification is needed on which types of symbiotes are being looked at. Are they MOX or SOX populations? Are they homogenous? What are the concentrations of sulfur at the sampled sites?

      We thank you for your valuable comments and suggestions. Gigantidas platifrons harbors a MOX endosymbiont population characterized by a single 16S rRNA phylotype. We apologize for any confusion resulting from our previous wording. To clarify, we have revised lines 57-59 of our introduction

      In the text and images, consider using standardized gene names and leaving out the genome coordinates. This would greatly help with readability. Also, be careful to properly follow gene naming and formatting conventions (ie italicizing gene names and symbols).

      We appreciate the reviewer’s insightful comments. In model animals, gene nomenclature often stems from forward genetic approaches, such as the identification of loss-of-function mutants. These gene names, along with their protein products, typically correspond to unique genome coordinates. Conversely, in non-model invertebrates (e.g., Gigantidas platifrons of present study), gene prediction relies on a combination of bioinformatics methods, including de novo prediction, homolog-based prediction, and transcriptomics mapping. Subsequently, the genes are annotated by identifying their best homologs in well-characterized databases. Given that different genes may encode proteins with similar annotated functions, we chose to include both the gene ID (genome coordinates) and the gene name in our manuscript. This dual labeling approach ensures that our audience receives accurate and comprehensive information regarding gene identification and annotation.

      Additionally, extending KEGG analysis to the atlas annotation section could help strengthen the confidence of annotations. For example, when identifying bacteriocyte populations, the functional categories of individual marker genes (lysosomal proteases, lysosomal traffic regulators, etc) are used to justify the annotation. Presenting KEGG support that these functional categories are upregulated in this population relative to others would help further support how you characterize this cluster by showing it's not just a few specific genes that are enriched in this cell group, but rather an overall functionality.

      We appreciate the valuable suggestion provided by the reviewer. Indeed, incorporating KEGG analysis into the atlas annotation section could further enhance the confidence in our annotations. However, in our study, we encountered some limitations that impeded us from conducting a comprehensive KEGG enrichment analysis.

      Firstly, the number of differentially expressed genes (DEGs) that we identified for certain cell populations was relatively small, making it challenging to meet the threshold required for meaningful KEGG enrichment analysis. For instance, among the 97 marker genes identified for the Bacteriocyte cluster, only two genes, Bpl_scaf_59648-4.5 (lysosomal alpha-glucosidase-like) and Bpl_scaf_52809-1.6 (lysosomal-trafficking regulator-like isoform X1), were identified as lysosomal genes. To generate reliable KEGG enrichments, a larger number of genes is typically required.

      Secondly, single-nucleus sequencing, as employed in our study, tends to yield a relatively smaller number of genes per cell compared to bulk RNA sequencing. This limited gene yield can make it challenging to achieve sufficient gene representation for rigorous KEGG enrichment analysis.

      Furthermore, many genes in the genome still lack comprehensive annotation, both in terms of KEGG and GO annotations. In our dataset, out of the 33,584 genes obtained through single-nuclei sequencing, 26,514 genes have NO KEGG annotation, and 25,087 genes have NO GO annotation. This lack of annotations further restricts the comprehensive application of KEGG analysis in our study.

      The claim that VEPCs are symbiote free is not demonstrated. Additional double in situs are needed to show that markers of this cell type localize in regions free of symbiotes.

      We appreciate your comments and suggestions. In Figure 5B, our results demonstrate that the bacteriocytes (green fluorescent signal) are distant from the VEPCs, which are located around the tip of the gill filaments (close to the food groove). We have revised our Figure 5B to make it clear.

      Additionally, it does not seem like trajectory analysis is appropriate for these sampling conditions. Generally, to create trajectories confidently, more closely sampled time points are needed to sufficiently parse out the changes in expression. More justification is needed for the use of this type of analysis here and a discussion of the limitations should be mentioned, especially when discussing the hypotheses relating to PEBZCs, VEPCs, and DEPCs.

      We greatly appreciate your thoughtful commentary. It is important to acknowledge that in the context of a developmental study, incorporating more closely spaced time points indeed holds great value. In our ongoing project investigating mouse development, for instance, we have implemented time points at 24-hour intervals. However, in the case of deep-sea adult animals, we hypothesized a slower transcriptional shift in such extreme environment, which led us to opt for a time interval of 3-7 days. Examining the differential expression profiles among the three treatments, we observed that most cell types exhibited minimal changes in their expression profiles. For the cell types strongly impacted by in situ transplantation, their expression profiles per cell type still exhibited highly overlap in the UMAP analysis (Figure 6a), thus enabling meaningful comparisons. Nevertheless, we recognize that our sampling strategy may not be flawless. Additionally, the challenging nature of conducting in situ transplantation in 1000-meter depths limited the number of sampling occasions available to us. We sincerely appreciate your input and understanding.

      Finally, more detail should be added on the computational methods used in this paper. For example, the single-cell genomics analysis protocol should be expanded on so that readers unfamiliar with BD single-cell genomics handbooks could replicate the analysis. More detail is also needed on what criteria and cutoffs were used to calculate marker genes. Also, please be careful to cite the algorithms and software packages mentioned in the text.

      Acknowledged, thank you for highlighting this. In essence, the workflow closely resembles that of the 10x Genomics workflow (despite the use of a different software, i.e., Cell Ranger). We better explain the workflow below, and also noting that this information may no longer be relevant for newer users of BD or individuals who are not acquainted with BD, given that the workflow underwent a complete overhaul in the summer of 2023.

      References to lines

      Line 32: typo "..uncovered unknown tissue heterogeny" should read "uncovering" or "and uncovered")

      Overall abstract could include more detail of findings (ex: what are the "shifts in cell state" in line 36 that were observed)

      We apologize for the mistakes, and have revised the manuscript accordingly.

      Line 60: missing comma "...gill filament structure, but also"

      We apologize for the mistakes, and have revised the manuscript accordingly.

      Line 62-63: further discussion here, or in the relevant sections of the specific genes identified in the referenced bulk RNA-seq project could help strengthen confidence in annotation

      We appreciate the comment, and have revised the manuscript accordingly.

      Line 112: what bootstrapping strategy? Applied to what?

      This is a bootstrap sampling algorithm to assess the robustness of each cell cluster developed in a recent biorxiv paper. (Singh, P. & Zhai, Y. Deciphering Hematopoiesis at single cell level through the lens of reduced dimensions. bioRxiv, 2022.2006.2007.495099 (2022). https://doi.org:10.1101/2022.06.07.495099)

      Lines 127-129: What figures demonstrate the location of the inter lamina cells? Are there in situs that show this?

      We apologize for any errors; the referencing of figures in the manuscript has been revised for clarity

      Lines 185-190: does literature support these as markers of SMCs? Are they known smooth muscle markers in other systems?

      We characterized the SMCs by the expression of LDL-associated protein, angiotensin-converting enzyme-like protein, and the "molecular spring" titin-like protein, all of which are commonly found in human vascular smooth muscle cells. Based on this analysis, we hypothesize that these cells belong to the smooth muscle cell category.

      Line 201: What is meant by "regulatory roles"?

      In this context, we are discussing the expression of genes encoding regulatory proteins, such as SOX transcription factors and secreted-frizzled proteins.

      Line 211: which markers disappeared? What in situs show this?

      We apologize for the mistakes, and have revised the manuscript accordingly.

      Line 211: typo, "role" → "roll"

      We apologize for the mistakes, and have revised the manuscript accordingly.

      Line 214: what are these "hallmark genes"

      We apologize for the mistakes, here we are referring to the genes listed in figure 4B. We have revised the manuscript accordingly.

      Line 220: are there meristem-like cells in metazoans? If so, this would be preferable to a comparison with plants.

      In this context, we are discussing the morphological characteristics of gill proliferative cell populations found in filibranch bivalves. These populations, namely PEPC, VEPC, and DEPC, consist of cells exhibiting morphological traits akin to those of plant cambial-zone meristem cells. These cells typically display small, round shapes with a high nucleus-to-plasma ratio. We acknowledge that while these terms are utilized in bivalve studies (citations below), they lack the robust support seen in model systems backed by molecular biology evidences. The present snRNA-seq data, however, may offer valuable cell markers for future comprehensive investigations.

      Leibson, N. L. & Movchan, O. T. Cambial zones in gills of Bivalvia. Mar. Biol. 31, 175-180 (1975). https://doi.org:10.1007/BF00391629

      Wentrup, C., Wendeberg, A., Schimak, M., Borowski, C. & Dubilier, N. Forever competent: deep-sea bivalves are colonized by their chemosynthetic symbionts throughout their lifetime. Environ. Microbiol. 16, 3699-3713 (2014). https://doi.org:10.1111/1462-2920.12597

      Cannuel, R., Beninger, P. G., McCombie, H. & Boudry, P. Gill Development and its functional and evolutionary implications in the blue mussel Mytilus edulis (Bivalvia: Mytilidae). Biol. Bull. 217, 173-188 (2009). https://doi.org:10.1086/BBLv217n2p173

      Line 335: what is slingshot trajectory analysis? Does this differ from the pseudotime analysis?

      Slingshot is an algorithm that uses the principal graph of the cells to infer trajectories. It models trajectories as curves on the principal graph, capturing the progression and transitions between different cellular states.

      Both Slingshot and pseudotime aim to infer cellular trajectories. Slingshot focuses on capturing branching patterns which is fully compatible with the graph generated using dimensionality reduction such as UMAP and PHATE, while pseudotime analysis aims to order cells along a continuous trajectory. It does not rely on dimensionality reduction graphs. We used both in the MS for different purposes.

      Line 241: introduce FISH methodology earlier in the paper, when in situ images are first referenced

      We appreciate the comment, and have revised the manuscript accordingly.

      Line 246-249: can you quantify the decrease in signal or calculate the concentration of symbiotes in the cells? Was 5C imaged whole? This can impact the fluorescent intensity in tissues of different thicknesses.

      We appreciate your comment. In Figure 5C, most of the typical gill filament region is visible (the ventral tip of the gill filament, and the mid part of the gill filament) except for the dorsal end. The gill filament of bathymodioline mussels exhibits a simple structure: a single layer of bacteriocytes grow on the basal membrane. Consequently, the gill slices have a fairly uniform thickness (with two layers of bacteriocytes and one layer of interlamina cells in between), minimizing any potential impact on fluorescent intensity. As of now, detailed quantification of intracellular symbionts may necessitate continuous TEM or ultra-resolution confocal sections to 3D reconstruct the bacteriocytes, which may exceed the scope of the current study. Therefore, fluorescent intensity remains the only method available to us for estimating bacterial density/distribution across the gill filament.

      Line 249: What is meant by 'environmental gradient?'

      Here we are refereeing the gases need for symbiont’s chemosynthesis. We have revised the manuscript to make it clear.

      Lines 255-256: Were the results shown in the TEM images previously known? Not clear what novel information is conveyed in images Fig 5 C and D

      In the Fig 5 C and D, we’ve delivered a high-quality SEM TEM image of a typical bacteriocyte, showcasing its morphology and subcellular machinery with clarity. These electron microscopy images offer the audience a comprehensive introduction to the cellular function of bacteriocytes. Additionally, they serve as supportive evidence for the bacteriocytes' snRNA-seq data.

      Line 295-296: Can you elaborate on what types of solute carrier genes have been shown to be involved with symbioses?

      We appreciate the comment, and have revised the manuscript accordingly. The putative functions of the solute carriers could be found in Figure 5I.

      Line 297-301: Which genes from the bulk RNA-seq study? Adding more detail and references in cluster annotation would help readers better understand the justifications.

      We appreciate the comment, and have revised the manuscript accordingly.

      Line 316 -322: Can you provide the values of the distances?

      We also provide values in the main text, in addition to the Fig6b. We also provide a supplementary Table (Supplementary Table S19).

      Line 328: What are the gene expression patterns?

      We observed genes that are up- and down-regulated in Starvation and reconstitution.

      LIne 334-337: A visualization of the different expression levels of the specific genes in clusters between sites might be helpful to demonstrate the degree of difference between sites.

      We have prepared a new supplementary file showing the different expression levels.

      Line 337: Citation needed

      We appreciate the comment. Here, we hypothesize the cellular responds based on the gene’s function and their expression patterns.

      Line 402-403: Cannot determine lineages from data presented. Need lineage tracing over time to determine this

      We acknowledge the necessity of conducting lineage tracing over time to validate this hypothesis. Nonetheless, in practical terms, it is difficult to obtain samples for testing this. Perhaps, it is easier to use their shallow sea relatives to test this hypothesis. However, in practice, it is very difficult.

      413-414: What are the "cell-type specific responses to environmental change"? It could be interesting to present these results in the "results and discussion" section

      These results are shown in Supplementary Figure S8.

      Line 419-424: Sampling details might go better earlier on in the paper, when the sampling scheme is introduced.

      We appreciate the comments. Here, we are discussing the limitations of our current study, not sampling details.

      Line 552: What type of sequencing? Paired end? How long?

      We conducted 150bp paired-end sequencing.

      556-563: More detail here would be useful to readers not familiar with the BD guide. Also be careful to cite the software used in analysis!

      The provided guide and handbook elucidate the intricacies of gene name preparation, data alignment to the genome, and the generation of an expression matrix. It is worth mentioning that we relied upon outdated versions of the aforementioned resources during our data analysis phase, as they were the only ones accessible to us at the time. However, we have since become aware of a newer pipeline available this year, rendering the information presented here of limited significance to other researchers utilizing BD.

      Many thanks for your kind reminding. We have now included a reference for STAR. All other software was cited accordingly. There are no scholarly papers or publications to refer to for the BD pipeline that we can cite.

      Line 577-578: How was the number of clusters determined? What is meant by "manually combine the clusters?" If cells were clustered by hand, more detail on the method is needed, as well as direct discussion and justification in the body of the paper.

      It would be more appropriate to emphasize the determination of cell types rather than clusters. The clusters were identified using a clustering function, as mentioned in the manuscript. It's important to note that the clustering function (in our case, the FindClusters function of Seurat) provides a general overview based on diffuse gene expression. Technically speaking, there is no guarantee that one cluster corresponds to a single cell type. Therefore, it is crucial to manually inspect the clustering results to assign clusters to the appropriate cell types. In some cases, multiple clusters may be assigned to the same cell type, while in other cases, a single cluster may need to be further subdivided into two or more cell types or sub-cell types, depending on the specific circumstances.

      For studies conducted on model species such as humans or mice, highly and specifically expressed genes within each cluster can be compared to known marker genes of cell types mentioned in previous publications, which generally suffices for annotation purposes. However, in the case of non-model species like Bathymodioline mussels, there is often limited information available about marker genes, making it challenging to confidently assign clusters to specific cell types. In such situations, in situ hybridisation proves to be incredibly valuable. In our study, WISH was employed to visualise the expression and morphology of marker genes within clusters. When WISH revealed the expression of marker genes from a cluster in a specific type of cell, we classified that cluster as a genuine cell type. Moreover, if WISH demonstrated uniform expression of marker genes from different clusters in the same cell, we assigned both clusters to the same cell type.

      We expanded the description of the strategy in the Method section.

      LIne 690-692: When slices were used, what part of the gill were they taken from?

      We sectioned the gill around the mid part which could represent the mature bacteriocytes.

      References to figures:

      General

      Please split the fluorescent images into different channels with an additional composite. It is difficult to see some of the expression patterns. It would also make it accessible to colorblind readers.

      We appreciate the comments and suggestions from the reviewer. We have converted our figures to CMYK colour which will help the colorblind audiences to read our paper.

      Please provide the number of replicates for each in situ and what proportion of those displayed the presented pattern.

      We appreciate the reviewer’s comments. We have explained in the material and methods part of the manuscript.

      Figure 2.C' is a fantastic summary and really helps the non-mussel audience understand the results. Adding schematics like this to Figures 3-5 would be helpful as well.

      We value the reviewer's comments. We propose that Figures 3K, 4C, and 5A-D could offer similar schematic explanations to assist the audience.

      Figure 2:

      Figures 2.C-F, 2.C', 2.H-J are not referenced in the text. Adding in discussions of them would help strengthen your discussions on the cluster annotation

      We appreciate the reviewer's comments. We have revise the manuscript accordingly.

      In 2.B. 6 genes are highlighted in red and said to be shown in in situs, but only 5 are shown.

      We apology for the mistake. We didn’t include the result 20639-0.0 WISH in present study. We have changed the label to black.

      Figure 3:

      FIg 2C-E not mentioned.

      We appreciate the reviewer's comments. We have revise the manuscript accordingly.

      In 3.B 8 genes are highlighted in red and said to be shown in in situs. Only 6 are.

      The result of the WISH were provided in Supplementary Figures S4 and S5.

      FIgure 3.K is not referenced in the legend.

      We appreciate the comment, and have revised the manuscript accordingly.

      Figure 4:

      In Figure D, it might be helpful to indicate the growth direction.

      We appreciate the comment, and have revised the manuscript accordingly by adding an arrow in panel D to indicate growth direction.

      4F: A double in situ with the symbiote marker is needed to demonstrate the nucleolin-like positive cells are symbiote free.

      We appreciate the comment. The symbiont free region could be found in Figure 5A.

      Figure 5:

      In 5.A, quantification of symbiote concentration would help support your conclusion that they are denser around the edges.

      We appreciate the comment, as we mentioned above, detailed quantification of intracellular symbionts may necessitate continuous TEM or ultra-resolution confocal sections to 3D reconstruct the bacteriocytes, which may exceed the scope of the current study. Therefore, fluorescent intensity remains the only method available to us for estimating bacterial density/distribution across the gill filament.

      In 5.D, the annotation is not clear. Adding arrows like in 5.C would be helpful.

      We appreciate the comment, and have revised the manuscript accordingly.

      A few genes in 5.F are not mentioned in the paper body when listing other genes. Mentioning them would help provide more support for your clustering.

      We appreciate the comment, and have revised the manuscript accordingly.

      Is 5.I meant to be color coded with the gene groups from 5.F? Color Coding the gene names, rather than organelles or cellular structures might portray this better and help visually strengthen the link between the diagram and your dot plot.

      We appreciate the suggestions. We've experimented with color-coding the gene names, but some colors are less discernible against a white background.

      Figure 6:

      6.B Is there a better way to visualize this data? The color coding is confusing given the pairwise distances. Maybe heatmaps?

      We attempted a heatmap, as shown in the figure below. However, all co-authors agree that a bar plot provides clearer visualization compared to the heatmap. We agree that the color scheme maya be confusing because they use the same color as for individual treatment. So we change the colors.

      Author response image 1.

      Figure 6.D: Why is the fanmao sample divided in the middle?

      Fig6C show that single-cell trajectories include branches. The branches occur because cells execute alternative gene expression programs. Thus, in Fig 6D, we show changes for genes that are significantly branch dependent in both lineages at the same time. Specifically, in cluster 2, the genes are upregulated during starvation but downregulated during reconstitution. Conversely, genes in cluster 1 are downregulated during starvation but upregulated during reconstitution. It's of note that Fig 6D displays only a small subset of significantly branch-dependent genes.

      FIgure 6.D: Can you visualize the expression in the same format as in figures 2-5?

      We appreciate the comments from the reviewer. As far as we know, this heatmap are the best format to demonstrate this type of gene expression profile.

      Supplementary Figure S2:

      Please provide a key for the cell type abbreviations

      We appreciate the comment, and have added the abbreviations of cell types accordingly.

      Supplementary Figures S4 and S5:

      What part of the larger images are the subsetted image taken from?

      We appreciate the comment, these images were taken from the ventral tip and mid of the gill slices, respectively. We have revised the figure legends to make it clear.

      Supplemental Figure S7:

      If clusters 1 and 2 show genes up and downregulated during starvation, what do clusters 4 and 3 represent?

      Cluster 1: Genes that are obviously upregulated during Starvation, and downregulated during reconstitution; luster4: genes are downregulated during reconstitution but not obviously upregulated during Starvation.

      Cluster 2 show genes upregulated during reconstitution, and cluster 3 obviously downregulated during Starvation.

      Author response table 1.

      Supplemental Figure S8:

      This is a really interesting figure that I think shows some of the results really well! Maybe consider moving it to the main figures of the paper?

      We appreciate the comments and suggestions. We concur with the reviewer on the significance of the results presented. However, consider the length of this manuscript, we have prioritized the inclusion of the most pertinent information in the main figures. Supplementary materials containing additional figures and details on the genes involved in these pathways are provided for interested readers.

      Supplemental Figure S11:

      Switching the axes might make this image easier for the reader to interpret. Additionally, calculating the normalized contribution of each sample to each cluster could help quantify the extent to which bacteriocytes are reduced when starving.

      Thank you for the insightful suggestion, which we have implemented as detailed below. We acknowledge the importance of understanding the changes in bacteriocyte proportions across different treatments. However, it's crucial to note that the percentage of cells per treatment is highly influenced by factors such as the location of digestion and sequencing, as previously mentioned.

      Author response image 2.

      Reviewer #2 (Recommendations For The Authors):

      The following are minor recommendations for the text and figures that may help with clarity:

      Fig. 3K: This figure describes water flow induced by different ciliary cells. It is not clear what the color of the arrows corresponds to, as they do not match the UMAP (i.e. the red arrow) and this is not indicated in the legend. Are these colours meant to indicate the different ciliary cell types? If so it would be helpful to include this in the legend.

      We appreciate the reviewer's comments and suggestions. The arrows indicate the water flow that might be agitated by the certain types of cilium. We have revised our figure and figure legends to make it clear.

      Line 369: The incorrect gene identifier is given for the mitochondrial trifunctional enzyme. This gene identifier is identical to the one given in line 366, which describes long-chain-fatty-acid-ligase ACSBG2-like (Bpl_scaf_28862-1.5).

      We appreciate the reviewer's comments and suggestions. We have revised our manuscript accordingly.

      Line 554: The Bioproject accession number (PRJNA779258) does not appear to lead to an existing page in any database.

      We appreciate the reviewer's comments and suggestions. We have released this Bioproject to the public.

      Line 597-598: it would be helpful to know the specific number of cells that the three sample types were downsampled to, and the number of cells remaining in each cluster, as this can affect the statistical interpretation of differential expression analyses.

      The number of cells per cluster in our analysis ranged from 766 to 14633. To mitigate potential bias introduced by varying cell numbers, we implemented downsampling, restricting the number of cells per cluster to no more than 3500. This was done to ensure that the differences between clusters remained less than 5 times. We experimented with several downsampling strategies, exploring cell limits of 4500 and 2500, and consistently observed similar patterns across these variations.

      Data and code availability:

      The supplementary tables and supplementary data S1 appear to be the final output of the differential expression analyses. Including the raw data (e.g. reads) and/or intermediate data objects (e.g. count matrices, R objects), in addition to the code used to perform the analyses, may be very helpful for replication and downstream use of this dataset. As mentioned above, the Bioproject accession number appears to be incorrect.

      We appreciate the reviewer's comments and suggestions. Regarding our sequencing data, we have deposited all relevant information with the National Center for Biotechnology Information (NCBI) under Bioproject PRJNA779258. Additionally, we have requested the release of the Bioproject. Furthermore, as part of this round of revision, we have included the count matrices for reference.

      Reviewer #3 (Recommendations For The Authors):

      As noted in the public review, my only major concerns are around the treatment of progenitor cell populations. I am sympathetic to the challenges of these experiments but suggest a few possible avenues to the authors.

      First, there could be some demonstration that these cells in G. platifrons are indeed proliferative, using EdU incorporation labeling or a conserved epitope such as the phosphorylation of serine 10 in histone 3. It appears in Mytilus galloprovincialis that proliferating cell nuclear antigen (PCNA) and phospho-histone H3 have previously been used as good markers for proliferative cells (Maiorova and Odintsova 2016). The use of any of these markers along with the cell type markers the authors recover for PEBZCs for example would greatly strengthen the argument that these are proliferative cells.

      If performing these experiments would not be currently possible, the authors could use some computation approaches to strengthen their arguments. Based on conserved cell cycle markers and the use of Cell-Cycle feature analysis in Seurat could the authors provide evidence that these progenitors occupy the G2/M phase at a greater percentage than other cells? Other than the physical position of the cells is there much that suggests that these are proliferative? While I am more convinced by markers in VEPCs the markers for PEBZCs and DEPCs are not particularly compelling.

      While I do not think the major findings of the paper hinge on this, comments such as "the PBEZCs gave rise to new bacteriocytes that allowed symbiont colonization" should be taken with care. It is not clear that the PBEZCs are proliferative and there does not seem to be any direct evidence that PBEZCs (or DEPCs or VEPCS for that manner) are the progenitor cells through any sort of labeling or co-expression studies.

      We appreciate the comments and suggestions from the reviewer. We have considered all the suggestions and have revised the manuscript accordingly. We especially appreciate the reviewer’s suggestions about the characterisations of the G. platifrons gill proliferative cell populations. In a separate research project, we have tested both cell division and cell proliferation markers on the proliferation cell populations. Though we are not able to include these results in the current manuscript, we are happy to share our preliminary results with the reviewer. Our results demonstrate the proliferative cell populations, particularly the VEPCs, are cell proliferation marker positive, and contains high amount of mitotic cells.

      Author response image 3.

      Finally, there is a body of literature that has examined cell proliferation and zones of proliferation in mussels (such as Piquet, B., Lallier, F.H., André, C. et al. Regionalized cell proliferation in the symbiont-bearing gill of the hydrothermal vent mussel Bathymodiolus azoricus. Symbiosis 2020) or other organisms (such as Bird, A. M., von Dassow, G., & Maslakova, S. A. How the pilidium larva grows. EvoDevo. 2014) that could be discussed.

      We appreciate the comments and suggestions from the reviewer. We have considered all the suggestions and have revised the manuscript accordingly (line 226-229).

      Minor comments also include:

      Consider changing the orientation of diagrams in Figure 2C' in relationship to Figure 2C and 2D-K.

      We appreciate the comments and suggestions from the reviewer. The Figure 2 has been reorganized.

      For the diagram in Figure 3K, please clarify if the arrows drawn for the direction of inter lamina water flow is based on gene expression, SEM, or some previous study.

      We are grateful for the reviewer's valuable feedback and suggestions. The arrows in the figure indicate the direction of water flow that could be affected by specific types of cilium. Our prediction is based on both gene expression and SEM results. To further clarify this point, we have revised the figure legend of Fig. 3.

      Please include a label for the clusters in Figure 5E for consistency.

      We have revised our Figure 5E to keep our figures consistent.

      Please include a note in the Materials and Methods for Monocle analysis in Figure 6.

      We conducted Monocle analyses using Monocle2 and Monocle 3 in R environment. We have revised our material and methods with further information of Figure 6.

      In Supplement 2, the first column is labeled PEBC while the first row is labeled PEBZ versus all other rows and columns have corresponding names. I am guessing this is a typo and not different clusters?

      We appreciate the great effort of the reviewer in reviewing our manuscript. We have corrected the typo in the revised version.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      (1) The authors' findings are primarily rooted in a series of well-conducted in vitro experiments using two CML cell lines, K562 and MEG-01. While the findings are interesting and novel, further work to corroborate these findings in primary CML samples would have greatly strengthened the potential real-world relevance of these discoveries. The authors appear to have some PBMCs from primary CML patients and a BM sample from a Ph+ ALL in which they performed western blot analyses (Fig 1). Couldn't these samples have been used to at least confirm some of the key discoveries? For example, the neddylation of BCR-ABL, or; sensitivity of primary leukemic cells to RAPSYN knockdown, and/or; phosphorylation of RAPSYN by SRC?

      We agree with your points and really appreciate your comments. To demonstrate the clinical relevance, we have conducted a series of experiments to address your concerns.

      (1) after a thorough optimization on the transduction process, we have managed to show that shRNA-mediated gene silencing of RAPSYN impaired the growth of primary CML samples. These additional data are presented as Figure 1D in the revised manuscript with its corresponding figure legend and description, lines 136-141.

      (2) we have invested tremendous time and effort to deal with “key discoveries” regardless of the almost impossible task with a great technical difficulty. With 5 mL (ethical approval) of PBMCs on hands, we have finally managed to confirm BCR-ABL neddylation by IP from two newly acquired CML patients. The results are as presented in Figure 2F in the revised manuscript with its corresponding figure legend and description, lines 186-187.

      (2) The authors initially interrogated a fairly dated (circa 2009) microarray-based primary dataset to show that the increase in RAPSYN is primarily a post-transcriptional event, as mRNA levels are not different between healthy and CML samples. It would be interesting to see whether differences might be more readily seen in more recent RNA-seq datasets from CML patients, given the well-known differences in sensitivity between the two platforms. Additionally, I wonder if there would be transcriptional signatures of increased NEDDylation (or RAPSYN-induced NEDDylation) that could be interrogated in primary samples? Furthermore, there are proteomics datasets of CML cells made resistant to TKIs (through in vitro selection experiments) that could be interrogated for independent validation of the authors' discoveries. For example: from K562 cells, PMID: 30730747 or PMID: 34922009).

      Thank you very much for your constructive comments. Based on your suggestion, we have 1) analyzed mRNA level of RAPSYN in RNA-seq datasets GSE13159 (2009), GSE138883 (2020) and GSE140385 (2020), indicating no difference between CML patients and healthy donors. We have included the results in Figure1-figure supplementary 1A and in the revised manuscript (lines 123-127); 2) examined the RNA levels of RAPSYN-related neddylation enzymes, including E1 (NAE1), E2 (UBE2M), NEDD8 and NEDP1 in these databases, and no significant differences of these neddylation-related genes were found between CML patients and healthy donors as well (Supplementary Figure 2C, lines 168-172).

      We have also analyzed the proteomics datasets from PMID: 30730747 and PMID: 34922009 according to your suggestion. Unfortunately, no information on RAPSYN expression is available in these datasets. To avoid potential negligence, we have examined all CML-related proteomics datasets from 2002 to 2024, still resulting in no information about protein expression of RAPSYN. Consequently, our finding on the higher expression of RAPSYN in the PBMCs of Ph+ patients in this study appears to be an observation for the first time. And we believe that our results should be more clinically relevant than those, if any, from the cells by in vitro selection.

      Reviewer #2 (Public Review):

      Most of the conclusions drawn in this paper are well supported by data, but some aspects of the data need to be clarified and extended:

      (1) The authors propose that targeting RAPSYN in Ph+ leukemia could have a high therapeutic index, suggesting that inhibition of RAPSYN may lead to cytotoxicity in Ph+ leukemia with high specificity and minimal side effects. To substantiate this assertion, the authors should investigate the impact on cell viability upon RAPSYN knockdown in non-Ph leukemic cell lines or HS-5 cells (similar to Figure 1C), despite their lower RAPSYN protein levels.

      We appreciate your valuable comments. When we used shRNA to knockdown the expression of RAPSYN in HS-5 cells, it did not affect the cell growth of HS-5 cells. We have included the data in Figure 1C, modified its figure legend, and added corresponding description, lines 136-141.

      (2) The authors intriguingly show that the protein levels of RAPSYN are significantly enriched in Ph+ patient samples and cell lines (Figure 1A, B), even though the mRNA levels remain unchanged (Supplementary Figure 1 A-C). This observation merits a clear explanation in the context of the presented results. The data in the manuscript does imply a feedforward loop mechanism (Figure 7), where BCR-ABL activates SRC, which subsequently stabilizes RAPSYN, which in turn helps protect BCR-ABL from c-CBL-mediated degradation. If this is the working hypothesis, it would be beneficial for the reader to see supporting evidence.

      Thank you very much for pointing out the issue. We have realized the inappropriateness of Figure 7, which was originally placed as a summarizing figure. To avoid potential confusion and misleading, this figure has been deleted, which does not affect the results and conclusions of this study. In addition, the differences on mRNA levels and protein expressions have been responded to Reviewer #1.

      (3) The authors present compelling evidence to suggest that RAPSYN may possess direct NEDD8-ligase activity on BCR-ABL. To strengthen this claim, it may be valuable to conduct further assays involving a ligase-deficient mutant, such as C366A, beyond its use in Figure 2J. Incorporating this mutant into the in vitro assay illustrated in Figure 2K, for instance, could offer substantial validation for the claim. In addition, showing whether the ligase-deficient mutant is capable of phenocopying the phosphorylation-mutant Y336F, as showcased in Figures 5E, F, and 6D, F, would be beneficial.

      We are grateful to your comments. In the manuscript, we have provided sufficient data to support the direct neddylation of BCR-ABL by RAPSYN, as you commented “The authors present compelling evidence to suggest that RAPSYN may possess direct NEDD8-ligase activity on BCR-ABL.”. Cys366 was previously demonstrated as the catalytic residue essential for E3 activity of RAPSYN (Li et al. 2016, PMID: 27839998), and the phosphorylation at Phe336 was thoroughly verified by site-directed mutagenesis and the treatments of SRC-specific inhibitor saracatinib in present cellular experiments. Therefore, while we fully respect your opinions, we do not think it would be necessary to perform tedious in vitro reactions for expected negative results, which was the reason for us not to conduct enzymatic reactions with known inactive mutants, such as C366A and Y336F, in the first place.

      (4) The observations presented in Figures 6 C-G require additional clarification. Notably, there are discrepancies in relative cell viability effects in K562 cells, and to some extent in MEG-01 cells, under conditions that are indicated as being either identical or highly similar. For instance, this inconsistency is observable when comparing the left panels of Figure 6C and 6D in the case of NC overexpression + shSRC#2, and the left panels of Figure 6E and 6G with NC overexpression or shNC, respectively. Listing potential causes of these discrepancies would strengthen the overall validity of the findings and their subsequent interpretation.

      Thank you for your comments and apologize for the confusion. To make a meaningful comparison, we have revised the method part “Preparation of stable RAPSYNWT, RAPSYNY336F or SRC expression cell lines” (lines 625-627) and reorganized Figure 6 to reflect the differences on the negative controls. In fact, we first used LV6 (EF-1a/Puro; OE-NC1) vector for the overexpression of RAPSYNWT and SRC. Due to low expression level with LV6 and long period of time for subsequent selection, we switched to LV18 (CMV/Puro; OE-NC2) for the overexpression of RAPSYNY336F. Since the sensitivities of K562/MEG01-OE-NC cells to shSRC transduction in Figure 6C (now revised to K562/MEG01-OE-NC1) and 6D (now revised to K562/MEG01-OE-NC2) were noticeably different, we have separated RAPSYNWT and RAPSYNY336F cells as 6C and 6D with their own corresponding empty vector as negative control, instead of merging the results into a single figure with one negative control of OE-NC. In addition, given the fact that K562/MEG01 cells reacted differently upon saracatinib treatments after transduction with the empty vector, we have also distinguished the negative controls as OE-NC1 in Figure 6E, OE-NC2 in Figure 6F and shNC in Figure 6G. Afterall, the transduction of K562/MEG01 cells with different expression vectors and viral particles caused the discrepancies in the experiments of cell viability, which has been clarified by reorganizing Figure 6 in the revision.

      (5) Throughout the manuscript, immunoblots which showcase immunoprecipitations of BCR-ABL or His-BCR-ABL depict poly-neddylation (e.g. Figures 2E-M, 3D-G, and 5A-E) and poly-ubiquitination (e.g. Figures 3D-G) patterns/smears where these patterns seem to extend below the molecular weight of BCR-ABL. To enhance clarity, it would be valuable for the authors to provide an explanation in the text or the figure legend for this observation. Is it reflective of potential degradation of BCR-ABL or is there another explanation behind it?

      Thank you for your valuable comments. After carefully checking original immunoblots, we have ascertained that the protein band of BCR-ABL was at 250 KDa and the smear bands appeared to be higher than 250 KDa were likely caused by the conjugation of NEDD8 (neddylation) or Ubiquitin (ubiquitination) onto BCR-ABL. Regarding the molecular weight of modified BCR-ABL lower than expected, whether it is a common feature as previously reported (Mao, J., et al, 2010, PMID: 21118980) or possible degradation during the modification process or sample preparation requires further investigation. We have corrected the labeling of figures in the revised manuscript.

      Reviewer #1 (Recommendations For The Authors):

      (1) It would really nail the real-world relevance of these nice findings if the authors are able to confirm some aspects of their cell line-based discoveries in publicly available 'omics datasets generated from primary CML samples. I have suggested some of these in the public review as well.

      Alternatively, if they are able to investigate samples from murine CML models (eg. BALB/c CML models), it would represent a step towards real-world relevance.

      Thank you very much for your constructive comments. According to your suggestion, we have examined and analyzed RAPSYN mRNA and protein in updated and publicly available datasets as replied in the public response.

      (2) The Discussion repeats some of the information already presented in the Introduction (for example, lines 311-327 of the merged document, or lines 349-358). I would urge the authors to instead expand more about how RAPSYN might be upregulated at the post-transcriptional level, or its potential post-translational regulation by SRC-mediated phosphorylation.

      Thanks for your constructive suggestion. We have re-written this part according to your suggestion and marked in red color in the revised manuscript, lines 319-325 and lines 351-378.

      (3) There are instances of clunky phrases/grammatical mistakes in the manuscript which detract from its readability (eg: lines 142-143: "...empty body transduced shRAPSN#3 or K562 cells into...."; lines 163-164: "Despite AChR subunits α7, M2, M3, and M4 were expressed in all tested cells, no change..."; line 178: "Preeminent BCR-ABL neddylation was detected in..."). A closer proof-reading of the final manuscript is advisable.

      We appreciate the valuable comments. We have made changes for improvement, which is marked in red color in the revised manuscript, lines 145-147, lines 166-168 and line 185.

      (4) The western blot in Fig 5C (particularly the control "OE-NC" of K562) looks drastically different from the corresponding control lanes in Figs 5A and 5B. Similarly, the cell viability curves presented in Fig 6D and 6F (for both K562 and MEG-01, control conditions) look very different from the corresponding curves in Figs 6A and 6B.

      We appreciate for your valuable comments. Because we accidently used the imagines with different exposure time, the western blots in Fig 5C (particularly the control "OE-NC" of K562) look very different from corresponding control lanes in Figs 5A and 5B. We have replaced images with the same exposure time in the revised manuscript.

      For readers to clearly understand, we have revised the method part “Preparation of stable RAPSYNWT, RAPSYNY336F or SRC expression cell lines” (lines 625-627) and related figure legends to reflect the differences.

      We have publicly responded the discrepancy on cell viability.

      Reviewer #2 (Recommendations For The Authors):

      In reviewing your study, I must insist that the completeness and robustness of your work would significantly benefit from a more exhaustive listing of the antibodies used for immunoblotting and immunoprecipitation within the Materials and Methods section. A number of antibodies have been accounted for, however, crucial ones targeting BCR-ABL, c-CBL, Ubiquitin, NEDD8, HA, Myc, and others appear to be omitted. To maintain rigorous scientific standards, I strongly encourage you to include these.

      We appreciate your comments. We have carefully checked the section of Methods and added detailed information of antibodies for Immunoblotting and Immunoprecipitation in the revised manuscript, lines 502-516.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      eLife assessment

      The authors have made important contributions to our understanding of the pathogenesis of erectile dysfunction (ED) in diabetic patients. They have identified the gene Lbh, expressed in pericytes of the penis and decreased in diabetic animals. Overexpression of Lbh appears to counteract ED in these animals. The authors also confirm Lbh as a potential marker in cavernous tissues in both humans and mice. While solid evidence supports Lbh's functional role as a marker gene, further research is needed to elucidate the specific mechanisms by which it exerts its effects. This work is of interest to those working in the fields of ED and angiogenesis.

      Public Reviews:

      Reviewer #1 (Public Review):

      In this study, the researchers aimed to investigate the cellular landscape and cell-cell interactions in cavernous tissues under diabetic conditions, specifically focusing on erectile dysfunction (ED). They employed single-cell RNA sequencing to analyze gene expression patterns in various cell types within the cavernous tissues of diabetic individuals. The researchers identified decreased expression of genes associated with collagen or extracellular matrix organization and angiogenesis in several cell types, including fibroblasts, chondrocytes, myofibroblasts, valve-related lymphatic endothelial cells, and pericytes. They also discovered a newly identified marker, LBH, that distinguishes pericytes from smooth muscle cells in mouse and human cavernous tissues. Furthermore, the study revealed that pericytes play a role in angiogenesis, adhesion, and migration by communicating with other cell types within the corpus cavernosum. However, these interactions were found to be significantly reduced under diabetic conditions. The study also investigated the role of LBH and its interactions with other proteins (CRYAB and VIM) in maintaining pericyte function and highlighted their potential involvement in regulating neurovascular regeneration. Overall, the manuscript is well-written and the study provides novel insights into the pathogenesis of ED in patients with diabetes and identifies potential therapeutic targets for further investigation.

      Comments on revised version:

      For Figure 4, immunofluorecent staining of LBH following intracavernous injections with lentiviruses is required to justify overexpression and tissue specificity.

      We agree with this claims. Therefore, we have performed the immunofluorecent staining of LBH in cavernous tissues after infection with LBH O/E lentiviruses. And we found the LBH expression is significantly decreased in DM or DM+NC groups, however, after infection with LBH O/E lentiviruses, the LBH expression is significantly increased, shown as Supplementary Fig. 10. (Please see revised ‘Result’ and ‘Supplementary Fig. 10’)

      Reviewer #3 (Public Review):

      Bae et al. described the key roles of pericytes in cavernous tissues in diabetic erectile dysfunction using both mouse and human single-cell transcriptomic analysis. Erectile dysfunction (ED) is caused by dysfunction of the cavernous tissue and affects a significant proportion of men aged 40-70. The most common treatment for ED is phosphodiesterase 5 inhibitors; however, these are less effective in patients with diabetic ED. Therefore, there is an unmet need for a better understanding of the cavernous microenvironment, cell-cell communications in patients with diabetic ED, and the development of new therapeutic treatments to improve the quality of life.

      Pericytes are mesenchymal-derived mural cells that directly interact with capillary endothelial cells (ECs). They play a vital role in the pathogenesis of erectile function as their interactions with ECs are essential for penile erection. Loss of pericytes has been associated with diabetic retinopathy, cancer, and Alzheimer's disease and has been investigated in relation to the permeability of cavernous blood vessels and neurovascular regeneration in the authors' previous studies. This manuscript explores the mechanisms underlying the effect of diabetes on pericyte dysfunction in ED. Additionally, the cellular landscape of cavernous tissues and cell type-specific transcriptional changes were carefully examined using both mouse and human single-cell RNA sequencing in diabetic ED. The novelty of this work lies in the identification of a newly identified pericyte (PC)-specific marker, LBH, in mouse and human cavernous tissues, which distinguishes pericytes from smooth muscle cells. LBH not only serves as a cavernous pericyte marker, but its expression level is also reduced in diabetic conditions. The LBH-interacting proteins (Cryab and Vim) were further identified in mouse cavernous pericytes, indicating that these signaling interactions are critical for maintaining normal pericyte function. Overall, this study demonstrates the novel marker of pericytes and highlights the critical role of pericytes in diabetic ED.

      Comments on revised version:

      Bae and colleagues substantially improved the data quality and revised their manuscript "Pericytes contribute to pulmonary vascular remodeling via HIF2a signaling". While these revisions clarify some of the concerns raised, others remain. In my view, the following question must be addressed.

      In my prior question on #3, I completely disagree with the statement that "identified cells with pericyte-like characteristics in the walls of large blood vessels". The staining that authors provided for LBH, was clearly stained for SMCs, not pericytes. Per Fig 2E, the authors are correct that LBH is colocalized with SMA+ cells( SMCs). However, the red signal from LBH clearly stains endothelial cells. In the rest of 2E and 2D, LBH is CD31- and their location suggests LBH stained for SMCs in the Aorta, Kidney vasculature, Dorsal vein, and Dorsal Artery.

      We respect the reviewer's comments and provide further justification for the reviewer's concerns. We first performed double staining of LBH and CD31 on dorsal artery and dorsal vein tissues. We found that LBH-expressing cells are completely different from CD31-expressing cells (Figrue 2D, indicated by arrows, and Supplementary Fig. 10A) and that expression is higher in veins than in arteries. This is consistent with previous understanding. In addition, in the double staining of LBH and α-SMA, we also found that there was no overlap between LBH-expressing cells and α-SMA-expressing smooth muscle cells in the cavernosum tissues, but there was some overlap in dorsal artery and dorsal vein (Figrue 2E, indicated by arrows). This may indicate that LBH is expressed slightly different types of blood vessels. This requires further experiments to prove in the future. In addition, to avoid confusion among other readers. We modify our previous discussion regarding the identification of cells with pericyte-like characteristics in the walls of large blood vessels. We removed the associated immunofluorescence staining in the aorta and kidneys replaced them with dorsal artery and dorsal vein (Please see revised ‘Result’ and ‘Figure 2’ and ‘Supplementary Fig. 10A’)

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this work, Qiu and colleagues examined the effects of preovulatory (i.e., proestrous or late follicular phase) levels of circulating estradiol on multiple calcium and potassium channel conductances in arcuate nucleus kisspeptin neurons. Although these cells are strongly linked to a role as the "GnRH pulse generator," the goal here was to examine the physiological properties of these cells in a hormonal milieu mimicking late proestrus, the time of the preovulatory GnRH-LH surge. Computational modeling is used to manipulate multiple conductances simultaneously and support a role for certain calcium channels in facilitating a switch in firing mode from tonic to bursting. CRISPR knockdown of the TRPC5 channel reduced overall excitability, but this was only examined in cells from ovariectomized mice without estradiol treatment. The patch clamp experiments are comprehensive and overall solid but a direct demonstration of the role of these conductances in being necessary for surge generation (or at least having a direct physiological consequence on surge properties) is lacking, substantially reducing the impact of the findings.

      Strengths:

      (1) Examination of multiple types of calcium and potassium currents, both through electrophysiology and molecular biology.

      (2) Focus on arcuate kisspeptin neurons during the surge is relatively conceptually novel as the anteroventral periventricular nucleus (AVPV) kisspeptin neurons have received much more attention as the "surge generator" population.

      (3) The modeling studies allow for direct examination of manipulation of single and multiple conductances, whereas the electrophysiology studies necessarily require examination of each current in isolation. The construction of an arcuate kisspeptin neuron model promises to be of value to the reproductive neuroendocrinology field.

      We thank the reviewer for recognizing our comprehensive examination of Kiss-ARH neurons through electrophysiological, molecular and computational modeling of their activity during the preovulatory surge, which as the reviewer pointed out is “conceptually novel.” We will bolster our argument that Kiss1-ARH neurons transition from synchronized firing to burst firing with the E2-mediated regulation of channel expression with the addition of new experiments. We will address the weaknesses as follows:

      Weaknesses:

      (1) The novelty of some of the experiments needs to be clarified. This reviewer's understanding is that prior experiments largely used a different OVX+E2 treatment paradigm mimicking periods of low estradiol levels, whereas the present work used a "high E2" treatment model. However, Figures 10C and D are repeated from a previous publication by the same group, according to the figure legend. Findings from "high" vs. "low" E2 treatment regimens should be labeled and clearly separated in the text. It would also help to have direct comparisons between results from low E2 and high E2 treatment conditions.

      We will revise Figures 10C and 10D to include new findings on Tac2 and Vglut2 expression in OVX and E2-treated Kiss1ARH. We did show the previously published data (Qiu, eLife 2018) to contrast with Figures 10E, F showing the downregulation of TRPC5 and GIRK2 channels following E2 treatment. Most importantly, our E2 treatment regime is clearly stated in the Methods and is exactly the same that was used previously (Qiu, eLife 2016 and Qiu, eLife 2018) for the induction of the LH surge in OVX mice (Bosch, Molecular and Cellular Endocrinology 2013) .

      (2) In multiple places, links are made between the changes in conductances and the transition from peptidergic to glutamatergic neurotransmission. However, this relationship is never directly assessed. The data that come closest are the qPCR results showing reduced Tac2 and increased Vglut2 mRNA, but in the figure legend, it appears that these results are from a prior publication using a different E2 treatment regimen.

      In the revised Figure 1, we will now include a clear depiction of the transition from synchronized firing driven by NKB signaling in OVX females to burst firing driven by glutamate in E2-treated females. We have used the same E2 treatment paradigm as previously published (Qiu, eLife 2018).

      (3) Similarly, no recordings of arcuate-AVPV glutamatergic transmission are made so the statements that Kiss1ARH neurons facilitate the GnRH surge via this connection are still only conjecture and not supported by the present experiments.

      Using a horizontal hypothalamic slice preparation, we have shown that Kiss1-ARH neurons excite GnRH neurons via Kiss1ARH glutaminergic input to Kiss1AvPV neurons (summarized in Fig. 12, Qiu, eLife 2016). We do not think that it is necessary to repeat these experiments in the current manuscript.

      (4) Figure 1 is not described in the Results section and is only tenuously connected to the statement in the introduction in which it is cited. The relevance of panels C and D is not clear. In this regard, much is made of the burst firing pattern that arises after E2 treatment in the model, but this burst firing pattern is not demonstrated directly in the slice electrophysiology examples.

      We will revised Figure 1 to include new whole-cell, current clamp recordings documenting the burst firing in response to glutamate in E2-treated, OVX females.

      (5) In Figure 3, it would be preferable to see the raw values for R1 and R2 in each cell, to confirm that all cells were starting from a similar baseline. In addition, it is unclear why the data for TTA-P2 is not shown, or how many cells were recorded to provide this finding.

      Before initiating photo-stimulation for each Kiss1-ARH neuron, we adjust the resting membrane potential to -70 mV, as noted in each panel in Figure 3, through current injections. We will include new findings on the effects of the T-channel blocker TTA-P2 on slow EPSP in the revised Figure 3. The number of cells tested with each calcium channel blocker is depicted in each of the bar graphs summarizing the effects of the blockers.

      (6) In Figure 5, panel C lists 11 cells in the E2 condition but panel E lists data from 37 cells. The reason for this discrepancy is not clear.

      In Figure 5E, we measured the L-, N-, P/Q and R channel currents after pretreatment with TTA-P2 to block the T-type current, whereas in Figure 5C, we measured the current without TTA-P2.

      (7) In all histogram figures, it would be preferable to have the data for individual cells superimposed on the mean and SEM.

      In all revised Figures we will include the individual data points for the individual neurons.

      (8) The CRISPR experiments were only performed in OVX mice, substantially limiting interpretation with respect to potential roles for TRPC5 in shaping arcuate kisspeptin neuron function during the preovulatory surge.

      The TRPC5 channels are most important for generating slow EPSPs when expression of NKB is high in the OVX state. Conversely, the glutamatergic response becomes more significant when the expression of NKB and TRPC5 channel are muted. Therefore, the CRISPR experiments were specifically conducted in OVX mice to maximize the effects.

      (9) Furthermore, there are no demonstrations that the CRISPR manipulations impair or alter the LH surge.

      In this manuscript, our focus is on the cellular electrophysiological activity of the Kiss1ARH neurons in ovx and E2-treated females. Exploration of CRISPR manipulations related to the LH surge is certainly slated for future experiments, but these in vivo experiments are beyond the scope of these comprehensive cellular electrophysiological and molecular studies.

      (10) The time of day of slice preparation and recording needs to be specified in the Methods.

      We will provide the times of slice preparation and recordings in the revised Methods and Materials.

      Reviewer #2 (Public Review):

      Summary:

      Kisspeptin neurons of the arcuate nucleus (ARC) are thought to be responsible for the pulsatile GnRH secretory pattern and to mediate feedback regulation of GnRH secretion by estradiol (E2). Evidence in the literature, including the work of the authors, indicates that ARC kisspeptin coordinate their activity through reciprocal synaptic interactions and the release of glutamate and of neuropeptide neurokinin B (NKB), which they co-express. The authors show here that E2 regulates the expression of genes encoding different voltage-dependent calcium channels, calcium-dependent potassium channels, and canonical transient receptor potential (TRPC5) channels and of the corresponding ionic currents in ARC kisspeptin neurons. Using computer simulations of the electrical activity of ARC kisspeptin neurons, the authors also provide evidence of what these changes translate into in terms of these cells' firing patterns. The experiments reveal that E2 upregulates various voltage-gated calcium currents as well as 2 subtypes of calcium-dependent potassium currents while decreasing TRPC5 expression (an ion channel downstream of NKB receptor activation), the slow excitatory synaptic potentials (slow EPSP) elicited in ARC kisspeptin neurons by NKB release and expression of the G protein-associated inward-rectifying potassium channel (GIRK). Based on these results, and on those of computer simulations, the authors propose that E2 promotes a functional transition of ARC kisspeptin neurons from neuropeptide-mediated sustained firing that supports coordinated activity for pulsatile GnRH secretion to a less intense firing in glutamatergic burst-like firing pattern that could favor glutamate release from ARC kisspeptin. The authors suggest that the latter might be important for the generation of the preovulatory surge in females.

      Strengths:

      The authors combined multiple approaches in vitro and in silico to gain insights into the impact of E2 on the electrical activity of ARC kisspeptin neurons. These include patch-clamp electrophysiology combined with selective optogenetic stimulation of ARC kisspeptin neurons, reverse transcriptase quantitative PCR, pharmacology, and CRIPR-Cas9-mediated knockdown of the Trpc5 gene. The addition of computer simulations for understanding the impact of E2 on the electrical activity of ARC kisspeptin cells is also a strength.

      The authors add interesting information on the complement of ionic currents in ARC kisspeptin neurons and on their regulation by E2 to what was already known in the literature. Pharmacological and electrophysiological experiments appear of the highest standards. Robust statistical analyses are provided throughout, although some experiments (illustrated in Figures 7 and 8) do have rather low sample numbers.

      The impact of E2 on calcium and potassium currents is compelling. Likewise, the results of Trpc5 gene knockdown do provide good evidence that the TRPC5 channel plays a key role in mediating the NKB-mediated slow EPSP. Surprisingly, this also revealed an unsuspected role for this channel in regulating the membrane potential and excitability of ARC kisspeptin neurons.

      We thank the reviewer for recognizing that the “pharmacological and electrophysiological experiments appear of the highest standards” and “the addition of the computer modeling for understanding the impact of E2 on the electrical activity of ARC kisspeptin cells is also a strength. However, we agree with the reviewer that we need to provide a direct demonstration of “burst-like” firing of Kiss1-ARH neurons. We will address the weaknesses as follows:

      Weaknesses:

      The manuscript also has weaknesses that obscure some of the conclusions drawn by the authors.

      One has to do with the fact that "burst-like" firing that the authors postulate ARC kisspeptin neurons transition to after E2 replacement is only seen in computer simulations, and not in slice patch-clamp recordings. A more direct demonstration of the existence of this firing pattern, and of its prominence over neuropeptide-dependent sustained firing under conditions of high E2 would make a more convincing case for the authors' hypothesis.

      We will provide a more direct demonstration of the existence of this firing pattern in the whole-cell current clamp experiments in the revised Figure 1.

      In addition, and quite importantly, the authors compare here two conditions, OVX versus OVX replaced with high E2, that may not reflect the physiological conditions (the diestrous [low E2] and proestrous [high E2] stages of the estrous cycle) under which the proposed transition between neuropeptide-dependent sustained firing and less intense burst firing might take place. This is an important caveat to keep in mind when interpreting the authors' findings. Indeed, that E2 alters certain ionic currents when added back to OVX females, does not mean that the magnitude of these ionic currents will vary during the estrous cycle.

      We have published that the magnitude of the slow EPSP, which is TRPC5 channel mediated, varies throughout the estrous cycle and the similarity to that found in OVX compared to E2-treated, OVX females (Figure 2, Qiu, eLife 2016). Moreover, TRPC5 channel mRNA expression, similar to the peptides, is downregulated by an E2 treatment (Figure 10 this manuscript) that mimics proestrus levels of the steroid (Bosch, Mol Cell Endocrinology 2013). Furthermore, the magnitude of ionic currents is directly proportional to the number of ion channels expressed in the plasma membrane, which we have found correlates with mRNA expression. Therefore, it is likely that the magnitude of these ionic currents will vary during the estrous cycle.

      Lastly, the results of some of the pharmacological and genetic experiments may be difficult to interpret as presented. For example, in Figure 3, although it is possible that blockade of individual calcium channel subtypes suppresses the slow EPSP through decreased calcium entry at the somato-dendritic compartment to sustain TRPC5 activation and the slow depolarization (as the authors imply), a reasonable alternative interpretation would be that at least some of the effects on the amplitude of the slow EPSP result from suppression of presynaptic calcium influx and, thus, decreased neurotransmitter and neuropeptide secretion. Along the same lines, in Figure 12, one possible interpretation of the observed smaller slow EPSPs seen in mice with mutant TRPC5 could be that at least some of the effect is due to decreased neurotransmitter and neuropeptide release due to the decreased excitability associated with TRPC5 knockdown.

      The reviewer raises a good point, but our previous findings clearly demonstrate that chelating intracellular calcium with BAPTA in whole-cell current clamp recordings abolishes the slow EPSP and persistent firing (Qiu, J. Neurosci 2021), which we have noted is the rationale for dissecting out the contribution of T, R, N, L and P/Q calcium channels to the slow EPSP in our current studies (revised Figure 3 will include the effects of T-channel blocker).

      However, to further bolster the argument for the post-synaptic contribution of the calcium channels to the slow EPSP and eliminate the potential presynaptic effects of calcium channel blockers on the postsynaptic slow EPSP amplitude, which may result from reduced presynaptic calcium influx and subsequently decreased neurotransmitter release, we will utilized an additional strategy. Specifically, we will measure the response to the externally administered TACR3 agonist senktide under conditions in which the extracellular calcium influx, as well as neurotransmitter and neuropeptide release, are blocked (new Figure 3).

    1. Author response:

      eLife assessment

      Unlocking the potential of molecular genetic tools (optogenetics, chemogenetics, sensors, etc.) for the study of systems neuroscience in nonhuman primates requires the development of effective regulatory elements for cell-type specific expression to facilitate circuit dissection. This study provides a valuable building block, by carefully characterizing the laminar expression profile of two viral vectors, one designed for general GABA+ergic neurons and the second for parvalbumin+ cell-type selective expression in the marmoset primary visual cortex. The authors provide solid evidence for the first enhancer S5E2 and incomplete evidence for the second one, h56D. This study contributes to our understanding of these tools but is limited by the understandably small number of animals used.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Federer et al. tested AAVs designed to target GABAergic cells and parvalbumin-expressing cells in marmoset V1. Several new results were obtained. First, AAV-h56D targeted GABAergic cells with >90% specificity, and this varied with serotype and layer. Second, AAV-PHP.eB.S5E2 targeted parvalbumin-expressing neurons with up to 98% specificity. Third, the immunohistochemical detection of GABA and PV was attenuated near viral injection sites.

      Strengths:

      Vormstein-Schneider et al. (2020) tested their AAV-S5E2 vector in marmosets by intravenous injection. The data presented in this manuscript are valuable in part because they show the transduction pattern produced by intraparenchymal injections, which are more conventional and efficient.

      Our manuscript additionally provides detailed information on the laminar specificity and coverage of these viral vectors, which was not investigated in the original studies.

      Weaknesses:

      The conclusions regarding the effects of serotype are based on data from single injection tracks in a single animal. I understand that ethical and financial constraints preclude high throughput testing, but these limitations do not change what can be inferred from the measurements. The text asserts that "...serotype 9 is a better choice when high specificity and coverage across all layers are required". The data presented are consistent with this idea but do not make a strong case for it.

      We are aware of the limitations of our results on the AAV-h56D. We agree with the Reviewer that a single injection per serotype does not allow us to make strong statements about differences between the 3 serotypes. Therefore, in the revised version of the manuscript we will temper our claims about such differences and use more caution in the interpretation of these data. Despite this weakness, we feel that these data still demonstrate high efficiency and specificity across cortical layers of transgene expression in GABA cells using the h56D promoter, at least with two of the 3 AAV serotypes we tested. We feel that in itself this is sufficiently useful information for the primate community, worthy of being reported. Due to cost, time and ethical considerations related to the use of primates, we chose not to perform additional experiments to determine precise differences among serotypes. Thus, for example, while it is possible that had we replicated these experiments, serotype 7 would have proven equally efficient and specific as the other two serotypes, we felt answering this question did not warrant additional experiments in this precious species.

      A related criticism extends to the analysis of injection volume on viral specificity. Some replication was performed here, but reliability across injections was not reported. My understanding is that individual ROIs were treated as independent observations. These are not biological replicates (arguably, neither are multiple injection tracks in a single animal, but they are certainly closer). Idiosyncrasies between animals or injections (e.g. if one injection happened to hit one layer more than another) could have substantial impacts on the measurements. It remains unclear which results regarding injection volume or serotype would hold up had a large number of injections been made into a large number of marmosets.

      For the AAV-S5E2, we made a total of 7 injections (at least 2 at the same volume), all of which, irrespective of volume, resulted in high specificity and efficiency for PV interneurons. Our conclusion is that larger volumes are slightly less specific, but the differences are minimal and do not warrant additional injections. Additionally, all of our injections involved all cortical layers, and the ROIs we selected for counts encompassed reporter protein expression across all layers. To provide a better sense of the reliability of the results across injections, in the revised version of the manuscript we will provide a supplementary table with results for each injection case separately.

      Reviewer #2 (Public Review):

      This is a straightforward manuscript assessing the specificity and efficiency of transgene expression in marmoset primary visual cortex (V1), for 4 different AAV vectors known to target transgene expression to either inhibitory cortical neurons (3 serotypes of AAV-h56D-tdTomato) or parvalbumin (PV)+ inhibitory cortical neurons in mice. Vectors are injected into the marmoset cortex and then postmortem tissue is analyzed following antibody labeling against GABA and PV. It is reported that: "in marmoset V1 AAV-h56D induces transgene expression in GABAergic cells with up to 91-94% specificity and 80% efficiency, depending on viral serotype and cortical layer. AAV-PHP.eB-S5E2 induces transgene expression in PV cells across all cortical layers with up to 98% specificity and 86-90% efficiency."

      These claims are largely supported but slightly exaggerated relative to the actual values in the results presented. In particular, the overall efficiency for the best h56D vectors described in the results is: "Overall, across all layers, AAV9 and AAV1 showed significantly higher coverage (66.1{plus minus}3.9 and 64.9%{plus minus}3.7)". The highest coverage observed is just in middle layers and is also less than 80%: "(AAV9: 78.5%{plus minus}9.1; AAV1: 76.9%{plus minus}7.4)".

      In the abstract, we indeed summarize the overall data and round up the decimals, and state that these parentages are upper bound and that they vary by serotype and layer, while in the Results we report the detailed counts with decimals. To clarify this, in the revised version of the Abstract we will change 80% to 79% and emphasize even more clearly the dependence on serotype and layer. We will amend this sentence of the Abstract as follows: “We show that in marmoset V1 AAV-h56D induces transgene expression in GABAergic cells with up to 91-94% specificity and 79% efficiency, but this depends on viral serotype and cortical layer.”

      For the AAV-PHP.eB-S5E2 the efficiency reported in the abstract ("86-90%) is also slightly exaggerated relative to the results: "Overall, across all layers coverage ranged from 78%{plus minus}1.9 for injection volumes >300nl to 81.6%{plus minus}1.8 for injection volumes of 100nl."

      Indeed, the numbers in the Abstract are upper bounds, for example efficiency in L4A/B with S5E2 reaches 90%. To further clarify this important point, in the revised abstract we will state ”AAV-PHP.eB-S5E2 induces transgene expression in PV cells across all cortical layers with up to 98% specificity and 86-90% efficiency, depending on layer”.

      These data will be useful to others who might be interested in targeting transgene expression in these cell types in monkeys. Suggestions for improvement are to include more details about the vectors injected and to delete some comments about results that are not documented based on vectors that are not described (see below).

      Major comments:

      Details provided about the AAV vectors used with the h56D enhancer are not sufficient to allow assessment of their potential utility relative to the results presented. All that is provided is: "The fourth animal received 3 injections, each of a different AAV serotype (1, 7, and 9) of the AAV-h56D-tdTomato (Mehta et al., 2019), obtained from the Zemelman laboratory (UT Austin)." At a minimum, it is necessary to provide the titers of each of the vectors. It would also be helpful to provide more information about viral preparation for both these vectors and the AAVPHP.eB-S5E2.tdTomato. Notably, what purification methods were used, and what specific methods were used to measure the titers?

      We thank the Reviewer for this comment. In the revised version of the manuscript, we will provide a Table with titers of each viral vector injected as well as more information regarding viral preparation methods. In fact, the methods for viral preparation and purification are detailed in the original publications so we feel it may be sufficient to cite the original papers?

      The first paragraph of the results includes brief anecdotal claims without any data to support them and without any details about the relevant vectors that would allow any data that might have been collected to be critically assessed. These statements should be deleted. Specifically, delete: "as well as 3 different kinds of PV-specific AAVs, specifically a mixture of AAV1-PaqR4-Flp and AAV1-h56D-mCherry-FRT (Mehta et al., 2019), an AAV1-PV1-ChR2-eYFP (donated by G. Horwitz, University of Washington)," and delete "Here we report results only from those vectors that were deemed to be most promising for use in primate cortex, based on infectivity and specificity. These were the 3 serotypes of the GABA-specific pAAV-h56D-tdTomato, and the PV-specific AAVPHP.eB-S5E2.tdTomato." These tools might in fact be just as useful or even better than what is actually tested and reported here, but maybe the viral titer was too low to expect any expression.

      This data is indeed anecdotal, and while we could delete it from the manuscript, as suggested by the Reviewer, we feel it could be useful information for the scientific community. It could prevent other labs from wasting resources, animals and time, particularly, as some of these vectors have been reported to be selective and efficient in the primate cortex, which we have not been able to confirm. We made several injections in several animals of those vectors that failed either to infect a sufficient number of cells or turned out to be poorly specific. Therefore, the negative results have been consistent. But we agree with the Reviewer that our negative results could have depended on factors such as titer. In the revised version of the manuscript, we will provide a supplementary Methods section in which we will report the specifics of the vectors that failed in our hands (i.e. number of injections made in how many animals, volumes, survival time, and titers).

      Based on the description in the Methods it seems that no antibody labeling against TdTomato was used to amplify the detection of the transgenes expressed from the AAV vectors. It should be verified that this is the case - a statement could be added to the Methods.

      That is indeed the case. We used no immunohistochemistry to enhance the reporter proteins as this was unnecessary. The native / non-emplified tdT signal was strong.

      Reviewer #3 (Public Review):

      Summary:

      Federer et al. describe the laminar profiles of GABA+ and of PV+ neurons in marmoset V1. They also report on the selectivity and efficiency of expression of a PV-selective enhancer (S5E2). Three further viruses were tested, with a view to characterizing the expression profiles of a GABA-selective enhancer (h56d), but these results are preliminary.

      Strengths:

      The derivation of cell-type specific enhancers is key for translating the types of circuit analyses that can be performed in mice - which rely on germline modifications for access to cell-type specific manipulation - in higher-order mammals. Federer et al. further validate the utility of S5E2 as a PV-selective enhancer in NHPs.

      Additionally, the authors characterize the laminar distribution pattern of GABA+ and PV+ cells in V1. This survey may prove valuable to researchers seeking to understand and manipulate the microcircuitry mediating the excitation-inhibition balance in this region of the marmoset brain.

      Weaknesses:

      Enhancer/promoter specificity and efficiency cannot be directly compared, because they were packaged in different serotypes of AAV.

      The three different serotypes of AAV expressing reporter under the h56D promoter were only tested once each, and all in the same animal. There are many variables that can contribute to the success (or failure) of a viral injection, so observations with an n=1 cannot be considered reliable.

      This is an important point that was also brought up by the Reviewer 1, which we thoroughly addressed in our comments. For clarity and convenience, we copied our response to Reviewer 1 below:.

      We are aware of the limitations of our results on the AAV-h56D. We agree with the Reviewer that a single injection per serotype does not allow us to make strong statements about differences between the 3 serotypes. Therefore, in the revised version of the manuscript we will temper our claims about such differences and use more caution in the interpretation of these data. Despite this weakness, we feel that these data still demonstrate high efficiency and specificity across cortical layers of transgene expression in GABA cells using the h56D promoter, at least with two of the 3 AAV serotypes we tested. We feel that in itself this is sufficiently useful information for the primate community, worthy of being reported. Due to cost, time and ethical considerations related to the use of primates, we chose not to perform additional experiments to determine precise differences among serotypes. Thus, for example, while it is possible that had we replicated these experiments, serotype 7 would have proven equally efficient and specific as the other two serotypes, we felt answering this question did not warrant additional experiments in this precious species.

      The language used throughout conflates the cell-type specificity conferred by the regulatory elements with that conferred by the serotype of the virus.

      In the revised version of the manuscript we will correct ambiguous language.

    1. Author response:

      The following is the authors’ response to the original reviews.

      General responses to the weaknesses of this work:

      The two reviewers mentioned two major weaknesses of this work:

      (1) The one unexplained step in this intricately described mechanism is how HSCB functions to promote TACC3 degradation. It appears that the proteasome is involved since MG-132 reverses the effect of HSCB deficiency, but no other details are provided. Does HSCB target TACC3 for ubiquitination somehow? Future studies will be required to understand this portion of the mechanism.

      We totally agree that the detailed mechanisms through which HSCB promotes TACC3 degradation should be clarified. We tried to find the ubiquitin ligases involved in this regulatory process but could not identify such a key protein so far. We also investigated whether HSCB itself is a ubiquitin ligase but found that the protein does not possess this activity. We therefore consider this weakness another limitation of this research and have added one sentence to the penultimate paragraph of the Discussion section to address this issue.

      (2) This study only uses cell models. The significance of this work may be broadened by further studies using animal models.

      We totally agree that in vivo models should be adopted to validate the major findings of this study. As we stated in the penultimate paragraph of the Discussion section, we did not have access to biological samples from the patient harboring the HSCB mutation. Additionally, HSCB constitutive knockout mice died during the embryonic stage, while conditional knockout did not cause embryonic death but resulted in almost no erythroid cells in the bone marrow. Therefore, we were not able to further validate our findings in in vivo models.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      • Figure 3A - Should include FOG1 on the total cell lysate blots to show if total FOG1 is changing or only the cytoplasmic/nuclear ratio. This is shown later but would be good to include here.

      We would like to thank the reviewer for the nice suggestion. We have added the blots for total FOG1 to updated Figure 3A as requested.

      • Figures 3C and 4F - Should include the qPCR results from control cultures on the graphs (EPO + CRISPR NC and shNC, respectively).

      We would like to thank the reviewer for the good suggestion. We have added the control groups for all qPCR assays to the updated figures throughout the study.

      • Figure 4 - The addition of genetic manipulation of TACC3 to confirm its role in the cytoplasmic retention of FOG1 and failed erythroid differentiation in HSCB-deficient cells would strengthen the conclusions of this figure.

      We would like to thank the reviewer for the good suggestion. We initially tried to knock down TACC3 expression through siRNAs to confirm its role in the cytoplasmic retention of FOG1. However, we found that siRNAs that worked well in untreated K562 and erythroid progenitor cells as well as several other cell lines had poor efficiency of knocking down gene expression upon HSCB deficiency. This happened not only to siRNAs targeting TACC3, but also to those targeting several other genes. Interestingly, gene overexpression plasmids worked especially well in HSCB-deficient cells. We were not able to explain these phenomena and chose to use an inhibitor of TACC3 to study its functional implications in this research.

      • Text should be added to discuss the implications of this work for the lineage-specifying function of GATA-1. There are papers by John Crispino and Alan Cantor/Stu Orkin using the FOG-binding mutant of GATA-1 that implicate FOG1-dependent GATA-1 activity as Meg/Ery specifying, whereas FOG1-independent GATA-1 activity promotes mast cell or eosinophil fate. This work suggests that GATA1-expressing myeloid progenitors where FOG1 is kept cytoplasmic (no EPO signaling) would be driven towards the mast cell fate.

      We would like to thank the reviewer for the valuable suggestion. We have added a new paragraph in the Discussion section of the updated manuscript to discuss the implication of this work for the lineage-specifying function of GATA-1.

      Reviewer #2 (Recommendations For The Authors):

      Minor comments:

      (1) In the model provided in Figure 7H, HSCB and FOG1 bind TACC3 simultaneously. However based on the data provided in Figure 6B and other figures, it seems that their interactions are more likely to be mutually exclusive. Is there a possibility that, besides inducing the degradation of TACC3, the binding of HSCB can inhibit the interaction between TACC3 and FOG1?

      We would like to thank the reviewer for the insightful comment. According to the data presented in the updated Figure 5F, TACC3 can simultaneously bind with HSCB and FOG1 in E 2-day HSCs. That is why we depict the simultaneous binding pattern in the model provided in Figure 7H. However, we agree that there is a possibility that the binding of HSCB can inhibit the interaction between TACC3 and FOG1 and have mentioned this possibility in the “Phosphorylation of HSCB by PI3K was necessary for its functionalization during human erythropoiesis” subsection of the “Results” section in the updated manuscript.

      (2) Whether the decreased TACC3 protein abundance (Figure 5D) during erythroblast differentiation is mainly due to the effect of HSCB. Can silencing of HSCB block this reduction?

      We would like to thank the reviewer for the great question. We have analyzed the protein abundance of TACC3 in HSCB-deficient hematopoietic stem cells induced for erythropoiesis for 0, 2 and 4 days and summarized the results as a new Figure 5E. According to the results, TACC3 protein abundance in HSCB-deficient hematopoietic stem cells exhibited no obvious change when the cells were induced for erythropoiesis for 0, 2 and 4 days. These results suggest that the decreased TACC3 protein abundance during early erythroblast differentiation was indeed due to the effect of HSCB. We only investigated the effect of HSCB on TACC3 abundance in early erythroid progenitors because, as shown in Figure 1, HSCB-deficient hematopoietic stem cells stopped differentiation at an early phase of their erythropoiesis. We have also mentioned these data in the “HSCB facilitated FOG1 nuclear translocation by binding with and mediating the proteasomal degradation of TACC3 upon activation of the EPO/EPOR signaling” subsection of the “Results” section in the updated manuscript.

      (3) This study shows that HSCB can be phosphorylated by PI3K, and this modification is important for its role in regulating FOG1 distribution. Does the phosphorylation of HSCB also affect its function in ISC biogenesis?

      We would like to thank the reviewer for the instructive question. We have analyzed the mitochondrial and cytosolic aconitase activities in wortmannin-treated K562 and E 2-day HSCs and their respective controls. The results have been summarized as a new Figure S5. According to the results, wortmannin treatment did not significantly affect mitochondrial and cytosolic aconitase activities. Therefore, it seems that HSCB phosphorylation does not affect its function in ISC biogenesis. We have also mentioned these data in the “Phosphorylation of HSCB by PI3K was necessary for its functionalization during human erythropoiesis” subsection of the “Results” section in the updated manuscript.

      (4) The method of isolation of nuclear fraction needs to be provided in the "Materials and Methods" section.

      We would like to thank the reviewer for the thoughtful suggestion. We have added the required information to the “Nuclear proteomics analysis” subsection of the "Materials and Methods" section in the updated manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      Following small molecule screens, this study provides convincing evidence that 7,8 dihydroxyflavone (DHF) is a competitive inhibitor of pyridoxal phosphatase. These results are important since they offer an alternative mechanism for the effects of 7,8 dihdroxyflavone in cognitive improvement in several mouse models. This paper is also significant due to the interest in the protein phosphatases and neurodegeneration fields.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Zink et al set out to identify selective inhibitors of the pyridoxal phosphatase (PDXP). Previous studies had demonstrated improvements in cognition upon removal of PDXP, and here the authors reveal that this correlates with an increase in pyridoxal phosphate (PLP; PDXP substrate and an active coenzyme form of vitamin B6) with age. Since several pathologies are associated with decreased vitamin B6, the authors propose that PDXP is an attractive therapeutic target in the prevention/treatment of cognitive decline. Following high throughput and secondary small molecule screens, they identify two selective inhibitors. They follow up on 7, 8 dihydroxyflavone (DHF). Following structure-activity relationship and selectivity studies, the authors then solve a co-crystal structure of 7,8 DHF bound to the active site of PDXP, supporting a competitive mode of PDXP inhibition. Finally, they find that treating hippocampal neurons with 7,8 DHF increases PLP levels in a WT but not PDXP KO context. The authors note that 7,8 DHF has been used in numerous rodent neuropathology models to improve outcomes. 7, 8 DHF activity was previously attributed to activation of the receptor tyrosine kinase TrkB, although this appears to be controversial. The present study raises the possibility that it instead/also acts through modulation of PLP levels via PDXP, and is an important area for future work.

      Strengths:

      The strengths of the work are in the comprehensive, thorough, and unbiased nature of the analyses revealing the potential for therapeutic intervention in a number of pathologies.

      Weaknesses:

      Potential weaknesses include the poor solubility of 7,8 DHF that might limit its bioavailability given its relatively low potency (IC50= 0.8 uM), which was not improved by SAR. However, the compound has an extended residence me and the co-crystal structure could aid the design of more potent molecules and would be of interest to those in the pharmaceutical industry. The images related to crystal structure could be improved.

      Reviewer #2 (Public Review):

      Summary:

      In this study, the authors performed a screening for PDXP inhibitors to identify compounds that could increase levels of pyridoxal 5'- phosphate (PLP), the co-enzymatically active form of vitamin B6. For the screening of inhibitors, they first evaluated a library of about 42,000 compounds for activators and inhibitors of PDXP and secondly, they validated the inhibitor compounds with a counter-screening against PGP, a close PDXP relative. The final narrowing down to 7,8-DHF was done using PLP as a substrate and confirmed the efficacy of this flavonoid as an inhibitor of PDXP function. Physiologically, the authors show that, by acutely treating isolated wild-type hippocampal neurons with 7,8-DHF they could detect an increase in the ratio of PLP/PL compared to control cultures. This effect was not seen in PDXP KO neurons.

      Strengths:

      The screening and validation of the PDXP inhibitors have been done very well because the authors have performed crystallographic analysis, a counter screening, and mutation analysis. This is very important because such rigor has not been applied to the original report of 7,8 DHF as an agonist for TrkB. Which is why there is so much controversy on this finding.

      Weaknesses:

      As mentioned in the summary report the study may benefit from some in vivo analysis of PLP levels following 7,8-DHF treatment, although I acknowledge that it may be challenging because of the working out of the dosage and timing of the procedure.

      Reviewer #3 (Public Review):

      This is interesting biology. Vitamin B6 deficiency has been linked to cognitive impairment. It is not clear whether supplements are effective in restoring functional B6 levels. Vitamin B6 is composed of pyridoxal compounds and their phosphorylated forms, with pyridoxal 5-phosphate (PLP) being of particular importance. The levels of PLP are determined by the balance between pyridoxal kinase and phosphatase activities. The authors are testing the hypothesis that inhibition of pyridoxal phosphatase (PDXP) would arrest the age-dependent decline in PLP, offering an alternative therapeutic strategy to supplements. Published data illustrating that ablation of the Pdxp gene in mice led to increases in PLP levels and improvement in learning and memory trials are consistent with this hypothesis.

      In this report, the authors conduct a screen of a library of ~40k small molecules and identify 7,8dihydroxyflavone (DHF) as a candidate PDXP inhibitor. They present an initial characterization of this micromolar inhibitor, including a co-crystal structure of PDXP and 7,8-DHF. In addition, they demonstrate that treatment of cells with 7,8 DHP increases PLP levels. Overall, this study provides further validation of PDXP as a therapeutic target for the treatment of disorders associated with vitamin B6 deficiency and provides proof-of-concept for inhibition of the target with small-molecule drug candidates.

      Strengths include the biological context, the focus on an interesting and under-studied class of protein phosphatases that includes several potential therapeutic targets, and the identification of a small molecule inhibitor that provides proof-of-concept for a new therapeutic strategy. Overall, the study has the potential to be an important development for the phosphatase field in general.

      Weaknesses include the fact that the compound is very much an early-stage screening hit. It is an inhibitor with micromolar potency for which mechanisms of action other than inhibition of PDXP have been reported. Extensive further development will be required to demonstrate convincingly the extent to which its effects in cells are due to on-target inhibition of PDXP.

      Recommendations for the authors:

      There is general agreement that the study represents an advance regarding the mechanisms of pyridoxal phosphatase and 7,8 DHF. From the reviewers' comments, several major questions and considerations are raised, followed by their detailed remarks:

      (1) More analysis of the solubility and dose of 7,8 DHF with regard to the 50% inhibition and the salt bridge of the B protomer, as raised by the reviewers.

      (2) Is there a possible involvement of another phosphatase?

      (3) Does 7,8 DHF cause an effect upon TrkB tyrosine phosphorylation?

      We thank the Reviewers and Editors for their fair and constructive comments and suggestions. We have performed additional experiments to address these questions and considerations. In addition, we have generated two new high-resoling (1.5 Å) crystal structures of human PDXP in complex with 7,8-DHF that substantially expand our understanding of 7,8-DHF-mediated PDXP inhibition. The scientist who performed this work for the revision of our manuscript has been added as an author (shared first authorship).

      We believe that the insights gained from these new data have further strengthened and improved the quality of our manuscript. Together, our data provide compelling evidence that 7,8-dihydroxyflavone is a direct and competitive inhibitor of pyridoxal phosphatase.

      Please find our point-by-point responses to the Public Reviews that are not addressed in the Recommendations for the Authors, and the Recommendations for the Authors below.

      Reviewer #2:

      As mentioned in the summary report the study may benefit from some in vivo analysis of PLP levels following 7,8-DHF treatment, although I acknowledge that it may be challenging because of the working out of the dosage and timing of the procedure.

      We agree that an in vivo analysis of PLP levels following 7,8-DHF treatment could be informative for the further evaluation of a possible mechanistic link between the reported effects of this compound and PDXP/vitamin B6. However, we currently do not have a corresponding animal experimentation permission in place and are unlikely to obtain such a permit within a reasonable me frame for this revision.

      Recommendations For The Authors:

      Reviewer #1:

      The work is already well-written, comprehensive, and convincing.

      Suggestions that could improve the manuscript.

      (1) Include a protein tyrosine phosphatase (PTP) in the selectivity analysis. One possibility is that 7,8 DHF acts on a PTP (such as PTP1B), leading to TrkB activation by preventing dephosphorylation. I note that a previous study has looked at SAR for flavones with PTP1B (PMID: 29175190), which is worth discussion.

      We thank the reviewer for bringing this interesting possibility to our attention. We were not aware of the SAR study for flavonoids with PTP1B by Proenca et al. but have now tested the effect of 7,8-DHF on PTP1B, referring to this paper. As shown in Figure 2d, PTP1B was not inhibited by 7,8-DHF at a concentration of 5 or 10 µM. At the highest tested concentration of 40 µM, 7,8-DHF inhibited PTP1B merely by ~20%. For comparison, compound C13 (3-hydroxy-7,8-dihydroxybenzylflavone-3’,4’dihydroxymethyl-phenyl), which emerged as the most active flavonoid in the SAR study by Proenca et al. inhibited PTP1B with an IC50 of 10 µM. Consistent with the results of these authors, our finding confirms that less polar substituents, such as O-benzyl groups at positions 7 and 8, and O-methyl groups at positions 3’ and 4’ of the flavone scaffold, are important for the ability of flavonoids to effectively inhibit PTP1B. We conclude that PTP1B inhibition by 7,8-DHF is unlikely to be a primary contributor to the reported cellular and in vivo effects of this flavone.

      In addition to PTP1B, we have now additionally tested the effect of 7,8-DHF on the serine/threonine protein phosphatase calcineurin/PP2B, the DNA/RNA-directed alkaline phosphatase CIP, and three other metabolite-directed HAD phosphatases, namely NANP, NT5C1A and PNKP. PP2B, CIP and NANP were not inhibited by 7,8-DHF. Similar to PTP1B, PNKP activity was attenuated (~30%) only at 40 µM 7,8-DHF. In contrast, 7,8-DHF effectively inhibited NT5C1A (IC50 ~10 µM). NT5C1A is an AMP hydrolase expressed in skeletal muscle and heart. To our knowledge, a role of NT5C1A in the brain has not been reported. Based on currently available information, the inhibition of NT5C1A therefore appears unlikely to contribute to 7,8-DHF effects in the brain.

      The results of these experiments are shown in the revised Figure 2d. Taken together, the extended selectivity analysis of 7,8-DHF on a total of 12 structurally and functionally diverse protein- and nonprotein-directed phosphatases supports our initial conclusion that 7,8-DHF preferentially inhibits PDXP.

      (2) Line 144: It is unclear how fig 2c supports the statement here. Remove call out for clarity.

      Our intention was to highlight the fact that 7,8-DHF concentrations >12.5 µM could not be tested in the BLI assay (shown in Figure 2c) due to 7,8-DHF solubility issues under these experimental conditions. However, since this is discussed in the text, but not directly visible in Figure 2c, we agree with the Reviewer and have removed this call out.

      (3) Figure 3a. It is difficult to see the pink 7,8 DHF on top of the pink ribbon backbone. A better combination of colours could be used. Likewise in Figure 3b it is pink on pink again.

      We have improved the combination of colors to enhance the visibility of 7,8-DHF and have consistently color-coded murine and the new human PDXP structures throughout the manuscript.

      (4) Figure 3c and d. These are the two protomers I believe, but the colour coding is not present in 3c where the ribbon is now gray. Please choose colours that can be used to encode protomers throughout the figure.

      Please see response to point 3 above.

      (5) Figure 3f. I think this is the same protomer as 3c but a 180-degree rotation. Could this be indicated, or somehow lined up between the two figures for clarity? It would also be useful to have 3e in the same orientation as 3f, to better visualise the overlap with PLP binding. PLP and 7,8 DHF could be labelled similarly to the amino acids in 3f (the colour coding here is helpful).

      Please see response to point 3 above. We have substantially revised the structural figures and have used consistent color coding and the same perspective of 7,8-DHF in the PDXP active sites.

      (6) Figure 3g. The colours of the bars relating to specific mutations do not quite match the colours in Figure 3f, which I think was the aim and is very helpful.

      We have adapted the colours of the residues in Figure 3f (now Fig. 3b and additionally Fig. 3 – figure supplement 1e) so that they exactly match the colours of the bars in Figure 3g (now Fig. 3d).

      Reviewer #2:

      No further comments.

      Reviewer #3:

      Page 4: The authors describe 7,8DHF as a "selective" inhibitor of PDXP - in my opinion, they do not have sufficient data to support such a strong assertion. Reports that 7,8DHF may act as a TRK-B-agonist already highlight a potential problem of off-target effects. Does 7,8DHF promote tyrosine phosphorylation of TRK-B in their hands? The selectivity panel presented in Figure 2, focusing on 5 other HAD phosphatases, is much too limited to support assertions of selectivity.

      We agree with the Reviewer that our previous selectivity analysis with six HAD phosphatases was limited. To further explore the phosphatase target spectrum of 7,8-DHF, we have now analyzed six other enzymes: three other non-HAD phosphatases (the tyrosine phosphatase PTP1B, the serine/threonine protein phosphatase PP2B/calcineurin, and the DNA/RNA-directed alkaline phosphatase/CIP) and three other non-protein-directed C1/C0-type HAD phosphatases (NT5C1A, NANP, and PNKP). The C1-capped enzymes NT5C1A and NANP were chosen because we previously found them to be sensitive to small molecule inhibitors of the PDXP-related phosphoglycolate phosphatase PGP (PMID: 36369173). PNKP was chosen to increase the coverage of C0-capped HAD phosphatases (previously, only the C0-capped MDP1 was tested).

      We found that calcineurin, CIP and NANP were not inhibited by up to 40 µM 7,8-DHF. The activities of PTP1B or PNKP activity were attenuated (by ~20 or 30%, respectively) only at 40 µM 7,8-DHF. In contrast, 7,8-DHF effectively inhibited NT5C1A (IC50 ~10 µM). We have previously found that NT5C1A was sensitive to small-molecule inhibitors of the PDXP paralog PGP, although these molecules are structurally unrelated to 7,8-DHF (PMID: 36369173). NT5C1A is an AMP hydrolase expressed in skeletal muscle and heart (PMID: 12947102). To our knowledge, a role of NT5C1A in the brain has not been reported. Based on currently available information, the inhibition of NT5C1A therefore appears unlikely to contribute to 7,8-DHF effects in the brain. The results of these experiments are shown in the revised Figure 2d. Taken together, the extended selectivity analysis of 7,8-DHF on a total of 12 structurally and functionally diverse protein- and non-protein-directed phosphatases supports our initial conclusion that 7,8-DHF preferentially inhibits PDXP. To nevertheless avoid any overstatement, we have now also replaced “selective” by “preferential” in this context throughout the manuscript.

      We have not tested if 7,8-DHF promotes tyrosine phosphorylation of TRK-B. Being able to detect 7,8- DHF-induced TRK-B phosphorylation in our hands would not exclude an additional role for PDXP/vitamin B6-dependent processes. Not being able to detect TRK-B phosphorylation may indicate absence of evidence or evidence of absence. This would neither conclusively rule out a biological role for 7,8-DHF-induced TRK-B phosphorylation in vivo, nor contribute further insights into a possible involvement of vitamin B6-dependent processes in 7,8-DHF induced effects.

      Page 6: The authors report that they obtained only two PDXP-selective inhibitor hits from their screen; 7,8DHF and something they describe as FMP-1. For the later, they state that it "was obtained from an academic donor, and its structure is undisclosed for intellectual property reasons". In my opinion, this is totally unacceptable. This is an academic research publication. If the authors wish to present data, they must do so in a manner that allows a reader to assess their significance; in the case of work with small molecules that includes the chemical structure. In my opinion, the authors should either describe the compound fully or remove mention of it altogether.

      We are unable to describe “FMP-1” because its identity has not been disclosed to us. The academic donor of this molecule informed us that they were not able to permit release of any details of its structure or general structural class due to an emerging commercial interest.

      We mentioned FMP-1 simply to highlight the fact that the screening campaign yielded more than one inhibitor. FMP-1 was also of interest due its complete inhibition of PDXP phosphatase activity.

      Because the structure of this molecule is unknown to us, we have now removed any mention of this compound in the manuscript. For the same reason, we have removed the mention of the inhibitor hits “FMP-2” and “FMP-3” in Figure 2 – figure supplement 1 and Figure 2 – figure supplement 2. The number of PDXP inhibitor hits in the manuscript has been adapted accordingly.

      Page 7: The observed plateau at 50% inhibition requires further explanation. It is not clear how poor solubility of the compound explains this observation. For example, the authors state that "due to the aforementioned poor solubility of 7,8DHF, concentrations higher than 12.5µM could not be evaluated". Yet on page 8, they describe assays against the specificity panel at concentrations of compound up to 40µM. Do the analogues of 7,8DHF (Fig 2b) result in >50% inhibition at higher concentrations? Further explanation and data on the solubility of the compounds would be of benefit.

      We currently do not have a satisfactory explanation for the apparent plateau of ~50% PDXP inhibition by 7,8-DHF. Resolving this question will likely require other approaches, including computational chemistry such as molecular dynamics simulations, and we feel that this is beyond the scope of the present manuscript.

      We previously speculated that the limited solubility of 7,8-DHF may counteract a complete enzyme inhibition if higher concentrations of this molecule are required. Specifically, we referred to Todd et al. who have performed HPLC-UV-based solubility assays of 7,8-DHF (ref. 35). These authors found that immediately after 7,8-DHF solubilization, nominal 7,8-DHF concentrations of 5, 20 or 50 µM resulted in 0.5, 3.0 or 13 µM of 7,8-DHF in solution of (i.e., 10, 15 or 26% of the respective nominal concentration). Seven hours later, 46, 26 or 26% of the respective nominal 7,8-DHF concentrations were found in solution. Hence, above a nominal concentration of 5 µM, 7,8-DHF solubility does not increase linearly with the input concentration, but plateaus at ~20% of the nominal concentration. This phenomenon could potentially contribute to the apparent plateau of human or murine PDXP inhibition by 7,8-DHF in vitro.

      However, experiments performed during the revision of our manuscript show that they HAD phosphatase NT5C1A can be effectively inhibited by 7,8-DHF with an IC50-value of 10 µM (see revised Fig. 2). Together with the fact that the activity of the PDXP-Asn61Ser variant can be completely inhibited by 7,8-DHF (see Fig. 3d), we conclude that the reason for the observed plateau of PDXP inhibition is likely to be primarily structural, with Asn61 impeding 7,8-DHF binding. We have therefore removed the mention of the limited solubility of 7,8-DHF here. On p.14, we now say: “These data also suggest that Asn61 contributes to the limited efficacy of 7,8-mediated PDXP inhibition in vitro.”

      The solubility of 7,8-DHF is dependent on the specific assay and buffer conditions. In BLI experiments, interference patterns caused by binding of 7,8-DHF in solution to biotinylated PDXP immobilized on the biosensor surface are measured. In phosphatase selectivity assays, phosphatases are in solution, and the effect of 7,8-DHF on the phosphatase activity is measured via the quantification of free inorganic phosphate.

      In BLI experiments, we observed that the sensorgrams obtained with the highest tested 7,8-DHF concentration (25 µM) showed the same curve shapes as the sensorgrams obtained with 12.5 µM 7,8-DHF. This contrasts with the expected steeper slope of the curves at 25 µM vs. 12.5 µM 7,8-DHF. The same behavior was observed for the reference sensors (i.e., the SSA sensors that were not loaded with PDXP, but incubated with 7,8-DHF at all employed concentrations for referencing against nonspecific binding of 7,8-DHF to the sensors). The sensorgrams at 25 µM 7,8-DHF were therefore not included in the analysis (this is now specified in the Materials and Methods BLI section on p.27). To clarify this point, we now state that “As a result of the poor solubility of the molecule, a saturation of the binding site was not experimentally accessible” (p.7).

      In contrast, the phosphatase selectivity assays described on p.8 could be performed with nominal 7,8-DHF concentrations of up to 40 µM. Although the effective 7,8-DHF concentration in solution is expected to be lower (see ref. 35 and discussed above), the limited solubility of 7,8-DHF in phosphatase assays does not prevent the quantification of free inorganic phosphate. Nevertheless, we cannot exclude some interference with this absorbance-based assay (e.g., due to turbidity caused by insoluble compound). Indeed, 5,6-dihydroxyflavone and 5,6,7-trihydroxyflavone caused an apparent increase in PDXP activity at concentrations above 10 µM (see Figure 2b), which may be related to compound solubility issues. Alternatively, these flavones may activate PDXP at higher concentrations.

      We have tested the 7,8-DHF analogue 3,7,8,4’-tetrahydroxyflavone at concentrations of 70 and 100 µM. At concentrations >100 µM, the DMSO concentration required for solubilizing the flavone interferes with PDXP activity. PDXP inhibition by 3,7,8,4’-tetrahydroxyflavone was slightly increased at 70 µM compared to 40 µM (by ~18%) but plateaued between 70 and 100 µM. These results are now mentioned in the text (p.7): “The efficacy of PDXP inhibition by 3,7,8,4’-tetrahydroxyflavone was not substantially increased at concentrations >40 µM (relative PDXP activity at 40 µM: 0.46 ± 0.05; at 70 µM: 0.38 ± 0.15; at 100 µM: 0.37 ± 0.09; data are mean values ± S.D. of n=6 experiments).”

      Page 9: The authors report that PDXP crystallizes as a homodimer in which 7,8DHF is bound only to one protomer. Is the second protomer active? Does that contribute to the 50% inhibition plateau? If Arg62 is mutated to break the salt bridge, does inhibition go beyond 50%?

      We have no way to measure the activity of the second, inhibitor-free protomer in murine PDXP. We know that PDXP functions as a constitutive homodimer, and based on our current understanding, both protomers are active. We have previously shown that the experimental monomerization of PDXP (upon introduction of two-point mutants in the dimerization interface) strongly reduces its phosphatase activity. Specifically, PDXP homodimerization is required for an inter-protomer interaction that mediates the proper positioning of the substrate specificity loop. Thus, homodimerization is necessary for effective substrate coordination and -dephosphorylation (PMID: 24338687).

      In the murine structure, we observed that 7,8-DHF binding to the second subunit (the B-protomer) is prevented by a salt bridge between Arg62 and Asp14 of a symmetry-related A-protomer in the crystal lace (i.e., this is not a salt bridge between Arg62 in the B-protomer and Asp14 in the A-protomer of a PDXP homodimer). As suggested, we have nevertheless tested the potential role of this salt bridge for the sensitivity of the PDXP homodimer to 7,8-DHF.

      The mutation of Arg62 is not suitable to answer this question, because this residue is involved in the coordination of 7,8-DHF (see Figure 3b), and the PDXP-Arg62Ala mutant is inhibitor resistant (see Figure 3d). We have therefore mutated Asp14, which is not involved in 7,8-DHF coordination. As shown in the new Figure 3 – figure supplement 1d, the 7,8-DHF-mediated inhibition of PDXPAsp14Ala again reached a plateau at ~50%. This result suggests that while an Arg62-Asp14 salt bridge is stabilized in the murine crystal, it is not a determinant of the active site accessibility of protomer B in solution.

      To address this important question further, we have now also generated co-crystals of human PDXP bound to 7,8-DHF, and refined two structures to 1.5 Å. We found that in human PDXP, both protomers bind 7,8-DHF. These new, higher resolution data are now shown in the revised Figure 3 and its figure supplements, and we have moved the panels referring to the previously reported murine PDXP structure to the Figure 3 – figure supplement 1. Thus, both protomers of human PDXP, but only one protomer of murine PDXP bind 7,8-DHF in the crystal structure, yet the 7,8-DHFmediated inhibition of human and murine PDXP plateaus at ~50% under the phosphatase assay conditions (see Figure 2a). We conclude that 7,8-DHF binding efficiency in the PDXP crystal does not necessarily reflect its inhibitory efficiency in solution.

      Taken together, these data indicate that the apparent partial inhibition of murine and human PDXP phosphatase activity by 7,8-DHF in our in vitro assays is not explained by an exclusive binding of 7,8DHF to just one protomer of the homodimer.

      Page 10-12; Is it possible to generate a mutant form of PDXP in which activity is maintained but inhibition is attenuated - an inhibitor-resistant mutant form of PDXP? Can such a mutant be used to assess on-target vs off-target effects of 7,8DHF in cells?

      This is an excellent point, and we agree with the Reviewer that such an approach would provide further evidence for cellular on-target activity of 7,8-DHF. Indeed, the verification of the PDXP-7,8DHF interaction sites has led to the generation of catalytically active, inhibitor-resistant PDXP mutants, such as Tyr146Ala and Glu148Ala (Fig. 3d). However, the biochemical analysis of such mutants in primary hippocampal neurons is a very difficult task.

      Primary hippocampal neurons are derived from pooled, isolated hippocampi of mouse embryos and are subsequently differentiated for 21 days in vitro. The resulting cellular yield is typically low and variable, and the viability (and contamination of the respective cultures with e.g. glial cells) varies from batch to batch. Although such cell preparations are suitable for electrophysiological or immunocytochemical experiments, they are far from ideal for biochemical studies. A meaningful experiment would require the efficient expression of a catalytically active, but inhibitor-resistant PDXP-mutant in PDXP-KO neurons. In parallel, PDXP-KO cells reconstituted with PDXP-WT (at phosphatase activity levels comparable with the PDXP mutant cells) would be needed for comparison. Unfortunately, the generation of (a) sufficient numbers of (b) viable cells that (c) efficiently express (d) functionally comparable levels of PDXP-WT or -mutant for downstream analysis (PLP/PL-levels upon inhibitor treatment) is currently not possible for us.

      Human iPSC-derived (hippocampal) spheroids are at present no alternative, due to the necessity of generating PDXP-KO lines first, and the difficulties with transfecting/transducing them. Such a system would require extensive validation. We have attempted to use SH-SY5Y cells (a metastatic neuroblastoma cell line), but PDXK expression in these cells is modest and they produce too little PLP. We therefore feel that this question is beyond the scope of our current study.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This is an interesting study that performs scRNA-Seq on infected and uninfected wounds. The authors sought to understand how infection with E. faecalis influences the transcriptional profile of healing wounds. The analysis demonstrated that there is a unique transcriptional profile in infected wounds with specific changes in macrophages, keratinocytes, and fibroblasts. They also speculated on potential crosstalk between macrophages and neutrophils and macrophages and endothelial cells using NicheNet analysis and CellChat. Overall the data suggest that infection causes keratinocytes to not fully transition which may impede their function in wound healing and that the infection greatly influenced the transcriptional profile of macrophages and how they interact with other cells.

      Strengths:

      It is a useful dataset to help understand the impact of wound infection on the transcription of specific cell types. The analysis is very thorough in terms of transcriptional analysis and uses a variety of techniques and metrics.

      Weaknesses:

      Some drawbacks of the study are the following. First, the fact that it only has two mice per group, and only looks at one time point after wounding decreases the impact of the study. Wound healing is a dynamic and variable process so understanding the full course of the wound healing response would be very important to understand the impact of infection on the healing wound. Including unwounded skin in the scRNA-Seq would also lend a lot more significance to this study. Another drawback of the study is that mouse punch biopsies are very different than human wounds as they heal primarily by contraction instead of reepithelialization like human wounds. So while the conclusions are generally supported the scope of the work is limited.

      Thank you for your thoughtful review and acknowledgment of the thoroughness of our analysis.

      First, the fact that it only has two mice per group, and only looks at one time point after wounding decreases the impact of the study.

      We acknowledge your concerns regarding the limitations of our study, particularly regarding the small number of mice per group and the examination of only one time point post-wounding. We agree that a more comprehensive analysis across multiple time points would provide a deeper understanding of the temporal changes induced by infection. While our primary focus in this study was to elucidate the foundational responses to bacteria-infected wounds, we attempted to augment our analysis by incorporating publicly available datasets of similar nature. However, these datasets lacked power in terms of cell number and populations. Nonetheless, we have bolstered our analysis by applying a crossentropy test on the integrated dataset and reporting its significance (Figure S1F), ensuring the robustness of our single-cell RNA sequencing datasets.

      Including unwounded skin in the scRNA-Seq would also lend a lot more significance to this study.

      We also recognize the significance of comparing infected wounds to unwounded skin to establish a baseline for transcriptional changes. While we attempted to incorporate publicly available unwounded skin samples into our analysis, we encountered limitations in the number of cells, particularly within the immune population. This constraint is addressed in the Limitations section of the manuscript.

      Another drawback of the study is that mouse punch biopsies are very different than human wounds as they heal primarily by contraction instead of re-epithelialization like human wounds.

      Regarding the concern about differences between murine and human wound healing mechanisms, we took measures during tissue isolation to mitigate this issue, extracting incisions of the wounds rather than contracted tissues. Despite the primary mode of wound closure in mice being contraction, we believe our analysis still offers valuable insights into cellular responses to infection relevant to human wound healing.

      We appreciate your constructive criticism of our study. Despite these constraints, we believe our work provides valuable insights into the transcriptional changes induced by infection in healing wounds.

      Reviewer #2 (Public Review):

      Summary:

      The authors have performed a detailed analysis of the complex transcriptional status of numerous cell types present in wounded tissue, including keratinocytes, fibroblasts, macrophages, neutrophils, and endothelial cells. The comparison between infected and uninfected wounds is interesting and the analysis suggests possible explanations for why infected wounds are delayed in their healing response.

      Strengths:

      The paper presents a thorough and detailed analysis of the scRNAseq data. The paper is clearly written and the conclusions drawn from the analysis are appropriately cautious. The results provide an important foundation for future work on the healing of infected and uninfected wounds.

      Weaknesses:

      The analysis is purely descriptive and no attempt is made to validate whether any of the factors identified are playing functional roles in wound healing. The experimental setup is analyzing a single time point and does not include a comparison to unwounded skin.

      We are thankful for your acknowledgment of the thoroughness of our analysis and the cautious nature of our conclusions.

      The analysis is purely descriptive, and no attempt is made to validate whether any of the factors identified are playing functional roles in wound healing.

      Regarding your concern about the purely descriptive nature of our analysis and the lack of functional validation of identified factors, we agree on the importance of understanding the functional roles of transcriptional changes in wound healing. To address this limitation, we plan to conduct functional experiments, such as perturbation assays or in vivo validation studies, to validate the roles of specific factors identified in our analysis.

      The experimental setup is analyzing a single time point and does not include a comparison to unwounded skin.

      We acknowledge the importance of comparing wounded tissue to unwounded skin to establish a baseline for understanding transcriptional changes. This point is noted and acknowledged in the limitations section of our manuscript.

      We appreciate your feedback and assure you that we will consider your suggestions in future iterations of our research.

      Recommendations For The Authors:

      We are grateful for the positive overall assessment of our revised work by the reviewers. Critical comments on specific aspects of our work are listed verbatim below followed by our responses.

      Reviewer 1 (Recommendations for the Authors):

      (1) The figures are a bit cluttered and hard to parse out. The different parts of the figure seem to be scattered all over the place with no consistent order.

      Thank you for your feedback regarding the figures in our manuscript. We acknowledge your concern that some panels may appear cluttered and challenging to navigate. In response, we made concerted efforts to declutter certain panels, taking into account page size constraints and ensuring a minimum font size for readability.

      (2) I didn't really understand what the last sentence on page 6 meant. Is this meant to say that these could be biomarkers of infection?

      We thank the reviewer for noting this lack of clarity. We revised the statement.

      Updated manuscript (lines 111-113)

      “Overall, the persistent E. faecalis infection contributed to higher Tgfb1 expression, whilst Pdgfa levels remained low, correlating with delayed wound healing.”

      (3) >(3) A reference on page 19 didn't format correctly.

      We thank the reviewer for catching the typos. We corrected the reference formatting.

      Updated manuscript (lines 503-505)

      “We confirm the immune-suppressive role of E. faecalis in wound healing, consistent with previous findings in different experimental settings (Chong et al., 2017; Kao et al., 2023; Tien et al., 2017).”

      (4) The title doesn't really address the scope of the finding which goes beyond immunomodulatory.

      The reviewer is correct! We therefore revised the title to cover all aspects of the study as:

      “Decoding the complexity of delayed wound healing following Enterococcus faecalis infection”

      Reviewer 2 (Recommendations for the Authors):

      (1) On page 6, the expression of Tgfb1 is described as "aggravated" by wounding alone. I am not sure whether this means Tgfb1 levels are increased or decreased. It appears from the data that it is increased, which was confusing to me since I interpreted "aggravated" as meaning decreased. So perhaps a different more straightforward word could be used to describe the data.

      We modified this ambiguous statement to:

      Updated manuscript (lines 105-106)

      “By contrast, wounding alone resulted in higher transforming growth factor beta 1 (Tgfb1) expression.”

      (2) On page 7, the authors state that "cells from infected wounds...demonstrated distinct clustering patterns compared to cells from uninfected wounds (Figure S1F)" but when I look at the data in this figure, I cannot really see a difference. Perhaps the differences could be more clearly highlighted?

      Thank you for pointing out this issue. We appreciate the reviewer's comment. We utilized the crossentropy test for statistical comparison, employing UMAP embedding space data. While the data underwent batch correction based on infection status, the UMAP plots for each condition may appear visually similar. However, it's important to note that the number of cells per clusters between the infected and uninfected conditions varies significantly. This aspect influences the selection of points (cells) and their nearest neighbours for statistical testing within each cluster in the embedding space. To address this concern, we have included a table indicating the number of cells per cell type alongside the plot (Figure S1F), providing additional context for the interpretation of our results.

      Author response table 1.

      Author response image 1.

      (3) On page 8, Zeb2hi cells are described as "immunosuppressive" and yet the genes are highlighted to express in include Cxcl2 and IL1b which I would classify as inflammatory, not immunosuppressive. Can the authors be a bit more clear on why they describe the phenotype of these cells as "immunosuppressive"?

      We agree with the reviewer that this is a bit counterintuitive. Conventionally, CXCL2 is thought to be chemoattractant for neutrophil recruitment. However, the infection-specific keratinocyte cluster expressing Cxcl2, Il1b, Wfdc17 along with Zeb2 and Thbs1 indicate their myeloid-derived suppressor cell-like features, which play immunosuppressive roles during infection and in cancer (Alshetaiwi et al., 2020; Siriwach et al., 2022; Veglia et al., 2021).

      Updated manuscript (lines 159-163)

      “As the barrier to pathogens, keratinocytes secrete a broad range of cytokines that can induce inflammatory responses (Alshetaiwi et al., 2020; Siriwach et al., 2022; Veglia et al., 2021). However, Zeb2hi keratinocytes co-expressing Cxcl2, Il1b, and Wfdc17, indicate myeloidderived suppressor cell-like phenotype which implies an immunosuppressive environment (Hofer et al., 2021; Veglia et al., 2021).”

      (4) On pages 8-9, Keratinocytes are described to express MHC class II. I find this quite unexpected since class II is usually thought to be expressed primarily by APCs such as DCs and B cells. Is there a precedent for keratinocytes to express class II? The authors should acknowledge that this is unexpected and in need of further validation, or support the claim with references in which class II expression has been previously observed on keratinocytes (and is thus not unexpected)

      Although MHC class II expression is predominantly on immune cells, an antigen-presenting role for keratinocytes has been reported in many studies (Banerjee et al., 2004; Black et al., 2007; Carr et al., 1986; Gawkrodger et al., 1987; Jiang et al., 2020; Li et al., 2022; Oh et al., 2019; Tamoutounour et al., 2019). Therefore, antigen-presenting role of keratinocytes is known and expected, and we think that this should be further investigated in in the context of wound infection.

      Updated manuscript (lines 177-179)

      “These genes are associated with the major histocompatibility complex (MHC) class II, suggesting a self-antigen presenting keratinocyte population, which have a role in costimulation of T cell responses (Meister et al., 2015; Tamoutounour et al., 2019).”

      REFERENCES

      Alshetaiwi, H., Pervolarakis, N., McIntyre, L. L., Ma, D., Nguyen, Q., Rath, J. A., Nee, K., Hernandez, G., Evans, K., Torosian, L., Silva, A., Walsh, C., & Kessenbrock, K. (2020). Defining the emergence of myeloid-derived suppressor cells in breast cancer using single-cell transcriptomics. Science Immunology, 5(44), eaay6017. https://doi.org/10.1126/sciimmunol.aay6017

      Banerjee, G., Damodaran, A., Devi, N., Dharmalingam, K., & Raman, G. (2004). Role of keratinocytes in antigen presentation and polarization of human T lymphocytes. Scandinavian Journal of Immunology, 59(4), 385–394. https://doi.org/10.1111/j.0300-9475.2004.01394.x

      Black, A. P. B., Ardern-Jones, M. R., Kasprowicz, V., Bowness, P., Jones, L., Bailey, A. S., & Ogg, G. S. (2007). Human keratinocyte induction of rapid effector function in antigen-specific memory CD4+ and CD8+ T cells. European Journal of Immunology, 37(6), 1485–1493. https://doi.org/10.1002/eji.200636915

      Carr, M. M., McVittie, E., Guy, K., Gawkrodger, D. J., & Hunter, J. A. (1986). MHC class II antigen expression in normal human epidermis. Immunology, 59(2), 223–227.

      Gawkrodger, D. J., Carr, M. M., McVittie, E., Guy, K., & Hunter, J. A. (1987). Keratinocyte expression of MHC class II antigens in allergic sensitization and challenge reactions and in irritant contact dermatitis. The Journal of Investigative Dermatology, 88(1), 11–16. https://doi.org/10.1111/1523-1747.ep12464641

      Jiang, Y., Tsoi, L. C., Billi, A. C., Ward, N. L., Harms, P. W., Zeng, C., Maverakis, E., Kahlenberg, J. M., & Gudjonsson, J. E. (2020). Cytokinocytes: The diverse contribution of keratinocytes to immune responses in skin. JCI Insight, 5(20), e142067, 142067. https://doi.org/10.1172/jci.insight.142067

      Li, D., Cheng, S., Pei, Y., Sommar, P., Kärner, J., Herter, E. K., Toma, M. A., Zhang, L., Pham, K., Cheung, Y. T., Liu, Z., Chen, X., Eidsmo, L., Deng, Q., & Xu Landén, N. (2022). Single-Cell Analysis Reveals Major Histocompatibility Complex II‒Expressing Keratinocytes in Pressure Ulcers with Worse Healing Outcomes. The Journal of Investigative Dermatology, 142(3 Pt A), 705–716. https://doi.org/10.1016/j.jid.2021.07.176

      Oh, S., Chung, H., Chang, S., Lee, S.-H., Seok, S. H., & Lee, H. (2019). Effect of Mechanical Stretch on the DNCB-induced Proinflammatory Cytokine Secretion in Human Keratinocytes. Scientific Reports, 9(1), 5156. https://doi.org/10.1038/s41598-019-41480-y

      Siriwach, R., Ngo, A. Q., Higuchi, M., Arima, K., Sakamoto, S., Watanabe, A., Narumiya, S., & Thumkeo, D. (2022). Single-cell RNA sequencing identifies a migratory keratinocyte subpopulation expressing THBS1 in epidermal wound healing. iScience, 25(4), 104130. https://doi.org/10.1016/j.isci.2022.104130

      Tamoutounour, S., Han, S.-J., Deckers, J., Constantinides, M. G., Hurabielle, C., Harrison, O. J., Bouladoux, N., Linehan, J. L., Link, V. M., Vujkovic-Cvijin, I., Perez-Chaparro, P. J., Rosshart, S. P., Rehermann, B., Lazarevic, V., & Belkaid, Y. (2019). Keratinocyte-intrinsic MHCII expression controls microbiota-induced Th1 cell responses. Proceedings of the National Academy of Sciences of the United States of America, 116(47), 23643–23652. https://doi.org/10.1073/pnas.1912432116

      Veglia, F., Sanseviero, E., & Gabrilovich, D. I. (2021). Myeloid-derived suppressor cells in the era of increasing myeloid cell diversity. Nature Reviews. Immunology, 21(8), 485–498. https://doi.org/10.1038/s41577-020-00490-y

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #3 (Public Review):

      Summary:

      It has been proposed in the literature, that the ATP release channel Panx1 can be activated in various ways, including by tyrosine phosphorylation of the Panx1 protein. The present study reexamines the commercial antibodies used previously in support of the phosphorylation hypothesis and the presented data indicate that the antibodies may recognize proteins unrelated to Panx1. Consequently, the authors caution about the use and interpretation of results obtained with these antibodies.

      Strengths:

      The manuscript by Ruan et al. addresses an important issue in Panx1 research, i.e. the activation of the channel formed by Panx1 via protein phosphorylation. If the authors' conclusions are correct, the previous claims for Panx1 phosphorylation on the basis of the commercial anti-phospho-Panx1 antibodies would be in question.

      This is a very detailed and comprehensive analysis making use of state-of-the-art techniques, including mass spectrometry and phos-tag gel electrophoresis.

      In general, the study is well-controlled as relating to negative controls.

      The value of this manuscript is, that it could spawn new, more function-oriented studies on the activation of Panx1 channels.

      Weaknesses:

      Although the manuscript addresses an important issue, the activation of the ATP-release channel Panx1 by protein phosphorylation, the data provided do not support the firm conclusion that such activation does not exist. The failure to reproduce published data obtained with commercial anti-phospho Panx1 antibodies can only be of limited interest for a subfield.

      (1) The title claiming that "Panx1 is NOT phosphorylated..." is not justified by the failure to reproduce previously published data obtained with these antibodies. If, as claimed, the antibodies do not recognize Panx1, their failure cannot be used to exclude tyrosine phosphorylation of the Panx1 protein. There is no positive control for the antibodies.

      The full title of our manuscript is “Human Pannexin 1 Channel is NOT Phosphorylated by Src Tyrosine Kinase at Tyr199 and Tyr309”. The major conclusion of our manuscript shall not be extended to the claim that “Panx1 is NOT phosphorylated”. This is by no means our conclusion. In fact, the LC-MS/MS data from both ours and others have shown that PANX1 is phosphorylated at both serine and tyrosine sites1. However, we provided solid evidence that Tyr199 and Tyr309 of human PANX1 are not effective substrate of the Src kinase.

      We did provide several positive controls for the antibodies in our study. We showed that the anti-PANX1 and anti-Src antibodies unambiguously recognized PANX1 and Src, respectively (Figure 3A), and that a pan-specific phosphotyrosine antibody (P-Tyr-100) unambiguously recognized phosphorylated Src (Figure 3A)—as expected—but did not recognize PANX1. In addition, we demonstrated that the two antibodies in question (anti-PANX1-pY198 and anti-PANX1-pY308) did produce signals in our western blot analysis, but we provided compelling evidence that the bands produced by these antibodies do not correspond to PANX1 (Figure 2B).

      (2) The authors claim that exogenous SRC expression does not phosphorylate Y198. DeLalio et al. 2019 show that Panx1 is constitutively phosphorylated at Y198, so an effect of exogenous SRC expression is not necessarily expected.

      We have unambiguously identified peptide fragments containing non-phosphorylated Y198 in our LC-MS/MS experiment, none corresponds to a phosphorylated Y198. Therefore, our LC-MS/MS data doesn’t support the notion that Panx1 is constitutively phosphorylated at Y198.

      (3) The authors argue that the GFP tag of Panx1at the COOH terminus does not interfere with folding since the COOH modified (thrombin cleavage site) Panx1 folds properly, forming an amorphous glob in the cryo-EM structure. However, they do not show that the COOH-modified Panx1 folds properly. It may not, because functional data strongly suggest that the terminal cysteine dives deep into the pore. For example, the terminal cysteine, C426, can form a disulfide bond with an engineered cysteine at position F54 (Sandilos et al. 2012).

      Our manuscript included results of using a non-GFP tagged PANX1 construct (Figure 2-figure supplement 1). We didn’t notice any difference for PANX1 phosphorylation between GFP-tagged and non-GFP-tagged PANX1. Therefore, the folding of the C-terminal tail of PANX1 doesn’t affect the conclusion of our study.

      (4) The authors dismiss the additional arguments for tyrosine phosphorylation of Panx1 given by the various previous studies on Panx1 phosphorylation. These studies did not, as implied, solely rely on the commercial anti-phospho-Panx1 antibodies, but also presented a wealth of independent supporting data. Contrary to the authors' assertion, in the previous papers the pY198 and pY308 antibodies recognized two protein bands in the size range of glycosylated and partial glycosylated Panx1.

      We didn’t dismiss additional arguments for the Src-dependent PANX1 regulation. In fact, in the discussion of our manuscript, we acknowledged the fact that Src may still be involved in PANX1 regulation, but probably through indirect mechanisms. In the two previous studies2,3, it’s unclear if the multimeric bands detected by pY198/pY308 antibodies correspond to glycosylated PANX1 or not, as the authors did not overlay the protein markers with their blots. In particular, the migration pattern of PANX1 changes across different western blot images from DeLalio et al2. It’s also worth noting that none of these “independent supporting data” in the two previous studies provided direct evidence that Src can phosphorylate pY198/pY308.

      (5) A phosphorylation step triggering channel activity of Panx1 would be expected to occur exclusively on proteins embedded in the plasma membrane. The membrane-bound fraction is small in relation to the total protein, which is particularly true for exogenously expressed proteins. Thus, any phosphorylated protein may escape detection when total protein is analyzed. Furthermore, to be of functional consequence, only a small fraction of the channels present in the plasma membrane need to be in the open state. Consequently, only a fraction of the Panx1 protein in the plasma membrane may need to be phosphorylated. Even the high resolution of mass spectroscopy may not be sufficient to detect phosphorylated Panx1 in the absence of enrichment processes.

      We agree with the reviewer that only plasma membrane-residing Panx1 phosphorylation is functionally relevant. Interestingly, however, previous studies actually analyzed total protein from cell lysate and concluded that PANX1 is phosphorylated at Y198 and Y3082,3. This has motivated our analysis, in which we found that the phosphorylation events cannot be detected when using whole cell lysate. Therefore, we have also conducted an electrophysiology experiment by comparing conditions with/without active Src kinase (Figure 7). Our result indicates that PANX1 current is not affected by the presence of Src. This result suggests that even if there might be minor Src kinase phosphorylation beyond detection limit of western blot or mass spectrometry, they may not be functionally significant as well.

      (6) In the electrophysiology experiments described in Figure 7, there is no evidence that the GFP-tagged Panx1 is in the plasma membrane. Instead, the image in Figure 7a shows prominent fluorescence in the cytoplasm. In addition, there is no evidence that the CBX-sensitive currents in 7b are mediated by Panx1-GFP and are not endogenous Panx1. Previous literature suggests that the hPanx1 protein needs to be cleaved (Chiu et al. 2014) or mutated at the amino terminus (Michalski et al 2018) to see voltage-activated currents, so it is not clear that the currents represent hPANX1 voltage-activated currents.

      Our previous analysis has already shown that endogenous current of non-transfected cells is not sensitive to CBX4. Therefore, the CBX-sensitive current in cells overexpressed PANX1 is from PANX1-GFP. It should be noted that when protein is overexpressed, it tends to accumulate at different intracellular membranes during protein synthesis/maturation. However, this doesn’t affect a portion of the protein to be trafficked to the plasma membrane. In the paper from Michalski et al 2018, it was shown that WT human/mouse PANX1 displayed voltage-dependent activation5. Although the current is relatively small, it is clearly distinguishable from non-transfected HEK and CHO cells. This voltage-dependent activation is also sensitive to CBX, consistent with our measurement (Figure 7)4. When GS is introduced at the N-terminus, the voltage-dependent activation of human/mouse PANX1 is significantly boosted, likely due to the altered NTH conformation resulting from the N-terminal extension.

      Recommendations for the authors:

      Reviewer #3 (Recommendations For The Authors):

      Literature quotes are still problematic. Why are secondary papers quoted instead of the original work? At least quote reviews by authors who published the original findings.

      We appreciate the reviewer pointing this out. We have carefully checked our references and made sure that the original literature is cited.

      Why does wtPanx1 run close to the 37 kD marker (Figure 2 supplement 1) instead of close to 50 kD as shown in the previous papers using the pY198 and pY308 antibodies?

      It is a common observation that membrane proteins migration in SDS-PAGE gel doesn’t correlate with their formula molecular weight, also known as “gel shifting”6–8. The molecular mechanism of this phenomenon remains complex. Therefore, simply relying on protein molecular standard could not unambiguously identify PANX1 protein band. This is an issue for identifying PANX1 band, especially in light of the fact that some antibodies may not be very specific (see Figure 6B). In our experiment, we have correlated the in-gel fluorescence and western blot signal which allowed us to determine the protein band corresponding to PANX1. It is worth noting that in Figure S3 of DeLalio 2019, the PANX1 is detected at 37 kDa2. However, in many other panels of the paper, PANX1 is detected at close to 50 kDa (for example, Figure S2B).

      Figure 6, supplement 1: why are there oligomers observed in the absence of crosslinking? Why is there no shift in the size of the "oligomers" in response to glycosidase F?

      It is common to observe multimeric membrane proteins, including PANX1, forming oligomeric bands in SDS-PAGE gels, likely because they are not fully denatured or disassembled. PANX1 also contains several free cysteines, which may non-specifically crosslink subunits. There is actually a small shift for the 75 kDa band (dimer) in Figure 6, supplement 1. For higher molecular weight bands, this small shift may not be apparent due to the limited resolution of the gel.

      A positive control for the antibodies used is missing. The authors argue that such controls are not available, since these commercial antibodies are "proprietary".

      We did provide several positive controls for the antibodies in our study. We showed that the anti-PANX1 and anti-Src antibodies unambiguously recognized PANX1 and Src, respectively (Figure 3A), and that a pan-specific phosphotyrosine antibody (P-Tyr-100) unambiguously recognized phosphorylated Src (Figure 3A)—as expected—but did not recognize PANX1. In addition, we demonstrated that the two antibodies in question (anti-PANX1-pY198 and anti-PANX1-pY308) did produce signals in our western blot analysis, but we provided compelling evidence that the bands produced by these antibodies do not correspond to PANX1 (Figure 2B).

      Unfortunately, the epitopes that Millipore Sigma used to generate anti-PANX1-pY198 and anti-PANX1-pY308 are not available. The description of the immunogen from Millipore Sigma website states that “A linear peptide corresponding to 12 amino acids surrounding phospho-Tyr198 of murine Pannexin-1” and “A linear peptide corresponding to 13 amino acids surrounding phosphotyrosine 308 of rat pannexin-1”. However, these immunogen peptides are not available for us to purchase.

      References

      (1) Nouri-Nejad, D. et al. Pannexin 1 mutation found in melanoma tumor reduces phosphorylation, glycosylation, and trafficking of the channel-forming protein. Mol Biol Cell 32, (2021).

      (2) DeLalio, L. J. et al. Constitutive SRC-mediated phosphorylation of pannexin 1 at tyrosine 198 occurs at the plasma membrane. Journal of Biological Chemistry 294, (2019).

      (3) Weilinger, N. L. et al. Metabotropic NMDA receptor signaling couples Src family kinases to pannexin-1 during excitotoxicity. Nat Neurosci 19, (2016).

      (4) Ruan, Z., Orozco, I. J., Du, J. & Lü, W. Structures of human pannexin 1 reveal ion pathways and mechanism of gating. Nature 584, (2020).

      (5) Michalski, K., Henze, E., Nguyen, P., Lynch, P. & Kawate, T. The weak voltage dependence of pannexin 1 channels can be tuned by N-terminal modifications. Journal of General Physiology 150, (2018).

      (6) Rath, A., Cunningham, F. & Deber, C. M. Acrylamide concentration determines the direction and magnitude of helical membrane protein gel shifts. Proc Natl Acad Sci U S A 110, (2013).

      (7) Rath, A. & Deber, C. M. Correction factors for membrane protein molecular weight readouts on sodium dodecyl sulfate-polyacrylamide gel electrophoresis. Anal Biochem 434, (2013).

      (8) Rath, A., Glibowicka, M., Nadeau, V. G., Chen, G. & Deber, C. M. Detergent binding explains anomalous SDS-PAGE migration of membrane proteins. Proc Natl Acad Sci U S A 106, (2009).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1:

      Summary:

      The evolution of non-shivering thermogenesis is of fundamental importance to understand. Here, in small mammals, the contractile apparatus of the muscle is shown to increase energy expenditure upon a drop in ambient temperature. Additionally, in the state of torpor, small hibernators did not show an increase in energy expenditure under the same challenge.

      Strengths:

      The authors have conducted a very well-planned study that has sampled the muscles of large and small hibernators from two continents. Multiple approaches were then used to identify the state of the contractile apparatus, and its energy expenditure under torpor or otherwise.

      Weaknesses:

      There was only one site of biopsy from the animals used (leg). It would be interesting to know if non-shivering thermogenesis is something that is regionally different in the animal, given the core body and distal limbs have different temperatures.

      We thank the reviewer for their time and effort in reviewing our manuscript. Furthermore, we agree that it would be of interest to perform similar experiments upon different muscle sites in these animals. This is of particular interest as in some mammals, such as mice, distal limbs do not shiver and therefore non-shivering thermogenesis may play a more prominent role in heat regulation. A paper from Aydin et al., demonstrated that when shivering muscles (soleus) were prevented undergoing non-shivering thermogenesis via knock-out of UCP1 and were then exposed to cold temperatures, the force production of these muscles was significantly reduced due to prolonged shivering [1]. These results do suggest that even in shivering muscle, non-shivering thermogenesis plays a key role in the generation of heat for survival and for the maintenance of muscle performance. Furthermore, there is evidence from garden dormice that muscle temperature during torpor is slightly warmer than abdominal temperature and slighter cooler that heart temperature which is 7-8°C than abdominal suggesting the existence of non-shivering thermogenesis in skeletal and cardiac muscles (Giroud et al. in prep) [2]. We have added this information and reference into our discussion to reflect this important point (Discussion, paragraph 6, “As the biopsies which were used…”).

      Reviewer #2:

      Summary:

      The authors utilized (permeabilized) fibers from muscle samples obtained from brown and black bears, squirrels, and Garden dormice, to provide interesting and valuable data regarding changes in myosin conformational states and energetics during hibernation and different types of activity in summer and winter. Assuming that myosin structure is similar between species then its role as a regulator of metabolism would be similar and not different, yet the data reveal some interesting and perplexing differences between the selected hibernating species.

      Strengths:

      The experiments on the permeabilized fibers are complementary, sophisticated, and well-performed, providing new information regarding the characteristics of skeletal muscle fibers between selected hibernating mammalian species under different conditions (summer, interarousal, and winter).

      The studies involve complementary assessments of muscle fiber biochemistry, sarcomeric structure using X-ray diffraction, and proteomic analyses of posttranslational modifications.

      Weaknesses:

      It would be helpful to put these findings on permeabilized fibers into context with the other anatomical/metabolic differences between the species to determine the relative contribution of myosin energetics (with these other contributors) to overall metabolism in these different species, including factors such as fat volume/distribution.

      We thank the reviewer for the time and effort they have put into reviewing our paper and are grateful for the helpful suggestions which we believe, enhances our work (please see below for detailed answers to critics).

      Reviewer #3:

      Summary and strengths:

      The manuscript, "Remodelling of skeletal muscle myosin metabolic states in hibernating mammals", by Lewis et al, investigates whether myosin ATP activity may differ between states of hibernation and activity in both large and small mammals. The study interrogates (primarily) permeabilized muscle strips or myofibrils using several state-of-the-art assays, including the mant-ATP assay to investigate ATP utilization of myosin, X-ray diffraction of muscles, proteomics studies, metabolic tests, and computational simulations. The overall data suggests that ATP utilization of myosin during hibernation is different than in active conditions.

      A clear strength of this study is the use of multiple animals that utilize two different states of hibernation or torpor. Two large animal hibernators (Eurasian Brown Bear, American Black Bear) represent large animal hibernators that typically undergo prolonged hibernation. Two small animal hibernators (Garden Dormouse, 13 Lined Ground Squirrel) undergo torpor with more substantial reductions in heart rate and body temperature, but whose torpor bouts are interrupted by short arousals that bring the animals back to near-summer-like metabolic conditions.

      Especially interesting, the investigators analyze the impact that body temperature may have on myosin ATP utilization by performing assays at two different temperatures (8 and 20 degrees C, in 13 Lined Ground Squirrels).

      The multiple assays utilized provide a more comprehensive set of methods with which to test their hypothesis that muscle myosins change their metabolic efficiency during hibernation.

      We thank this reviewer for the effort and time they have put into carefully reviewing our manuscript and have taken on board their valuable suggestions to improve our manuscript (please see below for detailed answers to critics).

      Suggestions and potential weaknesses:

      While the samples and assays provide a robust and comprehensive coverage of metabolic needs and testing, the data is less categorical. Some of these may be dependent on sample size or statistical analysis while others may be dependent on interpretation.

      (1) Statistical Analysis

      (1a) The results of this study often cannot be assessed properly due to a lack of clarity in the statistical tests.

      For example, the results related to the large animal hibernators (Figure 1) do not describe the statistical test (in the text of the results, methods, or figure legends). (Similarly for figure 6 and Supplemental Figure 1). Further, it is not clear whether or when the analysis was performed with paired samples. As the methods described, it appears that the Eurasian Brown Bear data should be paired per animal.

      We thank the reviewer for these important points and have added information upon the statistical tests used where previously missing in each figure legend. Details on the statistical testing used for figure 6 are listed in the methods section, paragraph 18, “All statistical analysis of TMT derived protein expression data…”

      (1b) The statistical methods state that non-parametric testing was utilized "where data was unevenly distributed". Please clarify when this was used.

      We have now clariid all statistical tests used in the figure legends.

      (1c) While there are two different myosin isoforms, the isoform may be considered a factor. It is unclear why a one-way ANOVA is generally used for most of the mant-ATP chase data.

      The reviewer is right, in our analysis, we haven’t considered ‘myosin isoforms’ as a factor. One of the main reasons for that is because we have decided to treat fibres expressing different myosin heavy chain isoforms as totally separated entities (not interconnected).

      (1d) While the technical replicates on studies such as the mant-ATP chase assay are well done, the total biological replicates are small. A consideration of the sample power should be included.

      Unfortunately, obtaining additional biological samples from these unique species is challenging. Hence, we have added a statement in the Discussion section. This statement focuses on the potential benefits of increasing sample size to increase statistical power (Discussion, paragraph 2, “In contrast to our study hypothesis…”

      (1e) An analysis of the biological vs statistical significance should be considered, especially for the mant-ATP chase data from the American Black Bear, where there appear to be shifts between the summer and winter data.

      We agree that it is important to be careful when drawing conclusions from data only based on p-values. We agree that the modest differences observed in these data on American Black bear, whilst not significant, are worth noting and we have added these considerations into the manuscript (Discussion, paragraph 2, “In contrast to our study hypothesis…).

      (2) Consistency of DRX/SRX data.

      (2a) The investigators performed both mant-ATP chase and x-ray diffraction studies to investigate whether myosin heads are in an "on" or "off" state. The results of these two studies do not appear to be fully consistent with each other, which should not be a surprise. The recent work of Mohran et al (PMID 38103642) suggests that the mant-ATP-predicted SRX:DRX proportions are inconsistent with the position of the myosin heads. The discussion appears to lack a detailed assessment of this prior work and lack a substantive assessment contrasting the differing results of the two assays in the current study. i.e. why the current study's mant-ATP chase and x-ray diffraction results differ.

      Prior works on skeletal muscle (observing discrepancies between Mant-ATP chase assay and X-ray diffraction) are rather scarce. Adding a comprehensive discussion about this may be beyond the scope of current study and would distract the reader from the main topic. For this reason, we have not added any section. Note that, we have other manuscripts in preparation that are specifically dedicated to the discrepancy.

      (2b) The discussion of the current study's x-ray diffraction data relating to the I_1,1/I_1,0 ratio and how substantially different this is to the M6 results merits discussion. i.e. how can myosin both be more primed to contract during IBA versus torpor (according to intensity ratio), but also have less mass near the thick filament (M6).

      The I1,1/I1,0 ratio indicates a subtle mass shift towards the myosin thick filament whilst the M6 spacing shows a more compliant thick filament. These results are not incompatible and rely on interpretation of the X-ray diffraction patterns. To avoid any confusion and avoid distracting the reader from the main topic, we have decided not to speculate there.

      (3) Possible interactions with Heat Shock Proteins

      Heat Shock Proteins (HSPs), such as HSP70, have been shown to be differential during torpor vs active states. A brief search of HSP and myosin reveals HPSs related to thick filament assembly and Heat Shock Cognate 70 interacting with myosin binding protein C. Especially given the author's discussion of protein stability and the potential interaction with myosin binding protein C and the SRX state, the limitation of not assessing HSPs should be discussed. (While HSP's relation to thick filament assembly might conceivably modify the interpretation of the M3 x-ray diffraction results, this reviewer acknowledges that possibility as a leap.)

      The reviewer raises an interesting and potentially important of the potential impact of HSP and their interaction with the thick filament during hibernation. We have added a section into the discussion of this manuscript regarding this, with particular impact upon the HSP70 acting as a chaperone for myosin binding protein, however we feel that it is important to point out that HSPs have only been shown to interact with MYBPC3, a cardiac isoform of this protein which is not present in skeletal muscle [3]. (Discussion, paragraph 5, “Of potential further interest to the regulation of myosin…”).

      Despite the above substantial concerns/weaknesses, this reviewer believes that this manuscript represents a valuable data set.

      Other comments related to interpretation:

      (4) The authors briefly mention the study by Toepfer et al [Ref 25] and that it utilizes cardiac muscles. There would benefit from increased discussion regarding the possible differences in energetics between cardiac and skeletal muscle in these states.

      As this manuscript focuses solely on skeletal muscle. We believe that introducing comparisons between cardiac and skeletal muscles would confuse the reader. These types of muscles have very different regulations of SRX/DRX as an example. Note that we are preparing a manuscript focusing on cardiac muscle and hibernation.

      (5) The author's analysis of temperature is somewhat limited.

      (5a) First, the authors use 20 degrees C (room temperature), not 37 degrees C, a more physiologic body temperature for large mammals. While it is true that limbs are likely at a lower temperature, 20 degrees C seems substantially outside of a normal range. Thus, temperature differences may have been minimized by the author's protocol.

      The authors agree that the experimental set up to perform these single fiber studies at slightly higher temperatures may have been more beneficial to replicate the physiological conditions of these hind leg muscle in the analyzed animals. However, previous work has shown that the resting myosin dynamics are in fact stable at temperatures between 20-30 degrees Celsius in type I, type II and cardiac mammalian muscle fibers [4].

      (5b) Second, the authors discuss the possibility of myosin contributing to non-shivering thermogenesis. The magnitude of this impact should be discussed. The suggestion of myosin ATP utilization also implies that there is some basal muscle tone (contraction), as the myosin ATPase utilizes ATP to release from actin, before binding and hydrolyzing again. Evidence of this tone should be discussed.

      The reviewer is raising an interesting point and it would indeed be interesting to assess the magnitude of the impact and whether a basal muscle tone exists. Assessing the magnitude of the impact, is not an easy task and would require very advanced simulations which we are not experts in unfortunately. As for basal muscle tone, this is difficult to say as myosin is not actually binding to actin but hydrolyzing ATP at a faster pace during hibernation. We then think that the relation between our data and basal muscle tone is unclear. Hence, we have decided not to discuss these points in the manuscript.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      This is a very interesting paper. I have some minor suggestions to help improve it.

      Is there any way to estimate the contribution of contractile apparatus to energy expenditure in reference to what is being generated at SERCA in the resting muscle under the various states examined?

      This is an interesting idea however, as far as we know, this would be challenging experimentally (in the hibernating mammals) and difficult to achieve in a reliable manner.

      It is important to emphasize that while BAT has been traditionally seen to be the site of NST, the skeletal muscle is very important, especially in large mammals, where BAT is going to be a very small % of the body and unlikely to be able to adequately provide heat. The addition of the contractile apparatus to SERCA as a heat generator at rest is very important -- also, the activation of ryanodine receptor Ca2+ to increase the local [Ca2+] at SERCA to generate heat has also recently been shown and should be mentioned (Meizoso-Huesca et al 2022, PNAS; Singh et al 2023, PNAS) alongside the work of Bal et al 2012 etc...

      We have included these mechanisms and references into the manuscript discussion [5, 6]. Discussion, paragraph 4, “A critical difference between the large hibernators…”

      Are you able to report the likely proportion of type II fibers in the muscles you have sampled?

      The fiber type breakdown for all animals used in this study is reported in supplementary table 1.

      The sampling of muscle from the legs of live animals is sensible and convenient. Is it possible different muscles in the body have different levels of NST, changes in energy expenditure in torpor, and other states?

      As discussed in the public review we have added to the discussion of this manuscript to reflect upon this important point of potentially different results from different muscle sites in these animals.

      Reviewer #2 (Recommendations For The Authors):

      Is it likely that the proportion of fast and slow myosin-heavy chains within the selected sample of myofibers from the different mammals contributes to the overall differences in the energetics of different conformational states? In living animals, how does the relative contribution of the energetics from different muscle fiber types compare with the contribution from other organs to the overall regulation of metabolism during activities in summer, winter, or periods of intermittent arousal?

      Fiber types in mammals can be vastly different between species as well as having a considerable amount of plasticity to change within each species upon specific stimuli. Furthermore, some mammals also have specific myosin heavy chain isoforms which have considerable expression, for example, myosin heavy chain 2B which is expressed in rodents such as mice but not larger mammals such as humans.

      In the manuscript, we demonstrate that there is no significant change in the ATP usage by myosin in resting muscle in any of the species which we examined (Fig 1 F, L; Fig 2 E, J). The relatively high mitochondrial density of type I fibers when compared to type II fibers may contribute to a higher overall requirement of energy storage primarily via lipid oxidation. However, mitochondrial respiration is heavily suppressed during hibernation, so questions remain over the overall energy demand in hibernating muscle beyond myosin [7]. The fact that myosin ATP demand is relatively preserved in hibernating muscle suggests that skeletal muscle may be a relatively energy-demanding organ even during hibernation, we speculate in the manuscript this may be due to the requirement of maintaining muscular tone and function during this period of prolonged immobilization. This may be of relevance when one considers the almost complete shutdown of organs involved with food intake and breakdown such as the stomach and liver during hibernation. Furthermore, heart rate and breathing rates are vastly suppressed. Altogether, whilst is it difficult at this point to make an accurate estimate of energy demands between the different organs of hibernators, our data points to skeletal muscle to be a relatively high energy demand organ during these periods. When considering the difference between fiber type, again our data suggests that both type I and type II fibers have relatively similar energy demands during hibernation.

      The supplementary data are quite revealing as to how the myosin isoform composition is stable in some species but highly plastic in others in response to the same environmental/metabolic challenges. Why is the myosin heavy chain isoform (I and II) composition stable for brown bears but not for black bears between summer and winter? This is very interesting. For the Ground squirrel, there is remarkable plasticity between myosin heavy chain isoforms ( I and II) between summer, interbout arousal, and torpor. Yet in the Garden Dormouse, the myosin heavy chain isoform (I and II) composition is stable between these three activity states. The inconsistencies between and within species are perplexing and worthy of closer interrogation.

      The measurements and role of myosin energetics in different conformational states are interesting but need to be explained in context with other metabolic regulators for these hibernating mammals, especially because some species show remarkable plasticity whereas others show remarkable stability. For example, compare brown and black bears which show differences in the response of myosin composition the activity, interbout arousal, and torpor. Ground squirrels show remarkable plasticity in myosin isoform composition between activity states (and likely metabolic differences), but the Garden Dormouse has a remarkably stable myosin isoform composition during the three metabolic/environmental challenges. What mechanisms facilitate these modifications in some but not other mammals, even those of similar size? The differences are very interesting, worthy of follow-up, and may well contribute to further understanding the significance of the energetics of different myosin conformational states.

      We agree that the changes seen between these species are very interesting and worthy of further investigation. What would be of further interest would be to look at methods which would allow for even deeper phenotyping, such as single fiber proteomics, to allow for the assessment of the percentage of hybrid fibers and fibers undergoing any fiber type switch during hibernating periods. Our results do observe a modest, albeit not significant, increase in the number of type I muscle fibers in 13-lined ground squirrels and Garden dormice during torpor which is consistent with previous studies[8]. Previous studies have demonstrated that lower temperatures may promote a shift towards more oxidative type I muscle fibers in mammals[9]. This could be an explanation for why we see this specifically in the smaller hibernators, however as we demonstrate and discuss, these lower temperatures are vital for the survival of these smaller mammals during hibernation so it would be inconsistent to hypothesize that these shifts are for heat-production purposes. Further studies are warranted to understand the relevance of these shifts further, particularly those with a higher sample size. It would also be on interest to examine fiber type percentages during the progression these long hibernating periods to observe if these changes are progressive.

      As for the triggers and mechanisms which facilitate these changes to myosin dynamics, this is of current investigation by the field. One which may be of particular relevance to the changes seen during hibernation would that of steroid hormones previous research has demonstrated that steroid hormone levels in make and female bears change differentially[10]. This may be of relevance as the steroid hormone estradiol has been shown to slow the resting myosin ATP turnover via the binding of myosin RLC[11]. Considering these studies, future work which looks at hibernating animals of each sex as different groups may be fruitful.

      Reviewer #3 (Recommendations For The Authors):

      i. PDF Pg 8- Results- 'Myosin temperature sensitivity is lost in relaxed skeletal muscles fibers of hibernating Ictidomys tridecemlineatus.': An extra comma appears to be placed between "temperature, decrease".

      ii. PDF Pg 9- Results- 'Hyper-phosphorylation of Myh2 predictably stabilizes myosin backbone in hibernating Ictidomys tridecemlineatus.' (last paragraph): A parenthesis needs to be closed upon the first reference to "supplemental figures 2 and 3".

      iii. PDF Pg 15- Methods- 'Samples collection and cryo-preservation'- The authors use the term "individuals" in the 2nd line. Consider using "subjects".

      iv. PDF Pg 15- Methods- 'Samples collection and cryo-preservation' (2nd paragraph)- define "subadult" in approximate months or years.

      v. PDF Pg 15- Methods- 'Samples collection and cryo-preservation' (2nd paragraph)- The authors state that brown bears were located in "February and again ... in late June". Was this order of operations always held? If so, a comment about how the potential ageing from the hibernation (especially if sub-adult transitions to adulthood in this period) should be included.

      All samples were collected during the subadult period of the lifespan of each bear and therefore we do not think that there would be a potential aging affect observed considering the lifespan of this species to be 20-30 years.

      vi. PDF Pg 15- Methods- 'Samples collection and cryo-preservation' (3rd paragraph)- The justification for deprivation of feeding of black bears 24 hours prior to euthanasia should be included. A comment on how this might impact post-translational modifications or gene expression should be included.

      Animals are starved prior to prevent aspiration during euthanasia. Considering these samples are to be compared to animals which have not consumed food or water for five months the impact relative impact on PTMs and gene expression would be considered negligible.

      vii. PDF Pg 17- Methods- 'Mant-ATP chase experiments' (just after normalized fluorescence equation): The "Where" may be lowercase.

      viii. PDF Pg 17- Methods- 'Mant-ATP chase experiments' (last paragraph): The protocol for myosin staining, along with the antibody identification (source, catalog number) should be included.

      ix. PDF Pg 18- Methods- 'Post-translational Modification Peptide mapping': Define the makeup of the acrylamide gel and/or the source and catalog number.

      x. PDF Pg 18- Methods- 'Post-translational Modification Peptide mapping': The authors state that "Gel bands were washed..." Please specify which protein bands and if multiple bands (i.e. multiple isoforms) were isolated.

      We thank this reviewer for their careful reading of our manuscript, we have made the changes above as relevant.

      Reference list

      (1) Aydin, J., et al., Nonshivering thermogenesis protects against defective calcium handling in muscle. Faseb j, 2008. 22(11): p. 3919-24.

      (2) Stickler, S., Regional body temperatures and fatty acid compositions in hibernating garden dormice: a focus on cardiac adaptions. 2022, Vienna: Vienna. p. v, 49 Seiten, Illustrationen.

      (3) Glazier, A.A., et al., HSC70 is a chaperone for wild-type and mutant cardiac myosin binding protein C. JCI Insight, 2018. 3(11).

      (4) Walklate, J., et al., Exploring the super-relaxed state of myosin in myofibrils from fast-twitch, slow-twitch, and cardiac muscle. Journal of Biological Chemistry, 2022. 298(3).

      (5) Meizoso-Huesca, A., et al., Ca<sup>2+</sup> leak through ryanodine receptor 1 regulates thermogenesis in resting skeletal muscle. Proceedings of the National Academy of Sciences, 2022. 119(4): p. e2119203119.

      (6) Singh, D.P., et al., Evolutionary isolation of ryanodine receptor isoform 1 for muscle-based thermogenesis in mammals. Proceedings of the National Academy of Sciences, 2023. 120(4): p. e2117503120.

      (7) Staples, J.F., K.E. Mathers, and B.M. Duffy, Mitochondrial Metabolism in Hibernation: Regulation and Implications. Physiology, 2022. 37(5): p. 260-271.

      (8) Xu, R., et al., Hibernating squirrel muscle activates the endurance exercise pathway despite prolonged immobilization. Exp Neurol, 2013. 247: p. 392-401.

      (9) Yu, J., et al., Effects of Cold Exposure on Performance and Skeletal Muscle Fiber in Weaned Piglets. Animals (Basel), 2021. 11(7).

      (10) Frøbert, A.M., et al., Differential Changes in Circulating Steroid Hormones in Hibernating Brown Bears: Preliminary Conclusions and Caveats. Physiol Biochem Zool, 2022. 95(5): p. 365-378.

      (11) Colson, B.A., et al., The myosin super-relaxed state is disrupted by estradiol deficiency. Biochemical and biophysical research communications, 2015. 456(1): p. 151-155.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1:

      Comments on revised version:

      The authors have satisfactorily addressed my concerns.

      I suggest some minor edits, however. Line 747 does not mention MARK3 and neither does the figure 8 legend (just MARK2). It would be helpful if the authors could include references to the papers reporting the shown structures in the Figure 8 legend

      We have added MARK3 and related references in the revised Figure 8 legend.

      Reviewer #2:

      I would recommend that the catalog numbers from the different antibodies used in the study, mainly CST and Invitrogen are depicted in material and methods (see Methods/Recombinant proteins and general reagents).

      Thank you for the comment. We have now added the antibody catalog numbers in the revised methods section.

      I have one remark related to question number 5 (my question was not clear enough). I meant if the authors did look at the functional relevance of the residues implicated in the identified salt-bridge network/tethers. What happens to the proteins functionally when you mutate those residues? (represented on Fig. 8).

      Otherwise, the authors have satisfactorily addressed my concerns.

      Yes, we have analyzed the stability of the salt bridge interaction in the context of cysteine mutations, and our findings are described in the results section titled “Cysteine mutations alter critical structural interactions required for kinase allosteric regulation Figure 6)”. However, we have not performed mutational analysis of the salt bridge residues as we feel this would be beyond the scope of the current study.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review): Weaknesses:

      However, the molecular mechanisms leading to NPC dysfunction and the cellular consequences of resulting compartmentalization defects are not as thoroughly explored. Results from complementary key experiments using western blot analysis are less impressive than microscopy data and do not show the same level of reduction. The antibodies recognizing multiple nucleoporins (RL1 and Mab414) could have been used to identify specific nucleoporins that are most affected, while the selection of Nup98 and Nup107 is not well explained.

      The results for the Western blots are less impressive than single nuclei imaging analysis because the protocol for isolating brain nuclei is heterogeneous and includes non-neuronal cells. For this reason, we selected specific nucleoporins for Western blot studies to complement the nonspecificity of pan-NPC antibodies for which the detection is based on the glycosylated moieties. We reasoned that a combination of pan-NPC and select NUPs will give the strongest complementary validation for the mutant phenotype. We have discussed the rationale of NUP selection in discussion. In brief, we selected NUP107 as it is a major component of the Yscaffold complex and is a long-lived subunit of the NPCs (Boehmer et al., 2003; D'Angelo et al., 2009). NUP98 is a mobile nucleoporin and is associated with the central pore, nuclear basket and cytoplasmic filaments. Both NUPs have been implicated in degenerative disorders. (Eftekharzadeh et al., 2018; Wu et al., 2001).

      There is also no clear hypothesis on how Aβ pathology may affect nucleoporin levels and NPC function. All functional NCT experiments are based on reporters or dyes, although one would expect widespread mislocalization of endogenous proteins, likely affecting many cellular pathways.

      We agree that the interaction between Aβ pathology and the NPC remains a work in progress. We decided to rigorously characterize Aβ-mediated deficits in App KI neurons – using different approaches and in more than one animal model – before moving on to explore mechanisms in subsequent studies, which we think deserves more extensive experiments. We seek your understanding and have included in the discussion, possible mechanisms for direct and indirect Aβ-mediated disruption of NPCs. We have also included an additional study to show the disruption in the localization of an endogenous nucleocytoplasmic protein – CRTC1 (cAMP Regulated Transcriptional Coactivator), which is CREB coactivator responsive to neural activity. We observed under basal and also in tetrodotoxin-silenced conditions, there is much higher CRTC1 in the nucleus in App KI neurons relative to WT. This reflects the compromised permeability barrier that we observed via FRAP studies. (Supplementary Figure S15).

      The second part of this manuscript reports that in App KI neurons, disruption in the permeability barrier and nucleocytoplasmic transport may enhance activation of key components of the necrosome complex that include receptor-interacting kinase 3 (RIPK3) and mixed lineage kinase domain1 like (MLKL) protein, resulting in an increase in TNFα-induced necroptosis. While this is of potential interest, it is not well integrated in the study. This potential disease pathway is not shown in the very simple schematic (Fig. 8) and is barely mentioned in the Discussion section, although it would deserve a more thorough examination.

      The study of necroptosis is meant to showcase a single cellular pathway that requires nucleocytoplasmic transport for activation that is compromised and is relevant for AD. We agree there is much more to explore in this pathway but feel is outside the scope of this study. We have included a new illustration that models how damage to NPCs and permeability barrier results in enhanced vulnerability of App KI neurons for necroptosis (Supplemental figure S12).

      Reviewer #2 (Public Review):

      (1) Adding statistics and comparisons between wild-type changes at different times/ages to determine if the nuclear pore changes with time in wild-type neurons. The images show differences in the Nuclear pore in neurons from the wild-type mice, with time in culture and age. However, a rigorous statistical analysis is lacking to address the impact of age/development on NUP function. Although the authors state that nuclear pore transport is reported to be altered in normal brain aging, the authors either did not design their experiments to account for the normal aging mechanisms or overlooked the analysis of their data in this light.

      All our quantifications and statistical comparisons in neuron cocultures are time-matched between WT and App KI neurons, and thus independent of age and maturity of the neurons in culture. The accelerated loss of NUP expression is evident across all time groups. However, we cannot compare across age groups in cultured neurons as the time-matched WT and App KI samples for each time point were processed and imaged separately as neurons matured over time (Fig. 1B-C). An experiment must be done simultaneously across all age groups to compare agerelated effects for WT and App KI neurons in order to account for time-dependent changes. Given the unique challenges of studying “aging” in culture systems, we opted to be more conservative in our interpretation of the results and as such, we were careful to describe the accelerated nuclear pore deficits in App KI neurons relative to time-matched WT expression and speculate its relationship to normal brain aging only in the discussion section. We seek your understanding in this matter. That said, we are able to capture the decline of the NPC in histology of brain sections and observed a statistically significant drop in WT NUP levels in animal sections across age groups where we quantified and compared the raw nuclear intensities from brain sections that were processed and imaged simultaneously across independent experiments (Fig. 1D-E). We have included a statement in the results section to highlight that point.

      (2) Add experiments to assess the contribution of wild-type beta-amyloid accumulation with aging. It was described in 2012 (Guix FX, Wahle T, Vennekens K, Snellinx A, Chávez-Gutiérrez L, Ill-Raga G, Ramos-Fernandez E, Guardia-Laguarta C, Lleó A, Arimon M, Berezovska O, Muñoz FJ, Dotti CG, De Strooper B. 2012. Modification of γ-secretase by nitrosative stress links neuronal ageing to sporadic Alzheimer's disease. EMBO Mol Med 4:660-673, doi:10.1002/emmm.201200243) and 2021 (Burrinha T, Martinsson I, Gomes R, Terrasso AP, Gouras GK, Almeida CG. 2021. Upregulation of APP endocytosis by neuronal aging drives amyloid-dependent synapse loss. J Cell Sci 134. doi:10.1242/jcs.255752), 28 DIV neurons are senescent and accumulate beta-amyloid42. In addition, beta-amyloid 42 accumulates normally in the human brain (Baker-Nigh A, Vahedi S, Davis EG, Weintraub S, Bigio EH, Klein WL, Geula C. 2015. Neuronal amyloid-β accumulation within cholinergic basal forebrain in ageing and Alzheimer's disease. Brain 138:1722-1737. doi:10.1093/brain/awv024), thus, it would be important to determine if it contributes to NUP dysfunction. Unfortunately, the authors tested the Abeta contribution at div14 when wild-type Abeta accumulation was undetected. It would enrich the paper and allow the authors to conclude about normal aging if additional experiments were performed, namely, treating 28Div neurons with DAPT and assessing if NUP is restored.

      Your point is well-noted. We are intrigued at the potential contribution of WT Aβ to the decline in NUPs and NPC but decided to focus on mutant Aβ for this manuscript. We have observed negligible MOAB2-positive Aβ signals in WT neurons across all age groups (data not shown) but acknowledge the potential contributions of aging toward a reduction in NPC function. Instead, we have included a section in the discussion to highlight the aging-related expression of Aβ in WT neurons and a subset of the citations above to indicate a possible link with normal decay of NPCs.

      Reviewer #3 (Public Review):

      Weaknesses:

      (1) It does not consider the relationship of the findings here to other published work on the intraneuronal perinuclear and nuclear accumulation of amyloid in other transgenic mouse models and in humans.

      We have updated the discussion to further elaborate on intraneuronal and perinuclear accumulation of amyloid and how that relates to our NPC phenotype.

      (2) It appears to presume that soluble, secreted Abeta is responsible for the effect rather than the insoluble amyloid fibrils.

      At present, our data cannot fully discount the role of fibrils or other forms of Aβ causing the NPC deficits, but our studies do show that external presence of Aβ (e.g. addition of synthetic oligomeric Aβ or App KI conditioned media) leads to intracellular accumulation and NPC dysfunction. We are aware that endogenous formation of fibrils could also contribute to the NPC dysfunction but refrained from drawing any conclusions without further studies. We have stated this in the discussion.

      (5) It is not clear when the alteration in NUP expression begins in the KI mice as there is no time at which there is no difference between NUP expression in KI and Wt and the earliest time shown is 2 months. If NUP expression is decreased from the earliest times at birth, then this makes the significance of the observation of the association with amyloid pathology less clear.

      The phenotype we observed early in neuronal cultures and in very young animals is subtle and in all our studies, the severity of the NUP phenotypes consistently correlates with elevated intracellular Aβ. We expect that by looking at earlier/younger neurons, the deficits will not be present. However, neurons before DIV7 are immature, and hence we chose not to include those in our observations. In animals, we observed Aβ expression in neuronal soma in young mice (2 mo.), but it is not clear when the deficits manifests and how early to look. While the NUP expression is reduced at an early stage, we speculate in discussion that cellular homeostatic mechanisms can compensate for any compromised nuclear functions and to maintain viability to the point where age-dependent degradation of cellular mechanisms will eventually lead to progression of AD.

      Reviewer #1 (Recommendations For The Authors):

      While the App KI model is suitable for modeling one key aspect of human AD, the use of the term "AD neurons" throughout the manuscript is misleading and should be avoided when describing experiments with "App KI neurons".

      Noted and corrected.

      The claim that Aβ pathology causes NPC dysfunction via reduced nucleoporin protein expression would be stronger if it was better supported by biochemical evidence based on western blots (WBs) to complement the strong microscopy data. The results shown in Figure 2H show a very weak effect compared to microscopy data that does not appear to match the quantification (e.g. Lamin-B1 staining appears reduced after 2 months in WB but not the graph). It is also not clear why nuclear fractionation is required. WB analyses with RL1 and MAB414 (that recognizes multiple FG-Nupsin ICCs and WBs) would help identify Nups that are most affected by Aβ pathology.

      The weaker Western blot results is due to the heterogeneity of the nuclei we isolated from the whole brain which includes non-neuronal cells. We reasoned that isolating the nuclear fraction would give us a cleaner Western blot with fewer background bands as the input lysate is more specific. We also decided to use antibodies against specific NUPs as a way to complement the pan-NPC antibodies that detect glycosylation-enriched epitopes in the nucleus. We reasoned that Western blot identification of individual subunits should provide complementary and stronger evidence for the reduction of NUPs at the peptide level. Overall, we used four different nuclear pore antibodies (RL1, Mab414, NUP98, NUP107) to demonstrate the same mutant phenotype in App KI neurons.

      While the observed NCT defects are discussed in detail, the authors do not present any potential mechanisms to be tested, how intracellular Aβ may impact NPCs. Does Aβ pathology affect nucleoporin expression or stability?

      We have observed the presence of Aβ adjacent to the nuclear membrane and also in the cytosol via high resolution confocal microscopy (Supplementary Figure S14). Our primary goal in this paper is to provide convincing evidence – using different assays and in more than one mouse model – for the reduction of NUPs and lower NPC counts. We feel mechanistic details of Aβdriven NPC disruption requires more extensive experimentation more suitable for subsequent publications.

      The very simple schematic just represents the loss of compartmentalization, without illustrating more complex concepts. It would also be improved by representing the outer and inner nuclear membrane fusing around the NPCs with a much wider perinuclear space between the membranes. As shown now, the nuclear envelope almost looks like a single membrane, while >60kDa proteins are shown at a similar size as the 125MDa NPC.

      We have updated the illustration along with a new schematic for necroptosis (Supplementary Figure S12). We have refrained from giving specific details of the damage to the nuclear pore complex because it is not yet clear the nature of these deficits.

      Misspelling of "Hoechst" as "Hochest" in several figures (Fig. 1, 2, S5, S7).

      Noted and corrected

      Reviewer #2 (Recommendations For The Authors):

      (1) Additional data analysis is required concerning the wild-type controls. The figures show clear differences in the wild-type neurons with time in culture (referring to figures 1A, 1B, 1C; 2A, 2B, 2C, 2D,6E, 6F, 6G, s4) and in different ages (2E, 2F, 2G, 5B, 5C, 5D). The data analysis is shown for knockin vs the time-matched wild-type condition. The effect of time in wild-type neurons/mice should also be analyzed. All the data is suggested to be normalized to 7 DIV/2month wild-type neurons/mice. Were these experiments done with different time points of the same culture? This would be the best to conclude on the effect of time.

      We have noted a decline of NUPs in WT neurons over time in primary cultures and in animal sections. This is not surprising since the NPC and nuclear signaling pathways deteriorate with age (Liu and Hetzer, 2022; Mertens et al., 2015). However, we are unable to do a direct comparison across age groups in cultured neurons as the time-matched WT and App KI neuronal samples for each time point were processed and imaged separately as neurons matured over time (Fig. 1B-C). Hence, we perform statistical analysis for each time-matched WT and App KI neurons. To be clear, multiple independent experiments across different cultures were performed at each time point. Given the inherent challenges of studying aging in culture systems, we opted to be more conservative in our interpretation of the results and as such, we were careful to describe the accelerated nuclear pore deficits in App KI neurons relative to WT levels without inferring the effect of time and speculate its relationship to normal brain aging only in the discussion section. That said, we are able to capture the decline of the nuclear pore complex across different age groups in histology of brain sections where we observed a drop in WT NUP levels in animal sections when we quantified and compared the raw nuclear intensities from brain sections that were processed and imaged simultaneously across independent experiments (Fig. 1D-E).

      Similarly, in Figure 2H, why aren't 2 months compared with 14 months? Why were these ages chosen? 2 months is a young adult, and 14 months is a middle-aged adult. To conclude, aging should have included an age between 18 and 24 months old.

      As with cultures, we isolated age-matched WT and App KI animals separately. We chose 2 to 14 months as they represent young and middle-aged adults as we wanted to showcase the nuclear pore deficits induced by the presence of Aβ without drawing a conclusion on the effects of age or time. That said, we do show histology of brain sections at 18 months of age with individual NUPs. We agree that the temporal aspects of NPC loss in WT neurons is interesting, however, given our experimental parameters, we cannot draw conclusions across different age groups at the moment.

      In Figure 3, statistics between wild type should have been included.

      Similar to the above comment, samples were processed and imaged independently across different groups, hence we cannot compare the datapoints across time.

      (4) Additional quantification: The intensity of MOAB2 at 2 and 13 months should be measured as in Figure 3C.

      Intracellular Aβ signal in 2-mo. old App KI mice is diffuse throughout the soma but in older animals, they are punctate. This observation was similarly described by Lord et al. for tgAPPArcSwe mice (Lord et al., 2006). We have included a confocal micrograph of MOAB-2 immunocytochemistry of a 13-mo. App KI brain section in supplemental figures (Supplementary Figure S13). We found it challenging to differentiate whether the signal is localized intracellularly or as an extracellular aggregate. Regardless, the differences in the quality and uneven distribution of Aβ signal makes any direct comparison of soma intensity across the different age groups harder to interpret in the context of the mutant phenotype.

      (5) Additional experiments: Because primary neurons differentiate, mature, and age with time in culture, they are required to control for the developmental stage of your cultures. Analyzing neuronal markers such as doublecortin for neuronal precursors, MAP2 (or Tau) for dendritic/axonal maturation, synapsin for synaptic maturation, and accumulation of senescenceassociated beta-galactosidase (SA-Beta-Gal) as an aging marker.

      As part of the maintenance of cultures, we stain cultures for axodendritic markers (e.g. MAP2), glial cell distribution (e.g GFAP) and excitatory vs. inhibitory neuronal subpopulations (e.g. Gad65) and synaptic markers (e.g. PSD95) to ensure that growth, survival and viability of neurons are not compromised (data not shown). These markers for maturity are routinely tracked to ensure proper development. We also test the health of the cultures (e.g. apoptosis, necrosis) and to look for cytoskeletal disruption or fragmentation for neuronal processes.

      (6) Additional methods: The quantification of Abeta intensity in Figure 3 is not clearly explained in the methods. Was the intensity measured per field, per cell body?

      The quantifications for Aβ are done for each MAP2-positive cell body and have included that statement in the methods.

      (7) Missing in discussion integration and references to these papers:

      a. Mertens J, Paquola ACM, Ku M, Hatch E, Böhnke L, Ladjevardi S, McGrath S, Campbell B, Lee H, Herdy JR, Gonçalves JT, Toda T, Kim Y, Winkler J, Yao J, Hetzer MW, Gage FH. 2015. Directly Reprogrammed Human Neurons Retain Aging-Associated Transcriptomic Signatures and Reveal Age-Related Nucleocytoplasmic Defects. Cell Stem Cell 17:705-718. doi:10.1016/j.stem.2015.09.001

      b. Guix FX, Wahle T, Vennekens K, Snellinx A, Chávez-Gutiérrez L, Ill-Raga G, Ramos-Fernandez E, Guardia-Laguarta C, Lleó A, Arimon M, Berezovska O, Muñoz FJ, Dotti CG, De Strooper B. 2012. Modification of γ-secretase by nitrosative stress links neuronal ageing to sporadic Alzheimer's disease. EMBO Mol Med 4:660-673. doi:10.1002/emmm.201200243

      c. Burrinha T, Martinsson I, Gomes R, Terrasso AP, Gouras GK, Almeida CG. 2021. Upregulation of APP endocytosis by neuronal aging drives amyloid-dependent synapse loss. J Cell Sci 134. doi:10.1242/jcs.255752),

      Neuronal amyloid-β accumulation within cholinergic basal forebrain in ageing and Alzheimer's disease. Brain 138:1722-1737. doi:10.1093/brain/awv024).

      We have cited a subset of the papers in the discussion section and also expanded the discussion to include the possibility of time-dependent changes for Aβ expression in WT neurons.

      Reviewer #3 (Recommendations For The Authors):

      Specific comments:

      (1) Fig. 1D,E. Fig. 2E, F. This shows the change in NUP IR with time for the APP-KI, but there is also a difference between Wt and KI from the earliest time shown. How early is this difference apparent? From birth? The study should go back to the earliest time possible as the timing of the staining for NUP is important to correlate this with other events of intraneuronal Abeta and amyloid IR. Is the difference between 4 and 7-month ko mice in Figures 2G and 2F statistically significant? If not, perhaps we need a larger N to determine the timing accurately.

      The point is well taken. We have not examined the WT and App KI brains before 2-mo. of age. At this early time point, the extracellular amyloid deposits are very low but intracellular Aβ can be readily detected in neuronal soma. We expect that as the animal ages, the Aβ inside cells will directly impact the NPC mutant phenotype, but it is unclear how early this phenotype manifests in animals and when we should look. To be clear, in less mature neurons (DIV7), the phenotype is very subtle and can only be observed via high resolution microscopy. The differences between 4-7 mo. old animals (Fig. 2F and G) in terms of severity of the reduction cannot be assessed as the age-matched animals for each time point were processed separately, but at each time point, we observed a significant reduction of NPC relative to WT. Nevertheless, in Figure 1E, we performed immunohistochemistry experiments with pan-NPC antibodies and quantified raw intensities to show a difference between 4/7-mo. with 13-mo. old animals.

      (2) Similarly, the increase in Abeta IR is only shown for cultured neurons and only a single time point of 2 months is shown for CA1 in KI brain. Since a major point is that the decrease in NUP IR is correlated with an increase in Abeta IR, a more convincing approach would be to stain for both simultaneously in KI brain, especially since Abeta IR is quite sensitive to conformational variation between APP, Abeta, and aggregated forms and whether they are treated with denaturants for "antigen retrieval". The entire brain hemisphere should be shown as the pathology is not limited to CA1. There are many different Abeta antibodies that are specific to the amyloid state so it should be possible to come up with a set of antibodies and conditions that work for both Abeta and NUP staining.

      The intracellular Aβ signal in 2-mo. old App KI mice is diffuse throughout the soma but in older animals, they are punctate. We have included a confocal micrograph of MOAB-2 immunocytochemistry of a 13-mo. App KI brain section (Supplementary Figure S13). We did not quantify Aβ as it was challenging to differentiate if the signal is intracellular Aβ or amyloid β plaques. Regardless, the differences in the quality and uneven distribution of Aβ signal makes any direct comparison of soma intensity across the different age groups much harder to interpret.

      (3) Figure 3A. The staining with MOAB 2 and 82E1 appears qualitatively different with 82E1 exhibiting larger perinuclear puncta. Both antibodies appear to stain puncta inside the nucleus consistent with previously published reports of intranuclear amyloid IR. If these are flattened images, then 3D Z stacks should be shown to clarify this. Figure 3H shows what appears to be Abeta immunofluorescence quantitation in DAPT-treated cells, but the actual images are apparently not shown. The details of this experiment aren't clear or what antibody is used, but this may not be Abeta as many APP fragments that are not Abeta also react with antibodies like MOAB2.

      Since 82E1 detects a larger epitope (aa1-16 as compared to 1-4 in MOAB-2), it is possible some forms of Aβ are differentially detected inside the cell. MOAB-2 is shown to detect the different forms of Aβ40 and 42, with a stronger selectivity for the latter. However, it is not known to react with APP or APP/CTFs (Youmans et al., 2012). DAPT-treated cells were processed and imaged as with other experiments in figure 3 using MOAB-2 antibodies to detect Aβ. We have included that information in the figure legends.

      The way we image the cell is to collect LSM800 confocal stacks and use IMARIS software to render the nucleus in a 3D object prior to quantifying the intensity or coverage. In this way, we are capturing and quantifying the entire volume of the nucleus and not just a single plane. The majority of signal for MOAB-2 positive Aβ are punctate signals in the cytosol with a subset adjacent to the nucleus (Supplementary Figure 14; Airyscan; single plane). We also detected MOAB-2 signals coming from within the nucleus. The nature of this interaction between Aβ and the nuclear membrane/perinuclear space/nucleoplasm remains unclear.

      (4) P20 L12. "We demonstrate an Aβ-driven loss of NUP expression in hippocampal neurons both in primary cocultures and in AD mouse models" It isn't clear that exogenous or extracellular Abeta drives this in living animals. All the data that demonstrate this is derived from cell culture and things may be very different (eg. Soluble Abeta concentration) in vivo. It is OK to speculate that the same thing happens in vivo, but to say it has been demonstrated in vivo is not correct.

      We have rewritten the opening statement in the paragraph to narrowly define our observations in the context of App KI. We understand the caveats of our studies in primary cultures, but we have done our due diligence to study the phenomenon in different assays, using at least four different nuclear pore antibodies, and in more than one mouse model to show the deficits. We mentioned Aβ-driven loss but did not conclude which Aβ peptide (e.g. 40 vs. 42) or form (e.g. fibrillar) that drives the deficits. However, we have shown some data that oligomers and not monomers as well as extracellular Aβ can accumulate in the soma and trigger NPC deficits. We also state in the discussion that other possible mechanisms of action, mainly via indirect interactions of Aβ with the cell, could result in the deficits.

      (5) P21, L21 "Inhibition of γ-secretase activity prevented cleavage of mutant APP and generation of Aβ, which led to the partial restoration of NUP levels". What the data actually shows is that treatment of the cells with DAPT led to partial restoration of NUP levels. Other studies have shown that DAPT is a gamma secretase inhibitor, so it is reasonable to suspect that the effect to gamma secretase activity, but the substrates and products are assumed rather than measured, so a little caution is a good idea here. For example, CTF alpha is also a substrate, producing P3, which is not considered abeta. The products Abeta and P3 also typically are secreted, where they can be further degraded. Abeta and P3 can also aggregate into amyloid, so whether the effect is really due to Abeta per se as a monomer or Abeta-containing aggregates isn't clear.

      The point is noted. DAPT inhibition of -secretase can impact more than one substate as the complex can cleave multiple substrates. However, we have measured Aβ intensity which increases with DAPT, and while a singular experiment is insufficient to show direct Aβ involvement, we have performed other experiments that show a correlation of Aβ levels inside the soma and the degree of NPC reduction. This includes the direct application of synthetic Aβ42 oligomers. We agree the data cannot fully exclude the involvement of other -secretase cleavage products, but we feel there is strong enough evidence that Aβ – in whatever form - is at least partially if not, the main driver that promote these deficits.

      (6) Discussion. The authors point to "intracellular Abeta" as a potential causative agent for decreased NUP expression and function and cite a number of papers reporting intracellular Abeta. (D'Andrea et al., 2001; Iulita et al., 2014; Kimura et al., 2003; LaFerla et al., 1997; Oddo et al., 2003b; Takahashi et al., 2004; Wirths et al., 2001). Most of these papers report immunoreactivity with Abeta antibodies and argue about whether this is really Abeta40 or 42 and not APP or APP-CTF immunoreactivity. What is missing from these papers and the discussion in this manuscript is that this is not just soluble Abeta, but Abeta amyloid of the same type that ends up in plaques because it has the same immunoreactivity with Abeta amyloid fibril-specific antibodies and even the classical anti-Abeta antibodies 6E10 and 4G8 after antigen retrieval as shown in papers by Pensalfini, et al., 2014 and Lee, et al., 2022 (1,2) who describe the evolution of neuritic plaques and their amyloid core beginning inside neurons. The term "dystrophic neurite" is a misnomer because the structures that resemble "neurites" morphologically are actually autophagic vesicles packed with Abeta and APP immunoreactive material which has the detergent insolubility properties of amyloid plaques. See (1,2). The apparent intranuclear IR of MOAB2 and 82E1 mentioned in comment 3 is relevant here. In Lee et al., the 3D serial section EM reconstruction of one of these neurons with perinuclear and nuclear amyloid shows abundant amyloid fibrils in the remnant of the nucleus. The nuclear envelope appears to break down as evidenced by the redistribution of NeuN immunoreactivity (Pensalfini et al.,) and other nuclear markers and the EM evidence (Lee et al.,). These papers are also improperly cited as evidence for a hypothetical intracellular source for soluble Abeta.

      We have devoted a section of the discussion to highlight some of these findings in the context of Pensalfini et al. 2014 and Lee et al. 2022. Lee et al. tested multiple animal strains to observe the Panthos structures but did not use the App KI mouse model. Since none of our experiments directly tested their observations (e.g. perinuclear fibrils or acidity of autophagic vesicles) in App KI, we decided to take a more conservative approach in our interpretations by framing the NPC deficits without specifying the nature of the intracellular Aβ. We note in discussion that it is entirely possible that App KI animals also show the same Panthos phenotypes and the perinuclear accumulation of Aβ which results in damaged NUPs. To do that, the Panthos phenotype must first be established in App KI mice.

      (7) The authors also cite the work of Ditaranto et al., 2001 and Ji et al., 2002 for Aβ-induced lysosomal leakage from these vesicular structures but overlook the original publications on Abeta-induced lysosomal leakage by Yang et al., (3) who further show that this is correlated with aggregation of Abeta42 upon internalization which also leads to the co-aggregation of APP and APP-CTFs in a detergent-insoluble form (4) and pulse-chase studies demonstrate that metabolically-labeled APP ultimately ends up as insoluble Abeta that have "ragged" N-termini (5). This work seems relevant to the results reported here as the perinuclear amyloid that the authors report here is likely to be the same insoluble, aggregated APP and APP-CTF-containing amyloid as that reported in references 1 and 2.

      We have included the literature references in the discussion, highlighting the possibility of lysosomal leakage contributing to the NPC damage.

      Minor points.

      (1) P2, L28 "permeability barrier facilities passive" should be 'facilitates'.

      (2) P7, L24 "homogenate and grounded for 5 additional strokes" One of the peculiarities of English is that the past tense of grind is ground. Grounded means something else.

      (3) P8, L9 "For synthetic Aβ experiments," Abeta what? 42? 40? It makes a difference and if it is Abeta42, you should be specific in the rest of the text where it is used.

      (4) P11, L14. "To determine if Aβ can trigger changes in nuclear structure and function" It seems a little early to start by presupposing that it is Abeta that triggers changes in nuclear structure and function. It sounds like you are starting out with a bias.

      (5) P11, L16,17 "While Aβ pathology is robustly detected in App KIs" At some point in the manuscript, either here or in the introduction, it would be useful to include a couple of sentences about what the pathology is in these mice along with the timing of the development of the pathology to compare with the results presented here. There are several types of amyloid deposits, "neuritic" plaques, diffuse plaques, and cerebrovascular amyloid. This is important because the early "neuritic" plaques are intraneuronal at least early on before the neuron dies. See (1,2).

      (6) P19, L10. "LMB is an inhibitor or CRM-1 mediated" should be of

      All minor points have been addressed in the manuscript and figures.

      References

      (1) Pensalfini, A., Albay, R., 3rd, Rasool, S., Wu, J. W., Hatami, A., Arai, H., Margol, L., Milton, S., Poon, W. W., Corrada, M. M., Kawas, C. H., and Glabe, C. G. (2014) Intracellular amyloid and the neuronal origin of Alzheimer neuritic plaques. Neurobiol Dis 71C, 53-61

      (2) Lee, J. H., Yang, D. S., Goulbourne, C. N., Im, E., Stavrides, P., Pensalfini, A., Chan, H., Bouchet-Marquis, C., Bleiwas, C., Berg, M. J., Huo, C., Peddy, J., Pawlik, M., Levy, E., Rao, M., Staufenbiel, M., and Nixon, R. A. (2022) Faulty autolysosome acidification in Alzheimer’s disease mouse models induces autophagic build-up of Abeta in neurons, yielding senile plaques. Nat Neurosci 25, 688-701

      (3) Yang, A. J., Chandswangbhuvana, D., Margol, L., and Glabe, C. G. (1998) Loss of endosomal/lysosmal membrane impermeability is an early event in amyloid Aß1-42 pathogenesis. J. Neurosci. Res. 52, 691-698

      (4) Yang, A. J., Knauer, M., Burdick, D. A., and Glabe, C. (1995) Intracellular A beta 1-42 aggregates stimulate the accumulation of stable, insoluble amyloidogenic fragments of the amyloid precursor protein in transfected cells. J Biol Chem 270, 14786-14792

      (5) Yang, A., Chandswangbhuvana, D., Shu, T., Henschen, A., and Glabe, C. G. (1999) Intracellular accumulation of insoluble, newly synthesized Aßn-42 in APP transfected cells that have been treated with Aß1-42. J. Biol. Chem. 274, 20650-20656

      References

      Boehmer, T., Enninga, J., Dales, S., Blobel, G., and Zhong, H. (2003). Depletion of a single nucleoporin, Nup107, prevents the assembly of a subset of nucleoporins into the nuclear pore complex. Proc Natl Acad Sci U S A 100, 981-985.

      D'Angelo, M.A., Raices, M., Panowski, S.H., and Hetzer, M.W. (2009). Age-dependent deterioration of nuclear pore complexes causes a loss of nuclear integrity in postmitotic cells. Cell 136, 284-295.

      Eftekharzadeh, B., Daigle, J.G., Kapinos, L.E., Coyne, A., Schiantarelli, J., Carlomagno, Y., Cook, C., Miller, S.J., Dujardin, S., Amaral, A.S., et al. (2018). Tau Protein Disrupts Nucleocytoplasmic Transport in Alzheimer's Disease. Neuron 99, 925-940 e927.

      Liu, J., and Hetzer, M.W. (2022). Nuclear pore complex maintenance and implications for agerelated diseases. Trends Cell Biol 32, 216-227.

      Lord, A., Kalimo, H., Eckman, C., Zhang, X.Q., Lannfelt, L., and Nilsson, L.N. (2006). The Arctic Alzheimer mutation facilitates early intraneuronal Abeta aggregation and senile plaque formation in transgenic mice. Neurobiol Aging 27, 67-77.

      Mertens, J., Paquola, A.C., Ku, M., Hatch, E., Bohnke, L., Ladjevardi, S., McGrath, S., Campbell, B., Lee, H., Herdy, J.R., et al. (2015). Directly Reprogrammed Human Neurons Retain Aging-Associated Transcriptomic Signatures and Reveal Age-Related Nucleocytoplasmic Defects. Cell stem cell 17, 705-718.

      Wu, X., Kasper, L.H., Mantcheva, R.T., Mantchev, G.T., Springett, M.J., and van Deursen, J.M. (2001). Disruption of the FG nucleoporin NUP98 causes selective changes in nuclear pore complex stoichiometry and function. Proc Natl Acad Sci U S A 98, 3191-3196.

      Youmans, K.L., Tai, L.M., Kanekiyo, T., Stine, W.B., Jr., Michon, S.C., Nwabuisi-Heath, E., Manelli, A.M., Fu, Y., Riordan, S., Eimer, W.A., et al. (2012). Intraneuronal Abeta detection in 5xFAD mice by a new Abeta-specific antibody. Molecular neurodegeneration 7, 8.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We thank both reviewers for their supportive comments. Reviewer 1 has suggested a different data processing strategy to better resolve subunits at the CALHM4/CALHM2 interface:

      I recommend an alternative data processing strategy. First, refine particles with 2-4 CALHM4 subunits with symmetry imposed. This is followed by symmetry expansion, signal subtraction of two adjacent subunits, and subsequent classification and refinement of the subtracted particles. This approach, while not guaranteed, can potentially provide a clearer definition of CALHM2 and CALHM4 interfaces and show whether CALHM2 subunits adopt different conformations based on their proximity to CALHM4 subunits.

      We have followed the recommended strategy in an attempt to improve the resolution and better resolve the structural heterogeneity in CALHM2/4 channels. To this end, we have combined symmetry expansion and partial signal subtraction, as suggested by the reviewer. Initially, a symmetrized (C11) 3.4 Å consensus map of undecameric CALHM2/4 channels bound to sybodies SbC2 and SbC4 was used. The particles of this reconstruction were subjected to symmetry expansion (C11) followed by signal subtraction of nine adjacent subunits. Next, we performed focused, alignment-free 3D classification of the remaining two subunits followed by refinement of these classes, leading to the classification of CALHM subunit pairs. The majority of the classes feature well-resolved CALHM2 pairs, consistent with the original approach (Author response image 1A). A minority of the classes contain CALHM4 subunits, revealing heterogeneity similar to regions of CALHM4 subunits observed in the non-symmetrized channel reconstruction (Author response image 1B). Unfortunately, this approach thus did not improve resolution or facilitate a more accurate subunit assignment. Consequently, we decided not to include these attempts in our manuscript. The resubmitted version thus contains only small corrections compared to the previous version.

      Author response image 1.

      Classification of subunit pairs of undecameric CALHM2/4 channels bound to sybodies SbC2 and SbC4 after the processing combining symmetry expansion and partial signal subtraction. (A) Classes showing CALHM2 subunit pairs. (B) Classes showing subunits at interfaces to CALHM4.

  2. Apr 2024
    1. Author response:

      The following is the authors’ response to the original reviews.

      We would like to express our gratitude to the reviewers for their suggestions and critiques as we continually strive to enhance the quality of the manuscript. We improved it, by incorporating the reviewers’ suggestions, changing the content and numbering of figures (Figs 1, 3S1 were edited; 4 figures were moved to supplemental materials), and adding several analyses suggested by the reviewers along with accompanying figures (1S2, 1S3) and tables (1 and 2). These analyses include investigating the link between freezing behavior and 44-kHz calls as well as their sound mean power and duration. Also, we have introduced detailed information regarding the experiments performed as well as expanded the description and discussion of the results section. Finally, we added the information about 44-kHz calls reported by another group – which was inspired by our findings.

      Below is the point-by-point response to the reviewers’ comments.

      Reviewer #1 (Public Review):

      Olszyński and colleagues present data showing variability from canonical "aversive calls", typically described as long 22 kHz calls rodents emit in aversive situations. Similarly long but higher-frequency (44 kHz) calls are presented as a distinct call type, including analyses both of their acoustic properties and animals' responses to hearing playback of these calls. While this work adds an intriguing and important reminder, namely that animal behavior is often more variable and complex than perhaps we would like it to be, there is some caution warranted in the interpretation of these data. The authors also do not provide adequate justification for the use of solely male rodents. With several reported sex differences in rat vocal behaviors this means caution should be exercised when generalizing from these findings.

      We fully agree that our data should be interpreted with caution and we followed the Reviewer’s suggestions along these lines (see below). Also, we appreciate the suggestion to explore the prevalence of 44-kHz calls in female subjects, which would indeed represent an important and intriguing extension of our research. However, due to present financial constraints, we can only plan such experiments. To address the comment, we have added the sentence: “Here we are showing introductory evidence that 44-kHz vocalizations are a separate and behaviorally-relevant group of rat ultrasonic calls. These results require further confirmations and additional experiments, also in form of repetition, including research on female rat subjects.”

      It is important to note that the data presented in the current manuscript originates primarily from previously conducted experiments. These earlier experiments employed male subjects only; it was due to established evidence indicating that the female estrus cycle significantly influences ultrasonic vocalization (Matochik et al., 1992). Adhering to controls for the estrus cycle would require a greater number of female subjects than males, which would not only increase animal suffering but also escalate the demands of human labor and financial costs.

      Firstly, the authors argue that the shift to higher-frequency aversive calls is due to an increase in arousal (caused by the animals having received multiple aversive foot shocks towards the end of the protocols). However, it cannot be ruled out that this shift would be due to factors such as the passage of time and increase in fatigue of the animals as they make vocalizations (and other responses) for extended periods of time. In fact the gradual frequency increase reported for 22 kHz calls and the drop in 44 kHz calls the next day in testing is in line with this.

      Answer: We would like to point out that the “increased-arousal” hypothesis, declared in the manuscript, is only a hypothesis – as reflected by the wording used. However, we changed the beginning of the sentence in question from “It could be argued” to “We would like to propose a hypothesis” to emphasize the speculative aspect of the proposed explanation behind the increase of 44-kHz ultrasonic emissions.

      Also, we do agree that other factors could contribute to the increased emission of 44kHz calls. These factors could include: heightened fear, stress/anxiety, annoyance/anger, disgust/boredom, grief/sadness, despair/helplessness, and weariness/fatigue. We are listing these potential factors in the discussion. Also, we added: “It is not possible, at this stage, to determine which factors played a decisive role. Please note that the potential contribution of these factors is not mutually exclusive”. However, we propose a list of arguments supporting the idea that 44-kHz vocalizations communicate an increased negative emotional state. Among these arguments were the conclusions drawn from additional analyses – mostly inspired by the fatigue hypothesis proposed by the Reviewer #1. In particular, we investigated changes in the sound mean power and duration of 22-kHz and 44-kHz calls. Specifically, we showed that the mean power of 44-kHz vocalizations did not change, and was higher than that of 22-kHz vocalizations (Fig. 1S2EF).

      Finally, the Reviewer #1 listed “the gradual frequency increase reported for 22 kHz calls and the drop in 44 kHz calls the next day” as arguments for the fatigue hypothesis. We do not agree that the “increase” should be interpreted as a sign of fatigue [Producing and maintaining higher frequency calls require greater effort from the vocalizer, on which we elaborated in the manuscript], also we are not sure what “drop in 44 kHz calls” the Reviewer is referring to [We assume it refers to less 44-kHz calls during testing vs. training; we suppose that the levels of arousal are lower in the test due to shorter session time and lack of shocks, which additionally contributes to fear extinction].

      Secondly, regarding the analysis where calls were sorted using DBSCAN based on peak frequency and duration, it is not surprising that the calls cluster based on frequency and duration, i.e. the features that are used to define the 44 kHz calls in the first place. Thus presenting this clustering as evidence of them being truly distinct call types comes across as a circular argument.

      Answer: The DBSCAN sorting results were to convey that when changing the clustering ε value, the degree of cluster separation, the 44-kHz vocalizations remained distinct from the 22-kHz and various short-call clusters that merged. In other words: 44-kHz calls remained separate from long 22-kHz, short 22-kHz and 50-kHz vocalizations, which all consolidated into one common cluster. As a result, in this mathematical analysis, 44-kHz vocalizations remained distinct without applying human biases. Additionally, frequency and duration are the two most common features used to define all types of calls (Barker et al., 2010; Silkstone & Brudzynski, 2019a, 2019b; Willey & Spear, 2013). In summary, we did not expect the analysis to isolate out the 44-kHz calls, and we were surprised by this result.

      The sparsity of calls in the 30-40 kHz range (shown in the individual animal panels in Figure 2C) could in theory be explained by some bioacoustics properties of rat vocal cords, without necessarily the calls below and above that range being ethologically distinct.

      Answer: We respectfully disagree with the argument regarding sparsity. It is important to note that, during prolonged fear conditioning experiments, we observed an increased incidence of 44-kHz calls (Fig. 1E-G) of up to >19% (Fig. 1S2AB) of the total ultrasonic vocalizations during specific inter-trial intervals. Also, it is possible that in observed experimental circumstances almost every fifth call could be attributed to the vocal apparatus as an artifact of its functioning (assuming we are interpreting the Reviewer’s argument correctly). While we do not believe this to be the case, we acknowledge the importance of considering such a hypothesis.

      The behavioral response to call playback is intriguing, although again more in line with the hypothesis that these are not a distinct type of call but merely represent expected variation in vocalization parameters. Across the board animals respond rather similarly to hearing 22 kHz calls as they do to hearing 44 kHz calls, with occasional shifts of 44 kHz call responses to an intermediate between appetitive and aversive calls. This does raise interesting questions about how, ethologically, animals may interpret such variation and integrate this interpretation in their responses. However, the categorical approach employed here does not address these questions fully.

      Answer: We are unsure of the Reviewer’s critique in this paragraph and will attempt to address it to the best of our understanding. Our finding of up to >19% of long seemingly aversive, 44-kHz calls, at a frequency in the define appetitive ultrasonic range (usually >32 kHz) is unexpected rather than “expected”. We would agree that aversive call variation is expected, but not in the appetitive frequency range.

      Kindly note the findings by Saito et al. (2019), which claim that frequency band plays the main role in rat ultrasonic perception. It is possible that the higher peak frequency of 44kHz calls may be a strong factor in their perception by rats, which is, however, modified by the longer duration and the lack of modulation.

      Also, from our experience, it is quite challenging to demonstrate different behavioral responses of naïve rats to pre-recorded 22-kHz (aversive) vs. 50-kHz (appetitive) vocalizations. Therefore, to demonstrate a difference in response to two distinct, potentially aversive, calls, i.e., 22-kHz vs. 44-kHz calls, to be even more difficult (as to our knowledge, a comparable experiment between short vs. long 22-kHz ultrasonic vocalizations, has not been done before).

      Therefore, we do not take lightly the surprising and interesting finding that “animals respond rather similarly to hearing 22 kHz calls as they do to hearing 44 kHz calls, with occasional shifts of 44 kHz call responses to an intermediate between appetitive and aversive calls”. We would rather put this description in analogous words: “the rats responded similarly to hearing 44-kHz calls as they did to hearing aversive 22-kHz calls, especially regarding heartrate change, despite the 44-kHz calls occupying the frequency band of appetitive 50-kHz vocalizations” and “other responses to 44-kHz calls were intermediate, they fell between response levels to appetitive vs. aversive playback” – which we added to the Discussion.

      Finally, we acknowledge that our findings do not present a finite and complete picture of the discussed aspects of behavioral responses to the presented ultrasonic stimuli (44-kHz vocalizations). Therefore, we have incorporated the Reviewer’s suggestion in the discussion. The added sentence reads: “Overall, these initial results raise further questions about how, ethologically, animals may interpret the variation in hearing 22-kHz vs. 44-kHz calls and integrate this interpretation in their responses.”

      In sum, rather than describing the 44kHz long calls as a new call type, it may be more accurate to say that sometimes aversive calls can occur at frequencies above 22 kHz. Individual and situational variability in vocalization parameters seems to be expected, much more so than all members of a species strictly adhering to extremely non-variable behavioral outputs.

      Answer: The surprising fact that there are presumably aversive calls that are beyond the commonly applied thresholds, i.e. >32 kHz, while sharing some characteristics with 22-kHz calls, is the main finding of the current publication. Whether they be finally assigned as a new type, subtype, i.e. a separate category or become a supergroup of aversive calls with 22-kHz vocalizations is of secondary importance to be discussed with other researchers of the field of study.

      However, we would argue – by showing a comparison – that 22-kHz calls occur at durations of <300 ms and also >300 ms, and are, usually, referred to in literature as short and long 22-kHz vocalizations, respectively (not introduced with a description that “sometimes 22kHz calls can occur at durations below 300 ms”). These are then regarded and investigated as separate groups or classes usually referred to as two different “types” (e.g., Barker et al., 2010) or “subtypes” (e.g., Brudzynski, 2015). Analogously, 44-kHz vocalizations can also be regarded as a separate type or a subtype of 22-kHz calls. The problem with the latter is that 22-kHz vocalizations are traditionally and predominantly defined by 18–32 kHz frequency bandwidth (Araya et al., 2020; Barroso et al., 2019; Browning et al., 2011; Brudzynski et al., 1993; Hinchcliffe et al., 2022; Willey & Spear, 2013).

      Reviewer #2 (Public Review):

      Olszyński et al. claim that they identified a "new-type" ultrasonic vocalization around 44 kHz that occurs in response to prolonged fear conditioning (using foot-shocks of relatively high intensity, i.e. 1 mA) in rats. Typically, negative 22-kHz calls and positive 50-kHz calls are distinguished in rats, commonly by using a frequency threshold of 30 or 32 kHz. Olszyński et al. now observed so-called "44-kHz" calls in a substantial number of subjects exposed to 10 tone-shock pairings, yet call emission rate was low (according to Fig. 1G around 15%, according to the result text around 7.5%).

      Answer: We are thankful for praising the strengths. Please note Figure 1G referred to 10-trial Wistar rats during delay fear conditioning session in which 44-kHz constituted 14.1% of ultrasonic vocalizations. The 7.5% number in results refers to the total of vocalizations analyzed across all animal groups used in fear conditioning experiments. These values have been updated in the current version of the manuscript. Also, please note – 44-kHz calls constituted up to 19.4% of calls, on average, in one of the ITI during fear conditioning session. However, the prevalence of aversive calls and of 44-kHz vocalizations in particular varied. It varied between individual rats; we added the text: “for n = 3 rats, 44-kHz vocalizations accounted for >95% of all calls during at least one ITI (e.g., 140 of total 142, 222 of 231, and 263 of 265 tallied 44-kHz calls), and in n = 9 rats, 44-kHz vocalizations constituted >50% of calls in more than one ITI.” See also further for the description of the array of experiments analyzed and the prevalence/percentage of 44-kHz calls encountered (Tab. 1, Fig. 1S3).

      Weaknesses: I see a number of major weaknesses.

      While the descriptive approach applied is useful, the findings have only focused importance and scope, given the low prevalence of "44 kHz" calls and limited attempts made to systematically manipulate factors that lead to their emission. In fact, the data presented appear to be derived from reanalyses of previously conducted studies in most cases and the main claims are only partially supported. While reading the manuscript, I got the impression that the data presented here are linked to two or three previously published studies (Olszyński et al., 2020, 2021, 2023). This is important to emphasize for two reasons:

      (1) It is often difficult (if not impossible) to link the reported data to the different experiments conducted before (and the individual experimental conditions therein). While reanalyzing previously collected data can lead to important insight, it is important to describe in a clear and transparent manner what data were obtained in what experiment (and more specifically, in what exact experimental condition) to allow appropriate interpretation of the data. For example, it is said that in the "trace fear conditioning experiment" both single- and grouphoused rats were included, yet I was not able to tell what data were obtained in single- versus group-housed rats. This may sound like a side aspect, however, in my view this is not a side aspect given the fact that ultrasonic vocalizations are used for communication and communication is affected by the social housing conditions.

      Answer: Preparing the current manuscript, we indeed used data collected during fear conditioning experiments which were described previously (Olszyński et al., 2021; Olszyński et al., 2022). Please note, however, that vocalization behavior during the fear conditioning itself was not the main subject of these publications. Our previous publications (Olszyński et al., 2020; Olszyński et al., 2021; Olszyński et al., 2022) present primarily ultrasonic-vocalization data from playback-part of experiments whereas here we analyze recordings obtained during fear conditioning experiments, thus we are analyzing new parts, i.e., not yet analyzed, of previously published studies. Also, we have performed additional experiments.

      In the first version of the current manuscript, we did not attempt to demonstrate exactly which calls were recorded in which conditions as the focus was to demonstrate that 44-kHz calls were emitted in several different fear-conditioning experiments. Also, as the experiments were not performed simultaneously and are results from different experimental situations, we would prefer to not compare these results directly.

      However, in the current version of the manuscript, we have introduced an additional reference system, based on Tab. 1, to more clearly indicate which rats have been employed in each analysis, e.g. the group of “Wistar rats that undergone 10 trials of fear conditioning” are described as “Tab. 1/Exp. 1-3/#2,4,8,13; n = 46”, i.e., these are the rats listed in rows 2, 4, 8, and 13 of Tab. 1.

      We have also tried to unify the analyses, in terms of rats used, as much as possible. Finally, we have also introduced Fig. 1S3 to demonstrate the prevalence of 44-kHz calls in all experiments analyzed with the note that “the experiments were not performed in parallel”.

      Regarding the Reviewer’s concerns about analyzing single- and pair-housed rats together. We have examined ultrasonic vocalizations emitted and freezing behavior in these two groups.

      • Ultrasonic vocalizations; when comparing the number of vocalizations, their duration, peak frequency and latency to first occurrence, equally for all types of calls and divided into types (short 22-kHz, long 22-kHz, 44-kHz, 50-kHz), the only difference was observed in peak frequency in 50-kHz vocalizations (50.7 ± 2.8 kHz for paired vs. 61.8 ± 3.1 kHz for single rats; p = 0.0280, Mann-Whitney). Since 50-kHz calls are not the subject of the current publication, we did not investigate this difference further. Also, this difference was not observed during playback experiments (Olszyński et al., 2020, Tab. 1).

      • Freezing. There were no differences between single- and pair-housed groups in freezing behavior, both in the time before first shock presentation and during fear conditioning training (Mann-Whitney).

      In summary, since the two groups did not differ in relevant ultrasonic features and freezing, we decided to present the results obtained from these rats together. However, we agree with the Reviewer, and it is possible that social housing conditions may in fact affect the emission of 44-kHz vocalizations, which could be a subject of another project – involving, e.g., larger experimental groups observed under hypothesis-oriented and defined conditions.

      (2) In at least two of the previously published manuscripts (Olszyński et al., 2021, 2023), emission of ultrasonic vocalizations was analyzed (Figure S1 in Olszyński et al., 2021, and Fig. 1 in Olszyński et al., 2023). This includes detailed spectrographic analyses covering the frequency range between 20 and 100 kHz, i.e. including the frequency range, where the "newtype" ultrasonic vocalization, now named "44 kHz" call, occurs, as reflected in the examples provided in Fig. 1 of Olszyński et al. (2023). In the materials and methods there, it was said: "USV were assigned to one of three categories: 50-kHz (mean peak frequency, MPF >32 kHz), short 22-kHz (MPF of 18-32 kHz, <0.3 s duration), long 22-kHz (MPF of 18-32 kHz, >0.3 s duration)". Does that mean that the "44 kHz" calls were previously included in the count for 50-kHz calls? Or were 44 kHz calls (intentionally?) left out? What does that mean for the interpretation of the previously published data? What does that mean for the current data set? In my view, there is a lack of transparency here.

      Answer: As mentioned above, we indeed used data collected during fear conditioning experiments which were described previously (Olszyński et al., 2021; Olszyński et al., 2022). However, in these publications, ultrasonic vocalizations emitted during playback experiments were the main subject, while the ultrasonic calls emitted during fear conditioning (performed before the playback) were only analyzed in a preliminary way. As a result, the 44-kHz vocalizations analyzed in the current manuscript were not included in the previous analyses. In particular, in Olszyński et al. (2021), we counted the overall number of ultrasonic vocalizations before fear conditioning session to determine the basal ultrasonic emissions (Fig. S1). Then, our next article (Olszyński et al., 2022), we analyzed again the number of all ultrasonic vocalizations before fear conditioning (Fig. S1) and restricted the analysis of vocalizations during fear conditioning to 22-kHz calls (Tab. S1 and S2).

      Also, we re-reviewed all the data used in our previous playback publications. Overall, 44-kHz calls were extremely rare in playback parts of the experiments. There were no 44-kHz calls in the playback data used in Olszyński et al. (2022) and Olszyński et al. (2020). In Olszyński et al. (2021), one rat produced eight 44-kHz calls. These 44-kHz calls constituted 0.03% of all vocalizations analyzed in the experiment (8/24888) and were included in the total number of calls analyzed (but not in the 50-kHz group), they were not described in further detail in that publication.

      Moreover, whether the newly identified call type is indeed novel is questionable, as also mentioned by the authors in their discussion section. While they wrote in the introduction that "high-pitch (>32 kHz), long and monotonous ultrasonic vocalizations have not yet been described", they wrote in the discussion that "long (or not that long (Biały et al., 2019)), frequency-stable high-pitch vocalizations have been reported before (e.g. Sales, 1979; Shimoju et al., 2020), notably as caused by intense cholinergic stimulation (Brudzynski and Bihari, 1990) or higher shock-dose fear conditioning (Wöhr et al., 2005)" (and I wish to add that to my knowledge this list provided by the authors is incomplete). Therefore, I believe, the strong claims made in abstract ("we are the first to describe a new-type..."), introduction ("have not yet been described"), and results ("new calls") are not justified.

      Answer: We would argue that 44-kHz vocalizations were indeed reported but not described. As far as we are concerned, an in-depth analysis of the properties and experimental circumstance of emission of long, high-frequency calls has not yet been performed. These researchers have observed, at least to a degree, similar calls to the ones we observed – as we mentioned in the discussion section. However, since these reported 44-kHz vocalizations were not fully described, we can only guess that they may be similar to ours. We speculate that perhaps like us, these researchers unknowingly recorded 44-kHz calls in their experiments and may also be able to describe them more extensively when re-analyzing their data as we have done here.

      Possibly, it was difficult to find reports on vocalizations, similar to the 44-kHz calls that we observed, because of the canonical and accepted definitions of ultrasonic vocalization types. Biały et al. (2019) allocated them as a part of 22-kHz group, perhaps because their calls were often of a step variation having both low and high components. Shimoju et al. (2020) grouped them along with 50-kHz vocalizations because they appeared during stroking rats held vertically; this procedure was compared to tickling which usually elicits appetitive calls.

      The Reviewer #2 states there are other publications to complete the list. We are aware of other articles authored by the same team as Shimoju et al. (2020) with different first authors. However, they are reporting similar findings to the cited article. Otherwise, we would gladly cite a more complete list of publications showing atypical, long, monotonous highfrequency vocalizations, similar to those observed in our experiments. Therefore, we would argue that ultrasonic vocalizations which were long, flat, high in frequency, and repeatedly occurring in a defined behavioral situation, have not been reported before. However, concerning the strong claims of novelty of our finding, we toned them down where we found this was warranted.

      In general, the manuscript is not well written/ not well organized, the description of the methods is insufficient, and it is often difficult (if not impossible) to link the reported data to the experiments/ experimental conditions described in the materials and methods section.

      Answer: The description of the methods has been adjusted and expanded. We added the requested link to each particular experiment as a formula “Tab. 1/Exp. nos./# nos.” which shows, each time, which experiments and experimental groups were analyzed. The list of the experiments and groups is found in the Tab. 1.

      For example, I miss a clear presentation of basic information: 1) How many rats emitted "44 kHz" calls (in total, per experiment, and importantly, also per experimental condition, i.e. single- versus group-housed)?

      Answer: We now clearly show which experiments were performed and how many animals were tested in each condition (Tab. 1), while the prevalence of 44-kHz calls amongst experimental conditions and animal groups is shown in Fig. 1S3. Also, we included information regarding the number of animals and treatment of each group of rats when reporting results. For example, we are stating that:

      (1a) “53 of all 84 conditioned Wistar rats (Tab. 1/Exp. 1-3/#2,4,6-8,13, Figs 1B, 1E, 1S1BC) displayed” 44-kHz vocalizations – as a general assessment; these numbers are different from those in the first version of the Ms, when we are mentioning Wistar rats conditioned 6 or 10 times only.

      (1b) “From this group of rats (n = 46), n = 41 (89.1%) emitted long 22-kHz calls, and 32 of them (69.6%) emitted 44-kHz calls” – this time referring only to 10-times conditioned Wistar rats as the biggest group that could be analyzed together (Figs 1F, 1G, 1S2A).

      (1c) “for n = 3 rats, 44-kHz vocalizations accounted for >95% of all calls during at least one ITI (e.g., 140 of total 142, 222 of 231, and 263 of 265 tallied 44-kHz calls), and in n = 9 rats, 44kHz vocalizations constituted >50% of calls in more than one ITI.”

      (2) Out of the ones emitting "44 kHz" calls, what was the prevalence of "44 kHz" calls (relative to 22- and 50-kHz calls, e.g. shown as percentage)?

      Answer: The prevalence of 44-kHz vocalizations in all investigated experiments and groups is shown in Fig. 1S3CD. Also, more information regarding the percentage of 44-kHz calls was demonstrated in Fig. 1S2AB where we calculated the distribution of 44-kHz calls to 22-kHz calls in Wistar rats, in 10-trial fear conditioning, across the length of the session.

      Additionally, the values are listed in the sentence regarding all Wistar rats which underwent 10 trials of fear conditioning: “these vocalizations were less frequent following the first trial (1.2 ± 0.4% of all calls), and increased in subsequent trials, particularly after the 5th (8.8 ± 2.8%), through the 9th (19.4 ± 5.5%, the highest value), and the 10th (15.5 ± 4.9%) trials, where 44-kHz calls gradually replaced 22-kHz vocalizations in some rats (Fig. 1F, 1S2B, Video 1; comp Fig. 1D vs. 1E).”

      (3) How did this ratio differ between experiments and experimental conditions?

      Answer: The prevalence of 44-kHz vocalizations in all experimental conditions is shown in Fig. 1S3. However, the direct comparison of results obtained in different conditions was not the goal of the present work. Also, we would argue, that such direct comparisons of results of different experiments would not be allowed. These experiments were done with different groups of animals, at different times, with different timetables of experimental manipulations.

      However, we are comfortable to state that:

      • There were more 44-kHz vocalizations during fear conditioning training than testing in all fear-conditioned Wistar rats;

      • We observed more 44-kHz vocalizations in Wistar rats compared to SHR.

      (4) Was there a link to freezing? Freezing was apparently analyzed before (Olszyński et al., 2021, 2023) and it would be important to see whether there is a correlation between "44-kHz" calls and freezing. Moreover, it would be important to know what behavior the rats are displaying while such "44-kHz" calls are emitted? (Note: Even not all 22-kHz calls are synced to freezing.) All this could help to substantiate the currently highly speculative claims made in the discussion section ("frequency increases with an increase in arousal" and "it could be argued that our prolonged fear conditioning increased the arousal of the rats with no change in the valence of the aversive stimuli"). Such more detailed analyses are also important to rule out the possibility that the "new-type" ultrasonic vocalization, the so-called "44 kHz" call, is simply associated with movement/ thorax compression.

      Answer: We analyzed freezing behavior and its association with ultrasonic emissions. The emission of 44-kHz vocalizations was associated with freezing. The results are now described and presented in the manuscript, i.e., Tab. 2, its legend and the description in Results: “Freezing during the bins of 22-kHz calls only (p < 0.0001, for both groups) and during 44-kHz calls only bins (p = 0.0003) was higher than during the first 5 min baseline freezing levels of the session. Also, the freezing associated with emissions of 44-kHz calls only was higher than during bins with no ultrasonic vocalizations (p = 0.0353), and it was also 9.9 percentage points higher than during time bins with only long 22-kHz vocalizations, but the difference was not significant (p = 0.1907; all Wilcoxon)” and “To further investigate this potential difference, we measured freezing during the emission of randomly selected single 44-kHz and 22-kHz vocalizations. The minimal freezing behavior detection window was reduced to compensate for the higher resolution of the measurements (3, 5, 10, or 15 video frames were used). There was no difference in freezing during the emission of 44-kHz vs. 22-kHz vocalizations for ≥150ms-long calls (3 frames, p = 0.2054) and for ≥500-ms-long calls (5 frames, p = 0.2404; 10 frames, p = 0.4498; 15 frames, p = 0.7776; all Wilcoxon, Tab. 2B).”

      Please note, that the general observation that "frequency increases with an increase in arousal" is not our claim but a general rule derived from large body of observations and proposed by the others (Briefer et al., 2012); we changed the wording of this statement to: “frequency usually increases with an increase in arousal (Briefer et al., 2012)”.

      The figures currently included are purely descriptive in most cases - and many of them are just examples of individual rats (e.g. majority of Fig. 1, all of Fig. 2 to my understanding, with the exception of the time course, which in case of D is only a subset of rats ("only rats that emitted 44-kHz calls in at least seven ITI are plotted" - is there any rationale for this criterion?)), or, in fact, just representative spectrograms of calls (all of Fig. 3, with the exception of G, all of Fig. 4).

      Answer: Please note, the former figures 2, 4, 6, and 8 have been now moved to supplementary figures 1S1, 2S1, 3S1, and 4S1 – to better organize the presentation of data. Figures 1, 3, 5, 7 are now 1, 2, 3, 4 respectively. In regards to presenting data from individual rats, this was to show the general patterns of ultrasonic-calls distributions observed. Showing the full data set as seen in Fig. 5A (now Fig. 3A) would obscure the readability of the graph without using mathematical clustering techniques such as DBSCAN.

      Concerning the Reviewer’s #2 question regarding the criterion of “minimum seven ITI”, we selected the highest vocalizers by taking animals above the 75th percentile of the number of ITI with 44-kHz calls. However, in the current version of the manuscript, we decided to omit this part of the analysis and the accompanying part of the figure, since it did not provide any additional informative value (apart from employing questionable criterion).

      Moreover, the differences between Fig. 5 and Fig. 6 are not clear to me. It seems Fig. 5B is included three times - what is the benefit of including the same figure three times?

      Answer: We hope that designating Fig. 6 as supplementary to Fig. 5 (now Figs 3S1 and 3, respectively) will make interpreting them more streamlined. Fig. 6A (now Fig. 3S1A) is a more detailed look on information presented in Fig. 5B (now Fig. 3B) with spectrogram images of ultrasonic vocalizations from different areas of the plot. Also, Fig. 3B (former Fig. 5B) was removed from Fig. 3S1B (former Fig. 6B).

      A systematic comparison of experimental conditions is limited to Fig. 7 and Fig. 8, the figures depicting the playback results (which led to the conclusion that "the responses to 44-kHz aversive calls presented from the speaker were either similar to 22-kHz vocalizations or in between responses to 22-kHz and 50-kHz playbacks", although it remains unclear to me why differences were seen b e f o r e the experimental manipulation, i.e. the different playback types in Fig. 8B).

      Answer: There were indeed instances of such before-differences. Such differences were observed in our previous studies (Olszyński et al., 2020, Tabs S9-12; Olszyński et al., 2021, Tabs S7; Olszyński et al., 2022, Tabs S4, S9, S13, S17, S18) and were most likely due to analyzing multiple comparisons. However, we think that the carry-over effect, mentioned by the Reviewer #2 (see below), also played a role.

      Related to that, I miss a clear presentation of relevant methodological aspects: 1) Why were some rats single-housed but not the others?

      Answer: As stated before, data were collected from our previous experiments and the observation of 44-kHz vocalizations in fear conditioning was an emergent discovery as we decided to analyze ultrasonic recordings from fear conditioning procedures. Single-housed animals were part of our experiment comparing fear conditioning and social situation on the perception of ultrasonic playback as described in Olszyński et al. (2020). Aside from this experiment, all other rats were housed in pairs.

      (2) Is the experimental design of the playback study not confounded? It is said that "one group (n = 13) heard 50-kHz appetitive vocalization playback while the other (n = 16) 22-kHz and 44kHz aversive calls". How can one compare "44 kHz" calls to 22- and 50-kHz calls when "44 kHz" calls are presented together with 22-kHz calls but not 50-kHz calls? What about carry-over effects? Hearing one type of call most likely affects the response to the other type of call. It appears likely that rats are a bit more anxious after hearing aversive 22-kHz calls, for example. Therefore, it would not be very surprising to see that the response to "44 kHz" calls is more similar to 22-kHz calls than 50-kHz calls.

      Of note, in case of the other playback experiment it is just said that rats "received appetitive and aversive ultrasonic vocalization playback" but it remains unclear whether "44 kHz" calls are seen as appetitive or aversive. Later it says that "rats were presented with two 10-s-long playback sets of either 22-kHz or 44-kHz calls, followed by one 50-kHz modulated call 10-s set and another two playback sets of either 44-kHz or 22-kHz calls not previously heard" (and wonder what data set was included in the figures and how - pooled?). Again, I am worried about carry-over effects here. This does not seem to be an experimental design that allows to compare the response to the three main call types in an unbiased manner.

      Answer: We apologize for being confounding and brief in our original description of the playback experiments. We wanted to avoid confusion associated with including several additional playback signals (please note some are not related to the current comparisons and include different 50-kHz ultrasonic subtypes and two different subtypes of short 22-kHz calls). We lengthened the description of these playback experiments in the current version.

      In general, including more than one type of ultrasonic calls as playback has a risk of a carry-over effect as well as a habituation effect (the responses become weak). However, it greatly reduces the number of required animals. Finally, regarding the first experiment, we chose 3 playbacks to compare the rats’ reactions, as this was the most conservative choice we thought of.

      We would like to highlight that we wanted to compare specifically the rats’ responses to 22-kHz vs. 44-kHz playback (as well as the effects of playback of different subtypes 50-kHz calls, which is not the subject of the current work). Therefore, we would argue, that the design of both experiments is actually unbiased regarding this key comparison (responses to 22-kHz vs. 44-kHz playback). In both experiments, 22-kHz and 44-kHz playbacks were included in the same sequences of stimuli and counterbalanced regarding their order (i.e., taking into account possible carry-over effects), and presented to the same rats. We regarded the group of rats that heard 50-kHz recordings as a baseline/control, since we know from previous playback studies what reactions to expect from rats exposed to these vocalizations (and 22-kHz playback), while in the second experiment, we reduced the 50-kHz playback to one set in order to minimize possible habituation to multiple playbacks.

      We agree that the design of both experiments does not allow for full comparison of the effects of aversive playbacks to 50-kHz playback. Also, we agree that some carry-over effects could play a role. It was mentioned in the discussion: ”Please factor in potential carryover effects (resulting from hearing playbacks of the same valence in a row) in the differences between responses to 50-kHz vs. 22/44-kHz playbacks, especially, those observed before the signal (Fig. 4AB).” However, we would still argue that the observed lack of difference in heartrate response (Fig. 4A) and the differences regarding the number of 50-kHz calls emitted (e.g., Fig. 4S1F) are void of the constraints raised by the Reviewer #2.

      We acknowledge that our studies do not give a complete picture of 44-kHz ultrasonic perception in relation to other ultrasonic bands and, given the possibility, we would like to perform more in-depth and focused experiments to study this aspect of 44-kHz calls in the future.

      Finally, regarding the second experiment, the description of the rats now includes that they “received 22-kHz, 44-kHz, and 50-kHz ultrasonic vocalization playback”, while the description of the experiment itself includes: “Responses to the pairs of playback sets were averaged”.

      Of note, what exactly is meant by "control rats" in the context of fear conditioning is also not clear to me. One can think of many different controls in a fear conditioning experiment.

      More concrete information is needed.

      Answer: This information was included in our previous publications. However, it was now provided in the method section of the current version of the manuscript. In general, control rats were subjected to the same procedures but did not receive electric shocks.

      Literature included in the answers

      Araya, E. I., Baggio, D. F., Koren, L. O., Andreatini, R., Schwarting, R. K. W., Zamponi, G. W., & Chichorro, J. G. (2020). Acute orofacial pain leads to prolonged changes in behavioral and affective pain components. Pain, 161(12), 2830-2840. https://doi.org/10.1097/j.pain.0000000000001970

      Barker, D. J., Root, D. H., Ma, S., Jha, S., Megehee, L., Pawlak, A. P., & West, M. O. (2010). Dose-dependent differences in short ultrasonic vocalizations emitted by rats during cocaine self-administration. Psychopharmacology (Berl), 211(4), 435-442. https://doi.org/10.1007/s00213-010-1913-9

      Barroso, A. R., Araya, E. I., de Souza, C. P., Andreatini, R., & Chichorro, J. G. (2019). Characterization of rat ultrasonic vocalization in the orofacial formalin test: Influence of the social context. Eur Neuropsychopharmacol, 29(11), 1213-1226. https://doi.org/10.1016/j.euroneuro.2019.08.298

      Biały, M., Podobinska, M., Barski, J., Bogacki-Rychlik, W., & Sajdel-Sulkowska, E. M. (2019). Distinct classes of low frequency ultrasonic vocalizations in rats during sexual interactions relate to different emotional states. Acta Neurobiol Exp (Wars), 79(1), 1-12. https://www.ncbi.nlm.nih.gov/pubmed/31038481

      Briefer, E. F., Padilla de la Torre, M., & McElligott, A. G. (2012). Mother goats do not forget their kids' calls. Proc Biol Sci, 279(1743), 3749-3755. https://doi.org/10.1098/rspb.2012.0986

      Browning, J. R., Browning, D. A., Maxwell, A. O., Dong, Y., Jansen, H. T., Panksepp, J., & Sorg, B. A. (2011). Positive affective vocalizations during cocaine and sucrose self administration: a model for spontaneous drug desire in rats. Neuropharmacology, 61(1-2), 268-275. https://doi.org/10.1016/j.neuropharm.2011.04.012

      Brudzynski, S. M. (2015). Pharmacology of Ultrasonic Vocalizations in adult Rats: Significance, Call Classification and Neural Substrate. Curr Neuropharmacol, 13(2), 180-192. https://doi.org/10.2174/1570159x13999150210141444

      Brudzynski, S. M., & Bihari, F. (1990). Ultrasonic vocalization in rats produced by cholinergic stimulation of the brain. Neurosci Lett, 109(1-2), 222-226. https://doi.org/10.1016/0304-3940(90)90567-s

      Brudzynski, S. M., Bihari, F., Ociepa, D., & Fu, X. W. (1993). Analysis of 22 kHz ultrasonic vocalization in laboratory rats: long and short calls. Physiol Behav, 54(2), 215-221. https://doi.org/10.1016/0031-9384(93)90102-l

      Hinchcliffe, J. K., Jackson, M. G., & Robinson, E. S. (2022). The use of ball pits and playpens in laboratory Lister Hooded male rats induces ultrasonic vocalisations indicating a more positive affective state and can reduce the welfare impacts of aversive procedures. Lab Anim, 56(4), 370-379. https://doi.org/10.1177/00236772211065920

      Matochik, J. A., White, N. R., & Barfield, R. J. (1992). Variations in scent marking and ultrasonic vocalizations by Long-Evans rats across the estrous cycle. Physiol Behav, 51(4), 783-786. https://doi.org/10.1016/0031-9384(92)90116-j

      Olszyński, K. H., Polowy, R., Małż, M., Boguszewski, P. M., & Filipkowski, R. K. (2020). Playback of Alarm and Appetitive Calls Differentially Impacts Vocal, Heart-Rate, and Motor Response in Rats. iScience, 23(10), 101577. https://doi.org/10.1016/j.isci.2020.101577

      Olszyński, K. H., Polowy, R., Wardak, A. D., Grymanowska, A. W., & Filipkowski, R. K. (2021). Increased Vocalization of Rats in Response to Ultrasonic Playback as a Sign of Hypervigilance Following Fear Conditioning. Brain Sci, 11(8). https://doi.org/10.3390/brainsci11080970

      Olszyński, K. H., Polowy, R., Wardak, A. D., Grymanowska, A. W., Zieliński, J., & Filipkowski, R. K. (2022). Spontaneously hypertensive rats manifest deficits in emotional response to 22-kHz and 50-kHz ultrasonic playback. Prog Neuropsychopharmacol Biol Psychiatry, 120, 110615. https://doi.org/10.1016/j.pnpbp.2022.110615

      Saito, Y., Tachibana, R. O., & Okanoya, K. (2019). Acoustical cues for perception of emotional vocalizations in rats. Scientific Reports, 9(1), 10539.

      Sales, G. D. (1979). Strain Differences in the Ultrasonic Behavior of Rats (Rattus norvegicus) Am Zool, 19(2), 513-527. https://www.jstor.org/stable/3882331

      Shimoju, R., Shibata, H., Hori, M., & Kurosawa, M. (2020). Stroking stimulation of the skin elicits 50-kHz ultrasonic vocalizations in young adult rats. J Physiol Sci, 70(1), 41. https://doi.org/10.1186/s12576-020-00770-1

      Silkstone, M., & Brudzynski, S. M. (2019a). The antagonistic relationship between aversive and appetitive emotional states in rats as studied by pharmacologically-induced ultrasonic vocalization from the nucleus accumbens and lateral septum. Pharmacology Biochemistry and Behavior, 181, 77-85. https://doi.org/10.1016/j.pbb.2019.04.009

      Silkstone, M., & Brudzynski, S. M. (2019b). Intracerebral injection of R-(-)-Apomorphine into the nucleus accumbens decreased carbachol-induced 22-kHz ultrasonic vocalizations in rats. Behavioural Brain Research, 364, 264-273. https://doi.org/10.1016/j.bbr.2019.01.044

      Willey, A. R., & Spear, L. P. (2013). The effects of pre-test social deprivation on a natural reward incentive test and concomitant 50 kHz ultrasonic vocalization production in adolescent and adult male Sprague-Dawley rats. Behav Brain Res, 245, 107-112. https://doi.org/10.1016/j.bbr.2013.02.020

      Wöhr, M., Borta, A., & Schwarting, R. K. (2005). Overt behavior and ultrasonic vocalization in a fear conditioning paradigm: a dose-response study in the rat. Neurobiol Learn Mem, 84(3), 228-240. https://doi.org/10.1016/j.nlm.2005.07.004

      Recommendations For The Authors:

      Reviewer #1 (Recommendations For The Authors):

      Additional considerations:

      The discussion of the "perfect fifth" and the proposition that this observation could be evidence of an evolutionary mechanism underlying it is rather far-fetched, especially for being presented in the Results section (with no supporting non-anecdotal evidence).

      Answer: We agree with the Reviewer #1. The text was modified, the word “evolutionary” was deleted. Instead, we expended on the possible reason for prevalence of the perfect fifth in the current version of the manuscript; we added that the prevalence of the perfect fifth: “could be explained by the observation that all physical objects capable of producing tonal sounds generate harmonic vibrations, the most prominent being the octave, perfect fifth, and major third (Christensen, 1993, discussed in Bowling and Purves, 2015).”

      It is not clear why Sprague-Dawleys were used as "receivers" in the playback experiment, when presumably the calls were recorded from Wistars and SHRs. While this does not critically impact the conclusions, within the species rats should be able to respond appropriately to calls made by rats of different genetic backgrounds, it adds an unnecessary source of variance.

      Answer: Sprague-Dawley rats were used to test another normotensive strain of rats. Regarding the Reviewer’s main point – we beg to differ as we think that it is worth testing playback stimuli in different strains. Diverging the stimuli between different rat strains would add unnecessary variance and it seemed logical to use the same recordings to test effects in different strains. Please note that finally, in spite of this additional variance, the results of both playback experiments are, in general, similar – which may point to a universal effect of 44-kHz playback across rat strains.

      It is pertinent to note that for the trace fear conditioning experiment, the rats had previously been exposed to a vocalization playback experiment. While such a pre-exposure is unlikely to be a very strong stressor, the possibility for it to influence the vocal behaviors of these rats in later experiments cannot be ruled out. It is also not clear what the control rats in this experiment experienced (home cage only?), nor what they were used for in analyses.

      Answer: In the current version of the manuscript, we have described in greater detail all the experiments performed and analyzed. We would like to emphasize that both delay and trace fear conditioning experiments with radiotelemetric transmitters were not performed specifically to elicit any particular response during fear conditioning, rather that our observation of 44-kHz vocalizations emerged as a result of re-examining the audio recordings. As a result, this work summarizes our observations of 44-kHz calls from several different experiments. It is relevant to note, that 44-kHz vocalizations were observed “in rats which were exposed to vocalization playback experiment”, in rats before the playback experiments as well as in naïve rats, without transmitters implemented, trained in fear conditioning (Tab. 1/Exp. 1-3).

      Our main message is that 44-kHz vocalizations were present in several experiments, with different conditions and subjects, while we are not attempting to compare in detail the results across the different experiments. In other words, we agree that pre-exposure to playback (and even more likely – transmitters implantation) could influence, but are not necessary, for 44-kHz ultrasonic emissions by the rats. To demonstrate this, we added a prolonged fear conditioning group with naïve Wistar rats (Exp. 3) to verify the emission of 44kHz calls in the absence of those experimental factors.

      We modified the methods section to clarify the circumstances under which these discoveries were made, such as including the information regarding the control rats in trace fear conditioning. In particular we mention that: “Control rats were subjected to the exact same procedures but did not receive the electric shock at the end of trace periods”.

      For Figure 1A-E, only example call distributions from individual rats are shown. It would perhaps be more informative to see the full data set displayed in this manner, with color/shape codes distinguishing individuals if desired.

      Answer: Please note the Fig. 1S1 shows more examples of ultrasonic call distribution. Showing all the data would make it more difficult to read and interpret. The problem is partly amended in Fig. 3A.

      It is not clear what is presented in Figure 2D vs. E, i.e. panel D is shown only for "selected rats" but the legend does not clarify how and why these rats were selected. It is also not clear why the legend reports p-values for both Friedman and Wilcoxon tests; the latter is appropriate for paired data which seems to be the case when the question is whether the call peak frequency alters across time, but the Friedman assumes non-paired input data.

      Answer: The question refers to the current Fig. 1S2C panel (former Fig. 2E panel) and the former Fig. 2D panel. The latter was not included in the current version of the manuscript, since both reviewers opposed the presentation of “selected rats” only (see above). The full description of the Fig. 1S2C panel is now in the results section together with p-values for Friedman and Wilcoxon test. We used the latter to investigate the difference between the first and the last ITI (selected paired data), while the Friedman to investigate the presence of change within the chain of ten ITI – since it is a suitable test for a difference between two or more paired samples.

      Reviewer #2 (Recommendations For The Authors):

      The weaknesses listed in the public review need to be addressed.

      Answer: We have done our best to address the weaknesses.

      Notes: 1) Page and line numbers would have been useful.

      Answer: We are including a separate manuscript version with page and line numbers.

      .(2) English language needs to be improved.

      Answer: The text has been checked by two native English speakers (one with a scientific background). Both only identified minor changes to improve the text which we applied.

      (3) I am a bit unsure whether the comment about the Star Wars movie (1997) and the Game of Thrones series (2011) is supposed to be a joke.

      Answer: These are indeed two genuine examples of the perfect fifth in human music that we hope are easily recognizable and familiar to readers. Parts of the same examples of the perfect fifth can also heard in the rat voice files provided.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      During the last decades, extensive studies (mostly neglected by the authors), using in vitro and in vivo models, have elucidated the five-step mechanism of intoxication of botulinum neurotoxins (BoNTs). The binding domain (H chain) of all serotypes of BoNTs binds polysialogangliosides and the luminal domain of a synaptic vesicle protein (which varies among serotypes). When bound to the synaptic membrane of neurons, BoNTs are rapidly internalized by synaptic vesicles (SVs) via endocytosis. Subsequently, the catalytic domain (L chain) translocates, a process triggered by the acidification of these organelles. Following translocation, the disulfide bridge connecting the H chain with the L chain is reduced by the thioredoxin reductase/thioredoxin system, and it is refolded by the chaperone Hsp90 on SV's surface. Once released into the cytosol, the L chains of different serotypes cleave distinct peptide bonds of specific SNARE proteins, thereby disrupting neurotransmission. In this study, Yeo et al. extensively revise the neuronal intoxication model, suggesting that BoNT/A follows a more complex intracellular route than previously thought. The authors propose that upon internalization, BoNT/A-containing endosomes are retro-axonally trafficked to the soma. At the level of the neuronal soma, this serotype then traffics to the endoplasmic reticulum (ER) via the Golgi apparatus. The ER SEC61 translocon complex facilitates the translocation of BoNT/A's LC from the ER lumen into the cytosol, where the thioredoxin reductase/thioredoxin system and HSP complexes release and refold the catalytic L chain. Subsequently, the L chain diffuses and cleaves SNAP25 first in the soma before reaching neurites and synapses. Strengths:

      I appreciate the authors' efforts to confirm that the newly established methods somehow recapitulate aspects of the BoNTs mechanism of action, such as toxin binding and uptake occurring at the level of active synapses. Furthermore, even though I consider the SNAPR approach inadequate, the genome-wide RNAi screen has been well executed and thoroughly analyzed. It includes well-established positive and negative controls, making it a comprehensive resource not only for scientists working in the field of botulinum neurotoxins but also for cell biologists studying endocytosis more broadly. Weaknesses:

      I have several concerns about the authors' main conclusions, primarily due to the lack of essential controls and validation for the newly developed methods used to assess toxin cleavage and trafficking into neurons. Furthermore, there is a significant discrepancy between the proposed intoxication model and existing studies conducted in more physiological settings. In my opinion, the authors have omitted over 20 years of work done in several labs worldwide (Montecucco, Montal, Schiavo, Rummel, Binz, etc.). I want to emphasize that I support changes in biological dogma only when these changes are supported by compelling experimental evidence, which I could not find in the present manuscript.

      We thank the reviewer for his reading and comments and for pointing out the discrepancy between our proposed model and the existing model. However, we respectfully disagree with the phrase of “extensive studies have elucidated the five-steps mechanism of intoxication…”. This sentence and the following imply that the model is well-established and demonstrated. It also highlights how the reviewer is convinced about this previous model.

      We contest this model for theoretical reasons and contest the strength of evidences that support it. We previously included references to previous work showing that the model is also being challenged by others. In light of the reviewer’s comments, we incluced more references in the introduction and we also explicit our main theoretical concern in the introduction:

      “Arguably, the main problem of the model is its failure to propose a thermodynamically consistent explanation for the directional translocation of a polypeptidic chain across a biologial membrane. Other known instances of polypeptide membrane translocation such as the co-translational translocation into the ER indicate that it is an unfavorable process, which consumes significant energy (Alder and Theg 2003). ”

      We also added the following text in the Discussion to address with the reviewer’s concerns: “Our study contradicts the long-established model of BoNT intoxication, which is described in several reviews specifically dedicated to the subject 1–4. In short, these reviews support the notion that BoNT are molecular machines able to mediate their own translocation across membranes; this notion has convinced some cell biologists interested in toxins and retrograde traffic, who describe BoNT mode of translocation in their reviews 5,6.

      But is this notion well supported by data? A careful examination of the primary literature reveals that early studies indeed report that BonTs form ion channels at low pH values 7,8. These studies have been extended by the use of patch-clamp 9,10. These works and others lead to various suppositions on how the toxin forms a channel and translocate the LC 1,11 .

      However, only a single study claims to reconstitute in vitro the translocation of BonT LC across membranes 12. In this paper, the authors report using a system of artificial membranes separating two aqueous compartments. They load the toxin in the cis compartment and measure the protease activity in the trans compartment after incubation. However, when the experimental conditions described are actually converted in terms of molarity, it appears that the cis compartment was loaded at 10e-8M BonT and that the reported translocated protease activity is equivalent to 10e-17 M (Figure 3D, 12). Thus, in this experiment, about 1 LC molecule in 100 millions has crossed the membrane. Such extremely low transfert rate does not tally with the extreme efficiency of intoxication in vivo, even while taking into account the difference between artificial and biological membranes.

      In sum, a careful analysis of the primary literature indicate that while there is ample evidence that BoNTs have the ability to affect membranes and possibly create ion channels, there is actually no credible evidence that these channels mediate translocation of the LC. As mentioned earlier, it is not clear how such a self-translocation mechanism would function thermodynamically. By contrast, our model proposes a mechanism without a thermodynamic problem, is consistent with current knowledge about other protein toxins, such as PE, Shiga and Ricin, and can help explain previously puzzling features of BonT effects. It is worth noting that a similar self-translocation model was proposed for other protein toxins such as Pseudomonas exotoxin, which have similar molecular organisation as BonT (68). However, it has since been demonstrated that the PE toxins require cellular machinery, in particular in the ER, for intoxication (21,69,70).”

      Reviewer #2 (Public Review):

      Summary:

      The study by Yeo and co-authors addresses a long-lasting issue about botulinum neurotoxin (BoNT) intoxication. The current view is that the toxin binds to its receptors at the axon terminus by its HCc domain and is internalized in recycled neuromediator vesicles just after the release of the neuromediators. Then, the HCn domain assists the translocation of the catalytic light chain (LC) of the toxin through the membrane of these endocytic vesicles into the cytosol of the axon terminus. There, the LC cleaves its SNARE substrate and blocks neurosecretion. However, other views involving kinetic aspects of intoxication suggest that the toxin follows the retrograde axonal transport up to the nerve cell body and then back to the nerve terminus before cleaving its substrate.

      In the current study, the authors claim that the BoNT/A (isotype A of BoNT) not only progresses to the cell body but once there, follows the retrograde transport trafficking pathway in a retromer-dependent fashion, through the Golgi apparatus, until reaching the endoplasmic reticulum. Next, the LC dissociates from the HC (a process not studied here) and uses the translocon Sec61 machinery to retro-translocate into the cytosol. Only then, does the LC traffic back to the nerve terminus following the anterograde axonal transport. Once there, LC cleaves its SNARE substrate (SNAP25 in the case of BoTN/A) and blocks neurosecretion.

      To reach their conclusion, Yeo and co-authors use a combination of engineered tools: a cell line able to differentiate into neurons (ReNcell VN), a reporter dual fluorescent protein derived from SNAP25, the substrate of BoNT/A (called SNAPR), the use of either native BoNT/A or a toxin to which three fragment 11 of the reporter fluorescent protein Neon Green (mNG) are fused to the N-terminus of the LC (BoNT/A-mNG11x3), and finally ReNcell VN transfected with mNG1-10 (a protein consisting of the first 10 beta strands of the mNG).

      SNAPR is stably expressed all over in the ReNcell VN. SNAPR is yellow (red and green) when intact and becomes red only when cleaved by BoNT/A LC, the green tip being degraded by the cell. When the LC of BoNT/A-mNG11x3 reaches the cytosol in ReNcell VN transfected by mNG1-10, the complete mNG is reconstituted and emits a green fluorescence.

      In the first experiment, the authors show that the catalytic activity of the LC appears first in the cell body of neurons where SNAPR is cleaved first. This phenomenon starts 24 hours after intoxication and progresses along the axon towards the nerve terminus during an additional 24 hours. In a second experiment, the authors intoxicate the ReNcell VN transfected by mNG1-10 using the BoNT/A-mNG11x3. The fluorescence appears also first in the soma of neurons, then diffuses in the neurites in 48 hours. The conclusion of these two experiments is that translocation occurs first in the cell body and that the LC diffuses in the cytosol of the axon in an anterograde fashion.

      In the second part of the study, the authors perform a siRNA screen to identify regulators of BoNT/A intoxication. Their aim is to identify genes involved in intracellular trafficking of the toxin and translocation of the LC. Interestingly, they found positive and negative regulators of intoxication. Regulators could be regrouped according to the sequential events of intoxication.

      Genes affecting binding to the cell-surface receptor (SV2) and internalization. Genes involved in intracellular trafficking. Genes involved in translocation such as reduction of the disulfide bond linking the LC to the HC and refolding in the cytosol. Genes involved in signaling such as tyrosine kinases and phosphatases. All these groups of genes may be consistent with the current view of BoNT intoxication within the nerve terminus. However, two sets of genes were particularly significant to reach the main conclusion of the work and definitely constitute an original finding important to the field. One set of genes consists of those of the retromer, and the other relates to the Sec61 translocon. This should indicate that once endocytosed, the BoNT traffics from the endosomes to the Golgi apparatus, and then to the ER. Ultimately, the LC should translocate from the ER lumen to the cytosol using the Sec61 translocon. The authors further control that the SV2 receptor for the BoNT/A traffics along the axon in a retromer-dependent fashion and that BoNT/A-mNG11x3 traverses the Golgi apparatus by fusing the mNG1-10 to a Golgi resident protein.

      Strengths:

      The findings in this work are convincing. The experiments are carefully done and are properly controlled. In the first part of the study, both the activity of the LC is monitored together with the physical presence of the toxin. In the second part of the work, the most relevant genes that came out of the siRNA screen are checked individually in the ReNcell VN / BoNT/A reporter system to confirm their role in BoNT/A trafficking and retro-translocation.

      These findings are important to the fields of toxinology and medical treatment of neuromuscular diseases by BoNTs. They may explain some aspects of intoxication such as slow symptom onset, aggravation, and appearance of central effects.

      Weaknesses:

      The findings antagonize the current view of the intoxication pathway that is sustained by a vast amount of observations. The findings are certainly valid, but their generalization as the sole mechanism of BoNT intoxication should be tempered. These observations are restricted to one particular neuronal model and engineered protein tools. Other models such as isolated nerve/muscle preparations display nerve terminus paralysis within minutes rather than days. Also, the tetanus neurotoxin (TeNT), whose mechanism of action involving axonal transport to the posterior ganglia in the spinal cord is well described, takes between 5 and 15 days. It is thus possible that different intoxication mechanisms co-exist for BoNTs or even vary depending on the type of neurons.

      Although the siRNA experiments are convincing, it would be nice to reach the same observations with drugs affecting the endocytic to Golgi to ER transport (such as Retro-2, golgicide or brefeldin A) and the Sec61 retrotranslocation (such as mycolactone). Then, it would be nice to check other neuronal systems for the same observations.

      We thank the reviewer for the careful reading and comments of our manuscript. The reference to “a vast amount of observation” is a similar argument to the Reviewer 1 and used to suggest that our study may not be applicable as a general mechanism.

      We respectfully disagree as described above and posit on the contrary that the model we propose is much more likely to be general than the model presented in current reviews for the several reasons cited (see added text in Introduction and Discussion). While we agree that more work is needed to confirm the proposed mechanisms of BonT translocation in other models, these experiments fall outside the perimeter of our study.

      The fact that nerve/muscle preparations of BonT activity have relatively fast kinetics does not pose a contradiction to our model. Our model reveals primarily the requirement for trafficking to the ER membranes. This ER targeting requires trafficking through the Golgi complex, in turn explaining the requirement for trafficking to the soma of neurons in the experimental system we used. However, in neuronal cells in vivo, Golgi bodies can be found along the lenght of the axon, thus BonT may not always require trafficking to the soma of the affected cells. The time required for intoxication could thus vary greatly depending on the neuronal structural organisation.

      TenT is proposed to transfer from excitatory neurons into inhibitory neurons before exerting its action. While the detailed mechanism of this fascinating mechanism remain to be explored, it clearly falls beyond the purview of this manuscript.

      Regarding the use of drugs, we agree that it would be a nice addition; unfortunately we are unable to perform such experiments at this stage. Setting up a large scale siRNA screen for BonT mechanism of action is challenging as it requires a special facility with controlled access and police authorisation (in Singapore) given the high toxicity of this molecule. Unfortunately, the authorisations have now lapsed.

      Reviewer #3 (Public Review): Summary:

      The manuscript by Yao et al. investigates the intracellular trafficking of Botulinum neurotoxin A (BoNT/A), a potent toxin used in clinical and cosmetic applications. Contrary to the prevailing understanding of BoNT/A translocation into the cytosol, the study suggests a retrograde migration from the synapse to the soma-localized Golgi in neurons. Using a genome-wide siRNA screen in genetically engineered neurons, the researchers identified over three hundred genes involved in this process. The study employs organelle-specific split-mNG complementation, revealing that BoNT/A traffics through the Golgi in a retromer-dependent manner before moving to the endoplasmic reticulum (ER). The Sec61 complex is implicated in the retro-translocation of BoNT/A from the ER to the cytosol. Overall, the research challenges the conventional model of BoNT/A translocation, uncovering a complex route from synapse to cytosol for efficient intoxication. The findings are based on a comprehensive approach, including the introduction of a fluorescent reporter for BoNT/A catalytic activity and genetic manipulations in neuronal cell lines. The conclusions highlight the importance of retrograde trafficking and the involvement of specific genes and cellular processes in BoNT/A intoxication.

      Strengths:

      The major part of the experiments are convincing. They are well-controlled and the interpretation of their results is balanced and sensitive.

      Weaknesses:

      To my opinion, the main weakness of the paper is in the interpretation of the data equating loss of tGFP signal (when using the Red SNAPR assay) with proteolytic cleavage by the toxin. Indeed, the first step for loss of tGFP signal by degradation of the cleaved part is the actual cleavage. However, this needs to be degraded (by the proteasome, I presume), a process that could in principle be affected (in speed or extent) by the toxin.

      We thank the reviewer for his comments and careful reading of our manuscript.

      Regarding the read-out of the assay, we agree that the assay could be sensitive to alteration in the protein degradation pathway. We have added the following sentence in the Discussion to take it into account:

      “As noted by one reviewer, the assay may be sensitive to perturbation in the general rate of protein degradation, a consideration to keep in mind when evaluating the results of large scale screens.”

      While this may be valid for some hits in the general list, it is important to note that the main hits have been shown to affect toxin trafficking by an independent, orthogonal assay based on the split GFP reconstitution.

      Recommendations to authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) To assess the activity of BoNT/A in neurons, Yeo et al. have generated a neuronal stem line referred to as SNAPR. This cell line stably expresses a chimeric reporter protein that consists of SNAP25 flanked at its N-terminus with a tagRFPT and at its C-terminus with a tagGFP. After exposure to BoNT/A, SNAP25 is cleaved and, the C-terminal tGFP-containing moiety is rapidly degraded. I have many doubts about the validity of the described method. Indeed, BoNT/A activity is analysed in an indirect way by quantifying the degradation of the GFP moiety generated after toxin cleavage (Fig. 2). In this regard, the authors should consider that their approach is dependent, not only on the toxin's metalloprotease activity but also on the functionality of the proteasome in neurons. Therefore, considering the current dataset, it is impossible to rule out the possibility that the progression of GFP signal loss from the soma to the neurite terminals may be attributed to the different proteasome activity in these compartments. Is it conceivable that the GFP fragment generated upon toxin cleavage degrades more rapidly in the soma in comparison to axonal terminals? This alternative explanation could challenge the conclusion drawn in Fig. 2.

      The reviewer’s alternative explanation disregards the experiments performed with the split-GFP complementation approach, which indicate translocation in the soma first. The split GFP reporter is not dependent on the proteasome activity. It also disregard the genetic data implicating many genes involved in membrane retrograde traffic, which are also not consistent with the hypothesis of the reviewer. These genes depletions not only affect SNAPR degradation but also BoNT/A-mNG11 trafficking: thus, their effect cannot be attributed to an completely hypothetical spatial heterogeneous distribution of the proteasome.

      For this reason, I strongly suggest using a more physiological approach that does not depend on proteasomal degradation or on the expression of the sensor in neurons. The authors should consider performing a time course experiment following intoxication and staining BoNT/A-cleaved SNAP25 by using specific antibodies (see Antonucci F. et al., Journal of Neuroscience, 2008 or Rheaume C. et al., Toxins 2015).

      For the above reason, we do not agree with the pressing importance of confirming by a third method using specific antibodies; especially considering that BonT is very difficult to detect in cells when incubated at physiological levels. By the way, the cited paper, by Antonucci F; et al. documents long distance retrograde traffic of BonT/A, which is in line with our data.

      An alternative approach could involve the use of microfluidic devices that physically separate axons from cell bodies. Such a separation will allow us to test the authors' primary conclusion that SNAP25 is initially cleaved in the soma. The suggested experiments will also rule out potential overexpression artifacts that could influence the authors' conclusions when using the newly developed SNAPR approach. Without these additional experiments, the authors' main conclusion that SNAP25 is cleaved first in the neuronal soma rather than at the nerve terminal is inadequate.

      As discussed above we disagree about the doubts raised by the reviewer: we present three types of evidences (SNAPR, split GFP and genetic hits) and they all point in the same direction. Thus, we respectfully doubt that a fourth approach would convince this reviewer. To note, we have attempted to use microfluidics devices as suggested by the reviewer, however, the Ren-VM neurons were not able to extend axons long enough across the device.

      (2) To detect BoNT/A translocation into the cytosol, the authors have used a complementation assay by intoxicating ReNcell VM cell expressing a cytosolic HA-tagged split monomeric NeonGreen (Cyt-mNG1-10) with an engineered BoNT/A, where the catalytic domain (LC) was fused to mNG1-11. When drawing conclusions regarding the detection of cytosolic LC in the neuronal soma, the authors should highlight the limitations of this assay and explicitly describe them to the readers. Firstly, the authors need to investigate whether the addition of mNG1-11 to the LC affects the translocation process itself (by comparing with a WT, not tagged, LC).

      Additionally, from the data shown in Fig. 2C, it is evident that the Cyt-mNG1-10 is predominantly expressed in the cytosol and less detected in neurites. This raises the question of whether there might be a bias for the cell soma in this assay. To address this important concern, I suggest quantifying MFI per cell (Fig. 2D) taking into consideration the amount of HA-tagged Cyt-mNG1-10. Furthermore, I strongly suggest targeting mNG1-10 to synapses and performing a similar time course experiment to observe when LC translocation occurs at nerve terminals. Alternative experiments, to prove that BoNT/A requires retrograde trafficking before it can translocate, may be done to repeat the experiments shown in Fig. 2D in the presence of inhibitors (or by KD some of the hits identified as microtubule stabilizers) that should interfere with BoNT/A trafficking to the neuronal somata. Without these additional experiments, the authors' main conclusion that the BoNT/A catalytic domain is first detected in the neuronal soma rather than at the nerve terminal is very preliminary.

      Similarly as for the SNAPR assay, the reviewer is raising the level of doubt to very high levels. We respect his thoroughness and eagerness to question the new model. However, we note that a similar level of scrutiny does not apply to the prevalent competitive model. Indeed, the data supporting the self-translocation model is based on a single in vitro experiment published in one panel as we have explain din the discussion (see above).

      (3) In the genome-wide RNAi screening, rather than solely assessing SV2 surface levels, it would have been beneficial to directly investigate BoNT/A binding to the neuronal membrane. For instance, this could have been achieved by using a GFP-tagged HC domain of BoNT/A. At present, the authors cannot exclude the possibility that among the 135 hits that did not affect SV2 levels, some might still inhibit BoNT/A binding to the neuronal surface. These concerns, already exemplified by B4CALT4 (which is known to be involved in the synthesis of GT1b), should be explicitly addressed in the main text.

      We agree with the reviewer that perturbation of binding of BonT is possible. We added the following text:

      “Network analysis reveals regulators of signaling, membrane trafficking and thioreductase redox state involved in BoNT/A intoxication

      Among the positive regulators of the screen, 135 hits did not influence significantly surface SV2 levels and are thus likely to function in post-endocytic processes (Supplementary Table 2). However, we cannot formerly exclude that they could affect binding of BonT to the cell surface independently of SV2.”

      (4) The authors should clearly state which reagents they have tried to use in order to explain the challenges they faced when directly testing the trafficking of BoNT/A. The accumulation of Dendra-SV2 bulbous structures at the neurite tips in VPS35-depleted cells could be interpreted as a sign of neuronal stress/death. Have the authors investigated other proteins that do not undergo retro-axonal trafficking in a retromer-dependent manner? This control is essential. In this regard, the use of a GFP-tagged HC domain of BoNT/A could prove to be quite helpful.

      We tried multiple commercially available antibodies against BonT but we could not get a very good signal. The postdoc in charge of this project has now gone to greener pastures and we are not in the capacity to provide the details corresponding to these antibodies. We di dnot observe significant cell death after VPS-35 knockdown at the time of the experiment, however longe rterm treatment might result in toxicity indeed.

      (5) Considering my concerns related to the SNAPR system and the complementation assay to study SNAP25 cleavage and BoNT/A trafficking, I suggest validating some of their major hits (ex. VPS34 and Sec61) by performing WB or IF analysis to examine the cleavage of endogenous SNAP25. Furthermore, the authors should test VPS35 depletion in the context of the experiments performed in Fig. 6G-H, by validating that this protein is essential for BoNT/A retrograde trafficking.

      The reviewer concerns are well noted but as discussed above, the two systems we used are completely orthogonal. Thus, for the reviewer’s concerns to be valid, it would have to be two completely independent artefacts giving rise to the same result. The alternative explanation is that BonT/A translocates in the soma. The Ockham razor principle dictates that the simplest explanation is the likeliest.

      (6) The introduction and the discussion section of this paper completely disregard more than 20 years of research conducted by several labs worldwide (Montecucco, Montal, Schiavo, Rummel, Binz, etc). The authors should make an effort to contextualize their data within the framework of these studies and address the significant discrepancies between their proposed intoxication model and existing research that clearly demonstrates BoNTs translocating upon the endocytic retrieval of SVs at presynaptic sites. Nevertheless, even assuming that the model proposed by the authors is accurate, numerous questions emerge. One such question is: How can the authors explain the exceptional toxicity of botulinum neurotoxin in an ex vivo neuromuscular junction preparation devoid of neuronal cell bodies (see Cesare Montecucco and Andreas Rummel's seminal studies)?

      Please see above in the answer to public reviews.

      (7) Scale bars should be added to all representative pictures.

      This has been done. Thank you for the thorough reading of our manuscript.

      Reviewer #2(Recommendations For The Authors):*

      (1) The title overstates the results. It may be indicated "in differenciated ReNcell VM".

      Title changed to: “Botulinum toxin intoxication requires retrograde transport and membrane translocation at the ER in RenVM neurons”

      (2) In the provided manuscript there are two Figure 2 and no Figure 3. This made the reading and understanding extremely difficult and should be corrected. As a result, the Figure legends do not fit the numbering. There are also discrepancies between some Figure panels (A, B, C, etc), the text, and the Legends. All this needs to be carefully checked.

      We apologize for the confusion as the manuscript as followed multiple rounds of revisions. We have carefully verified labels and legends.

      (3) The BoNT/A-mNG11x3 may introduce some bias that could be discussed. Would these additional peptides block LC translocation from synaptic vesicles in the nerve termini? In addition, the mNG peptides that are unfolded before complementation may direct LC towards Sec61. These aspects should be discussed.

      The comment would be valid if BoNT/A-mNG11x3 was the only approach used in the paper, however the SNAPR reporter is used with native BonT and shows data consistent with the split GFP approach.

      (4) In the Figure about SV2 (Fig 3 or 4): The authors did not locate SV2. The cells seem not to have the same differentiated phenotype as in Figure 1 and Figure 2/3A.

      We apologized above for the mislabeling. It is not clear what is the question here.

      (5) The authors should check whether BoNT/A wt cleaves the endogeneous SNAP25 by western blot for instance in the original ReNcell VN before SNAPR engineering. This should be compared with wt SNAP25 cleavage by the BoNT/A-LC-mNG.

      It is likely that BoNT/A-LC-mNG11 should have similar activity as it is only adding a small peptide at the end of the LC. At any rate, it is not clear why this is so important since both molecules translocate in the cytosol, with the same kinetics and in the same subcellular locale.

      (6) Perhaps I did not understand. How can the authors exclude that what is observed is the kinetic overproduction of the reporter substrate SNAPR?

      The authors could use SLO toxin (PNAS 98, 3185-3190, 2001) to permeabilize the cells all along their body and axon to introduce BoNT/A or LC (wt) and observe synchronized SNAPR cleavage throughout the cells.

      The concept mentioned here is not very clear to us. The reviewer is proposing that the SNAPR is produced much more efficiently at the tips of the neurites and thus its cleavage takes longer to be detected and is apparent first in the soma?? With all due respect, this is a strange hypothesis, at odds with what we know of protein dynamics in the neurons (i.e. most proteins are largely made in the soma and transported or diffuse into the neurites).

      Again, the two orthogonal approaches: split GFP and SNAPR reporter use different constructs and methods, yet converge on similar results. Perhaps, the incredulity of the reviewer might be more productively directed at the current data “demonstrating” the translocation of LC in the synaptic button?

      (7) The authors could also use an essay on neurotransmitter release monitoring by electrophysiology measurements to check the functional consequences of the kinetic diffusion of LC activity along the axon. Can the authors exclude that some toxin molecules translocate from the endocytic vesicles and block neurotransmission within minutes or a few hours?

      It is well established that inhibition of neurotransmission does not occur within minutes in vivo and in vitro, but rather within hours or even days. This kinetic delay is experienced by many patients and is one of the key argument against the current model of self-translocation at the synaptic vesicle level.

      Minor remarks

      Thank you for pointing out all these.

      (1) Please check typos. There are many. Check space before the parenthesis, between numbers and h (hours), reference style etc.

      Thank you. We have reviewed the text and try to eliminate all these instances.

      (2) Line 90: The C of HC should be capitalized.

      Fixed

      (3) Line 107: add space between "neurons(Donato".

      Fixed

      (4) Line 109: space "72 h".

      Fixed

      (5) Line 115: a word is missing ? ...to show retro-axonal... ? Please clarify this sentence.

      Fixed

      (6) Figure 1E: does nm refer to nM (nanomolar)? Please correct. No mention of panel F.

      Fixed

      (7) Line 161: do you mean ~16 µm/h? Please correct.

      Fixed

      (8) Line 168, words are missing.

      Fixed, thank you

      We verified that Cyt-mNG1-10 was expressed using the HA tag, the expression was homogeneously distributed in differentiated neurons and we observed no GFP signal (Figure2C).

      (9) Line 171: Isn't mNG 11 the eleventh beta strand of the neon green fluorescent protein, not alpha helix? Otherwise, can the authors confirm it acquires the shape of an alpha helix? Same at line 326.

      We have corrected the mistake; thanks for pointing it out.

      (10) Figure 2 is doubled. The legend of Fig 2 refers to Figure 3. There is no legend for Figure 2. Then, some figures are shifted in their numbering.

      Fixed

      (11) The fluorescence in the cell body must appear before the fluorescence in the axon due to higher volume. Please discuss.

      The fluorescence progresses in the neurites extensions in a centripetal fashion. The volume of the neurite near the cell body is not significantly different from the end of the neurite. Thus the fluorescence data is consistent with translocation in soma and not with an effect due to higher volume in the soma.

      (12) Figure 2D, right: the term intoxication is improper for this experiment. Rather, it is the presence of the BoNT/A-mNG11 that is detected. I believe the authors should be particularly careful about the use of terms: intoxication means blockade of neurosecretion, SNAPR cleavage means activity etc.

      While the reviewer is correct that it is the presence of BoNT/A-mNG11 that is detected, it remains that it is an active toxin, so the neurons are effectively intoxicated; as they are when we use the wild type toxin. We do not imply that we are measuring intoxication, but simply that the neurons are put into contact with a toxin.

      (13) Line 196: Should we read TXNRD1 is required for BoNT/A LC translocation? TXNRD1 in the current model of translocation is located in the cytoplasm and is supposed to play a role in the cleavage of the disulfide bond linking LC to HC. In the model proposed by this study, LC is translocated through the Sec61 translocon. In this case, I would assume that the protein disulfide isomerase (PDI) in the endoplasmic reticulum would reduce the LC-HC disulfide bond. In that case, TXNRD1 would not be required anymore. Please discuss.

      Why should we assume that a PDI is involved in the reduction of the LC-HC disulfide bond? In our previous studies on A-B toxins (PE and Ricin), different reduction systems seemed to be at play. There is no conceptual imperative to assume reduction in the ER because the Sec61 translocon is implicated. Reduction might occur on the cytosolic side by TXNRD1 or the effect of this reductase could be indirect.

      (14) The legend of Figure 4 (in principle Figure 5?) is not matching with the panels and panel entries are missing (Figure 4F in particular).

      Fixed

      (15) Figure 6 panels E and H, please match colors with legend (grey and another color).

      Not clear

      (16) Please indicate BoNT/A construct concentrations in all Figure legends.

      Done

      (17) Line 416: isn't SV2 also involved in epilepsy?

      Yes it is.

      (18) Line 433: as above, shouldn't the disulfide bond linking LC to HC be cleaved by PDI in the ER in this model (as for other translocating bacterial toxins) rather than by thioredoxin reductases in the cytoplasm? Please discuss.

      See above

      (19) Identification of vATPase in the screen could be consistent with the endocytic vesicle acidification model of translocation.

      Yes

      (20) Did the authors add KCl in screening controls without toxins? This should be detailed in the Materials and Methods. Could there be a KCl effect on the cells? KCl exposure for 48 hours may be highly stressful for cells. The KCl exposure should last only several minutes for toxin entry.

      We did not observe significant cell detah with the cell culture conditions used. Cell viability was controlled at multiple stages using nuclei number for instance

      Reviewer #3 (Recommendations For The Authors):

      Main comments: (1) In Figure 1B: could you devise a means to prevent proteosomal degradation of the tGFP cleaved part to assess whether this is formed?

      We have also used a FRET assay after tintoxication and obtained similar results

      (2) Line 152: Where it reads "was not surprising", maybe I missed something, but to me, this is indeed surprising. If the toxin is rapidly internalized and translocated (therefore, it is able to cleave SNAP25), the fact that tGFP requires 48 hours to be degraded seems surprising to me. Or does it mean that the toxin also slows down the degradation of the tGFP fragment? So, how can you differentiate between the effect being on cleavage of the fragment or in tGFP degradation?

      The reviewer is correct, the “not” was a typo due to re-writting; the long delay between adding the toxin and observing cleavage was suprising indeed. Our interpretation is that it is trafficking that takes time, indeed, the split-GFP data kinetics indicates that the toxin takes about 48h to fill up the entire cytosol (Fig. 2D).

      (3) Regarding the effect of Sec61G knockdown, is it possible that the observed effects are indirect and not due to the translocon being directly responsible for translocating the protein?

      As discussed in the last part of the results,Sec61 knock-down results in block of intoxication, but does not prevent BonT from reaching the lumen of the ER (Figure 6G,H). Thus, Sec61 is “is instrumental to the translocation of BoNT/A LC into the neuronal cytosol at the soma.”

      Minor comments:

      (1) Fig. 3E: in the legend I think one of the NT3+ should be NT3-.

      Yes, thanks for spotting it

      (2) Would you consider adding Figure S4 as a main figure?

      Thanks for the suggestion

      (3) Please, check that all microscopy image panels have scale bars.

      Done

      (4) Figure 6B (bottom panes): why does it seem that there is a lot of mNeonGreen positive signal in regions that are not positive for HA? Shouldn't complementation keep HA in the complemented protein.

      Our assumption i sthat there is an excess of receptor protein (HA tag) over reconstituted protein (GFP protein) given the relatively low concentration of toxin being internalized and translocated Refs: (1) Pirazzini M, Azarnia Tehran D, Leka O, Zanetti G, Rossetto O, Montecucco C. On the translocation of botulinum and tetanus neurotoxins across the membrane of acidic intracellular compartments. Biochim Biophys Acta. 2016 Mar;1858(3):467–474. PMID: 26307528

      (2) Pirazzini M, Rossetto O, Eleopra R, Montecucco C. Botulinum Neurotoxins: Biology, Pharmacology, and Toxicology. Pharmacol Rev. 2017 Apr;69(2):200–235. PMCID: PMC5394922

      (3) Dong M, Masuyer G, Stenmark P. Botulinum and Tetanus Neurotoxins. Annu Rev Biochem. Annual Reviews; 2019 Jun 20;88(1):811–837.

      (4) Rossetto O, Pirazzini M, Fabris F, Montecucco C. Botulinum Neurotoxins: Mechanism of Action. Handb Exp Pharmacol. 2021;263:35–47. PMCID: 6671090

      (5) Williams JM, Tsai B. Intracellular trafficking of bacterial toxins. Curr Opin Cell Biol. 2016 Aug;41:51–56. PMCID: PMC4983527

      (6) Mesquita FS, van der Goot FG, Sergeeva OA. Mammalian membrane trafficking as seen through the lens of bacterial toxins. Cell Microbiol. 2020 Apr;22(4):e13167. PMCID: PMC7154709

      (7) Hoch DH, Romero-Mira M, Ehrlich BE, Finkelstein A, DasGupta BR, Simpson LL. Channels formed by botulinum, tetanus, and diphtheria toxins in planar lipid bilayers: relevance to translocation of proteins across membranes. Proc Natl Acad Sci U S A. 1985 Mar;82(6):1692–1696. PMCID: PMC397338

      (8) Donovan JJ, Middlebrook JL. Ion-conducting channels produced by botulinum toxin in planar lipid membranes. Biochemistry. 1986 May 20;25(10):2872–2876. PMID: 2424493

      (9) Fischer A, Montal M. Single molecule detection of intermediates during botulinum neurotoxin translocation across membranes. Proc Natl Acad Sci U S A. 2007 Jun 19;104(25):10447–10452. PMCID: PMC1965533

      (10) Fischer A, Nakai Y, Eubanks LM, Clancy CM, Tepp WH, Pellett S, Dickerson TJ, Johnson EA, Janda KD, Montal M. Bimodal modulation of the botulinum neurotoxin protein-conducting channel. Proc Natl Acad Sci U S A. 2009 Feb 3;106(5):1330–1335. PMCID: PMC2635780

      (11) Fischer A, Montal M. Crucial role of the disulfide bridge between botulinum neurotoxin light and heavy chains in protease translocation across membranes. J Biol Chem. 2007Oct 5;282(40):29604–29611. PMID: 17666397

      (12) Koriazova LK, Montal M. Translocation of botulinum neurotoxin light chain protease through the heavy chain channel. Nature structural biology. 2003. p. 13–18. PMID: 12459720

      (13) Moreau D, Kumar P, Wang SC, Chaumet A, Chew SY, Chevalley H, Bard F.Genome-wide RNAi screens identify genes required for Ricin and PE intoxications. Dev Cell. 2011 Aug 16;21(2):231–244. PMID: 21782526

      (14) Bassik MC, Kampmann M, Lebbink RJ, Wang S, Hein MY, Poser I, Weibezahn J, Horlbeck MA, Chen S, Mann M, Hyman AA, Leproust EM, McManus MT, Weissman JS. A systematic mammalian genetic interaction map reveals pathways underlying ricin susceptibility. Cell. 2013 Feb 14;152(4):909–922. PMCID: PMC3652613

      (15) Tian S, Muneeruddin K, Choi MY, Tao L, Bhuiyan RH, Ohmi Y, Furukawa K, Furukawa K, Boland S, Shaffer SA, Adam RM, Dong M. Genome-wide CRISPR screens for Shiga toxins and ricin reveal Golgi proteins critical for glycosylation. PLoS Biol. 2018 Nov;16(11):e2006951. PMCID: PMC6258472

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      (1) Importantly, it would be useful to have provided more detailed information on the structure and histological properties of the murine cysts and how such findings relate to human lung cysts. Also, the authors should examine whether there is any information on Bmpr1a in human cyst formation (i.e GWAS data).

      We fully agree that it is important to examine Bmpr1a in human cyst pathology. Unfortunately, there is no GWAS data on this. From the published RNA-seq data, which were obtained from postnatal lung specimen of congenital pulmonary airway malformation (CPAM) patients, “integrated suppression of BMP signaling pathway” was reported although altered expression of BMPR1A was not presented. We speculate that (1) BMPR1A is critical in embryonic development and a germline deficiency of BMPR1A may lead to early embryonic lethality prior to lung formation as supported by mouse data; (2) As suggested by our previously published study related to TGF-beta signaling and prenatal pulmonary cysts (Miao et al., Am J Physiol Lung Cell Mol Physiol 2021), dysregulation of BMPR1A-mediated signaling in a particular time window of fetal lung development may be sufficient to cause cyst formation, so that BMPR1A alteration may not be persistent to postnatal lung specimens.

      (2) Throughout the paper, there is a lack of quantification for the histological findings. Littermate controls should also be clearly defined genetically,

      We thank the reviewer for this suggestion and acknowledge the importance of quantitative measurement for the changes. We now add quantitative data on branching number and size of the airway tips to define the difference between wild-type and Bmpr1a CKO mouse lungs in Fig.1. “The littermate controls were the mice without any gene deletion due to lack of transgenes Tbx4-rtTA and/or TetO-Cre”, which is now added in Materials and Methods.

      (3) Figure 1 suppl: "Doxycycline" is misspelled.

      This has been corrected.

      (4) Figure1c Suppl: Hard to discern clear-cut expression of Bmpr1a protein in mesenchyme in WT. Comparable images with similar sizes of airways should be used.

      To provide a clearer comparison of Bmpr1a expression patterns between Bmpr1a CKO and control lungs, we enlarge the fluorescent stained lungs presented in Supplemental Figure 1C as suggested by the editor. Additionally, dotted lines have been added to delineate the airway boundaries from the surrounding mesenchyme to better visualize the Bmpr1a distribution in lung mesenchyme. Bmpr1a expression in fetal lung mesenchyme is easily detected at E15.5 when significant dilation of airways is presented in Bmpr1a CKO lung. It is rare to have comparable sizes of peripheral airways in the Bmpr1a CKO lung at this point.

      (5) Figure 2a: Expression of several genes studied and altered should be identified on scatter plot.

      As suggested by the reviewer, we now highlight the related genes, including Acta2, Myocd, Eln, Bmp4, Sox2, etc., in the scatter plot. In addition, we also highlight these critical genes in the heatmap (Fig. 2B and Fig. 7B).

      (6) Figure 2c: Authors should also consider staining for other smooth muscle markers.

      We now include a panel of Myh11 immunostaining in Figure 2E. Myh11 is another common marker for smooth muscle cells. Lack of Myh11 staining in Bmpr1a CKO lung airways further supports our conclusion that loss of mesenchymal Bmpr1a leads to defective airway smooth muscle growth.

      (7) Figure 3: ELN expression should be defined in a clear quantitative manner.

      We have presented RNA-seq data, Real-time PCR results, immunostaining, and western blot data for in vivo samples. Additionally, we have included in vitro experiment illustrating that Bmp4 induces Eln expression, suggesting that BMP signaling regulates Eln expression. We believe that these datasets collectively support our conclusion.

      (8) Figure 4: Additional information on p38 dependent signaling (Including in vivo studies) would potentially help to understand key molecular events and perhaps could help to address key mechanistic events, including their location and identity.

      We sincerely appreciate the insightful suggestion from the reviewer. While the study of p38-dependent signaling is definitely important to dissect the entire mechanisms, we are not going to include such experiments in this manuscript due to time constraints associated with in vivo studies.

      (9) Figure 6: Would be helpful to know whether Bmpr1a receptor is expressed in Myocd KO.

      Bmpr1a expression is not changed in Myocd KO lungs, which is now included as Figure 6C. Together with other data, this suggests that Myocd is a downstream target directly mediating Bmpr1a-regulated airway smooth muscle development.

      (10) Figure 7: Not clear how these findings, though interesting, relate to the body of studies and the pathogenesis of cyst formation. Other points: 1) The authors should re-examine/repeat co-staining in the KO mouse lung (right 2 images in the top group of 4) for Foxj1, Sox2, and CDH (right 2 images, Figure 7A). For one thing, the cadherin stain in the 2 KO images seems localized to the lumen. Secondly, the pattern of cadherin staining looks exactly the same in both KO images, suggesting an error and/or duplication 2) authors should place arrows on the heat map showing the location of SPC, Sox2, Sox9, and FoxJ1 bands 3) figure 7D graph needs numbers on y axis.

      Fig.7 provides an additional potential mechanism by which deficient Bmp signaling leads to abnormally increased Bmp ligand expression, which disrupts the formation of epithelial proximal-distal axis, and results in cystic defects. Further in vivo experiments are needed to test this, which is beyond the scope of this paper.

      The E-cadherin staining signal in the lumen is caused by the tissue section positioned at an interface between lumen and the apical membrane of the lining epithelial cells where the E-cadherin is localized.

      Triple immunostaining of E-Cadherin, Sox2, and FoxJ1 was performed for the same tissue section (upper two panels of Figure 7A) as these antibodies were derived from different species, but the images are presented in two different combinations for simplicity and clarity. For the lower two panels of Figure 7A, double immunostaining of Sox9/E-Cadherin and Spc/E-Cadherin were performed separately on different tissue sections due to both anti-Sox9 and anti-Spc antibodies were produced from rabbits.

      The genes listed in the heatmap are canonical and putative marker genes for differential lung epithelial cell lineages, such as Scgb1a1 for Clara cells and FoxJ1 for ciliated cells. Therefore, progenitor cell marker Sox2 and Sox9 were not included. In the updated heatmap, four widely acknowledged epithelial cell markers—Scgb1a1, FoxJ1, Sftpb, and Sftpc have been distinguished by utilizing a distinct font color (red) to enhance their visibility.

      Label for the y axis of Fig.7D is now added.

      Reviewer #2 (Public Review):

      (1) The authors may be aware that a recent paper (https://doi.org/10.1038/s41598-022-24858-3) reported on transcriptional changes seen in human CPAM. It would seem that some of the molecular changes seen in human CPAM move in the opposite direction of what is reported in mice lacking mesenchymal Bmrp1a. Perhaps the authors could comment on these differences in the discussion and whether they potentially explain the etiology of CPAM or branching morphogenesis in general.

      We thank the reviewer for referring this paper regarding human CPAM study. CPAM has a variety of histopathology. The type 1 CPAM is assumed to develop from more proximal bronchial/bronchiolar airways while type 2 CPAM is developed from relatively distal bronchiolar airways. In that publication, surgical resected lung specimens were collected from type 1 CPAM patients postnatally (0.5-1 year), in which the cysts were lined with ciliated pseudostratified columnar epithelial cells. Gene expression was compared between cystic lung tissues and adjacent non-cystic lung tissues. Interestingly, integrated suppression of BMP signaling pathway was shown by their data analysis. In our mouse model, the histopathology appears as human type 2 CPAM, such as back-to-back cysts lining with a simple layer of epithelial cells. Therefore, several factors could explain the differences between their published data and our study at the molecular level: (1) Different types of CPAM based on the histopathology; (2) Different sampling time points, developing cysts at fetal stage in mouse sample vs. developed cysts in postnatal huma samples; (3) Different comparison of diseased and normal tissues: separate normal lungs vs. cystic lungs in mice while in human cystic tissues vs. non-cystic tissues in the same lungs. We now include this reference in the Discussion.

      (2) Figure 4 shows that BMP4 increases SMADs, p38, and several muscle genes in mesenchymal cells. Figure 5 extends this finding with a clever strategy to label airway and vascular smooth muscle with different fluorescent molecules used to isolate different types of mesenchymal cells. It shows that non-vascular smooth muscle cells but not perivascular smooth muscles are responsive to BMP4 signaling as defined by increased expression of Myh11. Are there cell-restricted responses to the other genes shown in Figure 4? Given the lack of SMAD signaling and the increase seen in p38 signaling, would blocking p38 signaling influence the BMP responsiveness of these nonvascular smooth muscle cells?

      We thank the reviewer for this constructive comment. As we have addressed above, we will leave p38-mediated signaling and cyst formation to next step study due to time constraints associated with these studies.

      (3) Figure 6 shows that mesenchymal loss of Myocd causes a deficiency of airway smooth muscle cells, but this was not sufficient to create cysts. Did the authors ever check to see if it changed Sox2-Sox9 staining in the airway epithelium?

      There is no significant change in Sox2 expression in proximal airway epithelia of Myocd CKO lungs as detected by immunostaining. The result was not included in this manuscript.

      (4) Figure 7 shows that mesenchymal loss of Bmpr1a proximalizes the distal airway as defined by loss of Sox2 and FoxJ1 (a ciliated marker) and gain in (Sox9 and SP-C) staining. But Club cells expressing Scgb1a1 and Cyp2F2 are the predominant epithelial cells in the distal airway. The transcriptomics data in panel B shows expression of these genes is less in the mutant mice. Does this mean they fail to generate Club cells or there is just less expression per cell? In other words, what are the primary epithelial cells present in the airways of mice with loss of mesenchymal Bmpr1a?

      As shown in the heatmap of Fig.7b, the dysregulated gene expression in the Bmpr1a CKO extends beyond the featured epithelial cell markers, encompassing alterations in numerous putative marker genes. For example, several putative Club cell markers in addition to Scgb1a1 and Cyp2F2 were reduced in the Bmpr1a CKO lungs, suggesting a compromised differentiation of Club cells. Additionally, we observed upregulations of some molecular markers for distal progenitors and differentiated cells in the proximal region of airways, again suggesting a significant disruption in epithelial differentiation in the Bmpr1a CKO lungs. These abnormal cells can be further defined by a single cell transcriptomic approach in future.

      Recommendations for Authors:

      Reviewer #1 (Recommendations For The Authors):

      As discussed above, there may be an issue with the histological images and staining in 2 images in Figure 7A. The precise images, problems and suggestions to resolve the issue are in the Review.

      Please see our response to Reviewer 1 above.

      Reviewer #2 (Recommendations For The Authors):

      Minor Weaknesses:

      (1) Please enlarge the fluorescent stained lungs presented in Supplemental Figure 1C.

      We have revised this panel accordingly.

      (2) Figure 1D and E show that loss of Bmpr1a does not change proliferation or apoptosis on E15.5. Was that also seen through E18.5?

      We thank the reviewer for the thoughtful question about proliferation and apoptosis at later embryonic stages. Our focus here was to elucidate the mechanisms underlying abnormal branching morphogenesis and lung cyst initiation that occur prior to E15.5 in our model. Measuring the dynamic changes in cell proliferation and apoptosis at later timepoints will help to understand cyst progression, which will be our next focus.

      (3) BMP inhibitors used in Figure 4 show that BMP signaling regulates mesenchymal myogenesis independent of SMAD. But the experiments don't show how the inhibitors impact the control cells.

      We have examined the effects of the BMPR1 inhibitor LDN on the control cells. At the same dose (200 nM) and serum-free culture condition, LDN did not affect the basal level of BMP signaling (data not included) but blocked exogeneous BMP4-induced signaling elevation (Fig.4E).

      (4) Bmpr1a was deleted by administering doxycycline to pregnant dams prior to lung bud formation. It caused cystic disorders by disrupting proximal airspace. Could the authors speculate on why it does not impact tracheal and bronchiolar development? In other words, does the TBX4 promoter not target these cells? Do these cells not express Bmpr1a?

      The Tbx4 enhancer does target mesenchymal cells surrounding the trachea and bronchioles. Deletion of Bmpr1a in tracheal mesenchymal cells result in disruption of tracheal cartilage formation and smooth muscle differentiation. These phenotypes are evident in the gross view of lungs from E15.5 and later (Fig.1A). However, our manuscript is focusing on the phenotype of prenatal lung cysts, and we have chosen not to include complex data on tracheal development.

    1. Author response:

      We would like to thank the reviewers for their helpful comments. We note that both reviews are strongly supportive with comments including, “a biophysical tour de force” (rev #1), “the study is exemplary” (rev #2), and “represents a roadmap for future work” (rev #2). Below we respond to each reviewer comment.

      Reviewer #1

      This study provides a detailed and quantitative description of the allosteric mechanisms resulting in the paradoxical activation of BRAF kinase dimers by certain kinase inhibitors. The findings provide a much needed quantiative basis for this phenomenon and may lay the foundation for future drug development efforts aimed at the important cancer target BRAF. The study builds on very evidence obtained by multiple independent biophysical methods.

      Summary:

      The authors quantitatively describe the complex binding equilibria of BRAF and its inhibitors resulting in some cases in the paradoxical activation of BRAF dimer when bound to ATP competitive inhibitors. The authors use a biophysical tour de force involving FRET binding assays, NMR, kinase activity assays and DEER spectroscopy.

      We are gratified by the reviewer’s supportive summary.

      Strengths:

      The strengths of the study are the beautifully conducted assays that allow for a thorough characterization of the allostery in this complex system. Additionally, the use of F-NMR and DEER spectroscopy provide important insights into the details of the process. The resulting model for binding of inhibitors and dimerization (Fig.4) is very helpful.

      Weaknesses:

      This is a complex system and its communication is inherently challenging. It might be of interest to the broader readership to understand the implications of the model for drug development and therapy.

      We agree with the reviewer that this is a complicated system. With regard to inhibitor development, a key insight is that designing aC-in state inhibitors that avoid paradoxical activation may be non-trivial because these molecules not only induce dimers but also tend to bind the second dimer subunit more weakly than the first, due to allosteric asymmetry and/or inherently different affinities for each RAF isoform. We feel the full implications for future therapeutic development are an extensive topic that is beyond the scope of our work, which is focused on the properties of current inhibitors.

      Recommendations for the author:

      The experimental work, analysis and resulting model are excellent. I had some difficulty following the complex model in some instances and it may be useful to review the description of the model and see whether it can be made more palatable to the broader readership. I think it would be useful to discuss the model presented in reference 40 (Kholodenko) and to compare it to the presented model here.

      We regret any confusion with regards to the nature of the model. Our analysis was built upon the model developed by Boris Kholodenko as reported in his 2015 Cell Reports paper. This formed the theoretical framework that combined with our experimental data allowed us to parameterize this model to obtain experimental values for the equilibrium constants and allosteric coupling factors.

      Reviewer #2

      This manuscript combines elegant biophysical solution measurements to address paradoxical kinase activation by Type II BRAF inhibitors. The novel findings challenge prevailing models, through experiments that are rigorous and carefully controlled. The study is exemplary in the breadth of strategies it uses to address protein kinase dynamics and inhibitor allostery.

      Summary:

      This manuscript uses FRET, 19F-NMR and DEER/EPR solution measurements to examine the allosteric effects of a panel of BRAF inhibitors (BRAFi). These include first-generation aC-out BRAFi, and more recent Type I and Type II aC-in inhibitors. Intermolecular FRET measurements quantify Kd for BRAF dimerization and inhibitor binding to the first and second subunits. Distinct patterns are found between aC-in BRAFi, where Type I BRAFi bind equally well to the first and second subunits within dimeric BRAF. In contrast, Type II BRAFi show stronger affinity for the first subunit and weaker affinity for the second subunit, an effect named "allosteric asymmetry". Allosteric asymmetry has the potential for Type II inhibitors to promote dimerization while favoring occupancy of only one subunit (BBD form), leading to enrichment of an active dimer.

      Measurements of in vitro BRAF kinase activity correlate amazingly well with the calculated amounts of the half site-inhibited BBD forms with Type II inhibitors. This suggests that the allosteric asymmetry mechanism explains paradoxical activation by this class of inhibitors. DEER/EPR measurements further examine the positioning of helix aC. They show systematic outward movement of aC with Type II inhibitors, relative to the aC-in state with Type I inhibitors, and further show that helix aC adopts multiple states and is therefore dynamic in apo BRAF. This makes a strong case that negative cooperativity between sites in the BRAF dimer can account for paradoxical kinase activation by Type II inhibitors by creating a half site-occupied homodimer, BBD. In contrast, Type I inhibitors and aC-out inhibitors do not fit this model, and are therefore proposed to be explained by previous proposed models involving negative allostery between subunits in BRAF-CRAF heterodimers, RAS priming, and transactivation.

      Strengths:

      This study integrates orthogonal spectroscopic and kinetic strategies to characterize BRAF dynamics and determine how it impacts inhibitor allostery. The unique combination of approaches presented in this study represents a road map for future work in the important area of protein kinase dynamics. The work represents a worthy contribution not only to the field of BRAF regulation but protein kinases in general.

      Weaknesses:

      Some questions remain regarding the proposed model for Type II inhibitors and its comparison to Type I and aC-out inhibitors that would be useful to clarify. Specifically, it would be helpful to address whether the activation of BRAF by Type II inhibitors, while strongly correlated with BBD model predictions in vitro, also depends on CRAF via BRAF-CRAF in cells and therefore overlaps with the mechanisms of paradoxical activation by Type I and aC-out inhibitors.

      We agree with the reviewer that this is a worthy question to be pursued. However, given the substantial experimental effort required for such an endeavor, and the highly supportive nature of the reviewer comments, including that “This is a strong manuscript that I feel is well above the bar for publication”, we believe this effort is more appropriate for a future study.

      This is a strong manuscript that I feel is well above the bar for publication. Nevertheless, it is recommended that the authors consider addressing the following points in order to support their major conclusions.

      (1) Fig 3D shows similar effects of Type II and Type I inhibitors in the biphasic increase of cellular pMEK/pERK. From this, the authors argue that Type II inhibitors are explained by negative allostery in the BRAF homodimer (based on Fig 2E), while Type I inhibitors are not. But it seems possible that despite the terrific correlation between BBD and BRAF kinase activities measured in vitro, CRAF is still important to explain pathway activation in cells. It also seems conceivable that the calculated %BBD between different Type II inhibitors may not correlate as well with their effects on pathway activation in cells. These possibilities should be addressed.

      We agree with the reviewer that it is likely that CRAF contributes to paradoxical activation by type II inhibitors in cells. It is also likely that other cellular factors such as RAS-priming and membrane recruitment play a role in activation. However, we note that for the type II inhibitors there is good agreement between the biophysical predictions and the concentration regimes in which activation is observed in cells, suggesting that these predictions are capturing a key part of the activation process that occurs in cells.

      (2) In Fig 2A, is it possible to report the activity of dimeric BRAF-WT in the absence of inhibitor? This would help confirm that the maximal activity measured after titrating inhibitor is indeed consistent with the predicted %BBD population, which would be expected to have half of the specific activity of BB.

      In principle, it is possible to determine the catalytic activity of apo dimers (BB) by combining our model predictions for the concentration of BB dimers and our activity measurements. However, because the activity assays are performed at nanomolar kinase concentrations, whereas the baseline dimerization affinity of BRAF is in the micromolar range, the observed activity of apo BRAF arises from a small subpopulation of dimers (on the order of 4 percent under the conditions of our experiments) and is therefore difficult to define accurately. As a result, we deemed it more suitable to compare our results to published activity measurements derived from 14-3-3-activated dimers which should represent fully dimerized BRAF. This analysis, as reported in Figure 2E, suggests that the BBD activity is approximately half of that of BB.

      (3) The 19F-NMR experiments make a good case for broadening of the helix aC signal in the BRAF dimer. From this, the study proposes that after inhibitor binds one subunit, the second unoccupied subunit retains dynamics. It would be useful to address this experimentally, if possible. For example, can the 19F-NMR signal be measured in the presence of inhibitor, to support the prediction that the unoccupied subunit is indeed dynamic and samples multiple conformations as in apo BRAF?

      We agree with the reviewer that it would be interesting to determine the dynamic response of BRAF to inhibitor binding. However, this is a challenging undertaking due to the biochemical heterogeneity that occurs at sub saturating inhibitor concentrations. For example, at any given inhibitor concentration, BRAF exists as a mixture of monomers, apo dimers, dimers with one inhibitor molecule, and dimers with two inhibitor molecules bound. This makes it challenging to relate the 19F NMR signal to a single biochemical state. Addressing this would require a substantial experimental effort that we feel is beyond the scope of this study.

    1. Author response:

      Reviewer 1:

      The paper “Quantifying gliding forces of filamentous cyanobacteria by self-buckling” combines experiments on freely gliding cyanobacteria, buckling experiments using two-dimensional V-shaped corners, and micropipette force measurements with theoretical models to study gliding forces in these organisms. The aim is to quantify these forces and use the results to perhaps discriminate between competing mechanisms by which these cells move. A large data set of possible collision events are analyzed, bucking events evaluated, and critical buckling lengths estimated. A line elasticity model is used to analyze the onset of buckling and estimate the effective (viscous type) friction/drag that controls the dynamics of the rotation that ensues post-buckling. This value of the friction/drag is compared to a second estimate obtained by consideration of the active forces and speeds in freely gliding filaments. The authors find that these two independent estimates of friction/drag correlate with each other and are comparable in magnitude. The experiments are conducted carefully, the device fabrication is novel, the data set is interesting, and the analysis is solid. The authors conclude that the experiments are consistent with the propulsion being generated by adhesion forces rather than slime extrusion. While consistent with the data, this conclusion is inferred.

      We thank the reviewer for the positive evaluation of our work.

      Summary:

      The paper addresses important questions on the mechanisms driving the gliding motility of filamentous cyanobacteria. The authors aim to understand these by estimating the elastic properties of the filaments, and by comparing the resistance to gliding under a) freely gliding conditions, and b) in post-buckled rotational states. Experiments are used to estimate the propulsion force density on freely gliding filaments (assuming over-damped conditions). Experiments are combined with a theoretical model based on Euler beam theory to extract friction (viscous) coefficients for filaments that buckle and begin to rotate about the pinned end. The main results are estimates for the bending stiffness of the bacteria, the propulsive tangential force density, the buckling threshold in terms of the length, and estimates of the resistive friction (viscous drag) providing the dissipation in the system and balancing the active force. It is found that experiments on the two bacterial species yield nearly identical values of f (albeit with rather large variations). The authors conclude that the experiments are consistent with the propulsion being generated by adhesion forces rather than slime extrusion.

      We appreciate this comprehensive summary of our work.

      Strengths of the paper:

      The strengths of the paper lie in the novel experimental setup and measurements that allow for the estimation of the propulsive force density, critical buckling length, and effective viscous drag forces for movement of the filament along its contour – the axial (parallel) drag coefficient, and the normal (perpendicular) drag coefficient (I assume this is the case, since the post-buckling analysis assumes the bent filament rotates at a constant frequency). These direct measurements are important for serious analysis and discrimination between motility mechanisms.

      We thank the reviewer for this positive assessment of our work.

      Weaknesses:

      There are aspects of the analysis and discussion that may be improved. I suggest that the authors take the following comments into consideration while revising their manuscript.

      The conclusion that adhesion via focal adhesions is the cause for propulsion rather than slime protrusion is consistent with the experimental results that the frictional drag correlates with propulsion force. At the same time, it is hard to rule out other factors that may result in this (friction) viscous drag - (active) force relationship while still being consistent with slime production. More detailed analysis aiming to discriminate between adhesion vs slime protrusion may be outside the scope of the study, but the authors may still want to elaborate on their inference. It would help if there was a detailed discussion on the differences in terms of the active force term for the focal adhesion-based motility vs the slime motility.

      We appreciate this critical assessment of our conclusions. Of course we are aware that many different mechanisms may lead to similar force/friction characteristics, and that a definitive conclusion on the mechanism would require the combination of various techniques, which is beyond the scope of this work. Therefore, we were very careful in formulating the discussion of our findings, refraining, in particular, from a singular conclusion on the mechanism but instead indicating “support” for one hypothesis over another, and emphasizing “that many other possibilities exist”.

      The most common concurrent hypotheses for bacterial gliding suggest that either slime extrusion at the junctional pore complex [A1], rhythmic contraction of fibrillar arrays at the cell wall [A2], focal adhesion sites connected to intracellular motor-microtubule complexes [A3], or modified type-IV pilus apparati [A4] provide the propulsion forces. For the slime extrusion hypothesis, which is still abundant today, one would rather expect an anticorrelation of force and friction: more slime extrusion would generate more force, but also enhance lubrication. The other hypotheses are more conformal to the trend we observed in our experiments, because both pili and focal adhesion require direct contact with a substrate. How contraction of fibrilar arrays would micromechanically couple to the environment is not clear to us, but direct contact might still facilitate force transduction. Please note that these hypotheses were all postulated without any mechanical measurements, solely based on ultra-structural electron microscopy and/or genetic or proteomic experiments. We see our work as complementary to that, providing a mechanical basis for evaluating these hypotheses.

      We agree with the referee that narrowing down this discussion to focal adhesion should have been avoided. We rewrote the concluding paragraph (page 8):

      “…it indicates that friction and propulsion forces, despite being quite vari able, correlate strongly. Thus, generating more force comes, inevitably, at the expense of added friction. For lubricated contacts, the friction coefficient is proportional to the thickness of the lubricating layer (Snoeijer et al., 2013 ), and we conjecture active force and drag both increase due to a more intimate contact with the substrate. This supports mechanisms like focal adhesion (Mignot et al., 2007 ) or a modified type-IV pilus (Khayatan et al., 2015 ), which generate forces through contact with extracellular surfaces, as the underlying mechanism of the gliding apparatus of filamentous cyanobacteria: more contacts generate more force, but also closer contact with the substrate, thereby increasing friction to the same extent. Force generation by slime extrusion (Hoiczyk and Baumeister, 1998 ), in contrast, would lead to the opposite behavior: More slime generates more propulsion, but also reduces friction. Besides fundamental fluid-mechanical considerations (Snoeijer et al., 2013 ), this is rationalized by two experimental observations: i. gliding velocity correlates positively with slime layer thickness (Dhahri et al., 2013 ) and ii. motility in slime-secretion deficient mutants is restored upon exogenous addition of polysaccharide slime. Still we emphasize that many other possibilities exist. One could, for instance, postulate a regulation of the generated forces to the experienced friction, to maintain some preferred or saturated velocity.”

      Can the authors comment on possible mechanisms (perhaps from the literature) that indicate how isotropic friction may be generated in settings where focal adhesions drive motility? A key aspect here would probably be estimating the extent of this adhesion patch and comparing it to a characteristic contact area. Can lubrication theory be used to estimate characteristic areas of contact (knowing the radius of the filament, and assuming a height above the substrate)? If the focal adhesions typically cover areas smaller than this lubrication area, it may suggest the possibility that bacteria essentially present a flat surface insofar as adhesion is concerned, leading to a transversely isotropic response in terms of the drag. Of course, we will still require the effective propulsive force to act along the tangent.

      We thank the referee for suggesting to estimate the dimensions of the contact region. Both pili and focal adhesion sites would be of sizes below one micron [A3, A4], much smaller than the typical contact region in the lubricated contact, which is on the order of the filament radius (few microns). So indeed, isotropic friction may be expected in this situation [A5] and is assumed frequently in theoretical work [A6–A8]. Anisotropy may then indeed be induced by active forces [A9], but we are not aware of measurements of the anisotropy of friction in bacterial gliding.

      For a more precise estimate using lubrication theory, rheology and extrusion rate of the secreted polysaccharides would have to be known, but we are not aware of detailed experimental characterizations.

      We extended the paragraph in the buckling theory on page 5 regarding the assumption of isotropic friction:

      “We use classical Kirchhoff theory for a uniform beam of length L and bending modulus B, subject to a force density ⃗b = −f ⃗t− η ⃗v, with an effective active force density f along the tangent ⃗t, and an effective friction proportional to the local velocity ⃗v, analog to existing literature (Fily et al., 2020; Chelakkot et al., 2014; Sekimoto et al., 1995 ). Presumably, this friction is dominated by the lubrication drag from the contact with the substrate, filled by a thin layer of secreted polysaccharide slime which is much more viscous than the surrounding bulk fluid. Speculatively, the motility mechanism might also comprise adhering elements like pili (Khayatan et al., 2015 ) or foci (Mignot et al., 2007 ) that increase the overall friction (Pompe et al., 2015 ). Thus, the drag due to the surrounding bulk fluid can be neglected (Man and Kanso, 2019 ), and friction is assumed to be isotropic, a common assumption in motility models (Fei et al., 2020; Tchoufag et al., 2019; Wada et al., 2013 ). We assume…”

      We also extended the discussion regarding the outcome of isotropic friction (page 7):

      “…Thus we plot f/v over η in Figure 4 D, finding nearly identical values over about two decades. Since f and η are not correlated with v0, this is due to a correlation between f and η. This relation is remarkable in two aspects: On the one hand, it indicates that friction is mainly isotropic. This suggests that friction is governed by an isotropic process like bond friction or lubrication from the slime layer in the contact with the substrate, the latter being consistent with the observation that mutations deficient of slime secretion do not glide but exogenous addition of slime restores motility (Khayatan et al., 2015 ). In contrast, hydrodynamic drag from the surrounding bulk fluid (Man and Kanso, 2019 ), or the internal friction of the gliding apparatus would be expected to generate strongly anisotropic friction. If the latter was dominant, a snapping-like transition into the buckling state would be expected, rather than the continuously growing amplitude that is observed in experiments. On the other hand, it indicates that friction and propulsion forces…”

      I am not sure why the authors mention that the power of the gliding apparatus is not rate-limiting. The only way to verify this would be to put these in highly viscous fluids where the drag of the external fluid comes into the picture as well (if focal adhesions are on the substrate-facing side, and the upper side is subject to ambient fluid drag). Also, the friction referred to here has the form of a viscous drag (no memory effect, and thus not viscoelastic or gel-like), and it is not clear if forces generated by adhesion involve other forms of drag such as chemical friction via temporary bonds forming and breaking. In quasi-static settings and under certain conditions such as the separation of chemical and elastic time scales, bond friction may yield overall force proportional to local sliding velocities.

      We agree with the referee that the origin of the friction is not easily resolved. Lubrication yields an isotropic force density that is proportional to the velocity, and the same could be generated by bond friction. Importantly, both types of friction would be assumed to be predominantly isotropic. We explicitly referred to lubrication drag because it has been shown that mutations deficient of slime extrusion do not glide [A4].

      Assuming, in contrast, that in free gliding, friction with the environment is not rate limiting, but rather the internal friction of the gliding apparatus, i.e., the available power, we would expect a rather different behavior during early-buckling evolution. During early buckling, the tangential motion is stalled, and the dynamics is dominated by the growing buckling amplitude of filament regions near the front end, which move mainly transversely. For geometric reasons, in this stage the (transverse) buckling amplitude grows much faster than the rear part of the filament advances longitudinally. Thus that motion should not be impeded much by the internal friction of the gliding apparatus, but by external friction between the buckling parts of the filament and the ambient. The rate at which the buckling amplitude initially grows should be limited by the accumulated compressive stress in the filament and the transverse friction with the substrate. If the latter were much smaller than the (logitudinal) internal friction of the gliding apparatus, we would expect a snapping-like transition into the buckled state, which we did not observe.

      In our paper, we do not intend to evaluate the exact origin of the friction, quantifying the gliding force is the main objective. A linear force-velocity relation agrees with our observations. A detailed analysis of friction in cyanobacterial gliding would be an interesting direction for future work.

      To make these considerations more clear, we rephrased the corresponding paragraph on page 7 & 8:

      “…Thus we plot f/v over η in Figure 4 D, finding nearly identical values over about two decades. Since f and η are not correlated with v0, this is due to a correlation between f and η. This relation is remarkable in two aspects: On the one hand, it indicates that friction is mainly isotropic. This suggests that friction is governed by an isotropic process like bond friction or lubrication from the slime layer in the contact with the substrate, the latter being consistent with the observation that mutations deficient of slime secretion do not glide but exogenous addition of slime restores motility (Khayatan et al., 2015 ). In contrast, hydrodynamic drag from the surrounding bulk fluid (Man and Kanso, 2019 ), or the internal friction of the gliding apparatus would be expected to generate strongly anisotropic friction. If the latter was dominant, a snapping-like transition into the buckling state would be expected, rather than the continuously growing amplitude that is observed in experiments. On the other hand, it indicates that friction and propulsion forces…”

      For readers from a non-fluids background, some additional discussion of the drag forces, and the forms of friction would help. For a freely gliding filament if f is the force density (per unit length), then steady gliding with a viscous frictional drag would suggest (as mentioned in the paper) f ∼ v! L η||. The critical buckling length is then dependent on f and on B the bending modulus. Here the effective drag is defined per length. I can see from this that if the active force is fixed, and the viscous component resulting from the frictional mechanism is fixed, the critical buckling length will not depend on the velocity (unless I am missing something in their argument), since the velocity is not a primitive variable, and is itself an emergent quantity.

      We are not sure what “f ∼ v! L η||” means, possibly the spelling was corrupted in the forwarding of the comments.

      We assumed an overdamped motion in which the friction force density ff (per unit length of the filament) is proportional to the velocity v0, i.e. ff ∼ η v0, with a friction coefficient η. Overdamped means that the friction force density is equal and opposite to the propulsion force density, so the propulsion force density is f ∼ ff ∼ η v0. The total friction and propulsion forces can be obtained by multiplication with the filament length

      L, which is not required here. In this picture, v0 is an emergent quantity and f and η are assumed as given and constant. Thus, by observing v0, f can be inferred up to the friction coefficient η. Therefore, by using two descriptive variables, L and v0, with known B, the primitive variable η can be inferred by logistic regression, and f then follows from the overdamped equation of motion.

      To clarify this, we revised the corresponding section on page 5 of the paper:

      “The substrate contact requires lubrication from polysaccharide slime to enable bacteria to glide (Khayatan et al., 2015 ). Thus we assume an over- damped motion with co-linear friction, for which the propulsion force f and the free gliding velocity v0 of a filament are related by f = η v0, with a friction coefficient η. In this scenario, f can be inferred both from the observed Lc ∼ (f/B)−1/3 and, up to the proportionality coefficient η, from the observed free gliding velocity. Thus, by combining the two relations, one may expect also a strong correlation between Lc and v0. In order to test this relation for consistency with our data, we include v0 as a second regressor, by setting x = (L−Lc(v0))/∆Lc in Equation 1, with Lc(v0) = (η v0/(30.5722 B))−1/3, to reflect our expectation from theory (see below). Now, η rather than f is the only unknown, and its ensemble distribution will be determined in the regression. Figure 3 E,F show the buckling behavior…”

      Reviewer 2:

      In the presented manuscript, the authors first use structured microfluidic devices with gliding filamentous cyanobacteria inside in combination with micropipette force measurements to measure the bending rigidity of the filaments.

      Next, they use triangular structures to trap the bacteria with the front against an obstacle. Depending on the length and rigidity, the filaments buckle under the propulsive force of the cells. The authors use theoretical expressions for the buckling threshold to infer propulsive force, given the measured length and stiffnesses. They find nearly identical values for both species, f ∼ (1.0 ± 0.6) nN/µm, nearly independent of the velocity.

      Finally, they measure the shape of the filament dynamically to infer friction coefficients via Kirchhoff theory. This last part seems a bit inconsistent with the previous inference of propulsive force. Before, they assumed the same propulsive force for all bacteria and showed only a very weak correlation between buckling and propulsive velocity. In this section, they report a strong correlation with velocity, and report propulsive forces that vary over two orders of magnitude. I might be misunderstanding something, but I think this discrepancy should have been discussed or explained.

      We regret the misunderstanding of the reviewer regarding the velocity dependence, which indicates that the manuscript should be improved to convey these relations correctly.

      First, in the Buckling Measurements section, we did not assume the same propulsion force for all bacteria. The logistic regression yields an ensemble median for Lc (and thus an ensemble median for f ), along with the width ∆Lc of the distribution (and thus also the width of the distribution of f ). Our result f ∼ (1.0 ± 0.6) nN/µm indicates the median and the width of the distribution of the propulsion force densities across the ensemble of several hundred filaments used in the buckling measurements. The large variability of the forces found in the second part is consistently reflected by this very wide distribution of active forces detected in the logistic regression in the first part.

      We did small modifications to the buckling theory paragraph to clarify that in the first part, a distribution of forces rather than a constant value is inferred (page 6)

      “Inserting the population median and quartiles of the distributions of bending modulus and critical length, we can now quantify the distribution of the active force density for the filaments in the ensemble from the buckling measurements. We obtain nearly identical values for both species, f ∼ (1.0±0.6) nN/µm, where the uncertainty represents a wide distribution of f across the ensemble rather than a measurement error.”

      The same holds, of course, when inferring the distribution of the friction coefficients (page 5):

      “The substrate contact requires lubrication from polysaccharide slime to enable bacteria to glide (Khayatan et al., 2015 ). Thus we assume an over- damped motion with co-linear friction, for which the propulsion force f and the free gliding velocity v0 of a filament are related by f = η v0, with a friction coefficient η. In this scenario, f can be inferred both from the observed Lc ∼ (f/B)−1/3 and, up to the proportionality coefficient η, from the observed free gliding velocity. Thus, by combining the two relations, one may expect also a strong correlation between Lc and v0. In order to test this relation for consistency with our data, we include v0 as a second regressor, by setting x = (L−Lc(v0))/∆Lc in Equation 1, with Lc(v0) = (η v0/(30.5722 B))−1/3, to reflect our expectation from theory (see below). Now, η rather than f is the only unknown, and its ensemble distribution will be determined in the regression. Figure 3 E,F show the buckling behavior…”

      The (naturally) wide distribution of force (and friction) leads to a distribution of Lc as well. However, due to the small exponent of 1/3 in the buckling threshold Lc ∼ f 1/3, the distribution of Lc is not as wide as the distributions of the individually inferred f or η. This is visualized in panel G of Figure 3, plotting Lc as a function of v0 (v0 is equivalent to f , up to a proportionality coefficient η). The natural length distribution, in contrast, is very wide. Therefore, the buckling propensity of a filament is most strongly characterized by its length, while force variability, which alters Lc of the individual, plays a secondary role.

      In order to clarify this, we edited the last paragraph of the Buckling Measurements section on page 5 of the manuscript:

      “…Within the characteristic range of observed velocities (1 − 3 µm/s), the median Lc depends only mildly on v0, as compared to its rather broad distribution, indicated by the bands in Figure 3 G. Thus a possible correlation between f and v0 would only mildly alter Lc. The natural length distribution (cf. Appendix 1—figure 1 ), however, is very broad, and we conclude that growth rather than velocity or force distributions most strongly impacts the buckling propensity of cyanobacterial colonies. Also, we hardly observed short and fast filaments of K. animale, which might be caused by physiological limitations (Burkholder, 1934 ).”

      Second, in the Profile analysis section, we did not report a correlation between force and velocity. As can be seen in Figure 4—figure Supplement 1, neither the active force nor the friction coefficient, as determined from the analysis of individual filaments, show any significant correlation with the velocity. This is also written in the discussion (page 7):

      We see no significant correlation between L or v0 and f or η, but the observed values of f and η cover a wide range (Figure 4 B, C and Figure 4—figure Supplement 1 ).

      Note that this is indeed consistent with the logistic regression: Using v0 as a second regressor did not significantly reduce the width of the distribution of Lc as compared to the simple logistic regression, indicating that force and velocity are not strongly correlated.

      In order to clarify this in the manuscript, we modified that part (page 7):

      “…We see no significant correlation between L or v0 and f or η, but the observed values of f and η cover a wide range (Figure 4 B,C and Figure 4— figure Supplement 1 ). This is consistent with the logistic regression, where using v0 as a second regressor did not significantly reduce the width of the distribution of critical lengths or active forces. The two estimates of the friction coefficient, from logistic regression and individual profile fits, are measured in (predominantly) orthogonal directions: tangentially for the logistic regression where the free gliding velocity was used, and transversely for the evolution of the buckling profiles. Thus we plot f/v over η in Figure 4 D, finding nearly identical values over about two decades. Since f and η are not correlated with v0, this is due to a correlation between f and η. This relation is remarkable in two aspects: On the one hand, it indicates that friction is mainly isotropic…”

      From a theoretical perspective, not many new results are presented. The authors repeat the well-known calculation for filaments buckling under propulsive load and arrive at the literature result of buckling when the dimensionless number (f L3/B) is larger than 30.6 as previously derived by Sekimoto et al in 1995 [1] (see [2] for a clamped boundary condition and simulations). Other theoretical predictions for pushed semi-flexible filaments [1–4] are not discussed or compared with the experiments. Finally, the Authors use molecular dynamics type simulations similar to [2–4] to reproduce the buckling dynamics from the experiments. Unfortunately, no systematic comparison is performed.

      [1]        Ken Sekimoto, Naoki Mori, Katsuhisa Tawada, and Yoko Y Toyoshima. Symmetry breaking instabilities of an in vitro biological system. Physical review letters, 75(1):172, 1995.

      [2]       Raghunath Chelakkot, Arvind Gopinath, Lakshminarayanan Mahadevan, and Michael F Hagan. Flagellar dynamics of a connected chain of active, polar, brownian particles. Journal of The Royal Society Interface, 11(92):20130884, 2014.

      [3]       Rolf E Isele-Holder, Jens Elgeti, and Gerhard Gompper. Self-propelled worm-like filaments: spontaneous spiral formation, structure, and dynamics. Soft matter, 11(36):7181–7190, 2015.

      [4]       Rolf E Isele-Holder, Julia J¨ager, Guglielmo Saggiorato, Jens Elgeti, and Gerhard Gompper. Dynamics of self-propelled filaments pushing a load. Soft Matter, 12(41):8495–8505, 2016.

      We thank the reviewer for pointing us to these publications, in particular the work by Sekimoto we were not aware of. We agree with the referee that the calculation is straight forward (basically known since Euler, up to modified boundary conditions). Our paper focuses on experimental work, the molecular dynamics simulations were included mainly as a consistency check and not intended to generate the beautiful post-buckling patterns observed in references [2-4]. However, such shapes do emerge in filamentous cyanobacteria, and with the data provided in our manuscript, simulations can be quantitatively matched to our experiments, which will be covered by future work.

      We included the references in the revision of our manuscript, and a statement that we do not claim priority on these classical theoretical results.

      Introduction, page 2:

      “…Self-Buckling is an important instability for self-propelling rod-like micro-organisms to change the orientation of their motion, enabling aggregation or the escape from traps (Fily et al., 2020; Man and Kanso, 2019; Isele-Holder et al., 2015; Isele-Holder et al., 2016 ). The notion of self-buckling goes back to work of Leonhard Euler in 1780, who described elastic columns subject to gravity (Elishakoff, 2000 ). Here, the principle is adapted to the self-propelling, flexible filaments (Fily et al., 2020; Man and Kanso, 2019; Sekimoto et al., 1995 ) that glide onto an obstacle. Filaments buckle if they exceed a certain critical length Lc ∼ (B/f)1/3, where B is the bending modulus and f the propulsion force density…”

      Buckling theory, page 5:

      “…The buckling of gliding filaments differs in two aspects: the propulsion forces are oriented tangentially instead of vertically, and the front end is supported instead of clamped. Therefore, with L < Lc all initial orientations are indifferently stable, while for L > Lc, buckling induces curvature and a resultant torque on the head, leading to rotation (Fily et al., 2020; Chelakkot et al., 2014; Sekimoto et al., 1995 ). Buckling under concentrated tangential end-loads has also been investigated in literature (de Canio et al., 2017; Wolgemuth et al., 2005 ), but leads to substantially different shapes of buckled filaments. We use classical Kirchhoff theory for a uniform beam of length L and bending modulus B, subject to a force density ⃗b = −f ⃗t − η ⃗v, with an effective active force density f along the tangent ⃗t, and an effective friction proportional to the local velocity ⃗v, analog to existing literature (Fily et al., 2020; Chelakkot et al., 2014; Sekimoto et al., 1995 )…”

      Further on page 6:

      “To derive the critical self-buckling length, Equation 5 can be linearized for two scenarios that lead to the same Lc: early-time small amplitude buckling and late-time stationary rotation at small and constant curvature (Fily et al., 2020; Chelakkot et al., 2014 ; Sekimoto et al., 1995 ). […] Thus, in physical units, the critical length is given by Lc = (30.5722 B/f)1/3, which is reproduced in particle based simulations (Appendix Figure 2 ) analogous to those in Isele-Holder et al. (2015, 2016).”

      Discussion, page 7 & 8:

      “…This, in turn, has dramatic consequences on the exploration behavior and the emerging patterns (Isele-Holder et al., 2015, 2016; Abbaspour et al., 2021; Duman et al., 2018; Prathyusha et al., 2018; Jung et al., 2020 ): (L/Lc)3 is, up to a numerical prefactor, identical to the flexure number (Isele-Holder et al., 2015, 2016; Duman et al., 2018; Winkler et al., 2017 ), the ratio of the Peclet number and the persistence length of active polymer melts. Thus, the ample variety of non-equilibrium phases in such materials (Isele-Holder et al., 2015, 2016; Prathyusha et al., 2018; Abbaspour et al., 2021 ) may well have contributed to the evolutionary success of filamentous cyanobacteria.”

      Reviewer 3:

      Summary:

      This paper presents novel and innovative force measurements of the biophysics of gliding cyanobacteria filaments. These measurements allow for estimates of the resistive force between the cell and substrate and provide potential insight into the motility mechanism of these cells, which remains unknown.

      We thank the reviewer for the positive evaluation of our work. We have revised the manuscript according to their comments and detail our replies and modifications next to the individual points below.

      Strengths:

      The authors used well-designed microfabricated devices to measure the bending modulus of these cells and to determine the critical length at which the cells buckle. I especially appreciated the way the authors constructed an array of pillars and used it to do 3-point bending measurements and the arrangement the authors used to direct cells into a V-shaped corner in order to examine at what length the cells buckled at. By examining the gliding speed of the cells before buckling events, the authors were able to determine how strongly the buckling length depends on the gliding speed, which could be an indicator of how the force exerted by the cells depends on cell length; however, the authors did not comment on this directly.

      We thank the referee for the positive assessment of our work. Importantly, we do not see a significant correlation between buckling length and gliding speeds, and we also do not see a correlation with filament length, consistent with the assumption of a propulsion force density that is more or less homogeneously distributed along the filament. Note that each filament consists of many metabolically independent cells, which renders cyanobacterial gliding a collective effort of many cells, in contrast to gliding of, e.g., myxobacteria.

      In response also to the other referees’ comments, we modified the manuscript to reflect more on the absence of a strong correlation between velocity and force/critical length. We modified the Buckling measurements section on page 5 of the paper:

      “The substrate contact requires lubrication from polysaccharide slime to enable bacteria to glide (Khayatan et al., 2015 ). Thus we assume an over-damped motion with co-linear friction, for which the propulsion force f and the free gliding velocity v0 of a filament are related by f = η v0, with a friction coefficient η. In this scenario, f can be inferred both from the observed Lc ∼ (f/B)−1/3 and, up to the proportionality coefficient η, from the observed free gliding velocity. Thus, by combining the two relations, one may expect also a strong correlation between Lc and v0. In order to test this relation for consistency with our data, we include v0 as a second regressor, by setting x = (L−Lc(v0))/∆Lc in Equation 1, with Lc(v0) = (η v0/(30.5722 B))−1/3, to reflect our expectation from theory (see below). Now, η rather than f is the only unknown, and its ensemble distribution will be determined in the regression. Figure 3 E, F show the buckling behavior…”

      Further, we edited the last paragraph of the Buckling measurements section on page 5 of the manuscript:

      “Within the characteristic range of observed velocities (1 − 3 µm/s), the median Lc depends only mildly on v0, as compared to its rather broad distribution, indicated by the bands in Figure 3 G. Thus a possible correlation between f and v0 would only mildly alter Lc. The natural length distribution (cf. Appendix 1—figure 1 ), however, is very broad, and we conclude that growth rather than velocity or force distributions most strongly impacts the buckling propensity of cyanobacterial colonies. Also, we hardly observed short and fast filaments of K. animale, which might be caused by physiological limitations (Burkholder, 1934 ).”

      We also rephrased the corresponding discussion paragraph on page 7:

      “…Thus we plot f/v over η in Figure 4 D, finding nearly identical values over about two decades. Since f and η are not correlated with v0, this is due to a correlation between f and η. This relation is remarkable in two aspects: On the one hand, it indicates that friction is mainly isotropic. This suggests that friction is governed by an isotropic process like bond friction or lubrication from the slime layer in the contact with the substrate, the latter being consistent with the observation that mutations deficient of slime secretion do not glide but exogenous addition of slime restores motility (Khayatan et al., 2015 ). In contrast, hydrodynamic drag from the surrounding bulk fluid (Man and Kanso, 2019 ), or the internal friction of the gliding apparatus would be expected to generate strongly anisotropic friction. If the latter was dominant, a snapping-like transition into the buckling state would be expected, rather than the continuously growing amplitude that is observed in experiments. On the other hand, it indicates that friction and propulsion forces…”

      Weaknesses:

      There were two minor weaknesses in the paper.

      First, the authors investigate the buckling of these gliding cells using an Euler beam model. A similar mathematical analysis was used to estimate the bending modulus and gliding force for Myxobacteria (C.W. Wolgemuth, Biophys. J. 89: 945-950 (2005)). A similar mathematical model was also examined in G. De Canio, E. Lauga, and R.E Goldstein, J. Roy. Soc. Interface, 14: 20170491 (2017). The authors should have cited these previous works and pointed out any differences between what they did and what was done before.

      We thank the reviewer for pointing us to these references. The paper by Wolgemuth is theoretical work, describing A-motility in myxobacteria by a concentrated propulsion force at the rear end of the bacterium, possibly stemming from slime extrusion. This model was a little later refuted by [A3], who demonstrated that focal adhesion along the bacterial body and thus a distributed force powers A-motility, a mechanism that has by now been investigated in great detail (see [A10]). The paper by Canio et al. contains a thorough theoretical analysis of a filament that is clamped at one end and subject to a concentrated tangential load on the other. Since both models comprise a concentrated end-load rather than a distributed propulsion force density, they describe a substantially different motility mechanism, leading also to substantially different buckling profiles. Consequentially, these models cannot be applied to cyanobacterial gliding.

      We included both citations in the revision and pointed out the differences to our work in the introduction (page 2):

      “…A few species appear to employ a type-IV-pilus related mechanism (Khayatan et al., 2015; Wilde and Mullineaux, 2015 ), similar to the better- studied myxobacteria (Godwin et al., 1989; Mignot et al., 2007; Nan et al., 2014; Copenhagen et al., 2021; Godwin et al., 1989 ), which are short, rod-shaped single cells that exhibit two types of motility: S (social) motility based on pilus extension and retraction, and A (adventurous) motility based on focal adhesion (Chen and Nan, 2022 ) for which also slime extrusion at the trailing cell pole was earlier postulated as mechanism (Wolgemuth et al., 2005 ). Yet, most gliding filamentous cyanobacteria do not exhibit pili and their gliding mechanism appears to be distinct from myxobacteria (Khayatan et al., 2015 ).”

      And in Buckling theory, page 5:

      “….The buckling of gliding filaments differs in two aspects: the propulsion forces are oriented tangentially instead of vertically, and the front end is supported instead of clamped. Therefore, with L < Lc all initial orientations are indifferently stable, while for L > Lc, buckling induces curvature and a resultant torque on the head, leading to rotation (Fily et al., 2020; Chelakkot et al., 2014; Sekimoto et al., 1995 ). Buckling under concentrated tangential end-loads has also been investigated in literature (de Canio et al., 2017; Wolgemuth et al., 2005 ), but leads to substantially different shapes of buckled filaments.”

      The second weakness is that the authors claim that their results favor a focal adhesion-based mechanism for cyanobacterial gliding motility. This is based on their result that friction and adhesion forces correlate strongly. They then conjecture that this is due to more intimate contact with the surface, with more contacts producing more force and pulling the filaments closer to the substrate, which produces more friction. They then claim that a slime-extrusion mechanism would necessarily involve more force and lower friction. Is it necessarily true that this latter statement is correct? (I admit that it could be, but is it a requirement?)

      We thank the referee for raising this interesting question. Our claim regarding slime extrusion is based on three facts: i. mutations deficient of slime extrusion do not glide, but start gliding as soon as slime is provided externally [A4]. ii. A positive correlation between speed and slime layer thickness was observed in Nostoc [A11]. iii. The fluid mechanics of lubricated sliding contacts is very well understood and predicts a decreasing resistance with increasing layer thickness.

      We included these considerations in the revision of our manuscript (page 8):

      “…it indicates that friction and propulsion forces, despite being quite variable, correlate strongly. Thus, generating more force comes, inevitably, at the expense of added friction. For lubricated contacts, the friction coefficient is proportional to the thickness of the lubricating layer (Snoeijer et al., 2013 ), and we conjecture active force and drag both increase due to a more intimate contact with the substrate. This supports mechanisms like focal adhesion (Mignot et al., 2007 ) or a modified type-IV pilus (Khayatan et al., 2015 ), which generate forces through contact with extracellular surfaces, as the underlying mechanism of the gliding apparatus of filamentous cyanobacteria: more contacts generate more force, but also closer contact with the substrate, thereby increasing friction to the same extent. Force generation by slime extrusion (Hoiczyk and Baumeister, 1998 ), in contrast, would lead to the opposite behavior: More slime generates more propulsion, but also reduces friction. Besides fundamental fluid-mechanical considerations (Snoeijer et al., 2013 ), this is rationalized by two experimental observations: i. gliding velocity correlates positively with slime layer thickness (Dhahri et al., 2013 ) and ii. motility in slime-secretion deficient mutants is restored upon exogenous addition of polysaccharide slime. Still we emphasize that many other possibilities exist. One could, for instance, postulate a regulation of the generated forces to the experienced friction, to maintain some preferred or saturated velocity.”

      Related to this, the authors use a model with isotropic friction. They claim that this is justified because they are able to fit the cell shapes well with this assumption. How would assuming a non-isotropic drag coefficient affect the shapes? It may be that it does equally well, in which case, the quality of the fits would not be informative about whether or not the drag was isotropic or not.

      The referee raises another very interesting point. Given the typical variability and uncertainty in experimental measurements (cf. error Figure 4 A), a model with a sightly anisotropic friction could be fitted to the observed buckling profiles as well, without significant increase of the mismatch. Yet, strongly anisotropic friction would not be consistent with our observations.

      Importantly, however, we did not conclude on isotropic friction based on the fit quality, but based on a comparison between free gliding and early buckling (Figure 4 D). In early buckling, the dominant motion is in transverse direction, while longitudinal motion is insignificant, due to geometric reasons. Thus, independent of the underlying model, mostly the transverse friction coefficiont is inferred. In contrast, free gliding is a purely longitudinal motion, and thus only the friction coefficient for longitudinal motion can be inferred. These two friction coefficients are compared in Figure 4 D. Still, the scatter of that data would allow to fit a certain anisotropy within the error margins. What we can exclude based on out observation is the case of a strongly anisotropic friction. If there is no ab-initio reason for anisotropy, nor a measurement that indicates it, we prefer to stick with the simplest

      assumption. We carefully chose our wording in the Discussion as “mainly isotropic” rather

      than “isotropic” or “fully isotropic”.

      We added a small statement to the Discussion on page 7 & 8:

      “... Thus we plot f/v over η in Figure 4 D, finding nearly identical values over about two decades. Since f and η are not correlated with v0, this is due to a correlation between f and η. This relation is remarkable in two aspects: On the one hand, it indicates that friction is mainly isotropic. This suggests that friction is governed by an isotropic process like bond friction or lubrication from the slime layer in the contact with the substrate, the latter being consistent with the observation that mutations deficient of slime secretion do not glide but exogenous addition of slime restores motility (Khayatan et al., 2015 ). In contrast, hydrodynamic drag from the surrounding bulk fluid (Man and Kanso, 2019 ), or the internal friction of the gliding apparatus would be expected to generate strongly anisotropic friction. If the latter was dominant, a snapping-like transition into the buckling state would be expected, rather than the continuously growing amplitude that is observed in experiments. On the other hand, it indicates that friction and propulsion forces ...”

      Recommendations for the authors

      The discussion regarding how the findings of this paper imply that cyanobacteria filaments are propelled by adhesion forces rather than slime extrusion should be improved, as this conclusion seems questionable. There appears to be an inconsistency with a buckling force said to be only weakly dependent on the gliding velocity, while its ratio with the velocity correlates with a friction coefficient. Finally, data and source code should be made publicly available.

      In the revised version, we have modified the discussion of the force generating mechanism according to the reviewer suggestions. The perception of inconsistency in the velocity dependence of the buckling force was based on a misunderstanding, as we detailed in our reply to the referee. We revised the corresponding section to make it more clear. Data and source code have been uploaded to a public data repository.

      Reviewer #2 (recommendations for the authors)

      Despite eLife policy, the authors do not provide a Data Availability Statement. For the presented manuscript, data and source code should be provided “via trusted institutional or third-party repositories that adhere to policies that make data discoverable, accessible and usable.” https://elifesciences.org/inside-elife/51839f0a/for-authors-updates- to-elife-s-data-sharing-policies

      Most of the issues in this reviewer’s public review should be easy to correct, so I would strongly support the authors to provide an amended manuscript.

      We added the Data Availability Statement in the amended manuscript.

      References

      [A1] E. Hoiczyk and W. Baumeister. “The junctional pore complex, a prokaryotic secretion organelle, is the molecular motor underlying gliding motility in cyanobacteria”. In: Curr. Biol. 8.21 (1998), pp. 1161–1168. doi: 10.1016/s0960-9822(07)00487-3.

      [A2] N. Read, S. Connell, and D. G. Adams. “Nanoscale Visualization of a Fibrillar Array in the Cell Wall of Filamentous Cyanobacteria and Its Implications for Gliding Motility”. In: J. Bacteriol. 189.20 (2007), pp. 7361–7366. doi: 10.1128/jb.00706- 07.

      [A3] T. Mignot, J. W. Shaevitz, P. L. Hartzell, and D. R. Zusman. “Evidence That Focal Adhesion Complexes Power Bacterial Gliding Motility”. In: Science 315.5813 (2007), pp. 853–856. doi: 10.1126/science.1137223.

      [A4] Behzad Khayatan, John C. Meeks, and Douglas D. Risser. “Evidence that a modified type IV pilus-like system powers gliding motility and polysaccharide secretion in filamentous cyanobacteria”. In: Mol. Microbiol. 98.6 (2015), pp. 1021–1036. doi: 10.1111/mmi.13205.

      [A5] Tilo Pompe, Martin Kaufmann, Maria Kasimir, Stephanie Johne, Stefan Glorius, Lars Renner, Manfred Bobeth, Wolfgang Pompe, and Carsten Werner. “Friction- controlled traction force in cell adhesion”. In: Biophysical journal 101.8 (2011), pp. 1863–1870.

      [A6] Hirofumi Wada, Daisuke Nakane, and Hsuan-Yi Chen. “Bidirectional bacterial gliding motility powered by the collective transport of cell surface proteins”. In: Physical Review Letters 111.24 (2013), p. 248102.

      [A7] Jo¨el Tchoufag, Pushpita Ghosh, Connor B Pogue, Beiyan Nan, and Kranthi K Mandadapu. “Mechanisms for bacterial gliding motility on soft substrates”. In: Proceedings of the National Academy of Sciences 116.50 (2019), pp. 25087–25096.

      [A8] Chenyi Fei, Sheng Mao, Jing Yan, Ricard Alert, Howard A Stone, Bonnie L Bassler, Ned S Wingreen, and Andrej Kosmrlj. “Nonuniform growth and surface friction determine bacterial biofilm morphology on soft substrates”. In: Proceedings of the National Academy of Sciences 117.14 (2020), pp. 7622–7632.

      [A9] Arja Ray, Oscar Lee, Zaw Win, Rachel M Edwards, Patrick W Alford, Deok-Ho Kim, and Paolo P Provenzano. “Anisotropic forces from spatially constrained focal adhesions mediate contact guidance directed cell migration”. In: Nature communications 8.1 (2017), p. 14923.

      [A10] Jing Chen and Beiyan Nan. “Flagellar motor transformed: biophysical perspectives of the Myxococcus xanthus gliding mechanism”. In: Frontiers in Microbiology 13 (2022), p. 891694.

      [A11] Samia Dhahri, Michel Ramonda, and Christian Marliere. “In-situ determination of the mechanical properties of gliding or non-motile bacteria by atomic force microscopy under physiological conditions without immobilization”. In: PLoS One 8.4 (2013), e61663.

    1. Author response:

      We extend our gratitude to the two reviewers and the editors at eLife for their meticulous examination of our manuscript, as well as for their valuable feedback and positive assessment. We are particularly pleased to observe in both the reviews and the editorial evaluation the recognition of the importance of our findings. Through this provisional response, we wish to convey to the editors, reviewers, and the readership of eLife our intention to enhance the paper by incorporating a detailed description of the sections pertaining to MAD analysis, data interpretation with combined HS-AFM and PCA methods, and specific portions of the discussions. This will involve editing the manuscript accordingly and providing separate explanations in the "author response”. We acknowledge that such additions will strengthen the comprehensiveness of our work and render it more self-contained.

      Moreover, in alignment with the recommendations from the review team, we will provide a thorough discussion of published data and offer a clearer explanation of our utilized methods, thereby providing a more robust foundation for our conclusions.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      I will summarize my comments and suggestions below.

      (1) Abstract:

      "Non-catalytic (pseudo)kinase signaling mechanisms have been described in metazoans, but information is scarce for plants." To the best of my understanding EFR is an active protein kinase in vitro and in vivo and cannot be considered a pseudokinase. Consider rephrasing.

      We rephrased to: “Non-catalytic signaling mechanisms of protein kinase domains have been described in metazoans, but information is scarce for plants.”

      (2) Page 4: It should be noted, that while membrane associated Rap-RiD systems have been used in planta to activate receptor kinase intracellular domains by promoting interaction with a co-receptor kinase domain, this system does not resemble the actual activation mechanism in the plasma membrane. This would be worth discussing when introducing the system. For example, the first substrates of the RK signaling complex may also be membrane associated and not freely diffuse in solution, which may be important for enzyme-substrate interaction.

      We inserted on page 4: “The RiD system was previously applied in planta, maintaining membrane-association by N-terminal myristoylation (Kim et al., 2021). For the in vitro experiments, the myristoylation sites were excluded to facilitate the production of recombinant protein.”

      (3) Page 4 and Fig 1: The catalytic Asp in BRI1 is D1027 and not D1009 (https://pubmed.ncbi.nlm.nih.gov/21289069/). Please check and prepare the correct mutant protein if needed.

      We clarified this in the text by stating that we mutated the HRD-aspartate to asparagine in all our catalytic-dead mutants: “Kinase-dead variants with the catalytic residue (HRD-aspartate) replaced by asparagine (EFRD849N and BRI1D1009N), had distinct effects […]”. D1027 in BRI1 is the DFG-Asp, which was not mutated in our study.

      (4) Page 4 and Fig 1: Is BIK1 a known component of the BR signaling pathway and a direct BRI1 substrate? Or in other words how specific is the trans-phosphorylation assay? In my opinion, a more suitable substrate for BRI1/BAK1 would be BSK1 or BSK3 (for example https://pubmed.ncbi.nlm.nih.gov/30615605/).

      Kinase-dead BIK1 is a reported substrate of BRI1. We clarified this in the results section by inserting: “BIK1 was chosen as it is reported substrate of both, EFR/BAK1 and BRI1/BAK1 complexes (Lin et al., 2013).”

      (5) Fig. 1B Why is BIK1 D202N partially phosphorylated in the absence of Rap? I would suggest to add control lanes showing BRI1, EFR, FLS2, BAK1 and BIK1 in isolation. Given that a nice in vitro activation system with purified components is available, why not compare the different enzyme kinetics rather than band intensities at only 1 enzyme : substrate ratio?

      BIK1 D202N is partially phosphorylated due to the presence of active BAK1 that is capable of transphosphorylating BIK1 D202N as it has been reported in a previous study: (DOI: 10.1038/s41586-018-0471-x).

      (6) Page 4 and Fig 1: Is the kinase dead variant of EFR indeed kinase dead? I could still see a decent autorad signal for this mutant when expressed in E. coli (Fig 1 A in Bender et al., 2021; https://pubmed.ncbi.nlm.nih.gov/34531323/)? If this mutant is not completely inactive, could this change the interpretation of the experiments performed with the mutant protein in vitro and in planta in the current manuscript? In my opinion, it could be possible that a partially active EFR mutant can be further activated by BAK1, and in turn can phosphorylate BIK1 D202N. The differences in autorad signal for BRI1D1009?N and EFRD849N is very small, and the entire mechanism hinges on this difference.

      We would like to emphasize that the mechanism hinges on the difference between non-dimerized and dimerized kinase domains in the in vitro kinase assay. BRI1 D1009N fails to enhance BIK1 D202N trans-phosphorylation compared to the non-dimerized sample, while EFR D849N is still capable of enhancing BIK1 transphosphorylation upon dimerization as indicated by quantification of autorads (Figure 1B/C). We have also addressed this point in a section on the limitations of our study.

      (7) Fig 1B. "Our findings therefore support the hypothesis that EFR increases BIK1 phosphorylation by allosterically activating the BAK1 kinase domain." To the best of my understanding presence of wild-type EFR in the EFR-BAK1 signaling complex leads to much better phosphorylation of BIK1D202N when compared to the EFRD849N mutant. How does that support the allosteric mechanism? By assuming that the D849N mutant is in an inactive conformation and fully catalytically inactive (see above)? Again, I think the data could also be interpreted in such a way that the small difference in autorad signal for BIK1 between BRI1 inactive (but see above) and ERF inactive are due to EFR not being completely kinase dead (see above), rather than EFR being an allosteric regulator. To clarify this point I would suggest to a) perform quantitative auto- and trans-(generic substrate) phosphorylation assays with wt and D849N EFR to derive enzyme kinetic parameters, to (2) include the EFRD849 mutant in the HDX analysis and (3) to generate transgenic lines for EFRD489N/F761H/Y836F // EFRD489N/F761H/SSAA and compare them to the existing lines in Fig. 3.

      Mutations of proteins, especially those that require conformational plasticity for their function can have pleiotropic effects as the mutation may affect the conformational plasticity and consequently catalytic and non-catalytic functions that depend on the conformational plasticity. In such cases, it is difficult to fully untangle catalytic and non-catalytic functions. Coming back to EFR D849N, the D849N mutation may also impact the non-catalytic function by altering the conformational plasticity, explaining the difference observed in EFR vs EFR D849N. As you rightly suggested, HDX would be a way to address this but would still not clarify whether catalytic activity contributes to activation. We instead attempted to produce analog sensitive EFR variants for in vivo characterization of EFR-targeted catalytic inhibition. Unfortunately, we failed in producing an analog-sensitive variant for which we could show ATP-analog binding. To address your concern, we inserted a section on limitations of the study.

      (8) Fig. 2B,C, supplement 3 C,D. Has it been assessed if the different EFR versions were expressed to similar protein levels and still localized to the PM?

      Localization of the mutant receptors has not been explicitly evaluated by confocal microscopy. However, the selected mutation EFRF761H is shown to accumulate in stable Arabidopsis lines (Figure 3 – Supplement 1C) and BAK1 could be coIPed by all EFR variants upon elf18-treatment (Figure 3 B), indicating plasma membrane localization.

      (9) How the active-like conformation of EFR is in turn activating BAK1 is poorly characterized, but appears to be the main step in the activation of the receptor complex. Extending the HDX analyses to resting and Rap-activated receptor complexes could be a first step to address this question. I tried to come up with an experimental plan to test if indeed the kinase activity of BAK1 and not of EFR is essential for signal propagation, but this is a complex issue. You would need to be able to mimic an activated form of EFR (which you can), to make sure its inactive (possibly, see above) and likewise to engineer a catalytically inactive form of BAK1 in an active-like state (difficult). As such a decisive experiment is difficult to implement, I would suggest to discuss different possible interpretations of the existing data and alternative scenarios in the discussion section of the manuscript.

      We addressed your concern whether BAK1 kinase activity is essential for signaling propagation by pairing EFRF761H and BAK1D416N (Figure 4 Supplement 2 C) which fails to induce signaling. In this case, EFRF761H is in its activated conformation but cannot activate downstream signaling. We also attempted to address your concern by an in vitro kinase assay by pairing EFR and BAK1D416N and using a range of concentrations of the substrate BIK1D202N. We observed that catalytic activity of BAK1 but not EFR was essential for BIK1 phosphorylation. However, this experiment does not address whether activated EFR can efficiently propagate signaling in the absence of BAK1 catalytic activity. In the limitations of the study section, we now discuss the catalytic importance of EFR for signaling activation.

      Author response image 1.

      BIK1 trans-phosphorylation depends on BAK1 catalytic activity. Increasing concentrations of BIK1 D202N were used as substrate for Rap-induced dimers of EFR-BAK1, EFR D849N-BAK1, and EFR-BAK1 D416N respectively. BIK1 trans-phosphorylation depended on the catalytic activity of BAK1. Proteins were purified from E. coli λPP cells. Three experiments yielded similar results of which a representative is shown here.

      Reviewer #2:

      All of my suggestions are minor.

      Figure 1B, I think it would be more useful to readers to explain the amino acid in the D-N change, rather than just call it D-to-N? Also, please label the bands on the stained gel; the shift on FKBP-BRI1 and FKBP-EFR are noticeable on the Coomassie stain.

      We implemented your suggestions.

      Figure 1-Supplement 1. There is still a signal in pS612 BAK1 (it states 'also failed to induce BAK1 S612 phosphorylation' in the text, which is not quite correct). Also, could mention the gel shift seen in BAK1, which appears absent in Y836F.

      We corrected the text which now states: “To test whether the requirement for Y836 phosphorylation is similar, we immunoprecipitated EFR-GFP and EFRY836F-GFP from mock- or elf18-treated seedlings and probed co-immunoprecipitated BAK1 for S612 phosphorylation. EFRY836F also obstructed the induction of BAK1 S612 phosphorylation (Figure 1 – Supplement 1), indicating that EFRY836F and EFRSSAA impair receptor complex activation.” The gel shift of BAK1 you pointed out was not observed in replications and thus we prefer not to comment on it.

      Figure 2 and 3 are full of a, b, c,d's, which I don't understand. Sorry

      We used uppercase letters to indicate subpanels and lowercase letters to indicate the results of the statistical testing. In the figure caption, we have clarified that the lowercase letters refer to statistical comparisons.

      Figure 2 A. If each point on the x-axis is one amino acid, I think it would again be useful to name the amino acids that the gold or purple or blue colored lines extend through.

      Each point stands for a peptide which are sorted by position of their starting amino acid from N-terminus to C-terminus. We now added plots of HDX for individual peptides that correspond to the highlighted region in subpanel A.

      Figure Supplement 1 is very small for what it is trying to show, even on the printed page. If this residue were to be phosphorylated, what would happen to the H-bond?

      We suppose that VIa-Tyr phosphorylation would break the H-bond and causes displacement of the aC-b4 loop. Recent studies, published after our submission, highlight the importance of this loop for substrate coordination and ATP binding. Thus, phosphorylation of VIa-Tyr and displacing this loop may render the kinase rather unproductive. We have expanded the discussion to include this point.

      Figure 2B: Tyr 836 is not present in any of the alignments in Figure 2A. This should be rectified, because the text talks about the similarity to Tyr 156 in PKA.

      We have adjusted the alignments such that they now contain the VIa-Tyr residues of EFR and PKA.

      Figure 4D. Is there any particular reason that these Blots are so hard to compare or FKBP and BAK1?

      We assume it is referred to Figure 4 – Supplement 2 D. FKBP-EFR and FRB-BAK1 both are approximately the size of RubisCo, the most abundant protein in plant protein samples and which overlay the FKBP- and FRB-tagged kinase. Thus, it is difficult to detect these proteins.

      Reviewer #3:

      (1) The paper reporting the allosteric activation mechanism of EGFR should be cited.

      Will be included.

      (2)The authors showed that "Rap addition increased BIK1 D202N phosphorylation when the BRI1 or EFR kinase domains were dimerized with BAK1, but no such effect was observed with FLS2". Please explain why FLS2 failed to enhance BIK1 transphosphorylation by Rap treatment?

      Even though BIK1 is a reported downstream signaling component of FLS2/BAK1, it might be not the most relevant downstream signaling component and rather related RLCKs, like PBL1, might be better substrates for dimerized FLS2/BAK1. We haven’t tested this, however. Alternatively, the purified FLS2 kinase domain might be labile and quickly unfolds even though it was kept on ice until the start of the assay, or the N-terminal FKBP-tag may disrupt function. As the reason for our observation is not clear, we have removed FLS2 in vitro dimerization experiments from the manuscript.

      (3) Based solely on the data presented in Figure 1, it can be concluded that EFR's kinase activity is not required to facilitate BIK1 transphosphorylation. Therefore, the title of Figure 1, "EFR Allosterically Activates BAK1," may be inappropriate.

      We have changed the figure title to: “EFR facilitates BIK1 trans-phosphorylation by BAK1 non-catalytically.”

      (4) In Figure 1- Supplement 1, I could not find any bands in anti-GFP and anti-BAK1 pS612 of input. Please redo it.

      Indeed, we could not detect protein in the input samples of this experiment. BAK1 S612 phosphorylation is an activation mark and not necessarily expected to be abundant enough for detection in input samples. EFR-GFP, however, is usually detected in input samples and is reported in Macho et al. 2014 from which manuscript these lines come. Why EFR-GFP is not detected in this set of experiments is unclear but, in our opinion, does not detract from the conclusions drawn since similar amounts of EFR-GFP are pulled-down across all samples.

      (5) For Figure 2A, please mark the structure represented by each color directly in the figure.

      We have made the suggested change.

      (6) Please modify "EFRF761/Y836F and EFRF761H/SSAA restore BIK1 trans-phosphorylation" to "EFRF761H/Y836F and EFRF761H/SSAA restore BIK1 trans-phosphorylation".

      Thank you for spotting this. We changed it.

      (7) The HDX-MS analysis demonstrated that the EFR (Y836F) mutation inhibits the formation of the active-like conformation. Conversely, the EFR (F761H) mutation serves as a potent intragenic suppressor, significantly stabilizing the active-like conformation. Confirming through HDX-MS conformational testing that the EFR (Y836F F761H) double mutation does not hinder the formation of the active-like EFR kinase conformation would greatly strengthen the conclusions of the article.

      Response: We agree that this is beneficial, and we attempted to do it but failed to produce enough protein for HDX-MS analysis. We stated this now in an extra section of the paper (“Limitations of the study”).

    1. Author response:

      eLife assessment

      This study investigates associations between retrotransposon element expression and methylation with age and inflammation, using multiple public datasets. The study is valuable because a systematic analysis of retrotransposon element expression during human aging has been lacking. However, the data provided are incomplete due to the sole reliance on microarray expression data for the core analysis of the paper.

      Both reviewers found this study to be important. We have selected the microarray datasets of human blood adopted by a comprehensive study of ageing published in Nature Communications (DOI: doi: 10.1038/ncomms9570). We only included the datasets specifically collected for ageing studies. Therefore, the large RNA-seq cohorts for cancer, cardiovascular, and neurological diseases were not relevant to this study and cannot be included.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Tsai and Seymen et al. investigate associations between RTE expression and methylation and age and inflammation, using multiple public datasets. The concept of the study is in principle interesting, as a systematic analysis of RTE expression during human aging is lacking.

      We thank the reviewer for the positive comment.

      Unfortunately, the reliance on expression microarray data, used to perform the core analysis of the paper places much of the study on shaky ground. The findings of the study would not be sufficiently supported until the authors validate them with more suitable methods.

      In our discussion section in the manuscript, we have clarified that “we are aware of the limitations imposed by using microarray in this study, particularly the low number of intergenic probes in the expression microarray data. Our study can be enriched with the advent of large RNA-seq cohorts for aging studies in the future.” However, the application of microarray for RTE expression analysis was introduced previously. In fact, in a manuscript published by Reichmann et al. (DOI: 10.1371/journal.pcbi.1002486) which was cited 76 times, the authors showed and experimentally verified that cryptic repetitive element probes present in Illumina and Affymetrix gene expression microarray platforms can accurately and sensitively monitor repetitive element expression data. Inspired by this methodological manuscript with reasonable acceptance by other researchers, we trusted that the RTE microarray probes could accurately quantify RTE expression at class and family levels.

      Strengths:

      This is a very important biological problem.

      Weaknesses:

      RNA microarray probes are obviously biased to genes, and thus quantifying transposon analysis based on them seems dubious. Based on how arrays are designed there should at least be partial (perhaps outdated evidence) that the probe sites overlap a protein-coding or non-coding RNA.

      We disagree with the reviewer that quantifying transposon analysis based on microarray data is dubious. As previously shown by Reichmann et al., the quantification is reliable as long as the probes do not overlap with annotated genes and they are in the correct orientation to detect sense repetitive element transcripts. Reichman et al. identified 1,400 repetitive element probes in version 1.0, version 1.1 and version 2.0 of the Illumina Mouse WG-6 Beadchips by comparing the genomic locations of the probes with the Repeatmasked regions of the mouse genome. We applied the same criteria for Illumina Human HT-12 V3 (29431 probes) and V4 (33963) to identify the RTE-specific probes.

      The authors state they only used intergenic probes, but based on supplementary files, almost half of RTE probes are not intergenic but intronic (n=106 out of 264).

      All our identified RTE probes overlap with intergenic regions. However, due to their repetitive natures, some probes overlap with intronic regions, too. We can replace "intergenic" with "noncoding" in our revision to show that they do not overlap with the exons of protein-coding genes. However, we do not rule out the possibility that some of our detected RTE probes might overlap noncoding RNAs. In fact, the border between coding and non-coding genomes has recently become very fuzzy with new annotations of the genome. RTE RNAs can be easily considered as non-coding RNAs if we challenge our junk DNA view.

      This is further complicated by the fact that not all this small subset of probes is available in all analyzed datasets. For example, 232 probes were used for the MESA dataset but only 80 for the GTP dataset. Thus, RTE expression is quantified with a set of probes which is extremely likely to be highly affected by non-RTE transcripts and that is also different across the studied datasets. Differences in the subsets of probes could very well explain the large differences between datasets in multiple of the analyses performed by the authors, such as in Figure 2a, or 3a. It is nonetheless possible that the quantification of RTE expression performed by the authors is truly interpretable as RTE expression, but this must be validated with more data from RNA-seq. Above all, microarray data should not be the main type of data used in the type of analysis performed by the authors.

      In this study, we did not compare MESA with GTP etc. We have analysed each dataset separately based on the available data for that dataset. Therefore, sacrificing one analysis because of the lack of information from the other does not make sense. We would do that if we were after comparing different datasets. Moreover, the datasets are not comparable because they were produced from different blood cell types.

      Reviewer #2 (Public Review):

      Summary:

      Yi-Ting Tsai and colleagues conducted a systematic analysis of the correlation between the expression of retrotransposable elements (RTEs) and aging, using publicly available transcriptional and methylome microarray datasets of blood cells from large human cohorts, as well as single-cell transcriptomics. Although DNA hypomethylation was associated with chronological age across all RTE biotypes, the authors did not find a correlation between the levels of RTE expression and chronological age. However, expression levels of LINEs and LTRs positively correlated with DNA demethylation, and inflammatory and senescence gene signatures, indicative of "biological age". Gene set variation analysis showed that the inflammatory response is enriched in the samples expressing high levels of LINEs and LTRs. In summary, the study demonstrates that RTE expression correlates with "biological" rather than "chronological" aging.

      Strengths:

      The question the authors address is both relevant and important to the fields of aging and transposon biology.

      We thank the reviewer for finding this study relevant and important.

      Weaknesses:

      The choice of methodology does not fully support the primary claims. Although microarrays can detect certain intergenic transposon sequences, the authors themselves acknowledge in the Discussion section that this method's resolution is limited. More critical considerations, however, should be addressed when interpreting the results. The coverage of transposon sequences by microarrays is not only very limited (232 unique probes) but also predetermined. This implies that any potential agerelated overexpression of RTEs located outside of the microarray-associated regions, or of polymorphic intact transposons, may go undetected. Therefore, the authors should be more careful while generalising their conclusions.

      This is a bioinformatics study, and we have already admitted and discussed the limitations in the discussion section of this manuscript. All technologies have their own limitations, and this should not stop us from shedding light on scientific facts because of inadequate information. In the manuscript, we have discussed that all large and proper ageing studies were performed using microarray technology. Peters et al. (DOI: doi: 10.1038/ncomms9570) adopted all these microarray data in their transcriptional landscape of ageing manuscript. Our study essentially applies the Reichmann et al. method to the peripheral blood-related data from the Peters et al. manuscript. Since hypomethylation due to ageing is a well-established and broad epigenetic reprogramming, it is unlikely that only a fraction of RTEs is affected by this phenomenon. Therefore, the subsampling of RTEs should not affect the result so much. Indeed, this is supported in our study by the inverse correlation between DNA methylation and RTE expression for LINE and SINE classes despite having limited numbers of probes for LINE and SINE expressions.

      Additionally, for some analyses, the authors pool signals from RTEs by class or family, despite the fact that these groups include subfamilies and members with very different properties and harmful potentials. For example, while sequences of older subfamilies might be passively expressed through readthrough transcription, intact members of younger groups could be autonomously reactivated and cause inflammation. The aggregation of signals by the largest group may obscure the potential reactivation of smaller subgroups. I recommend grouping by subfamily or, if not possible due to the low expression scores, by subgroup. For example, all HERV subfamilies are from the ERVL family.

      We agree with the reviewer that different subfamilies of RTEs play different roles through their activation. However, we will lose our statistical power if we study RTE subfamilies with a few probes. Global epigenetic alteration and derepression of RTEs by ageing have been observed to be genome-wide. While our systematic analysis across RTE classes and families cannot capture alterations in subfamilies due to statistical power, it is still relevant to the research question we are addressing.

      Next, Illumina arrays might not accurately represent the true abundance of TEs due to non-specific hybridization of genomic transposons. Standard RNA preparations always contain traces of abundant genomic SINEs unless DNA elimination is specifically thorough. The problem of such noise should be addressed.

      We have checked the RNA isolation step from MESA, GTP, and GARP manuscripts. The total RNA was isolated using the Qiagen mini kit following the manufacturer’s recommendations. The authors of these manuscripts did not mention whether they eliminated genomics DNA, but we assumed they were aware of the DNA contamination and eliminated it based on the manufacturer’s recommendations. We have looked up the literature about non-specific hybridization of RTEs but could not find any evidence to support this observation. We would appreciate the reviewers providing more evidence about such RTE contaminations.

      Lastly, scRNAseq was conducted using 10x Genomics technology. However, quantifying transposons in 10x sequencing datasets presents major challenges due to sparse signals.

      Applying the scTE pipeline (https://www.nature.com/articles/s41467-021-21808-x), we have found that the statical power of quantifying RTE classes (LINE, SINE, and LTR) or RTE families (L1, L2, All, ERVK, etc.) are as good as each individual gene. However, our proposed method cannot analyse RTE subfamilies, and we did not do that.

      Smart-seq single-cell technology is better suited to this particular purpose.

      We agree with the reviewer that Smart-seq provides higher yield than 10x, but there is no Smart-seq data available for ageing study.

      Anyway, it would be more convincing if the authors demonstrated TE expression across different clusters of immune cells using standard scRNAseq UMAP plots instead of boxplots.

      Since the number of RTE reads per cell is low, showing the expression of RTEs per cell in UMAP may not be the best statistical approach to show the difference between the aged and young groups. This is why we chose to analyse with pseudobulk and displayed differential expression using boxplot rather than UMAP for each immune cell type.

      I recommend validating the data by RNAseq, even on small cohorts. Given that the connection between RTE overexpression and inflammation has been previously established, the authors should consider better integrating their observations into the existing knowledge.

      Until recently, there were no publicly-available, non-cancerous, large cohort of RNA-seq data for ageing studies. We tried to gain access to the two RNA-seq datasets suggested by reviewer 2: Marquez et al. 2020 (phs001934.v1.p1, controlled access) and Morandini et al. 2023 (GSE193141, public access).

      Unfortunately, Marquez et al. 2020 data is not accessible because the authors only provide the data for projects related to cardiovascular diseases. However, we did analyse Morandini et al. 2023 data, and we can confirm that no association was observed between any class and family of RTEs with chronological ageing, which is the second strong piece of evidence supporting the statement in the manuscript. However, as expected, we found a positive correlation between RTE expression and IFNI signature score.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This study provides an important finding that the local abundance of metabolites impacts the biology of the tumor microenvironment by utilizing kidney tumors from patients and adjacent normal tissues. The evidence supporting the claims of the authors is convincing although certain caveats need to be taken into consideration as the authors acknowledged in the paper. The work will be of interest to the research community working on metabolism and on kidney cancer especially.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The present study addresses how the local abundance of metabolites impacts the biology of the tumor microenvironment. The authors enroll patients harboring kidney tumors and use freshly resected tumor material for metabolic studies. Specifically, the authors separate the adjacent normal kidney tissue from the tumor material and then harvest the interstitial fluid from the normal kidney (KIF) or the tumor (TIF) for quantitative metabolomics. The plasma samples from the patient are used for comparison. Additionally, the authors also compare metabolite levels in the plasma of patients with kidney versus lung cancer (or healthy donors) to address how specific tumor types might contribute to circulating levels of metabolites. Altogether, the authors find that the metabolite levels in the KIF and TIF, although vastly different than plasma, are largely overlapping. These findings indicate that tissue of origin appears to have a stronger role in determining the local metabolic environment of tumors than the genetics or biochemistry of the tumor itself.

      Strengths:

      The biggest strength of the current study is the use of human patient-derived samples. The cohort size (~50 patients) is relatively large, which adds to the rigor of the work. The work also relies on a small pool of metabolites that can be quantitatively measured using methods developed by the authors. Focusing on a smaller metabolic pool also likely increases the signal-to-noise ratio and enables the more rigorous determination of any underlying differences. The manuscript is well-written and highlights both the significance of the findings and also acknowledges many of the caveats. The recognition of the metabolic contributions of surrounding normal tissue as the primary driver of local nutrient abundance is a novel finding in the work, which can be leveraged in future studies.

      We thank the Reviewer for their careful evaluation of the study and for their supportive comments.

      Weaknesses:

      The work has certain caveats, some of which have been already recognized by the authors. These include the use of steady-state metabolites and the possibility of cross-contamination of some TIF into the adjacent KIF. This study is also unable to distinguish the mechanisms driving the metabolic changes in KIF/TIF relative to circulating levels in plasma.

      We agree with the Reviewer that these are important caveats to consider when interpreting the results of this study.

      The relative similarity of KIF and TIF is quite surprising. However, this interpretation is presently based on a sampling of only ~100 polar metabolites and ~200 lipid molecules. It is, perhaps, possible that future technological developments that enable more comprehensive quantitative metabolic profiling might distinguish between KIF and TIF composition.

      The Reviewer raises another important point that our interpretation of KIF vs TIF is limited to the ~300 metabolites we measured. We agree it would be worthwhile quantifying more metabolites where technically feasible to further characterize similarities and differences in nutrient availability between tumor and normal tissues.

      In vitro, tissue culture is recognized to suffer from ‘non-physiological’ nutrient dependencies, which are impacted by the composition of culture media. Thus, in vivo studies remain our current gold-standard in mechanistic studies of tumor metabolism. It is presently unclear whether the findings of this work will be recapitulated in any of the kidney cancer in vivo models and thus be functionally testable.

      We thank the Reviewer for calling attention to the limitations of cell culture media in studying tumor metabolism. While both in vitro and in vivo approaches have inherent limitations, formulating culture media based on metabolite concentrations measured here and in other studies provides a tool to study the influence of nutrient availability on kidney cell or kidney cancer cell phenotypes in vitro. We also agree with the Reviewer that determining whether the findings in our study are recapitulated in mouse models of kidney cancer, as this might enable investigation into the factors that modulate nutrient availability in this tissue context.

      Reviewer #2 (Public Review):

      The study employs quantitative metabolomic and lipidomic analyses to scrutinize tumor interstitial fluid (TIF), adjacent normal kidney interstitial fluid (KIF), and plasma samples from renal cell carcinoma (RCC) patients. The authors delve into the intricate world of renal cell carcinoma and its tumor microenvironment, shedding light on the factors that shape nutrient availability in both cancerous and adjacent normal tissues. The authors prove that non-cancer-driven tissue factors play a dominant role in shaping nutrient availability in RCC. This finding opens up new avenues for research, suggesting that the tumor microenvironment is profoundly influenced by factors beyond the presence of cancer cells. This study not only contributes valuable insights into RCC metabolism but also prompts a reevaluation of the factors governing nutrient availability in tumor microenvironments more broadly. Overall, it represents a significant step forward in our understanding of the intricate interplay between cancer and its surrounding milieu.

      We thank the Reviewer for their evaluation of our work and for their supportive comments.

      The study is overall well-constructed, including appropriate analysis. Likewise, the manuscript is written clearly and supported by high-quality figures. Since the authors exclusively employed samples from RCC patients and did not include kidney interstitial fluid and plasma samples from healthy individuals, we cannot accurately assess the true significance and applicability of the results until the role of cancer cells in reshaping KIF is understood. In essence, some metabolite levels in the tumor interstitial fluid did not show an increase or decrease compared to the adjacent normal kidney interstitial fluid. However, the levels of these metabolites in both TIF and KIF might be higher or lower than those in kidney interstitial fluid from healthy individuals, and the roles of these metabolites should not be overlooked. Similar concerns extend to plasma levels, emphasizing the importance of metabolites that synchronously change in RCC TIF, KIF, and plasma-whether elevated or reduced.

      We agree with the Reviewer that an important caveat in considering the study findings is that we do not have KIF values from healthy individuals. Since resection of normal kidney is not a common procedure, obtaining KIF samples from healthy patients was not possible to complement our analysis. We further agree that the metabolite levels we measured in KIF or plasma are plausibly impacted by the presence of RCC. We did compare the composition of polar metabolites in the plasma from RCC, lung cancer, and healthy patients, highlighting how cystine is affected by tumor presence and/or sample collection methodology. We also point out that factors such as diet will impact metabolites in both blood and tissues.

      Reviewer #3 (Public Review):

      In this study, the authors utilized mass spectrometry-based quantification of polar metabolites and lipids in normal and cancerous tissue interstitial fluid and plasma. This showed that nutrient availability in tumor interstitial fluid was similar to that of interstitial fluid in adjacent normal kidney tissue, but that nutrients found in both interstitial fluid compartments were different from those found in plasma. This suggests that the nutrients in kidney tissue differ from those found in blood and that nutrients found in kidney tumors are largely dictated by factors shared with normal kidney tissue. Those data could be useful as a resource to support further study and modeling of the local environment of RCC and normal kidney physiology.

      We thank the Reviewer for their time considering our paper and for their supportive comments.

      In Figures 1D and 1E, there were about 30% of polar metabolites and 25% of lipids significantly different between TIF and KIF, which could be key factors for RCC tumors. This reviewer considers that the authors should make comments on this.

      We agree with the Reviewer that the metabolites that significantly differ between TIF and KIF are of interest, particularly for those studying RCC tumor metabolism. We comment on some of the metabolites driving differences between TIF and KIF in our discussion of Figure 2, and in the revised manuscript we now include a new figure showing a heatmap that enables visualization of these metabolites (Figure 2-Supplement 1A-B).

      Recommendations for the authors:

      From the Reviewing Editor:

      Figure 2 needs to plot heatmaps for both upregulated and downregulated metabolites in TIF.

      We agree and now include heatmaps for significantly differing polar metabolites and lipids in TIF vs KIF as requested by Reviewer 3 (Figure 2-Supplement 1A-B). For completeness, we also include heatmaps for metabolites differing between healthy and RCC plasma (Figure 2-Supplement 2C) and for NSCLC and RCC plasma (Figure 2-Supplement 2D).

      There is a need to show whether the differences in these metabolites between plasma and tissue interstitial fluid are specific to RCC patients or if they are also present in normal individuals.

      Unfortunately, it has not been possible for us to collect KIF from healthy individuals. Since resection of normal kidney is not a common procedure, we have no way to obtain sufficient KIF samples from healthy patients for this measurement. We discuss this as a limitation of the study.

      Reviewer #1 (Recommendations For The Authors):

      a. The authors should provide additional details about the methodology to separate the KIF and TIF. Contaminating metabolites from surrounding tissue or the peritoneal fluids could impact interpretation and it would be helpful to understand how these challenges were addressed during tissue collection for this study. Additionally, was the collected tissue minced or otherwise dissociated? If so, could these procedures cause tissue lysis and contaminate the KIF/TIF with intracellular components?

      We thank the Reviewer for the suggestions to include more information about the sampling methodology. Care was taken to minimize cell lysis incurred by the processing methodology as tissues were not minced, smashed, nor dissociated, however there is still a possibility of some level of tissue lysis that is pre-existing or occurs during the isolation procedure. We note this caveat in the text (lines 218-220) and have updated the Methods with more details of the sampling and processing of the samples.

      b. Although the authors focus on metabolites that are elevated in TIF (relative to KIF and plasma), it would be equally relevant to consider the converse. Metabolites that are reduced in TIF, either due to underproduction or overconsumption, could render the tumors auxotrophic for some critical dependencies and identify some novel metabolic vulnerabilities. In this regard, Figure 2 could have a heatmap of the top metabolites that are elevated and depleted specifically in the TIF.

      We agree with the Reviewer it is useful to include heatmaps to better display the metabolites that significantly differ between TIF and KIF and now include these in Figure 2-Supplement 1A-B.

      c. The future utilization of this knowledge would depend on our ability to model these differences. Would interstitial tissue from a normal mouse kidney or tumor-bearing mouse kidney recapitulate the same differences relative to mouse plasma?

      We agree with the Reviewer that it would be worth determining whether the findings in our study are recapitulated in mouse models of kidney cancer, which would support future investigation into the factors that modulate nutrient availability. This is an interesting question, but we did not have access to endogenously arising models of RCC, which have been a limitation for the field, and comparison of normal mouse kidney metabolite data to human metabolite data is problematic for obvious reasons. Thus, we had no choice but to discuss this as a limitation of the study.

      Reviewer #2 (Recommendations For The Authors):

      In this study, Abbott et al. investigated the metabolic profile of renal cell carcinoma (RCC) by analyzing the tumor interstitial fluid (TIF), adjacent normal kidney interstitial fluid (KIF), and plasma samples from patients. The results indicate that nutrient composition in TIF closely resembles that of KIF, suggesting that tissue-specific factors, rather than tumor-driven alterations, have a more significant impact on nutrient levels. These findings are interesting. The study is overall well-constructed, including appropriate analysis, and the manuscript is written clearly and supported by high-quality figures. However, some issues are raised which if addressed, would strengthen the paper.

      We thank the Reviewer for their suggestions to improve the paper.

      The authors found a difference in the number of metabolites when comparing TIF or KIF lipid composition with plasma. The discoveries are intriguing; however, I am keen to understand whether the differences in these metabolites between plasma and tissue interstitial fluid are specific to RCC patients or if they are also present in normal individuals. I am particularly interested in identifying which metabolites could serve as potential diagnostic markers, intervention targets, or potentially reshape the tumor microenvironment. Because, even though some metabolite levels show no difference between TIF and KIF in RCC patients, I wonder if these metabolite levels in KIF increase or decrease compared to the interstitial fluid in healthy individuals. I am intrigued by the metabolites that simultaneously increase or decrease in both TIF and KIF compared to the kidney interstitial fluid in healthy individuals.

      We agree with the Reviewer that it would be interesting to measure kidney interstitial fluid from healthy patients to be able to compare metabolites changing due to the presence of RCC tumor. As we discuss in response to the public review, this was not possible as we could not obtain material from healthy individuals for analysis. Nevertheless we agree it warrants future study if material were available.

      The analysis conducted using plasma from healthy donors, as applauded by the author, is noteworthy. The author seems to have found that cystine levels do not differ between RCC patient plasma and tissue interstitial fluid. However, considering that in patient plasma, the cystine concentration is approximately two-fold higher than in plasma from healthy individuals, likely, cystine levels in patient tissue fluid have also increased nearly two-fold compared to levels in the interstitial fluid of normal kidney tissues. This finding aligns with the discovery of elevated GSH levels in cancer cells.

      We agree with the Reviewer that a higher cystine concentration in RCC patient plasma and interstitial fluid is interesting, and also considered this in relationship to past findings including reports of elevated GSH levels in RCC. However, we think this observation is driven at least in part by the fasting status of the patients pre-surgery. This does not rule out some part being related to the presence of the tumor, as this would be consistent with elevated GSH levels as noted by the Reviewer. Future studies will be needed to further delineate the factors that impact elevated cystine levels in both interstitial fluid and plasma.

      Some minor typos, such as "HIF1􀀀-driven" should be corrected.

      We thank the Reviewer for pointing out this typo and we have corrected it in the revised manuscript.

    1. Author response:

      eLife assessment

      This study provides valuable evidence indicating that Syngap1 regulates the synaptic drive and membrane excitability of parvalbumin- and somatostatin-positive interneurons in the auditory cortex. Since haplo-insufficiency of Syngap1 has been linked to intellectual disabilities without a well-defined underlying cause, the central question of this study is timely. However, the support for the authors' conclusions is incomplete in general and some parts of the experimental evidence are inadequate. Specifically, the manuscript requires further work to properly evaluate the impact on synaptic currents, intrinsic excitability parameters, and morphological features.

      We are happy that the editors found that our study provides valuable evidence and that the central question is timely. We thank the reviewers for their detailed comments and suggestions. Below, we provide a point-by-point answer (in blue) to the specific comments and indicate the changes to the manuscript and the additional experiments we plan to perform to answer these comments.

      Public Reviews:

      Reviewer #1 (Public Review):

      The study is designed to assess the role of Syngap1 in regulating the physiology of the MGE-derived PV+ and SST+ interneurons. Syngap1 is associated with some mental health disorders, and PV+ and SST+ cells are the focus of many previous and likely future reports from studies of interneuron biology, highlighting the translational and basic neuroscience relevance of the authors' work.

      Strengths of the study are using well-established electrophysiology methods and the highly controlled conditions of ex vivo brain slice experiments combined with a novel intersectional mouse line, to assess the role of Syngap1 in regulating PV+ and SST+ cell properties. The findings revealed that in the mature auditory cortex, Syngap1 haploinsufficiency decreases both the intrinsic excitability and the excitatory synaptic drive onto PV+ neurons from Layer 4. In contrast, SST+ interneurons were mostly unaffected by Syngap1 haploinsufficiency. Pharmacologically manipulating the activity of voltage-gated potassium channels of the Kv1 family suggested that these channels contributed to the decreased PV+ neuron excitability by Syngap insufficiency. These results therefore suggest that normal Syngap1 expression levels are necessary to produce normal PV+ cell intrinsic properties and excitatory synaptic drive, albeit, perhaps surprisingly, inhibitory synaptic transmission was not affected by Syngap1 haploinsufficiency.

      Since the electrophysiology experiments were performed in the adult auditory cortex, while Syngap1 expression was potentially affected since embryonic stages in the MGE, future studies should address two important points that were not tackled in the present study. First, what is the developmental time window in which Syngap1 insufficiency disrupted PV+ neuron properties? Albeit the embryonic Syngap1 deletion most likely affected PV+ neuron maturation, the properties of Syngap-insufficient PV+ neurons do not resemble those of immature PV+ neurons. Second, whereas the observation that Syngap1 haploinsufficiency affected PV+ neurons in auditory cortex layer 4 suggests auditory processing alterations, MGE-derived PV+ neurons populate every cortical area. Therefore, without information on whether Syngap1 expression levels are cortical area-specific, the data in this study would predict that by regulating PV+ neuron electrophysiology, Syngap1 normally controls circuit function in a wide range of cortical areas, and therefore a range of sensory, motor and cognitive functions. These are relatively minor weaknesses regarding interpretation of the data in the present study that the authors could discuss.

      We agree with the reviewer on the proposed open questions, which we will certainly discuss in the revised manuscript we are preparing. We do have experimental evidence suggesting that Syngap1 mRNA is expressed by PV+ and SST+ neurons in different cortical areas, during early postnatal development and in adulthood; therefore, we agree that it will be important, in future experiments, to tackle the question of when the observed phenotypes arise.

      Reviewer #2 (Public Review):

      Summary:

      In this manuscript, the authors investigated how partial loss of SynGap1 affects inhibitory neurons derived from the MGE in the auditory cortex, focusing on their synaptic inputs and excitability. While haplo-insufficiently of SynGap1 is known to lead to intellectual disabilities, the underlying mechanisms remain unclear.

      Strengths:

      The questions are novel

      Weaknesses:

      Despite the interesting and novel questions, there are significant concerns regarding the experimental design and data quality, as well as potential misinterpretations of key findings. Consequently, the current manuscript fails to contribute substantially to our understanding of SynGap1 loss mechanisms and may even provoke unnecessary controversies.

      Major issues:

      (1) One major concern is the inconsistency and confusion in the intermediate conclusions drawn from the results. For instance, while the sEPSC data indicates decreased amplitude in PV+ and SOM+ cells in cHet animals, the frequency of events remains unchanged. In contrast, the mEPSC data shows no change in amplitudes in PV+ cells, but a significant decrease in event frequency. The authors conclude that the former observation implies decreased excitability. However, traditionally, such observations on mEPSC parameters are considered indicative of presynaptic mechanisms rather than changes of network activity.‎ The subsequent synapse counting experiments align more closely with the traditional conclusions. This issue can be resolved by rephrasing the text. However, it would remain unexplained why the sEPSC frequency shows no significant difference. If the majority of sEPSC events were indeed mediated by spiking (which is blocked by TTX), the average amplitudes and frequency of mEPSCs should be substantially lower than those of sEPSCs. Yet, they fall within a very similar range, suggesting that most sEPSCs may actually be independent of action potentials. But if that was indeed the case, the changes of purported sEPSC and mEPSC results should have been similar.

      We understand the reviewer’s perspective; indeed, we asked ourselves the very same question regarding why the sEPSC and mEPSC frequency fall within a similar range when we analysed neuron means (bar graphs). We have already recorded sEPSCs followed by mEPSCs from several PV neurons (control and cHet) and are in the process of analyzing the data. We will add this data to the revised version of the manuscript. We will also rephrase the manuscript to present multiple potential interpretations of the data.

      We hope that we have correctly interpreted the reviewer's concern. However, if the question is why sEPSC amplitude but not frequency is affected in cHet vs ctrl then the reviewer’s comment is perhaps based on the assumption that the amplitude and frequency of miniature events should be lower for all events compared to those observed for spontaneous events. However, it's essential to note that changes in the mean amplitude of sEPSCs are primarily driven by alterations in large sEPSCs (>9-10pA, as shown in cumulative probability in Fig. 1b right), with smaller ones being relatively unaffected. Consequently, a reduction in sEPSC amplitude may not necessarily result in a significant decrease in frequency since their values likely remain above the detection threshold of 3 pA. This could explain the lack of a significant decrease in average inter-interval event of sEPSCs (as depicted in Fig. 1b left).

      If the question is whether we should see the same parameters affected by the genetic manipulation in both sEPSC and mEPSC, then another critical consideration is the involvement of the releasable pool in mEPSCs versus sEPSCs. Current knowledge suggests that activity-dependent and -independent release may not necessarily engage the same pool of vesicles or target the same postsynaptic sites. This concept has been extensively explored (reviewed in Kavalali, 2015). Consequently, while we may have traditionally interpreted activity-dependent and -independent data assuming they utilize the same pool, this is no longer accurate. The current discussion in the field revolves around understanding the mechanisms underlying such phenomena. Therefore, comparisons between sEPSCs and mEPSCs may not yield conclusive data but rather speculative interpretations. For a rigorous analysis, particularly in this context involving thousands of events, it is essential to assess these data sets (mEPSCs vs sEPSCs) separately and provide cumulative probability curves. This approach allows for a more comprehensive understanding of the underlying distributions and helps to elucidate any potential differences between the two types of events. We will rephrase the text, and as mentioned above, add additional data, to better reflect these considerations.

      (2) Another significant concern is the quality of synapse counting experiments. The authors attempted to colocalize pre- and postsynaptic markers Vglut1 and PSD95 with PV labelling. However, several issues arise. Firstly, the PV labelling seems confined to soma regions, with no visible dendrites. Given that the perisomatic region only receives a minor fraction of excitatory synapses, this labeling might not accurately represent the input coverage of PV cells. Secondly, the resolution of the images is insufficient to support clear colocalization of the synaptic markers. Thirdly, the staining patterns are peculiar, with PSD95 puncta appearing within regions clearly identified as somas by Vglut1, hinting at possible intracellular signals. Furthermore, PSD95 seems to delineate potential apical dendrites of pyramidal cells passing through the region, yet Vglut1+ partners are absent in these segments, which are expected to be the marker of these synapses here. Additionally, the cumulative density of Vglut2 and Vglut1 puncta exceeds expectations, and it's surprising that subcortical fibers labeled by Vglut2 are comparable in number to intracortical Vglut1+ axon terminals. Ideally, N(Vglut1)+N(Vglut2) should be equal or less than N(PSD95), but this is not the case here. Consequently, these results cannot be considered reliable due to these issues.

      We apologize, as it appears that the images we provided have caused confusion. The selected images represent a single focal plane of a confocal stack, which was visually centered on the PV cell somata. We chose just one confocal plane because we thought it showed more clearly the apposition of presynaptic and postsynaptic immunolabeling around the somata. In the revised version of the manuscript, we will provide higher magnification images, which will clearly show how we identified and selected the region of interest for the quantification of colocalized synaptic markers. In our confocal stacks, we can also identify PV immunolabeled dendrites and colocalized vGlut1/PSD95 or vGlut2/PSD95 puncta on them; but these do not appear in the selected images because, as explained, only one focal plane, centered on the PV cell somata, was shown.

      We acknowledge the reviewer's point that in PV+ cells the majority of excitatory inputs are formed onto dendrites; however, we focused on the somatic excitatory inputs to PV cells, because despite their lower number, they produce much stronger depolarization in PV neurons than dendritic excitatory inputs (Hu et al., 2010; Norenberg et al., 2010). Further, quantification of perisomatic putative excitatory synapses is more reliable since by using PV immunostaining, we can visualize the soma and larger primary dendrites, but smaller, higher order dendrites are not be always detectable. Of note, PV positive somata receive more excitatory synapses than SST positive and pyramidal neuron somata as found by electron microscopy studies in the visual cortex (Hwang et al., 2021; Elabbady et al., 2024).

      Regarding the comment on the density of vGlut1 and vGlut2 puncta, the reason that the numbers appear high and similar between the two markers is because we present normalized data (cHet normalized to their control values for each set of immunolabelling) to clearly represent the differences between genotypes. This information is present in the legends but we apologize for not clearly explaining it the methods section. We will provide a more detailed explanation of our methods in the revised manuscript.

      Briefly, immunostained sections were imaged using a Leica SP8-STED confocal microscope, with a 63x (NA 1.4) at 1024 X 1024, z-step =0.3 μm, stack size of ~15 μm. Images were acquired from the auditory cortex from at least 3 coronal sections per animal. All the confocal parameters were maintained constant throughout the acquisition of an experiment. All images shown in the figures are from a single confocal plane. To quantify the number of vGlut1/PSD95 or vGlut2/PSD95 putative synapses, images were exported as TIFF files and analyzed using Fiji (Image J) software. We first manually outlined the profile of each PV cell soma (identified by PV immunolabeling). At least 4 innervated somata were selected in each confocal stack. We then used a series of custom-made macros in Fiji as previously described (Chehrazi et al, 2023). After subtracting background (rolling value = 10) and Gaussian blur (σ value = 2) filters, the stacks were binarized and vGlut1/PSD95 or vGlut2/PSD95 puncta were independently identified around the perimeter of a targeted soma in the focal plane with the highest soma circumference. Puncta were quantified after filtering particles for size (included between 0-2μm2) and circularity (included between 0-1). Data quantification was done by investigators blind to the genotype, and presented as normalized data over control values for each experiment.

      (3) One observation from the minimal stimulation experiment was concluded by an unsupported statement. Namely, the change in the onset delay cannot be attributed to a deficit in the recruitment of PV+ cells, but it may suggest a change in the excitability of TC axons.

      We agree with the reviewer, please see answer to point below.

      (‎4) The conclusions drawn from the stimulation experiments are also disconnected from the actual data. To make conclusions about TC release, the authors should have tested release probability using established methods, such as paired-pulse changes. Instead, the only observation here is a change in the AMPA components, which remained unexplained.

      We agree with the reviewer and we will perform additional paired-pulse ratio experiments at different intervals. We will rephrase the discussion and our interpretation and potential hypothesis according to the data obtained from this new experiment.

      (5) The sampling rate of CC recordings is insufficient ‎to resolve the temporal properties of the APs. Therefore, the phase-plots cannot be interpreted (e.g. axonal and somatic AP components are not clearly separated), raising questions about how AP threshold and peak were measured. The low sampling rate also masks the real derivative of the AP signals, making them apparently faster.

      We acknowledge that a higher sampling rate could offer a more detailed analysis of the action potential waveform. However, in the context of action potential analysis, it is acceptable to use sampling rates ranging from 10 kHz to 20 kHz (Golomb et al., 2007; Stevens et al., 2021; Zhang et al., 2023), which are considered adequate in the context of the present study. Indeed, our study aims to evaluate "relative" differences in the electrophysiological phenotype when comparing groups following a specific genetic manipulation. A sampling rate of 10 kHz is commonly employed in similar studies, including those conducted by our collaborator and co-author S. Kourrich (e.g., Kourrich and Thomas 2009, Kourrich et al., 2013), as well as others (Russo et al., 2013; Ünal et al., 2020; Chamberland et al., 2023).

      Despite being acquired at a lower sampling rate than potentially preferred by the reviewer, our data clearly demonstrate significant differences between the experimental groups, especially for parameters that are negligibly or not affected by the sampling rate used here (e.g., #spikes/input, RMP, Rin, Cm, Tm, AP amplitude, AP latency, AP rheobase).

      Regarding the phase-plots, we agree that a higher sampling rate would have resulted in smoother curves and more accurate absolute values. However, the differences were sufficiently pronounced to discern the relative variations in action potential waveforms between the experimental groups.

      A related issue is that the Methods section lacks essential details about the recording conditions, such as bridge balance and capacitance neutralization.

      We indeed performed bridge balance and neutralized the capacitance before starting every recording. We will add the information in the methods.

      (6) Interpretation issue: One of the most fundamental measures of cellular excitability, the rheobase, was differentially affected by cHet in BCshort and BCbroad. Yet, the authors concluded that the cHet-induced changes in the two subpopulations are common.

      We are uncertain if we have correctly interpreted the reviewer's comment. While we observed distinct impacts on the rheobase (Fig. 7d and 7i), there seems to be a common effect on the AP threshold (Fig. 7c and 7h), as interpreted and indicated in the final sentence of the results section for Figure 7 (page 12). If our response does not address the reviewer's comment adequately, we would greatly appreciate it if the reviewer could rephrase their feedback.

      (7) Design issue:

      The Kv1 blockade experiments are disconnected from the main manuscript. There is no experiment that shows the causal relationship between changes in DTX and cHet cells. It is only an interesting observation on AP halfwidth and threshold. However, how they affect rheobase, EPSCs, and other topics of the manuscript are not addressed in DTX experiments.

      Furthermore, Kv1 currents were never measured in this work, nor was the channel density tested. Thus, the DTX effects are not necessarily related to changes in PV cells, which can potentially generate controversies.

      While we acknowledge the reviewer's point that Kv1 currents and density weren't specifically tested, an important insight provided by Fig. 5 is the prolonged action potential latency. This delay is significantly influenced by slowly inactivating subthreshold potassium currents, namely the D-type K+ current. It's worth noting that D-type current is primarily mediated by members of the Kv1 family. The literature supports a role for Kv1.1-containing channels in modulating responses to near-threshold stimuli in PV cells (Wang et al., 1994; Goldberg et al., 2008; Zurita et al., 2018). However, we recognize that besides the Kv1 family, other families may also contribute to the observed changes.

      To address this concern, we will revise our interpretation. We will opt for the more accurate term "D-type K+ current" and only speculate about the involved channel family in the discussion. It is not our intention to open unnecessary controversy, but present the data we obtained. We believe this approach and rephrasing the discussion as proposed will prevent unnecessary controversy and instead foster fruitful discussions.

      (8) Writing issues:

      Abstract:

      The auditory system is not mentioned in the abstract.

      One statement in the abstract is unclear‎. What is meant by "targeting Kv1 family of voltage-gated potassium channels was sufficient..."? "Targeting" could refer to altered subcellular targeting of the channels, simple overexpression/deletion in the target cell population, or targeted mutation of the channel, etc. Only the final part of the Results revealed that none of the above, but these channels were blocked selectively.

      We agree with the reviewer and we will rephrase the abstract accordingly.

      Introduction:

      There is a contradiction in the introduction. The second paragraph describes in detail the distinct contribution of PV and SST n‎eurons to auditory processing. But at the end, the authors state that "relatively few reports on PV+ and SST+ cell-intrinsic and synaptic properties in adult auditory cortex". Please be more specific about the unknown properties.

      We agree with the reviewer and we will rephrase more specifically.

      (9) The introduction emphasizes the heterogeneity of PV neurons, which certainly influences the interpretation of the results of the current manuscript. However, the initial experiments did not consider this and handled all PV cell data as a pooled population.

      In the initial experiments, we handled all PV cell data together because we wanted to be rigorous and not make assumptions/biases on the different PV cells, which in later experiments we were to distinguish based on the intrinsic properties alone. We will make this point clear in the revised manuscript.

      (10) The interpretation of the results strongly depends on unpublished work, which potentially provide the physiological and behavioral contexts about the role of GABAergic neurons in SynGap-haploinsufficiency. The authors cite their own unpublished work, without explaining the specific findings and relation to this manuscript.

      We agree with the reviewer and apologize for the lack of clarity. Our unpublished work is in revision right now. We will provide more information and update references in the revised version of this manuscript.

      (11) The introduction of Scholl analysis ‎experiments mentions SOM staining, however, there is no such data about this cell type in the manuscript.

      We apologize for the error, we will change SOM with SST (SOM and SST are two commonly used acronyms for Somatostatin expressing interneurons).

      Reviewer #3 (Public Review):

      This paper compares the synaptic and membrane properties of two main subtypes of interneurons (PV+, SST+) in the auditory cortex of control mice vs mutants with Syngap1 haploinsufficiency. The authors find differences at both levels, although predominantly in PV+ cells. These results suggest that altered PV-interneuron functions in the auditory cortex may contribute to the network dysfunction observed in Syngap1 haploinsufficiency-related intellectual disability. The subject of the work is interesting, and most of the approach is direct and quantitative, which are major strengths. There are also some weaknesses that reduce its impact for a broader field.

      (1) The choice of mice with conditional (rather than global) haploinsufficiency makes the link between the findings and Syngap1 relatively easy to interpret, which is a strength. However, it also remains unclear whether an entire network with the same mutation at a global level (affecting also excitatory neurons) would react similarly.

      The reviewer raises an interesting and pertinent open question which we will address in the discussion of the revised paper.

      (2) There are some (apparent?) inconsistencies between the text and the figures. Although the authors appear to have used a sophisticated statistical analysis, some datasets in the illustrations do not seem to match the statistical results. For example, neither Fig 1g nor Fig 3f (eNMDA) reach significance despite large differences.

      We respectfully disagree, we do not think the text and figures are inconsistent. In the cited example, large apparent difference in mean values does not show significance due to the large variability in the data; further, we did not exclude any data points, because we wanted to be rigorous. In particular, for Fig.1g, statistical analysis shows a significant increase in the inter-mEPSC interval (*p=0.027, LMM) when all events are considered (cumulative probability plots), while there is no significant difference in the inter-mEPSCs interval for inter-cell mean comparison (inset, p=0.354, LMM). Inter-cell mean comparison does not show difference with Mann-Whitney test either (p=0.101, the data are not normally distributed, hence the choice of the Mann-Whitney test). For Fig. 3f (eNMDA), the higher mean value for the cHet versus the control is driven by two data points which are particularly high, while the other data points overlap with the control values. The Mann-Whitney test show also no statistical difference (p=0.174).

      In the manuscript, discussion of the data is based on the results of the LMM analysis, which takes in account both the number of cells and the numbers of mice from which these cells are recorded. We chose this statistical approach because it does not rely on the assumption that cells recorded from same mouse are independent variables. In the supplemental tables, we provided the results of the statistical analysis done with both LMM and the most commonly used Mann Whitney (for not normally distributed) or t-test (for normally distributed), for each data set.

      Also, the legend to Fig 9 indicates the presence of "a significant decrease in AP half-width from cHet in absence or presence of a-DTX", but the bar graph does not seem to show that.

      We apologize for our lack of clarity. In legend 9, we reported the statistical comparisons between 1) cHET mice in absence of a-DTX and control mice and 2) cHET mice in presence of a-DTX and control mice. We will rephrase result description and the legend of the figure to avoid confusion.

      (3) The authors mention that the lack of differences in synaptic current kinetics is evidence against a change in subunit composition. However, in some Figures, for example, 3a, the kinetics of the recorded currents appear dramatically different. It would be important to know and compare the values of the series resistance between control and mutant animals.

      We agree with the reviewer that there appears to be a qualitative difference in eNMDA decay between conditions, although quantified eNMDA decay itself is similar between groups. We have used a cutoff of 15 % for the series resistance (Rs), which is significantly more stringent as compared to the cutoff typically used in electrophysiology, which are for the vast majority between 20 and 30%. To answer this concern, we re-examined the Rs, we compared Rs between groups and found no difference for Rs in eAMPA (13.2±0.5 in WT n=16 cells, 7 mice vs 13.7±0.3 in cHet n=14 cells, 7 mice, p=0.432 LMM) and eNMDA (12.7±0.7 in WT n=6 cells, 3 mice vs 13.8±0.7 in cHet n=6 cells, 5 mice, p=0.231, LMM). Thus, the apparent qualitative difference in eNMDA decay stems from inter-cell variability rather than inter-group differences. Notably, this discrepancy between the trace (Fig. 3a) and the data (Fig. 3f, right) is largely due to inter-cell variability, particularly in eNMDA, where a higher but non-significant decay rate is driven by a couple of very high values (Fig. 3f, right). In the revised manuscript, we will show traces that better represent our findings.

      (4) A significant unexplained variability is present in several datasets. For example, the AP threshold for PV+ includes points between -50-40 mV, but also values at around -20/-15 mV, which seems too depolarized to generate healthy APs (Fig 5c, Fig7c).

      We acknowledge the variability in AP threshold data, with some APs appearing too depolarized to generate healthy spikes. However, we meticulously examined each AP that spiked at these depolarized thresholds and found that other intrinsic properties (such as Rin, Vrest, AP overshoot, etc.) all indicate that these cells are healthy. Therefore, to maintain objectivity and provide unbiased data to the community, we opted to include them in our analysis. It's worth noting that similar variability has been observed in other studies (Bengtsson Gonzales et al., 2020; Bertero et al., 2020).

      Further, we conducted a significance test on AP threshold excluding these potentially unhealthy cells and found that the significant differences persist. After removing two outliers from the cHet group with values of -16.5 and 20.6 mV, we obtain: -42.6±1.01 mV in control, n=33, 15 mice vs -36.2±1.1 mV in cHet, n=38 cells, 17 mice, ***p<0.001, LMM. Thus, whether these cells are included or excluded, our interpretations and conclusions remain unchanged.

      We would like to clarify that these data have not been corrected with the junction potential. We will add this info in the revised version.

      (5) I am unclear as to how the authors quantified colocalization between VGluts and PSD95 at the low magnification shown in Supplementary Figure 2.

      We apologize for our lack of clarity. Although the analysis was done at high resolution, the figures were focused on showing multiple PV somata receiving excitatory inputs. We will add higher magnification figures and more detailed information in the methods of the revised version. Please also see our response to reviewer #2.

      (6) The authors claim that "cHet SST+ cells showed no significant changes in active and passive membrane properties", but this claim would seem to be directly refused by the data of Fig 8f. In the absence of changes in either active or passive membrane properties shouldn't the current/#AP plot remain unchanged?

      While we acknowledge the theoretical expectation that changes in intrinsic parameters should correlate with alterations in neuronal firing, the absence of differences in the parameters analyzed in this study should not overshadow the clear and significant decrease in firing rate observed in cHet SST+ cells. This decrease serves as a compelling indication of reduced intrinsic neuronal excitability. It's certainly possible that other intrinsic factors, not assessed in this study, may have contributed to this effect. However, exploring these mechanisms is beyond the scope of our current investigation. We will rephrase the discussion and add this limitation of our study in the revised version.

      (7) The plots used for the determination of AP threshold (Figs 5c, 7c, and 7h) suggest that the frequency of acquisition of current-clamp signals may not have been sufficient, this value is not included in the Methods section.

      This study utilized a sampling rate of 10 kHz, which is a standard rate for action potential analysis in the present context. We will describe more extensively the technical details in the method section of the revised manuscript we are preparing. While we acknowledge that a higher sampling rate could have enhanced the clarity of the phase plot, our recording conditions, as detailed in our response to Rev#2/comment#5, were suitable for the objectives of this study.

      Reference list

      Bengtsson Gonzales C, Hunt S, Munoz-Manchado AB, McBain CJ, Hjerling-Leffler J (2020) Intrinsic electrophysiological properties predict variability in morphology and connectivity among striatal Parvalbumin-expressing Pthlh-cells. Scientific Reports, 10, 15680. https://doi.org/10.1038/s41598-020-72588-1

      Bertero A, Zurita H, Normandin M, Apicella AJ (2020) Auditory long-range parvalbumin cortico-striatal neurons. Frontiers in Neural Circuits, 14, 45. http://doi.org/ 10.3389/fncir.2020.00045

      Chamberland S, Nebet ER, Valero M, Hanani M, Egger R, Larsen SB, Eyring KW, Buzsáki G, Tsien RW (2023) Brief synaptic inhibition persistently interrupts firing of fast-spiking interneurons. Neuron, 111, 1264–1281. http://doi.org/10.1016/j.neuron.2023.01.017

      Chehrazi P, Lee KKY, Lavertu-Jolin M, Abbasnejad Z, Carreño-Muñoz MI, Chattopadhyaya B, Di Cristo G (2023). The p75 Neurotrophin Receptor in Preadolescent Prefrontal Parvalbumin Interneurons Promotes Cognitive Flexibility in Adult Mice. Biol Psychiatry, 94, 310-321. doi: 10.1016/j.biopsych.2023.04.019.

      Elabbady L, Seshamani S, Mu S, Mahalingam G, Schneider-Mizell C, Bodor AL, Bae JA, Brittain D, Buchanan J, Bumbarger DJ, Castro MA, Dorkenwald S, Halageri A, Jia Z, Jordan C, Kapner D, Kemnitz N, Kinn S, Lee K, Li K…Collman F (2024) Perisomatic features enable efficient and dataset wide cell-type classifications across large-scale electron microscopy volumes. bioRxiv, https://doi.org/10.1101/2022.07.20.499976

      Goldberg EM, Clark BD, Zagha E, Nahmani M, Erisir A, Rudy B (2008) K+ Channels at the axon initial segment dampen near-threshold excitability of neocortical fast-spiking GABAergic interneurons. Neuron, 58, 387–400. https://doi.org/10.1016/j.neuron.2008.03.003

      Golomb D, Donner K, Shacham L, Shlosberg D, Amitai Y, Hansel D. (2007). Mechanisms of firing patterns in fast-spiking cortical interneurons. PLoS Computational Biology, 38, e156. http://doi.org/10.1371/journal.pcbi.0030156

      Hu H, Martina M, Jonas P (2010). Dendritic mechanisms underlying rapid synaptic activation of fast-spiking hippocampal interneurons. Science, 327, 52–58. http://doi.org/10.1126/science.1177876

      Hwang YS, Maclachlan C, Blanc J, Dubois A, Petersen CH, Knott G, Lee SH (2021). 3D ultrastructure of synaptic inputs to distinct gabaergic neurons in the mouse primary visual cortex. Cerebral Cortex, 31, 2610–2624. http://doi.org/10.1093/cercor/bhaa378

      Kavalali E (2015) The mechanisms and functions of spontaneous neurotransmitter release. Nature Reviews Neuroscience, 16, 5–16. https://doi.org/10.1038/nrn3875

      Kourrich S, Thomas MJ (2009) Similar neurons, opposite adaptations: psychostimulant experience differentially alters firing properties in accumbens core versus shell. Journal of Neuroscience, 29, 12275-12283. http://doi.org:10.1523/JNEUROSCI.3028-09.2009

      Kourrich S, Hayashi T, Chuang JY, Tsai SY, Su TP, Bonci A (2013) Dynamic interaction between sigma-1 receptor and Kv1.2 shapes neuronal and behavioral responses to cocaine. Cell, 152, 236–247. http://doi.org/10.1016/j.cell.2012.12.004

      Norenberg A, Hu H, Vida I, Bartos M, Jonas P (2010) Distinct nonuniform cable properties optimize rapid and efficient activation of fast-spiking GABAergic interneurons. Proceedings of the National Academy of Sciences, 107, 894–9. http://doi.org/10.1073/pnas.0910716107

      Stevens SR, Longley CM, Ogawa Y, Teliska LH, Arumanayagam AS, Nair S, Oses-Prieto JA, Burlingame AL, Cykowski MD, Xue M, Rasband MN (2021) Ankyrin-R regulates fast-spiking interneuron excitability through perineuronal nets and Kv3.1b K+ channels. Elife, 10, e66491. http://doi.org/10.7554/eLife.66491

      Russo G, Nieus TR, Maggi S, Taverna S (2013) Dynamics of action potential firing in electrically connected striatal fast-spiking interneurons. Frontiers in Cellular Neuroscience, 7, 209. https://doi.org/10.3389/fncel.2013.00209

      Ünal CT, Ünal B, Bolton MM (2020) Low-threshold spiking interneurons perform feedback inhibition in the lateral amygdala. Brain Structure and Function, 225, 909–923. http://doi.org/10.1007/s00429-020-02051-4

      Wang H, Kunkel DD, Schwartzkroin PA, Tempel BL (1994) Localization of Kv1.1 and Kv1.2, two K channel proteins, to synaptic terminals, somata, and dendrites in the mouse brain. The Journal of Neuroscience, 14, 4588-4599. https://doi.org/10.1523/JNEUROSCI.14-08-04588.1994

      Zhang YZ, Sapantzi S, Lin A, Doelfel SR, Connors BW, Theyel BB (2023) Activity-dependent ectopic action potentials in regular-spiking neurons of the neocortex. Frontiers in Cellular Neuroscience, 17. https://doi.org/10.3389/fncel.2023.1267687

      Zurita H, Feyen PLC, Apicella AJ (2018) Layer 5 callosal parvalbumin-expressing neurons: a distinct functional group of GABAergic neurons. Frontiers in Cellular Neuroscience, 12, 53. https://doi.org/10.3389/fncel.2018.00053

    1. Author response:

      Reviewer #1

      The first is that data on the general health of mice with single and double knockouts is not shown, nor is there any data on effects in any other tissues. This gives the impression that the only phenotype is in the male reproductive system, which would be misleading if there were phenotypes in other tissues that are not reported.

      We thank the reviewer for helpful and constructive suggestions that we plan to implement in the revision. We agree with this point and we will add a statement that the effect on the urogenital system was not the only observed phenotype, although it was the most striking histological feature that we found. We did notice some other physiological differences that we are examining in detail and determining their mechanisms, for future publications.

      Furthermore, data for the genitourinary system in single knockouts are very sparse; data are described for fertility in Figure 1H, ploidy, and cell number in Figures 2B and C, plasma testosterone and luteinizing hormone levels in Figures 5C and 5D, and morphology of testis and prostate tissue for single Cdk8 knockout in Supplementary Figure 1C (although in this case the images do not appear very comparable between control and CDK8 KO, thus perhaps wider fields should be shown), but, for example, there is no analysis of different meiotic stages or of gene expression in single knockouts. It is worth mentioning that single knockouts seem to show a corresponding upregulation of the level of the paralogue kinase, indicating that any lack of phenotypes might be due to feedback compensation, which would be an interesting finding if confirmed; this has not been mentioned.

      We agree that a description of the single KO could be beneficial, but we expect no big differences with the WT or Cre-Ert. We found neither histological differences nor changes in cell counts or ratios of cell types. Our ethical committee also has concerns about sacrificing mice without major phenotypic changes, without a well formulated hypothesis about the observed effects. We plan to add histological pictures to the next version of the article.

      We thank the reviewer for raising an important point about the paralog upregulation. Indeed, our data on primary cells (supplementary 1B) suggests the upregulation of CDK19 in CDK8KO and vice versa. We will point this out in disc We plan to examine the data for the testis as soon as more tissues are available.

      The second major weakness is that the correlation between double knockout and reduced expression of genes involved in steroid hormone biosynthesis is portrayed as a causal mechanism for the phenotypes observed. While this is a possibility, there are no experiments performed to provide evidence that this is the case. Furthermore, there is no evidence showing that CDK8 and/or CDK19 are directly responsible for the transcription of the genes concerned.

      We agree with the reviewer that the effects on CDK8/CDK19/CCNC could lead to the observed transcriptional changes in multiple indirect steps. There are, however, major technical challenges in examining the binding of transcription factors in the tissue, especially in Leydig cells which are a relatively minor population. We will clarify it in the revision, and strengthen this point in the discussion.

      Finally, the authors propose that the phenotypes are independent of the kinase activity of CDK8 or CDK19 because treatment of mice for a month with an inhibitor does not recapitulate the effects of the knockout, and nor does expression of two steroidogenic genes change in cultured Leydig cells upon treatment with an inhibitor. However, there are no controls for effective target inhibition shown.

      We thank the reviewer for raising this concern, which we will address in the revision. This study used the same CDK8/19 inhibitor (SNX631-6) as in the recently published study on prostate cancer (doi: 10.1172/JCI176709). That study describes the inhibitor, its target engagement in cell-free and cell-based assays, its anticancer potency, and its transcriptomic effects in vivo, the same dosage strength as in the present study, which phenocopy the effects of CDK8/19 knockdown. Additional data will be included in the revision.

      Reviewer #2

      The claim of reproductive defects in the induced double knockout of CDK8/19 resulted from the loss of CCNC via a kinase-independent mechanism is interesting but was not supported by the data presented. While the construction and analysis of the systemic induced knockout model of Cdk8 in Cdk19KO mice is not trivial, the analysis and data are weakened by the systemic effect of Cdk8 loss, making it difficult to separate the systemic effect from the local testis effect.

      We agree with the reviewer that the effects on the testis could be due to the systemic loss of CDK8 rather than specifically in the testis, and we will clarify it in the revision. We will also clarify that although our results are suggestive that the effects of CDK8/19 knockout are kinase-independent, and that the loss of Cyclin C is a likely explanation for the kinase independence but we do not claim that it is the mechanism.

      The analysis of male sterile phenotype is also inadequate with poor image quality, especially testis HE sections. The male reproductive tract picture is also small and difficult to evaluate.

      Unfortunately, during the submission process through Biorxiv the quality of the image worsened. We uploaded the high resolution pictures for the journal but probably they were not presented for the reviewer. We will re-send the high resolution images.

      The mice crossing scheme is unusual as you have three mice to cross to produce genotypes, while we could understand that it is possible to produce pups of desired genotypes with different mating schemes, such a vague crossing scheme is not desirable and of poor genetics practice.

      We thank the reviewer for this suggestion. Indeed, our scheme is not a representation of the actual breeding scheme but just a brief explanation of lineages used for the acquisition of the triple transgenic mice. We will include the full crossing scheme into the revision.

      Also using TAM-treated wild type as control is ok, but a better control will be TAM-treated ERT2-cre; CDK8f/f or TAM-treated ERT2 Cre CDK19/19 KO, so as to minimize the impact from the well-recognized effect of TAM.

      We used TAM-treated ERT2-cre for most of the experiments, and did not observe any major histological or physiological differences with the WT+TAM. We will make sure to present them in the revision.

      While the authors proposed that the inducible loss of CDK8 in the CDK19 knockout background is responsible for spermatogenic defects, it was not clear in which cells CDK8/19 genes are interested and which cell types might have a major role in spermatogenesis. The authors also put forward the evidence that reduction/loss of Testosterone might be the main cause of spermatogenic defects, which is consistent with the expression change in genes involved in steroigenesis pathway in Leydig cells of inducible double knockout. However it is not clear how the loss of Testosterone contributed to the loss of CcnC protein.

      We agree with the reviewer that the spermatogenic defects could be caused by the effects on gene expression in tissues other than Leydig cells. Nevertheless, this is our primary hypothesis since these changes resemble the effects of chemical castration in rats (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3408499/), and in SCARKO mice (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3968405/).

      Our hypothesis is actually the reversed scenario proposed by the reviewer. We think that the loss of steroidogenic gene expression is caused by the loss of CDK8/19 and Cyclin C in Leydig cells. This, in turn, leads to a drop of testosterone levels. We will expand this explanation for clarity.

      The authors should clarify or present the data on where CDK8 and CDK19 as well as CcnC are expressed so as to help the readers understand which tissues both CDK might be functioning in and cause the loss of CcnC. It should be easier to test the hypothesis of CDK8/19 stabilizing CcnC protein using double knock-out primary cells, instead of the whole testis.

      The stabilizing effect of Cdk8/19 on CcnC has been previously discovered and reported in cell culture (doi: 10.1093/nar/gkad538.), and here we have confirmed it at the level of whole tissue. Due to a limited sensitivity of single cell sequencing (only ~5,000 transcripts are sequenced from total of average 500,000 transcripts per cell, so the low expressed transcripts are not sequenced in all cells) it is challenging to firmly establish CDK8/19 positive and -negative tissues from single cell data because both transcripts are minor. This image will be included in the next version. We plan to resolve this matter using two approaches. First, we will try immunohistochemistry. If this method is not sufficiently sensitive we will analyze published single cell sequencing data from mouse databases and re-analyze our data. So far the former approach was challenging for us due to the absence of anti-mouse antibodies which are specific for CDK8 and CDK19 and work on tissue sections. We and others could not produce a tissue-specific staining, with the currently available commercially available antibodies. The only published specific antibody is currently not available.

      Since CDK8KO and CDK19KO have significantly reduced fertility compared to the wild type, it might be important to measure the sperm quantity and motility among CDK8 KO, CDK19KO, and induced DKO to evaluate spermatogenesis based on their sperm production.

      We agree that this is an interesting question. We did not do spermograms for single KOs but we don’t think that a decreased sperm count would explain CDK8KO infertility as the vasectomized males are able to produce copulative plugs in females whereas CDK8KO males do not, suggesting the absence of mating behavior as a reason for low fertility in the latter genotype.

      Some data for the inducible knockout efficiency of Cdk8 were presented in Supplemental Figure 1, but there is no legend for the supplemental figures, it was not clear which band represented the deletion band, and which tissues were examined. Tail or testis?

      We apologize for the accidental loss of supplementary figure legends, which will be presented in the next version. The efficiency of CDK8 KO in different tissues was previously examined by us in https://www.ncbi.nlm.nih.gov/gene/264064. The western blot in the MS represents deletion data for the testis.

      It seems that two months after the injection of Tam, all the Cdk8 were completely deleted, indicating extremely efficient deletion of Tam induction by two months post administration. Were the complete deletion of Cdk8 happening even earlier?

      The complete deletion of CDK8 occurs within a week or even as early as 2-3 days in culture, and at least after at two weeks in vivo. We chose the two mo. period to prevent the effect of tamoxifen on gene expression. We examined other time points (Figure 6) and registered the beginning of effects at 2 weeks and maximum effect by one mo.

      The authors found that Sertoli cells re-entered the cell cycle in the inducible double knockout but stopped short of careful characterization other than increased expression of cell cycle genes.

      We agree with the reviewer, and we will add Ki67 (or equivalent) staining along with Sertoli cell markers.

      Dko should be appropriately named iDKO (induced dKO).

      We will make the corresponding change.

      We performed necropsy ? not the right wording here. Colchicine-lke apoptotic bodies ? what does this mean? Not clear.

      We will amend the next version to address these minor points, and we thank the reviewer for careful reading of the manuscript.

      Images throughout the manuscript suffer from poor resolution and are often blurry and hard to evaluate.

      As mentioned above, we had a problem with image quality during the submission through Biorxiv and we will provide high resolution images in the next version.

      To pinpoint the meiotic stage defect of iDKO, it is better to use the meiotic chromosome spread approach.

      Unfortunately, meiotic spreads would not be feasible or informative, due to a low number of surviving cells in iDKO and the fact that there were evidently no cells in stages after SYCP3+.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Thank you very much for the careful and positive reviews of our manuscript. We have addressed each comment in the attached revised manuscript. We describe the modifications below. To avoid confusion, we've changed supplementary figure and table captions to start with "Supplement Figure" and "Supplementary Table," instead of "Figure" and "Table."

      We have modified/added:

      ● Supplementary Table S1: AUC scores for the top 10 frequent epitope types (pathogens) in the testing set of epitope split.

      ● Supplementary Table S5: AUCs of TCR-epitope binding affinity prediction models with BLOSUM62 to embed epitope sequences.

      ● Supplementary Table S6: AUCs of TCR-epitope binding affinity prediction models trained on catELMo TCR embeddings and random-initialized epitope embeddings.

      ● Supplementary Table S7: AUCs of TCR-epitope binding affinity prediction models trained on catELMo and BLOSUM62 embeddings.

      ● Supplementary Figure 4: TCR clustering performance for the top 34 abundant epitopes representing 70.55% of TCRs in our collected databases.

      ● Section Discussion.

      ● Section 4.1 Data: TCR-epitope pairs for binding affinity prediction.

      ● Section 4.4.2 Epitope-specific TCR clustering.

      Public Reviews:

      Reviewer #1 (Public Review):

      In this manuscript, the authors described a computational method catELMo for embedding TCR CDR3 sequences into numeric vectors using a deep-learning-based approach, ELMo. The authors applied catELMo to two applications: supervised TCR-epitope binding affinity prediction and unsupervised epitope-specific TCR clustering. In both applications, the authors showed that catELMo generated significantly better binding prediction and clustering performance than other established TCR embedding methods. However, there are a few major concerns that need to be addressed.

      (1) There are other TCR CDR3 embedding methods in addition to TCRBert. The authors may consider incorporating a few more methods in the evaluation, such as TESSA (PMCID: PMC7799492), DeepTCR (PMCID: PMC7952906) and the embedding method in ATM-TCR (reference 10 in the manuscript). TESSA is also the embedding method in pMTnet, which is another TCR-epitope binding prediction method and is the reference 12 mentioned in this manuscript.

      TESSA is designed for characterizing TCR repertoires, so we initially excluded it from the comparison. Our focus was on models developed specifically for amino acid embedding rather than TCR repertoire characterization. However, to address the reviewer's inquiry, we conducted further evaluations. Since both TESSA and DeepTCR used autoencoder-based models to embed TCR sequences, we selected one used in TESSA for evaluation in our downstream prediction task, conducting ten trials in total. It achieved an average AUC of 75.69 in TCR split and 73.3 in epitope split. Notably, catELMo significantly outperformed such performance with an AUC of 96.04 in TCR split and 94.10 in epitope split.

      Regarding the embedding method in ATM-TCR, it simply uses BLOSUM as an embedding matrix which we have already compared in Section 2.1. Furthermore, we have provided the comparison results between our prediction model trained on catELMo embeddings with the state-of-the-art prediction models such as netTCR and ATM-TCR in Table 6 of the Discussion section.

      (2) The TCR training data for catELMo is obtained from ImmunoSEQ platform, including SARS-CoV2, EBV, CMV, and other disease samples. Meanwhile, antigens related to these diseases and their associated TCRs are extensively annotated in databases VDJdb, IEDB and McPAS-TCR. The authors then utilized the curated TCR-epitope pairs from these databases to conduct the evaluations for eptitope binding prediction and TCR clustering. Therefore, the training data for TCR embedding may already be implicitly tuned for better representations of the TCRs used in the evaluations. This seems to be true based on Table 4, as BERT-Base-TCR outperformed TCRBert. Could catELMo be trained on PIRD as TCRBert to demonstrate catELMo's embedding for TCRs targeting unseen diseases/epitopes?

      We would like to note that catELMo was trained exclusively on TCR sequences in an unsupervised manner, which means it has never been exposed to antigen information. We also ensured that the TCRs used in catELMo's training did not overlap with our downstream prediction data. Please refer to the section 4.1 Data where we explicitly stated, “We note that it includes no identical TCR sequences with the TCRs used for training the embedding models.”. Moreover, the performance gap (~1%) between BERT-Base-TCR and TCRBert, as observed in Table 4, is relatively small, especially when compared to the performance difference (>16%) between catELMo and TCRBert.

      To further address this concern, we conducted experiments using the same number of TCRs, 4,173,895 in total, sourced exclusively from healthy ImmunoSeq repertoires. This alternative catELMo model demonstrated a similar prediction performance (based on 10 trials) to the one reported in our paper, with an average AUC of 96.35% in TCR split and an average AUC of 94.03% in epitope split.

      We opted not to train catELMo on the PIRD dataset for several reasons. First, approximately 7.8% of the sequences in PIRD also appear in our downstream prediction data, which could be a potential source of bias. Furthermore, PIRD encompasses sequences related to diseases such as Tuberculosis, HIV, CMV, among others, which the reviewer is concerned about.

      (3) In the application of TCR-epitope binding prediction, the authors mentioned that the model for embedding epitope sequences was catElMo, but how about for other methods, such as TCRBert? Do the other methods also use catELMo-embedded epitope sequences as part of the binding prediction model, or use their own model to embed the epitope sequences? Since the manuscript focuses on TCR embedding, it would be nice for other methods to be evaluated on the same epitope embedding (maybe adjusted to the same embedded vector length).

      Furthermore, the authors found that catELMo requires less training data to achieve better performance. So one would think the other methods could not learn a reasonable epitope embedding with limited epitope data, and catELMo's better performance in binding prediction is mainly due to better epitope representation.

      Review 1 and 3 have raised similar concerns regarding the epitope embedding approach employed in our binding affinity prediction models. We address both comments together on page 6 where we discuss the epitope embedding strategies in detail.

      (4) In the epitope binding prediction evaluation, the authors generated the test data using TCR-epitope pairs from VDJdb, IEDB, McPAS, which may be dominated by epitopes from CMV. Could the authors show accuracy categorized by epitope types, i.e. the accuracy for TCR-CMV pair and accuracy for TCR-SARs-CoV2 separately?

      The categorized AUC scores have been added in Supplementary Table 7. We observed significant performance boosts from catELMo compared with other embedding models.

      (5) In the unsupervised TCR clustering evaluation, since GIANA and TCRdist direct outputs the clustering result, so they should not be affected by hierarchical clusters. Why did the curves of GIANA and TCRdist change in Figure 4 when relaxing the hierarchical clustering threshold?

      For fair comparisons, we performed GIANA and TCRdist with hierarchical clustering instead of the nearest neighbor search. We have clarified it in the revised manuscript as follows.

      “Both methods are developed on the BLOSUM62 matrix and apply nearest neighbor search to cluster TCR sequences. GIANA used the CDR3 of TCRβ chain and V gene, while TCRdist predominantly experimented with CDR1, CDR2, and CDR3 from both TCRα and TCRβ chains. For fair comparisons, we perform GIANA and TCRdist only on CDR3 β chains and with hierarchical clustering instead of the nearest neighbor search.”

      (6 & 7) In the unsupervised TCR clustering evaluation, the authors examined the TCR related to the top eight epitopes. However, there are much more epitopes curated in VDJdb, IEDB and McPAS-TCR. In real application, the potential epitopes is also more complex than just eight epitopes. Could the authors evaluate the clustering result using all the TCR data from the databases? In addition to NMI, it is important to know how specific each TCR cluster is. Could the authors add the fraction of pure clusters in the results? Pure cluster means all the TCRs in the cluster are binding to the same epitope, and is a metric used in the method GIANA.

      We would like to note that there is a significant disparity in TCR binding frequencies across different epitopes in current databases. For instance, the most abundant epitope (KLGGALQAK) has approximately 13k TCRs binding to it, while 836 out of 982 epitopes are associated with fewer than 100 TCRs in our dataset. Furthermore, there are 9347 TCRs having the ability to bind multiple epitopes. In order to robustly evaluate the clustering performance, we originally selected the top eight frequent epitopes from McPAS and removed TCRs binding multiple epitopes to create a more balanced dataset.

      We acknowledge that the real-world scenario is more complex than just eight epitopes. Therefore, we conducted clustering experiments using the top most abundant epitopes whose combined cognate TCRs make up at least 70% of TCRs across three databases (34 epitopes). This is illustrated in Supplementary Figure 5. Furthermore, we extended our analysis by clustering all TCRs after filtering out those that bind to multiple epitopes, resulting in 782 unique epitopes. We found that catELMo achieved the 3rd and 2nd best performance in NMI and Purity, respectively (see Table below). These are aligned with our previous observations of the eight epitopes.

      Author response table 1.

      Reviewer #2 (Public Review):

      In the manuscript, the authors highlighted the importance of T-cell receptor (TCR) analysis and the lack of amino acid embedding methods specific to this domain. The authors proposed a novel bi-directional context-aware amino acid embedding method, catELMo, adapted from ELMo (Embeddings from Language Models), specifically designed for TCR analysis. The model is trained on TCR sequences from seven projects in the ImmunoSEQ database, instead of the generic protein sequences. They assessed the effectiveness of the proposed method in both TCR-epitope binding affinity prediction, a supervised task, and the unsupervised TCR clustering task. The results demonstrate significant performance improvements compared to existing embedding models. The authors also aimed to provide and discuss their observations on embedding model design for TCR analysis: 1) Models specifically trained on TCR sequences have better performance than models trained on general protein sequences for the TCR-related tasks; and 2) The proposed ELMo-based method outperforms TCR embedding models with BERT-based architecture. The authors also provided a comprehensive introduction and investigation of existing amino acid embedding methods. Overall, the paper is well-written and well-organized.

      The work has originality and has potential prospects for immune response analysis and immunotherapy exploration. TCR-epitope pair binding plays a significant role in T cell regulation. Accurate prediction and analysis of TCR sequences are crucial for comprehending the biological foundations of binding mechanisms and advancing immunotherapy approaches. The proposed embedding method presents an efficient context-aware mathematical representation for TCR sequences, enabling the capture and analysis of their structural and functional characteristics. This method serves as a valuable tool for various downstream analyses and is essential for a wide range of applications. Thank you.

      Reviewer #3 (Public Review):

      Here, the authors trained catElMo, a new context-aware embedding model for TCRβ CDR3 amino acid sequences for TCR-epitope specificity and clustering tasks. This method benchmarked existing work in protein and TCR language models and investigated the role that model architecture plays in the prediction performance. The major strength of this paper is comprehensively evaluating common model architectures used, which is useful for practitioners in the field. However, some key details were missing to assess whether the benchmarking study is a fair comparison between different architectures. Major comments are as follows:

      • It is not clear why epitope sequences were also embedded using catELMo for the binding prediction task. Because catELMO is trained on TCRβ CDR3 sequences, it's not clear what benefit would come from this embedding. Were the other embedding models under comparison also applied to both the TCR and epitope sequences? It may be a fairer comparison if a single method is used to encode epitope sequence for all models under comparison, so that the performance reflects the quality of the TCR embedding only.

      In our study, we indeed used the same embedding model for both TCRs and epitopes in each prediction model, ensuring a consistent approach throughout.

      Recognizing the importance of evaluating the impact of epitope embeddings, we conducted experiments in which we used BLOSUM62 matrix to embed epitope sequences for all models. The results (Supplementary Table 5) are well aligned with the performance reported in our paper. This suggests that epitope embedding may not play as critical a role as TCR embedding in the prediction tasks. To further validate this point, we conducted two additional experiments.

      Firstly, we used catELMo to embed TCRs while employing randomly initialized embedding matrices with trainable parameters for epitope sequences. It yielded similar prediction performance as when catELMo was used for both TCR and epitope embedding (Supplementary Table 6). Secondly, we utilized BLOSUM62 to embed TCRs but employed catELMo for epitope sequence embedding, resulting in performance comparable to using BLOSUM62 for both TCRs and epitopes (Supplementary Table 4). These experiment results confirmed the limited impact of epitope embedding on downstream performance.

      We conjecture that these results may be attributed to the significant disparity in data scale between TCRs (~290k) and epitopes (less than 1k). Moreover, TCRs tend to exhibit high similarity, whereas epitopes display greater distinctiveness from one another. These features of TCRs require robust embeddings to facilitate effective separation and improve downstream performance, while epitope embedding primarily serves as a categorical encoding.

      We have included a detailed discussion of these findings in the revised manuscript to provide a comprehensive understanding of the role of epitope embeddings in TCR binding prediction.

      • The tSNE visualization in Figure 3 is helpful. It makes sense that the last hidden layer features separate well by binding labels for the better performing models. However, it would be useful to know if positive and negative TCRs for each epitope group also separate well in the original TCR embedding space. In other words, how much separation between these groups is due to the neural network vs just the embedding?

      It is important to note that we used the same downstream prediction model, a simple three-linear-layer network, for all the discussed embedding methods. We believe that the separation observed in the t-SNE visualization effectively reflects the ability of our embedding model. Also, we would like to mention that it can be hard to see a clear distinction between positive and negative TCRs in the original embedding space because embedding models were not trained on positive/negative labels. Please refer to the t-SNE of the original TCR embeddings below.

      Author response image 1.

      • To generate negative samples, the author randomly paired TCRs from healthy subjects to different epitopes. This could produce issues with false negatives if the epitopes used are common. Is there an estimate for how frequently there might be false negatives for those commonly occurring epitopes that most populations might also have been exposed to? Could there be a potential batch effect for the negative sampled TCR that confounds with the performance evaluation?

      Thank you for bringing this valid and interesting point up. Generating negative samples is non-trivial since only a limited number of non-binding TCR-pairs are publicly available and experimentally validating non-binding pairs is costly [1]. Standard practices for generating negative pairs are (1) paring epitopes with healthy TCRs [2, 3], and (2) randomly shuffling existing TCR-epitope pairs [4,5]. We used both approaches (the former included in the main results, and the latter in the discussion). In both scenarios, catELMo embeddings consistently demonstrated superior performance.

      We acknowledge the possibility of false negatives due to the finite-sized TCR database from which we randomly selected TCRs, however, we believe that the likelihood of such occurrences is low. Given the vast diversity of human TCR clonotypes, which can exceed 10^15[6], the chance of randomly selecting a TCR that specifically recognizes a target epitope is relatively small.

      In order to investigate the batch effect, we generated new negative pairs using different seeds and observed consistent prediction performance across these variations. However, we agree that there could still be a potential batch effect for the negative samples due to potential data bias.

      We have discussed the limitation of generative negative samples in the revised manuscript.

      • Most of the models being compared were trained on general proteins rather than TCR sequences. This makes their comparison to catELMO questionable since it's not clear if the improvement is due to the training data or architecture. The authors partially addressed this with BERT-based models in section 2.4. This concern would be more fully addressed if the authors also trained the Doc2vec model (Yang et al, Figure 2) on TCR sequences as baseline models instead of using the original models trained on general protein sequences. This would make clear the strength of context-aware embeddings if the performance is worse than catElmo and BERT.

      We agree it is important to distinguish between the effects of training data and architecture on model performance.

      In Section 2.4, as the reviewer mentioned, we compared catELMo with BERT-based models trained on the same TCR repertoire data, demonstrating that architecture plays a significant role in improving performance. Furthermore, in Section 2.5, we compared catELMo-shallow with SeqVec, which share the same architecture but were trained on different data, highlighting the importance of data on the model performance.

      To further address the reviewer's concern, we trained a Doc2Vec model on the TCR sequences that have been used for catELMo training. We observed significantly lower prediction performance compared to catELMo, with an average AUC of 50.24% in TCR split and an average AUC of 51.02% in epitope split, making the strength of context-aware embeddings clear.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) It is known that TRB CDR3, the CDR1, CDR2 on TRBV gene and the TCR alpha chain also contribute to epitope recognition, but were not modeled in catELMo. It would be nice for the authors to add this as a current limitation for catELMo in the Discussion section.

      We have discussed the limitation in the revised manuscript.

      “Our study focuses on modeling the TCRβ chain CDR3 region, which is known as the primary determinant of epitope binding. Other regions, such as CDR1 and CDR2 on the TRB V gene, along with the TCRα chain, may also contribute to specificity in antigen recognition. However, a limited number of available samples for those additional features can be a challenge for training embedding models. Future work may explore strategies to incorporate these regions while mitigating the challenges of working with limited samples.”

      (2) I tried to follow the instructions to train a binding affinity prediction model for TCR-epitope pairs, however, the cachetools=5.3.0 seems could not be found when running "pip install -r requirements.txt" in the conda environment bap. Is this cachetools version supported after Python 3.7 so the Python 3.6.13 suggested on the GitHub repo might not work?

      This has been fixed. We have updated the README.md on our github page.

      Reviewer #2 (Recommendations For The Authors):

      The article is well-constructed and well-written, and the analysis is comprehensive.

      The comments for minor issues that I have are as follows:

      (1) In the Methods section, it will be clearer if the authors interpret more on how the standard deviation is calculated in all tables. How to define the '10 trials'? Are they based on different random training and test set splits?

      ‘10 trials' refers to the process of splitting the dataset into training, validation, and testing sets using different seeds for each trial. Different trials have different training, validation, and testing sets. For each trial, we trained a prediction model on its training set and measured performance on its testing set. The standard deviation was calculated from the 10 measurements, estimating model performance variation across different random splits of the data.

      (2) The format of AUCs and the improvement of AUCs need to be consistent, i.e., with the percent sign.

      We have updated the format of AUCs.

      Reviewer #3 (Recommendations For The Authors):

      In addition to the recommendations in the public review, we had the following more minor questions and recommendations:

      • Could you provide some more background on the data, such as overlaps between the databases, and how the training and validation split was performed between the three databases? Also summary statistics on the length of TCR and epitope sequence data would be helpful.

      We have provided more details about data in our revision.

      • Could you comment on the runtime to train and embed using the catELMo and BERT models?

      Our training data is TCR sequences with relatively short lengths (averaging less than 20 amino acid residues). Such characteristic significantly reduces the computational resources required compared to training large-scale language models on extensive text corpora. Leveraging standard machines equipped with two GeForce RTX 2080 GPUs, we were able to complete the training tasks within a matter of days. After training, embedding one sequence can be accomplished in a matter of seconds.

      • Typos and wording:

      • Table 1 first row of "source": "immunoSEQ" instead of "immuneSEQ"

      This has been corrected.

      • L23 of abstract "negates the need of complex deep neural network architecture" is a little confusing because ELMo itself is a deep neural network architecture. Perhaps be more specific and add that the need is for downstream tasks.

      We have made it more specific in our abstract.

      “...negates the need for complex deep neural network architecture in downstream tasks.”

      References

      (1) Montemurro, Alessandro, et al. "NetTCR-2.0 enables accurate prediction of TCR-peptide binding by using paired TCRα and β sequence data." Communications biology 4.1 (2021): 1060.

      (2) Jurtz, Vanessa Isabell, et al. "NetTCR: sequence-based prediction of TCR binding to peptide-MHC complexes using convolutional neural networks." BioRxiv (2018): 433706.

      (3) Gielis, Sofie, et al. "Detection of enriched T cell epitope specificity in full T cell receptor sequence repertoires." Frontiers in immunology 10 (2019): 2820.

      (4) Cai, Michael, et al. "ATM-TCR: TCR-epitope binding affinity prediction using a multi-head self-attention model." Frontiers in Immunology 13 (2022): 893247.

      (5) Weber, Anna, et al. "TITAN: T-cell receptor specificity prediction with bimodal attention networks." Bioinformatics 37 (2021): i237-i244.

      (6) Lythe, Grant, et al. "How many TCR clonotypes does a body maintain?." Journal of theoretical biology 389 (2016): 214-224.

    1. Author response:

      eLife assessment

      This is an important study describing a neuromuscular junction co-culture system using human cells that the authors use to study the synaptic consequences of ALS mutations. The data supporting the system are solid and show the value of using myotubes and motor neurons from the same donor. The study will be of interest to researchers who model neuromuscular junction disorders, however, the authors could more comprehensively compare and contrast their system with previous literature describing other similar models. There are also technical weaknesses that limit the interpretation of specific findings.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors propose an improved neuro-muscle co-culture system to study ALS-related functional differences in human pluripotent stem cell lines.

      Strengths:

      A simple co-culture system with functional readout.

      We appreciate the recognition that this is a simplified co-culture system with a straight-forward functional evaluation.

      Weaknesses:

      There are concerns about the lack of novelty, rigor, and clarity in the approach. The strength of the study is undermined by its reliance on transcription factors used more than a decade ago, low myocyte activity, and inadequate validation methods, such as the lack of single-cell transcriptome analysis and detailed neuromuscular synapse characterization. The evidence presented requires substantial validation through rigorous experimental approaches and resolution of the identified concerns for the study's findings to be considered significant and reliable.

      The muscle differentiation protocol used in our work is an adaptation of the Albini S, et al. Cell Rep. 2013. This protocol was selected due to its efficiency to differentiate skeletal muscles from pluripotent stem cells (PSCs). Modifications from the original publications were made in the plasmids (MYOD and BAF60C) used, such as the inclusion of selection genes, puromycin and blasticidin, to improve efficiency. Moreover, a criticism of the previously used overexpression system, especially overexpression of MYOD, is that it introduces artificial expression of this gene throughout muscle differentiation, when it is only supposed to be expressed early in myogenesis. Thus, the constructs used in our work are dox inducible, which enables us to control the expression of MYOD and restrict it to the first 48 hours. This protocol resulted in a highly efficient skeletal muscle differentiation, as noted in our manuscript. “The PSC-derived skeletal muscles were characterized by the presence of Desmin (DES) and Myosin Heavy Chain (MHC), and as early as day 8 of differentiation nearly 100% of the cells co-expressed these markers.” We agree with the reviewer that the myocyte activity identified in our work is lower compared to Albini et al. (2013), mostly explained by the modification we made to the method, from a 3D to a 2D culture. In Albini et al. (2013) the electrophysiological properties were assayed in skeletal myospheres (3D), which are known to improve contractility measurements. Conversely, in 2D cultures when the contractility intensifies the cells detach from the plate. Thus, a tight regulation of cell concentration for optimal maturation and formation of contractile skeletal muscle culture without premature detachment of the cells is required. We believe that single-cell or single-nuclei transcriptome analysis from the co-culture setting of two well-defined cell types might yield little value for method characterization, however, as part of a follow up study we are performing morphological NMJ characterization and applying single-nuclei transcriptome analysis in the fALS disease context to identify specific molecular mechanisms that result in synaptic dysfunction.

      Reviewer #2 (Public Review):

      The manuscript by Chen et al from the group of Helen Miranda aims to describe an improved neuromuscular junction (NMJ) model to study synaptic dysfunction in several cases of familial ALS. Overall, the system described in the paper appears as a valid platform to study disease phenotypes with exciting results showing specific effects of GDNF on non-SOD1 ALS patient lines. The strength of the paper lies in the use of myotubes, and motor neurons derived from the same donor. However, the current study: (1) lacks a clear comparison of the current system with numerous previously described systems; (2) is limited by the number of repeat experiments in the study and (3) has no description of the synaptic phenotype observed in the study. These major points are discussed in more detail below.

      We appreciate the recognition that “the system described in the paper appears as a valid platform to study disease phenotypes with exciting results showing specific effects of GDNF on non-SOD1 ALS patient lines” and the careful evaluation of our work. We plan to address the points raised by this reviewer in the revision.

      Major points:

      (1) In the introduction the authors state (p. 4): "Finally, recent human NMJ models have been established from PSCs by differentiating these cells into both skeletal muscles and motor neurons in 2D and 3D formats. These previous systems present a remarkable advancement to the studies of human NMJs, however, they require long NMJ formation and maturation time (40 to 60 days), which, restricts their sensitivity and scalability [42]"

      In fact, a number of studies have described various in-vitro NMJ systems, with the same timeframes for NMJ formation. For example, in studies by Osaki et al, 2018, Sci Adv; Bellmann et al, 2019, Biomat; Demestre et al, 2015, Stem Cell Res; Badu-Mensah et al, 2022, Biomat (this is just an exemplar selection of the papers); NMJ formation was observed as early as 14 d in culture, in line with or at least slightly longer than reported by Chen et al. With the exception of the study by Osaki et al, all co-culture systems cited above are 2D-based. The authors need to expand on this further or provide a quantitative assessment of why their system is better compared to previously published models.

      Indeed, there are previous publications that have described neuromuscular junctions (NMJs) in cocultures of iPSC-derived skeletal muscles and motor neurons. Some of the publications mentioned above did show NMJ formation within ~20ish days, albeit with several caveats such as culture heterogeneity, i.e. 50% motor neuron differentiation efficiency. We agree with the reviewer that this needs to be expanded and clarified, and we will address this concern in the revision.

      (2) Further, when comparing their results with other work it is hard to claim how the current system is (p. 5) "more reproducible, and offers a 6-fold increase in scalability compared to previous models [40-43]".

      The authors need to expand on this further.

      This is an important aspect of this work, and we believe that our protocol offers a higher reproducibility due to, at least partially, the homogeneity of the starting cultures of iPSC-derived skeletal muscles and iPSC-derived motor neurons, and that the direct 2D co-culture approach is more suitable for miniaturization compared to 3D cultures or microfluidic chamber devices. Thus, we will expand on this idea in the revision.

      (3) Although mentioned, there were no examples of the modularity of the system, which of course would strengthen the paper and help to uncover ALS mechanisms of synaptic formation, for example by combining WT myotubes and fALS motor neurons (see point 4 below). The authors should show how they would adapt to 96 well plate format to showcase the scalability of the system. Based on their data on the efficacy of synaptic formation (60 per 0.7 cm2 area), is further miniaturization allowed?

      We appreciate the points raised by the reviewer. The “mix-and-match” approach to co-culture wild-type and affected iPSC-derived skeletal muscles with iPSC-derived motor neurons is a main focus of our lab and an advantage to protocols like ours, where cells are differentiated independently and later co-cultured together; however, a comprehensive characterization of various mix-match combinations is beyond the scope of this Tools and Resources article. Since the initial submission of this manuscript, we have extensively optimized the scalability of the co-cultures from the initial 0.7 cm2 to 0.32 cm2 (96-well plates). Further miniaturization is also being optimized to 0.136 cm2 (384-well plates). This point will be clarified in the revision.

      (4) A lot of a-bungarotoxin staining corresponds to AChR clusters that do not seem to be associated with muscle and do not form normal rings of clustering (pretzel-like) associated with the NMJ in vivo. This is seen clearly in Figure 3B and Figure 5B. Figures 3B and 5B only show low-magnification images which makes it difficult to assess the specificity of localization of the pre-/post-synaptic markers. The authors should clearly show the morphologies of the NMJs formed in WT and fALS lines at high magnification. In addition, the authors should show co-localization images for a-bungarotoxin and myosin-heavy chains to confirm the localization of the bungarotoxin signal on the myotubes.

      In addition to that, the authors report that the number of functional synapses formed on a plate varies from 30 (fASL) to 60 (Ctrl) per 10,000 neurons spread over the 0.7 cm2 area (0.6%). How do the authors explain an extensive loss of a-bungarotoxin signal in Figure 5B the majority of which likely corresponds to AChR clusters that are formed outside of neuronal connections? Such clustering can be usually observed in immature co-cultures and in vivo prior to the innervation of myotubes. One explanation could be that myotubes derived from fALS PSC are less capable of synaptic formation. Noteworthy, a study of PSCderived myotubes and motor neurons from PSC lines with various SOD1 mutations has already been published, but not cited by Chen et al (Badu-Mensah et al). Given the importance of those confounding factors, the authors should test cell-intrinsic (motor neuron-related) vs non-cell-intrinsic mechanisms by co-culturing healthy myotubes with fALS-derived motor neurons followed by NMJ quantification.

      The iPSC-derived skeletal muscle cultures were plated as a monolayer and even though the abungarotoxin staining does not show the pretzel-like shape NMJs, similar to other in vitro NMJ protocols (Badu-Mensah et al, Biomat 2023; Pereira et al., Nat Commun 2021; Uzel et al., Sci Adv 2016), abungarotoxin does show association with the muscles. For quantification purposes we omitted the MHC staining to decrease background, however we will include it in the revision in response to the reviewer’s concern.

      We agree with the reviewer that the suggested approaches would yield insight into disease mechanism but are beyond the scope of this method development study. In fact, we are very excited about our follow up study pursuing a more in-depth analysis of cell-autonomous vs non-cell autonomous pathogenesis to understand the NMJ dysfunction in fALS. We apologize that the “Badu-Mensah et al” work was not included, this was our oversight and will be added in the revision.

      (5) The authors present the advantage of optogenetic stimulation, but they only show the proof-ofprinciple and never really apply it to their studies. Specifically, with regard to Figure 6, are motor units derived from fALS PSCs incapable of being ontogenetically activated to the same extent as control motor units? Does the dysfunction stem from fALS motor neurons or fALS myotubes?

      We agree that these are important questions to be addressed and are actively pursuing these experiments as part of the natural follow up investigation from the present Tools and Resources article.

      (6) Figures 6 B and C appear to be identical except for the addition of the GDNF effect on the fALS lines. This should all be put in one figure. The authors should also show whether GDNF-induced functional recovery is associated with recovery in the number of motor units or with merely synaptic function by quantifying the NMJ number in the presence of GDNF.

      We will combine Figures 6B and 6C in the revision. Our follow up study also includes the interrogation of the mechanism through which GDNF rescues fALS NMJ dysfunction.

      (7) Figure 5 and Figure 6. The authors only use one line per fALS mutation and their corresponding isogenic controls. They state that the n=6 for these experiments represents the technical replication of the experiment. These experiments should be performed at least n=3 times starting from neuronal differentiation, and not by seeding replicate wells representing a true replication of each experiment. This would significantly strengthen their argument that their method is robust and the results are easily reproducible.

      We will clarify that the technical replicates originated from independent differentiations in the revision.

      (8) In the discussion the authors may want to mention that the lack of function of GDNF on the SOD1 lines may relate to the fact that SOD1 mutations do not lead to TDP43 pathology. Although speculative this suggests that in cases with TDP43 mutations (their data) or sporadic disease GDNF may be effective.

      We appreciate this suggestion and will highlight this as possible inclusion criteria for GDNF treatment in the discussion of our revised version of the manuscript.

      (9) Although beyond the scope of this paper, it would of course be interesting to see if sporadic forms of ALS had this same phenotype.

      We agree with the reviewer and we hope to include iPSC derived NMJs from sporadic ALS patients in a future study.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Strengths:

      This work (almost didactically) demonstrates how to develop, calibrate, validate and analyze a comprehensive, spatially resolved, dynamical, multicellular model. Testable model predictions of (also non-monotonic) emergent behaviors are derived and discussed. The computational model is based on a widely-used simulation platform and shared openly such that it can be further analyzed and refined by the community.

      Weaknesses:

      While the parameter estimation approach is sophisticated, this work does not address issues of structural and practical non-identifiability (Wieland et al., 2021, DOI:10.1016/j.coisb.2021.03.005) of parameter values, given just tissue-scale summary statistics, and does not address how model predictions might change if alternative parameter combinations were used. Here, the calibrated model represents one point estimate (column "Value" in Suppl. Table 1) but there is specific uncertainty of each individual parameter value and such uncertainties need to be propagated (which is computationally expensive) to the model predictions for treatment scenarios.

      We thank the reviewer for the excellent suggestions and observations. The CaliPro parameterization technique applied puts an emphasis on finding a robust parameter space instead of a global optimum. To address structural non-identifiability, we utilized partial rank correlation coefficient with each iteration of the calibration process to ensure that the sensitivity of each parameter was relevant to model outputs. We also found that there were ranges of parameter values that would achieve passing criteria but when testing the ranges in replicate resulted in inconsistent outcomes. This led us to further narrow the parameters into a single parameter set that still had stochastic variability but did not have such large variability between replicate runs that it would be unreliable. Additional discussion on this point has been added to lines 623-628. We acknowledge that there are likely other parameter sets or model rules that would produce similar outcomes but the main purpose of the model was to utilize it to better understand the system and make new predictions, which our calibration scheme allowed us to accomplish.

      Regarding practical non-identifiability, we acknowledge that there are some behaviors that are not captured in the model because those behaviors were not specifically captured in the calibration data. To ensure that the behaviors necessary to answer the aims of our paper were included, we used multiple different datasets and calibrated with multiple different output metrics. We believe we have identified the appropriate parameters to recapitulate the dominating mechanisms underlying muscle regeneration. We have added additional discussion on practical non-identifiability to lines 621-623.

      Suggested treatments (e.g. lines 484-486) are modeled as parameter changes of the endogenous cytokines (corresponding to genetic mutations!) whereas the administration of modified cytokines with changed parameter values would require a duplication of model components and interactions in the model such that cells interact with the superposition of endogenous and administered cytokine fields. Specifically, as the authors also aim at 'injections of exogenously delivered cytokines' (lines 578, 579) and propose altering decay rates or diffusion coefficients (Fig. 7), there needs to be a duplication of variables in the model to account for the coexistence of cytokine subtypes. One set of equations would have unaltered (endogenous) and another one have altered (exogenous or drugged) parameter values. Cells would interact with both of them.

      Our perturbations did not include delivery of exogenously delivered cytokines and instead were focused on microenvironmental changes in cytokine diffusion and decay rates or specific cytokine concentration levels. For example, the purpose of the VEGF delivery perturbation was to test how an increase in VEGF concentrations would alter regeneration outcome metrics with the assumption that the delivered VEGF would act in the same manner as the endogenous VEGF. We have clarified the purpose of the simulations on line 410. We agree that exploring if model predictions would be altered if endogenous and exogenous were represented separately; however, we did not explore this type of scenario.

      This work shows interesting emergent behavior from nonlinear cytokine interactions but the analysis does not provide insights into the underlying causes, e.g. which of the feedback loops dominates early versus late during a time course.

      Indeed, analyzing the model to fully understand the time-varying interactions between the multiple feedback loops is a challenge in and of itself, and we appreciate the opportunity to elaborate on our approach to addressing this challenge. First: the crosstalk/feedback between cytokines and the temporal nature was analyzed in the heatmap (Fig. 6) and lines 474-482. Second: the sensitivity of cytokine parameters to specific outputs was included in Table 9 and full-time course sensitivity is included in Supplemental Figure 2. Further correlation analysis was also included to demonstrate how cytokine concentrations influenced specific output metrics at various timepoints (Supplemental Fig. 3). We agree that further elaboration of these findings is required; therefore, we added lines 504-509 to discuss the specific mechanisms at play with the combined cytokine interactions. We also added more discussion (lines 637-638) regarding future work that could develop more analysis methods to further investigate the complex behaviors in the model.

      Reviewer #2 (Public Review):

      Strengths:

      The manuscript identified relevant model parameters from a long list of biological studies. This collation of a large amount of literature into one framework has the potential to be very useful to other authors. The mathematical methods used for parameterization and validation are transparent.

      Weaknesses:>

      I have a few concerns which I believe need to be addressed fully.

      My main concerns are the following:

      (1) The model is compared to experimental data in multiple results figures. However, the actual experiments used in these figures are not described. To me as a reviewer, that makes it impossible to judge whether appropriate data was chosen, or whether the model is a suitable descriptor of the chosen experiments. Enough detail needs to be provided so that these judgements can be made.

      Thank you for raising this point. We created a new table (Supplemental table 6) that describes the techniques used for each experimental measurement.

      (2) Do I understand it correctly that all simulations are done using the same initial simulation geometry? Would it be possible to test the sensitivity of the paper results to this geometry? Perhaps another histological image could be chosen as the initial condition, or alternative initial conditions could be generated in silico? If changing initial conditions is an unreasonably large request, could the authors discuss this issue in the manuscript?

      We appreciate your insightful question regarding the initial simulation geometry in our model. The initial configuration of the fibers/ECM/microvascular structures was kept consistent but the location of the necrosis was randomly placed for each simulation. Future work will include an in-depth analysis of altered histology configuration on model predictions which has been added to lines 618-621. We did a preliminary example analysis by inputting a different initial simulation geometry, which predicted similar regeneration outcomes. We have added Supplemental Figure 5 that provides the results of that example analysis.

      (3) Cytokine knockdowns are simulated by 'adjusting the diffusion and decay parameters' (line 372). Is that the correct simulation of a knockdown? How are these knockdowns achieved experimentally? Wouldn't the correct implementation of a knockdown be that the production or secretion of the cytokine is reduced? I am not sure whether it's possible to design an experimental perturbation which affects both parameters.

      We appreciate that this important question has been posed. Yes, in order to simulate the knockout conditions, the cytokine secretion was reduced/eliminated. The diffusion and decay parameters were also adjusted to ensure that the concentration within the system was reduced. Lines 391-394 were added to clarify this assumption.

      (4) The premise of the model is to identify optimal treatment strategies for muscle injury (as per the first sentence of the abstract). I am a bit surprised that the implemented experimental perturbations don't seem to address this aim. In Figure 7 of the manuscript, cytokine alterations are explored which affect muscle recovery after injury. This is great, but I don't believe the chosen alterations can be done in experimental or clinical settings. Are there drugs that affect cytokine diffusion? If not, wouldn't it be better to select perturbations that are clinically or experimentally feasible for this analysis? A strength of the model is its versatility, so it seems counterintuitive to me to not use that versatility in a way that has practical relevance. - I may well misunderstand this though, maybe the investigated parameters are indeed possible drug targets.

      Thank you for your thoughtful feedback. The first sentence (lines 32-34) of the abstract was revised to focus on beneficial microenvironmental conditions to best reflect the purpose of the model. The clinical relevance of the cytokine modifications is included in the discussion (lines 547-558) with additional information added to lines 524-526. For example, two methods to alter diffusion experimentally are: antibodies that bind directly to the cytokine to prevent it from binding to its receptor on the cell surface and plasmins that induce the release of bound cytokines.

      (5) A similar comment applies to Figure 5 and 6: Should I think of these results as experimentally testable predictions? Are any of the results surprising or new, for example in the sense that one would not have expected other cytokines to be affected as described in Figure 6?

      We appreciate the opportunity to clarify the basis for these perturbations. The perturbations included in Figure 5 were designed to mimic the conditions of a published experiment that delivered VEGF in vivo (Arsic et al. 2004, DOI:10.1016/J.YMTHE.2004.08.007). The perturbation input conditions and experimental results are included in Table 8 and Supplemental Table 6 has been added to include experimental data and method description of the perturbation. The results of this analysis provide both validation and new predictions, because some the outputs were measured in the experiments while others were not measured. The additional output metrics and timepoints that were not collected in the experiment allow for a deeper understanding of the dynamics and mechanisms leading to the changes in muscle recovery (lines 437-454). These model outputs can provide the basis for future experiments; for example, they highlight which time points would be more important to measure and even provide predicted effect sizes that could be the basis for a power analysis (lines 639-640).

      Regarding Figure 6, the published experimental outcomes of cytokine KOs are included in Table 8. The model allowed comparison of different cytokine concentrations at various timepoints when other cytokines were removed from the system due to the KO condition. The experimental results did not provide data on the impact on other cytokine concentrations but by using the model we were able to predict temporally based feedback between cytokines (lines 474-482). These cytokine values could be collected experimentally but would be time consuming and expensive. The results of these perturbations revealed the complex nature of the relationship between cytokines and how removal of one cytokine from the system has a cascading temporal impact. Lines 533-534 have been added to incorporate this into the discussion.

      (6) In figure 4, there were differences between the experiments and the model in two of the rows. Are these differences discussed anywhere in the manuscript?

      We appreciate your keen observation and the opportunity to address these differences. The model did not match experimental results for CSA output in the TNF KO and antiinflammatory nanoparticle perturbation or TGF levels with the macrophage depletion. While it did align with the other experimental metrics from those studies, it is likely that there are other mechanisms at play in the experimental conditions that were not captured by simulating the downstream effects of the experimental perturbations. We have added discussion of the differences to lines 445-454.

      (7) The variation between experimental results is much higher than the variation of results in the model. For example, in Figure 3 the error bars around experimental results are an order of magnitude larger than the simulated confidence interval. Do the authors have any insights into why the model is less variable than the experimental data? Does this have to do with the chosen initial condition, i.e. do you think that the experimental variability is due to variation in the geometries of the measured samples?

      Thank you for your insightful observations and questions. The lower model variability is attributed to the larger sample size of model simulations compared to experimental subjects. By running 100 simulations it narrows in the confidence interval (average 2.4 and max 3.3) compared to the experiments that typically had a sample size of less than 15. If the number of simulations had been reduced to 15 the stochasticity within the model results in a larger confidence interval (average 7.1 and max 10). There are also several possible confounding variables in the experimental protocols (i.e. variations in injury, different animal subjects for each timepoint, etc.) that are kept constant in the model simulation. We have added discussion of this point to the manuscript (lines 517519). Future work with the model will examine how variations in conditions, such as initial muscle geometry, injury, etc, alter regeneration outcomes and overall variability. This discussion has been incorporated into lines 640-643.

      (8) Is figure 2B described anywhere in the text? I could not find its description.

      Thank you for pointing that out. We have added a reference for Fig. 2B on line 190.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) The model code seems to be available from https://simtk.org/projects/muscle_regen but that website requests member status ("This is a private project. You must be a member to view its contents.") and applying for membership could violate eLife's blind review process. So, this reviewer liked to but couldn't run the model her/himself. To eLife: Can the authors upload their model to a neutral server that reviewers and editors can access anonymously?

      The code has been made publicly available on the following sites:

      SimTK: https://simtk.org/docman/?group_id=2635

      Zendo: https://zenodo.org/records/10403014

      GitHub: https://github.com/mh2uk/ABM-of-Muscle-Regeneration-with-MicrovascularRemodeling

      Line 121 has been updated with the new link and the additional resources were added to lines 654-657.

      (2) The muscle regeneration field typically studies 2D cross-sections and the present model can be well compared to these other 2D models but cells as stochastic and localized sources of diffusible cytokines may yield different cytokine fields in 3D vs. 2D. I would expect more broadened and smoothened cytokine fields (from sources in neighboring cross-sections) than what the 2D model predicts based on sources just within the focus cross-section. Such relations of 2D to 3D should be discussed.

      We thank the reviewer for the excellent suggestions and observations. It has been reported in other Compucell3D models (Sego et al. 2017, DOI:10.1088/17585090/aa6ed4) that the convergence of diffusion solutions between 2D and 3D model configurations had similar outcomes, with the 3D simulations presenting excessive computational cost without contributing any noticeable additional accuracy. Similarly, other cell-based ABMs that incorporate diffusion mechanisms (Marino et al. 2018, DOI:10.3390/computation6040058) have found that 2D and 3D versions of the model both predict the same mechanisms and that the 2D resolution was sufficient for determining outcomes. Lines 615-618 were added to elaborate on this topic.

      (3) Since the model (and title) focuses on "nonlinear" cytokine interactions, what would change if cytokine decay would not be linear (as modeled here) but saturated (with nonlinear Michaelis-Menten kinetics as ligand binding and endocytosis mechanisms would call for)?

      Thank you for raising an intriguing point. The model includes a combination of cytokine decay as well as ligand binding and endocytosis mechanisms that can be saturated. For a cytokine-dependent model behavior to occur the cytokines necessary to induce that action had to reach a minimum threshold. Once that threshold was reached, that amount of the cytokine would be removed at that location to simulate ligand-receptor binding and endocytosis. These ligand binding and endocytosis mechanisms behave in a saturated way, removing a set amount when above a certain threshold or a defined ratio when under the threshold. Lines 313-315 was revised to clarify this point. There were certain concentrations of cytokines where we saw a plateau in outputs likely as a result of reaching a saturation threshold (Supplemental Fig. 3). In future work, more robust mathematical simulation of binding kinetics of cytokines (e.g., using ODEs) could be included.

      (4) Limitations of the model should be discussed together with an outlook for model refinement. For example, fiber alignment and ECM ultrastructure may require anisotropic diffusion. Many of the rate equations could be considered with saturation parameters etc. There are so many model assumptions. Please discuss which would be the most urgent model refinements and, to achieve these, which would be the most informative next experiments to perform.

      We appreciate your thoughtful consideration of the model's limitations and the need for a comprehensive discussion on model refinements and potential future experiments. The future direction section was expanded to discuss additional possible model refinements (lines 635-643) and additional possible experiments for model validation (lines 630-634).

      (5) It is not clear how the single spatial arrangement that is used affects the model predictions. E.g. now the damaged area surrounds the lymphatic vessel but what if the opposite corner was damaged and the lymphatic vessel is deep inside the healthy area?

      Thank you for highlighting the importance of considering different spatial arrangements in the model and its potential impact on predictions. We previously tested model perturbations that included specifying the injury surrounding the lymphatic vessel versus on the side opposite the vessel. Since this paper focuses more on cytokine dynamics, we plan to include this perturbation, along with other injury alterations, in a follow-on paper. We added more context about this in the future efforts section lines 640-643.

      (6) It seems that not only parameter values but also the initial values of most of the model components are unknown. The parameter estimation strategy does not seem to include the initial (spatial) distributions of collagen and cytokines and other model components. Please discuss how other (reasonable) initial values or spatial arrangements will affect model predictions.

      We appreciate your thoughtful consideration of unknown initial values/spatial arrangements and their potential influence on predictions. Initial cytokine levels prior to injury had a low relative concentration compared to levels post injury and were assumed to be negligible. Initial spatial distribution of cytokines was not defined as initial spatial inputs (except in knockout simulations) but are secreted from cells (with baseline resident cell counts defined from the literature). The distribution of cytokines is an emergent behavior that results from the cell behaviors within the model. The collagen distribution is altered in response to clearance of necrosis by the immune cells (decreased collagen with necrosis removal) and subsequent secretion of collagen by fibroblasts. The secretion of collagen from fibroblast was included in the parameter estimation sweep (Supplemental Table 1).

      We are working on further exploring the model sensitivity to altered spatial arrangements and have added this to the future directions section (lines 618-621), as well as provided Supplemental Figure 5 to demonstrate that model outcomes are similar with altered initial spatial arrangements.

      (7) Many details of the CC3D implementation are missing: overall lattice size, interaction neighborhood order, and "temperature" of the Metropolis algorithm. Are the typical adhesion energy terms used in the CPM Hamiltonian and if so, then how are these parameter values estimated?

      Thank you for bringing attention to the missing details regarding the CC3D implementation in our manuscript. We have included supplemental information providing greater detail for CPM implementation (Lines 808-854). We also added two additional supplemental tables for describing the requested CC3D implementation details (Supplemental Table 4) and adhesion energy terms (Supplemental Table 5).

      (8) Extending the model analysis of combinations of altered cytokine properties, which temporal schedules of administration would be of interest, and how could the timing of multiple interventions improve outcomes? Such a discussion or even analysis would further underscore the usefulness of the model.

      In response to your valuable suggestion, lines 558-562 were added to discuss the potential of using the model as a tool to perturb different cytokine combinations at varying timepoints throughout regeneration. In addition, this is also included in future work in lines 636-637.

      (9) The CPM is only weakly motivated, just one sentence on lines 142-145 which mentions diffusion in a misleading way as the CPM just provides cells with a shape and mechanical interactions. The diffusion part is a feature of the hybrid CompuCell3D framework, not the CPM.

      Thank you for bringing up this distinction. We removed the statement regarding diffusion and updated lines 143-146 to focus on CPM representation of cellular behavior and interactions. We also added a reference to supplemental text that includes additional details on CPM.

      (10) On lines 258-261 it does not become clear how the described springs can direct fibroblasts towards areas of low-density collagen ECM. Are the lambdas dependent on collagen density?

      Thank you for highlighting this area for clarification. The fibroblasts form links with low collagen density ECM and then are pulled towards those areas based on a constant lambda value. The links between the fibroblast and the ECM will only be made if the collagen is below a certain threshold. We added additional clarification to lines 260-264.

      (11) On line 281, what does the last part in "Fibers...were regenerating but not fully apoptotic cells" mean? Maybe rephrase this.

      The last of part of that line indicates that there were some fibers surrounding the main injury site that were damaged but still had healthy portions, indicating that they were impacted by the injury and are regenerating but did not become fully apoptotic like the fiber cells at the main site of injury. We rephrased this line to indicate that the nearby fibers were damaged but not fully apoptotic.

      (12) Lines 290-293 describe interactions of cells and fields with localized structures (capillaries and lymphatic vessel). Please explain in more detail how "capillary agents...transport neutrophiles and monocytes" in the CPM model formalism. Are new cells added following rules? How is spatial crowding of the lattice around capillaries affecting these rules? Moreover, how can "lymphatic vessel...drain the nearby cytokines and cells"? How is this implemented in the CPM and how is "nearby" calculated? We appreciate your detailed inquiry into the interactions of cells and fields with localized structures. The neutrophils and monocytes are added to the simulation at the lattice sites above capillaries (within the cell layer Fig. 2B) and undergo chemotaxis up their respective gradients. The recruitment of the neutrophils and monocytes are randomly distributed among the healthy capillaries that do not have an immune cell at the capillary location (a modeling artifact that is a byproduct of only having one cell per lattice site). This approach helped to prevent an abundance of crowding at certain capillaries. Because immune cells in the simulation are sufficiently small, chemotactic gradients are sufficiently large, and the simulation space is sufficiently large, we do not see aggregation of recruited immune cells in the CPM.

      The lymphatic vessel uptakes cytokines at lattice locations corresponding to the lymphatic vessel and will remove cells located in lattice sites neighboring the lymphatic vessel. In addition, we have included a rule in our ABM to encourage cells to migrate towards the lymphatic vessel utilizing CompuCell3D External Potential Plugin. The influence of this rule is inversely proportional to the distance of the cells to the lymphatic vessel.

      We have updated lines 294-298 and 305-309 to include the above explanation.

      (13) Tables 1-4 define migration speeds as agent rules but in the typical CPM, migration speed emerges from random displacements biased by chemotaxis and other effects (like the slope of the cytokine field). How was the speed implemented as a rule while it is typically observable in the model?

      We appreciate your inquiry regarding the implementation of migration speeds. To determine the lambda parameters (Table 7) for each cell type, we tested each in a simplified control simulation with a concentration gradient for the cell to move towards. We tuned the lambda parameters within this simulation until the model outputted cell velocity aligned with the literature reported cell velocity for each cell type (Tables 1-4). We have incorporated clarification on this to lines 177-180.

      (14) Line 312 shows the first equation with number (5), either add eqn. (1-4) or renumber.

      We have revised the equation number.

      (15) Typos: Line 456, "expect M1 cell" should read "except M1 cell".

      Line 452, "thresholds above that diminish fibroblast response (Supplemental Fig 3)." remains unclear, please rephrase.

      Line 473, "at 28." should read "at 28 days.".

      Line 474, is "additive" correct? Was the sum of the individual effects calculated and did that match?

      Line 534, "complexity our model" should read "complexity in our model".

      We have corrected the typos and clarified line 452 (updated line 594) to indicate that the TNF-α concentration threshold results in diminished fibroblast response. We updated terminology line 474 (updated line 512) to indicate that there was a synergistic effect with the combined perturbation.

      (16) Table 7 defines cell target volumes with the same value as their diameter. This enforces a strange cell shape. Should there be brackets to square the value of the cell diameter, e.g. Value=(12µm)^2 ?

      The target volume parameter values were selected to reflect the relative differences in average cell diameter as reported in the literature; however, there are no parameters that directly enforce a diameter for the cells in the CPM formalism separate from the volume. We have observed that these relative cell sizes allow the ABM to effectively reproduce cell behaviors described in the literature. Single cells that are too large in the ABM would be unable to migrate far enough per time step to carry out cell behaviors, and cells that are too small in the CPM would be unstable in the simulation environment and not persist in the simulation when they should. We removed the units for the cell shape values in Table 7 since the target volume is a relative parameter and does not directly represent µm.

      (17) Table 7 gives estimated diffusion constants but they appear to be too high. Please compare them to measured values in the literature, especially for MCP-1, TNF-alpha and IL-10, or relate these to their molecular mass and compare to other molecules like FGF8 (Yu et al. 2009, DOI:10.1038/nature08391).

      We utilized a previously published estimation method (Filion et al. 2004, DOI:10.1152/ajpheart.00205.2004) for estimating cytokine diffusivity within the ECM. This method incorporates the molecular masses and accounts for the combined effects of the collagen fibers and glycosaminoglycans. The paper acknowledged that the estimated value is faster than experimentally determined values, but that this was a result of the less-dense matrix composition which is more reflective of the tissue environment we are simulating in contrast to other reported measurements which were done in different environments. Using this estimation method also allowed us to more consistently define diffusion constants versus using values from the literature (which were often not recorded) that had varied experimental conditions and techniques (such as being in zebrafish embryo Yu et al. 2009, DOI:10.1038/nature08391 as opposed to muscle tissue). This also allowed for recalculation of the diffusivity throughout the simulation as the collagen density changed within the model. Lines 318-326 were updated to help clarify the estimation method.

      (18) Many DOIs in the bibliography (Refs. 7,17,20,31,40,47...153) are wrong and do not resolve because the appended directory names are not allowed in the DOI, just with a journal's URL after resolution.

      Thank you for bringing this to our attention. The incorrect DOIs have been corrected.

      Reviewer #2 (Recommendations For The Authors):

      Minor comments:

      (9) On line 174, the authors say "We used the CC3D feature Flip2DimRatio to control the number of times the Cellular-Potts algorithm runs per mcs." What does this mean? Isn't one monte carlo timestep one iteration of the Cellular Potts model? How does this relate to physical timescales?

      We appreciate your attention to detail and thoughtful question regarding the statement about the use of the CC3D feature Flip2DimRatio. Lines 175-177 were revised to simplify the meaning of Flip2DimRatio. That parameter alters the number of times the Cellular-Potts algorithm is run, which is the limiting factor for cell movement. The physical timescale is kept to a 15-minute timestep but a high Flip2DimRatio allows more flexibility and stability to allow the cells to move faster in one timestep.

      (10) Has the costum matlab script to process histology images into initial conditions been made available?

      The Matlab script along with CC3D code for histology initialization with documentation has been made available with the source code on the following sites:

      SimTK: https://simtk.org/docman/?group_id=2635

      Zendo: https://zenodo.org/records/10403014

      GitHub: https://github.com/mh2uk/ABM-of-Muscle-Regeneration-with-MicrovascularRemodeling

      (11) Equation 5 is provided without a reference or derivation. Where does it come from and what does it mean?

      Thank you for highlighting the diffusion equation and seeking clarification on its origin and significance. Lines 318-326 were revised to clarify where the equation comes from. This is a previously published estimation method that we applied to calculate the diffusivity of the cytokines considering both collagen and glycosaminoglycans.

      (12) Line 326: "For CSA, experimental fold-change from pre-injury was compared with fold-change in model-simulated CSA". Does this step rely on the assumption that the fold change will not depend on the CSA? If so, is this something that is experimentally known, or otherwise, can it be confirmed by simulations?

      We appreciate the opportunity to clarify our rationale. The fold change was used as a method to normalize the model and experiment so that they could be compared on the same scale. Yes, this step relies on the assumption that fold change does not depend on pre-injury CSA. Experimentally it is difficult to determine the impact of initial fiber morphology on altered regeneration time course. This fold-change allows us to compare percent recovery which is a common metric utilized to assess muscle regeneration outcomes experimentally. Line 340-343 was revised to clarify.

      (13) Line 355: "The final passing criteria were set to be within 1 SD for CSA recovery and 2.5 SD for SSC and fibroblast count" Does this refer to the experimental or the simulated SD?

      The model had to fit within those experimental SD. Lines 371-372 was edited to specify that we are referring the experimental SD.

      (14) "Following 8 iterations of narrowing the parameter space with CaliPro, we reached a set that had fewer passing runs than the previous iteration". Wouldn't one expect fewer passing runs with any narrowing of the parameter space? Why was this chosen as the stopping criterion for further narrowing?

      We appreciate your observation regarding the statement about narrowing the parameter space with CaliPro. We started with a wide parameter space, expecting that certain parameters would give outputs that fall outside of the comparable data. So, when the parameter space was narrowed to enrich parts that give passing output, initially the number of passing simulations increased.

      Once we have narrowed the set of possible parameters into an ideal parameter space, further narrowing will cut out viable parameters resulting in fewer passing runs. Therefore, we stopped narrowing once any fewer simulations passed the criteria that they had previously passed with the wider parameter set. Lines 375-379 have been updated to clarify this point.

      (15) Line 516: 'Our model could test and optimize combinations of cytokines, guiding future experiments and treatments." It is my understanding that this is communicated as a main strength of the model. Would it be possible to demonstrate that the sentence is true by using the model to make actual predictions for experiments or treatments?

      This is demonstrated by the combined cytokine alterations in Figure 7 and discussed in lines 509-513. We have also added in a suggested experiment to test the model prediction in lines 691-695.

      (16) Line 456, typo: I think 'expect' should be 'except'.

      Thank you for pointing that out. The typo has been corrected.

    1. Author response:

      Reviewer #1 (Public Review):

      Summary:

      The authors collected genomic information from public sources covering 423 eukaryote genomes and around 650 prokaryote genomes. Based on pre-computed CDS annotation, they estimated the frequency of alternative splicing (AS) as a single average measure for each genome and computed correlations with this measure and other genomic properties such as genome size, percentage of coding DNA, gene and intergenic span, etc. They conclude that AS frequency increases with genome complexity in a somewhat directional trend from "lower" organisms to "higher" organisms.

      Strengths:

      The study covers a wide range of taxonomic groups, both in prokaryotes and eukaryotes.

      Weaknesses:

      The study is weak both methodologically and conceptually. Current high throughput sequencing technologies, coupled with highly heterogeneous annotation methods, can observe cases of AS with great sensitivity, and one should be extremely cautious of the biases and rates of false positives associated with these methods. These issues are not addressed in the manuscript. Here, AS measures seem to be derived directly from CDS annotations downloaded from public databases, and do not account for differing annotation methods or RNA sequencing depth and tissue sample diversity.

      We are aware of the bias that may exist in annotation files. Since the source of noise can be highly variable, we have assumed that most of the data has a similar bias. However, we agree with the reviewer that we could perform some analysis to test for these biases and their association to different methodologies. Thus, we will measure the uncertainty present in the data. From one side, we will be more explicit about the data limitations and the biases it can generate in the results. On the other side, while analyzing the false positives in the data is out of our scope, we will perform a statistical test to detect possible biases regarding different methods of sequencing and annotation, and types of organisms (model or non-model organisms). If positive, we will proceed, as far as possible, to normalize the data or to estimate a confidence interval.

      Here, AS measures seem to be derived directly from CDS annotations downloaded from public databases, and do not account for differing annotation methods or RNA sequencing depth and tissue sample diversity.

      Beyond taking into account the differential bias that may exist in the data, we do not consider that our AS measure is problematic. The NCBI database is one of the most reliable databases that we have to date and is continuously updated from all scientific community. So, the use of this data and the corresponding procedures for deriving the AS measure are perfectly acceptable for a comparative analysis on such a huge global scale. Furthermore, the proposal of a new genome-level measure of AS that allows to compare species spanning the whole tree of life is part of the novelty of the study. We understand that small-scale studies require a high specificity about the molecular processes involved in the study. However, this is not the case, where we are dealing with a large-scale problem. On the other side, as we have previously mention, we agree with the reviewer to analyze the degree of uncertainty in the data to better interpret the results.

      There is no mention of the possibility that AS could be largely caused by random splicing errors, a possibility that could very well fit with the manuscript's data. Instead, the authors adopt early on the view that AS is regulated and functional, generally citing outdated literature.

      There is no question that some AS events are functional, as evidenced by strongly supported studies. However, whether all AS events are functional is questionable, and the relative fractions of functional and non-functional AS are unknown. With this in mind, the authors should be more cautious in interpreting their data.

      Many studies suggest that most of the AS events observed are the result of splicing errors and are therefore neither functional nor conserved. However, we still have limited knowledge about the functionality of AS. Just because we don’t have a complete understanding of its functionality, doesn’t mean there isn’t a fundamental cause behind these events. AS is a highly dynamic process that can be associated with processes of a stochastic nature that are fundamental for phenotypic diversity and innovation. This is one of the reasons why we do not get into a discussion about the functionality of AS and consider it as a potential measure of biological innovation. Nevertheless, we agree with the reviewer’s comments, so we will add a discussion about this issue with updated literature and look at any possible misinterpretation of the results.

      The "complexity" of organisms also correlates well (negatively) with effective population size. The power of selection to eliminate (slightly) deleterious mutations or errors decreases with effective population size. The correlation observed by the authors could thus easily be explained by a non-adaptive interpretation based on simple population genetics principles.

      We appreciate the observation of the reviewer. We know well the M. Lynch’s theory on the role of the effective population size and its eventual correlation with genomic parameters, but we want to emphasize that our objective is not to find an adaptive or non-adaptive explanation of the evolution of AS, but rather to reveal it. Nevertheless, as the reviewer suggests, we will look at the correlation between the AS and the effective population size and discuss about a possible non-adaptive interpretation.

      The manuscript contains evidence that the authors might benefit from adopting a more modern view of how evolution proceeds. Sentences such as "... suggests that only sophisticated organisms optimize alternative splicing by increasing..." (L113), or "especially in highly evolved groups such as mammals" (L130), or the repeated use of "higher" and "lower" organisms need revising.

      As the reviewer suggests, we will proceed with the corresponding linguistic corrections.

      Because of the lack of controls mentioned above, and because of the absence of discussion regarding an alternative non-adaptive interpretation, the analyses presented in the manuscript are of very limited use to other researchers in the field. In conclusion, the study does not present solid conclusions.

      Reviewer #2 (Public Review):

      Summary:

      In this contribution, the authors investigate the degree of alternative splicing across the evolutionary tree and identify a trend of increasing alternative splicing as you move from the base of the tree (here, only prokaryotes are considered) towards the tips of the tree. In particular, the authors investigate how the degree of alternative splicing (roughly speaking, the number of different proteins made from a single ORF (open reading frame) via alternative splicing) relates to three genomic variables: the genome size, the gene content (meaning the fraction of the genome composed of ORFs), and finally, the coding percentage of ORFs, meaning the ratio between exons and total DNA in the ORF. When correlating the degree of alternative splicing with these three variables, they find that the different taxonomic groups have a different correlation coefficient, and identify a "progressive pattern" among metazoan groups, namely that the correlation coefficient mostly increases when moving from flowering plants to arthropods, fish, birds, and finally mammals. They conclude that therefore the amount of splicing that is performed by an organismal group could be used as a measure of its complexity.

      Weaknesses:

      While I find the analysis of alternative splicing interesting, I also find that it is a very imperfect measure of organismal complexity and that the manuscript as a whole is filled with unsupported statements. First, I think it is clear to anyone studying evolution over the tree of life that it is the complexity of gene regulation that is at the origin of much of organismal structural and behavioral complexity. Arguably, creating different isoforms out of a single ORF is just one example of complex gene regulation. However, the complexity of gene regulation is barely mentioned by the authors.

      We disagree with the reviewer with that our measure of AS is imperfect. Just as we responded to the first reviewer, we will quantify the uncertainty in the data and correct for differential biases caused by annotation and sequencing methods. Thus, beyond correcting relevant biases in the data, we consider that our measure is adequate for a comparative analysis at a global scale. A novelty of our study is the proposal of a genome-level measure of AS that takes into account data from the entire scientific community. 

      We want also to emphasize that we assume from the beginning that AS may reflect some kind of biological complexity, it is not a conclusion from the results. An argument in favor of such an assumption is that AS is associated with stochastic processes that are fundamental for phenotypic diversity and innovation. Of course, we agree with the reviewer that it is not the only mechanism behind biological complexity, so we will emphasize it in the manuscript. On the other side, we will be more explicit about the assumptions and objectives, and will correct any unsupported statement.

      Further, it is clear that none of their correlation coefficients actually show a simple trend (see Table 3). According to these coefficients, birds are more complex than mammals for 3 out of 4 measures.

      An evolutionary trend is broadly defined as the gradual change in some characteristic of organisms as they evolve or adapt to a specific environment. Under our context, we define an evolutionary trend as the gradual change in genome composition and its association with AS across the main taxonomic groups. If we look at Figure 4 and Table 3 we can conclude that there is a progressive trend. We will be more precise about how we define an evolutionary trend and correct any possible misinterpretation of the results. On the other side, we do not assume that mammals should be more complex than birds. First, we will emphasize that our results show that birds have the highest values of such a trend. Second, after reading the reviewer’s comments, we have decided that we will perform an additional analysis to correct for differences in the taxonomic group sizes, which will allow us to have more confidence in the results.

      It is also not clear why the correlation coefficient between alternative splicing ratio and genome length, gene content, and coding percentage should display such a trend, rather than the absolute value. There are only vague mechanistic arguments.

      The study analyzes the relationship of AS with genomic composition for the large taxonomic groups. We assume that significant differences in these relationships are indicators of the presence of different mechanisms of genome evolution. However, we agree with the reviewer that a correlation does not imply a causal relation, so we will be more cautious when interpreting the results.

      To quantify the relationships we use correlation coefficients, the slopes of such correlations, and the relation of variability. Although the absolute values of AS are also illustrated in Table 4, we consider that they are less informative than if we include how it relates to the genomic composition. For example, we observe that plants have a different genome composition and relation with AS if compared to animals, which suggest that they follow different mechanisms of genome evolution. On the other hand, we observe a trend in animals, where high values of AS are associated to a large percentage of introns and a percentage of intergenic DNA of about the 50% of genomes.

      Much more troubling, however, is the statement that the data supports "lineage-specific trends" (lines 299-300). Either this is just an ambiguous formulation, or the authors claim that you can see trends *within* lineages.

      We agree with the reviewer that this statement is not correct, so we will proceed to correct it.

      The latter is clearly not the case. In fact, within each lineage, there is a tremendous amount of variation, to such an extent that many of the coefficients given in Table 3 are close to meaningless. Note that no error bars or p-values are presented for the values shown in Table 3. Figure 2 shows the actual correlation, and the coefficient for flowering plants there is given as 0.151, with a p-value of 0.193. Table 3 seems to quote r=0.174 instead. It should be clear that a correlation within a lineage or species is not a sign of a trend.

      The reviewer is not understanding correctly the results in Table 3. It is precisely the variation of the genome variables what we are measuring. Given the standardization of these values by the mean values, we have proceeded to compare the variability between groups, which is the result shown in Table 3. In this case there are no error bars or p-values associated. On the other hand, we agree that a correlation is not a sign of a trend. But the relations of variability, together with the results obtained in Figure 3, are indicators of a trend. As we mentioned before, we will proceed to analyze whether the variation in the group sizes is causing a bias in the results.

      There are several wrong or unsupported statements in the manuscript. Early on, the authors state that the alternative splicing ratio (a number greater or equal to one that can be roughly understood as the number of different isoforms per ORF) "quantifies the number of different isoforms that can be transcribed using the same amount of information" (lines 51-52). But in many cases, this is incorrect, because the same sequence can represent different amounts of information depending on the context. So, if a changed context gives rise to a different alternative splice, it is because the genetic sequence has a different meaning in the changed context: the information has changed.

      We agree that there are not well supported statements, so we will proceed to revise them.

      In line 149, the authors state that "the energetic cost of having large genomes is high". No citation is given, and while such a statement seems logical, it does not have very solid support.

      We will also revise the bibliography and support our statements with updated references.

      If there was indeed a strong selective force to reduce genome size, we would not see the stunning diversity of genome sizes even within lineages. This statement is repeated (without support) several times in the manuscript, apparently in support of the idea that mammals had "no choice" to increase complexity via alternative splicing because they can't increase it by having longer genomes. I don't think this reasoning can be supported.

      We agree with the reviewer in this issue, so we will carefully revise the statements that indirectly (or directly) assume the action of selective forces on the genome composition.

      Even more problematic is the statement that "the amount of protein-coding DNA seems to be limited to a size of about 10MB" (line 219). There is no evidence whatsoever for this statement.

      In Figure 1A we observe a one-to-one relationship between the genome size and the amount of coding. However, in multicellular organisms, although the genome size increases we observe that the amount of coding does not increase by more than 10Mb, which suggest the presence of some genomic limitation. Of course, this is not an absolute or general statement, but rather a suggestion. We are only describing our results.

      The reference that is cited (Choi et al 2020) suggests that there is a maximum of 150GB in total genome size due to physiological constraints. In lines 257-258, the authors write that "plants are less restricted in terms of storing DNA sequences compared to animals" (without providing evidence or a citation).

      We will revise the bibliography and add updated references.

      I believe this statement is made due to the observation that plants tend to have large intergenic regions. But without examining the functionality of these interagency regions (they might host long non-coding RNA stretches that are used to regulate the expression of other genes, for example) it is quite adventurous to use such a simple measure as being evidence that plants "are less restricted in terms of storing DNA sequences", whatever that even means. I do not think the authors mean that plants have better access to -80 freezers. The authors conclude that "plant's primary mechanism of genome evolution is by expanding their genome". This statement itself is empty: we know that plants are prone to whole genome duplication, but this duplication is not, as far as we understand, contributing to complexity. It is not a "primary mechanism of genome evolution".

      We will revise these statements.

      In lines 293-294, the authors claim that "alternative splicing is maximized in mammalian genomes". There is no evidence that this ratio cannot be increased. So, to conclude (on lines 302-303) that alternative splicing ratios are "a potential candidate to quantify organismal complexity" seems, based on this evidence, both far-fetched and weak at the same time.

      Our results show the highest values of AS in mammals, but we understand that the results are limited to the availability and accuracy of data, which we will emphasize in the manuscript. As we previously mention, we will also proceed to analyze the uncertainty in data and carry out the appropriate corrections.

      I am also not very comfortable with the data analysis. The authors, for example, say that they have eliminated from their analysis a number of "outlier species". They mention one: Emmer wheat because it has a genome size of 900 Mb (line 367). Since 900MB does not appear to be extreme, perhaps the authors meant to write 900 Gb. When I consulted the paper that sequenced Triticum dicoccoides, they noted that 14 chromosomes are about 10GB. Even a tetraploid species would then not be near 900Gb. But more importantly, such a study needs to state precisely which species were left out, and what the criteria are for leaving out data, lest they be accused of selecting data to fit their hypothesis.

      The reviewer is right, we wanted to say 900Mb, which is approximately 7.2Gb. We had a mistake of nomenclature. This value is extreme compared to the typical values, so it generates large deviations when applying measures of central tendency and dispersion. We want to obtain mean values that are representative of the most species composing the taxonomic groups, so we find appropriate to exclude all outlier values in the study. Nevertheless, we will specify the criteria that we have used to select the data in a rigorous way.

      I understand that Methods are often put at the end of a manuscript, but the measures discussed here are so fundamental to the analysis that a brief description of what the different measures are (in particular, the "alternative splicing ratio") should be in the main text, even when the mathematical definition can remain in the Methods.

      We agree with the reviewer, so we will add a brief description of the genomic variables at the beginning of the Results section.

      Finally, a few words on presentation. I understand that the following comments might read differently after the authors change their presentation. This manuscript was at the border of being comprehensible. In many cases, I could discern the meaning of words and sentences in contexts but sometimes even that failed (as an example above, about "species-specific trends", illustrates). The authors introduced jargon that does not have any meaning in the English language, and they do this over and over again.

      Note that I completely agree with all the comments by the other reviewer, who alerted me to problems I did not catch, including the possible correlation with effective population size: a possible non-adaptive explanation for the results.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      Specific comments to improve the quality of the work:

      (1) The choice of subunits to tag are really not ideal. In the available structures of the human proteasome, The C-terminus of Rpn3/PSMD3 points directly toward the ATPase pore and is likely to disrupt the structure and/or dynamics of the proteasome during proteolysis (see comments regarding controls for functionality below). Similarly, the C-terminal tail of Rpt1/PSMC2 has a key role in the opening of the 20S core particle gate for substrate translocation and processing (see 2018 Nature Communications, 9:1360 and 2018 Cell Reports 24:1301-1315), and Alpha3/PSMA4 can be substituted by a second copy of Alpha4/PSMA7 in some conditions (although tagging Alpha3/PSMA4 would admittedly provide a picture of the canonical proteasome interactome while actively excluding the interactome of the non-canonical proteasomes that form via replacement of Alpha3/PSMA4). Comparison of these cell lines with lines harboring tags on subunits that are commonly used for tagging in the field because of a lack of impacts, such as the N-terminus of Rpn1/PSMD2, the C-terminus of Rpn11/PSMD14, and the C-terminus of Beta4/PSMB2 would help instill confidence that the interactome reported largely arises from mature, functional proteasomes rather than subcomplexes, defective proteasomes, or other species that may occur due to tagging at these positions.

      We thank the reviewer for pointing this out. The original purpose of our strategy was to establish proximity labeling of proteasomes to enable applications both in cell culture and in vivo. The choice of PSMA4 and PSMC2 was dictated by previous successful tagging with GFP in mammalian cells (Salomons et al., Exp Cell Res 2010)(Bingol and Schuman, Nature 2006). However, the choice of C-terminal PSMC2 might have been not optimal. HEK293 cells overexpressing PSMC2-BirA show slower growth and the BioID data retrieve higher enrichment of assembly factors suggesting slower assembly of this fusion protein in proteasome. Although we did not observe a negative impact on overall proteasome activity and PSMC2-BirA was (at least in part) incorporated into fully assembled proteasomes as indicated by enrichment of 20S proteins.We apologize for not making it clear that we labeled the N-terminus of PSMD3/Rpn3 and not the C-terminus (Figure 1a and S1a). Therefore, we included in Figure S1a of the revised manuscript structures of the proteasome where the tagged subunit termini are highlighted: C-terminus for PSMA4 and PSMC2 and N-terminus for PSMD3. Additionally, we would like to point out that, differently from PSMC2-BirA, cells expressing BirA-PSMD3 did not show slower growth, and BioID data showed a more homogenous enrichment of both 19S and 20S proteins, as compared to PSMC2-BirA (Figure 1D and 1E). However, the overall level of enrichment of proteasome subunits was not comparable to PSMA4-BirA and, therefore, we opted for focusing the rest of the manuscript on this construct.

      In support of this point, the data provided in Figure 1E in which the change in the abundances of each proteasome subunit in the tagged line vs. the BirA control line demonstrates substantial enrichment of the subcomplexes of the proteasome that are tagged in each case; this effect may represent the known feedback-mediated upregulation of new proteasome subunit synthesis that occurs when proteasomal proteolysis is impaired, or alternatively, the accumulation of subcomplexes containing the tagged subunit that cannot readily incorporate into mature proteasomes. Acknowledging this limitation in the text would be valuable to readers who are less familiar with the proteasome.

      We would like to clarify that the data shown in Figure 1E do not represent whole proteome data, but rather log2 fold changes vs. BirA* control calculated on streptavidin enrichment samples. The differences in the enrichment of the various subcomplexes between cell lines derives from the fact that the effect size of the enrichment depends on both protein abundance in the isolated complexes, but also on the efficiency of biotinylation. The latter will be higher for proteins located in closer proximity to the bait. A similar observation was pointed out in a recent publication (PMID:36410438) that compared BioID and Co-IP for the same bait. When a component of the nuclear pore complex (Nup158) was analyzed by BioID only the more proximal proteins were enriched as compared to the whole complex in Co-IP data (Author response image 1):

      Author response image 1.

      Proteins identified in the NUP158 BioID or pulldown experiments are filled in red or light red for significance intervals A or B, respectively. The bait protein NUP158 is filled in yellow. Proteins enriched in the pulldown falling outside the SigA/B cutoff are filled in gray. NPC, nuclear pore complex. SigA, significant class A; SigB, significant class B. Reproduced from Figure 6 of PMID: 36410438.

      However, we would like to point out that despite quantitative differences between different proteasome subunits, both 19S and 20S proteins were found to be strongly enriched (typically >2 fold) in all the constructs compared to BirA* control line (Figure 1E). This indicates that at least a fraction of all the tagged subunits are incorporated into fully assembled proteasomes.

      Regarding the upregulation of proteasome subunits as a consequence of proteasome dysfunction, we did not find evidence of this, at least in the case of PSMA4. The immunoblot shown in Figure 2A and its quantification in S3A indicate no increased abundance of endogenous PSMA4 upon tetracycline induction of PSMA4-BirA*.

      (2) The use of myc as a substrate of the proteasome for demonstration that proteolysis is unaffected is perhaps not ideal. Myc is known to be degraded via both ubiquitin-dependent and ubiquitin-independent mechanisms, such that disruption of one means of degradation (e.g., ubiquitin-dependent degradation) via a given tag could potentially be compensated by another. A good example of this is that the C-terminal tagging of PSMC2/Rpt1 is likely to disrupt interaction between the core particle and the regulatory particle (as suggested in Fig. 1D); this may free up the core particle for ubiquitin-independent degradation of myc.

      Aside from using specific reporters for ubiquitin-dependent vs. independent degradation or a larger panel of known substrates, analysis of the abundance of K48-ubiquitinated proteins in the control vs. tag lines would provide additional evidence as to whether or not proteolysis is generally perturbed in the tag lines.

      We thank the reviewer for this suggestion. We have included an immunoblot analysis showing that the levels of K48 ubiquitylation (Figure S3d) are not affected by the expression of tagged PSMA4.

      (3) On pg. 8 near the bottom, the authors accidentally refer to ARMC6 as ARMC1 in one instance.

      We have corrected the mistake.

      (4) On pg. 10, the authors explain that they analyzed the interactome for all major mouse organs except the brain; although they explain in the discussion section why the brain was excluded, including this explanation on pg. 10 here instead of in the discussion might be a better place to discuss this.

      We moved the explanation from the discussion to the results part.

      Reviewer #2 (Recommendations For The Authors):

      (1) Perhaps the authors can quantify the fraction of unassembled PSMA4-BirA* from the SEC experiment (Fig. 2b) to give the readers a feeling for how large a problem this could be.

      The percentages based on Area Under the Curve calculations have been added to Figure S3b.

      (2) Do the authors observe any difference in the enrichment scores between proteins that are known to interact with the proteasome vs proteins that the authors can justify as "interactors of interactors" vs the completely new potential interactors? This could be an interesting way to show that the potential new interactors are not simply because of poor false positive rate calibration, but that they behave in the same way as the other populations.

      We thank the reviewer for this suggestion. We analyzed the enrichment scores for 20S proteasome subunits, known PIPs, first neighbors and the remaining enriched proteins. The remaining proteins (potential new interactors) have very similar scores as the first neighbors of known interactors. This plot has been added to Figure S3g.

      (3) Did the authors try to train a logistic model for the miniTurbo experiments, like it was done for the BirA* experiments? Perhaps combining the results of both experiments would yield higher confidence on the proteasome interactors.

      Following the reviewers suggestion, we applied the classifier on the dataset of the comparison between miniTurbo and PSMA-miniTurbo. We found a clear separation between the FPR and the TPR with 136 protein groups enriched in PSMA-miniTurbo. We have added the classifier and corresponding ROC curve to Figure S4f and S4g.

      75 protein groups were found to be enriched for both PSMA4-BirA* and PSMA4-miniTurbo (Author response image 2), including the proteasome core particles, regulatory particles, known interactors and potential new interactors. As we focused more on the identification of substrates with PSMA4-miniTurbo, we did not pursue these overlapping protein groups further, but rather used the comparison to the mouse model to identify potential new interactors.

      Author response image 2.

      Overlap between ProteasomeID enriched proteins (fpr<0.05) between PSMA4-BirA* and PSMA4-miniTurbo.

      (4) Perhaps this is already known, but did the authors check if MG132 affect proteasome assembly? The authors could for example repeat their SEC experiments in the presence of MG132.

      We thank the reviewer for the suggestion, however to our knowledge there are no reports that MG132 has an effect on the assembly of the proteasome. MG132 is one of the most used proteasome inhibitors in basic research and as such has been extensively characterized in the last 3 decades. The small peptide aldehyde acts as a substrate analogue and binds directly to the active site of the protease PSMB5/β5. We therefore think it is unlikely that MG132 is interfering with the assembly of the proteasome.

      (5) Minor comment: at the bottom of page 8, the authors probably mean ARMC6 and not ARMC1.

      We have corrected the mistake.

      (6) It would be interesting to expand the analysis of the already acquired in vivo data to try to identify tissue-specific proteasome interactors. Can the authors draw a four-way Venn diagram with the interactors of each tissue?

      We thank the reviewer for this suggestion. We have generated an UpSet plot showing the overlap of ProteasomeID enriched proteins in the four tissues that gave us meaningful results (Author response image 3). In order to investigate whether the observed differences in ProteasomeID enriched proteins could be meaningful in terms of proteasome biology, we have highlighted proteins belonging to the UPS that show tissue specific enrichments. We found proteasome activators such as PSME1/PA28alpha and PSME2/PA28beta to enrich preferentially in kidney and liver, respectively, as well as multiple deubiquitinases to enrich preferentially in the heart. These differences might be related to the specific cellular composition of the different tissues, e.g., number of immune cells present, or the tissue-specific interaction of proteasomes with enzymes involved in the ubiquitin cycle. Given the rather preliminary nature of these findings, we have opted for not including this figure in the main manuscript, but rather include it only in this rebuttal letter.

      Author response image 3.

      Upset plot showing overlap between ProteasomeID enriched proteins in different mouse organs.

      Reviewer #3 (Recommendations For The Authors):

      (1) In the first paragraph of the Introduction, the authors link cellular senescence caused by partial proteasome inhibition with the efficacy of proteasome inhibitors in cancer therapy. Although this is an interesting hypothesis, I am not aware of any direct evidence for this; rather, I believe the efficacy of bortezomib/carfilzomib in haematological malignancies is most commonly attributed to these cells having adapted to high levels of proteotoxic stress (e.g., chronic unfolded protein response activation). I would suggest rephrasing this sentence.

      We thank the reviewer for the comment and have amended the introduction.

      (2) For the initial validation experiments (e.g., Fig. 1B), have the authors checked what level of Streptavidin signal is obtained with "+ bio, - tet" ? Although I accept that the induction of PSMA4-BirA* upon doxycycline addition is clear from the anti-Flag blots, it would still be informative to ascertain what level of background labelling is obtained without induction (but in the presence of exogenous biotin).

      We tested four different conditions +/- tet and +/- biotin (24h) in PSMA4-BirA* cell lines (Author response image 4). As expected, biotinylation was most pronounced when tet and biotin were added. When biotin was omitted, streptavidin signal was the lowest regardless of the addition of tet. Compared to the -biotin conditions, a slight increase of streptavidin signal could be observed when biotin was added but tet was not added. This could be either due to the promoter leaking (PMID: 12869186) or traces of tetracycline in the FBS we used, as we did not specifically use tet-free FBS for our experiments.

      Author response image 4.

      Streptavidin-HRP immunoblot following induction of BirA fusion proteins with tetracycline (+tet) and supplementation of biotin (+bio). For the sample used as expression control tetracycline was omitted (-tet). To test background biotinylation, biotin supplementation was omitted (-bio). Immunoblot against BirA and PSMA was used to verify induction of fusion proteins, while GAPDH was used as loading control.

      (3) For the proteasome structure models in Fig. 1D, a scale bar would be useful to inform the reader of the expected 10 nm labelling radius (as the authors have done later, in Fig. 2D).

      We have added 10 nm scale bars to Figure 1d.

      (4) In the "Identification of proteasome substrates by ProteasomeID" Results subsection, I believe there is a typo where the authors refer to ARMC1 instead of ARMC6.

      We have corrected the mistake.

      (5) I think Fig. S5 was one of the most compelling in the manuscript. Given the interest in confirming on-target efficacy of targeted degradation modalities, as well as identifying potential off-target effects early-on in development, I would consider promoting this out of the supplement.

      We thank the reviewer for the comment and share the excitement about using ProteasomeID for targeted degradation screening. We have moved the data on PROTACs (Figure S5) into a new main Figure 5.

      In addition, in relation to the comment of this reviewer regarding the detection of endogenous substrates, we have now included validation for one more hit emerging from our analysis (TIGD5) and included the results in Figure 4f, 4g and S4j.

    1. Author response:

      Overall recommendations.

      A brief summary of the main reviewers' recommendations that should be prioritized is listed below. Detailed recommendations as suggested by each individual reviewer are also included.

      -Better justification of the choice of the substitutions for the mutations should be added. In addition, authors should strongly consider adding more mutations to enable mechanistic tests of the proposed model for lipid conduction.

      We will characterize more mutations to the key residues at the TM4-TM6 interface. In addition to the TM4 lysine mutations shown in the original manuscript, we will include mutations to alanine and glutamate, and justify our choice of the substitutions in the revised manuscript. Furthermore, we will also test if introducing lysine mutations in TM6 will convert the ion channels into lipid scramblases. These additional experiments will greatly strengthen our conclusion.

      -Blockers to validate the concern that the recorded currents indeed arise from TMEM16A or OSCA/TMEM63 channels should be tested. Do the pore blockers also block scramblase activity in the gating mutants?

      TMEM16A and OSCA1.2 are readily expressed on cell surface. OSCA1.2 also has large conductance. This is the reason why we can record huge current even with inside-out patches. We will include TMEM16A inhibitor Ani9 and a non-specific inhibitor of OSCA channels to further validate. We have reported that Ani9 can inhibit a TMEM16A-derived lipid scramblase (L543K in TM4) in our previo3us publication (PMID: 31015464). We will test if Ani9 can also inhibit other TMEM16A scramblases reported in this study. We will also examine if Gd3+ is capable of blocking lipid scrambling of the OSCA1.2 gating mutations.

      -Include details of missing experimental conditions for scramblase activity.

      We will conduct a thorough revision to include detailed experimental conditions for scramblase activity measurement.

      -Additional mutants above and below the putative lysine gate as suggested by reviewer 3 to better assess the model.

      As we explained in Response #1, we will extend our mutations around the putative activation gate.

      -Concern about whether osmolarity changes are in fact activating OSC and TMEM63. As suggested by reviewers 1 and 3. This could be addressed by assessing scramblase activity and currents at different osmolarity levels.

      We will test the engineered OSCA1.2 scramblases in response to solutions with different osmolarity.

      Reviewer #1 (Public Review):

      Summary:

      TMEM16, OSCA/TMEM63, and TMC belong to a large superfamily of ion channels where TMEM16 members are calcium-activated lipid scramblases and chloride channels, whereas OSCA/TMEM63 and TMCs are mechanically activated ion channels. In the TMEM16 family, TMEM16F is a well-characterized calcium-activated lipid scramblase that plays an important role in processes like blood coagulation, cell death signaling, and phagocytosis. In a previous study, the group demonstrated that lysine mutation in TM4 of TMEM16A can enable the calcium-activated chloride channel to permeate phospholipids too. Based on this they hypothesize that the energy barrier for lipid scramblase in these ion channels is low, and that modification in the hydrophobic gate region by introducing a charged side chain between the TM4/6 interface in TMEM16 and OSCA/TMEM63 family can allow lipid scramblase. In this manuscript, using scramblase activity via Annexin V binding to phosphatidylserine, and electrophysiology, the authors demonstrate that lysine mutation in TM4 of TMEM16F and TMEM16A can cause constitutive lipid scramblase activity. The authors then go on to show that analogous mutations in OSCA1.2 and TMEM63A can lead to scramblase activity.

      Strengths:

      Overall, the authors introduce an interesting concept that this large superfamily can permeate ions and lipids.

      Weaknesses:

      The electrophysiology data does not entirely support their claims.

      We appreciate your positive comments. We will conduct more experiments including more electrophysiology characterizations as suggested.

      Reviewer #2 (Public Review):

      This concise and focused study by Lowry and colleagues identifies a motif in the pores of three families of channel/scramblase proteins that regulate exclusive ion permeation and lipid transport. These three ion channel families, which include the TMEM16s, the plant-expressed and stress-gated cation channel OSCA, and the mammalian homolog and mechanosensitive cation channel, TMEM63 share low sequence similarity between them and have seemingly differing functions, as anion (TMEM16s), or stress-activated cation channels (OSCA/TMEM63). The study finds that in all three families, mutating a single hydrophobic residue in the ion permeation pathway of the channels confers lipid transport through the pores of the channels, indicating that TMEM16 and the related OSCA and TMEM63 channels have a conserved potential for both ion and lipid permeation. The authors interpret the findings as revealing that these channel/scramblase proteins have a relatively low "energetic barrier for scramblase" activity. The experiments themselves seem to be done with a high level of rigor and the paper is well written. A weakness is the limited scope of the experiments which, if fixed, could open up a new line of inquiry.

      We appreciate the positive comments from the reviewer. We will conduct more experiments listed in our responses to the Overall Recommendations to improve the scope and quality of our study.

      Reviewer #3 (Public Review):

      This study was focused on the conserved mechanisms across the Transmembrane Channel/Scramblase superfamily, which includes members of the TMEM16, TMEM63/OSCA, and TMC families. The authors show that the introduction of lysine residues at the TM4-TM6 interface can disrupt gating and confer scramblase activity to non-scramblase proteins. Specifically, they show this to be true for conserved TM4 residues across TMEM16F, TMEM16A, OSCA1.2, and TMEM63A proteins. This breadth of data is a major strength of the paper and provides strong evidence for an underlying linked mechanism for ion conduction and phospholipid transport. Overall, the confocal imaging experiments, patch clamping experiments, and data analysis are performed well.

      However, there are several concerns regarding the scope of experiments supporting some claims in the paper. Although the authors propose that the TM4/TM6 interface is critical to ion conduction and phospholipid scramblase activity, in each case, there is very narrow evidence of support consisting of 1-3 lysine substitutions at specific residues on TM4. Given that the authors postulate that the introduction of a positive charge via the lysine side chain is essential to the constitutive activity of these proteins, additional mutation controls for side chain size (e.g. glutamine/methionine) or negative charge (e.g. glutamic acid), or a different positive charge (i.e. arginine) would have strengthened their argument. To more comprehensively understand the TM4/TM6 interface, mutations at locations one turn above and one turn below could be studied until there is no phenotype. In addition, the equivalent mutations on the TM6 side should be explored to rule out the effects of conformational changes that arise from mutating TM4 and to increase the strength of evidence for the importance of side-chain interactions at the TM6 interface. The experiments for OSCA1.2 osmolarity effects on gating and scramblase in Figure 4 could be improved by adding different levels of osmolarity in addition to time in the hypotonic solution.

      We appreciate the positive and constructive comments from the reviewer. As we outlined in our responses to the Overall Recommendations, we will include more mutations at the TM4 and TM6 interface to further strengthen our conclusion.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      In this study, the authors examined the role of IBTK, a substrate-binding adaptor of the CRL3 ubiquitin ligase complex, in modulating the activity of the eiF4F translation initiation complex. They find that IBTK mediates the non-degradative ubiquitination of eiF4A1, promotes cap-dependent translational initiation, nascent protein synthesis, oncogene expression, and tumor cell growth. Correspondingly, phosphorylation of IBTK by mTORC1/ S6K1 increases eIF4A1 ubiquitination and sustains oncogenic translation.

      Strengths:

      This study utilizes multiple biochemical, proteomic, functional, and cell biology assays to substantiate their results. Importantly, the work nominates IBTK as a unique substrate of mTORC1, and further validates eiF4A1 (a crucial subunit of the ei44F complex) as a promising therapeutic target in cancer. Since IBTK interacts broadly with multiple members of the translational initial complex - it will be interesting to examine its role in eiF2alpha-mediated ER stress as well as eiF3-mediated translation. Additionally, since IBTK exerts pro-survival effects in multiple cell types, it will be of relevance to characterize the role of IBTK in mediating increased mTORC1 mediated translation in other tumor types, thus potentially impacting their treatment with eiF4F inhibitors.

      Limitations/Weaknesses:

      The findings are mostly well supported by data, but some areas need clarification and could potentially be enhanced with further experiments:

      (1) Since eiF4A1 appears to function downstream of IBTK1, can the effects of IBTK1 KO/KD in reducing puromycin incorporation (in Fig 3A), cap-dependent luciferase reporter activity (Fig 3G), reduced oncogene expression (Fig 4A) or 2D growth/ invasion assays (Fig 4) be overcome or bypassed by overexpressing eiF4A1? These could potentially be tested in future studies.

      We appreciate the reviewer for bringing up this crucial point. As per the reviewer's suggestion, we conducted experiments where we overexpressed Myc-eIF4A1 in IBTK-KO SiHa cells. Our findings indicate that increasing levels of eIF4A1 through ectopic overexpression is unable to reverse the decrease in puromycin incorporation (Fig. S3C) and protein expression of eIF4A1 targets caused by IBTK ablation (Fig. S4E). These results clearly demonstrate that IBTK ablation-induced eIF4A1 dysfunctions cannot be rescued by simply elevating eIF4A1 protein levels. Given the above results are negative, the impacts of eIF4A1 overexpression on the 2D growth/invasion capacities of IBTK-KO cells were not further examined. We sincerely appreciate the reviewer's understanding regarding this matter.

      (2) The decrease in nascent protein synthesis in puromycin incorporation assays in Figure 3A suggest that the effects of IBTK KO are comparable to and additive with silvesterol. It would be of interest to examine whether silvesterol decreases nascent protein synthesis or increases stress granules in the IBTK KO cells stably expressing IBTK as well.

      We appreciate the reviewer for bringing up this crucial point. We have showed that silvestrol treatment still decreased nascent protein synthesis in IBTK-KO cells overexpressing FLAG-IBTK as well (Fig. S3B).

      (3) The data presented in Figure 5 regarding the role of mTORC1 in IBTK- mediated eiF4A1 ubiquitination needs further clarification on several points:

      • It is not clear if the experiments in Figure 5F with Phos-tag gels are using the FLAG-IBTK deletion mutant or the peptide containing the mTOR sites as it is mentioned on line 517, page 19 "To do so, we generated an IBTK deletion mutant (900-1150 aa) spanning the potential mTORC1-regulated phosphorylation sites" This needs further clarification.

      We appreciate the reviewer for bringing up this crucial point. The IBTK deletion mutant used in Fig. 5F is FLAG-IBTK900-1150aa. We have annotated it with smaller font size in the panel (red box) in Author response image 1.

      Author response image 1.

      • It may be of benefit to repeat the Phos tag experiments with full-length FLAG- IBTK and/or endogenous IBTK with molecular weight markers indicating the size of migrated bands.

      We appreciate the reviewer for bringing up this crucial point. We attempted to perform Phos-tag assays to detect the overexpressed full-length FLAG-IBTK or endogenous IBTK. However, we encountered difficulties in successfully transferring the full-length FLAG-IBTK or endogenous IBTK onto the nitrocellulose membrane during Phos-tag WB analysis. This is likely due to the limitations of this technique. Based on our experience, phos-tag gel is less efficient in detecting protein motility shifts with large molecular weights. As the molecular weight of IBTK protein is approximately 160 kDa, it falls within this category. Considering these technical constraints, we did not include Phos-tag assay results for full-length IBTK in our study. We sincerely appreciate the reviewer's understanding regarding this matter.

      The binding of Phos-tag to phosphorylated proteins induces a mobility shift during gel electrophoresis or protein separation techniques. This shift allows for the visualization and quantification of phosphorylated proteins separately from non-phosphorylated proteins. It's important to note that these mobility shifts indicate phosphorylation status, rather than actual molecular weights. pre- stained protein markers are typically used as a reference to assess the efficiency of protein transfer onto the membrane [Ref: 1]. Considering the aforementioned reasons, we did not add molecular weights to the WB images.

      Reference [1]. FUJIFILM Wako Pure Chemical Corporation, https://www.wako- chemicals.de/media/pdf/c7/5e/20/FUJIFILM-Wako_Phos-tag-R.pdf

      • Additionally, torin or Lambda phosphatase treatment may be used to confirm the specificity of the band in separate experiments.

      We appreciate the reviewer for bringing up this crucial point. Torin1 is a synthetic mTOR inhibitor by preventing the binding of ATP to mTOR, leading to the inactivation of both mTORC1 and mTORC2, whereas rapamycin primarily targets mTORC1 activity and may inhibit mTORC2 in certain cell types after a prolonged treatment. We have identified that the predominant mediator of IBTK phosphorylation is the mTORC1/S6K1 complex. Therefore, in this context, we think that rapamycin is sufficient to inactivate the mTORC1/S6K1 pathway. As shown in Fig. 5F, the phosphorylated IBTK900-1150aa was markedly decreased while the non-phosphorylated form was simultaneously increased in rapamycin- treated cells. As per the reviewer's suggestion, we treated FLAG-IBTK900-1150aa overexpressed cells with lambda phosphatase. As shown in Fig. 5G, lambda phosphatase treatment completely abolished the mobility shifts of phosphorylated FLAG-IBTK900-1150aa. Additionally, the lowest band displayed an abundant accumulation of the non-phosphorylated form of FLAG-IBTK900-1150aa. These findings confirm that the mobility shifts observed in WB analysis correspond to the phosphorylated forms of FLAG-IBTK900-1150aa.

      • Phos-tag gels with the IBTK CRISPR KO line would also help confirm that the non-phosphorylated band is indeed IBTK.

      We appreciate the reviewer for bringing up this crucial point. As we state above, we performed Phos-tag assays to detect the mobility shifts of phosphorylated FLAG-IBTK900-1150aa. Anti-FLAG antibody, but not the anti-IBTK antibody was used for WB detection. This antibody does not exhibit cross-reactivity with endogenous IBTK.

      • It is unclear why the lower, phosphorylated bands seem to be increasing (rather than decreasing) with AA starvation/ Rapa in Fig 5H.

      We appreciate the reviewer for bringing up this crucial point. We think the panel the reviewer mentioned is Fig. 5F. According to the principle of Phos-tag assays, proteins with higher phosphorylation levels have slower migration rates on SDS-PAGE, while proteins with lower phosphorylation levels have faster migration rates.

      As shown in Author response image 2, the green box indicates the most phosphorylated forms of FLAG-IBTK900-1150aa, the red box indicates the moderately phosphorylated forms of FLAG-IBTK900-1150aa, and the yellow box indicates the non-phosphorylated forms of FLAG-IBTK900-1150aa. AA starvation or Rapamycin treatment reduced the hyperphosphorylated forms of FLAG-IBTK900-1150aa (green box), while simultaneously increasing the hypophosphorylated (red box) and non- phosphorylated (yellow box) forms of FLAG-IBTK900-1150aa. Thus, we conclude that AA starvation or Rapamycin treatment leads to a marked decrease in the phosphorylation levels of FLAG-IBTK900-1150aa.

      Author response image 2.

      Reviewer #2 (Public Review):

      Summary:

      This study by Sun et al. identifies a novel role for IBTK in promoting cancer protein translation, through regulation of the translational helicase eIF4A1. Using a multifaceted approach, the authors demonstrate that IBTK interacts with and ubiquitinates eIF4A1 in a non-degradative manner, enhancing its activation downstream of mTORC1/S6K1 signaling. This represents a significant advance in elucidating the complex layers of dysregulated translational control in cancer.

      Strengths:

      A major strength of this work is the convincing biochemical evidence for a direct regulatory relationship between IBTK and eIF4A1. The authors utilize affinity purification and proximity labeling methods to comprehensively map the IBTK interactome, identifying eIF4A1 as a top hit. Importantly, they validate this interaction and the specificity for eIF4A1 over other eIF4 isoforms by co- immunoprecipitation in multiple cell lines. Building on this, they demonstrate that IBTK catalyzes non-degradative ubiquitination of eIF4A1 both in cells and in vitro through the E3 ligase activity of the CRL3-IBTK complex. Mapping IBTK phosphorylation sites and showing mTORC1/S6K1-dependent regulation provides mechanistic insight. The reduction in global translation and eIF4A1- dependent oncoproteins upon IBTK loss, along with clinical data linking IBTK to poor prognosis, support the functional importance.

      Weaknesses:

      While these data compellingly establish IBTK as a binding partner and modifier of eIF4A1, a remaining weakness is the lack of direct measurements showing IBTK regulates eIF4A1 helicase activity and translation of target mRNAs. While the effects of IBTK knockout/overexpression on bulk protein synthesis are shown, the expression of multiple eIF4A1 target oncogenes remains unchanged.

      Summary:

      Overall, this study significantly advances our understanding of how aberrant mTORC1/S6K1 signaling promotes cancer pathogenic translation via IBTK and eIF4A1. The proteomic, biochemical, and phosphorylation mapping approaches established here provide a blueprint for interrogating IBTK function. These data should galvanize future efforts to target the mTORC1/S6K1-IBTK-eIF4A1 axis as an avenue for cancer therapy, particularly in combination with eIF4A inhibitors.

      Reviewer #1 (Recommendations For The Authors):

      (1) Certain references should be provided for clarity. For e.g.,: Page 15, line 418 " The C-terminal glycine glycine (GG) amino acid residues are essential for Ub conjugation to targeted proteins".

      We appreciate the reviewer for bringing up this crucial point. We have taken two fundamental review papers (PMID: 22524316, 9759494) on the ubiquitin system as references in this sentence.

      (2) Please describe the properties of the ΔBTB mutant on page 15 when first describing it. What motifs does it lack and has it been described before in functional studies?

      We appreciate the reviewer for bringing up this crucial point. We added a sentence to describe the properties of the ΔBTB mutant. This mutant lacks the BTB1 and BTB2 domains (deletion of aa 554–871), which have been previously demonstrated to be essential for binding to CUL3. The original reference has been added to the revised manuscript.

      (3) In Figure 2G how do the authors explain the fact that co-expression of the Ub K-ALLR mutant, which is unable to form polyubiquitin chains, formed only a moderate reduction in IBTK-mediated eIF4A1 ubiquitination?

      We appreciate the reviewer for bringing up this crucial point. The Ub K-ALLR mutant can indeed conjugate to substrate proteins, but it cannot form chains due to its absence of lysine residues, resulting in mono-ubiquitination. Multi- mono-ubiquitination refers to the attachment of single ubiquitin molecules to multiple lysine residues on a substrate protein. It's worth noting that a poly- ubiquitinated protein and a multi-mono-ubiquitinated protein appear strikingly similar in Western blot. Our findings demonstrated that the co-expression of the Ub K-ALL-R mutant resulted in only a modest reduction in IBTK-mediated eIF4A1 ubiquitination (Fig. 2G), and that eIF4A1 was ubiquitinated at twelve lysine residues when co-expressed with IBTK (Fig. S2F). As such, we conclude that the CRL3IBTK complex primarily catalyzes multi-mono-ubiquitination on eIF4A1. .

      (4) In Figure 5, The identity of the seven sites in the IBTK 7ST A mutants should be specified.

      We appreciate the reviewer for bringing up this crucial point. We have specified the seven mutation sites in the IBTK-7ST A mutant (Fig. 6A).

      (5) In Figure 5, the rationale for generating antibodies only to S990/992/993, as opposed to the other mTORC1/S6K motifs should be specified.

      We appreciate the reviewer for bringing up this crucial point. Upon demonstrating that IBTK can be phosphorylated—with evidence from positive Phos-tag and in vitro phosphorylation assays—we sought to directly detect changes in the phosphorylation levels using an antibody specific to IBTK phosphorylation. However, the expense of generating seven phosphorylation- specific antibodies for each site is significant. Recognizing that S990/992/993 are three adjacent sites, we deemed it appropriate to generate a single antibody to recognize the phospho-S990/992/993 epitope. Moreover, out of the seven phosphorylation sites, S992 perfectly matches the consensus motif for S6K1 phosphorylation (RXRXXS). Utilizing this antibody allowed us to observe a substantial decrease in the phosphorylation levels of these three adjacent Ser residues in IBTK following either AA deprivation or Rapamycin treatment (Fig. 5L). We have specified these points in the manuscript.

      Reviewer #2 (Recommendations For The Authors):

      The following suggestions would strengthen the study:

      (1) Directly examine the effects of IBTK modulation (knockdown/knockout/ overexpression) on eIF4A1 helicase activity.

      We appreciate the reviewer for bringing up this crucial point. We agree with the reviewer's suggestion that evaluating IBTK's influence on eIF4A1 helicase activity directly would enhance the strength of our conclusion. However, the current eIF4A1 helicase assays, as described in previous publications [Ref: 1, 2], can only be conducted using in vitro purified recombinant proteins. For instance, it is feasible to assess the varying levels of helicase activity exhibited by recombinant wild-type or mutant EIF4A1 proteins [Ref: 2]. Importantly, there is currently no reported methodology for evaluating the helicase activity of EIF4A1 in vivo, as mentioned by the reviewer in gene knockdown, knockout, or overexpression cellular contexts. Therefore, we have not performed these assays and we sincerely appreciate the reviewer's understanding in this regard. We sincerely appreciate the reviewer's understanding regarding this matter.

      Reference:

      [1] Chu J, Galicia-Vázquez G, Cencic R, Mills JR, Katigbak A, Porco JA, Pelletier J. CRISPR-mediated drug-target validation reveals selective pharmacological inhibition of the RNA helicase, eIF4A. Cell reports. 2016 Jun 14;15(11):2340-7.

      [2] Chu J, Galicia-Vázquez G, Cencic R, Mills JR, Katigbak A, Porco JA, Pelletier J. CRISPR-mediated drug-target validation reveals selective pharmacological inhibition of the RNA helicase, eIF4A. Cell reports. 2016 Jun 14;15(11):2340-7.

      (2) Justify why the expression of some but not all eIF4A1 target oncogenes is affected in IBTK-depleted/overexpressing cells. This is important if IBTK should be considered as a therapeutic target. The authors should consider which of the eIF4A1 targets are most impacted by IBTK KO. This would provide a more focused therapeutic approach in the future.

      We appreciate the reviewer for bringing up this crucial point. As the reviewer has pointed out, we assessed the protein levels of ten reported eIF4A1 target genes across three cancer cell lines (Fig.4, Fig. S4A, C). We observed that IBTK depletion led to a substantial reduction in the protein levels of most eIF4A1- regulated oncogenes upon IBTK depletion, although there were some exceptions. For instance, IBTK KO in H1299 cells exerted minimal influence on the protein levels of ROCK1 (Fig. S4A). Several possible explanations might account for this observation: firstly, given that our list of eIF4A1 target genes collected from previous studies conducted using distinct cell lines, it is not unexpected for different lines to exhibit subtle differences in regulation of eIF4A1 target genes. Secondly, as a CRL3 adaptor, IBTK potentially performs other biological functions via ubiquitination of specific substrates; dysregulation of these could buffer the impact of IBTK KO on the protein expression of some eIF4A1 target genes. We added these comments to the Discussion section of the revised manuscript.

      (3) Expand mTOR manipulation experiments (inhibition, Raptor knockout, activation) and evaluate impacts on IBTK phosphorylation, eIF4A1 ubiquitination, and translation.

      The mTORC1 signaling pathway is constitutively active under normal culture conditions. In order to inhibit mTORC1 activation, we employed several approaches including AA starvation, Rapamycin treatment, or Raptor knockout. Our results have demonstrated that both AA starvation and rapamycin treatment led to a reduction in eIF4A1 ubiquitination (Fig. 5M). Moreover, we have included new findings in the revised manuscript, which highlight that Raptor knockout specifically decreases eIF4A1 ubiquitination (Fig. 5N). It is worth mentioning that the impacts of mTOR inhibition or activation on protein translation have been extensively investigated and documented in numerous studies. Therefore, in our study, we did not feel it necessary to examine these treatments further.

      (4) Although not absolutely necessary, it would be nice to see if some of these findings are true in other cancer cell types.

      We appreciate the reviewer for bringing up this crucial point. We concur with the reviewer's suggestion that including data from other cancer cell types would enhance the strength of our conclusion. While the majority of our data is derived from two cervical cancer cell lines, we have corroborated certain key findings— such as the impact of IBTK on eIF4A1 and its target gene expression—in H1299 cells (human lung cancer) (Fig. 2C, Fig. S4A, B) and in CT26 cells (murine colon adenocarcinoma) (Fig. S4C, D). Additionally, we demonstrated that IBTK promotes IFN-γ-induced PD-L1 expression and tumor immune escape in both the H1299 and CT26 cells (Fig. S6A-K).

    1. Author response:

      The following is the authors’ response to the original reviews.

      The reviewer comments have been helpful, and we have revised the manuscript to address the concerns of reviewer 2. In addition to text changes, we also added a negative control to Figure 1 to address concerns about photobleaching or DNA unwrapping.

      Reviewer #1:

      This manuscript presents an extremely exciting and very timely analysis of the role that the nucleosome acidic patch plays in SWR1-catalyzed histone exchange. Intriguingly, SWR1 loses activity almost completely if any of the acidic patches are absent. To my knowledge, this makes SWR1 the first remodeler with such a unique and pronounced requirement for the acidic patch. The authors demonstrate that SWR1 affinity is dramatically reduced if at least one of the acidic patches is absent, pointing to a key role of the acidic patch in SWR1 binding to the nucleosome. The authors also pinpoint a specific subunit - Swc5 - that can bind nucleosomes, engage the acidic patch, and obtain a cryo-EM structure of Swc5 bound to a nucleosome. They also identify a conserved arginine-rich motif in this subunit that is critical for nucleosome binding and histone exchange in vitro and for SWR1 function in vivo. The authors provide evidence that suggests a direct interaction between this motif and the acidic patch.

      Strengths:

      The manuscript is well-written and the experimental data are of outstanding quality and importance for the field. This manuscript significantly expands our understanding of the fundamentally important and complex process of H2A.Z deposition by SWR1 and would be of great interest to a broad readership.

      We thank the reviewer for their enthusiastic and positive comments on our work.

      Reviewer #2:

      Summary:

      In this study, Baier et al. investigated the mechanism by which SWR1C recognizes nucleosomal substrates for the deposition of H2A.Z. Their data convincingly demonstrate that the nucleosome's acidic patch plays a crucial role in the substrate recognition by SWR1C. The authors presented clear evidence showing that Swc5 is a pivotal subunit involved in the interaction between SWR1C and the acidic patch. They pared down the specific region within Swc5 responsible for this interaction. However, two central assertions of the paper are less convincing. First, the data supporting the claim that the insertion of one Z-B dimer into the canonical nucleosome can stimulate SWR1C to insert the second Z-B dimer is somewhat questionable (see below). Given that this claim contradicts previous observations made by other groups, this hypothesis needs further testing to eliminate potential artifacts. Secondly, the claim that SWR1C simultaneously recognizes the acidic patch on both sides of the nucleosome also needs further investigation, as the assay used to establish this claim lacks the sensitivity necessary to distinguish any difference between nucleosomal substrates containing one or two intact acidic patches.

      Strengths:

      As mentioned in the summary, the authors presented clear evidence demonstrating the role of Swc5 in recognition of the nucleosome acidic patch. The identification of the specific region in Swc5 responsible for this interaction is important.

      We thank the reviewer for their careful critique of our work. Below we address each major concern.

      Major comments: (1) Figure 1B: It is unclear how much of the decrease in FRET is caused by the bleaching of fluorophores. The authors should include a negative control in which Z-B dimers are omitted from the reaction. In the absence of ZB dimers, SWR1C will not exchange histones. Therefore, any decrease in FRET should represent the bleaching of fluorophores on the nucleosomal substrate, allowing normalization of the FRET signal related to A-B eviction.

      In this manuscript, as well as in our two previous publications (Singh et al., 2019; Fan et al.,2022), we have presented the results of no enzyme controls, +/- ZB dimers, no ATP controls, or AMP-PNP controls for our FRET-based, H2A.Z deposition assay (see also Figure S3). We do not observe significant levels of photobleaching in this assay, either during ensemble measurements or in an smFRET experiment. To aid the reader, we have added the AMP-PNP data for the experiment shown in Figure 1B. The results show there is less than a 10% decrease in FRET over 30’, and the signal from the double acidic patch disrupted nucleosome is identical to this negative control.

      (2) Figure S3: The authors use the decrease in FRET signal as a metric of histone eviction. However, Figure S3 suggests that the FRET signal decrease could be due to DNA unwrapping. Histone exchange should not occur when SWR1C is incubated with AMP-PNP, as histone exchange requires ATP hydrolysis (10.7554/eLife.77352). And since the insertion of Z-B dimer and the eviction of A-B dimer are coupled, the decrease of FRET in the presence of AMP-PNP is unlikely due to histone eviction or exchange. Instead, the FRET decrease is likely due to DNA unwrapping (10.7554/eLife.77352). The authors should explicitly state what the loss of FRET means.

      We agree with the reviewer, that loss of FRET can be due to DNA unwrapping from the nucleosome. We have previously demonstrated this activity by SWR1C in our smFRET study (Fan et al., 2022). However, DNA unwrapping is highly reversible and has a time duration of only 1-3 seconds. We and others have not observed stable unwrapping of nucleosomes by SWR1C, but rather the stable loss of FRET reports on dimer eviction. We assume the reviewer is concerned about the rather large decrease in FRET signal shown in the AMP-PNP controls for Figure S3, panels A and D. For the other 7 panels, the decrease in FRET with AMP-PNP are minimal. In fact, if we average all of the AMP-PNP data points, the rate of FRET loss is not statistically different from no enzyme control reactions (nucleosome plus ZB dimers).

      Data for panels A and D used a 77NO nucleosomal substrate, with Cy3 labeling the linker distal dimer. This is our standard DNA fragment, and it was used in Figure 1B. The only difference between data sets is that the data shown in Fig 1B used nucleosome reconstituted with a Cy5-labelled histone octamer, rather than the hexasome assembly method used for Fig S3. Three points are important. First, for all of these substrates, we assembled 3 independent nucleosomes, and the results are highly reproducible. Two, we performed a total of 6 experiments for the 77NO-Cy5 substrates to ensure that the rates were accurate (+/-ATP). Third, and most important, we do not see this decrease in FRET signal in the absence of SWR1C (no enzyme control). This data was included in the data source file. Thus, it appears that there is significant SWR1C-induced nucleosome instability for these two hexasome-assembled substrates. We now note this in the legend to Figure S3. Key for this work, however, is that there is a large increase in the rate of FRET loss in the presence of ATP, and this rate is faster when a ZB dimer was present at the linker proximal location. In response to the last point, we state in the first paragraph of the results: “The dimer exchange activity of SWR1C is monitored by following the decrease in the 670 nm FRET signal due to eviction of the Cy5-labeled AB-Cy5 dimer (Figure 1A).”

      (3) Related to point 2. One way to distinguish nucleosomal DNA unwrapping from histone dimer eviction is that unwrapping is reversible, whereas A-B eviction is not. Therefore, if the authors remove AMP-PNP from the reaction chamber and a FRET signal reappears, then the initial loss of FRET was due to reversible DNA unwrapping. However, if the removal of AMP-PNP did not regain FRET, it means that the loss of FRET was likely due to A-B eviction. The authors should perform an AMP-PNP and/or ATP removal experiment to make sure the interpretation of the data is correct.

      See response to item 2 above

      (4) The nature of the error bars in Figure 1C is undefined; therefore, the statistical significance of the data is not interpretable.

      We apologize for not making this more explicit for each figure. The error bars report on 95% confidence intervals from at least 3 sets of experiments. This statement has been added to the legend.

      (5) The authors claim that the SWR1C requires intact acidic patches on both sides of the nucleosomes to exchange histone. This claim was based on the experiment in Figure 1C where they showed mutation of one of two acidic patches in the nucleosomal substrate is sufficient to inhibit SWR1C-mediated histone exchange activity. However, one could argue that the sensitivity of this assay is too low to distinguish any difference between nucleosomes with one (i.e., AB/AB-apm) versus two mutated acidic patches (i.e., AB-apm/AB-apm). The lack of sensitivity of the eviction assay can be seen when Figure 1B is taken into consideration. In the gel-shift assay, the AB-apm/AB-apm nucleosome exhibited a 10% SWR1C-mediated histone exchange activity compared to WT. However, in the eviction assay, the single AB/AB-apm mutant has no detectable activity. Therefore, to test their hypothesis, the authors should use the more sensitive in-gel histone exchange assay to see if the single AB/AB-apm mutant is more or equally active compared to the double AB-apm/AB-apm mutant.

      Our pincher model is based on three, independent sets of data, not just Figure 1C. First, as noted by the reviewer, we find that disruption of either acidic patch cripples the dimer exchange activity of SWR1C in the FRET-based assay. Whether the defect is identical to that of the double APM mutant nucleosome does not seem pertinent to the model. In a second set of assays, we used fluorescence polarization to quantify the binding affinity of SWR1C for wildtype nucleosomes, a double APM nucleosome, or each single APM nucleosome. Consistent with the pincher model, each single APM disruption decreases binding affinity at least 10-fold (below the sensitivity of the assay). Finally, we monitored the ability of different nucleosomes to stimulate the ATPase activity of SWR1C. Consistent with the pincher model, a single APM disruption was sufficient to eliminate nucleosome stimulation.

      (6) The authors claim that the AZ nucleosome is a better substrate than the AA nucleosome. This is a surprising result as previous studies showed that the two insertion steps of the two Z-B dimers are not cooperative (10.7554/eLife.77352 and 10.1016/J.CELREP.2019.12.006). The authors' claim was based on the eviction assay shown in Fig 1C. However, I am not sure how much variation in the eviction assay is contributed by different preparations of nucleosomes. The authors should use the in-gel assay to independently test this hypothesis.

      For all data shown in our manuscript, at least three different nucleosome preparations were used. The impact of a ZB dimer on the rates of dimer exchange was highly reproducible among different nucleosome preparations and experiments. We also see reproducible ZB stimulation for three different substrates – with ZB on the linker proximal side, the linker distal side, and on one side of a core particle. We do not believe that our data are inconsistent with previous studies. First, the previous work referenced by the reviewer performed dimer exchange reactions with a large excess of nucleosomes to SWR1C (catalytic conditions), whereas we used single turnover reactions. Secondly, our study is the first to use a homogenous, ZA heterotypic nucleosome as a substrate for SWR1C. All previous studies used a standard AA nucleosome, following the first and second rounds of dimer exchange that occur sequentially. And finally, we observe only a 20-30% increase in rate by a ZB dimer (e.g. 77N0 substrates), and such an increase was unlikely to have been detected by previous gel-based assays.

      Minor comments:

      (1) Abstract line 4: To say 'Numerous' studies have shown acidic patch impact chromatin remodeling enzymes activity may be too strong.

      Removed

      (2) Page 15, line 15: The authors claim that swc5∆ was inviable on formamide media. However, the data in Figure 8 shows cell growth in column 1 of swc5∆.

      The term ‘inviable’ has been replaced with ‘poor’ or ‘slow growth’

      (3) The authors should use standard yeast nomenclature when describing yeast genes and proteins. For example, for Figure 8 and legend, Swc5∆ was used to describe the yeast strain BY4741; MATa; his3Δ1; leu2Δ0; met15Δ0; ura3Δ0; YBR231c::kanMX4. Instead, the authors should describe the swc5∆ mutant strain as BY4741 MAT a his3∆1 leu2∆0 met15∆0 ura3∆0 swc5∆::kanMX4. Exogenous plasmid should also be indicated in italics and inside brackets, such as [SWC5-URA3] or [swc5(R219A)-URA3].

      We apologize for missing this mistake in the Figure 8 legend. We had inadvertently copied this from the euroscarf entry and forgot to edit the entry. We decided not to add all the plasmid names to the figure, as it was too cluttered. We state in the figure legend that the panels show growth of swc5 deletion strains harboring the indicated swc5 alleles on CEN/ARS plasmids.

      (4) According to Lin et al. 2017 NAR (doi: 10.1093/nar/gkx414), there is only one Swc5 subunit per SWR1C. Therefore, the pincher model proposed by the authors would suggest that there is a missing subunit that recognizes the second acidic patch. The authors should point out this fact in the discussion. However, as mentioned in Major comment 6, I am not sure if the pincer model is substantiated.

      In our discussion, we had noted that the published cryoEM structure had suggested that the Swc2 subunit likely interacts with the acidic patch on the dimer that is not targeted for replacement, and we proposed that Swc5 interacts with the acidic patch on the exchanging H2A/H2B dimer. We have now made this more clear in the text.

    1. Author response:

      We thank the reviewers for the feedback on our manuscript; we are planning to address the raised concerns in the following manner:

      We will be more explicit about the novelty of this method framing it more concretely within the scope of current research. From some comments of the reviewers, we understand that it is not clear that our method is an extension of an already existing method and model that has been extensively validated with pre-trained models brought online. Consequently, the details of the model as well as the training cohort are only covered briefly, referencing relevant published works on this topic. We will improve the clarity in this respect in the full responses. Nevertheless, we agree that the work would benefit from a simulation study that formally evaluates the performance of our method compared with more traditional approaches and will add it in our full responses. We will take care specifically of investigating the effect of assumptions like the centile-stability in healthy controls as suggested by the Reviewer 2.

      The novelty of this work lies in introducing a mathematically transparent method to use normative modelling for evaluating studies with a longitudinal design, using normative models trained on cross sectional data. We emphasise strongly that this is otherwise not possible using current methods. Furthermore, by building on a pre-trained model, this method enjoys the benefits of big (cross-sectional) data (by the pre-trained model being fitted on an extensive population sample) without the need to have direct access to them, or a ‘big’ longitudinal dataset from the cohort at hand. This is crucial in neuroimaging, where longitudinal data are much more scarce than cross-sectional data.

      We strongly disagree with the notion raised by Reviewer 1 that after the first episode cortical thickness alterations are expected to become more severe. There is now increasing evidence that: (i) trajectories of cortical thickness are highly variable across different individuals after the first psychotic episode and (ii) that individuals treated with second-generation antipsychotics and with careful clinical follow-up can show normalisation of cortical thickness atypicalities after the first episode. Indeed, we can provide evidence for this in an independent cohort, with different analytical methodologies, where precisely this occurs (https://www.medrxiv.org/content/10.1101/2024.04.19.24306008v1, https://pubmed.ncbi.nlm.nih.gov/36805840/). In the full revision, we would be happy to provide further discussion of evidence in support of this.

      We  would also like to re-emphasise  that the data were processed with the utmost rigour using state of the art processing pipelines including quality control.

      We will take care to improve the flow of the manuscript with special attention to the theoretical part and sections highlighted by the Reviewer 2. 

      We agree with the challenge outlined by the Reviewer 2 regarding the limitations in interpretation of overall trends when the position in the visit one is different between the subjects. However, this is a much broader challenge and is not specific to this study. The non-random sampling of large cohort studies is problematic for nearly all studies using such cohorts, and regardless of the  statistical approach used. We will explicitly acknowledge these limitations in the full response.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This solid study investigates the transdifferentiation of chicken embryonic fibroblasts into muscle and fat cells in 3D to create whole-cut meat mimics. The study is important and provides a method to control muscle, fat, and collagen content within the 3D meat mimics and thus provides a new avenue for customized cultured meat production. Limitations of this study include the use of transgene for transdifferentiation and thus the creation of GMO food.

      We are grateful for the substantial effort that editors and reviewers put into assessing our manuscript and providing insightful feedback. We have tried to address, as much as possible, all comments and criticisms. We believe that we have now a significantly improved manuscript. Below, there is a point-by-point response.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors presented here a novel 3D fibroblast culture and transdifferentiation approach for potential meat production with GelMA hydrogel.

      Strengths:

      (1) Reduced serum concentration for 3D chicken fibroblast culture and transdifferentiation is optimized.

      (2) Efficient myogenic transdifferentiation and lipogenesis as well as controlled fat deposition are achieved in the 3D GelMA.

      Weaknesses:

      (1) While the authors stated the rationale of using fibroblasts instead of myogenic/adipogenic stem cells for meat production, the authors did not comment on the drawbacks/disadvantages of genetic engineering (e.g., forced expression of MyoD) in meat production.

      Thanks for the reviewer for raise this important issue. We have now described this drawback in the discussion part.

      As a proof-of-concept study, we sought to explore the potential of utilizing the transdifferentiation integrated transgene tools for overexpressing a transdifferentiation factor to achieve the maximum muscle production. However, it is important to acknowledge that genetically modified meat products derived from the genetic engineering of cultured cells will not be suitable for consumer acceptance and market viability. We are currently testing other non-genomic integrating delivery means such as modRNAs and chemical cocktails to induce myogenic transdifferentiation in fibroblasts. We believe the new non-genomic integration means would be compatible for the meat production and consumer acceptance.

      Please see lines 439-445.

      “As a proof-of-concept, we utilized the transgene method to achieve maximum myogenic induction and the final products still retain the foreign transgene fragment in the cells’ genome. It is therefore posing a risk of genetic modified food which is not suitable for mass production. In the next step, other non-transgenic means such as non-integrating vectors, chemical reprogramming, modified RNAs, and recombinant transgene removal techniques will be explored to develop transgene-free end products.”

      (2) While the authors cited one paper to state the properties and applications of GelMA hydrogel in tissue engineering and food processing, concerns/examples of the food safety with GelMA hydrogel are not discussed thoroughly.

      Thank you for pointing out this issue. We discussed the drawbacks of Gelma hydrogel applications in the meat production in the main text.

      GelMA-based hydrogels have shown great potential due to their biocompatibility and mechanical tenability. It is widely used in 3D cell culture and tissue engineering for regenerative medicine, but less common in food processing and agricultural applications. Due to its special photo-crosslinking properties, biocompatibility and degradability, it allows this material to be shaped into complex tissue structures by 3D printing or modelling. Many researchers have also used Gelma hydrogel as a scaffold for culture meat production (Jeong et al., 2022; Li et al., 2021; Park et al., 2023). Later research will carefully consider Gelma hydrogen as well as other types of scaffold biomaterials for cost-effective and food-safety compliant culture meat production (Bomkamp et al., 2022).

      Bomkamp, C., Skaalure, S. C., Fernando, G. F., Ben‐Arye, T., Swartz, E. W., & Specht, E. A. J. A. S. (2022). Scaffolding biomaterials for 3D cultivated meat: prospects and challenges. Advanced Science (Weinh), 9(3), 2102908.

      Jeong, D., Seo, J. W., Lee, H. G., Jung, W. K., Park, Y. H., & Bae, H. (2022). Efficient Myogenic/Adipogenic Transdifferentiation of Bovine Fibroblasts in a 3D Bioprinting System for Steak-Type Cultured Meat Production. Advanced Science (Weinh), 9(31), e2202877.

      Li, Y., Liu, W., Li, S., Zhang, M., Yang, F., & Wang, S. J. J. o. F. F. (2021). Porcine skeletal muscle tissue fabrication for cultured meat production using three-dimensional bioprinting technology. Journal of Future Foods, 1(1), 88-97.

      Park, S., Hong, Y., Park, S., Kim, W., Gwon, Y., Jang, K.-J., & Kim, J. J. J. o. B. E. (2023). Designing Highly Aligned Cultured Meat with Nanopatterns-Assisted Bio-Printed Fat Scaffolds. Journal of Biosystems Engineering, 48(4), 503-511.

      We discussed the drawbacks of GelMA hydrogel. Please see lines 445-457.

      “Another food safety concern in this study is the use of GelMA hydrogel for culture meat production. Due to its excellent biocompatibility and mechanical flexibility, GelMA-based hydrogel has demonstrated significant potential in scalable 3D cell culture for creating artificial tissue ranging in sizes from millimeters to centimeters. It is widely used in 3D cell culture and tissue engineering for regenerative medicine, but less common in food processing and agricultural applications. Due to its special photo-crosslinking properties, biocompatibility and degradability, it allows this material to be shaped into complex tissue structures by 3D printing or modelling. Many researchers have also used GelMA hydrogel as a scaffold for culture meat production (Jeong et al., 2022; Li et al., 2021; Park et al., 2023). Later research will carefully consider hydrogel as well as other types of scaffold biomaterials for cost-effective and food-safety compliant culture meat production (Bomkamp et al., 2022). ”

      (3) In Fig. 4C, there seems no significant difference in the Vimentin expression between Fibroblast_MyoD and Myofibroblast. The conclusion of "greatly reduced in the myogenic transdifferentiated cells" is overstated.

      Thanks for pointing out this mistake.

      We revised the wording accordingly. The vimentin expression was reduced in fibroblast_MyoD compare to the original fibroblast.

      Please see lines 231-233.

      “The fibroblast intermediate filament Vimentin (Tarbit et al., 2019) was abundantly expressed in the fibroblasts but reduced in the myogenic transdifferentiated cells (Figure 4C)”

      (4) The presented cell culture platform is only applied to chicken fibroblasts and should be tested in other species such as pigs and fish.

      Thank you for the suggestion.

      In this pilot cultured meat study, we utilized chicken embryonic fibroblasts. These specific cells were chosen for their near-immortal nature and robustness in culture, as well as the inducible myogenic capacity. In our previous experiments (Ren et al, Cell Reports, 2022, 40:111206), we have tested the myogenic transdifferentiation potential of fibroblasts from mice, pigs, and chickens, and observed varying efficiencies of myogenesis. It is important to note that fibroblast cells derived from different species, or even different tissues within the same species, would exhibit significant variations in their capacities for myogenic and adipogenic transdifferentiation.

      In this proof-of-concept study we used only one source of fibroblasts for testing culture meat production and confirmed the myogenic/adipogenic transdifferentiation could be manipulated as feasible means to precisely control muscle, fat and collagen content. We would expect that different origins of fibroblasts to display different transdifferentiation efficiencies and thus produce various muscle/fat ratios in meat mimics. That is beyond the scope of current study.

      Furthermore, we are also testing myogenic/adipogenic transdifferentiation of fibroblasts from pigs through non-genomic integration approaches. We believe only the non-transgene tools are viable solutions for culture meat production in the future. We added the species information in the discussion part.

      See lines 515-517.

      “This approach can be readily extrapolated to other species such as pigs and presents promising avenues for the large-scale production of customized and versatile meat products that may cater to varying consumer preferences.”

      Reviewer #2 (Public Review):

      The manuscript by Ma et al. tries to develop a protocol for cell-based meat production using chicken fibroblasts as three-dimensional (3D) muscle tissues with fat accumulation. The authors used genetically modified fibroblasts which can be forced to differentiate into muscle cells and formulated 3D tissues with these cells and a biphasic material (hydrogel). The degrees of muscle differentiation and lipid deposition in culture were determined by immunohistochemical, biochemical, and molecular biological evaluations. Notably, the protocol successfully achieved the process of myogenic and lipogenic stimulation in the 3D tissues.

      Overall, the study is reasonably designed and performed including adequate analysis. The manuscript is clearly written with well-supported figures. While it presents valuable results in the field of cultivated meat science and skeletal muscle biology, some critical concerns were identified. First, it is unclear whether some technical approaches were really the best choice for cell-based meat production. Next, more careful evaluations and justifications would be required to properly explain biological events in the results. These points include additional evaluations and considerations with regard to myocyte alignment and lipid accumulation in the differentiated 3D tissues. The present data are very suggestive in general, but further clarifications and arguments would properly support the findings and conclusions.

      Thanks for the reviewer’s comments. We have performed additional experiments and analysis to address the critical questions. We also revised the text extensively to clarify or discuss some of the concerns, such as the cell alignment and cellular distribution of intramuscular fat issues. We expect the revised data and text could adequately support the conclusions of the manuscript.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) In Figure 1, the authors used 1% chicken serum. Have the authors tested other lower concentrations? It will be interesting to see the lowest chicken serum concentrations in fibroblast culture and transdifferentiation;

      Thank you for your suggestion.

      Yes, we actually have tested the lower concentrations of serum, such as 1% FBS, and 0.5% chicken serum. However, the cells are not in a healthy state under these low levels of serum, as shown by the abnormal cell morphology and nearly no cell growth. Please see the revised Supplementary Figure S1D, in which we added the 1%FBS and 0.5% chicken serum data. Hence, the 1% chicken serum is optimal in our hands. We will also test other types of specialized serum-free medium in future experiments.

      (2) In Figure 2, the authors should quantify the fold expansion of fibroblasts cultured in 3D gel after 1, 3, 5, and 9 days since this data is important for future meat manufacturing. In addition, long-term expansion (e.g., 1 month) in 3D gel should also be shown;

      Thanks for the question. We have quantified the cell growth in 3D by measuring the PHK26 stained cells. Since the cells were implanted into the gel, they propagated exponentially from 1 day to 9 days. The cell proliferation data provide good reference for the future meat manufacturing (Figure 2D). We have tried the long-term expansion in 3D but failed to measure the cell proliferation. Because the 3D gel always collapsed during 12-15 days in cell culture for some unknown reasons, either the cells are grown too crowded to compromise the gel structure or the gel matrix itself is not strong enough for standing long-term. We believe the cells will grow well in long-term if we provide enough 3D attachment surface, since they grow indefinitely in 2D. We will testing different 3D matrix in the future.

      Please see the revised Figure 2D for the quantification of cells.

      (3) In Figure 3, please also show MyoD staining as it'll be interesting to see the expression of exogenous and endogenous MyoD expression after dox treatment. In Figure G, the hydrogel meat seems very small, please show/discuss the maximum size of hydrogel meat that may be achieved using this approach;

      Thanks for asking this information. We performed the immunostaining by using the anti-MyoD and anti-Flag to show the expression of all MyoD (exogenous and endogenous) and only exogenous MyoD after dox treatment. The MyoD and 3xFlag were fused in-frame in the transgene plasmid and thus the anti-Flag staining indicate the exogenous MyoD expression and anti-MyoD staining indicate the expression of exogenous and endogenous MyoD together.

      As shown in Figure S4, we found that almost 100% of cells were positive for MyoD staining and 60% of which expressed Flag, these data were consistent with our previous results (Ren et al., 2022, Cell Reports).

      Author response image 1.

      As for the size of the culture meat based on hydrogel, we discussed the possibilities in scalable production of hydrogel based whole-cut meat mimics. Please see lines 446-449. “Due to its excellent biocompatibility and mechanical flexibility, GelMA-based hydrogel has demonstrated significant potential in scalable 3D cell culture for creating artificial tissue ranging in sizes from millimeters to centimeters.”

      (4) In Figure 5 and Supplementary Figure 6, please quantify the Oil-red O+ fat cells in the 2D and 3D lipogenic induction. Also in Fig. 6B, quantify the oil-red+MHC+ cells;

      Thank you for this advice. We have quantified the oil-red O stained images in the result “Stimulate the fat deposition in chicken fibroblasts in 3D” using analysis software imageJ and the quantification of Oil-red O area was added to the corresponding graphs (Figure 5C, Figure S6C and S6F).

      However, due to the unique structure of the 3D matrix, many MHC+ and Oil Red O+ double-positive cells overlap with each other across different Z-stack layers in 3D. This overlap makes it challenging to accurately position and quantify the double-positive cells as the different layers interfere with each other.

      (5) In Figure 7, please show immunostaining images of collagen and other major ECMs;

      Thank you for this question. We have tried to stain collagen networks the by the Picrosirius Red staining but failed. Instead, we employed the laminin immunostainings to confirm that the ECM contents in the 3D matrix is increasing steadily during cell culturation.

      Please see Figure 7C. Lines 346-348.

      “the laminin protein content was accumulated and increased steadily during 3D culturation (Figure 7C) “

      (6) In Figure 8, please show hierarchical clustering analysis of whole transcriptomes of 3D_fibroblasts, 3D_MyoD, 3D+FI, and 3D_MyoD+FI. A Venn Diagram showing the overlap and distinct gene expression among these groups is also appreciated.

      Thank you for the suggestion.

      We added the hierarchical clustering analysis of whole transcriptomes of 3D_fibroblasts, 3D_MyoD, 3D+FI, and 3D_MyoD+FI using Euclidean distance with ward.D cluster method. Please see Figure 8B. The result showed that these groups formed two large clusters, in which the 3D+FI clustered separately and the 3D_fibroblasts, 3D_MyoD and 3D_MyoD+FI were more similar. Please see Figure 8B.

      As the reviewer suggested, we also compared the transcriptomes of 3D_MyoD, 3D+FI, and 3D_MyoD+FI to the original 3D_fibroblasts to identify differentially expression genes (DEG) and then analyzed the overlap and distinct DEGs respectively. As shown in Figure 8D, the Venn Diagram showed that majority of DEG from 3D_MyoD+FI (3D_MyoD+FI versus 3D_fibroblasts) are overlapped with 3D_MyoD and 3D+FI, indicating that 3D_MyoD+FI are compatible with myogenic and adipogenic function.

      Please see the revised Figure 8.

      Reviewer #2 (Recommendations For The Authors):

      In this study, the authors demonstrated a new approach for cultivated meat production using chicken fibroblasts. Specifically, the cells were cultured as 3D and induced muscle differentiation and lipid deposition. The manuscript contains a good set of data, which would be valuable to researchers in the fields of both cell-based meat and skeletal muscle biology. From the aspect of cultivated meat science, the rationale behind the idea is understandable, but it remains unclear whether the proposed approach was really the best choice to achieve their final goal. On the other hand, when we read this manuscript as a paper in skeletal muscle biology, the overall approach was not innovative enough and several uncertain issues remain. The authors should add more sufficient justifications, arguments, and discussions.

      (1) When considering their goal to produce edible meat products, the current approach has some concerns. First, there are issues with the approach used for the induction of myogenesis by MyoD transgene. This makes the end products GMO foods, which are not easily acceptable to a wide range of consumers. Next, the hydrogel was used for 3D tissue formation, but it is unclear whether this matrix type is edible, safe, and bio-comparable for cell-based meat production. The authors already discussed these points by excusing that the current work remains proof-of-concept. However, more careful considerations and justifications would be required.

      Thank you for the suggestion.

      We acknowledge that the current transgene myogenic induction method is not suitable for mass production of culture meat because of the GMO food concerns. We utilized the MyoD transgene as the means of myogenic transdifferentiation at the first place, because of the ease of genetic manipulation and maximum efficiency. We are current testing non-genomic integration tools such as chemical cocktails and modified RNAs for myogenic transdifferentiation.

      When it comes to the applications of hydrogel in the food industry, certain types of hybrid hydrogels, such as those made from pectin or sodium polyacrylate, are not only edible but also safe for consumption. While GelMA hydrogel is typically utilized in tissue engineering and subsequent implantation in patients for therapeutic regenerative medicine purposes, it has not been commonly employed in food processing. In this study, we cultivated cells within GelMA hydrogel due to its durability and ease of use in cell culture. Moving forward, we plan to investigate alternative types of matrices to develop cultured meat suitable for food applications.

      We have now described the GMO and hydrogel drawbacks in the discussion part. Please see lines 439-457.

      “As a proof-of-concept, we utilized the transgene method to achieve maximum myogenic induction and the final products still retain the foreign transgene fragment in the cells’ genome. It is therefore posing a risk of genetic modified food which is not suitable for mass production. In the next step, other non-transgenic means such as non-integrating vectors, chemical reprogramming, modified RNAs, and recombinant transgene removal techniques will be explored to develop transgene-free end products. Another food safety concern in this study is the use of GelMA hydrogel for culture meat production. Due to its excellent biocompatibility and mechanical flexibility, GelMA-based hydrogel has demonstrated significant potential in scalable 3D cell culture for creating artificial tissue ranging in sizes from millimeters to centimeters. It is widely used in 3D cell culture and tissue engineering for regenerative medicine, but less common in food processing and agricultural applications. Due to its special photo-crosslinking properties, biocompatibility and degradability, it allows this material to be shaped into complex tissue structures by 3D printing or modelling. Many researchers have also used GelMA hydrogel as a scaffold for culture meat production (Jeong et al., 2022; Li et al., 2021; Park et al., 2023). Later research will carefully consider hydrogel as well as other types of scaffold biomaterials for cost-effective and food-safety compliant culture meat production (Bomkamp et al., 2022). ”

      (2) From the view of skeletal muscle biology, the approaches (MyoD overexpression, hydrogel-based 3D tissue formation, and lipogenic induction) have already been tested.

      Thank you for the insightful comments from the perspective of skeletal muscle cell biology. We totally agree that the current approaches including MyoD overexpression, 3D cell culture and lipogenic induction, were routine experiments in muscle cell biology. However, we want to highlight that utilization of these classical and robust muscle cell approaches, combine with the unique advantages of fibroblast cells (easily accessible, immortalized, cost-effective, ...) would provide a novel and practical avenue for culture meat production. We stated these issues in the revised manuscript in the discussion part.

      Please see lines 511-515.

      “In conclusion, we have effectively utilized immortalized chicken fibroblasts in conjunction with classical myogenic/adipogenic transdifferentiation approaches within 3D hydrogel to establish a cultured meat model. This model allows for the precise regulation of the synthesis of key components found in conventional meat, including muscle, fat, and ECM.”

      (3) The common emphasis in this manuscript is to use the advantages of 3D culture for tissue differentiation. As the authors described, skeletal muscle is a highly aligned tissue. In this study, some results successfully demonstrated advantages in terms of myocyte alignment, maturation, and lipid deposition. However, the current results cannot address whether the entire 3D tissues maintained these advantageous characteristics or not. Because the method for 3D formation does not have any additional modifications to make the cells aligned, like micropatterning, scaffolding, or bioprinting.

      Thank you for the suggestion.

      We agree with the reviewer that the skeletal muscle tissues are composed of well organized, directional bundles of fibers, and the cell alignment would greatly affect the meat tenderness and sensory properties. Therefore, it is a desired attribute if the cells in the culture meat matrix could be aligned together. But this alignment would require sophisticated biomaterial engineering mainly involved in the scaffold manipulation which is beyond the scope of this study. The hydrogel used in this study formed different sizes of pores at random directions and we would expect the embedded cells to be totally non-directional. But we still found localized cell alignments in some parts of the gel matrix which confirming the cell-cell interactions, please see figure 3D. We describe this feature in the results part. In the future, we will be testing the application of physical or electrical stimulations to the matrix to see if we can align the cells better to make all the muscle cells in the whole matrix to align together.

      Please see lines 186-190.

      “The separate XY axis views of the orthogonal projections at different depths (Figure 3D) and a multi-angle video (Supplementary Video 2) also showed the several myotubes were aligned together. Nevertheless, many myotubes were oriented in different directions, preventing the entire matrix from aligning in one direction.”

      (4) In the skeletal muscle, fat accumulation mainly occurs in adipocytes between myocytes. This means that "intra-" muscular fat deposition is identified. However, lipid deposition within myocytes also occurred in this preparation (Supplementary Figure 7C). This situation is not "intra-" muscular accumulation, which sounds different from what is going on in normal skeletal muscle tissues. Please explain what happened and what biological situations accounted for this. Also, the authors should clarify better how lipogenesis was induced in the 3D tissues, such as cell types (transdifferentiated myocytes, remained/un-transdifferentiated fibroblasts, or both).

      Thank you for the very insightful question. We have revised the corresponding text to further explain the intramuscular fat distribution in different cell types in culture meat.

      We totally agree with the reviewer that intramuscular fat accumulation may occur mainly in the intramuscular adipocytes. However, under some pathological and physiological conditions in human and animals, the lipid droplets were also abundantly observed inside myofibers (intramyocellular lipids within myofiber cytoplasm). For instance, high intramyocellular lipid content was found in insulin resistance patients and paradoxically in endurance trained athletes, (doi.org/10.1016/j.tem.2012.05.009), as well as in some farm animals under intensive selective breeding (doi:10.2174/1876142910901010059). In the current study, with the Oil Red O staining of lipid droplets, we identified lipid deposition in both the transdifferentiated myocytes and the remained un-transdifferentiated fibroblasts in the culture meat. This lipid distribution pattern is comparable to the intramuscular fat storage pattern observed in some human and animals, in which fat accumulation occurs in both myofibers (intramyocellular lipids) and intramuscular adipocyte cells (extramyocellular lipids) which reside within the muscle tissue bundle but between myofibers. We reason that current adipogenic induction treatment caused lipogenesis in both the MyoD-transdifferentiated cells and un-transdifferentiated fibroblasts. It is difficult to compare the absolute amount of lipids between these two types of cells via the Oil Red O staining. Also, it is almost impossible to separate these two types of cells from the 3D meat mimics. Thus, we can only confirm the lipid deposition occurs in both transdifferentiated myocytes and un-transdifferentiated fibroblasts, but without knowing which one is dominant and the major contributor to the intramuscular fat content in the culture meat.

      Please see lines 486-492.

      “In this study, the deposition of fat in the myotubes/myofibers facilitated the storage of significant lipid quantities in transdifferentiated muscle cells, known as intramyocellular lipids. Additionally, we observed Oil Red O staining in the remaining un-transdifferentiated fibroblasts, resembling cells of intramuscular adipocytes (extramyocellular lipids) found within muscle tissue. Hence, current adipogenic induction treatment caused lipogenesis in both the MyoD-transdifferentiated cells and un-transdifferentiated fibroblasts.”

    1. Author response:

      Reviewer #1 (Public Review):

      Given that this is one of the first studies to report the mapping of longitudinal intactness of proviral genomes in the globally dominant subtype C, the manuscript would benefit from placing these findings in the context of what has been reported in other populations, for example, how decay rates of intact and defective genomes compare with that of other subtypes where known.

      Most published studies are from men living with HIV-1 subtype B and the studies are not from the hyperacute infection phase and therefore a direct head-to-head comparison with the FRESH study is difficult. However, we can cite/highlight and contrast our study with a few examples from other acute infection studies as follows.

      (1) Peluso et. al., JCI, 2020, showed that in Caucasian men (SCOPE study), with subtype B infection, initiating ART during chronic infection virus intact genomes decayed at a rate of 15.7% per year, while defective genomes decayed at a rate of 4% per year. In our study we showed that in chronic treated participants genomes decreased at a rate of 25% (intact) and 3% (defective) per month for the first 6 months of treatment.

      (2) White et. al., PNAS, 2021, demonstrated that in a cohort of African, white and mixed-race American men treated during acute infection, the rate of decay of intact viral genomes in the first phase of decay was <0.3 logs copies in the first 2-3 weeks following ART initiation. In the FRESH cohort our data from acute treated participants shows a comparable decay rate of 0.31 log copies per month for virus intact genomes.

      (3) A study in Thailand (Leyre et. al., 2020, Science Translational Medicine), of predominantly HIV-1 CRF01-AE subtype compared HIV-reservoir levels in participants starting ART at the earliest stages of acute HIV infection (in the RV254/SEARCH 010 cohort) and participants initiating ART during chronic infection (in SEARCH 011 and RV304/SEARCH 013 cohorts). In keeping with our study, they showed that the frequency of infected cells with integrated HIV DNA remained stable in participants who initiated ART during chronic infection, while there was a sharp decay in these infected cells in all acutely treated individuals during the first 12 weeks of therapy. Rates of decay were not provided and therefore a direct comparison with our data from the FRESH cohort is not possible.

      (4) A study by Bruner et. al., Nat. Med. 2016, described the composition of proviral populations in acute treated (within 100 days) and chronic treated (>180 days), predominantly male subtype B cohort. In comparison to the FRESH chronic treated group, they showed that in chronic treated infection 98% (87% in FRESH) of viral genomes were defective, 80% (60% in FRESH) had large internal deletions and 14% (31% in FRESH) were hypermutated. In acute treated 93% (48% in FRESH) were defective and 35% (7%) in FRESH were hypermutated. The differences frequency of hypermutations could be explained by the differences in timing of infection specifically in the acute treated groups were FRESH participants initiate ART at a median of 1 day after infection. It is also possible that sex- or race-based differences in immunological factors that impact the reservoir may play a role.

      This study also showed that large deletions are non-random and occur at hotspots in the HIV-1 genome. The design of the subtype B IPDA assay (Bruner et. al., Nature, 2019) is based on optimal discrimination between intact and deleted sequences - obtained with a 5′ amplicon in the Ψ region and a 3′ amplicon in Envelope. This suggest that Envelope is a hotspot for large while deletions in Ψ is the site of frequent small deletions and is included in larger 5′ deletions. In the FRESH cohort of HIV-1 subtype C, genome deletions were most frequently observed between Integrase and Envelope relative to Gag (p<0.0001–0.001).

      (5) In 2017, Heiner et. al., in Cell Rep, also described genetic characteristics of the latent HIV-1 reservoir in 3 acute treated and 3 chronic treated male study participants with subtype B HIV. Their data was similar to Bruner et. al. above showing proportions of intact proviruses in participants who initiated therapy during acute/early infection at 6% (94% defective) and chronic infection at 3% (97% defective). In contrast the frequencies in FRESH in acute treated were 52% intact and 48% defective and in chronic infection were 13% intact and 87% defective. These differences could be attributed to the timing of treatment initiation where in the aforementioned study early treatment ranged from 0.6-3.4 months after infection.

      Indeed, in the abstract, the authors indicate that treatment was initiated before the peak. The use of the term 'peak' viremia in the hyperacute-treated group could perhaps be replaced with 'highest recorded viral load'. The statistical comparison of this measure in the two groups is perhaps more relevant with regards to viral burden over time or area under the curve viral load as these are previously reported as correlates of reservoir size.

      We will edit the manuscript text to describe the term peak viraemia in hyperacute treated participants more clearly. We will perform an analysis of area under the curve to compare viral burden in the two study groups.

      Reviewer #2 (Public Review):

      Other factors also deserve consideration and include age, and environment (e.g. other comorbidities and coinfections.)

      We agree that these factors could play a role however participants in this study were of similar age (18-23), and information on co-morbidities and coinfections are not known.

      Reviewer #3 (Public Review):

      The word reservoir should not be used to describe proviral DNA soon after ART initiation. It is generally agreed upon that there is still HIV DNA from actively infected cells (phase 1 & 2 decay of RNA) during the first 6-12 months of ART. Only after a full year of uninterrupted ART is it really safe to label intact proviral HIV DNA as an approximation of the reservoir. This should be amended throughout.

      We agree and will amend the use of the word reservoir to only refer to the proviral DNA load after full viral suppression, i.e., during undetectable viral load.

      All raw, individualized data should be made available for modelers and statisticians. It would be very nice to see the RNA and DNA data presented in a supplementary figure by an individual to get a better grasp of intra-host kinetics.

      We will make all relevant data available and accessible to interested parties.

      The legend of Supplementary Figure 2 should list when samples were taken.

      The data in this figure represents an overall analysis of all sequences available for each participant at all time points. This will be explained more clearly in the manuscript and added to the figure legend.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This is a follow-up study to the authors' previous eLife report about the roles of an alpha-arrestin called protein thioredoxin interacting protein (Txnip) in cone photoreceptors and in the retinal pigment epithelium. The findings are important because they provide new information about the mechanism of glucose and lactate transport to cone photoreceptors and because they may become the basis for therapies for retinal degenerative diseases.

      Strengths:

      Overall, the study is carefully done and, although the analysis is fairly comprehensive with many different versions of the protein analyzed, it is clearly enough described to follow. Figure 4 greatly facilitated my ability to follow, understand and interpret the study. The authors have appropriately addressed a few concerns about statistical significance and the relationship between their findings and previous studies of the possible roles of Txnip on GLUT1 expression and localization on the surfaces of RPE cells.

      We are delighted that Reviewer #1 is satisfied with this revised version.

      Reviewer #2 (Public Review):

      The hard work of the authors is much appreciated. With overexpression of a-arrestin Txnip in RPE, cones and the combined respectively, the authors show a potential gene agnostic treatment that can be applied to retinitis pigmentosa. Furthermore, since Txnip is related to multiple intracellular signaling pathway, this study is of value for research in the mechanism of secondary cone dystrophy as well.

      There are a few areas in which the article may be improved through further analysis and application of the data, as well as some adjustments that should be made in to clarify specific points in the article.

      Strengths

      • The follow-up study builds on innovative ground by exploring the impact of TxnipC247S and its combination with HSP90AB1 knockdown on cone survival, offering novel therapeutic pathways.

      • Testing of different Txnip deletion mutants provides a nuanced understanding of its functional domains, contributing valuable insights into the mechanism of action in RP treatment.

      • The findings regarding GLUT1 clearance and the differential effects of Txnip mutants on cone and RPE cells lay the groundwork for targeted gene therapy in RP.

      Weaknesses

      • The focus on specific mutants and overexpression systems might overlook broader implications of Txnip interactions and its variants in the wider context of retinal degeneration.

      Txnip is not expressed in WT or RP cones, as described in our previous study (Xue et al., 2021, eLife), so we could not perform loss of function assays. We thus chose overexpression, and assayed various alleles, based upon the literature, as we describe in our manuscript.

      • The study's reliance on cell count and GLUT1 expression as primary outcomes misses an opportunity to include functional assessments of vision or retinal health, which would strengthen the clinical relevance.

      In our previous study, we demonstrated that the optomotor response of Txnip-treated RP mice improved (Xue et al., 2021, eLife). Also, as described in our previous Txnip study, as well as an independent study (Xue et al., 2021, eLife; Xue et al., 2023, PNAS), ERG assays of Txnip-treated RP cones were no different than the controls. Other therapies that prolong RP cone survival and the optomotor response in our lab also failed to save the ERG, suggesting that there are other pathways that need to be addressed, e.g. the visual cycle. A combination therapy addressing multiple problems is one of our goals.

      • The paper could benefit from a deeper exploration of why certain treatments (like Best1-146 Txnip.C247S) do not lead to cone rescue and the potential for these approaches to exacerbate disease phenotypes through glucose shortages.

      This system is more complicated than we currently understand, and more work needs to be done.

      • Minor inconsistencies, such as the missing space in text references and the need for clarification on data representation (retinas vs. mice), should be addressed for clarity and accuracy.

      The missing spaces are added.

      We described the strategy of injecting the same mouse in each eye, one eye with control and one with the experimental vector. However, the following sentence has been added to the Materials and Methods to better assist the reader:

      “In almost all experiments, other than as noted, one eye of the mouse was treated with control (AAV8-RedO-H2BGFP, 2.5 × 108 vg/eye), and the other eye was treated with the experimental vector plus AAV8-RedO-H2BGFP, 2.5 × 108 vg/eye.”

      • The observation of promoter leakage and potential vector tropism issues raise questions about the specificity and efficiency of the gene delivery system, necessitating further discussion and validation.

      The following sentences have been added to the Results. We do not think this phenomenon affects the practice of the experiments or the interpretation of the results in this study.

      “To enable automated cone counting and trace the infection, we co-injected an AAV (AAV8-RedO-H2BGFP-WPRE-bGHpA) encoding an allele of GFP fused to histone 2B (H2BGFP), which localized to the nucleus. As the red opsin promoter was used to express this gene, H2BGFP was seen in cone nuclei, but not in the RPE, if AAV8-RedO-H2BGFP-WPRE-bGHpA was injected alone. However, when an AAV that expressed in the RPE, i.e. AAV8-Best1-Sv40intron-(Gene)-WPRE-bGHpA, was co-injected with AAV8-RedO-H2BGFP-WPRE-bGHpA, H2BGFP was expressed in the RPE, along with expression in cones (Figure 2A). We speculate that this is due to concatenation or recombination of the two genomes, such that the H2BGFP comes under the control of the RPE promoter. This may be due to the high copy number of AAV in the RPE, as it did not happen in the reverse combination, i.e. AAV with an RPE promoter driving GFP and a cone promoter driving another gene, perhaps due to the observation that the AAV genome copy number is »10 fold lower in cones than in the RPE (Wang et al., 2020).”

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Public Review):

      Summary:

      This paper provides a straightforward mechanism of how mycobacterial cAMP level is increased under stressful conditions and shows that the increase is important for the survival of the bacterium in animal hosts. The cAMP level is increased by decreasing the expression of an enzyme that degrades cAMP.

      We thank the reviewer for these extremely encouraging comments.

      Strengths:

      The paper shows that under different stresses the response regulator PhoP represses a phosphodiesterase (PDE) that degrades cAMP specifically. Identification of PhoP as a regulator of cAMP is significant progress in understanding Mtb pathogenesis, as increase in cAMP apparently increases bacterial survival upon infection. On the practical side, reduction of cAMP by increasing PDE can be a means to attenuate the growth of the bacilli. The results have wider implications since PhoP is implicated in controlling diverse mycobacterial stress responses and many bacterial pathogens modulate host cell cAMP level. The results here are straightforward, internally consistent, and of both theoretical and applied interests. The results also open considerable future work, especially how increases in cAMP level help to increase survival of the pathogen.

      Weaknesses:

      It is not clear whether PhoP-PDE Rv0805 is the only pathway to regulate cAMP level under stress.

      Reviewer 1 (Recommendations for the authors):

      (1) L.1: "maintenance of" or 'regulating'- I thought change in cAMP level upon stress is the whole point of the paper. Also, can replace "intracellular survival" with 'survival in host macrophages' if you want to be more specific.

      We agree with the reviewer, and therefore, we have now replaced “maintenance of” with “regulating cAMP level” in the title. However, we feel more comfortable with “intracellular survival” rather than being more specific with ‘survival in host macrophages’ as we have also shown animal experiments to demonstrate ‘in vivo’ effect in mice lung and spleen.

      (2) L.26: ---requires the bacterial virulence regulator –

      The suggested change has been made to the text.

      (3) L.30: Replace "phoP locus since the" with 'PhoP since this'. (The product, not the locus, is the regulator). The same comment for l.113.

      We agree with the reviewer. The suggested changes have been made to the text.

      (4) L.31: Change represtsor to repressor.

      We are sorry for the embarrassing spelling mistake. We have rectified the mistake in the revised version.

      (5) L.32: "hydrolytically degrades" or hydrolyses? (lytic and degrade sound like tautology). Same comment for l.117.

      We agree. The suggested change has been made to the text in both places of the revised manuscript.

      (6) L.35: I would also suggest changing "intra-mycobacterial" to 'intra bacterial' because you are talking about one bacterium here. The same change is recommended in l.29.

      Following reviewer’s recommendation, we have made the changes in the revised manuscript.

      (7) L.37: bacillus unless use of the plural form is the norm in the field.

      We agree. The suggested change has been made to the text.

      (8) L.43: Delete "intracellular" and change "intracellular" to host in l.44.

      The suggested changes have been made to the text.

      (9) L.66: --that a burst--

      We have corrected the mistake in the revised manuscript.

      (10) L.76: Receptor or receptor?

      We have corrected the mistake in the revised manuscript.

      (11) L.86: -- mechanisms of regulation of mycobacterial cAMP level. (homeostasis needs to be introduced first, and not used in the concluding statement for the first time).

      The suggested changes have been made to the text.

      (12) L.96: "essential" or 'a requirement'. (reduction is not the same as elimination)

      We understand the reviewer’s concern. However, several studies have independently established that phoPR remains an essential requirement for mycobacterial virulence.

      (13) L.97: Moreover, a mutant

      The suggested change has been made to the text.

      (14) L.113: --locus since PhoP has been –

      The suggested change has been made to the text.

      (15) L.119: mechanism or manner? (you are stating a fact, not a mechanism)

      We agree. We have now replaced ‘mechanism’ with ‘manner’ in the revised manuscript.

      (16) L.130: --lacking copies of both phoP and phoR (I am assuming you don't have two copies of each gene)

      We understand the reviewer’s concern. For better clarity, we have now clearly mentioned that the phoPR-KO mutant lacks both the single copies of phoP and phoR genes.

      (17) L.156: Indicate why GroEL2? - cells as another cytoplasmic protein, GroEL2 was also undetectable

      We have now mentioned it in the secretion experiments that mycobacterial cells did not undergo autolysis. To prove this point, we have used cytoplasmic GroEL2 as a marker protein. The absence of detectable GroEL2 in the culture filtrates (CFs) suggests absence of autolysis. To this end, we have modified the sentence in the revised manuscript (duplicated below):

      “Fig. 1C confirms absence of autolysis of mycobacterial cells as GroEL2, a cytoplasmic protein, was undetectable in the culture filtrates (CF).”

      (18) L.266: May delete "Together". Start with These data--, which would draw more attention to integrated view. In l.268-270, a reminder that intracellular pH is acidic in the normal course would enhance the physiological significance of the present results.

      We agree. We have made the suggested changes to the text. In view of the second comment of the reviewer, we have modified the text (duplicated below):

      “These data represent an integrated view of our results suggesting that PhoP-dependant repression of rv0805 regulates intra-mycobacterial cAMP level. In keeping with these results, activated PhoP under acidic pH conditions significantly represses rv0805, and intracellular mycobacteria most likely utilizes a higher level of cAMP to effectively mitigate stress for survival under hostile environment including acidic pH of the phagosome.”

      (19) L.272: Delete "and intracellular survival" (?) (I am assuming the survival is due to stress tolerance; also the section talks about stress only). No period in l.273.

      Following reviewer’s recommendations, the suggested changes have been made to the text.

      (20) L.295: Start the sentence thus: It appears that at least one of ---. (This would put more emphasis on the inference)

      We agree. We have now incorporated the recommended changes in the revised version.

      (21) L.301: No parenthesis.

      The parenthesis has been removed in the revised manuscript.

      (22) L.306: Together already implies these. Either delete Together (which I would prefer) or say 'Together, the results suggest that strains expressing wild type and mutant----properties, and the results are

      We agree. We have now deleted ‘Together’ in the revised manuscript.

      (23) L.311: These results support our view that higher---- (to avoid repetition of l.266)

      We agree. We have now incorporated the suggested change in the revised manuscript.

      (24) L.316: Using or with?

      We think “with” goes well with the statement.

      (25) L.329: Rephrase thus: Effect of intra-bacterial cAMP level on in vivo--

      The recommended change has been made to the text.

      (26) L.333: I would use ~, if you want to indicate about.

      We agree. We have now used ‘~’ in the revised version. Changes were incorporated in lines 328, 330 and 333 of the revised manuscript.

      (27) L.350: Change "somewhat functionally" to phenotypically?

      We thank the reviewer for this suggestion. We have changed “somewhat functionally” to “phenotypically” in the revised manuscript.

      (28) L.361: Change "is connected to" to 'regulates'.

      The suggested change has been made to the text.

      (29) L.365: ACs (to be parallel with PDEs)

      We agree. The suggested change has been made to the text.

      (30) L.366: delete "very" (let the readers decide how recent from the reference date).

      The suggested change has been made to the text.

      (31) L.382: level remained unknown before the present study.

      The recommended change has been made to the text.

      (32) L.399: add at the end of the sentence 'under stress'. Also, represent, not represents.

      The recommended changes have been made to the text.

      (33) L.560 and 571: Section headings formatted differently from the rest. Similar problem in l.900.

      We have rectified the issue and all of the section headings are now formatted in the same style.

      Reviewer #2 (Public Review):

      Summary:

      In the manuscript, the authors have presented new mechanistic details to show how intracellular cAMP levels are maintained linked to the phosphodiesterase enzyme which in turn is controlled by PhoP. Later, they showed the physiological relevance linked to altered cAMP concentrations.

      Strengths:

      Well thought out experiments. The authors carefully planned the experiments well to uncover the molecular aspects of it diligently.

      We thank the reviewer for these extremely encouraging comments.

      Weaknesses:

      Some fresh queries were made based on the author's previous responses and hope to get satisfactory answers this time.

      We provide below a point-by-point response to the fresh queries.

      (2) Line 134: please describe the complementation strain features as it is mentioned for the first time (plasmid, copy number, promoter etc.) in the manuscript. Especially under NO stress what could be the authors' justification regarding the high cAMP concentration in the complementation strain?

      As recommended by the reviewer, the details of construction of the complemented strain have been incorporated in the 'Materials and Methods' section of the revised manuscript (duplicated below): "To complement phoPR expression, pSM607 containing a 3.6-kb DNA fragment of M. tuberculosis phoPR including 200-bp phoP promoter region, a hygromycin resistance cassette, attP site and the gene encoding phage L5 integrase, as detailed earlier (Walters et al., 2006) was used to transform phoPR mutant to integrate at the L5 attB site.

      " To address the reviewer's other concern, we have now included the following sentence in the 'Results' section of the revised manuscript (duplicated below): "A higher cAMP level in the complemented strain under NO stress is possibly attributable to reproducibly higher phoP expression in the complemented mutant under specific stress condition (Khan et al., 2022)."

      Reference: Khan et al. (2022) Convergence of two global regulators to coordinate expression of essential virulence determinants of Mycobacterium tuberculosis. eLife 2022, 11:e80965.

      New query: The complemented gene (in pSM607 plasmid) becomes a single copy after chromosomal integration, so it should ideally behave like a WT strain. How could authors still justify the high cAMP concentration under NO stress?

      We agree with the reviewer. We are unable to provide a cogent justification regarding this result. We speculate that PhoP is strikingly activated under NO stress by a non-canonical mechanism and strongly represses rv0805 expression. As a result, there is a significantly higher cAMP concentration in case of the complemented mutant under NO stress.

      (13) Line 292: There is a difference between red and green bars. Authors should do statistical analysis and then comment on whether overexpression of WT and mutant pde are different or similar, to me they are different; also, explain why the WT-Rv0805 strain is different than the phoPR-KO strain in the context of cell wall metabolism.

      As recommended by the reviewer, we have now included statistical significance of the data in the revised version, and modified the text accordingly in the manuscript.

      New query: Authors are asked to put a statistical significance test between WT-Rv0805 and WT-Rv0805M.

      We have included it in the modified figure. Also, to explain it we incorporated new text in the legend to Fig. 4C of the revised manuscript (duplicated below):

      “Note that similar to phoPR-KO, WT-Rv0805 shows a comparably higher sensitivity to CHP relative to WT bacilli. However, WT-Rv0805M expressing a mutant Rv0805, shows a significantly lower sensitivity to CHP relative to WT-Rv0805, as measured by the corresponding CFU values.”

      (14) Line 299-303: Authors should explain how the colocalization % are calculated. Also, in the figure 4D merge panel please highlight the difference.

      As suggested by the reviewer, we have now explained the methodology used to calculate percent colocalization in greater details. Also, we have modified Figure 4D to highlight the difference between samples shown in merge panel. Please see our response to comment # 33 from the Reviewer 1.

      New query: In the figure legend it should be mentioned that the white arrow indicates non-co-localization which is visibly higher in WT and WT Rvo805M.

      We thank the reviewer for this very important suggestion. We have now included the following text in the legend to Fig. 4D of the revised manuscript.

      “White arrowheads in the merge panels indicate non-colocalization, which remains higher in WT-H37Rv and WT-Rv0805M relative to phoPR-KO or WT-Rv0805.”

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      Through an unbiased genomewide KO screen, the authors identified loss of DBT to suppress MG132-mediated death of cultured RPE cells. Further analyses suggested that DBT reduces ubiquitinated proteins by promoting autophagy. Mechanistic studies indicated that DBT loss promotes autophagy via AMPK and its downstream ULK and mTOR signaling. Furthermore, loss of DBT suppresses polyglutamine- or TDP-43-mediated cytotoxicity and/or neurodegeneration in fly models. Finally, the authors showed that DBT proteins are increased in ALS patient tissues, compared to non-neurological controls.

      Strengths:

      The idea is novel, the evidence is mostly convincing, and the data are clean. The findings have implications for human diseases.

      Reply: We thank the reviewer for the supportive comments.

      Weaknesses:

      More experiments are needed to establish the connections between DBT and autophagy. The mechanistic studies are somewhat biased, and it's unclear whether the same mechanism (i.e., AMPK-->mTOR) can be applied to TDP-43-mediated neurodegeneration. Also, some data interpretation has to be more accurate.

      Reply: We thank the reviewer for raising these questions, and we have provided additional evidence in the revised manuscript to support the model that DBTKO can enhance autophagy and induce resistance to TDP-43-associated toxicity. This is described in greater detail below.

      (1) To provide further evidence for the connection between DBT and autophagy, we have introduced additional controls. For the additional controls, we have included the AMPK shRNA and drug treatment controls (Fig.4D, Fig.S4B), and these results suggest that reducing the AMPK level renders DBTKO cells sensitive to MG132 toxicity. We also added the TSC1 shRNA and mTOR agonist treatment controls (Fig.5E, Fig.S4G), and the results show that increasing mTOR levels also make the DBTKO cells sensitive to MG132.

      (2) To further confirm the roles of AMPK and mTOR in DBTKO cells, we introduced the AMPK agonist (EX229) and mTOR inhibitors (RAD001 and AZD8055) in co-treatment experiments with MG132 and then measured cell survival (Fig.S4D, S4G). The results indicate that promoting AMPK activation or inhibiting mTOR can enhance cell resistance to MG132-induced toxicity.

      (3) Additionally, we included the overexpression and rescue experiments for DBT and analyzed the AMPK-ULK1 signaling in WT RPE1 and DBTKO cells (Fig.S5D, S5E). The results indicate that the increase of DBT can significantly reduce the phosphorylation of AMPK/ULK1 and the levels of the autophagy marker LC3II. Together, these results suggest that DBT plays an important role in autophagy.

      (4) We had shown in the original version of the manuscript that DBTKO renders cells more resistant to TDP-43-associated toxicity, similar to the tolerance of MG132-induced toxicity. Here we further show that expression of TDP-43M337V enhances the phosphorylation of AMPK in the DBTKO cells (Fig. S7A), similar to the effect of the MG132 treatment. These results suggest that the resistance of DBTKO cells to MG132 or TDP-43-assoicated toxicity shares a similar mechanism of activated the AMPK signaling.

      Reviewer #2 (Public Review):

      Summary:

      Hwang, Ran-Der et al utilized a CRISPR-Cas9 knockout in human retinal pigment epithelium (RPE1) cells to evaluate for suppressors of toxicity by the proteasome inhibitor MG132 and identified that knockout of dihydrolipoamide branched chain transacylase E2 (DBT) suppressed cell death. They show that DBT knockout in RPE1 cells does not alter proteasome or autophagy function at baseline. However, with MG132 treatment, they show a reduction in ubiquitinated proteins but with no change in proteasome function. Instead, they show that DBT knockout cells treated with MG132 have improved autophagy flux compared to wildtype cells treated with MG132. They show that MG132 treatment decreases ATP/ADP ratios to a greater extent in DBT knockout cells, and in accordance causes activation of AMPK. They then show downstream altered autophagy signaling in DBT knockout cells treated with MG132 compared to wild-type cells treated with MG132. Then they express the ALS mutant TDP43 M337 or expanded polyglutamine repeats to model Huntington's disease and show that knockdown of DBT improves cell survival in RPE1 cells with improved autophagic flux. They also utilize a Drosophila model and show that utilizing either a RNAi or CRISPR-Cas9 knockout of DBT improves eye pigment in TDP43M337V and polyglutamine repeat-expressing transgenic flies. Finally, they show evidence for increased DBT in postmortem spinal cord tissue from patients with ALS via both immunoblotting and immunofluorescence.

      Strengths:

      This is a mechanistic and well-designed paper that identifies DBT as a novel regulator of proteotoxicity via activating autophagy in the setting of proteasome inhibition. Major strengths include careful delineation of a mechanistic pathway to define how DBT is protective. These conclusions are largely justified, but additional experiments and information would be useful to clarify and extend these conclusions.

      Reply: We thank the reviewer for the supportive comments.

      Weaknesses:

      The large majority of the experiments are evaluating suppression of drug (MG132) toxicity in an in vitro epithelial cell line, so the generalizability to disease is unclear. Indeed, MG132 itself has been shown to modulate autophagy, and off-target effects of MG132 are not addressed. While this paper is strengthened by the inclusion of mouse-induced motor neurons, Drosophila models, and postmortem tissue, the putative mechanisms are minimally evaluated in these models.

      Also, this effect is only seen with MG132 treatment, at a dose that causes markedly impaired cell survival. In this setting, it is certainly plausible that changes in autophagy could be the result of differences in cell survival, as opposed to an underlying mechanism for cell survival. Additional controls would be useful to increase confidence that DBT knockdown is protective via modulation of autophagy.

      While the authors report increased DBT in postmortem ALS tissue as suggestive that DBT may modulate proteotoxicity in neurodegeneration, this point would be better supported with the evaluation of overexpression of DBT in their model.

      Reply: We appreciate the reviewer for raising these questions, and we have provided further evidence in the revised manuscript to support the proposed mechanism that DBTKO confers resistance to MG132-induced toxicity through activation of autophagy. This is discussed in greater detail below.

      (1) To provide further mechanistic analysis, we have included additional controls for the analysis of AMPK signaling in Fig. 4D and Fig. S4B. These results demonstrate that using drugs or shRNAs to reduce AMPK activity can decrease DBTKO survival. We have also shown that that an increasing the AMPK activity with an activator enhances the survival of both WT and DBTKO cells under MG132 treatment (Fig. S4D), suggesting that DBTKO cells resist MG132-induced toxicity through the activation of AMPK signaling.

      (2) We have included additional controls for the analysis of mTOR signaling in Fig. 5E and Fig. S4F. The results in Fig. 5E show that reducing TSC1 using shRNAs can decrease DBTKO survival. We also added the experiments with mTOR agonist MHY1485 as a control in Fig. S4F. These results indicate that mTOR activation can promote DBTKO cells' sensitivity to MG132 toxicity. To further confirm the importance of mTOR in DBTKO-mediated resistance to MG132 toxicity, we included the mTOR inhibitors RAD001 and AZD8055 in the co-treatment experiments with MG132, and then measured cell survival (Fig. S4G). The results show that both mTOR inhibitors can enhance cell resistance to MG132-induced toxicity (Fig. S4G). These findings suggest that mTOR inhibition is required for DBTKO-mediated cell survival under MG132 treatment.

      (3) To further test the hypothesis that DBT knockdown is protective via modulation of autophagy, we have introduced the overexpression of DBT and the rescue of DBT in DBTKO cells to analyze the AMPK signaling that regulates autophagy (Fig. S5E). The results demonstrate that overexpression of DBT significantly reduced the phosphorylation of AMPK and ULK1 (Fig. S5E). In the rescue experiment, the results mirror those of the overexpression experiment, showing a significant reduction in the phosphorylation of AMPK and ULK1 (Fig. S5E). We also analyzed the autophagy marker LC3II in both the overexpression and rescue experiments, and the results indicate that increasing the DBT level specifically reduces the LC3II level (Fig. S5D). These results support the model that loss of DBT promotes the activation of autophagy.

      (4) To test the hypothesis that DBT may modulate proteotoxicity in neurodegeneration, we included the studies with TDP-43M337V and found that the expression of the mutant TDP43 enhanced the phosphorylation of AMPK in the DBTKO cells (Fig. S7A), consistent with the observations made with MG-132 treatment. Together with other findings in the manuscript, these results indicate that DBTKO can sensitize the activation of the AMPK signaling and confer the resistance to TDP-43-associated toxicity.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Editor’s summary:

      This paper by Castello-Serrano et al. addresses the role of lipid rafts in trafficking in the secretory pathway. By performing carefully controlled experiments with synthetic membrane proteins derived from the transmembrane region of LAT, the authors describe, model and quantify the importance of transmembrane domains in the kinetics of trafficking of a protein through the cell. Their data suggest affinity for ordered domains influences the kinetics of exit from the Golgi. Additional microscopy data suggest that lipid-driven partitioning might segregate Golgi membranes into domains. However, the relationship between the partitioning of the synthetic membrane proteins into ordered domains visualised ex vivo in GPMVs, and the domains in the TGN, remain at best correlative. Additional experiments that relate to the existence and nature of domains at the TGN are necessary to provide a direct connection between the phase partitioning capability of the transmembrane regions of membrane proteins and the sorting potential of this phenomenon.

      The authors have used the RUSH system to study the traffic of model secretory proteins containing single-pass transmembrane domains that confer defined affinities for liquid ordered (lo) phases in Giant Plasma Membrane derived Vesicles (GPMVs), out of the ER and Golgi. A native protein termed LAT partitioned into these lo-domains, unlike a synthetic model protein termed LAT-allL, which had a substituted transmembrane domain. The authors experiments provide support for the idea that ER exit relies on motifs in the cytosolic tails, but that accelerated Golgi exit is correlated with lo domain partitioning.

      Additional experiments provided evidence for segregation of Golgi membranes into coexisting lipid-driven domains that potentially concentrate different proteins. Their inference is that lipid rafts play an important role in Golgi exit. While this is an attractive idea, the experiments described in this manuscript do not provide a convincing argument one way or the other. It does however revive the discussion about the relationship between the potential for phase partitioning and its influence on membrane traffic.

      We thank the editors and scientific reviewers for thorough evaluation of our manuscript and for positive feedback. While we agree that our experimental findings present a correlation between trafficking rates and raft affinity, in our view, the synthetic, minimal nature of the transmembrane protein constructs in question makes a strong argument for involvement of membrane domains in their trafficking. These constructs have no known sorting determinants and are unlikely to interact directly with trafficking proteins in cells, since they contain almost no extramembrane amino acids. Yet, the LATTMD traffics through Golgi similarly to the full-length LAT protein, but quite different from mutants with lower raft phase affinity. We suggest that these observations can be best rationalized by involvement of raft domains in the trafficking fates and rates of these constructs, providing strong evidence (beyond a simple correlation) for the existence and relevance of such domains.

      We have substantially revised the manuscript to address all reviewer comments, including several new experiments and analyses. These revisions have substantially improved the manuscript without changing any of the core conclusions and we are pleased to have this version considered as the “version of record” in eLife.

      Below is our point-by-point response to all reviewer comments.

      ER exit:

      The experiments conducted to identify an ER exit motif in the C-terminal domain of LAT are straightforward and convincing. This is also consistent with available literature. The authors should comment on whether the conservation of the putative COPII association motif (detailed in Fig. 2A) is significantly higher than that of other parts of the C-terminal domain.

      Thank you for this suggestion, this information has now been included as Supp Fig 2B. While there are other wellconserved residues of the LAT C-terminus, many regions have relatively low conservation. In contrast, the essential residues of the COPII association motif (P148 and A150) are completely conserved across in LAT across all species analyzed.

      One cause of concern is that addition of a short cytoplasmic domain from LAT is sufficient to drive ER exit, and in its absence the synthetic constructs are all very slow. However, the argument presented that specific lo phase partitioning behaviour of the TMDs do not have a significant effect on exit from the ER is a little confusing. This is related to the choice of the allL-TMD as the 'non-lo domain' partitioning comparator. Previous data has shown that longer TMDs (23+) promote ER export (eg. Munro 91, Munro 95, Sharpe 2005). The mechanism for this is not, to my knowledge, known. One could postulate that it has something to do with the very subject of this manuscript- lipid phase partitioning. If this is the case, then a TMD length of 22 might be a poor choice of comparison. A TMD 17 Ls' long would be a more appropriate 'non-raft' cargo. It would be interesting to see a couple of experiments with a cargo like this.

      The basis for the claim that raft affinity has relatively minor influence on ER exit kinetics, especially in comparison to the effect of the putative COPII interaction motif, is in Fig 1G. We do observe some differences between constructs and they may be related to raft affinity, however we considered these relatively minor compared to the nearly 4-fold increase in ER efflux induced by COPII motifs.

      We have modified the wording in the manuscript to avoid the impression that we have ruled out an effect of raft affinity of ER exit.

      We believe that our observations are broadly consistent with those of Munro and colleagues. In both their work and ours, long TMDs were able to exit the ER. In our experiments, this was true for several proteins with long TMDs, either as fulllength or as TMD-only versions (see Fig 1G). We intentionally did not measure shorter synthetic TMDs because these would not have been comparable with the raft-preferring variants, which all require relatively long TMDs, as demonstrated in our previous work1,2. Thus, because our manuscript does not make any claims about the influence of TMD length on trafficking, we did not feel that experiments with shorter non-raft constructs would substantively influence our conclusions.

      However, to address reviewer interest, we did complete one set of experiments to test the effect of shortening the TMD on ER exit. We truncated the native LAT TMD by removing 6 residues from the C-terminal end of the TMD (LAT-TMDd6aa). This construct exited the ER similarly to all others we measured, revealing that for this set of constructs, short TMDs did not accumulate in the ER. ER exit of the truncated variant was slightly slower than the full-length LAT-TMD, but somewhat faster than the allL-TMD. These effects are consistent with our previous measurements with showed that this shortened construct has slightly lower raft phase partitioning than the LAT-TMD but higher than allL2. While these are interesting observations, a more thorough exploration of the effect of TMD length would be required to make any strong conclusion, so we did not include these data in the final manuscript.

      Author response image 1.

      Golgi exit:

      For the LAT constructs, the kinetics of Golgi exit as shown in Fig. 3B are surprisingly slow. About half of the protein Remains in the Golgi at 1 h after biotin addition. Most secretory cargo proteins would have almost completely exited the Golgi by that time, as illustrated by VSVG in Fig. S3. There is a concern that LAT may have some tendency to linger in the Golgi, presumably due to a factor independent of the transmembrane domain, and therefore cannot be viewed as a good model protein. For kinetic modeling in particular, the existence of such an additional factor would be far from ideal. A valuable control would be to examine the Golgi exit kinetics of at least one additional secretory cargo.

      We disagree that LAT is an unusual protein with respect to Golgi efflux kinetics. In our experiments, Golgi efflux of VSVG was similar to full-length LAT (t1/2 ~ 45 min), and both of these were similar to previously reported values3. Especially for the truncated (i.e. TMD) constructs, it is very unlikely that some factor independent of their TMDs affects Golgi exit, as they contain almost no amino acids outside the membrane-embedded TMD.

      Practically, it has proven somewhat challenging to produce functional RUSH-Golgi constructs. We attempted the experiment suggested by the reviewer by constructing SBP-tagged versions of several model cargo proteins, but all failed to trap in the Golgi. We speculate that the Golgin84 hook is much more sensitive to the location of the SBP on the cargo, being an integral membrane protein rather than the lumenal KDEL-streptavidin hook. This limitation can likely be overcome by engineering the cargo, but we did not feel that another control cargo protein was essential for the conclusions we presented, thus we did not pursue this direction further.

      Comments about the trafficking model

      (1) In Figure 1E, the export of LAT-TMD from the ER is fitted to a single-exponential fit that the authors say is "well described". This is unclear and there is perhaps something more complex going on. It appears that there is an initial lag phase and then similar kinetics after that - perhaps the authors can comment on this?

      This is a good observation. This effect is explainable by the mechanics of the measurement: in Figs 1 and 2, we measure not ‘fraction of protein in ER’ but ‘fraction of cells positive for ER fluorescence’. This is because the very slow ER exit of the TMD-only constructs present a major challenge for live-cell imaging, so ER exit was quantified on a population level, by fixing cells at various time points after biotin addition and quantifying the fraction of cells with observable ER localization (rather than tracking a single cell over time).

      For fitting to the kinetic model (which attempts to describe ‘fraction in ER/Golgi’) we re-measured all constructs by livecell imaging (see Supp Fig 5) to directly quantify relative construct abundance in the ER or Golgi. These data did not have the plateau in Fig 1E, suggesting that this is an artifact of counting “ER positive cells” which would be expected to have a longer lag than “fraction of protein in ER”. Notably however, t1/2 measured by both methods was similar, suggesting that the population measurement agrees well with single-cell live imaging.

      We have included all these explanations and caveats in the manuscript. We have also changed the wording from “well described” to “reasonably approximated”.

      (2) The model for Golgi sorting is also complicated and controversial, and while the authors' intention to not overinterpreting their data in this regard must be respected, this data is in support of the two-phase Golgi export model (Patterson et al PMID:18555781).

      The reviewers are correct, our observations and model are consistent with Patterson et al and it was a major oversight that a reference to this foundational work was not included. We have now added a discussion regarding the “two phase model” of Patterson and Lippincott-Schwartz.

      Furthermore contrary to the statement in lines 200-202, the kinetics of VSVG exit from the Golgi (Fig. S3) are roughly linear and so are NOT consistent with the previous report by Hirschberg et al.

      Regarding kinetics of VSVG, our intention was to claim that the timescale of VSVG efflux from the Golgi was similar to previously reported in Hirschberg, i.e. t1/2 roughly between 30-60 minutes. We have clarified this in the text. Minor differences in the details between our observations and Hirschberg are likely attributable to temperature, as those measurements were done at 32°C for the tsVSVG mutant.

      Moreover, the kinetics of LAT export from the Golgi (Fig. 3B) appear quite different, more closely approximating exponential decay of the signal. These points should be described accurately and discussed.

      Regarding linear versus exponential fits, we agree that the reality of Golgi sorting and efflux is far more complicated than accounted for by either the phenomenological curve fitting in Figs 1-3 or the modeling in Fig 4. In addition to the possibility of lateral domains within Golgi stacks, there is transport between stacks, retrograde traffic, etc. The fits in Figs 1-3 are not intended to model specifics of transport, but rather to be phenomenological descriptors that allowed us to describe efflux kinetics with one parameter (i.e. t1/2). In contrast, the more refined kinetic modeling presented in Figure 4 is designed to test a mechanistic hypothesis (i.e. coexisting membrane domains in Golgi) and describes well the key features of the trafficking data.

      Relationship between membrane traffic and domain partitioning:

      (1) Phase segregation in the GPMV is dictated by thermodynamics given its composition and the measurement temperature (at low temperatures 4degC). However at physiological temperatures (32-37degC) at which membrane trafficking is taking place these GPMVs are not phase separated. Hence it is difficult to argue that a sorting mechanism based solely on the partitioning of the synthetic LAT-TMD constructs into lo domains detected at low temperatures in GPMVs provide a basis (or its lack) for the differential kinetics of traffic of out of the Golgi (or ER). The mechanism in a living cell to form any lipid based sorting platforms naturally requires further elaboration, and by definition cannot resemble the lo domains generated in GPMVs at low temperatures.

      We thank the reviewers for bringing up this important point. GPMVs are a useful tool because they allow direct, quantitative measurements of protein partitioning between coexisting ordered and disordered phases in complex, cell-derived membranes. However, we entirely agree, that GPMVs do not fully represent the native organization of the living cell plasma membrane and we have previously discussed some of the relevant differences4,5. Despite these caveats, many studies have supported the cellular relevance of phase separation in GPMVs and the partitioning of proteins to raft domains therein 6-9. Most notably, elegant experiments from several independent labs have shown that fluorescent lipid analogs that partition to Lo domains in GPMVs also show distinct diffusive behaviors in live cells 6,7, strongly suggesting the presence of nanoscopic Lo domains in live cells. Similarly, our recent collaborative work with the lab of Sarah Veatch showed excellent agreement between raft preference in GPMVs and protein organization in living immune cells imaged by super-resolution microscopy10. Further, several labs6,7, including ours11, have reported nice correlations between raft partitioning in GPMVs and detergent resistance, which is a classical (though controversial) assay for raft association.

      Based on these points, we feel that GPMVs are a useful tool for quantifying protein preference for ordered (raft) membrane domains and that this preference is a useful proxy for the raft-associated behavior of these probes in living cells. We propose that this approach allows us to overcome a major reason for the historical controversy surrounding the raft field: nonquantitative and unreliable methodologies that prevented consistent definition of which proteins are supposed to be present in lipid rafts and why. Our work directly addresses this limitation by relating quantitative raft affinity measurements in a biological membrane with a relevant and measurable cellular outcome, specifically inter-organelle trafficking rates.

      Addressing the point about phase transition temperatures in GPMVs: this is the temperature at which macroscopic domains are observed. Based on physical models of phase separation, it has been proposed that macroscopic phase separation at lower temperatures is consistent sub-microscopic, nanoscale domains at higher temperatures8,12. These smaller domains can potentially be stabilized / functionalized by protein-protein interactions in cells13 that may not be present in GPMVs (e.g. because of lack of ATP).

      (2) The lipid compositions of each of these membranes - PM, ER and Golgi are drastically different. Each is likely to phase separate at different phase transition temperatures (if at all). The transition temperature is probably even lower for Golgi and the ER membranes compared to the PM. Hence, if the reported compositions of these compartments are to be taken at face value, the propensity to form phase separated domains at a physiological temperature will be very low. Are ordered domains even formed at the Golgi at physiological temperatures?

      It is a good point that the membrane compositions and the resulting physical properties (including any potential phase behavior) will be very different in the PM, ER, and Golgi. Whether ordered domains are present in any of these membranes in living cells remains difficult to directly visualize, especially for non-PM membranes which are not easily accessible by probes, are nanoscopic, and have complex morphologies. However, the fact that raft-preferring probes / proteins share some trafficking characteristics, while very similar non-raft mutants behave differently argues that raft affinity plays a role in subcellular traffic.

      (3) The hypothesis of 'lipid rafts' is a very specific idea, related to functional segregation, and the underlying basis for domain formation has been also hotly debated. In this article the authors conflate thermodynamic phase separation mechanisms with the potential formation of functional sorting domains, further adding to the confusion in the literature. To conclude that this segregation is indeed based on lipid environments of varying degrees of lipid order, it would probably be best to look at the heterogeneity of the various membranes directly using probes designed to measure lipid packing, and then look for colocalization of domains of different cargo with these domains.

      This is a very good suggestion, and a direction we are currently following. Unfortunately, due to the dynamic nature and small size of putative lateral membrane domains, combined with the interior of a cell being filled with lipophilic environments that overlay each other, directly imaging domains in organellar membranes with lipid packing probes remains extremely difficult with current technology (or at least available to us). We argue that the TMD probes used in this manuscript are a reasonable alternative, as they are fluorescent probes with validated selectivity for membrane compartments with different physical properties.

      Ultimately, the features of membrane domains suggested by a variety of techniques – i.e. nanometric, dynamic, relatively similar in composition to the surrounding membrane, potentially diverse/heterogeneous – make them inherently difficult to microscopically visualize. This is one reason why we believe studies like ours, which use a natural model system to directly quantify raft-associated behaviors and relate them to cellular effects (in our case, protein sorting), are a useful direction for this field.

      We believe we have been careful in our manuscript to avoid confusing language surrounding lipid rafts, phase separation, etc. Our experiments clearly show that mammalian membranes have the capacity to phase separate, that some proteins preferentially interact with more ordered domains, and that this preference is related to the subcellular trafficking fates and rates of these proteins. We have edited the manuscript to emphasize these claims and avoid the historical controversies and confusions.

      (4) In the super-resolution experiments (by SIM- where the enhancement of resolution is around two fold or less compared to optical), the authors are able to discern a segregation of the two types of Golgi-resident cargo that have different preferences for the lo-domains in GPMVs. It should be noted that TMD-allL and the LATallL end up in the late endosome after exit of the Golgi. Previous work from the Bonafacino laboratory (PMID: 28978644) has shown that proteins (such as M6PR) destined to go to the late endosome bud from a different part of the Golgi in vesicular carriers, while those that are destined for the cell surface first (including TfR) bud with tubular vesicular carriers. Thus at the resolution depicted in Fig 5, the segregation seen by the authors could be due to an alternative explanation, that these molecules are present in different areas of the Golgi for reasons different from phase partitioning. The relatively high colocalization of TfR with the GPI probe in Fig 5E is consistent with this explanation. TfR and GPI prefer different domains in the GPMV assays yet they show a high degree of colocalization and also traffic to the cell surface.

      This is a good point. Even at microscopic resolutions beyond the optical diffraction limit, we cannot make any strong claims that the segregation we observe is due to lateral lipid domains and not several reasonable alternatives, including separation between cisternae (rather than within), cargo vesicles moving between cisternae, or lateral domains that are mediated by protein assemblies rather than lipids. We have explicitly included this point in the Discussion: “Our SIM imaging suggests segregation of raft from nonraft cargo in the Golgi shortly (5 min) after RUSH release (Fig 5B), but at this level of resolution, we can only report reduced colocalization, not intra-Golgi protein distributions. Moreover, segregation within a Golgi cisterna would be very difficult to distinguish from cargo moving between cisternae at different rates or exiting via Golgi-proximal vesicles.”

      We have also added a similar caveat in the Results section of the manuscript: “These observations support the hypothesis that proteins can segregate in Golgi based on their affinity for distinct membrane domains; however, it is important to emphasize that this segregation does not necessarily imply lateral lipid-driven domains within a Golgi cisterna. Reasonable alternative possibilities include separation between cisternae (rather than within), cargo vesicles moving between cisternae, or lateral domains that are mediated by protein assemblies rather than lipids.”

      Finally, while probes with allL TMD do eventually end up in late endosomes (consistent with the Bonifacino lab’s findings which we include), they do so while initially transiting the PM2,11.

      Minor concerns:

      (1) Generally, the quantitation is high quality from difficult experimental data. Although a lot appears to be manual, it appears appropriately performed and interpreted. There are some claims that are made based on this quantitation, however, where there are no statistics performed. For example, figure 1B. Any quantitation with an accompanying conclusion should be subject to a statistical test. I think the quality of the model fits- this is particularly important.

      We appreciate the thoughtful feedback, the quantifications and fits were not trivial, but we believe important. We have added statistical significance to Figure 1B and others where it was missing.

      (2) Modulation of lipid levels in Fig 4E shows a significant change for the trafficking rate for the LAT-TMD construct and a not so significant change for all-TMD construct. However, these data are not convincing and appear to depend on a singular data point that appears to lower the mean value. In general, the experiment with the MZA inhibitor (Fig. 4D-F) is hard to interpret because cells will likely be sick after inhibition of sphingolipid and cholesterol synthesis. Moreover, the difference in effects for LAT-TMD and allL-TMD is marginal.

      We disagree with this interpretation. Fig 4E shows the average of three experiments and demonstrates clearly that the inhibitors change the Golgi efflux rate of LAT-TMD but not allL-TMD. This is summarized in the t1/2 quantifications of Fig 4F, which show a statistically significant change for LAT-TMD but not allL-TMD. This is not an effect of a singular data point, but rather the trend across the dataset.

      Further, the inhibitor conditions were tuned carefully to avoid cells becoming “sick”: at higher concentrations, cells did adopt unusual morphologies and began to detach from the plates. We pursued only lower concentrations, which cells survived for at least 48 hrs and without major morphological changes.

      (3) Line 173: 146-AAPSA-152 should read either 146-AAPSA-150 or 146-AAPSAPA-152, depending on what the authors intended.

      Thanks for the careful reading, we intended the former and it has been fixed.

      (4) What is the actual statistical significance in Fig. 3C and Fig. 3E? There is a single asterisk in each panel of the figure but two asterisks in the legend.

      Apologies, a single asterisk representing p<0.05 was intended. It has been fixed.

      (5) The code used to calculate the model. is not accessible. It is standard practice to host well-annotated code on Github or similar, and it would be good to have this publicly available.

      We have deposited the code on a public repository (doi: 10.5281/zenodo. 10478607) and added a note to the Methods.

      (1) Lorent, J. H. et al. Structural determinants and func7onal consequences of protein affinity for membrane ra=s. Nature communica/ons 8, 1219 (2017).PMC5663905

      (2) Diaz-Rohrer, B. B., Levental, K. R., Simons, K. & Levental, I. Membrane ra= associa7on is a determinant of plasma membrane localiza7on. Proc Natl Acad Sci U S A 111, 8500-8505 (2014).PMC4060687

      (3) Hirschberg, K. et al. Kine7c analysis of secretory protein traffic and characteriza7on of golgi to plasma membrane transport intermediates in living cells. J Cell Biol 143, 1485-1503 (1998).PMC2132993

      (4) Levental, K. R. & Levental, I. Giant plasma membrane vesicles: models for understanding membrane organiza7on. Current topics in membranes 75, 25-57 (2015)

      (5) Sezgin, E. et al. Elucida7ng membrane structure and protein behavior using giant plasma membrane vesicles. Nat Protoc 7, 1042-1051 (2012)

      (6) Komura, N. et al. Ra=-based interac7ons of gangliosides with a GPI-anchored receptor. Nat Chem Biol 12, 402-410 (2016)

      (7) Kinoshita, M. et al. Ra=-based sphingomyelin interac7ons revealed by new fluorescent sphingomyelin analogs. J Cell Biol 216, 1183-1204 (2017).PMC5379944

      (8) Stone, M. B., Shelby, S. A., Nunez, M. F., Wisser, K. & Veatch, S. L. Protein sor7ng by lipid phase-like domains supports emergent signaling func7on in B lymphocyte plasma membranes. eLife 6 (2017).PMC5373823

      (9) Machta, B. B. et al. Condi7ons that Stabilize Membrane Domains Also Antagonize n-Alcohol Anesthesia. Biophys J 111, 537-545 (2016)

      (10) Shelby, S. A., Castello-Serrano, I., Wisser, I., Levental, I. & S., V. Membrane phase separa7on drives protein organiza7on at BCR clusters. Nat Chem Biol in press (2023)

      (11) Diaz-Rohrer, B. et al. Rab3 mediates a pathway for endocy7c sor7ng and plasma membrane recycling of ordered microdomains Proc Natl Acad Sci U S A 120, e2207461120 (2023)

      (12) Veatch, S. L. et al. Cri7cal fluctua7ons in plasma membrane vesicles. ACS Chem Biol 3, 287-293 (2008)

      (13) Wang, H. Y. et al. Coupling of protein condensates to ordered lipid domains determines func7onal membrane organiza7on. Science advances 9, eadf6205 (2023).PMC10132753

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Suggestions to the authors:

      • Please re-analyze findings by omitting from all Tables and Figures all data of comparators who were not randomized (BAC). I understand the difficulties of running this trial but the results of excess reduction of mortality do not allow the publication of a trial where comparators do not come from the randomized patient population.

      We wish to thank the editors and reviewers for their useful comments. Given that the study was designed with both randomised and CC participants we can’t easily exclude the CC analysis from the paper. However, we do provide graphs for both randomised only and randomised and CC participants for the primary and secondary endpoints. The fact that the primary endpoint (CRP) results are mirrored in both instances is also informative form a trial design perspective and indicative of the effect of dornase alfa therapy on inflammation being robust enough to yield the same results with small and larger cohorts.

      We agree that there are potential drawbacks of using contemporary controls. To address these potential biases we used CC patients recruited at the same time period at single site using the same selection criteria as the randomised group, which minimised potential bias. However, the enrolment and comparison of CRP in CC-BAC participants to concurrent randomised control R-BAC patients indicated that the two groups responded to BAC treatment in the same manner (Table 2, LS means log(CRP) 3.78 vs 3.53, P=0.386), whereas the R-BAC+DA vs R-BAC group comparison yielded significant differences (Table 2, LS means log(CRP) 3.1 vs 3.59, P=0.041). These comparisons mitigate to a large degree these potential problems.

      Still, to make easy to distinguish the groups we now use the following unique nomenclature throughout the manuscript which is clearly defined on ln. 111 and state that comparisons of treated participants were performed with both control groups separately and combined.

      R-BAC: Randomised BAC CC-BAC: Contemporary control BAC R-BAC+DA : Randomised BAC+ dornase alfa T-BAC: R-BAC + CC-BAC

      In fact, the most important bias in our study, might actually be the placebo effect, given that participants randomised to BAC did not receive a nebulized control substance. We now discuss these points in more detail in the manuscript and modified the title by removing the reference to a randomised trial and clinical outcomes.

      • The presentation remains confusing and the manuscript should be critically revised for clarity. There is a repetition of methods (e.g. lines 176-187 repeat 160-175) and redundant results (e.g. Figure S2, Table 3).

      We apologise for the repetition. We removed the repeated text in the Exclusion criteria (lines 176-187 in the old manuscript).

      Figure S2 is not related to Table 3. Figure S2 depicts baseline characteristics, whereas Table 3 complements the graph in Figure 3A but lists the mean daily value of the primary endpoint as requested by Reviewer 1 in the first round of revision.

      At Table 4: the authors should select one method of illustration for lab results, either Table or figure, without repetitions

      We agree and have removed Table 4 leaving the graphs instead.

      • Regarding inclusion criteria, it is unclear whether high radiological suspicion is sufficient for inclusion or whether PCR based confirmation is required in all instances (differences in wording between lines 153 and 191), and under which oxygen requirements (lines 155 and 192)

      We thank the reviewer for pointing this out. Indeed, radiological suspicion was not sufficient and all participants in this study had a positive PCR test as part of their diagnosis prior to inclusion in the study. The entire eligibility section was rewritten to reflect this important point.

      • Table 1 should be merged with Table S2 and a better description of cohort baseline severity (P/F, SOFA, APACHE, organ support, number of patients in each point of the WHO severity score) and treatments should be made available.

      We thank the reviewer for this suggestion. We have now merged Table 1 and S2 and included WHO ordinal severity information in Table 1, with median, average, SD, min and max values which reflect the participant distribution. Unfortunately, although the additional requested information was recorded, it was not systematically collected for the analysis of the trial and it was not straight forward to compile at this stage.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendation for the authors):

      (1) On a few occasions, I found that the authors would introduce a concept, but provide evidence much later on. For example, in line 57, they introduced the idea that feedback timing modulates engagement of the hippocampus and striatum, but they provided the details much later on around line 99. There are a few instances like these, and the authors may want to go through the manuscript critically to bridge such gaps to improve the flow of reading.

      First, we thank the reviewer for acknowledging the contribution of our study and the methodological choices. We acknowledge the concern raised about the flow of information in the introduction. We have critically reviewed the manuscript, especially on writing style and overall structure, to ensure a smoother transition between the introduction of concepts and the provision of supporting evidence. In the case of the concept of feedback timing and memory systems, lines 46-58 first introduce the concept enhanced with evidence regarding adults, and we then pick up the concept around line 103 again to relate it to children and their brain development to motivate our research question. To further improve readability, we have included an outline of what to expect in the introduction. Specifically, we added a sentence in line 66-68 that provides an overview of the different paragraphs: “We will introduce the key parameters in reinforcement learning and then we review the existing literature on developmental trajectories in reinforcement learning as well as on hippocampus and striatum, our two brain regions of interest.”

      This should prepare the reader better when to expect more evidence regarding the concepts introduced. We included similar “road-marker” outline sentences in other occasions the reviewer commented on, to enhance consistency and readability.

      (2) I am curious as to how they think the 5-second delay condition maps onto real-life examples, for example in a classroom setting feedback after 5 seconds could easily be framed as immediate feedback.

      The authors may want to highlight a few illustrative examples.

      Thank you for asking about the practical implications of a 5-second delay condition, which may be very relevant to the reader. We have modified the introduction example in line 39-41 towards the role of feedback timing in the classroom to point out its practical relevance early on: “For example, children must learn to raise their hand before speaking during class. The teacher may reinforce this behavior immediately or with a delay, which raises the question whether feedback timing modulates their learning”.

      We have also expanded a respective discussion point in lines 720-728 to pick up the classroom example and to illustrate how we think timescale differences may apply: “In scenarios such as in the classroom, a teacher may comment on a child’s behavior immediately after the action or some moments later, in par with our experimental manipulation of 1 second versus 5 seconds. Within such short range of delay in teachers’ feedback, children’s learning ability during the first years of schooling may function equally well and depend on the striatal-dependent memory system. However, we anticipate that the reliance on the hippocampus will become even more pronounced when feedback is further delayed for longer time. Children’s capacity for learning over longer timescales relies on the hippocampal-dependent memory system, which is still under development. This knowledge could help to better structure learning according to their development.”

      (3) In the methods section, there are a few instances of task description discrepancies which make things a little bit confusing, for example, line 173 reward versus punishment, or reward versus null elsewhere e.g. line 229. In the same section, line 175, there are a few instances of typos.

      We appreciate your attention to detail in pointing out discrepancies in task descriptions and typos in the method section. We have revised the section, corrected typos, and now phrased the learning outcomes consistently as “reward” and “punishment”.

      (4). I wasn't very clear as to why the authors did not compute choice switch probability directly from raw data but implemented this as a model that makes use of a weight parameter. Former would-be much easier and straightforward for data plotting especially for uninformed readers, i.e., people who do not have backgrounds in computational modelling.

      Thank you for asking for clarification on the calculation of switching behavior. Indeed, in the behavioral results, switching behavior was directly calculated from the raw data. We now stressed this in the methods in lines 230-235, also by naming win-stay and lose-shift as “proportions” instead of as “probabilities”:“As a first step, we calculated learning outcomes diretly from the raw data, which where learning accuracy, win-stay and lose-shift behavior as well as reaction time.

      Learning accuracy was defined as the proportion to choose the more rewarding option, while win-stay and lose-shift refer to the proportion of staying with the previously chosen option after a reward and switching to the alternative choice after receiving a punishment, respectively.”

      In contrast to the raw data switching behavior, the computational heuristic strategy model indeed uses a weight for a relative tendency of switching behavior. We have also stressed the advantage of the computational measure and its difference to the raw data switching behavior in lines 248-252 and believe that the reader can now clearly distinguish between the raw data and the computational results: “Note that these model-based outcomes are not identical to the win-stay and lose-shift behavior that were calculated from the raw data. The use of such model-based measure offers the advantage in discerning the underlying hidden cognitive process with greather nuance, in contrast to classical approaches that directly use raw behavioral data.”

      (5) I agree with the authors' assertion that both inverse temperature and outcome sensitivity parameters may lead to non-identifiability issues, but I was not 100% convinced about their modelling approach exclusively assessing a different family of models (inv temperature versus outcome sensitivity). Here, I would like to make one mid-way recommendation. They may want to redefine the inverse temperature term in terms of reaction time, i.e., B=exp^(s+g(RT-mean (RT)) where s and g are free parameters (see Webb, 2019), and keep the outcome sensitivity parameter in the model with bounds [0,2] so that the interpretation could be % increase or decrease in actual outcome. Personally, in tasks with binary outcomes i.e. [0,1: null vs reward] I do not think outcome sensitivity parameters higher than 2 are interpretable as these assign an inflated coefficient to outcomes.

      We appreciate the mid-way recommendation regarding the modeling approach for inverse temperature and outcome sensitivity parameters. We have carefully revised our analysis approach by considering alternative modeling choices. Regarding the suggestion to redefine the inverse temperature in terms of reaction time by B=exp^(s+g(RT-mean (RT)), we unfortunately were not able to identify the reference Webb (2019), nor did we find references to the suggested modeling approach. Any further information that the reviewer could provide will be greatly appreciated. Regardless, we agree that including reaction times through the implementation of drift-diffusion modeling may be beneficial. However, changing the inverse temperature model in such a way would necessitate major changes in our modeling approach, which unfortunately would result in non-convergence issues in our MCMC pipeline using Rstan. Hence, this approach goes beyond the scope of the manuscript. Nonetheless, we have decided to mention the use of a drift-diffusion model, along with other methodological considerations, as future recommendation for disentangling outcome sensitivity from inverse temperature in lines 711-712: “Future studies might shed new light by examining neural activations at both task phases, by additionally modeling reaction times using a drift-diffusion approach, or by choosing a task design that allows independent manipulations of these phases and associated model parameters, e.g., by using different reward magnitudes during reinforcement learning, or by studying outcome sensitivity without decisionmaking.“

      Regarding the upper bound of outcome sensitivity, we agree that traditionally, limiting the parameter values at 2 is the choice for the parameter to be best interpretable. During model fitting, we had experienced non-convergence issues and ceiling effects in the outcome sensitivity parameter when fixing the inverse temperature at 1. The non-convergence issue was not resolved when we fixed the inverse temperature at 15.47, which was the group mean of the winning inverse temperature family. Model convergence was only achieved after increasing the outcome sensitivity upper bound to 20, with inverse temperature again fixed at 1. Since this model also performed well during parameter and model recovery, we argue that the parameter is nevertheless meaningful, despite the more extreme trial-to-trial value fluctuations under higher outcome sensitivity. We described our choice for this model in the methods section in lines 282-288: “Even though outcome sensitivity is usually restricted to an upper bound of 2 to not inflate outcomes at value update, this configuration led to ceiling effects in outcome sensitivity and non-converging model results. Further, this issue was not resolved when we fixed the inverse temperature at the group mean of 15.47 of the winning inverse temperature family model. It may be that in children, individual differences in outcome sensitivity are more pronounced, leading to more extreme values. Therefore, we decided to extend the upper bound to 20, parallel to the inverse temperature, and all our models converged with Rhat < 1.1.”.

      (6) I think the authors reporting optimal parameters for the model is very important (line 464), but the learning rate they report under stable contingencies is much higher than LRs reported by for example Behrens et al 2007, LRs around 0.08 for the optimal learning behaviour. The authors may want to discuss why their task design calls for higher learning rates.

      Thank you for appreciating our optimal parameter analysis, and for the recommendation to discuss why optimal learning rates in our task design may call for higher learning rates compared to those reported in some other studies. As largely articulated in Zhang et al (2020; primer piece by one of our co-authors), the optimal parameter combination is determined by several factors, such as the reward schedule (e.g., 75:25, vs 80:20) and task design (e.g., no reversal, one reversal, vs multiple reversal) and number of trials (e.g., 80, vs 100, vs, 120). Notably, in these taskrelated regards, our task is different from Behrens et al. (2007), which hinders a quantitative comparison among the optimal parameters in the two tasks. We have now included more details in our discussion in lines 643-656: “However, the differences in learning rate across studies have to be interpreted with caution. The differences in the task and the analysis approach may limit their comparability. Task proporties such as the trial number per condition differed across studies. Our study included 32 trials per cue in each condition, while in adult studies, the trials per condition ranged from 28 to 100. Optimal learning rates in a stable learning environment were at around 0.25 for 10 to 30 trials, another study reported a lower optimal learning rate of around 0.08 for 120 trials. This may partly explain why in our case of 32 trials per condition and cue, optimal learning rates called for a relatively high optimal learning rate of 0.29, while in other studies, optimal learning rates may be lower. Regarding differences in the analysis approach, the hierarchical bayesian estimation approach used in our study produces more reliable results in comparison to maximum likelihood estimation, which had been used in some of the previous adult studies and may have led to biased results towards extreme values. Taken together, our study underscores the importance of using longitudinal data to examine developmental change as well as the importance of simulation-based optimal parameters to interpret the direction of developmental change.”

      (7) The authors may want to report degrees of freedom in t-tests so that it would be possible to infer the final sample size for a specific analysis, for example, line 546.

      We appreciate the recommendation to include degrees of freedom, which are now added in all t-test results, for example in line 579: “Episodic memory, as measured by individual corrected object recognition memory (hits - false alarms) of confident (“sure”) ratings, showed at trend better memory for items shown in the delayed feedback condition (𝛽!""#$%&’(#")%*"# = .009, SE =.005, t(df = 137) = 1.80, p = .074, see Figure 5A).”

      (8) I'm not sure why reductions in lose shift behaviour are framed as an improvement between 2 assessment points, e.g. line 578. It all depends on the strength of the contingency so a discussion around this point should be expanded.

      We acknowledge that a reduction in lose-shift behavior only reflect improvements under certain conditions where uncertainty is low and the learning contingencies are stable, which is the case in our task. We have added Supplementary Material 4 to illustrate the optimality of win-stay and lose-shift proportions from model simulation and to confirm that children’s longitudinal development was indeed towards more optimal switching behavior. In the manuscript, we refer to these results in lines 488-490: “We further found that the average longitudinal change in win-stay and lose-shift proportion also developed towards more optimal value-based learning (Supplementary Material 4).”

      (9) If I'm not mistaken, the authors reframe a trend-level association as weak evidence. I do not think this is an accurate framing considering the association is strictly non-significant, therefore should be omitted line 585.

      We thank for the point regarding the interpretation of a trend-level association as weak evidence. We changed our interpretation, corrected in lines 581-585: “The inclusion of poor learners in the complete dataset may have weakend this effect because their hippocampal function was worse and was not involved in learning (nor encoding), regardless of feedback timing. To summarize, there was inconclusive support for enhanced episodic memory during delayed compared to immediate feedback, calling for future study to test the postulation of a selective association between hippocampal volume and delayed feedback learning.” as well as lines 622-623: “Contrary to our expectations, episodic memory performance was not enhanced under delayed feedback compared to immediate feedback.”

      Reviewer # 2 (Public Review):

      We thank the reviewer for acknowledging the strength of our study and pointing out its weaknesses.

      Weaknesses:

      There were a few things that I thought would be helpful to clarify. First, what exactly are the anatomical regions included in the striatum here?

      We appreciate the clarification question regarding the anatomical regions included in the striatum. The striatum included ventral and dorsal regions, i.e., accumbens, caudate and putamen. We have now specified the anatomical regions that were included in the striatum in lines 211-212: “We extracted the bilateral brain volumes for our regions of interest, which were striatum and hippocampus. The striatum regions included nucleus accumbens, caudate and putamen.”

      Second, it was mentioned that for the reduced dataset, object recognition memory focused on "sure" ratings. This seems like the appropriate way to do it, but it was not clear whether this was also the case for the full analyses in the main text.

      Thank you for pointing out that in the full dataset analysis, the use of “sure” ratings for object recognition memory was previously not mentioned. Including only “sure” ratings was used consistently across analyses. This detail is now described under methods in lines 332-333: “Only confident (“sure”) ratings were included in the analysis, which were 98.1 % of all given responses.”

      Third, the children's fitted parameters were far from optimal; is it known whether adults would be closer to optimal on the task?

      We thank for your question on whether adult learning rates in the task have been reported to be more optimal than those of the children in our study. This indeed seems to be the case, and we added this point in our discussion in line 639-643: “Adult studies that examined feedback timing during reinforcement learning reported average learning rates range from 0.12 to 0.34, which are much closer to the simulated optimal learning rates of 0.29 than children’s average learning rates of 0.02 and 0.05 at wave 1 and 2 in our study. Therefore, it is likely that individuals approach adult-like optimal learning rates later during adolescence.”

      The main thing I would find helpful is to better integrate the differences between the main results reported and the many additional results reported in the supplement, for example from the reduced dataset when excluding non-learners. I found it a bit challenging to keep track of all the differences with all the analyses and parameters. It might be helpful to report some results in tables side-by-side in the two different samples. And if relevant, discuss the differences or their implication in the Discussion. For example, if the patterns change when excluding the poor learners, in particular for the associations between delayed feedback and hippocampal volume, and those participants were also those less well fit by the value-based model, is that something to be concerned about and does that affect any interpretations? What was not clear to me is whether excluding the poor learners at one extreme simply weakens the general pattern, or whether there is a more qualitative difference between learners and non-learners. The discussion points to the relevance of deficits in hippocampaldependent learning for psychopathology and understanding such a distinction may be relevant.

      We appreciate the feedback that it might seem challenging to keep track of differences between the analyses of the full and the reduced dataset. We have now gathered all the analyses for the reduced dataset in Supplementary Material 6, with side-by-side tables for comparison to the full dataset results. Whenever there were differences between the results, they were pointed out in the results section, see lines 557-560: “In the results of the reduced dataset, the hippocampal association to the delayed learning score was no longer significant, suggesting a weakened pattern when excluding poor learners (Supplementary Material 6). It is likely that the exclusion reduced the group variance for hippocampal volume and delayed learning score in the model.” and lines 579-581: “Note that in the reduced dataset, delayed feedback predicted enhanced item memory significantly (Supplementary Material 6).”

      The found differences were further included in our discussion in lines 737-740 in the context of deficits in hippocampal-dependent learning and psychopathology: “Interestingly, poor learners showed relatively less value-based learning in favor of stronger simple heuristic strategies, and excluding them modulated the hippocampal-dependent associations to learning and memory in our results. More studies are needed to further clarify the relationship between hippocampus and psychopathology during cognitive and brain development.”

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      (1) There appears to be a flaw in the exploration of cortical inputs. the authors never show that HFS of cortical inputs has no effect in the absence of thalamic stimulation. It appears that there is a citation showing this, but I think it would be important to show this in this study as well.

      We understand that the reviewer would like us to induce an HFS protocol on cortical input and then test if there is any change in synaptic strength in thalamic input. We have done this experiment which shows that without a footshock, high-frequency stimulation (HFS) of the cortical inputs did not induce synaptic potentiation on the thalamic pathway (Extended Data Fig. 4d).

      (2) t is somewhat confusing that the authors refer to the cortical input as driving heterosynaptic LTP, but this is not shown until Figure 4J, that after non-associative conditioning (unpaired shock and tone) HFS of the cortex can drive freezing and heterosynaptic LTP of thalamic inputs.

      We agree with the reviewer that it is in figure 4j and figure 5,b,c which we show electrophysiological evidence for cortical input driving heterosynaptic LTP. It is only to be consistent with our terminology that initially we used behavioral evidence as the proxy for heteroLTP (figure 3c).

      …, the authors are 'surprised' by this outcome, which appears to be what they predict.

      We removed the phrase “To our surprise”.

      (3) 'Cortex' as a stimulation site is vague. The authors have coordinates they used, it is unclear why they are not using standard anatomical nomenclature.

      We replaced “cortex” with “auditory/associative cortex”.

      (4) The authors' repeated use of homoLTP and heteroLTP to define the input that is being stimulated makes it challenging to understand the experimental detail. While I appreciate this is part of the goal, more descriptive words such as 'thalamic' and 'cortical' would make this much easier to understand.

      We agree with the reviewer that a phrase such as “an LTP protocol on thalamic and cortical inputs” would be more descriptive. We chose the words “homoLTP” and “heteroLTP” only to clarify (for the readers) the physiological relevance of these protocols. We thought by using “thalamic” and “cortical” readers may miss this point. However, when for the first time we introduce the words “homoLTP” and “heteroLTP”, we describe which stimulated pathway each refers to.

      Reviewer #2 (Public Review):

      (1) …The experimental schemes in Figs. 1 and 3 (and Fig. 4e and extended data 4a,b) show that one group of animals was subjected to retrieval in the test context at 24 h, then received HFS, which was then followed by a second retrieval session. With this design, it remains unclear what the HFS impacts when it is delivered between these two 24 h memory retrieval sessions.

      We understand that the reviewer has raised the concern that the increase in freezing we observed after the HFS protocol (ex. Fig. 1b, the bar labeled as Wth+24hHFSth) could be caused or modulated by the recall prior to the HFS (Fig. 1a, top branch). To address this concern, in a new group of mice, 24 hours after weak conditioning, we induced the HFS protocol, followed by testing (that is, no testing prior to the HFS protocol). We observed that homoLTP was as effective in mice that were tested prior to the induction protocol as those that were not (Fig. 1b, Extended Data Fig. 1d,e).

      It would be nice to see these data parsed out in a clean experimental design for all experiments (in Figs 1, 3, and 4), that means 4 groups with different treatments that are all tested only once at 24 h, and the appropriate statistical tests (ANOVA). This would also avoid repeating data in different panels for different pairwise comparisons (Fig 1, Fig 3, Fig 4, and extended Fig 4).

      While we understand the benefit of the reviewer’s suggestion, the current presentation of the data was done to match the flow of the text and the delivery of the information throughout the manuscript. We think it is unlikely that the retrieval test prior to the HFS impacts its effectiveness, as confirmed by homosynaptic HFS data (Extended Data Fig. 1d,e). It is beyond the scope of current manuscript to investigate the mechanisms and manipulations related to reconsolidation and retrieval effects.

      (2) … It would be critical to know if LFPs change over 24 h in animals in which memory is not altered by HFS, and to see correlations between memory performance and LFP changes, as two animals displayed low freezing levels. … They would suggest that thalamo-LA potentiation occurs directly after learning+HFS (which could be tested) and is maintained over 24 h.

      We have performed the experiment where we recorded the evoked LFP 2hrs and 24hrs following the weak conditioning protocol. We observed that a weak conditioning protocol that was not followed by an optical LTP protocol on the cortical inputs failed to produce synaptic potentiation of the thalamic inputs (tested 2hrs and 24hrs after the LTP protocol; Extended Data Fig. 5d,e).

      (3) The statistical analyses need to be clarified. All statements should be supported with statistical testing (e.g. extended data 5c, pg 7 stats are missing). The specific tests should be clearly stated throughout. For ANOVAs, the post-hoc tests and their outcomes should be stated. In some cases, 2-way ANOVAs were performed, but it seems there is only one independent variable, calling for one-way ANOVA.

      All the statistical analyses have been revised and the post-hoc tests performed after the ANOVAs are mentioned in the relevant figure legends.

      Reviewer #2 (Recommendations For The Authors):

      The wording "transient" and "persistent" used here in the context of memory seems a bit misleading, as only one timepoint was assessed for memory recall (24 h), at which the memory strength (freezing levels) seem to change.

      As the reviewer mentioned, we have tested memory recall only at one time point. For this reason, throughout the text we used “transient” exclusively to refer to the experience (receiving footshock) and not to the memory. We replaced “persistence” with “stabilization” where it refers to a memory (“the induction of plasticity influences the stabilization of the memory”).

      For the procedures in which the CS and US were not paired, the term "unpairing" is used (which is probably the more adequate one), but the term "non-associative conditioning" appears in the text, which seems a bit misleading, as this term may have another connotation. There is also literature that an unpairing of CS and US could lead to the formation of a safety memory to the CS, that may be disrupted by HFS stimulation.

      We replaced "non-associative" with “unpaired”.

      Validation of viral injection sites for all experiments: Only representative examples are shown, it would be nice to see all viral expression sites.

      For this manuscript, we have used 155 mice. For this reason, including the injection sites for all the animals in the manuscript is not feasible. Except for the mice that have been excluded, (please see exclusion criteria added in the methods), the expression pattern we observed was consistent across animals and therefore the images shown are true representatives.

      Extended Data 1b: Please explain what N, U, W, and S behavioral groups mean. To what groups mentioned in the text (pg 2,3) do these correspond?

      The requested clarifications are implemented in the figure legend.

      Please elaborate on the following aspects of your methods and approaches:

      • Please explain if the protocol for HFS to manipulate behavior was the same as the one used for the LTP experiments (Fig 1d, Fig 4j) and was identical for homo/hetero inputs from thal and ctx?

      We used the same HFS protocol for all the HFS inductions. We included this information in the methods section.

      • Please state when the HFS was given in respect to the conditioning (what means immediately before and after?) and in which context it was given. Were animals subjected to HFS exposed to the context longer (either before or after the conditioning while receiving HFS) than the other groups? When the HFS was given in another context (for the 24 h group)- how was this controlled for?

      Requested information has been added to the methods section. The control and intervention groups were treated in the same way.

      • When were the footshocks given in the anesthesized recordings (Fig. 4j) and how was the temporal relationship to the HFS? Was the timing the same as for the HFS in the behavioral experiments?

      Requested information has been added to the methods section.

      • Please add information on how the LFP was stimulated and how the LFP- EPSP slope was determined in in vivo recordings, likewise for the whole cell recordings of EPSPs in Fig. 5d-f.

      Requested information has been added to the methods section.

      Here, the y-Axis in Fig. 5e should be corrected to EPSP slope rather than fEPSP slope if these are whole-cell recordings.

      This has been corrected.

      • Please include information if the viral injections and opto-manipulations were done bilateral or unilateral and if so in which hemisphere. Likewise, indicate where the LFP recordings were done.

      Requested information has been added to the methods section.

      • Were there any exclusion criteria for animals (e.g. insufficient viral targeting or placement of fibers and electrodes), other than the testing of the optical CS for adverse effects?

      Requested information has been added to the methods section.

      Statistics: In addition to clarifying analytical statistics, please clarify n-numbers for slice recordings (number of animals, number of slices, and number of cells if applicable).

      Requested information has been added to the methods section.

      It would be nice to scrutinize the results in extended data 4b. The freezing levels with U+24h HFS show a strong trend towards an increase, the effect size may be similar to immediate HFS Fig 4f and extended data 4a) if n was increased.

      We agree with the reviewer. To address this point, we added “HomoLTP protocol when delivered 24hrs later, produced an increase in freezing; however, the value was not statistically significant.” To show this point, we used the same scale for freezing in Extended Data Fig. 4a and b.

      In the final experiment (Fig. 5a-c), Fig. 5b seems to show results from only one animal, but behavioral results are from 4 animals (Fig 5c). It would be helpful to see the quantification of potentiation in each animal.

      The results (now with error bar) include all mice.

      Please spell out the abbreviation "STC".

      Now, it is spelled out.

      Page 8 last sentence of the discussion does not seem to fit there.

      The sentence has been removed.

      Reviewer #3 (Recommendations For The Authors):

      (1) The authors did not determine how WTh affects Th-LA synapses, as field EPSPs were recorded only after HFS. WTh was required for the effects of HFS, as HFS alone did not produce CR in naïve and/or unpaired controls. As such the effects of the WTh protocol on synaptic strength must be investigated.

      We have performed the experiment where we recorded the evoked LFP 2hrs and 24hrs following the weak conditioning protocol. We observed that a weak conditioning protocol that was not followed by an optical LTP protocol on the cortical inputs failed to produce synaptic potentiation of the thalamic inputs (tested 2hrs and 24hrs after the LTP protocol; Extended Data Fig. 5d,e).

      (2) The authors provide some evidence that their dual opsin approach is feasible, particularly the use of sustained yellow light to block the effects of blue light on ChrimsonR. However, this validation was done using single pulses making it difficult to assess the effect of this protocol on Th input when HFS was used. Without strong evidence that the optogenetic methods used here are fault-proof, the main conclusions of this study are compromised. Why did the authors not use a protocol in which fibers were placed directly in the Ctx and Th while using soma-restricted opsins to avoid cross-contamination?

      We understand that the reviewer raises the possibility that our dual-opsin approach, although effective with single pulses, may fail in higher frequency stimulation protocols (10Hz and 85Hz). To address this concern, in a new group of mice we applied our approach to 10Hz and 85Hz stimulation protocols. We show that our approach is effective in single-pulse as well as in 10Hz and 85Hz stimulation protocols (Fig. 2d-h).

    1. Author response:

      Reviewer #1 (Public Review):

      Summary:

      Zhang et al. demonstrate that CD4+ single positive (SP) thymocytes, CD4+ recent thymic emigrants (RTE), and CD4+ T naive (Tn) cells from Cd11c-p28-flox mice, which lack IL-27p28 selectively in Cd11c+ cells, exhibit a hyper-Th1 phenotype instead of the expected hyper Th2 phenotype. Using IL-27R-deficient mice, the authors confirm that this hyper-Th1 phenotype is due to IL-27 signaling via IL-27R, rather than the effects of monomeric IL-27p28. They also crossed Cd11c-p28-flox mice with autoimmune-prone Aire-deficient mice and showed that both T cell responses and tissue pathology are enhanced, suggesting that SP, RTE, and Tn cells from Cd11c-p28-flox mice are poised to become Th1 cells in response to self-antigens. Regarding mechanism, the authors demonstrate that SP, RTE, and Tn cells from Cd11c-p28-flox mice have reduced DNA methylation at the IFN-g and Tbx21 loci, indicating 'de-repression', along with enhanced histone tri-methylation at H3K4, indicating a 'permissive' transcriptional state. They also find evidence for enhanced STAT1 activity, which is relevant given the well-established role of STAT1 in promoting Th1 responses, and surprising given IL-27 is a potent STAT1 activator. This latter finding suggests that the Th1-inhibiting property of thymic IL-27 may not be due to direct effects on the T cells themselves.

      Strengths:

      Overall the data presented are high quality and the manuscript is well-reasoned and composed. The basic finding - that thymic IL-27 production limits the Th1 potential of SP, RTE, and Tn cells - is both unexpected and well described.

      Weaknesses:

      A credible mechanistic explanation, cellular or molecular, is lacking. The authors convincingly affirm the hyper-Th1 phenotype at epigenetic level but it remains unclear whether the observed changes reflect the capacity of IL-27 to directly elicit epigenetic remodeling in developing thymocytes or knock-on effects from other cell types which, in turn, elicit the epigenetic changes (presumably via cytokines). The authors propose that increased STAT1 activity is a driving force for the epigenetic changes and resultant hyper-Th1 phenotype. That conclusion is logical given the data at hand but the alternative hypothesis - that the hyper-STAT1 response is just a downstream consequence of the hyper-Th1 phenotype - remains equally likely. Thus, while the discovery of a new anti-inflammatory function for IL-27 within the thymus is compelling, further mechanistic studies are needed to advance the finding beyond phenomenology.

      Thanks for the comments. Following the suggestions of the reviewer, further studies will be performed to test whether developing thymocytes are the direct targets of IL-27 using Cd4-IL-27ra knockout mice or mixed bone marrow chimeras of wildtype and IL-27ra knockout cells.

      To address the potential autocrine loop in the STAT1 hyperactivation, we added IFN-γ antibody into CD4+ T cell cultures and saw no obvious impact on STAT1 phosphorylation. If deemed necessary, we could further test this possibility in vivo using Cd4-Ifng and CD11c-p28 double knockout mice.

      The detailed mechanisms underlying the hyperactivation of STAT1 remain to be determined. IL-27p28 has recently been shown to act as an antagonist of gp130-mediated signaling. In addition, structural studies have demonstrated that IL-27p28 has the interface with EBI3, as well as the two receptor subunits IL-27Rα and gp130. Taken into consideration of these findings and the fact that p28 and IL-27ra deficiency exhibits similar phenotype, we speculate that deficiency in either p28 or IL-27ra makes more gp130 available to transduce signals elicited by other cytokines. We will next focus on gp130 related cytokines to search for the candidate(s) which ultimately leads to enhanced STAT1 activation in the absence of p28. Alternatively, release of EBI3 in the absence of p28 may facilitate its coupling with other cytokine subunits. IL-35, which is composed of EBI3 and p35, is of particular interest as IL-27Rα is also involved in its signaling.

      To narrow down the candidate cytokines, we will first examine the expression of IL-35 and gp130 related cytokines, including IL-6, IL-11, LIF, CT1, OSM, IL-31, CLCF1, CNTF in the thymus and thymocyte-depleted thymic stromal cells by mining public databases and by RT-PCR. Similarly, CD4+ thymocytes will be examined for the expression of receptor subunits which can couple with gp130, including IL-6R, IL-11R, LIFR, OSMRβ, IL-31Rα, CNTFRα, IL-23R, and IL-12Rβ2.

      We next will select those cytokines expressed in the thymus or thymic stromal cells with cognate receptor expression in CD4+ thymocytes and test their effect on STAT1 phosphorylation of wildtype and p28-deficient CD4+ thymocytes. If deemed necessary, double knockout mice will be engaged to rescue the hyper-Th1 phenotype.

      Reviewer #2 (Public Review):

      Summary:

      Naïve CD4 T cells in CD11c-Cre p28-floxed mice express highly elevated levels of proinflammatory IFNg and the transcription factor T-bet. This phenotype turned out to be imposed by thymic dendritic cells (DCs) during CD4SP T cell development in the thymus [PMID: 23175475]. The current study affirms these observations, first, by developmentally mapping the IFNg dysregulation to newly generated thymic CD4SP cells [PMID: 23175475], second, by demonstrating increased STAT1 activation being associated with increased T-bet expression in CD11c-Cre p28-floxed CD4 T cells [PMID: 36109504], and lastly, by confirming IL-27 as the key cytokine in this process [PMID: 27469302]. The authors further demonstrate that such dysregulated cytokine expression is specific to the Th1 cytokine IFNg, without affecting the expression of the Th2 cytokine IL-4, thus proposing a role for thymic DC-derived p28 in shaping the cytokine response of newly generated CD4 helper T cells. Mechanistically, CD4SP cells of CD11c-Cre p28-floxed mice were found to display epigenetic changes in the Ifng and Tbx21 gene loci that were consistent with increased transcriptional activities of IFNg and T-bet mRNA expression. Moreover, in autoimmune Aire-deficiency settings, CD11c-Cre p28-floxed CD4 T cells still expressed significantly increased amounts of IFNg, exacerbating the autoimmune response and disease severity. Based on these results, the investigators propose a model where thymic DC-derived IL-27 is necessary to suppress IFNg expression by CD4SP cells and thus would impose a Th2-skewed predisposition of newly generated CD4 T cells in the thymus, potentially relevant in autoimmunity.

      Strengths:

      Experiments are well-designed and executed. The conclusions are convincing and supported by the experimental results.

      Weaknesses:

      The premise of the current study is confusing as it tries to use the CD11c-p28 floxed mouse model to explain the Th2-prone immune profile of newly generated CD4SP thymocytes. Instead, it would be more helpful to (1) give full credit to the original study which already described the proinflammatory IFNg+ phenotype of CD4 T cells in CD11c-p28 floxed mice to be mediated by thymic dendritic cells [PMID: 23175475], and then, (2) build on that to explain that this study is aimed to understand the molecular basis of the original finding. In its essence, this study mostly rediscovers and reaffirms previously reported findings, but with different tools. While the mapping of epigenetic changes in the IFNg and T-bet gene loci and the STAT1 gene signature in CD4SP cells are interesting, these are expected results, and they only reaffirm what would be assumed from the literature. Thus, there is only incremental gain in new insights and information on the role of DC-derived IL-27 in driving the Th1 phenotype of CD4SP cells in CD11c-p28 floxed mice.

      Indeed, the present study is based on the finding of enhanced IFN-γ production by CD4+ T cells from CD11c-p28 floxed mice, which was originally reported by Zhang et al. and repeatedly cited in the our manuscript. We revisited this phenomenon in the context of functional bias of newly generated CD4+ T cells and sought to reveal the mechanisms underlying the hyper-Th1 phenotype in the absence of thymic DC-derived IL-27. We showed that deletion of p28 resulted in an unexpected hyperactivation of STAT1, which was accompanied by epigenetic changes in favor of Th1 bias. However, the gap remains between p28 deficiency and STAT1 activation.

      Altogether, the major issues of this study remain unresolved:

      (1) It is still unclear why the p28-deficiency in thymic dendritic cells would result in increased STAT1 activation in CD4SP cells. Based on their in vitro experiments with blocking anti-IFNg antibodies, the authors conclude that it is unlikely that the constitutive activation of STAT1 would be a secondary effect due to autocrine IFNg production by CD4SP cells. However, this possibility should be further tested with in vivo models, such as Ifng-deficient CD11c-p28 floxed mice. Alternatively, is this an indirect effect by other IFNg producers in the thymus, such as iNKT cells? It is necessary to explain what drives the STAT1 activation in CD11c-p28 floxed CD4SP cells in the first place.

      Thanks for the suggestions. Further studies will be performed to test the potential autocrine loop for IFN-γ production in vivo using Cd4-Ifng and CD11c-p28 double knockout mice. This model should also be helpful to exclude the possibility of indirect role of IFN- production by such cells as iNKT.

      As pointed out by the reviewer, a critical unanswered question is what drives the STAT1 activation in CD11c-p28 floxed CD4SP cells. Several lines of evidence point to the possibility that p28 deficiency increases the responsiveness of developing thymocytes to STAT1-activating cytokines. Firstly, IL-27p28 has recently been shown to act as an antagonist of gp130-mediated signaling. Secondly, structural studies have demonstrated that IL-27p28 is centrally positioned in the complex formed with EBI3, as well as the two receptor subunits IL-27Rα and gp130. Thirdly, we observed similar hyper-Th1 phenotype in the absence of either p28 and IL-27ra. Therefore, it is speculated that more gp130 should be available to transduce signals elicited by other cytokines in such a scenario. We will next seek to determine the candidate cytokine(s) responsible for the enhanced STAT1 activation in the absence of p28 as outlined in the response to Reviewer 1.

      (2) It is also unclear whether CD4SP cells are the direct targets of IL-27 p28. The cell-intrinsic effects of IL-27 p28 signaling in CD4SP cells should be assessed and demonstrated, ideally by CD4SP-specific deletion of IL-27Ra, or by establishing bone marrow chimeras of IL-27Ra germline KO mice.

      Thanks for the suggestions. Further studies will be performed to test whether developing thymocytes are the direct targets of IL-27 using Cd4-IL-27ra knockout mice or mixed bone marrow chimeras of wildtype and IL-27ra knockout cells.

    1. Author Response:

      We thank the editors for their assessment of our manuscript. We appreciate the reviewers’ thoughtful comments and plan to incorporate their feedback into a revised manuscript. We agree that incorporating an additional, more common ablation tool would be highly complementary to our Kir2.1 ablation studies. We also agree that images across timepoints should be expanded for contact analyses, connectomics data can be better leveraged, additional quantifications can be performed as suggested by the reviewers to better support claims, and that the introduction and discussion can be revised to better position our work in the context of previous studies. We also strongly agree that providing data on receptor RNA and protein expression in the GF across timepoints would be extremely informative, however we have found acquiring these data, at the necessary resolution, would require new approaches and tools that may be outside the scope of the project.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Farhat-Younis and colleagues demonstrate tumor-specific IgM's capacity to induce tumor cell death in monocyte-derived dendritic cell cultures. They subsequently designed a chimeric receptor based on high-affinity FcRI. However, the authors found that the transfection process was more efficient when either the variable light or heavy chain was transfected individually rather than the entire scFv. This scFv construct led to an endoplasmic reticulum (ER) stress response and scFv degradation. A considerable portion of the manuscript is dedicated to the negative scFv expression results. The authors pivoted to a modified FcgRI capable of transmitting IgM signals. This represents a tremendous amount of work in the development of this chimeric receptor, the critical experiment showing efficacy in vivo was not presented, and instead various in vitro assays are shown. Thus, this manuscript will markedly benefit from showing improved responses to tumors in vivo when macrophages express FcgRI-IgM.

      We deeply thank the reviewer for his thoughtful comments and overall favorable review of our manuscript.

      1) In a mouse tumor model, the authors demonstrated that monocyte-derived dendritic cells (MoDCs) treated with IgG immune complexes (ICs) were more effective at preventing tumor growth compared to those treated with IgM ICs (as shown in Figure 1B). In Figure 1C, their in vitro experiments revealed that IgM resulted in tumor cell death, as well as increased production of nitric oxide (NO) and granzyme B. How do the authors reconcile IgG IC-treated MoDCs performing better in preventing tumors in vivo than IgM IC-treated MoDCs, despite the in vitro results with IgM-ICs. The authors speculate that IgG IC-treated MoDCs might trigger T cell immunity but do not show T cell involvement.

      We apologize for not making this point clearer. We have extensively studied this phenomenon and published two papers that detailed the underlying mechanism in two consecutive papers (PMID: 27812544, PMID: 25924063). Briefly, we showed that DC activated with IgM-IC DC undergo cell death concomitantly to their release of lytic granules and lysis of tumor cells. As a result, they do not migrate to the lymph nodes where they should induce reactive T cell clones. In contrast, DC activated with IgG-IC do not elicit in vitro cytotoxicity but rather process the IC to present its derived antigens of MHC-II. We addressed that issue in the revised version and cited the relevant paper to further clarify it.

      (2) The authors report distinct functional consequences of MoDCs incubated with tumor-IgG complexes and tumor IgM complexes. Tumor growth was inhibited and T cell immunity induced with the former. The latter, however, elicited robust anti-tumor killing. What happens if MoDCs are incubated with both IgG and IgM complexes? If this combined treatment induces effective killing and T cell memory, would this impact the design of the chimeric receptor to include IgG responsiveness as well?

      This is a very interesting point. As mentioned above, our previous publications strongly suggest that tumor binding IgG and IgM induce different processes in myeloid cells. Yet, since MoDC naturally express the high affinity receptors for IgG FcRI, we speculate that treating tumor-bearing mice modified monocyte, alone or in combination with tumor-binding IgG, would shed some light into that. Indeed, such treatment elicit a strong T cell immunity in these mice and the data was added to Supplementary Data Figure S4J. With that being said, a complete analysis of this question is very complicated and extent beyond the scope of this work. We would like to emphasize that the purpose of this work is to highlight some of the challenges unique to genetic manipulation in myeloid cells and to suggest one alternative scaffold for integrating signaling in these cells. We do not argue that the specific solution presented here is the most potent one and more work is required before promoting such treatment into the clinic. We have added a sentence to the Discussion section that stress that issue.

      (3) In Figure 5H, the authors demonstrate the ability of the chimeric receptor construct to deplete tumor cells in vitro. The ms would improve if the authors could show the chimeric receptor construct results in tumor cell death and/or prevention in an in vivo model. Similarly, if combined stimulation with IgG and IgM complexes enhances tumor response, this should be incorporated into the therapeutic strategy.

      This is a wonderful suggestion. To address that, we challenged C57Bl/6 mice with B16F10 melanoma and allowed them to grow until it reached a palpable size of approximately 25 mm2. Concomitantly, we cultured bone marrow dendritic cells from syngeneic mice and transfected them with a linear mRNA of the alpha/mu construct. Tumor bearing mice were then treated with alpha/mu and sham transduced BMDC alone, or in combination with antibody against the melanoma antigen Trp1 (TA99). The results were added as Figure 5K and to Supplementary Figure S4h-S4I.

      Reviewer #2 (Public Review):

      Summary:

      While a significant portion of immunotherapy research has focused on the pivotal role of T cells in tumor immunity, their effectiveness may be limited by the suppressive nature of the tumor environment. On the other hand, myeloid cells are commonly found within tumors and can withstand these adverse conditions. However, these cells often adopt an immunosuppressive phenotype when infiltrating tumors. Therefore, manipulating myeloid cells could potentially enhance the anti-tumor potential of immunotherapy.

      In this manuscript, Farhat-Younes and colleagues have demonstrated that activating the IgM receptor signaling in myeloid cells induces an oxygen burst, the secretion of Granzyme B, and the lysis of adjacent tumor cells. Furthermore, they have outlined a strategy to utilize these features to generate CAR macrophages. However, they have identified a limitation: the expression of scFv in myeloid cells induces ER stress and the degradation of misfolded proteins. To address this issue, chimeric receptors were designed based on the high-affinity FcγRI for IgG. When macrophages transfected with these receptors were exposed to tumor-binding IgG, extensive tumor cell killing, and the release of reactive oxygen species and Granzyme B were observed.

      Strengths:

      In general, I consider this work to be significant, and the results are compelling. It emphasizes the specific considerations and requirements for successful manipulation in myeloid cells, which could further advance the field of cellular engineering for the benefit of immunotherapy

      We thank the reviewer for his thoughtful comments and overall appreciation of our findings.

      Weaknesses:

      Nevertheless, there are several minor issues that should be addressed:

      (1) TCR fragments are commonly used to induce ER stress in non-immune cells. Therefore, it would be interesting to investigate whether TCR fragments can be expressed in myeloid cells and if they induce ER stress. Addressing this issue would support the notion that these cells lack the ER chaperones required for folding immunoglobulin variable chains.

      This is a wonderful suggestion. To assess that possibility, we cloned the alpha chain of anti-Trp1 TCR and transfected RAW 264.7 macrophages. Importantly, we could not detect expression on this construct in macrophages, further supporting our findings with scFv in these cells. We added this result to Figure 4J and Supplementary Figure S3C.

      (2) It would be valuable to determine whether, after the degradation of scFv fragments by myeloid cells, they are presented on MHC-I and MHC-II.

      This is a very interesting point. To address that, we generated a genetic construct where we fused the anti-CD19 scFv to a polypeptide composed from the MHCI and the MHCII fragments of Ova Albumin. Next, DC 2.4 were transfected with this construct and measured their capacity to stimulate the proliferation of CD8+ T cells from OT-I and CD4+ from OT-II mice. DC transfected with this construct efficiently stimulated the proliferation of both T cells, suggesting that both Ova fragments are indeed presented on MHCI and MHCII. Nonehteless, DC transfected with polypeptide of MHCI and MHCII fragments of Ova Albumin only (with no scFv), were almost equally effective in stimulating OT-I and OT-II T cell proliferation. We added that result to Supplementary Figure S3D-S3E.

      (3) Some methodological details, such as the vaccination protocol and high-resolution microscopy procedures, are missing from the text.

      We thank the reviewer for pointing out these issues. We added the missing details to the revised version of the manuscript.

    1. Author response:

      We thank both reviewers for their feedback and for underlying the potential of our new tool and experimental approach to identify signalling molecules that can improve the in vitro derivation of specific cell types from human pluripotent stem cells. To address the reviewers' points we plan to carry out further analysis that should solidify our conclusions. We will also edit the text to temper conclusions where appropriate.

    1. Author response:

      The following is the authors’ response to the current reviews.

      We sincerely appreciate the reviewer’s dedication to evaluating our manuscript and raising essential considerations regarding the classification of the migration behavior we described. While the reviewer suggests that this behavior aligns with the concept of itinerancy, we contend that it represents a distinct phenomenon, albeit with similarities, as both involve the non-breeding movements of birds. We acknowledge that our manuscript did not adequately address this distinction and have considered the reviewer’s feedback. In our response, we clarify the difference between the described phenomenon and itinerancy. Our revised manuscript will include a new section in the Discussion to address this issue comprehensively.

      In the first part of the review, the reviewer emphasizes that the pattern we are describing is consistent with itinerancy. Regardless of the terminology used, we want to highlight the existence of two different types of migratory behavior, both of which involve movement in non-breeding areas.

      The first type, called itinerancy, was first described by Moreau in 1972 in “The Palaearctic-African Bird Migration Systems.” As noted by the reviewer, this behavior involves an alternation of stopovers and movements between different short-term non-breeding residency areas. They usually occur in response to food scarcity in one part of the non-breeding range, causing birds to move to another part of the same range. These movements typically cover distances of 10 to 100 kilometers but are neither continuous nor directional. Moreau (1972) defined itinerancy as prolonged stopovers, normally lasting several months, primarily in tropical regions. He noted observations of certain species disappearing from his study areas in sub-Saharan Africa in December and others appearing, suggesting they may have multiple home ranges during the non-breeding season. Subsequent research, as mentioned by the reviewer, has confirmed itinerancy in many species, particularly among Palaearctic-African migrants in sub-Saharan Africa. In particular, the Montagu’s Harrier has been extensively studied in this regard. The reviewer rightly points out that our study does not include recent findings on this species. In our revised version, we will include references to recent studies, such as those by Trierweiler et al. (2013, Journal of Animal Ecology, 82:107-120) and Schlaich et al. (2023, Ardea, 111:321-342), which show that Montagu’s Harrier has an average of 3-4 home ranges separated by approximately 200 kilometers. These studies suggest that the species spends approximately 1.5 months at each site, with the most extended period typically observed at the last site before migrating to the breeding grounds.

      In the second type, birds undertake a post-breeding migration, arrive in their non-breeding range, and then gradually move in a particular direction throughout the season. This continuous directional movement covers considerable distances and continues throughout the non-breeding period. In our study, this movement covered about 1000 km, comparable to the total migration distance of Rough-legged Buzzards of about 1500 km. As observed in our research, these movements are influenced by external factors such as snow cover. In such cases, the progression of snow cover in a south-westerly direction during winter can prevent birds from finding food, forcing them to continue migrating in the same direction. In essence, this movement represents a prolonged phase of the migration process but at a slower pace. Similar behavior has been documented in buzzards, as reported by Strandberg et al. (2009, Ibis 151:200-206). Although several transmitters in their study stopped working in mid-winter, the authors observed a phenomenon they termed ‘prolonged autumn migration.’

      In the second part of the review, the reviewer questions the need to distinguish between the two behaviors we have discussed. However, we believe these behaviors differ in their structure (with the first being intermittent and often non-directional, whereas the second is continuous and directional) and in their causes (with the first being driven by seasonal food resource cycles and the second by advancing snow cover). We therefore argue that it is worth distinguishing between them. To differentiate these forms of non-breeding movement, we propose to use ‘itinerancy’ for the first type, as described initially by Moreau in 1972, and introduce a separate term for the second behavior. Although ‘slow directional itinerancy’ could be considered, we find it too cumbersome.

      Moreover, ‘itinerancy’ in the literature refers not only to non-breeding movements but also to the use of different nesting sites, e.g., Lislevand et al. (2020, Journal of Avian Biology: e02595), reinforcing its association with movements between multiple sites within habitats. We, therefore, propose that the second behavior be given a distinct name. We acknowledge the reviewer’s point that we did not adequately address this distinction in the Discussion and plan to include a separate section in our paper’s revised version. In the third part of his review, the reviewer suggests an alternative title. Another reviewer, Dr Theunis Piersma, suggested the current title during the first round of reviewing, and we have chosen his version.

      In the fourth part of the review, the reviewer questions whether it is appropriate to discuss the conservation aspect of this study. This type of non-breeding movement raises concerns about accurately determining non-breeding ranges and population dynamics for species that exhibit this behavior. We believe that accurate determination of range and population dynamics is critical to conservation efforts. While this may be less important for species breeding in Europe and migrating to Africa, for which monitoring breeding territories is more feasible, it’s essential for Arctic and sub-Arctic breeding species. Large-scale surveys in these regions have historically been challenging and have become even more so with the end of Arctic cooperation following Russia’s war with Ukraine (Koivurova, Shibata, 2023). For North America and Europe, non-breeding abundance is typically estimated once per season in mid-winter. In North America, these are the so-called Christmas counts (which take place once at the end of December), and in Europe, they are the IWC counts mentioned by the reviewer (as follows from their official website - “The IWC requires a single count at each site, which should be repeated each year. The exact dates vary slightly from region to region, but take place in January or February”). Because of such a single count in mid-winter, non-breeding habitats occupied in autumn and spring will be listed as ‘uncommon’ at best, while south-western habitats where birds are only present in mid-winter will be listed as ‘common.’ However, the situation will be reversed if we consider the time birds spend in these habitats.

      The reviewer also highlights the introduction’s unconventional structure and information redundancy at the beginning. We have chosen this structure and provided basic explanations to improve readability for a wider audience, given eLife’s readership. At the same time, we will certainly take the reviewers’ feedback into account in the revised version. We plan to include the references to modern itinerancy research mentioned above and to add a section on itinerancy to the Discussion.

      We appreciate the reviewer’s input and sincerely thank them for their time and effort in reviewing our paper. While we may not fully agree on the classification of the behavior we describe, we value the opportunity to engage in discussion and believe that presenting arguments and counterarguments to the reader is beneficial to scientific progress.


      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      I much enjoyed reading this manuscript, that is, once I understood what it is about. Titles like "Conserving bird populations in the Anthropocene: the significance of non-breeding movements" are a claim to so-called relevance, they have NOTHING to do with the content of the paper, so once I understood that this paper was about the "Quick quick slow: the foxtrot migration of rough-legged buzzards is a response to habitat and snow" (an alternative title), it was becoming very interesting. So the start of the abstract as well as the introduction is very tedious, as clearly much trouble is taken here to establish reputability. In my eyes this is unnecessary: eLife should be interested in publishing such a wonderful description of such a wonderful migrant in a study that comes to grips with limiting factors on a continental scale!

      We sincerely appreciate your time and effort in reviewing our manuscript. Thank you for your appreciation of our study.

      We agree that the focus of the article should be changed from conservation to migration patterns. We have rewritten the Introduction and Discussion as suggested. We have added the application of this pattern including conservation at the end of the Discussion by completely changing Figure 5. We have also changed the title to the suggested one.

      Not sure that the first paragraph statements that seek to downplay what we know about wintering vs breeding areas are valid (although I see what purpose they serve). Migratory shorebirds have extensively been studied in the nonbreeding areas, for example, including movement aspects (see, as just one example, Verhoeven, M.A., Loonstra, A.H.J., McBride, A.D., Both, C., Senner, N.R. & Piersma, T. (2020) Migration route, stopping sites, and non breeding destinations of adult Black tailed Godwits breeding in southwest Fryslân, The Netherlands. Journal of Ornithology 162, 61-76) and there are very impressive studies on the winter biology of migrants across large scale (for example in Zwarts' Living on the Edge book on the Sahel wetlands). Think also about geese and swans and about seabirds!

      We have rewritten the first paragraph and it now talks about patterns of migratory behavior. We have also rewritten the second paragraph, now it is devoted to studies of movements in the non-breeding period. We explain how our pattern differs from those already studied and give references to the papers you mentioned.

      Directional movements in nonbreeding areas as a function of food (in this case locusts) have really beautifully been described by Almut Schlaich et al in JAnimEcol for Montagu's harriers.

      We have added Montagu's harrier example in the second paragraph of the Introduction and the Discussion. We have added a reference to Schlaich and to Garcia and Arroyo, who suggested that Montagu's harriers have long directional migrations during the non-breeding period.

      Once the paper starts talking buzzards, and the analyses of the wonderful data, all is fine. It is a very competent analysis with a description of a cool pattern.

      Thank you for your appreciation of our study. We hope the revised version is better and clearer.

      However, i would say that it is all a question of spatial scale. The buzzards here respond to changes in food availability, but there is not an animal that doesn't. The question is how far they have to move for an adequate response: in some birds movements of 100s of meters may be enough, and then anything to the scale of rough-legged buzzards.

      In the new version of the manuscript, we emphasize that this is a large distance (about 1000 km), comparable to the distance of the fall and spring migrations (about 1400 km) in lines 70-72 of the Introduction and 379-383 of the Discussion.

      And actually, several of the shorebirds I know best also do a foxtrot, such as red knots and bar-tailed godwits moulting in the Wadden Sea, then spending a few months in the UK estuaries, before returning to the Wadden Sea before the long migrations to Arctic breeding grounds. The publication of the rough-legged buzzard story may help researchers to summarize patterns such as this too. Mu problem with this paper is the framing. A story on the how and why of these continental movements in response to snow and other habitat features would be a grand contribution. Drop Anthropocene, and rethink whether foxtrot should be introduced as a hypothesis or a summary of cool descriptions. I prefer the latter, and recommend eLife to go with that too, rather than encourage "disconnected frames that seek 'respectability'" Good luck, theunis piersma

      We thank the reviewer again for his valuable comments and suggestions. We have changed the framing to the suggested one and removed the Anthropocene from the article.

      Reviewer #2 (Recommendations For The Authors):

      We sincerely appreciate the time and effort you have taken to review our manuscript. We have carefully considered all of your comments, including both public and author comments, and provided detailed responses to each of them below. In addition, we would like to address the most important public comments.

      We agree with the suggestion to shift the focus of the article from conservation to migration patterns. Accordingly, we have rewritten both the Introduction and Discussion sections to focus on migration behavior rather than conservation.

      However, we respectfully disagree with the suggestion that the migration patterns we describe are synonymous with itinerancy. We acknowledge that our original presentation may have been unclear and may have hindered full understanding. In the revised version, we provide a detailed analysis of migratory behavior in the Introduction that describes how our pattern differs from itinerancy. We also revisit this distinction in the Discussion section. We have also carefully revised Figure 1 to improve clarity and avoid potential misunderstandings.

      Regarding the applicability of the described migration pattern, we acknowledge that the Rough-legged Buzzard is not listed as an endangered species. However, we believe that our findings have practical implications. We have moved our discussion of this issue to the end of the Discussion section and have completely revised Figure 5. While the overall population of Rough-legged Buzzards is not declining, certain regions within its range are experiencing declines. We show that this decline does not warrant listing the species as endangered. Instead, it may represent a redistribution within the non-breeding range - a shift in range dynamics. We use the example of the Rough-legged Buzzard to illustrate this concept and emphasize the importance of considering such dynamics when assessing the conservation status of species in the future.

      We also acknowledge that the hypothesis of this form of behavior has been proposed previously for Montagu's Harrier, and we have included this information in the revised manuscript. In addition, we agree that the focus on the Anthropocene is unnecessary in this context and have therefore removed it.

      We believe that these revisions significantly improve the clarity and robustness of the manuscript, and we are grateful for your insightful comments and suggestions.

      As a general comment, please note that including line numbers (as it is the standard in any manuscript submission) would facilitate reviewers providing more detailed comments on the text.

      We apologize for this oversight and have added line numbers to our revised manuscript.

      Dataset: unclear what is the frequency of GPS transmissions. Furthermore, information on relative tag mass for the tracked individuals should be reported.

      We have included this information in our manuscript (L 157-163). We also refer to the study in which this dataset was first used and described in detail (L 164).

      Data pre-processing: more details are needed here. What data have been removed if the bird died? The entire track of the individual? Only the data classified in the last section of the track? The section also reports on an 'iterative procedure' for annotating tracks, which is only vaguely described. A piecewise regression is mentioned, but no details are provided, not even on what is the dependent variable (I assume it should be latitude?).

      Regarding the deaths. We only removed the data when the bird was already dead. We have corrected the text to make this clear (L 170).

      Regarding the iterative procedure. We have added a detailed description on lines 175-188.

      Data analysis: several potential issues here:

      (1) Unclear why sex was not included in all mixed models. I think it should be included.

      Our dataset contains 35 females and eight males. This ratio does not allow us to include sex in all models and adequately assess the influence of this factor. At the same time, because adult females disperse farther than males in some raptor species, we conducted a separate analysis of the dependence of migration distance on sex (Table S8) and found no evidence for this in our species. We have written a separate paragraph about this. This paragraph can be found on lines 356-360 of the new manuscript.

      (2) Unclear what is the rationale of describing habitat use during migration; is it only to show that it is a largely unsuitable habitat for the species? But is a formal analysis required then? Wouldn't be enough to simply describe this?

      Habitat use and snow cover determine the two main phases (quick and slow) of the pattern we describe. We believe that habitat analysis is appropriate in this case and that a simple description would be uninformative and would not support our conclusions.

      (3) Analysis of snow cover: such a 'what if' analysis is fine but it seems to be a rather indirect assessment of the effect of snow cover on movement patterns. Can a more direct test be envisaged relating e.g. daily movement patterns to concomitant snow cover? This should be rather straightforward. The effectiveness of this method rests on among-year differences in snow cover and timing of snowfall. A further possibility would be to demonstrate habitat selection within the entire non-breeding home range of an individual in relation snow cover. Such an analysis would imply associating presence-absence of snow to every location within the non-breeding range and testing whether the proportion of locations with snow is lower than the proportion of snow of random locations within the entire non-breeding home range (95% KDE) for every individual (e.g. by setting a 1/10 ratio presence to random locations).

      The proposed analysis will provide an opportunity to assess whether the Rough-legged Buzzard selects areas with the lowest snow cover, but will not provide an opportunity to follow the dynamics and will therefore give a misleading overall picture. This is especially true in the spring months. In March-April, Rough-legged Buzzards move northeast and are in an area that is not the most open to snow. At this time, areas to the southwest are more open to snow (this can be seen in Figure 4b). If we perform the proposed analysis, the control points for this period would be both to the north (where there is more snow) and to the south (where there is less snow) from the real locations, and the result would be that there is no difference in snow cover.

      A step-selection analysis could be used, as we did in our previous work (Curk et al 2020 Sci Rep) with the same Rough-legged Buzzard (but during migration, not winter). But this would only give us a qualitative idea, not a quantitative one - that Rough-legged Buzzards move from snow (in the fall) and follow snowmelt progression (in the spring).

      At the same time, our analysis gives a complete picture of snow cover dynamics in different parts of the non-breeding range. This allows us to see that if Rough-legged Buzzards remained at their fall migration endpoint without moving southwest, they would encounter 14.4% more snow cover (99.5% vs. 85.1%). Although this difference may seem small (14.4%), it holds significance for rodent-hunting birds, distinguishing between complete and patchy snow cover. Simultaneously, if Rough-legged Buzzards immediately flew to the southwest and stayed there throughout winter, they would experience 25.7% less snow cover (57.3% vs. 31.6%). Despite a greater difference than in the first case, it doesn't compel them to adopt this strategy, as it represents the difference between various degrees of landscape openness from snow cover.

      We write about this in the new manuscript on lines 385-394.

      Results: it is unclear whether the reported dispersion measures are SDs or SEs. Please provide details.

      For the date and coordinates of the start and end of the different phases of migration, we specified the mean, sd, and sample size. We wrote this in line 277. For the values of the parameters of the different phases of the migration (duration, distance, speed, and direction), we used the mean, the standard error of the mean, and the confidence interval (obtained using the ‘emmeans’ package). We have indicated this in lines 302-303 and the caption of Table 1 (L 315) and Figure 2 (L 293-294). For the values of habitat and snow cover experienced by the Rough-legged Buzzards, we used the mean and the error of the mean. We reported this on lines 322 and 337 and in Figures 3 (L 332-333) and 4 (L 355-356).

      Discussion: in general, it should be reshaped taking into account the comments. It is overlong, speculative and quite naive in several passages. Entire sections can be safely removed (I think it can be reduced by half without any loss of information). I provide some examples of the issues I have spotted below. For instance, the entire paragraph starting with 'Understanding....' is not clear to me. What do you mean by 'prohibited management' options? Without examples, this seems a rather general text, based on unclear premises when related to the specific of this study. Some statements are vague, derive from unsubstantiated claims, and unclear. E.g. "Despite their scarcity in these habitats, forests appear to hold significant importance for Rough-legged buzzards for nocturnal safety". I could not find any day-night analysis showing that they actually roost in forests during nighttime. Being a tundra species, it may well be possible that rough-legged buzzards perceive forests as very dangerous habitats and that they prefer instead to roost in open habitats. Analysing habitat use during day and night during the non-breeding period may be of help to clarify this. Furthermore, considering the fast migration periods, what is the flight speed during day and night above forests? Do these birds also migrate at night or do they roost during the night? Perhaps a figure visualizing day and night track segments could be of help (or an analysis of day vs. night flight speed) (there are several R packages to annotate tracks in relation to day and night). This is an example of another problematic statement: "The progression of snow cover in the wintering range of Rough-legged buzzards plays a significant role in their winter migration pattern." The manuscript does not contain any clear demonstration of this, as I wrote in my previous comments. Without such evidence, you must considerably tone down such assertions. But since providing a direct link is certainly possible, I think that additional analyses would clearly strengthen your take-home message.

      The paragraph starting with "The quantification of environmental changes that could prove fatal to bird species presents yet another challenge for conservation efforts in an era of rapid global change." is quite odd. Take the following statement "For instance, the presence of small patches of woodland in the winter range might appear crucial to the survival of the Rough-legged buzzard. Elimination of these seemingly minor elements of vegetation cover through management actions could have dire consequences for the species.". It is based on the assumption that minor vegetation elements play a key role in the ecology of the species, without any evidence supporting this. Does it have any sense? I could safely say exactly the opposite and I would believe it might even be more substantiated.

      We agree with these comments.

      We have completely rewritten this section. As suggested, we have shortened it by removing statements that were not supported by the research. We have completely removed the statements about "prohibited management". We have also removed the statement that "forests appear to be of significant importance to Rough-legged buzzards for nocturnal safety" and everything associated with that statement, e.g. the statement about "small elements of vegetation cover", etc. We do believe that this statement is true in substance, but we also agree that it is not supported by the results and requires separate analysis. At the same time, we believe that this is a topic for a separate study and would be redundant here. Therefore, we leave it for a separate publication.

      Conclusion paragraph: I believe this severely overstates the conservation importance of this study. That the results have "crucial implications for conservation efforts in the Anthropocene, where rapidly changing environmental factors can severely impact bird migration" seems completely untenable to me. What is the evidence for such crucial implications? For instance, these results may suggest that climate change, because global warming is predicted to reduce snow cover in the non-breeding areas, might well be beneficial for populations of this species, by reducing non-breeding energy expenditure and improving non-breeding survival. I think statements like these are simply not necessary, and that the study should be more focused on the actual results and evidence provided.

      We have completely rewritten this section. We removed the reference to the Anthropocene and focused on migratory behavior and migration patterns.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Review:

      Summary:

      In this manuscript, the authors set out to understand how different TLR4 agonists trigger Myddosome assembly and seek to examine how the potent LPS agonist induces a heightened TLR4 response. A strength of the study is that the authors employ a novel light sheet imaging modality coupled to nanopipette delivery of TLR4 ligands. The authors use this technological innovation to resolve the dynamics of Myddosome formation within the whole cell volume of macrophage cell lines expressing MyD88-YFP. The main finding is that the kinetics of Myddosome formation is slower for the weaker agonist Abeta than LPS. However, Abeta amyloids resulted in the formation of larger MyD88-YFP puncta that persisted for longer. The authors suggest the slower kinetics of formation and larger puncta size reflect how Abeta amyloids are a less efficient TLR4 agonist. Many Toll-like receptors are now known to recognize endogenous produced danger signals and microbially derived molecules. This work is the first to compare the signaling kinetics of endogenous versus microbially derived TLR agonists.

      Strengths:

      A key strength of this work is the technological achievement of imaging Myddosomes within the entire cell volume and using a nanopipette to administer ligands directly to single cells. The authors also combine this light sheet microscopy with STORM imaging to gain a super-resolved view of the assembly of Myddosomes. These findings suggest that Myddosomes formed in response to Abeta have a more irregular morphology. We conclude that these technological achievements are significant in improving our understanding of the dynamics of TLR4 signaling in response to diverse agonists. Given the limited literature on the molecular dynamics of innate immune signal transduction, this study is an important addition to the field.

      Weaknesses:

      One limitation of the paper is that a suitable explanation for how larger Myddosomes would contribute to an attenuated downstream signaling response. Do the larger clusters of nucleated MyD88 polymers reflect inefficiency in assembling fully formed Myddosomes that contain IRAK4/2? Could the MyD88-GFP puncta be stained with antibodies against IRAK4 (or IRAK2) to determine the frequency and probably of the two ligands to stimulate signal transduction beyond MyD88 assembly?

      A second weakness is the discussion. The authors should explore other explanations for the observed differences in Myddosome formation between TLR4 agonists. For example, could the observed delay in Myddosome assembly in response to Abeta be due to different binding affinity or kinetics to TLR4? Can this be ruled out?

      We thank the reviewer for these comments.

      To address the first comment we have added a section on the limitations of the current study and suggested that future work could use IRAK4 or 2 staining to identify Myddosomes that are functional as well as working with cells where the Myddosome expression levels is at physiological levels, which may reduce the formation of larger Myddosomes.

      The reviewer is correct that the difference in delay time for Myddosome formation could be due slow formation of a TLR4 dimer or binding to the TLR4 dimer, rather than the time take to assemble the Myddosome after TLR4 dimerisation and binding since we have only measured the delay time for Myddosome formation when triggered by LPS or Aβ aggregates. This delay times involves dimerization of TLR4, binding of LPS or Aβ aggregates to the TLR4 dimer followed by Myddosome formation. These other processes might contribute to the difference in delay time that we observed between LPS or Aβ aggregates. It is worth noting that in our experiments we deliver the LPS or Aβ aggregates directly onto the surface for 5 seconds and that we previously showed the presence of the preformed TLR4 dimers on the cell surface (Latty et al., 2018). The affinity of Aβ aggregates for TLR4 is not known but LPS has a high affinity for TLR4, estimated to ∼3 nM for lipid A–TLR4-MD-2 (Akashi et al., 2003). However, even with this high affinity which implies fast binding, direct delivery directly onto the surface and the presence of preformed TLR4 dimers on the cell surface we observed that it took 80 s to observe Myddosome formation. This indicates that Myddosome formation is the slow step for LPS triggering. This is likely to be the case Aβ aggregates, since pM concentrations of aggregates can trigger TLR4 signalling (Hughes et al., 2020) indicating high affinity. However, it is not possible to rule out a contribution of a difference in affinity to observed difference in delay time without measuring the affinity directly.

      We have added both these points to a new paragraph on the limitations of the study in the Discussion.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We are grateful for these balanced, nuanced evaluations of our work concerning the observed epistatic trends and our interpretations of their mechanistic origins. Overall, we think the reviewers have done an excellent job at recognizing the novel aspects of our findings while also discussing the caveats associated with our interpretations of the biophysical effects of these mutations. We believe it is important to consider both of these aspects of our work in order to appreciate these advances and what sorts of pertinent questions remain.

      Notably, both reviewers are concerned that our lack of experimental approaches to compare the conformational properties of GnRHR variants weakens our claims. We would first humbly suggest that this constitutes a more general caveat that applies to nearly all investigations of the cellular misfolding of α-helical membrane proteins. Whether or not any current in vitro folding measurements report on conformational transitions that are relevant to cellular protein misfolding reactions remains an active area of debate (discussed further below). Nevertheless, while we concede that our structural and/ or computational evaluations of various mutagenic effects remain speculative, prevailing knowledge on the mechanisms of membrane protein folding suggest our mutations of interest (V276T and W107A) are highly unlikely to promote misfolding in precisely the same way. Thus, regardless of whether or not we were able experimentally compare the relevant folding energetics of GnRHR variants, we are confident that the distinct epistatic interactions formed by these mutations reflect variations in the misfolding mechanism and that they are distinct from the interactions that are observed in the context of stable proteins. In the following, we provide detailed considerations concerning these caveats in relation to the reviewers’ specific comments.

      Reviewer #1 (Public Review):

      The paper carries out an impressive and exhaustive non-sense mutagenesis using deep mutational scanning (DMS) of the gonadotropin-releasing hormone receptor for the WT protein and two single point mutations that I) influence TM insertion (V267T) and ii) influence protein stability (W107A), and then measures the effect of these mutants on correct plasma membrane expression (PME).

      Overall, most mutations decreased mGnRHR PME levels in all three backgrounds, indicating poor mutational tolerance under these conditions. The W107A variant wasn't really recoverable with low levels of plasma membrane localisation. For the V267T variant, most additional mutations were more deleterious than WT based on correct trafficking, indicating a synergistic effect. As one might expect, there was a higher degree of positive correlation between V267T/W107A mutants and other mutants located in TM regions, confirming that improper trafficking was a likely consequence of membrane protein co-translational folding. Nevertheless, context is important, as positive synergistic mutants in the V27T could be negative in the W107A background and vice versa. Taken together, this important study highlights the complexity of membrane protein folding in dissecting the mechanism-dependent impact of disease-causing mutations related to improper trafficking.

      Strengths

      This is a novel and exhaustive approach to dissecting how receptor mutations under different mutational backgrounds related to co-translational folding, could influence membrane protein trafficking.

      Weaknesses

      The premise for the study requires an in-depth understanding of how the single-point mutations analysed affect membrane protein folding, but the single-point mutants used seem to lack proper validation.

      Given our limited understanding of the structural properties of misfolded membrane proteins, it is unclear whether the relevant conformational effects of these mutations can be unambiguously validated using current biochemical and/ or biophysical folding assays. X-ray crystallography, cryo-EM, and NMR spectroscopy measurements have demonstrated that many purified GPCRs retain native-like structural ensembles within certain detergent micelles, bicelles, and/ or nanodiscs. However, helical membrane protein folding measurements typically require titration with denaturing detergents to promote the formation of a denatured state ensemble (DSE), which will invariably retain considerable secondary structure. Given that the solvation provided by mixed micelles is clearly distinct from that of native membranes, it remains unclear whether these DSEs represent a reasonable proxy for the misfolded conformations recognized by cellular quality control (QC, see https://doi.org/10.1021/acs.chemrev.8b00532). Thus, the use and interpretation of these systems for such purposes remains contentious in the membrane protein folding community. In addition to this theoretical issue, we are unaware of any instances in which GPCRs have been found to undergo reversible denaturation in vitro- a practical requirement for equilibrium folding measurements (https://doi.org/10.1146/annurev-biophys-051013-022926). We note that, while the resistance of GPCRs to aggregation, proteolysis, and/ or mechanical unfolding have also been probed in micelles, it is again unclear whether the associated thermal, kinetic, and/ or mechanical stability should necessarily correspond to their resistance to cotranslational and/ or posttranslational misfolding. Thus, even if we had attempted to validate the computational folding predictions employed herein, we suspect that any resulting correlations with cellular expression may have justifiably been viewed by many as circumstantial. Simply put, we know very little about the non-native conformations are generally involved in the cellular misfolding of α-helical membrane proteins, much less how to measure their relative abundance. From a philosophical standpoint, we prefer to let cells tell us what sorts of broken protein variants are degraded by their QC systems, then do our best to surmise what this tells us about the relevant properties of cellular DSEs.

      Despite this fundamental caveat, we believe that the chosen mutations and our interpretation of their relevant conformational effects are reasonably well-informed by current modeling tools and by prevailing knowledge on the physicochemical drivers of membrane protein folding and misfolding. Specifically, the mechanistic constraints of translocon-mediated membrane integration provide an understanding of the types of mutations that are likely to disrupt cotranslational folding. Though we are still learning about the protein complexes that mediate membrane translocation (https://doi.org/10.1038/s41586-022-05336-2), it is known that this underlying process is fundamentally driven by the membrane depth-dependent amino acid transfer free energies (https://doi.org/10.1146/annurev.biophys.37.032807.125904). This energetic consideration suggests introducing polar side chains near the center of a nascent TMDs should almost invariably reduce the efficiency of topogenesis. To confirm this in the context of TMD6 specifically, we utilized a well-established biochemical reporter system to confirm that V276T attenuates its translocon-mediated membrane integration (Fig. S1)- at least in the context of a chimeric protein. We also constructed a glycosylation-based topology reporter for full-length GnRHR, but ultimately found its’ in vitro expression to be insufficient to detect changes in the nascent topological ensemble.

      In contrast to V276T, the W107A mutation is predicted to preserve the native topological energetics of GnRHR due to its position within a soluble loop region. W107A is also unlike V276T in that it clearly disrupts tertiary interactions that stabilize the native structure. This mutation should preclude the formation of a structurally conserved hydrogen bonding network that has been observed in the context of at least 25 native GPCR structures (https://doi.org/10.7554/eLife.5489). However, without a relevant folding assay, the extent to which this network stabilizes the native GnRHR fold in cellular membranes remains unclear. Overall, we admit that these limitations have prevented us from measuring how much V276T alters the efficiency of GnRHR topogenesis, how much the W107A destabilizes the native fold, or vice versa. Nevertheless, given these design principles and the fact that both reduce the plasma membrane expression of GnRHR, as expected, we are highly confident that the structural defects generated by these mutations do, in fact, promote misfolding in their own ways. We also concede that the degree to which these mutagenic perturbations are indeed selective for specific folding processes is somewhat uncertain. However, it seems exceedingly unlikely that these mutations should disrupt topogenesis and/ or the folding of the native topomer to the exact same extent. From our perspective, this is the most important consideration with respect to the validity of the conclusions we have made in this manuscript.

      Furthermore, plasma membrane expression has been used as a proxy for incorrect membrane protein folding, but this not necessarily be the case, as even correctly folded membrane proteins may not be trafficked correctly, at least, under heterologous expression conditions. In addition, mutations can affect trafficking and potential post-translational modifications, like glycosylation.

      While the reviewer is correct that the sorting of folded proteins within the secretory pathway is generally inefficient, it is also true that the maturation of nascent proteins within the ER generally bottlenecks the plasma membrane expression of most α-helical membrane proteins. Our group and several others have demonstrated that the efficiency of ER export generally appears to scale with the propensity of membrane proteins to achieve their correct topology and/ or to achieve their native fold (see https://doi.org/10.1021/jacs.5b03743 and https://doi.org/10.1021/jacs.8b08243). Notably, these investigations all involved proteins that contain native glycosylation and various other post-translational modification sites. While we cannot rule out that certain specific combinations of mutations may alter expression through their perturbation of post-translational GnRHR modifications, we feel confident that the general trends we have observed across hundreds of variants predominantly reflect changes in folding and cellular QC. This interpretation is supported by the relationship between observed trends in variant expression and Rosetta-based stability calculations, which we identified using unbiased unsupervised machine learning approaches (compare Figs. 6B & 6D).

      Reviewer #2 (Public Review):

      Summary:

      In this paper, Chamness and colleagues make a pioneering effort to map epistatic interactions among mutations in a membrane protein. They introduce thousands of mutations to the mouse GnRH Receptor (GnRHR), either under wild-type background or two mutant backgrounds, representing mutations that destabilize GnRHR by distinct mechanisms. The first mutant background is W107A, destabilizing the tertiary fold, and the second, V276T, perturbing the efficiency of cotranslational insertion of TM6 to the membrane, which is essential for proper folding. They then measure the surface expression of these three mutant libraries, using it as a proxy for protein stability, since misfolded proteins do not typically make it to the plasma membrane. The resulting dataset is then used to shed light on how diverse mutations interact epistatically with the two genetic background mutations. Their main conclusion is that epistatic interactions vary depending on the degree of destabilization and the mechanism through which they perturb the protein. The mutation V276T forms primarily negative (aggravating) epistatic interactions with many mutations, as is common to destabilizing mutations in soluble proteins. Surprisingly, W107A forms many positive (alleviating) epistatic interactions with other mutations. They further show that the locations of secondary mutations correlate with the types of epistatic interactions they form with the above two mutants.

      Strengths:

      Such a high throughput study for epistasis in membrane proteins is pioneering, and the results are indeed illuminating. Examples of interesting findings are that: (1) No single mutation can dramatically rescue the destabilization introduced by W107A. (2) Epistasis with a secondary mutation is strongly influenced by the degree of destabilization introduced by the primary mutation. (3) Misfolding caused by mis-insertion tends to be aggravated by further mutations. The discussion of how protein folding energetics affects epistasis (Fig. 7) makes a lot of sense and lays out an interesting biophysical framework for the findings.

      Weaknesses:

      The major weakness comes from the potential limitations in the measurements of surface expression of severely misfolded mutants. This point is discussed quite fairly in the paper, in statements like "the W107A variant already exhibits marginal surface immunostaining" and many others. It seems that only about 5% of the W107A makes it to the plasma membrane compared to wild-type (Figures 2 and 3). This might be a low starting point from which to accurately measure the effects of secondary mutations.

      The reviewer raises an excellent point that we considered at length during the analysis of these data and the preparation of the manuscript. Though we remain confident in the integrity of these measurements and the corresponding analyses, we now realize this aspect of the data required further discussion and documentation which we have provided in the revised version of the manuscript as is described in the following.

      Still, the authors claim that measurements of W107A double mutants "still contain cellular subpopulations with surface immunostaining intensities that are well above or below that of the W107A single mutant, which suggests that this fluorescence signal is sensitive enough to detect subtle differences in the PME of these variants". I was not entirely convinced that this was true.

      We made this statement based on the simple observation that the surface immunostaining intensities across the population of recombinant cells expressing the library of W107A double mutants was consistently broader than that of recombinant cells expressing W107A GnRHR alone (see Author response image 1 for reference). Given that the recombinant cellular library represents a mix of cells expressing ~1600 individual variants that are each present at low abundance, the pronounced tails within this distribution presumably represent the composite staining of many small cellular subpopulations that express collections of variants that deviate from the expression of W107A to an extent that is significant enough to be visible on a log intensity plot.

      Author response image 1.

      Firstly, I think it would be important to test how much noise these measurements have and how much surface immunostaining the W107A mutant displays above the background of cells that do not express the protein at all.

      For reference, the average surface immunostaining intensity of HEK293T cells transiently expressing W107A GnRHR was 2.2-fold higher than that of the IRES-eGFP negative, untransfected cells within the same sample- the WT immunostaining intensity was 9.5-fold over background by comparison. Similarly, recombinant HEK293T cells expressing the W107A double mutant library had an average surface immunostaining intensity that was 2.6-fold over background across the two DMS trials. Thus, while the surface immunostaining of this variant is certainly diminished, we were still able to reliably detect W107A at the plasma membrane even under distinct expression regimes. We have included these and other signal-to-noise metrics for each experiment in the Results section of the revised manuscript.

      Beyond considerations related to intensity, we also previously noticed the relative intensity values for W107A double mutants exhibited considerable precision across our two biological replicates. If signal were too poor to detect changes in variant expression, we would have expected a plot of the intensity values across these two replicates to form a scatter. Instead, we found DMS intensity values for individual variants to be highly correlated from one replicate to the next (Pearson’s R2 = 0.95, see Author response image 2 for reference). This observation empirically demonstrates that this assay consistently differentiated between variants that exhibit slightly enhanced immunostaining from those that have even lower immunostaining than W107A GnRHR. We have included these discussion points in the Results section as well as scatter plots for replicate variant intensities within all three genetic backgrounds in Figure S3 of the revised manuscript.

      Author response image 2.

      But more importantly, it is not clear if under this regimen surface expression still reports on stability/protein fitness. It is unknown if the W107A retains any function or folding at all. For example, it is possible that the low amount of surface protein represents misfolded receptors that escaped the ER quality control.

      While we believe that such questions are outside the scope of this work, we certainly agree that it is entirely possible that some of these variants bypass QC without achieving their native fold. This topic is quite interesting to us but is quite challenging to assess in the context of GPCRs, which have complex fitness landscapes that involve their propensity to distinguish between different ligands, engage specific components associated with divergent downstream signaling pathways, and navigate between endocytic recycling/ degradation pathways following activation. In light of the inherent complexity of GPCR function, we humbly suggest our choice of a relatively simple property of an otherwise complex protein may be viewed as a virtue rather than a shortcoming. Protein fitness is typically cast as the product of abundance and activity. Rather than measuring an oversimplified, composite fitness metric, we focused on one variable (plasma membrane expression) and its dominant effector (folding). We believe restraining the scope in this manner was key for the elucidation of clear mechanistic insights.

      The differential clustering of epistatic mutations (Fig. 6) provides some interesting insights as to the rules that dictate epistasis, but these too are dominated by the magnitude of destabilization caused by one of the mutations. In this case, the secondary mutations that had the most interesting epistasis were exceedingly destabilizing. With this in mind, it is hard to interpret the results that emerge regarding the epistatic interactions of W107A. Furthermore, the most significant positive epistasis is observed when W107A is combined with additional mutations that almost completely abolish surface expression. It is likely that either mutation destabilizes the protein beyond repair. Therefore, what we can learn from the fact that such mutations have positive epistasis is not clear to me. Based on this, I am not sure that another mutation that disrupts the tertiary folding more mildly would not yield different results. With that said, I believe that the results regarding the epistasis of V276T with other mutations are strong and very interesting on their own.

      We agree with the reviewer. In light of our results we believe it is virtually certain that the secondary mutations characterized herein would be likely to form distinct epistatic interactions with mutations that are only mildly destabilizing. Indeed, this insight reflects one of the key takeaway messages from this work- stability-mediated epistasis is difficult to generalize because it should depend on the extent to which each mutation changes the stability (ΔΔG) as well as initial stability of the WT/ reference sequence (ΔG, see Figure 7). Frankly, we are not so sure we would have pieced this together as clearly had we not had the fortune (or misfortune?) of including such a destructive mutation like W107A as a point of reference.

      Additionally, the study draws general conclusions from the characterization of only two mutations, W107A and V276T. At this point, it is hard to know if other mutations that perturb insertion or tertiary folding would behave similarly. This should be emphasized in the text.

      We agree. Our findings suggest different mutations may not behave similarly, which we believe is a key finding of this work. We have emphasized this point in the Discussion section of the revised manuscript as follows:

      “These findings suggest the folding-mediated epistasis is likely to vary among different classes of destabilizing mutations in a manner that should also depend on folding efficiency and/ or the mechanism(s) of misfolding in the cell.”

      Some statistical aspects of the study could be improved:

      (1) It would be nice to see the level of reproducibility of the biological replicates in a plot, such as scatter or similar, with correlation values that give a sense of the noise level of the measurements. This should be done before filtering out the inconsistent data.

      We thank the reviewer for this suggestion and will include scatters for each genetic background like the one shown above in Figure S3 of the revised version of the manuscript.

      (2) The statements "Variants bearing mutations within the C- terminal region (ICL3-TMD6-ECL3-TMD7) fare consistently worse in the V276T background relative to WT (Fig. 4 B & E)." and "In contrast, mutations that are 210 better tolerated in the context of W107A mGnRHR are located 211 throughout the structure but are particularly abundant among residues 212 in the middle of the primary structure that form TMD4, ICL2, and ECL2 213 (Fig. 4 C & F)." are both hard to judge. Inspecting Figures 4B and C does not immediately show these trends, and importantly, a solid statistical test is missing here. In Figures 4E and F the locations of the different loops and TMs are not indicated on the structure, making these statements hard to judge.

      We apologize for this oversight and thank the reviewer for pointing this out. We utilized paired Wilcoxon-Signed Rank Tests to evaluate the statistical significance of these observations and modified the description of these findings in the revised version of the results section as follows:

      “Variants bearing mutations within the C-terminal regions including ICL3, TMD6, and TMD7 fare consistently worse in the V276T background relative to WT (paired Wilcoxon-Signed Rank Test p-values of 0.0001, 0.02, and 0.005, respectively) (Fig. 4 B & E). Given that V276T perturbs the cotranslational membrane integration of TMD6 (Fig. S1, Table S1), this directional bias potentially suggests that the apparent interactions between these mutations manifest during the late stages of cotranslational folding. In contrast, mutations that are better tolerated in the context of W107A mGnRHR are located throughout the structure but are particularly abundant among residues in the middle of the primary structure that form ICL2, TMD4, and ECL2 (paired Wilcoxon-Signed Rank Test p-values of 0.0005, 0.0001, and 0.004, respectively) (Fig. 4 C & F).”

      (3) The following statement lacks a statistical test: "Notably, these 98 variants are enriched with TMD variants (65% TMD) relative to the overall set of 251 variants (45% TMD)." Is this enrichment significant? Further in the same paragraph, the claim that "In contrast to the sparse epistasis that is generally observed between mutations within soluble proteins, these findings suggest a relatively large proportion of random mutations form epistatic interactions in the context of unstable mGnRHR variants". Needs to be backed by relevant data and statistics, or at least a reference.

      We thank the reviewer for this reasonable suggestion. In the revised manuscript, we included the results of a paired Wilcoxon-Signed Rank Test that confirms the statistical significance of this observation and modified the Results section to reflect this as follows:

      “Notably, these 98 variants are enriched with TMD variants (65% TMD) relative to the overall set of 251 variants (45% TMD, Fisher’s Exact Test p = 0.0019). These findings suggest random mutations form epistatic interactions in the context of unstable mGnRHR variants in a manner that depends on the specific folding defect (V276T vs. W107A) and topological context.”

      Reviewer #1 (Recommendations for the Authors):

      As far as this reviewer is aware, the effect of the V267T variant on MP insertion has not been measured directly; its position corresponds to T277 in TMD6 of human GnRHR that has been measured for TM insertion, but given the clear lack of conservation (threonine vs valine) the mutation in TM6 could potentially have a different impact on the mouse homologue. Please clarify what the predicted delta TM for insertion is between human and mouse GnRHR is? Moreover, I would argue that single TM insertion by tethering to Lep is insufficient to understand MP insertion/folding, as neighbouring TM helices could help to drive TM6 insertion. Has ER microsome experiments for mouse GnRHR also been carried out in the context of neighbouring helices?

      We included measurements (and predictions) of the impact of the V276T substitution on the translocon-mediated membrane integration of the mouse TMD6 in the context of a chimeric Lep protein (see Fig. S1 & Table S1). Our results reveal that this substitution decreases the efficiency of TMD6 membrane integration by ~10%. Though imperfect, this prevailing biochemical assay remains popular for a variety of theoretical and technical reasons. Importantly, extensive experimental testing of this system has shown that these measurements report apparent equilibrium constants that are well-described by two-state equilibrium partitioning models (see DOIs 10.1038/nature03216 and 10.1038/nature06387). This observation provides a reasonable rationale to interpret these measurements using energetic models as we have in this work (see Table S1). From a technical perspective, the Lep system is also advantageous due to the fact that this protein is generally well expressed in the context of in vitro translation systems containing native membranes, which generally ensures a consistent signal to noise and dynamic range for membrane integration measurements. Nevertheless, the reviewers are correct that membrane integration efficiencies are likely distinct in the context of the native mGnRHR protein. For these reasons, we attempted to develop a glycosylation-based topology reporter prior to the posting and submission of this manuscript. However, all GnRHR reporters we tested were poorly expressed in vitro and the resulting 35S-labeled proteins only generated faint smears on our phosphorimaging screens that could not be interpreted. For these reasons, we chose to rely the Lep measurements for these investigations.

      The lack of a more relevant topological reporter is one of many challenges we faced in our investigations of this unstable, poorly behaved protein. We share the reviewer’s frustrations concerning the speculative aspects of this work. Nevertheless, there is increasing appreciation for the fact that our perspectives on protein biophysics have been skewed by our continuing choice to focus on the relatively small set of model proteins that are compatible with our favored methodologies (doi: 10.1016/j.tibs.2013.05.001). We humbly suggest this work represents an example of how we can gain a deeper understanding of the limits of biochemical systems when we instead choose to study the unsavory bits of cellular proteomes. But this choice requires a willingness to make some reasonable assumptions and to lean on energetic/ structural modeling from time to time. Despite this limitation, we believe there is still tremendous value in this compromise.

      What is the experimental evidence the W107A variant affects the protein structure? Has its melting temperature with and without inverse agonist binding for WT vs the W107A variant been measured, for example? Even heat-FSEC of detergent-solubilised membranes would be informative to know how unstable the W107A variant is. If is very unstable in detergent, then it could be that recovery mutants are going to be unlikely as you are already starting with a poor construct showing poor folding/localisation.

      We again understand the rationale for this concern, but do not believe that thermal melting measurements are likely to report the same sorts of conformational transitions involved in cellular misfolding. Heating up a protein to the point in which membranes (or micelles) are disrupted and the proteins begin to form insoluble aggregates is a distinct physical process from those that occur during co- and post-translational folding within intact ER membranes at physiological temperatures (discussed further in the Response to the Reviews). Indeed, as the reviewer points out below, there seems to be little evidence that secretion is linked to thermal stability or various other metrics that others have attempted to optimize for the sake of purification and/ or structural characterization. Thus, we believe it would be just as speculative to suggest thermal aggregation represents a relevant metric for the propensity of membrane proteins to fold in the cell. The physical interpretation of membrane protein misfolding reaction remains contentious in our field due to the key fact that the denatured states of helical membrane proteins remain highly structured in a manner that is hard to generalize beyond the fact that the denatured states retain α-helical secondary structure (doi: 10.1146/annurev-biophys-051013-022926). This is in stark contrast to soluble proteins, where random coil reference states have proven to be generally useful for energetic interpretations of protein stability. For reference, our lab is currently working to leverage epistatic measurements like this to map the prevailing physiological denatured states of an integral membrane protein. Our current findings suggest that non-native electrostatic interactions form in the context of misfolded states. We hope that more information on the structural aspects of these states will help us to develop and interpret meaningful folding measurements within the membrane.

      For reference, even in cases when quantitative folding measurements can be achieved, their relevance remains actively debated. As a point of reference, the corresponding author of this work previously worked on the stability and misfolding of another human α-helical membrane protein (PMP22). Like GnRHR, PMP22 is prone to misfolding in the secretory pathway and is associated with dozens of pathogenic mutations that cause protein misfolding. To understand how the thermodynamic stability of this protein is linked to secretion, the corresponding author purified PMP22, reconstituted it into n-Dodecyl-phosphocholine (DPC) micelles, and measured its resistance to denaturation by an anionic denaturing detergent (Lauryl Sarcosine, LS). The results were initially perplexing due to the fact that equilibrium unfolding curves manifested as an exponential decay (rather than a sigmoid) and relaxation kinetics appeared to be dominated by the rate constant for unfolding (doi: 10.1021/bi301635f). Unfortunately, these data could not be fit with existing folding models due to the lack of a folded protein baseline and the absence of a folding arm in the chevron plot. We eventually found that a full sigmoidal unfolding transition and refolding kinetics could be measured upon addition of 15% (v/v) glycerol. Our measurements revealed that the free energy of unfolding in DPC micelles was 0 kcal/ mol (without glycerol). This shocking lack of WT stability made it impossible to directly measure the effects of destabilizing mutations that enhance misfolding- you can’t measure the unfolding of a protein that is already unfolded. We ultimately had to instead infer the energetic effects of such mutations from the thermodynamic coupling between cofactor binding and folding (doi: 10.1021/jacs.5b03743). Finally, after demonstrating the resulting ΔΔGs correlated with both cellular trafficking and disease phenotype, we still faced justified scrutiny about the relevance of these measurements due to the fact that they were carried out in micelles. For these reasons, we do not feel that additional biophysical measurements will add much to this work until more is understood about the nature of misfolding reactions in the membrane and how to effectively recapitulate it in vitro. We also note that PMP22 is secreted with 20% efficiency in mammalian cell lines, which is 20-fold more efficient than human GnRHR under similar conditions (doi: 10.1016/j.celrep.2021.110046). Thus, we suspect equilibrium unfolding measurements are likely out of reach using previously described measurements.

      Our greatest evidence suggesting W107A destabilizes the protein has to do with the fact that it deletes a highly conserved structural contact and that this structural modification kills its secretion. The fact that this mutation clearly reduces the escape of GnRHR from ER quality control is a classic indicator of misfolding that represents the cell’s way of telling us that the mutation compromises the folding of the nascent protein in some way or another. Precisely how this mutation remodels the nascent conformational ensemble of nascent GnRHR and how this relates to the free energy difference between the native and non-native portions of its conformational ensemble under cellular conditions is a much more challenging question that lies beyond the scope of this investigation (and likely beyond the scope of what’s currently possible). Indeed, there is an entire field dedicated to understanding such. Nevertheless, the difference in the epistatic interactions formed by W107A and V276T is at the very least consistent with our speculative interpretation that these two mutations vary in their misfolding mechanism and/ or in the extent to which they destabilize the protein. For these reasons, we feel the main conclusions of this manuscript are well-justified.

      Please clarify if the protein is glycosylated or not and, if it is, how would this requirement affect the conclusions of your analysis?

      As we noted in the Response to the Reviewers, which also constitutes a published portion of the final manuscript, this protein is indeed glycosylated. We were well aware of this aspect of the protein since inception of this project and do not think this changes our interpretation at all. Most membrane proteins are glycosylated, and several groups have demonstrated in various ways that the secretion efficiency of glycoproteins is proportional to certain stability metrics for secreted soluble proteins and membrane proteins alike. Generally, mutations that enhance misfolding do not change the propensity of the nascent chain to undergo N-linked glycosylation, which occurs during translation before protein synthesis and/ or folding is complete. Misfolded proteins typically carry lower weight glycans, which reflects their failure to advance from the ER to the Golgi, where N-linked glycans are modified and O-linked glycans are added. From our perspective, glycosyl modifications just ensure that nascent proteins are engaged by calnexin and other lectin chaperones involved in QC. It does not decouple folding from secretion efficiency. In the case of PMP22 (described above), we found that removal of its glycosylation site allows the nascent protein to bypass the lectin chaperones in a manner that enhances its plasma membrane expression eight-fold (doi: 10.1016/j.jbc.2021.100719). Similar to WT, the expression of several misfolded PMP22 variants also significantly increases upon removal of the glycosylation site. Nevertheless, their expression is still significantly lower than the un-glycosylated WT protein, and the expression patterns of the mutants relative to WT was quite similar across this panel of un-glycosylated proteins. Thus, while glycosylation certainly impacts secretion, it does not change its dependence on folding efficiency within the ER. There are many layers of partially redundant QC within the ER, and it seems that folding imposes a key bottleneck to secretion regardless of which QC proteins are involved. For these reasons, we do not think glycosylation (or other PTMs) should factor into our interpretation of these results.

      One caveat with the study is that there is a poor understanding of the factors that decide if the protein should be trafficked to the PM or not. Even secretory proteins not going through the calnexin/reticulum cycle (as they have no N-linked glycans), might still get stuck in the ER, despite the fact they are functional. Could this be a technical issue of heterologous expression overloading the Sec system?

      While we agree that there is much to be learned about this topic, we disagree with the notion that our understanding of folding and secretion is insufficient to generally interpret the molecular basis of the observed trends. In collaboration with various other groups, the corresponding author of this paper has shown for several other proteins that the stability of the native topology and the native tertiary structure can constrain secretion efficiency (see dois: 10.1021/jacs.8b08243, 10.1021/jacs.5b03743, and 10.1016/j.jbc.2021.100423). Moreover, the Balch and Kelly groups demonstrated many years ago that relatively simple models for the coupling between folding and chaperone binding can recapitulate the observed effects of mutations on the secretion efficiency of various proteins (doi: 10.1016/j.cell.2007.10.025). Given a wide body of prevailing knowledge in this area, we believe it is entirely reasonable to assume that the conformational effects of these mutation have a dominant effect on plasma membrane expression.

      Whether or not some of the proteins retained in the ER are folded and/ or functional is an interesting question, but is outside the scope of this work. Various lines of evidence concerning approaches to rescue misfolded membrane proteins suggest many of these variants are likely to retain residual function once they escape the ER, which may suggest there are pockets of foldable/ folded proteins within the ER. But it seems generally clear that the efficiency of folding in the ER bottlenecks secretion regardless of whether or not the ER contains some fraction of folded/ functional protein. We note that it is certainly possible, if not likely, that secretion efficiency is likely to be higher at lower expression levels (doi: 10.1074/jbc.AC120.014940). However, the mutational scanning platform used in this work was designed such that all variants are expressed from an identical promoter at the same location within the genome. Thus, for the purposes of these investigations, we believe it is entirely fair to draw “apples-to-apples” comparisons of their relative effects on plasma membrane expression.

      Please see Francis Arnold's paper on this point and their mutagenesis library of the channelrhodopsin (https://www.pnas.org/doi/10.1073/pnas.1700269114), which further found that 20% of mutations improved WT trafficking. Some general comparisons to this paper might be informative.

      We agree that it may be interesting to compare the results from this paper to those in our own. Indeed, we find that 20% of the point mutations characterized herein also enhance the expression of WT mGnRHR, as mentioned in the Results section. However, we think it might be a bit premature to suggest this is a more general trend in light of the fact that the channelrhodopsins engineered in those studies were not of eukaryotic origin and have likely resulted from distinct evolutionary constraints. We ultimately decided against adding more on this to our already lengthy discussion in order to maintain focus on the mechanisms of epistasis.

      Chris Tate and others have shown that there is a high frequency of finding stabilising point mutations in GPCRs and this is the premise of the StAR technology used to thermostabilise GPCRs in the presence of different ligands, i.e. agonist vs inverse agonists. As far as I am aware, there is a poor correlation between expression levels and thermostability (measured by ligand binding to detergent-solubilised membranes). As such, it is possible that some of the mutants might be more stable than WT even though they have lower levels of PME.

      We believe the disconnect between thermostability and expression precisely speaks to our main point about the suitability of current membrane protein folding assays for the questions we address herein. The degradative activity of ER quality control has not necessarily selected for proteins that are resistant to thermal degradation and/ or are suitable for macromolecular crystallography. For this reason, it is often not so difficult to engineer proteins with enhanced thermal stability. We do not believe this disconnect signals that quality control is insensitive to protein folding and stability, but rather that it is more likely to recognize conformational defects that are distinct from those involved in thermal degradation and/ or aggregation. Indeed, recent work from the Fluman group, which builds on a wider body of previous observations, has shown that the exposure of polar groups within the membrane is a key factor that recruits degradation machinery (doi: 0.1101/2023.12.12.571171). It is hard to imagine that these sorts of conformational defects are the same as those involved in thermal aggregation.

      Reviewer #2 (Recommendations For The Authors):

      (1) I believe that by focusing more on the epistasis with V276T, and less on W107A, the paper could be strengthened significantly.

      We appreciate this sentiment. But we believe the comparison of these two mutants really drive home the point that destabilizing mutations are not equivalent with respect to the epistatic interactions they form.

      (2) In the abstract - please define the term epistasis in a simple way, to make it accessible to a general audience. For example - negative epistasis means that... this should be explicitly explained.

      We thank the reviewer for this suggestion. To meet eLife formatting, we had to cut down the abstract significantly. We simplified this as best we could in the following statement:

      “Though protein stability is known to shape evolution, it is unclear how cotranslational folding constraints modulate the synergistic, epistatic interactions between mutations.”

      We also define positive and negative epistasis in the results section as follows:

      “Positive Ɛ values denote double mutants that have greater PME than would be expected based on the effects of single mutants. Negative Ɛ values denote double mutants that have lower PME than would be expected based on the effects of single mutants. Pairs of mutations with Ɛ values near zero have additive effects on PME.”

      (3) The title is quite complex and might deter readers from outside the protein evolution field. Consider simplifying it.

      We thank the reviewer for this suggestion. We have simplified the title to the following:

      “Divergent Folding-Mediated Epistasis Among Unstable Membrane Protein Variants”

      (4) The paper could benefit from a simple figure explaining the different stages of membrane protein folding (stages 1+2) to make it more accessible to readers from outside the membrane protein field.

      This is a great suggestion. We incorporated a new schematic in the revised manuscript that outlines the nature of these processes (see Fig. 1A in the revised manuscript).

      (5) For the FACS-Seq experiment - it was not clear to me if and when all cells are pulled together. For example - are the 3 libraries mixed together already at the point of transfection, or are the transfected cells pulled together at any point before sorting? This could have some implications on batch effects and should, therefore, be explicitly mentioned in the main text.

      We thank the reviewer for this suggestion. We modified the description of the DNA library assembly to emphasize that the mutations were generated in the context of three mixed plasmid pools, which were then transfected into the cells and sorted independently:

      “We then generated a mixed array of mutagenic oligonucleotides that collectively encode this series of substitutions (Table S3) and used nicking mutagenesis to introduce these mutations into the V276T, W107A, and WT mGnRHR cDNAs (Medina-Cucurella et al., 2019), which produced three mixed plasmid pools.”

      (6) The following description in the text is quite confusing. It would be better to simplify it considerably or remove it: "scores (Ɛ) were then determined by taking the log of the double mutant fitness value divided by the difference between the single mutant fitness values (see Methods)."

      We thank the reviewer for this valuable feedback and have simplified the text as follows:

      “To compare epistatic trends in these libraries, we calculated epistasis scores (Ɛ) for the interactions that these 251 mutations form with V276T and W107A by comparing their relative effects on PME of the WT, V276T, and W107A variants using a previously described epistasis model (product model, see Methods) (Olson et al. 2014).”

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      Connelly and colleagues provide convincing genetic evidence that importation from mainland Tanzania is a major source of Plasmodium falciparum lineages currently circulating in Zanzibar. This study also reveals ongoing local malaria transmission and occasional near-clonal outbreaks in Zanzibar. Overall, this research highlights the role of human movements in maintaining residual malaria transmission in an area targeted for intensive control interventions over the past decades and provides valuable information for epidemiologists and public health professionals.

      Reviewer #1 (Public Review):

      Zanzibar archipelago is close to achieving malaria elimination, but despite the implementation of effective control measures, there is still a low-level seasonal malaria transmission. This could be due to the frequent importation of malaria from mainland Tanzania and Kenya, reservoirs of asymptomatic infections, and competent vectors. To investigate population structure and gene flow of P. falciparum in Zanzibar and mainland Tanzania, they used 178 samples from mainland Tanzania and 213 from Zanzibar that were previously sequenced using molecular inversion probes (MIPs) panels targeting single nucleotide polymorphisms (SNPs). They performed Principal Component Analysis (PCA) and identity by descent (IBD) analysis to assess genetic relatedness between isolates. Parasites from coastal mainland Tanzania contribute to the genetic diversity in the parasite population in Zanzibar. Despite this, there is a pattern of isolation by distance and microstructure within the archipelago, and evidence of local sharing of highly related strains sustaining malaria transmission in Zanzibar that are important targets for interventions such as mass drug administration and vector control, in addition to measures against imported malaria.

      Strengths:

      This study presents important samples to understand population structure and gene flow between mainland Tanzania and Zanzibar, especially from the rural Bagamoyo District, where malaria transmission persists and there is a major port of entry to Zanzibar. In addition, this study includes a larger set of SNPs, providing more robustness for analyses such as PCA and IBD. Therefore, the conclusions of this paper are well supported by data.

      Weaknesses:

      Some points need to be clarified:

      (1) SNPs in linkage disequilibrium (LD) can introduce bias in PCA and IBD analysis. Were SNPs in LD filtered out prior to these analyses?

      Thank you for this point. We did not filter SNPs in LD prior to this analysis. In the PCA analysis in Figure 1, we did restrict to a single isolate among those that were clonal (high IBD values) to prevent bias in the PCA. In general, disequilibrium is minimal only over small distances <5-10kb without selective forces at play. This is much less than the average spacing of the markers in the panel. If there is minimal LD, the conclusions drawn on relative levels and connections at high IBD are unlikely to be confounded by any effects of disequilibrium.

      ( 2) Many IBD algorithms do not handle polyclonal infections well, despite an increasing number of algorithms that are able to handle polyclonal infections and multiallelic SNPs. How polyclonal samples were handled for IBD analysis?

      Thank you for this point. We added lines 157-161 to clarify. This section now reads:

      “To investigate genetic relatedness of parasites across regions, identity by descent (IBD) estimates were assessed using the within sample major alleles (coercing samples to monoclonal by calling the dominant allele at each locus) and estimated utilizing a maximum likelihood approach using the inbreeding_mle function from the MIPanalyzer package (Verity et al., 2020). This approach has previously been validated as a conservative estimate of IBD (Verity et al., 2020).”

      Please see the supplement in (Verity et al., 2020) for an extensive simulation study that validates this approach.

      Reviewer #1 (Recommendations For The Authors):

      (3) I think Supplementary Figures 8 and 9 are more visually informative than Figure 2.

      Thank you for your response. We performed the analysis in Figure 2 to show how IBD varies between different regions and is higher within a region than between.

      Reviewer #2 (Public Review):

      This manuscript describes P. falciparum population structure in Zanzibar and mainland Tanzania. 282 samples were typed using molecular inversion probes. The manuscript is overall well-written and shows a clear population structure. It follows a similar manuscript published earlier this year, which typed a similar number of samples collected mostly in the same sites around the same time. The current manuscript extends this work by including a large number of samples from coastal Tanzania, and by including clinical samples, allowing for a comparison with asymptomatic samples.

      The two studies made overall very similar findings, including strong small-scale population structure, related infections on Zanzibar and the mainland, near-clonal expansion on Pemba, and frequency of markers of drug resistance. Despite these similarities, the previous study is mentioned a single time in the discussion (in contrast, the previous research from the authors of the current study is more thoroughly discussed). The authors missed an opportunity here to highlight the similar findings of the two studies.

      Thank you for your insights. We appreciated the level of detail of your review and it strengthened our work. We have input additional sentences on lines 292-295, which now reads:

      “A recent study investigating population structure in Zanzibar also found local population microstructure in Pemba (Holzschuh et al., 2023). Further, both studies found near-clonal parasites within the same district, Micheweni, and found population microstructure over Zanzibar.”

      Strengths:

      The overall results show a clear pattern of population structure. The finding of highly related infections detected in close proximity shows local transmission and can possibly be leveraged for targeted control.

      Weaknesses:

      A number of points need clarification:

      (1) It is overall quite challenging to keep track of the number of samples analyzed. I believe the number of samples used to study population structure was 282 (line 141), thus this number should be included in the abstract rather than 391. It is unclear where the number 232 on line 205 comes from, I failed to deduct this number from supplementary table 1.

      Thank you for this point. We have included 282 instead of 391 in the abstract. We added a statement in the results at lines 203-205 to clarify this point, which now reads:

      “PCA analysis of 232 coastal Tanzanian and Zanzibari isolates, after pruning 51 samples with an IBD of greater than 0.9 to one representative sample, demonstrates little population differentiation (Figure 1A).”

      (2) Also, Table 1 and Supplementary Table 1 should be swapped. It is more important for the reader to know the number of samples included in the analysis (as given in Supplementary Table 1) than the number collected. Possibly, the two tables could be combined in a clever way.

      Thank you for this advice. Rather than switch to another table altogether, we appended two columns to the original table to better portray the information (see Table 1).

      Methods

      (3) The authors took the somewhat unusual decision to apply K-means clustering to GPS coordinates to determine how to combine their data into a cluster. There is an obvious cluster on Pemba islands and three clusters on Unguja. Based on the map, I assume that one of these three clusters is mostly urban, while the other two are more rural. It would be helpful to have a bit more information about that in the methods. See also comments on maps in Figures 1 and 2 below.

      Cluster 3 is a mix of rural/urban while the clusters 2, 4 and 5 are mostly rural. This analysis was performed to see how IBD changes in relation to local context within different regions in Zanzibar, showing that there is higher IBD within locale than between locale.

      (4) Following this point, in Supplemental Figure 5 I fail to see an inflection point at K=4. If there is one, it will be so weak that it is hardly informative. I think selecting 4 clusters in Zanzibar is fine, but the justification based on this figure is unclear.

      The K-means clustering experiment was used to cluster a continuous space of geographic coordinates in order to compare genetic relatedness in different regions. We selected this inflection point based on the elbow plot and based the number to obtain sufficient subsections of Zanzibar to compare genetic relatedness. This point is added to the methods at lines 174-178, which now reads:

      “The K-means clustering experiment was used to cluster a continuous space of geographic coordinates in order to compare genetic relatedness in different regions. We selected K = 4 as the inflection point based on the elbow plot (Supplemental Figure 5) and based the number to obtain sufficient subsections of Zanzibar to compare genetic relatedness.”

      (5) For the drug resistance loci, it is stated that "we further removed SNPs with less than 0.005 population frequency." Was the denominator for this analysis the entire population, or were Zanzibar and mainland samples assessed separately? If the latter, as for all markers <200 samples were typed per site, there could not be a meaningful way of applying this threshold. Given data were available for 200-300 samples for each marker, does this simply mean that each SNP needed to be present twice?

      Population frequency is calculated based on the average within sample allele frequency of each individual in the population, which is an unbiased estimator. Within sample allele frequency can range from 0 to 1. Thus, if only one sample has an allele and it is at 0.1 within sample frequency, the population allele frequency would be 0.1/100 = 0.001. This allele is removed even though this would have resulted in a prevalence of 0.01. This filtering is prior to any final summary frequency or prevalence calculations (see MIP variant Calling and Filtering section in the methods). This protects against errors occurring only at low frequency.

      Discussion:

      (6) I was a bit surprised to read the following statement, given Zanzibar is one of the few places that has an effective reactive case detection program in place: "Thus, directly targeting local malaria transmission, including the asymptomatic reservoir which contributes to sustained transmission (Barry et al., 2021; Sumner et al., 2021), may be an important focus for ultimately achieving malaria control in the archipelago (Björkman & Morris, 2020)." I think the current RACD program should be mentioned and referenced. A number of studies have investigated this program.

      Thank you for this point. We have added additional context and clarification on lines 275-280, which now reads:

      “Thus, directly targeting local malaria transmission, including the asymptomatic reservoir which contributes to sustained transmission (Barry et al., 2021; Sumner et al., 2021), may be an important focus for ultimately achieving malaria control in the archipelago (Björkman & Morris, 2020). Currently, a reactive case detection program within index case households is being implemented, but local transmission continues and further investigation into how best to control this is warranted (Mkali et al. 2023).”

      (7) The discussion states that "In Zanzibar, we see this both within and between shehias, suggesting that parasite gene flow occurs over both short and long distances." I think the term 'long distances' should be better defined. Figure 4 shows that highly related infections rarely span beyond 20-30 km. In many epidemiological studies, this would still be considered short distances.

      Thank you for this point. We have edited the text at lines 287-288 to indicate that highly related parasites mainly occur at the range of 20-30km, which now reads:

      “In Zanzibar, highly related parasites mainly occur at the range of 20-30km.”

      (8) Lines 330-331: "Polymorphisms associated with artemisinin resistance did not appear in this population." Do you refer to background mutations here? Otherwise, the sentence seems to repeat lines 324. Please clarify.

      We are referring to the list of Pfk13 polymorphisms stated in the Methods from lines 146-148. We added clarifying text on lines 326-329:

      “Although polymorphisms associated with artemisinin resistance did not appear in this population, continued surveillance is warranted given emergence of these mutations in East Africa and reports of rare resistance mutations on the coast consistent with spread of emerging Pfk13 mutations (Moser et al., 2021). “

      (9) Line 344: The opinion paper by Bousema et al. in 2012 was followed by a field trial in Kenya (Bousema et al, 2016) that found that targeting hotspots did NOT have an impact beyond the actual hotspot. This (and other) more recent finding needs to be considered when arguing for hotspot-targeted interventions in Zanzibar.

      We added a clarification on this point on lines 335-345, which now reads:

      “A recent study identified “hotspot” shehias, defined as areas with comparatively higher malaria transmission than other shehias, near the port of Zanzibar town and in northern Pemba (Bisanzio et al., 2023). These regions overlapped with shehias in this study with high levels of IBD, especially in northern Pemba (Figure 4). These areas of substructure represent parasites that differentiated in relative isolation and are thus important locales to target intervention to interrupt local transmission (Bousema et al., 2012). While a field cluster-randomized control trial in Kenya targeting these hotspots did not confer much reduction of malaria outside of the hotspot (Bousema et al. 2016), if areas are isolated pockets, which genetic differentiation can help determine, targeted interventions in these areas are likely needed, potentially through both mass drug administration and vector control (Morris et al., 2018; Okell et al., 2011). Such strategies and measures preventing imported malaria could accelerate progress towards zero malaria in Zanzibar.”

      Figures and Tables:

      (10) Table 2: Why not enter '0' if a mutation was not detected? 'ND' is somewhat confusing, as the prevalence is indeed 0%.

      Thank you for this point. We have put zero and also given CI to provide better detail.

      (11) Figure 1: Panel A is very hard to read. I don't think there is a meaningful way to display a 3D-panel in 2D. Two panels showing PC1 vs. PC2 and PC1 vs. PC3 would be better. I also believe the legend 'PC2' is placed in the wrong position (along the Y-axis of panel 2).

      Supplementary Figure 2B suffers from the same issue.

      Thank you for your comment. A revised Figure 1 and Supplemental Figure 2 are included, where there are separate plots for PC1 vs. PC2 and PC1 vs. PC3.

      (12) The maps for Figures 1 and 2 don't correspond. Assuming Kati represents cluster 4 in Figure 2, the name is put in the wrong position. If the grouping of shehias is different between the Figures, please add an explanation of why this is.

      Thank you for this point. The districts with at least 5 samples present are plotted in the map in Figure 1B. In Figure 2, a totally separate analysis was performed, where all shehias were clustered into separate groups with k-means and the IBD values were compared between these clusters. These maps are not supposed to match, as they are separate analyses. Figure 1B is at the district level and Figure 2 is clustering shehias throughout Zanzibar.

      The figure legend of Figure 1B on lines 410-414 now reads:

      “B) A Discriminant Analysis of Principal Components (DAPC) was performed utilizing isolates with unique pseudohaplotypes, pruning highly related isolates to a single representative infection. Districts were included with at least 5 isolates remaining to have sufficient samples for the DAPC. For plotting the inset map, the district coordinates (e.g. Mainland, Kati, etc.) are calculated from the averages of the shehia centroids within each district.”

      The figure legend of Figure 2 on lines 417-425 now reads:

      “Figure 2. Coastal Tanzania and Zanzibari parasites have more highly related pairs within their given region than between regions. K-means clustering of shehia coordinates was performed using geographic coordinates all shehias present from the sample population to generate 5 clusters (colored boxes). All shehias were included to assay pairwise IBD between differences throughout Zanzibar. Pairwise comparisons of within cluster IBD (column 1 of IBD distribution plots) and between cluster IBD (column 2-5 of IBD distribution plots) was done for all clusters. In general, within cluster IBD had more pairwise comparisons containing high IBD identity.”

      (13) Figure 2: In the main panel, please clarify what the lines indicate (median and quartiles?). It is very difficult to see anything except the outliers. I wonder whether another way of displaying these data would be clearer. Maybe a table with medians and confidence intervals would be better (or that data could be added to the plots). The current plots might be misleading as they are dominated by outliers.

      Thank you for this point and it greatly improved this figure. We changed the plotting mechanisms through using a beeswarm plot, which plots all pairwise IBD values within each comparison group.

      (14) In the insert, the cluster number should not only be given as a color code but also added to the map. The current version will be impossible to read for people with color vision impairment, and it is confusing for any reader as the numbers don't appear to follow any logic (e.g. north to south).

      Thank you very much for these considerations. We changed the color coding to a color blind friendly palette and renamed the clusters to more informative names; Pemba, Unguja North (Unguja_N), Unguja Central (Unguja_C), Unguja South (Unguja_S) and mainland Tanzania (Mainland).

      (15) The legend for Figure 3 is difficult to follow. I do not understand what the difference in binning was in panels A and B compared to C.

      Thank you for this point. We have edited the legend to reflect these changes. The legend for Figure 3 on lines 427-433 now reads:

      “Figure 3. Isolation by distance is shown between all Zanzibari parasites (A), only Unguja parasites (B) and only Pemba parasites (C). Samples were analyzed based on geographic location, Zanzibar (N=136) (A), Unguja (N=105) (B) or Pemba (N=31) (C) and greater circle (GC) distances between pairs of parasite isolates were calculated based on shehia centroid coordinates. These distances were binned at 4km increments out to 12 km. IBD beyond 12km is shown in Supplemental Figure 8. The maximum GC distance for all of Zanzibar was 135km, 58km on Unguja and 12km on Pemba. The mean IBD and 95% CI is plotted for each bin.”

      (16) Font sizes for panel C differ, and it is not aligned with the other panels.

      Thank you for pointing this out. Figure 3 and Supplemental Figure 10 are adjusted with matching formatting for each plot.

      (17) Why is Kusini included in Supplemental Figure 4, but not in Figure 1?

      In Supplemental Figure 4, all isolates were used in this analysis and isolates with unique pseudohaplotypes were not pruned to a single representative infection. That is why there are additional isolates in Kusini. The legend for Supplemental Figure 4 now reads:

      “Supplemental Figure 4. PCA with highly related samples shows population stratification radiating from coastal Mainland to Zanzibar. PCA of 282 total samples was performed using whole sample allele frequency (A) and DAPC was performed after retaining samples with unique pseudohaplotypes in districts that had 5 or more samples present (B). As opposed to Figure 1, all isolates were used in this analysis and isolates with unique pseudohaplotypes were not pruned to a single representative infection.”

      (18) Supplemental Figures 6 and 7: What does the width of the line indicate?

      The sentence below was added to the figure legends of Supplemental Figures 6 and 7 and the legends of each network plot were increased in size:

      “The width of each line represents higher magnitudes of IBD between pairs.”

      (19) What was the motivation not to put these lines on the map, as in Figure 4A? This might make it easier to interpret the data.

      Thank you for this comment. For Supplemental Figure 8 and 9, we did not put these lines that represent lower pairwise IBD to draw the reader's attention to the highly related pairs between and within shehias.

      Reviewer #2 (Recommendations For The Authors):

      (1) There is a rather long paragraph (lines 300-323) on COI of asymptomatic infections and their genetic structure. Given that the current study did not investigate most of the hypotheses raised there (e.g. immunity, expression of variant genes), and the overall limited number of asymptomatic samples typed, this part of the discussion feels long and often speculative.

      Thank you for your perspective. The key sections highlighted in this comment, regarding immunity and expression of variant genes, were shortened. This section on lines 300-303 now reads:

      “Asymptomatic parasitemia has been shown to be common in falciparum malaria around the globe and has been shown to have increasing importance in Zanzibar (Lindblade et al., 2013; Morris et al., 2015). What underlies the biology and prevalence of asymptomatic parasitemia in very low transmission settings where anti-parasite immunity is not expected to be prevalent remains unclear (Björkman & Morris, 2020).”

      (2) As a detail, line 304 mentions "few previous studies" but only one is cited. Are there studies that investigated this and found opposite results?

      Thank you for this comment. We added additional studies that did not find an association between clinical disease and COI. These changes are on lines 303-308, which now reads:

      “Similar to a few previous studies, we found that asymptomatic infections had a higher COI than symptomatic infections across both the coastal mainland and Zanzibar parasite populations (Collins et al., 2022; Kimenyi et al., 2022; Sarah-Matio et al., 2022). Other studies have found lower COI in severe vs. mild malaria cases (Robert et al., 1996) or no significant difference between COI based on clinical status (Earland et al. 2019; Lagnika et al. 2022; Conway et al. 1991; Kun et al. 1998; Tanabe et al. 2015)”

      (3) Table 2: Percentages need to be checked. To take one of several examples, for Pfk13-K189N a frequency of 0.019 for the mutant allele is given among 137 samples. 2/137 equals to 0.015, and 3/137 to 0.022. 0.019 cannot be achieved. The same is true for several other markers. Possibly, it can be explained by the presence of polyclonal infections. If so, it should be clarified what the total of clones sequenced was, and whether the prevalence is calculated with the number of samples or number of clones as the denominator.

      Thank you for this point. We mistakenly reported allele frequency instead of prevalence. An updated Table 2 is now in the manuscript. The method for calculating the prevalence is now at lines 148-151:

      “Prevalence was calculated separately in Zanzibar or mainland Tanzania for each polymorphism by the number of samples with alternative genotype calls for this polymorphism over the total number of samples genotyped and an exact 95% confidence interval was calculated using the Pearson-Klopper method for each prevalence.”

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Granados-Aparici et al., investigate somatic-germline interactions in female mice. Mammalian oocytes are nurtured in multi-cellular ovarian follicles and communication with surrounding somatic cells is critical for oocyte development. This study focused on transzonal projections (TZP) extending from granulosa cells to the surface of oocytes and documented the importance of SMAD4, a TGF- β mediator, in regulating the TZPs. They propose a model in which individual TZPs contact the surface of the oocyte and stably attach if there is sufficient N-cadherin. In SMAD4-depleted cells, there is insufficient N-cadherin to stabilize the attachment. The TZP continues to elongate but eventually retracts. Their model is well supported by their experimental evidence and the manuscript is both well-formulated and written.

      Reviewer #2 (Public Review):

      Summary:

      This study proposed a new mechanism by which the TGF-beta signaling pathway promotes contacts between oocytes and the surrounding somatic cells in mice, by regulating the numbers of transzonal projections (TZPs).

      Strengths:

      The conditional Smad4 knockout and three-dimensional observation of transzonal projections are solid and sufficiently support the major conclusions.

      Weaknesses:

      The physiological significance of SMAD4-dependent formation of transzonal projection networks is not assessed in this study.

      Previous studies have shown that physical contact and gap junctional communication with the granulosa cells is essential for normal oocyte development. A recent study has also shown that depleting Myo10 in granulosa cells reduces the number of TZPs and leads to abnormalities in oocyte and embryo development. Thus, the importance of TZPs is well-established. These findings, which were insufficiently brought out in the Introduction of the original manuscript, have now been made more clearly (Introduction, 2nd paragraph). We recognize that these reports do not directly test a role for SMAD4-dependent TZPs. Unfortunately, it is beyond our technical capacity to obtain embryos following meiotic maturation and fertilization of oocytes that have grown in vitro, which wold be necessary for us to fully test the physiological role of SMAD4-dependent TZPs.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) The authors switch from Amhr2-cre to ER-cre to increase the number of GFP-positive granulosa cells in 12 d/o ovaries. To avoid disruption of FSH secretion by SMAD4, they use an in vitro model that requires 6 days in GEO culture (1 d tamoxifen + 5 d). Could it be that Amhr2-cre didn't work because most follicles would not have reached the atretic preantral stage in 12 d/o ovaries? Did the authors consider 6 days in vitro GEO culture to determine if Amhr2-cre would be efficient and avoid exposure to tamoxifen?

      Please see below.

      When is Amhr2 expressed?

      Previous studies (Jorgez et al, 2004; Pangas et al, 2006) report that Amhr2 is expressed in growing follicles that have progressed beyond a single layer of granulosa cells (often defined as secondary and primary follicles, respectively). As shown in Fig. 1C, we did not observe evidence of widespread Cre activity in multilayer follicles. At least two factors may contribute why we observed relatively weak Cre activity. One possibility is that, on the genetic background our mice, Amhr2 is expressed relatively late during follicular growth. Thus, we might have observed more GFP-positive granulosa cells in antral or pre-ovulatory follicles. Because the granulosa cells of these late-stage follicles would already have produced many TZPs, the number of new TZPs generated in wild-type but not SMAD4-depleted cells after Amhr2 activation would be a relatively small proportion of the total population. This would make it more difficult to detect a reduction in TZP number in the absence of SMAD4.

      A second point is that we used pre-puberal mice whereas Jorgez et al examined Amhr2 expression in ovaries of adult mice. Pangas et al evaluated both prepuberal and adult females. It may be that Amhr2 is expressed earlier or more strongly in granulosa cells of adult mice. Regarding the suggestion to culture complexes obtained from mice on the Amhr2-Cre background, as this might allow widespread expression of Cre without the need for tamoxifen, this is an excellent idea. If there is considerable heterogeneity among cells in the timing of Amhr2-Cre activity, though, this may further cloud efforts to uncover the role of SMAD4 in the production or stability of TZPs, as noted above.

      (2) Did most of the GEO cultured in vitro reach the antral follicle stage after 6 days?

      Since GOCs were treated with collagenase, the thecal layer was removed. Therefore, development of an antrum does not occur. We observed that, in some cases, the oocyte was extruded from the granulosa cell mass. These abnormal complexes were discarded.

      (3). Was the development/diameter of the oocyte in the GEO comparable to the oocyte growing in vivo?

      We did not compare the diameter of the oocytes grown in vitro to those grown in vivo. Thus, we cannot say whether the oocytes grown in vitro reached the same size as those grown in vivo. We did, however, compare the diameter of the oocytes in the wt and ko groups and observed no difference (Figure 2). This indicates that depletion of SMAD4 in the granulosa cells does not impair oocyte growth. Importantly for our studies, it excludes the possibility that the reduction in TZP-number is simply due to a smaller surface area of the oocyte.

      (4) SMAD4 depletion in granulosa cells disrupts steroidogenesis leading to increased progesterone levels and precocious luteinization of granulosa cells (Pangas et al., 2006). Did the authors determine the expression level of luteal markers of granulosa cells in the in vitro GEO culture Smad4 knockout model? Are their observations direct effects of the absence of SMAD4?

      This is an excellent point. We checked our previously performed RNA-seq analysis of the wild-type and knockout granulosa cells, but found no difference in the quantities of Cyp11a1, Sfrp4, Star or Ptgfr. This is now described in the Discussion (4th paragraph). One potentially important difference between our study and that of Pangas et al (2006) is that they observed premature luteinization when prepuberal (3-week old) mice were injected with the FSH analogue, equine serum gonadotropin, whereas we studied granulosa-oocyte complexes cultured in vitro. This could underlie the apparent differences with respect to luteinization.

      (5) Could the reduced number of TZPs in ER-cre+; Smad4fl/fl GOCs be explained by luteinization?

      This interesting and logical possibility is related to the previous point. In other words, luteinization could be considered as a default pathway of differentiation that is suppressed by SMAD signaling. It is possible that luteinized cells are unable to generate or maintain TZPs. This model offers a potential mechanistic basis for our observation, and we now raise it in the Discussion (3rd paragraph).

      Reviewer #3 (Recommendations For The Authors):

      The expression and localization of N-cadherin should be observed in Smad4 and control granulosa cell-oocyte complexes.

      We agree that this would be an excellent approach to confirm the decreased expression of N-cadherin in the granulosa cells that was observed by immunoblotting. We were confronted by two challenges, however. First, we were unable to consistently obtain strong staining of granulosa cell membranes in the inner layers of multilayer granulosa-oocyte complexes. Other antibodies are able to stain structures at the oocyte surface, indicating that antibodies are not physically blocked from penetrating the complex. More likely, the anti-N-cadherin does not bind its target strongly enough to generate a robust signal that can be detected through multiple overlying layers of cells. Second, whereas for immunoblotting we collect all granulosa cells from culture complexes, for immunofluorescence we are only able to examine those that remain in the complex. This means that, for immunofluorescence, we essentially but unavoidably select against cells that are only loosely attached – as would be expected for N-cadherin-deficient cells – to their neighbours. Given these challenges, we believe that the immunoblotting approach, which produced highly reproducible results over six biological replicates (Fig. 6), is the most reliable.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This study presents useful findings regarding the role of formin-like 2 in mouse oocyte meiosis. The submitted data are supported by incomplete analyses, and in some cases, the conclusions are overstated. If these concerns are addressed, this paper would be of interest to reproductive biologists.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The presented study focuses on the role of formin-like 2 (FMNL2) in oocyte meiosis. The authors assessed FMNL2 expression and localization in different meiotic stages and subsequently, by using siRNA, investigated the role of FMNL2 in spindle migration, polar body extrusion, and distribution of mitochondria and endoplasmic reticulum (ER) in mouse oocytes.

      Strengths:

      Novelty in assessing the role of formin-like 2 in oocyte meiosis.

      Weaknesses:

      Methods are not properly described.

      Overstating presented data.

      It is not clear what statistical tests were used.

      My main concern is that there are missing important details of how particular experiments and analyses were done. The material and methods section are not written in the way that presented experiments could be repeated - it is missing basic information (e.g., used mouse strain, timepoints of oocytes harvest for particular experiments, used culture media, image acquisition parameters, etc.). Some of the presented data are overstated and incorrectly interpreted. It is not clear to me how the analysis of ER and mitochondria distribution was done, which is an important part of the presented data interpretation. I'm also missing important information about the timing of particular stages of assessed oocytes because the localization of both ER and mitochondria differs at different stages of oocyte meiosis. The data interpretation needs to be justified by proper analysis based on valid parameters, as there is considerable variability in the ER and mitochondria structure and localization across oocytes based on their overall quality and stage.

      Thank you for your comment. We regret the oversight of omitting critical information in the manuscript. In the revised manuscript, we have included essential details such as mouse strains, culture media, stages of oocyte and statistical methods in the materials and methods section. Please find our details responses in the “Recommendations for the authors” part.

      Reviewer #2 (Public Review):

      Summary:

      This research involves conducting experiments to determine the role of Fmnl2 during oocyte meiosis I.

      Strengths:

      Identifying the role of Fmnl2 during oocyte meiosis I is significant.

      Weaknesses:

      The quantitative analysis and the used approach to perturb FMNL2 function are currently incomplete and would benefit from more confirmatory approaches and rigorous analysis.

      (1) Most of the results are expected. The new finding here is that FMNL2 regulates cytoplasmic F-actin in mouse oocytes, which is also expected given the role of FMNL2 in other cell types. Given that FMNL2 regulates cytoplasmic F-actin, it is very expected to see all the observed phenotypes. It is already established that F-actin is required for spindle migration to the oocyte cortex, extruding a small polar body and normal organelle distribution and functions.

      Thank you for your comment. In the recent decade, Arp2/3 complex (Nat Cell Biol 2011), Formin2 (Nat Cell Biol 2002, Nat Commun 2020), and Spire (Curr Biol 2011) were reported to be 3 key factors to involve into this process. These factors regulate actin filaments in different ways. However, how they cross with each other for the subcellular events were still fully clear. Our current study identified that FMNL2 played a critical role in coordinating these molecules for actin assembly in oocytes. Our findings demonstrate that FMNL2 interacts with both the Arp2/3 complex and Formin2 to facilitate actin-based meiotic spindle migration. Additionally, we discovered a novel role for FMNL2 in determining the distribution and function of the endoplasmic reticulum and mitochondria, which may in turn influence meiotic spindle migration in oocytes. Our results not only uncover the novel functions of FMNL2-mediated actin for organelle distribution, but also extend our understanding of the molecular basis for the unique meiotic spindle migration in oocyte meiosis.

      (2) The authors used Fmnl2 cRNA to rescue the effect of siRNA-mediated knockdown of Fmnl2. It is not clear how this works. It is expected that the siRNA will also target the exogenous cRNA construct (which should have the same sequence as endogenous Fmnl2) especially when both of them were injected at the same time. Is this construct mutated to be resistant to the siRNA?

      Thank you for your question. We regret any misunderstanding that may have been caused by the inappropriate description in our manuscript. In the rescue experiments, we initially injected FMNL2 siRNA into oocytes, followed by the microinjection of FMNL2 mRNA 18-20 hours later. After conducting our previous experiments, we have verified through Western blotting that endogenous FMNL2 is effectively suppressed 18-20 hours following the microinjection of FMNL2 siRNA. Additionally, we observed a significant increase in exogenous FMNL2 protein expression 2 hours after the injection of FMNL2 mRNA. We believe that the exogenous FMNL2 could compensate the decrease by FMNL2 knockdown, and this approach was adopted in many oocyte studies.

      (3) The authors used only one approach to knockdown FMNL2 which is by siRNA. Using an additional approach to inhibit FMNL2 would be beneficial to confirm that the effect of siRNA-mediated knockdown of FMNL2 is specific.

      Thank you for your question. Yes, the specificity is always the concern for siRNA or morpholino microinjection due to the off-target issue. Due to the limitation we could not generate the knock out model, and there are no known inhibitors with specific targeting capabilities for FMNL2. To solve this, we performed the rescue study with exogenous mRNA to confirm the effective knock down of FMNL2. These measures provide reassurance regarding the credibility of the experimental outcomes, and this is also the general way to avoid the off-target of siRNA or morpholino.

      Reviewer #3 (Public Review):

      Summary:

      The authors focus on the role of formin-like protein 2 in the mouse oocyte, which could play an important role in actin filament dynamics. The cytoskeleton is known to influence a number of cellular processes from transcription to cytokinesis. The results show that downregulation of FMNL2 affects spindle migration with resulting abnormalities in cytokinesis in oocyte meiosis I.

      Weaknesses:

      The overall description of methods and figures is overall dismissively poor. The description of the sample types and number of replicate experiments is impossible to interpret throughout, and the quantitative analysis methods are not adequately described. The number of data points presented is unconvincing and unlikely to support the conclusions. On the basis of the data presented, the conclusions appear to be preliminary, overstated, and therefore unconvincing.

      Thank you for your comment. We regret the oversight of omitting critical information in the manuscript. In the revised manuscript, we have incorporated your suggestions for modification, particularly regarding the Materials and Methods section. Please see the detailed revision and responses in the “Recommendations for the authors” part.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for The Authors):

      My main concern is that there are missing important details of how particular experiments and analyses were done. The material and methods section is not written in the way that presented experiments could be repeated - it is missing basic information (e.g., used mouse strain, timepoints of oocytes harvest for particular experiments, used culture media, image acquisition parameters, etc.). Some of the presented data are overstated and incorrectly interpreted. It is not clear to me how the analysis of ER and mitochondria distribution was done, which is an important part of the presented data interpretation. I'm also missing important information about the timing of particular stages of assessed oocytes because the localization of both ER and mitochondria differs at different stages of oocyte meiosis. The data interpretation needs to be justified by proper analysis based on valid parameters, as there is considerable variability in the ER and mitochondria structure and localization across oocytes based on their overall quality and stage. My specific comments are listed below.

      (1) Information about statistical tests that were used needs to be provided for all quantification experiments.

      Thank you for your suggestion. Based on your suggestions, we revised the statistical analysis description in the Materials and Methods section. Additionally, we also included a description of the statistical methods in the legends of the relevant result figures.

      (2) I recommend replacing the plunger plots, used in most quantification data, with alternatives allowing evaluation of the distribution of the data (dot plots, box plots, whisker plots).

      Thank you for your suggestion. Following your suggestion, we replaced the plunger plots in Fig 2C, D, H, I and Fig3 B, C with dot plots.

      (3) Can the authors provide information about particular time points when were individual oocyte stages (GVBD, meiosis I, and meiosis II) harvested/used for immunofluorescence protein detection, western blotting, microinjection, and ER and mitochondria staining? Were the time points always the same in all presented experiments and experimental vs control group? If not, this needs to be clarified.

      Thank you for your suggestion. We used oocytes in the metaphase I (MI) stage for the statistical analysis of spindle migration, actin filament aggregation, endoplasmic reticulum localization, and mitochondrial localization. In the Western blot analysis, GV stage oocytes were utilized to evaluate the efficiency of knockdown and rescue experiments. The protein expression levels of Arp2, Formin2, INF2, Cofilin, Grp78, and Chop in different treatment groups were detected using MI-stage oocytes. In the revised version, we provided all the detailed information about the stages.

      (4) Figure 1B: Can the authors comment on why there is a missing representative image of MII oocyte FMBL2-Ab? I recommend including this in the figure to have a complete view of comparing overexpressed and endogenous FMNL2 localization in oocyte meiosis.

      Thank you for your suggestion. In the revised manuscript, we added immunostaining images of FMNL2 antibody in MII stage oocytes.

      (5) Figure 1C: The figure legend says, "FMNL2 and actin overlapped in cortex and spindle surrounding". In MI oocytes, there is usually no accumulated actin signal around the spindle, which is also true in the presented images, so there cannot be overlapping with the FMNL2 signal. The interpretation should be changed.

      We apologize for this inappropriate description that was used, and we deleted this sentence.

      (6) Figure 2B: What were the parameters of the "large" and "normal" polar bodies for performing the analysis?

      Thank you for your question. In order to assess the size of the polar body, we conducted a comparison between the diameter of the polar body and that of the oocyte. If the diameter of the polar body was found to be less than 1/3 of the oocyte's diameter, we categorized it as normal-sized polar body. Conversely, if the polar body's diameter exceeded 1/3 of the oocyte's diameter, we categorized it as a large polar body. We have included these details in the Results section of the manuscript.

      (7) Figure 2F: Can the authors comment on what can be the second band in the rescue group?

      Thank you for your question. In the rescue experiment, we microinjected exogenous FMNL2-EGFP mRNA into the oocytes. As a result, compared to endogenous FMNL2, the protein size increased due to the addition of the EGFP tag, approximately 27 kDa. Hence, in the Western blot bands of the rescue group, the upper band represents the expression of exogenous FMNL2-EGFP, while the lower band corresponds to the expression of endogenous FMNL2. We have provided annotations in the revised Figure 2F to clarify this.

      (8) Can the authors comment on the variability of PBE between 2C and 2H in the FMNL2-KD groups? In panel C, the PBE in the KD group was 59.5 {plus minus} 2.82%; in panel H, the PBE in the KD group was 48.34 {plus minus} 4.2%, and in the rescue group, the PBE was 62.62 {plus minus} 3.6%. The rescue group has a similar PBE rate as the KD group in panel C. How consistent was the FMNL2 knockdown across individual replicates? Can the authors provide more details on how the rescue experiment was performed?

      Thank you for your question. We believe that the difference in PBE observed in Figure 2C and 2H of the FMNL2-KD group was due to the microinjection times and the duration of in vitro arrest. The results shown in Figure 2C depict the outcome of a single injection of FMNL2 siRNA into GV stage oocytes, followed by 18 hours of in vitro arrest; the results shown in Figure 2H contain a subsequent additional injection of FMNL2-EGFP mRNA with another 2 hours of arrest. The two rounds of microinjection and the extended period of in vitro arrest both affect oocyte maturation rates.

      (9). Figure 2J and K: What groups were compared together? The used statistic needs to be properly described.

      Thank you for your question. The FMNL2-KD, FMNL3-KD, and FMNL2+3-KD groups were all compared to the Control group, therefore, t-test was used for analysis. We have provided explanations in the revised manuscript.

      (10) Figure 4B and C: Can the authors provide representative images without oversaturated actine signal?

      Thank you for your question. For the analysis of oocyte F-actin, the F-actin are divided into cortex actin and cytoplasmic actin. Due to the contrast during imaging, the strong cortex actin signals affected the detection of cytoplasmic actin, therefore, it is necessary to increase the scanning index, which will cause the overexpose the cortex actin signal. This is for the better observation of the cytoplasmic signals.

      (11) Figure 4G + 5H: Can the authors comment on why they used as a housekeeping gene actin instead of tubulin, which was used in the rest of the WB experiments?

      Thank you for your question. In most of the western blot experiments conducted in this study, we used tubulin as a housekeeping gene. However, due to the supply of antibodies by delivery period, we had GAPDH and actin as well for some experiments. These housekeeping genes were all valid for the study.

      (12) Based on what parameters was ER considered normally or abnormally distributed, and what stages of oocytes were assessed?

      Thank you for your question. In this study, we employed oocytes at the MI stage for the analysis of ER localization. In the MI stage, the ER localized around the spindle, which is regarded as the typical localization pattern. The ER displayed a dispersed distribution throughout the cytoplasm or clustered were categorized as aberrant positioning. We included relevant descriptions in the revised version of the manuscript.

      (13) Figure 5H: As a housekeeping gene was used actin - the quantification is labeled as a Grp78 to tubulin ratio.

      Thank you for pointing out the error. This is a label mistake and we corrected it.

      (14) Information about how JC-1 staining was done needs to be provided.

      Thank you for your carefully reading. We included a description of JC1 staining in the Materials and Methods section.

      (15). Line 231-232: "As shown in Figure 4A" - the text doesn't correspond to the figure.

      Thank you for pointing out the error. We revised this mistake in the revised manuscript by correcting "Fig3A" to "Fig4A."

      (16) Line 265: there is probably a missing word "Formin2".

      Thank you and we corrected the error and made the necessary changes in the revised manuscript.

      Reviewer #2 (Recommendations for The Authors):

      (1) Quantification and analysis:

      • Fig. 3B: The rate of spindle migration should be quantified based on the distance from the spindle to the cortex. Also, the orientation of the spindle (Z-position) needs to be taken into consideration.

      • Fig. 5C, D: It is unclear how the rate of ER distribution was calculated.

      • Western blot: In many experiments (such as Fig. 5H), the bands are saturated which will prevent accurate intensity measurements and quantifications.

      For spindle migration, we specifically focused on spindles exhibiting a distinctive spindle-like shape with clear bipolarity to eliminate any statistical discrepancies potentially caused by variations in Z-axis alignment. Our criterion for determining successful migration was based on the contact between the spindle pole and the cortical region of the oocyte. Therefore, we think that the rate is better to reflect the phenotype than the distance.

      For the examination of ER localization, Reviewer 1 also raised this issue. We utilized oocytes at the MI stage in this study. The ER localized around the spindle in MI stage. The ER displayed a dispersed distribution throughout the cytoplasm or clustered were categorized as aberrant positioning. We included relevant descriptions in the revised version of the manuscript.

      For the bands of the western blot results, during the experimental procedure we typically capture multiple images at different exposure levels (3-5 images). In the revised manuscript, we have replaced the inappropriate images with more suitable ones.

      (2) Given that all Immunoprecipitation experiments in this manuscript were performed on the whole ovary which contains more somatic cells than oocytes, the results do not necessarily reflect meiotic oocytes. Please consider this possibility during the interpretation.

      Thank you for your suggestion. Yes, we agree with you. In the revised manuscript, we made appropriate modifications to the relevant descriptions.

      (3) 351-365: The conclusion that Arp2/3 compensates for the decreased formin 2 in FMNL2 knockdown oocytes is a bit unconvincing. 1- In mouse oocytes, it is already known that Arp2/3 and formin 2 regulate different pools of F-actin nucleation. 2- The authors found an increase in Arp2/3 in FMNL2 knockdown oocytes compared to control oocytes without any change in cortical F-actin. Given that Arp2/3 is primarily promoting cortical F-actin, it is expected to see an increase in cortical F-actin in FMNL2 knockdown oocytes, which was not the case.

      Thank you for your question. Yes, previous studies showed that formin2 localizes to the cytoplasm of oocytes and accumulates around the spindle, which facilitate cytoplasmic actin assembly. While Arp2/3 is primarily responsible for actin assembly at the cortex region of oocytes. In invasive cells, FMNL2 is mainly localized in the leading edge of the cell, lamellipodia and filopodia tips, to improve cell migration ability by actin-based manner (Curr Biol 2012). We showed that FMNL2 localized both at spindle periphery and cortex, but depletion of FMNL2 did not affect cortex actin intensity. We think that FMNL2 and Arp2/3 both contribute to the cortex actin dynamics, when FMNL2 decreased, ARP2 increased to compensate for this, which maintained the cortex actin level. In the revised manuscript, we have made modifications to avoid excessive extrapolation from our results, ensuring that our conclusions are presented in a more objective manner.

      (4) Lines 195-197: The spindle is initially formed soon after the GVBD, so there is no spindle during GVBD. Also, I can't see oocytes at anaphase I or telophase I in this figure. Please revise.

      Thank you for your suggestion. We apologize for the inappropriate descriptions that were used. In the revised manuscript, we have made modifications to the respective descriptions in the Results part.

      (5) Fig. 2E: It seems that the control oocyte is abnormal with mild cytokinesis defects. Please replace or delete it since this information is already included in Fig. 3A.

      Thank you for your suggestion. Based on our observations, during the extrusion of the first polar body in oocytes, there is a temporary occurrence of cellular morphological fragmentation due to cortical reorganization (11h in control oocyte from Fig 2E). However, after the extrusion of the first polar body, the oocyte morphology returns to normal. Figure 2E illustrates the meiotic division process of oocytes, while Figure 3A primarily focuses on the process of oocyte spindle migration. We think that it is better to retain both to present our results.

      Reviewer #3 (Recommendations for The Authors):

      In the case of the observed phenotype, the stage of GV is important. The phenotypes presented also occur in meiotic or developmentally incompetent oocytes. In addition, the images of GV oocytes appear as NSN, which also show the KD phenotype in Figs. 2 and 3.

      Thank you for your concern. As the oocyte grows, the proportion of SN-type oocytes gradually increases. When the oocyte diameter reaches 70-80 μm, the proportion of SN oocytes is approximately 52.7% (Mol Reprod Dev. 1995). In our study, both the control and knockdown groups collected oocytes with a diameter of around 80 μm, which is considered as fully-grown oocytes, predominantly in the SN phase. Since the collection period and size of the oocytes were consistent, we can sure that the observed differences between the control and knockdown groups in phenotype analysis could be solid and reliable.

      MII is absent in Fig. 1B.

      In the revised manuscript, we added immunostaining images of FMNL2 in MII stage oocytes.

      The result of KD is not convincing. Also, discuss whether the heterozygous effect of Fmnl2 deletion affects reproductive fitness.

      Thank you for your concern. In our investigation, limited to the setup of knock out model, we employed siRNA to knockdown FMNL2 expression, to avoid the risk of off-target, we performed rescue experiment with exogenous mRNA, which we believe that it could solve this issue. When designing siRNA sequences, we ensured their specificity for binding to FMNL2 mRNA only, and we assessed the levels of FMNL2 and FMNL3 mRNA in oocytes after injection of FMNL2 siRNA. The results showed that, compared to the control group, the expression of FMNL2 mRNA decreased by approximately 70% after 18 hours of FMNL2 siRNA injection, while the level of FMNL3 mRNA was not decreased.

      Fig. 2F rescue experiment with double bands. What bands are seen here? Did the authors inject tagged or untagged FMNL2? Or does endogenous FMNL2 appear higher in the sample after KD?

      Thank you for your question. In the rescue experiment, we microinjected exogenous FMNL2-EGFP mRNA into the oocytes. As a result, compared to endogenous FMNL2, the protein size increased due to the addition of the EGFP tag, approximately 27 kDa. Hence, in the Western blot bands of the rescue group, the upper band represents the expression of exogenous FMNL2-EGFP, while the lower band corresponds to the expression of endogenous FMNL2. We provided annotations in the revised Figure 2F to clarify this.

      Variability in mitochondria and ER distribution patterns is also known in healthy and developing oocytes, although the authors described only a single phenotype.

      Thank you for your concern. Yes, mitochondria and ER show dynamic localization in different stage of oocyte maturation. However, in this study we employed oocyte MI stage for the analysis of ER and mitochondria localization, and in MI stage, both the ER and mitochondria localize around the spindle. This pattern is considered as the normal localization. Several studies showed that dispersed or clustered localization contributed to maturation defects. We included relevant descriptions in the revised manuscript.

      What exactly is meant by input in the IP experiments? Why is the target missing in the input sample?

      Thank you for your question. We subjected the input samples to electrophoresis on a single channel, all the analyzed proteins demonstrated normal expression, thereby confirming the viability of the input sample. However, upon simultaneous exposure with the IP samples, we observed a lack of clear signal for certain proteins in the input group. This phenomenon is due to the excessive signal intensity resulting from protein enrichment in the IP group, which caused the low exposure of proteins in input group.

      Explain the rationale for using, actin or tubulin as loading or normalization controls in the study focusing on the cytoskeleton.

      Thank you for your question. Actin and tubulin are both widely used as the control due to their stable expression. For actin, there are α-actin and β-actin isoforms. Formins and Arp2/3 complex regulate the polymerization of α-actin and β-actin to form F-actin, not isoform expression. In our study F-actin (the functional type) was examined. While α-tubulin and β-tubulin are two subtypes of tubulin, and they interact with each other to form stable α/β-tubulin heterodimers. The changes of cytoskeleton dynamics could not change the expression of α/β-tubulin. Therefore, β-actin and α-tubulin could be used as normalization controls.

      Fig. 6E shows only , but the legend says *.

      Thank you for pointing out the error. We correct the mistake in the revised manuscript.

      Spindle positioning appears to differ between control and KD. Does this affect the quantification of Fig. 6F? Adequate nomenclature should be used here.

      Thank you for your question. Yes, spindle positioning was affected by FMNL2 depletion. However, central spindle or cortex spindle all belong to MI stage, and JC1 is not related with the stage difference. To avoid misunderstanding we replaced the representative images and corresponding description in Figure 6F.

      The description of the methods and legends should be significantly improved.

      Thank you for your suggestion. Reviewer 1 and 2 also raised the similar concern. We enriched the description of methods and legends in the revised manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This study presents valuable new insights into HIV-associated nephropathy (HIVAN) kidney phenotype in the Tg26 transgenic mouse model and delineates the kidney cell types that express HIV genes and are injured in these HIV-transgenic mice. A series of compelling experiments demonstrated that PKR inhibition can ameliorate HIVAN with reversal of mitochondrial dysfunction (mainly confined to endothelial cells), a prominent feature shared in other kidney diseases. Although there are concerns regarding the specificity of C16 to PKR inhibition, as well as with the in situ hybridization studies, the data suggests that inhibition of PKR and mitochondrial dysfunction has potential clinical significance for HIVAN.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      HIV-associated nephropathy (HIVAN) is a rapidly progressing form of kidney disease that manifests secondary to untreated HIV infection, and is predominantly seen in individuals of African descent. Tg26 mice carrying an HIV transgene lacking gag and pol exhibit high levels of albuminuria and rapid decline in renal function that recapitulates many features of HIVAN in humans. HIVAN is seen predominantly in individuals carrying two copies of missense variants in the APOL1 gene, and the authors have previously shown that APOL1 risk variant mRNA induces activity of the double-strand RNA sensor kinase PKR. Because of the tight association between the APOL1 risk genotype and HIVAN, the authors hypothesized that PKR activation may mediate renal injury in Tg26 mice and tested this hypothesis by treating mice with a commonly used PKR inhibitory compound called C16. Treatment with C16 substantially attenuated renal damage in the Tg26 model as measured by urinary albumin/creatinine ratio, urinary NGAL/creatinine ratio, and improvement in histology. The authors then performed bulk and single-nucleus RNAseq on kidneys from mice from different treatment groups to identify pathways and patterns of cell injury associated with HIV transgene expression as well as to determine the mechanistic basis for the effect of C16 treatment. They show that proximal tubule nuclei from Tg26 mice appear to have more mitochondrial transcripts which was reversed by C16 treatment and suggest that this may provide evidence of mitochondrial dysfunction in this model. They explore this hypothesis by showing there is a decrease in the expression of nuclear-encoded genes and proteins involved in oxidative phosphorylation as well as a decrease in respiratory capacity via functional assessment of respiration in tubule and glomerular preparations from these mouse kidneys. All of these changes were reversed by C16 treatment. The authors propose the existence of a novel injured proximal tubule cell-type characterized by the leak of mitochondrial transcripts into the nucleus (PT-Mito). Analysis of HIV transgene expression showed high level expression in podocytes, consistent with the pronounced albuminuria that characterizes this model and HIVAN, but transcripts were also detected in tubular and endothelial cells. Because of the absence of mitochondrial transcripts in the podocytes, the authors speculate that glomerular mitochondrial dysfunction in this model is driven by damage to glomerular endothelial cells.

      Strengths:

      The strengths of this study include the comprehensive transcriptional analysis of the Tg26 model, including an evaluation of HIV transgene expression, which has not been previously reported. This data highlights that HIV transcripts are expressed in a subset of podocytes, consistent with the highly proteinuric disease seen in mice and humans. However, transcripts were also seen in other tubular cells, notably intercalated cells, principal cells and injured proximal tubule cells. Though the podocyte expression makes sense, the relevance of the tubular expression to human disease is still an open question.

      The data in support of mitochondrial dysfunction are also robust and rely on combined evidence from downregulation of transcripts involved in oxidative phosphorylation, decreases in complex I and II as determined by immunoblot, and assessments of respiratory capacity in tubular and glomerular preparations. These data are largely consistent with other preclinical renal injury models reported in the literature as well as previous, less thorough assessments in the Tg26 model.

      Weaknesses:

      The key weakness of the study lies in the use of a PKR inhibitor with questionable specificity. C16 has been reported to inhibit numerous other kinases including cyclin CDKs and GSK3α and -β, and this means that the conclusions of this study with respect to the role of PKR are highly questionable. The rationale for the dose used was not provided (and is lower than used in other publications with C16), and in the absence of drug exposure data and assessment of target engagement, it is difficult to ascertain whether substantial inhibition of PKR was achieved.

      A second key weakness lies in the identification of the PT-Mito cell cluster. Though the authors provide some rationale for the identification of this specific cell type, it seems equally plausible the cells merely reflect a high background capture of mitochondria in a subset of droplets. The IHC analysis that was provided is not convincing enough to support the claim and more careful high resolution imaging and in situ hybridization (with appropriate quantitation) will be needed to provide substantive support for the presence of a proximal tubule cell type with mitochondrial transcript that are trafficked to the nucleus.

      We appreciate the reviewer’s thoughtful summary.

      With regard to non-specificity of C16, we added to the Discussion a description and references that describe non-specificity of C16. as suggested by the reviewer. Of note, the C16 doses that we used were also used previously (Okamoto, CommBiol, 2018). Importantly, newly-added immunofluorescence images using a phospho-PKR specific antibody showed PKR inhibition (Supplemental Figure 1).

      Identification of the PT-Mito cluster in tissues was challenging, mainly due to the absence of existence of know marker genes for newly-identified cluster. Finally, We added in situ hybridization images, with a negative control probe, to show specificity of target probes.

      Reviewer #2 (Public Review):

      Summary:

      Numerous studies by the authors and other groups have demonstrated an important role for HIV gene expression kidney cells in promoting progressive chronic kidney disease, especially HIV-associated nephropathy. The authors had previously demonstrated a role for protein kinase R (PKR) in a non-HIV transgenic model of kidney disease (Okamoto, Commun Bio, 2021). In this study, the authors used innovative techniques including bulk and single nuclear RNAseq to demonstrate that mice expressing a replication-incompetent HIV transgene have prominent dysregulation of mitochondrial gene expression and activation of PKR and that treatment of these mice with a small molecule PKR inhibitor ameliorated the kidney disease phenotype in HIV-transgenic mice. They also identified STAT3 as a key upstream regulator of kidney injury in this model, which is consistent with previously published studies. Other important advances include identifying the kidney cell types that express the HIV transgene and have dysregulation of cellular pathways.

      Strengths:

      Major strengths of the study include the use of a wide variety of state-of-the-art molecular techniques to generate important new data on the pathogenesis of kidney injury in this commonly used model of kidney disease and the identification of PKR as a potential druggable target for the treatment of HIV-induced kidney disease. The authors also identify a potential novel cell type within the kidney characterized by high expression of mitochondrial genes.

      Weaknesses:

      Though the HIV-transgenic model used in these studies results in a phenotype that is very similar to HIV-associated nephropathy in humans, the model has several limitations that may prevent direct translation to human disease, including the fact that mice lack several genetic factors that are important contributors to HIV and kidney pathogenesis in humans. Additional studies are therefore needed to confirm these findings in human kidney disease.

      We appreciate the succinct summary of the present work. We agree that the findings from the HIV Tg26 mouse model warrant additional investigation in human kidney disease samples. Further studies will be needed to confirm whether the mechanisms presented here are operative in human HIVAN or other RNA virus-associated kidney diseases.

      Reviewer #1 (Recommendations For The Authors)

      The specificity of the C16 tool has been called into question in 3 publications - Chen et al, 2008, PMID: 19046382; Lopez-Grancha et al, 2021, PMID: 34531308; and Cusak et al, 2023, PMID: 36400288. Lopez-Grancha et al have reported a novel, more selective PKR inhibitor with good pharmacological properties that might enable a more robust test of the PKR hypothesis. Regardless, compound exposures and target engagement (i.e. by monitoring phosphorylation of PKR targets such eIF2α) should accompany these studies. Alternatively, it may be easier to probe the role of PKR in Tg26 pathogenicity by crossing the Tg26 line to a PKR knockout mouse.

      In response, we have added a description and references about the the possibility of non-specificity of C16 in the Discussion as a limitation as suggested. (Page 21).

      “Third, we acknowledge possibility of a non-specific effect of C16 as an inhibitor of PKR.66-68”

      Further, we added immunohistochemistry images of pPKR on kidney tissue as shown in Supplemental Figure 1A-D. Images showed PKR activation in Tg26 tubular cells, which was inhibited by C16 treatment.

      Author response image 1.

      Immunofluorescent images showing pPKR. (A-D) Immunofluorescent images showed PKR activation by detecting pPKR in Tg26 mouse kidney. pPKR was inhibited by C16 treatments.

      The suggested PKR knockout mice experiment is an excellent idea for future work but we believe Is outside the scope of the current manuscript.

      To enhance the evidentiary base for the PT-Mito cell type, it would be interesting to know whether these cells can also be found in human datasets like KPMP, though this might require reprocessing the original snRNAseq data. Further in situ hybridization in both mouse and human samples using fluorescent rather than colorimetric approaches should yield a more compelling dataset to provide evidence for this cell type. These approaches would also allow for more precise quantification of the PT-Mito cells compared to the population of proximal tubule cells. Again, the default assumption here should be that the mitochondrial transcripts represent a contamination, and the purpose of these additional experiments is to definitively rule out that explanation.

      Authors: First, as suggested, we carried out additional analyses. We examined a publiclyavailable human kidney snRNA-seq dataset (GSE131882) and found in it the same PT-Mito cluster as shown in Supplemental Figure 6. The PT-Mito cluster was located in close proximity to the PT cluster in a UMAP plot. We added this finding in the Results as follows (Page 12):

      “We also confirmed the existence of similar PT-Mito cluster in published human kidney single-nuclear RNA-seq data47 by the re-analysis of the original data. (Supplemental Figure 6A-C).”

      Author response image 2.

      PT-Mito cluster detection of publicly available human kidney single-nuclear RNA-seq data (GSE131882) (A) UMAP plot of human kidney single-nuclear RNA-seq data shows 16 clusters. Cluster 1, 4 are proximal tubule (PT) clusters, and cluster 7 is PT-Mito cluster. (B) Dot plot shows expression of PT marker genes and PT-Mito marker genes obtained from current manuscript data. PTMito markers including MT-CO1 and MT-CO2 had high expression in cluster 7. (C) UMAP plot shows all six samples are contributing to all cell clusters.

      Second, as suggested, we also included negative control data from in situ hybridization studies (Supplementary Figure 5A, 5B), which shows that the signals in Figure 4B, 4C are true signals.

      Author response image 3.

      Additional in situ hybridization images. (A) In situ hybridization images probing dapB (negative control probe) showed no signals. (B) In situ hybridization images probing Ppib (positive control probe) showed strong signals.

      Reviewer #2 (Recommendations For The Authors)

      (1) The supplementary data file seems to have been uploaded twice but the supplementary methods were not available which would have been helpful when assessing some methods such as using PodoCount to count podocytes.

      We acknowledge that we inadvertently failed to upload the Supplementary Methods section-thank you for pointing this out. The supplementary methods are now provided in the revised submission, including detailed methods about PodoCount. Corresponding descriptions are as follows:

      “Estimation of glomerular podocyte count

      PodoCount5, a computational tool for whole slide podocyte estimation from digitized histologic sections, was used to detect, enumerate, and characterize podocyte nuclear profiles in the glomeruli of immunohistochemically labeled (IHC-labeled) murine kidney sections. Formalin-fixed, paraffin embedded tissues (2 µm thickness) were IHC-labeled for p57kip2, a marker of podocyte terminal differentiation (ab75974, Abcam, Cambridge, UK), and detected with horse radish peroxidase (RU-HRP1000, Diagnostic BioSystems, Pleasanton, CA) and diaminobenzidine chromogen substrate (BSB0018A, Bio SB, Santa Barbara, CA). A periodic acid-Schiff post-stain was applied without hematoxylin counterstain. The tool uses a combination of stain deconvolution, digital image processing, and feature engineering to compute histologic podometrics6 with correction for section thickness7. In this study, PodoCount was used to assess mean glomerular podocyte count per mouse.“

      (2) In the abstract, the authors give the impression that they know definitively the sequence of HIV gene expression, cytoskeletal dysregulation, dedifferentiation, then loss from glomeruli. Since they could only examine cells that were present in glomeruli, they can't definitively say much about the cells that were lost from glomeruli.

      As suggested, deleted the following text: “and were lost from glomeruli tuft”

      (3) The authors state that 56,976 cells were used for snRNAseq studies. Was the number of cells similar for each of the 8 mice (from 4 different groups)?

      In response, we have created a new table summarizing numbers of nuclei from each sample (i.e. each mouse) added to the Supplemental Figure 2D as follows:

      Author response table 1.

      Pre-processing of single-nuclear RNA-seq data, Breakdown of nuclei numbers from each sample showed comparable numbers of nuclei analyzed.

      (4) Please provide information on the assay that was used to measure creatinine since some methods can be unreliable in mice

      This is now provided in the revised submission, including creatinine measurement methods (LC-MS/MS) on page 3 of Supplementary Material:

      “Mouse chemistry measurements

      Plasma creatinine was measured by isotope dilution LC-MS/MS at The University of Alabama at Birmingham O’Brien Center Core C (Birmingham, AL).”

      (5) The authors state that expression of PKR (Eif2ak2) was expressed in all nephron segments. However, it appears on visual inspection of the UMAP in Fig S2B that the percentage of cells expressing Eif2ak2 was low. What percent of cells expressed Eif2ak2 and if it was a low percentage, what is the authors hypothesis for how expression in a small percentage of cells led to the kidney phenotype?

      Supplemental Figure 2B (now 3B) does show modest expression of Eif2ak2, approximately 10%. The technique may lack sensitivity to detect low gene expression and even low gene expression may be sufficient to cause phenotypic change.

      (6a) In figure 4B and C, it is not clear what genotype/treatment group is shown.

      The legend for figure 4B, 4C has been modified to state that the group was wildtype mice

      (B, C) In situ hybridization of mt-Co1 and mt-Atp6 genes showed signals inside nuclei of WT mice

      (6b) Also, if these ISH images are from Tg26 mice, it would be helpful to do ISH in mice with/without C16 treatment.

      These images of ISH for these two genes are from wild-type mice, as now stated in the revised legend. Our purpose was to show that these mitochondrial-encoded gene transcripts (mt-Co1 and mt-Atp6) are transported to nuclei from the cytoplasm. We believe it is not necessary to do ISH in Tg26 mice because these genes are not disease-specific.

      (6c) Also, only 3-6% of cells express these "PT-mito" markers by snRNAseq, but it appears that far more are expressed by ISH, raising concerns for nonspecific binding of the ISH probe.

      (6d) Also, nonsense controls should be included to demonstrate the specificity of the ISH data.

      First (comment 6c), the PT-mito cluster does not have specific markers, to our knowledge. Second (comment 6d) , to address the concern for non-specific binding of the ISH probes, we have now added additional ISH images, together with a negative control probe (C. elegans gene dapB) and a positive control probe (mouse Ppib), as shown in Supplementary Figure 5A and 5B, respectively.

      Author response image 4.

      Additional in situ hybridization images. (A) In situ hybridization images probing dapB (negative control probe) showed no signals. (B) In situ hybridization images probing Ppib (positive control probe) showed strong signals.

      (7) The authors state that "mitochondrial dysfunction was most pronounced in the PT-Mito cluster" but in Figure 4D, the oxidative phosphorylation activation Z score was most down in the PT-inj (injured PT cells) and the PT-Mito cells were the 4-most downregulated cell type.

      We appreciate the careful reading and agree with reviewer’s comment. In the revision, we have deleted “most” from this description.

      (8) In Fig 4F, please state what "Cp expression" means.

      We have spelled out ceruloplasmin (Cp).

      (9) It is not clear in immunohistochemistry images in Fig 5F where the p-stat3 was detected due to the hematoxylin counterstain which may have obscured subtle nuclear staining. Also, some of the strongest staining appears to be in peritubular capillaries, instead of tubular and glomerular epithelial cells.

      We have added arrows to help readers see where we show that p-Stat3 was detected as faintly-brown and distinct cytoplasmic granules in injured tubular cells in Tg26 mice (panel F), as opposed to diffuse in tubular cytoplasmic color in wild-type mice (panel E).

      Author response image 5.

      (10) For the studies of mitochondrial oxygen consumption (Fig 6), it would be helpful to also provide data on the effect of C16 in wild-type kidneys, in case C16 somehow causes a primary increase in mitochondrial oxygen consumption rather than preventing HIV-induced loss in kidney cells from HIV-transgenic mice.

      We did not include Seahorse data regarding oxygen consumption from WT mice treated with C16, as C16 did not affect either renal function or transcriptomes in WT mice, in contrast to the Tg26 mice (Figure 1A-G).

      (11) The authors emphasize that podocytes had the highest expression of HIV genes (Fig 7). However, it appears that <2% of podocytes expressed HIV genes. How do the authors explain the severe renal phenotype given the relatively small number of cells expressing the HIV transgene? Also, did the same cells express all/most of the HIV transcripts, or did some cells express some HIV transcripts? For instance, since the authors state that vpr and nef have the most important role in kidney injury, were the same cells that expressed nef also expressing Vpr?

      We know that snRNA-seq cannot detect the whole transcriptome in each cell, due to the well-known drop-out effect characteristic of the method. Several factors may contribute to this drop-out effect, including stochastic patterns of gene expression, low RNA amounts and inefficient mRNA capture (Qiu, Nature Comm, 2020; Ran, Bioinformatics, 2020).

      Our interpretation is that HIV gene expressing-podocytes had higher expression of HIV genes, but it does not mean that other kidney cells entirely lack HIV gene expression. With regard to co-expression of other HIV transcripts, nef and vpr were more often coexpressed as shown in Figure 7J. Vpr was expressed in nef-positive podocytes and not detected in nef-negative podocytes.

      (12) In figure 8, the authors emphasize the dysregulation of genes involved in cell-cell interaction, particularly PDGF-D. They show some data for the effect of C16 in this system in Fig 8 but it would be helpful if they can state the effect in the text of the Results section.

      We have added text in the Results describing activating interactions in Tg26 mice, that were reduced by C16 exposure, as follows: (page 18)

      “For example, platelet derived growth factor D (PDGF-D) was upregulated in PT-Inj in Tg26 mice and was downregulated by C16 treatment (Figure 8D). Further, PDGF-D may interact with PDGFR-B in fibroblasts.”

    1. Author response:

      We extend our sincere gratitude to the editor and three reviewers for their invaluable feedback, which not only included positive comments but also provided constructive suggestions for enhancing the quality of our manuscript.

      Of potential interest to you is our forthcoming investigation into vaccine efficacy, where we will compare the effectiveness of our live-attenuated vaccine with an mRNA-based alternative.

      Moreover, we acknowledge and fully endorse the recommendation to elucidate why immunization with our live-attenuated vaccine confers protection against viral challenge, even in the absence of sufficient neutralizing antibodies. As pointed out by the reviewers, this phenomenon may be attributed to mucosal immunity. Consequently, we have outlined plans to investigate whether the attenuated live vaccine elicits mucosal immunity as part of our ongoing research.

      We are currently working to gather the necessary data to address these inquiries comprehensively, and are aiming to resubmit our manuscript at the earliest opportunity.

      Reviewer #1: We sincerely appreciate the insightful comments provided by Reviewer #1. In response to this feedback, we will conduct a comparative analysis of efficacy between our live-attenuated vaccine and an mRNA-based alternative. Furthermore, we will thoroughly examine and delineate the advantages and limitations of this/our live-attenuated vaccine in our discussion.

      Reviewer #2: We express our sincere appreciation to Reviewer #2 for invaluable suggestions. In light of the insightful observation concerning the weakness of our study, related to the poor assessment/evaluation of the induction of mucosal immunity by our vaccine candidate, we have resolved to undertake a comprehensive analysis in this regard.

      Furthermore, we will take into account this reviewer's recommendation to compare BK2102 results with those of an mRNA vaccine. We are currently in the process of planning additional experiments to thoroughly address this aspect.

      Reviewer #3: We are very grateful to Reviewer #3 for the positive feedback and invaluable suggestions. In order to further explore the immune mechanisms underlying the protection against the Omicron variant in the absence of detectable neutralizing antibodies, we are currently devising plans for experiments focused on evaluating mucosal immunity.

      Moreover, in accordance with Reviewer #3's suggestion, we are considering the incorporation of an ELISPOT assay experiment. However, we acknowledge uncertainties regarding the feasibility of establishing an experimental system for this purpose.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript, Day et al. present a high-throughput version of expansion microscopy to increase the throughput of this well-established super-resolution imaging technique. Through technical innovations in liquid handling with custom-fabricated tools and modifications to how the expandable hydrogels are polymerized, the authors show robust ~4-fold expansion of cultured cells in 96-well plates. They go on to show that HiExM can be used for applications such as drug screens by testing the effect of doxorubicin on human cardiomyocytes. Interestingly, the effects of this drug on changing DNA organization were only detectable by ExM, demonstrating the utility of HiExM for such studies.

      Overall, this is a very well-written manuscript presenting an important technical advance that overcomes a major limitation of ExM - throughput. As a method, HiExM appears extremely useful, and the data generally support the conclusions.

      Strengths:

      Hi-ExM overcomes a major limitation of ExM by increasing the throughput and reducing the need for manual handling of gels. The authors do an excellent job of explaining each variation introduced to HiExM to make this work and thoroughly characterize the impressive expansion isotropy. The dox experiments are generally well-controlled and the comparison to an alternative stressor (H2O2) significantly strengthens the conclusions.

      Weaknesses:

      (1) Based on the exceedingly small volume of solution used to form the hydrogel in the well, there may be many unexpanded cells in the well and possibly underneath the expanded hydrogel at the end of this. How would this affect the image acquisition, analysis, and interpretation of HiExM data?

      The hydrogel footprint covers approximately 5% of the surface within an individual well and only cells within this area are embedded in the polymerized hydrogel for subsequent processing steps. Cells that are outside of this footprint are not incorporated into the gel, meaning that these cells are digested by Proteinase K and subsequently washed away by the excess water exchange in the gel swelling step. Note that different cell types may require higher or lower concentrations of Proteinase K to adequately digest cells for expansion while maintaining fluorescence signal. Given the compatibility of HiExM with 96-well plates, this titration can be performed rapidly in a single experiment. Although cells outside of the hydrogel footprint are removed prior to imaging, we do occasionally observe Hoechst signal that appears to be underneath the gels. We believe this signal is likely from excess DNA from digested cells that was not fully washed out in the gel swelling step. This signal is both spatially and morphologically distinct from the nuclear signal of intact cells and it does not affect image acquisition, analysis, or data interpretation.

      (2) It is unclear why the expansion factor is so variable between plates (e.g., Figure 2H). This should be discussed in more detail.

      The variability in expansion factor across plates can likely be attributed to the small volume (~250 nL) deposited by the device posts. Small variations in gel volume could impact gel polymerization compared to standard ExM gels. For example, gels in HiExM are more sensitive to evaporation because they are ~1000x smaller than standard expansion gel preparations due to an increased air-liquid-interface. Evaporation in HiExM gels increases monomer and cross linker concentrations, leading to variation in expansion factor across plates. We note that expansion factor is robust within well plates and that variance is slightly increased between plates. These differences will be discussed in the revised manuscript.

      (3) The authors claim that CF dyes are more resistant to bleaching than other dyes. However, in Figure. S3, it appears that half of the CF dyes tested still show bleaching, and no data is shown supporting the claim that Alexa dyes bleach. It would be helpful to include data supporting the claim that Alexa dyes bleach more than CF dyes and the claim that CF dyes in general are resistant to bleaching should be modified to more accurately reflect the data shown.

      We did not show data using Alexa dyes because these fluorophores are highly sensitive to photobleaching using Irgacure and thus we could not obtain images. In contrast, some CF dyes are more robust to bleaching in HiExM including CF488A, CF568, and CF633 dyes. We have recently adapted our protocol to PhotoExM chemistry which is compatible with a wider range of fluorophores as described by Günay et al. (2023) and as shown in current Fig. S11.

      (4) Related to the above point, it appears that Figure S11 may be missing the figure legend. This makes it hard to understand how HiExM can use other photo-inducible polymerization methods and dyes other than CF dyes.

      The following figure legend will be included in the revised manuscript. Fig. S11: Example of a cell expanded in HiExM using Photo-ExM gel chemistry. Photo-ExM does not require an anoxic environment for gel deposition and polymerization, improving ease of use of HiExM. Mitochondria were stained with an Alexa 647 conjugated secondary antibody, indicating that HiExM is compatible with additional fluorophores when combined with Photo-ExM.

      (5) The use of automated high-content imaging is impressive. However, it is unclear to me how the increased search space across the extended planar area and focal depths in expanded samples is overcome. It would be helpful to explain this automated imaging strategy in more detail.

      We imaged plates on the Opera Phenix using the PreciScan Acquisition Software in Harmony. In brief, each well is imaged at 5x magnification in the Hoechst channel to capture the full well at low resolution. Hoechst is used for this step given its signal brightness, ubiquity across established staining protocols, and spectral independence from most fluorophores commonly conjugated to secondary antibodies. Using this information, the microscope detects regions of interest (nuclei) based on criteria including size, brightness, circularity, etc. Finally, the positional information for each region is stored, and the microscope automatically images those regions at 63x magnification. The working distance for the objective used in this study is 600 µm which is sufficient to capture the entirety of expanded cells in the Z direction. This strategy allows minimizes off-target imaging and allows robust image acquisition even in cultures with lower seeding density. A detailed description of the automated imaging strategy will be included in the revised manuscript.

      (6) The general method of imaging pre- and post-expansion is not entirely clear to me. For example, on page 5 the authors state that pre-expansion imaging was done at the center of each gel. Is pre-expansion imaging done after the initial gel polymerization? If so, this would assume that the gelation itself has no effect on cell size and shape if these gelled but not yet expanded cells are used as the reference for calculating expansion factor and isotropy.

      Pre-expansion imaging is performed after staining is complete, but prior to the application of AcX, which is the first step of the HiExM protocol. Following staining and imaging, plates can be sealed with paraffin and stored at 4˚C for up to a week prior to starting the expansion protocol. We typically image 61 fields of view at the center of the well plate (where the gel will be deposited) to obtain sufficient pre-expansion images as shown in Figure 2b (left). After pre-expansion imaging, we perform the HiExM protocol followed by image acquisition. We then tile all the images, as shown in Figure 2b, and compare tiled images from the same well pre- and post-expansion to manually identify the same cells. Comparisons of the pre- and post-expansion images of the same cell are then used to calculate expansion factor and isotropy measurements as described. This detailed description will be included in the revised manuscript.

      (7) In the dox experiments, are only 4 expanded nuclei analyzed? It is unclear in the Figure 3 legend what the replicates are because for the unexpanded cells, it says the number of nuclei but for expanded it only says n=4. If only 4 nuclei are analyzed, this does not play to the strengths of HiExM by having high throughput.

      We performed the DOX titration assay across four different well plates (i.e. n=4). For each condition, the total number of nuclei measured was 56, 71, 64, 92, and 62 for DMSO, 1nM, 10nM, 100nM, and 1µM, respectively. For SEM calculations, we included the number of technical replicates to avoid underestimating error. We have revised the Figure 3 legend to better reflect the experimental details.

      (8) I am not sure if the analysis of dox-treated cells is accurate for the overall phenotype because only a single slice at the midplane is analyzed. It would be helpful to show, at least in one or two example cases, that this trend of changing edge intensity occurs across the whole 3D nucleus.

      We will repeat our analysis on a subset of images using multiple optical sections for each nucleus reported. These new data will be included in the revised manuscript.

      (9) It would be helpful to provide an actual benchmark of imaging speed or throughput to support the claims on page 8 that HiExM can be combined with autonomous imaging to capture thousands of cells a day. What is the highest throughput you have achieved so far?

      The parameters that dictate imaging speed in HiExM include exposure time, z-stack height, and number of channels. Depending on the signal intensity for a given channel, exposure times vary from 200ms to 1000ms. For z-stack height, we found that imaging 65 sections with 1µm spacing allowed for robust identification of each region of interest in the 5x pre-scan. As an example, collecting images for a full well plate (e.g., 20 images per well with 4 channels) requires approximately 24 hours of autonomous image acquisition using the Opera Phenix. Depending on cell size, this yields imaging data for between 1200 cells (1 cell per field of view) to 6000 cells (5 cells per field of view). Different autonomous imagers as well as improving staining techniques that increase signal:noise can be expected to significantly decrease the exposure time as it will reduce the number of z-stacks needed for each region.

      Reviewer #2 (Public Review):

      Summary:

      In the present work, the authors present an engineering solution to sample preparation in 96-well plates for high-throughput super-resolution microscopy via Expansion Microscopy. This is not a trivial problem, as the well cannot be filled with the gel, which would prohibit the expansion of the gel. A device was engineered that can spot a small droplet of hydrogel solution and keep it in place as it polymerizes. It occupies only a small portion of space at the center of each well, the gel can expand into all directions, and imaging and staining can proceed by liquid handling robots and an automated microscope.

      Strengths:

      In contrast to Reference 8, the authors' system is compatible with standard 96 well imaging plates for high-throughput automated microscopy and automated liquid handling for most parts of the protocol. They thus provide a clear path towards high-throughput ExM and high-throughput super-resolution microscopy, which is a timely and important goal.

      Weaknesses:

      The assay they chose to demonstrate what high-throughput ExM could be useful for, is not very convincing. But for this reviewer that is not important.

      We appreciate this reviewer’s point. We believe the data provide an example of the power of HiExM for collecting thousands of nanoscale images that would benefit experiments that require many samples (e.g., conditions, replicates, timepoints, etc.). The ability to generate large data sets also enables quantitative analysis of images with appropriate statistical power. The intention of this experiment was to provide a proof-of-concept example of the robustness, accessibility, and experimental design flexibility of HiExM.

      Reviewer #3 (Public Review):

      Summary:

      Day et al. introduced high-throughput expansion microscopy (HiExM), a method facilitating the simultaneous adaptation of expansion microscopy for cells cultured in a 96-well plate format. The distinctive features of this method include 1) the use of a specialized device for delivering a minimal amount (~230 nL) of gel solution to each well of a conventional 96-well plate, and 2) the application of the photochemical initiator, Irgacure 2959, to successfully form and expand the toroidal gel within each well.

      Strengths:

      This configuration eliminates the need for transferring gels to other dishes or wells, thereby enhancing the throughput and reproducibility of parallel expansion microscopy. This methodological uniqueness indicates the applicability of HiExM in detecting subtle cellular changes on a large scale.

      Weaknesses:

      To demonstrate the potential utility of HiExM in cell phenotyping, drug studies, and toxicology investigations, the authors treated hiPS-derived cardiomyocytes with a low dose of doxycycline (dox) and quantitatively assessed changes in nuclear morphology. However, this reviewer is not fully convinced of the validity of this specific application. Furthermore, some data about the effect of expansion require reconsideration.

      The application we chose was intended as a proof of concept. We believe the data provide an example of the power of HiExM for collecting thousands of nanoscale images that would benefit experiments that require many samples (e.g., conditions, replicates, timepoints, etc.). The ability to generate large data sets also enables quantitative analysis of images with appropriate statistical power. The intention of this experiment was to provide a proof-of-concept example of the robustness, accessibility, and experimental design flexibility of HiExM.

      The variability in expansion factor across plates can likely be attributed to the small volume (~250 nL) deposited by the device posts. Small variations in gel volume could impact gel polymerization compared to standard ExM gels. For example, gels in HiExM are more sensitive to evaporation because they are ~1000x smaller than standard expansion gel preparations due to an increased air-liquid-interface. Evaporation in HiExM gels increases monomer and cross linker concentrations, leading to variation in expansion factor across plates. We note that expansion factor is robust within well plates and that variance is slightly increased between plates. These differences will be discussed in the revised manuscript.

    1. Author response:

      eLife assessment

      This study presents valuable information on the mechanism of how birnavirus VP3 protein interacts with PI3P in early endosomes. Evidence supporting the proposed two-stage mechanism is incomplete and would benefit from additional supporting experiments, and additional experimentation would also address concerns about data consistency.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Zanetti et al. use biophysical and cellular assays to investigate the interaction of the birnavirus VP3 protein with the early endosome lipid PI3P. The major novel finding is that the association of the VP3 protein with an anionic lipid (PI3P) appears to be important for viral replication, as evidenced through a cellular assay on FFUs.

      Strengths:

      Supports previously published claims that VP3 may associate with early endosomes and bind to PI3P-containing membranes. The claim that mutating a single residue (R200) critically affects early endosome binding and that the same mutation also inhibits viral replication suggests a very important role for this binding in the viral life cycle.

      Weaknesses:

      The manuscript is relatively narrowly focused: one bimolecular interaction between a host cell lipid and one protein of an unusual avian virus (VP3-PI3P). Aspects of this interaction have been described previously. Additional data would strengthen claims about the specificity and some technical issues should be addressed. Many of the core claims would benefit from additional experimental support to improve consistency.

      We focused our efforts on the characterization of the molecular interaction between the birnaviral protein VP3 and the anionic lipid PI3P, which is found in the host cell. This decision was motivated by our previous research, which made use of cell biology and virology techniques to demonstrate that VP3 facilitates the formation of the viral replication machinery on the cytosolic leaflet of early endosomes due to its inherent endosome-targeting capability (J Virol. 2018 May 14;92(11):e01964-17). Additionally, our previous findings indicated that PI3P, present in early endosomal membranes, is a critical host factor enabling VP3's association with these membranes, thereby promoting viral replication (J Virol. 2021 Feb 24;95(6):e02313-20). Consequently, an in-depth characterization of the VP3/PI3P interaction was necessary and motivated the present work. We plan to incorporate specific recommendations to further substantiate our assertions in the revised version of our manuscript.

      Reviewer #2 (Public Review):

      Summary:

      Birnavirus replication factories form alongside early endosomes (EEs) in the host cell cytoplasm. Previous work from the Delgui lab has shown that the VP3 protein of the birnavirus strain infectious bursal disease virus (IBDV) interacts with phosphatidylinositol-3-phosphate (PI3P) within the EE membrane (Gimenez et al., 2018, 2020). Here, Zanetti et al. extend this previous work by biochemically mapping the specific determinants within IBDV VP3 that are required for PI3P binding in vitro, and they employ in silico simulations to propose a biophysical model for VP3-PI3P interactions.

      Strengths:

      The manuscript is generally well-written, and much of the data is rigorous and solid. The results provide deep knowledge into how birnaviruses might nucleate factories in association with EEs. The combination of approaches (biochemical, imaging, and computational) employed to investigate VP3-PI3P interactions is deemed a strength.

      Weaknesses:

      (1) Concerns about the sources, sizes, and amounts of recombinant proteins used for co-flotation: Figures 1A, 1B, 1G, and 4A show the results of co-flotation experiments in which recombinant proteins (control His-FYVE v. either full length or mutant His VP3) were either found to be associated with membranes (top) or non-associated (bottom). However, in some experiments, the total amounts of protein in the top + bottom fractions do not appear to be consistent in control v. experimental conditions. For instance, the Figure 4A western blot of His-2xFYVE following co-flotation with PI3P+ membranes shows almost no detectable protein in either top or bottom fractions.

      Liposome-based methods, such as the co-flotation assay, are well-known and preferred to study protein-phosphoinositide interaction because the phosphoinositides are incorporated in a membrane, the composition of which can mimic cellular membranes. Additionally, by modifying the phosphoinositide incorporated in the liposomes, this technique allows for determining the specificity of the protein binding. However, this approach is rather qualitative, meaning that, after density gradient separation, the protein is found in the top fractions (bound to liposomes) or in the bottom fractions (not bound to liposomes), and our quantifications have the aim of showing the difference in the bound fraction between liposome populations with or without PI3P. Given the setting of the co-flotation assays, each protein-liposome system [2xFYVE-PI3P(-), 2xFYVE-PI3P(+), VP3-PI3P(-), or VP3-PI3P(+)] is assessed separately, and even if the conditions are homogeneous, it’s not surprising to observe differences in the protein level between each one. Indeed, our revised version of the manuscript will include membranes with more similar band intensities.

      Reading the paper, it was difficult to understand which source of protein was used for each experiment (i.e., E. coli or baculovirus-expressed), and this information is contradicted in several places (see lines 358-359 v. 383-384). Also, both the control protein and the His-VP3-FL proteins show up as several bands in the western blots, but they don't appear to be consistent with the sizes of the proteins stated on lines 383-384. For example, line 383 states that His-VP3-FL is ~43 kDa, but the blots show triplet bands that are all below the 35 kDa marker (Figures 1B and 1G). Mass spectrometry information is shown in the supplemental data (describing the different bands for His-VP3-FL) but this is not mentioned in the actual manuscript, causing confusion. Finally, the results appear to differ throughout the paper (see Figures 1B v. 1G and 1A v. 4A).

      We used two sources of recombinant VP3: baculovirus and Escherichia coli. Initially, we opted for the baculovirus system based on evidence from previous studies that it was suitable for ectopic expression of VP3. Subsequently, we successfully produced VP3 using Escherichia coli and chose to transition to this system due to several technical advantages. Moreover, mass spectrometry analysis did not reveal any post-translational modifications that may have favored retaining the baculoviral system. We confirmed that VP3, produced in either system, exhibited similar behavior in our co-flotation assays. We will clarify all this in the revised version of our manuscript.

      (2) Possible "other" effects of the R200D mutation on the VP3 protein. The authors performed mutagenesis to identify which residues within patch 2 on VP3 are important for association with PI3P. They found that a VP3 mutant with an engineered R200D change (i) did not associate with PI3P membranes in co-floatation assays, and (ii) did not co-localize with EE markers in transfected cells. Moreover, this mutation resulted in the loss of IBDV viability in reverse genetics studies. The authors interpret these results to indicate that this residue is important for "mediating VP3-PI3P interaction" (line 211) and that this interaction is essential for viral replication. However, it seems possible that this mutation abrogated other aspects of VP3 function (e.g., dimerization or other protein/RNA interactions) aside from or in addition to PI3P binding. Such possibilities are not mentioned by the authors.

      The arginine amino acid at position 200 of VP3 is not located in any of the protein regions associated with its other known functions. VP3 has a dimerization domain located in the second helical domain, where different amino acids across the three helices form a total of 81 interprotomeric close contacts; however, R200 is not involved in these contacts (Structure. 2008 Jan;16(1):29-37). VP3 also has an oligomerization domain mapped within the 42 C-terminal residues of the polypeptide, i.e., the segment of the protein composed by the residues at positions 216-257 (J Virol. 2003 Jun;77(11):6438–6449). Regarding VP3’s ability to bind RNA, it is facilitated by a region of positively charged amino acids, identified as P1, which includes K99, R102, K105, and K106 (PLoS One. 2012;7(9):e45957). Furthermore, our findings indicate that the R200D mutant retains a folding pattern similar to the wild-type protein, as shown in Figure 4B. All these lead us to conclude that the loss of replication capacity of R200D viruses results from impaired, or even lost, VP3-PI3P interaction.

      (3) Interpretations from computational simulations. The authors performed computational simulations on the VP3 structure to infer how the protein might interact with membranes. Such computational approaches are powerful hypothesis-generating tools. However, additional biochemical evidence beyond what is presented would be required to support the authors' claims that they "unveiled a two-stage modular mechanism" for VP3-PI3P interactions (see lines 55-59). Moreover, given the biochemical data presented for R200D VP3, it was surprising that the authors did not perform computational simulations on this mutant. The inclusion of such an experiment would help tie together the in vitro and in silico data and strengthen the manuscript.

      We acknowledge that the language used may have overstated the "unveiling" of the two-stage binding mechanism for VP3 on membranes containing PI3P. We intended to propose, rather than confirm, this mechanism, largely based on our coarse-grained simulations. Accordingly, we will revise the manuscript to temper our claims and frame them more appropriately. Regarding the absence of computer simulations for the R200D VP3 mutant, these were indeed conducted, and the results are detailed in Figure 14 of the supplementary material. We realize this was not adequately emphasized in the main manuscript, an oversight we will correct in the revised version.

      Reviewer #3 (Public Review):

      Summary:

      Infectious bursal disease virus (IBDV) is a birnavirus and an important avian pathogen. Interestingly, IBDV appears to be a unique dsRNA virus that uses early endosomes for RNA replication that is more common for +ssRNA viruses such as for example SARS-CoV-2.

      This work builds on previous studies showing that IBDV VP3 interacts with PIP3 during virus replication. The authors provide further biophysical evidence for the interaction and map the interacting domain on VP3.

      Strengths: Detailed characterization of the interaction between VP3 and PIP3 identified R200D mutation as critical for the interaction. Cryo-EM data show that VP3 leads to membrane deformation.

      Weaknesses:

      The work does not directly show that the identified R200 residues are directly involved in VP3-early endosome recruitment during infection. The majority of work is done with transfected VP3 protein (or in vitro) and not in virus-infected cells. Additional controls such as the use of PIP3 antagonizing drugs in infected cells together with a colocalization study of VP3 with early endosomes would strengthen the study. In addition, it would be advisable to include a control for cryo-EM using liposomes that do not contain PIP3 but are incubated with HIS-VP3-FL. This would allow ruling out any unspecific binding that might not be detected on WB.

      The authors also do not propose how their findings could be translated into drug development that could be applied to protect poultry during an outbreak. The title of the manuscript is broad and would improve with rewording so that it captures what the authors achieved.

      In previous works from our group, we demonstrated the crucial role of the VP3 P2 region in targeting the early endosomal membranes and for viral replication, including the use of PI3K inhibitors to deplete PI3P, showing that both the control RFP-2xFYVE and VP3 lost their ability to associate with the early endosomal membranes (J Virol. 2018 May 14;92(11):e01964-17; J Virol. 2021 Feb 24;95(6):e02313-20). In the present work, to further characterize the role of R200 in binding to early endosomes and for viral replication, we show that: i) the transfected VP3 R200D protein loses the ability to bind to early endosomes in immunofluorescence assays (Figure 2E and Figure 3); ii) the recombinant VP3 R200D protein loses the ability to bind to liposomes PI3P(+) in co-flotation assays (Figure 4A); and, iii) the mutant virus R200D loses replication capacity (Figure 4C).

      Regarding the cryo-EM comment: we will include images where we used liposomes PIP3(-) in the revised version of our manuscript.

      We will also modify the title of the manuscript.

      Regarding the question of how our findings could be translated into drug development, indeed, VP3-PI3P binding constitutes a good target for drugs that counteract infectious bursal disease. However, we did not mention this idea in the manuscript, first because it is somewhat speculative and second because infected farms do not implement any specific treatment. The control is based on vaccination. We will mention these aspects of the infection in the revised version of our manuscript.

    1. Author Response

      The following is the authors’ response to the previous reviews.

      Thank you once again for your patience and guidance through this revision process. I would like to add an important aspect to our previous discussion regarding the identification and impact of potential contaminants in our study.

      In recent years, advanced tools such as SCRuB (recently published in Nature Biotechnology, DOI:10.1038/s41587-023-01696-w) and the widely-used tool decontam have been developed to address the issue of contaminants in metagenomic studies. These tools primarily operate based on sequence similarity, identifying potential contaminants by marking and removing those found in only a minority of samples or those that display patterns indicative of laboratory contamination.

      As the reviewer rightly pointed out, contaminants are often rare species that appear in very few samples. Our study, focusing on high-abundance species in the vaginal microbiome, is less susceptible to the influences of such rare contaminants. This approach aligns with the methodology employed by leading research groups in the field, such as Professor Jacques Ravel's lab. Their decision not to use blank controls in several of their studies on the female reproductive tract microbiome likely stems from a similar understanding — that the impact of rare contaminants is minimal on the study's conclusions, especially when high-abundance species are the main focus.

      We believe that the methodologies and tools currently available for contaminant identification and removal, while highly effective for their intended purpose, reinforce our decision to focus on high-abundance species. This focus minimizes the potential impact of rare contaminants on our study's conclusions. In light of this, our study's methodology remains robust and well-suited for achieving our research objectives.

      In our revised manuscript, we will include a discussion of these points, further clarifying our approach and the rationale behind our methodological choices. We hope that this additional information will address the concerns raised and provide a clearer understanding of the context and reliability of our findings.

      Thank you for considering these additional points. We look forward to your feedback on our revised manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We thank the reviewers for their thoughtful comments. We were pleased that they thought our study was "well crafted and written", "important", and that it provides a "valuable resource for researchers studying color vision". They also expressed several constructive criticisms, concerning – among other things – the lack of details regarding experimental procedures and analysis, the challenge in relating retinal data to cortical recordings, and consistency of results across animals. In response to the reviewers’ comments and following their suggestions, we performed additional analyses, and substantially revised the paper:

      We added a section in the Discussion about "Limitations of the stimulus paradigm". In addition, we added a new Suppl. Figure that illustrates the effect of deconvolution of calcium traces on our results and clarified in the text why we use deconvolved signals for all analyses. The new Suppl. Figure also shows an additional analysis with a more conservative threshold of neuron exclusion.

      We now clarify how retinal signaling relates to our cortical results and rewrote the text to be more conservative regarding our conclusions.

      In addition, we added a new Suppl. Figure showing the key analyses from Figures 2 and 4 separately for each animal. We now mention consistency across animals in the Results section and clearly state which analyses were performed an data pooled across animals.

      We are positive that these additions address the issues raised by the reviewers. Please find our point-by-point replies to all comments below.

      eLife assessment

      Franke et al. explore and characterize the color response properties in the mouse primary visual cortex, revealing specific color opponent encoding strategies across the visual field. The data is solid; however, the evidence supporting some conclusions and details about some procedures are incomplete. In its current form, the paper makes a useful contribution to how color is coded in mouse V1. Significance would be enhanced with some additional analyses and resolution of some technical issues.

      We thank the reviewers for appreciating our manuscript and their thoughtful comments.

      Referee 1 (Remarks to the Author):

      Summary:

      In this study, Franke et al. explore and characterize the color response properties across the primary visual cortex, revealing specific color opponent encoding strategies across the visual field. The authors use awake-behaving 2P imaging to define the spectral response properties of visual interneurons in Layer 2/3. They find that opponent responses are more prominent at photopic light levels, and diversity in color opponent responses exists across the visual science, with green ON/ UV OFF responses being stronger represented in the upper visual field. This is argued to be relevant for detecting certain features that are more salient when the chromatic space is used, possibly due to noise reductions.

      Strengths:

      The work is well crafted and written and provides a thorough characterization that reveals an uncharacterized diversity of visual properties in V1. I find this characterization important because it reveals how strongly chromatic information can modulate the response properties in V1. In the upper visual field, 25% of the cells differentially relay chromatic information, and one may wonder how this information will be integrated and subsequently used to aid vision beyond the detection of color per see. I personally like the last paragraph of the discussion that highlights this fact.

      We thank the reviewer for appreciating our manuscript.

      Weaknesses: One major point highlighted in this paper is the fact that Green ON/UV OFF responses are not generated in the retina. But glancing through the literature, I saw this is not necessarily true. Fig 1. of Joesch and Meister, a paper cited, shows this can be the case. Thus, I would not emphasize that this wasn’t present in the retina. This is a minor point, but even if the retina could not generate these signals, I would be surprised if the diversity of responses would only arise through feed-forward excitation, given the intricacies of cortical connectivity. Thus, I would argue that the argument holds for most of the responses seen in V1; they need to be further processed by cortical circuitries.

      We thank the reviewer for this comment. When analyzing available data from the retina using a similar center-surround color flicker stimulus (Szatko et al. 2020), we found that Green On/UV Off color opponency is very rare in the RF center of retinal ganglion cells (Suppl. Fig. 5). This suggests that center Green On/UV Off color opponency in V1 neurons is not inherited by the RF center of retinal neurons. However, we agree with the reviewer that retinal neurons might still contribute to V1 color opponency, for example by being center-surround color opponent (e.g. Joesch et al. 2016 and Szatko et al. 2020). We rephrased the text to acknowledge this fact.

      This takes me to my second point, defining center and surround. The center spot is 37.5 deg of visual angle, more than 1 mm of the retinal surface. That means that all retinal cells, at least half and most likely all of their surrounds will also be activated. Although 37.5 deg is roughly the receptive field size previously determined for V1 neurons, the one-to-one comparison with retinal recording, particularly with their center/surround properties, is difficult. This should be discussed. I assume that the authors tried a similar approach with sparse or dense checker white noise stimuli. If so, it would be interesting if there were better ways of defining the properties of V1 neurons on their complex/simple receptive field properties to define how much of their responses are due to an activation of the true "center" or a coactivation of the surround. Interestingly, at least some of the cells (Fig. 1d, cells 2 and 5) don’t have a surround. Could it be that in these cases, the "center" and "surround" are being excited together? How different would the overall statistics change if one used a full-filed flicker stimulus instead of a center/surround stimulus? How stable are the results if the center/surround flicker stimulus is shifted? These results won’t change the fact that chromatic coding is present in the VC and that there are clear differences depending on their position, but it might change the interpretation. Thus, I would encourage you to test these differences and discuss them.

      Thanks for this comment. We agree with the reviewer that a one-to-one comparison of retina and V1 data is challenging, due to differences in both RF and stimulus size. We rephrased the Results text to clarify this point and now also mention it in the Discussion.

      To be able to record from many V1 neurons simultaneously, we used a stimulus size of 37.5 degree visual angle in diameter, which is slightly larger than center RFs of single V1 neurons. As the reviewer mentions, the disadvantage of this approach is that the stimulus is only roughly centered on the neurons’ center RFs. To reduce the impact of potential stimulus misalignment on our results, we used the following steps:

      For each recording, we positioned the monitor such that the mean RF across all neurons lies within the center of the stimulus field of view.

      We confirmed that this procedure results in good stimulus alignment for the large majority of recorded neurons within individual recording fields by using a sparse noise stimulus (Suppl. Fig. 1a-c). Specifically, we found that for 83% of tested neurons, more than two thirds of their center RF, determined by the sparse noise stimulus, overlapped with the center spot of the color noise stimulus.

      For analysis, we excluded neurons without a significant center STA, which may be caused by misalignment of the stimulus.

      Together, we believe these points strongly suggest that the center spot and the surround annulus of the noise stimulus predominantly drive center (i.e. classical RF) and surround (i.e. extraclassical RF), respectively, of the recorded V1 neurons. This is further supported by the fact that color response types identified using an automated clustering method were robust across mice (Suppl. Fig. 6c), indicating consistent stimulus centering.

      Nevertheless, we cannot exclude that the stimulus was misaligned for a subset of the recorded neurons used for analysis. We agree with the reviewer that such misalignment might have contributed to cells not having surround STAs, due to simultaneous activation of antagonistic center and surround RF components by the surround stimulus. While a full-field stimulus would get rid of the misalignment problem, it would not allow to study color tuning in center and surround RF components separately. Instead, one could compare the results of our approach with an approach that centers the stimulus on individual neurons. However, we believe that performing these additional experiments is out of the scope of the current study.

      To acknowledge the experimental limitations of our study and the concerns brought up by the reviewer, we now explicitly mention the steps we perform to reduce the effects of stimulus misalignment in the Results section and discuss the problem of stimulus alignment in the Discussion. We believe these changes will help the reader to interpret our results.

      Referee 2 (Remarks to the Author):

      Summary:

      Franke et al. characterize the representation of color in the primary visual cortex of mice and how it changes across the visual field, with a particular focus on how this may influence the ability to detect aerial predators. Using calcium imaging in awake, head-fixed mice, they characterize the properties of V1 neurons (layer 2/3) using a large center-surround stimulation where green and ultra-violet were presented in random combinations. Using a clustering approach, a set of functional cell-types were identified based on their preference to different combinations of green and UV in their center and surround. These functional types were demonstrated to have varying spatial distributions in V1, including one neuronal type (Green-ON/UV-OFF) that was much more prominent in the posterior V1 (i.e. upper visual field). Modelling work suggests that these neurons likely support the detection of predator-like objects in the sky.

      Strengths:

      The large-scale single-cell resolution imaging used in this work allows the authors to map the responses of individual neurons across large regions of the visual cortex. Combining this large dataset with clustering analysis enabled the authors to group V1 neurons into distinct functional cell types and demonstrate their relative distribution in the upper and lower visual fields. Modelling work demonstrated the different capacity of each functional type to detect objects in the sky, providing insight into the ethological relevance of color opponent neurons in V1.

      We thank the reviewer for appreciating our manuscript.

      Weaknesses:

      While the study presents solid evidence a few weaknesses exist, including the size of the dataset, clarity regarding details of data included in each step of the analysis and discussion of caveats of the work. The results presented here are based on recordings of 3 mice. While the number of neurons recorded is reasonably large (n > 3000) an analysis that tests for consistency across animals is missing. Related to this, it is unclear how many neurons at each stage of the analysis come from the 3 different mice (except for Suppl. Fig 4).

      Thank you for this comment. We apologize that the original manuscript did not clearly indicate the consistency of our results across animals. We have revised the manuscript in the following ways:

      We have added an additional Suppl. Figure, which shows the variability of the data within and across animals (Suppl. Fig. 4). Specifically, we show the distribution of color and luminance selectivity for (i) center and surround components of V1 RFs and (ii) for upper and lower visual field. This data is used for all analyses shown in Figures 2-4. The figure legend of this figure also states the number of neurons per animal.

      We now clearly state in the Results section that all analyses in the main figures were performed by pooling data across animals, and refer to the Suppl. Figures for consistency across animals.

      We believe these changes help the reader to interpret our results.

      Finally, the paper would greatly benefit from a more in depth discussion of the caveats related to the conclusion drawn at each stage of the analysis. This is particularly relevant regarding the caveats related to using spike triggered averages to assess the response preferences of ON-OFF neurons, and the conclusions drawn about the contribution of retinal color opponency.

      Thanks. We substantially revised the text to discuss caveats and limitations of the approach. For example, we added a section into the Discussion called "Limitations of the stimulus paradigm". In addition, we clarified how retinal signals relate to cortical ones and phrased our conclusions more conservatively.

      The authors provide solid evidence to support an asymmetric distribution of color opponent cells in V1 and a reduced color contrast representation in lower light levels. Some statements would benefit from more direct evidence such as the integration of upstream visual signals for color opponency in V1.

      Based on the comments from Reviewer 1, we have rephrased the statements regarding the integration of upstream visual signals for color opponency in V1. We think these revisions increase the clarity of the results and help the reader with interpretation.

      Overall, this study will be a valuable resource for researchers studying color vision, cortical processing, and the processing of ethologically relevant information. It provides a useful basis for future work on the origin of color opponency in V1 and its ethological relevance.

      Thanks! We thank the reviewer again for the helpful comments.

      Referee 3 (Remarks to the Author):

      This paper studies chromatic coding in mouse primary visual cortex. Calcium responses of a large collection of cells are measured in response to a simple spot stimulus. These responses are used to estimate chromatic tuning properties - specifically sensitivity to UV and green stimuli presented in a large central spot or a larger still surrounding region. Cells are divided based on their responses to these stimuli into luminance or chromatic sensitive groups. Several technical concerns limit how clearly the data support the conclusions. If these issues can be fixed, the paper would make a valuable contribution to how color is coded in mouse V1.

      We thank the reviewer for the helpful comments.

      Analysis: The central tool used to analyze the data is a "spike triggered average" of the responses to randomly varying stimuli. There are several steps in this analysis that are not documented, and hence evaluating how well it works is difficult. Central to this is that the paper does not measure spikes. Instead, measured calcium traces are converted to estimated spike rates, which are then used to estimate STAs. There are no raw calcium traces shown, and the approach to estimate spike rates is not described in any detail. Confirming the accuracy of these steps is essential for a reader to be able to evaluate the paper. Further, it is not clear why the linear filters connecting the recorded calcium traces and the stimulus cannot be estimated directly, without the intermediate step of estimating spike rates.

      Thank you for this comment. We have used the genetically encoded calcium sensor GCaMP6s in our recordings. This sensor is a very sensitive GCaMP6 variant, but also one with slow kinetics. To remove the effect of the slow sensor kinetics from recorded calcium responses, the recorded traces are commonly deconvolved with the impulse function of the sensor to obtain the deconvolved calcium traces. We now include this reasoning in the Results section. To illustrate the effect of the deconvolution, we added a new Suppl. Figure (Suppl. Fig. 2) showing raw calcium and deconvolved traces, and the STAs estimated from both types of traces. This illustrates that the results regarding neuronal color preferences are consistent across raw and deconvolved calcium traces.

      We agree with the reviewer that the term STA might be confusing. We have replaced it with the term "even-triggered-average" (ETA). In addition, we have replaced the phrase "estimated spike rate" with "deconvolved calcium trace" throughout the manuscript because the unit of the deconvolved traces is not interpretable, like spike rate would be (spikes per second). In the revised version, we now clarify in the Methods section that we estimate the ETAs based on deconvolved calcium traces, which is correlated with and an approximation for spike rate.

      A further issue about the STAs is that the inclusion criterion (correlation of predicted vs measured responses of 0.25) is pretty forgiving. It would be helpful to see a distribution of those correlation values, and some control analyses to check whether the STA is providing a sufficiently accurate measure to support the results (e.g. do the central results hold for the cells with the highest correlations).

      We thank the reviewer for this comment. To exclude noisy neurons from analysis, we used the following procedure:

      For each of the four stimulus conditions (center and surround for green and UV stimuli), kernel quality was measured by comparing the variance of the STA with the variance of the baseline, defined as the first 500 ms of the STA. Only cells with at least 10-times more variance of the kernel compared to baseline for UV or green center STA were considered for further analysis.

      We have added the distribution of quality values to a new Suppl. Figure (Suppl. Fig. 2d,e). We now also show the percentage of neurons above threshold, given different quality thresholds. Finally, we have repeated the analysis shown in Figure 2 for a much more conservative threshold, including only the top 25% of neurons (Suppl. Fig. 2e,f). We now mention this new analysis in the Methods and Results section.

      Limitations of stimulus choice: The paper relies on responses to a large (37.5 degree diameter) modulated spot and surrounding region. This spot is considerably larger than the receptive fields of both V1 cells and retinal ganglion cells. As a result, the spot itself is very likely to strongly activate both center and surround mechanisms, and responses of cells are likely to depend on where the receptive fields are located within the spot (and, e.g., how much of the true neural surround samples the center spot vs the surround region). The impact of these issues on the conclusions is considered briefly at the start of the results but needs to be evaluated in considerably more detail. This is particularly true for retinal ganglion cells given the size of their receptive fields (see also next point).

      We agree with the reviewer that the centering of the stimulus is critical and apologize if this point was not discussed sufficiently. To be able to record from many V1 neurons simultaneously, we used a stimulus size of 37.5 degree visual angle in diameter, which is slightly larger than center RFs of single V1 neurons. As the reviewer mentions, the disadvantage of this approach is that the stimulus is only roughly centered on the neurons’ center RFs. To reduce the impact of potential stimulus misalignment on our results, we have used different experimental and analysis steps and controls (see also second comment of Reviewer 1):

      For each recording, we positioned the monitor such that the mean RF across all neurons lies within the center of the stimulus field of view.

      We confirmed that this procedure results in good stimulus alignment for the large majority of recorded neurons within individual recording fields by using a sparse noise stimulus (Suppl. Fig. 1a-c). Specifically, we found that for 83% of tested neurons, more than two thirds of their center RF, determined by the sparse noise stimulus, overlapped with the center spot of the color noise stimulus.

      For analysis, we excluded neurons without a significant center STA, which may be caused by misalignment of the stimulus.

      We now mention those clearly in the Results section and added the limitations of our approach to the Discussion section.

      Comparison with retina: A key conclusion of the paper is that the chromatic tuning in V1 is not inherited from retinal ganglion cells. This conclusion comes from comparing chromatic tuning in a previously-collected data set from retina with the present results. But the retina recordings were made using a considerably smaller spot, and hence it is not clear that the comparison made in the paper is accurate. This issue may be handled by the analysis presented in the paper, but if so it needs to be described more clearly. The paper from which the retina data is taken argues that rod-cone chromatic opponency originates largely in the outer retina. This mechanism would be expected to be shared across retinal outputs. Thus it is not clear how the Green-On/UV-Off vs Green-Off/UV-On asymmetry could originate. This should be discussed.

      We agree with the reviewer that a one-to-one comparison of retina and V1 data is challenging, due to differences in both RF and stimulus size. We rephrased the Results text to clarify this point and now also mention it in the Discussion.

      When analyzing available data from the retina using a similar center-surround color flicker stimulus (Szatko et al. 2020), we found that Green On/UV Off color opponency is very rare in the RF center of retinal ganglion cells (Suppl. Fig. 5). This suggests that center Green On/UV Off color opponency in V1 neurons is not inherited by the RF center of retinal neurons. However, we agree with the reviewer that retinal neurons might still contribute to V1 color opponency, for example by being center-surround color opponent (e.g. Joesch et al. 2016 and Szatko et al. 2020). We rephrased the text to acknowledge this fact.

      Residual chromatic cells at low mesopic light levels The presence of chromatically tuned cells at the lowest light level probed is surprising. The authors describe these conditions as rod-dominated, in which case chromatic tuning should not be possible. This again is discussed only briefly. It either reflects the presence of an unexpected pathway that amplifies weak cone signals under low mesopic conditions such that they can create spectral opponency or something amiss in the calibrations or analysis. Data collected at still lower light levels would help resolve this.

      Thank you for this comment. We call the lowest light level "low mesopic" and "rod-dominated" because the spectral contrast of V1 center responses in posterior recording fields is green-shifted for this light level (Fig. 3a). This is only expected if responses in the UV-cone dominant ventral retina are predominantly driven by rod photoreceptors. We now explain this rationale in the Results section. In addition, we mention in the Discussion that future studies are required to test whether cone signals need to be amplified for low light levels. While we agree with the reviewer that it would be exciting to use even lower light levels during recordings, we believe this is out of the scope of the current study due to the technical challenges involved in achieving scotopic stimulation.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We have revised the manuscript mainly in the following aspects: (1) the data of electrophysiological and behavioral responses of larvae and adults to trehalose have been added, and the related figures and texts have been modified accordingly; (2) the photos of taste organs of larvae and adults indicating the position of recorded sensilla have been added; (3) the potential off-target effects of GR knock-out on other GR expressions has been carefully explained and revised in the relevant text; (4) the abstract has been revised to present the findings more technically in a limited number of words; (5) some details of experiments in Materials and Methods and some new literatures have been added; (6) a new figure (Figure 8) summarizing the main findings of the study has been added.

      In the following, we respond to the reviewers’ comments and suggestions one by one. We hope that our answers will satisfy you and the three reviewers. We are also very happy to get further valuable advices from you.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The process of taste perception is significantly more intricate and complex in Lepidopteran insects. This investigation provides valuable insights into the role of Gustatory receptors and their dynamics in the sensation of sucrose, which serves as a crucial feeding cue for insects. The article highlights the differential sensitivity of Grs to sucrose and their involvement in feeding and insect behavior.

      Strengths:

      To support the notion of the differential specificity of Gr to sucrose, this study employed electrophysiology, ectopic expression of Grs in Xenopus, genome editing, and behavioral studies on insects. This investigation offers a fundamental understanding of the gustation process in lepidopteran insects and its regulation of feeding and other gustation-related physiological responses. This study holds significant importance in advancing our comprehension of lepidopteran insect biology, gustation, and feeding behavior.

      Thank you for your recognition of our research.

      Weaknesses:

      While this manuscript demonstrates technical proficiency, there exists an opportunity for additional refinement to optimize comprehensibility for the intended audience. Several crucial sugars have been overlooked in the context of electrophysiology studies and should be incorporated. Furthermore, it is imperative to consider the potential off-target effects of Gr knock-out on other Gr expressions. This investigation focuses exclusively on Gr6 and Gr10, while neglecting a comprehensive narrative regarding other Grs involved in sucrose sensation.

      We accept the reviewer's suggestion. Because trehalose is a main sugar in insect blood, and it is converted by insects after feeding on plant sugars, we have added the new data on electrophysiological and behavioral responses of larvae and adults of Helicoverpa armigera to trehalose (see Figure 1-2, Figure 1-figure supplement 1, Figure 2-figure supplement 1). Now, the total eight sugars include 2 pentoses (arabinose and xylose), 4 hexoses (fructose, fucose, galactose and glucose), and 2 disaccharides (sucrose and trehalose), which were chosen because they are mainly present in host-plants of H. armigera and/or representative in the structure and source of sugars.

      We fully agree to the reviewer’s opinion and have already taken the potential off-target effects of CRISPR/Cas9 knockout of Gr on other GR expressions into consideration. To predict the potential off-target sites of sgRNA of Gr6 and Gr10 establishing homozygous mutants using CRISPR/Cas9 technology, we first use online software CasOFFinder (http://www.rgenome.net/cas-offinder/) to blast the genome of the wild type cotton bollworm and set the mismatch number less than or equal to 3. We found that Gr10 sgRNA had no potential potential off-target site, and the sgRNA of Gr6 had only one potential off-target site. Therefore, we designed primers according to the sequence of potential off-target sites of Gr6 sgRNA, and conducted PCR using genomic DNA of homozygous mutant as a template, performed Sanger sequencing on the PCR products obtained, and found that the potential off-target sites of Gr6 sgRNA were no different from those of the wild type. Particularly, concerning the sgRNA of Gr6 and Gr10 may produce off-target effects on other sugar receptor genes of H. armigera, we conducted the same off-target site analysis with the designed sgRNA on each of the other eight sugar receptor genes, and found that there were no off-target sites on these receptor genes (see Line254-256).

      Reviewer #2 (Public Review):

      Summary:

      To identify sugar receptors and assess the capacity of these genes the authors first set out to identify behavioral responses in larvae and adults as well as physiological response. They used phylogenetics and gene expression (RNAseq) to identify candidates for sugar reception. Using first an in vitro oocyte system they assess the responses to distinct sugars. A subsequent genetic analysis shows that the Gr10 and Gr6 genes provide stage specific functions in sugar perception.

      Strengths:

      A clear strength of the manuscript is the breadth of techniques employed allowing a comprehensive study in a non-canonical model species.

      Thank you for your recognition of our research.

      Weaknesses:

      There are no major weaknesses in the study for the current state of knowledge in this species. Since it is much basic work to establish a broader knowledge, context with other modalities remains unknown. It might have been possible to probe certain contexts known from the fruit fly, which would have strengthened the manuscript.

      Thank you so much for your suggestion. According to this suggestion, we further added some sentences probing sugar sensing and behaviors of fruit fly larvae in the Introduction and discussion sections (Line 68-71 in Introduction section, Line 395-399 in Discussion section).

      Reviewer #3 (Public Review):

      In this study, the authors combine electrophysiology, behavioural analyses, and genetic editing techniques on the cotton bollworm to identify the molecular basis of sugar sensing in this species.

      The larval and adult forms of this species feed on different plant parts. Larvae primarily consume leaves, which have relatively lower sugar concentrations, while adults feed on nectar, rich in sugar. Through a series of experiments-spanning electrophysiological recordings from both larval and adult sensillae, qPCR expression analysis of identified GRs from these sensillae, response profiles of these GRs to various sugars via heterologous expression in Xenopus oocytes, and evaluations of CRISPR mutants based on these parameters-the authors discovered that larvae and adults employ distinct GRs for sugar sensing. While the larva uses the highly sensitive GR10, the adult uses the less sensitive and broadly tuned GR6. This differential use of GRs are in keeping with their behavioral ecology.

      The data are cohesive and consistently align across the methodologies employed. They are also well presented and the manuscript is clearly written.

      Recommendations for the authors:

      While appreciating the quality of the work and its presentation, we have a few comments for the authors, should they wish to consider them, that would significantly improve the presentation of the work.

      Title: Could the authors please revisit their title to better reflect the main finding of their work?

      The title has been changed into “The larva and adult of Helicoverpa armigera use differential gustatory receptors to sense sugars”.

      Text: There are a few comments related to the text, and these are listed below:

      (1) Could the authors place their work in the context of what's known about sugar sensing in Drosophila larva and adult?

      In the Introduction section, we added the status of research on sugar perception in Drosophila larvae, pointing out "No external sugar-sensing mechanism in Drosophila larvae has yet been characterized." (Line 70-71); in the Discussion section, the research progress of sugar sensing in Drosophila adults and larvae was also summarized (Line 397-399).

      (2) For each results section, could the authors please include a sentence or two that interprets the data in the context of previously presented data?

      We accept the reviewer's suggestion. In order to make it easy for readers to follow up, we included a sentence interprets the above data at the beginning of each part of the Results on the premise of avoiding duplication.

      (3) Could the authors please provide details of the generation and screening of the CRISPR mutants?

      We have added more details on mutant establishment and screening in the Materials and Methods section (Line 722-726, 729-732).

      Figures: Could the authors please include images and schematics wherever possible? For example, a schematic depicting the position of the sense organs and one summarising the main findings of the studies.

      In Figure 1 we added the photo of each taste organ, on which the recorded sensilla were indicated. We also added a new figure, Figure 8, summarizing the main findings of the study.

      Choice of Sugars: Could the authors please justify their choice of sugars they have used in the analyses?

      In the first paragraph of the Results section of the article, we further explain the reasons for using the sugars in the study. “We first investigated the electrophysiological responses of the lateral and medial sensilla styloconica in the larval maxillary galea to eight sugars. These sugars were chosen because they are mostly found in host-plants of H. armigera or are representative in the structure and source of sugars.”

      In addition to this, there are several specific comments in the detailed reviewers comments below, which the authors could consider responding to.

      Reviewer #1 (Recommendations For The Authors):

      The article titled "Sucrose taste receptors exhibit dissimilarities between larval and adult stages of a moth" by Shuai-Shuai Zhang and colleagues provides an intriguing analysis. The authors have conducted a meticulously planned and executed study. However, I do have some inquiries.

      (1) What precisely does the term "differ" signify in the title? It can be expounded upon in terms of differing in expression or sensitivity. The title could benefit from being more informative. The authors should appropriately specify the insect species in the title of the paper. This would make it more comprehensible to readers. Merely mentioning the term "moth" does not provide any information about the model organism. Hence, it would be preferable to mention Helicoverpa armigera instead of using the generic term "moth" in the title.

      Thank you for your suggestions. We considered it better to emphasize that the receptors for sucrose are different, and we have accepted the suggestion of adding the name of the animal. The title has been changed into “The larva and adult of Helicoverpa armigera use differential gustatory receptors to sense sugars”.

      (2) The abstract is written in a simple and easily understandable manner, but it overlooks important findings from a technical standpoint.

      We add some key experimental techniques to illustrate some important findings in the Abstract.

      (3). Almost all herbivorous insects are said to consume plants and utilize sucrose as a stimulus for feeding, as stated by the authors. Sucrose, glucose, and fructose sugar are among the commonly observed stimulants for feeding in numerous insects. It would be appropriate to incorporate not only sucrose but also glucose and fructose as feeding stimulants for almost all herbivorous insects.

      Thank you for your suggestion. Sucrose is the major sugar in plants, and its concentration varies greatly from tissue to tissue, while the concentration of the hexose sugars is much lower and the concentration does not change much. In Line 48, we state that sucrose, glucose, and fructose are feeding stimuli for herbivorous insects. From the previous studies, it seems that sucrose is the strongest, followed by fructose, and finally glucose. The cotton bollworm larvae showed no electrophysiological and behavioral response to glucose.

      (4) The reason why trehalose is not considered in the electrophysiology analysis is unclear. Given that trehalose is a major sugar in insects and plants, it would be intriguing to include it in the analysis.

      We have accepted the reviewer's suggestion, and supplemented the electrophysiological responses of taste organs in larvae and adults of Helicoverpa armigera to trehalose (Figure 1, Figure 1-Figure Supplement 1), and also tested the behavioral responses of the larvae and adults to trehalose (Figure 2, Figure 2-Figure Supplement 1). Therefore, all the related figures have been changed.

      (5) The author's intention regarding the co-receptor relationship between Gr5 and Gr6 (line 211) is unclear. If this is indeed the case, then the reason for considering Gr5 in further studies remains uncertain.

      We have changed the sentence as follows: “Since Gr5 was highly expressed with Gr6 in the proboscis and tarsi (Figure 3D-3E, Figure 3—figure supplement 1), we suspected that Gr5 and Gr6 might be expressed in the same cells, and then tested the response profile of their co-expression in oocytes.”

      (6) The homologous nature of Grs is emphasized by the authors. It is not specified how the author ensured that the guide RNA targeting Gr6 or Gr10 did not result in off-target effects on other Grs.

      Thank you so much for your suggestion. We have rewritten the relevant paragraph (Line 238-251), detailing our tests and the results on the potential off-target effects of knocking out GRs by CRISPR/Cas9: “In order to predict the potential off-target sites of sgRNA of Gr6 and Gr10, we used online software Cas-OFFinder (http://www.rgenome.net/cas-offinder/) to blast the genome of H. armigera, and the mismatch number was set to less than or equal to 3. According to the predicted results, the Gr10 sgRNA had no potential off-target region but Gr6 sgRNA had one. Therefore, we amplified and sequenced the potential off-target region of Gr6-/- and found there was no frameshift or premature stop codon in the region compared to WT (Figure 5—figure supplement 2). It is worth mentioning that there was no potential off-target region of Gr6 and Gr10 sgRNA in other sugar receptor genes of H. armigera, Gr4, Gr5, Gr7, Gr8, Gr9, Gr11 and Gr12. We further found there was no difference in the response to xylose of the medial sensilla styloconica among WT, Gr10-/- and Gr6-/- (Figure 5—figure supplement 2). Furthermore, WT, Gr10-/- and Gr6-/- did not show differences in the larval body weight, adult lifespan, and number of eggs laid per female (Figure 5—figure supplement 2). All these results suggest that no off-target effects occurred in the study.”

      (7) Is it possible that knocking out Gr10 is not compensated for by the overexpression of Gr6 or other sucrose sensing Grs? Similarly, would the vice versa scenario hold true?

      In the Discussion section, we have added some sentences to discuss this issue: “From our results, knocking out Gr10 or Gr6 is unlikely to be compensated by overexpression of other sugar GRs. One of our recent studies showed that Orco knockout had no significant effect on the expression of most OR, IR and GR genes in adult antennae of H. armigera, but some genes were up- or down-regulated (Fan et al., 2022).”

      (8) What was the rationale for selecting nine candidate GR genes for expression analysis?

      Based on the reviewer's suggestion, we expanded the relevant paragraphs to illustrate the rationale for selecting nine candidate GR genes for expression analysis: “To reveal the molecular basis of sugar reception in the taste sensilla of H. armigera, we first analyzed the putative sugar gustatory receptor genes based on the reported gene sequences of GRs in H. armigera and their phylogenetic relationship of D. melanogaster sugar gustatory receptors (Jiang et al., 2015; Pearce et al., 2017; Xu et al., 2017). Nine putative sugar GR genes, Gr4–12 were identified, and their full-length cDNA sequences were cloned (The GenBank accession number is provided in Appendix—Table S1).” (Line 155-161)

      (9) What is the potential reason for the difference between the major larval sugar receptors of Drosophila and Lepidopterans?

      The difference between the major larval sugar receptors of Drosophila and Lepidopterans is probably due to differences in the food their larvae feed on. Fruit fly larvae feed on rotten fruit, the main sugar of which is fructose. The larvae of Lepidoptera mainly feed on plants, and the main sugar is sucrose. In the Discussion section, we have added a sentence “This is most likely due to fruit fly larvae feeding on rotten fruits, which contain fructose as the main sugar.” (Line 399-401)

      (10) There is a disparity in GRs, specifically GR5 and GR6, between the female antenna, proboscis, and tarsi. What could be the possible justification and significance of this?

      Thank you so much for this question. We have added a sentence in the Discussion section, “In this study, the expression patterns of 9 sugar GRs in three taste organs of adult H. armigera show that there is a disparity in GRs, specifically GR5 and GR6, between the female antenna, tarsi and proboscis, which may be an evolutionary adaptation reflecting subtle differentiation in the function of these taste organs in adult foraging. Antennae and tarsi play a role in the exploration of potential sugar sources, while the proboscis plays a more precise role in the final decision to feed.” (Line 433-438)

      (11) I suggest that a visual representation illustrating the positioning of GSNs, particularly the lateral and medial sensilla, in both larva and adult stages would enhance the correlation with the results.

      In Figure 1 we added the photo of each taste organ and the position of the recorded sensilla, and also added a new figure, Figure 8 summarizing the main findings of the studies.

      (12) Further experiments can be conducted to elucidate the precise molecular mechanisms, particularly the downstream effects of GRs, in order to establish the specificity of GRs more convincingly.

      Thank you so much for your suggestion. We have discussed the further experiments in the Discussion section, “To elucidate the precise molecular mechanisms of sugar reception in H. armigera is necessary to compare a series of single, double and even multiple Gr knock-out lines and investigate the downstream effects of the GRs.” (Line 363-369)

      (13) Figure 6 caption: In Figure 6 (D to I), the percentage of PER is depicted. There is redundancy in the Y-axis title (Percentage of PER) and the legend. This appears to be repetitive. I suggest that it would be better to include the Y-axis title only in Figure D or in Figures D and G.

      We accept the suggestion. Figure 7 (not Figure 6) has been revised accordingly.

      (14) In Figures 6A and 6C, there is inconsistency in the colors used for WT, Gr6, and Gr10. This could potentially confuse the reader. I recommend using the same colors in both figures instead of using a blue color. Please specify how the authors calculated the feeding area in Figure 6.

      We accept the reviewer's suggestion and have changed the color of Figure 7A, B. We have also added the detail method for calculating feeding area (Line 541-545).

      (15) In Two-choice tests, why did the authors use 0.01% Tween 80? Please provide comments on this.

      Use of 0.01% Tween 80 is to reduce the surface tension and increase the malleability of the solution. We have given detailed explanation in the Method section and cite the reference. (Line538-540)

      (16) It would be valuable if the authors could comment on the prospects of this study, considering that GRs play a vital role in controlling behavior and developmental pathways. What are the potential consequences of blocking or disrupting these receptors in terms of behavioral and developmental phenotypic deformities? Could this potentially lead to increased insect mortality?

      Thank you so much for your suggestions. In the last paragraph of the Discussion section, we have added the following perspectives, “Knockout of Gr10 or Gr6 led to a significant decrease in sugar sensitivity and food preference of the larvae and adults of H. armigera, respectively, which is bound to bring adverse consequences to survival and reproduction of the insects. Therefore, studying the molecular mechanisms underlying sugar perception in phytophagous insects may provide new insights into the behavioral ecology of this important and highly diverse group of insects, and measures blocking or disrupting sugar receptors could also have applications to control agricultural pests and improve crop yields worldwide” (Line 449-456).

      Reviewer #2 (Recommendations for The Authors):

      There are a few comments, that I feel would be beneficial to be addressed.

      • The authors used 7 different sugars for their experimental approach. While I agree that this is a sufficiently large collection for a study, I was wondering why they specifically chose these sugars; an explanatory section might be helpful for a reader to follow the reasoning.

      According to reviewer 1's suggestion, we increased trehalose to 8 sugars in experiments. Trehalose is a main sugar in insect blood. It is converted by insects after feeding on plant sugars. The 8 sugars were chosen because they are present in host-plants of H. armigera or are representative in the structure and source of sugars. They contain 2 pentoses (arabinose and xylose), 4 hexoses (fructose, fucose, galactose and glucose), and 2 disaccharides (sucrose and trehalose).

      • It might be beneficial to provide some broader overview on the gustatory system in the cotton bollworm, particularly at the larval stage since this may not be common knowledge. Along these lines eg. the complexity of sensilla types, organs and overall number (or estimation) of neurons might be good to know, a graphical representation of the sense organs might be informative.

      In the Introduction section, we give a more specific description on sugar sensitive GSNs in the taste system of the larva and adult of H. armigera, and cite the corresponding references.

      • Concerning phylogeny of GRs, it might be relevant to know how complete the genome information is and some more general background on GR diversity in the cotton bollworm.

      We agree to your opinion. According to this idea, we got the putative sugar GRs from the previously published genome (Pearce et al. 2017) and the related annotation of GRs (Jiang et al. 2015, Xu et al. 2012). We have made a more detailed explanation about this in the new version of the manuscript, “We first analyzed the putative sugar gustatory receptor genes based on the genome data of H. armigera (Pearce et al. 2017), the reported gene sequences of sugar GRs in H. armigera and their phylogenetic relationship of D. melanogaster sugar gustatory receptors (Jiang et al. 2015, Xu et al. 2012). All nine putative sugar GR genes in H. armigera, Gr4–12 were validated, and their full-length cDNA sequences were cloned (The GenBank accession number is provided in Appendix—Table S1).” (Line 155-161).

      • Generation of mutants based on CRISPR is intriguing and a powerful step. While the techniques are well described in the method section, there is no information concerning efficiency or broader feasibility of the approach. I feel it would be quite interesting to learn about how feasible or laborious the approach is to generate mutants (e.g. number of initial injected eggs, the resulting F0 offspring, number of back-crosses, number of screened F1s ....).

      In the Materials and Methods section, we have added specific success rates for each step in the process of building the two mutants (Line 722-726, 729-732).

      Reviewer #3 (Recommendations For The Authors):

      I want to congratulate the authors on this very nice study and have only minor comments for them.

      (1) It would be very nice to include pictures of the larva and adult of H. armigera. It would also help to have schematics of where the sensilla they are recording from are.

      We have added photos of four taste organs on which the recoded sensilla were indicated (Figure 1), and picture of the larva and adult on which the stimulating site was indicated (Figure 2).

      (2) A schematic summarising their findings, including the relevance to the animal's behavioural ecology, will greatly improve interpretations for the broader audience.

      A schematic summarizing the findings has been added.

      (3) The manner in which PIs are represented in figure 2A, B (among others) is confusing. Can the authors please plot the PI and not the feeding area? From the PI values listed beside the plot, it actually suggests that the larvae don't really show a preference. Could the authors please comment on this?

      Yes, sucrose has a significant stimulating effect on larva feeding, but the effect is not as large as the predicted based on the sensitivity of the sensillum, the main reasons are as follows: (1) there are many factors affecting larva feeding, sucrose is only one of them; (2) due to the substrate leaf discs also contain sugar, the effect of newly added sucrose may be reduced. After careful consideration, we think it is better to display the feeding area and PI together so that readers have a complete understanding of the data.

      (4) The heterologous expression experiments suggest that co-expression of GR6 with either GR10 or GR5 somehow suppress the response of the GR6 alone to fucose. Am I reading the data correctly? Why would this be? Perhaps the authors could discuss this. In this context, it would help to reproduce all the GR6 data together.

      Your interpretation is reasonable to a certain extent. The result of co-injection might be that Gr10 or Gr5 inhibited the response of Gr6. However, there is another possibility that the amount of Gr6 sRNA was diluted by co-injection of two GRs, resulting in a reduced response of Gr6 to fucose.

      (5) In general, for each results section, it would help to have a sentence or two that interprets the data in the context of previously presented data. This would help the reader digest the data and interpret it as they read along. Currently, the authors summarise the observations and leave all the interpretation to the discussion section.

      We accept the suggestion. In each part of the results, we have added a sentence to explain the above data, which will help readers to clarify the context of the research more easily.

      (6) Is the GR6 data in 4C not lined up correctly?

      Yes, it is right.

      (7) Line 228 suggests that the mutants were validating with qPCRs - I don't see that data.

      The mutants were not validating with qPCR. We used the ordinary PCR technology at the mRNA level to verify whether the related sequences were really deleted in the mutants.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #3 (Public Review):

      Kang, Huang, and colleagues have provided new data to address concerns regarding confirmation of LRRK1 and LRRK2 deletion in their mouse model and the functional impact of the modest loss of TH+ neurons observed in the substantia nigra of their double KO mice. In the revised manuscript, the new data around the characterization of the germline-deleted LRRK1 and LRRK2 mice add confidence that LRRK1 and LRRK2 can be deleted using the genetic approach. They have also added new text to the discussion to try and address some of the comments and questions raised regarding how LRRK1/2 loss may impact cell survival and the implications of this work for PD-linked variants in LRRK2 and therapeutic approaches targeting LRRK2.

      The new data provides additional support for the author's claims. I have provided below some suggestions for clarification/additions to the text that can be addressed without additional experiments.

      (1) The authors added additional text highlighting that more studies are warranted in mice where LRRK1/2 are deleted in other CNS cell types (microglia/astrocytes) to understand cell extrinsic drivers of the autophagy deficits observed in their previous work. It still remains unclear how loss of LRRK1/2 leads to increased apoptosis and gliosis in dopaminergic neurons in a cell-intrinsic manner, and, as suggested in the original review, it would be helpful to add some text to the discussion speculating on potential mechanisms by which this might occur.

      (2) Revisions have been made to the discussion to clarify their rationale around how variants in LRRK2 associated with PD may be loss-of-function to support the relevance of this mouse model to phenotypes observed in PD. However, as written, the argument that PD-linked variants are loss-offunction is based on the fact that the double KO mice have a mild loss of TH+ neurons while the transgenic mice overexpressing PD-linked LRRK2 variants often do not and that early characterization of kinase activity was done in vitro are relatively weak. Given that the majority of evidence generated by many labs in the field supports a gain-of-function mechanism, the discussion should be further tempered to better highlight the uncertainty around this (rather than strongly arguing for a loss-offunction effect). This could include the mention of increased Rab phosphorylation observed in cellular and animal models and opposing consequences on lysosomal function observed in cellular studies in KO and pathogenic variant expressing cells. Further, a reference to the Whiffen et al. 2020 paper mentioned by another reviewer should be included in the discussion for completeness.

      We thank the reviewer for the comments. The discussion has been further revised and expanded to explain the cell extrinsic microglial response to pathophysiological changes in DA neurons of cDKO mice and propose future studies of single-cell RNA-sequencing to identify molecular changes within DA neurons of cDKO mice that may drive their apoptotic death during aging.

      We also added paragraphs summarizing existing experimental evidence for the toxic gain-of-function mechanism (biochemical data of increased kinase activity but the lack of evidence for the elevated pRabs and the altered pLRRK2 driving dopaminergic neurodegeneration) and for the loss-of-function mechanism (genetic data of relevant physiological roles) as well as the relationships between LRRK1 and LRRK2 (functional homologues sharing functional domains and overlapping roles in dopaminergic neuron survival) and how dominantly inherited missense mutations can confer a loss of function mechanism (impairing its function in cis and inhibiting wild-type protein function in trans). We also provided a brief summary and discussion of the Whiffen et al. 2020 paper.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      This study identifies a family of solute transports in the enteric protist, Blastocystis, that may mediate the transport of glycolytic intermediates across the mitochondrial membrane. The study builds on previous observations suggesting that Blastocystis (and other Stramenopiles) are unusual in having a compartmentalized glycolytic pathway with enzymes involved in upper and lower glycolysis being located in the cytosol and mitochondria, respectively. In this study, the authors identified two putative Stamenopile metabolite transporters that are related to plant di/tricarboxylic acid transporters that might mediate the transport of glycolytic intermediates across the mitochondrial membrane. These GIC-transporters were localized to the Blastocystis mitochondrion using specific rabbit antibodies and shown to bind several glycolytic intermediates (including GAP, DHAP, and PEP) based on thermostability shift assays. Direct evidence for transport activity was obtained by reconstituting native proteins in proteoliposomes and measuring the uptake of 14C-malate or 35S-sulphate against unlabelled substrates. This assay showed that GIC-2 transported DHAP, GAP, and PEP. However, significant transport activity was not observed for bGIC-2. Overall, the study provides strong, but not conclusive evidence that bGIC-1 is involved in transporting glycolytic intermediates across the inner membrane of the mitochondria, while the function of GIC-2 remains unclear, despite exhibiting the same metabolite binding properties as bGIC-2 in thermostability assays.

      Strengths:

      Overall, the findings are of interest in the context of understanding the diversity of core metabolic pathways in evolutionarily diverse eukaryotes, as well as the process by which cytosolic glycolysis evolved in most eukaryotes. The experiments are carefully performed and clearly described.

      We thank the reviewer for their constructive comments. We note that bGIC-2 is the identified glycolytic intermediate transporter, not bGIC-1.

      Weaknesses:

      The main weakness of the study is the lack of direct evidence that either bGIC-1 and/or bGIC2 are active in vivo. While it is appreciated that the genetic tools for disrupting GIC genes in Blastocystis are limited/lacking, are there opportunities to ectopically express or delete these genes in other Stamenopiles, such as Phaeodactylum triconuteum, to demonstrate function in vivo?

      Here, we have identified a transport protein, unique to stramenopiles, which is present in mitochondria of Blastocystis and can bind and transport glycolytic intermediates. We agree that it would have been desirable to confirm that they function as glycolytic intermediate transporters in vivo. However, the reviewer is correct in saying that the genetic tools for disrupting GIC genes in Blastocystis in vivo are not available. While the reviewer mentions the possibility of performing these analyses in Phaeodactylum tricornutum, it is important to note that this species possesses aerobic mitochondria and that the pay-off phase of glycolysis is present in both the mitochondrial matrix and the cytosol. Consequently, any data obtained from this species might not be conclusive and would also not be relevant to the glycolytic metabolism in Blastocystis, the subject of this study.

      The authors demonstrate that both bGIC-1 and bGIC-2 are targeted to the mitochondrion, based on immunofluorescence studies. However, the precise localization and topology of these carriers in the inner or outer membrane are not defined. The conclusions of the study would be strengthened if the authors could show that one/both transporters are present in the inner membrane using protease protection experiments following differential solubilization of the outer and inner mitochondrial membranes.

      The protein is a member of the mitochondrial carrier family, which are extremely hydrophobic membrane proteins. Those with an established transport function are known to localise consistently to the mitochondrial inner membrane, which is impermeable to charged molecules, whereas the outer membrane is porous through VDAC. Furthermore, when the carriers are overproduced in Saccharomyces cerevisiae, the protein is found in the enriched mitochondrial fraction, adding further support to the idea that they are localised to the inner membrane, as the outer membrane has a limited surface area.

      It is not clear why hetero-exchange reactions were not performed for bGIC-1 (only for bGIC-2).

      Unfortunately, bGIC-1 did not display transport activity when tested in [14C]-malate/malate, [35S]-sulphate/sulphate or [33P]-phosphate/phosphate homo-exchange reactions, as shown in Figure 6 (Figure 5 in the revised manuscript). Phosphoenolpyruvate and dihydroxyacetone phosphate are not available in a radiolabelled form and glyceraldehyde-3-phosphate is prohibitively expensive, so we were unable to test glycolytic intermediates directly in homo-exchange reactions. Hetero-exchange reactions, as performed in Figure 5 (Figure 6 in the revised manuscript) for bGIC-2, are conclusive, as accumulation of the radio-labelled substrate inside the proteoliposomes can only occur, when the internal substrate is exported. It seems that Blastocystis has multiple copies, some of which are coding for dysfunctional carriers, being possible pseudo-genes.

      The summary slide depicted in Fig 7 is somewhat simplified and inaccurate. First, the authors show that TPI is located in the mitochondria in this study, while in the summary figure, TPI is shown to be present in both the cytosol and mitochondrial matrix. A cytosolic localization for TPI provides a functional rationale for having a triose-P carrier in the inner membrane - however, this is not supported by the data shown here. Second, if bGIC1/2 uses PEP as a counter ion to import GA3P and DHAP into the mitochondrion, as proposed in Fig 7, the lower glycolytic pathway would be effectively truncated at PEP, removing substrate for pyruvate kinase and formation of pyruvate/ATP. Third, the authors suggest that DHAP may have other functions in the mitochondria although these are not shown in the figure.

      Figure 7 presents a schematic comparison of the localisation of glycolysis in humans and Blastocystis, specifically focused on the transport steps of either pyruvate (humans) or glycolytic intermediates (Blastocystis) into the mitochondrial matrix. Most of the metabolism of Blastocystis has been inferred from the presence or absence of genes, encoding for particular enzymes, with the exception of the unusual glycolytic pathway. We feel that overcomplicating this schematic figure would detract from the main message of this analysis. Although the transport data show that PEP, another glycolytic intermediate, is transported, we agree with the reviewer that PEP export cannot be rationalised in the context of our current understanding of the metabolism, and we have changed the figure accordingly.

      We have not suggested that DHAP has other functions in mitochondria; on line 230, we state that ‘we have not found any evidence for the presence of dihydroxyacetone phosphate inside mitochondria in the literature. It is possible that it is not transported under physiological conditions in competition with dicarboxylates or other substrates.’

      Reviewer #2 (Public Review):

      In this manuscript, the authors set out to identify transporters that must exist in Stramenophiles due to the fact that the second half of glycolysis appears to be conducted in the mitochondria. They hypothesize that a Stramenophile-specific clade of transporters related to the dicarboxylate carriers is likely the relevant family and then go on to test two proteins from Blastocystis due to the infectious disease relevance of this organism. They show rather convincingly that these two proteins are expressed and are localized to the mitochondria in the native organism. The purified proteins bind to glycolytic intermediates and one of them, GIC-2, transports several glycolytic intermediates in vitro. This is a very solid and well-executed study that clearly demonstrates that bCIC-2 can transport glycolytic intermediates.

      We thank the reviewer for their positive comments on the manuscript, and their careful analyses of the presented data.

      (1) The major weakness is that the authors aren't able to show that this protein actually has this function in the native organism. This could be impossible due to the lack of genetic tools in Blastocystis, but it leaves us without absolute confidence that bGIC-2 is the important glycolytic intermediate mitochondrial transporter (or even that it has this function in vivo).

      Unfortunately, genetic manipulation in Blastocystis is currently not feasible and thus we cannot conduct a comparative metabolic study with the appropriate controls. The gold standard for identification is to prove the function with purified protein directly, which we have done here by using binding studies and transport assays.

      (2) It's atypical that the figures and figure panels don't really follow the order of their citation in the text. It's not a big deal, but mildly annoying to have to skip around in the figures (e.g. Figure 3D-E are described in the same paragraph as Figure 5). In addition, to facilitate the flow and a proper understanding I would encourage a reordering between figures 5D and 6 since Figure 6 is needed to understand the results shown in panel 5D, which may lead to confusion.

      We agree with the reviewer and have reordered the figures, switching Figure 5 and 6, which makes the manuscript easier to follow.

      (3) My impression is that the authors under-emphasize the fact that the hDIC also binds (and is stabilized by) glycolytic intermediates (G3P and 3PG). In the opinion of this reviewer, this might change the interpretation about the uniqueness of the bGIC proteins. They act on additional glycolytic intermediates, but it's not unique.

      The reviewer is correct that hDIC is stabilized by both G3P and 3PG, but neither are transported, as shown in Figure 5B (Figure 6B in the revised manuscript). It is not uncommon for compounds to bind to some extend without being transported, as they share certain structural and chemical features with the substrates, which result in stabilisation in thermostability analyses. For example, GTP stabilises the ADP/ATP carrier in thermostability analyses to some extent (Majd et al, 2018), although it is not a transported substrate of the carrier (King et al, 2020). Although thermostability assays are very useful for screening of potential substrates, it is always necessary to carry out transport assays, which are the gold standard for transporter identification.

      Reviewer #3 (Public Review):

      Summary:

      Unlike most eukaryotes, Blastocystis has a branched glycolysis pathway, which is split between the cytoplasm and the mitochondrial matrix. An outstanding question was how the glycolytic intermediates generated in the 'preparatory' phase' are transported into the mitochondrial matrix for the 'pay off' phase. Here, the authors use bioinformatic analysis to identify two candidate solute carrier genes, bGIC-1, and bGIC-2, and use biochemical and biophysical methods to characterise their substrate specificity and transport properties. The authors demonstrate that bGIC-2 can transport dihydroxyacetone phosphate, glyceraldehyde-3-phosphate, 3-phosphoglycerate, and phosphoenolpyruvate, establishing this protein as the 'missing link' connecting the two split branches of glycolysis in this branch of single-celled eukaryotes. The authors also present their data on bGIC-1, which suggests a role in anion transport and bOGC, which is a close functional homologue of the human oxoglutarate carrier (hOGC, SLC25A11) and human dicarboxylate carrier (hDIC, SLC25A10).

      Strengths:

      The results are presented in a clear and logical arrangement, which nicely leads the reader through the process of gene identification and subsequent ligand screening and functional reconstitution. The results are compelling and well supported - the thermal stabilisation data is supported by the exchange studies. Caveats, where apparent, are discussed and rational explanations are given.

      We thank the reviewer for their positive and constructive comments on the manuscript.

      Weaknesses:

      The study does not contain any significant weaknesses in my view. I would like to see the authors include the initial rate plots used in the main figures (possibly as insets), so we can observe the data points used for these calculations. It would also have been interesting to include the AlphaFold models for bGIC-1 and bGIC-2 and a discussion/rationalisation for the substrate specificity discussed in the study.

      We have shown uptake curves in both Figure 3 and Figure 6 (Figure 5 in the revised manuscript) to provide the typical uptake curves that we record by our robot, and we also show how we calculate the initial rates. We feel that the inclusion of uptake curves for each compound for each carrier (96 uptake curves in total) would make figure 5 (Figure 6 in the revised manuscript) extremely complicated.

      It would also have been interesting to include the AlphaFold models for bGIC-1 and bGIC-2 and a discussion/rationalisation for the substrate specificity discussed in the study.

      Whilst AlphaFold is an important step forward in the prediction of protein structures, it is not accurate enough at this time to be used for the rationalisation of the substrate specificity. For instance, there are the significant structural differences between the predicted AlphaFold structure of the human uncoupling protein (https://alphafold.ebi.ac.uk/entry/P25874), by and large based on the mitochondrial ADP/ATP carrier, and the experimentally determined structure, especially for the central cavity where the substrate recognition takes place (Jones et al, 2023; Kang & Chen, 2023). More importantly, it is believed that the optimal binding of the substrate takes place in the occluded state (Klingenberg, 2007; Springett et al, 2017), for which we have no structure.

      References

      Jones SA, Gogoi P, Ruprecht JJ, King MS, Lee Y, Zögg T, Pardon E, Chand D, Steimle S, Copeman DM et al (2023) Structural basis of purine nucleotide inhibition of human uncoupling protein 1. Sci Adv 9: eadh4251

      Kang Y, Chen L (2023) Structural basis for the binding of DNP and purine nucleotides onto UCP1. Nature 620: 226-231

      King MS, Tavoulari S, Mavridou V, King AC, Mifsud J, Kunji ERS (2020) A single cysteine residue in the translocation pathway of the mitosomal ADP/ATP carrier from Cryptosporidium parvum confers a broad nucleotide specificity. Int J Mol Sci 21: 8971

      Klingenberg M (2007) Transport viewed as a catalytic process. Biochimie 89: 1042-1048

      Majd H, King MS, Palmer SM, Smith AC, Elbourne LD, Paulsen IT, Sharples D, Henderson PJ, Kunji ER (2018) Screening of candidate substrates and coupling ions of transporters by thermostability shift assays. Elife 7: e38821

      Springett R, King MS, Crichton PG, Kunji ERS (2017) Modelling the free energy profile of the mitochondrial ADP/ATP carrier. Biochim Biophys Acta 1858: 906-914

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors present a detailed study of a nearly complete Entomophthora muscae genome assembly and annotation, along with comparative analyses among related and non-related entomopathogenic fungi. The genome is one of the largest fungal genomes sequenced, and the authors document the proliferation and evolution of transposons and the presence/absence of related genetic machinery to explore how this may have occurred. There has also been an expansion in gene number, which appears to contain many "novel" genes unique to E. muscae. Functionally, the authors were interested in CAZymes, proteases, circadian clock related genes (due to entomopathogenicity/ host manipulation), other insect pathogenspecific genes, and secondary metabolites. There are many interesting findings including expansions in trahalases, unique insulinase, and another peptidase, and some evidence for RIP in Entomophthoralean fungi. The authors performed a separate study examining E. muscae species complex and related strains. Specifically, morphological traits were measured for strains and then compared to the 28S+ITSbased phylogeny, showing little informativeness of these morpho characters with high levels of overlap.

      This work represents a big leap forward in the genomics of non-Dikarya fungi and large fungal genomes. Most of the gene homologs have been studied in species that diverged hundreds of millions of years ago, and therefore using standard comparative genomic approaches is not trivial and still relatively little is known. This paper provides many new hypotheses and potential avenues of research about fungal genome size expansion, entomopathogenesis in zygomycetes, and cellular functions like RIP and circadian mechanisms.

      Strengths:

      There are many strengths to this study. It represents a massive amount of work and a very thorough functional analysis of the gene content in these fungi (which are largely unsequenced and definitely understudied). Too often comparative genomic work will focus on one aspect and leave the reader wondering about all the other ways genome(s) are unique or different from others. This study really dove in and explored the relevant aspects of the E. muscae genome.

      The authors used both a priori and emergent properties to shape their analyses (by searching for specific genes of interest and by analyzing genes underrepresented, expanded, or unique to their chosen taxa), enabling a detailed review of the genomic architecture and content. Specifically, I'm impressed by the analysis of missing genes (pFAMs) in E. muscae, none of which are enriched in relatives, suggesting this fungus is really different not by gene loss, but by its gene expansions.

      Analyzing species-level boundaries and the data underlying those (genetic or morphological) is not something frequently presented in comparative genomic studies, however, here it is a welcome addition as the target species of the study is part of a species complex where morphology can be misleading and genetic data is infrequently collected in conjunction with the morphological data.

      Thank you for your careful reading of our work. We’re glad that you identified these areas as strengths.

      Weaknesses:

      The conclusions of this paper are mostly well supported by data, but a few points should be clarified.

      In the analysis of Orthogroups (OGs), the claim in the text is that E. muscae "has genes in multi-species OGs no more frequently than Enotomophaga maimaiga. (Fig. 3F)" I don't see that in 3F. But maybe I'm really missing something.

      Thank you for catching this. You were, in fact, not missing anything at all. There was a mismatch between the data plotted in F and G and how the caption described these data. We very much apologize for the confusion that this must have caused. We have corrected these plots and also made changes to improve interpretability (see below).

      Also related, based on what is written in the text of the OG section, I think portions of Figure 3G are incorrect/ duplicated. First, a general question, related to the first two portions of the graph. How do "Genes assigned to an OG" and "Genes not assigned to an OG" not equal 100% for each species? The graph as currently visualized does not show that. Then I think the bars in portion 3 "Genes in speciesspecific OG" are wrong (because in the text it says "N. thromboides had just 16.3%" species-specific OGs, but the graph clearly shows that bar at around 50%. I think portion 3 is just a duplicate of the bars in portion 4 - they look exactly the same - and in addition, as stated in the text portion 4 "Potentially speciesspecific genes" should be the simple addition of the bars in portion 2 and portion 3 for each species.

      As mentioned above, we sincerely regret the error made in the plot and for the confusion that this caused. F now reflects the percentage of orthogroups (OGs) that possess at least one representative from the indicated species (left) and the percentage of OGs that are species-specific (only possess genes from one species; right). The latter is a subset of the former. G now reflects the percentage of annotated genes that were assigned an OG, per species, as well as the inverse of this - genes that were not assigned to any OG. These should, and now do, sum to 100%. The “Within species-specific OG” data summed with the “Not assigned OG” data yields the “Potentially species-specific data” in the rightmost column.

      In the introduction, there is a name for the phenomenon of "clinging to or biting the tops of plants," it's called summit disease. And just for some context for the readers, summit disease is well-documented in many of these taxa in the older literature, but it is often ignored in modern studies - even though it is a fascinating effect seen in many insect hosts, caused by many, many fungi, nematodes (!), etc. This phenomenon has evolved many times. Nice discussions of this in Evans 1989 and Roy et al. 2006 (both of whom cite much of the older literature).

      You’re right. We have now clarified that this behavior is called “summit disease” and referenced the suggested articles, along with a more recent review.

      Reviewer #2 (Public Review):

      In their study, Stajich and co-authors present a new 1.03 Gb genome assembly for an isolate of the fungal insect parasite Entomophthora muscae (Entomophthoromycota phylum, isolated from Drosophila hydei). Many species of the Entomophthoromycota phylum are specialised insect pathogens with relatively large genomes for fungi, with interesting yet largely unexplored biology. The authors compare their new E. muscae assembly to those of other species in the Entomophthorales order and also more generally to other fungi. For that, they first focus on repetitive DNA (transposons) and show that Ty3 LTRs are highly abundant in the E. muscae genome and contribute to ~40% of the species' genome, a feature that is shared by closely related species in the Entomophthorales. Next, the authors describe the major differences in protein content between species in the genus, focusing on functional domains, namely protein families (pfam), carbohydrate-active enzymes, and peptidases. They highlight several protein families that are overrepresented/underrepresented in the E. muscae genome and other

      Entomophthorales genomes. The authors also highlight differences in components of the circadian rhythm, which might be relevant to the biology of these insect-infecting fungi. To gain further insights into E. muscae specificities, the authors identify orthologous proteins among four Entomophthorales species. Consistently with a larger genome and protein set in E. muscae, they find that 21% of the 17,111 orthogroups are specific to the species. To finish, the authors examine the consistency between methods for species delineation in the genus using molecular (ITS + 28S) or morphological data (# of nuclei per conidia + conidia size) and highlight major incongruences between the two.

      Although most of the methods applied in the frame of this study are appropriate with the scripts made available, I believe there are some major discrepancies in the datasets that are compared which could undermine most of the results/conclusions. More precisely, most of the results are based on the comparison of protein family content between four Entomophthorales species. As the authors mention on page 5, genome (transcriptome) assembly and further annotation procedures can strongly influence gene discovery. Here, the authors re-annotated two assemblies using their own methods and recovered between 30 and 60% more genes than in the original dataset, but if I understand it correctly, they perform all downstream comparative analyses using the original annotations. Given the focus on E. muscae and the small sample size (four genomes compared), I believe performing the comparisons on the newly annotated assemblies would be more rigorous for making any claim on gene family variation.

      Thank you for this comment. While we did compare gene model predictions for two of these assemblies to assess if this difference could account for discrepancies in gene counts, completely reannotating all non-E. muscae datasets was outside of the scope of this study. In our opinion, the total number of predicted genes in a genome is not a best representation of differences since splitting or fusing gene models can inflate seeming differences; the orthology and domain counts are a more accurate assessment of the content. It’s possible that annotation differences may have inflated some gene family counts, however we will note that similar domain trends were observed between the closest species to E. muscae, Entomophaga maimaiga, suggesting that these differences were not sufficient to prevent us from detecting real biological signals. We look forward to continued improvement of our genome through additional sequencing and more clarity on total gene content of E. muscae.

      The authors also investigate the putative impact of repeat-induced point mutation on the architecture of the large Entomophthorales genomes (for three of the eight species in Figure 1) and report low RIP-like dinucleotide signatures despite the presence of RID1 (a gene involved in the RIP process in Neurospora crassa) and RNAi machinery. They base their analysis on the presence of specific PFAM domains across the proteome of the three Entomophthorales species. In the case of RID1, the authors searched for a DNA methyltransferase domain (PF00145), however other proteins than RID1 bear such functional domain (DNMT family) so that in the current analysis it is impossible to say if the authors are actually looking at RID1 homologs (probably not, RID1 is monophyletic to the Ascomycota I believe). Similar comments apply to the analysis of components of the RNAi machinery. A more reliable alternative to the PFAM analysis would be to work with full protein sequences in addition to the functional domains.

      While we understand this concern regarding domain vs. full length protein, the advantage of the domain search is that HMM-based searches are sensitive to detecting more distantly related homologs. Entomophthoralean fungi are distantly related from the ascomycetes in which these mechanisms have been characterized, so we chose a broader search approach that may identify proteins with similar domain structure, but are not necessarily homologs. These searches are presented in the manuscript as preliminary, but worth further investigation. However, our RID-based analysis did not identify convincing homologs for RID1 in entomophthoralean fungi included in our investigation, and we reported low homology (i.e., 12-14%) among our orthogroup of interest and RID1. We have further edited this section to clarify our understanding that these candidates are not RID1 homologs. We had hoped to avoid this implication, but we felt this investigation and null result were worth reporting.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Specific points:

      Results:

      "1.03 Gb genome consisting of 7,810 contigs (N50 = 301.1 kb). Additional... resulted in a final contig count of 7,810 (N50 = 329.6 kb)" So you started and ended with the same contig count but a different N50? Is this a typo?

      Yes, this was a typo. Thank you for bringing this to our attention.

      Figure 1D.

      The colors of Complete1x and Complete2x are too similar to tell them apart.

      The colors have been made more distinct.

      Figure 4B.

      I know C. rosea has been found from insects before, but it's mostly a mycoparasite and occasionally an endophyte, and has bioactivity against a lot of things. I just saw that it's listed as an entomopathogen, and I was surprised. Anyway, leave it as is if you want to, but it's definitely better studied and better known (Google Scholar) as a mycoparasite.

      Thanks for this comment. For the sake of including a more diverse representation of entomopathogenic fungi, we have opted to leave this as is.

      Full references (from the public comment)

      Evans, H.C., 1989. Mycopathogens of insects of epigeal and aerial habitats. Insect-fungus interactions, pp.205-238.

      Roy, H.E., Steinkraus, D.C., Eilenberg, J., Hajek, A.E. and Pell, J.K., 2006. Bizarre interactions and endgames: entomopathogenic fungi and their arthropod hosts. Annu. Rev. Entomol., 51, pp.331-357.

      Reviewer #2 (Recommendations For The Authors):

      I believe the manuscript could largely benefit from restructuring the results section to enhance clarity. The results section reads like a lot of descriptive back and forth, so that the reader lacks a clear rationale. The absence of a consistent dataset used for the different comparisons made all along the manuscript makes it hard to follow.

      Minor comments:

      (No line numbers were available so I refer to page numbers).

      p1

      • not sure about the use of "allied" to describe other fungal species in the title and after (sister species?).

      We didn’t want to use the word sister because not all of these species could be considered sister.

      • Genomic defence against transposable elements rather than "anti"?

      We have rephrased to genomic defense.

      p3

      • Extra parenthesis at Bronski et al.

      This is now corrected.

      • What does newly-available mean here?

      We mean recent. A lot of the datasets we used were very new, and we wanted to emphasize that point.

      • The back and forth between genomes and transcriptomes makes it hard to follow, would clarify from the beginning (in addition to the sequencing method - short vs long-read assemblies as in Figure 1B) or perhaps use a consistent dataset for all subsequent comparative analysis in the Entomophthorales.

      We have denoted our transcriptomic datasets in Fig 1C using parentheses.

      p5

      • Perhaps clarify that class II DNA transposons can also "copy" (single-strand excisions can be repaired by the host machinery).

      We have now included mention of “copy” as well as “jump” mechanisms of Class II transposons per your suggestion.

      p6

      • "beginning roughly concurrently", not clear what "began".

      This is now corrected.

      • "control" rather than "protect against"?

      We’ve changed “protect against” to “counter”.

      • I believe RIP has only been observed (experimentally) in a handful of fungal species, all from the Ascomycota phylum.

      Hood et al, 2005 found signatures of RIP in anther-smut fungus and Horns et al, 2012, found evidence of hypermutability across repeat elements within several Pucciniales species.

      • "RID1 contains two DNA_methylase domains", RID1 has one methyltransferase domain according to the reference Freitag et al, 2002.

      Thank you for drawing this to our attention. It is true RID1 has one methyltransferase region; however, the sequence deposited by Freitag et al, 2002 (AAM27408) is predicted by HMMer to have two adjacent Pfam DNA_methylase domains (i.e., PF00145). In this exploratory analysis, we tried to leverage this characteristic to identify candidate proteins of interest. We have reworded this section to clarify this.

      p8

      • Here and after I would use more informative titles for each paragraph.

      With the exception of the headings for Pfam, CAZy and MEROPs analyses, we believe the other headings are informative. We appreciate this comment, but opt to leave the heading titles as is.

      • I believe presenting the orthology analysis before the more in-depth protein family domain search.

      We leveraged the OG analysis mostly as a way to identify potentially unique genes in E. muscae, so we think the current order makes the most sense.

      p10

      • Figures 3F and G are confusing. The legend for Figure 3F mentions "OGs with >= 2 species" while the figure shows "multi-species OGs", and reads as redundant with the "species-specific" OGs. For the "OGs within species" do I understand it correctly that it represents the number of genes assigned to OGs for each species? If yes, the numbers are in contradiction with Figure 3G. And in Figure 3G shouldn't the sum of "genes assigned in OGs" and "genes nor assigned in OGs" add up to 100? I'm probably missing something here, but I would clarify what the different sets of orthogroups are in the figure and in the text (perhaps adopting a pangenome-like nomenclature).

      Thanks for this comment. This legend, unfortunately, reflected an earlier version of the figure and was overlooked prior to submission. We have since amended this and sincerely apologize for the error on our part.

      p12

      • The whole first paragraph reads more like it should be part of an introduction/discussion.

      We’ve moved some of this paragraph to the discussion but left the background information necessary for the reader to understand why we were looking for homologs of wc and frq.

      p13

      • The last paragraph reads like discussion.

      We have revised this paragraph so it now reads: “Because E. muscae is an obligate insect-pathogen only living inside live flies, we investigate the presence of canonical entomopathogenic enzymes in the genome. We find that E. muscae appear to have an expanded group of acid-trehalases compared to other entomopathogenic and non-entomopathogenic Entomophthorales (Fig. 4A), which correlates with the primary sugar in insect blood (hemolymph) being trehalose (Thompson, 2003). The obligate insectpathogenic lifestyle is also evident when comparing the repertoire of lipases, subtilisin-like serine proteases, trypsins, and chitinases in our focal species versus Zoopagomycota and Ascomycota fungi that are not obligate insect pathogens (Fig. 4B). Sordariomycetes within Ascomycota contains the other major transition to insect-pathogenicity within the kingdom Fungi (Araújo and Hughes, 2016). Based on our comparison of gene numbers, Entomophthorales possess more enzymes suitable for cuticle penetration than Sordariomycetes (Fig. 4B). In contrast, insect-pathogenic fungi within Hypocreales possess a more diverse secondary metabolite biosynthesis machinery as evidenced by the absence of polyketide synthase (PKS) and indole pathways in Entomophthorales (Fig. 4C).”

      p15 and 16

      • This all reads as redundant with the previous protein family domain analysis. I would try to merge them.

      Thank you for this comment, however we have opted to maintain the current structure.

      p18

      • In the first sentence, I'm not sure about what was performed here.

      This has been reworded to clarify.

      p20

      • Regarding the assembly, do I understand it correctly that a nuclear genome can be partially haploid / diploid?

      Thanks for your comment. The genome itself is, of course, some integer multiple of n, but based on BUSCO scores our assembly doesn’t appear to have completely collapsed into a haploid genome. We think it makes more sense here to say “partially haploid” than “partially diploid” so have altered this.

      p21

      • RIP has only been observed in a couple of Ascomycetes. RIP-like genomic signatures (GC bias) have been observed elsewhere.

      Hood et al, 2005 found signatures of RIP in anther-smut fungus and Horns et al, 2012, found evidence of hypermutability across repeat elements within several Pucciniales species.

      p23

      • Interesting that the peptidase A2B domain is found uniquely in E. muscae genome and is associated with Ty3 activity. Does the domain often overlap with annotated Ty3 in E. muscae genome? Or how come the domain is not present in other sister species with large genomes full of Ty3 transposons? Could it relate to a new active transposon in E. muscae specifically?

      Thanks for this comment. The domain-based analysis was only performed on the predicted transcriptome of the genome assembly, which does not include the repeat elements (e.g., Ty3). It could be that this peptidase reflects a new active transposon that’s specific to E. muscae, which would certainly be very interesting. We’ve now included this idea in the discussion.

      p26

      • In the case of fungal genomes, I would not advise masking the assembly for repeated sequences prior to gene annotation (in particular given the current focus on protein family variation).

      Thank you for this comment, however we disagree with this assertion as a typical approach for genome annotation in fungi and eukaryotic genomes is to use soft masking of transposable elements before performing gene prediction to avoid over-prediction. While there could be alternative approaches that compare masked or unmasked. This is a recommended protocol for underlying tools like Augustus (10.1002/cpbi.57) and in general descriptions of genome annotation (10.1002/0471250953.bi0401s52). The false positive rate of genes predicted through TE regions is likely to be more a problem than false negatives of missed genes in our experience. Further it seems appropriate to use consistent approach to annotation throughout when including genomes from other sources (e.g., Joint Genome Institute annotated genomes) which also use a repeat masking approach first before annotation. It seems most appropriate to use consistent methods when generating datasets to be used for comparative analyses. It is outside the scope of this project to reannotate all genomes with and without repeat masking.

      p27

      • Interrupted sentence at "Classification of DNA and LTR .. by similarity The".

      This was an unnecessary partial phrase as the information on classification of elements via RepBase was made a few sentences above this.

      p28

      • Enriched/depleted rather than "significantly different"?

      Thank you for this comment, however we have opted to maintain the current phrasing.

    1. Author response:

      Reviewer #2 (Public Review):

      In this study, the authors report that both mice and human patients carrying function-disrupting mutations in the OFD1 gene exhibited ectopic brown adipose tissue formation in the malformed tongue. The OFD1 gene is located on the X-chromosome and encodes a protein product required for the formation and function of the primary cilium, which is required for cells to properly receive and activate several signaling pathways, particularly the hedgehog signaling pathway. Loss of OFD1 function causes prenatal lethality of male fetuses and mosaic disruption of tissues in females due to random inactivation of the X-chromosome carrying either the mutant or wildtype allele. Using cell type-specific gene inactivation and genetic lineage labeling, the manuscript shows that the ectopic brown adipose tissue in the mutant tongue was not derived from cranial neural crest cells (CNCCs). Additional genetic and embryological studies led to the conclusion that loss of Ofd1 function in the CNCC cells in the embryonic hypoglossal cord, via which the tongue myoblast precursor cells migrate from anterior somites to the tongue primordia, caused disruption of cell-cell interactions between the CNCCs and migrating muscle precursor cells, resulting in altered differentiation of those myoblast precursor cells into brown adipocytes. The authors provided data that disruption of Smo in a subset of CNCCs also resulted in ectopic adipose tissue formation in the tongue, indicating that this phenotype in the Ofd1 mutant mice was likely caused by disruption of hedgehog signaling in CNCCs. However, no experimental evidence is provided to support a major conclusion of the manuscript regarding altered differentiation of the tongue myoblast precursor cells into brown adipocytes in the Ofd1 mutant mice. Since it is well established that hedgehog signaling in the CNCCs is required for them to direct tongue myoblast cell migration as well as for tongue muscle differentiation/organization after the myoblasts arrived in the tongue primordia, the finding of tongue muscle defects in the Ofd1 mutant mice is not surprising. However, if proven true that disruption of Ofd1 function in CNCCs caused tongue myoblast precursor cells to alter their fate and differentiate into brown adipocytes, it would be an interesting new finding. Further identification of the signals produced by the Ofd1 mutant CNCCs for directing the cell fate switch will be a highly significant new advance in understanding the cellular and molecular mechanisms regulating tongue morphogenesis.

      Many in vitro and in vivo data have been added as new data. We hope that these are enough for our conclusion. It is extremely difficult to identify the signals produced by the Ofd1 mutant CNCCs for directing the cell fate switch of mesodermal cells after activation of Hh signaling in CNCC. Instead, our new findings raise the possibility that Hh signaling in mesodermal cells is also important for their differentiation as well as Hh signaling in CNCC, which has been added in revised paper. However, we think that it is beyond the scope of this study to deepen these.

      Reviewer #3 (Public Review):

      The authors observed phenotypes of ciliopathy model mice and they seem to coincide with those in human patients. They used mutants in which cilial function genes are deleted in cranial neural crest cells, and found the mutants exhibit abnormal cell differentiation in both neural crest- and mesoderm-lineage cells. The finding clearly shows the importance of tissue/cell interaction. The authors mainly observed the mouse in which Ofd1 gene that is coded on the X chromosome is deleted, therefore, Ofd1fl/WT;Wnt1Cre(HET) mice show that about one-fourth of neural crest cells can exhibit Ofd1 function whereas Ofd1fl;Wnt1Cre (HM) shows null Ofd1 function and show severer phenotypes than HET.

      For ectopic brown adipose tissue in the tongue is derived from mesoderm and the authors tried to show that the hypoglossal cord failed to obtain myogenic lineage after entering branchial arches in HET and HM due to lack of communication with neural crest cells. For ectopic bone formation, they found that it is due to the lack of Hedgehog signaling in neural crest cells, which was consistent with the reports in the Smofl/fl;Wnt1-Cre (Xu et al., 2019) and Ift88fl/fl;Wnt1Cre (Kitamura et al. 2020). The ectopic bone is connected to the original mandibular bone. The authors attribute the ectopic bone formation to the migration of mandibular bone neural crest cells into the tongue-forming area.

      For the poor tongue frenum formation, the authors found the importance of cell migration from the lateral sides of the branchial arch to the midline and its formation relies on non-canonical Wnt signaling. The authors observed similar phenotypes in the human patients as those in the mutants. The adipose tissue in the tongue area is normally found in the salivary gland region and intermuscular space, and it is intriguing to find the brown adipose tissue anterior to the cervical area in which the most anterior brown adipose tissue develops. qRT-PCR indicates that some of the marker genes are expressed in the laser micro-dissected sections of the ectopic brown adipose tissue. However, histology does not show the typical brown adipose tissue feature. In addition, brown adipose tissue is normally recognized in the sixth pharyngeal region as the cervical brown tissue from around E14.5 (Schulz and Tseng 2013), not E12 as the authors observe. Although the mutants develop under abnormal conditions, is it possible to say they are brown adipose tissue? The point has to be further investigated with more marker expression by immunohistochemical detection and other methods. Since the mutants seem to show impaired midline formation (which is consistent with the condition of human ciliopathy), is it possible to hypothesize that the adipose-like tissue is derived from the mesoderm of posterior branchial arch levels if the tissue is brown adipose tissue?

      Immunohistochemistry data has been added as new Figure S4 and S5.

      We agree reviewer’s comment. Histology of ectopic adipose in Ofd1 cKO is slightly different from typical images of brown adipose. Molecular characters of ectopic adipose in Ofd1 mutant tongue are similar to these of low thermogenic adipocyte. Histological features of low thermogenic is known to be different from that of typical brown adipose tissue. Histological features of low thermogenic adipocyte is similar to that of ectopic adipose in Ofd1 mutant mice. This has been mentioned in Results section.

      The cervical brown adipose tissue in Ofd1 mutant should be shrinked or be connected to ectopic adipose in mutant tongue, if ectopic adipose in mutant tongue was derived from the cervical brown adipose tissue due to mis-migration. However, any significant changes of the cervical brown adipose tissue or conection between cervical brown adipose and tongue adipose could not be detected in Ofd1 mutant mice. We think that ectopic adipose in mutant tongue is unlikely derived from cervical brown adipose tissue. These have been added in Result section.

      Cranial neural crest cells start migrating around E8.0 and reach their destination by E9.5. The authors show the lack of neural crest cells in the midline, the fluorescence is absent from the midline in HM, however, they studied it in the E11 mandible (Fig. 4E), almost more than two days after neural crest migration completes. Since the mandibular arch seems to form at the beginning in the mutants, is there a failure in allocating the neural crest and mesoderm at the beginning of the mandibular arch formation?

      It is difficult to prove how much migration is affected in mutant mice. Therefore, sentence describing migration has been deleted in revised paper

      The authors tried to disturb the interaction between the hypoglossal cord and neural crest cells by making incisions in the dorsal area of the branchial arches. That area contains both neural crest and mesoderm but not the hypoglossal cord-derived mesoderm. The hypoglossal cord passed through the posterior edge of the caudal (6th) pharyngeal arch, along the lateral side of the pericardium towards the anterior, ventral to branchial arches, and then inside the 2nd and 1st branchial arches (Adachi et al., 2018). It expresses Pax3 before entering the branchial arches, then Myf5 in the branchial arches. It seems that the migration of the hypoglossal cord does not require interaction with neural crest cells but it has to be confirmed as well as neural crest migration into the branchial arches from the beginning. Although the hypoglossal cord migrates mostly in mesoderm-derived mesenchyme, we cannot exclude the possibility that hypoglossal cord migration is affected.

      Cutting region in original Figure 2Q was not accurate. It has been changed in new Figure 3Q. We agree reviewer’s comment “we cannot exclude the possibility that hypoglossal cord migration is affected”. However, It is difficult to prove how much migration is affected in mutant mice. Therefore, sentence describing migration has been deleted in revised paper

      The lack of Myf5 expression in Ofd1fl;Wnt1Cre (HM) was explained as a failure in the differentiation of the hypoglossal cord into myoblasts on entrance into the branchial arches. Most of the cervical brown adipose tissue is derived from either Myf5- or Pax3- expressing lineage (Sanchez-Gurmaches and Guertin, 2014). Although the authors suggest that brown adipose cells are fate-changed mesoderm in the branchial arches, how do they explain the association with Myf5- or Pax3- expression?

      As reviewer mentioned, the cervical brown adipose tissue is derived from either Myf5- or Pax3- expressing lineage. However, these cells lost Myf5- or Pax3 expression when they differentiate into brawn adipocytes. Although ectopic adipose in Ofd1 mutant tongue showed Pax3 expression at early stage, they likely loose Pax3 expression soon after. There is another possibility that ectopic adipocytes retain Pax3 expression, if they would be abnormal adipocytes. If so, it's not surprised when expression pattern of ectopic adipocytes in Ofd1 mutant is different from these of normal brown adipose tissue. Anything can be possible in these situation. Therefore, we don’t mention anything about these in the text

      In addition, the cervical brown tissue is supposed to be derived from the branchial arch mesoderm (Mo et al., 2017). Is the formation of the cervical brown tissue affected in the Ofd1fl/WT;Wnt1Cre(HET) or Ofd1fl;Wnt1Cre (HM) if dysfunction of neural crest cells results in the cell fate change of mesoderm?

      Any significant morphological changes of the cervical brown adipose tissue could not be detected in Ofd1 mutant mice. Ectopic adipose tissue in Ofd1 cKO was found from E115, while cervical adipose tissue form from E14.5. We think that dysfunction of CNCC at E14.5 does not affect mesodermal cells for the cervical adipose tissue.

      For the tongue frenum development, it is hard to understand to hypothesize that its formation is unlikely to associate with midline formation. Although Lgr5 and Tbx22 are not expressed in the midline, the defect in midline formation could cause unnecessary interaction between the right and left tissues.

      We agree reviewer’s comment. The sentences have been changed in new manuscript.

      Tissue morphogenesis takes place in three dimensions, which were not considered in the data, especially in the labeling experiments. When the authors labelled the cells, which cells in which area were labelled? In the textbook, tongue formation is a result of the fusion of the midline processes derived from the branchial arches, therefore, it is important to identify which cells in which area are labelled.

      Data of Lgr5 and Tbx22 in situ hybridization has been added as new Figure 10-S1D and -S1E, since we labelled cells within Lgr5 and Tbx22 expression domain. Data showing section of explant with DiD injection before and after culture has been added as new Figure 10-S1F and -S1G, which showed DiD labelled cells were located within Lgr5 and Tbx22 expression domain before culture and at tongue frenum region after culture.

      The weakest point is that the authors demonstrate many interesting phenotypes but fail to show the mechanism of altered cell differentiation and direct evidence of the tissue origin of ectopic brown tissue. Without the data, suggestion from the authors' argument is weak, which is reflected in the conclusion of the abstract.

      Many in vitro and in vivo data have been added as new data. We hope that these are enough for our conclusion.

    1. Author response:

      Reviewer #2 (Public Review):

      (1) Some changes to statistical analyses are needed in this study.

      Fig. 1B, 1D, 2A, 3E, and 3F report the QL.d phenotype as a percentage of animals scored that were defective in migration. The methods make it clear this data is categorical rather than quantitative. Therefore, a t-test or any test designed for quantitative data is not appropriate. I suggest that the authors should investigate using a chi-squared or Fisher's exact test.

      For the reasons mentioned above, the calculation of standard deviation (as shown in error bars) is also not appropriate for Fig. 1B, 1D, 2A, 3E, and 3F. Of course, it is excellent that the authors scored multiple trials. For experiments with mutants, I suggest the authors might combine these trials or show separate results of each trial. For experiments using RNAi (Fig. 1B), each trial should be plotted separately because RNAi effectiveness can vary. If there is not enough space to show multiple trials, then I would ask that a representative trial be shown in the main figure and additional trials in a supplement.

      We thank the reviewer for pointing out the statistical mistake. For all figures assessing the QL.d migration phenotype (Fig.1B, 1D, 2A, 4A (former 3E), 4D (former 3F) and Fig.1 – figure supplement 1, Fig.2 – figure supplement 1, Fig.4 – figure supplement 2) the statistical significance was evaluated using Fisher’s exact test. For RNAi experiments (Fig. 1B) results from a representative experiment is shown and two additional trials are shown in Figure 1 – figure supplement 1. For experiments with mutants, results from separate trials were pooled and are presented in the main figures.

      In Fig. 1, 2, 3, and 5, it is not specified whether/how p-values were adjusted for multiple tests.

      We have applied Bonferroni correction for multiple testing in all Figures where it was relevant (Fig. 1, 2, 4, 5 and 6 and in their supplements) and this is now stated in all corresponding Figure legends.

      (2) I felt the author's interpretation of the sel-5 mutant phenotypes in EXC, and the genetic interactions with Wnt signaling mutants, might be improved. The authors show convincing data that the sel-5 mutants display a shortened EXC outgrowth phenotype. Conversely, mutants with reduced Wnt signaling, such as the lin-17 or lin-44 mutants, displayed lengthened EXC outgrowth. The authors show that in double mutants, loss of sel-5 partially suppressed the EXC overgrowth defects of lin-17 or lin-44 mutants (Fig. 5). In my opinion, this data is consistent with a model where SEL-5 acts to inhibit Wnt signaling in EXC. An inhibitory role in a Wnt-receiving cell would be consistent with the known activity for human AAK1 in promoting negative feedback and endocytosis of LPR6. Interestingly, the authors mention in their discussion that a mutant of plr-1, which acts in the internalization of Frizzled receptors, has a shortened EXC phenotype similar to that of sel-5 mutants. These observations all seem consistent with an inhibitory role, yet the authors do not state this as their conclusion. A clarification of their interpretation is needed.

      We thank the reviewer for this feedback. Indeed, the above interpretation of the excretory cell migration data is plausible, however, we think that several lines of evidence argue against this possibility. First, measurements of the posterior canal length during L1/L2 larval stages show that LIN-44/LIN-17 signalling is not required for the early stages of excretory canal outgrowth, unlike SEL-5/VPS-29 (Fig. 5E, 6D). This suggests that SEL-5 and VPS-29 are required earlier than LIN-44 and LIN-17 and therefore should not act at the level of Wnt receptor internalization. Our new data with more mutant combinations revealed canal shortening in cwn-1; cfz-2 and cwn-2; cfz-2 mutants. This would rather suggest a positive role for SEL-5 and VPS-29 in Wnt pathway regulation. Either SEL-5/VPS-29 employ two different mechanisms of Wnt pathway regulation or alternatively, act prior to any Wnt-dependent step in the excretory canal outgrowth. The observed partial rescue of the lin-17 or lin-44 overgrowth defect by sel-5 could then be explained for example by a reduced speed of canal outgrowth in sel-5 mutants. Based on new findings about CWN-1, CWN-2 and CFZ-2 involvement we have also modified our model now presented in Fig.7.

      For changes to the Results section, see Response to Reviewer 1, point 4b. The Discussion part has been substantially rewritten and is presented below:

      LINE 428 “Our analysis of single Wnt and Frizzled mutants revealed that while loss of cwn-2 or cfz-2 expression resulted in a very mild shortening of the excretory canal, loss of lin-44 or lin-17 led to profound canal overgrowth (summarized in Fig. 7A). These findings suggested that two independent Wnt pathways could be employed to establish proper excretory canal length – one promoting canal extension and one generating the stop signal for growth termination. Further analyses of double mutants and other Wnt signalling components revealed that the extension-promoting pathway includes cwn-1 in addition to cwn-2 and cfz-2, while the stop-signal pathway encompasses lin-44, lin-17, dsh-1, mig-5 and mig-14. A similar repulsive role of LIN-44/LIN-17 complex has been described in the case of a posterior axon of C. elegans GABAergic DD6 motor neuron (Maro et al., 2009) or PLM, ALN and PLN neurons (Zheng et al., 2015). Loss of lin-44 or lin-17 expression promoted outgrowth of the posterior neurites of these neurons implicating that in wild type animals, LIN-44 serves as a repulsive cue. On the other hand, cwn-2 and cfz-2 were shown to positively regulate the posterior neurite outgrowth of RMED/V neurons with cwn-2 acting as an attractive cue (Song et al., 2010). The role of two other Wnt signalling components, egl-20 and mig-1, is less clear. No effect (mig-1) or only very mild overgrowth defect (egl-20) is observed in single mutants. However, both egl-20 and mig-1 significantly rescue the overgrowth phenotype of lin-17 mutants, while at the same time, mig-1 can suppress the shortening of canals in cfz-2 mutants. EGL-20-producing cells are localized around the rectum (Whangbo et al., 1999; Harterink et al., 2011), exactly where the excretory canals stop, while LIN-44 is expressed more posteriorly (Herman et al., 1995; Harterink et al., 2011). A possible explanation could thus be that while LIN-44 provides a general posterior repulsive signal, EGL-20 fine-tunes the exact stopping position of the growing canal. The role of different Wnts and Frizzleds in excretory canal outgrowth is summarized in Fig. 7B. Further investigation will be required to decipher the exact way how SEL-5 and the retromer crosstalk with Wnt signalling during excretory cell outgrowth. It is clear though that more than one mechanism is likely involved. First, sel-5 vps-29 mutants display canal shortening similarly to cwn-1; cfz-2 or cwn-2; cfz-2 suggesting a positive regulatory role. Mutants in lin-17 and lin-44 display canal overgrowth, yet sel-5 is partially able to suppress this phenotype. This would imply a negative regulatory role of sel-5 and be in agreement with the role of AAK1 in Wnt pathway regulation (Agajanian et al., 2019). However, sel-5 and vps-29 are required already during the initial larval outgrowth while the LIN-44/LIN-17 signal is required later. The observed rescue might thus also be explained by a delayed growth of the canal and not by a direct impact of sel-5 and vps-29 on LIN-44 or LIN-17 levels or localization.”

    1. Author response:

      Reviewer #2 (Public Review):

      The manuscript entitled " Multimodal HLA-I genotypes regulation by human cytomegalovirus US10 and resulting surface patterning" by Gerke et al describes the biochemical analysis of US10-mediated down regulation of HLA-I molecules. The authors systemically examine the surface expression of different HLA-I alleles in cells expressing US10 and interactions of US10 with HLA-I and antigen presentation machinery. Further, studies examined genotypic and allotypic differences during expression of US10/US11 transcripts suggest a different allelic class I downregulation. In general, the authors have included data supporting the major claims. Yet, the conclusions and findings of the study only marginally advance the overall understanding of HCMV viral evasion and the mechanism of US10 function.

      Strengths:

      The studies are well characterized and the studies utilize diverse HLA-I and HCMV viral molecules. The biochemistry is excellent and is of high quality. Importantly, the study describes HLA-I allelic specific HCMV down regulation at the cell surface and molecular levels.

      Weaknesses:

      (1) The authors use over expressive language such as "strong binding" that does not have a quantitative value and it is relative to the specific assay with only small differences among the factors.

      We have changed the language to avoid non-quantitative expressions.

      (2) The US10 binding to the HLA-I did not correlate with class I surface levels suggesting that binding to the APC machinery (Figure 1); hence, why does the binding of US10 to the APC define its mechanism of action.

      We hypothesized that since binding to HLA-I allomorphs did not correlate with surface expression, further factors could be involved in regulation. Since the PLC (APC machinery) plays a major role for HLA-I expression, it was relevant to investigate this. The new data underlines the importance of the PLC for US10-mediated HLA-I regulation.

      (3) The innovative and significant aspects of the study are limited. The study does not delineate the US10 mechanism of action or show data in which US10-mediated MHC class I down regulation impacts adaptive or innate immune function.

      These remarks are important. We want to emphasize the variable impact of US10 on HLA-I. To our knowledge previous studies have not uncovered genotype-dependent effects on HLA-I as distinct as those observed with US10, indicating that US10 may exploit aspects of HLA-I that are yet to be fully elucidated. Therefore, confirming these findings is crucial for our study. The quantitative analysis of the HeLa HLA-I ligandome in US10-expressing cells strongly supports this conclusion. The precise quantification of HLA-I peptide ligands was made possible through collaboration with Dr. Andreas Schlosser from Würzburg, Germany, who possesses profound expertise in this specific method. Thus, in our opinion, this revision has enabled us to advance innovation and, importantly, enhance the significance of our study.

    1. Author response:

      Reviewer #3 (Public Review):

      Software UX design is not a trivial task and a point-and-click interface may become difficult to use or misleading when such design is not very well crafted. While Phantasus is a laudable effort to bring some of the out-of-the box transcriptomics workflows closer to the broader community of point-and-click users, there are a number of shortcomings that the authors may want to consider improving.

      Thank you for such an in-depth review. We really appreciate this feedback and have tried to address all of the concerns in the new version of Phantasus.

      Here I list the ones I found running Phantasus locally through the available Bioconductor package:

      (1) The feature of loading in one click one of the thousands of available GEO datasets is great. However, one important use of any such interfaces is the possibility for the users to analyze his/her own data. One of the standard formats for storing tables of RNA-seq counts are CSV files. However, if we try to upload from the computer a CSV file with expression data, such as the counts stored in the file GSE120660_PCamerge_hg38.csv.gz from https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE120660, a first problem is that the system does not recognize that the CSV file is compressed. A second problem is that it does not recognize that values are separated by commas, the very original CSV format, giving a cryptic error "columnVector is undefined". If we transform the CSV format into tab-separated values (TSV) format, then it works, but this constitutes already a first barrier for the target user of Phantasus.

      Thank you for highlighting this issue of file formats support. We acknowledge the commonality of CSV and CSV.gz files in gene expression analysis. As a response, we have updated our data loading procedure to support these file formats. Moreover, the most recent version of our web application is able to recognize gzip-archived file in any of supported table formats: GCT, TSV, CSV and XLSX.

      (2) Many RNA-seq processing pipelines use Ensembl annotations, which for the purpose of downstream interpretation of the analysis, need to be translated into HUGO gene symbols. When I try to annotate the rows to translate the Ensembl gene identifiers, I get the error

      "There is no AnnotationDB on server. Ask administrator to put AnnotationDB sqlite databases in cacheDir/annotationdb folder"

      Thank you for revealing this issue. Indeed, locally installed instances of the Phantatus might lose some functionality in absence of some auxiliary files. For example, gene annotation mapping is unavailable without annotation databases. Previously, the user had to perform additional setup steps to unlock a few features, which might be confusing and unclear. In order to overcome this we have revised significantly the installation procedure. Newly added ‘setupPhantasus’ function is able to create all necessary configuration files and provides an interactive dialog with the user that helps to load all necessary data files from our official cache mirror (https://alserglab.wsutl.edu/files/phantasus/minimal-cache/). Docker-based installation follows the same approach, however it is configured to install everything by default. Thus, with help of the new installation procedure locally installed Phantasus now has the whole functionality available at the official mirrors. The comprehensive installation description is now available at https://ctlab.github.io/phantasus-doc/installation.

      (3) When trying to normalize the RNA-seq counts, there are no standard options such as within-library (RPKM, FPKM) or between-library (TMM) normalization procedures.

      Appreciating your feedback, we've expanded the available normalization options in the updated version of Phantasus. We added support for TMM normalization as suggested by the edgeR package and voom normalization from the limma package. However, certain strategies like RPKM/FPKM or TPM rely on gene-specific effective lengths, which are challenging to infer without protocol and alignment details. As Phantasus operates on gene expression matrices and doesn't execute alignment steps, the implementation of these normalization seems infeasible. On the other hand, if the user has the matrix with FPKM or TPM gene values (for example from a core facility), such a matrix can be loaded into Phantasus and used for the analysis.

      If I take log2(1+x) a new tab is created with the normalized data, but it's not easy to realize what happened because the tab has the same name as the previous one and while the colors of the heatmap changed to reflect the new scale of the data, this is quite subtle. This may cause that an unexperienced user to apply the same normalization step again on the normalized data. Ideally, the interface should lead the user through a pipeline, reducing unnecessary degrees of freedom associated with each step.

      Thank you for your comment. Indeed our approach to create a new tab for each alteration to the expression values preserving the name might be the source of confusion for a user. On the other hand, generating informative tab names without overwhelming users with too much detail is also challenging. As a compromise we have an option for the user to manually rename the tab. Still, we agree that this remains an area for improvement. We also consider it to be a part of a larger issue: for example, the loaded data can already be log-scaled, so that even one round of log-scale transformation in Phantasus would be incorrect. Accordingly, we are exploring ways to address this issue in the future by adding automated checks for the tools or, as you suggested, implementing stricter pipelines.

      (4.4) Phantasus allows one to filter out lowly-expressed genes by averaging expression of genes across samples and discarding/selecting genes using some cutoff value on that average. This strategy is fine, but to make an informed decision on that cutoff it would be useful to see a density plot of those averages that would allow one to identify the modes of low and high expression and decide the cutoff value that separates them.

      Thank you for the suggestion. Indeed a density plot might help users to make informed decisions during gene filtration. We have added such a plot into the ‘Plot/Chart’ tool as a ‘histogram’ chart type.

      It would be also nice to have an interface to the filterByExpr() function from the edgeR package, which provides more control on how to filter out lowly-expressed genes.

      Thank you for proposing the inclusion of an interface for the filterByExpr() function from the edgeR package. In the recent update we have incorporated filterByExpr() as part of the voom normalization tool. For now, for simplicity, we have decided to keep only the default parameter values. However, we will explore the addition of the dedicated filtering tool in the future.

      (5) When attempting a differential expression (DE) analysis, a popup window appears saying:

      "Your dataset is filtered. Limma will apply to unfiltered dataset. Consider using New Heat Map tool."

      One of the main purposes of filtering lowly-expressed genes is mainly to conduct a DE analysis afterwards, so it does not make sense that the tool says that such an analysis will be done on the unfiltered dataset. The reference to the "New Heat Map tool" is vague and unclear where should the user look for that other tool, without any further information or link.

      Thank you for highlighting this issue. We agree that the message in the popup window and the default action were confusing. In response to your feedback, we've updated the default behavior of our DE tools to automatically use the filtered data in a new tab. Additionally, we've clarified the warning message to ensure a better understanding of this process.

      (6) The DE analysis only allows for a two-sample group comparison, which is an important limitation in the question we may want to address. The construction of more complex designs could be graphically aided by using the ExploreModelMatrix Bioconductor package (Soneson et al, F1000Research, 2020).

      Indeed, the ability to create complex designs and various comparisons is important for many applications for gene expression analysis. Accordingly, in the latest Phantasus version, we've introduced an advanced design feature for the DE analysis, enabling the utilization of multiple column annotations for the design matrix. Combined with the existing ability to create new annotations, this update facilitates the setup of diverse design matrices. While at the moment we do not allow setting a complex contrast, we hope that the current interface will cover most of the differential expression use cases.

      (7) When trying to perform a pathway analysis with FGSEA, I get the following error:

      "Couldn't load FGSEA meta information. Please try again in a moment. Error: cannot open the connection In call: file(file, "rt")

      We hope that this issue should be resolved after we have implemented a more streamlined setup process. Among others, the new approach aims to eliminate the unexpected absence of metafiles in local installations. The latest Phantasus package version explicitly prompts the user to load necessary additional files automatically during the initial run, reducing options for an invalid setup.

      Finally, there have been already some efforts to approach R and Bioconductor transcriptomics pipelines to point-and-click users, such as iSEE (Rue-Albrecht et al, 2018) and GeneTonic (Marini et al, 2021) but they are not compared or at least cited in the present work.

      Indeed, our comparison was focused toward tools that offer non-programmatic functionalities for gene expression data analysis. While tools like iSEE and GeneTonic are adept at visualizing data and hold their own in providing extensive abilities, they do necessitate additional data preparation using R, distinguishing them from the specific scope of tools we assessed.

      One nice features of these two tools that I missed in Phantasus is the possibility of generating the R code that produces the analysis performed through the interface. This is important to provide a way to ensure the reproducibility of the analyses performed.

      The ability to generate R code within tools like these indeed aids in ensuring analysis reproducibility. Moreover, we have previously attempted implementing this functionality in Phantasus, however it proved to be hard to do in a useful fashion due to potential complex interactions between user and the client-side part of Phantasus. Nevertheless, we acknowledge the significance of such a feature and aim to introduce it in the future.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We thank the reviewers for a careful review of the manuscript and for their comments, which we address below.

      Reviewer #1:

      (1) …the authors could examine division in a population of cells with only one centrosome. Seeing some restoration of mitotic progression in the absence of SAC-dependent delays would suggest that even one centrosome with uninhibited Eg5 is sufficient to negate SAC-dependent delays, and would limit models for what exactly centrosomes contribute.

      We agree that the one-centrosome question (i.e. whether cells with a single centriole, and therefore a single centrosome, have the same SAC dependence) would be interesting to address. It is known that cells with a single centriole generated through centrinone treatment also have elongated mitoses, like cells lacking centrioles (see Chinen, et. al. 2021, compare Fig 2C to Fig 2D), We have tried this experiment in RPE-1 cells with preliminary results confirming that there is a mitotic delay. It is not known whether this delay requires SAC activity, and we hope to address that in future work. In addition, we note that we show in Fig. 4b-c that cells with the normal centrosome number but with a single focus of microtubules due to Eg5 inhibition, were also sensitive to MPS1 inhibition. This suggests that centrosome presence alone cannot overcome the requirement for SAC activity, rather, the centrosomes need to be able to separate in a timely fashion.

      Reviewer #2:

      (1) An example is how to interpret the effect of Aurora B inhibition, which does not block acentrosomal cell division. If Aurora B is required for SAC activity, it suggests this effect of MPS1 may be a function other than SAC. Given the complexity of the SAC, it would be informative to test other SAC components. Instead, the authors conclude that the mitotic delay caused by MPS is required for acentrosomal cell division. I don't think they have ruled out, or even addressed other functions of MPS1.

      We agree that it is possible that functions of the MPS1 kinase other than those involved in the SAC could be important. Although we have not directly tested other SAC components, we did “mimic” SAC activity by delaying anaphase onset using APC/C inhibition while also inhibiting MPS1 (Fig. 2b-b’’). The fact that this restored division suggests that it is the SAC function of MPS1 kinase activity that is relevant to this delay. 

      (2) The authors find that when both the APC and MPS1 are inhibited, the cells eventually divide. These results are intriguing, but hard to interpret. The authors suggest that the failure to divide in MPS1-inhibited cells is because they enter anaphase, and then must back out. This is hard to understand and there is not data supporting some kind of aborted anaphase. Is the division observed with double inhibition some sort of bypass of the block caused by MPS1 inhibition alone? It is not clear why inhibition of APC causes increased cell division when MPS1 is inhibited.

      As described in the response to 1), we believe that reinstating the delay to anaphase onset by APC/C inhibition provided the time needed to establish a functional bipolar spindle even in the absence of the SAC, and that cells eventually overcome the proTAME block and proceed through mitosis, as observed in control cells in our experiments. We note that we chose concentrations of proTAME specifically for each cell line (RPE-1 and U2OS) that would result only in a temporary block, following on the work of Lara-Gonzalez and Taylor (2012), who reported similar findings for HeLa cells.

      (3) The authors characterize MTOC formation in these cells, which is also interesting. MTOCs are established after NEB in acentrosomal cells. Indeed, forming these MTOCs is probably a key mechanism for how these cells complete a division, like mouse oocytes.

      We agree that the observed intermediates of MTOCs are interesting and likely crucial to the mechanism of cell division in acentrosomal somatic cells. We are investigating further the differences and similarities between somatic cell MTOC formation in the absence of centrosomes and the naturally-occurring form of that process in oocytes.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We would like to thank the editors and reviewers for providing feedback and suggestions for our manuscript.

      In response to reviewers comments we changed several main Figures and added new tables and supplementary figures. We also made edits to the Discussion.

      Reviewer #1 (Public Review):

      Weaknesses:

      Limited data is shown on the let-7afdLOF mice. Does this mouse respond similarly to nCB as the let-7bc2LOF.

      In the revised manuscript, we have added a baseline lung phenotypic assessment for the let-7afdLOF mice up to 6-months of age within Figure 4-figure supplement 1. The data supports our original statement and observation that let-7afdLOF mice do not exhibit lung pathology, inflammation, or changes in T cell subsets at baseline. Our view is that current manuscript addresses the importance of let-7bc2-cluster in experimental emphysema and the let-7afd-cluster mice is used to validate Rorc as a direct target of let-7. In the future, new grant funding will make it possible to ascertain whether absence of the let-7afd-cluster also sensitizes mice to experimentally induced emphysema.

      Because the authors validate their findings from a previously published RNA-seq dataset in subjects with and without emphysema, the authors should include patient demographics from the data presented in Figure 1C-D.

      We thank the reviewers for their recommendation. In address of this, the revised manuscript contains a new Supplementary Table 1 with the human subject demographic information that corresponds with Figure 1D.

      To validate their mouse models, the absence of Let-7 or enhanced Let-7 expression needs to be shown in isolated T cells from exposed mice.

      In the case of let-7bc2-cluster, we have included Figure 2-figure supplement 2 which shows pri-let7bc2 expression assessed by qPCR from selected CD8+ lung T cells of control and let-7bc2LOF mice exposed to PBS vehicle or nCB. The let-7g GOF model used in our studies has been validated for the induction of let-7g in thymic and peripheral T cells and elicitation of gain-of-function phenotypes (Pobezinskaya et al. 2019; Angelou et al. 2020; Wells et al. 2023).

      In Figure 3, the authors are missing the unexposed let-7bc2LOF group from all panels.

      We emphasize that our exhaustive characterization of control and let-7bc2LOF mice in absence of challenge showed no phenotype. The baseline data was collectively shown in Figure 2-figure supplement 1.

      Why did the authors choose to overexpress Let-7g, the rational is not clear?

      We concur that ideal GOF experiments can be carried out with let-7b or let-7c. Unfortunately, let-7b/c2 transgenic mice are not currently available, so we elected to use the well characterized let-7g T cell GOF mouse model (Pobezinskaya et al. 2019; Angelou et al. 2020; Wells et al. 2023). Furthermore, it is worth noting that the binding/seed sequence of let-7g is identical to let-7a/b/c and other members. Nonetheless, we have edited our Discussion section to reflect this as a potential caveat that can confound the utilization of this let-7GOF mouse model.

      The purity of the CD4+ and CD8+ T cells is not shown and the full gating strategy should be included.

      In the revision, we included the flow gating strategy and display the representative population with purities in Supplementary Figure 1 of the revised manuscript.

      Reviewer #2 (Public Review):

      Weaknesses:

      The functional analyses are unusually focused on IL-17 producing CD8 T cells, but it is not made clear whether these cells are an important player in emphysema pathogenesis in the nCB and CS models. The data shown reveal that they are far less numerous than IL-17-producing CD4 T cells. It is also notable that the Figure 1 expression data from human subjects used sorted CD4+ T cells. And as the author mentioned, prior work on let-7 showed that it regulated Th17 (CD4) responses.

      As we showed that the let-7bc2LOF had enhanced the Tc17 cell population without any significant impact on Th17 cells, we elected to focus our analysis on this population. Furthermore, the connection of let-7 with the generation of a Tc17 inflammatory response is a novel finding, which so far remained unappreciated in the field and instigates new lines of inquiry.

      Compared with Let7bc2 deletion, Let7afd deletion had a much larger effect on IL17 production by CD8 T cells in vitro, and it also had a larger effect on RORgt expression in untreated mice in vivo, especially in the lung. It would be valuable to more thoroughly characterize the let7afd mice. RORgt expression should be shown in the in vitro assays. In the results, the authors state that let7afdLOF mice "did not exhibit lung histopathology nor inflammatory changes" up to 6 months of age. Similarly, it is stated in the conclusion that "the let-7afdLOF mice ... did not exhibit changes in Tc17/Th17 subpopulations" in vivo. All these data should be shown, and if no baseline changes are apparent, then I also recommend challenging these mice with nCB and/or cigarette smoke.

      We concur that additional phenotypic characterization on the let-7afdLOF mice will contribute valuable information in the future. Reviewer 1 had a similar comment. As described above in response to Reviewer 1, we added comprehensive phenotypic analysis of let-7afdLOF mice within Figure 4-figure supplement 1 in the revised manuscript. The new data indicates that there is no overt lung pathology in the let-7afdLOF mice despite the subtle induction of RORγt expression in T cells. Furthermore, we have now included flow cytometric analysis of RORγt expression from in vitro polarized Tc0 and Tc17 cells from let-7afdLOF mice within revised Figure 5H.

      This brings up the larger issue of redundancy among the let-7 family members and genomic clusters. This should be discussed, including some explanation of the relative expression of each mature family member in T cells, and how that maps to the clusters studied here (and those that were not investigated). It would also be helpful to explain the relationship between mouse Let7bc2 and human Let7a3b, since Let7bc2 is the primary focus of emphysema experiments in this manuscript. This is especially important because the study of individual let-7 clusters is the core novelty of this body of work, as described in the first paragraph of the discussion. The regulation of let-7 expression has been reported before and its functional role has been investigated with a variety of tools.

      We appreciate the interest and suggestion to expand the discussion on the let-7 family and their expression regulation. To address these points, we included additional references and expanded the Discussion section of the revised manuscript.

      Let7g overexpression caused a marked reduction in Rorgt expression in T cells at baseline and in the setting of nCB challenge, and it reduced the frequency of IL17+ producing CD8 T cells in the lung to baseline levels. Yet there was no change in the MLI measurement of histopathology. Is this a robust result? The responses in the experiment shown in Fig. 6C-D are quite muted compared to those shown in Figure 2. The latter also shows a larger number of replicates, and it is unclear whether the data in 6D include measurement from all of the mice tested (e.g. pooled from 2 small experiments) or only mice from one experiment.

      We appreciate the reviewer inquiry into the data presented in Figure 6C-D. The data is representative of a single experiment and the number of experiments has been added to the revised Figure 6 legend. We note that all let-7GOF and associated control mice in Figure 6 are exposed to doxycycline as part of the let7g induction model, whereas mice in Figure 2 are not. It has been previously reported that doxycycline, a member of the tetracycline family of molecules, has anti-inflammatory properties (Di Caprio et al. 2015), which we speculate could account for the differences in the magnitude of emphysemic response.

      Reviewer #3 (Public Review):

      Weaknesses:

      The authors show no change in frequencies of Treg cells in let-7bc2LOF mice exposed to nCB. Do these Treg cells also express higher levels of RORgt and IL-17? The major question that was not addressed in this study is how let-7 expression is regulated in emphysema. The other recommendation is that the authors include the sequences of the let-7 mimic oligos used in the luciferase assay.

      We did not have the opportunity to address whether RORγt is in fact also upregulated in Treg cells. It remains unclear what upstream mechanisms drive the downregulation of the let-7 clusters in T cells with exposure to smoke/nCB. However, we agree that this an important question and we therefore updated the Discussion section of manuscript by including several citations that could explain how let-7 clusters become repressed in a coordinated fashion. Regarding the last point, the sequence of the duplex used in luciferase assay corresponds to the canonical mature let-7b in NCBI and has been added to Supplementary Table 3.

      Reviewer #2 (Recommendations For The Authors):

      The authors state that "Recent evidence suggests the let-7 family is downregulated in patients with COPD, however, how they cause emphysema remains unclear." This should be reworded. Its downregulation in disease does not necessarily indicate that let-7 causes emphysema. Also, recommend rewording "Overall, our findings shed light on the let-7/RORγt axis as a braking and driving regulatory circuit in the generation of Tc17 cells..." What does it mean to be a "braking and driving" circuit? These terms seem contradictory.

      We recognize that the sentences were not phrased clearly. We have rephrased these statements as “Recent evidence suggests the let-7 miRNA family is downregulated in patients with COPD, however, whether this repression conveys a functional consequence in emphysema pathology has not been elucidated.” and “Overall, our findings shed light on the let-7/RORγt axis with let-7 acting as a molecular brake in the generation of Tc17 cells…”

      Experimental details are needed for the human miRNA expression studies. Too little information is provided in the methods section, and the article cited there (Yuan et al 2020) is not listed in the bibliography.

      We expanded the Materials and Methods section for the collection, isolation, and qPCR analysis of human subject lung T cells. We have corrected the bibliography and added the missing citation.

      The claim of novelty for miRNA-mediated silencing of Rorc in the discussion section is unnecessary and incorrect (https://pubmed.ncbi.nlm.nih.gov/23359619).

      Thank you for bringing the publication to our attention. Close inspection of this publication indicates that the authors did not experimentally validate Rorc as a direct target of let-7 itself. Plus the work was limited to immortalized in vitro cell cultures. We amended the sentence in the Discussion section highlighting the novelty of our findings which is the demonstration of Rorc as an in vivo target of let-7 in T cells.

      Citations

      Angelou, Constance C., Alexandria C. Wells, Jyothi Vijayaraghavan, Carey E. Dougan, Rebecca Lawlor, Elizabeth Iverson, Vanja Lazarevic, et al. 2020. “Differentiation of Pathogenic Th17 Cells Is Negatively Regulated by Let-7 MicroRNAs in a Mouse Model of Multiple Sclerosis.” Frontiers in Immunology 10: 3125. https://doi.org/10.3389/fimmu.2019.03125.

      Di Caprio, Roberta, Serena Lembo, Luisa Di Costanzo, Anna Balato, and Giuseppe Monfrecola. 2015. “Anti-Inflammatory Properties of Low and High Doxycycline Doses: An in Vitro Study.” Mediators of Inflammation 2015: 329418. https://doi.org/10.1155/2015/329418.

      Pobezinskaya, Elena L., Alexandria C. Wells, Constance C. Angelou, Eric Fagerberg, Esengul Aral, Elizabeth Iverson, Motoko Y. Kimura, and Leonid A. Pobezinsky. 2019. “Survival of Naïve T Cells Requires the Expression of Let-7 miRNAs.” Frontiers in Immunology 10 (May). https://doi.org/10.3389/fimmu.2019.00955.

      Wells, Alexandria C., Kaito A. Hioki, Constance C. Angelou, Adam C. Lynch, Xueting Liang, Daniel J. Ryan, Iris Thesmar, et al. 2023. “Let-7 Enhances Murine Anti-Tumor CD8 T Cell Responses by Promoting Memory and Antagonizing Terminal Differentiation.” Nature Communications 14 (1): 5585. https://doi.org/10.1038/s41467-023-40959-7.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      (1) The description of the wing phenotype that results from combinations of wingless and delex alleles at the bottom of page 4 (figure 1) is quite confusing. Are the trans-hets suppressed to wt or enhanced? The images in the Fig look enhanced.

      We thank the reviewer for this thoughtful observation regarding the wing phenotype description in combination with wg and dx alleles. We understand the confusion and appreciate the opportunity to clarify.

      In response to the concern raised, the trans-heterozygous indeed enhanced rather than suppressed to wild type. We acknowledge that the description would have been clearer. We have revised the relevant section to explicitly state that trans-heterozygous exhibit an enhanced wing phenotype in the updated version of the manuscript.

      (2) Use of Cut as a Wg readout in Fig1 is problematic since it is also a Notch target. Perhaps a more direct measure of Arm activity would be a better choice here, e.g., naked-lacZ.

      We appreciate the reviewer’s insightful comment regarding the use of Cut as a Wg readout. The point about being Cut as a Notch target raises a valid concern. To address this issue and provide a more direct measurement of Arm activity, we agree that incorporating a specific Arm readout, such as naked lacZ, would be a more suitable choice.

      We will incorporate this valuable feedback into our future research endeavors to augment the comprehensiveness of our study.

      (3) The dx allele effects on Sens and Vg in Fig 2C appear greater at two points along the DV margin (arrows). Do these match the expression pattern of dx mRNA?

      We thank the reviewer for this thoughtful observation. We understand that the effect of the dx LOF allele on Sens and Vg seems more pronounced at two specific points along the D/V margin. As far as our understanding Dx shows a homogeneous expression pattern throughout the Wg disc which has been reported earlier (Busseau et al., 1994., Mukherjee et al., 2005).

      (4) It really looks to my eye that dx loss lowers Wg expression in source cells in Fig 2. To confirm the model that Dx controls the spread of Wg protein, it would be ideal to rule out txnal effects with a wg-lacZ reporter.

      We appreciate the reviewer for raising this important point. In the revised version of the manuscript, we have introduced Wg-lacZ staining for both Wg-lacZ/+ and dx152/Y; Wg-lacZ/+ combination in Figure 2. This additional information eliminates the possibility of Deltex influencing Wg transcriptional regulation in source cells, thus reinforcing our hypothesis that the reduction of Deltex leads to a decline in Wg protein levels in the source cells, given Dx essential role in wingless gradient formation.

      (5) The drop in DV Wg and expansion of Vg domain in dx mutants seem paradoxical but could be explained by accelerated Wg spread and uptake. This could be tested by depleting the dally-like glypican that promotes long-range Wg diffusion in dx mutants, and seeing if this restores Wg levels at the DV margin.

      This is indeed a very thoughtful comment and we thank the reviewer for this insightful suggestion for further exploration. We believe that depleting dally-like glypican in dx mutants could possibly restore Wg levels at the DV margin.

      We recognize the importance of this experiment in providing a more comprehensive understanding of the underlying mechanisms, and we will give major emphasis on incorporating this suggestion in our future research.

      (6) The authors describe the effect of Dx over-expression as "reducing" the Wg gradient when they actually mean "flattening". Please be careful with this word choice as they mean different things.

      We thank the reviewer for the insightful feedback. The suggested modifications have been incorporated into the revised version of the manuscript.

      (7) The combined effects of Rab5dn and Dx o/e on Wg protein loc/levels are interesting but need to be followed up by testing whether the endogenous Dx/Rab5 show genetic interactions in control of Wg protein levels/localization.

      We acknowledge the reviewer's comment and in addressing it, we wish to highlight that the over-expression of Dx with endogenous Rab5 or Rab7 does not affect Wg protein levels or localization. We have mentioned the supporting data for this control in Figure 5(G, H).

      (8) The ability of MG132 to restore Arm levels in en-Dx discs is very promising. However, MG132 will also block Arm degradation by the Slmb-APC destruction complex, so this result could be non-specific. Tests of whether Dx drives poly-ub of Arm, and how much Dx is redundant to Slmb in this role, would be needed to solidify the authors' conclusion.

      We thank the reviewer for this insightful comment. We understand that the concern about MG132 blocking Arm degradation by Slmb-APC destruction complex adds an important layer of complexity to the interpretation of the results. We agree with the reviewer's comment that conducting these experiments will indeed offer valuable insight into the specificity of MG132 effects and further strengthen our conclusion.

      We are interested to see how future experiments addressing the points raised by the reviewer will shape our understanding of the intricate mechanisms involved in Wg signaling and Arm/-catenin degradation. Once again, we thank the reviewer for the thoughtful engagement with the research, and the comments will undoubtedly stimulate further investigation and discussion in this area.

      Reviewer #2 (Recommendations For The Authors):

      The work really needs more experiments to further provide a mechanistic understanding and distinguish between direct and indirect action (via Notch signaling) on Wingless, but instead switches in the second half to a second interaction with β-catenin, leaving the conclusions of the first part hanging. More mechanistic information on the cell biology of how Deltex might affect wingless endocytic trafficking directly would be beneficial, for example involving some cell culture experiments where the action of deltex on Notch and wingless could be more clearly separated and a more detailed study of the consequences on wingless trafficking could be explored.

      Wingless is secreted into an extracellular compartment and so won't be accessible for a direct interaction with cytoplasmic deltex. Therefore are the authors proposing Deltex interacts with a membrane-bound wingless receptor such as frizzled in order to mediate its effects? These avenues could be explored further experimentally to derive a more mechanistic conclusion.

      The colocalisation images are not high resolution and colocalisation is not quantified, and no differences ( +/- Deltex) in wingless subcellular localisation, which would aid mechanistic interpretation, are shown.

      We thank the reviewer for the insightful feedback on our work. We appreciate the suggestion for more experiments to provide a mechanistic understanding and to distinguish between direct and indirect actions of Notch on Wingless signaling. We acknowledge the importance of clarifying these aspects and agree that further experiments could help separate the effects of Deltex on Notch and Wingless signaling, allowing for a more detailed examination of their respective trafficking and ubiquitination mechanisms.

      We will consider your valuable input in our future research efforts to enhance the comprehensiveness of our study.

      Other specific points

      Figure 2: Narrowing and broadening of different marker gene expression patterns in dx mutants needs to be quantified so that variation is taken into account and the numbers of wings imaged should be clearly stated.

      We greatly appreciate this valuable suggestion from the reviewer. As a response, we have incorporated quantification data to address the observed variations. We have also provided information regarding the number of wing discs that were imaged for the purpose of quantification.

      Figure 3: The number of discs imaged in total should be mentioned

      We express our appreciation to the reviewer for the input. We have taken their comment into account and have subsequently included details regarding the number of discs imaged in the figure legend section of the manuscript.

      Figure 6: There is no description of (E5-E6) in the figure legend. F1 to F5 eye size phenotypes require quantification.

      We are grateful to the reviewers for bringing this to our attention. In response, we have included a description of E5-E6 in the figure legend. Also, as per the reviewer’s suggestions, we have incorporated the quantification data of the eye size phenotype.

      Discussion

      Links between Notch and wingless pathway should be more comprehensively discussed, including previous work that has previously linked Notch/Deltex to β-catenin degradation e.g.

      Acar et al. .Sci Rep 2021 Apr 27;11(1):9096. doi: 10.1038/s41598-021-88618-5

      Hayward et al. Development 2005 Apr;132(8):1819-30. doi: 10.1242/dev.01724;

      Kwon et al Nat Cell Biol 2011 Aug 14;13(10):1244-51. doi: 10.1038/ncb2313.

      Sanders et al. PLoS Biol 2009 Aug;7(8):e1000169. doi:10.1371/journal.pbio.1000169. Epub 2009 Aug 11.

      The links between endocytic trafficking and wingless gradient formation could also be further discussed eg.

      Marois et al. Development 2006 Jan;133(2):307-17.doi: 10.1242/dev.02197. Epub 2005 Dec 14

      Yamazaki et al Nat Cell Biol 2016 Apr;18(4):451-7. doi: 10.1038/ncb3325. Epub 2016 Mar 14.

      We appreciate the reviewer's valuable suggestions and we have now included these references in the discussion section of the revised manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      (1) The data strongly suggest that iron depletion in urine leads to conditional essentiality of some genes. It would be informative to test the single gene deletions (Figure 3G) for growth in urine supplemented with iron, to determine how many of those genes support growth in urine due to iron limitation.

      We appreciate this suggestion. We have now included this suggested experiment as a new panel (Figure 5G).

      (2) Line 641. The authors raise the intriguing possibility that some mutants can "cheat" by benefitting from the surrounding cells that are phenotypically wild-type. Growing a fepA deletion strain in urine, either alone or mixed with wild-type cells, would address this question. Given that other mutants may be similarly "masked", it is important to know whether this phenomenon occurs.

      We thank the reviewer for this suggestion but believe that this would be very difficult to ascertain in K. pneumoniae as several redundant iron uptake systems exist. This would require significantly more time to construct sequential/combinatorial iron-uptake mutants to exactly determine this “cheating” and “masking” phenomenon and such work is beyond the scope of the current study.

      (3) In cases where there are disparities between studies, e.g., for genes inferred to be essential for serum resistance, it would be informative to test individual deletions for genes described as essential in only one study.

      We thank the reviewer for this suggestion, and we agree that deleting conditionally essential genes (i.e. serum resistance) could help identify discrepancies in methodology with other studies but this is beyond the scope of this study. Furthermore, we do not have these other strains readily available to us and importing these strains into Australia is challenging due to the strict import/quarantine laws.

      Reviewer #1 (Recommendations For The Authors)

      (4) Line 529. Why was 50 chosen as the read count threshold?

      This was chosen as the minimum threshold needed to exclude essential genes from the comparative analysis, as these can contribute false positive results where a change from, for example, 2 to 5 reads between conditions is considered a >2-fold change. We have updated the manuscript text to highlight this: “were removed from downstream analysis to exclude confounding essential genes and minimize the effect of stochastic mutant loss” (line 539

      (5) The titles for Figure 5 and Figure 6 appear to be switched.

      Thank you, we have now corrected this error.

      (6) Line 381. "Forty-six of these regions contain potential open reading frames that could encode proteins". How is a potential ORF defined?

      This was based on submitting the selected 145bp regions to BLASTx using default parameters and listing the top hit (if one was found). We have now edited the manuscript text to make this clearer. (Line 394)

      (7) Two previous TnSeq studies looking at Escherichia coli and Vibrio cholerae suggest that H-NS can prevent transposon insertion, leading to false positive essentiality calls. Is there any evidence of this phenomenon here? A/T content could be used as a proxy for H-NS occupancy.

      We thank the reviewer for this point and also agree that H-NS or other DNA-binding proteins could indeed lead to false-positive essentiality calls using TraDIS. Based on this, we have now included a sentence in the conclusion section mentioning this methodological caveat (Line 631). We believe that A/T content could potentially be used as a proxy for H-NS occupancy,

      Reviewer #2 (Recommendations For The Authors):

      (1) The authors may wish to reformat the manuscript by decanting a number of panels and figures as supplementary material. These include the panels related to the description of TraDIS (for example Fig 1D, 1E, 1F. 1G, Fig 2A, Fig 3C, 3D, 3E, 3F, Fig 5C, Fig 6D). This is a well-established method.

      We thank the reviewer for this suggestion but believe that these panels allow the methodology and resulting insertion plots to be more followable and allow other researchers, of varying expertise, to better understand this functional genetic screen technique.

      (2) The authors need to indicate how relevant the strain they have probed is. Is it a good reference strain of the KpI group?

      This is a great suggestion and we have now included a new figure illustrating the genetic context and relatedness of K. pneumoniae ECL8 within the KpI phylogroup (New Figure 3).

      (3) The authors need to provide an extensive comparison between the data obtained and those reported testing other Klebsiella strains. A Table identifying the common and different genes, as well as a figure, may suffice. I would encourage authors to compare also their data against E. coli and Salmonella. For example, igaA seems to be not essential in Kebsiella although data indicates it is in Salmonella.

      We thank the reviewer for their comment and appreciate that our data could be extended and compared to other relevant Enterobacteriaceae members. However, we believe this is beyond the scope of this study as the focus is more on K. pneumoniae.

      (4) None of the mutants tested further are complemented. Without these experiments, it cannot be rigorously claimed that these loci play any role in the phenotypes investigated.

      We agree that complementation is an important tenet for validation of mutant gene phenotypes to specific gene loci, in this case wbbY has already been complemented and believe complementation for an already known molecular mechanism would be redundant. Please refer to our response in point 6.

      We complemented isolated transposon mutants hns7::Tn5 and hns18::Tn5 with a mid-copy IPTG inducible . We observed a slight increase in serum susceptibility but not full rescue of the WT phenotype (i.e. serum susceptibility). We suspect that the imperfect rescue of the serum-resistance phenotype observed could be due to the expression levels and copy number of the complement hns plasmid used. As hns is a known global regulator its possible pleiotropic role is complex as many aspects of stress response, metabolism or capsule could be affected in Klebsiella (doi.org/10.1186/1471-2180-6-72, doi.org/10.3389/fcimb.2016.00013). We have now included in the text our efforts in complementation and have included a new supplementary figure (Figure S11).

      (5) The contribution of siderophores to survival in urine is not conclusively established. Authors may wish to test the transcription of relevant genes, and to assess whether the expression is fur dependent in urine. Also, authors may wish to identify the main siderophore needed for survival in urine by probing a number of mutants; this will allow us to assess whether there is a degree of selection and redundancy.

      We thank the reviewer for their comment and agree siderophore uptake is important. We have now included an additional panel (Figure 5G) interrogating the importance of iron-uptake genes grown in urine which is iron limited. We do appreciate that further experiments looking into the Fur regulon and siderophore biosynthesis would be interesting but believe this is outside the scope of this study.

      (6) The role of wbbY is intriguing, pointing towards the importance of high molecular weight O-polysaccharide. In this mutant background, the authors need to assess whether the expression of the capsule, and ECA is affected. Authors need also to complement the mutant. Which is the mechanism conferring resistance?

      We thank the reviewer for their comment and would like to mention that wbbY has already been shown to play a role in LPS profile/biosynthesis and serum-resistance (10.3389/fmicb.2014.00608 ). Furthermore, blast analysis shows that the wbbY gene between the NTUH-K2044 (strain used in aforementioned study) and ECL8 shares 100% sequence identity and also shares lps operon structure. Hence, we do not find it pertinent to complement this mutant as we believe its molecular mechanism has already been established. We have now in the text more prominently highlighted the results of this study and how our screen was robust enough to also identify this gene for serum resistance.

      (7) hns and gnd mutants most likely will have their capsule affected. The authors need to assess whether this is the case. Which is the mechanism conferring resistance?

      As mentioned in point 6, we believe that the serum resistance phenotype is attributable to the LPS phenotype. Previous studies have listed hns and gnd mutants would likely have differences in capsule but due to hns being pleiotropic and gnd being intercalated/adjacent to the LPS/O-antigen biosynthesis it would be difficult to exactly delineate which cellular surface structure is involved.

      (8) The conclusion section can be shortened significantly as much of the text is a repetition of the results/discussion section.

      We thank the reviewer for their suggestion and have made edits to limit repetition in the conclusion section.

      Reviewer #3 (Public Review):

      Below I include several comments regarding potential weaknesses in the methodology used:

      • The study was done with biological duplicates. In vitro studies usually require 3 samples for performing statistical robust analysis. Thus, are two duplicates enough to reach reproducible results? This is important because many genes are analyzed which could lead to false positives. That said, I acknowledge that genes that were confirmed through targeted mutagenesis led to similar phenotypic results. However, what about all those genes with higher p and q values that were not confirmed? Will those differences be real or represent false positives? Could this explain the differences obtained between this and other studies?

      We thank the reviewer for their comment and apologize for the confusion, data were only pooled for the statistical analysis of gene essentiality. Here, two technical replicates of the input library were sequenced and the number of insertions per gene quantified (insertion index scores). These replicates had a correlation coefficient of r2 = 0.955, and the insertions per gene data were pooled to give total insertions index scores to predict gene essentiality. For conditional analyses (growth in urine or serum), replicate data were not combined. As mentioned previously, differences between this and other studies could also be attributed to inherent genomic differences or due to differences in experimental methodology, computational approaches, or the stringency of analysis used to categorize these genes.

      • Two approaches are performed to investigate genes required for K. pneumoniae resistance to serum. In the first approach, the resistance to complement in serum is investigated. And here a total of 356 genes were identified to be relevant. In contrast, when genes required for overall resistance to serum are studied, only 52 genes seem to be involved. In principle, one would expect to see more genes required for overall resistance to serum and within them identify the genes required for resistance to complement. So this result is unexpected. In addition, it seems unlikely that 356 genes are involved in resistance to complement. Thus, is it possible false positives account for some of the results obtained?

      We thank the reviewer for their comment and do believe false positives may account for some of the identified genes. Specifically, to the large contrast in genes, we believe this is due to the methodology as alluded to in our conclusion section. For overall resistance to serum, we used a longer time point (180 min exposure) where fewer surviving mutants are recovered hence fewer overall genes will be identified, whereas strains with short killing windows will have more (i.e. complement-mediated killing, 90 minute exposure).

      Reviewer #3 (Recommendations For The Authors):

      • In Figure 4 it is shown that genes important for growth in urine include several that are required for enterobactin uptake. Moreover, an in vitro experiment shows that the complementation of urine with iron increases K. pneumoniae growth. It would have been informative to do a competition experiment between the WT and Fep mutants in urine supplemented with iron. This could demonstrate that the genes identified are only necessary for conditions in which iron is in limiting concentrations and confirm that the defect of the mutants is not due to other characteristics of urine.

      We appreciate this suggestion. We have now included a new panel (Figure 5G) addressing the supplementation of iron in urine for these select mutants.

      • Considering the results section, the title for Figure 6 seems to be more appropriate for Figure 5.

      Thank you, this has now been corrected.

      Other points:

      • Line 44: treat instead of treating

      Thank you, this has now been corrected.

      • Line 63: found that only 3 genes played a role instead of "found only 3 genes played a role"

      Thank you, this has now been corrected.

      • Line 105: is there any reason for only using males? Since UTIs are frequent in women? Why not use urine from women volunteers?

      Due to accessibility of willing volunteers and human ethic application processes, only male samples were available. We are currently undertaking further studies to understand how male and female urine influences growth of uropathogens.

      • Line 105: since the urine was filter-sterilized, maybe the authors can comment that another point that is missing in urine - and that it may be important to study - will be the presence of the urine microbiome and how this affects growth of K. pneumoniae.

      We again thank the reviewer for this comment and have now edited the manuscript discussing how the absence of urine microbiome could affect growth (Line 659). As an aside, future studies in our lab are interested in looking at the role of commensal/microbiome co-interactions for essentiality/pathogenesis using TraDIS.

      • Line 116: I understand that the 8 healthy volunteers combined males and females

      Thank you, we have now edited this methods line to make this clearer.

      • Line 120: incubate in serum 90 min and 180 RPM shaking: any reasons for using these conditions, any reference supporting these conditions?

      Thank you for pointing this out, we were mirroring a previous K. pneumoniae serum-resistance study (doi.org/10.1128/iai.00043-).

      • Line 156: space after the dot.

      Thank you, we have now corrected this in the manuscript.

      • Line 164: resulting reads were mapped to the K. pneumoniae: what are the parameters used for mapping (e.g. % of identity...)?

      Thank you for bringing this to our attention, we have now included in our manuscript that we used the default parameters of BWA-MEM for mapping for minimum seed length (default -k =20bp exact match)

      • Line 180: it will be good to upload to a repository the In-house scripts used or indicate the link beside the reference for those scripts.

      Our scripts are derived from the pioneering TraDIS study (doi: 10.1101/gr.097097.109). We are currently still optimizing our scripts and intend to upload these to be publicly available. However, in the meantime we are more than happy to share them with other parties upon request.

      • Line 191: why were genes classified as 12 times more likely to be situated in the left mode? Any particular reason for using this threshold?

      We opted for a more-stringent threshold for classifying essential genes, in keeping with previous and comparable studies (doi.org/10.1371/journal.pgen.1003834).

      • Line 209: do you mean Q-value of <0.05 instead of >0.05 ? How is this Q value is calculated, and which specific tests are applied?

      Thank you for pointing out this Q value error, we have now corrected this in the manuscript. These values were generated using the biotradis tradis_comparison.R script which uses the EdgeR package. For further reading please see DOI: 10.1093/bioinformatics/btp616. The Q-values are from P values corrected for multiple testing by the Benjamini-Hochberg method.

      • Line 212: again, which type of test is used? What about the urine growth analysis? The same type of tests were applied?

      Thank you for bringing this to our attention, we have now indicated in the referenced method section the use of which package for which datasets (i.e. or serum). Line 212 refers to our use of the AlbaTraDIS package, which builds on the biotradis toolkit, to identify gene commonalities/differences in the selected growth conditions again using multiple testing by the Benjamini-Hochberg methods. For further reading, please refer to DOI: 10.1371/journal.pcbi.1007980

      • Line 226: do the authors mean Sanger sequencing instead of SangerSanger sequencing?

      Thank you, we have now corrected this in the manuscript.

      • Line 239: does the WT strain contain another marker for differentiating this strain from the mutant? Or is the calculation of the number of WT CFUs done by subtracting the number of CFUs in media with antibiotics from the total number of CFUs in media without antibiotics? The former will be a more accurate method.

      The calculation was based on the latter assumption, “number of WT CFUs done by subtracting the number of CFUs in media with antibiotics from the total number of CFUs in media without antibiotics”. We have now updated the methods section to make this clearer.

      • Line 266: can you indicate approximately how many CFUs you have in this OD?

      Thank you, we have now also indicated an approximate CFU for this mentioned OD600 (OD600 1 = 7 × 108 cells).

      • Line 309: besides indicating Figure 1D please indicate here Dataset S1 (the table where one can see the list of essential and non-essential genes). This table is shown afterwards but I think it will be more appropriate to show it at the begging of the section.

      Thank you, we have now taken on this recommendation and have now edited the manuscript to also indicate Dataset S1 earlier.

      • Table 3. regarding the comparison of essential genes between different strains. I think it will be more clear if a Venn diagram was drawn including only genes that have homologs in all the studied strains (i.e. defining the core genome essentially).

      We would like to thank the reviewer for suggesting a venn diagram and have now removed Table 3 which has been replaced with a new Figure 3.

      • Line 461: replicates were combined for downstream analyses? But are replicates combined for doing the statistical analysis? If so, how is the statistical analysis performed? How is it taken into account the potential variability in the abundance in each library? An r of 0.9 is high but not perfect.

      Technical replicates of the sequenced input library were combined following identification of a correlation coefficient of r2 = 0.955, for the calculation of insertion index scores used in gene essentiality analysis. While r2 = 0.955 is not perfect, discrepancies here can be attributed to higher variance in insertion index scores when sampling small genes, as these are represented by fewer insertions and the stochastic absence of a single insertion event has a greater effect on the overall IIS. Replicate data were not pooled for statistical analysis of mutant fitness (growth in urine and serum).

      • Line 487: is there any control strain containing the kanamycin gene in a part of the genome that does not affect the growth of K. pneumoniae? This could be used to show that having the kanamycin gene does not provide any defect in urine growth.

      We thank the reviewer for this suggestion but argue that introduction of the kanamycin gene into each unique loci may result in various levels of gene fitness that would be incomparable to a single control strain. Instead, we culture the ECL8 mutant library in urine and ensure that its kinetics are comparable to the wildtype. As the library contains thousands of kanamycin cassettes uniquely positioned across most of the genome with no observable growth defect, we do not anticipate the presence or expression of the cassette to have an appreciable impact.

      • Line 569: in the methodology it was indicated that control cells were incubated in PBS for the same amount of time. I think this is an important control that is not cited in the results section. Please can you indicate?

      We apologise for this misunderstanding due to how the methodology was written. The experiment did not sequence the PBS incubated samples as this was solely used a check for viability of the used K. pneumoniae ECL8 stock solution.

      • Line 597: "Mutants in igaA are enriched in our experiments". Can you show this data?

      We have now included this as a supplementary (Figure S11A)

      • Line 615: when doing this calculation, I guess the authors take into account only genes that are also present in the other strains.

      That is correct, we were aiming to highlight the high conservation of “essential genes” among all the selected strains.

      • Line 627: why surprisingly? Because is too low. Then indicate.

      Thank you, we have now edited this sentence to indicate that.

      • Figure 4: please, for clarity, can you indicate the meaning of the colors in the figure itself besides indicating it in the figure legend?

      Thank you, we have now included a color legend in these figure panels for clarity.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #3:

      Summary:

      The receptor tyrosine kinase Anaplastic Lymphoma Kinase (ALK) in humans is nervous system expressed and plays an important role as an oncogene. A number of groups have been studying ALK signalling in flies to gain mechanistic insight into its various roles. In flies, ALK plays a critical role in development, particularly embryonic development and axon targeting. In addition, ALK also was also shown to regulate adult functions including sleep and memory. In this manuscript, Sukumar et al., used a suite of molecular techniques to identify downstream targets of ALK signalling. They first used targeted DamID, a technique that involves a DNA methylase to RNA polymerase II, so that GATC sites in close proximity to PolII binding sites are marked. They performed these experiments in wild type and ALK loss of function mutants (using an Alk dominant negative ALkDN), to identify Alk responsive loci. Comparing these loci with a larval single cell RNAseq dataset identified neuroendocrine cells as an important site of Alk action. They further combined these TaDa hits with data from RNA seq in Alk Loss and Gain of Function manipulations to identify a single novel target of Alk signalling - a neuropeptide precursor they named Sparkly (Spar) for its expression pattern. They generated a mutant allele of Spar, raised an antibody against Spar, and characterised its expression pattern and mutant behavioural phenotypes including defects in sleep and circadian function.

      Strengths:

      The molecular biology experiments using TaDa and RNAseq were elegant and very convincing. The authors identified a novel gene they named Spar. They also generated a mutant allele of Spar (using CrisprCas technology) and raised an antibody against Spar. These experiments are lovely, and the reagents will be useful to the community. The paper is also well written, and the figures are very nicely laid out making the manuscript a pleasure to read.

      We thank the reviewer for this analysis.

      Weaknesses:

      The manuscript has improved substantially in the revision. Yet, some concerns remain around the genetics and behavioural analysis which is incomplete and confusing. The authors generated a novel allele of Spar - Spar ΔExon1 and examined sleep and circadian phenotypes of this allele and of RNAi knockdown of Spar. The RNAi knockdown is a welcome addition. However, the authors only show one parental control the GAL4 / +, but leave out the other parental control i.e. the UAS RNAi / + e.g. in Fig. 9. It is important to show both parental controls.

      We would like to express our gratitude for your insightful comments and feedback on our manuscript. We acknowledge the concerns raised regarding the genetics and behavioural analysis, and we appreciate the opportunity to address these issues. We have added the reciprocal UAS Spar-RNAi control in addition to the GAL4/+ control and we have incorporated both controls in the revised Figure 9, Figure 9 Supplementary Figure 1 and Figure 9 Supplementary Figure 2. Figure legends have been modified accordingly.

      Further, the sleep and circadian characterisation could be substantially improved. It is unclear how sleep was calculated - what program was used or what the criteria to define a sleep bout was.

      The data underwent analysis utilizing an Excel macro, as outlined in the study by Berlandi et al. (2017) (PMID: 28912696). As previously indicated in the methodology, sleep is characterized as 5 minutes of inactivity. The raw data acquired from the Trikenetics DAM system was input into an Excel spreadsheet, and the parameters, encompassing sleep and activity, were computed for each day of the trial as an average derived from the data of all living animals at that time. Subsequently, these parameters were exhibited over the course of the experiment. We have further detailed this part in the methods section to avoid confusion (Page 32 of revised MS).

      In the legend for Fig 8c, it says sleep was shown as "percentage of time flies spend sleeping measured every 5min across a 24h time span". Sleep in flies is (usually) defined as at least 5 min of inactivity. With this definition, I'm not sure how one can calculate the % time asleep in a 5 min bin! Typically people use 30min or 60min bins.

      We thank the reviewer for bringing this to our attention. As previously stated, in our experiments, sleep is defined as 5 minutes of inactivity. We have now modified the wording in the figure legend (Figure 8, Page 41), which was previously misleading.

      The sleep numbers for controls also seem off to me e.g. in Fig. 8H and H' average sleep / day is ~100. Is this minutes of sleep? 100 min / day is far too low, is it a typo? The same applies to Figure 8, figure supplement 2. Other places e.g. Fig 8 figure supplement 1, avg sleep is around 1000 min / day.

      The numbers for sleep bouts are also too low to me e.g. in Fig 9 number of sleep bouts avg around 4, and in Fig. 8 figure supplement 2 they average 1 sleep bout. There are several free software packages to analyse sleep data (e.g. Sleep Mat, PMID 35998317, or SCAMP). I would recommend that the authors reanalyse their data using one of these standard packages that are used routinely in the field. That should help resolve many issues.

      We thank the reviewer for pointing this out. There was indeed a typo “missing 0”, resulting in 0 values as only 3 days of raw data were chosen for the analysis of the average sleep in the mentioned figures. We have corrected this mistake in all figures.

      The circadian anticipatory activity analyses could also be improved. The standard in the field is to perform eduction analyses and quantify anticipatory activity e.g. using the method of Harrisingh et al. (PMID: 18003827). This typically computed as the ratio of activity in the 3hrs preceding light transition to activity in the 6hrs preceding light transition. The programs referenced above should help with this.

      For consistency purposes we used the same macro excel (Berlandi et al, 2017) (PMID: 28912696) and followed the methodology of Harrisingh et al. (PMID: 18003827) to assess the anticipatory activity. We selected the activity in the 6 h period before lights on and defined it as a.m. anticipation, and the activity in the 6h period preceding the lights off and defined as p.m. anticipation (Figure 8 f-g).

      Finally, in many cases I'm not sure that the appropriate statistical tests have been used e.g. in Fig 8c, 8e, 8h t-tests have been used when are three groups in the figure. The appropriate test here would an ANOVA, followed by post-hoc comparisons.

      We agree with the reviewer’s comments. We have re-evaluated the data in Figure 8 b, c, e, h and h’ and Figure 8 Supplement 2 and 4 using a One-Way ANOVA followed by Tukey post-hoc test and we have indicated this in all legends.

    1. Author response:

      We kindly thank the senior editor, the reviewing editor, and the esteemed reviewers for their invaluable insights in enhancing our manuscript. The assessment and feedback, particularly on the role of directly released bacterial ATP versus OMV-delivered bacterial ATP and its role on neutrophils, addressing study limitations, and discussing our models is highly appreciated.

      The points you raised let us critically rethink our approach, our results, and our conclusions. Furthermore, it gave us the chance to elaborate on some critical aspects that you mentioned. With your help, we will make clarifications throughout the manuscript, and we will add the data about neutrophil numbers in the different organs (reviewer #1, weaknesses #3).

      Reviewer #1 (Public Review):

      Summary:

      • Extracellular ATP represents a danger-associated molecular pattern associated to tissue damage and can act also in an autocrine fashion in macrophages to promote proinflammatory responses, as observed in a previous paper by the authors in abdominal sepsis. The present study addresses an important aspect possibly conditioning the outcome of sepsis that is the release of ATP by bacteria. The authors show that sepsis-associated bacteria do in fact release ATP in a growth dependent and strain-specific manner. However, whether this bacterial derived ATP play a role in the pathogenesis of abdominal sepsis has not been determined. To address this question, a number of mutant strains of E. coli has been used first to correlate bacterial ATP release with growth and then, with outer membrane integrity and bacterial death. By using E. coli transformants expressing the ATP-degrading enzyme apyrase in the periplasmic space, the paper nicely shows that abdominal sepsis by these transformants results in significantly improved survival. This effect was associated with a reduction of peritoneal macrophages and CX3CR1+ monocytes, and an increase in neutrophils. To extrapolate the function of bacterial ATP from the systemic response to microorganisms, the authors exploited bacterial OMVs either loaded or not with ATP to investigate the systemic effects devoid of living microorganisms. This approach showed that ATP-loaded OMVs induced degranulation of neutrophils after lysosomal uptake, suggesting that this mechanism could contribute to sepsis severity.

      Strengths:

      • A strong part of the study is the analysis of E. coli mutants to address different aspects of bacterial release of ATP that could be relevant during systemic dissemination of bacteria in the host.

      We want to thank the reviewer for recognizing this important aspect of our experimental approach.

      Weaknesses:

      • As pointed out in the limitations of the study whether ATP-loaded OMVs provide a mechanistic proof of the pathogenetic role of bacteria-derived ATP independently of live microorganisms in sepsis is interesting but not definitively convincing. It could be useful to see whether degranulation of neutrophils is differentially induced by apyrase-expressing vs control E. coli transformants.

      We thank the reviewer for raising several important points. In our study, we assessed local and systemic effects of released bacterial ATP. The consequences of local bacterial ATP release were assessed using an apyrase-expressing E. coli transformant. Locally, bacterial ATP resulted in a decrease in neutrophil numbers and we hypothesize that directly released bacterial ATP either leads to neutrophil death (e.g. via P2X7 receptor (Proietti et al., 2019)) or interferes with the recruitment of neutrophils (e.g. via P2Y receptors (Junger, 2011)).

      The systemic consequences were assessed using ATP-loaded and empty OMV. We have shown that degranulation is induced by OMV-derived bacterial ATP. ATP-containing OMV are engulfed by neutrophils, reach its endolysosomal compartment and might activate purinergic receptors, which then lead to aberrant degranulation. This concept, that needs to be explored in future studies, is fundamentally different from classical purinergic signaling via directly released bacterial ATP into the extracellular space.

      It is possible that neutrophil degranulation is also modulated by directly released bacterial ATP. We agree that this should be assessed in future studies. Also, the role of OMV-derived bacterial ATP should be assessed locally as well as the importance of directly released vs. OMV-mediated bacterial ATP dissected locally. Based on our measurements (Figure 4-figure supplement 1A and Figure 5C), we estimate that the effect of OMV-derived bacterial ATP might be much smaller than the effects of directly released bacterial ATP. Thus, direct ATP release might predominate locally. However, we fully agree that this has to be investigated in a future study to reconcile the different aspects of bacterial ATP signaling. A paragraph will be added to the manuscript, in which we discuss this particular issue.

      • Also, the increase of neutrophils in bacterial ATP-depleted abdominal sepsis, which has better outcomes than "ATP-proficient" sepsis, seems difficult to correlate to the hypothesized tissue damage induced by ATP delivered via non-infectious OMVs.

      We fully acknowledge the mentioned discrepancy. What we propose is that bacterial ATP exhibits different functions that are dependent on the release mechanism (see above). Locally, in the peritoneal cavity, neutrophil numbers are decreased by directly released bacterial ATP. Remotely, ATP is delivered via OMV and impacts on neutrophil function. We agree that, in particular, in the peritoneal cavity, both effects may play a role. However, the impact of directly released bacterial ATP seems to be dominant (see above).

      We propose that neutrophils are decreased locally because of directly released bacterial ATP, which prevents efficient infection control and, therefore, impairs sepsis survival. In addition, these fewer neutrophils might even be dysregulated by the engulfment of bacterial ATP delivered via OMV, which leads to an upregulated and possibly aberrant degranulation process worsening local and remote tissue damage. We agree that in addition to neutrophil numbers, the function of local neutrophils should be assessed with and without the influence of OMV-delivered bacterial ATP. This could be done by RNA sequencing of primary neutrophils from the peritoneal cavity or neutrophil cell lines as well as degranulation assays.

      • Are the neutrophils counts affected by ATP delivered via OMVs?

      This is difficult to show in the peritoneal cavity where we have both, directly released bacterial ATP and OMV-derived bacterial ATP. We assessed such putative difference, however, for the systemic organs and the blood, where we did not find any differences in neutrophil numbers. We will include the figure in the revised manuscript as Figure 6-figure supplement 3C.

      Author response image 1.

      • A comparison of cytokine profiles in the abdominal fluids of E. coli and OMV treated animals could be helpful in defining the different responses induced by OMV-delivered vs bacterial-released ATP. The analyses performed on OMV treated versus E. coli infected mice are not closely related and difficult to combine when trying to draw a hypothesis for bacterial ATP in sepsis.

      We fully agree that there are several open questions that remain to be elucidated, in particular, to differentiate the local role of directly released versus OMV-delivered bacterial ATP. In this study, we laid the foundation for future in vivo research to examine the specific role of bacterial ATP in sepsis. Such future research avenues might be to investigate the local effects of OMV-delivered bacterial ATP, and how neutrophil migration, apoptosis and degranulation are altered. We agree that exploration of the local secretory immune response and cytokine profiles are relevant to understand the different mechanisms of how bacterial ATP alters sepsis. However, such experiments should be ideally performed in systems where the source and the delivery of ATP can be modulated locally.

      • Also it was not clear why lung neutrophils were used for the RNAseq data generation and analysis.

      Thank you for this remark. We have chosen primary lung neutrophils for four reasons:

      (1) Isolation of primary lung neutrophils allowed us to assess an in vivo response that would not have been possible with cell lines.

      (2) The lung and the respiratory system are among the clinically most important organs affected during sepsis resulting in a significant cause of mortality.

      (3) We show in Figure 6C that specifically in the lung, OMV are engulfed by neutrophils, which shows the relevance of the lung also in our study context.

      (4) And finally, lung neutrophils were chosen to examine specifically distant and not local effects.

      Reviewer #2 (Public Review):

      Summary:

      • In their manuscript "Released Bacterial ATP Shapes Local and Systemic Inflammation during Abdominal Sepsis", Daniel Spari et al. explored the dual role of ATP in exacerbating sepsis, revealing that ATP from both host and bacteria significantly impacts immune responses and disease progression.

      Strengths:

      • The study meticulously examines the complex relationship between ATP release and bacterial growth, membrane integrity, and how bacterial ATP potentially dampens inflammatory responses, thereby impairing survival in sepsis models. Additionally, this compelling paper implies a concept that bacterial OMVs act as vehicles for the systemic distribution of ATP, influencing neutrophil activity and exacerbating sepsis severity.

      We thank the reviewer for mentioning these key points and supporting the relevance of our study.

      Weaknesses:

      (1) The researchers extracted and cultivated abdominal fluid on LB agar plates, then randomly picked 25 colonies for analysis. However, they did not conduct 16S rRNA gene amplicon sequencing on the fluid itself. It is worth noting that the bacterial species present may vary depending on the individual patients. It would be beneficial if the authors could specify whether they've verified the existence of unculturable species capable of secreting high levels of Extracellular ATP.

      Most septic complications are caused by a limited spectrum of bacteria, belonging mainly either to the Firmicutes or the Proteobacteria phyla, including E. coli, K. pneumoniae, S. aureus or E. faecalis (Diekema et al., 2019; Mureșan et al., 2018). We validated this well documented existing evidence by randomly assessing 25 colonies. For the planned experiments, it was crucial to work with culturable bacteria; otherwise, ATP measurements, the modulation of ATP generation or loading of OMV would not have been possible. Using such culturable bacteria allowed us to describe mechanisms of ATP release.

      We fully agree that hard-to-culture or unculturable bacteria might contribute significantly to septic complications. This, however, would need to be explored in future studies using extensive culturing methods (Cheng et al., 2022).

      (2) Do mice lacking commensal bacteria show a lack of extracellular ATP following cecal ligation puncture?

      ATP is typically secreted by many cells of the host in active and passive manners in the case of any injury, including cecal ligation and puncture (Burnstock, 2016; Dosch et al., 2018; Eltzschig et al., 2012; Idzko et al., 2014). We hypothesize that bacterial ATP is a potential priming agent at early stages of sepsis, and indeed, at such early time points, a comparison of peritoneal ATP levels between germfree and colonized mice could support our hypothesis. Future studies addressing this question must, however, correct for the different immune responses between germ-free and colonized mice. This is of utmost importance, especially for the cecal ligation and puncture model, since the cecum of germ-free mice is extremely large, making such experiments hard to control.

      (3) The authors isolated various bacteria from abdominal fluid, encompassing both Gram-negative and Gram-positive types. Nevertheless, their emphasis appeared to be primarily on the Gram-negative E. coli. It would be beneficial to ascertain whether the mechanisms of Extracellular ATP release differ between Gram-positive and Gram-negative bacteria. This is particularly relevant given that the Gram-positive bacterium E. faecalis, also isolated from the abdominal fluid, is recognized for its propensity to release substantial amounts of Extracellular ATP.

      We fully agree with this comment. In this paper, we used E. coli as our model organism to determine the principles of sepsis-associated bacterial ATP release and therefore focused on gram-negative bacteria. In addition to the direct, growth-dependent release, we found a relevant impact of OMV-delivered bacterial ATP. For this latter purpose, a gram-negative strain, in which OMV generation has been well described (Schwechheimer & Kuehn, 2015), was chosen. Recently, gram-positive bacteria have been shown to secrete ATP and OMV as well (Briaud & Carroll, 2020; Hironaka et al., 2013; Iwase et al., 2010). Given the fundamental differences in the structure of the cell wall of gram-positive bacteria and the mechanisms of OMV generation and release, future studies are required to assess the relevance of directly released and OMV-delivered ATP in gram-positive bacteria.

      (4) The authors observed changes in the levels of LPM, SPM, and neutrophils in vivo. However, it remains uncertain whether the proliferation or migration of these cells is modulated or inhibited by ATP receptors like P2Y receptors. This aspect requires further investigation to establish a convincing connection.

      We fully agree with this comment. The decrease in LPM and the consequential predomination of SPM have been well described after inflammatory stimuli in the context of the macrophage disappearance reaction (Ghosn et al., 2010). Also, it has been shown that purinergic signaling modulates infiltration of neutrophils and can lead to cell death as a consequence of P2Y and P2X receptor activation (Junger, 2011; Proietti et al., 2019). In our study, we propose that intracellular purinergic receptors contribute to neutrophil function during sepsis. After introducing the general principles and fundaments of bacterial ATP with our studies, we fully agree that additional experiments need to address downstream purinergic receptor activation. That, however, would go beyond the scope of our study.

      (5) Additionally, is it possible that the observed in vivo changes could be triggered by bacterial components other than Extracellular ATP? In this research field, a comprehensive collection of inhibitors is available, so it is desirable to utilize them to demonstrate clearer results.

      This question is of utmost importance and defined the choice of our model and experimental approach. When we started the project, we used two different E. coli mutants that release low (ompC) and high (eaeH) amounts of ATP. However, the limitation of this approach is that these are different bacteria, which may also differ in the components they secrete or the surface proteins they express. We, therefore, decided against that approach. With the approach we finally used (same bacterium, just with and without ATP), we aimed to minimize the influence of non-ATP bacterial components.

      (6) Have the authors considered the role of host-derived Extracellular ATP in the context of inflammation?

      Yes, the role of host-derived extracellular ATP in inflammation and sepsis is well-established with contradictory results (Csóka et al., 2015; Ledderose et al., 2016). This conflicting data was the rationale to test the relevance of bacterial ATP. We suggest that bacterial ATP is essential in the early phase of sepsis when bacteria invade the sterile compartment and before efficient host response, including the eukaryotic release of ATP, is established.

      (7) The authors mention that Extracellular ATP is rapidly hydrolyzed by ectonucleotases in vivo. Are the changes of immune cells within the peritoneal cavity caused by Extracellular ATP released from bacterial death or by OMVs?

      This is a relevant question that was also asked by reviewer #1, and we answered it in detail above (weaknesses comment #1 and #2). From our ATP measurements (Figure 4-figure supplement 1A and Figure 5C), we conclude that locally, the role of directly released bacterial ATP (extracellular) predominates over OMV-derived bacterial ATP. Furthermore, the mechanisms between directly released and OMV-derived bacterial ATP (within OMV, engulfed and transported to the endolysosomal compartment) are different, and especially extracellular ATP has been described to lead to apoptosis via P2X7 signaling.

      (8) In the manuscript, the sample size (n) for the data consistently remains at 2. I would suggest expanding the sample size to enhance the robustness and rigor of the results.

      Two biological replicates (independent cultures) were only used for the bacteria cultures in Figure 1, Figure 2, and Figure 3, which achieved similar results and the standard deviation remained very small, indicating its robustness. In the in vitro experiments in Figure 5 we used a sample size of 6 (three biological replicates measured in technical duplicates), since we saw bigger deviations in our measurements. For the in vivo experiments, we always used 5 or more animals in at least two independent experiments.

      References

      Briaud, P., & Carroll, R. K. (2020). Extracellular Vesicle Biogenesis and Functions in Gram-Positive Bacteria. Infection and Immunity, 88(12), 10.1128/iai.00433-20. https://doi.org/10.1128/iai.00433-20

      Burnstock, G. (2016). P2X ion channel receptors and inflammation. Purinergic Signalling, 12(1), 59–67. https://doi.org/10.1007/s11302-015-9493-0

      Cheng, A. G., Ho, P.-Y., Aranda-Díaz, A., Jain, S., Yu, F. B., Meng, X., Wang, M., Iakiviak, M., Nagashima, K., Zhao, A., Murugkar, P., Patil, A., Atabakhsh, K., Weakley, A., Yan, J., Brumbaugh, A. R., Higginbottom, S., Dimas, A., Shiver, A. L., … Fischbach, M. A. (2022). Design, construction, and in vivo augmentation of a complex gut microbiome. Cell, 185(19), 3617-3636.e19. https://doi.org/10.1016/j.cell.2022.08.003

      Csóka, B., Németh, Z. H., Törő, G., Idzko, M., Zech, A., Koscsó, B., Spolarics, Z., Antonioli, L., Cseri, K., Erdélyi, K., Pacher, P., & Haskó, G. (2015). Extracellular ATP protects against sepsis through macrophage P2X7 purinergic receptors by enhancing intracellular bacterial killing. The FASEB Journal, 29(9), 3626–3637. https://doi.org/10.1096/fj.15-272450

      Diekema, D. J., Hsueh, P.-R., Mendes, R. E., Pfaller, M. A., Rolston, K. V., Sader, H. S., & Jones, R. N. (2019). The Microbiology of Bloodstream Infection: 20-Year Trends from the SENTRY Antimicrobial Surveillance Program. Antimicrobial Agents and Chemotherapy, 63(7), e00355-19. https://doi.org/10.1128/AAC.00355-19

      Dosch, M., Gerber, J., Jebbawi, F., & Beldi, G. (2018). Mechanisms of ATP Release by Inflammatory Cells. International Journal of Molecular Sciences, 19(4), 1222. https://doi.org/10.3390/ijms19041222

      Eltzschig, H. K., Sitkovsky, M. V., & Robson, S. C. (2012). Purinergic Signaling during Inflammation. New England Journal of Medicine, 367(24), 2322–2333. https://doi.org/10.1056/NEJMra1205750

      Ghosn, E. E. B., Cassado, A. A., Govoni, G. R., Fukuhara, T., Yang, Y., Monack, D. M., Bortoluci, K. R., Almeida, S. R., Herzenberg, L. A., & Herzenberg, L. A. (2010). Two physically, functionally, and developmentally distinct peritoneal macrophage subsets. Proceedings of the National Academy of Sciences, 107(6), 2568–2573. https://doi.org/10.1073/pnas.0915000107

      Hironaka, I., Iwase, T., Sugimoto, S., Okuda, K., Tajima, A., Yanaga, K., & Mizunoe, Y. (2013). Glucose Triggers ATP Secretion from Bacteria in a Growth-Phase-Dependent Manner. Applied and Environmental Microbiology, 79(7), 2328–2335. https://doi.org/10.1128/AEM.03871-12

      Idzko, M., Ferrari, D., & Eltzschig, H. K. (2014). Nucleotide signalling during inflammation. Nature, 509(7500), 310–317. https://doi.org/10.1038/nature13085

      Iwase, T., Shinji, H., Tajima, A., Sato, F., Tamura, T., Iwamoto, T., Yoneda, M., & Mizunoe, Y. (2010). Isolation and Identification of ATP-Secreting Bacteria from Mice and Humans. Journal of Clinical Microbiology, 48(5), 1949–1951. https://doi.org/10.1128/JCM.01941-09

      Junger, W. G. (2011). Immune cell regulation by autocrine purinergic signalling. Nature Reviews Immunology, 11(3), 201–212. https://doi.org/10.1038/nri2938

      Ledderose, C., Bao, Y., Kondo, Y., Fakhari, M., Slubowski, C., Zhang, J., & Junger, W. G. (2016). Purinergic Signaling and the Immune Response in Sepsis: A Review. Clinical Therapeutics, 38(5), 1054–1065. https://doi.org/10.1016/j.clinthera.2016.04.002

      Mureșan, M. G., Balmoș, I. A., Badea, I., & Santini, A. (2018). Abdominal Sepsis: An Update. The Journal of Critical Care Medicine, 4(4), 120–125. https://doi.org/10.2478/jccm-2018-0023

      Proietti, M., Perruzza, L., Scribano, D., Pellegrini, G., D’Antuono, R., Strati, F., Raffaelli, M., Gonzalez, S. F., Thelen, M., Hardt, W.-D., Slack, E., Nicoletti, M., & Grassi, F. (2019). ATP released by intestinal bacteria limits the generation of protective IgA against enteropathogens. Nature Communications, 10(1), Article 1. https://doi.org/10.1038/s41467-018-08156-z

      Schwechheimer, C., & Kuehn, M. J. (2015). Outer-membrane vesicles from Gram-negative bacteria: Biogenesis and functions. Nature Reviews Microbiology, 13(10), 605–619. https://doi.org/10.1038/nrmicro3525

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewing Editor's comments:

      There appears to be several mistakes/missing details in the additional statistical analyses reported in their response to Reviewer #'1 comments:

      (1) Detecting differentially expressed genes (DEGs):

      Reviewer #1 suggested adding an interaction term between sex and environment (ethnicity) in identifying DEGs. The authors performed ANCOVA analysis with sex and ethnicity as covariates (but not the interaction) and found sex explained more variance. This is not what the reviewer asked for, and the results do not help identify DEGs.

      We understand the reviewer’s suggestion about identification of DEGs using sex × ethnicity interaction. However, we could not find an appropriate tool to make such analysis, though we have carefully searched it in the literature. It should be noted that the interaction analysis between sex and environment was only designed to study genotype data rather than gene expression data. Besides, considering that we have added multiple covariates in our DEG detection, adding an interaction term between sex and environment (ethnicity) in identifying DEGs make the formulation too complex to resolve using current tools. Alternatively, we have made a linear regression model to test the explanation of sex for DEG detection in the revision (see details below). We would appreciate if the reviewer could provide any available tools, or previous studies conducting interaction analysis for DEG identification.

      (2) Overlap between DEGs and genes under positive selection in Tibetans (TSNGs)

      The authors claimed that the overlaps are significantly enriched in "sex-combined" set (p=0.048) and "male-only" set (p=9e-4), but it seems that the authors calculated the p-values incorrectly. Based on the histogram shown in Fig 3R (left penal), at least 750 out of 10,000 permutations led to 4 genes in overlap and there are additional permutations with 5 or more genes in overlap, so the p-value for the sex-combined set cannot be 0.048. In addition, the permutation procedure is somewhat questionable: it is unclear whether randomly sampling 192 genes from the human genome is reasonable choice, without matching for relevant gene features.

      As we explained in the response to Reviewer-1, we agree with the reviewer’s point that random sampling of genes in permutation should be extracted from genes expressed in each tissue rather than the entire genome. Based on this updated random sampling procedure, we redid the analysis, and our previous conclusions remain unchanged.

      (3) Polygenic adaptation signal based on eQTL information:

      The PolyGraph method is designed for highly polygenic traits with causal variants spread across the genome. However, the genetic architecture of the expression of a gene is much less polygenic with at most few cis- eQTLs per gene, so the PolyGraph model does not apply for expression of individual genes. On the other hand, eQTLs for different genes are associated with different "traits", so they cannot be simply aggregated together for PolyGraph analysis. Based on the Methods description, it is unclear how the authors ran the PolyGraph analysis on eQTLs practically and whether this practice is appropriate for detecting polygenic adaptation signal on gene expression.

      We understand the reviewer’s concern on polygenic adaptation analysis. In this study, we tested whether the estimated polygenic scores from eQTLs (estimated using sums of allele frequencies at independent eQTLs weighted by their effect sizes) were significantly enriched in Tibetans compared to other populations. The detailed descriptions of polygenic test are provided in the response to Reviewer-1.

      Reviewer #1 (Public Review):

      The revised manuscript new presented 1) a permutation-based test for the significance of the overlap between DEGs and genes with positive selection signals in Tibetans, and 2) polygenic adaptation test for the eQTLs. I make my suggestions in detail as below:

      Major Comments

      (1) My previous concern regarding the DEG analysis remains unresolved. Although the authors agreed in their response that the difference between the male- and female-specific DEGs are insufficient to the difference between sex-combined and sex-specific DEGs (Figure S6). However, the results section still states the opposite pattern between males and females as a decisive reason for the difference (p. 9, lines 236-239). Again, I would like to recommend the authors to test alternative ways of analysis to boost statistical power for DEG detection other than simply splitting data into males and females and performing analysis in each subset. For example, the authors may consider utilizing gene by environment interaction analysis schemes here biological sex as an environmental factor.

      To evaluate the effect of gene expression of each layer by sex, we adopted two strategies: 1) to calculate the variance explained by sex from the expression data; 2) to evaluate the statistical significance of association between sex from the expression data.

      Firstly, we observed a significantly higher variance explained by sex than by ethnicity in six layers of the placenta (see details in our previous response to reviewers).

      Then, we performed a linear regression model to test whether gender affects the gene expression. For each gene, a linear regression model was made by using R glm function with sex as covariates: glm (gene expression ~ sex). We discovered 5,865 genes significantly associated with sex, and most of them were located on the sex chromosomes. We observed 62.63% genes overlapped with those genes with opposite differential directions between the sex-combined and the sex-specific analyses.

      Considering the opposite direction of DEGs is likely only one of the explanations for the discrepancy between the sex-combined and the sex-specific DEGs, and there might be alternative mechanism for this phenomenon, we have tune down the description of this point in the revised manuscript:

      “Considering 62.63% of DEGs (248/396) with an opposite direction of between-population expression divergence in males and females, respectively (Figure S6), we reckon that there might be other factors such as sample size or cell composition affecting the identification of DEGs, which could cancel out the differences in the sex-combined analysis.” (Page 9)

      (2) Multiple testing schemes are still sub-optimal in some cases. Most of all, the p-values in the WGCNA analysis (p. 11), the authors corrected for the number of traits (n=12) after adjusting for the correlation between them. However, they did not mention whether they counted for the number of modules they tested at all (n=136 and 161 for males and females, respectively). Whether they account for the number of modules will make a substantial difference in the significance threshold, please incorporate and describe a proper multiple testing scheme for this analysis.

      We understand the reviewer’s point. Indeed, for multiple testing schemes, we considered both the number of traits and the number of modules. For the number of modules, multiple testing correction is already imbedded in WGCNA, as described in the published studies (Li et al. 2018; Zeng et al. 2023).

      (3) Evidence for natural selection on the observed DEG pattern is still weak and not properly described.

      (1) For the overlap between DEGs and TSNGs, the authors introduced a permutation-based test, but used a total set of genes in the human genome as a comparison set (p. 25, lines 699-700). I believe that the authors should sample random sets of genes from those already expressed in each tissue to make a fair comparison.

      We agree with the reviewer’s point that random sampling of genes in permutation should be extracted from genes expressed in each tissue, which is a fair comparison between the observed and the simulated counts of the overlapped genes.

      Therefore, for each permutation, we randomly extracted 192 genes from all the placenta expressed genes identified from the seven layers (17,284 genes in total), and we overlapped them with DEGs of the three sets (female + male, female only, and male only) and counted the gene numbers. After 10,000 permutations, we constructed a null distribution for each set, and found that the overlaps between DEGs and TSNGs were significantly enriched in the “sex-combined” set (p-value = 0.0123) and the “male-only” set (p-value < 1e-4), but not in the “female-only” set (p-value = 0.0572) (Figure R1). This result suggests that the observed DEGs are significantly enriched in TSNGs when compared to the set of random sampling, especially for the DEGs from the “male-only” set.

      Author response image 1.

      The distribution of 10,000 permutation tests of counts of the overlapped genes between 192 TSNGs and the DEGs randomly selected from the expressed genes in the placenta. The red-dashed lines indicate the observed values based on the randomly selected DEGs.

      (2) The entire polygraph analysis for polygenic adaptation is poorly described. The current version of the Methods does not clarify i) for which genes the eQTLs are discovered, 2) how the authors performed the eQTL analysis, iii) how the authors polarized the effect, and iv) how they set up a comparison between the eQTLs and the others.

      Considering the RNA-seq data of placenta mostly represent the transcriptomes of the newborns according to our analysis on maternal-fetal compositions of each dissected layer, we conducted eQTL analysis using the fetal genotypes and the placental tissue gene expression data (TPM) using R package MatrixEQTL (https://github.com/andreyshabalin/MatrixEQTL), and the altitude and maternal age were taken as covariates. We take a window 1 Mb upstream and 1 Mb downstream around each SNP to select genes or expression probes to test. Associations between these SNP–gene combinations are calculated using linear model. This tool can distinguish local (cis-) and distant (trans) eQTLs. We performed separate corrections for multiple testing.

      Finally, we detected 5,251 eQTLs (involving 319 eGenes), covering the SNPs significantly associated with gene expression (p-value < 5e-8). To identify the signatures of polygenic selection in Tibetans using eQTL information, we removed those SNPs in linkage disequilibrium (r2 > 0.2 in 1000 Genome Project) and obtained 176 independent eQTLs as input into PolyGraph (Racimo et al. 2018). QB (Racimo et al. 2018) and QX (Berg and Coop 2014) framework are used in Polygraph to determine whether the estimated polygenic scores exhibit more variance among populations than null expectation under genetic drift, by retrieving the summary statistics from the eQTL set.

      In this study, we focused on testing whether the estimated polygenic scores from eQTLs (estimated using sums of allele frequencies at independent eQTLs weighted by their effect sizes) were significantly enriched in Tibetans compared to other populations. The significance was evaluated by comparing to 10,000 sets of the control SNPs. Each set of control SNPs was randomly drawn from the genomic SNPs, and contained an equal number of SNPs as the eQTLs matched one-to-one by minor allele frequency.

      The PolyGraph result showed that Tibetans have a clear signature of polygenic selection on gene expression (Bonferroni-corrected p-value = 0.003, Figure S12). In other words, the frequency of alleles associated with gene expression (up-regulation or down-regulation) were specifically enriched in Tibetans, a signal of positive selection.

      Minor comments (1) In Figure S1, the amount of variance explained by PC1 and PC2 need to be corrected. PC1 explains less variance than PC2 (0.11 vs 0.68%).

      It was a typing error that mixed up the variances between PC1 and PC2. We have corrected it in the revised version.

      (2) In the section "Sex-biased expression divergence ..." (p. 8), the authors are using the term "gender" instead of sex. Considering that they are talking about the biological sex of each infant, I believe that sex is a more appropriate term to be used than gender.

      Following the reviewer’s suggestion, we rephrased “gender” as “sex” in the revised manuscript to describe the biological differences between females and males.

      Reviewer #3 (Public Review):

      More than 80 million people live at high altitude. This impacts health outcomes, including those related to pregnancy. Longer-lived populations at high altitudes, such as the Tibetan and Andean populations show partial protection against the negative health effects of high altitude. The paper by Yue sought to determine the mechanisms by which the placenta of Tibetans may have adapted to minimise the negative effect of high altitude on fetal growth outcomes. It compared placentas from pregnancies from Tibetans to those from the Han Chinese. It employed RNAseq profiling of different regions of the placenta and fetal membranes, with some follow-up of histological changes in umbilical cord structure and placental structure. The study also explored the contribution of fetal sex in these phenotypic outcomes.

      A key strength of the study is the large sample sizes for the RNAseq analysis, the analysis of different parts of the placenta and fetal membranes, and the assessment of fetal sex differences.

      A main weakness is that this study, and its conclusions, largely rely on transcriptomic changes informed by RNAseq. Changes in genes and pathways identified through bioinformatic analysis were not verified by alternate methods, such as by western blotting, which would add weight to the strength of the data and its interpretations. There is also a lack of description of patient characteristics, so the reader is unable to make their own judgments on how placental changes may link to pregnancy outcomes. Another weakness is that the histological analyses were performed on n=5 per group and were rudimentary in nature.

      For the three weaknesses raised by the reviewer, here are our responses:

      (1) Considering that our conclusions largely rely on the transcriptomic data, we agree with reviewer that more experiments are needed to validate the results from our transcriptomic data. However, this study was mainly aimed to provide a transcriptomic landscape of high-altitude placenta, and to characterize the gene-expression difference between native Tibetans and Han migrants. The molecular mechanism exploration is not the main task of this study, and more validation experiments are warranted in the future.

      (2) For the lack of description of patient characteristics, actually, we provided three-level results on the placental changes of Tibetans: macroscopic phenotypes (higher placental weight and volume), histological phenotypes (larger umbilical vein walls and umbilical artery intima and media; lower syncytial knots/villi ratios) and transcriptomic phenotypes (DEG and differential modules). Combined with the previous studies, these placenta changes suggest a better reproductive outcome. For example, the placenta volume shows a significantly positive correlation with birth weight (R = 0.31, p-value = 2.5e-16), therefore, the larger placenta volume of Tibetans is beneficial to fetal development at high altitude. In addition, the larger umbilical vein wall and umbilical artery intima and media of Tibetans can explain their adaptation in preventing preeclampsia.

      (3) For the sample size of histological analyses, we understand the reviewer’s concern that 5 vs. 5 samples are not very large in histological analyses. This is because it was difficult to collect high-altitude Han placenta samples, and we only got 13 Han samples, from which we selected 5 infant sex matched samples.

      Minor point:

      I feel the authors have responded well to the other reviewer comments. However, I am disappointed that the authors did not address my comment related to the validation of their RNAseq data. In particular, they failed to add new data that verifies and supports their RNAseq findings on pathways affected. This is imperative as their conclusions are based solely on the RNAseq analysis. The only other comment I have is that they should add a description of all abbreviations, including those in the supplementary information (like Table S12).

      For experimental validation of transcriptome, we understand the concern of reviewer. However, as we mentioned before, this study was mainly aimed to provide a transcriptomic landscape of high-altitude placenta, the molecular mechanism exploration is not the main task of this study, and more validation experiments are warranted in the future. Actually, we have tune down the description of power from transcriptomic data for explanation of biological difference, and called for the further functional validations in the future:

      “the transcriptome data is insufficient to explain the underlying molecular mechanisms of genetic adaptation in Tibetans. Future single-cell transcriptome analysis and functional validations of the candidate genes are warranted to reveal the responsible cell types and the molecular pathways.” (highlighted in Page 20)

      For abbreviations of the manuscript, according to the reviewer’s suggestion, we added descriptions of all abbreviations of this study in corresponding position (Table S1 and S12).

      References

      Berg JJ, and Coop G (2014). A population genetic signal of polygenic adaptation. PLoS Genet 10(8): e1004412.

      Li J, et al. (2018). Application of Weighted Gene Co-expression Network Analysis for Data from Paired Design. Sci Rep 8(1): 622.

      Racimo F, Berg JJ, and Pickrell JK (2018). Detecting Polygenic Adaptation in Admixture Graphs. Genetics 208(4): 1565-1584.

      Zeng JF, et al. (2023). Functional investigation and two-sample Mendelian randomization study of neuropathic pain hub genes obtained by WGCNA analysis. Frontiers in Neuroscience 17.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations for The Authors):

      (1) Since the data suggests that the degradation of Mecp2 is a crucial event in the exit from quiescence, gaining a better understanding of the underlying mechanism would improve the significance of the study. In this regard, the authors should take advantage of the serum stimulated degradation of Mecp2 (Fig. 3D) to identify the signaling pathway(s) required for the degradation.

      Thank you for this suggestion. To decipher the molecular mechanisms underlying Mecp2-regulated quiescence exit, we performed RNA-seq combined with ChIP-seq to identify the Mecp2-dependent transcriptome genome-wide during the early stage of liver regeneration (Figure S6C). There were 2658 Mecp2 direct target genes, in which 537 were PHx-activated and 2121 were PHx-repressed genes (Figure 6A). GO analysis showed that PHx-activated Mecp2 targets were highly enriched in proliferation-associated biological processes such as ribosome biogenesis, rRNA metabolic process, ncRNA metabolic process, and regulation of transcription by RNA polymerase I, whereas PHx-repressed Mecp2 targets were associated with several metabolic processes including carboxylic acid catabolic process, cellular amino acid metabolic process, fatty acid metabolic process and steroid metabolic process (Figure 6B). These results suggest that Mecp2 plays a negative regulatory role during quiescence exit by activating metabolism-associated genes while repressing proliferation-associated genes in quiescent cells.

      Given the more rapid decay of Mecp2 at the protein compared to the mRNA level during the quiescence-proliferation transition, we speculated that Mecp2 is targeted by posttranslational regulation. This hypothesis was supported by proteasome inhibition with the proteasome inhibitor MG132, which attenuated the reduction of Mecp2 in quiescent cells after S.R. (Figure S5A). To identify the signaling pathway that regulate Mecp2 degradation during the G0/G1 transition, we performed immunoprecipitation followed by mass spectrometry (IP-MS) using Mecp2 antibody in quiescent 3T3 cells treated with or without S.R. (Figure S5B). A total of 647 proteins were identified as putative Mecp2 interactors. We were particularly interested in the proteins involved in proteasome-mediated ubiquitin-dependent protein catabolic process which was one of the enriched Gene Ontology (GO) items in the Mecp2 interactome (Table S1).

      (2) The authors suggest that Mecp2 downregulation accelerates the induction of pRb, which serves as a key marker for G0/G1 transition. However, their data only show increased magnitudes of the expression in Mecp2 downregulated cells at the timepoints when samples were collected (Figs. 2B and 4B). In the in vitro experiments, the authors should investigate earlier timepoints to demonstrate that induction of pRB during the quiescence exit occurs earlier in Mecp2 deficient cells compared to control cells. Likewise, a later induction of pRB in Mecp2 overexpression cells, in comparison to normal cells, should be demonstrated.

      Thank you for these valuable suggestions. We have, accordingly, collected cell samples re-entered the cell cycle at 30-, 60-, 90- and 120-minutes post-S.R. We examined the pRb expression and found that phosphorylation of retinoblastoma protein (pRb) at Ser807/811 occurs earlier (about 90 minutes) in Mecp2 deficient cells compared to control cells (Figure S4C). Compared to the EV, Mecp2 OE resulted in the delayed induction of pRB (about 60 minutes) upon S.R. (Figure S4D). These data indicate that enhanced reduction of Mecp2 stimulates exit from quiescence.

      (3) There are three well-known phosphorylation sites in Mecp2, including S80, S229, and S423. As protein ubiquitination and degradation are often triggered by phosphorylation, it would be interesting to examine whether phosphorylation at these sites of Mecp2 is required for its downregulation during quiescence exit. This can be achieved using non-phosphorylate mutants of Mecp2.

      This is a very good question. Indeed, the 26S ubiquitin-proteasome system (26S UPS) is responsible for the breakdown of MeCP2 (PMID: 28394263, 28973632). In 2009, the bona fide PEST (enriched in proline, glutamic acid, serine, and threonine) domains have been identified, which are highly conserved across vertebrate evolution (PMID: 19319913). Consensus sequences enriched in PEST residues have been found to predispose proteins containing them for rapid proteolytic degradation (PMID: 8755249, 2876518). In addition, phosphorylation within PEST motifs precedes ubiquitination of proteins (PMID: 15229225). One of the best characterized sites of MeCP2 phosphorylation (S80) (PMID: 19225110), as well as one of the identified ubiquitination sites (K82/K99) (PMID: 22615490), both fall within one of these regions. It is still noteworthy that most of the MeCP2 phosphorylation sites were found in close proximity to potential ubiquitylation sites. For example, Rett syndrome missense mutations in Rett syndrome affecting three (K82R, K135A, K256S) of the ubiquitination sites (PMID: 25165434) and S80 (within one of the PEST sequences) and K82 have been shown to be phosphorylated and ubiquitinated.

      Based on the above discussion, we providing a potential hypothesis that the MeCP2 turnover during cell cycle re-entry is achieved by an initial phosphorylation signal (phosphorylated at S80, S229, or S421) that triggers the ubiquitination of a close lysine residue. We hope to solve these issues and be able to present the findings in future work. Thanks again for your professional suggestions.

      (4) It would be interesting if the authors could also examine the effect of altered expression of Mecp2 on the maintenance of quiescence. For example, whether the downregulation of Mecp2 sensitizes quiescent cells for entry of the cell cycle in response to serum stimulation or delays withdrawal from the cell cycle upon serum starvation or contact inhibition.

      Thank you for your suggestions. Cell cycle synchronization was induced with serum deprivation. When nutrients are exhausted, altered expression of Mecp2 have no statistical influence on the maintenance of quiescence as analyzed by Flow cytometric (Figure 4D and H). This suggests that the altered expression of Mecp2 alone may not be sufficient for cell cycle exit. In the presence of growth factors or nutrients, loss of MeCP2 only accelerates the rate of cell cycle re-entry.

      Minor points:

      For Figs. 2D, 2H, and 2L, it would be more intuitive if the percentage of changes in liver index rather than the relative index values were used. Also, the values listed in the figures should start from time zero after partial hepatectomy rather than pre-surgery.

      Liver weight have the corresponding change with body weight. The liver index (ratio of regenerate liver weight/body weight) is tightly regulated and depends on metabolic demands of the organism. During the course of liver regeneration, reestablishment of liver volume after resection is regulated by the functional needs of the organism. Using the percentage of regenerate liver weight/body weight as a liver growth index could reflect the regenerative function. Next, we agree with the data presentation form and the values listed in the figures have been modified in the revised version.

      Reviewer #2 (Recommendations for The Authors):

      My concerns are as follows:

      (1) The authors note that the decrease in Mecp2 protein levels was more pronounced than the decrease in mRNA levels, suggesting the presence of post-translational regulation of Mecp2 during the early stages of G0 exit. Could the decrease in MeCP2 levels be related to autophagy flux?

      Thank you for your valuable comments. Also, we have compared the cells extracts from untreated and chloroquine-treated cells (to block lysosomal degradation). Chloroquine did not cause any accumulation of MeCP2 (Figure S5B). The results suggest that autophagy activity do not involve in the decrease the MeCP2 protein.

      (2) In addition to Cyclin D1, how about other cell cycle-related proteins (cyclin A, cyclin B, and cyclin E) were changed when MeCP2 was lost during cell cycle re-entry? Protein expression should be examined by western blot.

      We appreciate your valuable suggestions. The expression of cell cycle related protein cyclin A2, cyclin B1 and cyclin E1 were evaluated by Western blotting. The expression of cyclin A2, cyclin B1 and cyclin E1 was enhanced by the knockdown of MeCP2 (Figure 4B). Conversely, the repressed expression of cyclin A2, cyclin B1 and cyclin E1 was observed by the over-expression of MeCP2 (Figure 4F).

      (3) By combining MeCP2 ChIP-seq and RNA-seq of genes regulated by MeCP2, the authors uncovered the dual role of Mecp2 in preventing quiescence exit by targeting Rara and Nr1h3. All they show are the Q-PCR results. The authors should show the protein level of Rara and Nr1h3 when MeCP2 was lost during cell cycle re-entry.

      Thank you for your advice. In Figure 7C, the knockdown efficiency of Rara and Nr1h3 were checked by Western blot analysis.

      (4) The authors performed lentiviral and AAV-mediated gene knockdown to target Rara and Nr1h3 in Cells and Mecp2-cKO livers, respectively. The Knockdown efficacy should be verified by western blots (Fig 7 C and F).

      In Figure 7F, the consequences of the Rara and Nr1h3 knockdown efficiency was verified by Western blot analysis.

      (5) The other major concern is regarding the lack of quantitative assessments of MeCP2 WB results (Fig 2, Fig 4, and Fig 7).

      Thank you for this suggestion. We added supplementary figures to Figure 2B, 2F and 2J to show the quantification membrane signal of MeCP2 protein in liver regeneration. And Fig S4A and 4B showing the quantification signal of MeCP2 protein in NIH3t3 cell cycle re-entry model.

      (6) In the Figure legends of Fig 4 B and Fig 4F, the authors should delete the statistical descriptions, as there are no statistical results. In Fig 5F, Fig 5J, Fig 6D, Fig 7D and Fig7H, there are no statistical results of p < 0.01, p < 0.05 or *p < 0.0001, respectively. The authors should check the description in the figure legends. In Fig S2C, the level of significance should be annotated.

      We would like to express our heartfelt thanks for your thorough reading of our manuscript. We have made corrections to make manuscript clearer and more accurate. The level of significance have been annotated in Fig S2C.

      (7) In Fig S4A, there are no WB results of Cyclin D1 and pRb, the authors should check the description.

      Thank you for pointing this out. We have deleted the confusing statements in the revised manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We thank the constructive criticism provided by the reviewers and editor. Based on these suggestions, we have thoroughly reworked the manuscript. More specifically but not limit:

      (1) We have corrected the mistakes mentioned by the reviewers on a point-by-point basis.

      (2) We have provided additional experimental evidences to explain the rationale behind selecting five miRNAs for q-PCR validation. Furthermore, we have elaborated on the reasons for focusing primarily on research related to cartilage.

      (3) In response to concerns regarding overinterpretation in the manuscript, we have made more precise descriptions and revisions. Furthermore, we have added some details in our methods, including the addition of results showing the conservation of miR-199b-5p sequences between human and mouse species.

      (4) We have provided additional details on the experiments, including the process for predicting target genes, timing of chondrocyte culture and other experimental operations.

      (5) Finally, we have made additional revisions to the details of the figures to avoid any distortions and enhance the precision of the language.

      Below please find our responses to the reviewers’ comments on a point-by-point basis. You also can track the changes in the modified manuscript. We believe that this revision has been substantially improved.

      eLife assessment

      The manuscript provides interesting evidence that miR-199b-5p regulates osteoarthritis and as such it may be considered as a potential therapeutic target. This finding may be useful to further advance the field.

      Thank you for your positive comments.

      Although the study is considered potentially clinically relevant, the evidence provided was deemed insufficient and incomplete to support the conclusions drawn by the authors.

      Thank you for your critical comments and constructive advices. We have response point to point according to the reviewers’ questions and thoroughly re-working our manuscript. We hope the revised manuscript can be qualified to the criteria and be published on the journal of eLife.

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript, the authors observed that miR-199b-5p is elevated in osteoarthritis (OA) patients. They also found that overexpression of miR-199b-5p induced OA-like pathological changes in normal mice and inhibiting miR-199b-5p alleviated symptoms in knee OA mice. They concluded that miR-199b-5p is not only a potential micro-target for knee OA but also provides a potential strategy for the future identification of new molecular drugs.

      Thanks for your comment.

      Strengths:

      The data are generated from both human patients and animal models.

      Thanks for the positive comment.

      Weaknesses:

      The data presented in this manuscript is not solid enough to support their conclusions. There are several questions that need to be addressed to improve the quality of this study.

      The following questions that need to be addressed to improve the quality of the study.

      (1) Exosomes were characterized by electron microscopy and western blot analysis (for CD9, 264 CD63, and CD81). However, figure S1 only showed two sample WB results and there is no positive and negative control as well as the confused not clear WB figure.

      Thank you for your suggestion. We acknowledge that a comprehensive identification of extracellular vesicles should include both positive and negative samples. However, in some of the initial studies we referenced, the positive and negative control were not mentioned1;2. In our study, we identified extracellular vesicles using a combination of electron microscopy, nanoparticle tracking analysis, and marker detection of exosomes. We agree that having negative samples would make our results more convincing, and we will include a negative control group in our future experiments. Additionally, we have provided clearer images in the revised version. (supplemental fig1 A)

      Reference

      (1) Ying W, Riopel M, Bandyopadhyay G, et al. Adipose Tissue Macrophage-Derived Exosomal miRNAs Can Modulate In Vivo and In Vitro Insulin Sensitivity. Cell. 2017;171(2).

      (2) Fang T, Lv H, Lv G, et al. Tumor-derived exosomal miR-1247-3p induces cancer-associated fibroblast activation to foster lung metastasis of liver cancer. Nature Communications. 2018;9(1):191.

      (2) The sequencing of miRNAs in serum exosomes showed that 88 miRNAs were upregulated and 89 miRNAs were downregulated in KOA patients compared with the control group based on fold change > 1.5 and p < 0.05. Figure 2 legend did not clearly elucidate what those represent and why the authors chose those five miRNAs to further validate although they did mention it with several words in line 108 'based on the p-value and exosomal'.

      In fact, our study included two additional groups: the acupuncture treatment group (4 weeks of continuous acupuncture treatment) and the waiting treatment group (no intervention, followed by acupuncture treatment after 4 weeks), in addition to the healthy control and knee osteoarthritis (OA) patient groups. After comparing these four groups, we found that 11 genes (hsa-miR-504-3p, hsa-miR-1915-3p, hsa-miR-103a-2-5p, hsa-miR-887-3p, hsa-miR-1228-5p, hsa-miR-34c-3p, hsa-miR-3168, hsa-miR-518e-3p, hsa-miR-1296-5p, hsa-miR-338-3p, and hsa-miR-199b-5p) were upregulated in KOA patients but downregulated after acupuncture treatment, with no change in the waiting treatment group. Additionally, 7 genes (hsa-miR-448, hsa-miR-514a-3p, hsa-miR-4440, hsa-let-7f-5p, hsa-let-7a-5p, hsa-let-7d-5p, and hsa-miR-15b-3p) were downregulated in KOA patients but upregulated after acupuncture treatment, with no change in the waiting treatment group. Considering the improvement in clinical symptoms of KOA patients after acupuncture treatment, we believe that these 18 genes are of significant value. Based on overall expression abundance and species specificity, we finally selected 5 genes, namely the 5 genes mentioned in this article. Regarding this result, we have already included it in the supplementary fig5(fig. S5).

      Author response image 1.

      Venn diagram showing differentially expressed miRNAs in the OA group compared with healthy patients and patients who recovered after acupuncture treatment.

      (3) In Figure 3 legend and methods, the authors did not mention how they performed the cell viability assay. What cell had been used? How long were they treated and all the details? Other figure legends have the same problem without detailed information.

      Thank you for your suggestions. In Figure 3, cell viability was determined using the CCK-8 assay. We used second-generation chondrocytes for this analysis. The chondrocytes were obtained from young mice aged 3-5 days after birth. The cartilage tissues were extracted, and the cells were cultured in complete medium after digestion with collagenase. The detailed description of the cell viability assay, cell culture procedures, specific timing, and treatment methods of the cells used can be found in our revised manuscript. (page14-15,line304-313)

      Besides, we have made thorough revisions to all figure legends to provide a clearer explanation of the relevant content.

      (4) The authors claimed that Gcnt2 and Fzd6 are two target genes of miR-199b-5p. However, there is no convincing evidence such as western blot to support their bioinformatics prediction.

      In the current study, we first identified six potential target genes by intersecting the predicted targets obtained from six bioinformatics websites. Subsequently, q-PCR was employed to test all six genes, revealing two genes with significant changes, namely Fzd6 and Gcnt2. We then predicted the binding sites of these genes and validated their existence through luciferase assays. Moreover, we examined the expression of these two potential targets in human KOA samples using a human database and found them to be expressed specifically in the samples. These results suggest that Fzd6 and Gcnt2 are potential target genes for KOA. However, we didn’t do western blot assay to verify the results. Based on your suggestions, we have further discussed the limitations of our study in this regard and proposed future research strategies.

      (5) To verify the binding site on 3'UTR of two potential targets, the authors designed a mouse sequence for luciferase assay, but not sure if it is the same when using a human sequence.

      Thank for your great advice. We carried out the comparative analysis of sequence conservatism between human and mouse, and find the binding site on 3'UTR matches to human sequence very well. The sequence conservation between hsa_miR-199b-5p and mmu_miR-199b-5p was as high as 95.65%. We added the methods and results in the revised manuscript. (page9, line181-184; page17, line361-365) (supplemental fig6).

      In detail: Firstly, the sequence information of mmu_miRNA-199b-5p was used to locate the human homologous sequence in the UCSC database. The homologous sequence was found to be located in the human genome at chr9:128244721-128244830 (supplemental fig6 A). Based on this positional information and the source gene, a further comparison was conducted in miRbase to identify the nearest miRNA at the position of the human genome. It was discovered that hsa_miR-199b-5p is positionally conserved and located at chr9:128244721-128244830 (supplemental fig6 B). The sequence of hsa_miR-199b-5p was obtained from the miRbase database (supplemental fig6 C), and a comparative analysis was performed between the sequences of humans and mouse (supplemental fig6 D). Besides being positionally conserved, the sequence conservation between hsa_miR-199b-5p and mmu_miR-199b-5p was as high as 95.65%, indicating a good sequence conservation.

      Author response image 2.

      (A) By using the sequence information of mmu_miRNA-199b-5p, we located the position of its human homologous sequence in the UCSC database. (B) Based on the positional information and the source gene, we further aligned this position with the closest miRNA in miRbase. (C) We compared the sequences of hsa_miR-199b-5p and mmu_miR-199b-5p. (D) Conservation analysis was performed to compare the sequence conservation of miR-199b-5p.

      Reviewer #2 (Public Review):

      Summary:

      The authors identified miR-199b-5p as a potential OA target gene using serum exosomal small RNA-seq from human healthy and OA patients. Their RNA-seq results were further compared with publicly available datasets to validate their finding of miR-199b-5p. In vitro chondrocyte culture with miR-199b-5p mimic/inhibitor and in vivo animal models were used to evaluate the function of miR-199b-5p in OA. The possible genes that were potentially regulated by miR-199b-5p were also predicted (i.e., Fzd6 and Gcnt2) and then validated by using Luciferase assays.

      We greatly appreciate Reviewer #2 constructive comments.

      Strengths:

      (1) Strong in vivo animal models including pain tests.

      (2) Validates the binding of miR-199b-5p with Fzd6 and binding of miR-199b-5p with Gcnt2.

      Thanks for positive comment.

      Weaknesses:

      (1) The authors may overinterpret their results. The current work shows the possible bindings between miR-199b-5p and Fzd6 as well as bindings between miR-199b-5p and Gcnt2. However, whether miR-199b-5p truly functions through Fzd6 and/or Gcnt2 requires genetic knockdown of Fzd6 and Gcnt2 in the presence of miR-199b-5p.

      In this study, we employed a comprehensive approach by integrating data from six bioinformatics databases to identify potential target genes for miR-199b-5p. Subsequent qPCR analysis revealed significant changes in two genes, Fzd6 and Gcnt2. We then utilized luciferase assays to validate the predicted binding sites and confirmed the interaction between miR-199b-5p and these genes. Additionally, we examined the expression profiles of these potential target genes in human KOA samples using a human database, which unveiled distinct expression patterns.

      While our findings suggest that Fzd6 and Gcnt2 may serve as potential target genes for miR-199b-5p, we acknowledge the necessity for further experimental validation and in-depth functional characterization. Building upon your insightful recommendations, we have thoroughly addressed the research limitations and proposed potential research strategies for future investigations in our discussion. (page11,line227-231)

      (2) In vitro chondrocyte experiments were conducted in a 2D manner, which led to chondrocyte de-differentiation and thus may not represent the chondrocyte response to the treatments.

      We admit that 3D culture system will be more accurate and reliable. However, according to Liu Qianqian et al researches3, the 2D culture systems were also used and work well. Besides, the second-generation primary mice chondrocytes we used in the current study did not exhibit a significant dedifferentiated morphology. So, considering the experiment condition in our lab, we chose the second-generation cultured primary mouse chondrocytes in the whole process of cell experiment. To show the reliability of the cells, we provided more pictures in the supplement fig 7(fig. S7) In the future study, we will adopt 3D culture system for experiments. Thank you for your advices and we have added this limitation in the revised manuscript. (page11,line237-240)

      Author response image 3.

      Primary mice chondrocytes we cultured (P1)and the secondary generation cells(P2) we used in the following experiment.

      References which used 2D :

      (3) Liu Q, Zhai L, Han M, et al. SH2 Domain-Containing Phosphatase 2 Inhibition Attenuates Osteoarthritis by Maintaining Homeostasis of Cartilage Metabolism via the Docking Protein 1/Uridine Phosphorylase 1/Uridine Cascade. Arthritis & Rheumatology (Hoboken, NJ). 2022;74(3):462-474.

      (3) There is a lack of description for bioinformatic analysis.

      Sorry for our neglection. We have added relevant descriptions and details. (Pages 14, line299-303)

      (4) There are several errors in figure labeling.

      We have revised. (Fig. 3, Fig. 4, Fig. 5 and Fig. 7)

      Recommendations for the authors:

      We appreciate the reviewers' feedback as we believe it has significantly contributed to the refinement of our manuscript. We are confident that our revisions have strengthened the quality and impact of our study, and we agree that the suggestions presented by the reviewers are valuable and appropriate for publication.

      Reviewer #2 (Recommendations For The Authors):

      I would like to thank the authors for investigating the functional role of miR-199b-5p in knee OA. While this study has the potential to provide valuable knowledge to the fields of miRNAs and joint diseases, significant improvements in several areas are required.

      We appreciate your constructive comments, and we have made a substantial improvement to the manuscript. We thank all the reviewers for their advice as well as their criticisms.

      Major concerns:

      (1) According to the Authors, miR-199b-5p is identified by the results from their own miRNA-sequencing as well as comparison with other publicly available datasets (both synovium and cartilage datasets). It is unclear to me why the synovium dataset was used here as it appears that the entire manuscript was mainly focused on chondrocytes.

      Thank you for your question. As we are aware, cartilage degradation is the initial pathological change in knee osteoarthritis (KOA), which subsequently leads to other pathological changes such as synovial inflammation4. These factors are interrelated, and current research on KOA encompasses cartilage, synovium, and system inflammation et al. Therefore, when we identified a large number of dysregulated miRNAs in extracellular vesicles isolated from serum, it was crucial to determine whether these dysregulated miRNAs were also altered in cartilage or synovium. To address this, we compared our findings with publicly available databases and found a higher overlap with the cartilage cell dataset, including miRNA-199b. Consequently, we decided to focus our subsequent investigations on cartilage-related research.

      Reference

      (4) Hunter D, Bierma-Zeinstra S. Osteoarthritis. Lancet (London, England). 2019;393(10182):1745-1759.

      (2) Also, 169 of 177 differentially expressed exosome miRNAs were intersected with differentially expressed miRNAs from OA cartilage datasets. It is surprising that in the 5 selected miRNAs for further qRT-PCR validation, 3 out of 5 were not in the exosome miRNA dataset (i.e., hsa-mir-1296-5p, hsa-mir-15b-3p, and hsa-mir-338-3p; page 5, line 109 and Fig. 1B). Isn't that selecting the miRNAs that both differently expressed in exosome and cartilage datasets for validation more essential? Furthermore, from the Authors' exosome miRNA dataset, only 5 out of 15 KOA patients actually exhibited up-regulated miR-199b-5p vs. health controls. Please elaborate on how the target was determined.

      In fact, our study included two additional groups: the acupuncture treatment group (4 weeks of continuous acupuncture treatment) and the waiting treatment group (no intervention, followed by acupuncture treatment after 4 weeks), in addition to the healthy control and knee osteoarthritis (OA) patient groups. After comparing these four groups, we found that 11 genes (hsa-miR-504-3p, hsa-miR-1915-3p, hsa-miR-103a-2-5p, hsa-miR-887-3p, hsa-miR-1228-5p, hsa-miR-34c-3p, hsa-miR-3168, hsa-miR-518e-3p, hsa-miR-1296-5p, hsa-miR-338-3p, and hsa-miR-199b-5p) were upregulated in KOA patients but downregulated after acupuncture treatment, with no change in the waiting treatment group. Additionally, 7 genes (hsa-miR-448, hsa-miR-514a-3p, hsa-miR-4440, hsa-let-7f-5p, hsa-let-7a-5p, hsa-let-7d-5p, and hsa-miR-15b-3p) were downregulated in KOA patients but upregulated after acupuncture treatment, with no change in the waiting treatment group. Considering the improvement in clinical symptoms of KOA patients after acupuncture treatment, we believe that these 18 genes are of significant value. Based on overall expression abundance and species specificity, we finally selected 5 genes, namely the 5 genes mentioned in this article. Regarding this result, we have already included it in the supplementary fig5(fig. S5).

      Author response image 4.

      Venn diagram showing differentially expressed miRNAs in the OA group compared with healthy patients and patients who recovered after acupuncture treatment.

      (3) There is also a lack of description for bioinformatic analysis regarding how miRNA sequencing datasets were analyzed. What R/python packages or algorithms were used? What were the QC criteria?

      We apologize for any confusion caused. We have now included a clear description of the method employed, and R was utilized for this data analysis (revised in Page14, Line301-305). To ensure consistency, we compared our findings with publicly available human serum data from the database (GSE105027) using a fold change threshold of > 1.5 and a significance level of p < 0.05. In the cartilage data (GSE175961), we observed a list of miRNAs with shared expression patterns, yet the precise differential values could not be determined.

      (4) Another major concern is the chondrocyte culture method. Chondrocytes should be cultured in a 3D manner (i.e., a 3D pellet culture system or a micro mass culture method). 2D cultured chondrocytes tend to de-differentiate into MSC-like cells and thus lose their chondrocyte phenotype. This is evident from Fig. 3B and C. Cells started to spread out and only a few cells were positive for COL2A1 with a deep brown staining color. Thus, the results from the in vitro studies may not be representative of chondrocyte response to the treatments.

      We admit that 3D culture system will be more accurate and reliable. However, according to Liu Qianqian et al researches3, the 2D culture systems were also used and work well. Besides, the second-generation primary mice chondrocytes we used in the current study did not exhibit a significant dedifferentiated morphology. So, considering the experiment condition in our lab, we chose the second-generation cultured primary mouse chondrocytes in the whole process of cell experiment. To show the reliability of the cells, we provided more pictures in the supplement fig 7(fig. S7) In the future study, we will adopt 3D culture system for experiments. Thank you for your advices and we have added this limitation in the revised manuscript. (page11, line237-240)

      Author response image 5.

      Primary mice chondrocytes we cultured (P1)and the secondary generation cells(P2) we used in the following experiment.

      References which used 2D :

      (3) Liu Q, Zhai L, Han M, et al. SH2 Domain-Containing Phosphatase 2 Inhibition Attenuates Osteoarthritis by Maintaining Homeostasis of Cartilage Metabolism via the Docking Protein 1/Uridine Phosphorylase 1/Uridine Cascade. Arthritis & Rheumatology (Hoboken, NJ). 2022;74(3):462-474.

      (5) Page 7, lines 148-149: "The cartilage of mice injected with the miR-199b-5p mimic was slightly degraded (p=0.02) (Fig. 4E, F)". However, there was no significance between the groups found in Fig. 4F. Also, from the histological images of Fig. 4E, it looks like mice with inhibitor injection had more cartilage damage than miR-199b-5p mimic.

      We apologize for any confusion caused. Figures 4E and 4F represent the Safranin Fast Green Staining staining of the joint after the administration of miR-199b-5p inhibitor and mimic under physiological conditions. As you can see, there is minimal difference between these four images. There is no statistically significant difference. However, in Figures 5E and 5F, the MIA-induced KOA model was utilized, and noticeable differences can be observed after the administration of the inhibitor and mimic. In the revised version, we have emphasized that Figures 4E and 4F represent the results under physiological conditions, not under the MIA-induced model. (page 7, line 146-151)

      (6) Page 7, lines 149-150: "Additionally, the articular surface showed insect erosion (Fig. 4G)." It is also unclear how micro-CT analysis will be able to demonstrate the erosion of cartilage. Or the authors actually indicate the trochlear groove. However, this could also be observed in the control group and the results were not quantified. It is also unclear if the cross-section images of micro-CT shown here are helpful at all without any further explanation in the manuscript.

      Figure 4 G represents control, vehicle control, inhibitor, and mimic groups, while Figure 5 G represents model, model+vehicle control, model+inhibitor, and model+mimic groups. From Figure 4G, it can be observed that the simulator group showed the most obvious erosion appearance, while the inhibitor group did not exhibit this phenomenon5. From Figure 5G, it can be seen that the model group and model+mimic group exhibited the most pronounced erosion appearance, while the model+inhibitor group showed the best recovery. To highlight the pathological changes in the erosion appearance, we marked the typical locations with red arrows in the images for easy comparison and reading by the readers (Fig. 4G; Fig. 5G). We also made corresponding textual modifications in the original manuscript to address these findings (page 7, line 150-151; page 8, line 160-161). In addition, the 3D reconstruction of micro-CT is based on the synthesis of these cross-sectional images.

      References

      (5) Tao Y, Wang Z, Wang L, et al. Downregulation of miR-106b attenuates inflammatory responses and joint damage in collagen-induced arthritis. Rheumatology (Oxford, England). 2017;56(10):1804-1813.

      (7) Page 17, line 309-310: "Before model establishment and at 3, 7, 10, 14, 21, and 28 days after model establishment." Please re-write this as this is not clear regarding the experimental procedure.

      Thank you. We had to re-write the sentences as following:Baseline testing of behavioral pain thresholds was conducted prior to model establishment, followed by behavioral pain threshold testing on days 3, 7, 10, 14, 21, and 28 after model establishment. (pages15, line322-324)

      (8) Fig. 5A. The M + inhibitor and Model images are not at the same plane as M + mimic and M + RNAnc images.

      Thank you. We have modified.

      (9) Fig. 5B. There are two lines both with circle markers (Control and M+inhibitor). Please correct.

      We have corrected.

      (10) Fig. 5F. Missing * sign.

      We added *sign.

      (11) Please elaborate how the potential binding sites between miR-199b-5p and Gcnt2 and between miR-199b-5p and Fzd6.

      We apologize for any lack of clarity in the original text. In fact, we utilized targets to predict potential binding sites. Specifically, for the mouse species, we predicted that the 3'UTR of Fzd6 binds with miR-199b-5p at positions 2483-2490, 3244-3251, 3303-3309, and 3854-3860, while the 3'UTR of Gcnt2 binds with miR-199b-5p at positions 2755-2762 and 4144-4151. In the revised version, we provide a detailed description of the methodology used for predicting these sites and offer an elaborate explanation of the results. (pages16, line352)

      Additionally, to demonstrate consistency with human binding sites, we not only predicted the binding sites of human miR with these two target genes but also found a high conservation of up to 95.65% between the human and mouse sequences of miR-199b-5p. We have included this information in the supplementary materials (Fig. S6). In Fig. 6E-F, we presented the potential binding sites between miR-199b-5p and Gcnt2, as well as between miR-199b-5p and Fzd6. In addition, we provide the predicted binding of human sequence to illustrate the binding sites. Furthermore, the predicted binding of human miR-199b-5p with fzd6 and gcnt2 showed a high degree of consistency. (The fluorescent labeling in the following text indicates the potential predicted binding sites.) (Supplement file 8)

      hsa-miR-199b-5p MIMAT0000263

      CCCAGUGUUUAGACUAUCUGUUC

      NCBI Gene ID 8323 GenBank Accession NM_001164615

      Gene Symbol FZD6 3' UTR Length 1368

      Gene Description frizzled class receptor 6

      3' UTR Sequence: agaacattttctctcgttactcagaagcaaatttgtgttacactggaagtgacctatgcactgttttgtaagaatcactgttacattcttcttttgcacttaaagttgcattgcctactgttatactggaaaaaatagagttcaagaataatatgactcatttcacacaaaggttaatgacaacaatatacctgaaaacagaaatgtgcaggttaataatatttttttaatagtgtgggaggacagagttagaggaatcttccttttctatttatgaagattctactcttggtaagagtattttaagatgtactatgctattttacttttttgatataaaatcaagatatttctttgctgaagtatttaaatcttatccttgtatctttttatacatatttgaaaataagcttatatgtatttgaacttttttgaaatcctattcaagtatttttatcatgctattgtgatattttagcactttggtagcttttacactgaatttctaagaaaattgtaaaatagtcttcttttatactgtaaaaaaagatataccaaaaagtcttataataggaatttaactttaaaaacccacttattgataccttaccatctaaaatgtgtgatttttatagtctcgttttaggaatttcacagatctaaattatgtaactgaaataaggtgcttactcaaagagtgtccactattgattgtattatgctgctcactgatccttctgcatatttaaaataaaatgtcctaaagggttagtagacaaaatgttagtcttttgtatattaggccaagtgcaattgacttcccttttttaatgtttcatgaccacccattgattgtattataaccacttacagttgcttatattttttgttttaacttttgttttttaacatttagaatattacattttgtattatacagtacctttctcagacattttgtagaattcatttcggcagctcactaggattttgctgaacattaaaaagtgtgatagcgatattagtgccaatcaaatggaaaaaaggtagttttaataaacaagacacaacgtttttatacaacatactttaaaatattaaggagttttcttaattttgtttcctattaagtattattctttgggcaagattttctgatgcttttgattttctctcaatttagcatttgcttttggtttttttctctatttagcattctgttaaggcacaaaaactatgtactgtatgggaaatgttgtaaatattaccttttccacattttaaacagacaactttgaatacaaaaactttgttttgtgtgatcttttcattaataaaattatctttgtataagaaaaaaaaaaaaaa

      hsa-miR-199b-5p MIMAT0000263

      CCCAGUGUUUAGACUAUCUGUUC

      NCBI Gene ID 2651 GenBank Accession NM_001491

      Gene Symbol GCNT2 3' UTR Length 2780

      Gene Description glucosaminyl (N-acetyl) transferase 2 (I blood group)

      3' UTR Sequence: gctattcatgagctactcatgactgaagggaaactgcagctgggaagaggagcctgtttttgtgagagacttttgccttcgtaatgttaaccgtttcaggaccacgtttatagcttcaggacctggctacgtaattatacttaaaatatccactggacactgtgaaatacactaacaggatggctgggtagagcaatctgggcactttggccaattttagtcttgctgtttcttgatgctcacctctatattagtttattgttaggatcaatgataaatttaaatgacctcagatctttgcaccagatactcatcatatacaaatgttttagtaaaaaagagaattgtagataatactgtctaggaaaataagaattaggtttctttgaagaaggaatcttttataacaccttaacagtcaccactgtgctcaaccagacagatagtgaaacagctttctgggtaattcaccaatttcctttaaaacataagctacctgaatggagaatacatcttgtttctgagtttcaacactagcatttttggcttactcatggacaaagttctgtatatagtataaagtcattaacaagaaacaggatatgctttaagacagaattcactgtctgttgcttcagtaaaaggacctcggggaataaaacatttctctcttatatgccagaatgtaggctggtccctatgtcatgtcttccattaagaacactaaaaagtccttgcaagaatggagatatgcattcaagagaggtgctatcacatagatctagtctgaagtctggaacactttcctcttctatgacccctctctccccagtattatcttacttgcaaaatggagaccaaattctatcctgtgaggcttttaattgcaccatagtatgctctgagtagctttacactgcctggtactgatagtagtggctcgatttttaagagccttcaattgtagatgaacatctctgttatttatccctcattcatccatccgttcattcattcagccttcaatcaacatctcttgagtgtctattatgtacaggacatgtactgagacaaaaaggaaacataagagctttttcactctaaaaatcttggcaataatgtcaacaccagaaagcctcctctggagaatcttacagagtgattgtagtttaatacaggaacacacagggctgtgtagcatgataccaggcccaggagatcagtaattacaaattaagggttaaatcagagattattcaacagagagggagaaaggaggagacagagggaggacctgttgtgttccagccattctggtattcctttatgtatctaatttcattcaaacctcacaacagtcttgtgaggcccttatataattactcccattttgcagatgaagtaactgaggcttagaaaggttaatagcaccggggaacaatttctctgggtgagaattgggactctgttgctggtcttctcagttcatttcctgaggtggatttactgagagaaggtgaaataaagccatatttagtataccagagaaggtagattttaagaatggtctcagtgttaatactgagaaaaagtcctgtcagttcagaaaaaatgtgaagtctactttagtattcctgtaatactaaaccgttgagtttctaaatatttatttattctaacaaaaagcaattactacaaatggatgacacatttaatgaacacaattttattttttttctgtaactgtgcttgttgaatgtcaatcatatttaaagggaatgactttgaagtaaaaccttttttcttgctactgaaaaaaatggagttgttttgggtggtaaagtgttaaggaatagggacagctggtcacacaaggaactcttgaaggccacatgtgaaaacctgtcacttgcacagaggccagtcccactaaggtgaccagagtgggctccaagcacaaactgccattggctatagatgggactgtgtccccccaaaattcatgtgttggagccttaaccctcaatgtgatggtatttgagatggggcctttggtaagggaagtttagatgaggtcacgagggtaggaccctcatgatgggatgagtccccttacaagacctctggcttgggccgggcgtggtggctcacacctgtaatcccaacactttgggaggccaaggcaggtagatcacttgatgccaggagttccagaccaggctggccgacatggtgaaaccccatctctactaaaaaatataaaaattagccgggctttgtggcatgtgcctgtaatcccagctatttggcaggctgaggcatgagaatcgcttgaacccaggaggtggaggttacagtgagctgagagtgccccactgcactccagcctgggtgacagagcgagactttgtcccaaaacaaaataggtgaggggatagcgaatgcactcagggtcagcagtggagtttaaaaattgtctcttttcaacttatttaaatgacagcacctgagaagaggaaccgttttacactggatgtttctcatgtagaacaagaaatctttctggaattgatgtttacatgtctgttgttggtcatctctcctgtgtcttaaatactttaatgttggaagagcatagtgtttgggctagtgggtttctgacagcccatgggaatgccctgaaactactgtatctgatgtttgttttcgatgaggttccatgttttgttttcttgggaataaattaatatattgttttccaaaaaaaaaaaaaaaaaaaa

      (12) Page 10-11, Line 222-223: "Our findings indicate that miR-199b-5p plays a crucial role in KOA by targeting Fzd6 and Gcnt2". This is an overstatement. The current work shows the possible bindings of miR-199b-5p and Fzd6 as well as bindings of miR-199b-5p and Gcnnt2. Whether miR-199b-5p truly functions through Fzd6 and/or Gcnt2 requires genetic knockdown of Fzd6 and Gcnt2 in the presence of miR-199b-5p. Thus, please tune down this statement and the title of the manuscript.

      We agree your opinion of our conclusion. Therefore, we delete the overstatement sentences and tune down the conclusion of the manuscript. (the title; page 8,179; page11, line227-228)

      (13) The Schematic figure (the last figure). Please remove osteophyte as this was not quantified in the study.

      We modified the schematic figure accordingly.

      Minor concerns:

      (1) Most figures were distorted.

      We provide a new version of the figure to avoid distortions.

      (2) Providing GO term numbers in Fig. 1C is not very helpful. Maybe show the GO term and corresponding numbers in the manuscript (Page 4, lines 79 - 82).

      Thank you for your advice. We added the corresponding notes of the GO term numbers in the manuscript to explain each biological concept of it. (Page 4, line 77-89;Page 22,line 515-532)

      (3) What were M-0.5 and M-1 in Fig. 2D? Different MIA concentrations?

      Yes, these are different MIA concentrations, which we illustrate in the legend. (Page 23, line 535-536)

      (4) Please follow the nomenclature of the gene symbol. For example, Fig. 3E-P should be mouse genes (?).

      We modified the relevant gene symbol.

      (5) Page 3, line 59. Not all chondrocytes are pathogenic cells in OA.

      We are sorry for the mistake, now it has been modified. (Page 3, line 59)

      (6) Typo. Page 3, line 55.

      We changed the Typo.

      (7) Page 4, line 78. These are differentially expressed miRNAs, not genes.

      We have revised the unsuitable expression. (Page4, line75-76)

      I wish the authors all the best with their continued work in this area.

      Thank you for your wishes.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      The authors build upon prior data implicating the secreted peptidoglycan hydrolase SagA produced by Enterococcus faecium in immunotherapy. Leveraging new strains with sagA deletion/complementation constructs, the investigators reveal that sagA is non-essential, with sagA deletion leading to a marked growth defect due to impaired cell division, and sagA being necessary for the immunogenic and anti-tumor effects of E. faecium. In aggregate, the study utilizes compelling methods to provide both fundamental new insights into E. faecium biology and host interactions and a proof-of-concept for identifying the bacterial effectors of immunotherapy response.

      We thank the Reviewers for their positive feedback on our manuscript. We also appreciate their helpful comments/critiques and have revised the manuscript as indicated below.

      Public Reviews:

      Reviewer #1 (Public Review):

      Klupt, Fam, Zhang, Hang, and colleagues present a novel study examining the function of sagA in E. faecium, including impacts on growth, peptidoglycan cleavage, cell separation, antibiotic sensitivity, NOD2 activation, and modulation of cancer immunotherapy. This manuscript represents a substantial advance over their prior work, where they found that sagA-expressing strains (including naturally-expressing strains and versions of non-expressing strains forced to overexpress sagA) were superior in activating NOD2 and improving cancer immunotherapy. Prior to the current study, an examination of sagA mutant E. faecium was not possible and sagA was thought to be an essential gene.

      The study is overall very carefully performed with appropriate controls and experimental checks, including confirmation of similar densities of ΔsagA throughout. Results are overall interpreted cautiously and appropriately.

      I have only two comments that I think addressing would strengthen what is already an excellent manuscript.

      In the experiments depicted in Figure 3, the authors should clarify the quantification of peptidoglycans from cellular material vs supernatants. It should also be clarified whether the sagA need to be expressed endogenously within E. faecium, and whether ambient endopeptidases (perhaps expressed by other nearby bacteria or recombinant enzymes added) can enzymatically work on ΔsagA cell wall products to produce NOD2 ligands?

      We mentioned in the main text that peptidoglycan was isolated from bacterial sacculi and digested with mutanolysin for LC-MS analysis. We have now also included “mutanolysin-digested” sacculi in the Figure 3 legend as well.

      We have added the following text “We next evaluated live bacterial cultures with mammalian cells to determine their ability to activate the peptidoglycan pattern recognition receptor NOD2” and “our analysis of these bacterial strains” to indicate live cultures were evaluated for NOD2 activation.

      We have also added the following text “Our results also demonstrated that while many enzymes are required for the biosynthesis and remodeling of peptidoglycan in E. faecium, SagA is essential for generating NOD2 activating muropeptides ex vivo.”

      In the murine experiments depicted in Figure 4, because the bacterial intervention is being performed continuously in the drinking water, the investigators have not distinguished between colonization vs continuous oral dosing of the mice peptidoglycans. While I do not think additional experimentation is required to distinguish the individual contributions of these 2 components in their therapeutic intervention, I do think the interpretation of their results should include this perspective.

      We have added the following text “We note that by continuous oral administration in the drinking water, live E. faecium and soluble muropeptides that are released into the media during bacterial growth may both contribute to NOD2 activation in vivo.” and revised the following text “Nonetheless, these results demonstrate SagA is not essential for E. faecium colonization, but required for promoting the ICI antitumor activity through NOD2 in vivo.

      Reviewer #2 (Public Review):

      Summary:

      The gut microbiome contributes to variation in the efficacy of immune checkpoint blockade in cancer therapy; however, the mechanisms responsible remain unclear. Klupt et al. build upon prior data implicating the secreted peptidoglycan hydrolase SagA produced by Enterococcus faecium in immunotherapy, leveraging novel strains with sagA deleted and complemented. They find that sagA is non-essential, but sagA deletion leads to a marked growth defect due to impaired cell division. Furthermore, sagA is necessary for the immunogenic and anti-tumor effects of E. faecium. Together, this study utilizes compelling methods to provide fundamental new insights into E. faecium biology and host interactions, and a proof-of-concept for identifying the bacterial effectors of immunotherapy response.

      Strengths:

      Klupt et al. provide a well-written manuscript with clear and compelling main and supplemental figures. The methods used are state-of-the-art, including various imaging modalities, bacterial genetics, mass spectrometry, sequencing, flow cytometry, and mouse models of immunotherapy response. Overall, the data supports the conclusions, which are a valuable addition to the literature.

      Weaknesses:

      Only minor revision recommendations were noted.

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      General comments - the number/type of replicates and statistics are missing from some of the figure panels. Please be sure to add these throughout - all main figure panels should have replicates. I've also noted some specific cases below.

      Abstract - sagA is non-essential, need to edit text at "essential functions".

      This change has been made.

      "small number of mutations" - specify how many in the text.

      We revised the text. “Small number” is changed to “11”.

      "under control of its native promoter" - what was the plasmid copy number? It looks clearly overexpressed in Figure 1d despite using a native promoter, although it's a bit hard to know for sure without a loading control.

      pAM401 has p15A origin of replication, therefore the plasmid copy number ~20-30 copies (Lutz R. et al Nucleic Acids Res. 1997). Total protein was visualized by Stain-Free™ imaging technology (BioRad) and serves as protein loading control and has been relabeled accordingly.

      "decrease levels of small muropeptides" - the asterisks are missing from Figure 3a.

      Green asterisks for peaks 2, 3, 7 and purple asterisks for peaks 13, 14 were added.

      The use of "Com 15 WT" in the figures is confusing - just replace it with "wt" and specify the strain in the text. Presumably, all of the strains are on the Com 15 background.

      “Com15 WT” was replaced to “WT” in figures and main text.

      Change 1d to 1b so that the panels are in order (reading left to right and then top to bottom).

      Figure 1 legend is missing a number of replicates and statistics for 1a.

      Number of replicates were added.

      Figure 1b - it's unclear to me what to look at here, could add arrows indicating the feature or interest and expand the relevant text.

      Arrows pointing to cell clusters were added.

      Figure 1d - what is "stain free"? It would be preferable to show a loading control using an antibody against a constitutive protein to allow for normalization of the loading control.

      Stain-Free Imaging technology (BioRad) utilizes gel-containing trihalo compound to make proteins fluorescent directly in the gel with a short photoactivation, allowing the immediate visualization of proteins at any point during electrophoresis and western blotting. Stain-Free total protein measurement serves as a reliable loading control comparable to Coomassie Blue Staining. This has been relabeled a “Total protein” in the Figure and Stain-free imaging technology is noted in the legend.

      ED Figure 1 - representative of how many biological replicates?

      Legends are updated.

      ED Figure 2a - I would replace this with a table, it's not necessary to show the strip images. Also, please specify the number of replicates per group.

      Additional Extended Data Table 2 was added.

      ED Figure 2b - This data was not that convincing since the sagA KO has a marked growth defect and the time points are cut off too soon to know if growth would occur later. The MIC definition is potentially misleading. Should specific a % growth cutoff (i.e. <10% of vehicle control) and the metric used (carrying capacity or AUC). Then assign MIC to the tested concentration, not a range. The empty vector also seems to impact MIC, which is concerning and complicates the interpretation. Specify the number of replicates and add statistics. Given these various concerns, I might suggest removing this figure, as it doesn't really add much to the story.

      We appreciate this comment from the Reviewer, but believe this data is helpful for paper and have included longer time points for the growth data. The definition of MIC for ED Fig. 2b has been included in the legend.

      Figure 2 - specify the type of replicate. Number of cells? Number of slices? Number of independent cultures?

      For Cryo-ET experiments single bacterial cultures were prepared. Number of cells and slices for analysis are indicated in the legend. Legends are updated.

      Figure 4e - missing the water group, was it measured?

      Water (αPD-L1) group was not included in immune profiling of tumor infiltrating lymphocytes (TILs) experiment, as we have previously demonstrated limited impact on ICI anti-tumor activity and T cell activation in this setting (Griffin M et al Science 2021).

      Figure 4d - is this media specific to your strains? If not, qPCR may be a better method using strain-specific primers.

      Yes, HiCrome™ Enterococcus faecium agar plates (HIMEDIA 1580) are selective for Enterococcus species, moreover the agar is chromogenic allowing to identify E. faecium as yellow colonies among other Enterococcus species.

    1. Author response:

      We are planning to extend our results of the Jurkat model system to primary T cells, as requested by the referees and eLife’s Senior Editor. This will involve the inclusion of new figures, including super-resolution/STED images to reinforce our results and to satisfy the referees’ points. In addition, we will improve and/or replace all the mentioned images to solve the raised caveats, including further quantification and analyses.

    1. Author response:

      Reviewer #1 (Public Review):

      Summary:

      [...] This study is a fundamental step towards our better understanding of the mechanisms underlying light effects on cognition and consequently optimising lighting standards.

      Strengths:

      While it is still impossible to distinguish individual hypothalamic nuclei, even with the high-resolution fMRI, the authors split the hypothalamus into five areas encompassing five groups of hypothalamic nuclei. This allowed them to reveal that different parts of the hypothalamus respond differently to an increase in illuminance. They found that higher illuminance increased the activity of the posterior part of the hypothalamus encompassing the MB and parts of the LH and TMN, while decreasing the activity of the anterior parts encompassing the SCN and another part of TMN. These findings are somewhat in line with studies in animals. It was shown that parts of the hypothalamus such as SCN, LH, and PVN receive direct retinal input in particular from ipRGCs. Also, acute chemogenetic activation of ipRGCs was shown to induce activation of LH and also increased arousal in mice.

      Weaknesses:

      While the light characteristics are well documented and EDI calculated for all of the photoreceptors, it is not very clear why these irradiances and spectra were chosen. It would be helpful if the authors explained the logic behind the four chosen light conditions tested. Also, the lights chosen have cone-opic EDI values in a high correlation with the melanopic EDI, therefore we can't distinguish if the effects seen here are driven by melanopsin and/or other photoreceptors. In order to provide a more mechanistic insight into the light-driven effects on cognition ideally one would use a silent substitution approach to distinguish between different photoreceptors. This may be something to consider when designing the follow-up studies.

      We thank the reviewer for acknowledging the quality and interest of our work and agree with the weaknesses they pointed out.

      Blue-enriched light illuminances were set according to the technical characteristics of the light source and to keep the overall photon flux similar to prior 3T MRI studies of our team (between ~1012 and 1014 ph/cm²/s) (Vandewalle et al. 2010 PNAS, Vandewalle et al. 2011 Biol. Psy.). The orange light was introduced as a control visual stimulation for potential secondary whole-brain analyses. It’s photopic illuminance should ideally have been set similar to the low illuminance blue-enriched light condition, but it was not the case. For the present region of interest analyses, we discarded colour differences between the light conditions and only considered illuminance as indexed by mel EDI lux. This constitutes indeed a limitation of our study as it does not allow attributing the findings to a particular photoreceptor class.

      The revised version of the manuscript will include a better explanation as to the choice of illuminances and spectra. The discussion will make clear that these choices limit the interpretation about the photoreceptors involved. The discussion will also point out that silent substitution could be used in the future to resolve such question.

      Reviewer #2 (Public Review):

      [...] By shedding light on these complex interactions, this research endeavors to contribute to the foundational knowledge necessary for developing innovative therapeutic strategies aimed at enhancing cognitive function through environmental modulation.

      Strengths:

      (1) Considerable Sample Size and Detailed Analysis: The study leverages a robust sample size and conducts a thorough analysis of hypothalamic dynamics, which enhances the reliability and depth of the findings.

      (2) Use of High-Resolution Imaging: Utilizing 7 Tesla fMRI to analyze brain activity during cognitive tasks offers high-resolution insights into the differential effects of illuminance on hypothalamic activity, showcasing the methodological rigor of the study.

      (3) Novel Insights into Illuminance Effects: The manuscript reveals new understandings of how different regions of the hypothalamus respond to varying illuminance levels, contributing valuable knowledge to the field.

      (4) Exploration of Potential Therapeutic Applications: Discussing the potential therapeutic applications of light modulation based on the findings suggests practical implications and future research directions.

      Weaknesses:

      (1) Foundation for Claims about Orexin and Histamine Systems: The manuscript needs to provide a clearer theoretical or empirical foundation for claims regarding the impact of light on the orexin and histamine systems in the abstract.

      (2) Inclusion of Cortical Correlates: While focused on the hypothalamus, the manuscript may benefit from discussing the role of cortical activation in cognitive performance, suggesting an opportunity to expand the scope of the manuscript.

      (3) Details of Light Exposure Control: More detailed information about how light exposure was controlled and standardized is needed to ensure the replicability and validity of the experimental conditions.

      (4) Rationale Behind Different Exposure Protocols: To clarify methodological choices, the manuscript should include more in-depth reasoning behind using different protocols of light exposure for executive and emotional tasks.

      We thank the reviewer for recognising the interest and strength of our study. We agree that corrections and clarifications to the text were needed. We will address the weaknesses they pointed out as follows:

      (1) As detailed in the discussion, we do believe orexin and histamine are excellent candidates for mediating the results we report. As also pointing out, however, we are in no position to know which neurons, nuclei, neurotransmitter and neuromodulator underlie the results. We will therefore remove the last sentence of the abstract as we agree our final statement in the abstract was too strong. We will carefully reconsider the discussion to avoid such overstatements.

      (2) We are unsure at this stage how to address the comment of the reviewer without considerably lengthening the manuscript with statements which can only be putative. Hypothalamus nuclei are connected to multiple cortical (and subcortical) structures. The relevance of these projections will vary with the cognitive task considered. In addition, we have not yet considered the cortex in our analyses such that truly integrating cortical structures appears premature. We will nevertheless refer to the general statement that subcortical structures (and particularly those receiving direct retinal projections) are likely to receive light illuminance signal first before passing on the light modulation to the cortical regions involved in the ongoing cognitive process.

      (3) Illuminance and spectra could not be directly measured within the MRI scanner due to the ferromagnetic nature of measurement systems. The MR coil and the associated optic fibre stand, together with the entire lighting system were therefore placed outside of the MR room to reproduce the experimental conditions of the in a completely dark room. A sensor was placed 2 cm away from the mirror of the coil (mounted at eye level), i.e. where the eye of the first author of the paper would be positioned, to measure illuminance and spectra. The procedure was repeated 4 times for illuminance and twice for spectra and measurements were averaged. This procedure does not take into account inter-individual variation in head size and orbit shape such that the reported illuminance levels may have varied slightly across subjects. The relative differences between illuminance are very unlikely to vary substantially across participants such that statistics consisting of tests for the impact of relative differences in illuminance were not affected. We will report these methodological details in the supplementary material file associated to the paper.

      (4) The comment is similar to the issue raised by reviewer 1 (and reviewer 3) so we refer to the response provided to reviewer 1 to address the final comment of reviewer 2.

      Reviewer #3 (Public Review):

      [...] The authors find evidence in support of a posterior-to-anterior gradient of increased blood flow in the hypothalamus during task performance that they later relate to performance on two different tasks. The results provide an enticing link between light levels, hypothalamic activity, and cognitive/affective function, however, clarification of some methodological choices will help to improve confidence in the findings.

      Strengths:

      The authors' focus on the hypothalamus and its relationship to light intensity is an important and understudied question in neuroscience.

      Weaknesses:

      I found it challenging to relate the authors' hypotheses, which I found to be quite compelling, to the apparatus used to test the hypotheses - namely, the use of orange light vs. different light intensities; and the specific choice of the executive and emotional tasks, which differed in key features (e.g., block-related vs. event-related designs) that were orthogonal to the psychological constructs being challenged in each task.

      Given the small size of the hypothalamus and the irregular size of the hypothalamic parcels, I wondered whether a more data-driven examination of the hypothalamic time series would have provided a more parsimonious test of their hypothesis.

      We thank the reviewer for acknowledging the originality and interest of our study. We agree that some methodological choices needed more explanations. We will address the weaknesses they pointed out as follows:

      The first comment questions the choices of the light conditions and of the tasks. Regarding light conditions, since reviewer 1 (and reviewer 2) raised a similar issue, we refer to the response provided to reviewer 1. We agree that many different tasks could have been used to test our hypotheses. Prior work of our team showed that the n-back task and emotional task we used were successful probes to demonstrate that light illuminance modulates cognitive activity, including within subcortical structures (though resolution did not allow precise isolation of nuclei or subparts). When taking the step of ultra-high field imaging we therefore opted for these tasks as our goal was to show that illuminance affects subcortical brain activity across cognitive domains in general and we were not interested in tasks that would test specific aspects of these domains. The fact that one task is event-related while the other consists of a block design adds, in our view, to the robustness of our finding that a similar anterior-posterior gradient of activity modulation by illuminance is present in hypothalamus. We will update the discussion to highlight this aspect.

      As mentioned in the text, the protocol also included an auditory attentional task that could have further broadened the potential generalisability of our findings, but it was not part of the analyses as it could only include 2 illuminance levels due to time constrains.

      We agree that a data driven approach could have constituted an alternative means to tests our hypothesis. We opted for an approach that we mastered best while still allowing to conclusively test for regional differences in activity across the hypothalamus. Examination of time series of the very same data we used will mainly confirm the results of our analyses – an anterior-posterior gradient in the impact of illuminance - and may yield slight differences in the limits of the subparts of the hypothalamus undergoing decreased or increased activity with increasing illuminance. While the suggested approach may have been envisaged if we had been facing negative results (i.e. no differences between subparts, potentially because subparts would not correspond functional differences in response to illuminance change), it would now constitute a circular confirmation of our main findings (i.e. using the same data). While we truly appreciate the suggestion, we do not consider that it would constitute a more parsimonious test of our hypothesis now that we successfully applied GLM/parcellation and GLMM approaches.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript, Bell et al. provide an exhaustive and clear description of the diversity of a new class of predicted type IV restriction systems that the authors denote as CoCoNuTs, for their characteristic presence of coiled-coil segments and nuclease tandems. Along with a comprehensive analysis that includes phylogenetics, protein structure prediction, extensive protein domain annotations, and an in-depth investigation of encoding genomic contexts, they also provide detailed hypotheses about the biological activity and molecular functions of the members of this class of predicted systems. This work is highly relevant, it underscores the wide diversity of defence systems that are used by prokaryotes and demonstrates that there are still many systems to be discovered. The work is sound and backed-up by a clear and reasonable bioinformatics approach. I do not have any major issues with the manuscript, but only some minor comments.

      Strengths:

      The analysis provided by the authors is extensive and covers the three most important aspects that can be covered computationally when analysing a new family/superfamily: phylogenetics, genomic context analysis, and protein-structure-based domain content annotation. With this, one can directly have an idea about the superfamily of the predicted system and infer their biological role. The bioinformatics approach is sound and makes use of the most current advances in the fields of protein evolution and structural bioinformatics.

      Weaknesses:

      It is not clear how coiled-coil segments were assigned if only based on AF2-predicted models or also backed by sequence analysis, as no description is provided in the methods. The structure prediction quality assessment is based solely on the average pLDDT of the obtained models (with a threshold of 80 or better). However, this is not enough, particularly when multimeric models are used. The PAE matrix should be used to evaluate relative orientations, particularly in the case where there is a prediction that parts from 2 proteins are interacting. In the case of multimers, interface quality scores, such as the ipTM or pDockQ, should also be considered and, at minimum, reported.

      A description of the coiled-coil predictions has been added to the Methods. For multimeric models, PAE matrices and ipTM+pTM scores have been included in Supplementary Data File S1.

      Reviewer #2 (Public Review):

      Summary:

      In this work, using in-depth computational analysis, Bell et al. explore the diverse repertoire of type IV McrBC modification-dependent restriction systems. The prototypical two-component McrBC system has been structurally and functionally characterised and is known to act as a defence by restricting phage and foreign DNA containing methylated cytosines. Here, the authors find previously unanticipated complexity and versatility of these systems and focus on detailed analysis and classification of a distinct branch, the so-called CoCoNut, named after its composition of coiled-coil structures and tandem nucleases. These CoCoNut systems are predicted to target RNA as well as DNA and to utilise defence mechanisms with some similarity to type III CRISPR-Cas systems.

      Strengths:

      This work is enriched with a plethora of ideas and a myriad of compelling hypotheses that now await experimental verification. The study comes from the group that was amongst the first to describe, characterize, and classify CRISPR-Cas systems. By analogy, the findings described here can similarly promote ingenious experimental and conceptual research that could further drive technological advances. It could also instigate vigorous scientific debates that will ultimately benefit the community.

      Weaknesses:

      The multi-component systems described here function in the context of large oligomeric complexes. Some of the single chain AF2 predictions shown in this work are not compatible, for example, with homohexameric complex formation due to incompatible orientation of domains. The recent advances in protein structure prediction, in particular AlphaFold2 (AF2) multimer, now allow us to confidently probe potential protein-protein interactions and protein complex formation. This predictive power could be exploited here to produce a better glimpse of these multimeric protein systems. It can also provide a more sound explanation for some of the observed differences amongst different McrBC types.

      Hexameric CnuB complexes with CnuC stimulatory monomers for Type I-A, I-B, I-C, II, and III-A CoCoNuT systems have been modeled with AF2 and included in Supplementary Data File S1, albeit without the domains fused to the GTPase N-terminus (with the exception of Type I-B, which lacks the long coiled-coil domain fused to the GTPase and was modeled with its entire sequence). Attempts to model the other full-length CnuB hexamers did not lead to convincing results.

      Recommendations for the authors:

      Reviewing Editor:

      The detailed recommendations by the two reviewers will help the authors to further strengthen the manuscript, but two points seem particularly worth considering: 1. The methods are barely sketched in the manuscript, but it could be useful to detail them more closely. Particularly regarding the coiled-coil segments, which are currently just statists, useful mainly for the name of the family, more detail on their prediction, structural properties, and purpose would be very helpful. 2. Due to its encyclopedic nature, the wealth of material presented in the paper makes it hard to penetrate in one go. Any effort to make it more accessible would be very welcome. Reviewer 1 in particular has made a number of suggestions regarding the figures, which would make them provide more support for the findings described in the text.

      A description of the techniques used to identify coiled-coil segments has been added to the Methods. Our predictions ranged from near certainty in the coiled-coils detected in CnuB homologs, to shorter helices at the limit of detection in other factors. We chose to report all probable coiled-coils, as the extensive coiled-coils fused to CnuB, which are often the only domain present other than the GTPase, imply involvement in mediating complex formation by interacting with coiled-coils in other factors, particularly the other CoCoNuT factors. The suggestions made by Reviewer 1 were thoughtful and we made an effort to incorporate them.

      Reviewer #1 (Recommendations For The Authors):

      I do not have any major issues with the manuscript. I have however some minor comments, as described below.

      • The last sentence of the abstract at first reads as a fact and not a hypothesis resulting from the work described in the manuscript. After the second read, I noticed the nuances in the sentence. I would suggest a rephrasing to emphasize that the activity described is a theoretical hypothesis not backed-up by experiments.

      This sentence has been rephrased to make explicit the hypothetical nature of the statement.

      • In line 64, the authors rename DUF3578 as ADAM because indeed its function is not unknown. Did the authors consider reaching out to InterPro to add this designation to this DUF? A search in interpro with DUF3578 results in "MrcB-like, N-terminal domain" and if a name is suggested, it may be worthwhile to take it to the IntrePro team.

      We will suggest this nomenclature to InterPro.

      • I find Figure 1E hard to analyse and think it occupies too much space for the information it provides. The color scheme, the large amount of small slices, and the lack of numbers make its information content very small. I would suggest moving this to the supplementary and making it instead a bar plot. If removed from Figure 1, more space is made available for the other panels, particularly the structural superpositions, which in my opinion are much more important.

      We have removed Figure 1E from the paper as it adds little information beyond the abundance and phyletic distribution of sequenced prokaryotes, in which McrBC systems are plentiful.

      • In Figure 2, it is not clear due to the presence of many colorful "operon schemes" that the tree is for a single gene and not for the full operon segment. Highlighting the target gene in the operons or signalling it somehow would make the figure easy to understand even in the absence of the text and legend. The same applies to Supplementary Figure 1.

      The legend has been modified to show more clearly that this is a tree of McrB-like GTPases.

      • In line 146, the authors write "AlphaFold-predicted endonucelase fold" to say that a protein contains a region that AF2 predicts to fold like an endonuclease. This is a weird way of writing it and can be confusing to non-expert readers. I would suggest rephrasing for increased clarity.

      This sentence has been rephrased for greater clarity.

      • In line 167, there is a [47]. I believe this is probably due to a previous reference formatting.

      Indeed, this was a reference formatting error and has been fixed.

      • In most figures, the color palette and the use of very similar color palettes for taxonomy pie charts, genomic context composition schemes, and domain composition diagrams make it really hard to have a good understanding of the image at first. Legends are often close to each other, and it is not obvious at first which belong to what. I would suggest changing the layouts and maybe some color schemes to make it easier to extract the information that these figures want to convey.

      It seemed that Figure 4 was the most glaring example of these issues, and it has been rearranged for easier comprehension.

      • In the paragraph that starts at line 199, the authors mention an Ig-like domain that is often found at the N-terminus of Type I CoCoNuTs. Are they all related to each other? How conserved are these domains?

      These domains are all predicted to adopt a similar beta-sandwich fold and are found at the N-terminus of most CoCoNuT CnuC homologs, suggesting they are part of the same family, but we did not undertake a more detailed sequenced-based analysis of these regions.

      We also find comparable domains in the CnuC/McrC-like partners of the abundant McrB-like NxD motif GTPases that are not part of CoCoNuT systems, and given the similarity of some of their predicted structures to Rho GDP-dissociation inhibitor 1, we suspect that they have coevolved as regulators of the non-canonical NxD motif GTPase type. Our CnuBC multimer models showing consistent proximity between these domains in CnuC and CnuB GTPase domains suggest this could indeed be the case. We plan to explore these findings further in a forthcoming publication.

      • In line 210, the authors write "suggesting a role in overcrowding-induced stress response". Why so? In >all other cases, the authors justify their hypothesis, which I really appreciated, but not here.

      A supplementary note justifying this hypothesis has been added to Supplementary Data File S1.

      • At the end of the paragraph that starts in line 264, the authors mention that they constructed AF2 multimeric models to predict if 2 proteins would interact. However, no quality scores were provided, particularly the PAE matrix. This would allow for a better judgement of this prediction, and I would suggest adding the PAE matrix as another panel in the figure where the 3D model of the complex is displayed.

      The PAE matrix and ipTM+pTM scores for this and other multimer models have been added to Supplementary Data File S1. For this model in particular, the surface charge distribution of the model has been presented to support the role of the domains that have a higher PAE in RNA binding.

      • In line 306, "(supplementary data)" refers to what part of the file?

      This file has been renamed Supplementary Table S3 and referenced as such.

      • In line 464, the authors suggest that ShdA could interact with CoCoNuTs. Why not model the complex as done for other cases? what would co-folding suggest?

      As we were not able to convincingly model full-length CnuB hexamers with N-terminal coiled-coils, we did not attempt modeling of this hypothetical complex with another protein with a long coiled-coil, but it remains an interesting possibility.

      • In line 528, why and how were some genes additionally analyzed with HHPred?

      Justification for this analysis has been added to the Methods, but briefly, these genes were additionally analyzed if there were no BLAST hits or to confirm the hits that were obtained.

      • In the first section of the methods, the first and second (particularly the second) paragraphs are extremely long. I would suggest breaking them to facilitate reading.

      This change has been made.

      • In line 545, what do the authors mean by "the alignment (...) were analyzed with HHPred"?

      A more detailed description of this step has been added to the Methods.

      • The authors provide the models they produced as well as extensive supplementary tables that make their data reusable, but they do not provide the code for the automated steps, as to excise target sequence sections out of multiple sequence alignments, for example.

      The code used for these steps has been in use in our group at the NCBI for many years. It will be difficult to utilize outside of the NCBI software environment, but for full disclosure, we have included a zipped repository with the scripts and custom-code dependencies, although there are external dependencies as well such as FastTree and BLAST. In brief, it involves PSI-BLAST detection of regions with the most significant homology to one of a set of provided alignments (seals-2-master/bin/wrappers/cog_psicognitor). In this case, the reference alignments of McrB-like GTPases and DUF2357 were generated manually using HHpred to analyze alignments of clustered PSI-BLAST results. This step provided an output of coordinates defining domain footprints in each query sequence, which were then combined and/or extended using scripts based on manual analysis of many examples with HHpred (footprint_finders/get_GTPase_frags.py and footprint_finders/get_DUF2357_frags.py), then these coordinates were used to excise such regions from the query amino acid sequence with a final script (seals-2-master/bin/misc/fa2frag).

      Reviewer #2 (Recommendations For The Authors):

      (1) Page 4, line 77 - 'PUA superfamily domains' could be more appropriate to use instead of "EVE superfamily".

      While this statement could perhaps be applied to PUA superfamily domains, our previous work we refer to, which strongly supports the assertion, was restricted to the EVE-like domains and we prefer to retain the original language.

      (2) Page 5. lines 128-130 - AF2 multimer prediction model could provide a more sound explanation for these differences.

      Our AF2 multimer predictions added in this revision indeed show that the NxD motif McrB-like CoCoNuT GTPases interact with their respective McrC-like partners such that an immunoglobulin-like beta-sandwich domain, fused to the N-termini of the McrC homologs and similar to Rho GDP-dissociation inhibitor 1, has the potential to physically interact with the GTPase variants. However, we did not probe this in greater detail, as it is beyond the scope of this already highly complex article, but we plan to study it in the future.

      (3) Page 8, line 252 - The surface charge distribution of CnuH OB fold domain looks very different from SmpB (pdb3iyr). In fact, the regions that are in contact with RNA in SmpB are highly acidic in CoCoNut CnuH. Although it looks likely that this domain is involved in RNA binding, the mode of interaction should be very different.

      We did not detect a strong similarity between the CnuH SmpB-like SPB domain and PDB 3IYR, but when we compare the surface charge distribution of PDB 1WJX and the SPB domain, while there is a significant area that is positively charged in 1WJX that is negatively charged in SPB, there is much that overlaps with the same charge in both domains.

      The similarity between SmpB and the SPB domain is significant, but definitely not exact. An important question for future studies is: If the domains are indeed related due to an ancient fusion of SmpB to an ancestor of CnuH, would this degree of divergence be expected?

      In other words, can we say anything about how the function of a stand-alone tmRNA-binding protein could evolve after being fused to a complex predicted RNA helicase with other predicted RNA binding domains already present? Experimental validation will ultimately be necessary to resolve these kinds of questions, but for now, it may be safe to say that the presence of this domain, especially in conjunction with the neighboring RelE-like RTL domain and UPF1-like helicase domain, signals a likely interaction with the A-site of the ribosome, and perhaps restriction of aberrant/viral mRNA.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This work provides a valuable contribution and assessment of what it means to replicate a null study finding, and what are the appropriate methods for doing so (apart from a rote p-value assessment). Through a convincing re-analysis of results from the Reproducibility Project: Cancer Biology using frequentist equivalence testing and Bayes factors, the authors demonstrate that even when reducing 'replicability success' to a single criterion, how precisely replication is measured may yield differing results. Less focus is directed to appropriate replication of non-null findings.

      Reviewer #1 (Public Review):

      Summary:

      The goal of Pawel et al. is to provide a more rigorous and quantitative approach for judging whether or not an initial null finding (conventionally with p ≥ 0.05) has been replicated by a second similarly null finding. They discuss important objections to relying on the qualitative significant/non-significant dichotomy to make this judgment. They present two complementary methods (one frequentist and the other Bayesian) which provide a superior quantitative framework for assessing the replicability of null findings.

      Strengths:

      Clear presentation; illuminating examples drawn from the well-known Reproducibility Project: Cancer Biology data set; R-code that implements suggested analyses. Using both methods as suggested provides a superior procedure for judging the replicability of null findings.

      Weaknesses:

      The proposed frequentist and the Bayesian methods both rely on binary assessments of an original finding and its replication. I'm not sure if this is a weakness or is inherent to making binary decisions based on continuous data.

      For the frequentist method, a null finding is considered replicated if the original and replication 90% confidence intervals for the effects both fall within the equivalence range. According to this approach, a null finding would be considered replicated if p-values of both equivalences tests (original and replication) were, say, 0.049, whereas would not be considered replicated if, for example, the equivalence test of the original study had a p-value of 0.051 and the replication had a p-value of 0.001. Intuitively, the evidence for replication would seem to be stronger in the second instance. The recommended Bayesian approach similarly relies on a dichotomy (e.g., Bayes factor > 1).

      Thanks for the suggestions, we now emphasize more strongly in the “Methods for assessing replicability of null results” and “Conclusions” sections that both TOST p-values and Bayes factors are quantitative measures of evidence that do not require dichotomization into “success” or “failure”.

      Reviewer #2 (Public Review):

      Summary:

      The study demonstrates how inconclusive replications of studies initially with p > 0.05 can be and employs equivalence tests and Bayesian factor approaches to illustrate this concept. Interestingly, the study reveals that achieving a success rate of 11 out of 15, or 73%, as was accomplished with the non-significance criterion from the RPCB (Reproducibility Project: Cancer Biology), requires unrealistic margins of Δ > 2 for equivalence testing.

      Strengths:

      The study uses reliable and shareable/open data to demonstrate its findings, sharing as well the code for statistical analysis. The study provides sensitivity analysis for different scenarios of equivalence margin and alfa level, as well as for different scenarios of standard deviations for the prior of Bayes factors and different thresholds to consider. All analysis and code of the work is open and can be replicated. As well, the study demonstrates on a case-by-case basis how the different criteria can diverge, regarding one sample of a field of science: preclinical cancer biology. It also explains clearly what Bayes factors and equivalence tests are.

      Weaknesses:

      It would be interesting to investigate whether using Bayes factors and equivalence tests in addition to p-values results in a clearer scenario when applied to replication data from other fields. As mentioned by the authors, the Reproducibility Project: Experimental Philosophy (RPEP) and the Reproducibility Project: Psychology (RPP) have data attempting to replicate some original studies with null results. While the RPCB analysis yielded a similar picture when using both criteria, it is worth exploring whether this holds true for RPP and RPEP. Considerations for further research in this direction are suggested. Even if the original null results were excluded in the calculation of an overall replicability rate based on significance, sensitivity analyses considering them could have been conducted. The present authors can demonstrate replication success using the significance criteria in these two projects with initially p < 0.05 studies, both positive and non-positive.

      Other comments:

      • Introduction: The study demonstrates how inconclusive replications of studies initially with p > 0.05 can be and employs equivalence tests and Bayesian factor approaches to illustrate this concept. Interestingly, the study reveals that achieving a success rate of 11 out of 15, or 73%, as was accomplished with the non-significance criterion from the RPCB (Reproducibility Project: Cancer Biology), requires unrealistic margins of Δ > 2 for equivalence testing.

      • Overall picture vs. case-by-case scenario: An interesting finding is that the authors observe that in most cases, there is no substantial evidence for either the absence or the presence of an effect, as evidenced by the equivalence tests. Thus, using both suggested criteria results in a picture similar to the one initially raised by the paper itself. The work done by the authors highlights additional criteria that can be used to further analyze replication success on a case-by-case basis, and I believe that this is where the paper's main contributions lie. Despite not changing the overall picture much, I agree that the p-value criterion by itself does not distinguish between (1) a situation where the original study had low statistical power, resulting in a highly inconclusive non-significant result that does not provide evidence for the absence of an effect and (2) a scenario where the original study was adequately powered, and a non-significant result may indeed provide some evidence for the absence of an effect when analyzed with appropriate methods. Equivalence testing and Bayesian factor approaches are valuable tools in both cases.

      Regarding the 0.05 threshold, the choice of the prior distribution for the SMD under the alternative H1 is debatable, and this also applies to the equivalence margin. Sensitivity analyses, as highlighted by the authors, are helpful in these scenarios.

      Thank you for the thorough review and constructive feedback. We have added an additional “Appendix C: Null results from the RPP and EPRP” that shows equivalence testing and Bayes factor analyses for the RPP and EPRP null results.

      Reviewer #3 (Public Review):

      Summary:

      The paper points out that non-significance in both the original study and a replication does not ensure that the studies provide evidence for the absence of an effect. Also, it can not be considered a "replication success". The main point of the paper is rather obvious. It may be that both studies are underpowered, in which case their non-significance does not prove anything. The absence of evidence is not evidence of absence! On the other hand, statistical significance is a confusing concept for many, so some extra clarification is always welcome.

      One might wonder if the problem that the paper addresses is really a big issue. The authors point to the "Reproducibility Project: Cancer Biology" (RPCB, Errington et al., 2021). They criticize Errington et al. because they "explicitly defined null results in both the original and the replication study as a criterion for replication success." This is true in a literal sense, but it is also a little bit uncharitable. Errington et al. assessed replication success of "null results" with respect to 5 criteria, just one of which was statistical (non-)significance.

      It is very hard to decide if a replication was "successful" or not. After all, the original significant result could have been a false positive, and the original null-result a false negative. In light of these difficulties, I found the paper of Errington et al. quite balanced and thoughtful. Replication has been called "the cornerstone of science" but it turns out that it's actually very difficult to define "replication success". I find the paper of Pawel, Heyard, Micheloud, and Held to be a useful addition to the discussion.

      Strengths:

      This is a clearly written paper that is a useful addition to the important discussion of what constitutes a successful replication.

      Weaknesses:

      To me, it seems rather obvious that non-significance in both the original study and a replication does not ensure that the studies provide evidence for the absence of an effect. I'm not sure how often this mistake is made.

      Thanks for the feedback. We do not have systematic data on how often the mistake of confusing absence of evidence with evidence of absence has been made in the replication context, but we do know that it has been made in at least three prominent large-scale replication projects (the RPP, RPEP, RPCB). We therefore believe that there is a need for our article.

      Moreover, we agree that the RPCB provided a nuanced assessment of replication success using five different criteria for the original null results. We emphasize this now more in the “Introduction” section. However, we do not consider our article as “a little bit uncharitable” to the RPCB, as we discuss all other criteria used in the RPCB and note that our intent is not to diminish the important contributions of the RPCB, but rather to build on their work and provide constructive recommendations for future researchers. Furthermore, in response to comments made by Reviewer #2, we have added an additional “Appendix B: Null results from the RPP and EPRP” that shows equivalence testing and Bayes factor analyses for null results from two other replication projects, where the same issue arises.

      Reviewer #1 (Recommendations For The Authors):

      The authors may wish to address the dichotomy issue I raise above, either in the analysis or in the discussion.

      Thank you, we now emphasize that Bayes factors and TOST p-values do not need to be dichotomized but can be interpreted as quantitative measures of evidence, both in the “Methods for assessing replicability of null results” and the “Conclusions” sections.

      Reviewer #2 (Recommendations For The Authors):

      Given that, here follow additional suggestions that the authors should consider in light of the manuscript's word count limit, to avoid confusing the paper's main idea:

      2) Referencing: Could you reference the three interesting cases among the 15 RPCB null results (specifically, the three effects from the original paper #48) where the Bayes factor differs qualitatively from the equivalence test?

      We now explicitly cite the original and replication study from paper #48.

      3) Equivalence testing: As the authors state, only 4 out of the 15 study pairs are able to establish replication success at the 5% level, in the sense that both the original and the replication 90% confidence intervals fall within the equivalence range. Among these 4, two (Paper #48, Exp #2, Effect #5 and Paper #48, Exp #2, Effect #6) were initially positive with very low p-values, one (Paper #48, Exp #2, Effect #4) had an initial p of 0.06 and was very precisely estimated, and the only one in which equivalence testing provides a clearer picture of replication success is Paper #41, Exp #2, Effect #1, which had an initial p-value of 0.54 and a replication p-value of 0.05. In this latter case (or in all these ones), one might question whether the "liberal" equivalence range of Δ = 0.74 is the most appropriate. As the authors state, "The post-hoc specification of equivalence margins is controversial."

      We agree that the post hoc choice of equivalence ranges is a controversial issue. The margins define an equivalence region where effect sizes are considered practically negligible, and we agree that in many contexts SMD = 0.74 is a large effect size that is not practically negligible. We therefore present sensitivity analyses for a wide range of margins. However, we do not think that the choice of this margin is more controversial for the mentioned studies with low p-values than for other studies with greater p-values, since the question of whether a margin plausibly encodes practically negligible effect sizes is not related to the observed p-value of a study. Nevertheless, for the new analyses of the RPP and EPRP data in Appendix B, we have added additional sensitivity analyses showing how the individual TOST p-values and Bayes factors vary as a function of the margin and the prior standard deviation. We think that these analyses provide readers with an even more transparent picture regarding the implications of the choice of these parameters than the “project-wise” sensitivity analyses in Appendix A.

      4) Bayes factor suggestions: For the Bayes factor approach, it would be interesting to discuss examples where the BF differs slightly. This is likely to occur in scenarios where sample sizes differ significantly between the original study and replication. For example, in Paper #48, Exp #2 and Effect #4, the initial p is 0.06, but the BF is 8.1. In the replication, the BF dramatically drops to < 1/1000, as does the p-value. The initial evidence of 8.1 indicates some evidence for the absence of an effect, but not strong evidence ("strong evidence for H0"), whereas a p-value of 0.06 does not lead to such a conclusion; instead, it favors H1. It would be interesting if the authors discussed other similar cases in the paper. It's worth noting that in Paper #5, Exp #1, Effect #3, the replication p-value is 0.99, while the BF01 is 2.4, almost indicating "moderate" evidence for H0, even though the p-value is inconclusive.

      We agree that some of the examples nicely illustrate conceptual differences between p-values and Bayes factors, e.g., how they take into account sample size and effect size. As methodologists, we find these aspects interesting ourselves, but we think that emphasizing them is beyond the scope of the paper and would distract eLife readers from the main messages.

      Concerning the conceptual differences between Bayes factors and TOST p-values, we already discuss a case where there are qualitative differences in more detail (original paper #48). We added another discussion of this phenomenon in the Appendix C as it also occurs for the replication of Ranganath and Nosek (2008) that was part of the RPP.

      5) p-values, magnitude and precision: It's noteworthy to emphasize, if the authors decide to discuss this, that the p-value is influenced by both the effect's magnitude and its precision, so in Paper #9, Exp #2, Effect #6, BF01 = 4.1 has a higher p-value than a BF01 = 2.3 in its replication. However, there are cases where both p-values and BF agree. For example, in Paper #15, Exp #2, Effect #2, both the original and replication studies have similar sample sizes, and as the p-value decreases from p = 0.95 to p = 0.23, BF01 decreases from 5.1 ("moderate evidence for H0") to 1.3 (region of "Absence of evidence"), moving away from H0 in both cases. This also occurs in Paper #24, Exp #3, Effect #6.

      We appreciate the suggestions but, as explained before, think that the message of our paper is better understood without additional discussion of more general differences between p-values and Bayes factors.

      6) The grey zone: Given the above topic, it is important to highlight that in the "Absence of evidence grey zone" for the null hypothesis, for example, in Paper #5, Exp #1, Effect #3 with a p = 0.99 and a BF01 = 2.4 in the replication, BF and p-values reach similar conclusions. It's interesting to note, as the authors emphasize, that Dawson et al. (2011), Exp #2, Effect #2 is an interesting example, as the p-value decreases, favoring H1, likely due to the effect's magnitude, even with a small sample size (n = 3 in both original and replications). Bayes factors are very close to one due to the small sample sizes, as discussed by the authors.

      We appreciate the constructive comments. We think that the two examples from Dawson et al. (2011) and Goetz et al. (2011) already nicely illustrate absence of evidence and evidence of absence, respectively, and therefore decided not to discuss additional examples in detail, to avoid redundancy.

      7) Using meta-analytical results (?): For papers from RPCB, comparing the initial study with the meta-analytical results using Bayes factor and equivalence testing approaches (thus, increasing the sample size of the analysis, but creating dependency of results since the initial study would affect the meta-analytical one) could change the conclusions. This would be interesting to explore in initial studies that are replicated by much larger ones, such as: Paper #9, Exp #2, Effect #6; Goetz et al. (2011), Exp #1, Effect #1; Paper #28, Exp #3, Effect #3; Paper #41, Exp #2, Effect #1; and Paper #47, Exp #1, Effect #5).

      Thank you for the suggestion. We considered adding meta-analytic TOST p-values and Bayes factors before, but decided that Figure 3 and the results section are already quite technical, so adding more analyses may confuse more than help. Nevertheless, these meta-analytic approaches are discussed in the “Conclusions” section.

      8) Other samples of fields of science: It would be interesting to investigate whether using Bayes factors and equivalence tests in addition to p-values results in a clearer scenario when applied to replication data from other fields. As mentioned by the authors, the Reproducibility Project: Experimental Philosophy (RPEP) and the Reproducibility Project: Psychology (RPP) have data attempting to replicate some original studies with null results. While the RPCB analysis yielded a similar picture when using both criteria, it is worth exploring whether this holds true for RPP and RPEP. Considerations for further research in this direction are suggested. Even if the original null results were excluded in the calculation of an overall replicability rate based on significance, sensitivity analyses considering them could have been conducted. The present authors can demonstrate replication success using the significance criteria in these two projects with initially p < 0.05 studies, both positive and non-positive.

      Thank you for the excellent suggestion. We added an Appendix B where the null results from the RPP and EPRP are analyzed with our proposed approaches. The results are also discussed in the “Results” and “Conclusions” sections.

      9) Other approaches: I am curious about the potential impact of using an approach based on equivalence testing (as described in https://arxiv.org/abs/2308.09112). It would be valuable if the authors could run such analyses or reference the mentioned work.

      Thank you. We were unaware of this preprint. It seems related to the framework proposed by Stahel W. A. (2021) New relevance and significance measures to replace p-values. PLoS ONE 16(6): e0252991. https://doi.org/10.1371/journal.pone.0252991

      We now cite both papers in the discussion.

      10) Additional evidence: There is another study in which replications of initially p > 0.05 studies with p > 0.05 replications were also considered as replication successes. You can find it here: https://www.medrxiv.org/content/10.1101/2022.05.31.22275810v2. Although it involves a small sample of initially p > 0.05 studies with already large sample sizes, the work is currently under consideration for publication in PLOS ONE, and all data and materials can be accessed through OSF (links provided in the work).

      Thank you for sharing this interesting study with us. We feel that it is beyond the scope of the paper to include further analyses as there are already analyses of the RPCB, RPP, and EPRP null results. However, we will keep this study in mind for future analysis, especially since all data are openly available.

      11) Additional evidence 02: Ongoing replication projects, such as the Brazilian Reproducibility Initiative (BRI) and The Sports Replication Centre (https://ssreplicationcentre.com/), continue to generate valuable data. BRI is nearing completion of its results, and it promises interesting data for analyzing replication success using p-values, equivalence regions, and Bayes factor approaches.

      We now cite these two initiatives as examples of ongoing replication projects in the introduction. Similarly as for your last point, we think that it is beyond the scope of the paper to include further analyses as there are already analyses of the RPCB, RPP, and EPRP null results.

      Reviewer #3 (Recommendations For The Authors):

      I have no specific recommendations for the authors.

      Thank you for the constructive review.

      Reviewing Editor (Recommendations For the Authors):

      I recognize that it was suggested to the authors by the previous Reviewing Editor to reduce the amount of statistical material to be made more suitable for a non-statistical audience, and so what I am about to say contradicts advice you were given before. But, with this revised version, I actually found it difficult to understand the particulars of the construction of the Bayes Factors and would have appreciated a few more sentences on the underlying models that fed into the calculations. In my opinion, the provided citations (e.g., Dienes Z. 2014. Using Bayes to get the most out of non-significant results) did not provide sufficient background to warrant a lack of more technical presentation here.

      Thank you for the feedback. We added a new “Appendix C: Technical details on Bayes factors” that provides technical details on the models, priors, and calculations underlying the Bayes factors.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      Bendzunas, Byrne et al. explore two highly topical areas of protein kinase regulation in this manuscript. Firstly, the idea that Cys modification could regulate kinase activity. The senior authors have published some standout papers exploring this idea of late, and the current work adds to the picture of how active site Cys might have been favoured in evolution to serve critical regulatory functions. Second, BRSK1/2 are understudied kinases listed as part of the "dark kinome" so any knowledge of their underlying regulation is of critical importance to advancing the field.

      Strengths:

      In this study, the author pinpoints highly-conserved, but BRSK-specific, Cys residues as key players in kinase regulation. There is a delicate balance between equating what happens in vitro with recombinant proteins relative to what the functional consequence of Cys mutation might be in cells or organisms, but the authors are very clear with the caveats relating to these connections in their descriptions and discussion. Accordingly, by extension, they present a very sound biochemical case for how Cys modification might influence kinase activity in cellular environs.

      Weaknesses:

      I have very few critiques for this study, and my major points are barely major.

      Major points

      (1) My sense is that the influence of Cys mutation on dimerization is going to be one of the first queries readers consider as they read the work. It would be, in my opinion, useful to bring forward the dimer section in the manuscript.

      We agree that the influence of Cys on BRSK dimerization is a topic of significant interest. Our primary focus was to explore oxidative regulation of the understudied BRSK kinases as they contain a conserved T-loop Cys, and we have previously demonstrated that equivalent residues at this position in related kinases were critical drivers of oxidative modulation of catalytic activity. We have demonstrated here that BRSK1 & 2 are similarly regulated by redox and this is due to oxidative modification of the T+2 Cys, in addition to Cys residues that are conserved amongst related ARKs as well as BRSK-specific Cys. Although we also provide evidence for limited redox-sensitive higher order BRSK species (dimers) in our in vitro analysis, these represent a small population of the total BRSK protein pool (this was validated by SEC-MALs analysis). As such, we do not have strong evidence to suggest that these limited dimers significantly contribute to the pronounced inhibition of BRSK1 & 2 in the presence of oxidizing agents, and instead believe that other biochemical mechanisms likely drive this response. This may result from oxidized Cys altering the conformation of the activation loop. Indeed, the formation of an intramolecular disulfide within the T-loop of BRSK1 & 2, which we detected by MS, is one such regulatory modification. It is noteworthy, that intramolecular disulfide bonds within the T-loop of AKT and MELK have already been shown to induce an inactive state in the kinase, and we posit a similar mechanism for BRSKs.

      While we recognize the potential importance of dimerization in this context, our current data from in vitro and cell-based assays do not provide substantial evidence to assert dimerization as a primary regulatory mechanism. Hence, we maintained a more conservative stance in our manuscript, discussing dimerization in later sections where it naturally followed from the initial findings. That being said, we acknowledge the potential significance of dimerization in the regulation of the BRSK T-loop cysteine. We believe this aspect merits further investigation and could indeed be the focus of a follow-up study.

      (2) Relatedly, the effect of Cys mutation on the dimerization properties of preparations of recombinant protein is not very clear as it stands. Some SEC traces would be helpful; these could be included in the supplement.

      In order to determine whether our recombinant BRSK proteins (and T-loop mutants) existed as monomers or dimers, we performed SDS-PAGE under reducing and non-reducing conditions (Fig 7). This unambiguously revealed that a monomer was the prominent species, with little evidence of dimers under these experimental conditions (even in the presence of oxidizing agents). Although we cannot discount a regulatory role for BRSK dimers in other physiological contexts, we could not produce sufficient evidence to suggest that multimerization played a substantial role in modifying BRSK kinase activity in our assays. We note that our in vitro analysis was performed using truncated forms of the protein, and as such it is entirely possible that regions of the protein that flank the kinase domain may serve additional regulatory functions that may include higher order BRSK conformations. In this regard, although we have not included SEC traces of our recombinant proteins, we have included analytical SEC-MALS of the truncated proteins (Supplementary Figure 6) which we believe to be more informative. We have also now included additional SEC-MALS data for BRSK2 C176A and C183A (Supplementary Figure 6d and e), which supports our findings in Fig 7, demonstrating the presence of limited dimer species under non-reducing conditions.

      (3) Is there any knowledge of Cys mutants in disease for BRSK1/2?

      We have conducted an extensive search across several databases: COSMIC (Catalogue of Somatic Mutations in Cancer), ProKinO (Protein Kinase Ontology), and TCGA (The Cancer Genome Atlas). These databases are well-regarded for their comprehensive and detailed records of mutations related to cancer and protein kinases. Our analysis using the COSMIC and TCGA databases focused on identifying any reported instances of Cys mutations in BRSK1/2 that are implicated in cancer. Additionally, we utilized the ProKinO database to explore the broader landscape of protein kinase mutations, including any potential disease associations of Cys mutations in BRSK1/2. However, we found no evidence to indicate the presence of Cys mutations in BRSK1/2 that are associated with cancer or disease. This lack of association in the current literature and database records suggests that, as of our latest search, Cys mutations in BRSK1/2 have not been reported as significant contributors to pathogenesis.

      (4) In bar charts, I'd recommend plotting data points. Plus, it is crucial to report in the legend what error measure is shown, the number of replicates, and the statistical method used in any tests.

      We have added the data points to the bar charts and included statistical methods in figure legends.

      (5) In Figure 5b, the GAPDH loading control doesn't look quite right.

      The blot has been repeated and updated.

      (6) In Figure 7 there is no indication of what mode of detection was used for these gels.

      We have updated the figure legend to confirm that the detection method was western blot.

      (7) Recombinant proteins - more detail should be included on how they were prepared. Was there a reducing agent present during purification? Where did they elute off SEC... consistent with a monomer of higher order species?

      We have added ‘produced in the absence of reducing agents unless stated otherwise’ in the methods section to improve clarity. Although we have not added additional sentences to describe the elution profile of the BRSK proteins by SEC during purification, we believe that the inclusion of analytical SEC-MALS data is sufficient evidence that the proteins are largely monomeric under non-reducing conditions.

      Reviewer #2 (Public Review):

      Summary:

      In this study by Bendzunas et al, the authors show that the formation of intra-molecular disulfide bonds involving a pair of Cys residues near the catalytic HRD motif and a highly conserved T-Loop Cys with a BRSK-specific Cys at an unusual CPE motif at the end of the activation segment function as repressive regulatory mechanisms in BSK1 and 2. They observed that mutation of the CPE-Cys only, contrary to the double mutation of the pair, increases catalytic activity in vitro and drives phosphorylation of the BRSK substrate Tau in cells. Molecular modeling and molecular dynamics simulations indicate that oxidation of the CPE-Cys destabilizes a conserved salt bridge network critical for allosteric activation. The occurrence of spatially proximal Cys amino acids in diverse Ser/Thr protein kinase families suggests that disulfide-mediated control of catalytic activity may be a prevalent mechanism for regulation within the broader AMPK family. Understanding the molecular mechanisms underlying kinase regulation by redox-active Cys residues is fundamental as it appears to be widespread in signaling proteins and provides new opportunities to develop specific covalent compounds for the targeted modulation of protein kinases.

      The authors demonstrate that intramolecular cysteine disulfide bonding between conserved cysteines can function as a repressing mechanism as indicated by the effect of DTT and the consequent increase in activity by BSK-1 and -2 (WT). The cause-effect relationship of why mutation of the CPE-Cys only increases catalytic activity in vitro and drives phosphorylation of the BRSK substrate Tau in cells is not clear to me. The explanation given by the authors based on molecular modeling and molecular dynamics simulations is that oxidation of the CPE-Cys (that will favor disulfide bonding) destabilizes a conserved salt bridge network critical for allosteric activation. However, no functional evidence of the impact of the salt-bridge network is provided. If you mutated the two main Cys-pairs (aE-CHRD and A-loop T+2-CPE) you lose the effect of DTT, as the disulfide pairs cannot be formed, hence no repression mechanisms take place, however when looking at individual residues I do not understand why mutating the CPE only results in the opposite effect unless it is independent of its connection with the T+2residue on the A-loop.

      Strengths:

      This is an important and interesting study providing new knowledge in the protein kinase field with important therapeutic implications for the rationale design and development of next-generation inhibitors.

      Weaknesses:

      There are several issues with the figures that this reviewer considers should be addressed.

      Reviewer #1 (Recommendations for The Authors):

      Major points

      Page 26 - the discussion could be more concise. There's an element of recapping the results, which should be avoided.

      Regarding the conciseness of the discussion section, we have thoroughly revised it to ensure a more succinct presentation, deliberately avoiding the recapitulation of results. The revised discussion now focuses on interpreting the findings and their implications, steering clear of redundancy with the results section.

      Figure 1b seems to be mislabeled/annotated. I recommend checking whether the figure legends match more broadly. Figure 1 appears to be incorrectly cited throughout the results.

      Thank you for pointing out the discrepancies in the labeling and citation of Figure 1b. We have carefully reviewed and corrected these issues to ensure that all figure labels, legends, and citations accurately reflect the corresponding data and illustrations. We appreciate your attention to detail and the opportunity to improve the clarity and accuracy of our presentation.

      Figure 6 - please include a color-coding key in the figure. Further support for these simulations could be provided by supplementary movies or plots of the interaction. Figure 4 colour palette should be adjusted for the spheres in the Richardson diagrams to have greater distinction.

      As suggested, we have amended the colour palette in Figure 4 to improve conformity throughout the figure.

      Minor points

      Figure 2 - it'd be helpful to know what the percentage coverage of peptides is.

      We have updated the figure legend to include peptide coverage for both proteins

      Some typos - Supp 2 legend "Domians".

      Fixed

      Figure 6 legend - analyzed by needs a space;

      Fixed

      Fig 8 legend schematic misspelled.

      Fixed

      Broadly, if you Google T-loop you get a pot pourri of enzyme answers. Why not just use Activation loop?

      The choice of "T-loop" over "Activation loop" in our manuscript was made to maintain consistency with other literature in the field, and in particular our previous paper “Aurora A regulation by reversible cysteine oxidation reveals evolutionarily conserved redox control of Ser/Thr protein kinase activity” where we refer to the activation loop cysteine as T-loop + 2. We acknowledge the varied enzyme contexts in which "T-loop" is used and agree on the importance of clarity. To address this, we made an explicit note in the manuscript that the "T-loop" is also referred to as the "Activation loop", ensuring readers are aware of the interchangeable use of these terms. Additionally, this nomenclature facilitates a more straightforward designation of cysteine residues within the loop (T+2 Cysteine). We believe this approach balances adherence to established conventions with the need for clarity and precision in our descriptions.

      Methods - what is LR cloning. Requires some definition. Some manufacturer detail is missing in methods, and referring to prior work is not sufficient to empower readers to replicate.

      We agree, and have added the following to the methods section:

      “BRSK1 and 2 were sub-cloned into pDest vectors (to encode the expression of N-terminal Flag or HA tagged proteins) using the Gateway LR Clonase II system (Invitrogen) according to the manufacturer’s instructions. pENtR BRSK1/2 clones were obtained in the form of Gateway-compatible donor vectors from Dr Ben Major (Washington University in St. Louis). The Gateway LR Clonase II enzyme mix mediates recombination between the attL sites on the Entry clone and the attR sites on the destination vector. All cloned BRSK1/2 genes were fully sequenced prior to use.”

      Page 7 - optimal settings should be reported. How were pTau signals quantified and normalised?

      We have added the following to the methods section:

      “Two-color Western blot detection method employing infrared fluorescence was used to measure the ratio of Tau phospho serine 262 to total Tau. Total GFP Tau was detected using a mouse anti GFP antibody and visualized at 680 nm using goat anti mouse IRdye 680 while phospho-tau was detected using a Tau phospho serine 262 specific antibody and visualized at 800 nm using goat anti rabbit IRdye 800. Imaging was performed using a Licor Odessey Clx with scan control settings set to 169 μm, medium quality, and 0.0 mm distance. Quantification was performed using Licor image studio on the raw image files. Total Tau to phospho Tau ratio was determined by measuring the ratio of the fluorescence intensities measured at 800 nm (pTau) to those at 680 nm (total tau).”

      In the Figure 6g-j legend, the salt bridge is incorrectly annotated as E185-R248 rather than 258.

      Fixed

      Lines 393-395 provides a repeat statement on BRSKs phosphorylating Tau (from 388-389).

      We have removed the repetition and reworded the opening lines of the results section to improve the overall flow of the manuscript.

      Supp. Figure 1 is difficult to view - would it be possible to increase the size of the phylogenetic analysis?

      We thank the reviewer for this observation. We have rotated (90°) and expanded the figure so that it can be more clearly viewed

      Supp. Figure 2 - BRSK1/2 incorrectly spelled.

      Fixed

      Please check the alignment of labels in Supp. Figure 3e.

      Fixed

      Reviewer #2 (Recommendations For The Authors):

      (1) In Figure 1, current panel b is not mentioned/described in the figure legend and as a consequence, the rest of the panels in the legends do not fit the content of the figure.

      Reviewer 1 also noted this error, and we have amended the manuscript accordingly.

      What is the rationale for using the HEK293T cells as the main experimental/cellular system? Are there cell lines that express both proteins endogenously so that the authors can recapitulate the results obtained from ectopic overexpression?

      The selection of HEK-293T cells was driven by their well-established utility in overexpression studies, which make them ideal for the investigation of protein interactions and redox regulation. This cell line's robust transfection efficiency and well-characterized biology provide a reliable platform for dissecting the molecular mechanisms underlying the redox regulation of proteins. Furthermore, the use of HEK-293T cells aligns with the broader scientific practice, serving as a common ground for comparability with existing literature in the field of BRSK1/2 signaling, protein regulation and interaction studies.

      The application of HEK-293T cells as a model system in our study serves as a foundational step towards eventually elucidating the functions of BRSK1/2 in neuronal cells, where these kinases are predominantly expressed and play critical roles. Given the fact that BRSKs are classed as ‘understudied’ kinases, the choice of a HEK-293T co-overexpression system allowed us to analyze the direct effects of BRSK kinase activity (using phosphorylation of Tau as a readout) in a cellular context and in more controlled manner. This approach not only aids in the establishment of a baseline understanding of the redox regulation of BRSK1/2, but also sets the stage for subsequent investigations in more physiologically relevant neuronal models

      In current panel d, could the authors recapitulate the same experimental conditions as in current panel c?

      Figure 1 panel c shows that both BRSK1 and 2 are reversibly inhibited by oxidizing agents such as H2O2, whilst panels d and e show the concentration dependent activation and inhibition of the BRSKs with increasing concentrations of DTT and H2O2 respectively. The experimental conditions were identical, other than changing amounts of reducing and oxidizing agents, and used the same peptide coupled assays. Data for all experiments were originally collected in ‘real time’ as depicted in Fig 1c (increase in substrate phosphorylation over time). However, to aid interpretation of the data, we elected to present the latter two panels as dose response curves by calculating the change in the rate of enzyme activity (shown as pmol phosphate incorporated into the peptide substrate per min) for each condition. To aid the reader, we now include an additional supplementary figure (new supplementary figure 2) depicting BRSK1 and 2 dependent phosphorylation of the peptide substrate in the presence of different concentrations of DTT and H2O2 in a real time (kinetic) assay. The new data shown is a subset of the unprocessed data that was used to calculate the rates of BRSK activity in Fig 1d & e.

      Why did the authors use full-length constructs in these experiments and did not in e.g. Figure 2 where they used KD constructs instead?

      In the initial experiments, illustrated in Figure 1, we employed full-length protein constructs to establish a proof of concept, demonstrating the overall behavior and interactions of the proteins in their full-length form. This confirmed that BRSK1 & 2, which both contain a conserved T + 2 Cys residue that is frequently prognostic for redox sensitivity in related kinases, displayed a near-obligate requirement for reducing agents to promote kinase activity.  

      Subsequently, in Figure 2, our focus shifted towards delineating the specific regions within the proteins that are critical for redox regulation. By using constructs that encompass only the kinase domain, we aimed to demonstrate that the redox-sensitive regulation of these proteins is predominantly mediated by specific cysteine residues located within the kinase domain itself. This strategic use of the kinase domain of the protein allowed for a more targeted investigation. Furthermore, in our hands these truncated forms of the protein were more stable at higher concentrations, enabling more detailed characterization of the proteins by DSF and SEC-MALS. We predict that the flanking disordered regions of the full-length protein (as predicted by AlphaFold) contribute to this effect.

      (2) In Figure 2, Did the authors try to do LC/MS-MS in the same experimental conditions as in Figure 1 (e.g. buffer minus/plus DTT, H2O2, H2O2 + DTT)?

      We would like to clarify that the mass spectrometry experiments were conducted exclusively on proteins purified under native (non-reducing) conditions. We did not extend the LC/MS-MS analyses to include proteins treated with various buffer conditions such as minus/plus DTT, H2O2, or H2O2 + DTT as used in the experiments depicted in Figure 1. Given that we could readily detect disulfides in the absence of oxidizing agents, we did not see the benefit of additional treatment conditions as peroxide treatment of protein samples can frequently complicate interpretation of MS data. However, it should be noted that prior to MS analysis, tryptic peptides were subjected to a 50:50 split, with one half alkylated in the presence of DTT (as described in the methods section) to eliminate disulfides and other transiently oxidized Cys forms. Comparative analysis between reduced and non-reduced tryptic peptides improved our confidence when assigning disulfide bonds (which were eliminated in identical peptides in the presence of DTT).

      On panel b, why did the authors show alphafold predictions and not empiric structural information (e.g. X-ray, EM,..)?

      The AlphaFold models were primarily utilized to map the general locations of redox-sensitive cysteine pairs within the proteins of interest. Although we have access to the crystal structure of mouse BRSK2, they do not fully capture the active conformation seen in the Alphafold model of the human version. The use of AlphaFold models for human proteins in this study aids in consistently tracking residue numbering across the manuscript, offering a useful framework for understanding the spatial arrangement of these critical cysteine pairs in their potentially active-like states. This approach facilitates our analysis and discussion by providing a reference for the structural context of these residues in the human proteins.

      What was the rationale for using the KD construct and not the FL as in Figure 1?

      The rationale to use the kinase domain was primarily based on the significantly lower confidence in the structural predictions for regions outside the kinase domain (KD). Our experimental focus was to investigate the role of conserved cysteine residues within the kinase domain, which are critical for the protein's function and regulation. This targeted approach allowed us to concentrate our analyses on the most functionally relevant and structurally defined portion of the protein, thereby enhancing the precision and relevance of our findings. As is frequently the case, truncated forms of the protein, consisting only of the kinase domain, are much more stable than their full length counterparts and are therefore more amenable to in vitro biochemical analysis. In our hands this was true for both BRSK1 and 2, and as such much of the data collected here was generated using kinase-domain (KD) constructs. Simulations using the KD structures are therefore much more representative of our original experimental setup.

      The BSK1 KD construct appears to be rather inactive and not responsive to DTT treatment. Could the authors comment on the differences observed with the FL construct of Figure 1

      It is important to note that BRSK1, in general, exhibits lower intrinsic activity compared to BRSK2. This reduced activity could be attributed to a range of factors, including the need for activation by upstream kinases such as LKB1, as well as potential post-translational modifications (PTMs) that may be absent in the bacterially expressed KD construct. The full-length forms of the protein were purified from Sf21 cells, and as such may have additional modifications that are lacking in the bacterially derived KD counterparts. We also cannot discount additional regulatory roles of the regions that flank the KD, and these may contribute in part to the modest discrepancy observed between constructs.  Despite these differences, it is crucial to emphasize that both the KD and FL constructs of BRSK1 are regulated by DTT, indicating a conserved redox-dependent activation for both of the related BRSK proteins.  

      (3) In Figure 4, on panel A wouldn´t the authors expect that mutating on the pairs e.g. C198A in BSK1 would have the same effect as mutating the C191 from the T+2 site? Did they try mutating individual sites of the aE/CHRD pair? The same will apply to BSK2

      We appreciate the insightful comment. It's important to clarify that the redox regulation of these proteins is influenced not solely by the formation of disulfide bonds but also by the oxidation state of individual cysteine residues, particularly the T+2 Cys. This nuanced mechanism of regulation allows for a diverse range of functional outcomes based on the specific cysteine involved and its state of oxidation. This aspect forms a key finding of our paper, highlighting the complexity of redox regulation beyond mere disulfide bond formation. For example, AURA kinase activity is regulated by oxidation of a single T+2 Cys (Cys290, equivalent to Cys191 and Cys176 of BRSK1 and 2 respectively), but this regulation can be supplemented through artificial incorporation of a secondary Cys at the DFG+2 position (Byrne et al., 2020). This targeted genetic modification or AURA mirrors equivalent regulatory disulfide-forming Cys pairs that naturally occur in kinases such as AKT and MELK, and which provide an extra layer of regulatory fine tuning (and a possible protective role to prevent deleterious over oxidation) to the T+2 Cys. We surmise that the CPE Cys is also an accessory regulatory element to the T+2 Cys in BRSK1 +2, which is the dominant driver of BRSK redox sensitivity (as judged by the fact that CPE Cys mutants are still potently regulated by redox [Fig 4]), by locking it in an inactive disulfide configuration.

      In our preliminary analysis of BRSK1, we observed that mutations of individual sites within the aE/CHRD pair was similarly detrimental to kinase activity as a tandem mutation (see reviewer figure 1). As discussed in the manuscript, we think that these Cys may serve important structural regulatory functions and opted to focus on co-mutations of the aE/CHRD pair for the remainder of our investigation.

      Author response image 1.

      In vitro kinase assays showing rates of in vitro peptide phosphorylation by WT and Cys-to-Ala (aE/CHRD residues) variants of BRSK1 after activation by LKB1.

      In panels C and D, the same experimental conditions should have been measured as in A and B.

      Panels A and B were designed to demonstrate the enzymatic activity and the response to DTT treatment to establish the baseline redox regulation of the kinase and a panel of Cys-to-Ala mutant variants. In contrast, panels C and D were specifically focused on rescue experiments with mutants that showed a significant effect under the conditions tested in A and B. These panels were intended to further explore the role of redox regulation in modulating the activity of these mutants, particularly those that retained some level of activity or exhibited a notable response to redox changes.

      The rationale for this experimental design was to prioritize the investigation of mutants, such as those at the T+2 and CPE cysteine sites, which provided the most insight into the redox-dependent modulation of kinase activity. Other mutants, which resulted in inactivation, were deprioritized in this context as they offered limited additional information regarding the redox regulation mechanism. This focused approach allowed us to delve deeper into understanding how specific cysteine residues contribute to the redox-sensitive control of kinase function, aligning with the overall objective of elucidating the nuanced roles of redox regulation in kinase activity.

      (4) In figure 5: Why did the authors use reduced Glutathione instead of DTT? The authors should have recapitulated the same experimental conditions as in Figure 4 and not focused only on the T+2 or the CPE single mutants but using the double and the aE/CHRD mutants as well, as internal controls and validation of the enzymatic assays using the modified peptide

      Regarding the use of reduced glutathione (GSH) instead of DTT in Figure 5, we chose GSH for its well characterized biological relevance as an antioxidant in cellular responses to oxidative stress. Furthermore, while DTT has been widely used in experimental setups, it is also potentially cytotoxic at high concentrations.

      Addressing the point on experimental consistency with Figure 4, we appreciate the suggestion and indeed had already conducted such experiments (Previously Supp Fig 3, now changed to current Supp Fig 4). These experiments include analyses of BRSK mutant activity in a HEK-293T model. However, we chose not to focus on inactivating mutants (such as the aE/CHRD mutants which had depleted expression levels possibly as a consequence of compromised structural integrity) or pursue the generation of double mutant CMV plasmids, as these were deemed unlikely to add significant insights into the core narrative of our study. Our focus remained on the mutants that yielded the most informative results regarding the redox regulation mechanisms in the in vitro setting, ensuring a clear and impactful presentation of our findings.

      A time course evaluation of the reducing or oxidizing reagents should have been performed. Would we expect that in WT samples, and in the presence of GSH, and also in the case of the CPE mutant, an increment in the levels of Tau phosphorylation as a readout of BSK1-2 activity?

      We acknowledge the importance of such analyses in understanding the dynamic nature of redox regulation on kinase activity and have included a time course (Supp Fig 2 e-g). These results confirm a depletion of Tau phosphorylation over time in response to peroxide generated by the enzyme glucose oxidase.

      (5) In Figure 6, did the authors look at the functional impact of the residues with which interact the T+2 and the CPE motifs e.g. T174 and the E185-R258 tether?

      Our primary focus was on the salt bridges, as this is a key regulatory structural feature that is conserved across many kinases. Regarding the additional interactions mentioned, we have thoroughly evaluated their roles and dynamics through molecular dynamics (MD) simulations but did not find any results of significant relevance to warrant inclusion.

      (6) In Figure 7: Did the author look at the oligomerization state of the BSK1-2 multimers under non-reducing conditions? Were they also observed in the case of the FL constructs? What was the stoichiometry?

      Our current work indicates that the kinase domain of BRSK1-2 primarily exists in a monomeric state, with some evidence of dimerization or multimer formation under specific conditions. Our SEC-MALS (Supp Fig 6) and SDS-PAGE analysis (Figure 7) clearly demonstrates that monomers are overwhelmingly the dominant species under non-reducing conditions (>90 %). We also conclude that these limited oligomeric species can be removed by inclusion of reducing agents such as DTT (Figure 7), which may suggest a role for a Cys residue(s). Notably, removal of the T+2 Cys was insufficient to prevent multimerization.

      We were unable to obtain reliable SEC-MALS data for the full-length forms of the protein, likely due to the presence of disordered regions that flank the kinase domain which results in a highly heterodispersed and unstable preparation (at the concentrations required for SEC-MALS). Although we are therefore unable to comment on the stoichiometry of FL BRSK dimers, we can detect BRSK1 and 2 hetero- and homo-complexes in HEK-293T cells by IP, which supports the existence of limited BRSK1 & 2 dimers (Supp Fig 6a). However, we were unable to detect intermolecular disulfide bonds by MS, although this does not necessarily preclude their existence. The physiological role of BRSK multimerization (if any) and establishing specifically which Cys residues drive this phenomenon is of significant interest to our future investigations.

    1. Author response:

      We thank the reviewers for their attention to our study and for their fair and reasonable assessment of the strengths and weaknesses of our work. We believe the reviewers adequately captured both the potential implications of our work as well as its major current limitations. As both reviewers noted, we believe the work presented in this manuscript is an exciting first step in adapting minibinders as antigen sensors for synthetic receptors but many questions remain before these new tools can be widely adopted. We hope that this work will catalyze others to try minibinders as potential antigen sensors when developing novel synthetic receptors, and we hope that future work will more thoroughly test a wide range of linkers to better optimize antigen sensor function across synthetic receptors.

      In our future work, we intend to evaluate a greater diversity of minibinders across different relevant therapeutic targets. We are working to test both existing minibinders as well as generate novel minibinders using deep-learning-based de novo protein design methods. We further hope to explore additional linker modifications, especially focusing on modifications that will allow minibinder coupled-synthetic receptors to escape the glycocalyx of engineered cells. We hope to share findings on these topics in either an update to this manuscript or in future manuscripts, depending on the results of our studies in progress.

      Finally, reviewers noted a mismatch in the data displayed in Figure 5A and 5C, whereby LCB-CAR-expressing cells induced higher lysis in Figure 5C than in Figure 5A. This is due to figure 5C showing only 24 hours of incubation between effector and target cells, as opposed to the 72 hours of incubation that is quantitated in 5A. These mismatched timepoints were selected because linker-dependent differences in lysis were most readily apparent at 24 hours and were negligible at 72 hours. The full-time course of lysis for this experiment can be seen in Supplemental Figure 2D.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We thank the reviewers for their thorough review of and overall positive comments on our manuscript. We have revised the manuscript to address most of the concerns raised. Below is a point-by-point response to the reviewers’ comments outlining these changes.

      The novelty of the study is compromised due to the recently published structure of unliganded PRex1 (Chang et al. 2022). The unliganded and IP4-bound structure of P-Rex1 appear virtually identical, however, no clear comparison is presented in the manuscript. In the same paper, a very similar model of P-Rex1 activation upon binding to PIP3 membranes and Gbeta/gamma is presented.

      This comparison has been added as Supplemental Figure 5. Although similar models of activation are presented in our manuscript and in that of Chang et al. 2022, our model is extended to incorporate inhibition by IP4 and other aspects of regulation not previously incorporated, shown in both schematic form (Figure 6B) and including supporting data (Figure 6A). We also point out that in the work by Chang et al. they used domain insertions to stabilize the structure, and here we present the native protein structure. It turns out that they look similar, but our work reduces concerns over possible engineering artifacts. Finally, our model is further informed by HDX-MS measurements of the enzyme bound to PIP3 in liposomes (Figure 6A and Supplemental figure 8), which reveal the regions of the protein subject to higher dynamics and are consistent with a more fully extended conformation.

      The authors demonstrate that IP4 binding to P-Rex1 results in catalytic inhibition and increased protection of autoinhibitory interfaces, as judged by HDX. The relevance of this in a cellular setting is not clear and is not experimentally demonstrated. Further, mechanistically, it is not clear whether the biochemical inhibition by IP4 of PIP3 activated P-Rex1 is due to competition of IP4 with activating PIP3 binding to the PH domain of P-Rex1, or due to stabilizing the autoinhibited conformation, or both.

      We feel that both occur. IP4 and PIP3 bind to the same site of the PH domain, thus they must be competitive at the very least. We also show that IP4 stabilizes the autoinhibited conformation (based on both our cryo-EM and HDX-MS data). Because PIP3 does not activate either DH/PH or DH/PH-DEP1 (nor does IP4 inhibit, see Sup. Fig. 1), it is not possible for us to tell with this suite of experiments how much the inhibition is due to competition versus stabilization of the autoinhibited conformation.

      It is difficult to judge the error in the HDX experiments presented in Sup. data 1 and 2. In the method section, it is stated that the results represent the average from two samples. How is the SD error calculated in Fig.1B-C?

      To clarify, the following passages have been revised:

      Figure 1 legend – “Graphs show the exchange over time for select regions in the P-Rex1 (B) PH domain and (C) a IP4P region that was disordered in the P-Rex1–Gbg structure. Shown is the average of two experiments with error bars representing the mean ± standard deviation.” Methods section – “Each sample was analyzed twice by HDX-MS, and the data shown in graphs represent the average of these experiments. For each peptide, the average of all five time points was calculated and used to plot the difference data onto the coordinates.”

      As mentioned, from the explanations in the manuscript it is difficult to judge the differences between the unliganded and the IP4 bound structure. A superposition, pointing to the main differences, would help. Are there any additional interactions observed that could explain a more stable autoinhibitory conformation?

      Added as Supplemental Figure 5. Although there are global shifts in some of the domains, the overall structures are similar to one another. Due to the moderate resolution of both structures (~4.2 Å), accurate placement of sidechains is difficult, in some places more than others. Because of this, we cannot pinpoint many specific sidechain interactions with certainty. There are no obvious interactions observed in our IP4 bound structure compared to that of 7SYF that would explain a more stable autoinhibited conformation, and thus the evidence comes primarily from the HDX-MS data.

      The cellular significance of IP4 regulation is not clear. Finding a way to manipulate intracellular IP4 levels and showing that this affects P-Rex1 cellular activity would greatly increase the significance of this finding.

      We agree that this would be an informative experiment, but not one that we currently have the means to perform.

      From the presented data it is not clear if inhibition by IP4 is due to competition with PIP3 or due to the proposed stabilization of P-Rex1 autoinhibition. Performing a study as shown in Fig.1D, but with the DH/PH construct could resolve this question.

      First, please see our response to the similar concern from Reviewer 1 above. It is not possible for us to test the DH/PH construct and assess if there is direct competition with PIP3. To emphasize this point (and to correct the error that we never made a call to Sup. Fig. 1C in the original manuscript), we added the following lines to the first paragraph of the Results.

      “Negatively charged liposomes (containing PC/PS), including those that also contain PIP3, unexpectedly inhibit the GEF activity of the DH/PH-DEP1 and DH/PH fragments (Sup. Fig. 1C). Because full-length P-Rex1 is not affected by PC/PS liposomes, it suggests this the observed inhibition represents a non-productive interaction of the DH/PH-DEP1 and DH/PH fragments with negatively charged surfaces in our assay. The lack of activation of DH/PH-DEP1 by PIP3 prevents us from testing whether IP4 can directly inhibit via direct competition with PIP3.”

      If I understand correctly, the data shown in Supplementary Data 1 and 2 are averages of 2 measurements, which makes it difficult to judge real signals from outliers. Perhaps, rather than showing the average, the results from the two experiments could be shown. Also, please explain how the SD error is calculated in Fig.1B-C if the data points indeed are averages of 2 measurements.

      We are sorry for the confusion. The data shown in Sup. Data 1 and 2 are not averages of two experiments. The Methods section has therefore been modified to read: “Each image in Supplemental Data 1 and 2 shows one experiment (rainbow plots) or a difference analysis from those experiments (red to blue plots). Only one of the two sets of experiments performed for each condition (+/- liposomes or +/- IP4) is shown here.” As described above, text has been added to clarify the SD error calculated in Fig. 1B and 1C.

      The authors claim that the data presented in Fig 4B suggests that the salt bridge formed by K207 and E251 is important for autoinhibition. If so, the authors should explain why the K207C mutant is not activated.

      Multiple reviewers had problems with this panel, and we now recognize that we misinterpreted the data, which did not help with this. Because this data is largely just supportive of our structure and SAXS data, Figure 4 was moved to the Supplement and this section of the results now reads:

      “Flexibility of the hinge in the a6-aN helix of the DH/PH module is important for autoinhibition.

      One of our initial goals in this project was to determine a high-resolution structure of the autoinhibited DH/PH-DEP1 core by X-ray crystallography. To this end, we started with the DH/PH-DEP1 A170K variant, which was more inhibited than wild-type but still dynamic, and then introduced S235C/M244C and K207C/E251C double mutants to completely constrain the hinge in the a6-aN helix via disulfide bond formation in a redox sensitive manner. Single cysteine variants K207C and M244C were generated as controls. The S235C/M244C variant performed as expected, decreasing the activity of the A170K variant to nearly background in the oxidized but not the reduced state (Supplemental Fig. 4). However, the M244C single mutant exhibited similar effects, suggesting that it forms disulfide bonds with cysteine(s) other than S235C. Indeed, the side chains of Cys200 and Cys234 are very close to that of M244C. The K207C/E251C mutant was similar to S235C/M244C under oxidized conditions, but ~15-fold more active (similar to WT DH/PH levels, see Fig. 3C) under reducing conditions. The K270C variant, on the other hand, exhibited higher activity than A170K on its own under oxidizing conditions, but similar activity to all the variants except K207C/E251C when reduced. These results suggest that K207C/E251C in a reduced state and K270C in an oxidized state favor a configuration where the DEP1 domain is less able to engage the DH domain and maintain the kinked state. The mechanism for this is not known. Regardless, these data show that perturbation of contacts between the kinked segments of the a6-aN helix can have profound consequences on the activity of the DH/PH-DEP1 core.”

      In the low-resolution cryo-EM study, it is mentioned that only a few classes exhibit the extra density that ultimately corresponds to autoinhibited P-Rex1. If so, is this also the case in the high-resolution study and how many of the most populated classes contribute to the autoinhibited structure? It would be informative for the reader to provide this information.

      Indeed, only a small subset of the particles are in the autoinhibited conformation in the Krios data set, similar to the Glacios. How many classes these particles partition to is dependent on how many classes are asked for during 2D classification and how many “garbage” particles are present at the different stages of particle stack cleaning during 2D classification. Also, because of the preferred orientation problem, many of the particles in this conformation segregate together during 2D classification. Therefore, in addition to the information show in Sup. Fig. 2, we think a more informative metric to answer the reviewer’s question is the number of particles at the start of data processing compared to at the end, which is shown in Table 1.

      Page 10, line 217: "The kink .... is important for autoinhibition". It seems unlikely that there is no kink in the activated state. Perhaps it should say something like "Mobility in the kink is important ..."

      Agreed. In fact, the SAXS data we reported on the DH/PH module in Ravala et al. (2020) is most consistent with a DH/PH that exhibits both extended and condensed conformations in solutions.

      Fig. 4A: It would help to label helices alpha6 and alphaN.

      These helices have now been labeled.

      Page 11, lines 223 and 228 are contradictory: In line 223 it is stated that K207C/E251C exhibit reduced GEF activity, while on line 228 it says this has little effect under non-reducing conditions.

      We thank the reviewer for this catch. We have modified the text to make it self-consistent.

      In Fig.5B, it would help if the authors mention in the legend that a trans-well migration assay was used, in order to know what the increase in stained cells signifies.

      The legend has been modified to include this information.

      The previous work by Chang et al., 2022 (PMID: 35864164) found that the final DH domain α6 formed the hinge helix (the kink in this manuscript), which undergoes a significant conformational change between closed and opened conformations of P-Rex1. Could the authors discuss the state of the kink in the presence of IP4 and in the P-Rex1 variants A170K and L177E?

      We have now included an alignment of our structure in the presence of IP4 with the Chang et al., 2022 structure (Supplemental Figure 5). There is very little difference in the kink region. Because the A170K variant exhibits reduced GEF activity and a smaller Dmax, it could be speculated that the kink might be further stabilized as compared to wild-type. The L177E variant exhibited activity similar to that of DH/PH alone, implying a relief of the kink. This interpretation is supported by our SAXS analysis of A170K and L177E in Fig. 3.

      I am a bit confused about the set of experiments with the intended DH-DEP1 interface disruptive mutation A170K, which later turned out to enhance P-Rex1 activity inhibition. The authors explained that the DH K170 salt bridges with DEP1 Glu411 stabilize the DH-DEP1 interaction. Next, the authors used P-Rex1 A170K mutant as the backbone for the introduction of disulfide bonds to block the closed configuration of the DH-PH hinge region by creating some mutants S235C/M244C and K207C/E251C. The first intended C235-C244 disulfide bond did not show any effect on the GEF activity because C235 is so close to the native C234 for a potential disulfide bond. I would recommend putting the data of S235C/M244C into a supplemental figure. Also, I am wondering if the GEF activity measurements in Fig 4B could be performed in the presence or absence of IP4 to see whether the IP4-induced autoinhibition form is distinct from the natural autoinhibitory once the kink was unblocked by reducing agent DTT.

      The confusion was warranted by our poor analysis of this data, rectified as discussed above.

      With regards to experiments plus/minus IP4, due to the absence of the IP4P domain, IP4 had no inhibitory effect on the activity of DH/PH or DH/PH-DEP1 (Supplemental Figure 1A and 1B) and as such this experiment would not likely be informative (or at best very hard to interpret).

      For the IP4 versus PIP3 activity assays, the authors indicated that P-Rex1 inhibition is dependent on the Inositol 3-phosphate. Have the authors tested and could they test with either Ins (1,3,4)P3 or Ins(1,3,5)P3?

      In these assays (Figure 1D), we show that inhibition does not occur with Ins(1,4,5)P3. Based on previous structures of IP4 bound to the PH domain and supporting biochemical assays (Cash et al., 2016, Structure), the 3- and 4-phosphates are the most highly coordinated and the next most thermostabilizing headgroup other than IP4 was Ins(1,3,4)P3. Therefore, we would anticipate that Ins(1,3,4)P3 might stabilize the autoinhibited state, perhaps at higher concentrations, but we have not directly tested this.

      The authors should provide the electron density maps of the P-REX1-IP4 complex in the supplemental figure and highlight the maps for two key interactions between DEP1 and DH and between PH and IP4P 4-helix bundle subdomain.

      The Coulomb potential map of this complex is shown in Figure 2A. Due to the moderate resolution of the reconstruction, side chain details cannot be unambiguously modeled at these interfaces, which is why we do not highlight any observed, specific interactions between sidechains.

      The manuscript was written very well and there is only one typing error in the legend of Supplemental Figure 1.

      Thank you for this catch.

      Details of EM density at significant domain interfaces and at the IP4 binding site should be provided as supplementary material.

      Beyond our comment about interfaces above, we have now provided the map representing the bound IP4 as Figure 4B.

      Line 123: It is difficult to discern in Figure 2A the "severe bend" in the helix that connects the DH and PH domains. It was not apparent (to me, at least) where this helix is located until eventually encountering Figure 4. It would be helpful to highlight or label (maybe with an asterisk) the bend site in Fig 2A.

      This has been labeled in Figure 2A.

      Line 125-126: likewise, It would be helpful to the reader to highlight the GTPase binding site in the DH domain.

      This has been labeled in Figure 2A.

      Line 159. Consider adding a supplementary figure showing a superposition of the two pREX-1 regulatory interfaces in the present structure and in 7SYF.

      A superposition of the two structures has now been added as Supplemental Figure 5. Because both structures are of moderate resolution, it is difficult to place side chains with a high degree of certainty. Thus, we did not think it wise to draw conclusions from comparisons between the details of these interfaces.

      Is the positioning of IP4 dictated by the EM density, prior knowledge from high-resolution structures, or both? A rendering of the EM density over the stick model as a supplementary figure would be helpful.

      This was modeled based on both. This image has now been added as Figure 4B.

      It should be emphasized that the jackknife model is similar to the hinge model proposed by Chang et al (2022).

      Mention of similarity between our model and the model proposed by Chang et al., 2022 occurs twice in the manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Response to Reviewer #1:

      Thank you for the careful reading and the positive evaluation of our manuscript. As you mentioned, the present study tried to address the question of how the lost genomic functions could be compensated by evolutionary adaptation, indicating the potential mechanism of "constructive" rather than "destructive" evolution. Thank you for the instructive comments that helped us to improve the manuscript. We sincerely hope the revised manuscript and the following point-to-point response meet your concerns.

      • Line 80 "Growth Fitness" is this growth rate?

      Yes. The sentence was revised as follows.

      (L87-88) “The results demonstrated that most evolved populations (Evos) showed improved growth rates, in which eight out of nine Evos were highly significant (Fig. 1B, upper).”

      • Line 94 a more nuanced understanding of r/K selection theory, allows for trade-ups between R and K, as well as trade-offs. This may explain why you did not see a trade-off between growth and carrying capacity in this study. See this paper https://doi.org/10.1038/s41396-023-01543-5. Overall, your evos lineages evolved higher growth rates and lower carrying capacity (Figures 1B, C, E). If selection was driving the evolution of higher growth rates, it may have been that there was no selective pressure to maintain high carrying capacity. This means that the evolutionary change you observed in carrying capacity may have been neutral "drift" of the carrying capacity trait, during selection for growth rate, not because of a trade-off between R and K. This is especially likely since carrying capacity declined during evolution. Unless the authors have convincing evidence for a tradeoff, I suggest they remove this claim.

      • Line 96 the authors introduce a previous result where they use colony size to measure growth rate, this finding needs to be properly introduced and explained so that we can understand the context of the conclusion.

      • Line 97 This sentence "the collapse of the trade-off law likely resulted from genome reduction." I am not sure how the authors can draw this conclusion, what is the evidence supporting that the genome size reduction causes the breakdown of the tradeoff between R and K (if there was a tradeoff)?

      Thank you for the reference information and the thoughtful comments. The recommended paper was newly cited, and the description of the trade-off collapse was deleted. Accordingly, the corresponding paragraph was rewritten as follows.

      (L100-115) “Intriguingly, a positive correlation was observed between the growth fitness and the carrying capacity of the Evos (Fig. 1D). It was somehow consistent with the positive correlations between the colony growth rate and the colony size of a genome-reduced strain 11 and between the growth rates and the saturated population size of an assortment of genome reduced strains 13. Nevertheless, the negative correlation between growth rate and carrying capacity, known as the r/K selection30,31 was often observed as the trade-off relationship between r and K in the evolution and ecology studies 32 33,34. As the r/K trade-off was proposed to balance the cellular metabolism that resulted from the cost of enzymes involved 34, the deleted genes might play a role in maintaining the metabolism balance for the r/K correlation. On the other hand, the experimental evolution (i.e., serial transfer) was strictly performed within the exponential growth phase; thus, the evolutionary selection was supposed to be driven by the growth rate without selective pressure to maintain the carrying capacity. The declined carrying capacity might have been its neutral "drift" but not a trade-off to the growth rate. Independent and parallel experimental evolution of the reduced genomes selecting either r or K is required to clarify the actual mechanisms.”

      • Line 103 Genome mutations. The authors claim that there are no mutations in parallel but I see that there is a 1199 base pair deletion in eight of the nine evo strains (Table S3). I would like the author to mention this and I'm actually curious about why the authors don't consider this parallel evolution.

      Thank you for your careful reading. According to your comment, we added a brief description of the 1199-bp deletion detected in the Evos as follows.

      (L119-122) “The number of mutations largely varied among the nine Evos, from two to 13, and no common mutation was detected in all nine Evos (Table S3). A 1,199-bp deletion of insH was frequently found in the Evos (Table S3, highlighted), which well agreed with its function as a transposable sequence.”

      • Line 297 Please describe the media in full here - this is an important detail for the evolution experiment. Very frustrating to go to reference 13 and find another reference, but no details of the method. Looked online for the M63 growth media and the carbon source is not specified. This is critical for working out what selection pressures might have driven the genetic and transcriptional changes that you have measured. For example, the parallel genetic change in 8/9 populations is a deletion of insH and tdcD (according to Table S3). This is acetate kinase, essential for the final step in the overflow metabolism of glucose into acetate. If you have a very low glucose concentration, then it could be that there was selection to avoid fermentation and devote all the pyruvate that results from glycolysis into the TCA cycle (which is more efficient than fermentation in terms of ATP produced per pyruvate).

      Sorry for the missing information on the medium composition, which was additionally described in the Materials and Methods. The glucose concentration in M63 was 22 mM, which was supposed to be enough for bacterial growth. Thank you for your intriguing thinking about linking the medium component to the genome mutation-mediated metabolic changes. As there was no experimental result regarding the biological function of gene mutation in the present study, please allow us to address this issue in our future work.

      (L334-337) “In brief, the medium contains 62 mM dipotassium hydrogen phosphate, 39 mM potassium dihydrogen phosphate, 15 mM ammonium sulfate, 15 μM thiamine hydrochloride, 1.8 μM Iron (II) sulfate, 0.2 mM magnesium sulfate, and 22 mM glucose.”

      • Line 115. I do not understand this argument "They seemed highly related to essentiality, as 11 out of 49 mutated genes were essential (Table S3)." Is this a significant enrichment compared to the expectation, i.e. the number of essential genes in the genome? This enrichment needs to be tested with a Hypergeometric test or something similar.

      • Also, "As the essential genes were known to be more conserved than nonessential ones, the high frequency of the mutations fixed in the essential genes suggested the mutation in essentiality for fitness increase was the evolutionary strategy for reduced genome." I do not think that there is enough evidence to support this claim, and it should be removed.

      Sorry for the unclear description. Yes, the mutations were significantly enriched in the essential genes (11 out of 45 genes) compared to the essential genes in the whole genome (286 out of 3290 genes). The improper description linking the mutation in essential genes to the fitness increase was removed, and an additional explanation on the ratio of essential genes was newly supplied as follows.

      (L139-143) “The ratio of essential genes in the mutated genes was significantly higher than in the total genes (286 out of 3290 genes, Chi-square test p=0.008). As the essential genes were determined according to the growth35 and were known to be more conserved than nonessential ones 36,37, the high frequency of the mutations fixed in the essential genes was highly intriguing and reasonable.”

      • Line 124 Regarding the mutation simulations, I do not understand how the observed data were compared to the simulated data, and how conclusions were drawn. Can the authors please explain the motivation for carrying out this analysis, and clearly explain the conclusions?

      Random simulation was additionally explained in the Materials and Methods and the conclusion of the random simulation was revised in the Results, as follows.

      (L392-401) “The mutation simulation was performed with Python in the following steps. A total of 65 mutations were randomly generated on the reduced genome, and the distances from the mutated genomic locations to the nearest genomic scars caused by genome reduction were calculated. Subsequently, Welch's t-test was performed to evaluate whether the distances calculated from the random mutations were significantly longer or shorter than those calculated from the mutations that occurred in Evos. The random simulation, distance calculation, and statistic test were performed 1,000 times, which resulted in 1,000 p values. Finally, the mean of p values (μp) was calculated, and a 95% reliable region was applied. It was used to evaluate whether the 65 mutations in the Evos were significantly close to the genomic scars, i.e., the locational bias.”

      (L148-157) “Random simulation was performed to verify whether there was any bias or hotspot in the genomic location for mutation accumulation due to the genome reduction. A total of 65 mutations were randomly generated on the reduced genome (Fig. 2B), and the genomic distances from the mutations to the nearest genome reduction-mediated scars were calculated. Welch's t-test was performed to evaluate whether the genomic distances calculated from random mutations significantly differed from those from the mutations accumulated in the Evos. As the mean of p values (1,000 times of random simulations) was insignificant (Fig. 2C, μp > 0.05), the mutations fixed on the reduced genome were either closer or farther to the genomic scars, indicating there was no locational bias for mutation accumulation caused by genome reduction.”

      • Line 140 The authors should give some background here - explain the idea underlying chromosomal periodicity of the transcriptome, to help the reader understand this analysis.

      • Line 142 Here and elsewhere, when referring to a method, do not just give the citation, but also refer to the methods section or relevant supplementary material.

      The analytical process (references and methods) was described in the Materials and Methods, and the reason we performed the chromosomal periodicity was added in the Results as follows.

      (L165-172) “As the E. coli chromosome was structured, whether the genome reduction caused the changes in its architecture, which led to the differentiated transcriptome reorganization in the Evos, was investigated. The chromosomal periodicity of gene expression was analyzed to determine the structural feature of genome-wide pattern, as previously described 28,38. The analytical results showed that the transcriptomes of all Evos presented a common six-period with statistical significance, equivalent to those of the wild-type and ancestral reduced genomes (Fig. 3A, Table S4).”

      • Line 151 "The expression levels of the mutated genes were higher than those of the remaining genes (Figure 3B)"- did this depend on the type of mutation? There were quite a few early stops in genes, were these also more likely to be expressed? And how about the transcriptional regulators, can you see evidence of their downstream impact?

      Sorry, we didn't investigate the detailed regulatory mechanisms of 49 mutated genes, which was supposed to be out of the scope of the present study. Fig. 3B was the statistical comparison between 3225 and 49 genes. It didn't mean that all mutated genes expressed higher than the others. The following sentences were added to address your concern.

      (L181-185) “As the regulatory mechanisms or the gene functions were supposed to be disturbed by the mutations, the expression levels of individual genes might have been either up- or down-regulated. Nevertheless, the overall expression levels of all mutated genes tended to be increased. One of the reasons was assumed to be the mutation essentiality, which remained to be experimentally verified.”

      • Line 199 onward. The authors used WGCNA to analyze the gene expression data of evolved organisms. They identified distinct gene modules in the reduced genome, and through further analysis, they found that specific modules were strongly associated with key biological traits like growth fitness, gene expression changes, and mutation rates. Did the authors expect that there was variation in mutation rate across their populations? Is variation from 3-16 mutations that they observed beyond the expectation for the wt mutation rate? The genetic causes of mutation rate variation are well understood, but I could not see any dinB, mutT,Y, rad, or pol genes among the discovered mutations. I would like the authors to justify the claim that there was mutation rate variation in the evolved populations.

      Thank you for the intriguing thinking. We don't think the mutation rates were significantly varied across the nine populations, as no mutation occurred in the MMR genes, as you noticed. Our previous study showed that the spontaneous mutation rate of the reduced genome was higher than that of the wild-type genome (Nishimura et al., 2017, mBio). As nonsynonymous mutations were not detected in all nine Evos, the spontaneous mutation rate couldn't be calculated (because it should be evaluated according to the ratio of nonsynonymous and synonymous single-nucleotide substitutions in molecular evolution). Therefore, discussing the mutation rate in the present study was unavailable. The following sentence was added for a better understanding of the gene modules.

      (L242-245) “These modules M2, M10 and M16 might be considered as the hotspots for the genes responsible for growth fitness, transcriptional reorganization, and mutation accumulation of the reduced genome in evolution, respectively.”

      • Line 254 I get the idea of all roads leading to Rome, which is very fitting. However, describing the various evolutionary strategies and homeostatic and variable consequence does not sound correct - although I am not sure exactly what is meant here. Looking at Figure 7, I will call strategy I "parallel evolution", that is following the same or similar genetic pathways to adaptation and strategy ii I would call divergent evolution. I am not sure what strategy iii is. I don't want the authors to use the terms parallel and divergent if that's not what they mean. My request here would be that the authors clearly describe these strategies, but then show how their results fit in with the results, and if possible, fit with the naming conventions, of evolutionary biology.

      Thank you for your kind consideration and excellent suggestion. It's our pleasure to adopt your idea in tour study. The evolutionary strategies were renamed according to your recommendation. Both the main text and Fig. 7 were revised as follows.

      (L285-293) “Common mutations22,44 or identical genetic functions45 were reported in the experimental evolution with different reduced genomes, commonly known as parallel evolution (Fig. 7, i). In addition, as not all mutations contribute to the evolved fitness 22,45, another strategy for varied phenotypes was known as divergent evolution (Fig. 7, ii). The present study accentuated the variety of mutations fixed during evolution. Considering the high essentiality of the mutated genes (Table S3), most or all mutations were assumed to benefit the fitness increase, partially demonstrated previously 20. Nevertheless, the evolved transcriptomes presented a homeostatic architecture, revealing the divergent to convergent evolutionary strategy (Fig. 7, iii).”

      Author response image 1.

      • Line 327 Growth rates/fitness. I don't think this should be called growth fitness- a rate is being calculated. I would like the authors to explain how the times were chosen - do the three points have to be during the log phase? Can you also explain what you mean by choosing three ri that have the largest mean and minor variance?

      Sorry for the confusing term usage. The fitness assay was changed to the growth assay. Choosing three ri that have the largest mean and minor variance was to avoid the occasional large values (blue circle), as shown in the following figure. In addition, the details of the growth analysis can be found at https://doi.org/10.3791/56197 (ref. 59), where the video of experimental manipulation, protocol, and data analysis is deposited. The following sentence was added in accordance.

      Author response image 2.

      (L369-371) “The growth rate was determined as the average of three consecutive ri, showing the largest mean and minor variance to avoid the unreliable calculation caused by the occasionally occurring values. The details of the experimental and analytical processes can be found at https://doi.org/10.3791/56197.”

      • Line 403 Chromosomal periodicity analysis. The windows chosen for smoothing (100kb) seem big. Large windows make sense for some things - for example looking at how transcription relates to DNA replication timing, which is a whole-genome scale trend. However, here the authors are looking for the differences after evolution, which will be local trends dependent on specific genes and transcription factors. 100kb of the genome would carry on the order of one hundred genes and might be too coarse-grained to see differences between evos lineages.

      Thank you for the advice. We agree that the present analysis focused on the global trend of gene expression. Varying the sizes may lead to different patterns. Additional analysis was performed according to your comment. The results showed that changes in window size (1, 10, 50, 100, and 200 kb) didn't alter the periodicity of the reduced genome, which agreed with the previous study on a different reduced genome MDS42 of a conserved periodicity (Ying et al., 2013, BMC Genomics). The following sentence was added in the Materials and Methods.

      (L460-461) “Note that altering the moving average did not change the max peak.”

      • Figures - the figures look great. Figure 7 needs a legend.

      Thank you. The following legend was added.

      (L774-777) “Three evolutionary strategies are proposed. Pink and blue arrowed lines indicate experimental evolution and genome reduction, respectively. The size of the open cycles represents the genome size. Black and grey indicate the ancestor and evolved genomes, respectively.”

      Response to Reviewer #2:

      Thank you for reviewing our manuscript and for your fruitful comments. We agree that our study leaned towards elaborating observed findings rather than explaining the detailed biological mechanisms. We focused on the genome-wide biological features rather than the specific biological functions. The underlying mechanisms indeed remained unknown, leaving the questions as you commented. We didn't perform the fitness assay on reconstituted (single and combinatorial) mutants because the research purpose was not to clarify the regulatory or metabolic mechanisms. It's why the RNA-Seq analysis provided the findings on genome-wide patterns and chromosomal view, which were supposed to be biologically valuable. We did understand your comments and complaints that the conclusions were biologically meaningless, as ALE studies that found the specific gene regulation or improved pathway was the preferred story in common, which was not the flow of the present study.

      For this reason, our revision may not address all these concerns. Considering your comments, we tried our best to revise the manuscript. The changes made were highlighted. We sincerely hope the revision and the following point-to-point response are acceptable.

      Major remarks:

      (1) The authors outlined the significance of ALE in genome-reduced organisms and important findings from published literature throughout the Introduction section. The description in L65-69, which I believe pertains to the motivation of this study, seems vague and insufficient to convey the novelty or necessity of this study i.e. it is difficult to grasp what aspects of genome-reduced biology that this manuscript intends to focus/find/address.

      Sorry for the unclear writing. The sentences were rewritten for clarity as follows.

      (L64-70) “Although the reduced growth rate caused by genome reduction could be recovered by experimental evolution, it remains unclear whether such an evolutionary improvement in growth fitness was a general feature of the reduced genome and how the genome-wide changes occurred to match the growth fitness increase. In the present study, we performed the experimental evolution with a reduced genome in multiple lineages and analyzed the evolutionary changes of the genome and transcriptome.”

      (2) What is the rationale behind the lineage selection described in Figure S1 legend "Only one of the four overnight cultures in the exponential growth phase (OD600 = 0.01~0.1) was chosen for the following serial transfer, highlighted in red."?

      The four wells (cultures of different initial cell concentrations) were measured every day, and only the well that showed OD600=0.01~0.1 (red) was transferred with four different dilution rates (e.g., 10, 100, 1000, and 10000 dilution rates). It resulted in four wells of different initial cell concentrations. Multiple dilutions promised that at least one of the wells would show the OD600 within the range of 0.01 to 0.1 after the overnight culture. They were then used for the next serial transfer. Fig. S1 provides the details of the experimental records. The experimental evolution was strictly controlled within the exponential phase, quite different from the commonly conducted ALE that transferred a single culture in a fixed dilution rate. Serial transfer with multiple dilution rates was previously applied in our evolution experiments and well described in Nishimura et al., 2017, mBio; Lu et al., 2022, Comm Biol; Kurokawa et al., 2022, Front Microbiol, etc. The following sentence was added in the Materials and Methods.

      (L344-345) “Multiple dilutions changing in order promised at least one of the wells within the exponential growth phase after the overnight culture.”

      (3) The measured growth rate of the end-point 'F2 lineage' shown in Figure S2 seemed comparable to the rest of the lineages (A1 to H2), but the growth rate of 'F2' illustrated in Figure 1B indicates otherwise (L83-84). What is the reason for the incongruence between the two datasets?

      Sorry for the unclear description. The growth rates shown in Fig. S2 were obtained during the evolution experiment using the daily transfer's initial and final OD600 values. The growth rates shown in Fig. 1B were obtained from the final population (Evos) growth assay and calculated from the growth curves (biological replication, N=4). Fig. 1B shows the precisely evaluated growth rates, and Fig. S2 shows the evolutionary changes in growth rates. Accordingly, the following sentence was added to the Results.

      (L84-87) “As the growth increases were calculated according to the initial and final records, the exponential growth rates of the ancestor and evolved populations were obtained according to the growth curves for a precise evaluation of the evolutionary changes in growth.”

      (4) Are the differences in growth rate statistically significant in Figure 1B?

      Eight out of nine Evos were significant, except F2. The sentences were rewritten and associated with the revised Fig. 1B, indicating significance.

      (L87-90) “The results demonstrated that most evolved populations (Evos) showed improved growth rates, in which eight out of nine Evos were highly significant (Fig. 1B, upper). However, the magnitudes of growth improvement were considerably varied, and the evolutionary dynamics of the nine lineages were somehow divergent (Fig. S2).”

      (5) The evolved lineages showed a decrease in their maximal optical densities (OD600) compared to the ancestral strain (L85-86). ALE could accompany changes in cell size and morphologies, (doi: 10.1038/s41586-023-06288-x; 10.1128/AEM.01120-17), which may render OD600 relatively inaccurate for cell density comparison. I suggest using CFU/mL metrics for the sake of a fair comparison between Anc and Evo.

      The methods evaluating the carrying capacity (i.e., cell density, population size, etc.) do not change the results. Even using CFU is unfair for the living cells that can not form colonies and unfair if the cell size changes. Optical density (OD600) provides us with the temporal changes of cell growth in a 15-minute interval, which results in an exact evaluation of the growth rate in the exponential phase. CFU is poor at recording the temporal changes of population changes, which tend to result in an inappropriate growth rate. Taken together, we believe that our method was reasonable and reliable. We hope you can accept the different way of study.

      (6) Please provide evidence in support of the statement in L115-119. i.e. statistical analysis supporting that the observed ratio of essential genes in the mutant pool is not random.

      The statistic test was performed, and the following sentence was added.

      (L139-141) “The ratio of essential genes in the mutated genes was significantly higher than in the total genes (286 out of 3290 genes, Chi-square test p=0.008).”

      (7) The assumption that "mutation abundance would correlate to fitness improvement" described in L120-122: "The large variety in genome mutations and no correlation of mutation abundance to fitness improvement strongly suggested that no mutations were specifically responsible or crucially essential for recovering the growth rate of the reduced genome" is not easy to digest, in the sense that (i) the effect of multiple beneficial mutations are not necessarily summative, but are riddled with various epistatic interactions (doi: 10.1016/j.mec.2023.e00227); (ii) neutral hitchhikers are of common presence (you could easily find reference on this one); (iii) hypermutators that accumulate greater number of mutations in a given time are not always the eventual winners in competition games (doi: 10.1126/science.1056421). In this sense, the notion that "mutation abundance correlates to fitness improvement" in L120-122 seems flawed (for your perusal, doi: 10.1186/gb-2009-10-10-r118).

      Sorry for the improper description and confusing writing, and thank you for the fruitful knowledge on molecular evolution. The sentence was deleted, and the following one was added.

      (L145-146) “Nevertheless, it was unclear whether and how these mutations were explicitly responsible for recovering the growth rate of the reduced genome.”

      (8) Could it be possible that the large variation in genome mutations in independent lineages results from a highly rugged fitness landscape characterized by multiple fitness optima (doi: 10.1073/pnas.1507916112)? If this is the case, I disagree with the notion in L121-122 "that no mutations were specifically responsible or crucially essential" It does seem to me that, for example, the mutations in evo A2 are specifically responsible and essential for the fitness improvement of evo A2 in the evolutionary condition (M63 medium). Fitness assessment of individual (or combinatorial) mutants reconstituted in the Ancestral background would be a bonus.

      Thank you for the intriguing thinking. The sentence was deleted. Please allow us to adapt your comment to the manuscript as follows.

      (L143-145) “The large variety of genome mutations fixed in the independent lineages might result from a highly rugged fitness landscape 38.”

      (9) L121-122: "...no mutations were specifically responsible or crucially essential for recovering the growth rate of the reduced genome". Strictly speaking, the authors should provide a reference case of wild-type E. coli ALE in order to reach definitive conclusions that the observed mutation events are exclusive to the genome-reduced strain. It is strongly recommended that the authors perform comparative analysis with an ALEed non-genome-reduced control for a more definitive characterization of the evolutionary biology in a genome-reduced organism, as it was done for "JCVI-syn3.0B vs non-minimal M. mycoides" (doi: 10.1038/s41586-023-06288-x) and "E. coli eMS57 vs MG1655" (doi: 10.1038/s41467-019-08888-6).

      The improper description was deleted in response to comments 7 and 8. The mentioned references were cited in the manuscript (refs 21 and 23). Thank you for the experimental advice. We are sorry that the comparison of wild-type and reduced genomes was not in the scope of the present study and will probably be reported soon in our future work.

      (10) L146-148: "The homeostatic periodicity was consistent with our previous findings that the chromosomal periodicity of the transcriptome was independent of genomic or environmental variation" A Previous study also suggested that the amplitudes of the periodic transcriptomes were significantly correlated with the growth rates (doi: 10.1093/dnares/dsaa018). Growth rates of 8/9 Evos were higher compared to Anc, while that of Evo F2 remained similar. Please comment on the changes in amplitudes of the periodic transcriptomes between Anc and each Evo.

      Thank you for the suggestion. The correlation between the growth rates and the amplitudes of chromosomal periodicity was statistically insignificant (p>0.05). It might be a result of the limited data points. Compared with the only nine data points in the present study, the previous study analyzed hundreds of transcriptomes associated with the corresponding growth rates, which are suitable for statistical evaluation. In addition, the changes in growth rates were more significant in the previous study than in the present study, which might influence the significance. It's why we did not discuss the periodic amplitude.

      (11) Please elaborate on L159-161: "It strongly suggested the essentiality mutation for homeostatic transcriptome architecture happened in the reduced genome.".

      Sorry for the improper description. The sentence was rewritten as follows.

      (L191-193) “The essentiality of the mutations might have participated in maintaining the homeostatic transcriptome architecture of the reduced genome.”

      (12) Is FPKM a valid metric for between-sample comparison? The growing consensus in the community adopts Transcripts Per Kilobase Million (TPM) for comparing gene expression levels between different samples (Figure 3B; L372-379).

      Sorry for the unclear description. The FPKM indicated here was globally normalized, statistically equivalent to TPM. The following sentence was added to the Materials and Methods.

      (L421-422) “The resulting normalized FPKM values were statistically equivalent to TPM.”

      (13) Please provide % mapped frequency of mutations in Table S3.

      They were all 100%. The partially fixed mutations were excluded in the present study. The following sentence was added to the caption of Table S3.

      (Supplementary file, p 9) “Note that the entire population held the mutations, i.e., 100% frequency in DNA sequencing.”

      (14) To my knowledge, M63 medium contains glucose and glycerol as carbon sources. The manuscript would benefit from discussing the elements that impose selection pressure in the M63 culture condition.

      Sorry for the missing information on M63, which contains 22 mM glucose as the only carbon source. The medium composition was added in the Materials and Methods, as follows.

      (L334-337) “In brief, the medium contains 62 mM dipotassium hydrogen phosphate, 39 mM potassium dihydrogen phosphate, 15 mM ammonium sulfate, 15 μM thiamine hydrochloride, 1.8 μM Iron (II) sulfate, 0.2 mM magnesium sulfate, and 22 mM glucose.”

      (15) The RNA-Seq datasets for Evo strains seemed equally heterogenous, just as their mutation profiles. However, the missing element in their analysis is the directionality of gene expression changes. I wonder what sort of biological significance can be derived from grouping expression changes based solely on DEGs, without considering the magnitude and the direction (up- and down-regulation) of changes? RNA-seq analysis in its current form seems superficial to derive biologically meaningful interpretations.

      We agree that most studies often discuss the direction of transcriptional changes. The present study aimed to capture a global view of the magnitude of transcriptome reorganization. Thus, the analyses focused on the overall features, such as the abundance of DEGs, instead of the details of the changes, e.g., the up- and down-regulation of DEGs. The biological meaning of the DEGs' overview was how significantly the genome-wide gene expression fluctuated, which might be short of an in-depth view of individual gene expression. The following sentence was added to indicate the limitation of the present analysis.

      (L199-202) “Instead of an in-depth survey on the directional changes of the DEGs, the abundance and functional enrichment of DEGs were investigated to achieve an overview of how significant the genome-wide fluctuation in gene expression, which ignored the details of individual genes.”

      Minor remarks

      (1) L41: brackets italicized "(E. coli)".

      It was fixed as follows.

      (L40) “… Escherichia coli (E. coli) cells …”

      (2) Figure S1. It is suggested that the x-axis of ALE monitor be set to 'generations' or 'cumulative generations', rather than 'days'.

      Thank you for the suggestion. Fig. S1 describes the experimental procedure, so the" day" was used. Fig. S2 presents the evolutionary process, so the "generation" was used, as you recommended here.

      (3) I found it difficult to digest through L61-64. Although it is not within the job scope of reviewers to comment on the language style, I must point out that the manuscript would benefit from professional language editing services.

      Sorry for the unclear writing. The sentences were revised as follows.

      (L60-64) “Previous studies have identified conserved features in transcriptome reorganization, despite significant disruption to gene expression patterns resulting from either genome reduction or experimental evolution 27-29. The findings indicated that experimental evolution might reinstate growth rates that have been disrupted by genome reduction to maintain homeostasis in growing cells.”

      (4) Duplicate references (No. 21, 42).

      Sorry for the mistake. It was fixed (leaving ref. 21).

      (5) Inconsistency in L105-106: "from two to 13".

      "From two to 13" was adopted from the language editing. It was changed as follows.

      (L119) “… from 2 to 13, …”

      Response to Reviewer #3:

      Thank you for reviewing our manuscript and for the helpful comments, which improved the strength of the manuscript. The recommended statistical analyses essentially supported the statement in the manuscript were performed, and those supposed to be the new results in the scope of further studies remained unconducted. The changes made in the revision were highlighted. We sincerely hope the revised manuscript and the following point-to-point response meet your concerns. You will find all your suggested statistic tests in our future work that report an extensive study on the experimental evolution of an assortment of reduced genomes.

      (1) Line 106 - "As 36 out of 45 SNPs were nonsynonymous, the mutated genes might benefit the fitness increase." This argument can be strengthened. For example, the null expectation of nonsynonymous SNPs should be discussed. Is the number of observed nonsynonymous SNPs significantly higher than the expected one?

      (2) Line 107 - "In addition, the abundance of mutations was unlikely to be related to the magnitude of fitness increase." Instead of just listing examples, a regression analysis can be added.

      Yes, it's significant. Random mutations lead to ~33% of nonsynonymous SNP in a rough estimation. Additionally, the regression is unreliable because there's no statistical significance between the number of mutations and the magnitude of fitness increase. Accordingly, the corresponding sentences were revised with additional statistical tests.

      (L123-129) “As 36 out of 45 SNPs were nonsynonymous, which was highly significant compared to random mutations (p < 0.01), the mutated genes might benefit fitness increase. In addition, the abundance of mutations was unlikely to be related to the magnitude of fitness increase. There was no significant correlation between the number of mutations and the growth rate in a statistical view (p > 0.1). Even from an individual close-up viewpoint, the abundance of mutations poorly explained the fitness increase.”

      (3) Line 114 - "They seemed highly related to essentiality, as 11 out of 49 mutated genes were essential (Table S3)." Here, the information mentioned in line 153 ("the ratio of essential to all genes (302 out of 3,290) in the reduced genome.") can be used. Then a statistical test for a contingency table can be used.

      (4) Line 117 - "the high frequency of the mutations fixed in the essential genes suggested the mutation in essentiality for fitness increase was the evolutionary strategy for reduced genome." What is the expected number of fixed mutations in essential genes vs non-essential genes? Is the observed number statistically significantly higher?

      Sorry for the improper and insufficient information on the essential genes. Yes, it's significant. The statistical test was additionally performed. The corresponding part was revised as follows.

      (L134-146) “They seemed highly related to essentiality7 (https://shigen.nig.ac.jp/ecoli/pec/genes.jsp), as 11 out of 49 mutated genes were essential (Table S3). Although the essentiality of genes might differ between the wild-type and reduced genomes, the experimentally determined 302 essential genes in the wild-type E. coli strain were used for the analysis, of which 286 were annotated in the reduced genome. The ratio of essential genes in the mutated genes was significantly higher than in the total genes (286 out of 3290 genes, Chi-square test p=0.008). As the essential genes were determined according to the growth35 and were known to be more conserved than nonessential ones 36,37, the high frequency of the mutations fixed in the essential genes was highly intriguing and reasonable. The large variety of genome mutations fixed in the independent lineages might result from a highly rugged fitness landscape 38. Nevertheless, it was unclear whether and how these mutations were explicitly responsible for recovering the growth rate of the reduced genome.”

      (5) The authors mentioned no overlapping in the single mutation level. Is that statistically significant? The authors can bring up what the no-overlap probability is given that there are in total x number of fixed mutations observed (either theory or simulation is good).

      Sorry, we feel confused about this comment. It's unclear to us why it needs to be statistically simulated. Firstly, the mutations were experimentally observed. The result that no overlapped mutated genes were detected was an Experimental Fact but not a Computational Prediction. We feel sorry that you may over-interpret our finding as an evolutionary rule, which always requires testing its reliability statistically. We didn't conclude that the evolution had no overlapped mutations. Secondly, considering 65 times random mutations happened to a ~3.9 Mb sequence, the statistical test was meaningful only if the experimental results found the overlapped mutations. It is interesting how often the random mutations cause the overlapped mutations in parallel evolutionary lineages while increasing the evolutionary lineages, which seems to be out of the scope of the present study. We are happy to include the analysis in our ongoing study on the experimental evolution of reduced genomes.

      (6) The authors mentioned no overlapping in the single mutation level. How about at the genetic level? Some fixed mutations occur in the same coding gene. Is there any gene with a significantly enriched number of mutations?

      No mutations were fixed in the same gene of biological function, as shown in Table S3. If we say the coding region, the only exception is the IS sequences, well known as the transposable sequences without genetic function. The following description was added.

      (L119-122) “The number of mutations largely varied among the nine Evos, from 2 to 13, and no common mutation was detected in all nine Evos (Table S3). A 1,199-bp deletion of insH was frequently found in the Evos (Table S3, highlighted), which well agreed with its function as a transposable sequence.”

      (7) Line 151-156- It seems like the authors argue that the expression level differences can be just explained by the percentage of essential genes that get fixed mutations. One further step for the argument could be to compare the expression level of essential genes with vs without fixed mutations. Also, the authors can compare the expression level of non-essential genes with vs without fixed mutations. And the authors can report whether the differences in expression level became insignificant after the control of the essentiality.

      It's our pleasure that the essentiality intrigued you. Thank you for the analytical suggestion, which is exciting and valuable for our studies. As only 11 essential genes were detected here and "Mutation in essentiality" was an indication but not the conclusion of the present study, we would like to apply the recommended analysis to the datasets of our ongoing study to demonstrate this statement. Thank you again for your fruitful analytical advice.

      (8) Line 169- "The number of DEGs partially overlapped among the Evos declined significantly along with the increased lineages of Evos (Figure 4B). " There is a lack of statistical significance here while the word "significantly" is used. One statistical test that can be done is to use re-sampling/simulation to generate a null expectation of the overlapping numbers given the DEGs for each Evo line and the total number of genes in the genome. The observed number can then be compared to the distribution of the simulated numbers.

      Sorry for the inappropriate usage of the term. Whether it's statistically significant didn't matter here. The word "significant" was deleted as follows.

      (L205--206) “The number of DEGs partially overlapped among the Evos declined along with the increased lineages of Evos (Fig. 4B).”

      (9) Line 177-179- "In comparison,1,226 DEGs were induced by genome reduction. The common DEGs 177 of genome reduction and evolution varied from 168 to 540, fewer than half of the DEGs 178 responsible for genome reduction in all Evos" Is the overlapping number significantly lower than the expectation? The hypergeometric test can be used for testing the overlap between two gene sets.

      There's no expectation for how many DEGs were reasonable. Not all numbers experimentally obtained are required to be statistically meaningful, which is commonly essential in computational and data science.

      (10) The authors should give more information about the ancestral line used at the beginning of experimental evolution. I guess it is one of the KHK collection lines, but I can not find more details. There are many genome-reduced lines. Why is this certain one picked?

      Sorry for the insufficient information on the reduced genome used for the experimental evolution. The following descriptions were added in the Results and the Materials and Methods, respectively.

      (L75-79) “The E. coli strain carrying a reduced genome, derived from the wild-type genome W3110, showed a significant decline in its growth rate in the minimal medium compared to the wild-type strain 13. To improve the genome reduction-mediated decreased growth rate, the serial transfer of the genome-reduced strain was performed with multiple dilution rates to keep the bacterial growth within the exponential phase (Fig. S1), as described 17,20.”

      (L331-334) “The reduced genome has been constructed by multiple deletions of large genomic fragments 58, which led to an approximately 21% smaller size than its parent wild-type genome W3110.”

      (11) How was the saturated density in Figure 1 actually determined? In particular, the fitness assay of growth curves is 48h. But it seems like the experimental evolution is done for ~24 h cycles. If the Evos never experienced a situation like a stationary phase between 24-48h, and if the author reported the saturated density 48 h in Figure 1, the explanation of the lower saturated density can be just relaxation from selection and may have nothing to do with the increase of growth rate.

      Sorry for the unclear description. Yes, you are right. The evolution was performed within the exponential growth phase (keeping cell division constant), which means the Evos never experienced the stationary phase (saturation). The final evolved populations were subjected to the growth assay to obtain the entire growth curves for calculating the growth rate and the saturated density. Whether the decreased saturated density and the increased growth rate were in a trade-off relationship remained unclear. The corresponding paragraph was revised as follows.

      (L100-115) “Intriguingly, a positive correlation was observed between the growth fitness and the carrying capacity of the Evos (Fig. 1D). It was somehow consistent with the positive correlations between the colony growth rate and the colony size of a genome-reduced strain 11 and between the growth rates and the saturated population size of an assortment of genome reduced strains 13. Nevertheless, the negative correlation between growth rate and carrying capacity, known as the r/K selection30,31 was often observed as the trade-off relationship between r and K in the evolution and ecology studies 32 33,34. As the r/K trade-off was proposed to balance the cellular metabolism that resulted from the cost of enzymes involved 34, the deleted genes might play a role in maintaining the metabolism balance for the r/K correlation. On the other hand, the experimental evolution (i.e., serial transfer) was strictly performed within the exponential growth phase; thus, the evolutionary selection was supposed to be driven by the growth rate without selective pressure to maintain the carrying capacity. The declined carrying capacity might have been its neutral "drift" but not a trade-off to the growth rate. Independent and parallel experimental evolution of the reduced genomes selecting either r or K is required to clarify the actual mechanisms.”

      (12) What annotation of essentiality was used in this paper? In particular, the essentiality can be different in the reduced genome background compared to the WT background.

      Sorry for the unclear definition of the essential genes. They are strictly limited to the 302 essential genes experimentally determined in the wild-type E coli strain. Detailed information can be found at the following website: https://shigen.nig.ac.jp/ecoli/pec/genes.jsp. We agree that the essentiality could differ between the WT and reduced genomes. Identifying the essential genes in the reduced genome will be an exhaustedly vast work. The information on the essential genes defined in the present study was added as follows.

      (L134-139) “They seemed highly related to essentiality7 (https://shigen.nig.ac.jp/ecoli/pec/genes.jsp), as 11 out of 49 mutated genes were essential (Table S3). Although the essentiality of genes might differ between the wild-type and reduced genomes, the experimentally determined 302 essential genes in the wild-type E. coli strain were used for the analysis, of which 286 were annotated in the reduced genome.”

      (13) The fixed mutations in essential genes are probably not rarely observed in experimental evolution. For example, fixed mutations related to RNA polymerase can be frequently seen when evolving to stressful environments. I think the author can discuss this more and elaborate more on whether they think these mutations in essential genes are important in adaptation or not.

      Thank you for your careful reading and the suggestion. As you mentioned, we noticed that the mutations in RNA polymerases (rpoA, rpoB, and rpoD) were identified in three Evos. As they were not shared across all Evos, we didn't discuss the contribution of these mutations to evolution. Instead of the individual functions of the mutated essential gene functions, we focused on the enriched gene functions related to the transcriptome reorganization because they were the common feature observed across all Evos and linked to the whole metabolic or regulatory pathways, which are supposed to be more biologically reasonable and interpretable. The following sentence was added to clarify our thinking.

      (L268-273) “In particular, mutations in the essential genes, such as RNA polymerases (rpoA, rpoB, rpoD) identified in three Evos (Table S3), were supposed to participate in the global regulation for improved growth. Nevertheless, the considerable variation in the fixed mutations without overlaps among the nine Evos (Table 1) implied no common mutagenetic strategy for the evolutionary improvement of growth fitness.”

      (14) In experimental evolution to new environments, several previous literature also show that long-term experimental evolution in transcriptome is not consistent or even reverts the short-term response; short-term responses were just rather considered as an emergency plan. They seem to echo what the authors found in this manuscript. I think the author can refer to some of those studies more and make a more throughput discussion on short-term vs long-term responses in evolution.

      Thank you for the advice. It's unclear to us what the short-term and long-term responses referred to mentioned in this comment. The "Response" is usually used as the phenotypic or transcriptional changes within a few hours after environmental fluctuation, generally non-genetic (no mutation). In comparison, long-term or short-term experimental "Evolution" is associated with genetic changes (mutations). Concerning the Evolution (not the Response), the long-term experimental evolution (>10,000 generations) was performed only with the wild-type genome, and the short-term experimental evolution (500~2,000 generations) was more often conducted with both wild-type and reduced genomes, to our knowledge. Previous landmark studies have intensively discussed comparing the wild-type and reduced genomes. Our study was restricted to the reduced genome, which was constructed differently from those reduced genomes used in the reported studies. The experimental evolution of the reduced genomes has been performed in the presence of additional additives, e.g., antibiotics, alternative carbon sources, etc. That is, neither the genomic backgrounds nor the evolutionary conditions were comparable. Comparison of nothing common seems to be unproductive. We sincerely hope the recommended topics can be applied in our future work.

      Some minor suggestions

      • Figures S3 & Table S2 need an explanation of the abbreviations of gene categories.

      Sorry for the missing information. Figure S3 and Table S3 were revised to include the names of gene categories. The figure was pasted followingly for a quick reference.

      Author response image 3.

      • I hope the authors can re-consider the title; "Diversity for commonality" does not make much sense to me. For example, it can be simply just "Diversity and commonality."

      Thank you for the suggestion. The title was simplified as follows.

      (L1) “Experimental evolution for the recovery of growth loss due to genome reduction.”

      • It is not easy for me to locate and distinguish the RNA-seq vs DNA-seq files in DRA013662 at DDBJ. Could you make some notes on what RNA-seq actually are, vs what DNA-seq files actually are?

      Sorry for the mistakes in the DRA number of DNA-seq. DNA-seq and RNA-seq were deposited separately with the accession IDs of DRA013661 and DRA013662, respectively. The following correction was made in the revision.

      (L382-383) “The raw datasets of DNA-seq were deposited in the DDBJ Sequence Read Archive under the accession number DRA013661.”

    1. Author response:

      eLife assessment

      In this valuable study, Kumar et al., provide evidence suggesting that the p130Cas drives the formation of condensates that sprout from focal adhesions to cytoplasm and suppress translation. Pending further substantiation, this study was found to be likely to provide previously unappreciated insights into the mechanisms linking focal adhesions to the regulation of protein synthesis and was thus considered to be of broad general interest. However, the evidence supporting the proposed model was incomplete; additional evidence is warranted to substantiate the relationship between p130Cas condensates and mRNA translation and establish corresponding functional consequences.

      We thank the Elife editorial team for their positive assessment of the broad significance of our manuscript. We fully agree that the functional consequences need to be explored in more detail. We feel that many of the criticisms are valid points that are not easily addressed via available tools, thus, should be considered limitations of present approaches. We hope that readers appreciate that identification of a new class of liquid-liquid phase separations calls for much more work to fully explore their characteristics, regulation and function, which will likely advance many areas of cell biology and perhaps even medicine.

      Reviewer #1 (Public Review):

      Summary:

      The authors demonstrated the phenomenon of p130Cas, a protein primarily localized at focal adhesions, and its formation of condensates. They identified the constituents within the condensates, which include other focal adhesion proteins, paxillin, and RNAs. Furthermore, they proposed a link between p130Cas condensates and translation.

      Strengths:

      Adhesion components undergo rapid exchange with the cytoplasm for some unclear biological functions. Given that p130Cas is recognized as a prominent mechanical focal adhesion component, investigating its role in condensate formation, particularly its impact on the translation process, is intriguing and significant.

      We thank the reviewer for recognizing the functional significance of the work.

      Weaknesses:

      The authors identified the disordered region of p130Cas and investigated the formation of p130Cas condensate. They attempted to demonstrate that p130Cas condensates inhibit translation, but the results did not fully support this assertion. There are several comments below:

      (1) Despite isolating p130Cas-GFP protein using GFP-trap beads, the authors cannot conclusively eliminate the possibility of isolating p130Cas from focal adhesions. While the characterization of the GFP-tagged pulls can reveal the proteins and RNAs associated with p130Cas, they need to clarify their intramolecular mechanism of localization within p130Cas droplets. Whether the protein condensates retain their liquid phase or these GFP-p130Cas pulls represent protein aggregate remains uncertain.

      We agree, the isolation from cell lysates does not distinguish between focal adhesions and cytoplasmic LLPS. We note that p130Cas in focal adhesions also appears to be in LLPS. But there are no methods available to isolate them separately. We acknowledge this is a limitation of the study.

      (2) The authors utilized hexanediol and ammonium acetate to highlight the phenomenon of p130Cas condensates. Although hexanediol is an inhibitor for hydrophobic interactions and ammonium acetate is a salt, a more thorough explanation of the intramolecular mechanisms underlying p130Cas protein-protein interaction is required. Additionally, given that the size of p130Cas condensates can exceed >100um2, classification is needed to differentiate between p130Cas condensates and protein aggregation.

      Ammonium acetate, which works by promoting hydrophobic interactions and weak Van der Waals forces, has been widely used in phase separation studies to change ionic strength without altering intracellular pH. Conversely, hexanediol weakens hydrophobic/ Van der Walls interactions that commonly mediate phase separation of IDRs. In the case of p130Cas, the multiple tyrosines and within the scaffolding domain are obvious targets. If the reviewer is asking us to resolve the detailed hydrophobic interactions within the scaffolding domain, this is far beyond the scope of the current paper.

      Protein aggregates are defined by their characteristics (e.g irreversibility, departure from spherical) not by size. Older, larger droplets remain circular and show slower but still measurable rates of exchange. Moreover, droplets are essentially absent after trypsinizing and replating cells. All these results argue against aggregates.

      (3) The connection between p130Cas condensates and translation inhibition appears tenuous. The data only suggests a correlation between p130Cas expression and translation inhibition. Further evidence is required to bolster this hypothesis.

      The optogenetic experiment shows that triggering LLPS by dimerizing p130Cas results in inhibition of translation. This is a causal not a correlative experiment. The reviewer may be thinking that dimerizing p130Cas could stimulate focal adhesion signaling, activating FAK or a src family kinase or other signals. However, none of these signals has been linked to inhibition of cell growth or migration. Thus, we agree that this is a limitation but consider it a low probability mechanism.

      Reviewer #2 (Public Review):

      Summary:

      In this article, Kumar et al., report on a previously unappreciated mechanism of translational regulation whereby p130Cas induces LLPS condensates that then traffic out from focal adhesion into the cytoplasm to modulate mRNA translation. Specifically, the authors employed EGFP-tagged p130Cas constructs, endogenous p130Cas, and p130Cas knockouts and mutants in cell-based systems. These experiments in conjunction with various imaging techniques revealed that p130Cas drives assembly of LLPS condensates in a manner that is largely independent of tyrosine phosphorylation. This was followed by in vitro EGFP-tagged p130Cas-dependent induction of LLPS condensates and determination of their composition by mass spectrometry, which revealed enrichment of proteins involved in RNA metabolism in the condensates. The authors excluded the plausibility that p130Cas-containing condensates co-localize with stress granules or p-bodies. Next, the authors determined mRNA compendium of p130Cas-containing condensates which revealed that they are enriched in transcripts encoding proteins implicated in cell cycle progression, survival, and cell-cell communication. These findings were followed by the authors demonstrating that p130Cas-containing condensates may be implicated in the suppression of protein synthesis using puromycylation assay. Altogether, it was found that this study significantly advances the knowledge pertinent to the understanding of molecular underpinnings of the role of p130Cas and more broadly focal adhesions on cellular function, and to this end, it is likely that this report will be of interest to a broad range of scientists from a wide spectrum of biomedical disciplines including cell, molecular, developmental and cancer biologists.

      Strengths:

      Altogether, this study was found to be of potentially broad interest inasmuch as it delineates a hitherto unappreciated link between p130Cas, LLPS, and regulation of mRNA translation. More broadly, this report provides unique molecular insights into the previously unappreciated mechanisms of the role of focal adhesions in regulating protein synthesis. Overall, it was thought that the provided data sufficiently supported most of the authors' conclusions. It was also thought that this study incorporates an appropriate balance of imaging, cell and molecular biology, and biochemical techniques, whereby the methodology was found to be largely appropriate.

      We thank reviewer for this positive assessment.

      Weaknesses:

      Two major weaknesses of the study were noted. The first issue is related to the experiments establishing the role of p130Cas-driven condensates in translational suppression, whereby it remained unclear whether these effects are affecting global mRNA translation or are specific to the mRNAs contained in the condensates. Moreover, some of the results in this section (e.g., experiments using cycloheximide) may be open to alternative interpretation. The second issue is the apparent lack of functional studies, and although the authors speculate that the described mechanism is likely to mediate the effects of focal adhesions on e.g., quiescence, experimental testing of this tenet was lacking.

      We appreciate the reviewer’s insights. Assessing translational inhibition for specific genes rather than global measurement of translation is an important direction for future work.

      Regarding the cycloheximide experiments, we are unsure what the reviewer means. We used it as a control for puromycin labeling but this is a very standard approach. It seems more likely that the question concerns Fig 5G, where we used it to sequester mRNAs on ribosomes to deplete from other pools. In this case, p130cas condensates decrease after 2 minutes. The reviewer may be suggesting that this effect could be due to blocked translation per se and loss of short-lived proteins. We acknowledge that this is possible but given the very rapid effect (2 min), we think it unlikely.

      Lastly, we agree with the reviewer that further functional studies in quiescence or senescence are warranted; however, these are extensive, open-ended studies and we will not be able to include them as part of the current paper.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      In this valuable study, the authors investigate the transcriptional landscape of tuberculous meningitis, revealing important molecular differences contributed by HIV co-infection. Whilst some of the evidence presented is compelling, the bioinformatics analysis is limited to a descriptive narrative of gene-level functional annotations, which are somewhat basic and fail to define aspects of biology very precisely. Whilst the work will be of broad interest to the infectious disease community, validation of the data is critical for future utility.

      We appreciate with eLife’s positive assessment, although we challenge the conclusion that we ‘fail to define aspects of biology very precisely’. Our stated objective was to use bioinformatics tools to identify the biological pathways and hub genes associated with TBM pathogenesis and the eLife assessment affirms we have investigated ‘the transcriptional landscape of tuberculous meningitis’. To more precisely define aspects of the biology will require another study with different design and methods.

      Reviewer #1 (Public Review):

      Summary:

      Tuberculous meningitis (TBM) is one of the most severe forms of extrapulmonary TB. TBM is especially prevalent in people who are immunocompromised (e.g. HIV-positive). Delays in diagnosis and treatment could lead to severe disease or mortality. In this study, the authors performed the largest-ever host whole blood transcriptomics analysis on a cohort of 606 Vietnamese participants. The results indicated that TBM mortality is associated with increased neutrophil activation and decreased T and B cell activation pathways. Furthermore, increased angiogenesis was also observed in HIV-positive patients who died from TBM, whereas activated TNF signaling and down-regulated extracellular matrix organisation were seen in the HIV-negative group. Despite similarities in transcriptional profiles between PTB and TBM compared to healthy controls, inflammatory genes were more active in HIV-positive TBM. Finally, 4 hub genes (MCEMP1, NELL2, ZNF354C, and CD4) were identified as strong predictors of death from TBM.

      Strengths:

      This is a really impressive piece of work, both in terms of the size of the cohort which took years of effort to recruit, sample, and analyse, and also the meticulous bioinformatics performed. The biggest advantage of obtaining a whole blood signature is that it allows an easier translational development into a test that can be used in the clinical with a minimally invasive sample. Furthermore, the data from this study has also revealed important insights into the mechanisms associated with mortality and the differences in pathogenesis between HIV-positive and HIV-negative patients, which would have diagnostic and therapeutic implications.

      Weaknesses:

      The data on blood neutrophil count is really intriguing and seems to provide a very powerful yet easy-to-measure method to differentiate survival vs. death in TBM patients. It would be quite useful in this case to perform predictive analysis to see if neutrophil count alone, or in combination with gene signature, can predict (or better predict) mortality, as it would be far easier for clinical implementation than the RNA-based method. Moreover, genes associated with increased neutrophil activation and decreased T cell activation both have significantly higher enrichment scores in TBM (Figure 9) and in morality (Figure 8). While I understand the basis of selecting hub genes in the significant modules, they often do not represent these biological pathways (at least not directly associated in most cases). If genes were selected based on these biologically relevant pathways, would they have better predictive values?

      We conducted a sensitivity analysis including blood neutrophil as a potential predictor in the multivariate Cox elastic-net regression model for important predictor selection (Table S14). In this analysis, all six selected important predictors (genes and clinical risk factors) identified in the original analysis (Table S13) were also selected, together with blood neutrophil number. Additionally, we evaluated the predictive value of blood neutrophil alone, which demonstrated poor performance, with an optimism-corrected AUC of 0.63 for all TBM, 0.67 for HIV-negative TBM, and 0.70 for HIV-positive TBM. Even when combined with identified gene signatures, blood neutrophil did not improve the overall performance of predictive model (optimism-corrected AUC of 0.79 for all TBM, 0.76 for HIV-negative TBM, and 0.80 for HIV-positive). These results indicate that identified hub genes exhibit better predictive values compared to blood neutrophil alone or in combination. These findings have been incorporated into our manuscript results.

      To test whether pathway representative genes have better predictive values than hub genes, we included all these genes in the analysis for important predictor selection. Pathway representative genes comprised ANXA3 and CXCR2 representing neutrophil activation and IL1b representing acute inflammatory response. We observed that all hub genes (MCEMP1, NELL2, ZNF354C, and CD4) consistently emerged as the most important genes with the highest selection in the models, compared to the rest, in both the HIV-negative TBM and HIV-positive TBM cohorts. Additionally, these identified hub genes were still selected when testing together with other hub genes representing relevant biological pathways associated with TBM mortality, such as CYSTM1 involved in neutrophil activation, TRAF5 involved in NF-kappa B signaling pathway, CD28 and TESPA1 involved in T cell receptor signaling. These results show that selected genes based on known biologically relevant pathways did not give better predictive values than the identified hub genes in the significant modules.

      Reviewer #2 (Public Review):

      Summary:

      This manuscript describes the analysis of blood transcriptomic data from patients with TB meningitis, with and without HIV infection, with some comparison to those of patients with pulmonary tuberculosis and healthy volunteers. The objectives were to describe the comparative biological differences represented by the blood transcriptome in TBM associated with HIV co-infection or survival/mortality outcomes and to identify a blood transcriptional signature to predict these outcomes. The authors report an association between mortality and increased levels of acute inflammation and neutrophil activation, but decreased levels of adaptive immunity and T/B cell activation. They propose a 4-gene prognostic signature to predict mortality.

      Strengths:

      Biological evaluations of blood transcriptomes in TB meningitis and their relationship to outcomes have not been extensively reported previously.

      The size of the data set is a major strength and is likely to be used extensively for secondary analyses in this field of research.

      Weaknesses:

      The bioinformatic analysis is limited to a descriptive narrative of gene-level functional annotations curated in GO and KEGG databases. This analysis cannot be used to make causal inferences. In addition, the functional annotations are limited to 'high-level' terms that fail to define biology very precisely. At best, they require independent validation for a given context. As a result, the conclusions are not adequately substantiated. The identification of a prognostic blood transcriptomic signature uses an unusual discovery approach that leverages weighted gene network analysis that underpins the bioinformatic analyses. However, the main problem is that authors seem to use all the data for discovery and do not undertake any true external validation of their gene signature. As a result, the proposed gene signature is likely to be overfitted to these data and not generalisable. Even this does not achieve significantly better prognostic discrimination than the existing clinical scoring.

      As explained in response to the eLife assessment, our objective was to use bioinformatics tools to identify the biological pathways and hub genes associated with TBM pathogenesis. We agree that ‘This analysis cannot be used to make causal inferences’: that would require different study design and approaches. The proposed gene signature has higher AUC values than the existing clinical model alone or in combination with clinical risk factors (Table 4). We agree that independent validation of the gene signature will be a crucial next step for future utility. We have performed qPCR in another sample set, and have added these results in the revision (Table 4 and supplementary figure S8)

      Reviewer #1 (Recommendations For The Authors):

      I have a few additional comments most of which are relatively minor:

      (1) Can the authors please clarify if all the PTB cases are also HIV-negative?

      This has been added to the methods section.

      (2) For Table 1, can the authors please list the total number of patients with microbiologically confirmed TB regardless of the methods used? And for the two TBM groups, was the positive microbiology based on CSF findings?

      The total number of patients with microbiologically confirmed TB was presented in Table 2 in definite TBM group, which was microbiologically confirmed TB diagnosed using microscopy, culture, and Xpert testing in cerebrospinal fluid (CSF) samples. We have updated the note in Table 2 to provide clarity on the definition.

      (3) How was the discovery and validation set selected? Was it based on randomisation?

      We randomly split TBM data into two datasets, a discovery cohort (n=142) and a validation cohort (n=139) with a purpose to ensure reproducibility of data analysis. We described this in the methods section.

      (4) Line 107 can be better clarified by stating that the overall 3-month mortality rate is 21.7% for TBM regardless of HIV status.

      Thank you, we have restated this sentence in the results section.

      (5) The authors stated that samples were collected at enrolment when patients would have received less than 6 days of anti-tubercular treatment. Is there information on the median and IQR on the number of days that the patients would have received Rx, especially between the groups? Did the authors control for this variable when analysing for DEGs?

      One of criteria to enroll participants in LAST-ACT and ACT-HIV trials is that they must receive less than 6 consecutive days of two or more drugs active against M. tuberculosis. However, the information of the days that the patients would have received Rx was not recorded and we could not control this variable when performing differential expression analysis for DEGs. This has been clarified further in the methods section: ‘The samples were taken at enrollment, when patients could not have received more than 6 consecutive days of two or more drugs active against M. tuberculosis.’

      (6) I am a little bit concerned with the reads mapping accuracy (57%) to the human genome, which is fairly low. Did the authors investigate the reasons behind this low accuracy?

      Thank you. It was indeed a typo. We have corrected it in the results section.

      (7) On Tables S2-S4, can the authors please clarify what the last column (labelled as "B") shows?

      Tables S2-S4 now have been changed to S3-S5. We have updated the legend of these tables to provide clarification regarding the meaning of the last column.

      Reviewer #2 (Recommendations For The Authors):

      If the authors wish to revise their manuscript, I suggest the following amendments:

      (1) Provide a consort diagram for the selection of samples included in the present analysis (from parent study cohorts), allocation to test and validation splits for bioinformatics analysis, and outcomes.

      We have provided our consort diagram in supplementary Figure S10.

      (2) Provide details of inclusion criteria for pulmonary TB cohort, and how samples from this cohort were selected for inclusion in the present analysis. Please clarify whether this cohort excluded HIV-positive participants by design or by chance.

      The inclusion criteria for the pulmonary TB cohort were described in the methods section. Due to the very low prevalence of HIV in this prospective observational study, HIV-positive participants were excluded. We have clarified in the amended manuscript that the pulmonary TB cohort only included HIV-negative participants.

      (3) Baseline characteristics of HIV-positive participants (Table 1) should include CD4 count, HIV viral load, and whether anti-retroviral therapy was naïve or experienced.

      We have included pre-treatment CD4 cell count, information on anti-retroviral therapy, and HIV viral load data in Table 1, as well as described these information in the results section.

      (4) I note that the TBM samples were derived from RCTs of adjunctive steroid therapy, but not stratified in the present analysis by treatment arm allocation. Clearly, this may affect the survival/mortality outcomes that are the central focus of this manuscript. Therefore, they should be included in the models for differential gene expression analysis and prognostic signature discovery. To do so, the authors may need to wait until they are able to unblind the trial metadata.

      With permission from the trial investigators, we were able to adjust the analyses for treatment with corticosteroids. The investigators remained blind to the allocation and we have not reported any direct effects of corticosteroids on outcome – such an analysis could only be done once the LAST-ACT trial has been reported (which won’t be until the end of 2024). Treatment outcome and effect were blinded by extracting only the fold change difference between survival and death in the linear regression model, in which gene expression was outcome and survival and treatment were covariates.

      (5) I understood from the methods (lines 460-461) that batch correction of the RNAseq data was necessary. However, it is not clear how the samples were batched. PCA of the transcriptomes before and after batch correction with batch and study group labels should be provided. I would also advocate for a sensitivity analysis to check the robustness of the main findings without batch correction. I assume Fig2A represents batch-corrected data, but this is not clear.

      We have now added information about the RNA sequencing batch and the batch correction approach, analyses and data visualizations utilized batch-corrected data in the methods section. We have also updated results related to batch correction in Fig. 2A and Supplementary Figure S9.

      (6) I would encourage the authors to include a differential gene expression analysis to directly compare the transcriptome of TBM to that of pulmonary TB. I think it would add additional value to their focus on describing the transcriptome in TBM.

      We thank for reviewer’s suggestion. Conducting differential gene expression analysis to compare the transcriptome of TBM with that of PTB is beyond the scope of this manuscript and we will examine this question separately.

      (7) I don't really understand the purpose of splitting their data set into test and validation for the purposes of showing that WGCNA analysis is mostly reproduced in the two halves of the data. I would advocate that they scrap this approach to maximise the statistical power of their analysis in the descriptive work.

      As mentioned in response to reviewer #1 in question #3, the purpose of splitting data is to ensure the reproducibility of the data analysis as suggested by Langfelder et al. (PMID: 21283776). This approach served two purposes: (i) to affirm the existence of functional modules in an independent cohort and (ii) to validate the association of interested modules or their hub genes with survival outcomes.

      (8) The authors should soften the confidence in their interpretation of the GO/KEGG annotations of WGCNA modules. At least, they should include a paragraph that explicitly details the limitations of their analyses, including (i) the accuracy GO/KEGG annotations are not validated in this context (if at all), (ii) that none of the data can be used to make causal inferences and (iii) that peripheral blood assessments that are obviously impacted by changes in cellular composition of peripheral blood do not necessarily reflect immunopathogenesis at the site of disease - in fact if circulating cells are being recruited to the site of disease or other immune compartments, then quite the opposite interpretations may be true.

      We appreciate the reviewer's comment. (i) In our analysis, we initially confirmed the existence of Weighted Gene Co-expression Network Analysis (WGCNA) modules in discovery cohort and validated the association of these modules with mortality outcomes in validation cohort. We then applied GO/KEGG annotations to define the biological functions involved in WGCNA modules. Finally, we performed Qusage analysis to directly test the association of top-hit pathways of each WGCNA module with mortality outcomes (see supplementary S6). This analysis approach helped to identify and validate modules and biological pathways associated with TBM mortality in this context, avoiding potential false positives in GO/KEGG annotations of WGCNA modules. (ii) We agree with the assessment that 'This analysis cannot be used to make causal inferences,' as that would require a different study design and approach. (iii) The focus of this study is to investigate the pathogenesis of TBM in the systemic immune system. We have highlighted this focus in the title and the aim of the manuscript.

      (9) For the prognostic signature discovery and validation, I strongly recommend the authors include more robust validation. For example, to undertake an 80:20 split for sequential discovery (for feature selection and derivation of a prognostic model), followed by validation of a 'locked' model in data that made no contribution to discovery. In two separate sensitivity analyses. I also suggest they split their dataset (i) by treatment allocation in the RCT and (ii) by HIV status. In addition, their method for feature selection has to be clearer- precisely how they select hub genes from their WGCNA analysis as candidate predictors is not explained. Since this is such a prominent output of their manuscript, the results of this analysis should really be included in the main manuscript, and all performance metrics for discrimination should include confidence intervals.

      Employing an 80:20 split for training and testing models is a good approach for an internal validation. However, we addressed the issue of overestimating the performance of a prognostic model by bootstrapping sampling approach proposed by Steyerberg et al. (PMID: 11470385). This approach has been proven to provide stable estimates with low bias. The overall model performance for discrimination, reported in our manuscript, was corrected for “optimism” to ensure internal validity. This adjustment was achieved through a 1000-times bootstrapping approach, which effectively accounted for estimation uncertainty. As such, there is no need to present confidence intervals for these metrics.

      Moreover, in our revision, to confirm prognostic signatures independently, we have evaluated the predictive value of identified gene signatures using qPCR in another set of samples. The results have been added in Table 4, supplementary Figure S8 and the results section.

      For the reasons given above (comment 4), we are unable to split our dataset by treatment allocation in this analysis. But as described, we have adjusted the analysis for corticosteroid treatment. Once the primary results of the LAST ACT trial have been published, we will examine the impact of corticosteroids on TBM pathophysiology and outcomes, seeking to better understand the mechanisms by which steroids have their therapeutic effects.

      Given the difference in pathogenesis and immune response by HIV-coinfection, we stratified our analysis by HIV status. As the reviewer’s suggestion, we have provided additional details in the methods section regarding the selection of hub genes from associated WGCNA modules and the feature selection process for predictive modeling.

    1. Author response:

      The following is the authors’ response to the previous reviews

      We extend our sincere gratitude for the invaluable comments provided by the reviewers and yourself, along with the constructive suggestions to enhance the quality of our manuscript. In response to this invaluable feedback, we have diligently revised and resubmitted our paper as an article, introducing five primary figures, seven supplementary figures, and two supplementary data files. Importantly, this work represents a significant contribution to the field, presenting novel findings for the first time without any prior publication.

      Within the enclosed document, we have provided a comprehensive response to the editor and reviewer comments, addressing each point meticulously and specifically. We extend our heartfelt thanks to the reviewers and yourself for your diligent examination of our manuscript and for offering insightful recommendations.

      In our latest revision, we have taken great care to address every comment, ensuring that we clarify the manuscript and provide robust evidence where required. We have meticulously highlighted the modifications within the manuscript in yellow for your convenience, while also including the modifications made in response to each specific comment. The primary focus of these revisions was to provide additional context regarding the relationship between PARP-1 and mono-methylated histones. Substantial modifications were made to our discussion section to address this point.

      Another concern raised was regarding the discrepancy in the relationship of PR-SET7 and PARP-1 between our study and the recent study by Estève et al. (PMID: 36434141). We have revised the results and discussion sections to discuss this concern.

      Addressing Reviewer 2’s concern about the potential indirect role of PARP1 in the regulation of some metabolic genes despite its direct binding to loci coding for metabolic genes we revised the discussion section to highlight this possibility.

      Enclosed, you will find a detailed, point-by-point response to each of the editor’s and reviewers' comments, showcasing our commitment to addressing their concerns with precision.

      We firmly believe that our revisions successfully resolve all the concerns raised by the editor and the reviewers, and we are confident that this improved version of our manuscript contributes significantly to the scientific discourse. Once again, we thank you for considering our work, and please feel free to contact me if you require any additional information.

      In the revised manuscript, most of the concerns raised by the reviewers have been addressed satisfactorily. However, as suggested by reviewer#2, it would have been more significant, if the PARP1-mediated reading of global mono-methylation of histone could be addressed. At least the mechanisms of selectivity of PARP1 need further convincing discussion.

      We thank the editor for their valuable comments. We have extended our discussion section to discuss in more detail the relationship between PARP1 and mono-methylated histones. In our refined Discussion section, we have endeavored to articulate more clearly how PARP-1 may be selectively recruited to active chromatin domains through its interaction with mono-methylated histone marks. We propose a model wherein PARP-1 actively participates in the turnover process, contributing to the maintenance of an active chromatin environment. This mechanism entails PARP-1 selectively binding to mono-methylated active histone marks associated with highly transcribed genes. Upon activation, PARP-1 undergoes automodification, leading to its release from chromatin and facilitating the reassembly of nucleosomes carrying the mono-methylated marks. Subsequently, the enzymatic action of Poly(ADP)-ribose glycohydrolase (PARG) cleaves pADPr, enabling the restoration of PARP-1's binding affinity to mono-methylated active histone marks. This proposed hypothesis is consistent with existing research across various model organisms and aligns with the known association of PARP-1 with highly expressed genes, as well as its role in mediating nucleosome dynamics and assembly.

      Our modified Discussion section unfolds as follows:

      "Finally, highly transcribed genes have been reported to present a high turnover of mono-methylated modifications, maintaining a state of low methylation (50). Moreover, our previous study revealed that PARP1 preferentially binds to highly active genes (34).  Consequently, our findings suggest an active involvement of PARP-1 in the turnover process to maintain an active chromatin environment. This proposed mechanism unfolds in the following steps: 1) PARP-1 selectively binds to mono-methylated active histone marks associated with highly transcribed genes. 2) Upon activation, PARP-1 undergoes automodification and subsequently disengages from chromatin, facilitating the reassembly of nucleosomes carrying the mono-methylated marks. 3) The enzymatic action of Poly(ADP)-ribose glycohydrolase (PARG) cleaves pADPr, restoring PARP-1's binding affinity to mono-methylated active histone marks. This proposed hypothesis is consistent with existing research conducted across various model organisms, including mice, Drosophila, and Humans (7, 24, 30, 51-53). Notably, previous studies have consistently demonstrated that PARP-1 predominantly associates with highly expressed genes and plays a crucial role in mediating nucleosome dynamics and assembly. Thus, our proposed model provides a molecular framework that may contribute to understanding the relationship between PARP-1 and the epigenetic regulation of gene expression."

      We trust that these revisions effectively address the editor’s comment and enhance the overall strength and clarity of our manuscript.

      Furthermore, recent developments in the area are omitted, as an important publication hasn't been discussed anywhere in the work (PMID: 36434141).

      We appreciate the editor's thorough review of our revised manuscript and the responses to the previous reviewer's comments. To address this important concern, we have carefully investigated the levels of PR-SET7 in parp1 hypomorphic conditions.

      Supplemental Fig. S4 and S5 demonstrate that in the absence of Parp1, there were no significant changes observed in PR-SET7 RNA or protein levels, respectively. This finding supports the conclusion that Parp1 is not directly involved in the regulation of PR-SET7 in Drosophila contrasting with the findings of Estève et al.'s study (PMID: 36434141). This discrepancy may arise from differing relationships between PARP-1 and PR-SET7, which could cooperate in the context of Drosophila development while playing antagonistic roles in specific cell lines or under particular conditions.

      We have updated the Results section to explicitly mention this observation:

      "Interestingly, in the absence of PARP-1, neither PR-SET7 RNA nor protein levels were affected (Supplemental Fig.S4-5), indicating that PARP-1 is not directly implicated in the regulation of pr-set7. This finding contrasts with recent evidence demonstrating PARP1-induced degradation of PR-SET7/SET8 in human cells (16)."

      Furthermore, we have modified the discussion section to address this discrepancy:

      "A recent study demonstrated that in human cells overexpressing PARP-1, PR-SET7/SET8 is degraded, whereas depletion of PARP-1 leads to an increase in PR-SET7/SET8 levels (16). However, in our study involving parp-1 mutant in Drosophila third-instar larvae revealed a nuanced scenario: we detected a minor but not significant reduction in both PR-SET7 RNA and protein levels (Supplemental Fig.S4 and S5). This outcome stands in stark contrast to the previous study's findings. The discrepancy could be due to the distinct experimental approaches used: the previous research focused on mammalian cells and in vitro experiments, whereas our study examined the functions of PARP-1 in whole Drosophila third-instar larvae during development. Consequently, while PARP-1 may cooperate with PR-SET7 in the context of Drosophila development, it could exhibit antagonistic roles against PR-SET7 in specific cell lines and under certain biological or developmental conditions."

      We believe that these modifications effectively address the raised concern and provide a more comprehensive understanding of the relationship between PARP1 and PR-SET7 in our study. We hope these clarifications enhance the overall robustness and clarity of our findings.

      Reviewer #2 (Public Review):

      Summary:

      This study from Bamgbose et al. identifies a new and important interaction between H4K20me and Parp1 that regulates inducible genes during development and heat stress. The authors present convincing experiments that form a mostly complete manuscript that significantly contributes to our understanding of how Parp1 associates with target genes to regulate their expression.

      Strengths:

      The authors present 3 compelling experiments to support the interaction between Parp1 and H4K20me, including:

      (1) PR-Set7 mutants remove all K4K20me and phenocopy Parp mutant developmental arrest and defective heat shock protein induction.

      (2) PR-Set7 mutants have dramatically reduced Parp1 association with chromatin and reduced poly-ADP ribosylation.

      (3) Parp1 directly binds H4K20me in vitro.

      Weaknesses:

      (1) The RNAseq analysis of Parp1/PR-Set7 mutants is reasonable, but there is a caveat to the author's conclusion (Line 251): "our results indicate H4K20me1 may be required for PARP-1 binding to preferentially repress metabolic genes and activate genes involved in neuron development at co-enriched genes." An alternative possibility is that many of the gene expression changes are indirect consequences of altered development induced by Parp1 or PR-Set7 mutants. For example, Parp1 could activate a transcription factor that represses metabolic genes. The authors counter this model by stating that Parp1 directly binds to "repressed" metabolic genes. While this argument supports their model, it does not rule out the competing indirect transcription factor model. Therefore, they should still mention the competing model as a possibility.

      We appreciate Reviewer 2's insightful comments during both rounds of revision, which have significantly enriched the quality of our manuscript. The binding of PARP1 to loci encoding metabolic genes indeed suggests a direct role of PARP1 in their regulation. However, we acknowledge Reviewer 2's point that some of these targets might be regulated indirectly, with PARP1 potentially modulating the expression of intermediary transcription factors.

      To address this possibility, we have revised the discussion section of our manuscript accordingly:

      "Remarkably, our observations indicate a notable affinity of PARP-1 for binding to the gene bodies of these metabolic genes (34), suggesting a direct involvement of PARP1 in their regulation. Nonetheless, it remains plausible that certain genes may be indirectly regulated by PARP1 through intermediary transcription factors."

      We trust that this modification adequately addresses Reviewer 2's concern.

      (2) The section on inducibility of heat shock genes is interesting but missing an important control that might significantly alter the author's conclusions. Hsp23 and Hsp83 (group B genes) are transcribed without heat shock, which likely explains why they have H4K20me without heat shock. The authors made the reasonable hypothesis that this H4K20me would recruit Parp-1 upon heat shock (line 270). However, they observed a decrease of H4K20me upon heat shock, which led them to conclude that "H4K20me may not be necessary for Parp1 binding/activation" (line 275). However, their RNA expression data (Fig4A) argues that both Parp1 and H40K20me are important for activation. An alternative possibility is that group B genes indeed recruit Parp1 (through H4K20me) upon heat shock, but then Parp1 promotes H3/H4 dissociation from group B genes. If Parp1 depletes H4, it will also deplete H4K20me1. To address this possibility, the authors should also do a ChIP for total H4 and plot both the raw signal of H4K20me1 and total H4 as well as the ratio of these signals. The authors could also note that Group A genes may similarly recruit Parp1 and deplete H3/H4 but with different kinetics than Group B genes because their basal state lacks H4K20me/Parp1. To test this possibility, the authors could measure Parp association, H4K20methylation, and H4 depletion at more time points after heat shock at both classes of genes.

      We sincerely appreciate Reviewer 2 for their insightful comment on our manuscript. Your hypothesis regarding the potential induction of H3/H4 dissociation from group B genes by PARP-1, leading to a reduction in H4K20me1, offers a thought-provoking perspective. However, our findings suggest an alternative interpretation.

      Our data indicate that while H4K20me1 is indeed present under normal conditions at group B genes, its reduction following heat shock does not seem to impede PARP-1's role in transcriptional activation (Fig. 4A, C, and E). Instead, we propose that this decrease in H4K20me1 might signify a regulatory shift in chromatin structure, facilitating transcriptional activation during heat shock, with PARP-1 playing an independent facilitating role. Moreover, existing studies have highlighted the dual role of H4K20me1, acting as a promoter of transcription elongation in certain contexts and as a repressor in others.

      The elevated enrichment of H4K20me1 in group B genes under normal conditions may indeed indicate a repressive state that requires alleviation for transcriptional activation. Additionally, we cannot discount the possibility of unique regulatory functions associated with PR-SET7, extending beyond its recognized role as a histone methylase. Non-catalytic activities and potential interactions with non-histone substrates might contribute to the nuanced control exerted by PR-SET7 on group B genes during heat stress.

      Furthermore, our exploration of pr-set720 and ParpC03256 mutants reveals distinct roles for PARP-1 and H4K20me1 in modulating gene expression (Fig 3E). This reinforces the notion that the interplay between PR-SET7 and PARP-1 involves a multifaceted regulatory mechanism.

      To address these points, we have revised the discussion section of our manuscript accordingly:

      "Another plausible explanation could be that the recruitment of PARP-1 to group B genes loci promotes H4 dissociation and then leads to a reduction of H4K20me1. However, our findings suggest an alternative interpretation: the decrease in H4K20me1 at group B genes during heat shock does not seem to impede PARP-1's role in transcriptional activation, (Fig.4A, C and E). Rather than disrupting PARP-1 function, we propose that this reduction in H4K20me1 may signify a regulatory shift in chromatin structure, priming these genes for transcriptional activation during heat shock, with PARP-1 playing an independent facilitating role. Moreover, existing studies have highlighted the dual role of H4K20me1, acting as a promoter of transcription elongation in certain contexts and as a repressor in others (13, 26, 39, 40, 42-46). The elevated enrichment of H4K20me1 in group B genes under normal conditions may indicate a repressive state that requires alleviation for transcriptional activation. Additionally, we cannot discount the possibility of unique regulatory functions associated with PR-SET7, extending beyond its recognized role as a histone methylase. Non-catalytic activities and potential interactions with non-histone substrates might contribute to the nuanced control exerted by PR-SET7 on group B genes during heat stress (47, 48). Furthermore, our exploration of pr-set720 and parp-1C03256 mutants reveals distinct roles for PARP-1 and H4K20me1 in modulating gene expression (Fig 3E). This reinforces the notion that the interplay between PR-SET7 and PARP-1 involves a multifaceted regulatory mechanism. Understanding the intricate relationship between these molecular players is crucial for elucidating the complexities of gene expression modulation under heat stress conditions."

      We believe that this modification enhances the clarity of our conclusions and adequately addresses Reviewer 2's concerns regarding the intricate relationship between PARP-1, H4K20me1, and PR-SET7 in transcriptional regulation under heat stress conditions.

    1. Author response:

      Reviewer #1 (Public Review):

      Summary:

      The endocannabinoid system (ECS) components are dysregulated within the lesion microenvironment and systemic circulation of endometriosis patients. Using endometriosis mouse models and genetic loss of function approaches, Lingegowda et al. report that canonical ECS receptors, CNR1 and CNR2, are required for disease initiation, progression, and T-cell dysfunction.

      Strengths:

      The approach uses genetic approaches to establish in vivo causal relationships between dysregulated ECS and endometriosis pathogenesis. The experimental design incorporates bulk RNAseq approaches, as well as imaging mass spectrometry to characterize the mouse lesions. The identification of immune-related and T-cell-specific changes in the lesion microenvironment of CNR1 and CNR2 knockout (KO) mice represents a significant advance

      Weaknesses:

      Although the mouse phenotypic analyses involve a detailed molecular characterization of the lesion microenvironment using genomic approaches, detailed measurements of lesion size/burden and histopathology would provide a better understanding of how CNR1 or CNR2 loss contributes to endometriosis initiation and progression. The cell or tissue-specific effects of the CNR1 and CNR2 are not incorporated into the experimental design of the studies. Although this aspect of the approach is recognized as a major limitation, global CNR1 and CNR2 KO may affect normal female reproductive tract function, ovarian steroid hormone levels, decidualization response, or lead to preexisting alterations in host or donor tissues, which could affect lesion establishment and development in the surgically induced, syngeneic mouse model of endometriosis.

      We appreciate the reviewer's thoughtful and constructive feedback. We agree that the additional measurements of lesion size/burden and histopathology would provide valuable insights into the specific contributions of CNR1 and CNR2 to endometriosis progression. However, the focus of this study was on assessing the alterations in complex immune microenvironment due to the absence of CNR1 and CNR2, given their close relation in regulating immune cell populations. We will plan to incorporate these measurements in future studies to further strengthen the understanding of the disease pathogenesis. Regarding the potential effects of global knockout, the reviewer raises a valid concern. To address this, we will explore cell and/or tissue-specific knockout models in future experiments to better isolate the direct effects of CNR1 and CNR2 on the disease process, while minimizing potential confounding factors from systemic alterations.

      Reviewer #2 (Public Review):

      Summary:

      The endocannabinoid system (ECS) regulates many critical functions, including reproductive function. Recent evidence indicates that dysregulated ECS contributes to endometriosis pathophysiology and the microenvironment. Therefore, the authors further examined the dysregulated ECS and its mechanisms in endometriosis lesion establishment and progression using two different endometrial sources of mouse models of endometriosis with CNR1 and CNR2 knockout mice. The authors presented differential gene expressions and altered pathways, especially those related to the adaptive immune response in CNR1 and CNR2 ko lesions. Interestingly, the T-cell population was dramatically reduced in the peritoneal cavity lacking CNR2, and the loss of proliferative activity of CD4+ T helper cells. Imaging mass cytometry analysis provided spatial profiling of cell populations and potential relationships among immune cells and other cell types. This study provided fundamental knowledge of the endocannabinoid system in endometriosis pathophysiology.

      Strengths:

      Dysregulated ECS and its mechanisms in endometriosis pathogenesis were assessed using two different endometrial sources of mouse models of endometriosis with CNR1 and CNR2 knockout mice. Not only endometriotic lesions, but also peritoneal exudate (and splenic) cells were analyzed to understand the specific local disease environment under the dysregulated ECS.

      Providing the results of transcriptional profiles and pathways, immune cell profiles, and spatial profiles of cell populations support altered immune cell population and their disrupted functions in endometriosis pathogenesis via dysregulation of ECS.

      In line 386: Role of CNR2 in T cells. The finding that nearly absent CD3+ T cells in the peritoneal cavity of CNR2 ko mice is intriguing.

      The interpretation of the results is well-described in the Discussion.

      Weaknesses:

      The study was terminated and characterized 7 days after EM induction surgery without the details for selecting the time point to perform the experiments.

      The authors also mentioned that altered eutopic endometrium contributes to the establishment and progression of endometriosis. This reviewer agrees with lines 324-325. If so, DEGs are likely identified between eutopic endometrium (with/without endometriosis lesion induction) and ectopic lesions. It would be nice to see the data (even though using publicly available data sets).

      Figure 7 CDEF. The results of the statistical analyses and analyzed sample numbers should be added. Lines 444-450 cannot be reviewed without them.

      This reviewer agrees with lines 498-500. In contrast, retrograded menstrual debris is not decidualized. The section could be modified to avoid misunderstanding.

      We would like to thank the reviewer for insightful comments, suggestions and acknowledging the importance of the work presented in this manuscript.

      Regarding 7-day time point, we have provided rationale in lines 479-481, but agree that it isn’t sufficient and hence we have provided additional details on the selection of the 7-day time point for the experiments in methods section (Mouse model of EM). We have also noted the suggestion on providing comparison of differentially expressed genes in the eutopic endometrium vs ectopic lesions. Since there are publications comparing the eutopic vs ectopic gene expression patterns (PMIDs: 33868805 and 18818281), including a study exploring the ECS genes in the endometrium throughout different menstrual cycles (PMID: 35672435), we believe additional analysis using the same dataset may not yield new information. However, we see the value in reviewer’s comment, and we will look at the gene expression patterns in the uterine vs endometriosis like lesions in our future studies with tissue or cell specific CNR1 and CNR2 knockout models to understand functional relevance of ECS in endometriosis initiation.

      Since the IMC study was exploratory for proof of concept, we did not have enough biological replicates for meaningful statistical validation (n = 2-3). We have clarified this information in the methods, results, and figure legends for appropriately representing the limitations of the current setup.

      Finally, we appreciate the feedback on the section discussing retrograded menstrual debris. Even though the menstrual debris may not be decidualized, some endometriotic lesions have the ability to decidualize based on their response to estrogen and progesterone in a cycling manner (PMID: 26450609), similar to the endometrium in the uterine cavity. We have clarified this in the revised MS.

    1. Author response:

      Public Reviews:

      Reviewer #1:

      Summary:

      Casas-Tinto et al. present convincing data that injury of the adult Drosophila CNS triggers transdifferentiation of glial cells and even the generation of neurons from glial cells. This observation opens up the possibility of getting a handle on the molecular basis of neuronal and glial generation in the vertebrate CNS after traumatic injury caused by Stroke or Crush injury. The authors use an array of sophisticated tools to follow the development of glial cells at the injury site in very young and mature adults. The results in mature adults revealing a remarkable plasticity in the fly CNS and dispels the notion that repair after injury may be only possible in nerve cords which are still developing. The observation of so-called VC cells which do not express the glial marker repo could point to the generation of neurons by former glial cells.

      Conclusion:

      The authors present an interesting story that is technically sound and could form the basis for an in-depth analysis of the molecular mechanism driving repair after brain injury in Drosophila and vertebrates.

      Strengths:

      The evidence for transdifferentiation of glial cells is convincing. In addition, the injury to the adult CNS shows an inherent plasticity of the mature ventral nerve cord which is unexpected.

      Weaknesses:

      Traumatic brain injury in Drosophila has been previously reported to trigger mitosis of glial cells and generation of neural stem cells in the larval CNS and the adult brain hemispheres. Therefore this report adds to but does not significantly change our current understanding. The origin and identity of VC cells is unclear.

      The Reviewer correctly points out that it has been reported that traumatic brain injury trigger generation of neural stem cells. However, according to previous reports, those cells where quiescent Dpn+ neuroblast. We now report that already differentiated adult neuropil glia transdifferentiate into neurons. Which is a new mechanism not previously reported.

      We agree with the reviewer regarding the identity of VC neurons although according to the results of G-TRACE experiments the origin is clear, they originate from neuropil glia (i.e. Astrocyte-like glia and ensheathing glia). We will use a battery of antibodies previously reported to identify specific subtypes of neurons to identify these newly generated neurons.

      Reviewer #2:

      Summary:

      Casas-Tinto et al., provide new insight into glial plasticity using a crush injury paradigm in the ventral nerve cord (VNC) of adult Drosophila. The authors find that both astrocyte-like glia (ALG) and ensheating glia (EG) divide under homeostatic conditions in the adult VNC and identify ALG as the glial population that specifically ramps up proliferation in response to injury, whereas the number of EGs decreases following the insult. Using lineage-tracing tools, the authors interestingly observe the interconversion of glial subtypes, especially of EGs into ALGs, which occurs independent of injury and is dependent on the availability of the transcription factor Prospero in EGs, adding to the plasticity observed in the system. Finally, when tracing the progeny of differentiated glia, Casas-Tinto and colleagues detect cells of neuronal identity and provide evidence that such glia-derived neurogenesis is specifically favored following ventral nerve cord injury, which puts forward a remarkable way in which glia can respond to neuronal damage.

      Numerous experiments have been carried out in 7-day-old flies, showing that the observed plasticity is not due to residual developmental remodeling or a still immature VNC.

      By elegantly combining different genetic tools, the authors show glial divisions with mitotic-dependent tracing and find that the number of generated glia is refined by apoptosis later on.

      The work identifies Prospero in glia as an important coordinator of glial cell fate, from development to the adult context, which draws further attention to the upstream regulatory mechanisms.

      We express our gratitude to the reviewer for their keen appreciation of our efforts and their enthusiasm for the outcomes of this research.

      Weaknesses:

      Although the authors do use a variety of methods to show glial proliferation, the EdU data (Figure 1B) could be more informative (Figure 1B) by displaying images of non-injured animals and providing quantifications or the mention of these numbers based on results previously acquired in the system.

      We appreciate the Reviewer’s comment. We believed that adding images of non-injured animals did not add new information as we already quantified the increase of glial proliferation upon injury in Losada-Perez let al. 2021. Besides, the porpoise of this experiment was to figure out if dividing cells where Astrocyte-like glia rather than the number of dividing cells. Comparing independent experiments could be tricky but if we compare the quantifications of G2-M glia (repo>fly-Fucci) done in Losada-Perez et al 2021 (fig 1C) with the quantifications of G2-M neuropil glia done in this work (fig 1C) we can see that the numbers are comparable.

      The experiments relying on the FUCCI cell cycle reporter suggested considerable baseline proliferation for EGs and ALGs, but when using an independent method (Twin Spot MARCM), mitotic marking was only detected for ALGs. This discrepancy could be addressed by assessing the co-localization of the different glia subsets using the identified driver lines with mitotic markers such as PH3.

      In our understanding this discrepancy could be explained by the magnitude of proliferation. The lower proliferation rate of EG (as indicate the fly-fucci experiments) combining with the incomplete efficiency of MARCM clones induction reduces considerably the chances of finding EG MARCM clones. PH3 is a mitotic marker but it is also found in apoptotic cells (Kim and Park 2012. DOI: 10.1371/journal.pone.0044307), however we can do the suggested experiment and quantify the results.

      The data in Figure 1C would be more convincing in combination with images of the FUCCI Reporter as it can provide further information on the location and proportion of glia that enter the cell cycle versus the fraction that remains quiescent.

      We will add the suggested images.

      The analyses of inter-glia conversion in Figure 3 are complicated by the fact that Prospero RNAi is both used to suppress EG - to ALG conversion and as a marker to establish ALG nature. Clarifications if the GFP+ cells still expressed Pros or were classified as NP-like GFP cells are required here.

      As described in the text, Pros is a marker for ALG and the results suggest that Prospero expression is required for the EG to ALG transition. We will clarify these concepts in the text accordingly. In figure 3 we showed images of NP-like cells originated from EG that are prospero+, and therefore supporting the transdifferentiation from EG to ALG.

      The conclusion that ALG and EG glial cells can give rise to cells of neuronal lineage is based on glial lineage information (GFP+ cells from glial G-trace) and staining for the neuronal marker Elav. The use of other neuronal markers apart from Elav or morphological features would provide a more compelling case that GFP+ cells are mature neurons.

      We completely agree with the reviewer's observation regarding the identity of VC neurons. We will try to identify the identity of these cells using previously described antibodies to identify neuronal populations. We will also appreciate any suggestions regarding the antibodies we can use

      Although the text discusses in which contexts, glial plasticity is observed or increased upon injury, the figures are less clear regarding this aspect. A more systematic comparison of injured VNCs versus homeostatic conditions, combined with clear labelling of the injury area would facilitate the understanding of the panels.

      We appreciate the Reviewer’s observation. We will carefully check all figures in order to increase their clarity

      Context/Discussion

      The study finds that glia in the ventral cord of flies have latent neurogenic potential. Such observations have not been made regarding glia in the fly brain, where injury is reported to drive glial divisions or the proliferation of undifferentiated progenitor cells with neurogenic potential.

      Discussing this different strategy for cell replacement adopted by glia in the VNC and pointing out differences to other modes seems fascinating. Highlighting differences in the reactiveness of glia in the VNC compared to the brain also seems highly relevant as they may point to different properties to repair damage.

      Based on the assays employed, the study points to a significant amount of glial "identity" changes or interconversions, which is surprising under homeostatic conditions. The significance of this "baseline" plasticity remains undiscussed, although glia unarguably show extensive adaptations during nervous system development.

      It would be interesting to know if the "interconversion" of glia is determined by the needs in the tissue or would shift in the context of selective ablation/suppression of a glial type.

      We deeply appreciate the Reviewer’s enthusiasm on this subject, it is indeed fascinating. We made a reduced discussion in order to fit in the eLife Short report requirements but the specific condition that trigger glial interconversion are of great interest for us. To compromise EG or ALG viability and evaluate the behaviour of glial cells is of great interest for developmental biology and regeneration, but the precise scenario to develop these experiments is not well defined. In this report, we aim to reproduce an injury in Drosophila brain and this model should serve to analyze cellular behaviours. The scenario where we deplete on specific subpopulation of glial cells is conceptually attractive, but far away from the scope of this report.

      Reviewer #3:

      In this manuscript, Casas-Tintó et al. explore the role of glial cells in the response to a neurodegenerative injury in the adult brain. They used Drosophila melanogaster as a model organism and found that glial cells are able to generate new neurons through the mechanism of transdifferentiation in response to injury.

      This paper provides a new mechanism in regeneration and gives an understanding of the role of glial cells in the process.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Public Review):

      In this manuscript, Huang and colleagues explored the role of iron in bacterial therapy for cancer. Using proteomics, they revealed the upregulation of bacterial genes that uptake iron, and reasoned that such regulation is an adaptation to the iron-deficient tumor microenvironment. Logically, they engineered E. Coli strains with enhanced iron-uptake efficiency, and showed that these strains, together with iron scavengers, suppress tumor growth in a mouse model. Lastly, they reported the tumor suppression by IroA-E. Coli provides immunological memory via CD8+ T cells. In general, I find the findings in the manuscript novel and the evidence convincing.

      (1) Although the genetic and proteomic data are convincing, would it be possible to directly quantify the iron concentration in (1) E. Coli in different growth environments, and (2) tumor microenvironment? This will provide the functional consequences of upregulating genes that import iron into the bacteria.

      We appreciate the reviewer’s comment regarding the precise quantification of iron concentrations. In our study, we attempted various experimental approaches, including Immunohistochemistry utilizing an a Fe3+ probe, iron assay kit (ab83366), and Inductively Coupled Plasma Mass Spectrometry (ICP-MS). Despite these attempts, the quantification of oxidized Fe3+ concentrations proved challenging due to the inherently low levels of Fe ions and difficulty to distinguish Fe2+ and Fe3+. We observed measurements below the detection threshold of even the sensitive ICP-MS technique. To circumvent this limitation, we designed an experiment wherein bacteria were cultured in a medium supplemented with Chrome Azurol S (CAS) reagent, which colormetrically detects siderophore activity. We compared WT bacteria and IroA-expressing bacteria at varying levels of Lcn2 proteins. The outcome, as depicted in the updated Fig. 3b, reveals an enhanced iron acquisition capability in IroA-E. coli under the presence of Lcn2 proteins, in comparison to the wild-type E. coli strains. In addition to the Lcn2 study, the proteomic study in Figure 4 highlights the competitive landscape between cancer cells and bacteria. We observed that IroA-E. coli showed reduced stress responses and exerted elevated iron-associated stress to cancer cells, thus further supporting the IroA-E. coli’s iron-scavenging capability against nutritional immunity.

      (2) Related to 1, the experiment to study the synergistic effect of CDG and VLX600 (lines 139-175) is very nice and promising, but one flaw here is a lack of the measurement of iron concentration. Therefore, a possible explanation could be that CDG acts in another manner, unrelated to iron uptake, that synergizes with VLX600's function to deplete iron from cancer cells. Here, a direct measurement of iron concentration will show the effect of CDG on iron uptake, thus complementing the missing link.

      We appreciate the reviewer’s comment and would like to point the reviewer to our results in Figure S3, which shows that the expression of CDG enhances bacteria survival in the presence of LCN2 proteins, which reflects the competitive relationship between CDG and enterobactin for LCN2 proteins as previously shown by Li et al. [Nat Commun 6:8330, 2015]. We regret to inform the reviewer that direct measurement of iron concentration was attempted to no avail due to the limited sensitivity of iron detecting assays. We do acknowledge that CDG may exert different effects in addition to enhancing iron uptake, particularly the potentiation of the STING pathway. We pointed out such effect in Fig 2c that shows enhanced macrophage stimulation by the CDG-expressing bacteria. We would like to accentuate, however, that a primary objective of the experiment is to show that the manipulation of nutritional immunity for promoting anticancer bacterial therapy can be achieved by combining bacteria with iron chelator VLX600. The multifaceted effects of CDG prompted us to focus on IroA-E. coli in subsequent experiments to examine the role of nutritional immunity on bacterial therapy. We have updated the associated text to better convey our experimental design principle.

      Lines 250-268: Although statistically significant, I would recommend the authors characterize the CD8+ T cells a little more, as the mechanism now seems quite elusive. What signals or memories do CD8+ T cells acquire after IroA-E. Coli treatment to confer their long-term immunogenicity?

      We apologize for the overinterpretation of the immune memory response in our previous manuscript and appreciate the reviewer’s recommendation to further characterize CD8+ T cells post-IroA-E. coli treatment. Our findings, which show robust tumor inhibition in rechallenge studies, indicate establishment of anticancer adaptive immune responses. As the scope of the present work is aimed at demonstrating the value of engineered bacteria for overcoming nutritional immunity, expounding on the memory phenotypes of the resulting cellular immunity is beyond the scope of the study. We do acknowledge that our initial writing overextended our claims and have revised the manuscript accordingly. The revised manuscript highlights induction of anticancer adaptive immunity, attributable to CD8+ T cells, following the bacterial therapy.

      (3) Perhaps this goes beyond the scope of the current manuscript, but how broadly applicable is the observed iron-transport phenomenon in other tumor models? I would recommend the authors to either experimentally test it in another model or at least discuss this question.

      We highly appreciate the reviewer’s suggestion regarding the generalizability of the iron-transport phenomenon in diverse tumor models. To address this, we extended our investigations beyond the initial model, employing B16-F10 melanoma and E0771 breast cancer in mouse subcutaneous models. The results, as depicted in Figures 3g to 3j and Figure S5, demonstrate the superiority of IroA-E. coli over WT bacteria in tumor inhibition. These findings support the broad implication of nutritional immunity as well as the potential of iron-scavenging bacteria for different solid tumor treatments.

      Reviewer #2 (Public Review):

      Summary:

      The authors provide strong evidence that bacteria, such as E. coli, compete with tumor cells for iron resources and consequently reduce tumor growth. When sequestration between LCN2 and bacterobactin is blocked by upregulating CDG(DGC-E. coli) or salmochelin(IroA-E.coli), E. coli increase iron uptake from the tumor microenvironment (TME) and restrict iron availability for tumor cells. Long-term remission in IroA-E.coli treated mice is associated with enhanced CD8+ T cell activity. Additionally, systemic delivery of IroA-E.coli shows a synergistic effect with chemotherapy reagent oxaliplatin to reduce tumor growth.

      Strengths:

      It is important to identify the iron-related crosstalk between E. coli and TME. Blocking lcn2-bacterobactin sequestration by different strategies consistently reduces tumor growth.

      Weaknesses:

      As engineered E.coli upregulate their function to uptake iron, they may increase the likelihood of escaping from nutritional immunity (LCN2 becomes insensitive to sequester iron from the bacteria). Would this raise the chance of developing sepsis? Do authors think that it is safe to administrate these engineered bacteria in mice or humans?

      We appreciate the reviewer’s comment on the safety evaluation of the iron-scavenging bacteria. To address the concern, we assessed the potential risk of sepsis development by measuring the bacterial burden and performing whole blood cell analyses following intravenous injection of the engineered bacteria. As illustrated in Figures 3k and 3l, our findings indicate that the administration of these engineered bacteria does not elevate the risk of sepsis. The blood cell analysis suggests that mice treated with the bacteria eventually return to baseline levels comparable to untreated mice, supporting the safety of this approach in our experimental models.

      Reviewer #3 (Public Review):

      Summary:

      Based on their observation that tumor has an iron-deficient microenvironment, and the assumption that nutritional immunity is important in bacteria-mediated tumor modulation, the authors postulate that manipulation of iron homeostasis can affect tumor growth. They show that iron chelation and engineered DGC-E. coli have synergistic effects on tumor growth suppression. Using engineered IroA-E. coli that presumably have more resistance to LCN2, they show improved tumor suppression and survival rate. They also conclude that the IroA-E. coli treated mice develop immunological memory, as they are resistant to repeat tumor injections, and these effects are mediated by CD8+ T cells. Finally, they show synergistic effects of IroA-E. coli and oxaliplatin in tumor suppression, which may have important clinical implications.

      Strengths:

      This paper uses straightforward in vitro and in vivo techniques to examine a specific and important question of nutritional immunity in bacteria-mediated tumor therapy. They are successful in showing that manipulation of iron regulation during nutritional immunity does affect the virulence of the bacteria, and in turn the tumor. These findings open future avenues of investigation, including the use of different bacteria, different delivery systems for therapeutics, and different tumor types.

      Weaknesses:

      • There is no discussion of the cancer type and why this cancer type was chosen. Colon cancer is not one of the more prominently studied cancer types for LCN2 activity. While this is a proof-of-concept paper, there should be some recognition of the potential different effects on different tumor types. For example, this model is dependent on significant LCN production, and different tumors have variable levels of LCN expression. Would the response of the tumor depend on the role of iron in that cancer type? For example, breast cancer aggressiveness has been shown to be influenced by FPN levels and labile iron pools.

      We highly appreciate the reviewer’s insightful comment on the varying LCN2 activities across different tumor types. In light of the reviewer’s suggestion, we extended our investigations beyond the initial colon cancer model, employing B16-F10 melanoma and E0771 breast cancer in mouse subcutaneous models. The results, as depicted in Figures 3g to 3j and Figure S5, demonstrate that IroA-E. coli consistently outperforms WT bacteria in tumor inhibition. We acknowledge the reviewer’s comment regarding LCN2 being more prominently examined in breast cancer and have highlighted this aspect in the revised manuscript. For colon and melanoma cancers, several reports have pointed out the correlation of LCN2 expression and the aggressiveness of these cancers [Int J Cancer. 2021 Oct 1;149(7):1495-1511][Nat Cancer. 2023 Mar;4(3):401-418], albeit to a lesser extent. These findings support the broad implication of nutritional immunity as well as the potential of iron-scavenging bacteria for different solid tumor treatments. The manuscript has been revised to reflect the reviewer’s insightful comment.

      • Are the effects on tumor suppression assumed to be from E. coli virulence, i.e. Does the higher number of bacteria result in increased immune-mediated tumor suppression? Or are the effects partially from iron status in the tumor cells and the TME?

      We appreciate the reviewer’s question regarding the therapeutic mechanism of IroA-E. coli. Bacterial therapy exerts its anticancer action through several different mechanisms, including bacterial virulence, nutrient and ecological competition, and immune stimulation. Decoupling one mechanism from another would be technically challenging and beyond the scope of the present work. With the objective of demonstrating that an iron-scavenging bacteria can elevate anticancer activity by circumventing nutritional immunity, we highlight our data in Fig. S6, which shows that IroA-E. coli administration resulted in higher bacterial colonization within solid tumors compared to WT-E. coli on Day 15. This increased bacterial presence supports our iron-scavenging bacteria design, and we highlight a few anticancer mechanisms mediated by the engineered bacteria. Firstly, as shown in Fig. 4d, IroA-E. coli is shown to induce an elevated iron stress response in tumor cells as the treated tumor cells show increased expression of transferrin receptors. Secondly, our experiments involving CD8+ T cell depletion indicates that the IroA-E. coli establishes a more robust anticancer CD8+ T cell response than WT bacteria. Both immune-mediated responses and alterations in iron status within the tumor microenvironment are demonstrated to contribute to the enhanced anticancer activity of IroA-E. coli in the present study.

      • If the effects are iron-related, could the authors provide some quantification of iron status in tumor cells and/or the TME? Could the proteomic data be queried for this data?

      We appreciate the reviewer’s query regarding the quantification of iron concentrations. In our study, we attempted various experimental approaches, including Immunohistochemistry utilizing an a Fe3+ probe, iron assay kit (ab83366), and Inductively Coupled Plasma Mass Spectrometry (ICP-MS). Despite these attempts, the quantification of oxidized Fe3+ concentrations proved challenging due to the inherently low levels of Fe ions and difficulty to distinguish Fe2+ and Fe3+. We observed measurements below the detection threshold of even the sensitive ICP-MS technique. Consequently, to circumvent this limitation, we designed an experiment wherein bacteria were cultured in a medium supplemented with Chrome Azurol S (CAS) reagent, which colormetrically detects siderophore activity. We compared WT bacteria and IroA-expressing bacteria at varying levels of Lcn2 proteins. The outcome, as depicted in the updated Fig. 3b, reveals an enhanced iron acquisition capability in IroA-E. coli under the presence of Lcn2 proteins, in comparison to the wild-type E. coli strains. In addition to the Lcn2 study, the proteomic study in Figure 4 highlights the competitive landscape between cancer cells and bacteria. We observed that IroA-E. coli showed reduced stress responses and exerted elevated iron-associated stress to cancer cells, thus further supporting the IroA-E. coli’s iron-scavenging capability against nutritional immunity.

      Reviewing Editor:

      The authors provide compelling technically sound evidence that bacteria, such as E. coli, can be engineered to sequester iron to potentially compete with tumor cells for iron resources and consequently reduce tumor growth. Long-term remission in IroA-E.coli treated mice is associated with enhanced CD8+ T cell activity and a synergistic effect with chemotherapy reagent oxaliplatin is observed to reduce tumor growth. The following additional assessments are needed to fully evaluate the current work for completeness; please see individual reviews for further details.

      We appreciate the editor’s positive comment.

      (1) The premise is one of translation yet the authors have not demonstrated that manipulating bacteria to sequester iron does not provide a potential for sepsis or other evidence that this does not increase the competitiveness of bacteria relative to the host. Only tumor volume was provided rather than animal survival and cause of death, but bacterial virulence is enhanced including the possibility of septic demise. Alternatively, postulated by the authors, that tumor volume is decreased due to iron sequestration but they do not directly quantify the iron concentration in (1) E. Coli in different growth environments, and (2) tumor microenvironment. These important endpoints will provide the functional consequences of upregulating genes that import iron into the bacteria.

      We appreciate the editor’s comment and have added substantial data to support the translational potential of the iron-scavenging bacteria. In particular, we added evidence that the iron-scavenging bacteria does not increase the risk of sepsis (Fig. 3k, l), evidence of increased bacteria competitiveness and survival in tumor (Fig. S6), and iron-scavenging bacteria’s superior anticancer ability and survival benefit across 3 different tumor models (Fig. 3e-j; Fig. S5). While direct measurement of iron concentration in the tumor environment is technically difficult due to the challenge in differentiating Fe2+ and Fe3+ by available techniques, we added a colormetric CAS assay to demonstrate the iron-scavenging bacteria can more effectively utility Fe than WT bacteria in the presence of LCN2 (Fig. 3b). These results substantiate the translational relevance of the engineered bacteria.

      (2) There is no discussion of the cancer type and why this cancer type was chosen. If the current tumor modulation system is dependent on LCN2 activity, there would need to be some recognition that different tumors have variable levels of LCN expression. Would the response of the tumor depend on the role of iron in that cancer type?

      We appreciate the comment and added relevant text and citations describing clinical relevance of LCN2 expression associated with the tumor types used in the study (breast cancer, melanoma, and colon cancer). Elevated LCN2 has been associated with higher aggressiveness for all three cancer types.

      (3) To demonstrate long-term anti-cancer memory was established through enhancement of CD8+ T cell activity (Fig 5c), the "2nd seeding tumor cells" experiment may need to be done in CD8 antibody-treated IronA mice since CD8+ T cells may play a role in tumor suppression regardless of whether or not iron regulation is being manipulated. It appears that the control group for this experiment is naive mice (and not WT-E. coli treated mice), in which case the immunologic memory could be from having had tumor/E. coli rather than the effect of IroA-E. coli.

      We acknowledge that our prior writing may have overstated our claim on immunological memory. Our intention is to show that upon treatment and tumor eradication by iron-scavenging bacteria, adaptive immunity mediated by CD8 T cells can be elicited. We also did not consider a WT-E. coli control as no WT-E. coli treated group achieved complete tumor regression. We have modified our text to reflect our intended message.

      Reviewer #1 (Recommendations For The Authors):

      All the figures seem to be in low resolution and pixelated. Please upload high-resolution ones.

      We have updated figures to high-resolution ones.

      Reviewer #2 (Recommendations For The Authors):

      Some specific comments towards experiments:

      (1) For Fig 2 f/ Fig 3f/ Fig 5d/Fig6c, the survival rate is based on the tumor volume (the mouse was considered dead when the tumor volume exceeded 1,500 mm3). Did the mice die from the experiment (how many from each group)? If it only reflects the tumor size, do these figures deliver the same information as the tumor growth figure?

      We appreciate the reviewer’s comment. The survival rate is indeed based on tumor volume, and we used a cutoff of 1500 mm3. No death event was observed prior to the tumors reaching 1500 mm3. Although the survival figures cover some of the information conveyed by the tumor volume tracking, the figures offer additional temporal resolution of tumor progression with the survival figures. Having both tumor volume and survival tracking are commonly adopted to depict tumor progression. We have the protocol regarding survival monitoring to the materials and method section.

      (2) Fig 3a, not sure if entE is a good negative control for this experiment. Neg. Ctrl should maintain its CFU/ml at a certain level regardless of Lcn2 conc. However, entE conc. is at 100 CUF/ml throughout the experiment suggesting there is no entE in media or if it is supersensitive to Lcn2 that bacteria die at the dose of 0.1nM?

      We appreciate the reviewer’s comment. The △entE-E. coli was indeed observed to be highly sensitive to LCN2. We included the control to highlight the competitive relationship between entE and LCN2 for iron chelation, which is previously reported in literature [Biometals 32, 453–467 (2019)].

      (3) Fig 4, the authors harvested bacteria from the tumor by centrifuging homogenized samples at different speeds. Internal controls confirming sample purity (positive for bacteria and negative for cells for panels a,b,c; or vice versa for panel d) may be necessary. This comment may also apply to samples from Fig 1.

      We acknowledge the reviewer’s concern and would like to point out that the proteomic analysis was performed using a highly cited protocol that provides reference and normalization standards for E. coli proteins [Mol Cell Proteomics. 2014 Sep; 13(9): 2513–2526]. The reference is cited in the Materials and Method section associated with the proteomic analysis.

      (4) To demonstrate long-term anti-caner memory was established through enhancement of CD8+ T cell activity, the "2nd seeding tumor cells" experiment may need to be done in CD8 antibody-treated IronA mice.

      We have modified our claims to highlight that the tumor eradication by iron scavenging bacteria can establish adaptive anticancer immunity through the elicitation of CD8 T cells. We apologize for overstating our claim in the previous manuscript draft.

      Minor suggestions:

      (1) Please include the tumor re-challenge experiment in the method section.

      The re-challenge experiment has been added to the method section as instructed.

      (2) Please cite others' and your previous work. E.g. line 281, 282, line 306-307.

      We have added the citations as instructed.

      (3) Line 448, BL21 is bacteria, not cells.

      We have made the correction accordingly.

      Reviewer #3 (Recommendations For The Authors):

      • The authors postulate that IroA-E. coli is more potent than DGC-E. coli in resisting LCN2 activity, and that this potency is the cause of the increased tumor suppression of this engineered strain. If so, Fig 3a should include DGC-E. coli for direct comparison.

      We appreciate the reviewer for the comment and would like to clarify that we intended construct IroA-E. coli as a more specific iron-scavenging strategy, which can aide the discussion of nutritional immunity and minimize compounding factors from the immune-stimulatory effect of CDG. We have modified our text to clarify our stance.

      • The data refers to the effects of WT bacteria-mediated tumor suppression, e.g. Figure 3e shows that even WT bacteria have a significant suppressive effect on tumor growth. Could the authors provide background on what is known about the mechanism of this tumor suppression, outside of tumor targeting and engineerability? They only reference "immune system stimulation."

      We appreciate the reviewer’s comment and would like to refer the reviewer to our recently published article [Lim et al., EMBO Molecular Medicine 2024; DOI: 10.1038/s44321-023-00022-w], which shows that in addition to immune system stimulation, WT bacteria can also be perceived as an invading species in the tumor that can exert differential selective pressure against cancer cells. Competition for nutrient is highlighted as a major contribution to contain tumor growth. In fact, the nutrient competition that we observed in the prior article inspired the design of the iron scavenging bacteria towards overcoming nutritional immunity. We have cited this recently published article to the revised manuscript to enrich the background.

      • The authors claim that there is immunologic memory because of tumor resistance in re-challenged mice after IroA-E. coli treatment (Fig 5c). It appears that the control group for this experiment is naive mice (and not WT-E. coli treated mice), in which case the immunologic memory could be from having had tumor/E. coli rather than the effect of IroA-E. coli.

      We have modified our claims to highlight that the tumor eradication by iron scavenging bacteria can establish adaptive anticancer immunity through the elicitation of CD8 T cells. We did not intend to highlight that the adaptive immunity stemmed from IroA-E. coli only, and we intend to build upon current literature that has reported CD8+ T cell elicitation by bacterial therapy. The IroA-E.coli is shown to enhance adaptive immunity. We also did not consider a WT-E. coli control as no WT-E. coli treated group achieved complete tumor regression.

      • The authors claim that CD8+ T cells are mechanistically important in the effects of iron status manipulation in E. coli-mediated tumor suppression (Fig 5). In order to show this, it seems that Fig 5c should include WT-E. coli and WT-E. coli+CD8 ab groups, as it may be that CD8+ T cells play a role in tumor suppression regardless of whether or not iron regulation is being manipulated.

      We apologize for the confusion from our prior writing. We have modified our claims to highlight that the tumor eradication by iron scavenging bacteria can establish adaptive anticancer immunity through the elicitation of CD8 T cells. We did not intend to convey that CD8+ T cells are mechanistically important in the effects of iron status manipulation.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      We thank the editorial team and reviewers for their continued contributions to improve our work.

      Below we have addressed the final recommendations to the authors

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      I asked previously why the suppression depth should vary based on the contrast change speed. I now understand that the authors expect this variation from a working model based on neural adaptation (lines 274-277 and 809-820). I suggest the authors specify this prediction also on lines 473-479, where there is room for improved clarity (the words/phrases 'impact,' 'be sensitive to,' and 'covary' are non-directional).

      We have now specified this prediction to improve clarity:

      Line 475 – 486

      “In the context of the tCFS method, the steady increases and decreases in the target’s actual strength (i.e., its contrast) should, respectively, boost its emergence from suppression (bCFS) and facilitate its reversion to suppression (reCFS) as it competes against the mask. Whether construed as a consequence of neural adaptation or error signal, we surmise that these cycling state transitions defining suppression depth should be sensitive to the rate of contrast change of the monocular target. Specifically, the slower the contrast change, the greater the amount of accrued adaptation, which will contract the range between breakthrough and suppression thresholds according to an adapting reciprocal inhibition model. For fast contrast change, there will be less accrual of adaptation meaning that the range between breakthrough and suppression thresholds will exhibit less contraction. Expressed in operational terms, the depth of suppression should be positively related to the rate of target change. Experiment 3 tested this supposition using three rates of contrast change.”

      Line 108: 'By comparing the thresholds for a target to transition into (reCFS) and out of awareness (bCFS)'-are 'into' and 'out of' reversed?

      They were, thank you, these have now been corrected.

      Lines 696-698 read, 'Figure 3 shows that polar patterns tend to emerge from suppression at slightly lower contrasts than do gratings.' In the same paragraph, lines 716-171 read, 'Figure 3 shows that bCFS and reCFS thresholds are very similar for all image categories.' There is a statistically significant effect of category in these results; meanwhile, the differences among categories are arguably small. Which side do the authors intend to emphasize? Are the readers meant to interpret this as a glass-half-full, half-empty situation?

      We have now revised this paragraph. We emphasise that the small differences do not support ‘preferential processing’ of the magnitude that would be expected from category specific neural CRFs.

      From Line 702

      “Next we turn to another question raised about our conclusion concerning invariant depth of suppression. If a certain image type had overall lower bCFS and reCFS contrast thresholds relative to another image type (despite equivalent suppression depth), would that imply the former image enjoyed “preferential processing” relative to the latter? And, what would determine the differences in bCFS and reCFS thresholds? Figure 3 shows that polar patterns tend to emerge from suppression at slightly lower contrasts than do gratings and that polar patterns, once dominant, tend to maintain dominance to lower contrasts than do gratings and this happens even though the rate of contrast change is identical for both types of stimuli. But while rate of contrast change is identical, the neural responses to those contrast changes may not be the same: neural responses to changing contrast will depend on the neural contrast response functions (CRFs) of the cells responding to each of those two types of stimuli, where the CRF defines the relationship between neural response and stimulus contrast. CRFs rise monotonically with contrast and typically exhibit a steeply rising initial response as stimulus contrast rises from low to moderate values, followed by a reduced growth rate for higher contrasts. CRFs can vary in how steeply they rise and at what contrast they achieve half-max response. CRFs for neurons in mid-level vision areas such as V4 and FFA (which respond well to polar stimuli and faces, respectively) are generally steeper and shifted towards lower contrasts than CRFs for neurons in primary visual cortex (which respond well to gratings). Therefore, the effective strength of the contrast changes in our tCFS procedure will depend on the shape and position of the underlying CRF, an idea we develop in more detail in Supplementary Appendix 1, comparing the case of V1 and V4 CRFs. Interestingly, the comparison of V1 and V4 CRFs shows two interesting points: (i) that V4 CRFs should produce much lower bCFS and reCFS thresholds than V1 CRFs, and (ii) that V4 CRFs should produce much more suppression than V1 CRFs. Our data do not support either prediction: bCFS and reCFS thresholds for the polar shape are not ‘much lower’ than those for gratings (Fig. 3) and neither is there ‘much more’ suppression depth for the polar form. There is no room in these results to support the claim that certain images are special and receive “preferential processing” or processing outside of awareness. Instead, the similar data patterns for all image types is most parsimoniously explained by a single mechanism processing all images (see Appendix 1), although there are many other kinds of images still to be tested in tCFS and exceptions may yet be found. As a first step in exploring this idea, one could use standard psychophysical techniques (e.g., (Ling & Carrasco, 2006)) to derive CRFs for different categories of patterns and then measure suppression depth associated with those patterns using tCFS.”

    1. Author response:

      The following is the authors’ response to the original reviews.

      The reviewers praised multiple aspects of our study. Reviewer 1 noted that “the work aligns well with current research trends and will greatly interest researchers in the field.” Reviewer 2 highlighted the unique capability of our imaging approach, which “allows for investigation of the heterogeneity of response across individual dopamine axons, unlike other common approaches such as fiber photometry.” Reviewer 3 commented that “the experiments are beautifully executed” and “are revealing novel information about how aversive and rewarding stimuli is encoded at the level of individual axons, in a way that has not been done before.”

      In addition to the positive feedback, the reviewers also provided useful criticisms and suggestions, some of which may not be fully addressed in a single study. For instance, questions regarding whether dopamine axons encode the valence or specific identity of the stimuli, or the most salient aspects of the environment, remain open. At the same time, as all the reviewers agreed, our report on the diversity of dopamine axonal responses using a novel imaging design introduces significant new insights to the neuroscience community. Following the reviewers’ recommendations, we have refrained from making interpretations that could be perceived as overinterpretation, such as concluding that “dopamine axons are involved in aversive processing.” This has necessitated extensive revisions, including modifying the title of our manuscript to make clear that the novelty of our work is revealing ‘functional diversity’ using our new imaging approach.

      Below, we respond to the reviewers’ comments point by point.

      eLife assessment

      This valuable study shows that distinct midbrain dopaminergic axons in the medial prefrontal cortex respond to aversive and rewarding stimuli and suggest that they are biased toward aversive processing. The use of innovative microprism based two-photon calcium imaging to study single axon heterogeneity is solid, although the experimental design could be optimized to distinguish aversive valence from stimulus salience and identity in this dopamine projection. This work will be of interest to neuroscientists working on neuromodulatory systems, cortical function and decision making.

      Reviewer #1

      Summary:

      In this manuscript, Abe and colleagues employ in vivo 2-photon calcium imaging of dopaminergic axons in the mPFC. The study reveals that these axons primarily respond to unconditioned aversive stimuli (US) and enhance their responses to initially-neutral stimuli after classical association learning. The manuscript is well-structured and presents results clearly. The utilization of a refined prism-based imaging technique, though not entirely novel, is well-implemented. The study's significance lies in its contribution to the existing literature by offering single-axon resolution functional insights, supplementing prior bulk measurements of calcium or dopamine release. Given the current focus on neuromodulator neuron heterogeneity, the work aligns well with current research trends and will greatly interest researchers in the field.

      However, I would like to highlight that the authors could further enhance their manuscript by addressing study limitations more comprehensively and by providing essential details to ensure the reproducibility of their research. In light of this, I have a number of comments and suggestions that, if incorporated, would significantly contribute to the manuscript's value to the field.

      Strengths:

      • Descriptive.

      • Utilization of a well-optimized prism-based imaging method.

      • Provides valuable single-axon resolution functional observations, filling a gap in existing literature.

      • Timely contribution to the study of neuromodulator neuron heterogeneity.

      We thank the reviewer for this positive assessment.

      Weaknesses:

      (1) It's important to fully discuss the fact that the measurements were carried out only on superficial layers (30-100um), while major dopamine projections target deep layers of the mPFC as discussed in the cited literature (Vander Weele et al., 2018) and as illustrated in FigS1B,C. This limitation should be explicitly acknowledged and discussed in the manuscript, especially given the potential functional heterogeneity among dopamine neurons in different layers. This potential across-layer heterogeneity could also be the cause of discrepancy among past recording studies with different measurement modalities. Also, mentioning technical limitations would be informative. For example: how deep the authors can perform 2p-imaging through the prism? was the "30-100um" maximum depth the authors could get?

      Thank you for pointing out this important issue about layer differences.

      It is possible that the mesocortial pathway has layer-specific channels, with some neurons targeting supra granular layers and others targeting infragranular ones. Alternatively, it is also plausible that the axons of the same neurons branch into both superficial and deep layers. This is a critical issue that has not been investigated in anatomical studies and will require single-cell labeling of dopamine neurons (Matsuda et al 2009 and Aransay et al 2015). We now discuss this issue in the Discussion.

      As for the imaging depth of 30–100 m, we were unable to visualize deeper axons in a live view mode. Our imaging system has already been optimized to detect weak signals (e.g., we have employed an excitation wavelength of 980 nm, dispersion compensation, and a hybrid photodetector). It is possible that future studies using improved imaging approaches may be able to visualize deeper layers. Importantly, sparse axons in the supragranular layers are advantageous in detecting weak signals; dense labeling of axons would increase the background fluorescence relative to signals. We now reference this layer issue in the Results and Discussion sections.

      (2) In the introduction, it seems that the authors intended to refer to Poulin et al. 2018 regarding molecular/anatomical heterogeneity of dopamine neurons, but they inadvertently cited Poulin et al. 2016 (a general review on scRNAseq). Additionally, the statement that "dopamine neurons that project to the PFC show unique genetic profiles (line 85)" requires clarification, as Poulin et al. 2018 did not specifically establish this point. Instead, they found at least the Vglut2/Cck+ population projects into mPFC, and they did not reject the possibility of other subclasses projecting to mPFC. Rather, they observed denser innervation with DAT-cre, suggesting that non-Vglut2/Cck populations would also project to mPFC. Discuss the potential molecular heterogeneity among mPFC dopamine axons in light of the sampling limitation mentioned earlier.

      We thank the reviewer for pointing this out. Genetic profiles of PFC-projecting DA neurons are still being investigated, so describing them as “unique” was misleading. We have edited the Introduction accordingly, and now discuss this issue in detail in the Discussion.

      (3) I find the data presented in Figure 2 to be odd. Firstly, the latency of shock responses in the representative axons (right panels of G, H) is consistently very long - nearly 500ms. It raises a query whether this is a biological phenomenon or if it stems from a potential technical artifact, possibly arising from an issue in synchronization between the 2-photon imaging and stimulus presentation. My reservations are compounded by the notable absence of comprehensive information concerning the synchronization of the experimental system in the method section.

      The synchronization of the stimulus and data acquisition is accomplished at a sub-millisecond resolution. We use a custom-made MATLAB program that sends TTL commands to standard imaging software (ThorImage or ScanImage) and a stimulator for electrical shocks. All events are recorded as analogue inputs to a different DAQ to ensure synchronization. We have provided additional details regarding the configuration in the Methods section.

      We consider that the long latency of shock response is biological. For instance, a similar long latency was found after electrical shock in a photometry imaging study (Kim, …, Deisseroth, 2016).

      Secondly, there appear to be irregularities in Panel J. While the authors indicate that "Significant axons were classified as either reward-preferring (cyan) or aversive-preferring (magenta), based on whether the axons are above or below the unity line of the reward/aversive scatter plot (Line 566)," a cyan dot slightly but clearly deviates above the unity line (around coordinates (x, y) = (20, 21)). This needs clarification. Lastly, when categorizing axons for analysis of conditioning data in Fig3 (not Fig2), the authors stated "The color-coded classification (cyan/magenta) was based on k-means clustering, using the responses before classical conditioning (Figure 2J)". I do not understand why the authors used different classification methods for two almost identical datasets.

      We thank the reviewer for pointing out these insufficient descriptions. We classified the axons using k-means clustering, and the separation of the two clusters happened to roughly coincide with the unity line of the reward/aversive scatter plot in Fig 2J. In other words, we did not use the unity line to classify the data points (which is why the color separation of the histogram is not at 45 degrees). We have clarified this point in the Methods section.

      (4) In connection with Point 3, conducting separate statistical analyses for aversive and rewarding stimuli would offer a fairer approach. This could potentially reveal a subset of axons that display responses to both aversive and appetitive stimuli, aligning more accurately with the true underlying dynamics. Moreover, the characterization of Figure 2J as a bimodal distribution while disregarding the presence of axons responsive to both aversive and appetitive cues seems somewhat arbitrary and circular logic. A more inclusive consideration of this dual-responsive population could contribute to a more comprehensive interpretation.

      We also attempted k-means clustering with additional dimensions (e.g., temporal domains as shown in Fig. 3I, J), but no additional clusters were evident. We note that the lack of other clusters does not exclude the possibility of their existence, which may only become apparent with a substantial increase in the number of samples. In the current report, we present the clusters that were the easiest/simplest for us to identify.

      Additionally, we have revised our manuscript to reflect that many axons respond to both reward and aversive stimuli, and that aversive-preferring axons do not exclusively respond to the aversive stimulus.

      (5) The contrast in initialization to novel cues between aversive and appetitive axons mirrors findings in other areas, such as the tail-of-striatum (TS) and ventral striatum (VS) projecting dopamine neurons (Menegas et al., 2017, not 2018). You might consider citing this very relevant study and discussing potential collateral projections between mPFC and TS or VS.

      Thank you for pointing this out. We have now included Menegas et al., 2017, and also discuss the possibility of collaterals to these areas. In addition, we also referred to Azcorra et al., 2023 - this was published after our initial submission.

      (6) The use of correlation values (here >0.65) to group ROIs into axons is common but should be justified based on axon density in the FOV and imaging quality. It's important to present the distribution of correlation values and demonstrate the consistency of results with varying cut-off values. Also, provide insights into the reliability of aversive/appetitive classifications for individual ROIs with high correlations. Importantly, if you do the statistical testing and aversive/appetitive classifications for individual ROIs with above-threshold high correlation (to be grouped into the same axon), do they always fall into the same category? How many false positives/false negatives are observed?


      "Our results remained similar for different correlation threshold values (Line 556)" (data not shown) is obsolete.

      We have conducted additional analysis using correlation values 0.5 and 0.3 that resulted in a smaller number of axon terminals. In essence, the relationship between reward responses and aversive responses remained very similar to Fig. 2J, K.

      Author response image 1.

      Reviewer #2 (Public Review):

      Summary:

      This study aims to address existing differences in the literature regarding the extent of reward versus aversive dopamine signaling in the prefrontal cortex. To do so, the authors chose to present mice with both a reward and an aversive stimulus during different trials each day. The authors used high spatial resolution two-photon calcium imaging of individual dopaminergic axons in the medial PFC to characterize the response of these axons to determine the selectivity of responses in unique axons. They also paired the reward (water) and an aversive stimulus (tail shock) with auditory tones and recorded across 12 days of associative learning.

      The authors find that some axons respond to both reward and aversive unconditioned stimuli, but overall, there is a strong preference to respond to aversive stimuli consistent with expectations from prior studies that used other recording methods. The authors find that both of their two auditory stimuli initially drive responses in axons, but that with training axons develop more selective responses for the shock associated tone indicating that associative learning led to changes in these axon's responses. Finally, the authors use anticipatory behaviors during the conditioned stimuli and facial expressions to determine stimulus discrimination and relate dopamine axons signals with this behavioral evidence of discrimination. This study takes advantage of cutting-edge imaging approaches to resolve the extent to which dopamine axons in PFC respond appetitive or aversive stimuli. They conclude that there is a strong bias to respond to the aversive tail shock in most axons and weaker more sparse representation of water reward.

      Strengths:

      The strength of this study is the imaging approach that allows for investigation of the heterogeneity of response across individual dopamine axons, unlike other common approaches such as fiber photometry which provide a measure of the average population activity. The use of appetitive and aversive stimuli to probe responses across individual axons is another strength.

      We thank the reviewer for this positive assessment.

      Weaknesses:

      A weakness of this study is the design of the associative conditioning paradigm. The use of only a single reward and single aversive stimulus makes it difficult to know whether these results are specific to the valence of the stimuli versus the specific identity of the stimuli. Further, the reward presentations are more numerous than the aversive trials making it unclear how much novelty and habituation account for results. Moreover, the training seems somewhat limited by the low number of trials and did not result in strong associative conditioning. The lack of omission responses reported may reflect weak associative conditioning. Finally, the study provides a small advance in our understanding of dopamine signaling in the PFC and lacks evidence for if and what might be the consequence of these axonal responses on PFC dopamine concentrations and PFC neuron activity.

      We thank the reviewer for the suggestions.

      We agree that interpreting the response change during classical conditioning is not straightforward. Although the reward and aversive stimuli we employed are commonly used in the field, future studies with more sophisticated paradigms will be necessary to address whether dopamine axons encode the valence of the stimuli, the specific identity of the stimuli, or novelty and habituation. In our current manuscript, we refrain from making a conclusion that distinct groups of neurons encode different valances. In fact, many axons respond to both stimuli, at different ratios. We have removed descriptions that may suggest exclusive coding of reward or aversive processing. Additionally, we have extensively discussed possible interpretations.

      In terms of the strength of the conditioning association, behavioral results indicated that the learning plateaued – anticipatory behaviors did not increase during the last two phases when the conditioned span was divided into six phases (Figure 3–figure supplement 1).

      Our goal in the current manuscript is to provide new insight into the functional diversity of dopamine axons in the mPFC. Investigating the impact of dopamine axons on local dopamine concentration and neural activity in the mPFC is important but falls beyond the scope of our current study. In particular, given the functional diversity of dopamine axons, interpreting bulk optogenetic or chemogenetic axonal manipulation experiments would not be straightforward. As suggested, measuring the dopamine concentration through two-photon imaging of dopamine sensors and monitoring the activity of dopamine recipient neurons (e.g., D1R- or D2R-expressing neurons) is a promising approach that we plan to undertake in the near future.

      Reviewer #3 (Public Review):

      Summary:

      The authors image dopamine axons in medial prefrontal cortex (mPFC) using microprism-mediated two-photon calcium imaging. They image these axons as mice learn that two auditory cues predict two distinct outcomes, tailshock or water delivery. They find that some axons show a preference for encoding of the shock and some show a preference for encoding of water. The authors report a greater number of dopamine axons in mPFC that respond to shock. Across time, the shock-preferring axons begin to respond preferentially to the cue predicting shock, while there is a less pronounced increase in the water-responsive axons that acquire a response to the water-predictive cue (these axons also increase non-significantly to the shock-predictive cue). These data lead the authors to argue that dopamine axons in mPFC preferentially encode aversive stimuli.

      Strengths:

      The experiments are beautifully executed and the authors have mastered an impressively complex technique. Specifically, they are able to image and track individual dopamine axons in mPFC across days of learning. This technique is used the way it should be: the authors isolate distinct dopamine axons in mPFC and characterize their encoding preferences and how this evolves across learning of cue-shock and cue-water contingencies. Thus, these experiments are revealing novel information about how aversive and rewarding stimuli is encoded at the level of individual axons, in a way that has not been done before. This is timely and important.

      We thank the reviewer for this positive assessment.

      Weaknesses:

      The overarching conclusion of the paper is that dopamine axons preferentially encode aversive stimuli. This is prevalent in the title, abstract, and throughout the manuscript. This is fundamentally confounded. As the authors point out themselves, the axonal response to stimuli is sensitive to outcome magnitude (Supp Fig 3). That is, if you increase the magnitude of water or shock that is delivered, you increase the change in fluorescence that is seen in the axons. Unsurprisingly, the change in fluorescence that is seen to shock is considerably higher than water reward.

      We agree that the interpretation of our results is not straightforward. Our current manuscript now focuses on our strength, which is reporting the functional diversity of dopamine axons. Therefore, we avoid using the word ‘encode’ when describing the response.

      We believe that our results could reconcile the apparent discrepancy as to why some previous studies reported only aversive responses while others reported reward responses. In particular, if the reward volume were very small, the reward response could go undetected.

      Further, when the mice are first given unexpected water delivery and have not yet experienced the aversive stimuli, over 40% of the axons respond [yet just a few lines below the authors write: "Previous studies have demonstrated that the overall dopamine release at the mPFC or the summed activity of mPFC dopamine axons exhibits a strong response to aversive stimuli (e.g., tail shock), but little to rewards", which seems inconsistent with their own data].

      We always recorded the reward and aversive response together, which might have confused the reviewer. Therefore, there is no inconsistency in our data. We have clarified our methods and reasoning accordingly.

      Given these aspects of the data, it could be the case that the dopamine axons in mPFC encodes different types of information and delegates preferential processing to the most salient outcome across time.

      This is certainly an exciting interpretation, so we have included it in our discussion. Meanwhile, ‘the most salient outcome’ alone cannot fully capture the diverse response patterns of the dopaminergic axons, particularly reward-preferring axons. We discuss our findings in more detail in the revised manuscript.

      The use of two similar sounding tones (9Khz and 12KHz) for the reward and aversive predicting cues are likely to enhance this as it requires a fine-grained distinction between the two cues in order to learn effectively. There is considerable literature on mPFC function across species that would support such a view. Specifically, theories of mPFC function (in particular prelimbic cortex, which is where the axon images are mostly taken) generally center around resolution of conflict in what to respond, learn about, and attend to. That is, mPFC is important for devoting the most resources (learning, behavior) to the most relevant outcomes in the environment. This data then, provides a mechanism for this to occur in mPFC. That is, dopamine axons signal to the mPFC the most salient aspects of the environment, which should be preferentially learned about and responded towards. This is also consistent with the absence of a negative prediction error during omission: the dopamine axons show increases in responses during receipt of unexpected outcomes, but do not encode negative errors. This supports a role for this projection in helping to allocate resources to the most salient outcomes and their predictors, and not learning per se. Below are a just few references from the rich literature on mPFC function (some consider rodent mPFC analogous to DLPFC, some mPFC), which advocate for a role in this region in allocating attention and cognitive resources to most relevant stimuli, and do not indicate preferential processing of aversive stimuli.

      Distinguishing between 9 kHz and 12 kHz sound tones may not be that difficult, considering anticipatory licking and running are differentially manifested. In addition, previous studies have shown that mice can distinguish between two sound tones when they are separated by 7% (de Hoz and Nelken 2014). Nonetheless, we agree with the attractive interpretation that “the mPFC devotes the most resources (learning, behavior) to the most relevant outcomes in the environment” and that dopamine is a mechanism for this. Therefore, we discuss this interpretation in the revised text.

      References:

      (1) Miller, E. K., & Cohen, J. D. (2001). An integrative theory of prefrontal cortex function. Annual review of neuroscience, 24(1), 167-202.

      (2) Bissonette, G. B., Powell, E. M., & Roesch, M. R. (2013). Neural structures underlying set-shifting: roles of medial prefrontal cortex and anterior cingulate cortex. Behavioural brain research, 250, 91101.

      (3) Desimone, R., & Duncan, J. (1995). Neural mechanisms of selective visual attention. Annual review of neuroscience, 18(1), 193-222.

      (4) Sharpe, M. J., Stalnaker, T., Schuck, N. W., Killcross, S., Schoenbaum, G., & Niv, Y. (2019). An integrated model of action selection: distinct modes of cortical control of striatal decision making. Annual review of psychology, 70, 53-76.

      (5) Ridderinkhof, K. R., Ullsperger, M., Crone, E. A., & Nieuwenhuis, S. (2004). The role of the medial frontal cortex in cognitive control. science, 306(5695), 443-447.

      (6) Nee, D. E., Kastner, S., & Brown, J. W. (2011). Functional heterogeneity of conflict, error, taskswitching, and unexpectedness effects within medial prefrontal cortex. Neuroimage, 54(1), 528-540.

      (7) Isoda, M., & Hikosaka, O. (2007). Switching from automatic to controlled action by monkey medial frontal cortex. Nature neuroscience, 10(2), 240-248.

      Reviewer #1 (Recommendations For The Authors):

      Specific Suggestions and Questions on the Methods Section:

      In general, the methods part is not well documented and sometimes confusing. Thus, as it stands, it hinders reproducible research. Specific suggestions/questions are listed in the following section.

      (1) Broussard et al. 2018 introduced axon-GCaMP6 instead of axon-jGCaMP8m. The authors should provide details about the source of this material. If it was custom-made, a description of the subcloning process would be appreciated. Additionally, consider depositing sequence information or preferably the plasmid itself. Furthermore, the introduction of the jGCaMP8 series by Zhang, Rozsa, et al. 2023 should be acknowledged and referenced in your manuscript.

      We thank the reviewer for pointing this out. We have now included details on how we prepared the axon-jGCaMP8m, which was based on plasmids available at Addgene. Additionally, we have deposited our construct to Addgene ( https://www.addgene.org/216533/ ). We have also cited Janelia’s report on jGCaMP8, Zhang et al.

      (2) The authors elaborate on the approach taken for experimental synchronization. Specifically, how was the alignment achieved between 2-photon imaging, treadmill recordings, aversive/appetitive stimuli, and videography? It would be important to document the details of the software and hardware components employed for generating TTLs that trigger the pump, stimulator, cameras, etc.

      We have now included a more detailed explanation about the timing control. We utilize a custommade MATLAB program that sends TTL square waves and analogue waves via a single National Instruments board (USB-6229) to control two-photon image acquisition, behavior camera image acquisition, water syringe movement, current flow from a stimulator, and sound presentation. We also continuously recorded at 30 kHz via a separate National Instrument board (PCIe-6363) the frame timing of two-photon imaging, the frame timing of a behavior camera, copies of command waves (sent to the syringe pump, the stimulator, and the speaker), and signals from the treadmill corresponding to running speed.

      (3) The information regarding the cameras utilized in the study presents some confusion. In one instance, you mention, "To monitor licking behavior, the face of each mouse was filmed with a camera at 60 Hz (CM3-U3-13Y3M-CS, FLIR)" (Line 488). However, there's also a reference to filming facial expressions using an infrared web camera (Line 613). Could you clarify whether the FLIR camera (which is an industrial CMOS not a webcam) is referred to as a webcam? Alternatively, if it's a different camera being discussed, please provide product details, including pixel numbers and frame rate for clarity.

      We thank the reviewer for pointing this out. This was a mistake on our end. The camera used in the current project was a CM3-U3-13Y3M-CS, not a web camera. We have now corrected this.

      (4) Please provide more information about the methodology employed for lick detection. Specifically, did the authors solely rely on videography for this purpose? If so, why was an electrical (or capacitive) detector not used? It would provide greater accuracy in detecting licking.

      Lick detection was performed offline based on videography, using DeepLabCut. As licking occurs at a frequency of ~6.5 Hz (Xu, …, O’Connor Nature Neurosci, 2022), the movement can be detected at a frame rate of 60 Hz. Initially, we used both a lick sensor and videography. However, we favored videography because it could potentially provide non-binary information.

      Other Minor Points:

      (5) Ensure consistency in the citation format; both Vander Weele et al. 2018 and Weele et al. 2019, share the same first author.

      Thank you for pointing this out. Endnote processes the first author’s name differently depending on the journal. We fixed the error manually. The first paper (2018) is an original research paper, and the second one (2019) is a review about how dopamine modulates aversive processing in the mPFC. We cited the second one in three instances where we mentioned review papers.

      (6) The distinction between "dashed vs dotted lines" in Figure 3K and 3M appears to be very confusing. Please consider providing a clearer visualization/labeling to mitigate this confusion.

      We have now changed the line styles.

      (7) Additionally plotting mean polar angles of aversive/appetitive axons as vectors in the Cartesian scatter plots (2J, 3I,J) would make interpretation easier.

      We have now made this change to Figures 2, 3, 4.

      (8) Data and codes should be shared in a public database. This is important for reproducible research and we believe that "available from the corresponding author upon reasonable request" is outdated language.

      We have uploaded the data to GitHub, https://github.com/pharmedku/2024-elife-da-axon.

      Reviewer #2 (Recommendations For The Authors):

      (1) Authors don't show which mouse each axon data comes from making it hard to know if differences arise from inter-mouse differences vs differences in axons. The best way to address this point is to show similar plots as Figure 2J & K but broken down by mouse to shows whether each mouse had evidence of these two clusters.

      We have now made this change to Figure 2-figure supplement 3.

      (2) Line 166: Should this sentence point to panels 2F, G, H rather than 2I which doesn't show a shock response?

      We thank the reviewer for pointing this out. We have fixed the incorrect labels.

      Line 195: The population level bias to aversive stimuli was shown previously using photometry so it is not justified to say "for the first time" regarding this statement.

      We have adjusted this sentences so the claim of ”for the first time” is not associated with the population-level bias.

      (4) The paper lacks a discussion of the potential role that novelty plays in the amplitude of the responses given that tail shocks occur less often that rewards. Is the amplitude of the first reward of the day larger than subsequent rewards? Would tail shock responses decay if they occurred in sequential trials?

      Following the reviewer's suggestion, we conducted a comparison of individual axonal responses to both conditioned and unconditioned stimuli across the first trial and subsequent trials. Our findings reveal a notable trend: aversive-preferring axons exhibited attenuation in response to CSreward, yet enhancement in response to CSaversive. Conversely, the response of these axons to USreward was attenuated, with no significant change observed for USaversive. In contrast, reward-preferring axons displayed an invariable activity pattern from the initial trial, highlighting the functional diversity present within dopamine axons. This analysis has been integrated into Figure 3-figure supplement 4 and is elaborated upon in the Discussion section.

      (5) Fix typo in Figure 1 - supplement 1. Shift

      We have now corrected this. Thank you.

      (6) The methods section needs information about trial numbers. Please indicate how many trials were presented to each mouse per day.

      We have now added the information about trial numbers to the Methods section.

      Reviewer #3 (Recommendations For The Authors):

      In line with the public review, my recommendation is for the authors to remain as objective about their data as possible. There are many points in the manuscript where the authors seem to directly contradict their own data. For example, they first detail that dopamine axons respond to unexpected water rewards. Indeed, they find that there are 40% of dopamine axons that respond in this way. Then, a few paragraphs later they state: "Previous studies have demonstrated that the overall dopamine release at the mPFC or the summed activity of mPFC dopamine axons exhibits a strong response to aversive stimuli (e.g., tail shock), but little to rewards". As detailed above, I do not think these data support an idea that dopamine axons in mPFC preferentially encode aversive outcomes. If the authors wanted to examine a role for mPFC in preferential encoding of aversive stimuli, you would first have to equate the outcomes by magnitude and then compare how the axons acquire preferences across time. Alternatively, a prediction of a more general process that I detail above would predict that you could give mice two rewards that differ in magnitude (e.g., lots of food vs. small water) and you would see the same results that the authors have seen here (i.e., a preference for the food, which is the larger and more salient outcome). Without other tests of how dopamine axons in mPFC respond to situations like this, I don't think any conclusion around mPFC in favoring aversive stimuli can be made.

      As suggested, we have made the current manuscript as objective as possible, removing interpretation aspects regarding what dopamine axons encode and emphasizing their functional diversity. In particular, we remove the word ‘encode’ when describing the response of dopamine axons.

      Although it may have appeared unclear, there was no contradiction within our data regarding the response to reward and aversive stimuli. We have now improved the readability of the Results and Methods sections. Concerning the interpretation of what exactly the mPFC dopamine axons encode, we have rewritten the discussion to be as objective about our data as possible, as suggested. We also have edited our title and abstract accordingly. Meanwhile, we wish to emphasize that our reward and aversive stimuli are standard paradigms commonly used in the field. We believe, and all the reviewers agreed, that reporting the diversity of dopamine axonal responses with a novel imaging design constitutes new insight for the neuroscience community. Therefore, we have decided to leave the introduction of new behavioral tasks for future studies and instead expanded our discussion.

      As mentioned, I think the experiments are executed really well and the technological aspects of the authors' methods are impressive. However, there are also some aspects of the data presentation that would be improved. Some of the graphs took a considerable amount of effort to unpack. For example, Figure 4 is hard going. Is there a way to better illustrate the main points that this figure wants to convey? Some of this might be helped by a more complete description in the figure captions about what the data are showing. It would also be great to see how the response of dopamine axons changes across trial within a session to the shock and water-predictive cues. Supp Figure 1 should be in the main text with standard error and analyses across time. Clarifying these aspects of the data would make the paper more relevant and accessible to the field.

      We thank the reviewer for pointing out that the legend of Figure 4 was incomplete. We have fixed it, along with improving the presentation of the figure. We have also prepared a new figure (Figure 3– figure supplement 4) to compare CSaversive and CSreward signals for the first and rest of the trials within daily sessions, revealing further functional diversity in dopamine axons. We have decided to keep Figure 1–figure supplement 2 as a figure supplement with an additional analysis, as another reviewer pointed out that the design is not completely new. Furthermore, as eLife readers can easily access figure supplements, we believe it is appropriate to maintain it in this way.

      Minor points:

      (1) What is the control period for the omission test? Was omission conducted for the shock?

      The control period for reward omission is a 2-second period just before the CS onset. We did not include shock omission, because a sufficient number of trials (> 6 trials) for the rare omission condition could not be achieved within a single day.

      (2) The authors should mention how similar the tones were that predicted water and shock.

      According to de Hoz and Nelken (2014), a frequency difference of 4–7% is enough for mice to discriminate between tones. In addition, anticipatory licking and running confirmed that the mice could discriminate between the frequencies. We have now included this information in the Discussion.

      (3) I realize the viral approach used in the current studies may not allow for an idea of where in VTA dopamine neurons are that project to mPFC- is there data in the literature that speak to this? Particularly important as we now know that there is considerable heterogeneity in dopamine neuronal responses, which is often captured by differences in medial/lateral position within VTA.

      Some studies have suggested that mesocortical dopamine neurons are located in the medial posterior VTA (e.g., Lammel et al., 2008). However, in mouse anterograde tracing, it is not possible to spatially confine the injection of conventional viruses/tracers. We now refer to Lammel et al., 2008 in the Introduction.

    1. Author response:

      eLife assessment

      This study provides valuable information on the mechanism of PepT2 through enhanced-sampling molecular dynamics, backed by cell-based assays, highlighting the importance of protonation of selected residues for the function of a proton-coupled oligopeptide transporter (hsPepT2). The molecular dynamics approaches are convincing, but with limitations that could be addressed in the manuscript, including lack of incorporation of a protonation coordinate in the free energy landscape, possibility of protonation of the substrate, errors with the chosen constant pH MD method for membrane proteins, dismissal of hysteresis emerging from the MEMENTO method, and the likelihood of other residues being affected by peptide binding. Some changes to the presentation could be considered, including a better description of pKa calculations and the inclusion of error bars in all PMFs. Overall, the findings will appeal to structural biologists, biochemists, and biophysicists studying membrane transporters.

      We would like to express our gratitude to the reviewers for providing their feedback on our manuscript, and also for recognising the variety of computational methods employed, the amount of sampling collected and the experimental validation undertaken. Following the individual reviewer comments, as addressed point-by-point below, we will shortly prepare a revised version of this paper. Intended changes to the revised manuscript are marked up in bold font in the detailed responses below, but before that we address some of the comments made above in the general assessment:

      • “lack of incorporation of a protonation coordinate in the free energy landscape”. We acknowledge that of course it would be highly desirable to treat protonation state changes explicitly and fully coupled to conformational changes. However, at this point in time, evaluating such a free energy landscape is not computationally feasible (especially considering that the non-reactive approach taken here already amounts to almost 1ms of total sampling time). Previous reports in the literature tend to focus on either simpler systems or a reduced subset of a larger problem. As we were trying to obtain information on the whole transport cycle, we decided to focus here on non-reactive methods.

      • “possibility of protonation of the substrate”. The reviewers are correct in pointing out this possibility, which we had not discussed explicitly in our manuscript. Briefly, while we describe a mechanism in which protonation of only protein residues (with an unprotonated ligand) can account for driving all the necessary conformational changes of the transport cycle, there is some evidence for a further intermediate protonation site in our data (as we commented on in the first version of the manuscript as well), which may or may not be the substrate itself. A future explicit treatment of the proton movements through the transporter, when it will become computationally tractable to do so, will have to include the substrate as a possible protonation site; for the present moment, we will amend our discussion to alert the reader to the possibility that the substrate could be an intermediate to proton transport. This has repercussions for our study of the E56 pKa value, where – if protons reside with a significant population at the substrate C-terminus – our calculated shift in pKa upon substrate binding could be an overestimate, although we would qualitatively expect the direction of shift to be unaffected. However, we also anticipate that treating this potential coupling explicitly would make convergence of any CpHMD calculation impractical to achieve and thus it may be the case that for now only a semi-quantitative conclusion is all that can be obtained.

      • “errors with the chosen constant pH MD method for membrane proteins”. We acknowledge that – as reviewer #1 has reminded us – the AMBER implementation of hybrid-solvent CpHMD is not rigorous for membrane proteins, and as such we will add a cautionary note to our paper. We will also explain how the use of the ABFE thermodynamic cycle calculations helps to validate the CpHMD results in a completely orthogonal manner (we will promote this validation which was in the supplementary figures into the main text in the revised version). We therefore remain reasonably confident in the results presented with regards to the reported pKa shift of E56 upon substrate binding, and suggest that if the impact of neglecting the membrane in the implicit-solvent stage of CpHMD is significant, then there is likely an error cancellation when considering shifts induced by the incoming substrate.

      • “dismissal of hysteresis emerging from the MEMENTO method”. We have shown in our method design paper how the use of the MEMENTO method drastically reduces hysteresis compared to steered MD and metadynamics for path generation, and find this improvement again for PepT2 in this study. We will address reviewer #3’s concern about our presentation on this point by revising our introduction of the MEMENTO method, as detailed in the response below.

      • “the likelihood of other residues being affected by peptide binding”. In this study, we have investigated in detail the involvement of several residues in proton-coupled di-peptide transport by PepT2. Short of the potential intermediate protonation site mentioned above, the set of residues we investigate form a minimal set of sorts within which the important driving forces of alternating access can be rationalised. We have not investigated in substantial detail here the residues involved in holding the peptide in the binding site, as they are well studied in the literature and ligand promiscuity is not the problem of interest here. It remains entirely possible that further processes contribute to the mechanism of driving conformational changes by involving other residues not considered in this paper. We will make our speculation that an ensemble of different processes may be contributing simultaneously more explicit in our revision, but do not believe any of our conclusions would be affected by this.

      As for the additional suggested changes in presentation, we will provide the requested details on the CpHMD analysis. Furthermore, we will use the convergence data presented separately in figures S12 and S16 to include error bars on our 1D-reprojections of the 2D-PMFs in figures 3, 4 and 5. (Note that we will opt to not do so in figures S10 and S15 which collate all 1D PMF reprojections for the OCC ↔ OF and OCC ↔ IF transitions in single reference plots, respectively, to avoid overcrowding those necessarily busy figures). We are also changing the colours schemes of these plots in our revision to improve accessibility.

      Reviewer #1 (Public Review):

      The authors have performed all-atom MD simulations to study the working mechanism of hsPepT2. It is widely accepted that conformational transitions of proton-coupled oligopeptide transporters (POTs) are linked with gating hydrogen bonds and salt bridges involving protonatable residues, whose protonation triggers gate openings. Through unbiased MD simulations, the authors identified extra-cellular (H87 and D342) and intra-cellular (E53 and E622) triggers. The authors then validated these triggers using free energy calculations (FECs) and assessed the engagement of the substrate (Ala-Phe dipeptide). The linkage of substrate release with the protonation of the ExxER motif (E53 and E56) was confirmed using constant-pH molecular dynamics (CpHMD) simulations and cellbased transport assays. An alternating-access mechanism was proposed. The study was largely conducted properly, and the paper was well-organized. However, I have a couple of concerns for the authors to consider addressing.

      We would like to note here that it may be slightly misleading to the reader to state that “The linkage of substrate release with the protonation of the ExxER motif (E53 and E56) was confirmed using constant-pH molecular dynamics (CpHMD) simulations and cell-based transport assays.” The cellbased transport assays confirmed the importance of the extracellular gating trigger residues H87, S321 and D342 (as mentioned in the preceding sentence), not of the substrate-protonation link as this line might be understood to suggest.

      (1) As a proton-coupled membrane protein, the conformational dynamics of hsPepT2 are closely coupled to protonation events of gating residues. Instead of using semi-reactive methods like CpHMD or reactive methods such as reactive MD, where the coupling is accounted for, the authors opted for extensive non-reactive regular MD simulations to explore this coupling. Note that I am not criticizing the choice of methods, and I think those regular MD simulations were well-designed and conducted. But I do have two concerns.

      a) Ideally, proton-coupled conformational transitions should be modelled using a free energy landscape with two or more reaction coordinates (or CVs), with one describing the protonation event and the other describing the conformational transitions. The minimum free energy path then illustrates the reaction progress, such as OCC/H87D342- → OCC/H87HD342H → OF/H87HD342H as displayed in Figure 3.

      We concur with the reviewer that the ideal way of describing the processes studied in our paper would be as a higher-dimensional free energy landscapes obtained from a simulation method that can explicitly model proton-transfer processes. Indeed, it would have been particularly interesting and potentially informative with regards to the movement of protons down into the transporter in the OF → OCC → IF sequence of transitions. As we note in our discussion on the H87→E56 proton transfer:

      “This could be investigated using reactive MD or QM/MM simulations (both approaches have been employed for other protonation steps of prokaryotic peptide transporters, see Parker et al. (2017) and Li et al. (2022)). However, the putative path is very long (≈ 1.7 nm between H87 and E56) and may or may not involve a large number of intermediate protonatable residues, in addition to binding site water. While such an investigation is possible in principle, it is beyond the scope of the present study.”

      Where even sampling the proton transfer step itself in an essentially static protein conformation would be pushing the boundaries of what has been achieved in the field, we believe that considering the current state-of-the-art, a fully coupled investigation of large-scale conformational changes and proton-transfer reaction is not yet feasible in a realistic/practical time frame. We also note this limitation already when we say that:

      “The question of whether proton binding happens in OCC or OF warrants further investigation, and indeed the co-existence of several mechanisms may be plausible here”.

      Nonetheless, we are actively exploring approaches to treat uptake and movement of protons explicitly for future work.

      In our revision, we will expand on our discussion of the reasoning behind employing a nonreactive approach and the limitations that imposes on what questions can be answered in this study.

      Without including the protonation as a CV, the authors tried to model the free energy changes from multiple FECs using different charge states of H87 and D342. This is a practical workaround, and the conclusion drawn (the OCC→ OF transition is downhill with protonated H87 and D342) seems valid. However, I don't think the OF states with different charge states (OF/H87D342-, OF/H87HD342-, OF/H87D342H, and OF/H87HD342H) are equally stable, as plotted in Figure 3b. The concern extends to other cases like Figures 4b, S7, S10, S12, S15, and S16. While it may be appropriate to match all four OF states in the free energy plot for comparison purposes, the authors should clarify this to ensure readers are not misled.

      The reviewer is correct in their assessment that the aligning of PMFs in these figures is arbitrary; no relative free energies of the PMFs to each other can be estimated without explicit free energy calculations at least of protonation events at the end state basins. The PMFs in our figures are merely superimposed for illustrating the differences in shape between the obtained profiles in each condition, as discussed in the text, and we will make this clear in the appropriate figure captions in our revision.

      b) Regarding the substrate impact, it appears that the authors assumed fixed protonation states. I am afraid this is not necessarily the case. Variations in PepT2 stoichiometry suggest that substrates likely participate in proton transport, like the Phe-Ala (2:1) and Phe-Gln (1:1) dipeptides mentioned in the introduction. And it is not rigorous to assume that the N- and C-termini of a peptide do not protonate/deprotonate when transported. I think the authors should explicitly state that the current work and the proposed mechanism (Figure 8) are based on the assumption that the substrates do not uptake/release proton(s).

      This is indeed an assumption inherent in the current work. While we do “speculate that the proton movement processes may happen as an ensemble of different mechanisms, and potentially occur contemporaneously with the conformational change” we do not in the current version indicate explicitly that this may involve the substrate. We will make clear the assumption and this possibility in the revised version of our paper. Indeed, as we discuss, there is some evidence in our PMFs of an additional protonation site not considered thus far, which may or may not be the substrate. We will make note of this point in the revised manuscript.

      As for what information can be drawn from the given experimental stoichiometries, we note in our paper that “a 2:1 stoichiometry was reported for the neutral di-peptide D-Phe-L-Ala and 3:1 for anionic D-Phe-L-Glu. (Chen et al., 1999) Alternatively, Fei et al. (1999) have found 1:1 stoichiometries for either of D-Phe-L-Gln (neutral), D-Phe-L-Glu (anionic), and D-Phe-L-Lys (cationic).”

      We do not assume that it is our place to arbit among the apparent discrepancies in the experimental data here, although we believe that our assumed 2:1 stoichiometry is additionally “motivated also by our computational results that indicate distinct and additive roles played by two protons in the conformational cycle mechanism”.

      (2) I have more serious concerns about the CpHMD employed in the study.

      a) The CpHMD in AMBER is not rigorous for membrane simulations. The underlying generalized Born model fails to consider the membrane environment when updating charge states. In other words, the CpHMD places a membrane protein in a water environment to judge if changes in charge states are energetically favorable. While this might not be a big issue for peripheral residues of membrane proteins, it is likely unphysical for internal residues like the ExxER motif. As I recall, the developers have never used the method to study membrane proteins themselves. The only CpHMD variant suitable for membrane proteins is the membrane-enabled hybrid-solvent CpHMD in CHARMM. While I do not expect the authors to redo their CpHMD simulations, I do hope the authors recognize the limitations of their method.

      We will discuss the limitations of the AMBER CpHMD implementation in the revised version. However, despite that, we believe we have in fact provided sufficient grounds for our conclusion that substrate binding affects ExxER motif protonation in the following way:

      In addition to CpHMD simulations, we establish the same effect via ABFE calculations, where the substrate affinity is different at the E56 deprotonated vs protonated protein. This is currently figure S20, though in the revised version we will move this piece of validation into a new panel of figure 6 in the main text, since it becomes more important with the CpHMD membrane problem in mind. Since the ABFE calculations are conducted with an all-atom representation of the lipids and the thermodynamic cycle closes well, it would appear that if the chosen CpHMD method has a systematic error of significant magnitude for this particular membrane protein system, there may be the benefit of error cancellation. While the calculated absolute pKa values may not be reliable, the difference made by substrate binding appears to be so, as judged by the orthogonal ABFE technique.

      Although the reviewer does “not expect the authors to redo their CpHMD simulations”, we consider that it may be helpful to the reader to share in this response some results from trials using the continuous, all-atom constant pH implementation that has recently become available in GROMACS (Aho et al 2022, https://pubs.acs.org/doi/10.1021/acs.jctc.2c00516) and can be used rigorously with membrane proteins, given its all-atom lipid representation.

      Unfortunately, when trying to titrate E56 in this CpHMD implementation, we found few protonationstate transitions taking place, and the system often got stuck in protonation state–local conformation coupled minima (which need to interconvert through rearrangements of the salt bridge network involving slow side-chain dihedral rotations in E53, E56 and R57). Author response image 1 shows this for the apo OF state, Author response image 2 shows how noisy attempts at pKa estimation from this data turn out to be, necessitating the use of a hybrid-solvent method.

      Author response image 1.

      All-atom CpHMD simulations of apo-OF PepT2. Red indicates protonated E56, blue is deprotonated.

      Author response image 2.

      Difficulty in calculating the E56 pKa value from the noisy all-atom CpHMD data shown in Author response image 1

      b) It appears that the authors did not make the substrate (Ala-Phe dipeptide) protonatable in holosimulations. This oversight prevents a complete representation of ligand-induced protonation events, particularly given that the substrate ion pairs with hsPepT2 through its N- & C-termini. I believe it would be valuable for the authors to acknowledge this potential limitation.

      In this study, we implicitly assumed from the outset that the substrate does not get protonated, which – as by way of response to the comment above – we will acknowledge explicitly in revision. This potential limitation for the available mechanisms for proton transfer also applies to our investigation of the ExxER protonation states. In particular, a semi-grand canonical ensemble that takes into account the possibility of substrate C-terminus protonation may also sample states in which the substrate is protonated and oriented away from R57, thus leaving the ExxER salt bridge network in an apo-like state. The consequence would be that while the direction of shift in E56 pKa value will be the same, our CpHMD may overestimate its magnitude. It would thus be interesting to make the C-terminus protonatable for obtaining better quantitative estimates of the E56 pKa shift (as is indeed true in general for any other protein protonatable residue, though the effects are usually assumed to be negligible). We do note, however, that convergence of the CpHMD simulations would be much harder if the slow degree of freedom of substrate reorientation (which in our experience takes 10s to 100s of ns in this binding pocket) needs to be implicitly equilibrated upon protonation state transitions. We will discuss such considerations in the revision.

      Reviewer #2 (Public Review):

      This is an interesting manuscript that describes a series of molecular dynamics studies on the peptide transporter PepT2 (SLC15A2). They examine, in particular, the effect on the transport cycle of protonation of various charged amino acids within the protein. They then validate their conclusions by mutating two of the residues that they predict to be critical for transport in cell-based transport assays. The study suggests a series of protonation steps that are necessary for transport to occur in Petp2. Comparison with bacterial proteins from the same family shows that while the overall architecture of the proteins and likely mechanism are similar, the residues involved in the mechanism may differ.

      Strengths:

      This is an interesting and rigorous study that uses various state-of-the-art molecular dynamics techniques to dissect the transport cycle of PepT2 with nearly 1ms of sampling. It gives insight into the transport mechanism, investigating how the protonation of selected residues can alter the energetic barriers between various states of the transport cycle. The authors have, in general, been very careful in their interpretation of the data.

      Weaknesses:

      Interestingly, they suggest that there is an additional protonation event that may take place as the protein goes from occluded to inward-facing but they have not identified this residue.

      We have indeed suggested that there may be an additional protonation site involved in the conformational cycle that we have not been able to capture, which – as we discuss in our paper – might be indicated by the shapes of the OCC ↔ IF PMFs given in Figure S15. One possibility is for this to be the substrate itself (see the response to reviewer #1 above) though within the scope of this study the precise pathway by which protons move down the transporter and the exact ordering of conformational change and proton transfer reactions remains a (partially) open question. We acknowledge this and denote it with question marks in the mechanistic overview we give in Figure 8, and also “speculate that the proton movement processes may happen as an ensemble of different mechanisms, and potentially occur contemporaneously with the conformational change”.

      Some things are a little unclear. For instance, where does the state that they have defined as occluded sit on the diagram in Figure 1a? - is it truly the occluded state as shown on the diagram or does it tend to inward- or outward-facing?

      Figure 1a is a simple schematic overview intended to show which structures of PepT2 homologues are available to use in simulations. This was not meant to be a quantitative classification of states. Nonetheless, we can note that the OCC state we derived has extra- and intracellular gate opening distances (as measured by the simple CVs defined in the methods and illustrated in Figure 2a) that indicate full gate closure at both sides. In particular, although it was derived from the IF state via biased sampling, the intracellular gate opening distance in the OCC state used for our conformational change enhanced sampling was comparable to that of the OF state (ie, full closure of the gate), see Figure S2b and the grey bars therein. Therefore, we would schematically classify the OCC state to lie at the center of the diagram in Figure 1a. Furthermore, it is largely stable over triplicates of 1 μslong unbiased MD, where in 2/3 replicates the gates remain stable, and the remaining replicate there is partial opening of the intracellular gate (as shown in Figure 2 b/c under the “apo standard” condition). We comment on this in the main text by saying that “The intracellular gate, by contrast, is more flexible than the extracellular gate even in the apo, standard protonation state”, and link it to the lower barrier for transition to IF than to OF. We did this by saying that “As for the OCC↔OF transitions, these results explain the behaviour we had previously observed in the unbiased MD of Figure 2c.” We acknowledge this was not sufficiently clear and will add details to the latter sentence in revision to help clarify better the nature of the occluded state.

      The pKa calculations and their interpretation are a bit unclear. Firstly, it is unclear whether they are using all the data in the calculations of the histograms, or just selected data and if so on what basis was this selection done. Secondly, they dismiss the pKa calculations of E53 in the outward-facing form as not being affected by peptide binding but say that E56 is when there seems to be a similar change in profile in the histograms.

      In our manuscript, we have provided two distinct analyses of the raw CpHMD data. Firstly, we analysed the data by the replicates in which our simulations were conducted (Figure 6, shown as bar plots with mean from triplicates +/- standard deviation), where we found that only the effect on E56 protonation was distinct as lying beyond the combined error bars. This analysis uses the full amount of sampling conducted for each replicate. However, since we found that the range of pKa values estimated from 10ns/window chunks was larger than the error bars obtained from the replicate analysis (Figures S17 and S18), we sought to verify our conclusion by pooling all chunk estimates and plotting histograms (Figure S19). We recover from those the effect of substrate binding on the E56 protonation state on both the OF and OCC states. However, as the reviewer has pointed out (something we did not discuss in our original manuscript), there is a shift in the pKa of E53 of the OF state only. In fact, the trend is also apparent in the replicate-based analysis of Figure 6, though here the larger error bars overlap. In our revision, we will add more details of these analyses for clarity (including more detailed figure captions regarding the data used in Figure 6) as well as a discussion of the partial effect on the E53 pKa value.

      We do not believe, however, that our key conclusions are negatively affected. If anything, a further effect on the E53 pKa which we had not previously commented on (since we saw the evidence as weaker, pertaining to only one conformational state) would strengthen the case for an involvement of the ExxER motif in ligand coupling.

      Reviewer #3 (Public Review):

      Summary:

      Lichtinger et al. have used an extensive set of molecular dynamics (MD) simulations to study the conformational dynamics and transport cycle of an important member of the proton-coupled oligopeptide transporters (POTs), namely SLC15A2 or PepT2. This protein is one of the most wellstudied mammalian POT transporters that provides a good model with enough insight and structural information to be studied computationally using advanced enhanced sampling methods employed in this work. The authors have used microsecond-level MD simulations, constant-PH MD, and alchemical binding free energy calculations along with cell-based transport assay measurements; however, the most important part of this work is the use of enhanced sampling techniques to study the conformational dynamics of PepT2 under different conditions.

      The study attempts to identify links between conformational dynamics and chemical events such as proton binding, ligand-protein interactions, and intramolecular interactions. The ultimate goal is of course to understand the proton-coupled peptide and drug transport by PepT2 and homologous transporters in the solute carrier family.

      Some of the key results include:

      (1) Protonation of H87 and D342 initiate the occluded (Occ) to the outward-facing (OF) state transition.

      (2) In the OF state, through engaging R57, substrate entry increases the pKa value of E56 and thermodynamically facilitates the movement of protons further down.

      (3) E622 is not only essential for peptide recognition but also its protonation facilitates substrate release and contributes to the intracellular gate opening. In addition, cell-based transport assays show that mutation of residues such as H87 and D342 significantly decreases transport activity as expected from simulations.

      Strengths:

      (1) This is an extensive MD-based study of PepT2, which is beyond the typical MD studies both in terms of the sheer volume of simulations as well as the advanced methodology used. The authors have not limited themselves to one approach and have appropriately combined equilibrium MD with alchemical free energy calculations, constant-pH MD, and geometry-based free energy calculations. Each of these 4 methods provides a unique insight regarding the transport mechanism of PepT2.

      (2) The authors have not limited themselves to computational work and have performed experiments as well. The cell-based transport assays clearly establish the importance of the residues that have been identified as significant contributors to the transport mechanism using simulations.

      (3) The conclusions made based on the simulations are mostly convincing and provide useful information regarding the proton pathway and the role of important residues in proton binding, protein-ligand interaction, and conformational changes.

      Weaknesses:

      (1) Some of the statements made in the manuscript are not convincing and do not abide by the standards that are mostly followed in the manuscript. For instance, on page 4, it is stated that "the K64-D317 interaction is formed in only ≈ 70% of MD frames and therefore is unlikely to contribute much to extracellular gate stability." I do not agree that 70% is negligible. Particularly, Figure S3 does not include the time series so it is not clear whether the 30% of the time where the salt bridge is broken is in the beginning or the end of simulations. For instance, it is likely that the salt bridge is not initially present and then it forms very strongly. Of course, this is just one possible scenario but the point is that Figure S3 does not rule out the possibility of a significant role for the K64-D317 salt bridge.

      The reviewer is right to point out that the statement and Figure S3 as they stand do not adequately support our decision to exclude the K64-D317 salt-bridge in our further investigations. The violin plot shown in Figure S3, visualised as pooled data from unbiased 1 μs triplicates, does indeed not rule out a scenario where the salt bridge only formed late in our simulations (or only in some replicates), but then is stable. Therefore, in our revision, we will include the appropriate time-series of the salt bridge distances, showing how K64-D317 is initially stable but then falls apart in replicate 1, and is transiently formed and disengaged across the trajectories in replicates 2 and 3. We will also remake the data for this plot as we discovered a bug in the relevant analysis script that meant the D170-K642 distance was not calculated accurately. The results are however almost identical, and our conclusions remain.

      (2) Similarly, on page 4, it is stated that "whether by protonation or mutation - the extracellular gate only opens spontaneously when both the H87 interaction network and D342-R206 are perturbed (Figure S5)." I do not agree with this assessment. The authors need to be aware of the limitations of this approach. Consider "WT H87-prot" and "D342A H87-prot": when D342 residue is mutated, in one out of 3 simulations, we see the opening of the gate within 1 us. When D342 residue is not mutated we do not see the opening in any of the 3 simulations within 1 us. It is quite likely that if rather than 3 we have 10 simulations or rather than 1 us we have 10 us simulations, the 0/3 to 1/3 changes significantly. I do not find this argument and conclusion compelling at all.

      If the conclusions were based on that alone, then we would agree. However, this section of work covers merely the observations of the initial unbiased simulations which we go on to test/explore with enhanced sampling in the rest of the paper, and which then lead us to the eventual conclusions.

      Figure S5 shows the results from triplicate 1 μs-long trajectories as violin-plot histograms of the extracellular gate opening distance, also indicating the first and final frames of the trajectories as connected by an arrow for orientation – a format we chose for intuitively comparing 48 trajectories in one plot. The reviewer reads the plot correctly when they analyse the “WT H87-prot” vs “D342A H87-prot” conditions. In the former case, no spontaneous opening in unbiased MD is taking place, whereas when D342 is mutated to alanine in addition to H87 protonation, we see spontaneous transition in 1 out of 3 replicates. However, the reviewer does not seem to interpret the statement in question in our paper (“the extracellular gate only opens spontaneously when both the H87 interaction network and D342-R206 are perturbed”) in the way we intended it to be understood. We merely want to note here a correlation in the unbiased dataset we collected at this stage, and indeed the one spontaneous opening in the case comparison picked out by the reviewer is in the condition where both the H87 interaction network and D342-R206 are perturbed. In noting this we do not intend to make statistically significant statements from the limited dataset. Instead, we write that “these simulations show a large amount of stochasticity and drawing clean conclusions from the data is difficult”. We do however stand by our assessment that from this limited data we can “already appreciate a possible mechanism where protons move down the transporter pore” – a hypothesis we investigate more rigorously with enhanced sampling in the rest of the paper. We will revise the section in question to make clearer that the unbiased MD is only meant to give an initial hypothesis here to be investigated in more detail in the following sections. In doing so, we will also incorporate, as we had not done before, the case (not picked out by the reviewer here but concerning the same figure) of S321A & H87 prot. In the third replicate, this shows partial gate opening towards the end of the unbiased trajectory (despite D342 not being affected), highlighting further the stochastic nature that makes even clear correlative conclusions difficult to draw.

      (3) While the MEMENTO methodology is novel and interesting, the method is presented as flawless in the manuscript, which is not true at all. It is stated on Page 5 with regards to the path generated by MEMENTO that "These paths are then by definition non-hysteretic." I think this is too big of a claim to say the paths generated by MEMENTO are non-hysteretic by definition. This claim is not even mentioned in the original MEMENTO paper. What is mentioned is that linear interpolation generates a hysteresis-free path by definition. There are two important problems here: (a) MEMENTO uses the linear interpolation as an initial step but modifies the intermediates significantly later so they are no longer linearly interpolated structures and thus the path is no longer hysteresisfree; (b) a more serious problem is the attribution of by-definition hysteresis-free features to the linearly interpolated states. This is based on conflating the hysteresis-free and unique concepts. The hysteresis in MD-based enhanced sampling is related to the presence of barriers in orthogonal space. For instance, one may use a non-linear interpolation of any type and get a unique pathway, which could be substantially different from the one coming from the linear interpolation. None of these paths will be hysteresis-free necessarily once subjected to MD-based enhanced sampling techniques.

      We certainly do not intend to claim that the MEMENTO method is flawless. The concern the reviewer raises around the statement "These paths are then by definition non-hysteretic" is perhaps best addressed by a clarification of the language used and considering how MEMENTO is applied in this work.

      Hysteresis in the most general sense denotes the dependence of a system on its history, or – more specifically – the lagging behind of the system state with regards to some physical driver (for example the external field in magnetism, whence the term originates). In the context of biased MD and enhanced sampling, hysteresis commonly denotes the phenomenon where a path created by a biased dynamics method along a certain collective variable lags behind in phase space in slow orthogonal degrees of freedom (see Figure 1 in Lichtinger and Biggin 2023, https://doi.org/10.1021/acs.jctc.3c00140). When used to generate free energy profiles, this can manifest as starting state bias, where the conformational state that was used to seed the biased dynamics appears lower in free energy than alternative states. Figure S6 shows this effect on the PepT2 system for both steered MD (heavy atom RMSD CV) + umbrella sampling (tip CV) and metadynamics (tip CV). There is, in essence, a coupled problem: without an appropriate CV (which we did not have to start with here), path generation that is required for enhanced sampling displays hysteresis, but the refinement of CVs is only feasible when paths connecting the true phase space basins of the two conformations are available. MEMENTO helps solve this issue by reconstructing protein conformations along morphing paths which perform much better than steered MD paths with respect to giving consistent free energy profiles (see Figure S7 and the validation cases in the MEMENTO paper), even if the same CV is used in umbrella sampling.

      There are still differences between replicates in those PMFs, indicating slow conformational flexibility propagated from end-state sampling through MEMENTO. We use this to refine the CVs further with dimensionality reduction (see the Method section and Figure S8), before moving to 2D-umbrella sampling (figure 3). Here, we think, the reviewer’s point seems to bear. The MEMENTO paths are ‘non-hysteretic by definition’ with respect to given end states in the sense that they connect (by definition) the correct conformations at both end-states (unlike steered MD), which in enhanced sampling manifests as the absence of the strong starting-state bias we had previously observed (Figure S7 vs S6). They are not, however, hysteresis-free with regards to how representative of the end-state conformational flexibility the structures given to MEMENTO really were, which is where the iterative CV design and combination of several MEMENTO paths in 2D-PMFs comes in.

      We also cannot make a direct claim about whether in the transition region the MEMENTO paths might be separated from the true (lower free energy) transition paths by slow orthogonal degrees of freedom, which may conceivably result in overestimated barrier heights separating two free energy basins. We cannot guarantee that this is not the case, but neither in our MEMENTO validation examples nor in this work have we encountered any indications of a problem here.

      We hope that the reviewer will be satisfied by our revision, where we will replace the wording in question by a statement that the MEMENTO paths do not suffer from hysteresis that is otherwise incurred as a consequence of not reaching the correct target state in the biased run (in some orthogonal degrees of freedom).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer 1

      (1) Given the low trial numbers, and the point of sequential vs clustered reactivation mentioned in the public review, it would be reassuring to see an additional sanity check demonstrating that future items that are currently not on-screen can be decoded with confidence, and if so, when in time the peak reactivation occurs. For example, the authors could show separately the decoding accuracy for near and far items in Fig. 5A, instead of plotting only the difference between them.

      We have now added the requested analysis showing the raw decoded probabilities for near and distant items separately in Figure 5A. We have also chosen to replace Figure 5B with the new figure as we think it provides more information than the previous Figure 5B. Instead, we have moved Figure 5B to the supplement. The median peak decoded accuracy for near and distant items is equivalent. We have added the following description to the figure:

      “Decoded raw probabilities for off-screen items, that were up to two steps ahead of the current stimulus cue (‘near’,) vs. distant items that were more than two steps away on the graph, on trials with correct answers. The median peak decoded probability for near and distant items was at the same time point for both probability categories. Note that displayed lines reflect the average probability while, to eliminate influence of outliers, the peak displays the median.”

      (2) The non-sequential reactivation analyses often use a time window of peak decodability, and it was not entirely clear to me what data this time window is determined on, e.g., was it determined based on all future reactivations irrespective of graph distance? This should be clarified in the methods.

      Thank you for raising this. We now clarify this in the relevant section to read: “First, we calculated a time point of interest by computing the peak probability estimate of decoders across all trials, i.e., the average probability for each timepoint of all trials (except previous onscreen items) of all distances, which is equivalent to the peak of the differential reactivation analysis”

      (3) Fig 4 shows evidence for forward and backward sequential reactivation, suggesting that both forward and backward replay peak at a lag of 40-50msec. It would be helpful if this counterintuitive finding could be picked up in the discussion, explaining how plausible it is, physiologically, to find forward and backward replay at the same lag, and whether this could be an artifact of the TDLM method.

      This is an important point and we agree that it appears counterintuitive. However, we would highlight this exact time range has been reported in previous studies, though t never for both forward and backward replay. We now include a discussion of this finding. The section now reads:

      “[… ] Even though we primarily focused on the mean sequenceness scores across time lags, there appears s to be a (non-significant) peak at 40-60 milliseconds. While simultaneous forward and backward replay is theoretically possible, we acknowledge that it is somewhat surprising and, given our paradigm, could relate to other factors such as autocorrelations (Liu, Dolan, et al., 2021).”

      (4) It is reported that participants with below 30% decoding accuracy are excluded from the main analyses. It would be helpful if the manuscript included very specific information about this exclusion, e.g., was the criterion established based on the localizer cross-validated data, the temporal generalisation to the cued item (Fig. 2), or only based on peak decodability of the future sequence items? If the latter, is it applied based on near or far reactivations, or both?

      We now clarify this point to include more specific information, which reads:

      “[…] Therefore, we decided a priori that participants with a peak decoding accuracy of below 30% would be excluded from the analysis (nine participants in all) as obtained from the cross-validation of localizer trials”

      (5) Regarding the low amount of data for the reactivation analysis, the manuscript should be explicit about the number of trials available for each participant. For example, Supplemental Fig. 1 could provide this information directly, rather than the proportion of excluded trials.

      We have adapted the plot in the supplement to show the absolute number of rejected epochs per participant, in addition to the ratio.

      (6) More generally, the supplements could include more detailed information in the legends.

      We agree and have added more extensive explanation of the plots in the supplement legends.

      (7) The choice of comparing the 2 nearest with all other future items in the clustered reactivation analysis should be better motivated, e.g., was this based on the Wimmer et al. (2020) study?

      We have added our motivation for taking the two nearest items and contrasting them with the items further away. The paragraph reads:

      “[…] We chose to combine the following two items for two reasons: First, this doubled the number of included trials; secondly, using this approach the number of trials for each category (“near” and “distant”) was more balanced. […]”

      Reviewer 2

      (1) Focus exclusively on retrieval data (and here just on the current image trials).

      If I understand correctly, you focus all your analyses (behavioural as well as MEG analyses) on retrieval data only and here just on the current image trials. I am surprised by that since I see some shortcomings due to that. These shortcomings can likely be addressed by including the learning data (and predecessor image trials) in your analyses.

      a) Number of trials: During each block, you presented each of the twelve edges once. During retrieval, participants then did one "single testing session block". Does that mean that all your results are based on max. 12 trials? Given that participants remembered, on average, 80% this means even fewer trials, i.e., 9-10 trials?

      This is correct and a limitation of the paper. However, while we used only correct trials for the reactivation analysis, the sequential analysis was conducted using all trials disregarding the response behaviour. To retain comparability with previous studies we mainly focused on data from after a consolidation phase. Nevertheless, despite the trial limitation we consider the results are robust and worth reporting. Additionally, based on the suggestion of the referee, we now include results from learning blocks (see below).

      b) Extend the behavioural and replay/reactivation analysis to predecessor images.

      Why do you restrict your analyses to the current image trials? Especially given that you have such a low trial number for your analyses, I was wondering why you did not include the predecessor trials (except the non-deterministic trials, like the zebra and the foot according to Figure 2B) as well.

      We agree it would be great to increase power by adding the predecessor images to the current image cue analysis, excluding the ambiguous trials, we did not do so as we considered the underlying retrieval processes of these trial types are not the same, i.e. cannot be simply combined. Nevertheless, we have performed the suggested analysis to check if it increases our power. We found, that the reactivation effect is robust and significant at the same time point of 220-230 ms. However, the effect size actually decreased: While before, peak differential reactivation was at 0.13, it is now at 0.07. This in fact makes conceptual sense. We suspect that the two processes that are elicited by showing a single cue and by showing a second, related, cue are distinct insofar as the predecessor image acts as a primer for the current image, potentially changing the time course/speed of retrieval. Given our concerns that the two processes are not actually the same we consider it important to avoid mixing these data.

      We have added a statement to the manuscript discussing this point. The section reads:

      “Note that we only included data from the current image cue, and not from the predecessor image cue, as we assume the retrieval processes differ and should not be concatenated.”

      c) Extend the behavioural and replay/reactivation analysis to learning trials.

      Similar to point 1b, why did you not include learning trials in your analyses?

      The advantage of including (correct and incorrect) learning trials has the advantage that you do not have to exclude 7 participants due to ceiling performance (100%).

      Further, you could actually test the hypothesis that you outline in your discussion: "This implies that there may be a switch from sequential replay to clustered reactivation corresponding to when learned material can be accessed simultaneously without interference." Accordingly, you would expect to see more replay (and less "clustered" reactivation) in the first learning blocks compared to retrieval (after the rest period).

      To track reactivation and replay over the course of learning is a great idea. We have given a lot of thought as to how to integrate these findings but have not found a satisfying solution. Thus, analysis of the learning data turned out to be quite tricky: We decided that each participant should perform as many blocks as necessary to reach at least 80% (with a limit of six and lower bound of two, see Supplement figure 4). Indeed, some participant learned 100% of the sequence after one block (these were mostly medical students, learning things by hard is their daily task). With the benefit of hindsight, we realise our design means that different blocks are not directly comparable between participants. In theory, we would expect that replay emerges in parallel with learning and then gradually changes to clustered reactivation as memory traces become consolidated/stronger. However, it is unclear when replay should emerge and when precisely a switch to clustered reactivation would happen. For this reason, we initially decided not to include the learning trials into the paper.

      Nevertheless, to provide some insight into the learning process, and to see how consolidation impacts differential reactivation and replay, we have split our data into pre and post resting state, aggregating all learning trials of each participant. While this does not allow us to track processes on a block basis, it does offer potential (albeit limited) insight into the hypothesis we outline in the discussion.

      For reactivation, we see emergence of a clear increase, further strengthening the outlined hypothesis, however, for replay the evidence is less clear, as we do not know over how many learning blocks replay is expected.

      We calculated individual trajectories of how reactivation and replay changes from learning to retrieval and related these to performance. Indeed, we see an increase of reactivation is nominally associated with higher learning performance, while an increase in replay strength is associated with lower performance (both non-significant). However, due to the above-mentioned reasons we think it would premature to add this weak evidence to the paper.

      To mitigate problems of experiment design in relation to this question we are currently implementing a follow-study, where we aim to normalize the learning process across participants and index how replay/reactivation changes over the course of learning and after consolidation.

      We have added plots showing clustered reactivation sequential replay measures during learning (Figure 5D and Supplement 8)

      The added section(s) now read:

      “To provide greater detail on how the 8-minute consolidation period affected reactivation we, post-hoc, looked at relevant measures across learning trials in contrast to retrieval trials. For all learning trials, for each participant, we calculated differential reactivation for the same time point we found significant in the previous analysis (220-260 milliseconds). On average, differential reactivation probability increased from pre to post resting state (Figure 5D). […]

      Nevertheless, even though our results show a nominal increase in reactivation from learning to retrieval (see Figure 5D), due to experimental design features our data do not enable us to test for an hypothesized switch for sequential replay (see also “limitations” and Supplement 8).”

      d) Introduction (last paragraph): "We examined the relationship of graph learning to reactivation and replay in a task where participants learned a ..." If all your behavioural analyses are based on retrieval performance, I think that you do not investigate graph learning (since you exclusively focus the analyses on retrieving the graph structure). However, relating the graph learning performance and replay/reactivation activity during learning trials (i.e., during graph learning) to retrieval trials might be interesting but beyond the scope of this paper.

      We agree. We have changed the wording to be more accurate. Indeed, we do not examine graph learning but instead examine retrieval from a graph, after graph learning. The mentioned sentence now read

      “[…] relationship of retrieval from a learned graph structure to reactivation [...]”

      e) It is sometimes difficult to follow what phase of the experiment you refer to since you use the terms retrieval and test synonymously. Not a huge problem at all but maybe you want to stick to one term throughout the whole paper.

      Thank you for pointing this out. We have now adapted the manuscript to exclusively refer to “retrieval” and not to “test”.

      (2) Is your reactivation clustered?

      In Figure 5A, you compare the reactivation strength of the two items following the cue image (i.e., current image trials) with items further away on the graph. I do not completely understand why your results are evidence for clustered reactivation in contrast to replay.

      First, it would be interesting to see the reactivation of near vs. distant items before taking the difference (time course of item probabilities).

      (copied answer from response to Reviewer 1, as the same remark was raised)

      We have added the requested analysis showing the raw decoded probabilities for near and distant items separately in Figure 5A. We have chosen to replace Figure 5B with the new figure as we think that it offers more information than the previous Figure 5B. Instead, we have moved Figure 5B to the supplement. The median peak decoded accuracy for near and distant items is equivalent. We have added the following description to the figure:

      “Decoded raw probabilities for off-screen items, that were up to two steps ahead of the current stimulus cue (‘near’,) vs. distant items that were more than two steps away on the graph, on trials with correct answers. The median peak decoded probability for near and distant items was at the same time point for both probability categories. Note that displayed lines reflect the average probability while, to eliminate influence of outliers, the peak displays the median. .”

      Second, could it still be that the first item is reactivated before the second item? By averaging across both items, it becomes not apparent what the temporal courses of probabilities of both items look like (and whether they follow a sequential pattern). Additionally, the Gaussian smoothing kernel across the time dimension might diminish sequential reactivation and favour clustered reactivation. (In the manuscript, what does a Gaussian smoothing kernel of  = 1 refer to?). Could you please explain in more detail why you assume non-sequential clustered reactivation here and substantiate this with additional analyses?

      We apologise for the unclear description. Note the Gaussian kernel is in fact only used for the reactivation analysis and not the replay analysis, so any small temporal successions would have been picked up by the sequential analysis. We now clarify this in the respective section of the sequential analysis and also explain the parameter of delta= 1 in the reactivation analysis section. The paragraph now reads

      “[…] As input for the sequential analysis, we used the raw probabilities of the ten classifiers corresponding to the stimuli. [...]

      […] Therefore, to address this we applied a Gaussian smoothing kernel (using scipy.ndimage.gaussian_filter with the default parameter of σ=1 which corresponds approximately to taking the surrounding timesteps in both direction with the following weighting: current time step: 40%, ±1 step: 25%, ±2 step: 5%, ±3 step: 0.5%) [...]”

      (3) Replay and/or clustered reactivation?

      The relationship between the sequential forward replay, differential reactivation, and graph reactivation analysis is not really apparent. Wimmer et al. demonstrated that high performers show clustered reactivation rather than sequential reactivation. However, you did not differentiate in your differential reactivation analysis between high vs. low performers. (You point out in the discussion that this is due to a low number of low performers.)

      We agree that a split into high vs low performers would have been preferably for our analysis. However, there is one major obstacle that made us opt for a correlational analysis instead: We employed criteria learning, rendering a categorical grouping conceptually biased. Even though not all participants reached the criteria of 80%, our sample did not naturally split between high and low performers but was biased towards higher performance, leaving the groups uneven. The median performance was 83% (mean ~81%), with six of our subjects (~1/4th of included participant) having this exact performance. This makes a median or mean split difficult, as either binning assignment choice would strongly affect the results. We have added a limitations section in which we extensively discuss this shortcoming and reasoning for not performing a median split as in Wimmer et al (2020). The section now reads:

      “There are some limitations to our study, most of which originate from a suboptimal study design. [...], as we performed criteria learning, a sub-group analysis as in Wimmer et al., (2020) was not feasible, as median performance in our sample would have been 83% (mean 81%), with six participants exactly at that threshold. [...]”

      It might be worth trying to bring the analysis together, for example by comparing sequential forward replay and differential reactivation at the beginning of graph learning (when performance is low) vs. retrieval (when performance is high).

      Thank you for the suggestion to include the learning segments, which we think improves the paper quite substantially. However, analysis of the learning data turned out to be quite tricky> We had decided that each participant should perform as many blocks as necessary to reach at least 80% accuracy (with a limit of six and lower bound of two, see Supplement figure 4). Some participants learned 100% of the sequence after one block (these were mostly medical students, learning things by hard is their daily task). This in hindsight is an unfortunate design feature in relation to learning as it means different blocks are not directly comparable between participants.

      In theory, we would expect that replay emerges in parallel with learning and then gradually change to clustered reactivation, as memory traces get consolidated/stronger. However, it is unclear when replay would emerge and when the switch to reactivation would happen. For this reason, we initially decided not to include the learning trials into the paper at all.

      Nevertheless, to give some insight into the learning process and to see how consolidation effects differential reactivation and replay, we have split our data into pre and post resting state, aggregating all learning trials of each participant. While this does not allow us to track measures of interest on a block basis, it gives some (albeit limited) insight into the hypothesis outlined in our discussion.

      For reactivation, we see a clear increase, further strengthening the outlined hypothesis, However, for replay the evidence is less obvious, potentially due to that fact that we do not know across how many learning blocks replay is to be expected.

      The added section(s) now read:

      “To examine how the 8-minute consolidation period affected reactivation we, post-hoc, looked at relevant measures during learning trials in contrast to retrieval trials. For all learning trial, for each participant, we calculated differential reactivation for the time point we found significant during the previous analysis (220-260 milliseconds). On average, differential reactivation probability increased from pre to post resting state (Figure 5D).

      […]

      Nevertheless, even though our results show a nominal increase in reactivation from learning to retrieval (see Figure 5D), our data does not enable us to show an hypothesized switch for sequential replay (see also “limitations” and Supplement 8).”

      Additionally, the main research question is not that clear to me. Based on the introduction, I thought the focus was on replay vs. clustered reactivation and high vs. low performance (which I think is really interesting). However, the title is more about reactivation strength and graph distance within cognitive maps. Are these two research questions related? And if so, how?

      We agree we need to be clearer on this point. We have added two sentences to the introduction, which should address this point. The section now reads:

      “[…] In particular, the question remains how the brain keeps track of graph distances for successful recall and whether the previously found difference between high and low performers also holds true within a more complex graph learning context.”

      (4) Learning the graph structure.

      I was wondering whether you have any behavioural measures to show that participants actually learn the graph structure (instead of just pairs or triplets of objects). For example, do you see that participants chose the distractor image that was closer to the target more frequently than the distractor image that was further away (close vs. distal target comparison)? It should be random at the beginning of learning but might become more biased towards the close target.

      Thanks, this is an excellent suggestion. Our analysis indeed shows that people take the near lure more often than the far lure in later blocks, while it is random in the first block.

      Nevertheless, we have decided to put these data into the supplement and reference it in the text. This is because analysis of the learning blocks is challenging and biased in general. Each participant had a different number of learning blocks based on their learning rate, and this makes it difficult to compare learning across participants. We have tried our best to accommodate and explain these difficulties in the figure legend. Nevertheless, we thank the referee for guidance here and this analysis indeed provides further evidence that participants learned the actual graph structure.

      The added section reads

      “Additionally, we have included an analysis showing how wrong answers participants provided were random in the first block and biased towards closer graph nodes in later blocks. This is consistent with participants actually learning the underlying graph structure as opposed to independent triplets (see figure and legend of Supplement 6 for details).”

      (5) Minor comments

      a) "Replay analysis relies on a successive detection of stimuli where the chance of detection exponentially decreases with each step (e.g., detecting two successive stimuli with a chance of 30% leaves a 9% chance of detecting the replay event). " Could you explain in more detail why 30% is a good threshold then?

      Thank you. We have further clarified the section. As we are working mainly with probabilities, it is useful to keep in mind that accuracy is a class metric that only provides a rough estimate of classifier ability. Alternatively, something like a Top-3-Accuracy would be preferable, but also slightly silly in the context of 10 classes.

      Nevertheless, subtle changes in probability estimates are present and can be picked up by the methods we employ. Therefore, the 30% is a rough lower bound and decided based on pilot data that showed that clean MEG data from attentive participants can usually reach this threshold. The section now reads:

      “(e.g., detecting two successive stimuli with a chance of 30% leaves a 9% chance of detecting a replay event). However, one needs to bear in mind that accuracy is a “winnertakes-all” metric indicating whether the top choice also has the highest probability, disregarding subtle, relative changes in assigned probability. As the methods used in this analysis are performed on probability estimates and not class labels, one can expect that the 30% are a rough lower bound and that the actual sensitivity within the analysis will be higher. Additionally, based on pilot data, we found that attentive participants were able to reach 30% decodability, allowing us to use decodability as a data quality check. “

      b) Could you make explicit how your decoders were designed? Especially given that you added null data, did you train individual decoders for one class vs. all other classes (n = 9 + null data) or one class vs. null data?

      We added detail to the decoder training. The section now reads

      “Decoders were trained using a one-vs-all approach, which means that for each class, a separate classifier was trained using positive examples (target class) and negative examples (all other classes) plus null examples (data from before stimulus presentation, see below). In detail, null data was.”

      c) Why did you choose a ratio of 1:2 for your null data?

      Our choice for using a higher ratio was based upon previous publications reporting better sensitivity of TDLM using higher ratios, as spatial sensor correlations are decreasing. Nevertheless, this choice was not well investigated beforehand. We have added more information to this to the manuscript

      d) You could think about putting the questionnaire results into the supplement if they are sanity checks.

      We have added the questionnaire results. However, due to the size of the tables, we have decided to add them as excel files into the supplementary files of the code repository. We have mentioned the existence file in the publication.

      e) Figure 2. There is a typo in D: It says "Precessor Image" instead of "Predecessor Image".

      Fixed typo in figure.

      f) You write "Trials for the localizer task were created from -0.1 to 0.5 seconds relative to visual stimulus onset to train the decoders and for the retrieval task, from 0 to 1.5 seconds after onset of the second visual cue image." But the Figure legend 3D starts at -0.1 seconds for the retrieval test.

      We have now clarified this. For the classifier cross-validation and transfer sanity check and clustered analysis we used trials from -0.1 to 0.5s, whereas for the sequenceness analysis of the retrieval, we used trials from 0 to 1.5 seconds

    1. Author response:

      The following is the authors’ response to the previous reviews.

      eLife assessment

      This important study advances our understanding of how past and future information is jointly considered in visual working memory by studying gaze biases in a memory task that dissociates the locations during encoding and memory tests. The evidence supporting the conclusions is convincing, with state-of-the-art gaze analyses that build on a recent series of experiments introduced by the authors. This work, with further improvements incorporating the existing literature, will be of broad interest to vision scientists interested in the interplay of vision, eye movements, and memory.

      We thank the Editors and the Reviewers for their enthusiasm and appreciation of our task, our findings, and our article. We also wish to thank the Reviewers for their constructive comments that we have embraced to improve our article. Please find below our point-by-point responses to this valuable feedback, where we also state relevant revisions that we have made to our article.

      In addition, please note that we have now also made our data and code publicly available.

      Reviewer 1, Comments:

      In this study, the authors offer a fresh perspective on how visual working memory operates. They delve into the link between anticipating future events and retaining previous visual information in memory. To achieve this, the authors build upon their recent series of experiments that investigated the interplay between gaze biases and visual working memory. In this study, they introduce an innovative twist to their fundamental task. Specifically, they disentangle the location where information is initially stored from the location where it will be tested in the future. Participants are tasked with learning a novel rule that dictates how the initial storage location relates to the eventual test location. The authors leverage participants' gaze patterns as an indicator of memory selection. Intriguingly, they observe that microsaccades are directed toward both the past encoding location and the anticipated future test location. This observation is noteworthy for several reasons. Firstly, participants' gaze is biased towards the past encoding location, even though that location lacks relevance to the memory test. Secondly, there's a simultaneous occurrence of an increased gaze bias towards both the past and future locations. To explore this temporal aspect further, the authors conduct a compelling analysis that reveals the joint consideration of past and future locations during memory maintenance. Notably, microsaccades biased towards the future test location also exhibit a bias towards the past encoding location. In summary, the authors present an innovative perspective on the adaptable nature of visual working memory. They illustrate how information relevant to the future is integrated with past information to guide behavior.

      Thank you for your enthusiasm for our article and findings as well as for your constructive suggestions for additional analyses that we respond to in detail below.

      This short manuscript presents one experiment with straightforward analyses, clear visualizations, and a convincing interpretation. For their analysis, the authors focus on a single time window in the experimental trial (i.e., 0-1000 ms after retro cue onset). While this time window is most straightforward for the purpose of their study, other time windows are similarly interesting for characterizing the joint consideration of past and future information in memory. First, assessing the gaze biases in the delay period following the cue offset would allow the authors to determine whether the gaze bias towards the future location is sustained throughout the entire interval before the memory test onset. Presumably, the gaze bias towards the past location may not resurface during this delay period, but it is unclear how the bias towards the future location develops in that time window. Also, the disappearance of the retro cue constitutes a visual transient that may leave traces on the gaze biases which speaks again for assessing gaze biases also in the delay period following the cue offset.

      Thank you for raising this important point. We initially focused on the time window during the cue given that our central focus was on gaze-biases associated with mnemonic item selection. By zooming in on this window, we could best visualize our main effects of interest: the joint selection (in time) of past and future memory attributes.

      At the same time, we fully agree that examining the gaze biases over a more extended time window yields a more comprehensive view of our data. To this end, we have now also extended our analysis to include a wider time range that includes the period between cue offset (1000 ms after cue onset) and test onset (1500 ms after cue onset). We present these data below. Because we believe our future readers are likely to be interested in this as well, we have now added this complementary visualization as Supplementary Figure 4 (while preserving the focus in our main figure on the critical mnemonic selection period of interest).

      Author response image 1.

      Supplementary Figure 4. Gaze biases in extended time window as a complement to Figure 1 and Supplementary Figure 2. This extended analysis reveals that while the gaze bias towards the past location disappears around 600 ms after cue onset, the gaze bias towards the future location persists (panel a) and that while the early (joint) future bias occurs predominantly in the microsaccade range below 1 degree visual angle, the later bias to the future location incorporates larger eye movement that likely involve preparing for optimally perceiving the anticipated test stimulus (panel b).

      This extended analysis reveals that while the gaze bias towards the past location disappears around 600 ms after cue onset (consistent with our prior reports of this bias), the gaze bias towards the future location persists. Moreover, as revealed by the data in panel b above, while the early (joint) future bias occurs predominantly in the microsaccade range below 1 degree visual angle, the later bias to the future location incorporates larger eye movement that likely involve preparing for optimally perceiving the anticipated test stimulus.

      We now also call out these additional findings and figure in our article:

      Page 2 (Results): “Gaze biases in both axes were driven predominantly by microsaccades (Supplementary Fig. 2) and occurred similarly in horizontal-to-vertical and vertical-tohorizontal trials (Supplementary Fig. 3). Moreover, while the past bias was relatively transient, the future bias continued to increase in anticipation of the of the test stimulus and increasingly incorporated eye-movements beyond the microsaccade range (see Supplementary Fig. 4 for a more extended time range)”.

      Moreover, assessing the gaze bias before retro-cue onset allows the authors to further characterize the observed gaze biases in their study. More specifically, the authors could determine whether the future location is considered already during memory encoding and the subsequent delay period (i.e., before the onset of the retro cue). In a trial, participants encode two oriented gratings presented at opposite locations. The future rule indicates the test locations relative to the encoding locations. In their example (Figure 1a), the test locations are shifted clockwise relative to the encoding location. Thus, there are two pairs of relevant locations (each pair consists of one stimulus location and one potential test location) facing each other at opposite locations and therefore forming an axis (in the illustration the axis would go from bottom left to top right). As the future rule is already known to the participants before trial onset it is possible that participants use that information already during encoding. This could be tested by assessing whether more microsaccades are directed along the relevant axis as compared to the orthogonal axis. The authors should assess whether such a gaze bias exists already before retro cue onset and discuss the theoretical consequences for their main conclusions (e.g., is the future location only jointly used if the test location is implicitly revealed by the retro cue).

      Thank you – this is another interesting point. We fully agree that additional analysis looking at the period prior to retrocue onset may also prove informative. In accordance with the suggested analysis, we have therefore now also analysed the distribution of saccade directions (including in the period from encoding to retrocue) as a function of the future rule (presented below, and now also included as Supplementary Fig. 5). Complementary recent work from our lab has shown how microsaccade directions can align to the axis of memory contents during retention (see de Vries & van Ede, eNeuro, 2024). Based on this finding, one may predict that if participants retain the items in a remapped fashion, their microsaccades may align with the axis of the future rule, and this could potentially already happen prior to cue onset.

      These complementary analyses show that saccade directions are predominantly influenced by the encoding locations rather than the test locations, as seen most clearly by the saccade distribution plots in the middle row of the figure below. To obtain time-courses, we categorized saccades as occurring along the axis of the future rule or along the orthogonal axis (bottom row of the figure below). Like the distribution plots, these time course plots also did not reveal any sign of a bias along the axis of the future rule itself.

      Importantly, note how this does not argue against our main findings of joint selection of past and future memory attributes, as for that central analysis we focused on saccade biases that were specific to the selected memory item, whereas the analyses we present below focus on biases in the axes in which both memory items are defined; not only the cued/selected memory item.

      Author response image 2.

      Supplementary Figure 5. Distribution of saccade directions relative to the future rule from encoding onset. (Top panel) The spatial layouts in the four future rules. (Middle panel) Polar distributions of saccades during 0 to 1500 ms after encoding onset (i.e., the period between encoding onset and cue onset). The purple quadrants represent the axis of the future rule and the grey quadrants the orthogonal axis. (Bottom panel) Time courses of saccades along the above two axes. We did not observe any sign of a bias along the axis of the future rule itself.

      We agree that these additional results are important to bring forward when we interpret our findings. Accordingly, we now mention these findings at the relevant section in our Discussion:

      Page 5 (Discussion): “First, memory contents could have directly been remapped (cf. 4,24–26) to their future-relevant location. However, in this case, one may have expected to exclusively find a future-directed gaze bias, unlike what we observed. Moreover, using a complementary analysis of saccade directions along the axis of the future rule (cf. 24), we found no direct evidence for remapping in the period between encoding and cue (Supplementary Fig. 5)”.

      Reviewer 2, Comments:

      The manuscript by Liu et al. reports a task that is designed to examine the extent to which "past" and "future" information is encoded in working memory that combines a retro cue with rules that indicate the location of an upcoming test probe. An analysis of microsaccades on a fine temporal scale shows the extent to which shifts of attention track the location of the location of the encoded item (past) and the location of the future item (test probe). The location of the encoded grating of the test probe was always on orthogonal axes (horizontal, vertical) so that biases in microsaccades could be used to track shifts of attention to one or the other axis (or mixtures of the two). The overall goal here was then to (1) create a methodology that could tease apart memory for the past and future, respectively, (2) to look at the time-course attention to past/future, and (3) to test the extent to which microsaccades might jointly encode past and future memoranda. Finally, some remarks are made about the plausibility of various accounts of working memory encoding/maintenance based on the examination of these time courses.

      Strengths:

      This research has several notable strengths. It has a clear statement of its aims, is lucidly presented, and uses a clever experimental design that neatly orthogonalizes "past" and "future" as operationalized by the authors. Figure 1b-d shows fairly clearly that saccade directions have an early peak (around 300ms) for the past and a "ramping" up of saccades moving in the forward direction. This seems to be a nice demonstration the method can measure shifts of attention at a fine temporal resolution and differentiate past from future-oriented saccades due to the orthogonal cue approach. The second analysis shown in Figure 2, reveals a dependency in saccade direction such that saccades toward the probe future were more likely also to be toward the encoded location than away from the encoded direction. This suggests saccades are jointly biased by both locations "in memory".

      Thank you for your overall appreciation of our work and for highlighting the above strengths. We also thank you for your constructive comments and call for clarifications that we respond to below.

      Weaknesses:

      (1) The "central contribution" (as the authors characterize it) is that "the brain simultaneously retains the copy of both past and future-relevant locations in working memory, and (re)activates each during mnemonic selection", and that: "... while it is not surprising that the future location is considered, it is far less trivial that both past and future attributes would be retained and (re)activated together. This is our central contribution." However, to succeed at the task, participants must retain the content (grating orientation, past) and probe location (future) in working memory during the delay period. It is true that the location of the grating is functionally irrelevant once the cue is shown, but if we assume that features of a visual object are bound in memory, it is not surprising that location information of the encoded object would bias processing as indicated by microsaccades. Here the authors claim that joint representation of past and future is "far less trivial", this needs to be evaluaed from the standpoint of prior empirical data on memory decay in such circumstances, or some reference to the time-course of the "unbinding" of features in an encoded object.

      Thank you. We agree that our participants have to use the future rule – as otherwise they do not know to which test stimulus they should respond. This was a deliberate decision when designing the task. Critically, however, this does not require (nor imply) that participants have to incorporate and apply the rule to both memory items already prior to the selection cue. It is at least as conceivable that participants would initially retain the two items at their encoded (past) locations, then wait for the cue to select the target memory item, and only then consider the future location associated with the target memory item. After all, in every trial, there is only 1 relevant future location: the one associated with the cued memory item. The time-resolved nature of our gaze markers argues against such a scenario, by virtue of our observation of the joint (simultaneous) consideration of past and future memory attributes (as opposed to selection of past-before-future). These temporal dynamics are central to the insights provided by our study.

      In our view, it is thus not obvious that the rule would be applied at encoding. In this sense, we do not assume that the future location is part of both memory objects from encoding, but rather ask whether this is the case – and, if so, whether the future location takes over the role of the past location, or whether past and future locations are retained jointly.

      Our statements regarding what is “trivial” and what is “less trivial” regard exactly this point: it is trivial that the future is considered (after all, our task demanded it). However, it is less trivial that (1) the future location was already available at the time of initial item selection (as reflected in the simultaneous engagement of past and future locations), and (2) that in presence of the future location, the past location was still also present in the observed gaze biases.

      Having said that, we agree that an interesting possibility is that participants remap both memory items to their future-relevant locations ahead of the cue, but that the past location is not yet fully “unbound” by the time of the cue. This may trigger a gaze bias not only to the new future location but also to the “sticky” (unbound) past location. We now acknowledge this possibility in our discussion (also in response to comment 3 below) where we also suggest how future work may be able to tap into this:

      Page 6 (Discussion): “In our study, the past location of the memory items was technically irrelevant for the task and could thus, in principle, be dropped after encoding. One possibility is that participants remapped the two memory items to their future locations soon after encoding, and had started – but not finished – dropping the past location by the time the cue arrived. In such a scenario, the past signal is merely a residual trace of the memory items that serves no purpose but still pulls gaze. Alternatively, however, the past locations may be utilised by the brain to help individuate/separate the two memory items. Moreover, by storing items with regard to multiple spatial frames (cf. 37) – here with regard to both past and future visual locations – it is conceivable that memories may become more robust to decay and/or interference. Also, while in our task past locations were never probed, in everyday life it may be useful to remember where you last saw something before it disappeared behind an occluder. In future work, it will prove interesting to systematically vary to the delay between encoding and cue to assess whether the reliance on the past location gradually dissipates with time (consistent with dropping an irrelevant feature), or whether the past trace remains preserved despite longer delays (consistent with preserving utility for working memory).”

      (2) The authors refer to "future" and "past" information in working memory and this makes sense at a surface level. However, once the retrocue is revealed, the "rule" is retrieved from long-term memory, and the feature (e.g. right/left, top/bottom) is maintained in memory like any other item representation. Consider the classic test of digit span. The digits are presented and then recalled. Are the digits of the past or future? The authors might say that one cannot know, because past and future are perfectly confounded. An alternative view is that some information in working memory is relevant and some is irrelevant. In the digit span task, all the digits are relevant. Relevant information is relevant precisely because it is thought be necessary in the future. Irrelevant information is irrelevant precisely because it is not thought to be needed in the immediate future. In the current study, the orientation of the grating is relevant, but its location is irrelevant; and the location of the test probe is also relevant.

      Thank you for this stimulating reflection. We agree that in our set-up, past location is technically “task-irrelevant” while future location is certainly “task-relevant”. At the same time, the engagement of the past location suggests to us that the brain uses past location for the selection – presumably because the brain uses spatial location to help individuate/separate the items, even if encoded locations are never asked about. Therefore, whether something is relevant or irrelevant ultimately depends on how one defines relevance (past location may be relevant/useful for the brain even if technically irrelevant from the perspective of the task). In comparison, the use of “past” and “future” may be less ambiguous.

      It is also worth noting how we interpret our findings in relation to demands on visual working memory, inspired by dynamic situations whereby visual stimuli may be last seen at one location but expected to re-appear at another, such as a bird disappearing behind a building (the example in our introduction). Thus, past for us does not refer to the memory item perse (like in the digit span analogue) but, rather, quite specifically to the past location of a dynamic visual stimulus in memory (which, in our experiment, was operationalised by the future rule, for convenience).

      (3) It is not clear how the authors interpret the "joint representation" of past and future. Put aside "future" and "past" for a moment. If there are two elements in memory, both of which are associated with spatial bindings, the attentional focus might be a spatial average of the associated spatial indices. One might also view this as an interference effect, such that the location of the encoded location attracts spatial attention since it has not been fully deleted/removed from working memory. Again, for the impact of the encoded location to be exactly zero after the retrieval cue, requires zero interference or instantaneous decay of the bound location information. It would be helpful for the authors to expand their discussion to further explain how the results fit within a broader theoretical framework and how it fits with empirical data on how quickly an irrelevant feature of an object can be deleted from working memory.

      Thank you also for this point (that is related to the two points above). As we stated in our reply to comment 1 above, we agree that one possibility is that the past location is merely “sticky” and pulls the task-relevant future bias toward the past location. If so, our time courses suggest that such “pulling” occurs only until approximately 600 ms after cue onset, as the past bias is only transient. An alternative interpretation is that the past location may not be merely a residual irrelevant trace, but actually be useful and used by the brain.

      For example, the encoded (past) item locations provide a coordinate system in which to individuate/separate the two memory items. While the future locations also provide such a coordinate system, the brain may benefit from holding onto both coordinate systems at the same time, rendering our observation of joint selection in both frames. Indeed, in a recent VR experiment in which we had participants (rather than the items) rotate, we also found evidence for the joint use of two spatial frames, even if neither was technically required for the upcoming task (see Draschkow, Nobre, van Ede, Nature Human Behaviour, 2022). Though highly speculative at this stage, such reliance on multiple spatial frames may make our memories more robust to decay and/or interference. Moreover, while past location was never explicitly probed in our task, in daily life the past location may sometimes (unexpectedly) become relevant, hence it may be useful to hold onto it, just in case. Thus, considering the past location merely as an “irrelevant feature” (that takes time to delete) may not do sufficient justice to the potential roles of retaining past locations of dynamic visual objects held in working memory.

      As also stated in response to comment 1 above, we now added these relevant considerations to our Discussion:

      Page 5 (Discussion): “In our study, the past location of the memory items was technically irrelevant for the task and could thus, in principle, be dropped after encoding. One possibility is that participants remapped the two memory items to their future locations soon after encoding, and had started – but not finished – dropping the past location by the time the cue arrived. In such a scenario, the past signal is merely a residual trace of the memory items that serves no purpose but still pulls gaze. Alternatively, however, the past locations may be utilised by the brain to help individuate/separate the two memory items. Moreover, by storing items with regard to multiple spatial frames (cf. 37) – here with regard to both past and future visual locations – it is conceivable that memories may become more robust to decay and/or interference. Also, while in our task past locations were never probed, in everyday life it may be useful to remember where you last saw something before it disappeared behind an occluder. In future work, it will prove interesting to systematically vary to the delay between encoding and cue to assess whether the reliance on the past location gradually dissipates with time (consistent with dropping an irrelevant feature), or whether the past trace remains preserved despite longer delays (consistent with preserving utility for working memory).”

      Reviewer 3, Comments:

      This study utilizes saccade metrics to explore, what the authors term the "past and future" of working memory. The study features an original design: in each trial, two pairs of stimuli are presented, first a vertical pair and then a horizontal one. Between these two pairs comes the cue that points the participant to one target of the first pair and another of the second pair. The task is to compare the two cued targets. The design is novel and original but it can be split into two known tasks - the first is a classic working memory task (a post-cue informs participants which of two memorized items is the target), which the authors have used before; and the second is a classic spatial attention task (a pre-cue signal that attention should be oriented left or right), which was used by numerous other studies in the past. The combination of these two tasks in one design is novel and important, as it enables the examination of the dynamics and overlapping processes of these tasks, and this has a lot of merit. However, each task separately is not new. There are quite a few studies on working memory and microsaccades and many on spatial attention and microsaccades. I am concerned that the interpretation of "past vs. future" could mislead readers to think that this is a new field of research, when in fact it is the (nice) extension of an existing one. Since there are so many studies that examined pre-cues and post-cues relative to microsaccades, I expected the interpretation here to rely more heavily on the existing knowledge base in this field. I believe this would have provided a better context of these findings, which are not only on "past" vs. "future" but also on "working memory" vs. "spatial attention".

      Thank you for considering our findings novel and important, while at the same time reminding us of the parallels to prior tasks studying spatial attention in perception and working memory. We fully agree that our task likely engages both attention to the (past) memory item as well as spatial attention to the upcoming (future) test stimulus. At the same time, there is a critical difference in spatial attention for the future in our task compared with ample prior tasks engaging spatial cueing of attention for perception. In our task, the cue never directly cues the future location. Rather, it exclusively cues the relevant memory item. It is the memory item that is associated with the relevant future location, according to the future rule. This integration of the rule-based future location into the memory representation is distinct from classical spatial-attention tasks in which attention is cued directly to a specific location via, for example, a spatial cue such as an arrow.

      Thus, if we wish to think about our task as engaging cueing of spatial attention for perception, we have to at least also invoke the process of cueing the relevant location via the appropriate memory item. We feel it is more parsimonious to think of this as attending to both the past and future location of a dynamic visual object in working memory.

      If we return to our opening example, when we see a bird disappear behind a building, we can keep in working memory where we last saw it, while anticipating where it will re-appear to guide our external spatial attention. Here too, spatial attention is fully dependent on working-memory content (the bird itself) – mirroring the dynamic semng in our study. Thus, we believe our findings contribute a fresh perspective, while of course also extending established fields. We now contextualize our finding within the literature and clarify our unique contribution in our revised manuscript:

      Page 5 (Discussion): “Building on the above, at face value, our task may appear like a study that simply combines two established tasks: tasks using retro-cues to study attention in working memory (e.g.,2,31-33) and tasks using pre-cues to study orienting of spatial attention to an upcoming external stimulus (e.g., 31,32,34–36). A critical difference with common pre-cue studies, however, is that the cue in our task never directly informed the relevant future location. Rather, as also stressed above, the future location was a feature of the cued memory item (according to the future rule), and not of the cue itself. Note how this type of scenario may not be uncommon in everyday life, such as in our opening example of a bird flying behind a building. Here too, the future relevant location is determined by the bird – i.e. the memory content – itself.”

      Reviewer 2, Recommendations:

      It would be helpful to set up predictions based on existing working memory models. Otherwise, the claim that the joint coding of past/future is "not trivial" is simply asserted, rather than contradicting an existing model or prior empirical results. If the non-trivial aspect is simply the ability to demonstrate the joint coding empirical through a good experimental design, make it clear that this is the contribution. For example, it may be that prevailing models predict exactly this finding, but nobody has been able to demonstrate it cleanly, as the authors do here. So the non-triviality is not that the result contradicts working memory models, but rather relates to the methodological difficulty of revealing such an effect.

      Thank you for your recommendation. First, please see our point-by-point responses to the individual comments above, where we also state relevant changes that we have made to our article, and where we clarify what we meant with “non trivial”. As we currently also state in our introduction, our work took as a starting point the framework that working memory is inherently about the past while being for the future (cf. van Ede & Nobre, Annual Review of Psychology, 2023). By virtue of our unique task design, we were able to empirically demonstrate that visual contents in working memory are selected via both their past and their future-relevant locations – with past and future memory attributes being engaged together in time. With “not trivial” we merely intend to make clear that there are viable alternatives than the findings we observed. For example, past could have been replaced by the future, or it could have been that item selection (through its past location) was required before its future-relevant location could be considered (i.e. past-before-future, rather than joint selection as we reported). We outline these alternatives in the second paragraph of our Discussion:

      Page 5 (Discussion): “Our finding of joint utilisation of past and future memory attributes emerged from at least two alternative scenarios of how the brain may deal with dynamic everyday working memory demands in which memory content is encoded at one location but needed at another.

      First, [….]”

      Our work was not motivated from a particular theoretical debate and did not aim to challenge ongoing debates in the working-memory literature, such as: slot vs. resource, active vs. silent coding, decay vs. interference, and so on. To our knowledge, none of these debates makes specific claims about the retention and selection of past and future visual memory attributes – despite this being an important question for understanding working memory in dynamics everyday semngs, as we hoped to make clear by our opening example.

      Reviewer 3, Recommendations:

      I recommend that the present findings be more clearly interpreted in the context of previous findings on working memory and attention. The task design includes two components - the first (post-cue) is a classic working memory task and the second (the pre-cue) is a classic spatial attention design. Both components were thoroughly studied in the past and this previous knowledge should be better integrated into the present conclusions. I specifically feel uncomfortable with the interpretation of past vs. future. I find this framework to be misleading because it reads like this paper is on a topic that is completely new and never studied before, when in fact this is a study on the interaction between working memory and spatial attention. I recommend the authors minimize this past-future framing or be more explicit in explaining how this new framework relates to the more common terminology in the field and make sure that the findings are not presented in a vacuum, as another contribution to the vibrant field that they are part of.

      Thank you for these recommendations. Please also see our point-by-point responses to the individual comments above. Here, we explained our logic behind using the terminology of past vs. future (in addition, see also our response to point 2 or reviewer 2). Here, we also stated relevant changes that we have made to our manuscript to explain how our findings complement – but are also distinct from – prior tasks that used pre-cues to direct spatial attention to an upcoming stimulus. As we explained above, in our task, the cue itself never contained information about the upcoming test location. Rather, the upcoming test location was a property of the memory item (given the future rule). Hence, we referred to this as a “future attribute” of the cued memory item, rather than as the “cued location” for external spatial attention. Still, we agree the future bias likely (also) reflects spatial allocation to the upcoming test array, and we explicitly acknowledge this in our discussion. For example:

      Page 5 (Discussion): “This signal may reflect either of two situations: the selection of a future-copy of the cued memory content or anticipatory attention to its the anticipated location of its associated test-stimulus. Either way, by the nature of our experimental design, this future signal should be considered a content-specific memory attribute for two reasons. First, the two memory contents were always associated with opposite testing locations, hence the observed bias to the relevant future location must be attributed specifically to the cued memory content. Second, we cued which memory item would become tested based on its colour, but the to-be-tested location was dependent on the item’s encoding location, regardless of its colour. Hence, consideration of the item’s future-relevant location must have been mediated by selecting the memory item itself, as it could not have proceeded via cue colour directly.”

      Page 6 (Discussion): “Building on the above, at face value, our task may appear like a study that simply combines two established tasks: tasks using retro-cues to study attention in working memory (e.g.,2,31-33) and tasks using pre-cues to study orienting of spatial attention to an upcoming external stimulus (e.g., 31,32,34–36). A critical difference with common pre-cue studies, however, is that the cue in our task never directly informed the relevant future location. Rather, as also stressed above, the future location was a feature of the cued memory item (according to the future rule), and not of the cue itself. Note how this type of scenario may not be uncommon in everyday life, such as in our opening example of a bird flying behind a building. Here too, the future relevant location is determined by the bird – i.e. the memory content – itself.”

    1. Author response:

      Factual error in the eLife assessment to be corrected:

      In the eLife assessment, "ribosomal protein H59" should be changed to "helix 59 of the 28S ribosomal RNA" to make this factually correct.

      Provisional author response

      We thank the reviewers for their thorough and thoughtful readings of the manuscript. Our responses to the four suggestions made in their public reviews are below.

      Reviewer #1 (Public Review):

      Major points:

      (1) The identification of RAMP4 is a pivotal discovery in this paper. The sophisticated AlphaFold prediction, de novo model building of RAMP4's RBD domain, and sequence analyses provide strong evidence supporting the inclusion of RAMP4 in the ribosome-translocon complex structure.

      However, it is crucial to ensure the presence of RAMP4 in the purified sample. Particularly, a validation step such as western blotting for RAMP4 in the purified samples would strengthen the assertion that the ribosome-translocon complex indeed contains RAMP4. This is especially important given the purification steps involving stringent membrane solubilization and affinity column pull-down.

      As suggested, we will revise the manuscript to include Western blots showing that RAMP4 is retained at secretory translocons (and not multipass translocons) after solubilisation, affinity purification, and recovery of ribosome-translocon complexes.

      (2) Despite the comprehensive analyses conducted by the authors, it is challenging to accept the assertion that the extra density observed in TRAP class 1 corresponds to calnexin. The additional density in TRAP class 1 appears to be less well-resolved, and the evidence for assigning it as calnexin is insufficient. The extra density there can be any proteins that bind to TRAP. It is recommended that the authors examine the density on the ER lumen side. An investigation into whether calnexin's N-globular domain and P-domain are present in the ER lumen in TRAP class 1 would provide a clearer understanding.

      We agree that the Calnexin assignment is less confident than the other assignments in this manuscript, and that further support would be ideal. We have exhaustively searched our maps for any unexplained density connected with the putative Calnexin TMD, and have found none. This is consistent with Calnexin's lumenal domain being flexibly linked to its TMD, and thus would not be resolved in a ribosome-aligned reconstruction.

      Our assignment of this TMD to Calnexin was based on existing biochemical data (referenced in the paper) favouring this as the best working hypothesis by far: Calnexin is TRAP’s only abundant co-purifying factor, and their interaction is sensitive to point mutations in the Calnexin TMD. Recognising that this is not conclusive, we will ensure that the text and figures consistently describe this assignment as provisional or putative.

      (3) In the section titled 'TRAP competes and cooperates with different translocon subunits,' the authors present a compelling explanation for why TRAP delta defects can lead to congenital disorders of glycosylation. To enhance this explanation, it would be valuable if the authors could provide additional analyses based on mutations mentioned in the references. Specifically, examining whether these mutations align with the TRAP delta-OSTA structure models would strengthen the link between TRAP delta defects and the observed congenital disorders of glycosylation.

      We agree that mapping disease-causing point mutants to the TRAP delta structure could be potentially informative. Unfortunately, the referenced TRAP delta disease mutants act by simply impairing TRAP delta expression, and thus admit no such fine-grained analyses. However, sequence conservation is our next best guide to mutant function. We note in the text that the contact site charges on TRAP delta and RPN2 are conserved, and that the closest-juxtaposed interaction pair (K117 on TRAPδ and D386 on RPN2) is also the most conserved.

      Reviewer #2 (Public Review):

      Strengths:

      The manuscript contains numerous novel new structural analyses and their potential functional implications. While all findings are exciting, the highlight is the discovery of RAMP4/SERP1 near the Sec61 lateral gate. Overall, the strength is the thorough and extensive structural analysis of the different high-resolution RTC classes as well as the expert bioinformatic evolutionary analysis.

      Weaknesses:

      A minor downside of the manuscript is the sheer volume of analyses and mechanistic hypotheses, which makes it sometimes difficult to follow. The authors might consider offloading some analyses based on weaker evidence to the supplement to maximize impact.

      We agree that the manuscript is long, and we will seek ways to streamline it in revision while avoiding the undesirable side effect of making important findings undiscoverable via literature searches (an unfortunate consequence of many supplemental data). Indeed, we chose eLife for its flexibility regarding article length and suitability for extended and detailed analyses.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1:

      We are grateful for the overall positive feedback from the reviewer.

      We agree with the reviewer that our data showing cellular co-localization between PRC1 and BIN1 requires further investigation in future studies, however, we are confident that in the current form, our manuscript already presents multiple evidences for the role of BIN1 in mitotic processes. We would like to emphasize that PRC1 is not the sole BIN1 partner that connects it to mitotic processes, but it is only one out of more than a dozen that we identified in our study. Furthermore, the mitotic connection with BIN1 is not absolutely novel as BIN1 levels are mildly fluctuating during the cell cycle, similar to other proteins involved in the regulation of the cell cycle (Santos et al., 2015) and because DNM2 is also a well-accepted actor during mitosis (Thompson et al., 2002).

      The less marked co-localization between BIN1 and PRC1 compared to the strong co-localization between BIN1 and DNM2 can be a consequence of their weaker affinity and their partial binding. Yet, this does not necessarily imply that stronger interactions have more biological significance. For example, weaker affinities can be compensated by local concentrations to achieve an even higher degree of cellular complexes than of strongly binding interactions that are separated within the cell. Furthermore, even the degree of complex formation cannot be used intuitively to estimate the biological significance of a complex because complexes can trigger very important biological processes even at very low abundances, e.g. by catalyzing enzymatic reactions. Deciding what is and what is not “biologically significant” among the identified interactions remains to be answered in the future, once we are able to overview complex biological processes in a holistic manner.

      In the revised version, we implemented minor changes to further clarify the raised points.

      Reviewer #2:

      We thank the reviewer for the careful assessment and we are pleased to see the positive enthusiasm regarding our affinity interactomic strategy.

      The reviewer points out that affinities were only measured with a single technique, which is relatively unproven. While it is true that our work uses two techniques building on the same holdup concept, we rather believe that this approach is well-proven. The original holdup method was described almost 20 years ago and since then, it has been used in more than 10 publications for quantitative interactomics. Over the years, at least five distinct generations of the assay were developed, all building on the expertise of the preceding one. In the past, we extensively proved that the resulting affinities show excellent agreement with affinities measured with other methods, such as fluorescence polarization, isothermal titration calorimetry, or surface plasmon resonance (for example in Vincentelli et al. Nat. Meth. 2015; Gogl et al. 2020 Structure; Gogl et al. 2022 Nat.Com.). However, it is true that the most recent variation of this method family, called native holdup, is a fairly new approach published just a bit more than a year ago and this is only the third work that utilizes this method. Yet, in our original work describing the method, we demonstrated good agreement with the results of previous holdup experiments, as well as with orthogonal affinity measurements (Zambo et al. 2022).

      Importantly, the reviewer raises concerns regarding the number of replicates used in our study, as well as the reliability of our methodology. We are glad for such a comment as it allows us to explain our motives behind experimental design which is most often left out from scientific works to save space and keep focus on results. The reason why we use technical replicates instead of the typical biological replicates lies in the nature of the holdup assay. In a typical interactomic assay, such as immunoprecipitation, a lot of variables can perturb the outcome of the measurement, such as bait immobilization, or captured prey leakage during washing steps. The output of such an experiment is a list of statistically significant partners and to minimize these variabilities, biological replicates are used. In the case of a native holdup approach, a panel of an equal amount of resins, all saturated with different baits or controls, is mixed with an equal amount of cell extract, taken from a single tube, and after a brief incubation, the supernatant of this mixture is analyzed. The output of such an experiment is a list of relative concentrations of prey and to maximize its accuracy, we use technical replicates. Using an ideal analytical method, such as fluorescence, it is not necessary to use technical replicates to reach accurate results. For example, the general accuracy of a holdup experiment coupled with a robust analytical approach can be seen clearly in our fragmentomic holdup data shown in Figure 7C where mutant domains that do not have any impact on the interactome show extreme agreement in affinities. Unfortunately, mass spectrometry is less accurate as an analytical method, hence we use technical triplicates to compensate for this. Finally, in the case of BIN1, an independent nHU measurement was also performed using a less capable mass spectrometer. Not counting the 117 detected partners of BIN1 that were only detected in only one of these proteomic measurements, 29 partners were identified as common significant partners in both of these measurements showing nearly identical affinities with a mean standard deviation between measured pKapp values of 0.18, meaning that the obtained dissociation constants are within a <2.5-fold range with >95% probability. There were also 61 BIN1 partners that were detected in both proteomic measurements but were only identified as a significant interaction partner in one of these experiments. Yet many of them show binding in both assays, albeit were found to be not significant in one of these assays. For example, CDC20 shows 66% depletion in one assay (significant binding) while it shows 54% depletion in the other (not significant binding), or CKAP2 shows 58% depletion in one assay (significant binding) while it shows 41% depletion in the other (not significant binding). We hope that these examples show that statistical significance in nHU experiments rather signifies how certain we are in a particular affinity measurement and not the accuracy of the affinity measurement itself. While there are true discrepancies between some of the affinity measurements between these experiments, that would be possible to clarify with more experimental replicates, the raw data presented in our work clearly demonstrate the strength and robustness of a fully quantitative interactomic assay.

      In the revised version, we clarified the number of replicates in the text, in the figure legends, and included some of this discussion in the method section.

      The reviewer had some very useful comments regarding affinity differences between short fragments and full-length proteins. In his comment, he possibly made a typo as we find that fulllength proteins typically interact with higher affinities compared to short PxxP motif fragments in isolation and not weaker. The reviewer also comments that we explain this difference with cooperativity. In a previous preprint version, which the reviewer may have seen, this was indeed the case, but since we realized that we did not have sufficient evidence supporting this model, therefore we did not discuss this in detail in the last version submitted to eLife. To clarify this, we included more discussion about the observed differences in the affinities between fragments and full-length proteins, but since we have limited data to make solid conclusions, we do not go into details about underlying models.

      Instead of cooperativity, the reviewer suggests that the observed differences may originate from additional residues that were not included in our peptides. Indeed, many similar experiments fail because of suboptimal peptide library design. Our peptide library was constructed as 15-mer, xxxxxxPxxPxxxxx motifs and we do not see a strong contribution of residues at the far end of these peptides. Specificity logo reconstructions are expected to identify all key residues that participate in SH3 domain binding, and based on this, all key residues of the identified motifs can be included in shorter 10-mer, xxxPxxPxxx motifs. Therefore, it is unlikely that residues outside our peptide regions will greatly contribute to the site-specific interactions of SH3 domains. It is however possible that other sites, that are sequentially far away from the studied PxxP motifs, are also capable of binding to SH3 through a different surface, but in light of the small size of an isolated SH3 domain, we believe it is very unlikely. It is also possible that BIN1 could also interact with other types of SH3 binding motifs that were not included in our peptide library. We think a more likely explanation is some sort of cooperativity. Cooperativity, or rather synergism between different sites can be easily explained in typical situations, such as in the case of a bimolecular interaction that is mediated by two independent sites. In such an event, once one site is bound, the second binding event will likely also occur because of the high effective local concentration of the binding sites. However, cooperativity can also form in atypical conditions and a molecular explanation for these events is rather elusive. As BIN1 contains a single SH3 domain, its binding to targets containing more binding sites can be challenging to interpret. If these sites are part of a greater Pro-rich region, such as in the case of DNM2, it is possible that the entire region adopts a fuzzy, malleable, yet PPII-like helical conformation. Once the SH3 domain is recruited to this helical region, it can freely trans-locate within this region via lateral diffusion and it will pause on optimal PxxP motifs. As an alternative to this sliding mechanism, a diffusion-limited cooperative binding can also occur. If the two motifs are not part of the same Pro-rich region, but are relatively close in space, such as in the case of ITCH or PRC1, once a BIN1 molecule dissociates from one site, it has a higher chance to rebind to the second site due to higher local concentrations. Such an event can more likely occur if a transient, but relatively stable encounter complex exists between the two molecules, from which complex formation can occur at both sites (A+B↔AB; AB↔ABsite1; AB*↔ABsite2). However, this large effective local concentration in this encounter complex is only temporary because diffusion rapidly diminishes it, although weak electrostatic interactions can increase the lifetime of such encounter complexes. In contrast, the large effective local concentration in conventional multivalent binding is time-independent and only determined by the geometry of the complex. Finally, it may also occur that our empirical bait concentration estimation for immobilized biotinylated proteins is less accurate than the concentration estimation of peptide baits because we approximate this value based on peptide baits. For this technical reason, which was discussed in detail in the original paper describing the nHU approach, we are carefully using apparent affinities for nHU experiments. Nevertheless, even without accurate bait concentrations, our nHU experiment provides precise relative affinities and, thus partner ranking. Either of the mechanisms underlying the interactions we study would be difficult to further explore experimentally, especially at the proteomic level.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Recommendations For The Authors):

      The data is poorly dealt with, and the figures are shown poorly. For example, Figure 2A is not even shown totally.

      We apologize for any difficulties that the reviewer encountered while attempting to view the figures. We have confirmed that all figures, including all panels of Figure 2, display correctly on the HTML and PDF versions of the article hosted at bioRxiv. The HTML and PDF versions generated by eLife also appears to contain all figures and panels in their entirety.

      Reviewer #2 (Recommendations For The Authors):

      Please refer to the public review for possible revisions.

      We thank Reviewer #2 for the summary and thoughtful comments provided in the Public Review. We note the point of possible revision noted from the Public Review: “It can be informative to directly demonstrate DPYD promoter-enhancer interactions. However, the genetic variants support the integration of regulatory activities.” In Figure 4, we provide evidence for direct promoterenhancer interaction though the use of 3C. We furthermore demonstrate that these interactions are dependent upon genotype at rs4294451 as stated by the reviewer. We have highlighted the promoter-enhancer interaction in the revised manuscript, lines 323-325. The role of genotype in this interaction is also specifically discussed in lines 378-381.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Gap junction channels establish gated intercellular conduits that allow the diffusion of solutes between two cells. Hexameric connexin26 (Cx26) hemichannels are closed under basal conditions and open in response to CO2. In contrast, when forming a dodecameric gapjunction, channels are open under basal conditions and close with increased CO2 levels. Previous experiments have implicated Cx26 residue K125 in the gating mechanism by CO2, which is thought to become carbamylated by CO2. Carbamylation is a labile post-translational modification that confers negative charge to the K125 side chain. How the introduction of a negative charge at K125 causes a change in gating is unclear, but it has been proposed that carbamylated K125 forms a salt bridge with the side chain at R104, causing a conformational change in the channel. It is also unclear how overall gating is controlled by changes in CO2, since there is significant variability between structures of gap-junction channels and the cytoplasmic domain is generally poorly resolved. Structures of WT Cx26 gap-junction channels determined in the presence of various concentrations of CO2 have suggested that the cytoplasmatic N-terminus changes conformation depending on the concentration of the gas, occluding the pore when CO2 levels are high.

      In the present manuscript, Deborah H. Brotherton and collaborators use an intercellular dyetransfer assay to show that Cx26 gap-junction channels containing the K125E mutation, which mimics carbamylation caused by CO2, is constitutively closed even at CO2 concentrations where WT channels are open. Several cryo-EM structures of WT and mutant Cx26 gap junction channels were determined at various conditions and using classification procedures that extracted more than one structural class from some of the datasets. Together, the features on each of the different structures are generally consistent with previously obtained structures at different CO2 concentrations and support the mechanism that is proposed in the manuscript. The most populated class for K125E channels determined at high CO2 shows a pore that is constricted by the N-terminus, and a cytoplasmic region that was better resolved than in WT channels, suggesting increased stability. The K125E structure closely resembles one of the two major classes obtained for WT channels at high CO2. These findings support the hypothesis that the K125E mutation biases channels towards the closed state, while WT channels are in an equilibrium between open and closed states even in the presence of high CO2. Consistently, a structure of K125E obtained in the absence of CO2 appeared to also represent a closed state but at lower resolution, suggesting that CO2 has other effects on the channel beyond carbamylation of K125 that also contribute to stabilizing the closed state. Structures determined for K125R channels, which are constitutively open because arginine cannot be carbamylated, and would be predicted to represent open states, yielded apparently inconclusive results.

      A non-protein density was found to be trapped inside the pore in all structures obtained using both DDM and LMNG detergents, suggesting that the density represents a lipid rather than a detergent molecule. It is thought that the lipid could contribute to the process of gating, but this remains speculative. The cytoplasmic region in the tentatively closed structural class of the WT channel obtained using LMNG was better resolved. An additional portion of the cytoplasmic face could be resolved by focusing classification on a single subunit, which had a conformation that resembled the AlphaFold prediction. However, this single-subunit conformation was incompatible with a C6-symmetric arrangement. Together, the results suggest that the identified states of the channel represent open states and closed states resulting from interaction with CO2. Therefore, the observed conformational changes illuminate a possible structural mechanism for channel gating in response to CO2.

      Some of the discussion involving comparisons with structures of other gap junction channels are relatively hard to follow as currently written, especially for a general readership. Also, no additional functional experiments are carried out to test any of the hypotheses arising from the data. However, structures were determined in multiple conditions, with results that were consistent with the main hypothesis of the manuscript. No discussion is provided, even if speculative, to explain the difference in behavior between hemichannels and gap junction channels. Also, no attempt was made to measure the dimensions of the pore, which is relevant because of the importance of identifying if the structures indeed represent open or closed states of the channel.

      We have considerably revised the manuscript in an attempt to make it more tractable. We respond to the individual comments below.

      Reviewer #2 (Public Review):

      Summary:

      The manuscript by Brotherton et al. describes a structural study of connexin-26 (Cx26) gap junction channel mutant K125E, which is designed to mimic the CO2-inhibited form of the channel. In the wild-type Cx26, exposure to CO2 is presumed to close the channel through carbamylation of the residue K125. The authors mutated K125 to a negatively charged residue to mimic this effect, and they observed by cryo-EM analysis of the mutated channel that the pore of the channel is constricted. The authors were able to observe conformations of the channel with resolved density for the cytoplasmic loop (in which K125 is located). Based on the observed conformations and on the position of the N-terminal helix, which is involved in channel gating and in controlling the size of the pore, the authors propose the mechanisms of Cx26 regulation.

      Strengths:

      This is a very interesting and timely study, and the observations provide a lot of new information on connexin channel regulation. The authors use the state of the art cryo-EM analysis and 3D classification approaches to tease out the conformations of the channel that can be interpreted as "inhibited", with important implications for our understanding of how the conformations of the connexin channels controlled.

      Weaknesses:

      My fundamental question to the premise of this study is: to what extent can K125 carbamylation by recapitulated by a simple K125E mutation? Lysine has a large side chain, and its carbamylation would make it even slightly larger. While the authors make a compelling case for E125-induced conformational changes focusing primarily on the negative charge, I wonder whether they considered the extent to which their observation with this mutant may translate to the carbamoylated lysine in the wild-type Cx26, considering not only the charge but also the size of the modified side-chain.

      This is an important point. We agree that the difference in size will have a different effect on the structure. For kinases, aspartate or glutamate are often used as mimics of phosphorylated serine or threonine and these will have the same issues. The fact that we cannot resolve the relevant side-chains in the density may be indicative that the mutation doesn’t give the whole story. It may be able to shift the equilibrium towards the closed conformation, but not stably trap the molecule in that conformation. We include a comment to this effect in the revised manuscript.

      Reviewer #3 (Public Review):

      Summary:

      The mechanism underlying the well-documented CO2-regulated activity of connexin 26 (Cx26) remains poorly understood. This is largely due to the labile nature of CO2-mediated carbamylation, making it challenging to visualize the effects of this reversible posttranslational modification. This paper by Brotherton et al. aims to address this gap by providing structural insights through cryo-EM structures of a carbamylation-mimetic mutant of the gap junction protein.

      Strengths:

      The combination of the mutation, elevated PCO2, and the use of LMNG detergent resulted in high-resolution maps that revealed, for the first time, the structure of the cytoplasmic loop between transmembrane helix (TM) 2 and 3.

      Weaknesses:

      The presented maps merely reinforce their previous findings, wherein wildtype Cx26 favored a closed conformation in the presence of high PCO2. While the structure of the TM2-TM3 loop may suggest a mechanism for stabilizing the closed conformation, no experimental data was provided to support this mechanism. Additionally, the cryo-EM maps were not effectively presented, making it difficult for readers to grasp the message.

      We have extensively revised the manuscript so that the novelty of this study is more apparent. There are three major points

      (1) The carbamylation mimetic pushes the conformation towards the closed conformation. Previously we just showed that CO2 pushes the conformation towards this conformation. Though we could show this was not due to pH, and could speculate this was due to carbamylation as suggested by previous mutagenesis studies, our data did not provide any mechanism whereby Lys125 was involved.

      (2) In going from the open to closed conformations, not only is a conformational change in TM2 involved, as we saw previously, but also a conformational change in TM1, the linker to the N-terminus and the cytoplasmic loop. Thus there is a clear connection between Lys125 and the conformation of the pore-closing N-terminus.

      (3) We observe for the first time in any connexin structure, density for the cytoplasmic loop. Since this loop is important in regulation, knowing how it might influence the positions of the transmembrane helices is important information if we are to understand how connexins can be regulated.

      Reviewing Editor:

      The reviewers have agreed on a list of suggested revisions that would improve the eLife assessment if implemented, which are as follows:

      (1) For completeness, Figure 1 could be supplied with an example of how the experiment would look like in the presence of CO2 - for the wild-type and for the K125E mutant. presumably for the wild-type this has been done previously in exactly this assay format, but this control would be an important part of characterization for the mutant. Page 4, lines 105106; "unsurprisingly, Cx26K125E gap junctions remain closed at a PCO2 of 55 mmHg." The data should be presented in the manuscript.

      We have now included the data with a PCO2 of 55mmH. This is now Figure 4 in our revised manuscript.

      (2) Would AlphaFold predictions show any interpretable differences in the E125 mutant, compared to the K125 (the wild-type)?

      We tried this in response to the reviewer’s suggestion. We did not see any interpretable differences. In general AlphaFold is not recognised as giving meaningful information around point mutations.

      (3) The K125R mutant appears to be a more effective control for extracting significant features from the K125E maps. Given that the use of a buffer containing high PCO2 is essential for obtaining high-resolution maps, wildtype Cx26 is unsuitable as an appropriate control. The K125R map, obtained at a high resolution (2.1Å), supports its suitability as a robust control.

      Though we are unsure what the referee is referring to here, we have rewritten this section and compare against the K125R map (figure 5a) as well as that derived from the wild-type protein. The important point is that the K125E mutant, causes a structural change that is consistent with the closure of the gap junctions that we observe in the dye-transfer assays.

      (4) Likewise, the rationale for using wildtype Cx26 maps obtained in DDM is unclear. Wildtype Cx26 seems to yield much better cryo-EM maps in LMNG. We suggest focusing the manuscript on the higher-quality maps, and providing supporting information from the DDM maps to discuss consistency between observations and the likely possibility that the nonprotein density in the pore is lipid and not detergent.

      The rationale for comparing the mutants against the wt Cx26 maps obtained in DDM was because the mutants were also solubilised in DDM. However, taking the lead from the referees’ comments, we have now rewritten the manuscript so that we first focus on the data we obtain from protein solubilised in LMNG. We feel this makes our message much clearer.

      (5) In general, the rationale for utilizing cryo-EM maps with the entire selected particles is unclear. Although the overall resolutions may slightly improve in this approach, the regions of interest, such as the N-terminus and the cytoplasmic loop, appear to be better ordered afer further classifications. The paper would be more comprehensible if it focuses solely on the classes representing the pore-constricting N-terminus (PCN) and the pore-open flexible Nterminus (POFN) conformations. Also, the nomenclatures used in the manuscript, such as "WT90-Class1", "K125E90-1", "LMNG90-class1", "LMNG90-mon-pcn" are confusing.

      LMNG90s are also wildtype; K125E-90-1 is in Class1 for this mutant and is similar to WT90Class2, which represents the PCN conformation. More consistent and intuitive nomenclatures would be helpful.

      We agree with the referees’ comments. This should now be clearer with our rewritten manuscript where we have simplified this considerably. We now call the conformations NConst (N-terminus defined and constricting the pore) and NFlex (N-terminus not visible) and keep this consistent throughout.

      (6) A potential salt bridge between the carbamylated K125 and R104 is proposed to account for the prevalence of Class-1 (i.e., PCN) in the majority of cryo-EM particles. However, the side chain densities are not well-defined, suggesting that such an interaction may not be strong enough to trap Cx26 in a closed conformation. Furthermore, the absence of experimental data to support this mechanism makes it unclear how likely this mechanism may be. Combining simple mutagenesis, such as R104E, with a dye transfer assay could offer support for this mechanism. Are there any published experimental results that could help address this question without the need for additional experimental work? Alternatively, as acknowledged in the discussion, this mechanism may be deemed as an "over-simplification." What is an alternative mechanism?

      R104 has been mutated to alanine in gap junctions and tested in a dye transfer assay as now mentioned in the text (Nijar et al, J Physiol 2021) supporting this role. In hemichannels R104 has been mutated to both alanine and glutamate and tested through dye loading assays Meigh et al, eLife 2013). Also in hemichannels R104 and K125 have been mutated to cysteines allowing them to be cross-linked through a disulphide bond. This mutant responds to a change in redox potential in a similar way to which the wild type protein responds to CO2 (Meigh et al, Open Biol 2015). Therefore, there is no doubt that the residues are important for the mechanism and the salt-bridge interaction seems a plausible mechanism to reconcile the mutagenesis data, however we cannot be sure that there are not other interactions involved that are necessary for closure. This information has now been included in the text.

      (7) The cryo-EM maps presented in the manuscript propose that gap junctions are constitutively open under normal PCO2 as the flexible N-terminus clears the solute permeation pathway in the middle of the channel. However, hemichannels appear to be closed under normal PCO2. It is puzzling how gap junctions can open when hemichannels are closed under normal PCO2 conditions. If this question has been addressed in previous studies, the underlying mechanism should be explicitly described in the introduction. If it remains an open question, differences in the opening mechanisms between hemichannels and gap junctions should be investigated.

      We suspect this is due to the difference in flexibility of gap junctions relative to hemichannels. However, a discussion of this is beyond this paper and would be complete speculation based on hemichannel structures of other connexins, performed in different buffering systems. There are no high resolution structures of Cx26 hemichannels.

      (8) A mystery density likely representing a lipid is abruptly introduced, but the significance of this discovery is unclear. It is hard to place the lipid on Figure S6 in the wider context of everything else that is discussed in the text. It would be helpful for readers if a figure were provided to show where the density is located in relation to all the other regions that are extensively discussed in the text.

      In the revised text this section has been completely rewritten. We have now include a more informative view in a new figure (Figure 1 – figure supplement 3).

      (9) Including and displaying even tentative pore-diameter measurements for the different states - this would be helpful for readers and provide a more direct visual cue as to the difference between open and closed states.

      We have purposely avoided giving precise measurements to the pore-diameter, since this depends on how we model the N-terminus. The first three residues are difficult to model into the density without causing stearic clashes with the neighbouring subunits.

      (10) Given that no additional experiments for channel function were carried out, it would be useful if to provide a more detailed discussion of additional mutagenesis results from the literature that are related to the experimental results presented.

      We have amplified this in the discussion (see answer to point 6).

      The reviewers also agreed that improvements in the presentation of the data would strengthen the manuscript. Here is a summary list of suggestions by reviewers aimed at helping improve how the data is presented:

      (1) Why is the pipette bright green in the top image, but rather weakly green in the bottom image in Figure 1 - is this the case for all images?

      (Now figure 4) This depends on whether the pipette was in the focal plane of view or not. The important point of these images is the difference in intensity of the donor vs the recipient cell. The graphs in figure 4c illustrate clearly the difference between the wild-type and the mutant gap junctions.

      (2) In figures 2-5, labels would help a lot in understanding what is shown - while the legends do provide the information on what is presented, it would help the reader to see the models/maps with labels directly in the panel. For example, Figure 2a/b - just indicating "WT90 Cx26" in pink and "K125E90" in blue directly in the panel would reduce the work for the reader.

      We have extensively modified the labels in the figures to address this issue.

      (3) Figure 4 - magenta and pink are fairly close, and to avoid confusion it might be useful to use a different color selection. This is especially true when structures are overlayed, as in this figure - the presentation becomes rather complicated, so the less confusion the color code can introduce, the better.

      (Now Figure 2) We have now changed pink to blue.

      (4) Figure 5 - a remarkably under-labelled figure.

      Now added labels.

      (5) Figure 6 - it would be interesting to add a comparison to Cx32 here as well for completeness, since the structure has been published in the meantime.

      Cx32 has now been included.

      (6) Figure 7 - please add equivalent labels on both sides of the model, left and right. Add the connecting lines for all of the tubes TM helices - this will help trace the structural elements shown. The legend does not quite explain the colors.

      We have modified the figure as suggested and explained the colours in the legend.

      (8) Fig.1 legend; Unclear what mCherry fluorescence represents. State that Cx26 was expressed as a translational fusion with mCherry.

      Now figure 4. We have now written “Montages each showing bright field DIC image of HeLa cells with mCherry fluorescence corresponding to the Cx26K125E-mCherry fusion superimposed (leftmost image) and the permeation of NBDG from the recorded cell to coupled cells.”

      (9) Fig. 3 b); Show R104 in the figure. Also E129-R98/R99 interaction is hard to acknowledge from the figure. It seems that the side chain density of E129 is not strong enough to support the modeled orientation.

      This is now Figure 1c. While the density in this region is sufficient to be confident of the main chain, we agree that the side chain density for the E129-R98/R99 interaction is not sufficiently clear to draw attention to and have removed the associated comment from the figure legend. The density is focussed on the linker between TM1 and the N-terminus and the KVRIEG motif. We prefer to omit R104, in order to keep the focus on this region. As described in the manuscript, the density for the R104 side chain is poor.

      (10) Fig. 3 c); Label the N-terminus and KVRIEG motif in the figure.

      Now Figure 1b. We have labelled the N-terminus. The KVRIEG motif is not visible in this map.

      (11) Page 9, lines 246-248; Restate, "We note, however, density near to Lys125, between Ser19 in the TM1-N-term linker, Tyr212 of TM4 and Tyr97 on TM3 of the neighbouring subunit, which we have been unable to explain with our modelling."

      We have reworded this.

      (12) Page 14, line 399; Patch clamp recording is not included in the manuscript.

      Patch clamp recordings were used to introduce dye into the donor cell.

      (13) On the same Figure 2, clashes are mentioned but these are hard to appreciate in any of the figures shown. Perhaps would be useful to include an inset showing this.

      We have modified Figure 2b slightly and added an explanation to highlight the clash. It is slightly confusing because the residues involved belong to neighbouring subunits.

      (14) The discussion related to Figure 6 is very hard to follow for readers who are not familiar with the context of abbreviations included on the figure labels. This figure could be improved to allow a general readership to identify more clearly each of the features and structural differences that are discussed in the text.

      We have extensively changed the text and updated the labels on the figure to make it much easier for the reader to follow.

      Below, you can find the individual reviews by each of the three reviewers.

      Reviewer #1 (Recommendations For The Authors):

      (1) In Figure 2d-e, the text discusses differences between K125E 90-1 and WT 90-class2 (7QEW), yet the figure compares K125E with 7QEQ. I suggest including a figure panel with a comparison between the two structures discussed in the manuscript text.

      This has been changed in the revised manuscript.

      Other comments have been addressed above.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      The reviewers thoughtful comments have helped us make the manuscript both more comprehensive and clearer. Thank you for your time and effort. We know that this is a long and technical paper. In our responses we refer to three documents:

      • Original: the first original submission

      • Revision: the revised document (02 MillardFranklinHerzog2023 v2.pdf)

      • Difference: a document that shows the changes made to text (but not figures or tables) from the original to revision (03 MillardFranklinHerzog2023 diff.pdf).

      Reviewer #1 (Recommendations For The Authors):

      (1) In general, the paper is well written and addresses important questions of muscle mechanics and muscle modeling. In the current version, the model limitations are briefly summarized in the abstract. However, the discussion needs a more complete description of limitations as well as a discussion of types of data (in vivo, ex vivo, single fiber, wholes muscle, MTU, etc.) that can be modeled using this approach.

      Please see the response to comment 23 for more details of the limitations that have been added to the revised document.

      (2) The choice of a model with several tendon parameters for simulating single muscle fiber experiments is not well justified.

      A rigid-tendon model with a slack length of zero was, in fact, used for these simulations for both the VEXAT and Hill models. In case this is still not clear: a rigid-tendon model of zero length is equivalent to no tendon at all. The text that first mentions the tendon model has now been modified to make it clearer that the parameters of the model were set to be consistent with no tendon at all:

      Please see the following text:

      Original:

      • page 17, column 1, line 28 ”... rigid tendon of zero length,”

      • page 17, column 1, line 51 ”... rigid tendon of zero length.”

      Revision:

      • page 19, column 1, line 19 ”... we used a rigid-tendon of zero length (equivalent to ignoring the tendon)”

      • page 19, column 1, line 38 ”... coupled with a rigid-tendon of zero-length.”

      Difference:

      • page 21, column 1, line 19 ”... we used a rigid-tendon... ”

      • page 21, column 1, line 45 ”... rigid-tendon of zero length ...”

      (3) A table that clarifies how all model parameters were estimated needs to be included in the main part of the manuscript.

      Two tables have been added to the manuscript that detail the parameters of the elastic-tendon cat soleus model (in the main body of the text) and the rabbit psoas fibril model (in an appendix). Each table includes:

      • A plain language parameter name

      • The mathematical symbol for the parameter

      • The value and unit of the parameter

      • A coded reference to the data source that indicates both the experimental animal and how the data was used to evaluate the parameter.

      Please see the following text:

      Revision:

      • page 11

      • page 42

      Difference:

      • page 11

      • page 46

      (4) The supplemental information is not properly referenced in the main text. There are a number of smaller issues that also need to be addressed.

      Thank for your attention to detail. The following problems related to Appendix referencing have been fixed:

      • Appendices are now parenthetically referenced at the end of a sentence. However, a few references to figures (that are contained within anAppendix) still appear in the body of the sentence since moving these figure references makes the text difficult to understand.

      • All Appendices are now referenced in the main body of the text.

      (5) Abstract, line 6: While it is commonly assumed that the short range stiffness of muscle is due to cross bridges, Rack & Westbury (1974) noted that it occurs over a distance of 25-35 nm, and that many cross-bridges must be stretched even farther than this distance (their p. 348 middle). It seems unlikely that cross-bridges alone can actually account for the short-range stiffness.

      There are three parts to our response to this comment:

      (a) Rack & Westbury’s definition of short-range-stiffness and unrealistic cross-bridge stretches

      (b) Rack & Westbury’s definition of short-range-stiffness vs. linear-timeinvariant system theory

      (c) Updates to the paper

      a. Rack & Westbury’s definition of short-range-stiffness and unrealistic cross-bridge stretches.

      As you note, on page 348, Rack and Westbury write that ”If the short range stiffness is to be explained in terms of extension of cross-bridges, then many of them must be extended further than the 25-35 nm mentioned above.” Having re-read the paper, its not clear how these three factors are being treated in the 25−35 nm estimate:

      • the elasticity of the tendon and aponeurosis,

      • the elasticity of actin and myosin filaments,

      • and the cycling rate of the cross-bridges.

      Obviously the elasticity of the tendon, aponeurosis, actin, and myosin filaments will reduce the estimated amount of crossbridge strain during Rack and Westbury’s experiments. A potentially larger factor is the cycling rate of each cross-bridge. If each crossbridge cycles faster than 11 Hz (the maximum frequency Rack and Westbury used), then no single crossbridge would stretch by 25-35 nm. So why didn’t Rack and Westbury consider the cycling rate of crossbridges?

      Rack and Westbury’s reasoned that a perfectly elastic work loop would necessarily mean that all crossbridges stayed attached: as soon as a crossbridge cycles it would release its stored elastic energy and the work loop would no longer be elastic. Since Rack and Westbury measured some nearly perfect elastic work loops (the smallest loops in Fig. 2,3, and 4), I guess they assumed crossbridges remained attached during the 25-35 nm crossbridge stretch estimate. However, even Rack and Westbury note that none of the work loops they measured were perfectly elastic and so there is room to entertain the idea that crossbridges are cycling.

      Fortunately, for this discussion, crossbridge cycling rates have been measured.

      In-vitro measurements by Uyeda et al. show that crossbridges are cycling at 30 Hz when moving at 0.5-1.2 length/s. At this rate, there would be enough time for a single crossbridge to cycle nearly 2.72 times for every cycle of the 11 Hz sinusoidal perturbations, reducing its expected strain from 25-35 nm down to 9.2−12.9µm. This effect becomes even more pronounced if crossbridge cycling rate is used to explain the difference in sliding velocity between Uyeda et al.’s in-vitro data (0.5-1.2 length/s) and the maximum contraction velocity of an in-situ cat soleus (4.65 lengths/s, Scott et al.).

      b. Rack & Westbury’s definition of short-range-stiffness vs. linear-time-invariant system theory

      Rack and Westbury defined short-range-stiffness to describe a specific kind of force response of the muscle to cyclical length changes:

      • muscle force is linear with length change,

      • and independent of velocity.

      Rack and Westbury’s definition therefore fails when viscous forces become noticeable, because viscous forces are velocity dependent.

      On line 6 of the abstract the term ‘short-range-stiffness’ is not used because Rack and Westbury’s definition is too narrow for our purposes. Instead we are using the more general approach of approximating muscle as a linear-timeinvariant (LTI) system, where it is assumed that

      • the response of the system is linear

      • and time invariant.

      To unpack that a little, a muscle is considered in the ‘short-range’ in our work if it meets the criteria of a linear time-invariant (LTI) system:

      • the force response of muscle can be accurately described as a linear function of its length and velocity (its state)

      • and its response is not a function of time (which means constant stimulation, and no fatigue).

      In contrast to Rack and Westbury’s definition, the ‘short-range’ in linear systems theory is general enough to accommodate both elastic and viscous forces. In physical terms, small for an LTI approximation of muscle is larger than the short-range defined by Rack and Westbury: an LTI system can include velocity dependence, while short-range-stiffness ends when velocity dependence begins.

      c. Updates to the paper

      To make the differences between Rack and Westbury’s ‘short-range-stiffness’ and LTI system theory clearer: - We have removed all occurrences of ‘short-range’ that were associated with Kirsch et al. and have replaced this phrase with ‘small’.

      • On the first mention of Kirsch’s work we have made the wording more specific

      Revision:

      • page 1, column 1, lines 4,5

      • page 1, column 2, lines 14-21 ”Under constant activation ...”

      Difference: page 1, column 2, line 19-26

      • page 1, column 1, lines 4,5

      • page 1, column 2, lines 20-27 ”Under constant activation ...”

      • A footnote has been added to contrast the definition of ‘small’ in the context of an linear time invariant system to ‘short-range’ in the context of Rack and Westbury’s definition of short-range-stiffness.

      Revision: page 1, column 2, bottom

      Difference: page 1, column 2, bottom

      • In addition, we have added a brief overview of LTI system theory to make the analysis and results more easily understood:

      Revision: Figure 4 paragraph beginning on page 10, column 2, line 15 ”As long as ...”

      Difference: Figure 4 paragraph beginning on page 12, column 1, line 46 ”As long as ...”

      (6) Page 3, lines 6-8: It also seems unlikely that 25% of cross-bridges are attached at one time (Howard, 1997) even for supramaximal isometric stimulation. The number should be less than 20%. What would the ratio of load path stiffness be for low force movements such as changing the direction of a frictionless manipulandum or slow walking? The range of relative stiffnesses is of more interest than the upper limit.

      We have made the following updates to address this comment:

      • A 20% duty cycle now defines the upper bound stiffness of the actinmyosin load path.

      • We have also evaluated the lower bound actin-myosin stiffness when a single crossbridge is attached.

      • The stiffness of titin from Kellermayer et al. has been digitized at a length of 2 µm and 4 µm to more accurately capture the length dependence of titin’s stiffness.

      • We have added a new figure (Figure 14) to make it easier to compare the range of actin-myosin stiffness to titin-actin stiffness.

      • The text in the main body of the paper and the Appendix has been updated.

      • The script ’main ActinMyosinAndTitinStiffness.m’ used to perform the calculations and generate the figure is now a part of the code repository.

      Please see the following text:

      Revision

      • The paragraph beginning at page 2, column 2, line 45 ”The addition of a titin element ...”

      • Appendix A

      • Figure 14 (in Appendix A)

      Difference

      • The paragraph beginning at page 3, column 1, line 6: ”The addition of a titin element ...”

      • Appendix A

      • Figure 14 (in Appendix A)

      (7) Page 5, line 12: A word seems to be missing here, ”...together to further...”.

      Thank you for your attention to detail. The sentence has been corrected.

      Please see the following text:

      • Revision: page 4, column 2, line 40 ”... into a single ...”

      • Difference: page 5, column 1, line 18

      (8) Page 5, line 24-27: These ”theories” are not mutually exclusive, and it is misleading to suggest they are. There is evidence for binding of titin to actin at multiple locations and there is no reason why evidence supporting one binding location must detract from the evidence supporting other binding locations.

      The text has been modified to make it clear to readers that the different titinactin binding locations are not mutually exclusive. Please see the following text:

      • Revision: page 5, column 1, lines 17-19, the sentence beginning ”As previously mentioned, ...”

      • Difference: page 5, column 1, lines 41-44

      (9) Page 5, lines 48-51: Should cite Kellermayer and Granzier (1996) not Kellermayer et al. (1997).

      The reference to ‘Kellermayer et al.’ has been changed to ‘Kellermayer and Granzier’. The comment that the year of the reference should be changed from (1997) to (1996) is confusing: the 1996 paper is being referenced.

      For further details please see:

      • Revision: page 5, column 1, 39-40

      • Difference: page 5, column 2, line 19-22

      (10) Also, Dutta et al. (2018) should be cited as further showing that N2A titin by itself slows actin motility on myosin.

      Thank you for the suggestion. The sentence has been modified to include Dutta et al.:

      For further details please see:

      • Revision: page 5, column 1, 40

      • Difference: page 5, column 2, line 19-22

      (11) Figure 2 legend and elsewhere: it is odd to say that experiments used ”a cat soleus” when more than one cat coleus was used. Change to ”cat coleus”. See also page 15, line 15.

      Thank you for your attention to detail. All occurrences of ‘a cat soleus’ have been changed, with some sentence revision, to ‘cat soleus’.

      (12) Page 6, line 10: It is not clear why an MTU was used to simulate single muscle fiber experiments. What is the justification for choosing this particular model? Also, the choice of model might explain why the version with stiff tendon performs better than the version with an elastic tendon, but this is never mentioned. Why not use a muscle model with no tendon (e.g., Wakeling et al., 2021 J. Biomech.)?

      Please see the response to comment 2.

      (13) Millard et al.’s activation dynamics model also fails to capture the lengthdependence of activation dynamics (Shue and Crago, 1998; Sandercock and Heckman, 1997), which should be noted in the discussion along with other limitations.

      An additional limitations paragraph is in the revised manuscript that addresses this comment specifically. However, we have used Stephenson and Wendt as a reference for the shift in peak isometric force that comes with submaximal activation. In addition, we also reference Chow and Darling for the property that the maximum shortening velocity is reduced with submaximal activations.

      • Revision: page 22, column 1, line 41 ”Finally, the VEXAT model ...”

      • Difference: page 24, column 2, line 12 ”Finally, the VEXAT model ...”

      In addition, please see the response to comment 23.

      (14) Page 6, line 22: ”An underbar...”.

      Thank you for your attention to detail, this correction has been made.

      (14) Page 7, lines 27-32: This and other issues should be described in the Discussion under a heading of model limitations.

      Please see the response to comment 23.

      (15) Page 7, lines 43-44: Numerous papers from the last author’s laboratory contradict the claim that there is no force enhancement on the ascending limb by demonstrating that force enhancement does occur on the ascending limb (see e.g., Leonard & Herzog 2002, Peterson et al., 2004 and several papers from the Rassier laboratory).

      Thank you for your attention to detail. This statement is in error and has been removed. To improve this section of the paper, a paragraph has been added to briefly mention the experimental observations of residual force enhancement before proceeding to explain how this phenomena is represented by the model.

      Please see the following text:

      Revision:

      • the paragraph starting on page 7, column 2, line 43 ”When active muscle is lengthened, ...”

      • and the following paragraph starting on page 8, column 1, line 3 “To develop RFE, ”

      Difference:

      • the paragraph starting on page 8, column 2, line 15

      • and the following paragraph starting on page 9, column 1, line 6

      (17) Figure 3 legend and elsewhere: The authors use Prado et al. (2005) to determine several titin parameters, however the simulations seem to focus on cat soleus, but Prado et al.’s paper is on rabbits. More clarity is needed about which specific results from which species and muscles were used to parameterize the model.

      The new parameter table includes coded entries to indicate the literature source for experimental data, the animal it came from, and how the data was used. For example, the ‘ECM fraction’ has a source of ‘R[57]’ to show that the data came from rabbits from reference 57. For further details, please see the response to comment #3

      Please see the following text:

      • Revision: page 11, column 2, table section H: ‘ECM fraction’.

      • Difference: page 11, column 2, table section H: ‘ECM fraction’.

      To address this comment in a little more detail, we have had to use Prado et al. (2005) to give us estimates for only one parameter: P, the fraction of the passive force-length relation that is due to titin. Prado et al.’s measurements relating to P are unique to our knowledge: these are the only measurements we have to estimate P in any muscle, cat soleus or otherwise. Here we use the average of the values for P across the 5 muscles measured by Prado et al. as a plausible default value for all of our simulations.

      (18) Figure 4 seems unnecessary.

      Figure 4 has been removed.

      (19) Page 10, lines 17-18: provide the abbreviation (VAF) here with the definition (variance accounted for).

      Thank you for your attention to detail. The abbreviation has been added.

      Please see these parts of the manuscripts for details:

      • Revision: page 12, column 2, line 13

      • Difference: page 13, column 2, line 32

      (20) Page 11, lines 2-3: Here and elsewhere, it is clear that some model parameters have been optimized to fit the model. The main paper should include a table that lists all model parameters and how they were chosen or optimized, including but not limited to the information in Table 1 of the supplemental information section.

      See response to comment 3.

      (20) Page 17, lines 45 -49: Again, a substantial number of ad hoc adjustments to the model appear to be required. These should be described in the Discussion under limitations, and accounted for in the parameters table. See also legends to Fig. 12 and 13, page 19, lines 23-26.

      Please see the response to comment #3: a coded entry now appears to indicate the data source, the animal used in the experiment, and the method used to process the data. This includes entries for parameters which were estimated

      ‘E’ so that the model produced acceptable results in the simulations presented. In addition, the new discussion paragraph includes a number of sentences that use the adjustment to the active-titin-damping coefficient as an opening to discuss the limitations of the VEXAT’s titin-actin bond model and the circumstances under which the model’s parameters would need to be adjusted.

      Please see responses to comments 3 and 23 for additional details. In addition, please see the specific discussion text mentioning the change to βoPEVK:

      • Revision: page 22, column 1, line 30 ”In Sec. 3.3 we had ...”

      • Difference: page 24, column 1, line 49

      (22) Page 20, lines 50-11: It should be noted here that Tahir et al.’s (2018) model has both series and parallel elastic elements, provided by superposition of rotation (series) and translation (parallel) of a pulley.

      While it is true that Tahir et al.’s (2018) model has series and parallel elements, as do the other models mentioned, these models do not have the correct structure to yield a gain and phase response that mimics biological muscle. The text that I originally wrote attempted to explain this without going into the details. As you note, this explanation leaves something to be desired. The original text commenting on the models of Forcinito et al, Tahir et al, Haeufle et al., and Gunther et al. has been updated to be more specific.¨ Please see the parts of the following manuscripts for details:

      • Revision: page 22, column 2, line 20, the paragraph beginning ”The models of Forcinito ...”

      • Difference: page 24, column 2, line 44

      (23) Discussion: This section should include a description of model limitations, including the relatively large number of ad hoc modifications and how many parameters must be found by optimization in practice. The authors should discuss what types of data are most compatible for use with the model (ex vivo, in vivo, single fiber, whole muscle, MTU), requirements for applying the model to different types of data, and impediments to using the model on different types of data.

      An additional limitations paragraph has been added to the discussion.

      Please see the following text:

      • Revision: the paragraph beginning on page 22, column 1, line 11 ”Both the viscoelastic ...”

      • Difference: the paragraph beginning on page 24, column 1, line 27.

      Reviewer #2 (Recommendations For The Authors):

      (1) If it is possible to compare the output of this model to other more contemporary models which incorporate titin but are also simple enough to implement in whole-body simulation (such as the winding filament model), this would seem to greatly strengthen the paper.

      That’s an excellent idea, though beyond the scope of this already lengthy paper. Even though the Hill model we evaluated is a bit old it is widely used, and so, many readers will be interested in seeing the benchmark results. As benchmarking work is both difficult to fund and undertake, we do hope that others will evaluate their own models using the code and data we have provided.

      (2) I’m a little unclear on the basis for the transition between short- and midrange length changes, both in reality and in the model. And also about the range of strains that qualify as ”short”. It seems like there is potential for short range stiffness, although I would have thought more in the range of 1-2% strains than >3%, to be due to currently attached crossbridges. There is clear evidence that active titin is responsible for the low stiffness at very large strains that exceed actin-myosin overlap. But I am not clear on how a transitional stiffness on the descending limb of the force-length relationship is implemented in the model, and what aspect of physiology this is replicating. It may be helpful to clarify this further and indicate where in the model this stiffness arises.

      This question has several parts to it which I will paraphrase here:

      A Short-range stiffness acts over smaller strains than 3.8%. How is shortrange defined?

      B Where is the transition made between short-range and mid-range force response, both in reality and in the model. Also how does this change on the descending limb?

      C What components in the model contribute to the stiffness of the CE?

      A. Short-range stiffness acts over smaller strains than 3.8%. How is shortrange defined?

      The response to Reviewer 1’s comment # 5 directly addresses this question.

      B. Where is the transition made between short-range and mid-range forceresponse, both in reality and in the model. Also how does this change on the descending limb? We are going to rephrase the question because of changes in terminology that we have made in response to Reviewer 1’s comment #5.

      (i) What is the basis for the transition between the muscle behaving like an LTI system? Both in reality, and in the model. (ii) What happens outside the LTI range? (iii) Also how does this change on the descending limb?

      We will address this question one part at a time:

      (i) What is the basis for the transition between the muscle behaving like an LTI system? Both in reality, and in the model.

      A system’s response can be approximated as a linear-time-invariant (LTI) system as long as it is time-invariant, and its output can be expressed as a linear function of its input. In the context of Kirsch et al.’s experiment, the ‘system’ is the muscle, the ‘input’ is the time series of length data, and the ‘output’ is the time series of force data. Due to the requirement for timeinvariance, two experimental conditions must be met to approximate muscle as an LTI system:

      • the nominal length of the muscle stays constant over long periods of time,

      • and the nominal activation of the muscle stays constant.

      These conditions were met by default in Kirch et al.’s experiment, and also in our simulations of this experiment. The one remaining condition to assess is whether or not the muscle’s response is linear.

      To evaluate whether the muscle’s force is a linear function of the length change, Kirch et al. evaluated (Cxy)2 the coherence squared between the length and force time-series data. Even though the mathematical underpinnings of (Cxy)2 are complicated, the interpretation of (Cxy)2 is simple: muscle can be accurately approximated as a linear system if (Cxy)2 is close to 1, but the accuracy of this approximation becomes poor as (Cxy)2 approaches 0. Kirsch et al. used (Cxy)2 to identify a bandwidth in which the response of the muscle to the 1−3.8%ℓoM length changes was sufficiently linear for analysis: a lower bound of 4 Hz was identified using (Cxy)2 and the bandwidth of the input signal (15 Hz, 35 Hz, or 90 Hz) set the upper bound. In Fig. 3 of Kirsch et al. the (Cxy)2 at 4 Hz has a value of at least 0.67 for the 15 Hz and 90 Hz signals. To minimize error in our analysis and yet be consistent with Kirsch et al., we analyze the bandwidth common to both (Cxy)2 ≥ 0.67 and Kirsch et al.’s defined range. Though the bandwidth defined by the criteria (Cxy)2 ≥ 0.67 is usually larger than the one defined by Kirsch et al., there are some exceptions where the lower frequency bound of the models is higher than 4 Hz (now reported in Tables 4D and 5D).

      (ii) What happens outside the LTI range?

      When a muscle’s output cannot be considered a LTI it means that either that its length or activation is time-varying, or the relationship between length and force is no longer linear. In short, that the muscle is behaving as one would normally expect: time-varying and non-linearly. The wonderful part of Kirsch et al.’s work is that they found a surprisingly large region in the frequency domain where muscle behaves linearly and can be analyzed using the powerful tools of linear systems and signals.

      (iii) Also how does this change on the descending limb?

      Since nominal length of Kirsch et al.’s experiments is ℓoM it is not clear how the results of the perturbation experiments will change if the nominal length is moved firmly to the descending limb. However, we can see how the stiffness and damping values will change by examining Figure 9C and 9D which shows the calculated stiffness and damping of the VEXAT and Hill models as ℓM is lengthened from ℓoM down the descending limb: the stiffness and damping of the VEXAT model does not change much, while the Hill model’s stiffness changes sign and the damping coefficient changes a lot. What cannot be seen from Figure 9C and 9D is how the bandwidth over which the models are considered linear changes.

      We have made a number of updates to the text to more clearly communicate these details of our response to part (i):

      • Text has been edited so that it is clear that the terms ’short-range stiffness’ and ’small’ from Rack and Westbury’s work is not confused with ’stiffness’ and ’small’ from the LTI system’s analysis. Please see our response to comment # 5 for details.

      • We have added text to the main body of the paper to explain how the coherence squared metric was used to select a bandwidth in which the response of the system is approximately linear:

      • Revision: the paragraph that starts on page 11, column 1, line 3 ”Kirsch et al. used system identification ...”

      – Difference: page 13, column 2, line 1

      – Coherence is defined in Appendix D

      – Coherence is now also included in the example script ‘main SystemIdentificationExample.m’

      • The bandwidth over which model output can be considered linear (coherence squared > 0.67) has been added to Tables 4 and 5

      – Revision: see Table 4D, and Table 5D in Appendix E

      – Difference: see Table 4D, and Table 5D in Appendix E

      • Figures 6 and Figures 16 are annotated now if the plotted signal does not meet the linearity requirement of Cxy > 0.67.

      C. What components in the model contribute to the stiffness of the CE?

      There are three components that contribute to the stiffness of the CE which are pictured in Figure 1, appear in Eqn. 15, and are listed explicitly in Eqn. 76:

      (a) The XE, as represented by the afL(ℓ˜S+L˜M)k˜oX term in Eqn. 15.

      (b) The elasticity of the distal segment of titin, f2(ℓ˜2). Only f2(ℓ˜2) appears in Eqn. 15 because ℓ˜1 is a model state.

      (c) The extracellular matrix, as represented by the fECM(ℓ˜ECM)

      There is also a compressive element fKE, but it plays no role in the simulations presented in this work because it only begins to produce force at extremely short CE lengths (ℓ˜M < 0.1ℓoM).

      We have made the following changes to make these components clearer

      Figure 1A has been updated:

      – The symbols for a spring and a damper are now defined in Figure 1A

      – The ECM now has a spring symbol. Now all springs and dampers have the correct symbol in Figure 1A.

      – The caption now explicitly lists the rigid, viscoelastic, and elastic elements in the model

      The equations for the VEXAT’s CE stiffness and damping are now compared and contrasted to the the Hill model’s stiffness and damping in Sec. 3.1.

      – Revision: starting at page 14, column 2, line 1: Eqn. 28 and Eqn. 29 and surrounding text

      – Difference: page 17, column 1, line 22

      (3) This model appears to be an amalgamation of a phenomenological (forcelength and force-velocity relationships) and a mechanistic (crossbridge and titin stiffness and damping) model. While this may improve predictions, and so potentially be useful, it also seems like it limits the interpretation of physiological underpinnings of any findings. It may be helpful to explore in greater detail the implications of this approach.

      We have added a limitations paragraph to the discussion which addresses this comment and can be found in:

      • Revision: the paragraph beginning on page 22, column 1, line 11 ”Both the viscoelastic ...”

      • Difference: the paragraph beginning on page 24, column 1, line 27

      (4)As a biologist, I found the interpretation of phase and gain a little difficult and it may help the reader to show in greater detail the time series data and model predictions to highlight conditions under which the models do not accurately capture the magnitude and timing of force production.

      It is important that the ideas of phase and gain are understood, especially because little information can be gleaned from the time series data directly. There is some time series data in the paper already that compares each model’s response to its spring-damper of best fit: plots of the force response of each model and its spring damper of best fit can be found in Figures 6A, 6D, 6G, 6J, 16A, 16D, 16G, and 16J in the revised manuscript. While it is clear that models with a higher VAF more closely match the spring-damper of best fit, there is not much more that can be taken from time series data: the systematic differences, particularly in phase, are just not visually apparent in the time-domain but are clear in gain and phase plots in the frequency-domain.

      To make the meaning of phase and gain plots clearer, Figure 4 (Figure 5 in the first submission) has been completely re-made and includes plots that illustrate the entire process of going from two length and force timedomain signals to gain and phase plots in the frequency-domain. Included in this figure is a visual representation of transforming a signal from the time to the frequency domain (Fig. 4B and 4C), and also an illustration of the terms gain and phase (Fig. 4D). In addition, a small example file ’main SystemIdentificationExample.m’ has been added to the matlab code repository in the elife2023 branch to accompany Appendix D, which goes through the mathematics used to transform input and output time domain signals into gain and phase plots of the input-output relation. Small updates have been made to Figure 6 and 16 in the revised paper (Figures 7 and 18 in the first submission) to make the time domain signals from the spring-damper of best fit and the model output clearer. Finally, I have re-calculated the gain and phase profiles using a more advanced numerical method that trades off some resolution in frequency for more accuracy in the magnitude. This has allowed me to make Figures 6 and 16 easier to follow because the gain and phase responses are now lines rather than a scattering of points. We hope that these additions make the interpretation of gain and phase clearer.

      Please see

      Revision:

      – Figure 4 and caption on page 12

      – The opening 2 paragraphs of Sec 3.1 starting on page 10, column 2, line 4 ”In Kirsch et al.’s ...”

      – Figure 6 & 16: spring damper and model annotation added, plotted the gain and phase as lines

      – Appendix D: Updated to include coherence and the more advanced method used to evaluate the system transfer function, gain, and phase.

      Difference:

      – Figure 4 and caption on page 12

      – The opening 2 paragraphs of Sec 3.1 starting on page 12, column 1, line 34 and ending on page 13, column 2, line 29

      – Figure 6 & 16: spring damper and model annotation added

      – Appendix D

      (5) The actin-myosin and actin-titin load pathways are depicted as distinct in the model. However, given titin’s position in the center of myosin and the crossbridge connections between actin and myosin, this would seem to be an oversimplification. It seems worth considering whether the separation of these pathways is justified if it has any effect on the conclusions or interpretation.

      We have reworked one of the discussion paragraphs to focus on how our simulations would be affected by two mechanisms (Nishikawa et al.’s winding filament theory and DuVall et al.’s titin entanglement hypothesis) that make it possible for crossbridges to do mechanical work on titin.

      • Revision: the paragraph beginning on page 21, column 2, line 42 “The active titin model ...”

      • Difference: the paragraph beginning on page 23, column 2, line 48

      References

      Nishikawa KC, Monroy JA, Uyeno TE, Yeo SH, Pai DK, Lindstedt SL. Is titin a ‘winding filament’? A new twist on muscle contraction. Proceedings of the royal society B: Biological sciences. 2012 Mar 7;279(1730):981-90.

      DuVall M, Jinha A, Schappacher-Tilp G, Leonard T, Herzog W. I-Band Titin Interaction with Myosin in the Muscle Sarcomere during Eccentric Contraction: The Titin Entanglement Hypothesis. Biophysical Journal. 2016 Feb 16;110(3):302a.

    1. Author response:

      Reviewer #1 (Public Review):

      In this manuscript, Naseri et al. present a new strategy for identifying human genetic variants with recessive effects on disease risk by the genome-wide association of phenotype with long runs-of-homozygosity (ROH). The key step of this approach is the identification of long ROH segments shared by many individuals (termed "shared ROH diplotype clusters" by the authors), which is computationally intensive for large-scale genomic data. The authors circumvented this challenge by converting the original diploid genotype data to (pseudo-)haplotype data and modifying the existing positional Burrow-Wheeler transformation (PBWT) algorithms to enable an efficient search for haplotype blocks shared by many individuals. With this method, the authors identified over 1.8 million ROH diplotype clusters (each shared by at least 100 individuals) and 61 significant associations with various non-cancer diseases in the UK Biobank dataset.

      Overall, the study is well-motivated, highly innovative, and potentially impactful. Previous biobank-based studies of recessive genetic effects primarily focused on genome-wide aggregated

      ROH content, but this metric is a poor proxy for homozygosity of the recessive alleles at causal loci. Therefore, searching for the association between phenotype and specific variants in the homozygous state is a key next step towards discovering and understanding disease genes/alleles with recessive effects. That said, I have some concerns regarding the power and error rate of the methods, for both identification of ROH diplotype clusters and subsequent association mapping. In addition, some of the newly identified associations need further validation and careful consideration of potential artifacts (such as cryptic relatedness and environment sharing).

      1) Identification of ROH diplotype clusters.

      The practice of randomly assigning heterozygous sites to a homozygous state is expected to introduce errors, leading to both false positives and false negatives. An advantage that the authors claim for this practice is to reduce false negatives due to occasional mismatch (possibly due to genotyping error, or mutation), but it's unclear how much the false positive rate is reduced compared to traditional ROH detection algorithm. The authors also justified the "random allele drawing" practice by arguing that "the rate of false positives should be low" for long ROH segments, which is likely true but is not backed up with quantitative analysis. As a result, it is unclear whether the trade-off between reducing FNs and introducing FPs makes the practice worthwhile (compared to calling ROHs in each individual with a standard approach first followed by scanning for shared diplotypes across individuals using BWT). I would like to see a combination of back-of-envelope calculation, simulation (with genotyping errors), and analysis of empirical data that characterize the performance of the proposed method.

      In particular, I find the high number of ROH clusters in MHC alarming, and I am not convinced that this can be fully explained by a high density of SNPs and low recombination rate in this region. The authors may provide further support for their hypothesis by examining the genome-wide relationship between ROH cluster abundance and local recombination rate (or mutation rate).

      Thanks for this insightful comment. Through additional experiments, we confirmed that the excessive number of ROH clusters in the MHC region is due to the higher density of markers per centimorgan. As discussed above at Essential Revision 2, we took this opportunity to modify our code to search for clusters with the minimum length in terms of cM instead of sites. We have also provided the genetic distance for reported clusters in the MHC region with significant association (genetic length (cM) column in Tables 1 and 2). We include the following in the main text:

      “We searched for ROH clusters using a minimum target length of 0.1 cM (Figure 3–figure supplement 1). As shown in the figure, there is no excessive number of ROH clusters in chromosome 6 as was spotted using a minimum number of variant sites.”

      Methods section, ROH algorithm subsection:

      “We implemented ROH-DICE to allow direct use of genetic distances in addition to variant sites for L. The program can take minimum target length L directly in cM and detect all ROH clusters greater than or equal to the target length in cM. The program holds a genetic mapping table for all the available sites, and cPBWT was modified to work directly with the genetic length instead of the number of sites.”

      2) Power of ROH association. Given that the authors focused on long segments only (which is a limitation of the current method), I am concerned about the power of the association mapping strategy, because only a small fraction of causal alleles are expected to be present in long, homozygous haplotypes shared by many individuals. It would be useful to perform a power analysis to estimate what fraction of true causal variants with a given effect size can be detected with the current method. To demonstrate the general utility of this method, the authors also need to characterize the condition(s) under which this method could pick up association signals missed by standard GWAS with recessive effects considered. I suspect some variants with truly additive effects can also be picked up by the ROH association, which should be discussed in the manuscript to guide the interpretation of results.

      We added a new experiment in the Results section “Evaluation of ROH clusters in simulated data” under Power of ROH-DICE in association studies. We compared the power of the ROH cluster with additive, recessive, and dominant models. Our simulation shows that using ROH clusters outperforms standard GWAS when a phenotype is associated with a set of consecutive homozygous sites. We added the following text:

      “...We calculated the p-values for both ROH clusters and all variant sites. We used a p-value cut-off of 0.05 divided by the number of tests for each phenotype to determine whether the calculated p-value was smaller than the threshold, indicating an association. For GWAS, only one variant site within the ROH cluster, contributing to the phenotype, was required. We tested for all additive, dominant, and recessive effects (Figure 1–figure supplement 3). The figure demonstrates that ROH-DICE outperforms GWAS when a phenotype is associated with a set of consecutive homozygous sites. The maximum effect size of 0.3 resulted in ROH clusters achieving a power of 100%, whereas the additive model only achieved 11%, and the dominant and recessive models achieved 52% and 70%, respectively. The GWAS with recessive effect yields the best results among other GWAS tests, however, its power is still lower than using ROH clusters.”

      3) False positives of ROH association. GWAS is notoriously prone to confounding by population and environmental stratification. Including leading principal components in association testing alleviates this issue but is not sufficient to remove the effects of recent demographic structure and local environment (Zaidi and Mathieson 2020 eLife). Similar confounding likely applies to homozygosity mapping and should be carefully considered. For example, it is possible that individuals who share a lot of ROH diplotypes tend to be remotely related and live near each other, thus sharing similar environments. Such scenarios need to be excluded to further support the association signals.

      We acknowledge that there could be confounding factors that may affect the association's results. To address this, we utilized principal component (PC) values and additional covariates while using PHESANT after our initial Chi-square tests. We also included your comments in our Discussion section:

      "We used age, gender, and genetic principal components as confounding variables in the association analysis. Genetic principal components can reduce the confounding effect brought on by population structure but it may be insufficient to completely eliminate the effects of recent demographic structure and the local environment45. For example, individuals sharing excessive ROH diplotypes may share similar environments since they are closely related and reside close to one another. Since we did not rule out related individuals, some of the reported GWAS signals may not be attributable to ROH.”

      4) Validation of significant associations. It is reassuring that some of the top associations are indirectly corroborated by significant GWAS associations between the same disease and individual SNPs present in the ROH region (Tables 1 and 2). However, more sanity checks should be done to confirm consistency in direction of effect size (e.g., risk alleles at individual SNPs should be commonly present in risk-increasing ROH segment, and vice versa) and the presence of dominance effect.

      The beta values for effect size are now included in all reported tables. All beta values for ROH-DICE are positive indicating carriers of these ROH diplotypes may increase the risk of certain non-cancerous diseases. Moreover, we conducted the suggested sanity check to confirm the consistency of the direction of risk-inducing ROH diplotypes and risk alleles.

      We also computed D’ as a measure of linkage between the reported GWAS results and ROH clusters. We found that most of the GWAS results and ROH clusters are strongly correlated. However, in a few cases, D' is small or close to zero. In such cases, the reported p-value from GWAS was also insignificant, while the ROH cluster indicated a significant association. We included these points in the Results section.

      Reviewer #3 (Public Review):

      A classic method to detect recessive disease variants is homozygosity mapping, where affected individuals in a pedigree are scanned for the presence of runs of homozygosity (ROH) intersecting in a given region. The method could in theory be extended to biobanks with large samples of unrelated individuals; however, no efficient method was available (to the best of my knowledge) for detecting overlapping clusters of ROH in such large samples. In this paper, the authors developed such a method based on the PBWT data structure. They applied the method to the UK biobank, finding a number of associations, some of them not discovered in single SNP associations.

      Major strengths:

      •           The method is innovative and algorithmically elegant and interesting. It achieves its purpose of efficiently and accurately detecting ROH clusters overlapping in a given region. It is therefore a major methodological advance.

      •           The method could be very useful for many other researchers interested in detecting recessive variants associated with any phenotype.

      •           The statistical analysis of the UK biobank data is solid and the results that were highlighted are interesting and supported by the data.

      Major weaknesses:

      •           The positions and IDs of the ROH clusters in the UK biobank are not available for other researchers. This means that other researchers will not be able to follow up on the results of the present paper.

      We included the SNP IDs, positions, and consensus alleles for all reported loci in the main tables. Moreover, additional information including beta and D’ values were added. The current information should allow researchers to follow up on the results. Supplementary File 2 contains beta, D’ values for all reported clusters.

      Supplementary File 3 contains the SNP IDs and consensus alleles for all reported clusters in Tables 1 and 2. The consensus allele denotes the allele with the highest occurrence in the reported clusters.

      •           The vast majority of the discoveries were in regions already known to be associated with their respective phenotypes based on standard GWAS.

      We agree that a majority of the ROH regions are indeed consistent with GWAS. However, some regions were missed by standard GWAS (e.g. chr6:25969631-26108168, hemochromatosis). Our message is that our method is a complementary approach to standard GWAS and will not replace standard GWAS analysis. See our response to Reviewer #2 Point Six.

      •           The running time seems rather long (at least for the UK biobank), and therefore it will be difficult for other researchers to extensively experiment with the method in very large datasets. That being said, the method has a linear running time, so it is already faster than a naïve algorithm.

      Thank you for your input. The algorithm used to locate matching blocks is efficient and the total CPU hours it consumed was the reported run time. Since it consumes very little memory and resources, it can be executed simultaneously for all chromosomes. We also noticed that a significant time was being spent parsing the input file and slightly modified our script to improve the parsing. We also re-ran it for all chromosomes in parallel and reported the elapsed time which was only 18 hours and 54 minutes.

      “This was achieved by running the ROH-DICE program, with a wall clock time of 18 hours and 54 minutes where the program was executed for all chromosomes in parallel (total CPU hours of ~ 242.5 hours). The maximum residence size for each chromosome was approximately 180 MB.”

    1. Author response;

      Reviewer #1 (Public Review):

      Authors investigated the role of OBOX4 in the zygotic genome activation (ZGA) in mice. Obox4 genes form an array of duplicated genes they were identified as a candidate ZGA factor based on expression patterns during early development. The role of OBOX4 was subsequently studied in embryonic stem cells and early embryos. It was found that transcriptional activation mediated by OBOX4 has similar features as that of DUX, which was previously identified as a zygotic transcription factor involved in ZGA and a major activator of the zygotic expression program. It was, however, unexpected that Dux knock-out did not impair embryonic development. The work by Guo et al. provides several lines of evidence that OBOX4-mediated activation of gene expression considerably overlaps with that of DUX and this redundancy might explain the loss of early developmental phenotype in Dux mutants. Consistent with this model, double mutants of Obox4 and Dux show impaired development. Given the difficulties with investigating details of the genetic model in double mutants at the preimplantation embryo stage, authors not only crossed genetic mutants, but also used (1) nuclear transfer of mutated nuclei of ESCs, which could be characterized on their own in separate experiments, and (2) antisense oligonucleotides (ASO) microinjection, which included a rescue control demonstrating that reintroducing OBOX4 is sufficient to rescue the phenotype caused by blocking both, Dux and Obox4.

      This work is important for the field because it reveals functional redundancy and plasticity of the zygotic genome activation in mammals, where the mouse model stands as a remarkable example of genome activation, which massively integrated long terminal repeat (LTR)-derived enhancers from retrotransposons and now two of the key activating zygotic factors appear to be encoded by tandemly duplicated clusters of different phylogenetic age. Identification of OBOX4 as a second factor partially redundant with DUX now allows us to decipher what constitutes the essential part of the ZGA program.

      We are grateful for the reviewer’s appreciation of our work, particularly the technical difficulty of knocking out two multicopy genes and the value of the rescue experiment.

      Reviewer #2 (Public Review):

      In this study, Guo et al., screened a few homeobox transcription factors and identified that Obox4 can induce the 2-cell like state in mouse embryonic stem cells (mESCs) (Fig. 1 and 2). The authors also compared in detail how Obox4 vs. Dux in activating 2C repeats and genes in mESCs (Fig. 3). Compared to Dux, Obox4 activates fewer 2C genes (Fig. 2). In addition, although both Obox4 and Dux bind to MERVL elements, Obox4 additionally binds to ERVK (Fig. 3). The authors then used three different approaches (i.e., SCNT-mediated KO, ASO-mediated KD, and genetic KO) to study how Obox4 and Dux regulates zygotic genome activation in embryos. Although there are some inconsistencies among different approaches, the authors were able to show that loss of both Obox4 and Dux causes more severe consequences than loss of single protein in embryonic development and zygotic genome activation (Fig. 4 and 5).

      Overall, this is a comprehensive study that addresses an important question that puzzles the community. However, some comparisons to the recent work by Ji et al (PMID: 37459895) are highly recommended. Ji et al knocked out the entire Obox cluster (including Obox4) in mice and found that Obox cluster KO causes 2-4 cell arrest without affecting Dux. That said, Obox proteins seem more critical than Dux in regulating ZGA, and Obox cluster KO cannot be compensated by Dux. Ji et al., also reported that maternal (Obox1, 2, 5, 7) and zygotic (Obox3, 4) Obox proteins redundantly regulate embryogenesis because loss of either is compatible to development. Consistent with Ji's work, Obox4 KO embryos generated in this study can develop to adulthood and are fertile. Since these two studies are highly relevant, some comparisons of Obox4 KO and Obox4/Dux DKO with the previous Obox cluster KO will greatly benefit the community.

      We thank the reviewer for appreciating the value of our study. We are aware of the work done to high standard by Ji et al. and have included a comparison between our data and the data by Ji et al. in the revised manuscript. Despite repeated attempts, various crossing strategies failed to produce Obox4KO/DuxKO mating pairs that could be used to produce large number of Obox4KO/DuxKO embryos required for in-depth transcriptome analysis. Based on the quality of the RNA-seq, we decided to perform comparative analysis using our ASO KD data and showed that Obox4 has distinct regulatory targets from those of other Obox family members, which is consistent with the phylogenetic distance within the family.

    1. Author response:

      A general comment was that this study left several key questions unanswered, in particular the causal mechanism for the reported ribosomal distributions. We have been interested in the evolution of asymmetric bacterial growth and aging for many years. However, a motivational difference is that we are more interested in the evolutionary process, and evolution by natural selection works on the phenotype. Thus, we wanted to start with the phenotype closest to fitness, appropriately defined for the conditions, work downwards. We examined first the asymmetry of elongation rates in single cells, then gene products, and now ribosomes. As we have pointed out, our demonstration of ribosomal asymmetry shows that the phenomenon was not peculiar and unique to the gene products we examined. Rather, the asymmetry is acting higher up in the metabolic network and likely affecting all genes. We find such conceptual guidance to be important. In the ideal world, of course we would have liked to have worked out the causal mechanisms in one swoop. In a less than ideal situation, it is a subjective decision as where to stop. We believe that the publication of this manuscript is more than appropriate at this juncture. We work at the interface of evolutionary theory and microbiology. Our results could appeal to both fields. If we attract new researchers, progress could be accelerated. Could the delay caused by publishing only completed stories slow the rate of discovery? These questions are likely as old as science (e.g., https://telliamedrevisited.wordpress.com/2021/01/28/how-not-to-write-a-response-to-reviewers/).

      We present below our response to specific comments by reviewers. We have not added a new discussion of papers suggested by Reviewer #1 because we feel that the speculations would have been too unfocused. We were already criticized for speculation in the Discussion about a link between aggregate size and ribosomal density.

      Respond to Major comments by Reviewer #1.

      (a) Fig. 1 only shows 2 divisions (rather than 3 as per Rev1) to avoid an overly elaborate figure. We have added text to the figure legend that the old and new poles and daughters in the subsequent 3, 4, 5, 6, and 7 generations can be determined by following the same notations and tracking we presented for generations 1 and 2 in Fig. 1. For example, if we know the old and new poles of any of the four daughters after 2 divisions (as in Fig. 1), and allow that daughter to elongate, become a mother, and divide to produce 2 “grand-daughters”, the polarity of the grand-daughters can also be determined.

      (b) Because division times were normalized and analyzed as quartiles, the raw values were never used. Rather than annotating unused values, we have provided the mean division times in the Material and Methods section on normalization to provide representative values.

      (c) We did not quantify in our study the changes over generations for three reasons. First, the sample sizes for the first generations (cohorts of 1, 2, 4, and 8 cells) are statistically small. Second, and most importantly, cells on an agar pad in a microscope slide, despite being inoculated as fresh exponentially growing cells, experience a growth lag, as all cells transferred to a new physiological condition. Thus, to be safe, we do not collect data from cohorts 1, 2, 4, and 8 to ensure that our cells are as much as possible physiologically uniform. Lastly, as we noted in the Material and Methods they also slow down after 7 generations (128 cells). Thus, we have collected ribosome and length measurements primarily from cohorts 16, 32, 64, and 128. Measurable cells from the 128 cohort are actually rare because a colony with that many cells often starts to form double layers, which are not measurable. Most of our measurements came from the 16, 32, and 64 cohorts, in which case a time series would not be meaningful. Some of these details were not included in our manuscript but have been added to the Material and Methods (Microscopy and time-lapse movies). For these reasons we have not added a time series as requested by the reviewer.

      (d) We have added the additional figure as requested, but as a supplement rather than in the main article (Supplemental Materials Fig. S1). This figure showed the normalized density of ribosomes along the normalized length of old and new daughters. The density was continuous rather than quartiles. This figure was included in the original manuscript, but readers recommended that it be removed because the all the analyzed data had been done with quartiles. Readers felt mislead and confused.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      We greatly appreciate the comments from the editor and the reviewers, based on which we have made the revisions. We have responded to all the questions and summarized the revisions below. The changes are also highlighted in the manuscript.

      Additionally, we’ve noticed a few typos in the manuscript presented on the eLife website, which were not there in our originally submitted file.

      (1) In both the “Full text” presented on the eLife website and the pdf file generated after clicking “Download”: the last FC1000 in the second paragraph of the “Extensive induction curves fitting of TetR mutants” section should be FC1000WT .

      (2) In the pdf file generated after clicking “Download”: the brackets are all incorrectly formatted in the captions of Figure 4 and Figure 3—figure supplement 6.

      eLife assessment

      The fundamental study presents a two-domain thermodynamic model for TetR which accurately predicts in vivo phenotype changes brought about as a result of various mutations. The evidence provided is solid and features the first innovative observations with a computational model that captures the structural behavior, much more than the current single-domain models.

      We appreciate the supportive comments by the editor and reviewers.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors’ earlier deep mutational scanning work observed that allosteric mutations in TetR (the tetracycline repressor) and its homologous transcriptional factors are distributed across the structure instead of along the presumed allosteric pathways as commonly expected. Especially, in addition, the loss of the allosteric communications promoted by those mutations, was rescued by additional distributed mutations. Now the authors develop a two-domain thermodynamic model for TetR that explains these compelling data. The model is consistent with the in vivo phenotypes of the mutants with changes in parameters, which permits quantification. Taken together their work connects intra- and inter-domain allosteric regulation that correlate with structural features. This leads the authors to suggest broader applicability to other multidomain allosteric proteins. Here the authors follow their first innovative observations with a computational model that captures the structural behavior, aiming to make it broadly applicable to multidomain proteins. Altogether, an innovative and potentially useful contribution.

      We thank the reviewer for the supportive comments.

      Weaknesses:

      None that I see, except that I hope that in the future, if possible, the authors would follow with additional proteins to further substantiate the model and show its broad applicability. I realize however the extensive work that this would entail.

      We thank the reviewer for the supportive comments and the suggestion to extend the model to other proteins, which we indeed plan to pursue in future studies.

      Reviewer #2 (Public Review):

      Summary:

      This combined experimental-theoretical paper introduces a novel two-domain statistical thermodynamic model (primarily Equation 1) to study allostery in generic systems but focusing here on the tetracycline repressor (TetR) family of transcription factors. This model, building on a function-centric approach, accurately captures induction data, maps mutants with precision, and reveals insights into epistasis between mutations.

      Strengths:

      The study contributes innovative modeling, successful data fitting, and valuable insights into the interconnectivity of allosteric networks, establishing a flexible and detailed framework for investigating TetR allostery. The manuscript is generally well-structured and communicates key findings effectively.

      We thank the reviewer for the supportive comments.

      Weaknesses:

      The only minor weakness I found was that I still don’t have a better sense into (a) intuition and (b) mathematical derivation of Equation 1, which is so central to the work. I would recommend that the authors provide this early on in the main text.

      We thank the reviewer for the suggestion. The full mathematical derivation of Equation 1 is given in the first section of the supplementary file. Given the length of the derivation, we think it’s better to keep it in the supplementary file rather than the main text. In the main text, the first subsection (overview of the two-domain thermodynamic model of allostery) of the Results section and the paragraph right before Equation 1 are meant for providing intuitive understandings of the two-domain model and the derivation of Equation 1, respectively.

      We would also like to point the reviewer to Figure 2-figure supplement 2 and Equations (12) to (18) in the supplementary file for an alternative derivation. They show that the equilibria among all molecular species containing the operator are dictated by the binding free energies, the ligand concentration, and the allosteric parameters. The probability of an unbound operator (proportional to the probability that the promoter is bound by a RNA polymerase, or the gene expression level) can thus be calculated using Equation (12), which then leads to main text Equation 1 following the derivation given there.

      Additionally, we’ve added a paragraph to the main text (line 248-260) to aid an intuitive understanding of Equation 1.

      “The distinctive roles of the three biophysical parameter on the induction curve as stipulated in Equation 1 could be understood in an intuitive manner as well. First, the value of εD controls the intrinsic strength of binding of TetR to the operator, or the intrinsic difficulty for ligand to induce their separation. Therefore, it controls how tightly the downstream gene is regulated by TetR without ligands (reflected in leakiness) and affects the performance limit of ligands (reflected in saturation). Second, the value of εL controls how favorable ligand binding is in free energy. When εL increases, the binding of ligand at low concentrations become unfavorable, where the ligands cannot effectively bind to TetR to induce its separation from the operator. Therefore, the fold-change as a function of ligand concentration only starts to noticeably increase at higher ligand concentrations, resulting in larger EC50. Third, as discussed above, γ controls the level of anti-cooperativity between the ligand and operator binding of TetR, which is the basis of its allosteric regulation. In other words, γ controls how strongly ligand binding is incompatible with operator binding for TetR, hence it controls the performance limit of ligand (reflected in saturation).”

      We hope that the reviewer will find this explanation helpful.

      Reviewer #3 (Public Review):

      Summary:

      Allosteric regulations are complicated in multi-domain proteins and many large-scale mutational data cannot be explained by current theoretical models, especially for those that are neither in the functional/allosteric sites nor on the allosteric pathways. This work provides a statistical thermodynamic model for a two-domain protein, in which one domain contains an effector binding site and the other domain contains a functional site. The authors build the model to explain the mutational experimental data of TetR, a transcriptional repress protein that contains a ligand and a DNA-binding domain. They incorporate three basic parameters, the energy change of the ligand and DNA binding domains before and after binding, and the coupling between the two domains to explain the free energy landscape of TetR’s conformational and binding states. They go further to quantitatively explain the in vivo expression level of the TetR-regulated gene by fitting into the induction curves of TetR mutants. The effects of most of the mutants studied could be well explained by the model. This approach can be extended to understand the allosteric regulation of other two-domain proteins, especially to explain the effects of widespread mutants not on the allosteric pathways. Strengths: The effects of mutations that are neither in the functional or allosteric sites nor in the allosteric pathways are difficult to explain and quantify. This work develops a statistical thermodynamic model to explain these complicated effects. For simple two-domain proteins, the model is quite clean and theoretically solid. For the real TetR protein that forms a dimeric structure containing two chains with each of them composed of two domains, the model can explain many of the experimental observations. The model separates intra and inter-domain influences that provide a novel angle to analyse allosteric effects in multi-domain proteins.

      We thank the reviewer for the supportive comments.

      Weaknesses:

      As mentioned above, the TetR protein is not a simple two-main protein, but forms a dimeric structure in which the DNA binding domain in each chain forms contacts with the ligand-binding domain in the other chain. In addition, the two ligand-binding domains have strong interactions. Without considering these interactions, especially those mutants that are on these interfaces, the model may be oversimplified for TetR.

      We thank the reviewer for this valid concern and acknowledge that TetR is a homodimer. However, we’ve deliberately chosen to simplify this complexity in our model for the following reasons.

      (1) In this work, we aim to build a minimalist model for two-domain allostery withonly the most essential parameters for capturing experimental data. The simplicity of the model helps promote its mechanistic clarity and potential transferability to other allosteric systems.

      (2) Fewer parameters are needed in a simpler model. Our two-domain modelcurrently uses only three biophysical parameters, which are all demonstrated to have distinct influences on the induction curve (see the main text section “System-level ramifications of the two-domain model”). This enables the inference of parameters with high precision for the mutants, and the quantification of the most essential mechanistic effects of their mutations, provided that the model is shown to accurately recapitulate the comprehensive dataset. Thus, we found it was unnecessary to add another parameter for explicitly describing inter-chain coupling, which would likely incur uncertainty in the inference of parameters due to the redundancy of their effects on induction data, and prevent the model from making faithful predictions.

      (3) From a more biological point of view, TetR is an obligate dimer, meaning thatthe two chains must synchronize for function, supporting the two-domain simplification of TetR for binding concerns.

      Additionally, as shown in the subsection “Inclusion of single-ligand-bound state of repressor” of section 1 of the supplementary file, incorporating the dimeric nature of TetR in our model by allowing partial ligand binding does not change the functional form of main text equation 1 in any practical sense. Therefore, considering all the factors stated above, we think that increasing the complexity of the two-domain model will only be necessary if additional data emerge to suggest the limitation of our model.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      This is an excellent work. I have only one suggestion for the authors. Interestingly, the authors also note that the epistatic interactions that they obtain are consistent with the structural features of the protein, which is not surprising. Within this framework, have the authors considered rescue mutations? Please see for example PMID: 18195360 and PMID: 15683227. If I understand right, this might further extend the applicability of their model. If so, the authors may want to add a comment to that effect.

      We thank the reviewer for the supportive comments and for pointing us to the useful references. We have added some comments to the main text regarding this point in line 332-336: “The diverse mechanistic origins of the rescuing mutations revealed here provide a rational basis for the broad distributions of such mutations. Integrating such thermodynamic analysis with structural and dynamic assessment of allosteric proteins for efficient and quantitative rescuing mutation design could present an interesting avenue for future research, particularly in the context of biomedical applications (PMID: 18195360, PMID: 15683227).”

      Reviewer #3 (Recommendations For The Authors):

      The authors should try to build a more realistic dimeric model for TetR to see if it could better explain experimental data. If it were too complicated for a revision, more discussions on the weakness of the current model should be given.

      We thank the reviewer for this valid concern and for the suggestion. The reasons for refraining from increasing the complexity of the model are fully discussed in our response to the reviewer’s public review given above. Primarily, we think that the value of a simple physical model is two-fold (e.g., the paradigm Ising model in statistical physics and the classic MWC model), first, its mechanistic clarity and potential transferability makes it a useful conceptual framework for understanding complex systems and establishing universal rules by comparing seemingly unrelated phenomena; second, it provides useful insights and design principles of specific systems if it can quantitatively capture the corresponding experimental data. Thus, given the current experimental data set, we believe it is justified to keep the two-domain model in its current form, while additional experimental data could necessitate a more complex model for TetR allostery in the future. Relevant discussions are added to the main text (line 443-446) and section 8 of the supplementary file.

      “It’s noted that the homodimeric nature of TetR is ignored in the current two-domain model to minimize the number of parameters, and additional experimental data could necessitate a more complex model for TetR allostery in the future (see supplementary file section 8 for more discussions).”

      Minor issues:

      (1) There is an error in Figure 3A, the 13th and 14th subgraphs are the same and should be corrected.

      We thank the reviewer for capturing this error, which has been corrected in the revised manuscript.

      (2) The criteria for the selection of mutants for analysis should be clearly given. Apart from deleting mutants that are in direct contact with the ligand of DNA, how many mutants are left, and how far are they are from the two sites? In line 257, what are the criteria for selecting these 15 mutants? Similarly, in line 332, what are the criteria for selecting these 8 mutants?

      We thank the reviewer for this comment. The data selection criteria are now added in section 7 of the supplementary file. The distances to the DNA operator and ligand of the 21 residues under mutational study are now added in Table 1 (Figure 3-figure supplement 9). The added materials are referenced in the main text where relevant.

      “7. Mutation selection for two-domain model analysis

      In this work, there are 24 mutants studied in total including the WT, and they contain mutations at 21 WT residues. We did not perform model parameter inference for the mutant G102D because of its flat induction curve (see the second subsection of section 2 and main text Figure 2—figure Supplement 3). Therefore, there are 23 mutants analyzed in main text Figure 5.

      Measuring the induction curve of a mutant involves a significant amount of experimental effort, which therefore is hard to be extended to a large number of mutants. Nonetheless, we aim to compose a set of comprehensive induction data here for validating our two-domain model for TetR allostery. To this end, we picked 15 individual mutants in the first round of induction curve measurements, which contains mutations spanning different regions in the sequence and structure of TetR (main text Figure 3—figure Supplement 1). Such broad distribution of mutations across LBD, DBD and the domain interface could potentially lead to diverse induction curve shapes and mutant phenotypes for validating the two-domain model. Indeed, as discussed in the main text section "Extensive induction curves fitting of TetR mutants", the diverse effects on induction curve from mutations perturbing different allosteric parameters predicted by the model, are successfully observed in these 15 experimental induction curves. Additionally, 5 of the 15 mutants contain a dead-rescue mutation pair, which helps us validate the model prediction that a dead mutation could be rescued by rescuing mutations that perturb the allosteric parameters in various ways.

      Eight mutation combinations were chosen for the second round of induction curve measurement for studying epistasis, where we paired up C203V and Y132A with mutations from different regions of the TetR structure. Such choice is largely based on two considerations. 1. As both C203V and Y132A greatly enhance the allosteric response of TetR, we want to probe why they cannot rescue a range of dead mutations as observed previously (PMID: 32999067). 2. C203V and Y132A are the only two mutants that show enhanced allosteric response in the first round of analysis. Combining detrimental mutations of allostery in a combined mutant could potentially lead to near flat induction curve, which is less useful for inference (see the second subsection of section 2).”

      Since the number of hotspots identified by DMS is not very large, why not analyze them all?

      We thank the reviewer for this comment. There are 41 hotspot residues in TetR (PMID: 36226916), which have 41*19=779 possible single mutations. It’s unfeasible to perform induction curve measurements for all of these 779 mutants in our current experiment. However, we agree that it would be helpful if we can obtain such a dataset in an efficient way.

      In line 257, there are 15 mutants mentioned, while in Figure 5, there are 23 mutants mentioned, in Figure 3-figure supplement 1, there are 21 mutants mentioned, and in line 226 of the supplementary file, there are 24 mutants mentioned, which is very confusing. Therefore, the data selection criteria used in this article should be given.

      We thank the reviewer for this comment. The data selection criteria are now given in section 7 of the supplementary file, which should clarify this confusion.

      (3) In Figure 4 of the Exploring epistasis between mutations section, the 6 weights of the additive models corresponding to each mutation combination are different. On one hand, it seems that there are no universal laws in these experimental data. On the other hand, unique parameters of a single mutation combination were not validated in other mutation combinations, which somewhat weakened the conclusions about the potential physical significance of these additive weights.

      We thank the reviewer for this comment. We admit that a quantitative universal law for tuning the 6 weights of the additive model does not manifest in our data, which indicates the mutation-specific nature of epistatic interactions in TetR as hinted in the different rescuing mutation distributions of different dead mutations (PMCID: PMC7568325). However, clear common trends in the weight tuning of combined mutants that contain common mutations do emerge, which comply with the structural features of the protein and provide explanations as to why C203V and Y132A don’t rescue a range of dead mutations (main text section “Exploring epistasis between mutations”). Additionally, the lack of a quantitative universal rule for tuning the 6 weights in our simple model doesn’t exclude the possibility of the existence of universal law for epistasis in TetR in another functional form, a point that could be explored in the future with more extensive joint experimental and computational investigations.

      In Eq. (27) of the supplementary file, the prior distribution of inter-domain coupling γ is given as a Gaussian distribution centered at 5 kBT. Since the absolute value of γ is important, can the authors explain why the prior distribution of γ is set to this value and what happens if other values are used?

      We thank the reviewer for the question. As explained in the corresponding discussions of Eq. (27) in the supplementary file, the prior of γ is chosen to serve as a soft constraint on its possible values based on the consideration that 1. inter-domain energetics for a TetR-like protein should be on the order of a few kBT; and 2. the prior distribution should reflect the experimental observation in the literature that γ has a small probability of adopting negative values upon mutations. Given our thorough validation of the statistical model and computational algorithm (see section 3 of the supplementary file), and the high precision in the parameter fitting results using experimental data (Figure 3 and Figure 4-figure supplement 2), we conclude that 1. the physical range of parameters encoded in their chosen prior distributions agrees well with the value reflected in the experimental data; 2. the inference results are predominantly informed by the data. Thus, changing the mean of the prior distribution of γ should not affect the inference results significantly given that it remains in the physical range.

      This point is explicitly shown in the added Table 2 (Figure 3-figure supplement 10), where we compare the current Bayesian inference results with those obtained after increasing the standard deviation of the Gaussian prior of γ from 2.5 to 5 kBT. As shown in the table, most inference results stay virtually unchanged at the use of this less informative prior, which confirms that they are predominantly informed by the data. The only exceptions are the slight increase of the inferred γ values for C203V, C203V-Y132A and C203V-G102D-L146A, reflecting the intrinsic difficulty of precise inference of large γ values with our model, as is already discussed in the second subsection of section 3 of the supplementary file. However, such observations comply with the common trend of epistatic interactions involving C203V presented in the main text and don’t compromise the ability of our model to accurately capture the induction curves of mutants. Relevant discussions are now added to the second subsection of section 3 of the supplementary file (line 368-385).

      “In our experimental dataset, such inference difficulty is only observed in the case of C203V, Y132A-C203V and C203V-G102D-L146A due to their large γ and γ + εL values (see main text Figure 3, Figure 3—figure Supplement 10 and Figure 4). As shown in main text Figure 3—figure Supplement 10, the inference results for the other 20 mutants stay highly precise and virtually unchanged after increasing the standard deviation of the Gaussian prior of γ (gstdγ ) from 2.5 to 5 kBT. This demonstrates that the inference results for these mutants are strongly informed by the induction data and there is no difficulty in the precise inference of the parameter values. On the other hand, the inferred γ values (especially the upper bound of the 95% credible region) for C203V, Y132A-C203V and C203V-G102D-L146A increased with gstdγ . This is because the induction curves in these cases are not sensitive to the value of γ given that it’s large enough as discussed above. Hence, when unphysically large γ values are permitted by the prior distribution, they could enter the posterior distribution as well. Such difficulty in the precise inference of γ values for these three mutants however, doesn’t compromise the ability of our model in accurately capturing the comprehensive set of induction data (see part iv below). Additionally, the increase of the inferred γ value of C203V at the use of larger gstdγ complies with the results presented in main text Figure 4, which show that the effect of C203V on γ tends to be compromised when combined with mutations closer to the domain interface."

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      The study provides potentially fundamental insight into the function and evolution of daily rhythms. The authors investigate the function of the putative core circadian clock gene Clock in the cnidarian Nematostella vectensis. While it parts still incomplete, the evidence suggests that, in contrast to mice and fruit flies, Clock in this species is important for daily rhythms under constant conditions, but not under a rhythmic light/dark cycle, suggesting that the major role of the circadian oscillator in this species could be a stabilizing function under non-rhythmic environmental conditions.

      Public Reviews:

      Reviewer #1 (Public Review):

      In this nice study, the authors set out to investigate the role of the canonical circadian gene Clock in the rhythmic biology of the basal metazoan Nematostella vectensis, a sea anemone, which might illuminate the evolution of the Clock gene functionality. To achieve their aims the team generated a Clock knockout mutant line (Clock-/- ) by CRISPR/Cas9 gene deletion and subsequent crossing. They then compared wild-type (WT) with Clock-/- animals for locomotor activity and transcriptomic changes over time in constant darkness (DD) and under light/dark cycles to establish these phenotypes under circadian control and those driven by light cycles. In addition, they used Hybridization Chain Reaction-In situ Hybridization (HCR-ISH) to demonstrate the spatial expression of Clock and a putative circadian clocl-controlled gene Myh7 in whole-mounted juvenile anemones.

      The authors demonstrate that under LD both WT and Clock-/- animals were behaviourally rhythmic but under DD the mutants lost this rhythmicity, indicating that Clock is necessary for endogenous rhythms in activity. With altered LD regimes (LD6:6) they show also that Clock is light-dependent. RNAseq comparisons of rhythmic gene expression in WT and Clock-/- animals suggest that clock KO has a profound effect on the rhythmic genome, with very little overlap in rhythmic transcripts between the two phenotypes; of the rhythmic genes in both LD and DD in WT animals (220- termed clock-controlled genes, CCGS) 85% were not rhythmic in Clock-/- animals in either light condition. In silico gene ontology (GO) analysis of CCGS reflected process associated with circadian control. Correspondingly, those genes rhythmic in KO animals under DD (here termed neoCCGs) were not rhythmic in WT, lacked upstream E-box motifs associated with circadian regulation, and did not display any GO enrichment terms. 'Core' circadian genes (as identified in previous literature) in WT and Clock-/- animals were only rhythmic under entrainment (LD) conditions whilst Clock-/- displayed altered expression profiles under LD compared to WT. Comparing CCGs with previous studies of cycling genes in Nematostellar, the authors selected a gene from 16 rhythmic transcripts. One of these, Myh7 was detectable by both RNAseq and HCR-ISH and considered a marker of the circadian clock by the authors.

      The authors claim that the study reveals insights into the evolutionary origin of circadian timing; Clock is conserved across distant groups of organisms, having a function as a positive regulator of the transcriptional translational feedback loop at the heart of daily timing, but is not a central element of the core feedback loop circadian system in this basal species. Their behavioural and transcriptomic data largely support the claims that Clock is necessary for endogenous daily activity but that the putative molecular circadian system is not self-sustained under constant darkness (this was known already for WT animals)- rather it is responsive to light cycles with altered dynamics in Clock-/- specimens in some core genes under LD. In the main, I think the authors achieved their aims and the manuscript is a solid piece of important work. The Clock-/- animal is a useful resource for examining time-keeping in a basal metazoan.

      The work described builds on other transcriptomic-based works on cnidaria, including Nematostellar, and does probe into the molecular underpinnings with a loss-of-function in a gene known to be core in other circadian systems. The field of chronobiology will benefit from the evolutionary aspect of this work and the fact that it highlights the necessity to study a range of non-model species to get a fuller picture of timing systems to better appreciate the development and diversity of clocks.

      Strengths:

      The generation of a line of Clock mutant Nematostellar is a very useful tool for the chronobiological community and coupled with a growing suite of tools in this species will be an asset. The experiments seem mostly well conceived and executed (NB see 'weaknesses'). The problem tackled is an interesting one and should be an important contribution to the field.

      Weaknesses:

      I think the claims about shedding light on the evolutionary origin of circadian time maintenance are a little bold. I agree that the data do point to an alternative role for Clock in this animal in light responsiveness, but this doesn't illuminate the evolution of time-keeping more broadly in my view. In addition, these are transcriptomic data and so should be caveated- they only demonstrate the expression of genes and not physiology beyond that. The time-course analysis is weakened by its low resolution, particularly for the RAIN algorithm when 4-hour intervals constrain the analysis. I accept that only 24h rhythms were selected in the analysis from this but, it might be that detail was lost - I think a preferred option would be 2 or 3-hour resolution or 2 full 24h cycles of analysis.

      The authors discount the possibility of the observed 12h rhythmicity in Clock-/- animals by exposing them to LD6:6 cycles before free-running them in DD. I suggest that LD cycles are not a particularly robust way to entrain tidal animals as far as we know. Recent papers show inundation/mechanical agitation are more reliable cues (Kwiatkowski ER, et al. Curr Biol. 2023, 2;33(10):1867-1882.e5. doi: 10.1016/j.cub.2023.03.015; Zhang L., et al Curr Biol. 2013, 23;19, 1863-1873 doi.org/10.1016/j.cub.2013.08.038.) and might be more effective in revealing endogenous 12h rhythms in the absence of 24h cues.

      Response: We removed the suggestion that we used 6:6h LD to perform tidal entrainment. We generated this ultradian light condition to address the 24h rhythmicity observed in the NvClk1-/- in 12:12h LD.

      Reviewer #2 (Public Review):

      This manuscript addresses an important question: what is the role of the gene Clock in the control of circadian rhythms in a very primitive group of animals: Cnidaria. Clock has been found to be essential for circadian rhythms in several animals, but its function outside of Bilaterian animals is unknown. The authors successfully generated a severe loss-of-function mutant in Nematostella. This is an important achievement that should help in understanding the early evolution of circadian clocks. Unfortunately, this study currently suffers from several important weaknesses. In particular, the authors do not present their work in a clear fashion, neither for a general audience nor for more expert readers, and there is a lack of attention to detail. There are also important methodological issues that weaken the study, and I have questions about the robustness of the data and their analysis. I am hoping that the authors will be able to address my concerns, as this work should prove important for the chronobiology field and beyond. I have highlighted below the most important issues, but the manuscript needs editing throughout to be accessible to a broad audience, and referencing could be improved.

      Major issues:

      (1) Why do the authors make the claim in the abstract that CLOCK function is conserved with other animals when their data suggest that it is not essential for circadian rhythms? dCLK is strictly required in Drosophila for circadian rhythms. In mammals, there are two paralogs, CLOCK and NPAS2, but without them, there are no circadian rhythms either. Note also that the recent claim of BMAL1-independent rhythms in mammals by Ray et al., quoted in the discussion to support the idea that rhythms can be observed in the absence of the positive elements of the circadian core clock, had to be corrected substantially, and its main conclusions have been disputed by both Abruzzi et al. and Ness-Cohn et al. This should be mentioned.

      Response: According to our Behavioral and Transcriptomic data, CLOCK function is conserved in constant light condition. In LD context, the rhythmicity is maintained probably by the light-response pathway in Nematostella. We modified our rhythmic transcriptomic analysis and considered the context of the contested results by Ray et al., and discussed it in the revised manuscript.

      (2) The discussion of CIPC on line 222 is hard to follow as well. How does mRNA rhythm inform the function of CIPC, and why would it function as a "dampening factor"? Given that it is "the only core clock member included in the Clock-dependent CCGs," (220) more discussion seems warranted. Discussing work done on this protein in mammals and flies might provide more insight.

      Response: The initial sentence was unclear. Furthermore, since we restricted our rhythmic analysis to genes only found rhythmic with a p<0.01 with RAIN combined with JTK, NvCipc was no longer defined as rhythmic in free running.

      (3) The behavioral arrhythmicity seen with their Clock mutation is really interesting. However, what is shown is only an averaged behavior trace and a single periodogram for the entire population. This leaves open the possibility that individual animals are poorly synchronized with each other, rather than arrhythmic. I also note that in DD there seem to be some residual rhythms, though they do not reach significance. Thus, it is also possible that at least some individual animals retain weak rhythms. The authors should analyze behavioral rhythms in individual animals to determine whether behavioral rhythmicity is really lost. This is important for the solidity of their main conclusions.

      Response: Fig. 1 has been modified. We have separated the data for WT and NvClk1-/- animals to provide clarity on the average behavior pattern for each genotype. While the LSP analysis on the population average informs us about the synchronization of the population, it is true that it does not provide insight into individual rhythmicity. To address this, we analyzed individuals in all conditions using the Discorhythm website (Carlucci et al., 2019).

      In the revised figure, we have included a comparison plot of the acrophase of 24-hour rhythmic animals between genotypes using Cosinor analysis, which is most suitable for acrophase detection. This plot indicates the number of animals detected as significantly rhythmic, providing direct visual input to the reader regarding individual rhythmicity. Additionally, we have added Table 1, which contains the Cosinor period analysis (24 and 12 hours) of individuals for all genotypes and conditions, further enhancing the clarity of our findings.

      (4) There is no mention in the results section of the behavior of heterozygotes. Based on supplement figure 2A, there is a clear reduction in amplitude in the heterozygous animals. Perhaps this might be because there is only half a dose of Clock, but perhaps this could be because of a dominant-negative activity of the truncated protein. There is no direct functional evidence to support the claim that the mutant allele is nonfunctional, so it is important to discuss carefully studies in other species that would support this claim, and the heterozygous behavior since it raises the possibility that the mutant allele acts as a dominant negative.

      Response: Extended Data Fig.1 modified. We show NvClk1+/- normalized locomotion over time in DD of the population, comparison of individual normalized behavior amplitude, LSP of the average population and individual acrophase of only rhythmic 24h individuals. Indeed, we cannot discriminate Dominant-negative from non-functional allele.

      (5) I do not understand what the bar graphs in Figure 2E and 3B represent - what does the y-axis label refer to?

      Response: Not relevant to the revised manuscript.

      (6a) I note that RAIN was used, with a p<0.05 cut-off. I believe RAIN is quite generous in calling genes rhythmic, and the p-value cut-off is also quite high. What happens if the stringency is increased, for example with a p<0.01.

      Response: We acknowledge your concern regarding the stringency of our statistical analysis. To address this, we opted to combine both RAIN and JTK methods and applied a more stringent p-value cut-off of p<0.01.

      (6b) It would be worth choosing a few genes called rhythmic in different conditions (mutant or wild-type. LD or DD), and using qPCR to validate the RNAseq results. For example, in Figure 3D, Myh7 RNAseq data are shown, and they do not look convincing. I am surprised this would be called a circadian rhythm. In wild-type, the curve seems arrhythmic to me, with three peaks, and a rather large difference between the first and second ZT0 time point. In the Clock mutants, rhythms seem to have a 12hr period, so they should not be called rhythmic according to the material and methods, which says that only ca 24hr period mRNA rhythms were considered rhythmic. Also, the result section does not say anything about Myh7 rhythms. What do they tell us? Why were they presented at all?

      Response: Regarding the suggestion for independent verification of our RNAseq results, we agree that such validation would enhance the robustness of our findings. To address this, we chose to overlap our identified rhythmic genes under WT LD conditions with those from another transcriptomic study that shared similarities in experimental design. Notably, the majority of overlapping rhythmic genes between the studies are candidate pacemaker genes. We believe that this replication of biologically significant rhythmic genes strengthens the validity and reliability of our results (see Extended Data Fig. 2).

      Furthermore, we have decided to remove the NvMhc-st (mistakenly named Myh7, only rhythmic in WT DD in the new analysis) as it does not contribute substantively to the revised version of the manuscript.

      (7) The authors should explain better why only the genes that are both rhythmic in LD and DD are considered to be clock-controlled genes (CCGs). In theory, any gene rhythmic in DD could be a CCG. However, Leach and Reitzel actually found that most genes in DD1 do not cycle the next day (DD2)? This suggests that most "rhythmic" genes might show a transient change in expression due to prolonged obscurity and/or the stress induced by the absence of a light-dark cycle, rather than being clock controlled. Is this why the authors saw genes rhythmic under both LD and DD as actual CCGs? I would suggest verifying that in DD the phase of the oscillation for each CCG is similar to that in LD. If a gene is just responding to obscurity, it might show an elevated expression at the end of the dark period of LD, and then a high level in the first hours of DD. Such an expression pattern would be very unlikely to be controlled by the circadian clock.

      Response: As we modified our transcriptomic analysis, we do no longer analyze LD+DD rhythmic genes, but any genes rhythmic (RAIN and JTK p<0.01) in each condition. As such we end up with four list of genes corresponding to each experimental conditions.

      (8) Since there are still rhythms in LD in Clock mutants, I wonder whether there is a paralog that could be taking Clock's place, similar to NPAS2 in mammals.

      Response: see response to (1) > The only NPAS2 orthologous identified in Nematostella NPAS3 showed marginally significance (p=0.013) with RAIN in LD WT suggesting a regulation similar to the candidate pacemaker genes. As such we included within our candidate pacemaker genes list.

      (9) I do not follow the point the authors try to make in lines 268-272. The absence of anticipatory behavior in Drosophila Clk mutants results from disruption of the circadian molecular clock, due to the loss of Clk's circadian function. Which light-dependent function of Clock are the authors referring to, then? Also, following this, it should be kept in mind that clock mutant mice have a weakened oscillator. The effect on entrainment is secondary to the weakening of the oscillator, rather than a direct effect on the light input pathway (weaker oscillators have increased response to environmental inputs). The authors thus need to more clearly explain why they think there is a conservation of circadian and photic clock function.

      Response: Following the changes in our statistical analysis we reframed the discussion and address directly the circadian and the photic clock function (we call it light-response pathway in the manuscript)

      Recommendations for the authors:

      We suggest the following improvements:

      (1) Please undertake a serious effort to make this work more accessible to non-marine chronobiologists. This includes better explanations, and schemes of the animal when images of staining are shown (e.g. Fig.1b) which include the labeling of relevant morphological structures mentioned in the text (like "tentacle endodermis and mesenteries" (line 132)). Similar issues for mentioned life cycle stages like "late planula stage" (line 133), "bisected physa" (line 149).

      Response: Fig. 1b, we outlined the animal shaped and added 2 arrows to locate the tentacle endodermis and mesenteries. We replaced the term late planula stage, by larvae. And we rephrased bisected physa by tissue sampling.

      Please attend to details. This includes:

      • Wrong referrals to figures (currently line 151 refers to EDF2- but should be EDF 1 instead, there is a Fig.3f mentioned in the text, but there is no such Fig.).

      Response: Fixed

      • Mentioning of ZTs when the HCR stainings were performed.

      Response: Fixed

      • Fig.1 a shows a rather incomplete and thus potentially confusing phylogenetic tree. Vertebrates have at least two Clk orthologs (NPAS2 and CLK), please include both, use an outgroup, and rout the tree.

      Response: Identifying NPAS2 and CLK orthologous in all species added more confusion into the conclusion. However, we followed the suggestion of adding an outgroup using a CLK orthologous sequence identified in the sponge Amphimedon queenslandica and rout the tree. Thank for the suggestion.

      • What do the y-axis labels in Figure 2E and 3B refer to exactly? Y-axis label annotations in Fig.3a,d are entirely missing- what do the numbers refer to?

      Response: not relevant in the revised manuscript

      • Fig.2D- is the Go term enrichment referring to LD or DD?

      Response: to DD. We made it cleared on the figure 5.

      • Wording: "Clock regulates genetic pathways." What is meant by "genetic pathways"? There are no "non-genetic pathways". Could one simply say: "Clock regulates a variety of transcripts".

      Response: We modified our threshold to use only p.adj<0.01, which reduced the GO term numbers. We removed “genetic pathways” and now address the specific pathways: cell-cycle and neuronal.

      The use of the term "epistatic" is confusing (line 219), i.e. that light is epistatic to Clock. In genetics, epistasis is defined as the effect of gene interactions on phenotypes. To a geneticist, this implies that there is a second gene impacting on the phenotype of the Clock mutants. Please re-word.

      Response: “light is epistatic on Clock” has been re-phrased.

      The provided Supplementary tables are not well annotated. Several of them need guess-work about what is shown. For instance, for Supplementary Table 1, the Ns are unclear, which in total can go up to almost 200 per condition-genotype, but only about 30 animals for each were tested. Thus, where do the high totals in the LSP table come from? What do the numbers of each periodicity mean? Initially one might assume it was the number of animals that showed a periodogram peak at a given periodicity, but it seems that cannot be. Maybe it counted any period bin over statistical significance? Please clarify with better descriptions and labels.

      Response: Supplementary tables are now clearly annotated on their first Tabs. About Fig.1, we already addressed this point in the public review.

      Albeit not essential, it would be more reader-friendly to also add a summary table with average period and SD, power and SD, and percentage rhythmicity to the main figure.

      Response: Table 1 is added: it contains individual count of rhythmic animals (24h and 12h) with Cosinor. However, using Discorhythm we had to ask for a specific Period. Thus, we can only provide animal count significant for a given period value. And not an estimation of their own period.

      (2) Some of the terminology is quite confusing, in particular the double meaning of the word "clock" (i.e the pacemaker and the transcription factor). This is not a specific problem to this manuscript, but it would be helpful for the readability to try to improve this.

      Could the gene/transcript/protein be spelled: clk and Clk?

      Alternatively, for clarity- how about talking about "core pacemaker genes," "CLOCK-dependent rhythmic genes" and "CLOCK-independent rhythmic genes"?

      Response:

      Clock/CLOCK > NvClk / NvCLK and the mutant is NvClk1-/-

      Core clock genes > candidate pacemaker genes.

      CLOCK-dependent CCG > this notion no longer exists in the revised manuscript.

      CLOCK-independent CCG > this notion no longer exists in the revised manuscript.

      (3) The dismissal of the 12h rhythmicity in Clock-/- animals is not really convincing and should be reconsidered. LD6:6 cycles (before free-running animals in DD) is likely a not particularly robust way to entrain tidal animals. Recent papers show inundation/mechanical agitation are more reliable cues (Kwiatkowski ER, et al. Curr Biol. 2023, 2;33(10):1867-1882.e5. doi: 10.1016/j.cub.2023.03.015; Zhang L., et al Curr Biol. 2013, 23;19, 1863-1873 doi.org/10.1016/j.cub.2013.08.038.) and might be more effective in revealing endogenous 12h rhythms in the absence of 24h cues.

      Response: We removed the proposition of using 6:6hLD as Tidal entrainment. Instead, the LD 6:6 experiment reveals the direct light-dependency of the NvClk1-/- mutant.

      (4) There are significant questions raised on the validity of BMAL1-independent rhythms in mammals as suggested by the Ray et al study. See DOI: 10.1126/science.abe9230 and DOI: 10.1126/science.abf0922

      These technical comments should also be taken into account and the discussion adjusted accordingly to better reflect the ongoing discussions in the chronobiology field.

      Response: We modified our rhythmic analysis. As we cannot use BHQ or adjusted p-value which resulted in very genes, we defined 24h-rhythmic genes if p<0.01 with two different algorithms (RAIN and JTK). We propose this compromise to reduce the risk of false-positive. Furthermore, we discussed our methodology in the light of the significant questions raised by these papers you cited. We thank the reviewer for this important point.

      (5) The HCR stainings for clk are not very convincing. Normally, HCR should have more dots. In principle, the logic of HCR is such that it detects individual mRNA molecules in the cell. Thus, having only one strong dot/cell like in Fig.1b doesn't make much sense.

      Response: We were the first surprised by this single dot signal. We are experienced users of HCRv.3 across different species. We decided to remove the close-up (for further investigations) but to keep the full animal signal. According to our approach it is a convincing signal. However, the doty nature of the signal itself it is not easy to make it highly visible at full scale animal on the picture. We did our best to show the mRNA signal visible without altering the pattern.

      Furthermore, the controls for the HCR in situ hybridization are unclear. In the methods, there are two Clock probes described (B3 & B5) and two control probes (B1 & B3), however, in the negative control image, a combination of one Clock (B1) and one control (B3) probes is used and is unclear what "redundant detection" means in the legend of figure S2.

      Response: Considering the nature of the signal (single of few dots), we decided to use two probes with 2 different fluorophores. A noise is by nature random. Our hypothesis was: only overlapping fluorescent dots are true signal of NvClk mRNA.

      For Control probes we used two zebrafish probes labelling hypothalamic peptides.

      Based on the experience with non-Drosophila, non-mouse animal model systems the reviewers assume that non-sense mediated mRNA decay (NMD) is not strongly initiated upon Crispr-induced premature STOP-codons. If this assumption is correct it would be worth to mention it. Alternatively, it would be worth testing if Nematostella induces NMD, as this would be a great control for the HCR and the mutation itself. At which ZT was the HCR done?

      Response: We performed the HCR at ZT10 when NvClk is described to be at peak. It is now indicated in the Fig. 1b. The RNAseq detected a higher quantity of NvClk1 mRNA in the NvClk1-/- (see Fig. 4a). mRNA quantity regulation involves transcription, stabilization, and degradation. At this stage, we cannot identify which specific step is affected.

      For Fig.1c- please provide the binding site and sequence in the figure, simply include EDF 1 in the main figure.

      Response: We generated a clear indication in the new Fig.1c and EDF. 1b about the protein domains, the CRISPR binding site and the consequences on the DNA and AA sequences.

      (6) Please provide the individual trace data for the behavioral analyses either as supplementary files or as a link to an openly accessible database like DRYAD (see also comment 7 in the public review of reviewer 2). Maybe this is what is shown in Supplementary Table 1, but it is really not clear what is actually shown.

      Response: Fig.1 is updated. Table 1 is added. Supplementary Table 1 contains individual normalized locomotor data of each polyps for each genotypes and light conditions. Supplementary Table 2 contains the cosinor individual rhythmic behavior analysis based on the Supplementary Table 1.

      (7) It is not really clear if the mutation is a true loss-of-function or could also be dominant negative. While this is raised in the discussion, it should be more carefully considered. The reason why a dominant negative would be unlikely is unclear. More specifically also see comment 8) in the public review of reviewer 2.

      Response: Indeed, the results cannot tell us if it is a true loss of function, a dominant negative or non-functional allele. We addressed it in the first part of the discussion.

      (8) The pretty small overlap of rhythmic transcripts in LD and DD could reflect the true biology of a more core clock driven-process under constant conditions and a more light-driven process under LD. But still- wouldn't one expect that similar processes should be rhythmic? If not, why not?

      It would certainly add strength to the data if for one or two transcripts these results were independently verified by qPCR from an independent sampling. This could even be done for just two time points with the most extreme differences.

      Response: We appreciate the reviewer's comments and concerns regarding the overlap of rhythmic transcripts in different conditions. In response to the reviewer's query, we revised our interpretation of the transcriptomic data, acknowledging the limited overlap between light and genotype conditions in our study. This prompted us to reconsider the underlying biological processes driving rhythmic gene expression under constant conditions versus light-dark cycles.

      Regarding the suggestion for independent verification of our RNAseq results, we agree that such validation would enhance the robustness of our findings. To address this, we chose to overlap our identified rhythmic genes under WT LD conditions with those from another transcriptomic study that shared similarities in experimental design. Notably, the majority of overlapping rhythmic genes between the studies are candidate pacemaker genes. We believe that this replication of biologically significant rhythmic genes strengthens the validity and reliability of our results (see Extended Data Fig. 2).

      (9) Expression of myh7 : Checking for co-expression should be pretty straightforward by HCR. This is what this type of staining technique is really good for. Please do clk and myh7 co-staining if you want to claim co-expression. Otherwise don't make such a claim.

      Response: We agree that checking for co-expression should be straightforward by HCR. However, due to time constraints during the revision period, we are unable to conduct the double in-situ experiment. Additionally, upon careful consideration, we recognize that including myhc-st (mistakenly named myh7) staining and co-expression analysis would not significantly contribute to the main conclusions of our study. Therefore, we have decided to remove this analysis from the revised manuscript.

      (10) Missing methodological details:

      • The false discovery rate for each analysis should be included (see Hughes et al.,: "Guidelines for Genome-Scale Analysis of Biological Rhythms," 2017).

      Response: THE FDR is indicated for each gene in supplementary table 3

      • Fig.1f- continuous light- please provide a spectrum (If there is no good spectrophotometer available, please provide at least manufacturer information.

      Response: Unfortunately, we don’t have a good spectrophotometer available during the time of the revision. We added to the method the reference of the lamp. We found the light spectrum provided by the supplier. However, we did not add it to the revised manuscript.

      Author response image 1.

      Spectrum of the Aquastar t8

      Also, it would be easier for the reader, if the measurements of light intensity are provided in photons, because this is what the light receptors ultimately measure.

      Response: Modified.

      • Fig.2E- please add the consensus sequence used for circadian E-box vs. E-box to the figure.

      Response: In the revised manuscript Fig.4c, we show which E-box motifs we extracted for our promoter analysis. We as well changed our analysis and did no longer use HOMER, but we directly extracted promoter sequences and looked for canonical Ebox CANNTG and Circadian Ebox CACGTG and generate a Circadian Ebox enrichment output per gene promoter.

      (11) There has been some discussion about the evolutionary statement as stated by the authors. It appears that depending on the background of the reader, this can be misunderstood. We thus suggest to more clearly point out where the author thinks there is evolutionary conservation (a function for clk in the circadian oscillator under constant light or dark conditions) versus where there is no apparent evolutionary conservation (the situation under light-dark conditions).

      Response: In the revised manuscript we proposed a conserved function of NvCLK in constant darkness, and a light-response pathway compensating in LD conditions in the mutant.

      Please also consider the major comments 8 and 9 of the common review from reviewer 2.

      Reviewer #1 (Recommendations For The Authors):

      The hybridization chain-reaction ISH is OK but, I'm not sure I understand the control condition-this should be clarified. I would also welcome the use of Clock-/- animals in HCR as another, more direct level of control. In addition, the authors state that the Myh7 probes hybridise in anatomical regions resembling those for Clock (Fig 3e). It would be better to duplex these two probe sets with different fluors for a better representation of the relative spatial distributions of each transcript.

      Response: We agree that checking for co-expression should be straightforward by HCR. However, due to time constraints during the revision period, we are unable to conduct the double in-situ experiment. Additionally, upon careful consideration, we recognize that including myhc-st (mistakenly named myh7) staining and co-expression analysis would not significantly contribute to the main conclusions of our study. Therefore, we have decided to remove this analysis from the revised manuscript.

      We clarified in the methods the control probes design.

      Minor points:

      Figure legends do not all convey sufficient detail. For instance, Figure 1c needs a better explanation. Figure 3e- are these images both WT? Fig 3f doesn't exist and other figure text references do not align with figures and need an overhaul.

      Response: All errors have been fixed.

      Reviewer #2 (Recommendations For The Authors):

      Major issues:

      (1) The authors need to introduce their model system better for a broad audience. What are the tissues/cells that express Clock at a higher level? What is their function, does this provide a potential explanation for their specific Clock expression, and how CLOCK might regulate behavior? Terms such as "tentacle endodermis and mesenteries" (line 132), "late planula stage" (line 133), "bisected physa" (line 149) would need some explanation.

      Response: We modified term such as planula to larvae, and bisected physa to tissue samples.

      2) Some of the terminology used is quite confusing, because of the double-meaning of the word "clock" (i.e the pacemaker and the transcription factor). The authors use terms such as "clock-controlled genes", "core clock genes", "CLOCK-dependent clock-controlled genes", "neo-clock-controlled genes". Is there any way to help the reader? Here are several suggestions: "core pacemaker genes," "CLOCK-dependent rhythmic genes" and "CLOCK-independent rhythmic genes".

      Response: all the terminology has been clarified, see previous comments

      3) Also in the abstract, there is mention of "hierarchal light- and Clock-signaling" (52-3) - is this related to the statement on line 219 that light is epistatic to Clock? I do not quite understand what epistatic would mean here. Who is upstream of whom? LD modifies rhythmicity in Clock mutant animals, but Clock mutations also impact rhythmicity in LD. Also, as epistasis is defined as the effect of gene interactions on phenotypes - what is the secondary gene impacting the phenotype of the Clock mutants? I am not sure the term epistatic is appropriate in the present context.

      Response: Indeed, Epistatic is a genetic term which might be unclear in this context. We removed it.

      4) The control for the in situ hybridization is unclear. In the methods, there are two Clock probes described (B3 & B5) and two control probes (B1 & B3), however, in the negative control image, a combination of one Clock (B1) and one control (B3) probe is used, I am not sure what "redundant detection" means in the legend of figure S2. Also, the sequences of each Clock probe should be provided. It might be worth testing the Clock mutant the authors generated. Clock mRNA could be reduced due to non-sense, mediated RNA decay, since the mutation causes a premature stop codon. This would be a great additional control for the in situ hybridization. Even better would be if, by chance, the probes target the mutated sequence. The signal should then be completely lost.

      Response: HCR is a tilling probe. Which means the target transcript is covered by dozens of successive DNA sequence “primer-like” which allow the HCRv.3 technology. We cannot design a mutant probe specific with this technology.

      (5) I have concerns with rhythmic-expression calls, particularly as there is so little overlap between LD and DD, and that a completely different set of rhythmic genes is observed in Clock mutant and wild-type animals. I am not an expert in whole-genome expression studies, so I hope one of my colleague reviewers can weigh in.

      When describing rhythmicity analysis in the Methods, it states that Benjamini-Hochberg corrections were applied to account for multiple comparisons. However, the false discovery rate for each analysis should be included (see Hughes et al.,: "Guidelines for Genome-Scale Analysis of Biological Rhythms," 2017).

      Response: As explained before we cannot used Benjamini-Hochberg corrections as only few genes (mostly oscillator gene pass the threshold). As such we combined two different algorithms (RAIN and JTK) with a p<0.01 to detect confidently rhythmic genes while reducing the risk of false-positives.

      Minor issues:

      (1) Environmental inputs are not "circadian", as written in the title.

      Response: Title modified

      (2) In the abstract, the description of the Clock mutant behavioral phenotypes is hard to follow, with no mention of whether or not Clock mutant animals are behaviorally rhythmic or arrhythmic in constant conditions.

      Response: corrected

      (3) Abstract: A 6/6 h LD cycle is not a compressed tidal cycle as written in the abstract. Light is not an input to tidal rhythms.

      Response: corrected

      (4) Line 101: timeout is not a core clock gene in animals.

      Response: we removed it from the candidate pacemaker genes.

      (5) What is the evidence for the role of PAR-Zip proteins in the Nematostella clock? The reference provided does not mention those.

      Response: There is no functional data in Nematostella yet to support their role within the pacemaker. However based on their rhythmicity in LD and protein conservation, we included them within the candidate pacemaker genes list. The refences have been corrected.

      (6) Line 125. should refer to Fig 1C when describing the Clock protein.

      Response: corrected

      (7) Line 143-4. based on the figure, the region targeted by gRNA was not "close to the 5' end" as stated, it is closer to the middle of the gene sequence as shown in Figure 1C. A more accurate description would be a region in between the PAS domains.

      Response: Indeed we modified the figure and the text.

      (8) Line 150. The mutant allele is described as Clock1 initially, then for the rest of the paper as Clock-. SInce it is not clear that the allele is a null (see major comment #8), Clock1 should be used throughout the manuscript.

      Response: the allele is named NvClk1 in the revised manuscript

      (9) Figure 2A, the second CT/ZT0 is misplaced.

      Response: Fig. 2 modified in the revised manuscript

      (10) Figure legend for 2E and 3B. "The 1000bp upstream ATG" is unclear. I guess it means that 1000bp upstream of the putative initiation codon was used.

      Response: Right, and in the revised version we analyzed 5kb upstream the putative ATG.

      (11) Line 164. The authors write "We discovered..." , but wasn't it already known that these animals are behaviorally rhythmic?

      Response: Fixed

      (12) It would be worth mentioning in the results section the reduced amplitude of rhythms in LL compared to DD (in WT and seemingly also in Clock mutants).

      Response: Indeed, we observed a significant reduction in the mean amplitude in the NvClk1-/- in DD and LL compared WT and NvClk1-/- in LD, DD and LL. However, as rhythmicity is lost by virtually all mutants in LL and DD we do not think these results add to the current interpretation of the gene function.

      (13) Please correct the figure numbers in the main text, there are several mistakes.

      Response: Done

      (14) Line 196, most genes in the quoted study did not cycle on day 2, so whether they are truly clock controlled is questionable.

      Response: We agree, identifying free-running cycling genes in cnidarian remains a challenge to overcome. One of the limitations of this study was to detect rhythmic genes in LD which conserved rhythmicity in DD. However, considering different transcriptomic studies (cited in the discussion) it seems that in the cnidaria phyla rhythmic genes in LD are not necessarily the one we identified rhythmic in DD.

      (15) Line 204-206 needs to be rephrased. It is confusing.

      Response: rephrased

      (16) Line 216. Rephrase to something like: "A similar finding was made for."

      Response: rephrased

      (17) "Clock regulates genetic pathways" sounds quite odd. Do you mean it regulates preferentially specific genetic (or maybe better, molecular) pathways?

      Response: rephrased

      (18) Figure 4 and legend: Dashed lines indicating threshold are missing. Do the black and red dots represent WT and Clock-/-, as indicated in the legend, or up/down, as indicated in the figures?

      Response: Fig.5 modified accordingly. Colors in the Volcano plot indicate Up- (black) versus Down- (red) regulated. It is now coherent within the figure.

      (19) Legend for Extended figure 1. "Immature peptide sequence" is incorrect.

      Response: rephrased

      (20) Extended data Figure 4. What the asterisks labels is unclear.

      Response: EDF4 was modified and become EDF2 with different content. The * indicates NvClk mRNA

      (21) Line 228. Gene "isoforms". I guess the authors mean "paralogs".

      Response: corrected.

      (22) Line 232-3/Figure 3e. Please include a comparable image of the Clk ISH to facilitate the comparison of the spatial expression pattern. In addition, where and what is the "analysis" referred to - "the spatial expression pattern of Myh7 closely resembled that of Clock, as evidenced by our analysis"?

      Response: the analysis has been removed from the revised manuscript because we currently cannot perform the double ish.

      (23) Line 282-3. As mentioned above, it is difficult to be sure that circadian behavior is lost, if only looking at a population of animals.

      Response: Fig.1 corrected

      (24) Line 301-5. Rephrase.

      Response: Rephrased

      (25) Line 325. I am not convinced that the author can say that their mutant is amorphic. See Major comment 8.

      Response: corrected.

      (26) Line 351 "simplifying interactions with the environment". Please explain what is meant here.

      Response: this confusing sentence has been removed from the revised manuscript

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      (1) Figures 1B, S4, and S5, Tibia sections would be more informative and promising as the growth plate is flat. Otherwise, histology of the knee would be preferred.

      We have added the tibia section images in Figures 1B, S4, and S5 (New Figure 1B, Figure 2-figure supplement 3A, and Figure 3-figure supplement 1A).

      (2) Figure 1C, The authors performed immunostaining for vimentin, alpha-SMA, Col1a1 and Col1a2. The authors should use adjusted sections for the immunostaining for different antibodies. It would avoid region-specific variations in the size and shape of sections and the data would be more reliable. Please correct and revise.

      We have provided immunostaining results using consecutive sections at the similar locations of the external ear (Figure 1C).

      (3) Figure 2A and throughout the manuscript where authors performed p-smad1/5/9 fluorescent immunostaining, the authors should also show non-phospho levels of p-smad1/5/9. Please correct and revise.

      We have tried different anti-Smad1/5/9 antibodies and the signals have very high background and are not presentable. We instead did a western blot on auricle samples and the results are in Figure 2-figure supplement 1A, suggesting that ablation of Bmpr1a led to loss of activation of Smad1/5/9 without affecting their expression. For different segments of external ear, we also provided WB results in Figure 2-figure supplement 4B. In addition, we added RNA-seq data regarding the Smad1,5,9 mRNA levels, which were not affected by Bmpr1a ablation (Figure 4-figure supplement 1B). Overall, these results suggest that Bmpr1a ablation does not affect the expression of Smad1/5/9.

      (4) Result 2, lines 131-134, the authors mentioned in the text that they observed no ear phenotype of Prrx1CreERT or Bmpr1af/f mice compared with wild-type mice (Figures S2A and S2B). However, the figures did not show histology pictures of wild-type mice. Please correct and revise.

      We have provided histological pictures of wild type mice (Figure 2-figure supplement 2C).

      (5) Result 5, lines 173-174 "We generated....Bmpr1a floxed mice". How did authors generate Col1a2-CreERT; Bmpr1af/f mice by crossing Prrx1Cre-ERT and Bmpr1af/f mice? Please correct and revise.

      It is a typo and has been corrected.

      (6) In the previous study by Soma Biswas et al., (Scientific Reports 2018, PMID 29855498) the authors mentioned in the result section that the mice with deletion of Bmpr1a using Prx1Cre looked morphologically normal. They did not mention the ear phenotype/microtia. Please explain how this study differs from current work and what are the limitations in the discussion.

      We did not observe an obvious ear phenotype in the adult transgenic Prrx1-CreERT; Bmpr1af/f mice. The reason could be that that the transgene label too few auricle chondrocytes as it has been for endosteal bones and periosteal bones in adult mice (Liu et al. Nat Genet 2022; Wilk, K. et al. Stem Cell Rep 2017; Julien A et al. J Bone Miner Res 2022). The difference is likely caused by the fact that the transgenic CreERT line was driven by a 2.3 kilobase promoter of Prrx1 that was inserted to unknow location in the genome. Since we do not carry the transgenic line any more, we cannot directly test the labelling efficiency of the transgenic line in auricle. We have discussed this point in the revised manuscript.

      Reviewer #2 (Recommendations For The Authors):

      Chondrocytes are present in many parts of the body; some components are replaced by osteoblast cells, but others stay with their morphology. These cells are in different morphological and cellular conditions throughout the body. Is there any human variant study of Prrx1 and their association with auricle chondrocytes is present?

      We searched the literature and found no study on Prrx1 in auricle chondrocytes in human.

      Do auricle chondrocytes have Prrx1+ through their developmental stage, and what's the expression situation of Prrx1+ at articular cartilage and growth plates throughout development? Only a small population is positive throughout the development, or they lose as they develop.

      We traced Prrx1 lineage cells in Prrx1-CreERT; R26tdTomato mice that received TAM at E8.5, E13.5, or p21. We found that auricle chondrocytes were Tomato+ under these conditions even only one dose of TAM (1/10 of the dose for adult mice) was given to the pregnant mice at E8.5 or E13.5 (Figure 1-figure supplement 1). However, while E8.5 mice showed Tomato+ chondrocytes at both articular cartilage and growth plate, E13.5 or p21 mice showed much fewer Tomato+ chondrocytes at articular cartilage and growth plate (Figure 1-figure supplement 1). These results indicate that Prrx1 expression differs in cartilages during development, growth, and maintenance.

      What's your rationale for studying Bmpr1a ablation at the adult stage?

      Organ development and maintenance are different processes, especially for slow-turnover tissues. Organ maintenance is also important since it accounts for 90% of the lifetime of mice. While previous studies have uncovered essential roles for BMP signaling in chondrogenic differentiation during development, it remains unclear whether BMP signaling plays a role in cartilage maintenance in adult mice.

      Line no 128: Chondrocytes are shirked but still have normal proliferation; what's the author's thought about it?

      Sorry that we did not make it clear enough. Actually there were very few cells undergoing proliferation in auricle cartilage and Bmpr1a ablation did not alter that. We have rephrased these sentences.

      Do chondrocytes have protein trafficking defects or ER/Golgi stress?

      We checked the expression of proteins involved in protein trafficking and found that some were up-regulated and some were down-regulated (Figure 4-figure supplement 1D), which may reflect the shift from chondrocytes to osteoblasts and warrants further investigation. However, the expression of ER or Golgi stress-related genes, which play critical roles in chondrocyte differentiation and survival (Wang et al. 2018; Horigome et al. 2020), was not altered by Bmpr1a ablation (Figure 4-figure supplement 1E and 1F).

      How many Prrx paralogs are there in the system? Are all associated with auricle chondrocytes and similar mechanisms?

      There is one Prrx1 paralog, Prrx2. While Prrx1-/- mice lived for up to 24 hours after birth with low-set ears (Martin JF. Eta al. Genes Dev. 1995), Prrx2-/- mice are perfectly normal. Prx1-/-Prx2-/- double mutant mice died within an hour after birth and the pups showed no external ears (ten Berge D. et al. Development. 1998). We have added this information into the revised manuscript.

      Extracellular matrix (ECM) provides cell-to-cell interaction and environment for cell growth. Does Bmpr1a ablation lead to any changes in ECM at the auricle or growth plate chondrocytes?

      Our analysis showed that the expression of many ECM proteins was down-regulated in auricle cartilage of Prrx1-CreERT; Bmpr1af/f mice (Figure 4-figure supplement 1A). This may reflect the shift from chondrocytes to osteoblasts and warrants further investigation. However, immunostaining revealed that the expression of Aggrecan and Col10 in the growth plates was unaltered in adult Prrx1-CreERT; Bmpr1af/f mice compared to control mice (Figure 4-figure supplement 1C), likely due to the lack of marking of chondrocytes in growth plates.

      Microtia usually develops during the first trimester of pregnancy in humans. What's your view about studying at the adult stage compared to intrauterine development?

      Congenital microtia is a problem with the formation of external ear whereas microtia development in adult mice is a problem with the maintenance of the auricle chondrocytes. Organ maintenance is also an important process as it starts from 3 months of age and lasts for 90% of the lifetime of mice.

      In RNA sequencing protocol, Wikipedia pages keep updating, so it is very strange to cite the Wikipedia pages. Cite a research article for it.

      We have replaced this reference.

      Why do the authors have a very low FDR value for this study? How does this value strengthen the study?

      It was a typo that has been corrected.

      It needs further validation to show that Prrx1 marked cells are a good model for auricular chondrocyte-related studies.

      We show that Prrx1 marks auricle chondrocytes but few growth plate or articular chondrocytes in adult mice, suggestive its specificity. However, the use of Prrx1-CreERT line in auricle cartilage studies is complicated by the labelling of dermal cells in the external ear by Prrx1. We have discussed this point in the revised manuscript.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public Review):

      In this study, the authors address a fundamental unresolved question in cerebellar physiology: do synapses between granule cells (GCs) and Purkinje cells (PCs) made by the ascending part of the axon (AA) have different synaptic properties from those made by parallel fibers? This is an important question, as GCs integrate sensorimotor information from numerous brain areas with a precise and complex topography.

      Summary:

      The authors argue that CGs located close to PCs essentially contact PC dendrites via the ascending part of their axons. They demonstrate that joint high-frequency (100 Hz) stimulation of distant parallel fibers and local CGs potentiates AA-PC synapses, while parallel fiber-PC synapses are depressed. On the basis of paired-pulse ratio analysis, they concluded that evoked plasticity was postsynaptic. When individual pathways were stimulated alone, no LRP was observed. This associative plasticity appears to be sensitive to timing, as stimulation of parallel fibers first results in depression, while stimulation of the AA pathway has no effect. NMDA, mGluR1 and GABAA receptors are involved in this plasticity.

      Strengths:

      Overall, the associative modulation of synaptic transmission is convincing, and the experiments carried out support this conclusion. However, weaknesses limit the scope of the results.

      Weaknesses:

      One of the main weaknesses of this study is the suggestion that high-frequency parallel-fiber stimulation cannot induce long term potentiation unless combined with AA stimulation. Although we acknowledge that the stimulation and recording conditions were different from those of other studies, according to the literature (e.g. Bouvier et al 2016, Piochon et al 2016, Binda et al, 2016, Schonewille et al 2021 and others), high-frequency stimulation of parallel fibers leads to long-term postsynaptic potentiation under many different experimental conditions (blocked or unblocked inhibition, stimulation protocols, internal solution composition). Furthermore, in vivo experiments have confirmed that high-frequency parallel fibers are likely to induce long-term potentiation (Jorntell and Ekerot, 2002; Wang et al, 2009). This article provides further evidence that long-term plasticity (LTP and LTD) at this connection is a complex and subtle mechanism underpinned by many different transduction pathways. It would therefore have been interesting to test different protocols or conditions to explain the discrepancies observed in this dataset.

      Even though this is not the main result of this study, we acknowledge that the control experiments done on PF stimulation add a puzzling result to an already contradictory literature. High frequency parallel fibre stimulation (in isolation) has been shown to induce long term potentiation in vitro, but not always, and most importantly, this has been shown in vivo. This was in fact the reason for choosing that particular stimulation protocol. Examination of in vitro studies, however, show that the results are variable and even contradictory. Most were done in the presence of GABAA receptor antagonists, including the SK channel blocker Bicuculline, whereas in the study by Binda (2016), LTP was blocked by GABAA receptor inhibition. In some studies also, LTP was under the control of NMDAR activation only, whereas in Binda (2016), it was under the control of mGluR activation. Moreover, most experiments were done in mice, whereas our study was done in rats. Our results reveal intricate mechanisms working together to produce plasticity, which are highly sensitive to in vitro conditions. We designed our experiments to be close to physiological conditions, with inhibition preserved and a physiological chloride gradient. It is likely that experimental differences have given rise to the variability of the results and our inability to reproduce PF-LTP, but it was not the aim of this study to dissect the subtleties of the different experimental protocols and models. We will modify the Discussion to describe that point fully including differences in experimental conditions.

      Another important weakness is the lack of evidence that the AAs were stimulated. Indeed, without filling the PC with fluorescent dye or biocytin during the experiment, and without reconstructing the anatomical organization, it is difficult to assess whether the stimulating pipette is positioned in the GC cluster that is potentially in contact with the PC with the AAs. According to EM microscopy, AAs account for 3% of the total number of synapses in a PC, which could represent a significant number of synapses. Although the idea that AAs repeatedly contact the same Purkinje cell has been propagated, to the best of the review author's knowledge, no direct demonstration of this hypothesis has yet been published. In fact, what has been demonstrated (Walter et al 2009; Spaeth et al 2022) is that GCs have a higher probability of being connected to nearby PCs, but are not necessarily associated with AAs.

      We fully agree with the reviewer that we have not identified morphologically ascending axon synapses, and we stress this fact both in the first paragraph of the Results section, and again at the beginning of Discussion. Our point is mainly topographical, given the well documented geometrical organisation of the cerebellar cortex, and strictly speaking, inputs are local (including ascending axon) or distal (parallel fibre). Similarly, the studies by Isope and Barbour (2002) and Walter et al. (2009), just like Sims and Hartell (2005 and 2006), have coined the term ‘ascending axon’ when drawing conclusions about locally stimulated inputs. Moreover, our results do not rely on or assume multiple contacts, stronger connections, or higher probability of connections between ascending axons and Purkinje cells. Our results only demonstrate a different plasticity outcome for the two types of inputs. Therefore, our manuscript could be rephrased with the terms ‘local’ and ‘distal’ granule cell inputs, but this would have no more implication for the results or the computation performed in Purkinje cells. However, in our experience, this is more confusing to the reader, and as we already stress this point in the manuscript, we do not wish to make this modification. However we will modify the abstract of the manuscript to clarify that point.

      Reviewer #2 (Public Review):

      Summary:

      The authors describe a form of synaptic plasticity at synapses from granule cells onto Purkinje cells in the mouse cerebellum, which is specific to synapses proximal to the cell body but not to distal ones. This plasticity is induced by the paired or associative stimulation of the two types of synapses because it is not observed with stimulation of one type of synapse alone. In addition, this form of plasticity is dependent on the order in which the stimuli are presented, and is dependent on NMDA receptors, metabotropic glutamate receptors and to some degree on GABAA receptors. However, under all experimental conditions described, there is a progressive weakening or run-down of synaptic strength. Therefore, plasticity is not relative to a stable baseline, but relative to a process of continuous decline that occurs whether or not there is any plasticity-inducing stimulus.

      As highlighted by the reviewer, we observed a postsynaptic rundown of the EPSC amplitude for both input pathways. Rundown could be mistaken for a depression of synaptic currents, not for a potentiation, and the progressive decrease of the EPSC amplitude during the course of an experiment leads to an underestimate of the absolute potentiation. We have taken the view to provide a strong set of control data rather than selecting experiments based on subjective criteria or applying a cosmetic compensation procedure. We have conducted control experiments with no induction (n = 17), which give a good indication of the speed and amplitude of the rundown. Comparison shows a highly significant potentiation of the ascending axon EPSC. Depression of the parallel fibre EPSC, on the other hand, was not significantly different from rundown, and we have not spoken of parallel fibre long term depression. The data show thus very clearly that ascending axon and parallel fibre synapses behave differently following the costimulation protocol.

      Strengths:

      The focus of the authors on the properties of two different synapse-types on cerebellar Purkinje cells is interesting and relevant, given previous results that ascending and parallel fiber synapses might be functionally different and undergo different forms of plasticity. In addition, the interaction between these two synapse types during plasticity is important for understanding cerebellar function. The demonstration of timing and order-dependent potentiation of only one pathway, and not another, after associative stimulation of both pathways, changes our understanding of potential plasticity mechanisms. In addition, this observation opens up many new questions on underlying intracellular mechanisms as well as on its relevance for cerebellar learning and adaptation.

      Weaknesses and suggested improvements:

      A concern with this study is that all recordings demonstrate "rundown", a progressive decrease in the amplitude of the EPSC, starting during the baseline period and continuing after the plasticity-induction stimulus. In the absence of a stable baseline, it is hard to know what changes in strength actually occur at any set of synapses. Moreover, the issues that are causing rundown are not known and may or may not be related to the cellular processes involved in synaptic plasticity. This concern applies in particular to all the experiments where there is a decrease in synaptic strength.

      We have provided an answer to that point directly below the summary paragraph. Moreover, if the phenomenon causing rundown was involved in plasticity, it should affect plasticity of both inputs, which was not the case, clearly distinguishing the ascending axon and parallel fibre inputs.

      The authors should consider changes in the shape of the EPSC after plasticity induction, as in Fig 1 (orange trace) as this could change the interpretation.

      Figure 1 shows an average response composed of evoked excitatory and inhibitory synaptic currents. The third section of Supplementary material (supplementary figure 3) shows that this complex shape is given by an EPSC followed by a delayed disynaptic IPSC. We would like to point out that while separating EPSC from IPSC might appear difficult from average traces due to the averaged jitter in the onset of the synaptic currents, boundaries are much clearer when analysing individual traces. In the same section we discuss the results of experiments in which transient applications of SR 95531 before and after the induction protocol allowed us to measure the EPSC, while maintaining the experimental conditions during induction. Analysis of the kinetics of the EPSCs during gabazine application at the beginning and end of experiments, showed that there is no change in the time to peak of both AA and PF response. The decay time of AA and PF EPSC are slightly longer at the end of the experiment, even if the difference is not significant for AA inputs (we will add this analysis to the revised version of the paper). Our analysis, that uses as template the EPSCs kinetics measured at the beginning and at the end of the experiments, takes directly into account these changes. The results show clearly that the presence of disynaptic inhibition doesn’t significantly affect the measure of the peak EPSC after the induction protocol nor the estimate of plasticity.

      In addition, the inconsistency with previous results is surprising and is not explained; specifically, that no PF-LTP was induced by PF-alone repeated stimulation.

      In our experimental conditions, PF-LTP was not induced when stimulating PF only, the only condition that reproduces experiments in the literature. As discussed in our response to reviewer 1, a close look at the literature, however, reveals variabilities and contradictions behind seemingly similar results. They reveal intricate mechanisms working together to produce plasticity, which are sensitive to in vitro conditions. We designed our experiments to be close to physiological conditions, with inhibition preserved and a physiological chloride gradient. It is likely that experimental differences have given rise to the variability of the results and our inability to observe PF-LTP. We will modify the discussion section to discuss that point fully in the context of past results.

      The authors test the role of NMDARs, GABAARs and mGluRs in the phenotype they describe. The data suggest that the form of plasticity described here is dependent on any one of the three receptors. However, the location of these receptors varies between the Purkinje cells, granule cells and interneurons. The authors do not describe a convincing hypothetical model in which this dependence can be explained. They suggest that there is crosstalk between AA and PF synapses via endocannabinoids downstream of mGluR or NO downstream of NMDARs. However, it is not clear how this could lead to the long-term potentiation that they describe. Also, there is no long-lasting change in paired-pulse ratio, suggesting an absence of changes in presynaptic release.

      We suggest in the result section that the transient change in paired pulse ratio (PPR) is linked to a transient presynaptic effect only, which has been reported by others. This suggests that the long lasting changes observed are postsynaptic, like other reports with similar trains of stimulation, and we will modify the manuscript to state this clearly.

      Concerning the involvement of multiple molecular pathways, investigators often tested for the involvement of NMDAR or mGluRs in cerebellar plasticity, rarely both. Here we showed that both pathways are involved. The conjunctive requirement for NMDAR and mGluR activation can easily be explained based on the dependence of cerebellar LTP and LTD on the concentrations of both NO and postsynaptic calcium (Coesman et al., 2004; Safo and Regehr, 2005; Bouvier et al., 2016; Piochon et al., 2016). NO production has been linked to the activation of NMDARs in granule cell axons (Casado et al., 2002; Bidoret et al., 2009; Bouvier et al., 2016), occasionally in molecular layer interneurones (Kono et al., 2019). NO diffuses to activate Guanylate Cyclase in the Purkinje cell. Based on the literature also, different mechanisms can feed a calcium increase, including mGluRs activation. Therefore NMDARs and mGluRs can reasonably cooperate to control postsynaptic plasticity. The associative nature of AA-LTP is more complex to explain, i.e. the requirement for co-activation of AA and PF inputs, and indicates a necessary cross talk between synaptic sites. We propose that either one of the receptors is absent from AA synapses, and a signal needs to propagate from PF to AA synapses, or that both receptors are present but a signal is required to activate one of the receptors at AA synapses.

      We also observed an effect of GABAergic inhibition. GABAergic inhibition was elegantly shown by Binda (2016) to regulate calcium entry together with mGluRs, and control plasticity induction. A similar mechanism could contribute to our results, although inhibition might have additional effects. We will modify the discussion of the manuscript and add a diagram to highlight the links between the different molecular pathways and potential cross talk mechanisms, and the location of receptors.

      Is the synapse that undergoes plasticity correctly identified? In this study, since GABAergic inhibition is not blocked for most experiments, PF stimulation can result in both a direct EPSC onto the Purkinje cell and a disynaptic feedforward IPSC. The authors do address this issue with Supplementary Fig 3, where the impact of the IPSC on the EPSC within the EPSC/IPSC sequence is calculated. However, a change in waveform would complicate this analysis. An experiment with pharmacological blockade will make the interpretation more robust. The observed dependence of the plasticity on GABAA receptors is an added point in favor of the suggested additional experiments.

      We did consider that due to long recording times there might be kinetic changes, and that’s the reason why the experiments of Supplementary figure 3 were done with pharmacological blockade of GABAAR with gabazine, both before and again after LTP induction. The estimate of the amplitude of the EPSC is based on the actual kinetics of the response at both times.

      A primary hypothesis of this study is that proximal, or AA, and distal, or PF, synapses are different and that their association is specifically what drives plasticity. The alternative hypothesis is that the two synapse-types are the same. Therefore, a good control for pairing AA with PF would be to pair AA with AA and PF with PF, thereby demonstrating that pairing with each other is different from pairing with self.

      Pairing AA with AA would be difficult because stimulation of AA can only be made from a narrow band below the PC and we would likely end up stimulating overlapping sets of synapses.. However, Figure 5 shows the effect of stimulating PF and PF, while also mimicking the sparse and dense configuration of the usual experiment. It shows that sparse PF do not behave like AA. Sims and Hartell (2006) also made an experiment with sparse PF inputs and observed clear differences between sparse local (AA) and sparse distal (PF) synapses.

      It is hypothesized that the association of a PF input with an AA input is similar to the association of a PF input with a CF input. However, the two are very different in terms of cellular location, with the CF input being in a position to directly interact with PF-driven inputs. Therefore, there are two major issues with this hypothesis: 1) how can sub-threshold activity at one set of synapses affect another located hundreds of micrometers away on the same dendritic tree? 2) There is evidence that the CF encodes teaching/error or reward information, which is functionally meaningful as a driver of plasticity at PF synapses. The AA synapse on one set of Purkinje cells is carrying exactly the same information as the PF synapses on another set of Purkinje cells further up and down the parallel fiber beam. It is suggested that the two inputs carry sensory vs. motor information, which is why this form of plasticity was tested. However, the granule cells that lead to both the AA and PF synapses are receiving the same modalities of mossy fiber information. Therefore, one needs to presuppose different populations of granule cells for sensory and motor inputs or receptive field and contextual information. As a consequence, which granule cells lead to AA synapses and which to PF synapses will change depending on which Purkinje cell you're recording from. And that's inconsistent with there being a timing dependence of AA-PF pairing in only one direction. Overall, it would be helpful to discuss the functional implications of this form of plasticity.

      We do not hypothesise that association of the AA and PF inputs is similar to the association of PF and climbing fibre inputs. We compare them because it is the only other known configuration triggering associative plasticity in Purkinje cells. We conclude that ‘The climbing fibre is not the only key to associative plasticity’, and it is indeed interesting to observe that even if the inputs are very small compared to the powerful climbing fibre input, they can be effective at inducing plasticity. Physiologically, the climbing fibre signal has been clearly linked to error and reward signals, but reward signals are also encoded by granule cell inputs (Wagner et al., 2017). We will modify the discussion to make sure that we do not suggest equivalence with CF induced LTD.

      Moreover, we fully agree that AA and PF synapses made up by a given granule cell carry the same information, and cannot encode sensory and motor information at the same time. Yet, these synapses carry different information. AA synapses from a local granule cell deliver information about the local receptive field, but PF synapses from the same granule cell will deliver contextual information about that receptive field to distant Purkinje cells. In the context of sensorimotor learning, movement is learnt with respect to a global context, not in isolation, therefore learning a particular association must be relevant. The associative plasticity we describe here could help explain this functional association. Difference in timing of the inputs therefore should represent difference in the timing of activation of different granule cells which receive either local information or information from different receptive fields. We will modify the discussion to make sure we do not suggest association between sensory and motor inputs, and clarify our view of local receptive field and context about ongoing activity.

      Reviewer #3 (Public Review):

      Granule cells' axons bifurcate to form parallel fibers (PFs) and ascending axons (AAs). While the significance of PFs on cerebellar plasticity is widely acknowledged, the importance of AAs remains unclear. In the current paper, Conti and Auger conducted electrophysiological experiments in rat cerebellar slices and identified a new form of synaptic plasticity in the AA-Purkinje cell (PC) synapses. Upon simultaneous stimulation of AAs and PFs, AA-PC EPSCs increased, while PFs-EPSCs decreased. This suggests that synaptic responses to AAs and PFs in PCs are jointly regulated, working as an additional mechanism to integrate motor/sensory input. This finding may offer new perspectives in studying and modeling cerebellum-dependent behavior. Overall, the experiments are performed well. However, there are two weaknesses. First, the baseline of electrophysiological recordings is influenced significantly by run-down, making it difficult to interpret the data quantitatively. The amplitude of AA-EPSCs is relatively small and the run-down may mask the change. The authors should carefully reexamine the data with appropriate controls and statistics. Second, while the authors show AA-LTP depends on mGluR, NMDA receptors, and GABA-A receptors, which cell types express these receptors and how they contribute to plasticity is not clarified. The recommended experiments may help to improve the quality of the manuscript.

      As highlighted by the reviewer and developed above in response to reviewer 2, we observed a postsynaptic rundown of the EPSC amplitude. Rundown could be mistaken for a depression of synaptic currents, not for a potentiation. Moreover, we have conducted control experiments with no induction (n = 17), which give a good indication of the speed and amplitude of the rundown, and provide a baseline. Comparison shows a highly significant potentiation of the ascending axon EPSC, relative to baseline and relative to these control experiments. Depression of the parallel fibre EPSC on the other hand was not significantly different from rundown. For that reason we have not spoken of parallel fibre long term depression. The data, however, show that ascending axon and parallel fibre synapses behave very differently following the costimulation protocol.

      We have discussed above in our response to reviewer 2 the potential involvement of mGluRs, NMDARs and GABAARs. We will modify the discussion of the manuscript and add a diagram to highlight the links between the different molecular pathways and potential cross talk mechanisms, and the location of receptors.

    1. Author Response:

      We greatly appreciate the insightful feedback provided by the reviewers and the editor on our manuscript titled "Automated workflow for the cell cycle analysis of non-adherent and adherent cells using a machine learning approach".  We will provide a revised version of the manuscript aiming to address the comments and recommendations provided by the reviewers to enhance the quality and clarity of our work. In detail:

      Reviewer #1 (Public Review):

      Summary:

      The manuscript proposes a series of steps using the FIJI environment, the authors have created a plugin for the initial steps of the process, merging images into an RGB stack, conversion to HSV, and then using brightness for reference and hue to distinguish the phases of the cycle. Then, the well-known Trackmate plugin was used to identify single cells and extract intensities. The data was further post-processed in R, where a series of steps, smoothing, scaling, and addressing missing frames were used to train a random forest. Hard-coded values of hue were used to distinguish G1, S, and G2/M. The process was validated with a score comparing the quality of the tracks and the authors reported the successful measure of the cell cycles.

      Strengths:

      The implementation of the pipeline seems easy, although it requires two separate platforms: Fiji and R. A similar approach could be implemented in a single programming environment like Python or Matlab and there would not be any need to export from one to the other. However, many labs have similar setups and that is not necessarily a problem.

      Weaknesses:

      I found two important weaknesses in the proposal:

      (1) The pipeline relies on a large number of hard-coded conditions: size of Gaussian blur (Gaussian should be written in uppercase), values of contrast, size of filters, levels of intensity, etc. Presumably, the authors followed a heuristic approach and tried values of these and concluded that the ones proposed were optimal. A proper sensitivity analysis should be performed. That is, select a range of values of the variables and measure the effect on the output.

      (2) Linked to the previous comments. Other researchers that want to follow the pipeline would have either to have exactly the same acquisition conditions as the manuscript or start playing with values and try to compensate for any difference in their data (cell diameter, fluorescent intensity, etc.) to see if they can match the results of the manuscript.

      We thank Reviewer #1 for the insightful comments. We acknowledge the importance of ensuring the reproducibility and robustness of our pipeline among different sample types, acquisition conditions and, consequently, image S/N ratio and resolution. To address the concerns regarding the reliance on hard-coded conditions and the impact of varying parameter values on the output, we will complete the Methods section of the manuscript and the “Usage” section of the README file in the Github repository (https://github.com/ieoresearch/cellcycle-image-analysis)  providing a summary of best practices that should be applied in the pre-processing part of the analysis. As an example, the usable image filters types and their settings related to cells with different size, fluorescence intensities and acquisition conditions will be analysed in detail and general guidelines will be provided.

      Moreover, we will provide detailed documentation on the acquisition conditions required for reproducibility in the README file and Methods section.

      For the Tracking Analysis part, we will refer to the well documented TrackMate tutorial to adapt the tracking analysis to different cell types, image resolution and intensities.

      Reviewer #2 (Public Review):

      Summary:

      This paper presents an automated method to track individual mammalian cells as they progress through the cell cycle using the FUCCI system and applies the method to look at different tumor cell lines that grow in suspension and determine their cell cycle profile and the effect of drugs that directly affect the cell cycles, on progression through the cell cycle for a 72 hour period.

      Strengths:

      This is a METHODS paper. The one potentially novel finding is that they can identify cells that are at the G1-S transition by the change in color as one protein starts to go up and the other one goes down, similar to the change seen as cells enter G2/M.

      Weaknesses:

      They did not clearly indicate whether the G1/S cells are identified automatically or need to be identified by the person reviewing the data. In Figures 1 and S1, the movie shows cells with no color at a time corresponding to what is about the G1/S transition. Their assigned cell cycle phase is shown in Figure 1 but not in Figure S1. None of these pictures show the G1/S cells that they talk about being able to detect with a different color.

      Thank you for your valuable feedback regarding the identification of G1/S cells in our pipeline. To clarify, the G1/S phase identification process is entirely automated within our pipeline. We apologize for any confusion caused by the lack of explicit indication in our manuscript. We will ensure to update the manuscript to clearly state that the identification of G1/S cells is performed automatically by our algorithm, eliminating the need for manual intervention.

      Regarding the visualization of G1/S cells in Figures 1 and S1, we will revise the figures to include all the available frames referred to the G1/S transition. It's important to note that during this transition, fluorescence intensities for both the green and the red channels, are dimmer in comparison with their intensity levels during the G2/M transitions. This can result in frames that may seem visually darker, despite both colors coexisting at the same time point. In our revised figures, we will ensure to include all available frames relevant to the G1/S transition and provide a clearer representation of this phenomenon.

      In response to Reviewer #2's recommendation, we plan to conduct additional experiments to further validate our observations. We will utilize the EdU technology to highlight the S-phase in FUCCI cells, allowing for better discrimination between the red and green fluorescence of the FUCCI reporter during the initial S-phase.

      Additionally, we acknowledge that the link to the Docker container (https://hub.docker.com/r/emanuelsoda/rf_semi_sup)  was not included in the manuscript. We apologize for this oversight, and it will be included in the revised version of the paper.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Public Review):

      A summary of what the authors were trying to achieve.

      The authors cultured pre- and Post-vaccine PBMCs with overlapping peptides encoding S protein in the presence of IL-2, IL-7, and IL-15 for 10 days, and extensively analyzed the T cells expanded during the culture; by including scRNAseq, scTCRseq, and examination of reporter cell lines expressing the dominant TCRs. They were able to identify 78 S epitopes with HLA restrictions (by itself represents a major achievement) together with their subset, based on their transcriptional profiling. By comparing T cell clonotypes between pre- and post-vaccination samples, they showed that a majority of pre-existing S-reactive CD4+ T cell clones did not expand by vaccinations. Thus, the authors concluded that highly-responding S-reactive T cells were established by vaccination from rare clonotypes.

      An account of the major strengths and weaknesses of the methods and results.

      Strengths:

      Selection of 4 "Ab sustainers" and 4 "Ab decliners" from 43 subjects who received two shots of mRNA vaccinations.

      Identification of S epitopes of T cells together with their transcriptional profiling. This allowed the authors to compare the dominant subsets between sustainers and decliners.

      Weaknesses were properly addressed in the revised manuscript, and I do not have any additional concerns.

      We appreciate the reviewer for the constructive comments and recommendations, which were a great help for us to improve our manuscript.

      Reviewer #3 (Public Review):

      Summary:

      The paper aims to investigate the relationship between anti-S protein antibody titers with the phenotypes&clonotypes of S-protein-specific T cells, in people who receive SARS-CoV2 mRNA vaccines. To do this, the paper recruited a cohort of Covid-19 naive individuals that receives the SARS-CoV2 mRNA vaccines and collect sera and PBMCs samples on different timepoints. Then they mainly generate three sets of data: 1). Anti-S protein antibody titers on all timepoints. 2) Single-cell RNAseq/TCRseq dataset for divided T cells after stimulation by Sprotein for 10 days. 3) Corresponding epitopes for each expanded TCR clones. After analyzing these result, the paper reports two major findings&claims: A) Individuals having sustained anti-S protein antibody response also have more so-called Tfh cells in their single-cell dataset. B). S-reactive T cells do exist before the vaccination, but they seems to be unable to response to Covid-19 vaccination properly.

      The paper's strength is it uses a very systemic and thorough strategy trying to dissect the relationship between antibody titers, T cell phenotypes, TCR clonotypes and corresponding epitopes, and indeed it reports several interesting findings about the relationship of Tfh clonotypes/sustained antibody and about the S-reactive clones that exist before the vaccination. The conclusion is solid in general but some claims are overstated. My suggestion is the authors should further limit their claims in abstract, for example,

      ”Even before vaccination, S-reactive CD4+ T cell clonotypes did exist, most of which (MAY) cross-reacted with environmental or symbiotic bacteria" -- The paper don't have experimental evidence to show these TCR clones respond to these epitopes.

      We thank the reviewer for pointing out the insufficient demonstration of experimental evidence. We have added the relevant data to Fig. S5 in the newly revised manuscript.

      "These results suggest that de novo acquisition of memory Tfh-like cells upon vaccination (LIKELY) contributes to the longevity of anti-S antibody titers." --Given the small sample size and the statistical analysis was not significant, this claim was overstated.

      "S-reactive T cell clonotypes detected immediately after 2nd vaccination polarized to follicular helper T (Tfh)-like cells (UNDER IN VITRO CULTURE)". -- the conclusion was based on vitro cultured cells, which had limitation.

      We thank the reviewer for the helpful suggestion. We have corrected some sentences in line with these suggestions in the newly revised manuscript.

      Recommendations for the authors:

      Please note: Though most of the overstatement was removed from the original manuscript, authors still need to modify some of the statements in "Abstract".

      We thank the reviewer for carefully reading our manuscript and giving us detailed suggestions. We have modified these statements in “Abstract” accordingly in the newly revised manuscript.

    1. Author Response

      The following is the authors’ response to the current reviews.

      At this stage the referees had only minor comments. Referee #1 asked whether archerfish indeed generalize in egocentric rather than allocentric coordinates. It might be that the current results do not rule out the idea that archerfish are unaware of changes in body position, they continue with previously successful actions, that seems as egocentric generalization. We agree with referee #1 and updated lines 255-260 in the results and added lines 329-336 in the discussion text that mentions this possibility. Referee #2 mentioned that a portion of fish did not make it to the final test which raises the question whether all individuals are able to solve the task. We agree with referee #2 and added paragraph at the discussion section to mention this point (lines 384-388). We also added the salinity of the water in the water tanks (line 98) as per suggestion of the Referee #2. Referee #2 suggested using a different term than “washout” in the behavioral experiments. Since the term “washout” is standard in the field, we keep the term in the text.


      The following is the authors’ response to the original reviews.

      eLife assessment

      This useful study explores how archerfish adapt their shooting behavior to environmental changes, particularly airflow perturbations. It will be of interest to experts interested in mechanisms for motor learning. While the evidence for an internal model for adaptation is solid, evidence for adaptation to light refraction, as initially hypothesized, is inconclusive. As such, the evidence supporting an egocentric representation might be caused by alternative mechanisms to airflow perturbations.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors examined whether archerfish have the capacity for motor adaptation in response to airflow perturbations. Through two experiments, they demonstrated that archerfish could adapt. Moreover, when the fish flipped its body position with the perturbation remaining constant, it did not instantaneously counteract the error. Instead, the archerfish initially persisted in correcting for the original perturbation before eventually adapting, consistent with the notion that the archerfish's internal model has been adapted in egocentric coordinates.

      Evaluation:

      The results of both experiments were convincing, given the observable learning curve and the clear aftereffect. The ability of these fish to correct their errors is also remarkable. Nonetheless, certain aspects of the experiment's motivation and conclusions temper my enthusiasm.

      (1) The authors motivated their experiments with two hypotheses, asking whether archerfish can adapt to light refractions using an innate look-up table as opposed to possessing a capacity to adapt. However, the present experiments are not designed to arbitrate between these ideas. That is, the current experiments do not rule out the look-up table hypothesis, which predicts, for example, that motor adaptation may not generalize to de novo situations with arbitrary actionoutcome associations. Such look-up table operations may also show set-size effects, whereas other mechanisms might not. Whether their capacity to adapt is innate or learned was also not directly tested, as noted by the authors in the discussion. Could the authors clarify how they see their results positioned in light of the two hypotheses noted in the Introduction?

      We agree with the referee that look up tables only confuse the issue. The question we tested is whether or not the fish uses adaptation mechanisms to correct its shooting. We have now changed the introduction both to eliminate the entire question of look up tables and also to clarify that both innate mechanisms and learning mechanisms can contribute to fish shooting, and that our research focuses on the question of whether the fish can adapt to a perturbation in its shooting caused by a change in its physical environment.

      (2) The authors claim that archerfish use egocentric coordinates rather than allocentric coordinates. However, the current experiments do not make clear whether the archerfish are "aware" that their position was flipped (as the authors noted, no visual cues were provided). As such, for example, if the fish were "unaware" of the switch, can the authors still assert that generalization occurs in egocentric coordinates? Or simply that, when archerfish are ostensibly unaware of changes in body position, they continue with previously successful actions.

      The fish has access to the body position switch: there are clues in a water tank that can help the fish orient inside the water tank. Additionally, there are no clues to the presence or direction of the air flow above the water tank. Moreover, previous experience has shown that the fish is sensitive to the visual cues and uses them to achieve consistent orientation within the tank when possible. These points have been added to the main text [lines 143-144, 254-257]

      (3) The experiments offer an opportunity to examine whether archerfish demonstrate any savings from one session to another. Savings are often attributed to a faster look-up table operation. As such, if archerfish do not exhibit savings, it might indicate a scenario where they do not possess a refined look-up table and must rely on implicit mechanisms to relearn each time.

      This is an important question. Indeed, we looked for the ‘saving’ effect in the data, but its noisy nature prevented us from drawing a concrete conclusion. We now mention this in lines 247-249.

      We have also eliminated the discussion of look up tables from the article.

      (4) The authors suggest that motor adaptation in response to wind may hint at mechanisms used to adapt to light refraction. However, how strong of a parallel can one draw between adapting to wind versus adapting to light refraction? This seems important given the claims in this paper regarding shared mechanisms between these processes. As a thought experiment, what would the authors predict if they provided a perturbation more akin to light refraction (e.g., a film that distorts light in a new direction, rather than airflow)?

      This is an important point. Indeed, our project started by looking for options to distort the refraction index or distort the light in a new direction. However, given the available ways of distorting the light to a new direction, it is hard to achieve that on the technical level. Initially, we tried using prism goggles, however the archerfish found it hard to shoot with the heavy load on the head. We have also explored oil on the water surface. However, given the available oils and the width of the film above water, it is hard to achieve considerable perturbation.

      Fish response to the perturbation matches the response to what would be expected for a change in light refraction. Light refraction perturbation does not change with the change in fish body position relative to the target. However, in response to (and in agreement with) the referees, we have generalized the context in which we see our results and discuss the results in terms of adaptation of the fish shooting behavior to changes in physical factors including light refraction, wind, fatigue, and others.

      (5) The number of fish excluded was greater than those included. This raises the question as to whether these fish are merely elite specimens or representative of the species in general.

      The filtering of the fish was in the training stage. The requirements were quite strict: the fish had to produce enough shots each day in the experimental setup. Very few fish succeeded. But all fish that got to the stage of perturbation exhibited the adaptation effect. We do not see a reason to think that the motivation to shoot will have a strong interaction with the shooting adaptation mechanisms.

      Reviewer #2 (Public Review):

      Summary:

      The work of Volotsky et al presented here shows that adult archerfish are able to adjust their shooting in response to their own visual feedback, taking consistent alterations of their shot, here by an air flow, into account. The evidence provided points to an internal mechanism of shooting adaptation that is independent of external cues, such as wind. The authors provide evidence for this by forcing the fish to shoot from 2 different orientations to the external alteration of their shots (the airflow). This paper thus provides behavioral evidence of an internal correction mechanism, that underlies adaptive motor control of this behavior. It does not provide direct evidence of refractory index-associated shoot adjustance.

      Strengths:

      The authors have used a high number of trials and strong statistical analysis to analyze their behavioral data.

      Weaknesses:

      While the introduction, the title, and the discussion are associated with the refraction index, the latter was not altered, and neither was the position of the target. The "shot" was altered, this is a simple motor adaptation task and not a question related to the refractory index. The title, abstract, and the introduction are thus misleading. The authors appear to deduce from their data that the wind is not taken into account and thus conclude that the fish perceive a different refractory index. This might be based on the assumption that fish always hit their target, which is not the case. The airflow does not alter the position of the target, thus the airflow does not alter the refractive index. The fish likely does not perceive the airflow, thus alteration of its shooting abilities is likely assumed to be an "internal problem" of shooting. I am sorry but I am not able to understand the conclusion they draw from their data.

      This is an important point. Indeed, our project started by looking for options to distort the refraction index or distort the light in a new direction. However, given the available ways of distorting the light to a new direction, it is hard to achieve that on the technical level. Initially, we tried using prism goggles, however the archerfish found it hard to shoot with the heavy load on the head. We have also explored oil on the water surface. However, given the available oils and the width of the film above water, it is hard to achieve considerable perturbation.

      Fish response to the perturbation matches the response to what would be expected for a change in light refraction. Light refraction perturbation does not change with the change in fish body position relative to the target. However, in response to (and in agreement with) the referees, we have generalized the context in which we see our results and discuss the results in terms of adaptation of the fish shooting behavior to changes in physical factors including light refraction, wind, fatigue, and others.

      Reviewer #2 (Recommendations For The Authors):

      I have had a hard time trying to understand how the authors concluded that the RI is important here as it is not altered. Thus I did not understand the conclusions drawn from this paper. The experiments are well described, but the conclusions are not to me. Maybe schematics would help to clarify. I am from outside the field and represent a naïve reader with an average intellect. The authors need to do a better job of explaining their results if they want others to understand their conclusions.

      See response to the public comments.

      Minor comments:

      Line 9: omit the "an".

      Done.

      Line 11: this sentence would fit way better if it followed the next one.<br /> Done.

      Line 15: and all the rest of the paper: washout is a strange term and for me associated with pharmacological manipulations - might only be me. I suggest using recovery instead throughout the manuscript.

      The term ‘washout’ is often used in the field of motor adaptation to describe the return to original condition. For example:

      Kluzik J, Diedrichsen J, Shadmehr R, Bastian AJ (2008) Reach adaptation: what determines whether we learn an internal model of the tool or adapt the model of our arm? J Neurophysiol 100:1455-64. doi: 10.1152/jn.90334.2008

      Donchin O, Rabe K, Diedrichsen J, Lally N, Schoch B, Gizewski ER, Timmann D (2012) Cerebellar regions involved in adaptation to force field and visuomotor perturbation. J Neurophysiol 107:134-47

      Line 19: the fish does not expect the flow, it expects that it shoots too short- no?

      Done.

      Line 35: fix the citation - in your reference manager.

      Done.

      Line 52: provide some examples of the mechanisms you think of or papers of it for naive readers. Otherwise, this sentence is not helpful for the reader.

      Done.

      Line 183: it's unclear which parameter you mean. Rephrase.

      Done.

      Line 197: should read to test "the" - same sentence: you repeat yourself- rephrase the sentence.

      Done.

      Figure 4: it was unclear to me why the figure was differentiating between fishes until I read the legend. Why not include direct information in the figure? A schematic maybe? Legend: you have a double "that" in C.

      We added the title for each column with the information about the direction of air.

      Figures: in all figures, perturbation is wrongly spelled! Change the term washout to recovery.

      Done. We kept the term ‘washout’

    1. Author response:

      We are grateful to reviewer #1 for positive evaluation of our work and for providing valuable comments that will significantly enhance the presentation of our results. We understand reviewer #2's negative assessment because we did not discuss an alternative model of dosage compensation in Drosophila. We will address this omission in the Introduction section of the revised manuscript and remove any controversial statements from other parts of the text. However, it is important to clarify that our study does not focus on the mechanisms of dosage compensation. The main goal of the manuscript was to investigate the assembly of the MSL complex and its specific binding to the Drosophila X chromosome. We utilized male survival data to demonstrate the efficacy of MSL complex binding to the X chromosome, a relationship that has been supported by numerous independent studies. We understand that Reviewer #2 agrees that disruption of the MSL complex binding results in male lethality. As far as we understand, Reviewer #2 suggests that the MSL complex does not activate transcription of X chromosome genes, but instead facilitate the recruitment of MOF protein and potentially other general transcription factors to the X chromosome. This could explain the decrease in autosomal gene expression due to a reduction in activating factors like MOF at autosomal promoters. In the upcoming revision, we aim to strike a balance between the two models that elucidate dosage compensation in Drosophila. We appreciate your feedback and look forward to enhancing the clarity and coherence of our manuscript based on your insightful comments.

      Reviewer #2 (Public Review):

      Summary:

      A deletion analysis of the MSL1 gene to assess how different parts of the protein product interact with the MSL2 protein and roX RNA to affect the association of the MSL complex with the male X chromosome of Drosophila was performed.

      Strengths:

      The deletion analysis of the MSL1 protein and the tests of interaction with MSL2 are adequate.

      We thank the reviewer for the positive assessment of the experimental work done.

      This reviewer does not adhere to the basic premise of the authors that the MSL complex is the primary mediator of dosage compensation of the X chromosome of Drosophila.

      We completely agree with this reviewer's claim. In the Introduction section we’ll attempt to make clear that there are two models for the functional role of specific recruitment of the MSL complex to the X chromosome in males.

      Several lines of evidence from various laboratories indicate that it is involved in sequestering the MOF histone acetyltransferase to the X chromosome but there is a constraint on its action there. When the MSL complex is disrupted, there is no overall loss of compensation but there is an increase in autosomal expression. Sun et al (2013, PNAS 110: E808-817) showed that ectopic expression of MSL2 does not increase expression of the X and indeed inhibits the effect of acetylation of H4Lys16 on gene expression. Aleman et al (2021, Cell Reports 35: 109236) showed that dosage compensation of the X chromosome can be robust in the absence of the MSL complex. Together, these results indicate that the MSL complex is not the primary mediator of X chromosome dosage compensation. The authors use sex-specific lethality as a measure of disruption of dosage compensation, but other modulations of gene expression are the likely cause of these viability effects.

      Sun et al (2013, PNAS 110: E808-817) showed that recruitment of the MSL complex-specific subunit MSL2 or the MOF protein to the UAS promoter resulted in recruitment of the entire MSL complex in males but not transcriptional activation. This important result argues that the MSL complex does not activate transcription. However, it must be taken into account that the GAL4 DNA binding region used to recruit the chimeric MSL2 protein to the UAS promoter was directly fused to the MSL2 RING domain, which is critical for interaction of MSL2 with MSL1 and its ubiquitination activity (this activity could potentially be involved in transcription activation). It also remains poorly understood what happens to the MSL complex after recruitment to the promoters or HAS on the X chromosome. Subcomplex MSL1/MSL3/MOF can acetylate TF and H4K16 during RNA polymerase II elongation, resulting in increasing of transcription. The separate role of MSL2 and MSL1 in the activation of transcription of gene promoters is also shown. Sun et al. showed that in females, recruitment of MOF to the UAS promoter leads to a strong increase in transcription, which is associated with the inclusion of MOF in the non-specific lethal (NSL) complex, which is bound to promoters and is required for strong transcription activation. In males, MOF is preferentially recruited to the UAS promoter in the full MSL complex or perhaps in the MSL1/MSL3/MOF subcomplex, which stimulates transcription during RNA polymerase II elongation much less strongly than NSL complex. The same result was obtained in the Prestel et al. 2010 (Mol Cell 38:815-26). In this study the GAL4 binding sites were inserted upstream of the lacZ and mini-white genes. Activation of transcription after recruitment of GAL4-MOF to the GAL4 sites was studied in males and females. As in Sun et al. 2013, strong activation of the reporter was observed in females. A weak transcriptional activation of the reporter gene in males was shown, and the MOF protein was detected not only on the promoter, but also in the coding and 3’ regions of the reporter.

      We do not understand how the paper by Aleman et al (Cell Reports 35: 109236, 2021) is consistent with the hypothesis that the MSL complex is not involved in the transcriptional activation of X chromosomal genes. The main conclusions of this paper: 1) Inactivation of Mtor leads to selective activation of the male X chromosome. 2) Mtor-driven attenuation of male X occurs in broad domains linked by the MSL complex. 3) Mtor genetically interacts with MSL components and reduces male mortality; 4) Mtor restrains dose-compensated expression at the level of nascent transcription. Thus, the paper shows that the MSL complex has an activator activity that is partially inhibited by Mtor. Accordingly, inactivation of Mtor only partially restored the survival of males in which dosage compensation was not completely inactivated.

      A detailed explanation was provided by Birchler and Veitia (2021, One Hundred Years of Gene Balance: How stoichiometric issues affect gene expression, genome evolution, and quantitative traits. Cytogenetics and Genome Research 161: 529-550).

      We agree that an alternative model of the dosage compensation mechanism is reasonable. We can assume that both mechanisms can function jointly provide effective dosage compensation in Drosophila males. At the suggestion of the reviewer to reconsider the entire context of the article, we will make many small changes throughout the manuscript.

      Reviewer #1 (Recommendations For The Authors):

      Overall, I found the text well written and the figures logically organized (especially Figure 5, which had the potential to confuse). The authors especially excelled in bringing together the decades of literature in the Discussion.

      I offer several suggestions to improve the readability:

      Consider presenting the coiled-coil domain homology in Figure 1A as a contrast for the N-terminal region, which the authors claim is poorly conserved.

      We’ll add the coiled-coil domain homology in Figure 1A in new version of MS.

      It is difficult to visualize the red MSL2 in Figure 2; the green and red panels should be presented separately in the main text, as they are in the Supplemental Figure 2.

      We’ll prepare Figure 2 with separate green and red panels.

      The ChIP-seq experiments for MSL proteins are well presented, but in my opinion, add little to the overall conclusions:

      Figure 6 mostly recapitulates what has already been published and utilized by several groups, most recently the authors themselves (Tikhonova 2019): that MSL expressed in females targets the X/HAS, similar to in males. While these are nice supporting data for the female transgenic system, I do not believe this figure should be prominently featured as if this is a novelty of the current study.

      We fully agree with the reviewer's comment about the limitation of scientific novelty in Figure 6. It has an auxiliary meaning. Therefore, we decided to transfer this figure to Supplementary material.

      The ChIP experiments in Figure 7 agree with the conclusions in Figures 2 and 3 (polytene chromosome immunostaining) when it comes to X/autosome localization. I believe it would help with the flow of the paper if these experiments were combined or at least placed closer together in the narrative, rather than falling at the end.

      We’ll move Figure 7 closer to polytene chromosome immunostaining. We agree with reviewer that this placement of the figure will make it easier to perceive the meaning of the article as a whole.

      I find Figure 8 difficult to understand, especially since the "clusters" are not annotated in the figure, but are described in the text. I struggled to follow the authors' conclusions based on these data. The authors could clarify the figure with annotations, although to be honest I do not currently see the value of this analysis/figure.

      In the new version of the article, we will try to make this figure more understandable: we will add explanations to the figure and a legend to it, and we will also try to place emphasis more clearly in the text of the article.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      I have only a few comments that I think will improve the manuscript and help readers better appreciate the context of the reported results.

      We would like to thank the Reviewer for their time in reviewing our manuscript. We appreciate the helpful feedback and assistance in ensuring the highest quality publication possible.

      One paradox, that the authors point out, is that the drastic effects of TALK-1 L114P on plasma membrane potential do not result in a complete loss of insulin secretion. One important consideration is the role of intracellular stores in insulin secretion at physiological levels of hyperglycemia. This needs to be discussed more thoroughly, especially in the light of recent papers like Postic et al 2023 AJP and others. The authors do show an upregulation of IP3-induced Ca release. It is not clear whether they think this is a direct or indirect effect on the ER. Is there more IP3? More IP3R? Are the stores more full?

      The reviewer brings up an important point. Although we see a significant reduction in glucose-stimulated depolarization in most islets from TALK-1 L114P mice, some glucosestimulated calcium influx is still present (especially from female islets); this suggests that a subset of islet β-cells are still capable of depolarization. Because our original membrane potential recordings were done in whole islets without identification of the cell type being recorded, we have now repeated these electrical recordings in confirmed β-cells (see Supplemental figure 6). The new data shows that 33% of TALK-1 L114P β-cells show action potential firing in 11 mM glucose, which would be predicted to stimulate insulin secretion from a third of all TALK-1 L114P β-cells; this could be responsible for the remaining glucosestimulated insulin secretion observed from TALK-1 L114P islets. However, ER calcium store release could also allow for some of the calcium response in the TALK-1 L114P islets. We have now detailed this in the discussion; this now details the Postic et. al. study showing that glucose-stimulated beta-cell calcium increases involve ER calcium release as it occurs in the presence of voltage-dependent calcium channel inhibition. Future studies can assess this using SERCA inhibitors and determining if glucose-stimulated calcium influx in TALK-1 L114P islets is lost. We also find that muscarinic stimulated calcium influx from ER stores is greater in TALK-1 L114P mice. We currently do not have data to support the mechanism for this enhancement of muscarinic-induced islet calcium responses from islets expressing TALK1 L114P. Our hypothesis is that greater TALK-1 current on the ER membrane is enhancing ER calcium release in response to IP3R activation. There is an equivalent IP3R expression in control and TALK-1 L114P islets based on transcriptome analysis, which is now included in the manuscript. However, whether there is greater IP3 production, greater ER calcium storage, and/or greater ER calcium release requires further analysis. Because this finding was not directly related to the metabolic characterization of this TALK-1 L114P MODY mutation, we are planning to examine the ER functions of TALK-1L114P thoroughly in a future manuscript.

      The authors point to the possible roles of TALK-1 in alpha and delta cells. A limitation of the global knock-in approach is that the cell type specificity of the effects can't easily be determined. This should be more explicitly described as a limitation.

      We thank the reviewer for this suggestion and have added this to the discussion. This is now included in a paragraph at the end of the discussion detailing the limitations of this manuscript.

      The official gene name for TALK-1 is KCNK16. This reviewer wonders whether it wouldn't be better for this official name to be used throughout, instead of switching back and forth. The official name is used for Abcc8 for example.

      We thank the reviewer for this suggestion and have revised the manuscript to include Kcnk16 L114P. The instances of TALK-1 L114P that remain in the manuscript are in cases where the text specifically discusses TALK-1 channel function.

      There are several typos and mistakes in editing. For example, on page 5 it looks like "PMID:11263999" has not been inserted. I suggest an additional careful proofreading.

      We have revised this reference, thoroughly proofread the revised manuscript, and corrected typos.

      The difference in lethality between the strains is fascinating. Might be good to mention other examples of ion channel genes where strain alters the severe phenotypes? Additional speculation on the mechanism could be warranted. It also offers the opportunity to search for genetic modifiers. This could be discussed.

      We thank the reviewer for this suggestion and have added details on mutations where strain alters lethality.

      The sex differences are interesting. Of course, estrogen plays a role as mentioned at the bottom of page 16, but there have been more involved analyses of islet sex differences, including a recent paper from the Rideout group. Is there a sex difference in the islet expression of KCNK16 mRNA or protein, in mice or humans?

      We thank the reviewer for the important comments on the TALK-1 L114P sex differences. We have revised the manuscript to include greater discussion about female β cell resilience to stress, which may allow greater insulin secretion in the presence of the TALK-1 L114P channels; this is based on the Brownrigg et. al. study pointed out by the reviewer (PMID: 36690328). Because these sex differences in islet function were examined in mice, we looked at KCNK16 expression in mouse beta-cells. While there is a trend for greater KCNK16 expression in sorted male beta-cells (average RPKM 6296.25 +/-953.84) compared to sorted female beta-cells (5148.25 +/- 1013.22). Similarly, there was a trend toward greater KCNK16 expression in male HFD treated mouse beta-cells (average RPKM 8020.75 +/- 1944.41) compared to female HFD treated mouse beta-cells (average RPKM 7551 +/- 2952.70). We have now added this to the text.

      Page 15-16 "Indeed, it has been well established that insulin signaling is required for neonatal survival; for example, a similar neonatal lethality phenotype was observed in mice without insulin receptors (Insr-/-) where death results from hyperglycemia and diabetic ketoacidosis by P3 (40)." Formally, the authors are not examining insulin signaling. A better comparison is that of the Ins1/Ins2 double knockout model of complete hypoinsulinemia.

      We thank the reviewer for suggesting this as the appropriate comparison model and have now revised the manuscript to detail the 48-hour average life expectancy of Ins1/Ins2 double knockout mice (PMID: 9144203).

      There are probably too many abbreviations in the paper, making it harder to read by nonspecialists. I recommend writing out GOF, GSIS, WT, K2P, etc.

      We thank the reviewer for this suggestion and have revised the manuscript to reduce the use of most abbreviations.

      Reviewer #2:

      We would like to thank the Reviewer for their time in reviewing our manuscript. We appreciate the helpful feedback and assistance in ensuring the highest quality publication possible. We have thoroughly addressed all the reviewer’s comments and revised the manuscript accordingly. These changes have strengthened the manuscript and are summarized below.

      (1) The authors perform an RNA-sequencing showing that the cAMP amplifying pathway is upregulated. Is this also true in humans with this mutation? Other follow-up comments and questions from this observation:

      a) Will this mean that the treatment with incretins will improve glucose-stimulated insulin secretion and Ca2+ signalling and lower blood glucose? The authors should at least present data on glucose-stimulated insulin secretion and/or Ca2+ signalling in the presence of a compound increasing intracellular cAMP.

      b) Will an OGTT give different results than the IPGTT performed due to the fact that the cAMP pathway is upregulated?

      c) Is the increased glucagon area and glucagon secretion a compensatory mechanism that increases cAMP? What happens if glucagon receptors are blocked?

      We thank the reviewer for the suggestions. Although cAMP pathways were upregulated in the TALK-1 L114P islets, the changes in expression were only modest as examined by qRTPCR. Thus, we are not sure if this plays a role in secretion. For humans with this mutation, there have been such a small number of patients and no islets isolated from these patients. Therefore, we are unaware if the cAMP amplifying pathway is upregulated in humans with the MODY associated TALK-1 L114P mutation. We have performed the suggested experiment assessing calcium from TALK-1 L114P islets in response to liraglutide (see Supplemental figure 10); there was no liraglutide response in TALK-1 L114P islets. We have also performed the OGTT experiments as suggested and these have now been added to the manuscript (see Supplemental figure 3). We do not believe that the increased glucagon is a compensatory response, because: 1. TALK-1 deficient islets have less glucagon secretion due to reduced SST secretion (see PMID: 29402588); 2. There is no change in insulin secretion at 7mM glucose, however, glucagon secretion is significantly elevated from islets isolated from TALK-1 L114P mice; 3. TALK-1 is highly expressed in delta-cells, and in these cells TALK-1 L114P would be predicted to cause significant hyperpolarization and significant reductions in calcium entry as well as SST secretion. Thus, reduced SST secretion may be responsible for the elevation of glucagon secretion. We plan to investigate delta-cells within islets from TALK-1 L114P mice in future studies to determine if changes in SST secretion are responsible for the elevated glucagon secretion from TALK-1 L114P islets.

      (2) The performance of measurements in both male and female mice is praiseworthy. However, despite differences in the response, the authors do not investigate the potential reason for this. Are hormonal differences of importance?

      We thank the reviewer for this important point. It is indeed becoming clear that there are many differences between male and female islet function and responses to stress. Thus, we have revised the manuscript to include greater discussion about these differences such as female β cell resilience to stress, which may allow greater insulin secretion in the presence of the TALK-1 L114P channels; this is based on the Brownrigg et. al. study pointed out by reviewer 1 (PMID: 36690328). While the differences in islet function and GTT between male and female L114P mice are clear, they both show diminished islet calcium handling, defective hormone secretion, and development of glucose intolerance. This manuscript was intended to demonstrate how the MODY TALK-1 L114P causing mutation caused glucose dyshomeostasis, which we have determined in both male and female mice. The mechanistic determination for the differences between male and female mice and islets with TALK-1 L114P could be due to multiple potential causes (as detailed in PMID: 36690328), thus, we believe that comprehensive studies are required to thoroughly determine how the TALK-1 L114P mutation differently impacts male and female mice and islets, which we plan to complete in a future manuscript.

      (3) MINOR: Page 5 .." channels would be active at resting Vm PMID:11263999.." The actual reference has not been added using the reference system.

      We thank the reviewer for noticing this mistake, which has now been corrected.

      Reviewer #3:

      The manuscript is overall clearly presented and the experimental data largely support the conclusions. However, there are a number of issues that need to be addressed to improve the clarity of the paper.

      We would like to thank the Reviewer for their time in reviewing our manuscript. We appreciate the helpful feedback and assistance in ensuring the highest quality publication possible. We have thoroughly addressed all the reviewer’s comments and revised the manuscript accordingly. These changes have strengthened and improved the clarity of the manuscript.

      Specific comments:

      (1) Title: The terms "transient neonatal diabetes" and "glucose dyshomeostasis in adults" are used to describe the TALK-1 L114P mutant mice. Transient neonatal diabetes gives the impression that diabetes is resolved during the neonatal period. The authors should clarify the criteria used for transient neonatal diabetes, and the difference between glucose dyshomeostasis and MODY. Longitudinal plasma glucose and insulin data would be very informative and help readers to follow the authors' narrative.

      We appreciate the helpful comment and have added longitudinal plasma glucose from neonatal mice to address this (see Supplemental figure 2). The new data now shows the TALK-1 L114P mutant mice undergo transient hyperglycemia that resolves by p10 and then occurs again at week 15. Insulin secretion from P4 islets is also included that shows that male animals homozygous for the TALK-1 L114P mutation have the largest impairment in glucosestimulated insulin secretion, followed by male heterozygous TALK-1 L114P P4 islets that also have impaired insulin secretion (see Figure 1). The amount of hyperglycemia correlates with the defects in neonatal islet insulin secretion.

      (2) Another concern for the title is the term "α-cell overactivity." This could be taken to mean that individual α-cells are more active and/or that there are more α-cells to secrete glucagon. The study does not provide direct evidence that individual α-cells are more active. This should be clarified.

      We appreciate the helpful comment and have revised the manuscript title accordingly.

      (3) In the Introduction, it is stated that because TALK-1 activity is voltage-dependent, the GOF mutation is less likely to cause neonatal diabetes, yet the study shows the L114P TALK-1 mutation actually causes neonatal diabetes by completely abolishing glucose-stimulated Ca2+ entry. This seems to imply TALK-1 activity (either in the plasma membrane or ER membrane) has more impact on Vm or cytosolic Ca2+ in neonates than initially predicted. Some discussion on this point is warranted.

      These are important points and we have added details to the discussion about this. For example, the discussion now states that, “This suggests a greater impact of TALK-1 L114P in neonatal islets compared to adult islets. Future studies during β-cell maturation are required to determine if TALK-1 activity is greater on the plasma membrane and/or ER membrane compared with adult β-cells.” The introduction has also been revised to clarify the voltagedependence of TALK-1.

      (4) What is the relative contribution of defects in plasma membrane depolarization versus ER Ca2+ handling on defective insulin secretion response?

      We thank the reviewer for bringing up this important point. TALK-1 L114P islets show blunted glucose-stimulated depolarization and glucose-stimulated calcium entry, however, the L114P islets show equivalent Ca2+ entry as control islets in response high KCl (Figure 5GH). As the KCl stimulated Ca2+ influx is similar between control and TALK-1 L11P islets, this indicates that plasma membrane TALK-1 L114P has a hyperpolarizing role that significantly blunts glucose-stimulated depolarization and reduces activation of voltage-dependent calcium channels. We have further tested this by looking at glucose-stimulated β-cell membrane potential depolarization in TALK-1 L11P islets, which is significantly blunted (Figure4 A and B; Supplemental figure 6). However, 33% of TALK-1 L11P β-cells showed glucose-stimulated electrical excitability (Supplemental figure 6), which likely accounts for the modest GSIS from TALK-1 L11P islets. New data has also been included showing that KCl stimulation causes a significant depolarization of β-cells from TALK-1 L11P islets (Supplemental figure 6). Because plasma membrane TALK-1 L114P is largely responsible for the hyperpolarized membrane potential and blunted glucose-stimulated Ca2+ entry, this suggests that TALK-1 L11P on the plasma membrane is primarily responsible for the altered insulin secretion. The discussion has been revised to reflect this.

      (5) The Jacobson group has previously shown that another K2P channel TASK-1 is also involved in ER Ca2+ homeostasis and that TASK inhibitors restored ER Ca2+ in TASK-1 expressing cells. Is TASK-1 expressed in β-cell ER membrane? Can the mishandling of Ca2+ caused by TALK-1 L114P be reversed by TASK-1 inhibitors?

      We thank the reviewer for bringing up this important point in relation to ER calcium handling by K2P channels. We have found that TASK-1 channels expressed in alpha-cells enhance ER calcium release and that inhibitors or TASK-1 channels elevate alpha-cell ER calcium storage. We did not observe any significant changes in the gene (Kcnk3) encoding TASK-1 between islets from control or TALK-1 L11P mice, which has now been added to the manuscript. However, because the TALK-1 L11P-mediated reduction of glucose-stimulated depolarization and inhibition of calcium entry are both prevented in the presence of high KCl (see Figure X); this strongly suggests that TALK-1 L114P K+ flux at the membrane is hyperpolarizing the membrane potential and limiting depolarization and calcium entry. This suggests that TALK-1 L114P control of ER calcium handling is not the primary contributor to the blunted glucose-stimulate calcium handling. Furthermore, acetylcholine stimulation of islets from both control and TALK-1 L114P islets elicited ER calcium release, which indicates that for the most part ER calcium release is still responsive to cues that control release, but they are altered. Taken together this suggests that the TALK-1 L114P impact on ER calcium is not the primary mediator of blunted glucose-stimulated islet calcium entry and insulin secretion.

      (6) The electrical recording experiments were conducted using whole islets. The authors should comment on how the cells were identified as β-cells, especially in mutant islets in which there is an increased number of α-cells.

      The reviewer brings up an important point. As indicated, the original membrane potential recordings were conducted using whole islets. While the recorded cells could mostly be βcells based on mouse islets typically containing >80% β-cells, there is a possibility that some of the cells included in these recordings were α-cells or δ-cells (especially because of the noted α-cell hyperplasia in TALK-1 L114P islets). Thus, we have now included data from bcells that were identified with an adenoviral construct containing a rat insulin promoter driving a fluorescent reporter. This allowed the fluorescent β-cells to be monitored with electrophysiological membrane potential recordings. The new data (see Supplemental figure 6) shows a significant reduction in glucose-stimulated depolarization in 67% of β-cells with the L114P mutation compared to controls.

      Minor:

      (1) Some references need formatting.

      The references have been revised accordingly.

      (2) Please define glucose-stimulated phase 0 Ca2+ response for non-expert readers.

      This has been defined accordingly.

      (3) Page 14 bottom: The sentence "Unlike the only other MODY-associated.........., TALK-1 is not inhibited by sulfonylureas" seems out of place and lacks context.

      We thank the reviewer for this suggestion and have deleted this sentence.

      (4) Figure 6: It would be helpful to provide a protein name for the genes shown in panel D.

      The protein names for the genes have now been included in the discussion of these genes.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We appreciate the thoughtful review of our manuscript by the reviewers, along with their valuable suggestions for enhancing our work. In response to these suggestions, we conducted additional experiments and made significant revisions to both the text and figures. In the following sections, we first highlight the major changes made to the manuscript, and thereafter address each reviewer's comments point-by-point. We hope these additional data and revisions have improved the robustness and clarity of the study and manuscript. Please note that as part of a suggested revision we have changed the manuscript title to be: Bacterial vampirism mediated through taxis to serum.

      Major revisions and new data:

      (1) We conducted additional experiments testing taxis to serum using a swine ex vivo enterohemorrhagic lesion model in which we competed wildtype versus chemotaxis deficient strains (Fig. 8). We selected swine for these experiments due to their similarity in gastrointestinal physiology to humans. In these experiments we see that chemotaxis, and the chemoreceptor Tsr, mediate localization to, and migration into, the lesion. We also tested, and confirmed, taxis to serum from swine and serum from horse, that supporting that serum attraction is relevant in other host-pathogen systems.

      (2) We present additional experimental data and quantification of chemotaxis responses to human serum treated with serine-racemase (Fig. S3). This treatment reduces wildtype chemoattraction and the wildtype no longer possesses an advantage over the tsr strain, providing further evidence that L-serine is the specific chemoattractant responsible for Tsr-mediated attraction to serum.

      (3) We present additional data in the form of 17 videos of chemotaxis experiments with norepinephrine and DHMA showing null-responses under various conditions. These data provide additional support to the conclusion that these chemicals are not responsible for bacterial attraction to serum. We have included these raw data as a new supplementary file (Data S1) for those in the field that are interested in these chemicals.

      (4) Based on comments from Reviewer 2 regarding whether the position of the ligand and ligand-binding site residues in the previously-reported EcTsr LBD structure are incorrect, or whether these differences are due to the proteins being from different organisms, we performed paired crystallographic refinements to determine which positions result in model improvement (Fig. 7J). Altering the EcTsr structure to have the ligand and ligandbinding site positions from our new higher resolution and better-resolved structure of Salmonella Typhimurium Tsr results in a demonstrably better model, with both Rwork and Rfree lower by about 1% (Fig. 7J). These data support our conclusion that the correct positions for both structures are as we have modeled them in the S. Typhimurium Tsr structure. We also solved an additional crystal structure of SeTsr LBD captured at neutral pH (7-7.5) that confirms our structure captured with elevated pH (7.5-9.7) has no major changes in structure or ligand-binding interactions (Fig. S6, Table S2).

      (5) Based on comments from Reviewer 2 on the accuracy of the diffusion calculations, we present a new analysis (Fig. S2) comparing the experimentally-determined diffusion of A488 compared to its calculated diffusion. We found that:

      [line 111]: “As a test case of the accuracy of the microgradient modeling, we compared our calculated values for A488 diffusion to the normalized fluorescence intensity at time 120 s. We determined the concentration to be accurate within 5% over the distance range 70270 µm (Fig. S2). At smaller distances (<70 µm) the measured concentration is approximately 10% lower than that predicted by the computation. This could be due to advection effects near the injection site that would tend to enhance the effective local diffusion rate.”

      (6) Both reviewers asked us to better justify why we focused on the chemoreceptor Tsr, and had questions about why we did not investigate Tar. The low concentration of Asp in serum suggests Tar could have some effect, but less so than Trg or Tsr (see Fig. 4A). We have revised the text throughout to better convey that we agree multiple chemoreceptors are involved in the response and clarify our rationale for studying the role of Tsr:

      [line 178]: “We modeled the local concentration profile of these effectors based on their typical concentrations in human serum (Fig. 4B). Of these, by far the two most prevalent chemoattractants in serum are glucose (5 mM) and L-serine (100-300 µM) (Fig. 4B-F). This suggested to us that the chemoreceptors Trg and/or Tsr could play important roles in serum attraction.”

      [line 186]: “Since tsr mutation diminishes serum attraction but does not eliminate it, we conclude that multiple chemoattractant signals and chemoreceptors mediate taxis to serum. To further understand the mechanism of this behavior we chose to focus on Tsr as a representative chemoreceptor involved in the response, presuming that serum taxis involves one, or more, of the chemoattractants recognized by Tsr that is present in serum: L-serine, NE, or DHMA.”

      [line 468] “Serum taxis occurs through the cooperative action of multiple bacterial chemoreceptors that perceive several chemoattractant stimuli within serum, one of these being the chemoreceptor Tsr through recognition of L-serine (Fig. 4).”

      Point-by-point responses to reviewer comments:

      Reviewer #1:

      (1) Presumably in the stomach, any escaping serum will be removed/diluted/washed away quite promptly? This effect is not captured by the CIRA assay but perhaps it might be worth commenting on how this might influence the response in vivo. Perhaps this could explain why, even though the chemotaxis appears rapid and robust, cases of sepsis are thankfully relatively rare.

      To clarify, the Enterobacteriaceae species we have tested here are colonizers of the intestines, not the stomach, and cases of bacteremia from these species are presumably due to bloodstream entry through intestinal lesions. Whether or not intestinal flow acts as a barrier to bloodstream entry is not something we test here, and so we have not commented on this idea in the manuscript. We do demonstrate that attraction to serum occurs within seconds-to-minutes of exposure. We expect that the major protective effects against sepsis are the host antibacterial factors in serum, which are well-described in other work. We have been careful to state throughout the text that we see attraction responses, and growth benefits, to serum that is diluted in an aqueous media, which is different than bacterial growth in 100% serum or in the bloodstream.

      (2) The authors refer to human serum as a chemoattractant numerous times throughout the study (including in the title). As the authors acknowledge, human serum is a complex mixture and different components of it may act as chemoattractants, chemo-repellents (particularly those with bactericidal activities) or may elicit other changes in motility (e.g. chemokinesis). The authors present convincing evidence that cells are attracted to serine within human serum - which is already a well-known bacterial chemoattractant. Indeed, their ability to elucidate specific elements of serum that influence bacterial motility is a real strength of the study. However, human serum itself is not a chemoattractant and this claim should be re-phrased - bacteria migrate towards human serum, driven at least in part by chemotaxis towards serine.

      Throughout the text we have changed these statements, including in the title, to either be ‘taxis to serum’ or ‘serum attraction.’ On the timescales we tested our data support that chemotaxis, not chemokineses or other forms of direction motility, is what drives rapid serum attraction, since a motile but non-chemotactic cheY mutant cannot localize to serum (Fig. 4). We present evidence of one of these chemotactic interactions (L-Ser).

      (3) Linked to the previous point, several bacterial species (including E. coli - one of the bacterial species investigated here) are capable of osmotaxis (moving up or down gradients in osmolality). Whilst chemotaxis to serine is important here, could movement up the osmotic gradient generated by serum injection play a more general role? It could be interesting to measure the osmolality of the injected serum and test whether other solutions with similar osmolality elicit a similar migratory response. Another important control here would be to treat human serum with serine racemase and observe how this impacts bacterial migration.

      As addressed above, we have added additional experiments of serum taxis treated with serine racemase showing competition between WT and cheY, and WT and tsr (Fig. S3). These data support a role for L-serine as a chemoattractant driving attraction to serum. The idea of osmotaxis is interesting, but outside the scope of this work since we focus on chemoattraction to L-serine as one of the mechanisms driving serum attraction, and have multiple lines of evidence to support that.

      (4) The migratory response of E. coli looks striking when quantified (Fig. 6C) but is really unclear from looking at Panel B - it would be more convincing if an explanation was offered for why these images look so much less striking than analogous images for other species (E.g. Fig. 6A).

      We agree that the E. coli taxis to serum response is less obvious. We have brightened those panels to hopefully make it clearer to interpret (more cells in field of view over time). Also, as stated in the y-axes of these plots, this quantification was performed by enumerating the number of cells in the field of view, and the Citrobacter and Escherichia responses are shown on separate y-axes (now Fig. 8C). As indicated, the experiments have different numbers of starting motile cells, which we presume accounts for the difference in attraction magnitude. When investigating diverse bacterial systems we found there to be differences in motility under the culturing and experimental conditions we employed, for multiple reasons, and so for these data we thought it best to report raw cell numbers rather data normalized to the starting number of bacteria, as we do elsewhere. In the specific case of these E. coli responding to serum, please view Supplementary Movie S3, which both clearly shows the attraction response and that the bacteria grew in a longer, semi-filamentous form that seem to impair their swimming speed.

      (5) It is unclear why the fold-change in bacterial distribution shows an approximately Gaussian shape with a peak at a radial distance of between 50 -100 um from the source (see for example Fig. 2H). Initially, I thought that maybe this was due to the presence of the microcapillary needle at the source, but the CheY distribution looks completely flat (Fig. 3I). Is this an artifact of how the fold-change is being calculated? Certainly, it doesn't seem to support the authors' claim that cells increase in density to a point of saturation at the source. Furthermore, it also seems inappropriate to apply a linear fit to these non-linear distributions (as is done in Fig. 2H and in the many analogous figures throughout the manuscript).

      We have revised the text to address this point, and removed the comment about cells increasing in density to a point of saturation: [Line 138] “We noted that in some experiments the population peak is 50-75 µm from the source, possibly due to a compromise between achieving proximity to nutrients in the serum and avoidance of bactericidal serum elements, but this behavior was not consistent across all experiments. Overall, our data show S. enterica serovars that cause disease in humans are exquisitely sensitive to human serum, responding to femtoliter quantities as an attractant, and that distinct reorganization at the population level occurs within minutes of exposure (Fig. 3, Movie 2).”

      We can confirm that this is not an artifact of quantification. Please refer to the videos of these responses, which demonstrates this point (Movies 1-5).

      (6) The authors present several experiments where strains/ serovars competed against each other in these chemotaxis assays. As mentioned, these are a real strength of the study - however, their utility is not always clear. These experiments are useful for studying the effects of competition between bacteria with different abilities to climb gradients.

      However, to meaningfully interpret these effects, it is first necessary to understand how the different bacteria climb gradients in monoculture. As such, it would be instructive to provide monoculture data alongside these co-culture competition experiments.

      Thank you for this suggestion. We agree that the coculture experiments showing strains competing for the same source of effector give a different perspective than monoculture. These experiments allow us to confirm taxis deficiencies or advantages with greater sensitivity, and ensure that the bacteria in competition have experienced the same gradient. This type of competition experiment is often used in in vivo experimentation for the same advantages. We note that in the gut the bacteria are not in monoculture and chemotactic bacteria do have to compete against each other for access to nutrients. Repeating all of the experiments we present to show both the taxis responses in coculture and monoculture would be an extraordinary amount of work that we do not believe would meaningfully change the conclusions of this study.

      (7) Linked to the above point, it would be especially instructive to test a tsr mutant's response in monoculture. Comparing the bottom row of Fig. 3G to Fig. 3I suggests that when in co-culture with a cheY mutant, the tsr mutant shows a higher fold-change in radial distribution than the WT strain. Fig. 4G shows that a tsr mutant can chemotaxis towards aspartate at a similar, but reduced rate to WT. This could imply that (like the trg mutant), a tsr mutant has a more general motility defect (e.g. a speed defect), which could explain why it loses out when in competition with the WT in gradients of human serum, but actually seems to migrate strongly to human serum when in co-culture with a cheY mutant. This should be resolved by studying the response of a tsr mutant in monoculture.

      Addressed above.

      (8) In Fig. 4, the response of the three clinical serovars to serine gradients appears stronger than the lab serovar, whilst in Fig. 1, the response to human serum gradients shows the opposite trend with the lab serovar apparently showing the strongest response. Can the authors offer a possible explanation for these slightly confusing trends?

      We suspect this relates to the fact that pure L-serine is a chemoattractant, whereas treatment with serum exposes the bacteria both to chemoattractants and, likely, chemorepellents. Strains may navigate the landscape of these stimuli different for a variety of reasons that are not simple to tease apart. The final magnitude of change in bacterial localization depends on multiple factors including swimming speed, adaptation, sensitivity of chemoattraction, and cooperative signaling of the chemoreceptor nanoarray. Thus, we cannot state with certainty how and why these strains are different across all experiments, but we can state that they are attracted to both serum and L-serine.

      (9) In Fig. S2, it seems important to present quantification of the effect of serine racemase and the reported lack of response to NE and DHMA - the single time-point images shown here are not easy to interpret.

      As suggested, we present quantification of the serum racemase treated samples (now Fig. S3). To assist in the interpretation of this max projections Fig. S3 now noted the chemotactic response (chemoattraction for L-serine, null-response for NE/DHMA). Further, we revised the text to state: [line 209: “We observed robust chemoattraction responses to L-serine, evident by the accumulation of cells toward the treatment source (Fig. S3E, Movie 4), but no response to NE or DHMA, with the cells remaining randomly distributed even after 5 minutes of exposure (Fig. S3F-I, Movie 5, Movie S1).”

      (10) Importantly, the authors detail how they controlled for the effects of pH and fluid flow (Line 133-136). Did the authors carry out similar controls for the dual-species experiments where fluorescent imaging could have significantly heated the fluid droplet driving stronger flow forces?

      Most of our microfluidics experiments were performed in a temperature-controlled chamber (see Methods). Since the strains in the coculture experiments experienced the same experimental conditions we have no evidence of fluorescence-imaginginduced temperature changes that have impacted whether or not the bacteria are attracted to serum or the effectors we investigated.

      (11) The inference of the authors' genetic analysis combined with the migratory response of E. coli and C. koseri to human serum shown in Fig. 6 is that Tsr drives movement towards human serum across a range of Enterobacteriaceae species. The evidence for the importance of Tsr here is currently correlative - more causal evidence could be presented by either studying the response of tsr mutants in these two species (certainly these should be readily available for E. coli) or by studying the response of these two species to serine gradients.

      We have revised the text to state: [line 402] “Without further genetic analyses in these strain backgrounds, the evidence for Tsr mediating serum taxis for these bacteria remains circumstantial. Nevertheless, taxis to serum appears to be a behavior shared by diverse Enterobacteriaceae species and perhaps also Gammaproteobacteria priority pathogen genera that possess Tsr such as Serratia, Providencia, Morganella, and Proteus (Fig. 8B).”

      We note that other work has thoroughly investigated E. coli serine taxis.

      Figure Suggestions

      (1) Fig. 2 - The inset bar charts in panels H-J and the font size in their axes labels are too small - this suggestion also applies to all analogous figures throughout the manuscript.

      We have increased the size of the text for these inset plots. We have also broken up some of the larger figures.

      (2) Panel 2F - the cartoon bacterial cell and 'number of bacteria' are confusing and seem to contradict the y-axis label. This also applies to several other figures throughout the manuscript where the significance of this cartoon cell is quite hard to interpret.

      As suggested, we have removed this cartoon.

      (3) Panels G-I in Fig. 3 are currently tricky to interpret - it would be easier if the authors were to use three different colours for the three different strains shown across these panels.

      We have broken up Figure 2 (which also had these types of plots) so that hopefully these labels are more clear. For the Figure in question (now Fig. 4), due to the many figures and different types of data and comparisons it was difficult to find a color scheme for these strains that would be consistent across the manuscript. These colors also reflect the fluorescence markers. We note that not only do we use color to indicate the strain but also text labels.

      (4) Panels 3B-F would be best moved to a supplementary figure as this figure is currently very busy. Similarly, I would potentially consider presenting only the bottom row of panels in Panels G-I in the main figure (which would then be consistent with analogous data presented elsewhere).

      We have opted to keep these panels in the main text (now Fig. 4) as they are relevant to understanding (1) our justification for why to pursue certain chemoeffector-chemoreceptor interactions and not others, and (2) how the chemoattraction response can be understood both in terms of bacterial population distribution and relevant cells over time.

      (5) Fig. 4 and possibly elsewhere - perhaps best not to use Ser as an abbreviation for Serine here because it could potentially be confused with an abbreviation for serum.

      It is unfortunate that these two words are so similar. However, Ser is the canonical abbreviation for the amino acid serine. Serum does not have a canonical abbreviation.

      (6) Fig. 4 - I would move panels H - K to a separate supplementary figure - currently, they are too squished together and it is hard to make out the x-axis labels. I would also consider moving panels E-G to supplementary as well so that the microscopy images presented elsewhere in the figure can be presented at an appropriate size.

      Since we are allowed more figures, we could also break some of these figures up into multiple ones.

      (7) Similarly, I would move some panels from Fig. 5 to supplementary as the figure is currently quite busy.

      We have rearranged the figure (now Fig. 7) to move the bioinformatics data to Fig. 8 to allow more space for the panels.

      Other suggestions

      (8) Line 179 - how do the concentrations quote for serine and glucose compare to aspartate? This would be helpful to justify the authors' decision not to investigate Tar as a potential chemoreceptor.

      This is addressed in our comments above and in Fig. 4A and Fig. 4B-F. Human serum L-Asp is much lower concentration (about 20-fold).

      (9) Line 282 - Serine levels in serum are quantified at 241 uM, but this is only discussed in the context of serum growth effects. Could this information be better used to design/ inform the serine gradients that were tested in chemotaxis assays?

      We tested a wide range of serine concentrations and show even much lower sources of serine than is present in serum is sufficient for chemoattraction. Also, the K1/2 for serine is 105 uM (Fig. S4), which is surpassed by the concentration in serum (Fig. S5).

      (10) The word 'potent' in the title might be too vague, especially as the strength of the response varies between strains/species. It may perhaps be more useful to focus on the rapidity/sensitivity of the response. However, presumably the sensitivity of the response will be driven by the sensitivity of the response to serine (which is already known for E. coli at least). Also, as noted in the public review, human serum itself is not a chemoattractant so I would consider re-phasing this in the title and elsewhere.

      As suggested, and discussed above, we have implemented this change.

      (11) Typo line 59 'context of colonizing of a healthy gut'.

      Addressed.

      (12) Typo line 538 - there is an extra full stop here.

      Addressed.

      Reviewer #2:

      (1) This study is well executed and the experiments are clearly presented. These novel chemotaxis assays provide advantages in terms of temporal resolution and the ability to detect responses from small concentrations. That said, it is perhaps not surprising these bacteria respond to serum as it is known to contain high levels of known chemoattractants, serine certainly, but also aspartate. In fact, the bacteria are shown to respond to aspartate and the tsr mutant is still chemotactic. The authors do not adequately support their decision to focus exclusively on the Tsr receptor. Tsr is one of the chemoreceptors responsible for observed attraction to serum, but perhaps, not the receptor. Furthermore, the verification of chemotaxis to serum is a useful finding, but the work does not establish the physiological relevance of the behavior or associate it with any type of disease progression. I would expect that a majority of chemotactic bacteria would be attracted to it under some conditions. Hence the impact of this finding on the chemotaxis or medical fields is uncertain.

      We agree that the data we show are mostly mechanistic and further work is required to learn whether this bacterial behavior is relevant in vivo and during infections. We present new data using an ex vivo intestinal model which supports the feasibility of serum taxis mediating invasion of enterohemorrhagic lesions (Fig. 8).

      (2) The authors also state that "Our inability to substantiate a structure-function relationship for NE/DHMA signaling indicates these neurotransmitters are not ligands of Tsr." Both norepinephrine (NE) and DHMA have been shown previously by other groups to be strong chemoattractants for E. coli (Ec), and this behavior was mediated by Tsr (e.g. single residue changes in the Tsr binding pocket block the response). Given the 82% sequence identity between the Se and Ec Tsr, this finding is unexpected (and potentially quite interesting). To validate this contradictory result the authors should test E. coli chemotaxis to DHMA in their assay. It may be possible that Ec responds to NE and DHMA and Se doesn't. However, currently, the data is not strong enough to rule out Tsr as a receptor to these ligands in all cases. At the very least the supporting data for Tsr being a receptor for NE/DHMA needs to be discussed.

      Addressed above. The focus of this study is serum attraction and the mechanisms thereof. We never saw any evidence to support the idea that NE/DHMA drives attraction to serum, nor are chemoeffectors for Salmonella, and provide these null-results in Data S2.

      (3) The authors also determine a crystal structure of the Se Tsr periplasmic ligand binding domain bound to L-Ser and note that the orientation of the ligand is different than that modeled in a previously determined structure of lower resolution. I agree that the SeTsr ligand binding mode in the new structure is well-defined and unambiguous, but I think it is too strong to imply that the pose of the ligand in the previous structure is wrong. The two conformations are in fact quite similar to one another and the resolution of the older structure, is, in my view, insufficient to distinguish them. It is possible that there are real differences between the two structures. The domains do have different sequences and, moreover, the crystal forms and cryo-cooling conditions are different in each case. It's become increasingly apparent that temperature, as manifested in differential cooling conditions here, can affect ligand binding modes. It's also notable that full-length MCPs show negative cooperativity in binding ligands, which is typically lost in the isolated periplasmic domains. Hence ligand binding is sensitive to the environment of a given domain. In short, the current data is not convincing enough to say that a previous "misconception" is being corrected.

      Thank you for this comment, which spurred us to investigate this idea more rigorously. As described above we performed new refinements of the E. coli structure edited to have the positions of the ligand and ligand-binding site as modeled in our new Tsr structure from Salmonella (Fig. 7J). The best model is obtained with these poses. Along with the poor fit of the E. coli model to the density, the best interpretations for these positions, for both structures, are as we have modeled them in the Salmonella Tsr structures.

      Figure suggestions

      (1) Figure 2 looks busy and unorganized. Fig 2C could be condensed into one image where there are different colored rings coming from the source point that represent different time points.

      Addressed above. Fig. 2 has been broken apart to help improve clarity.

      (2) What is the second (bottom) graph of 2D? I think only the top graph is necessary.

      We have added an explanation to the figure legend that the top graph shows the means and the bottom shows SEM. The plots cannot easily be overlaid.

      (3) Similarly, Fig 2E doesn't need to have so many time points. Perhaps 4 at maximum.

      As the development of the response over time is a key take-home of the study, we do not wish to reduce the timepoints shown.

      (4) The legend for Figure 2F uses the unit 'µM' to mean micrometers but should use 'µm'.

      Corrected.

      (5) In Figures 2H-J, the lime green text is difficult to read. The word "serum" does not need to be at the top of each panel. I recommend shortening the y-axis titles on the graphs so you can make the graphs themselves larger.

      Addressed above.

      (6) In Figures 2H-J, I am confused about what is being shown in the inset graph. The legend says it's the AUC for the data shown. However, in the third panel (S. Typhimurium vs. S. Enteriditus) the data appears to be much more disparate than the inset indicates. I don't think that this inset is necessary either.

      The point of this inset graph is to quantify the response through integration of the curve, i.e., area under the curve, which is a common way to quantify complex curves and compare responses as single values. We are using this method to calculate statistical significant of the response compared to a null response. We have added further clarification to the figure legend regarding these plots: Inset plots show foldchange AUC of strains in the same experiment relative to an expected baseline of 1 (no change). p-values shown are calculated with an unpaired two-sided t-test comparing the means of the two strains, or one-sided t-test to assess statistical significance in terms of change from 1-fold (stars).

      (7) Line 154, change "relevant for" to "observed in".

      Changed.

      (8) Line 171, according to the Mist4 database, Salmonella enterica has seven chemoreceptors. Why are only Tar, Tsr, and Trg mentioned? Why were only Tsr and Trg tested?

      Addressed above.

      (9) Line 192, be clear that you are referring to genes and not proteins, as italics are used.

      Revised to make this distinction clear.

      (10) Line 193, have other studies found a Trg deletion strain to be non-chemotactic? If so, cite this source here.

      We state that the Trg deletion strain had deficiencies in motility, and also have revised the text to include the clarification that this was not noted in earlier work with this strain: [line 173]: We were surprised to find that the trg strain had deficiencies in swimming motility (data not shown). This was not noted in earlier work but could explain the severe infection disadvantage of this mutant 34. Because motility is a prerequisite for chemotaxis, we chose not to study the trg mutant further, and instead focused our investigations on Tsr.

      (11) Why wasn't a Tar deletion mutant also analyzed? The authors say that based on the known composition of serum, serine and glucose are the most abundant. However, the serum does have aspartate at 10s of micromolar concentrations.

      Addressed above.

      (12) “The Tsr deletion strain still exhibits an obvious chemoattraction to serum. There are other protein(s) involved in chemoattraction to serum but the text does not discuss this.”

      Addressed above.

      (13) “In Figure 3B-F, the text is very difficult to read even when zoomed in on.”

      We have increased the font size of these panels.

      (14) “All of the text in Figure 5 is extremely small and difficult to read.”

      Addressed above. We split this figure in two to help improve clarity.

      (15) “I wonder about the accuracy of the concentration modeling. It seems like there are a lot of variables that could affect the diffusion rates, including the accuracy of the delivery system. Could the concentrations be verified by the dye experiments?”

      Addressed above. We provide a new analysis comparing experimental diffusion of A488 dye compared to calculations (Fig. S2).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      Major comments:

      (1) It is nice that the authors compared their model to the one "without lookahead" in Figure 4, but this comparison requires more evidence in my opinion, as I explain in this comment. The model without lookahead is closely related or possibly equivalent to the standard predictive coding. In predictive coding, one can make the network follow the stimulus rapidly by reducing the time constant tau. However, as the time constant decreases, the network would become unstable both in simulations (due to limited integration time step) and physical implementation (due to noise). Therefore I wonder if the proposed model has an advantage over standard predictive coding with an optimized time constant. Hence I suggest to also add a comparison between the proposed model, and the predictive coding with parameters (such as tau) optimized independently for each model. Of course, we know that the time-constant of biological neurons is fixed, but biological neurons might have had different time constants (by changing leak conductance) and such analysis could shed light on the question of why the neurons are organized the way they are.

      The comparison with a predictive network for which the neuronal time constants shrink towards 0 is in fact helpful. We added two news subsections in the SI that formally compares the NLA with other approaches, Equilibrium propagation and the Latent Equilibrium, with a version of Equilibrium Propagation also covering the standard predictive coding you describe (SI, Sect.C and D). The Subsection C concludes: “In the Equilibrium propagation we cannot simply take the limit t0 since then the dynamics either disappears (when tau remains on the left, t Du  0) or explodes (when t is moved to the right, dt/ t  ∞), leading to either too small or too big jumps.”

      We have also expanded the passage on the predictive coding in the main text, comparing our instantaneous network processing (up to a remaining time constant tin) with experimental data from humans (see page 10 of the revised ms). The new paragraph ends with:

      “Notice that, from a technical perspective, making the time constants of individual cortical neurons arbitrarily short leads to network instabilities and is unlikely the option chosen by the brain (see SI Sect. C, Comparison to the Equilibrium Propagation).”

      A new formal definition of the moving equilibrium in the Methods (Sect. F) helps to understand this notion of being in a balanced equilibrium state during the dynamics. This formal definition directly leads to the contraction analysis in the SI, Sect. D, showing why the Latent Equilibrium is always contractive, while the current form of the NLA may show jumps at the corner of a ReLu (since a second order derivative of the transfer function enters in the error propagation).

      The reviewer perhaps has additional simulations in mind that compare the robustness of the different models. However, as this paper is more about presenting a novel concept with a comprehensive theory (summing up to 45 pages), we prefer to not add more than the simulations necessary to check the statements of the theorems.

      (2) I found this paper difficult to follow, because the Results sections went straight into details, and various elements of the model were introduced without explaining why they are necessary. Furthermore, the neural implementation was introduced after the model simulations. I suggest reorganizing the manuscript, to describe the model following Marr's levels of description and then presenting the results of simulations. In particular, I suggest starting the Results section by explaining what computation the network is trying to achieve (describe the setup, function L, define its integral over time, and explain that the goal is to find a model minimizing this integral). Then, I suggest presenting the algorithm the neurons need to employ to minimize this integral, i.e. their dynamics and plasticity (I wonder if r=rho(u) + tau rho(u)' is a consequence of action minimization or a necessary assumption - please clarify it). Next please explain how the algorithms could be implemented in biological neurons. Afterward please present the results of the simulation.

      We are sorry to realize that we could not convey the main message clearly enough. After rewriting the paper and straightening the narrative, we hope it is simpler to understand now.

      The paper does not suggest a new model to solve a task, and writing down the function to be minimized is not enough. The point of the NLA is that the time integral of our Lagrangian is minimized with respect to the prospective coordinates, i.e. the discounted future voltage. It is about the question how dynamic equations in biology are derived. Of course, we also solve these equations, prove theorems and perform simulations. But the main point that biology seems to deal with time differently than physics deals with time. Biology “thinks” in terms of future quantities, physics “thinks” in terms of current quantities. We tried to explain this better now in the Introduction, the Results (e.g. after Eq. 5) and the Methods.

      (3) Understanding the paper requires background knowledge that most readers of eLife are unlikely to have, even if they are mathematically minded. For example, I am from the field of computational neuroscience, and I have never heard about Least Action principle from physics or the EulerLagrange equation. I felt lost after reading this paper, and to be able to write this review I needed to watch videos on the Euler-Lagrange equation. To help other readers, I have two suggestions: First, I feel that Eq 4-6 could be moved to the methods, because I found the concept of u~ difficult to understand, and it does not appear in the algorithm. Second, I advise to write in the Introduction, what knowledge is required to follow this paper, and point the readers to resources where they can find the required information. The authors may specify what background is required to follow the main text, and what is required to understand the methods.

      We hope that after explaining the rationale better, it becomes clear that we cannot skip the equations for the prospective coordinates. Likewise, the Euler-Lagrange equations need to be presented in the abstract form, since these are the equations that are eventually transformed into the “model”. We tried to give the basic intuition for this in the main text. As we explained above, the equations asked to be skipped represent the essence of the proposal. It is about how to derive a model equations.

      Moreover, we give more explanations in the Methods to understand the derivations, and we refer to the specifically sections in the SI for further details. We are aware that a full understanding of the theory requires some basic knowledge of the calculus of variation.

      We are hesitating to write in the Introduction what type of knowledge is required to understand the paper. An understanding can be on various levels. Moreover, the materials that are considered to be helpful depend on the background. While for some it is a Youtube, for some Wikipedia, and for others it is a textbook where specific ingredients can be extracted. But we do cite two textbooks in the Results and more in the SI, Sect. F, when referring to the principle of least action in physics and the mathematics, including weblinks.

      Minor comments

      Eq.3: The Authors refer to this equation as a Lagrangian. Could you please clarify why? Is the logic to minimize the energy subject to a constraint that Cost = 0?

      Thanks for asking. The cost is not really a constraint, it is globally minimized, in parallel steps. We are explaining this right after Eq. 3. “We `prospectively' minimize L locally across a voltage trajectory, so that, as a consequence, the local synaptic plasticity for W will globally reduce the cost along the trajectory (Theorem 1 below).”

      We were adding two sentence that explain why this function in Eq. 3 is called a Lagrangian: “While in classical energy-based approaches L is called the total energy, we call it the `Lagrangian' because it will be integrated along real and virtual voltage trajectories as done in variational calculus (leading to the Euler-Lagrange equations, see below and SI, Sect. F)”

      p.4, below Eq. 5 - Please explain the rationale behind NLA, i.e. why is it beneficial that "the trajectory u˜(t) keeps the action A stationary with respect to small variations δu˜"? I guess you wish to minimize L integrated over time, but this is not evident from the text.

      Hmm, yes and no. We wish to minimize the cost, and on the way there minimize the action. Since the global minimization of C is technically difficult, one looks for stationary trajectory as defined in the cited sentence, while minimizing L with respect to W, to eventually minimize the cost.

      In the text we now explain after Eq. 5:

      “The motivation to search for a trajectory that keeps the action stationary is borrowed from physics. The motivation to search for a stationary trajectory by varying the near-future voltages ũ instead of u is assigned to the evolutionary pressure in biology to 'think ahead of time'. To not react too late, internal delays involved in the integration of external feedback need to be considered and eventually need to be overcome. In fact, only for the 'prospective coordinates' defined by looking ahead into the future, even when only virtually, will a real-time learning from feedback errors become possible (as expressed by our Theorems below).”

      Bottom of page 8. The authors say that in the case of single equilibrium and strong nudging the model reduced to the Least Control Principle. Does it also reduce to Predictive coding for supervised learning? If so, it would be helpful to state so.

      Yes, in this case the prediction error in the apical dendrite becomes the one of predictive coding. We are stating this now right at the end of the cited sentence:

      “In the case of strong nudging and a single steady-state equilibrium, the NLA principle reduces to the Least-Control Principle (Meulemans et al., 2022) that minimizes the mismatch energy E^M for a constant input and a constant target, with the apical prediction error becoming the prediction error from standard predictive coding (Rao & Ballard, 1999).”

      In the Discussion we also added a further point (iv) to compare the NLA principle with predictive coding. Both “improve” the sensory representation, but the NLA does in favor of an output, and the predictive coding in favor of the sensory prediction itself (see Discussion).

      Whenever you refer to supplementary materials, please specify the section, so it is easier for the reader to find it.

      Done. Sorry to not have done it earlier. We are now also indicate specific sections when referring to the Methods.

      Reviewer #2 (Recommendations For The Authors):

      There are no major issues with this article, but I have several considerations that I think would greatly improve the impact, clarity, and validity of the claims.

      (1) Unifying the narrative. There are many many ideas put forward in what feels like a deluge. While I appreciate the enthusiasm, as a reader I found it hard to understand what it was that the authors thought was the main breakthrough. For instance, the abstract, results, introduction, and discussion all seem to provide different answers to that question. The abstract seems to focus on the motor error idea. The introduction seems to focus on the novel prospective+predictive setup of the energy function. The discussion lists the different perks of the theory (delay compensation, moving equilibrium, microcircuit) without referring to the prospective+predictive setup of the energy function.

      Thanks much for these helpful hints. Yes, the paper became an agglomerate of many ideas, also own to the fact that we wish to show how the NLA principle can be applied to explain various phenomenology in neurosicence. We now simplified the narrative to this one point of providing a novel theoretical framework for neuroscience, and explaining why this is novel and why it “suddenly works” (the prospective minimization of the energy).

      As you can see from the dominating red in the revised pdf, we did fully rewrite Abstract, Introduction and Discussion under the narrative of the NLA and prospective coding.

      (2) Laying out the organization of the notation clearly. There are quite a few subtle distinctions of what is meant by the different weight matrices (omnibus matrix then input vs recurrent then layered architecture), different temporal horizon formalisms (bar, not bar, tilde), different operators (L, curly L, derivative version, integral version). These different levels are introduced on the fly, which makes it harder to grasp. The fact that there are many duplicate notations for the same quantities does not help the reader. For instance u_0 becomes equal to u_N at one point (above Eq 25). Another example is the constant flipping between integrated and 'current input' pictures. So laying out the multiple layers early, making a table or a figure for the notation, or sticking with one level would help convey the idea to a wide readership.

      Thanks for the hints. We included the table you suggested, but put it to the SI as it became a full page itself. We banned the curly L abbreviating the look-ahead operator.

      The “change of notation” you are alluding to is tricky, though. In a recurrent layer, the index of the output neuron is called o. In a forward network with N layer, the index of the output neurons becomes the last layer N. One has to introduce the layer index l anway for the deeper layers l < N, and we found it more consistent to explain that, while switching from the recurrent to the forward network, the voltage of the output layer becomes now u_o = u_N. There are more of these examples, like the weight matrix W splitting into a intrinsic network part W_net across which errors backpropagate, and a part conveying the input, W_in, that has to be excluded when writing the backpropagation formula for general networks. Again, in the case of the feedforward networks, the notation reduces to W_l, with index l coding for the layer. Presenting the general approach and a specific example may appear as we would duplicate notations – we haven’t found a solution here.

      (3) Separate the algorithm from the implementation level. I particularly struggled with separating the ideas that belonged to the algorithm level (cost function, optimization objectives) and the biophysics. The two are interwoven in a way that does not have to be. Particularly, some of the normative elements may be implemented by other types of biophysics than the authors have in mind. It is for this reason that I think that separating more clearly what belongs to the implementation and algorithm levels would help make the ideas more widely understood. On this point, a trigger point for me was the definition of the 'prospective input rates' e_i, which comes in the second paragraph.

      We are very sorry to have made you thinking that the 'prospective input rates' would be e_i. The prospective input rates are r_i. The misunderstanding likely appeared by an unclear formulation from our side that is now corrected (see first and second paragraph of the Results where we introduce r_i and e_i).

      From a biophysical perspective, it is quite arbitrary to define the input to be the difference between the basal input and the somatic (prospective) potential. It sounds like it comes from some unclear normative picture at this point. But the authors seem to have in mind to use the fact that the somatic potential is the sum of apical and basal input, that's the biophysical picture.

      We hope to have disentangled the normative and biophysical view in the 2nd and 3rd paragraph of the Results, respectively. We introduce the prospective error ei as abstract notion in the first paragraph, while explaining that it will be interpreted as somato-dendritic mismatch error in neuron I in the next paragraph. The second paragraph contains the biophysical details with the apical and basal morphology.

      (4) Experts and non-expert would appreciate an explanation of why/how the choice of state variables matters in the NLA. The prospective coding state variables cannot be said to be the naïve guess. Why does the simple u, dot{u} not work as state variables applied on the same energy function, as would be a naïve application of the Lagrangian ideas?

      We are very glad for this hint to present an intuition behind the variation of the action with respect to a prospective state, instead of the state itself. The simple L(u, dot{u}) does not work because one does not obtain the first-order voltage dynamics compatible with the biophysics. We made an effort to explain the intuition to non-experts and experts in an additional paragraph right after presenting the voltage and error dynamics (Eq. 7 on page 4).

      Here is how the paragraph starts (not displaying the formulas here):

      “From the point of view of theoretical physics, where the laws of motion derived from the least-action principle contain an acceleration term (as in Newton's law of motion, like … for a harmonic oscillator), one may wonder why no second-order time derivative appears in the NLA dynamics. As an intuitive example, consider driving into a bend. Looking ahead in time helps us to reduce the lateral acceleration by braking early enough, as opposed to braking only when the lateral acceleration is already present. This intuition is captured by minimizing the neuronal action A with respect to the discounted future voltages ũi instead of the instantaneous voltages ui.

      Keeping up an internal equilibrium in the presence of a changing environment requires to look ahead and compensate early for the predicted perturbations.

      Technically, …”

      More details are given in the Methods after Eq. 20. Moreover, in the last part of the SI, Sect. F, we have made the link to the least-action principle in physics more explicitly. There we show how the voltage dynamics can be derived from the physical least-action principle by including the Rayleigh dissipation (Eq. 92 and 95).

      (5) Specify that the learning rules have not been observed. Though the learning rules are Hebbian, the details of the rules have not to my knowledge been observed. Would be worth mentioning as this is a sticking point of most related theories.

      We agree, and we do now explicitly write in the Discussion that the learning rule still awaits to be experimentally tested.

      6) Some relevant literature. Chalk et al. PNAS (2018) have explored the relationship between temporal predictive coding and Rao & Ballard predictive coding based on the parameters of the cost function. Harkin et al. eLife (2023) have shown that 'prospective coding' also takes place in the serotonergic system, while Kim ... Ma (2021) have put forward similar ideas for dopamine, both may participate in setting the cost function. Instantaneous voltage propagation is also a focus of Greedy et al. (2023). The authors cite Zenke et al. for spiking error propagation, but there are biological references to that end.

      Thanks much for these hints. We do now cite the book of Gerstner & Kistler on spiking neurons, and more specifically the spike-based approach for learning to represent signals (Brendel, .., Machens, Denève, PLoS CB, 2020). Otherwise, we had difficulties to incorporate the other literature that seems to us not directly related to our approach, even when related notions come up (like predictive coding and temporal processing in Chalk et al. (2018), where various temporal coding schemes coding efficiency is studied as a function of the signal-to-noise ratio), or the apical activities in Greedy et al. (2022), where bursting, multiplexing and synaptic facilitation arises). We found it would confuse more than it would help if we would cite these papers too (we do already cite 95 papers).

      (7) In the main text, theorem two is presented as proof without assumptions on the level of nudging, but the actual proof uses strong assumptions in that respect, relying on numerical ad hoc observations for the general case.

      Thanks for pointing this out. We agree it is a better style to state all the critical assumptions in Theorem itself, rather than deferring them to the Methods. We now state: “Then, for suitable top-down nudging, learning rates, and initial conditions, the ….weights …evolve such that…”.

      (8) In the discussion regarding error-backpropagation, it seems to me that it could be clarified that the current algorithm asks for a weight alignment between FF and FB matrices as well as between FB and interneuron circuit matrices. Whether all of these matrices can be learned together remains to be shown; neither Akrout, Kunin nor Max et al. have shown this explicitly. Particularly when there are other inputs to the apical dendrites from other areas.

      Yes, it is difficult to learn to align all in parallel. Nevertheless, our simulations in fact do align the lateral and vertical circuits, at is also claimed in Theorem 2. Yet, as specified in the theorem, “for suitable learning rates” (that were all the same, but were commonly reduced after some training time, as previously explained in the Methods, Details for Fig. 5).

      In the Discussion we now emphasis that, in general, simulating all the circuitries jointly from scratch in a single phase is tricky. We write:

      “A fundamental difficulty arises when the neuronal implementation of the Euler-Lagrange equations requires an additional microcircuit with its own dynamics. This is the case for the suggested microcircuit extracting the local errors. Formally, the representation of the apical feedback errors first needs to be learned before the errors can teach the feedforward synapses on the basal dendrites. We showed that this error learning can itself be formulated as minimizing an apical mismatch energy. What the lateral feedback through interneurons cannot explain away from the top-down feedback remains as apical prediction error.

      Ideally, while the network synapses targetting the basal tree are performing gradient descent on the global cost, the microcircuit synapses involved in the lateral feedback are performing gradient descent on local error functions, both at any moment in time.

      The simulations show that this intertwined system can in fact learn simultaneously with a common learning rate that is properly tuned. The cortical model network of inter- and pyramidal neurons learned to classify handwritten digits on the fly, with 10 digit samples presented per second. Yet, the overall learning is more robust if the error learning in the apical dendrites operates in phases without output teaching but with corresponding sensory activity, as may arise during sleep (see e.g. Deperrois et al., 2022 and 2023).”

      (9) The short-term depression model is assuming a slow type of short-term depression, not the fast types that are the focus of much recent experimental literature (like Campagnola et al. Science 2022).

      This assumption should be specified.

      Thanks for hinting to this literature that we were not aware of. We are now citing the releaseindependent plasticity (Campagnola et al. 2022) in the context of our synaptic depression model.

      (10) There seems to be a small notation issue: Eq 21 combines vectors of the size of the full network (bar{e}) and the size of the readout network (bar{e}star).

      Well, for notational convenience we set the target error to e*=0 for non-output neurons. This way we can write the total error for an arbitrary network neuron as the sum of the backpropagated error plus the putative target error (if the neuron is an output neuron). Otherwise we would always have to distinguish between network neuron that may be output neurons, and those that are not. We did say this in the main text, but are repeating it now again right after Eq. 21. -- Notations are often the result of a tradoff.

    1. Author response:

      Reviewer #1 (Public Review):

      Summary:

      The manuscript presents a compelling model to explain the impact of mosaicism in preimplantation genetic testing for aneuploidies.

      Strengths:

      A new view of mosaicism is presented with a computational model, that brings new insights into an "old" debate in our field. It is a very well-written manuscript.

      Weaknesses:

      Although the manuscript is very well written, this is in a way that assumes that the reader has existing knowledge about specific terms and topics. This was apparent through a lack of definitions and minimal background/context to the aims and conclusions for some of the author's findings.

      There is a need for some examples to connect real evidence and scenarios from clinical reports with the model.

      We thank the reviewer for their assessment. Some background was condensed for space, and we wrote the manuscript to be understood by readers with existing reproductive genetics background. We will add more detail and explain terminology more clearly. There are a number of published case studies that can link real-life clinical data with the model’s findings. We will include a summary of them in the text.

      Reviewer #2 (Public Review):

      Summary:

      Although an oversimplification of the biological complexities, this modeling work does add, in a limited way, to the current knowledge on the theoretical difficulties of detecting mosaicism in human blastocysts from a single trophectoderm biopsy in PGT. However, many of the premises that the modeling was built on are theoretical and based on unproven biological and clinical assumptions that could yet lead to be untrue. Therefore, the work should be considered only as a simplified model that could assist in further understanding of the complexities of preimplantation embryo mosaicism, but assumptions of real-world application are, at this stage, premature and should not be considered as evidence in favour of any clinical strategies.

      Strengths:

      The work has presented an intriguing theoretical model for elaborating on the interpretation of complex and still unclear biological phenomena such as chromosomal mosaicism in preimplantation embryos.

      We thank the reviewer for this detailed review, and that they see the value of theoretical modelling. We agree that this model makes simplifications; we took this simplified approach to focus on the core contradiction between clinical experience and previous modelling. Expanding the model to consider additional aspects of balanced mitotic nondisjunctions and technical accuracy is something we want to address; we are discussing whether this is something that can be practically added to this manuscript, or will involve enough work that should be developed as a further study.

      Weaknesses:

      Lines 134-138: The spatial modeling of mitotic errors in the embryo was oversimplified in this manuscript. There is only limited (and non-comprehensive) evidence that meiotic errors leading to chromosome mosaicism arise from chromosome loss or gain only (e.g. anaphase lag). This work did not take into account the (more recognised) possibility of mitotic nondisjunction where following the event there would be clones of cells with either one more or one less of the same chromosome. Although addressed in the discussion (lines 572-574), not including this in the most basic of modeling is a significant oversight that, based on the simple likelihood, could significantly affect results.

      As above, we certainly plan to address this in future modelling; developing the model to account for this while also incorporating the issue of technical uncertainty in the state of each cell in the biopsy from sequencing.

      General comment: the premise of the manuscript is that an embryologist (embryology laboratory) is aware of and can accurately quantify the number of cells in a blastocyst or TE biopsy. The reality is that it is not possible to accurately do this without the destruction of the sample which is obviously not clinically applicable. Based on many assumptions the findings show that taking small biopsies poorly classifies mosaic embryos, which is not disputed. However, extrapolating this to the clinic and making suggestions to biopsy a certain amount of cells (lines 539-540) is careless and potentially harmful by suggesting the introduction of potential change in clinical practice without validation. Additionally, no embryologist in the field can tell how many cells are present in a clinical TE biopsy, making this suggestion even more impractical.

      We will revise this to make the technical limitations of clinical TE biopsies clearer.

      On a more general clinical consideration, the authors should acknowledge that when reporting findings of unproven clinical utility and unknown predictive values this inevitably results in negative consequences for infertile couples undergoing IVF. It is proven and established that when couples face the decision on how to manage a putative mosaicism finding, the vast majority decide on embryo disposal. It was recently reported in an ESHRE survey that about 75% of practitioners in the field consider discarding or donating to research embryos with reported mosaicism. A prospective clinical trial showed that about 30% live birth rate reduction can be expected if mosaic embryos are not considered (Capalbo et al., AJHG 2021). The real-world experience is that when mosaicism is reported, embryos with almost normal reproductive potential are discarded. The authors should be more careful with the clinical interpretation and translation of these theoretical findings.

      The clinical potential of mosaic embryos is much more nuanced than a simple ‘they should be discarded’ or ‘they should be treated like euploid embryos’. While the study mentioned by the reviewer (Capalbo et al., AJHG 2021) does indeed suggest that embryos with putative low level mosaicism have good potential, it also suggests that embryos with putative high level mosaicism are largely to be considered aneuploid and should therefore be discarded. Therefore, even the mentioned study supports a ‘ranking’ of embryos by their mosaic result. Furthermore, large controlled retrospective studies have indicated that even high level mosaic embryos have reproductive potential (Viotti Fertility & Sterility 2021 and Viotti F&S 2023). Recent case reports have shown that mosaicism can occasionally persist from embryo to late gestation and even birth, at times associating with negative medical findings. Therefore, while the true clinical potential of embryos classified as mosaic is still being defined, here we are merely suggesting that from a modelling standpoint, the features of mosaicism detected with PGT-A can help guide clinical decisions (complementing the observations reported in the clinical studies).

      There is a robust consensus within the field of clinical genetics and genomics regarding the necessity to exclusively report findings that possess well-established clinical validity and utility. This consensus is grounded in the imperative to mitigate misinterpretation and ineffective actions in patient care. However, the clinical framework delineated in this manuscript diverges from the prevailing consensus in clinical genetics. Clinical genetics and genomics prioritize the dissemination of findings that have undergone rigorous validation processes and have demonstrated clear clinical relevance and utility. This emphasis is crucial for ensuring accurate diagnosis, prognosis, and therapeutic decision-making in patient care. By adhering to established standards of evidence and clinical utility, healthcare providers can minimize the potential for misinterpretation and inappropriate interventions. The framework proposed in this manuscript appears to deviate from the established principles guiding clinical genetics practice. It is imperative for clinical frameworks to align closely with the consensus guidelines and recommendations set forth by professional organizations and regulatory bodies in the field. This alignment not only upholds the integrity and reliability of genetic testing and interpretation but also safeguards patient well-being and clinical outcomes.

      References:

      ACMG Board of Directors. (2015). Clinical utility of genetic and genomic services: a position statement of the American College of Medical Genetics and Genomics. Genetics in Medicine, 17(6), 505-507. https://doi.org/10.1038/gim.2014.194.

      Richards, S., Aziz, N., Bale, S., Bick, D., Das, S., Gastier-Foster, J., ... ACMG Laboratory Quality Assurance Committee. (2015). Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. Genetics in Medicine, 17(5), 405-424. https://doi.org/10.1038/gim.2015.30

      We will update where necessary to match these references.

      Line 61: "Self correction" - This terminology is unfortunately indiscriminately used in the field for PGT when referring to mosaicism and implies that the embryo can actively correct itself from a state of inherent abnormality. Apart from there being no evidence to suggest that there is an active process by which the embryo itself can correct chromosomal errors, most presumed euploid/aneuploid mosaic embryos will have been euploid zygotes and therefore "self-harm" may be a better explanation. True self-correction in the form of meiotic trisomy/monosomy rescue is of course theoretically possible but not at all clinically significant. The concept being conveyed in this part of the manuscript is not disputed but it is strongly suggested that the term "self correction" is not used in this context, nor in the rest of the manuscript, to prevent the perpetuation of misinformation in the field and instead use a better description.

      This is a good point. We have used ‘self correction’ as a shorthand, but the reality is more nuanced. It will often be a passive process in which aneuploid cell lineages fail to proliferate over time (‘aneuploidy depletion’). The idea of ‘self harm’ is interesting; aneuploidy arising from a healthy euploid embryo. We can also see a further situation where the gametes suffered damage (e.g. DNA fragmentation, unresolved crossovers, persistence of meiotic breaks) leading to mitotic errors. In that case, the embryo would suffer the consequences of harm in the gametes, and ‘aneuploidy rescue’ may be a useful term also. We will discuss this further and reword the terminology along these lines.

      Lines 69-73: The ability to quantify aneuploidy in known admixtures of aneuploid cells is indeed well established. However, the authors claim that the translation of this to embryo biopsy samples is inferred with some confidence and that if a biopsy shows an intermediate chromosome copy number (ICN), that the biopsy and the embryo are mosaic. There are no references provided here and indeed the only evidence in the literature relating to this is to the contrary. Multifocal biopsy studies have shown that an ICN result in a single biopsy is often not seen in other biopsies from the same embryo (Capalbo et al 2021; Kim et al., 2022; Girardi et al., 2023; Marin, Xu, and Treff 2021). Multifocal biopsies showing reciprocal gain and loss which would provide stronger validation for the presence of true mosaicism are also rare. In this work, the entire manuscript is based on the accuracy of ICN in a biopsy being reflective of mosaicism in the embryo. The evidence however points to a large proportion of ICN detected in embryo biopsy potentially being technical artifacts (misdiagnosing both constitutionally normal and abnormal (meiotic aneuploid) embryos as mosaic. Therefore, although results from the modelling provide insight into theoretical results, these can not be used to inform clinical decision-making at all.

      We thank the reviewer for raising this important conceptual point, which needs to be addressed. The fact that mosaicism is often not observed in serial biopsies of the same embryo is precisely an inherent feature of mosaicism and is an invalid argument to discount the original diagnosis as false. The detection of ICN is not trivial and certain PGT-A platforms might not have the capability to discern noise from true ICN, hence the need for proper validation of the technology. The most stringent validation method for mosaicism detection remains the admixture experiment, such that when ICN patterns are detected the most obvious conclusion is that the biopsy contained a mosaic mix of cells. We aim to add wording regarding these points in the manuscript.

      Lines 87-89: The authors make the claim that emerging evidence is suggestive that the majority of embryos are mosaic to some degree. If in fact, mosaicism is the norm, the clinical importance may be limited.

      If the majority of embryos are mosaic to some degree, it is important to understand the impacts that this may have on PGT-A biopsies and how informative such biopsies may be. Returning to the point the reviewer made above about mitotic aneuploidies as an important consideration: a mitotic nondisjunction at the first cleavage would result in a embryo that was entirely aneuploid. A mitotic nondisjunction occurring at the second cleavage would result in an embryo with 50% aneuploid cells, at the third cleavage, 25% aneuploid cells. If these aneuploid cells fail to proliferate, or are removed (either actively or passively), the level of aneuploidy will fall over time. While mosaicism is a binary (an embryo is or is not a mosaic of karyotypes), even if most embryos are mosaic, the clinical importance will depend on the level of aneuploidy.

      Line 102-103: The statement that data shows that the live birth rate per ET is generally lower in mosaic embryos than euploid embryos is from retrospective cohort studies that suffer from significant selection bias. The authors have ignored non-selection study results (Capalbo et al, ajhg 2021) that suggest that putative mosaicism has limited predictive value when assessed prospectively and blinded.

      We will add the referenced multifocal biopsy study, but in contrast to the reviewer we see the data it contains as supporting our position in this paper. Capalbo et al. performed rebiopsies of trophectoderm and a biopsy of inner cell mass and found that high level mosaic or aneuploid trophectoderm tended to correlate with abnormal karyotypes in the inner cell mass while low level mosaics correlated with a normal inner cell mass. This supports our point that measuring levels of aneuploidy in the trophectoderm is relevant, and that this gives useful information for ranking embryos.

      Lines 94-98: The authors have misrepresented the works they have presented as evidence for biopsy result accuracy (Kim et al., 2023; Victor et al 2019; Capalbo et al., 2021; Girardi et al., 2023, and any others). These studies show that a mosaic biopsy is not representative of the whole embryo and can actually be from embryos where the remainder of the embryo shows no evidence of mosaicism. There is also a missing key reference of Capalbo et al, AJHG 2021, and Girardi et al., HR 2023 where multifocal biopsies were taken.

      As above, we will add more information on these multifocal biopsy studies; we believe these studies also support our position: that individual biopsies are not predictive of aneuploidy level in an embryo. If mosaicism is detected in the biopsy, then the embryo is mosaic, but if the remainder of the embryo is euploid then that single biopsy was not an accurate representation of the embryo. This could also apply in reverse - if mosaicism is not detected in the biopsy, it does not mean there is no mosaicism in the embryo, only that mosaicism could not be identified.

      Lines 371-372: "Selecting the embryo with the lowest number of aneuploid cells in the biopsy for transfer is still the most sensible decision". Where is the evidence for this other than the modeling which is affected by oversimplification and unproven assumptions? Although the statement seems logical at face value, there is no concrete evidence that the proportion of aneuploid cells within a biopsy is valuable for clinical outcomes, especially when co-evaluated with other more relevant clinical information.

      We made this statement as part of a thought experiment to explain the difference between the concepts of absolute measurements versus embryo ranking. This section is not a result of the model, or clinical advice; it is a statement that in the specific example embryos given, the embryo with the fewest aneuploid cells in the biopsy would still be the embryo with the fewest aneuploid cells overall, and thus transferring this embryo (in the absence of any other differences of embryo quality) would remain sensible.

      Lines 431-463: In this section, the authors discuss clinical outcome data from the transfer of putative mosaic embryos and make conclusions about the relationship between ICN level in biopsy and successful pregnancy outcomes. The retrospective and selective nature of the data used in forming the results has the potential to lead to incorrect conclusions when applied to prospective unselected data.

      We believe the clinical data is a useful biological reality check, and we are discussing how to integrate it better with the modelling.

      Reviewer #3 (Public Review):

      Unfortunately, this study fails to incorporate the most important variable impacting the ability to predict mosaicism, the accuracy of the test. The fact is that most embryos diagnosed as mosaic are not mosaic. There may be 4 cases out of thousands and thousands of transfers where a confirmation was made. Mosaicism has become a category of diagnosis in which embryos with noisy NGS profiles are placed. With VeriSeq NGS it is not possible to routinely distinguish true mosaicism from noise. An analysis of NGS noise levels (MAPD) versus the rate of mosaics by clinic using the registry will likely demonstrate this is the case. Without accounting for the considerable inaccuracy of the method of testing the proposed modeling is meaningless.

      We disagree with the reviewer that the modelling is meaningless; we disagree that mosaicism is rare (see our other points). However, if we grant that mosaicism is rare, that almost all embryos are euploid or aneuploid, and that technical noise is the primary factor generating intermediate copy number values, then it is still important to understand how to interpret such intermediate values. Low-level mosaics would more likely represent miscalled euploid embryos, and high-level mosaics would more likely represent miscalled aneuploid embryos. We demonstrate that ranking on these intermediate values correlates with implantation rates and live birth rates, supporting their use. We do agree that technical accuracy of the NGS is an important consideration, and we will be incorporating this into our modelling in the future.

      Recent data using more accurate methods of identifying mosaicism indicate that the prevalence of true preimplantation embryonic mosaicism is only 2%, which is also consistent with findings made post-implantation. This model fails to account for the possibility that, because so few embryos are actually mosaic, there is actually no relevance to clinical care whatsoever. In fact, differences in clinical outcomes of embryos designated as mosaic could be entirely attributed to poor embryo quality resulting in noise levels that make NGS results fall into the "mosaic" category.

      As we also wrote in the point above, we disagree; it is possible that a euploid embryo may be misinterpreted as a mosaic. It is also possible that an aneuploid embryo is misinterpreted as a mosaic. Whether the intermediate copy number values arise through biological or technical reasons, they contain information that is useful to decisions on whether to transfer. We also note a recent paper that performed single-cell dissociation of trophectoderm versus inner cell mass which found that mosaicism in human embryos is very common (Chavli et al, 2024, DOI:10.1172/JCI174483).

      Additional comments:

      “Indeed, as more data emerges, it appears that the majority of embryos from both healthy and infertile couples are mosaic to some degree (Coticchio et al., 2021; Griffin et al., 2022).”

      This statement should be softened as all embryos will be considered mosaic when a method with a 10% false positive rate is applied to 10 more parts of the same embryo. The distinction between artifact and true mosaicism cannot be made with nearly all current methods of testing. When virtually no embryos display uniform aneuploidy in a rebiopsy study, there should be great concern over the accuracy of the testing used. The vast majority of aneuploidy is meiotic in origin.

      We note that reviewer 2 wrote that mitotic aneuploidy was the key concern, whereas reviewer 3 states meiotic aneuploidy is more common; we argue that both are relevant; a recent study by McCoy et al, 2023 (DOI:10.1186/s13073-023-01231-1) found that both drive arrest of human IVF embryos.

      “Experimental data provides strong evidence that, for the most part, the biopsy result obtained accurately represents the chromosome constitution of the rest of the embryo (Kim 96 et al., 2022; Navratil et al., 2020; Victor et al., 2019).”

      This statement is incorrect given published systematic review of the literature indicates a 10% false positive rate based on rebiopsy results.

      This shows that accurately classifying a mosaic embryo based on a single biopsy is not robust.

      This is exactly why the practice of designating embryo mosaics with intermediate copy numbers should not exist.

      We agree that accurately classifying a mosaic embryo based on a single biopsy is not robust. That is one of the main messages of this paper. What we show here is that biopsies from a mosaic embryo are indeed likely to disagree with each other - but we find that there is still enough information at a population level for this to be an indicator or embryo outcomes. We have not yet performed modelling to explore the effect of technical error, so we will not speculate on the impact, but we reiterate a point made earlier: the most stringent validation method for mosaicism detection remains the admixture experiment, such that when intermediate copy number patterns are detected the most obvious conclusion is that the biopsy contained a mosaic mix of cells.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      eLife assessment

      The manuscript by Jingsong Zhou and colleagues tries to uncover the reasons for the resistance of extraocular muscles (EOMs) to degenerative changes induced by amyotrophic lateral sclerosis (ALS). The findings of the study offer valuable information that EOMs are spared in ALS because they produce protective factors for the NMJ and, more specifically, factors secreted by EOM-derived satellite cells. While most of the experimental approaches are convincing, the use of sodium butyrate (NaBu) in this study needs further investigation, as NaBu might have a variety of biological effects. Overall, this work may help develop future therapeutic interventions for patients with ALS.

      We agree with the editor that NaBu have a variety of biological effects that require further investigation. Our team previously have explored the effect of NaBu treatment on intestinal microbiota and intestinal epithelial permeability (DOI: 10.1016/j.clinthera.2016.12.014), on the mitochondrial respiratory function of NSC-34 motor neuron cell line overexpressing hSOD1G93A (DOI: 10.3390/biom12020333) and on the mitochondrial function of skeletal muscle myofibers of G93A mice (DOI: 10.3390/ijms22147412). Other research teams have also explored the role of NaBu (or HDAC inhibition) in neuronal survival and axonal transport (DOIs: 10.1073/pnas.0907935106; 10.1038/s41467-017-00911-y; 10.15252/embj.2020106177; 10.1093/hmg/ddt028).

      Since the theme of this manuscript is the transcriptomic characteristics of EOM SCs, to include data of how NaBu affect cellular/molecular processes of other tissues will somewhat deviate from the theme. It would be more appropriate to develop a separate manuscript focusing on other tissues.

      We appreciate the feedback from the Editors and reviewers. We realized that our previous description on butyrate’s beneficial role might be overstated in the Abstract Section. We have made two changes to avoid potential overstatement of our finding: (1) We modified the Abstract to state that “the NaBu-induced transcriptomic changes resembling the patterns of EOM SCs “may contribute to” (instead of “underlie”) the beneficial effects observed in G93A mice” (Page 1, Line 29); (2) We have edited the corresponding paragraph in the Discussion section to emphasize that the effect of NaBu treatment is multi-faceted (Page 11, Line 459-461).

      Recommendations for the authors:

      Reviewer #3 (Recommendations For The Authors):

      line 388-389. The sentence has been corrected but is still not clear. What do the authors mean by ".....resulting in higher proportion of COX-deficient myofibers than other muscles». What other muscles do they refer to?

      Other muscles refer to muscles whose stem cells remain dormant under physiological conditions (uninjured, innervated), such as EDL. We have edited the sentence accordingly. (Page 10, Line 431-432)

      In reference to the results shown in Fig. 2, 7, 8 and 9. Since the experimenters were not blinded, this should be explicitly stated in the Methods section.

      We have added the disclaimer in the current “Data analysis and statistics” section in Methods as follows: “The experimenters were not blinded to the samples in data collection and analysis.” (Page 15, Line 636)

      Figure 7 C has been amended but now the inserted ANOVA values interfere with the correct visualization of Fig. 7D, can panels D be moved down so that they are better separated from panels in Fig. 7C

      Thanks for the comment and we have edited Figure 7 accordingly.

      Reviewer #4 (Recommendations For The Authors):

      The authors have revised the manuscript per the reviewer's comments in this study. While most of the concerns were addressed, a few concerns remain.

      The molecular basis of how AAV-mediated delivery of Cxcl12 improves the phenotype of satellite cells is still unclear.

      Thanks for the comment. As one of the earliest discovered chemokines, the chemotactic role of Cxcl12-Cxcr4 axis on cells and cellular processes (such as axons) has been comprehensively investigated by different functional assays from overexpression to protein application to inhibitor application to knockdown by shRNAs in different types of tissues. To list a few examples, the establishment of the correct routing trajectories of mammalian motor axons and oculomotor axons during embryonic development (DOIs: 10.1016/j.neuron.2005.08.011; 10.1167/iovs.18-25190). The regeneration of injured motor axon terminals guided by terminal Schwann cells in adult mice (DOI: 10.15252/emmm.201607257). The migration of neural crest cells to sympathetic ganglia in the formation of sympathetic nerve system during embryogenesis (DOI: 10.1523/JNEUROSCI.0892-10.2010). The migration of myoblasts in the process of fusion into myotubes (DOIs: 10.1242/jcs.066241; 10.1111/boc.201200022; 10.1074/jbc.M706730200).

      Because the existence of so many detailed mechanistic studies, our goal for this manuscript is not to identify a novel mechanism of how Cxcl12-mediated chemotaxis is achieved. Rather, we used it as one of the proof-of-concept mechanisms contributing to the resistance of EOMs against ALS and benefits of NaBu treatment. Certainly, it is not the sole mechanism.

      To address the reviewer’s concern, we have expanded discussion about the previous studies regarding the chemotactic effect of Cxcl12 in the discussion section. (Page 10, Line 435-436, Page 11, Line 445-446)

      The NaBu experiments may need additional support from other approaches. NaBu effects may not be directly related to satellite cells or muscle cells. Thus, the animal experiment results need to be carefully interpreted.

      We agree that NaBu have a variety of biological effects that require further investigation. Our team previously have explored the effect of NaBu treatment on intestinal microbiota and intestinal epithelial permeability (DOI: 10.1016/j.clinthera.2016.12.014), on the mitochondrial respiratory function of NSC-34 motor neuron cell line overexpressing hSOD1G93A (DOI: 10.3390/biom12020333) and on the mitochondrial function of skeletal muscle myofibers of G93A mice (DOI: 10.3390/ijms22147412). Other research teams have also explored the role of NaBu (or HDAC inhibition) in neuronal survival and axonal transport (DOIs: 10.1073/pnas.0907935106; 10.1038/s41467-017-00911-y; 10.15252/embj.2020106177; 10.1093/hmg/ddt028).

      Since the theme of this manuscript is the transcriptomic characteristics of EOM SCs, to include data of how NaBu affect cellular/molecular processes of other tissues will somewhat deviate from the theme. It would be more appropriate to develop a separate manuscript specifically addressing the impact of NaBu on other tissues.

      We appreciate the feedback from the reviewers. We realized that our previous description on butyrate’s beneficial role might be overstated in the Abstract Section. In response, we have made two changes to avoid potential overstatement of our finding: (1) We modified the Abstract to state that “the NaBu-induced transcriptomic changes resembling the patterns of EOM SCs “may contribute to” (instead of “underlie”) the beneficial effects observed in G93A mice” (Page 1, Line 29); (2) We edited the corresponding paragraph in the Discussion section to emphasize that the effect of NaBu treatment is multi-faceted (Page 11, Line 459-461).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript, Ngo et al. report a peculiar effect where a single base mismatch (CC) can enhance the mechanical stability of a nucleosome. In previous studies, the same group used a similar state-of-the-art fluorescence-force assay to study the unwrapping dynamics of 601-DNA from the nucleosome and observed that force-induced unwrapping happens more slowly for DNA that is more bendable because of changes in sequence or chemical modification. This manuscript appears to be a sequel to this line of projects, where the effect of CC is tested. The authors confirmed that CC is the most flexible mismatch using the FRET-based cyclization assay and found that unwrapping becomes slower when CC is introduced at three different positions in the 601 sequence. The CC mismatch only affects the local unwrapping dynamics of the outer turn of nucleosomal DNA.

      Strengths:

      These results are in good agreement with the previously established correlation between DNA bendability and nucleosome mechanical stability by the same group. This well-executed, technically sound, and well-written experimental study contains novel nucleosome unwrapping data specific to the CC mismatch and 601 sequence, the cyclizability of DNA containing all base pair mismatches, and the unwrapping of 601-DNA from xenophus and yeast histones. Overall, this work will be received with great interest by the biophysics community and is definitely worth attention.

      Weaknesses:

      The scope and impact of this study are somewhat limited due to the lack of sequence variation. Whether the conclusion from this study can be generalized to other sequences and other bendability-enhancing mismatches needs further investigation.

      Major questions:

      (1) As pointed out by the authors, the FRET signal is not sensitive to nucleosome position; therefore, the increasing unwrapping force in the presence of CC can be interpreted as the repositioning of the nucleosome upon perturbation. It is then also possible that CC-containing DNA is not positioned exactly the same as normal DNA from the start upon nucleosome assembly, leading to different unwrapping trajectories. What is the experimental evidence that supports identical positioning of the nucleosomes before the first stretch?

      We added the following and refer to our recent publication1 to address this question.

      “This is consistent with a previous single nucleotide resolution mapping of dyad position from of a library of mismatches in all possible positions along the 601 sequence or a budding yeast native sequence which showed that a single mismatch (A-A or T-T) does not affect the nucleosome position27.”

      (2) The authors chose a constant stretching rate in this study. Can the authors provide a more detailed explanation or rationale for why this rate was chosen? At this rate, the authors found hysteresis, which indicates that stretching is faster than quasi-static. But it must have been slow and weak enough to allow for reversible unwrapping and wrapping of a CC-containing DNA stretch longer than one helical turn. Otherwise, such a strong effect of CC at a single location would not be seen. I am also curious about the biological relevance of the magnitude of the force. Can such force arise during nucleosome assembly in vivo?

      To address the comment about the magnitude of force, we added the following paragraph to Introduction. “RNA polymerase II can initiate transcription at 4 pN of hindering force2 and its elongation activity continues until it stalls at ~ 10 pN of hindering force3,4. Therefore, the transcription machinery can generate picoNewtons of force on chromatin as long as both the machinery and the chromatin segment in contact are tethered to stationary objects in the nucleus. Another class of motor protein, chromatin remodeling enzymes, was also shown to induce processive and directional sliding of single nucleosomes when the DNA is under similar amount of tension (~ 5 pN)5. Therefore, measurements of nucleosomes at a few pN of force will expand our knowledge of the physiology roles of nucleosome structure and dynamics.”

      To address the comment about the stretching rate, we added the following to Results. We note that the physiological loading rate has been challenging to determine for any biomolecular interactions, and the only quantitative measurement we are aware of is that of an integrin that we are citing.

      “The force increases nonlinearly and the loading rate, i.e. the rate at which the force increases, was approximately in the range of 0.2 pN/s to 6 pN/s, similar to the cellular loading rates for a mechanosensitive membrane receptor6.”

      (3) In this study, the CC mismatch is the only change made to the 601 sequence. For readers to truly appreciate its unique effect on unwrapping dynamics as a base pair defect, it would be nice to include the baseline effects of other minor changes to the sequence. For example, how robust is the unwrapping force or dynamics against a single-bp change (e.g., AT to GC) at the three chosen positions?

      Unfortunately, we are unable to perform the suggested unwrapping experiment in a timely manner because the instrument has been disassembled during our recent move. However, we previously performed unwrapping experiments not only as a function of sequence but also as a function of cytosine modification and showed that we can detect even more subtle effects7,8. In addition, please note that we are not claiming that simply changing basepair at the chosen sites changes the mechanical stability of a nucleosome so we do not believe the requested experiment is necessary.

      (4) The last section introduces yeast histones. Based on the theme of the paper, I was expecting to see how the effect of CC is or is not preserved with a different histone source. Instead, the experiment only focuses on differences in the unwrapping dynamics. Although the data presented are important, it is not clear how they fit or support the narrative of the paper without the effect of CC.

      We apologize for giving the reviewer a wrong impression. We included the data because we believe that information on how the histone core can determine the translation of DNA mechanics into nucleosome mechanical stability will be of interest to the readers of this manuscript. We now mention explicitly that the observation was made using intact DNA, i.e. no mismatch, in the abstract and elsewhere.

      (5) It is stated that tRNA was excluded in experiments with yeast-expressed nucleosomes. What is the reason for excluding it for yeast nucleosomes? Did the authors rule out the possibility that tRNA causes the measured difference between the two nucleosome types?

      We normally include tRNA because we found that it reduces sticking of beads to the surface over several hours of experiments. In yeast nucleosomes, we found that tRNA causes the nucleosome to disassemble. Therefore, we did not include tRNA in yeast nucleosome experiments. We now mention this in Methods as reproduced below.

      “tRNA, which we normally include to reduce sticking of beads to the surface over the hours of single molecule experiments in a sealed chamber, was excluded in experiments with yeastexpressed nucleosomes because tRNA induced disassembly of nucleosomes assembled using yeast histones.”

      We cannot not formally rule out the possibility that tRNA causes the measured difference between Xenopus - vs Yeast- nucleosomes. However, we have shown in our previous publication7 that the asymmetric unwrapping in Xenopus nucleosomes was modulated by the DNA sequence. When we swapped the sequence of the inner turn between the two sides, while tRNA was included in all experiments, we observed stochastic unwrapping instead. As part of our response to another reviewer’s comments, we also added the following on the relevant differences between the species in Discussion.

      “The crystal structure of the yeast nucleosome suggests that yeast nucleosome architecture is subtly destabilized in comparison with nucleosomes from higher eukaryotes9. Yeast histone protein sequences are not well conserved relative to vertebrate histones (H2A, 77%; H2B, 73%; H3, 90%; H4, 92% identities), and this divergence likely contributes to differences in nucleosome stability. Substitution of three residues in yeast H3 a3-helix (Q120, K121, K125) very near the nucleosome dyad with corresponding human H3.1/H3.3 residues (QK…K replaced with MP…Q) caused severe growth defects, elevated nuclease sensitivity, reduced nucleosome positioning and nucleosome relocation to preferred locations predicted by DNA sequence alone 10. The yeast histone octamer harboring wild type H3 may be less capable of wrapping DNA over the histone core, leading to reduced resistance to the unwrapping force for the more flexible half of the 601positioning sequence.”

      Reviewer #2 (Public Review):

      Summary:

      Mismatches occur as a result of DNA polymerase errors, chemical modification of nucleotides, during homologous recombination between near-identical partners, as well as during gene editing on chromosomal DNA. Under some circumstances, such mismatches may be incorporated into nucleosomes but their impact on nucleosome structure and stability is not known. The authors use the well-defined 601 nucleosome positioning sequence to assemble nucleosomes with histones on perfectly matched dsDNA as well as on ds DNA with defined mismatches at three nucleosomal positions. They use the R18, R39, and R56 positions situated in the middle of the outer turn, at the junction between the outer turn and inner turn, and in the middle of the inner turn, respectively. Most experiments are carried out with CC mismatches and Xenopus histones. Unwrapping of the outer DNA turn is monitored by singlemolecule FRET in which the Cy3 donor is incorporated on the 68th nucleotide from the 5'-end of the top strand and the Cy5 acceptor is attached to the 7th nucleotide from the 5' end of the bottom strand. Force is applied to the nucleosomal DNA as FRET is monitored to assess nucleosome unwrapping. The results show that a CC mismatch enhances nucleosome mechanical stability. Interestingly, yeast and Xenopus histones show different behaviors in this assay. The authors use FRET to measure the cyclization of the dsDNA substrates to test the hypothesis that mismatches enhance the flexibility of the 601 dsDNA fragment and find that CC, CA, CT, TT, and AA mismatches decrease looping time, whereas GA, GG, and GT mismatches had little to no effect. These effects correlate with the results from DNA buckling assays reported by Euler's group (NAR 41, 2013) using the same mismatches as an orthogonal way to measure DNA kinking. The authors discuss that substitution rates are higher towards the middle of the nucleosome, suggesting that mismatches/DNA damage at this position are less accessible for repair, consistent with the nucleosome stability results.

      Strengths:

      The single-molecule data show clear and consistent effects of mismatches on nucleosome stability and DNA persistence length.

      Weaknesses:

      It is unclear in the looping assay how the cyclization rate relates to the reporting looping time. The biological significance and implications such as the effect on mismatch repair or nucleosome remodelers remain untested. It is unclear whether the mutational pattern reflects the behavior of the different mismatches. Such a correlation could strengthen the argument that the observed effects are relevant for mutagenesis.

      Reviewer #3 (Public Review):

      Summary:

      The mechanical properties of DNA wrapped in nucleosomes affect the stability of nucleosomes and may play a role in the regulation of DNA accessibility in eukaryotes. In this manuscript, Ngo and coworkers study how the stability of a nucleosome is affected by the introduction of a CC mismatched base pair, which has been reported to increase the flexibility of DNA. Previously, the group has used a sophisticated combination of single-molecule FRET and force spectroscopy with an optical trap to show that the more flexible half of a 601 DNA segment provides for more stable wrapping as compared to the other half. Here, it is confirmed with a single-molecule cyclization essay that the introduction of a CC mismatch increases the flexibility of a DNA fragment. Consistent with the previous interpretation, it also increased the unwrapping force for the half of the 601 segment in which the CC mismatch was introduced, as measured with single-molecule FRET and force spectroscopy. Enhanced stability was found up to 56 bp into the nucleosome. The intricate role of mechanical stability of nucleosomes was further investigated by comparing force-induced unwrapping profiles of yeast and Xenopus histones. Intriguingly, asymmetric unwrapping was more pronounced for yeast histones.

      Strengths:

      (1) High-quality single-molecule data.

      (2) Novel mechanism, potentially explaining the increased prominence of mutations near the dyads of nucleosomes.

      (3) A clear mechanistic explanation of how mismatches affect nucleosome stability.

      Weaknesses:

      (1) Disconnect between mismatches in nucleosomes and measurements comparing Xenopus and yeast nucleosome stability.

      (2) Convoluted data in cyclization experiments concerning the phasing of mismatches and biotin site. ---

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Specific comments:

      In Figure 1 legend, "the black diamonds on the DNA bends represent the mismatch position with R18 and R39 on minor grooves and R56 on a major groove." Minor and major grooves should be phrased as histone-facing minor and major grooves.

      We fixed the problem.

      In Materials and Methods, the sentence that describes the stretching rate cites reference 1, which does not seem to be relevant.

      We fixed the problem.

      Reviewer #2 (Recommendations For The Authors):

      (1) In the introduction, the authors should also discuss the context of mismatches occurring during homologous recombination in meiosis or somatic cells in non-allelic recombination between near identical repeats.

      Introduction now has the following.

      “DNA base-base mismatches are generated by nucleotide misincorporation during DNA synthesis, meiotic recombination, somatic recombination between nearly identical repeats, or chemical modification such as hydrolytic deamination of cytosine.”

      (2) Generally, it seems counter-intuitive in terms of biology that mismatches containing nucleosomes are more stable, as mismatches require repair and/or detection for heteroduplex rejection during recombination. Some discussion of this apparent paradox should be added.

      To address this comment, we added the following to Discussion.

      “The higher frequency of substitutions in the nucleosomal DNA may be attributed to the difficulty of accessing the extra-stable nucleosomes. We also note that even without an enhanced stability, a mismatch within a nucleosome would be more difficult to detect for mismatch repair machineries compared to a mismatch in a non-nucleosomal DNA. Because mismatch repair machineries accompany the replisome, most of nascent mismatches may be detected for repair before nucleosome deposition. Therefore, the decrease in accessibility predicted based on our data here may be important only in rare cases a mismatch is not detected prior to the deposition of a nucleosome on the nascent DNA or in cases where a mismatch is generated via a non-replicative mechanism.”

      (3) The authors discuss that the substitution rate is higher while the indel (insertion and deletion) rate is lower nearer the center of a positioned nucleosome. Are the differences between individual mismatches reported in Figure 6 reflected in the mutagenic profile?

      We cannot currently compare them because the mutagenic profile even when it is available is a complex convolution of mismatch generation, mismatch repair and selection. Mismatch generation occurs through several different processes and how they are affected by nucleosomes and their mismatch type and sequence context is unknown. Mismatch repair process itself depends on mismatch type and sequence context as recently shown by a high throughput in vivo study11. And because the population genetics does not simply reflect de novo mutation profiles due to selection, comparison between mismatch-induced DNA mechanical changes and mutagenic profiles is further complicated. We added the following to the revision.

      “If and how the mismatch type-dependent DNA mechanics affects the sequence-dependent mismatch repair efficiency in vivo, as recently determined in a high through study in E. coli11, remains to be investigated. Comparison of mismatch-type dependent DNA mechanics to population genetics data is challenging because mutation profiles reflect a combined outcome of mismatch-generation, mismatch repair and selection in addition to other mutational processes.”

      (4) The looping assay should be explained better, especially how the cyclization rate is related to the reported looping time.

      We modified Figure 5 to include examples of looping time determination through fitting of the looped fraction vs time, and added the following to the figure caption.

      “To calculate the looping time, the fraction of looped molecules (high FRET) as a function of time is fitted to an exponential function, 𝑒−𝑡⁄(𝑙𝑜𝑜𝑝𝑖𝑛𝑔 𝑡𝑖𝑚𝑒) (right panel for one run of experiments).

      Furthermore, we added the following sentence to Results.

      “The rate of loop formation, which is the inverse of looping time determined from an exponential fitting of loop fraction vs time, was used as a measure of apparent DNA flexibility influenced by a mismatch 12,13.”

      *Reviewer #3 (Recommendations For The Authors):

      I have some concerns that, when addressed upon revision, would improve the manuscript:

      (1) Page 6 and Supplementary Figure S1C: Though the FRET levels are the same for all nucleosomes, the distribution between the two levels is not. The nucleosomes with CC mismatches appear to have a larger fraction in the low-FRET population. This seems to contradict the higher mechanical stability. A comment on this should clarify it, or make this conundrum explicit.

      Thank you for the comment. The low FRET population also includes the nucleosomes that do not have an active acceptor the fraction of which varies between preparations. We now note this in the supplementary figure caption.

      (2) It is intriguing that a more stable nucleosome forms after several pulling cycles and it is argued that this might be due to shifting of the nucleosome. This seems reasonable and has important consequences both for the interpretation of the current experimental data and for the general mechanisms involved in nucleosome maintenance and remodeling. It is puzzling though how this would work mechanistically since it only seems to happen when nucleosomes are half-wrapped and when the unwrapped half contains the mismatch. From the previous work of the group and the current manuscript, it seems that shift does not occur in DNA without mismatches (Correct?). Does shifting happen for the 601-R18 and 601-R56 nucleosomes as well?

      The mismatch-containing half is the half that is mechanically less stable in an intact, mismatch-free 601 nucleosome. So indeed, that is the half that is unwrapped in an intact nucleosome. But because the introduction of mismatch makes that half more mechanically stable, it can stay wrapped until higher forces, and the resulting structural distortion may cause the shift although we acknowledge that this interpretation remains speculative. Shifting occurs for all three constructs with a mismatch but not for the intact nucleosome without a mismatch.

      (3) Could the shifting be related to the differences in sub-population distribution observed in Supplementary Figure S1C?

      /See our response to comment (1) above.

      (4) The paper would have more impact if the mechanism of possible shifting could be clarified. This can be done experimentally with a fluorescent histone, as suggested in the manuscript. But having a FRET pair on positions in the DNA that would shift to closer proximity upon shifting, either at the ED2 or at the ED1 site will also work, is in line with the current experiments and seems feasible.

      We revised the text as follows in order not to exclude labeling configurations with both fluorophores on the DNA while reporting on the shift. We are also happy to add an appropriate reference if the reviewer can help us identify an existing study that measured dyad position shifts through such a labeling configuration.

      “However, since the FRET values in our DNA construct are not sensitive to the nucleosome position, further experiments with fluorophores conjugated to strategic positions that allow discrimination between different dyad positions14 will be required to test this hypothesis.”

      (5) Figures 5 and 6: To appreciate the quality of the data, state the number of molecules that contributed to the cyclization essay, or better, share a figure of the number of looped molecules as a function of time as supplementary data.

      We added the requested figures to Figure 5 and a new supplementary Figure 2, and added the following to Methods.

      “Approximately 2500 – 3500 molecules were quantified at each timestamp during the experiment, and three independent experiments were performed for each sequence (Supplemental Figure S2).”

      (6) Page 8/9: A control is added to confirm that the phasing of the biotin relative to the end affects the observed cyclization rate. However, the mismatch sites were chosen such that they included 5 bp phase shifts. This convolutes the outcomes, as the direction of flexibility due to the phasing of the mismatch relative to the biotin may also influence the rate. Was this checked?

      We would like to clarify that the phasing of the biotin is not so much as with respect to the end, as it is with respect to the full molecule. Static curvature and poloidal angle associated with the DNA molecule (which is something that is ultimately determined by the full chemical composition of the molecule, including its sequence and the mismatch) could make the molecule prefer a looped configuration where the biotin points towards the “inside” of the molecule. Such a configuration would be sterically unfavoured during the single molecule looping reaction where the biotin is attached to a surface via avidin. However, if the biotin is moved by half the helical repeat (or an off multiple of half the helical repeat, essentially 16 nt as done in the manuscript), it would now point to the “outside” of the molecule. Therefore, to make sure that the difference between the looping rates of any two DNA constructs (say the 601-RH and 601-R18-RH) is a better reflection of differences in dynamic flexibility, we ensure that the difference persists even when the biotin is moved by an odd multiple of half the helical repeat. We revised the section as follows.

      “For example, moving the location of the biotin tether by half the helical repeat (~ 5 bp) can lead to a large change in cyclization rate15, likely due to the preferred poloidal angle of a given DNA16 that determines whether the biotin is facing towards the inside of the circularized DNA, thereby hindering cyclization due to steric hindrance caused by surface tethering.”

      (7) Page 9/10: The comparison of yeast vs Xenopus is interesting, albeit a bit disconnected. Since the single-molecule statistics are relatively small, did the nucleosomes show similar bulk FRET distributions, or did they also show a shift in FRET levels?

      We included the data because we believe that information on how the histone core can determine the translation of DNA mechanics into nucleosome mechanical stability will be of interest to the readers of this manuscript. The FRET values were similarly distributed.

      (8) The discussion calls for a more detailed analysis of the structural differences of the histones of the two species to rationalize the observed asymmetry in flexibility dependence: why would yeast nucleosomes be less sensitive to sequence asymmetries?

      We added the following to Discussion to address this comment.

      “The crystal structure of the yeast nucleosome suggests that yeast nucleosome architecture is subtly destabilized in comparison with nucleosomes from higher eukaryotes9. Yeast histone protein sequences are not well conserved relative to vertebrate histones (H2A, 77%; H2B, 73%; H3, 90%; H4, 92% identities), and this divergence likely contributes to differences in nucleosome stability. Substitution of three residues in yeast H3 3-helix (Q120, K121, K125) very near the nucleosome dyad with corresponding human H3.1/H3.3 residues (QK…K replaced with MP…Q) caused severe growth defects, elevated nuclease sensitivity, reduced nucleosome positioning and nucleosome relocation to preferred locations predicted by DNA sequence alone 10. The yeast histone octamer harboring wild type H3 may be less capable of wrapping DNA over the histone core, leading to reduced resistance to the unwrapping force for the more flexible half of the 601positioning sequence.”

      (9) It would also be interesting if the increased stability due to the introduction of mismatches observed on Xenopus nucleosomes holds in yeast. Or does the reduced stability remove this effect? This is relevant to substantiate the broad claims in the context of evolution and cancer that are discussed in the manuscript.

      Unfortunately, we are unable to perform the suggested unwrapping experiment in a timely manner because the instrument has been disassembled during our recent move. However, in terms of cancer relevance, our mismatch dependence experiments were performed using vertebrate nucleosomes (Xenopus) so repeating this for yeast nucleosomes would not provide relevant information.

      Minor comments:

      (1) Supplementary Figure S1 misses the label '(C)' in its caption.

      We fixed it.

      (2) The supplementary data sequences for the fleezer measurements contain entrees 'R39 construct' and miss the positions of the Cy3 and Cy labels; the color code (levels of grey) is not explained.

      We fixed the labeling mistake and added detailed annotations of the highlighted features.

      References

      (1) Park, S., Brandani, G.B., Ha, T. & Bowman, G.D. Bi-directional nucleosome sliding by the Chd1 chromatin remodeler integrates intrinsic sequence-dependent and ATP-dependent nucleosome positioning. Nucleic Acids Res 51, 10326-10343 (2023).

      (2) Fazal, F.M., Meng, C.A., Murakami, K., Kornberg, R.D. & Block, S.M. Real-time observation of the initiation of RNA polymerase II transcription. Nature 525, 274-7 (2015).

      (3) Galburt, E.A., Grill, S.W., Wiedmann, A., Lubkowska, L., Choy, J., Nogales, E., Kashlev, M. & Bustamante, C. Backtracking determines the force sensitivity of RNAP II in a factor-dependent manner. Nature 446, 820-3 (2007).

      (4) Schweikhard, V., Meng, C., Murakami, K., Kaplan, C.D., Kornberg, R.D. & Block, S.M. Transcription factors TFIIF and TFIIS promote transcript elongation by RNA polymerase II by synergistic and independent mechanisms. Proc Natl Acad Sci U S A 111, 6642-7 (2014).

      (5) Kim, J.M., Carcamo, C.C., Jazani, S., Xie, Z., Feng, X.A., Yamadi, M., Poyton, M., Holland, K.L., Grimm, J.B., Lavis, L.D., Ha, T. & Wu, C. Dynamic 1D Search and Processive Nucleosome Translocations by RSC and ISW2 Chromatin Remodelers. bioRxiv (2024). (6) Jo, M.H., Meneses, P., Yang, O., Carcamo, C.C., Pangeni, S. & Ha, T. Determination of singlemolecule loading rate during mechanotransduction in cell adhesion. Science (in press).

      (7) Ngo, T.T., Zhang, Q., Zhou, R., Yodh, J.G. & Ha, T. Asymmetric unwrapping of nucleosomes under tension directed by DNA local flexibility. Cell 160, 1135-44 (2015).

      (8) Ngo, T.T., Yoo, J., Dai, Q., Zhang, Q., He, C., Aksimentiev, A. & Ha, T. Effects of cytosine modifications on DNA flexibility and nucleosome mechanical stability. Nat Commun 7, 10813 (2016).

      (9) White, C.L., Suto, R.K. & Luger, K. Structure of the yeast nucleosome core particle reveals fundamental changes in internucleosome interactions. EMBO J 20, 5207-18 (2001).

      (10) McBurney, K.L., Leung, A., Choi, J.K., Martin, B.J., Irwin, N.A., Bartke, T., Nelson, C.J. & Howe, L.J. Divergent Residues Within Histone H3 Dictate a Unique Chromatin Structure in Saccharomyces cerevisiae. Genetics 202, 341-9 (2016).

      (11) Kayikcioglu, T., Zarb, J.S., Lin, C.-T., Mohapatra, S., London, J.A., Hansen, K.D., Rishel, R. & Ha, T. Massively parallel single molecule tracking of sequence-dependent DNA mismatch repair in vivo. bioRxiv, 2023.01.08.523062 (2023).

      (12) Jeong, J., Le, T.T. & Kim, H.D. Single-molecule fluorescence studies on DNA looping. Methods 105, 34-43 (2016).

      (13) Jeong, J. & Kim, H.D. Base-Pair Mismatch Can Destabilize Small DNA Loops through Cooperative Kinking. Phys Rev Lett 122, 218101 (2019).

      (14) Blosser, T.R., Yang, J.G., Stone, M.D., Narlikar, G.J. & Zhuang, X. Dynamics of nucleosome remodelling by individual ACF complexes. Nature 462, 1022-7 (2009).

      (15) Basu, A., Bobrovnikov, D.G., Qureshi, Z., Kayikcioglu, T., Ngo, T.T.M., Ranjan, A., Eustermann, S., Cieza, B., Morgan, M.T., Hejna, M., Rube, H.T., Hopfner, K.P., Wolberger, C., Song, J.S. & Ha, T. Measuring DNA mechanics on the genome scale. Nature 589, 462-467 (2021).

      (16) Yoo, J., Park, S., Maffeo, C., Ha, T. & Aksimentiev, A. DNA sequence and methylation prescribe the inside-out conformational dynamics and bending energetics of DNA minicircles. Nucleic Acids Res 49, 11459-11475 (2021).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      Beyond my general review, some descriptions of the results and methods could be further clarified, which I've outlined below:

      (1) Page 3, Line 118-120: Based on results from Fig 1A, the authors reported 15 nanobodies neutralized both delta and BA.1 out of the 41 tested. However, I only counted 14. Could the authors double check?

      We recounted the nanobodies and confirmed there are 15 as follows:

      (1) RBD-15

      (2) RBD-22

      (3) RBD-24

      (4) RBD-9S1-4

      (5) S1-35

      (6) RBD-6

      (7) RBD-5

      (8) RBD-21

      (9) RBD-16

      (10) S1-46

      (11) S1-49dimer

      (12) S2-10dimer

      (13) S2-3

      (14) S2-62

      (2) Page 5, Lines 134-135: the authors described that the heatmap reflects the neutralizing strength of the representative nanobodies from each group. For groups where multiple nanobodies were selected for visualization, how was the neutralization strength calculated? Was the IC50 averaged first before being converted into the neutralization strength?

      This has been made clear in the legend for Fig. 1 as follows “For groups with multiple nanobodies, the average -log10 (IC50) is first calculated for the nanobodies within that group, then normalized to a neutralization score within the 0–100 range using the min and max average -log10 (IC50) for that group. A higher score indicates more potent neutralization of the variant relative to the wild type.”

      (3) Page 5, Lines 138-139: What was the authors' rationale for selecting certain nanobodies over others for structural modeling and visualizing the neutralization heatmap in Fig 1B? Does it introduce bias to the neutralizing epitope map on the spike protein?

      We only focused on nanobodies for which we had enough epitope mapping data to unambiguously generate docked nanobody-spike models, as explained in our previous study (Mast et. al, eLife 2021). When multiple nanobodies within the same group had sufficient epitope mapping data available, we selected only representative candidates that had better binding affinity and/or neutralization potency. As epitope mapping via escape mutants relied largely on random point mutagenesis of Spike, there should be little introduced bias.

      Overall, groups I-VII cover an exhaustive set of target areas on the RBD (including the lone glycan site in Group-II), while groups VII and IX are representative areas on NTD and S2. Using group-average IC50s and suitable normalization as mentioned in point 3 above further prevent potential biases due to unequal number of Nbs modeled from each group.

      We have modified the text with the following:

      “For computational epitope modeling, we selected nanobody candidates using a series of experimentally obtained structural restraints, as described in Mast, Fridy et al. 2021.”

      (4) Page 5, Lines 161-167: It would be good to include Fig S1 as a main figure as it places the epitope landscape of nanobodies being investigated in this manuscript into the broader context of clinically approved monoclonal antibody therapeutics for COVID-19.

      We have amended the Figures to accommodate the reviewers suggestion. Figure S1 is now Figure 2.

      (5) Page 6, Lines 173-175: The neutralization breadth for S1-46 is quite encouraging. Any speculations on why this particular nanobody is so broadly targeting? Any additional thoughts on why its high binding affinity (nM) did not translate into strong neutralization (as it is in the 0.1-1 uM range)?

      S1-46 binds a region on spike that is conserved across all variants observed to date. Its epitope is difficult to access unless the RBD is in the up conformation, which may explain why monoclonal antibodies rarely bind. We state this in the text as follows:

      “S1-46 binds a region on spike that is conserved across all variants to date, but which may be relatively inaccessible and is not targeted by any of the mAbs that previously received EUA by the FDA (Cox, Peacock et al. 2023).”

      Relating neutralization activity to binding activity requires more insight into the mechanisms of binding and activity. Nonetheless, we are also encouraged by S1-46’s breadth and numerous avenues can be pursued to greatly improve its neutralizing activity (e.g. synergistic combinations).

      (6) Page 6, Lines 173-175: For the remaining two nanobodies S1-31 and S1-RBD-11 in group VII, the target epitopes on the spike proteins of either delta or BA.1 do not seem to bear any mutations, at least based on the mutation maps in Fig 1B. Yet their neutralizing capacities against delta and BA.1 variants were abolished. Do the authors have any idea about what is going on here?

      For group VII, only the epitope of S1-46 was mapped whereas S1-31 and S1-RBD-11 were assigned to group VII based on our lower resolution binning experiments. Thus, without knowing precisely where they bind, we can make only limited conclusions at this time. In the absence of supporting structural information, we speculate that the epitopes of RBD-11 and S1-31 may be in a region that overlaps with or is in close proximity to a mutation that could affect the binding of the nanobody enough to result in loss of neutralizing ability.

      (7) Page 7, Line 195-200: Please provide PRNT50 or logPRNT50 for the five nanobodies selected for BA.4/5 PRNT assay.

      We have added this suggested information. Additionally, a supporting table (Table S1) is now provided.

      (8) Page 8, Lines 223-224: Similar to comment 3, what was the rationale here for choosing certain nanobodies over others for structural modeling and visualizing the binding heatmap in Fig 2B?

      The set of nanobodies chosen for structural modeling and visualization of neutralization data is identical to the set of anti-RBD nanobodies chosen for binding.

      (9) Page 11, Lines 326-328: Can the authors include mutation maps as part of Fig 4C to show the mutation distributions on the XBB/BQ.1/BQ/1.1 spikes?

      We have updated and added a supplemental figure to accompany Fig. 5 (called “supplement for Figure 5”) showing the mutation maps.

      (10) Page 14, Line 409-418: This paragraph is well considered. Given the large number of nanobodies assessed in this manuscript, it would be helpful if the authors could highlight some candidate nanobodies as lead candidates for further optimization.

      While our intention in this manuscript was not to provide targeted recommendations for lead candidates, but rather to reiterate the collective potential of a Nb pool originally targeted towards the 2019 Wuhan variant, the reviewers point is interesting. We speculate that any of the Nbs we have demonstrated to show pan-VoC activity, would be prime candidates for further optimization.

      We have added a statement to this effect as follows: “We propose that any of the Nbs we have demonstrated to show pan-VoC activity, would be prime candidates for further optimization.”

      Reviewer #2 (Recommendations For The Authors):

      Major concerns:

      (1) The main message of the article is the prediction that nanobodies that retain binding to the different SARS-CoV-2 variants including early Omicron strains will retain binding and neutralization against currently circulating strains such XBB and BQ. However, no evidence either via modeling or experimental testing has been provided for that prediction. The study will benefit from mapping amino acid mutations in RBD of XBB and BQ lineages compared to BA.4/5 and demonstrating via computation docking that epitopes of the five nanobodies that retain binding to BA.4/5 RBD are not affected. For example, the crystal structure of XBB.1 RBD PDB:8OIV is available. Binding/neutralization experiment with currently circulating SARS-CoV-2 strains would still be the gold standard test given the fact that only five out of 41 nanobodies retained binding and neutralization to BA.4/5 lineage. Loss of neutralization ability against BA.4/5 without a significant decrease in binding affinity for nanobodies S1-46 and S1-RBD-22 further indicates that neutralization of XBB and BQ lineage should be performed.

      The docking protocol used to predict the spike epitopes uses a C-alpha resolution to represent protein residues, and is data-driven, i.e. it assumes that binding happens in the first place, and then utilizes experimentally obtained structural restraints. So, concluding possible binding from such a docking protocol alone would be noisy. In our revised manuscript we have a new Figure 3B, which shows epitopes of 4 out of the 5 pan-VoC nanobodies, i.e. S1-RBD-{9, 22, 40) and S1-46 mapped to the RBD structures of XBB.1 (8IOU) and BQ.1.1 (8FXC), and we have updated Figure 4 with a supplemental showing the mutation maps.

      (2) Described nanobodies are positioned as very potent neutralizers of SARS-CoV-2. However, they are much less potent in neutralization of ancestral strain as well as early VOCs compared to the mAbs that were approved for COVID-19 treatment. For example, IC50 for casirivimab and imdevimab are 37.4 pM and 42.1 pM, respectively. That is about 27-fold more than IC50 for the most potent nanobody reported in the article, S1-RDB-15.

      This comparison is fraught for several reasons. 1. Experimental differences in pseudovirus assay systems usually result in significant differences in reported IC50s, as IC50 is not an absolute measure, or ultimately comparable to clinical IC50 values. For this reason, in our original publication (Mast et al., 2021) we tested other nanobodies in our experimental set-up as benchmarks (Mast et al., 2021). 2. A typical monoclonal has two binding sites with a large structural Fc linker that is combined ~10 times the size of a nanobody. In a therapeutic setting where monoclonal therapy is provided in g per kg of patient body weight, there is a 5-fold excess of Nb binding to antibody binding capacity. 3. We have previously shown that dimerizing our nanobodies (to produce two antigen binding sites) can dramatically increase potency over 100 fold (Mast et al., 2021).

      In order to make this even clearer in the manuscript, we have added the following: “We note that IC50s are not directly comparable across different experimental set-ups because measured values are highly dependent on the experimental conditions. For this reason, we included other published nanobodies as benchmarks in our original publication and have subsequently maintained standard experimental conditions (Mast, Fridy et al. 2021)”.

      (3) Figure 1A. If each dot represents an independent measurement of the same nanobody, IC50 variation seems too high. For some nanobodies it ranges for almost a log of magnitude, e.g S1-RDB-24, S1-RBD-46, S2-3. Why is that?

      We have deliberately explored the full range of effects that could contribute to experimental variability in our pseudovirus assay, using different batches of nanobody and pseudovirus in each replicate to provide as impartial and comprehensive analysis as possible. While the activity of some nanobodies is remarkably stable from batch to batch, others show the variation noticed by the Reviewer, hence why we performed multiple replicates to define the average IC50 value for our nanobodies.

      (4) The drop in IC50 for BA.1 neutralization is about one log for the majority of tested nanobodies. This should be outlined in the text. For example, for the most potent neutralizer, S1-RDB-15, the drop in IC50 for BA.1 is about 100-fold compared to IC50 for the Delta and Wuhan strains. It is important to note that out of 9 nanobodies for that drop in neutralizing capacity against BA.1 and Delta variants less than one log of magnitude 2 have epitopes in the S2 domain of SRS-CoV-2 spike. Resistance of mAbs targeting the S2 part of the spike has been extensively described in the literature as being due to the highly conserved structure of this region that facilitates membrane fusion. Presented data demonstrate that >80% of the nanobody repertoire is affected by mutations on spike protein. Additionally, it can be helpful for readers if the fold-change in IC50 between Wuhan, Delta, and BA.1 is presented in the text or added to Figure 1 or a table.

      We agree with the Reviewer and to make this more explicit we have made the following change: “In comparison, groups I, I/II, I/IV, V, VII, VIII and the anti-S2 nanobodies contained the majority of omicron BA.1 neutralizers, though here the neutralization potency of many nanobodies was generally decreased tenfold compared to wild-type (emphasis added).”

      (5) The authors should either present the results of the formal correlation analysis or avoid using misleading verbiage such as: "the decrease in neutralization potency largely correlates with the accumulation of omicron BA.1 specific mutations throughout the RBD" or "significant decrease in binding affinity correlated to decreases neutralization potency".

      We thank the Reviewer for this constructive feedback. To address this question, we have performed a correlation analysis using Pearson and Spearman's methods to quantitatively assess the relationship between nanobody neutralization potency (IC50) and binding affinity (KD) across SARS-CoV-2 variants, including the wildtype, delta, and omicron BA.1 variants. Our results indicate a statistically significant correlation for the delta variant (Pearson's PCC: 0.71, p-value: 0.01; Spearman's rho: 0.63, p-value: 0.07), supporting our statement regarding the correlation between decreased neutralization potency and reduced binding affinity for this variant. However, for the wildtype and omicron BA.1 variants, the correlations were not statistically significant (wildtype Pearson's: 0.10, p-value: 0.70; omicron BA.1 Pearson's: 0.27, p-value: 0.31), which we acknowledge does not fully align with the verbiage used in the manuscript. Therefore, we have revised the manuscript to present the correlation analysis data accurately and ensure the discussion is reflective of the statistical evidence as follows:

      “SPR binding assessments to the spike S1 domain or RBD of delta revealed a pattern: nanobodies maintaining binding affinity generally also neutralized the virus with a statistically significant correlation between binding affinity and neutralization efficacy (Pearson's Correlation Coefficient: 0.71, p-value: 0.01; Spearman's rho: 0.63, p-value: 0.07). However, this correlation was not statistically significant for omicron BA.1 (Pearson's Correlation Coefficient: 0.27, p-value: 0.31) (Fig. 3A, Table 1). Notably, while some nanobodies bound to the variants, they did not consistently neutralize them, suggesting additional factors influence neutralization beyond mere binding.”

      (6) Figure 3 shows approximated curves for live virus neutralization assay with quite a broad 90% CI. It will be helpful to present, at least, in supplementary, primary data for live-virus neutralization that were used to perform non-linear regression.

      We have added the reviewer’s suggestion.

      (7) It is not clear what are the "variant-specific nanobody groups" exactly? A definition/description of the term is not provided. If the nanobody library was generated with the Wuhan strain, how did strain-specific nanobodies that bind/neutralize only Delta, BA.1 or BA.4/5 appear in the repertoire and were isolated? This statement also contradicts data in Table 4 where all nanobodies listed bind and neutralize Wuhan strain.

      We agree with the reviewer. All nanobodies tested bind/neutralize the Wuhan strain as they were selected from our original repertoire of 116 nanobodies (Mast, et al., 2021). To clarify, variant-specific nanobodies are nanobodies that bind only one variant that arose from the original Wuhan strain. They were categorized into variant-specific groups based on whether they were able to bind each variant (other than Wuhan).

      We have thus added to the manuscript, “we define variant-specific nanobodies as nanobodies that bind a single additional variant alongside the original Wuhan strain...”

      (8) Describing the categorization of nanobody epitope groups presented in Figure 4, the authors state that binding to Wuhan, Delta, BA/1, and BA.4/5 predicts that these nanobodies will be "effective binders against current circulating strains of the virus including XBB and BQ lineages"? How exactly is this conclusion corollary to the data shown?

      The epitopes of XBB and BQ.1 are not divergent enough within the regions we propose the nanobodies to bind, to suggest that nanobodies that bind in those regions will lose binding ability. We hypothesize that the region at which these nanobodies bind represents regions on spike that are vulnerable to our specified nanobodies in Fig. 4. We have generated a new Fig. 3B and added a supporting figure for Fig. 4 to address this.

      (9) Figures 4C and 6 describe how the nanobodies will retain binding to currently circulating strains of XBB lineage. However, epitopes are mapped on the same Wuhan, Delta, BA.1, and BA.4/5 virus strains. The predicted binding of nanobodies to XBB lineage RBD is not actually shown in Figure 6. It is clear from the figure that the nanobody binding footprint (red area) decreases with antigenic distance in every spike projection from Wuhan through the BA.4/5 strain. It is unclear how this indicates that nanobodies will remain active against even more distant XBB, BQ, EU, and CH strains accumulating more mutations in spike protein.

      We have added the following to the manuscript to clarify: “Strikingly, we have in our cohort 8 nanobodies able to bind delta, and the omicron lineages BA.1/BA.4/BA.5/XBB/BQ.1.1 (Fig. 5B). We further predict these 8 nanobodies will be effective binders against current circulating strains of the virus including omicron EG.5 and HV.1 as the epitope regions (or predicted epitopes) of these nanobodies do not vary significantly from omicron lineages XBB and BQ.1.1 (Fig. 5C and Supplement to Fig. 5).”

      (10) Despite major advances in the development of nanobodies as therapeutic molecules there are only a few nanobody-based drugs that have so far been approved for clinical use and all of them are nanobody fusions to immunoglobulin Fc fragment. It is dictated by the small size of the nanobody itself, 15 kDa molecule, that leads to rapid kidney clearance within hours post-injection, and also by the necessity of having antibody effector functions allowing for example killing of malignant cells. It is hard to predict how each individual nanobody will tolerate multimerization and if it will still retain binding ability as its size dramatically increases. It should be noted that IC50 for BA.4/5 is in the submicromolar range for the 5 nanobodies retaining neutralization of this strain. From a therapeutic perspective, this is quite a high IC50 that dictates a high dosage to achieve a therapeutic effect. Furthermore, it can be expected that additional mutations in the SARS-CoV-2 spike will further affect binding affinity and therefore reduce the neutralization ability of these nanobodies resulting in even higher doses required to achieve therapeutic effect. Therefore, authors should discuss the limitations of the nanobody approach as a therapeutic intervention more granularly.

      While Fc fusions are not strictly required for clinical use (for instance Caplacizumab is not an Fc fusion, being a multimer containing an albumin-binding nanobody), we agree that reformulation would indeed be required to optimize pharmacokinetics for eventual clinical use. Increased valency through multimerizeration is in fact one of several strategies, which also includes synergistic combinations, for significantly enhancing effective IC50. Preclinical nanobody engineering is not within the scope of this paper, but we acknowledge this challenge.

      Minor points:

      (1) Table S1 is missing.

      This is an .xlsx file uploaded as Supplementary File 3. Labeled now as “Figure 6–Source data 2. Neutralization data from synergy experiment”.

      (2) Because Table 1 summarizes all neutralization and binding data, it will be helpful to refer to it while describing data presented in Figure 1.

      This has been added to the revised manuscript.

      (3) Live SARS-CoV-2 PRNT is not described in Materials and Methods.

      This has been added to the revised manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Review:

      Reviewer #1:

      Summary:

      The Roco proteins are a family of GTPases characterized by the conserved presence of an ROC-COR tandem domain. How GTP binding alters the structure and activity of Roco proteins remains unclear. In this study, Galicia C et al. took advantage of conformationspecific nanobodies to trap CtRoco, a bacterial Roco, in an active monomeric state and determined its high-resolution structure by cryo-EM. This study, in combination with the previous inactive dimeric CtRoco, revealed the molecular basis of CtRoco activation through GTP-binding and dimer-to-monomer transition.

      Strengths:

      The reviewer is impressed by the authors' deep understanding of the CtRoco protein. Capturing Roco proteins in a GTP-bound state is a major breakthrough in the mechanistic understanding of the activation mechanism of Roco proteins and shows similarity with the activation mechanism of LRRK2, a key molecule in Parkinson's disease. Furthermore, the methodology the authors used in this manuscript - using conformation-specific nanobodies to trap the active conformation, which is otherwise flexible and resistant to single-particle average - is highly valuable and inspiring.

      Weakness:

      Though written with good clarity, the paper will benefit from some clarifications.

      (1) The angular distribution of particles for the 3D reconstructions should be provided (Figure 1 - Sup. 1 & Sup. 2).

      Figure 1 – Figure supplements 1 and 2 now contain particle distribution plots.

      (2) The B-factors for protein and ligand of the model, Map sharpening factor, and molprobity score should be provided (Table 1).

      Table 1 now contains B-factors and molprobity scores.

      The map used to interpret the model was post-processed by density modification, and therefore no data concerning sharpening factors are provided in the output.

      (3) A supplemental Figure to Figure 2B, illustrating how a0-helix interacts with COR-A&LRR before and after GTP binding in atomic details, will be helpful for the readers to understand the critical role of a0-helix during CtRoco activation.

      This is now illustrated in the new Figure 2 – Figure Supplement 1.

      (4) For the following statement, "On the other hand, only relatively small changes are observed in the orientation of the Roc a3 helix. This helix, which was previously suggested to be an important element in the activation of LRRK2 (Kalogeropulou et al., 2022), is located at the interface of the Roc and CORB domains and harbors the residues H554 and Y558, orthologous to the LRRK2 PD mutation sites N1337 and R1441, respectively." It is not surprising the a3-helix of the ROC domain only has small changes when the ROC domain is aligned (Figure 2E). However, in the study by Zhu et al (DOI: 10.1126/science.adi9926), it was shown that a3-helix has a "see-saw" motion when the COR-B domain is aligned. Is this motion conserved in CtRoco from inactive to active state?

      We indeed describe the conformational changes from the perspective of the Roc domain. When using the COR-B domain for structural alignment, a rotational movement of Roc (including a “seesaw”-like movement of the α3-helix helix around His554) with respect to COR-B is correspondingly observed.

      This is now added to Figure 2E. Additionally, the text was adapted to:

      “Interestingly, this rotational movement of CORB seems to use the H554-Y558-Y804 triad on the interface of Roc and CORB as a pivot point (Figure 2E). Mutation of either of the corresponding residues in LRRK2 (N1437, R1441, Y1699, respectively) is associated with PD and leads to LRRK2 activation. Residues H554 and Y558 are located on the Roc a3 helix, which was previously suggested to be an important element in the activation of LRRK2 (Kalogeropulou et al., 2022). Indeed, while the orientation of the a3 helix with respect to the rest of the Roc domain only undergoes small changes upon GTPgS binding, it can be observed that this helix undergoes a “seesaw-like” movement with respect to the CORB domain. A similar rearrangement was previously also observed for Rab29-mediated activation of human LRRK2 (Störmer et al., 2023; Zhu et al., 2022).”

      (5) A supplemental figure showing the positions of and distances between NbRoco1 K91 and Roc K443, K583, and K611 would help the following statement. "Also multiple crosslinks between the Nbs and CtRoco, as well as between both nanobodies were found. ... NbRoco1-K69 also forms crosslinks with two lysines within the Roc domain (K583 and K611), and NbRoco1-K91 is crosslinked to K583".

      A figure displaying these crosslinks is now provided as Figure 4–figure supplement 1. However, in interpreting these crosslinks it should be taken into consideration that the additive length of the DSSO spacer and the lysine side chains leads to a theoretical upper limit of ∼26 Å for the distance between the α carbon atoms of cross-linked lysines (and even a cut-off distance of 35 Å when taking into account protein dynamics).

      (6) It would be informative to show the position of CtRoco-L487 in the NF and GTP-bound state and comment on why this mutation favors GTP hydrolysis.

      L487 is located in Switch 1, which is a critical region for nucleotide binding and hydrolysis. Unfortunately, most probably due to flexibility, the Switch 1 region could not be entirely modeled (in neither nucleotide state). Since L487 is located on the edge of the interpretable portion of the Switch 1 in both structures (see Author response image 1 below), any interpretation regarding the role of this residue would be highly speculative.

      Author response image 1.

      The following text was added to the Results section:

      “Also the Switch 1 loop could not be fully modeled in our structure, presumably indicating some flexibility in this region despite the presence of a GTP analogue. Interestingly, the Switch 1 loop harbors the site of the PD-analogous L487A mutation that leads to a stabilization of the CtRoco dimer with a concomitant decrease in GTPase activity (Deyaert et al., 2019). Unfortunately, an exact interpretation of this effect of the L487A mutation is hampered by the lack of a well resolved Switch 1 loop.”

      Reviewer #2:

      Summary

      The manuscript by Galicia et al describes the structure of the bacterial GTPyS-bound CtRoco protein in the presence of nanobodies. The major relevance of this study is in the fact that the CtRoco protein is a homolog of the human LRRK2 protein with mutations that are associated with Parkinson's disease. The structure and activation mechanisms of these proteins are very complex and not well understood. Especially lacking is a structure of the protein in the GTP-bound state. Previously the authors have shown that two conformational nanobodies can be used to bring/stabilize the protein in a monomerGTPyS-bound state. In this manuscript, the authors use these nanobodies to obtain the GTPyS-bound structure and importantly discuss their results in the context of the mammalian LRRK2 activation mechanism and mutations leading to Parkinson's disease. The work is well performed and clearly described. In general, the conclusions on the structure are reasonable and well-discussed in the context of the LRRK2 activation mechanism.

      Strengths:

      The strong points are the innovative use of nanobodies to stabilize the otherwise flexible protein and the new GTPyS-bound structure that helps enormously in understanding the activation cycle of these proteins.

      Weakness:

      The strong point of the use of nanobodies is also a potential weak point; these nanobodies may have induced some conformational changes in a part of the protein that will not be present in a GTPyS-bound protein in the absence of nanobodies.

      Two major points need further attention.

      (1) Several parts of the protein are very flexible during the monomer-dimer activity cycle. This flexibility is crucial for protein function, but obviously hampers structure resolution. Forced experiments to reduce flexibility may allow better structure resolution, but at the same time may impede the activation cycle. Therefore, careful experiments and interpretation are very critical for this type of work. This especially relates to the influence of the nanobodies on the structure that may not occur during the "normal" monomerdimer activation cycle in the absence of the nanobodies (see also point 2). So what is the evidence that the nanobody-bound GTPyS-bound state is biochemically a reliable representative of the "normal" GTP-bound state in the absence of nanobodies, and therefore the obtained structure can be confidentially used to interpret the activation mechanism as done in the manuscript.

      See below for an answer to remark 1 and 2.

      (2) The obtained structure with two nanobodies reveals that the nanobodies NbRoco1 and NbRoco2 bind to parts of the protein by which a dimer is impossible, respectively to a0helix of the linker between Roc-COR and LRR, and to the cavity of the LRR that in the dimer binds to the dimerizing domain CORB. It is likely the open monomer GTP-bound structure is recognized by the nanobodies in the camelid, suggesting that overall the open monomer structure is a true GTP-bound state. However, it is also likely that the binding energy of the nanobody is used to stabilize the monomer structure. It is not automatically obvious that in the details the obtained nonobody-Roco-GTPyS structure will be identical to the "normal" Roco-GTPyS structure. What is the influence of nanobody-binding on the conformation of the domains where they bind; the binding energy may be used to stabilize a conformation that is not present in the absence of the nanobody. For instance, NbRoco1 binds to the a0 helix of the linker; what is here the "normal" active state of the Roco protein, and is e.g. the angle between RocCOR and LRR also rotated by 135 degrees? Furthermore, nanobody NbRoco2 in the LRR domain is expected to stabilize the LRR domain; it may allow a position of the LRR domain relative to the rest of the protein that is not present without nanobody in the LRR domain. I am convinced that the observed open structure is a correct representation of the active state, but many important details have to be supported by e,g, their CX-MS experiments, and in the end probably need confirmation by more structures of other active Roco proteins or confirmation by a more dynamic sampling of the active states by e.g. molecular dynamics or NMR.

      Recently, nanobodies have increasingly been used successfully to obtain structural insights in protein conformational states (reviewed in Uchański et al, Curr. Opin. Struc. Biol. 2020). As reviewer # 2 points out, the concern is sometimes raised that antibodies could distort a protein into non-native conformations. Here, it is important to note that the nanobodies were raised by immunizing a llama with the fully native CtRoco protein bound to a non-hydrolysable GTP analogue, after which the nanobodies were selected by phage display using the same fully native and functional form of the protein. As clearly explained in Manglik et al. Annu Rev Pharmacol Toxicol. 2017, the probability of an in vivo matured nanobody inducing a non-native conformation of the antigen is low, although it is possible that it selects a high-energy, low-population conformation of a dynamic protein. Immature B cells require engagement of displayed antibodies with antigen to proliferate and differentiate during clonal selection. Antibodies that induce non-native conformations of the antigen pay a substantial energetic penalty in this process, and B cell clones displaying such antibodies will have a significantly lower probability of proliferation and differentiation into mature antibody-secreting B lymphocytes. Hence, many recent experiments and observation give credence to the notion that nanobodies bind antigens primarily by conformational selection and not induced fit (e.g. Smirnova et al. PNAS 2015).

      Extrapolated to the case of CtRoco, which is clearly very flexible in its GTP-bound form, this means that the nanobodies are able to trap and stabilize one conformational state that is representative of the “active state” ensemble of the protein. In this respect, it is clear from our experiments (XL-MS, affinity and effect on GTPase activity) that the effects of NbRoco1 and NbRoco2 are additive (or even cooperative), meaning that both nanobodies recognize different features of the same CtRoco “active state”. Correspondingly, the monomeric, elongated “open” conformation is also observed in the structure of CtRoco bound to NbRoco1 only (Figure1 - supplement 2), albeit that this structure still displays more flexibility. The monomerization and conformational changes that we observe and describe in the current paper at high resolution are also in very good agreement with earlier observations for CtRoco in the GTP-bound form in absence of any nanobodies, including negative stain EM (Deyaert et al. Nature Commun, 2017), hydrogen-deuterium exchange experiments (Deyaert et al. Biochem. J. 2019) and native MS (Leemans et al. Biochem J. 2020).

      In the revised manuscript we added the following text to the discussion:

      “To decrease this flexibility, we have now used two previously developed conformationspecific nanobodies (NbRoco1 and NbRoco2) to stabilize the protein in the GTP-state (Leemans et al., 2020), allowing us to solve its structure using cryo-EM (Figure 1). Recently, Nbs have successfully been used to obtain structural insights in the conformational states of a number of highly dynamic proteins (Uchański et al, 2020). These studies established that Nbs bind antigens primarily by conformational selection rather than by induced fit (Manglik et al., 2017; Smirnova et al.,2015). Since NbRoco1 and NbRoco2 were generated by immunization with fully native CtRoco bound to a nonhydrolysable GTP analogue, and subsequently selected by phase display using the same functional protein, it is thus safe to assume that these Nbs bind to and stabilize a relevant conformation that is present within the “active” CtRoco conformational space (Leemans et al., 2020). Moreover, our current structures are also in very good agreement with previous biochemical studies and data from HDX-MS and negative stain EM (Deyaert et al., 2019; Deyaert, Wauters, et al., 2017).”

      Recommendations for the authors:

      Reviewer #1:

      (1) Figure 2C: please label the residues with meshes (switch 2).

      Labels have been added to figure 2C.

      (2) A supplemental figure for the following statement will be helpful "A remarkable feature of the CtRoco dimer structure was the dimer-stabilized orientation of the P-loop, which would hamper direct nucleotide binding on the dimer. Correspondingly, in the current structure, the P-loop changes orientation, allowing GTPgS to bind, although the EM map does not allow unambiguous placement of the entire P-loop. Surprisingly, also the Switch 1 loop could not be fully modeled, which could indicate some flexibility in this region despite the presence of a GTP analog".

      An additional Figure 2–figure supplement 2 has been added to illustrate this.

      (3) A supplemental figure for the following statement will be helpful "A final important observation in the Roc domain concerns the very C-terminal part of Switch 2 (residues 520 to 533), which could not be modeled in our GTP bound structure due to flexibility, while in the nucleotide-free dimer structure this region is structured and located at the interface of the Roc domain with the LRR-Roc linker and CORA. In this way, the conformational changes induced by GTPgS binding could be relayed via the Switch 2 toward the LRR and CORA domains, and vice versa."

      An additional Figure 2–figure supplement 2 has been added to illustrate this.

      (4) A structural comparison of each domain (LRR, ROC, COR) between NF and GTP-bound states will be greatly useful to understand statements in the manuscript, such as "In addition to the Cterminal dimerization part of CORB that becomes unstructured, also other large conformational changes are observed in the CORA and CORB domains of CtRoco upon GTPgS binding."

      We would like to clarify that with this statement we refer to changes in the relative orientation of the domains between the nucleotide-free and GTPgS-bound states, rather than to conformational changes within each domain. These changes in relative orientation are illustrated in Figure 2 and the associated Figure supplements.

      (5) The statement "to a lesser extent, also between CDR1 and the LRR-Roc linker" is not clearlyillustrated in Figure 3B.

      The reviewer is correct, and we now also show CDR1 in Figure 3B.

      (6) Extra panels can be added in Figure 1 Sup. 4 to illustrate the following statement "In the density map NbRoco2 can easily be identified and placed on the concave side of the LRR domain... Nterminal and C-terminal b-strands interacting with the very C-terminal repeat of the LRR".

      We belief the density map corresponding to NbRoco2 is clearly shown in Figure 1 – supplement 4A. A reference to this figure panel is now added to the main text.

      (7) "In the presence of both Nbs, the hydrolysis rate was increased 4-fold compared to CtRocoL487A alone and 2-fold compared to CtRoco-L487A in the presence of NbRoco1 only, again illustrating a collaboration between the Nbs (Figure 5C)" Here, is it 6-fold instead of 4-fold?

      The reviewer is correct. We changed this accordingly in the manuscript.

      Reviewer #2:

      (1) At many places in the manuscript the lack of structural details is explained by the assumed local flexibility of the protein. This may be true for many cases (such as linker regions), but is probably not always correct; several other explanations are possible to get no local structural details.

      See our answer to point 2, below.

      (2) At several other places in the manuscript the high flexibility is used to explain the lack of structural details (so the reasoning is reversed compared to point 1); this would require that a priori it is known that that the region is flexible and therefore no structure can be expected. An example is found mid-page 8: "A final important observation in the Roc domain concerns the very C-terminal part of Switch 2 (residues 520 to 533), which could not be modeled in our GTP bound structure due to flexibility, while in the nucleotide-free dimer structure this region is structured and located at the interface of the Roc domain with the LRR-Roc linker and CORA." As written there must be a reference to experiments showing the "due to flexibility"

      The reviewer is correct that additional factors might affect the interpretability of the map, such as the small size of the regions used for the focused refinements (around 50 kDa each) or a preferential distribution of orientation of the particles in the grid. Particle distribution plots are now shown in Figure 1 – Figure supplements 1 and 2. However, due to the intrinsic flexible nature of the Switch 1 and Switch 2 regions, we assume this flexibility to be the major cause of lack of features in the EM maps, especially since some of the neighboring regions display well-resolved maps.

      Nevertheless, in the manuscript we reworded our statements to be more careful. For example, on page 8:

      “Also the Switch 1 loop could not be fully modeled in our structure, presumably indicating some flexibility in this region despite the presence of a GTP analogue.”

      “… potentially due to flexibility of this region in the new position of the Switch 2…”

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The aim of the present work is to evaluate the role of BMP9 and BMP10 in liver by depleting Bmp9 and Bmp10 from the main liver cell types (endothelial cells (EC), hepatic stellate cells (HSC), Kupffer cells (KC) and hepatocytes (H)) using cell-specific cre recombinases. They show that HSCs are the main source of BMP9 and BMP10 in the liver. Using transgenic ALK1 reporter mice, they show that ALK1, the high affinity type 1 receptor for BMP9 and BMP10, is expressed on KC and EC. They have also performed bulk RNAseq analyses on whole liver, and cell-sorted EC and KC, and showed that loss of Bmp9 and Bmp10 decreased KC signature and that KC are replaced by monocyte-derived macrophages. EC derived from these Bmp9fl/flBmp10fl/flLratCre mice also lost their identity and transdifferentiated into continuous ECs. Liver iron metabolism and metabolic zonation were also affected in these mice. In conclusion, this work supports that BMP9 and BMP10 produced by HSC play a central role in mediating liver cell-cell crosstalk and liver homeostasis.

      We appreciate the comprehensive summary of reviewer 1.

      Strengths:

      This work further supports the role of BMP9 and BMP10 in liver homeostasis. Using a specific HSC-Cre recombinase, the authors show for the first time that it is the BMP9 and BMP10 produced by HSC that play a central role in mediating liver cell-cell crosstalk to maintain a healthy liver. Although the overall message of the key role of BMP9 in liver homeostasis has been described by several groups, the role of hepatic BMP10 has not been studied before. Thus, one of the novelties of this work is to have used liver cell specific Cre recombinase to delete hepatic Bmp9 and Bmp10. The second novelty is the demonstration of the role of BMP9 and BMP10 in KC Differentiation/homeostasis which has already been slightly addressed by this group by knocking out ALK1, the high affinity receptor of BMP9 and BMP10 (Zhao et al. JCI, 2022).

      We appreciate the positive comment of reviewer 1.

      Weaknesses:

      This work remains rather descriptive and the molecular mechanisms are barely touched upon and could have been more explored. Some references should be added; In particular, a work that has already demonstrated, using a different approach (in situ hybridization RNAscope), that in the liver BMP9 and BMP10 are expressed by HSC (Tillet et al., J Biol Chem 2018). Another publication (Bouvard et al., Cardiovasc Res, 2021) has previously showed that deletion of Bmp9 and Bmp10 leads to liver fibrosis and could have thus been cited. There is also a reference that is not correctly cited. Ref 26 (Herrera et al., 2014) does not say that "BMP10 is mostly expressed in the heart, followed by the liver" or that "BMP9 and BMP10 also bind to ALK2" as cited in the manuscript.

      We agree with the comment of reviewer 1 that the molecular mechanisms were barely investigated in our work. Indeed, it has been reported that BMP9/10 induce the expression of ID1/3 in KCs and GATA4 and Maf in liver ECs in vitro culture system. These master regulators play an important role in the differentiation of the two cell types. Thus, we think that the reduced expression of these master regulators can explain the phenotype in KCs and ECs observed in Bmp9fl/flBmp10fl/flLratCre mice. In addition, according to the reviewer’s suggestion, these references will be added or corrected in our revised manuscript.

      The gating strategies for cell sorting which is used for bulk RNAseq and FACS analyses should be better described in order to better follow the manuscript. This point is particularly important for KC gating as the authors show that Tim4 is very strongly decreased in Bmp9fl/flBmp10fl/flLratCre (Fig 2c), yet, it seems that this marker is used for gating macrophages (Suppl fig4). Same question with F4/80 which is strongly decreased in Bmp9fl/flBmp10fl/flLratCre (Fig 2d) and also used for gating. It is important to show the gating strategy for both Control and Bmp9fl/flBmp10fl/flLratCre mice.

      The authors should explain how they selected the genes shown on each heatmaps and add references that can justify the choice of the genes.

      Thank you for your suggestion. In our study, we used CD45+ Ly6C- F4/80+ CD64+ cells to define liver macrophages. We will delete Tim4 FACS plot from Suppl fig4 to avoid the misunderstanding. Although F4/80 positive cells were reduced in the livers of Bmp9fl/flBmp10fl/flLratCre mice, double staining by anti-F4/80 and anti-CD64 fluorescence antibodies can still clearly distinguish liver macrophages based on above gating strategy. Gating strategy for both control and Bmp9fl/flBmp10fl/flLratCre mice will be presented in our revised manuscript.

      Quantifications of Immunostaining and FACS data should be added as well as statistical analyses.

      Quantitative data will be added in our revised manuscript.

      Reviewer #2 (Public Review):

      Summary:

      The authors characterized the contribution of BMP9/BMP10 expression/secretion from all different hepatic cell types and analysed their impact on the other cell types. They are able to show that HSC derived BMP9/BMP10 controls Kupffer cell and EC differentiation and functions.

      We appreciate the comprehensive summary of reviewer 2.

      Strengths:

      This is the first study to my knowledge to comprehensively analyze the contribution of BMP9/BMP10 expression in such systematic fashion in vivo. This study therefore is a significant contribution to the field and further supports previous studies that have already implied BMP9 and BMP10 in Kupffer cell and EC functions but did not unravel the intercellular cross talk in such detailed fashion.

      We appreciate the positive comment of reviewer 2.

      Weaknesses:

      Several findings such as the impact of BMP9/10 on Kupffer cells and EC were already known. So these findings are not innovative, however I still believe that the elucidation of the cellular crosstalk makes this publication highly interesting to a broad scientific community.

      Overall the authors achieved their aims and the results are well supporting the conclusions and discussion.

      We appreciate the positive comment of reviewer 2. We agree with the comment of reviewer 2 that although some findings in our paper are somehow expected, the detailed investigation of the crosstalk between different liver cell types is still needed and beneficial to this field.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Responses to Reviewer 1:

      It wouldn't be very surprising to identify the association between PhenoAgeAccel and cancer risk, since the PhenoAgeAccel was constructed as a predictor for mortality which attributed a lot to cancer. Although cancer is an essential mediator for the association, sensitivity analyses using cancer-free mortality may provide an additional angle.

      As suggested, we retrained the PhenoAge in cancer-free participants based on mortality and recalculated PhenoAgeAccel in the UK Biobank. As expected, the re-calculated PhenoAgeAccel was still significantly associated with an increased risk of overall cancer in both men and women. The relevant results have been added to Appendix 1-table6.

      It would be interesting to see, to what extent, PhenoAgeAccel could be reversed by environmental or lifestyle factors. G by E for PhenoAgeAccel might be worth a try.

      As suggested, we performed interaction analysis between genetic and lifestyle factors on PhenoAgeAccel, and added the methods and results in the revision as follows:

      “55 independent PhenoAgeAccel-associated SNPs (P < 5 × 10-8) and corresponding effect sizes were derived from a large-scale PhenoAgeAccel GWAS including 107,460 individuals of European ancestry (Kuo, Pilling, Liu, Atkins, & Levine, 2021). A PhenoAgeAccel PRS was created using an additive model as previously described (Dai et al., 2019). In short, the genotype dosage of each risk allele for each individual was summed after multiplying by its respective effect size of PhenoAgeAccel.” (Page 6)

      “We performed additive interaction analysis between genetic risk (defined by CPRS) and PhenoAgeAccel on overall cancer risk, as well as genetic risk (defined by PhenoAgeAccel PRS) and lifestyle on PhenoAgeAccel using two indexes: the relative excess risk due to interaction (RERI) and the attributable proportion due to interaction (AP).” (Page 9)

      “However, we did not observe any interaction between genetic risk and lifestyle on PhenoAgeAccel in both men and women (Appendix 1-table 11).” (Page 13)

      Responses to Reviewer 2:

      Since the UK biobank has a large sample size, it should have enough power to split the dataset into discovery and validation sets. Why did the authors use 10-fold cross-validation instead of splitting the dataset?

      There may have been some misunderstandings in the interpretation of methods that 10-fold cross-validation was applied to select biomarkers when calculating PhenoAge in the previous manuscript (Levine et al., 2018). In this study, we analyzed the association between PhenoAgeAccel and incident cancer risk by dividing participants into ten groups based on the deciles of PhenoAgeAccel and assessed the associations of each group compared to the lowest decile. To avoid any confusion, we have removed the description of 10-fold cross-validation from the Methods section (Page 5).

      Recommendations for the authors:

      In addition, there is extant literature on the role of Phenotypic Age Acceleration in cancer risk and mortality that should be reviewed. Please also address possible overlap with previous work that used the UK Biobank cohort study (PMCID: PMC9958377).

      As suggested, we have reviewed the association of Phenotypic Age Acceleration with cancer risk, and added it into the Discussion section as follows:

      “Recently, several studies have confirmed the associations between PhenoAgeAccel and cancer risk. Mak et al. explored three measures of biological age, including PhenoAge, and assessed their associations with the incidence of overall cancer and five common cancers (breast, prostate, lung, colorectal, and melanoma) (Mak et al., 2023). In our previous study, we investigated the association between PhenoAgeAccel and lung cancer risk and analyzed the joint and interactive effects of PhenoAgeAccel and genetic factors on the risk of lung cancer (Ma et al., 2023). In comparison to these studies, our analysis expanded the range of cancers to 20 types and further explored the associations in different genetic and lifestyle contexts. Moreover, we also evaluated the potential implications of PhenoAge in population-level cancer screening.” (Page 15).

      Other minor comments:

      Line 216, "-4.35 to -1.25" or "-4.35, -1.25" may be better.

      As suggested, we have adjusted text accordingly.

      Line 260, please clarify the PRS used for G by E interaction testing. It could be site-specific PRS or CPRS.

      We used CPRS for G by E interaction testing, and we have changed the description of our methods as follows:

      “We performed additive interaction analysis between genetic risk (defined by CPRS) and PhenoAgeAccel on overall cancer risk, as well as genetic risk (defined by PhenoAgeAccel PRS) and lifestyle on PhenoAgeAccel using two indexes: the relative excess risk due to interaction (RERI) and the attributable proportion due to interaction (AP).” (Page 9)

      Line 223, The discussion/interpretation for "while negatively associated with risk of prostate cancer" is lacking.

      As suggested, we have discussed this as follows:

      “In addition, we observed a negative association between PhenoAgeAccel and prostate cancer risk. The unexpected association may have been confounded by diabetes and altered glucose metabolism, both of which are closely linked to aging. When we removed HbA1c and serum glucose from the biological age algorithms, the association became non-statistically significant. Similar findings were also reported by Mak et al. (Mak et al., 2023) and Dugue et al. (Dugue et al., 2021).” (Page 15).

      It is not clear how to define "biologically older" and "biologically younger". Whether the individuals fall in the "middle area" will impact the results.

      We defined "biologically older" and "biologically younger" based on Phenotypic Age Acceleration (PhenoAgeAccel), which was defined as the residual obtained from a linear model when regressing Phenotypic Age on chronological age. We categorized individuals with PhenoAgeAccel > 0 as biologically older and those with PhenoAgeAccel < 0 as biologically younger.

      Compared with individuals at low accelerated aging (the bottom quintile of PhenoAgeAccel), we found those in the "middle area" (quintiles 2 to 4) and high accelerated aging (the top quintile) had a significantly higher risk of overall cancer (Table 2). Individuals fall in the "middle area" also had a moderate risk of overall cancer, when reclassified accelerated aging levels according to quartiles or tertiles of the PhenoAgeAccel (Appendix 1-table 2).

      Do men and women have distinct biological ages, so they were analyzed separately?

      We found that men (median PhenoAgeAccel: 0.34, IQR: -2.42 to 3.53) have higher biological ages than women (median PhenoAgeAccel: -1.38, IQR: -4.26 to 1.96) (P < 0.0001). In addition, men and women have different cancer incidence patterns (Rubin, 2022). Therefore, we conducted separate analyses to investigate the associations of PhenoAgeAccel with cancer risk in men and women.

      Dai, J., Lv, J., Zhu, M., Wang, Y., Qin, N., Ma, H., . . . Shen, H. (2019). Identification of risk loci and a polygenic risk score for lung cancer: a large-scale prospective cohort study in Chinese populations. Lancet Respir Med, 7(10), 881-891. doi: 10.1016/S2213-2600(19)30144-4

      Dugue, P. A., Bassett, J. K., Wong, E. M., Joo, J. E., Li, S., Yu, C., . . . Milne, R. L. (2021). Biological Aging Measures Based on Blood DNA Methylation and Risk of Cancer: A Prospective Study. JNCI Cancer Spectr, 5(1). doi: 10.1093/jncics/pkaa109

      Kuo, C. L., Pilling, L. C., Liu, Z., Atkins, J. L., & Levine, M. E. (2021). Genetic associations for two biological age measures point to distinct aging phenotypes. Aging Cell, 20(6), e13376. doi: 10.1111/acel.13376

      Levine, M. E., Lu, A. T., Quach, A., Chen, B. H., Assimes, T. L., Bandinelli, S., . . . Horvath, S. (2018). An epigenetic biomarker of aging for lifespan and healthspan. Aging (Albany NY), 10(4), 573-591. doi: 10.18632/aging.101414

      Ma, Z., Zhu, C., Wang, H., Ji, M., Huang, Y., Wei, X., . . . Shen, H. (2023). Association between biological aging and lung cancer risk: Cohort study and Mendelian randomization analysis. iScience, 26(3), 106018. doi: 10.1016/j.isci.2023.106018

      Mak, J. K. L., McMurran, C. E., Kuja-Halkola, R., Hall, P., Czene, K., Jylhava, J., & Hagg, S. (2023). Clinical biomarker-based biological aging and risk of cancer in the UK Biobank. Br J Cancer, 129(1), 94-103. doi: 10.1038/s41416-023-02288-w

      Rubin, J. B. (2022). The spectrum of sex differences in cancer. Trends Cancer, 8(4), 303-315. doi: 10.1016/j.trecan.2022.01.013

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      We wish to thank the Reviewers for their critical analysis of the article and for their suggestions and comments.

      In addition and beside the point-by-point answer to the Reviewers, we wish here to emphasize on three essential points that have been raised: First, we never intended (nor pretended) to address the incidence of the two EHT cell emergence processes on downstream fate, after release from the aortic floor (see for example the last paragraph of our initially submitted manuscript). We only wished to bring evidence on cell biological heterogeneity of the HE, particularly relying on cell polarity control and polarity reestablishment/reinforcement in the case of EHT pol+ cells, thus leading to emergence morphodynamic complexity. In the general context of cell extrusion in which all polarity features are generally downregulated, these are remarkable features.

      Second, we inform the Reviewers that we have performed a major revision of the work on the Pard3 proteins issue the outcome of which, hopefully, substantiates significantly the idea of a tuning of cell polarity features in the HE and all along the EHT time-window, for supporting EHT pol- and EHT pol+ types of emergence. To achieve this, we entirely revised the experimental strategy to increase specificity and sensitivity of detection of Pard3 protein isoforms expressed in the vascular system, based on endothelial FACS-sorting, qRT-PCR and single-molecule whole mount in situ hybridization using RNAscope. Importantly, we wish to stress that, by addressing Pard3 proteins, we initially aimed at substantiating our observations on the localization of our podxl2 construct (del-podxl2) used to label apical membranes. Hence, we sought to bring correlative evidence on the variation of expression of polarity proteins at early and later time points of the EHT time-window (suggesting tightly regulated expression control of polarity determinants, possibly at the mRNA level). This was clearly written and justified in the text, lines 227 or 303 of the initial manuscript. Also, this may have led to identify (a) specific isoform(s), including splicing variants as initially addressed.

      As the Reviewers will see, while performing the revision of our work, we now have been able to point at a specific isoform of Pard3, namely Pard3ba, whose mRNA expression level, in aortic cells and at the single cell resolution, is uniquely and specifically enhanced in cells contacting emergence ‘hot spots’. Using our Runx1 mutant fish line (dt-Runx1), we also show that expression of Pard3ba mRNAs, in these specific aortic regions, is sensitive to interference with Runx1 activity (i.e dt-Runx1 increases Pard3ba expression). Altogether, our new results strongly support our idea, initially proposed, on the regulation of polarity features during EHT; they indicates intercellular coordination, throughout cooperative cross-talk between aortic and HE/EHT cells. This is compatible with the idea of a ‘tuning’ of apico-basal polarity during the entire EHT time-window (including maturation of the HE to become competent for emergence and the emergence process per se whose morphodynamic complexity relies on regulating apico-basal polarity associated functions (ex: for controlling the specific junctional recycling modes of EHT pol+ and EHT pol- cells, as we suggest using JAM proteins that we have chosen owing to their function in the recruitment of Pard3 proteins for apico-basal polarity establishment)). This complements nicely our work and highlights the relevance of studying the interplay between aortic and HE/EHT cells (which we have started to dissect in the second part of our manuscript). Further work is obviously required to address local, dynamic variations of mRNAs encoding for this specific isoform of Pard3 as well as specific interference with its functions at the spatial and temporal levels (hence on live tissues), which is far beyond the scope of our currently submitted work.

      Finally, this emphasizes the importance of the aortic context, at the mesoscopic level, in the regulation of the EHT.

      Third, based on these major points and Reviewers suggestions, we propose to take into account the fact that the heterogeneity in emergence morphodynamics was not highlighted and propose the following title:

      ‘Tuning apicobasal polarity and junctional recycling in the hemogenic endothelium orchestrates the morphodynamic complexity of emerging pre-hematopoietic stem cells’

      Regarding Results and Figures, the previous Figures 3 and 4 have been entirely revised, with the support of Supplement Figures (3 and 4 supplement figures, respectively as well as a supplement video to Figure 3). Supplement Figures have also been included to the revised version, for nearly all results that appeared as data not shown (Figure 1 – figure supplement 2: illustrating the maintenance of EHT pol+ and EHT pol- cells after division; Figure 1 – figure supplement 3: illustrating the expression of the hematopoietic marker CD41 by EHT pol+ and EHT pol- cells). Also, a new supplemental figure, Figure 7 – figure supplement 7, has been added to substantiate the impact of interfering with ArhGEF11/PDZ-RhoGEF alternative splicing on hematopoiesis. Finally, a Figure for the Reviewers is added at the end of this file that shows that virtually 100% of aortic floor cells that we consider as hemogenic cells are positive for the hematopoietic marker Gata2b which is upstream of Runx1 (using RNAscope which allows achieving cellular resolution unambiguously).

      Reviewer #1 (Public Review):

      Summary:

      In this research article, the authors utilized the zebrafish embryo to explore the idea that two different cell types emerge with different morphodynamics from the floor of the dorsal aorta based on their apicobasal polarity establishment. The hypothesis that the apical-luminal polarity of the membrane could be maintained after EHT and confer different functionality to the cell is exciting, however, this could not be established. There is a general lack of data supporting several of the main statements and conclusions. In addition, the manuscript is difficult to follow and needs refinement. We present below some questions and suggestions with the goal of guiding the authors to improve the manuscript and solidify their findings.

      Here, we wish to emphasize that we do not make the hypothesis that ‘…the apical-luminal polarity of the membrane could be maintained after EHT …’ but that the apico-basal polarity establishment/maintenance controls the type of emergence and their associated cell biological features (EHT pol+ and EHT pol- cellular morphodynamics, establishment of membrane domains). Hence, our work suggests that these emergence modes, as a consequence of their intrinsic characteristics and differences, might have an impact on cellular behavior after the release (to place the work in the broader context of hematopoietic cell fate and differentiation). More specifically, the difference in the biological features of the luminal versus abluminal membrane for the two EHT types (ex: membrane signaling territories, membrane pools devoted to specific functions), might endow the cells with specific functional properties, after the release. What happens to those cells thereafter, except for illustrating the evolution of the luminal membrane for pol+ EHT cells, is beyond the scope of this paper. Here, we analyze and characterize some of the cell biological features of the EHT process per se (the emergence from the aortic floor), including the dynamic interface with adjoining endothelial cells.

      Strengths:

      New transgenic zebrafish lines developed. Challenging imaging.

      Weaknesses:

      (1) The authors conclude that the truncated version of Podxl2 fused to a fluorophore is enriched within the apical site of the cell. However, based on the images provided, an alternative interpretation is that the portion of the membrane within the apical side is less stretched than in the luminal side, and therefore the fluorophore is more concentrated and easier to identify by confocal. This alternative interpretation is also supported by data presented later in the paper where the authors demonstrate that the early HE is not polarized (membranes are not under tension and stretched yet). Could the authors confirm their interpretation with a different technique/marker like TEM?

      The argument of the apparent enrichment, or exclusion, of a marker depending on membrane stretching (and hence molecular packing) would be valid for any type of molecule embedded in these membranes, including of course endogenous ones (this is one of the general biophysical principles leading to the establishment of membrane domains, structurally and functionally speaking); hence, using another marker would not solve the issue because it would depends on its behavior in regard to packing (in particular lipid packing), which is difficult to anticipate and is a topic in its own (especially in this system that has been poorly investigated in regard to its biophysical and biochemical properties in vivo (including its exposure to the hemodynamics)).

      If we follow the logic of the Reviewer, it appears that it is not consistent with our results on the maturing HE. Indeed, in our dt-Runx1 mutants, mKate2-podxl2 is enriched at the luminal membrane of HE cells (HE cells are elongated, and the two membrane domains have a relative equal surface and bending); in comparison, HE cells have the same morphology in control animals than in mutants but, in controls, eGFP-podxl2 and mKate2-podxl2 are equally partitioned between the luminal and abluminal membranes (see Figure 3 – figure supplement 2 (for mKate2-podxl2) and Figure 2 – figure supplement 1 and 2 (for eGFP-podxl2)). In addition, we took care while designing the eGFP and mKate2 fusions to keep the natural podxl2 sequence containing critical cysteine residues to maintain assembly properties and distance from the transmembrane segment (hence the fluorescent protein per se is not directly exposed to membrane stretching).

      Finally, electron microscopy is not the approach to use for this issue because requiring tissue fixation which is always at risk because modifying significantly membrane properties. On this line, when we fix embryos (and hence membranes, see our new Figure 4 and its Supplemental Figures), we do not appear to maintain obvious EHT pol+ and pol- cell shapes. In addition, to be conclusive, the work would require not TEM but immuno-EM to be able to visualize the marker(s), which is another challenge with this system.

      (2) Could the authors confirm that the engulfed membranes are vacuoles as they claimed, using, for example, TEM? Why is it concluded that "these vacuoles appear to emanate from the abluminal membrane (facing the sub-aortic space) and not from the lumen?" This is not clear from the data presented.

      The same argument regarding electron microscopy mentioned on the point before is valid here (in addition, it would require serial sectioning in the case it would be technically feasible to make sure not to miss the very tinny connection that may only suggest ultimate narrowing down of the facing adjacent bilayers, which is quite challenging). The term vacuole which we use with caution (in fact, more often, we use the term pseudo-vacuoles in the initial manuscript, lines 140, 146, 1467 (legend to Figure 1 – figure supplemental 1 or apparent vacuole-like in the same legend lines 1465 and 1476) is legitimate here because we cannot say that they are portions of the invaginated luminal membrane as we could be accused not to show that these membranes are still connected to the luminal surface; we are here at the limit of the resolution that in vivo imaging is allowing for the moment with this system, and we drive the attention of the Reviewer on the fact that we are reaching here a sub-cellular level which is already a challenge by itself.

      In addition, if there would not be at some point vacuoles (or pseudo-vacuoles) formed in this system (membrane-bounded organelles), it would be difficult to conceive how, after release of the cell, the fluid inherited from the artic lumen would efficiently be chased from these membranes/organelles (see also our model Figure 1 – figure Supplement 1B).

      Why is it concluded that "these vacuoles appear to emanate from the abluminal membrane (facing the sub-aortic space) and not from the lumen?" This is not clear from the data presented.

      This is not referring to our data but to the Sato et al 2023 work. For EHT undergoing cells leading to aortic clusters in mammals and avians, vacuolar structures indeed appear to emanate from the ab-luminal side facing the sub-aortic space (we cannot call it basal because we do not know the polarity status of these cells). In the Revised version of the manuscript, we have moved this paragraph referring to the Sato et al work to the Discussion, which gives the possibility to expand a bit on this issue, for more clarity (see the second paragraph of our new Discussion).

      (3) It is unclear why the authors conclude that "their dynamics appears to depend on the activity of aquaporins and it is very possible that aquaporins are active in zebrafish too, although rather in EHT cells late in their emergence and/or in post-EHT cells, for water chase and vacuolar regression as proposed in our model (Figure 1 - figure supplement 1B)." In our opinion, these figures do not confirm this statement.

      This part of the text has been upgraded and moved to the Discussion (see our answer to point 2), to take Reviewers concern about clarity of the Results text section and allowing elaborating a bit more on this issue. We only wished to drive the attention on the described presence of intracellular vacuolar structures recently addressed in the Sato el al 2023 paper showing EHTcell vacuoles that are proposed to contribute to cellular deformation during the emergence. We take this example to rationalize the regression of the vacuolar structures described Figure 1 - figure supplement 1B, which is why we have written ‘… it is very possible that aquaporins are active in zebrafish too’; the first part of the sentence refers to the Sato et al 2023 paper.

      (4) Could the authors prove and show data for their conclusions "We observed that both EHT pol+ and EHT pol- cells divide during the emergence"; "both EHT pol+ and EHT pol- cells express reporters driven by the hematopoietic marker CD41 (data not shown), which indicates that they are both endowed with hematopoietic potential"; and "the full recovery of their respective morphodynamic characteristics (not shown)?".

      To the new version of our manuscript, we have added new Supplemental information to Figure 1 (two new Supplemental Figures):

      • Figure 1 - figure Supplement 2 that illustrates that both EHT pol+ and EHT pol- cells divide during the emergence as well as the maintenance of morphology for both EHT cell types. We wish also to add here that the maintenance of the EHT pol+ morphology is the most critical point, showing that dividing cells in this system do not necessarily lead to EHT pol- cells.

      • Figure 1 - figure Supplement 3 that shows that both EHT cell types express CD41.

      (5) The authors do not demonstrate the conclusion traced from Fig. 2B. Is there a fusion of the vacuoles to the apical side in the EHT pol+ cells? Do the cells inheriting less vacuoles result in pol- EHT? It looks like the legend for Fig. 2-fig supp is missing.

      As said previously, showing fusion here is not technically possible, but indeed, this is the idea, which fits with the images corresponding to timing points 0-90 minutes (Figure 2A), showing (in particular for the right cell) a large pseudo-vacuole whose membrane is heavily enriched with the polarity marker podxl2 (based on fluorescence signal in a membrane-bounded organelle that, based on its curvature radius, should be more under tension then the more convoluted EHT pol+ cell luminal membrane). Also, EHT pol – cells may be born from HE cells that either inherit from less intracellular vesicles after division (or that are derived from HE cells that are less – or not - exposed to polarity-dependent signaling (see our data presented in the new Figure 4 and the new version of the Discussion (see paragraphs ‘Characteristics of the HE and complexity of pre-hematopoietic stem cell emergence’ and ‘Spatially restricted control of Pard3ba mRNAs by Runx1’).

      Finally, the cartoon Figure 2B is a hypothetical model, consistent with our data, and that is meant to help the reader to understand the idea extrapolated from images that may not be so easy to interpret for people not working on this system. In legend of Figure 2 that describes this issue in the first version of our manuscript (lines 1241-1243), we were cautious and wrote, in parentheses: ‘note that exocytosis of the large vacuolar structure may have contributed to increase the surface of the apical/luminal membrane (the green asterisk labels the lumen of the EHT pol + cell’.

      The legend to Figure 2 – figure supplement 1 is not missing (see lines 1492 – 1499 of the first manuscript). The images of this supplement are not extracted from a time-lapse sequence and show that as early as 30hpf (shortly after the beginning of the EHT time-window – around 28hpf), cells on the aortic floor already exhibit podxl2-containing pseudo-vacuolar structures (which we propose is a prerequisite for HE cell maturation into EHT competent cells; see also Figure 2 – figure supplement 2).

      (6) The title of the paper "Tuning apico-basal polarity and junctional recycling in the hemogenic endothelium orchestrates pre-hematopoietic stem cell emergence complexity" could be interpreted as functional heterogeneity within the HSCs, which is not demonstrated in this work. A more conservative title denoting that there are two types of EHT from the DA could avoid misinterpretations and be more appropriate.

      There was no ambiguity, throughout our initial manuscript, on what we meant when using the word ‘emergence’; it refers only to the extrusion process from the aortic floor.

      Reducing our title only to the 2 types of EHT cells would be very reductionist in regard to our work that also addresses essential aspects of the interplay between hemogenic cells, cells undergoing extrusion (EHT pol+ and pol- cells), and their endothelial neighbors (not to mention what we show in terms of the cell biology for the maturing HE and the regulation of its interface with endothelial cells (evidence for vesicular trafficking, specific regulation of HE-endothelial cell intercalation required for EHT progression etc … ). However, and to take this specific comment into account, we propose a slightly changed title saying that there are emergences differentially characterized by their morphodynamic characteristics:

      ‘Tuning apicobasal polarity and junctional recycling in the hemogenic endothelium orchestrates the morphodynamic complexity of emerging pre-hematopoietic stem cells’

      (7) There are several conclusions not supported by data: "Finally, we have estimated that the ratio between EHT pol+ and EHT pol- cells is of approximately 2/1". "We observed that both EHT pol+ and EHT pol- cells divide during the emergence and remain with their respective morphological characteristics". "We also observed that both EHT pol+ and EHT pol- cells express reporters driven by the hematopoietic marker CD41 (data not shown), which indicates that they are both endowed with hematopoietic potential." These conclusions are key in the paper, and therefore they should be supported by data.

      Most of the requests of the Reviewer in this point have already been asked in point 4 and were added to the revised version.

      Regarding the EHT pol+/pol- ratio, we will keep the ratio to approximately 2/1. The Reviewer should be aware that quantification of EHT cells is a tricky issue and a source of important variability, as can be assessed by the quantifications that we have been performing (see for example figures in which we compare the dt-Runx1 phenotype with Ctrl). This is inherent to this system, more specifically because the EHT process is asynchronous, ranging from approx. 28 hpf to 3 days post fertilization (we have even observed EHT at 5 dpf). We systematically observed heterogeneity in EHT numbers and EHT types between animals and also between experiments (some days we observe EHTs at 48 hpf, others more around 55 hpf or even later). In addition, emergence also proceeds on the lateral side of the aorta and, while it is relatively easy to identify EHT pol+ cells because of their highly characterized morphology, it is more difficult for EHT pol- cells that can be mistaken to round HE cells preparing for division. In the current revision of our work, we provide additional facts and potential explanations on the mechanisms that control this asynchrony and the apparent stochasticity of the EHT process (see results of new Figures 3 and 4).

      Reviewer #2 (Public Review):

      In this study, Torcq and colleagues make careful observations of the cellular morphology of haemogenic endothelium undergoing endothelial to haematopoietic transition (EHT) to become stem cells, using the zebrafish model. To achieve this, they used an extensive array of transgenic lines driving fluorescent markers, markers of apico-basal polarity (podocalixin-FP fusions), or tight junction markers (jamb-FP fusions). The use of the runx truncation to block native Runx1 only in endothelial cells is an elegant tool to achieve something akin to tissuespecific deletion of Runx1. Overall, the imaging data is of excellent quality. They demonstrate that differences in apico-basal polarity are strongly associated with different cellular morphologies of cells undergoing EHT from HE (EHT pol- and EHT pol+) which raises the exciting possibility that these morphological differences reflect the heterogeneity of HE (and therefore HSCs) at a very early stage. They then overexpress a truncated form of Runx1 (just the runt domain) to block Runx1 function and show that more HE cells abort EHT and remain associated with the embryonic dorsal aorta. They identify pard3aa and pard3ab as potential regulators of cell polarity. However, despite showing that loss of runx1 function leads to (late) decreases in the expression of these genes, no evidence for their role in EHT is presented. The FRAP experiments and the 2d-cartography, albeit very elegant, are difficult to interpret and not very clearly described throughout the text, making interpretation difficult for someone less familiar with the techniques. Finally, while it is clear that ArhGEF11 is playing an important role in defining cell shapes and junctions between cells during EHT, there is very little statistical evidence to support the limited data presented in the (very beautiful) images.

      As mentioned in the response to reviewer 1, we revised our whole strategy for the analysis of the role of Pard3 proteins in regulating the emergence of hematopoietic precursors. Our new data, obtained using refined gene expression analysis by qRT-PCR on FACS sorted populations and by in situ gene expression analysis at the single-cell resolution using RNAscope, show first that a unique Pard3 isoform (Pard3ba) is sensitive to runx1 activity, and that its expression is specifically localized in aortic cells contacting hemogenic(HE)/EHT cells. We show a clear correlation between the densification of Pard3ba mRNAs and the presence of contacting HE/EHT cells, suggesting a key role for Pard3ba in a cross talk between aortic and hemogenic cells. Furthermore, we show that our dt-runx1 mutant impacts on the maturation of HE cells; when this mutant is expressed, we observe, in comparison to control, an accumulation of HE cells that are abnormally polarized as well as unusually high numbers of EHT pol+ cells. This strongly suggests that the polarity status of HE cells controls the mode of emergence. Overall, our work shows that regulation of apico-basal polarity features is essential for the maturation of the HE and the proper proceeding of the EHT.

      We made efforts to explain more clearly the FRAP experiments as well as the analysis of 2Dcartography throughout the text to facilitate readers comprehension. 2D-cartography are an invaluable tool to precisely discriminate between endothelial and hemogenic cells, and their usage was essential during the FRAP sessions, to point at specific junctional complexes accurately. Performing FRAP at cellular junctions during aortic development was extremely challenging technically and the outcome subjected to quite significant variability (which often leads to quantitative results at the limit of the statistical significance, which is why we speak of tendencies in our results section reporting on this type of experiments). Apart from constant movement and drifting of the embryos which are sources of variability, the EHT process per se is evolving over time and does so at heterogeneous pace (for example, the apical closure of EHT pol+ cells is characterized by a succession of contraction and stabilization phases, see Lancino et al. 2018) which is an additional source of variability in the measurements. Despite all this, our data collectively and consistently suggest a differential regime of junctional dynamics between EHT cell types and support the critical function of ArhGEF11/PDZ-RhoGEF in the control of junctional turnover at the interface between HE and aortic cells as well as between HE cells to regulate cell-cell intercalation.

      There is a sense that this work is both overwhelming in terms of the sheer amount of imaging data, and the work behind it to generate all the lines they required, and at the same time that there is very little evidence supporting the assertion that pard3 (and even ArhGEF11) are important mediators of cell morphology and cell fate in the context of EHT. For instance, the pard3 expression data, and levels after blocking runx1 (part of Figure 3 and Figure 4) don't particularly add to the manuscript beyond indicating that the pard3 genes are regulated by Runx1.

      We thank the reviewer for the comment on the Pard3 data particularly because it led us to reconsider our strategy to address with more precision and at the cellular resolution the potential function of this protein family during the time-window of the EHT. As summarized in the header of the Public Review, we identified one specific isoform of Pard3 in the zebrafish - Pard3ba – whose sensitivity to runx1 interference and spatial restriction in expression reinforce the idea of a fine control of apico-basal polarity features and associated functions while EHT is proceeding. Our new data also reinforce the interplay between HE/EHT cells and their direct endothelial neighbors.

      Weaknesses

      The writing style is quite convoluted and could be simplified for clarity. For example, there is plenty of discussion and speculation throughout the presentation of the results. A clearer separation of the results from this speculation/discussion would help with understanding. Figures are frequently presented out of order in the text; modifying the figures to accommodate the flow of the text (or the other way around) - would make it much easier to follow the narrative. While the evidence for the different cellular morphologies of cells undergoing EHT is strong, the main claim (or at least the title of the manuscript) that tuning apico-basal polarity and junctional recycling orchestrate stem cell emergence complexity is not well supported by the data.

      We refined our text when necessary, in particular taking care of transferring and substantiating the arguments that appeared in the Results section, to the Discussion. We also made efforts, on several occasions and for clarity, to describe more precisely the results presented in the different panels of the Figures.

      As mentioned in the header of the text of the Public Review and the response to the 6th point of the Public Review of Reviewer 1, we modified slightly the title to avoid ambiguity. In addition, we added a new paragraph to the beginning of our discussion that summarizes the impact of our findings and, we believe, legitimates our title.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) Embryonic stages should be indicated in all images presented for clarification.

      We thank the reviewer for this point, we added stages when missing on the figures (Figure 1, Figure 1 - Figure supplement 1, Figure 2, Figure 2 - Figure supplement 1, Figure 5, Figure 6, Figure 6 - Figure supplement 1, Figure 7 - Figure supplement 3, Figure 7 - Figure supplement 5, Figure 7 - Figure supplement 6)

      (2) In which anatomical site/s were images from Fig 1C and D taken? The surrounding environment looks different, for example, cells in Fig1D seem to be surrounded by other cells, resembling the endothelial plexus at the CHT, while the cells in Fig. 1C seem to be in the dorsal aorta. Is there a spatial difference depending on where cells are budding off? The authors state that there are no differences, but no quantification or data demonstrating that statement is provided.

      As mentioned in the figure legend (lines 1206-1209 of the original manuscript), images for Figure 1C and 1D were both taken at the boundary between the end of the AGM and the entry in the caudal hematopoietic tissue. As the images were acquired from different embryos, the labelling of the underlying vein differs between the two panels, with veinous tissues being more sparsely labelled in panel C than in panel D. These images were chosen to illustrate the clearly opposite morphology between the two EHT types that we describe. However, for the rest of the paper, all images and all analysis were exclusively acquired / performed in the dorsal aorta in the AGM, in a region spanning over approximately 10-12 inter-segmentary vessels, starting from the end of the elongated yolk up to the start of the balled yolk. In light of the work from the lab of Zilong Wen showing that only cells emerging anteriorly exhibit long-term replenishment potential (Tian et al. 2017), we specifically chose to limit our comparative analysis to the AGM region and did not quantitatively investigate emergences occurring in the caudal region of the aorta. Additionally, although we routinely observe both types of emergences occurring in the caudal region of the dorsal aorta, we did not quantify the frequency of either EHT events in this region.

      Finally, the EHT pol+ cells that we show Figure 1C are of the highest quality obtained ever; one reason is that these two cells emerge at the entry of the CHT which is a region a lot easier to image at high resolution in comparison to the trunk because the sample is less thick and because we are less perturbed by heart beats.

      (3) Which figure shows "EHT pol- cells were observed in all other Tg fish lines that we are routinely imaging, including the Tg(Kdrl:Gal4;UAS:RFP) parental line that was used for transgenesis, thus excluding the possibility that these cells result from an artefact due to the expression of a deleted form of Podxl2 and/or to its overexpression."? It would be informative to include this figure.

      Other examples of EHT pol- cells were shown Figure 5C as well as Figure 6B using the Tg(kdrl:Jam3b-eGFP; kdrl:nls-mKate2) fish line, that was routinely used for junctional dynamic analyses by FRAP. Furthermore, we add now a new figure (New Figure 1 – figure supplement 3), to illustrate the presence of EHT pol- cells using the Tg(CD41:eGFP) transgenic background, additionally illustrating that EHT pol- cells are CD41 positive.

      (4) Are the spinning disk confocal images a single plane? Or maximum projections? Sometimes this is not specified.

      We made sure to take into account this remark and went through all figures legends to specify the type of images presented (Figure 1 – figure supplement 1, Figure 2, Figure 2 – figure supplement 1, Figure 2 – figure supplement 2, Figure 7 – figure supplement 3) and also, when relevant, we added this information directly to the figure panels (Figure 6A – 6B).

      (5) Could the expression data by RT-qPCR for the Pard3 isoforms be shown? Additionally, it would be appreciated if this expression data could be complemented using Daniocell (https://daniocell.nichd.nih.gov/).

      As mentioned in the first paragraph of our response to Public Reviews, and based on reviewers’ comments, we revised our strategy for the investigation of pard3 proteins expression in the vascular system, for their potential role in EHT and sensitivity to runx1. First, we used FACS sorting as well as tissue dissection to enrich in aortic endothelial cells and perform our qPCR analyses (see the new Figure 4 – figure supplement 1A and Figure 4 – figure supplement 3A for the strategy). As asked by the reviewers and for more transparency, we show the expression relative to the housekeeping gene ef1a in our different control samples (new Figure 4 – figure supplement 1C). Furthermore, we used single-molecule FISH to precisely characterise in situ the expression of several of the Pard3 isoforms (Pard3aa, Pard3ab and Pard3ba, which, based on qPCR, were the most relevant for our investigation in the vascular system) (see lines 386 to 412 in text relative to Figure 4 – figure supplement 2). This new addition nicely shows the different pattern of expression of 3 of the Pard3 zebrafish isoforms in the trunk of 2dpf embryos, outlining interesting specificities of each isoform expression in different tissues.

      We thank the reviewer for this suggestion to complement our data with the published Daniocell dataset. However, and potentially due to the poor annotation of the different pard3 genes on public databases, gene expression information was absent for two of our isoforms of interest (pard3aa and pard3ba), that we ultimately show to be the most enriched in the vascular system in the trunk. Daniocell gene expression data for the Pard3ab isoform at 48hpf show expression in pronephric duct at 48-58hpf, as well as in intestine progenitors and neuronal progenitors, which is consistent with our in situ observations using RNAscope. However, pard3ab is poorly detected within the hematopoietic and vascular clusters. This observation is coherent with our data that do not show any enrichment of this isoform in vascular tissues compared to other structures. On the other hand, pard3bb does not seem to be particularly enriched in vascular/hematopoietic clusters at 48-58hpf in the Daniocell dataset, in accordance to what we observe with our qPCR. Finally, in the Daniocell dataset, all of the pard3 variants (pard3ab, pard3bb, PARD3 and PARD3 (1 of many)) seem to be either scarcely or not detected in the hematopoietic/vascular system. In our case, for all the isoforms we studied in control condition (pard3aa, pard3ab and pard3ba), and although the technic is only semi-quantitative due to the presence of an amplification step, RNAscope assays seem to indicate a very low expression in aortic cell (with sometime as little as one mRNA copy per cell; this explains low detection in single-cell RNAseq datasets and is coherent with the Daniocell dataset.

      (6) It would be informative to add in the introduction some information on apico-basal polarity, tight junctions, JAMs (ArhGEF11/PDZ-RhoGEF).

      We modified the introduction so as to add relevant information on Pard3 proteins, their link with our JAMs reporters in the context of polarity establishment, as well as the role of ArhGEF11/PDZ-RhoGEF and its alternative splicing variants in regulating junctional integrity in the context of epithelial-to-mesenchymal transition (lines 99 to 127). This modification of the introduction also allowed us to lighten some parts of the result section (lines 222 to 224, 345 to 349 and 454 to 456 of the original manuscript).

      Reviewer #2 (Recommendations For The Authors):

      (1) There is lots of data (and lots of work) in this paper; I feel that the pard3 data doesn't substantially add to the paper, and at the same time there is data missing (see point 10, point 11 below for an example).

      To add to the clarity and substantiate our findings on Pard3, we revised entirely our investigation strategy as mentioned in previous paragraphs. We refined the characterization of Pard3 isoforms expression in the vascular tissue, using both cell enrichment by FACS for gene expression analysis as well as single-molecule FISH (RNAscope) to access to spatial information on the expression of pard3 isoforms, reaching sub-cellular resolution.

      This new strategy allowed us to show the unexpected localization of Pard3ba mRNAs in mRNAs enriched regions in the vicinity of HE/EHT cells (new Figure 4, and paragraph Interfering with Runx1 activity unravels its function in the control of Pard3ba expression and highlights heterogeneous spatial distribution of Pard3ba mRNAs along the aortic axis, see the new manuscript). Overall, the new spatial analysis we performed allowed us to substantiate our findings on Pard3ba and suggests a direct interplay between hemogenic cells and their endothelial aortic neighbors; this interplay supposedly relies on apico-basal polarity features that is at least in part regulated by runx1 in the context of HE maturation and EHT.

      (2) Labelling of the figures could be substantially improved. In many instances, the text refers to a figure (e.g. Fig 6A), but it has several panels that are not well annotated (in the case of Fig 6A, four panels) or labelled sparsely in a way that makes it easy to follow the text and identify the correct panel in the figure. Even supplementary figures are sparsely labelled. Labelling to include embryonic stages, which transgenic is being used, etc should be added to the panels to improve clarity for the reader.

      We revised the figures to added relevant information, including stages, types of images and annotations to facilitate the comprehension, including Figure 6A – 6B, Figure 5B – 5C (see response to Reviewer 1, first comment, for a more complete list of all revised figures, transgenic fish lines and embryonic stages annotations). Furthermore, we revised the integrality of the manuscript to fit as much as possible to the figures and added some annotations to more easily link the text to the figures and panels.

      (3) The current numbering of supplementary figures is quite confusing to follow.

      We revised the manuscript so as to make sure all principal and supplementary figures were called in the right order and that supplementary figures appearance was coherent with the unfolding of the text. For Figure 7 only, the majority of the supplemental figures are called before the principal figure, as they relate to our experimental strategy that we comment on before describing the results.

      (4) Graphs in Fig 4, Fig 7 supplement 1 and some of the supplementary figures miss statistical info for some comparison (I assume when non-significant), and sometimes present a p-value of a statistical test being done between samples across stages - but these are not dealt with in the text. Throughout all graphs, the font size used in graphs for annotation (labelling of samples, x-axis, and in some cases the p values) is very small and difficult to read.

      For Figure 7 - figure supplement 1, non-significant p-values of statistical tests were not displayed (as mentioned in the Figure legend, line 1614 of the original manuscript). For the new Figure 4, all p-values are displayed. For new Figure 4 - figure Supplement 1, statistical tests were only performed to compare RFP+ and RFP- cells in the trunk condition (3 biological replicates) and not in the whole embryo condition, for which we did not perform enough replicates for statistical analysis (biological duplicates).

      (5) The results are generally very difficult to follow, with a fair amount of discussion included but then very little detail of the experiments per se.

      We thank the reviewers for these comments that helped us improve the clarity of the manuscript.

      The Results section was revised to move some of the paragraphs to the introduction (see response to Reviewer 1, 6th comment), and some of them to the Discussion (such as lines 149 to 156 or 410 to 416 in the first version of the manuscript referring to vacuolar structures or to the recycling modes of JAMs in EHT pol+ and EHT pol- cells).

      (6) The truncated version of runx1 is introduced but its expected effect is not explained until the discussion. Related to this, is it expected that blocking runx1 with this construct (leading to accumulation of cells in the aorta before they undergo EHT) then leads to increased numbers of T-cell progenitors in the thymus? Abe et al (2005, J Immunol) have used the same strategy to overexpress the runt domain in thymocytes and found a decrease in these cells, rather than an increase. Can you explain this apparent discrepancy?

      We thank the reviewer for this interesting point on the effect of runx1 interference. This phenotype (increased number of thymic cells) seems to be in agreement with the phenotype that was described in zebrafish using homozygous runx1 mutants (Sood et al. 2010 PMID: 20154212), in which the authors show an increase of lymphoid progenitors in the kidney marrow of adult runx1W84X/W84X mutants compared to controls as well as a similar number of intra-thymic lck:eGFP cells in mutants and controls. Notably, the T-lymphoid lineage seems to be the only lineage spared by the mutation of runx1. This could suggest that in this case either the T-lymphoid lineage can develop independently of runx1 or that a compensation phenomenon (for example by another protein of the runx family) occurs to rescue the generation of T-lymphocytes.

      Although our data shows an impact on T-lymphopoiesis, we do not elucidate the exact mechanism leading to an increased number of thymic cells. In our case, we do not know the half-life of our dt-runx1 protein in newly generated hematopoietic cells when our transgene, expressed under the control of the kdrl vascular promoter, ceases to be produced after emergence. The effect we observe could be direct, due to the presence of our mutant protein after 3 days in thymic cells, or indirect, due to the impact of our mutant on the HE, that could lead to the preferential generation of lymphoid-biased progenitors. Similarly, we do not know whether the cells we observe at this stage in the thymus are generated from long-term HSC or short-term progenitors. Indeed, cell tracing analysis from the lab of Zilong Wen (Tian et al. 2017, see our Ref list) show the simultaneous presence of short-term PBI derived and longterm AGM derived thymic cells at 5dpf. Based on this, we can imagine for example that the sur-numerous cells we observe in the thymus are transient populations that could multiply faster in the absence of definitive populations. Conversely, based on our observation of an accumulation of EHT pol+ events, we can imagine that the EHT pol+ and EHT pol- cells are indeed differentially fated and that EHT pol+ may be biased toward a lymphoid lineage. We also know that at the stage we observe (5dpf), RNAscope assay of runx1 show that a vast majority of thymic cells do not express runx1 (our preliminary data), suggesting that the effect we observe would be an indirect one caused by upstream events rather than by direct interference with the endogenous expression of runx1 in thymic cells.

      The article referred to by the reviewer (Sato et al. 2005, PMID: 16177090) investigates on the role of runx1 during TCR selection for thymic cell maturation and shows that runx1 signaling lowers the apoptotic sensitivity of double-positive thymocytes when artificially activated, leading to a reduced number of single-positive thymic cells. Furthermore, this paper references another study from the same lab (Hayashi et al. 2000, PMID: 11120804) that used the same strategy to study the role of runx1 on the positive and negative selection steps of T lymphocytes maturation. This paper, although showing that runx1 is important for later stages of T lymphocytes differentiation — the double-positive to single-positive stage maturation —, also shows a relative increase in the amount of double-negative and double-positive thymocytes, that could be coherent with our observations. Indeed, in our case, although we show an increased number of thymic cells, we do not know the relative proportion of the different thymocyte subsets. We could explain the increased number of thymic cells by increased number of DN/DP thymocytes that would not preclude a decrease in single-positive thymocytes. Finally, the cells we observe in the thymus of our dt-runx1 mutants may also be different lymphoid populations, namely ILCs, that would react differently to runx1 interference.

      (7) Lines 154-155 refer to aquaporins but are missing a reference. This is a bit of speculation right in the results section and I struggled to understand what the point of it was.

      To clarify the argument and ease the flow of the text, as suggested by the reviewers, we transferred this paragraph (lines 149 to 156 of the initial manuscript) to the Discussion section lines 763-789). We additionally made sure to add the missing reference (Sato et al. 2023, see our Ref list).

      (8) Lines 173-175, indicating that both EHTpol+ and pol- express the CD41 transgenic marker - would be useful to show this data.

      We provide a new supplement Figure (Figure 1 – figure supplement 3), where, using an outcross of the CD41:eGFP and kdrl:mKate2-podxl2 transgenic lines, we show unambiguously and for multiple cells that both polarized EHT pol+ cells and non-polarized EHT pol- cells are CD41 positive. In addition, but not commented on in the main text, we can also see that an HE cell, characterized by its elongated morphology (in the middle of the field), its thickened nucleus and its position on the aortic floor, is also CD41 positive.

      (9) Lines 181-201 - it's not clear how HE cells were identified in the first place - was it just morphology? Or were they identified retrospectively?

      HE cells were identified solely on morphology and spatial criteria (as mentioned in the Methods section, lines 1073-1082 and 1108-1111 of the first manuscript). Furthermore, a recent investigation by the lab of Zilong Wen (Zhao et al. 2022, see our Ref list) questioning the common origin of HE cells and of endothelial cells as well as their respective capacity to extrude from the aorta to generate hematopoietic cells showed, by single-cell tracing, that 96% of floor cells are indeed hemogenic endothelial cells. Furthermore, as mentioned in the response to the 8th point, we show in Figure 1 – figure supplement 3 that all floor cells express CD41. Finally, we also used an alternative method to validate the true hemogenic identity of aortic floor cells and show, using RNAscope, that virtually 100% of floor cells that we consider as typical HE cells are indeed expressing an hematopoietic transcription factor upstream of Runx1, namely Gata2b (see Author response image 1).

      Author response image 1.

      All cells from the aortic floor, at 48hpf, express the hematopoietic marker Gata2b. 48 hpf Tg(Kdrl:eGFP) fixed embryos were used for RNAscope using a probe designed to detect Gata2b mRNAs. Subsequently, images were taken using spinning disk confocal microscopy. The image in the top panel is a z-projection of the entire aortic volume of one embryo and shows the full portion of the dorsal aorta from the anterior part (left side, at the limit of the balled yolk) down to the urogenital orifice (UGO, right side). The 4 boxes (1 - 4) delineate regions that have been magnified beneath (2X). The 2X images corresponding to each box are z-projections (top views) or z-sections (bottom views). The bottom views allow to visualize the aortic floor and to mark its position on top views). Pink arrows point at HE cells (elongated in the anteroposterior direction) and at EHT cells (ovoid/round cells; EHT pol+ cell morphology is not preserved after fixation and RNAscope; thus, it cannot be distinguished from ovoid/round EHT pol- cells). Pink dots = RNAscope spots of various sizes. The green cells in the subaortic space that are marked by RNAscope spots are newly born hematopoietic stem and progenitor cells (see for example box 1). This embryo is representative of n = 5 embryos treated and imaged.

      (1) Line 276 - the difference between the egfp-podxl2 and mKate-podxl2 - could that be due to the fluorophore used? Also, it would be good to label Fig 3 supplement 2 better and to see a control alongside the runt overexpression.

      Line 276 does not point at a difference in control conditions between eGFP-podxl2 and mKatepodxl2 (see in new Figure 1 – figure supplement 3, Figure 2 or in new Figure 3 - figure supplement 2 several examples of non-polarized HE cells in control conditions using both fluorophores) but between control and dt-runx1 conditions, both expressing the mKate2podxl2 transgene. Similarly, the new example that we provide now in the CD41 figure (Figure 1 – figure supplement 3) clearly shows that mKate-podxl2 is enriched at the apical/luminal membrane of EHT pol+ cells while no such enrichment is observed for EHT pol- cells. The Reviewer should be informed that EHT cells are not always the most typical in shape, in particular because cells can be squeezed by underlying tissues and for example the vein; or from the luminal side by flow and tensions on the aortic wall because of heart beat (the more we image up in the trunk, the more difficult the imaging and the stability of cell shape during long time-lapse sequences). To also take into account the reviewer’s comments, we added for the new Figure 3 – figure supplement 2A a control condition next to the dt-runx1 condition.

      (2) There is no quantitation data on the number of excess EHT pol+ cells in the DA, or in the thymus data (Figs 3 Supp1 and Fig 3 Supp 3). Can you quantify this data? This would better support the claim that tunin apico-basal polarity alters the morphology of the emerging HE cells.

      We added quantifications relative to both the emergence process itself, showing the accumulation of HE and EHT pol+ cells (new Figure 3B), and on hematopoiesis per se (new Figure 3 – figure supplement 1). Indeed, we show a diminution in the number of newly generated cmyb+ cells in the sub-aortic space. Furthermore, we improved our quantification of the later phenotype on the thymus (new Figure 3 – figure supplement 3), using improved segmentation methods, that indeed validate the increase number of thymic cells that we described.

      (3) The observed changes in pard3 isoforms are just reading out changes in their expression in the runt1 transgenics, rather than demonstrating a role in apico-basal polarity.

      We entirely revised our strategy regarding Pard3 expression analyses (see also the text at the beginning of this file, for the Public Review). But we wish to stress on the point that we did not intend initially to show directly a role of Pard3 proteins in controlling apico-basal polarity in the system, we just intended to provide correlative evidence supporting our observations with the polarity marker podxl2 (by interfering with their function, as written in the text, apico-basal polarity - which is essential for aortic lumenization and maintenance -, would have been impaired, blurring interpretations).

      During the revision, we obtained the unexpected finding, using RNAscope, that one Pard3 isoform, namely Pard3ba, is the one Pard3 that is expressed non-homogenously along the aortic axis and, in vast majority, by aortic cells and in the direct vicinity of emergence domains of the aortic floor (see the new Figure 4 and Figure 4 – figure supplements 2, 3).

      This correlative relation between expression of Pard3ba in aortic endothelial cells neighbouring HE/EHT cells suggests, as we propose, that a cross talk occurs between hemogenic and aortic cells, and that this cross talk relies, at least in part, on the expression of key components of apico-basal polarity and their associated functional features. In addition, we show that junctional recycling differs between both EHT types, based on our observations on the different dynamics in the turnover of JAM molecules, in the two EHT types. As JAM molecules are also required for the recruitment of Pard3, which initiates the establishment of apico-basal polarity, these different dynamics suggest that the control of apico-basal polarity is involved in supporting the morphodynamic complexity of EHT cell types.

      (4) There is a Fig 5, Supp 2 that is neither mentioned nor described anywhere in the manuscript.

      Figure 5 - figure Supplement 2 is mentioned lines 366-370 of the original manuscript, to describe the initial validation that was performed for our eGFP-JAM constructs in multiple cell types using an ubiquitous heat-shock promoter. We developed our description of this supplemental figure in the new manuscript (lines 504 to 514).

      (5) Lines 445-456 - these read like a bit of discussion, not results. There are other similar parts of the results section that also read like a discussion (e.g. 526-533)

      Although we decided to keep this paragraph in the Results section, as it justifies the rationale behind the choice of ArhGEF11/PDZ-RhoGEF, we took the reviewers comment into account and, as mentioned in the response to reviewer 1 6th comment, lightened the Results section by transferring some of the paragraphs to the Introduction or Discussion sections.

      (6) The description of Fig 7A (from line 505) is missing the stages at which the experiments were performed (also not labelled on the figure).

      The stages at which the experiments were performed is stated in the figure legend (line 1366) as well as in the Methods section of the original manuscript (line 1033). We added the information on top of the panels A and B for more clarity.

      (7) Some figures have multiple panels (e.g. Fig 7Aa'), so when referred to in the text, it remains unclear which panel is being referred to.

      We modified the text so as to refer more clearly to the different panels when mentioned in the text, particularly with regards to Figure 7 and 8 but also for all the other figures.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      eLife assessment

      This study presents valuable data on the antigenic properties of neuraminidase proteins of human A/H3N2 influenza viruses sampled between 2009 and 2017. The antigenic properties are found to be generally concordant with genetic groups. Additional analysis have strengthened the revised manuscript, and the evidence supporting the claims is solid.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary

      The authors investigated the antigenic diversity of recent (2009-2017) A/H3N2 influenza neuraminidases (NAs), the second major antigenic protein after haemagglutinin. They used 27 viruses and 43 ferret sera and performed NA inhibition. This work was supported by a subset of mouse sera. Clustering analysis determined 4 antigenic clusters, mostly in concordance with the genetic groupings. Association analysis was used to estimate important amino acid positions, which were shown to be more likely close to the catalytic site. Antigenic distances were calculated and a random forest model used to determine potential important sites.

      This revision has addressed many of my concerns of inconsistencies in the methods, results and presentation. There are still some remaining weaknesses in the computational work.

      Strengths

      (1) The data cover recent NA evolution and a substantial number (43) of ferret (and mouse) sera were generated and titrated against 27 viruses. This is laborious experimental work and is the largest publicly available neuraminidase inhibition dataset that I am aware of. As such, it will prove a useful resource for the influenza community.

      (2) A variety of computational methods were used to analyse the data, which give a rounded picture of the antigenic and genetic relationships and link between sequence, structure and phenotype.

      (3) Issues raised in the previous review have been thoroughly addressed.

      Weaknesses

      (1). Some inconsistencies and missing data in experimental methods Two ferret sera were boosted with H1N2, while recombinant NA protein for the others. This, and the underlying reason, are clearly explained in the manuscript. The authors note that boosting with live virus did not increase titres. Additionally, one homologous serum (A/Kansas/14/2017) was not generated, although this would not necessarily have impacted the results.

      We agree with the reviewer and this point was addressed in the previous rebuttal.

      (2) Inconsistency in experimental results

      Clustering of the NA inhibition results identifies three viruses which do not cluster with their phylogenetic group. Again this is clearly pointed out in the paper and is consistent with the two replicate ferret sera. Additionally, A/Kansas/14/2017 is in a different cluster based on the antigenic cartography vs the clustering of the titres

      We agree with the reviewer and this point was addressed in the previous rebuttal.

      (3) Antigenic cartography plot would benefit from documentation of the parameters and supporting analyses

      a. The number of optimisations used

      We used 500 optimizations. This information is now included in the Methods section.

      b. The final stress and the difference between the stress of the lowest few (e.g. 5) optimisations, or alternatively a graph of the stress of all the optimisations. Information on the stress per titre and per point, and whether any of these were outliers

      The stress was obtained from 1, 5, 500, or even 5000 optimizations (resulting in stress values of respectively, 1366.47, 1366.47, 2908.60, and 3031.41). Besides limited variation or non-conversion of the stress values after optimization, the obtained maps were consistent in multiple runs. The map was obtained keeping the best optimization (stress value 1366.47, selected using the keepBestOptimization() function).

      Author response image 1.

      The stress per point is presented in the heat map below.

      The heat map indicates stress per serum (x-axis) and strain (y-axis) in blue to red scale.

      c. A measure of uncertainty in position (e.g. from bootstrapping)

      Bootstrap was performed using 1000 repeats and 100 optimizations per repeat. The uncertainty is represented in the blob plot below.

      Author response image 2.

      (4) Random forest

      The full dataset was used for the random forest model, including tuning the hyperparameters. It is more robust to have a training and test set to be able to evaluate overfitting (there are 25 features to classify 43 sera).

      Explicit cross validation is not necessary for random forests as the out of bag process with multiple trees implicitly covers cross validation. In the random forest function in R this is done by setting the mtry argument (number of variables randomly sampled as candidates at each split). R samples variables with replacement (the same variable can be sampled multiple times) of the candidates from the training set. RF will then automatically take the data that is not selected as candidates as test set. Overfit may happen when all data is used for training but the RF method implicitly does use a test set and does not use all data for training.

      Code:

      rf <- randomForest(X,y=Y,ntree=1500,mtry=25,keep.forest=TRUE,importance=TRUE)

      Reviewer #2 (Public Review):

      Summary:

      The authors characterized the antigenicity of N2 protein of 43 selected A(H3N2) influenza A viruses isolated from 2009-2017 using ferret and mice immune sera. Four antigenic groups were identified, which the authors claimed to be correlated with their respective phylogenic/ genetic groups. Among 102 amino acids differed by the 44 selected N2 proteins, the authors identified residues that differentiate the antigenicity of the four groups and constructed a machine-learning model that provides antigenic distance estimation. Three recent A(H3N2) vaccine strains were tested in the model but there was no experimental data to confirm the model prediction results.

      Strengths:

      This study used N2 protein of 44 selected A(H3N2) influenza A viruses isolated from 2009-2017 and generated corresponding panels of ferret and mouse sera to react with the selected strains. The amount of experimental data for N2 antigenicity characterization is large enough for model building.

      Weaknesses:

      The main weakness is that the strategy of selecting 43 A(H3N2) viruses from 2009-2017 was not explained. It is not clear if they represent the overall genetic diversity of human A(H3N2) viruses circulating during this time. In response to the reviewer's comment, the authors have provided a N2 phylogenetic tree using180 randomly selected N2 sequences from human A(H3N2) viruses from 2009-2017. While the 43 strains seems to scatter across the N2 tree, the four antigenic groups described by the author did not correlated with their respective phylogenic/ genetic groups as shown in Fig. 2. The authors should show the N2 phylogenic tree together with Fig. 2 and discuss the discrepancy observed.

      The discrepancies between the provided N2 phylogenetic tree using 180 selected N2 sequences was primarily due to visualization. In the tree presented in Figure 2 the phylogeny was ordered according to branch length in a decreasing way. Further, the tree represented in the rebuttal was built with PhyML 3.0 using JTT substitution model, while the tree in figure 2 was build in CLC Workbench 21.0.5 using Bishop-Friday substitution model. The tree below was built using the same methodology as Figure 2, including branch size ordering. No discrepancies are observed.

      Phylogenetic tree representing relatedness of N2 head domain. N2 NA sequences were ordered according to the branch length and phylogenetic clusters are colored as follows: G1: orange, G2: green, G3: blue, and G4: purple. NA sequences that were retained in the breadth panel are named according to the corresponding H3N2 influenza viruses. The other NA sequences are coded.

      Author response image 3.

      The second weakness is the use of double-immune ferret sera (post-infection plus immunization with recombinant NA protein) or mouse sera (immunized twice with recombinant NA protein) to characterize the antigenicity of the selected A(H3N2) viruses. Conventionally, NA antigenicity is characterized using ferret sera after a single infection. Repeated influenza exposure in ferrets has been shown to enhance antibody binding affinity and may affect the cross-reactivity to heterologous strains (PMID: 29672713). The increased cross-reactivity is supported by the NAI titers shown in Table S3, as many of the double immune ferret sera showed the highest reactivity not against its own homologous virus but to heterologous strains. In response to the reviewer's comment, the authors agreed the use of double-immune ferret sera may be a limitation of the study. It would be helpful if the authors can discuss the potential effect on the use of double-immune ferret sera in antigenicity characterization in the manuscript.

      Our study was designed to understand the breadth of the anti-NA response after the incorporation of NA as a vaccine antigens. Our data does not allow to conclude whether increased breadth of protection is merely due to increased antibody titers or whether an NA boost immunization was able to induce antibody responses against epitopes that were not previously recognized by primary response to infection. However, we now mention this possibility in the discussion and cite Kosikova et al. CID 2018, in this context.

      Another weakness is that the authors used the newly constructed a model to predict antigenic distance of three recent A(H3N2) viruses but there is no experimental data to validate their prediction (eg. if these viruses are indeed antigenically deviating from group 2 strains as concluded by the authors). In response to the comment, the authors have taken two strains out of the dataset and use them for validation. The results is shown as Fig. R7. However, it may be useful to include this in the main manuscript to support the validity of the model.

      The removal of 2 strains was performed to illustrate the predictive performance of the RF modeling. However, Random Forest does not require cross-validation. The reason is that RF modeling already uses an out-of-bag evaluation which, in short, consists of using only a fraction of the data for the creation of the decision trees (2/3 of the data), obviating the need for a set aside the test set:

      “…In each bootstrap training set, about one-third of the instances are left out. Therefore, the out-of-bag estimates are based on combining only about one- third as many classifiers as in the ongoing main combination. Since the error rate decreases as the number of combinations increases, the out-of-bag estimates will tend to overestimate the current error rate. To get unbiased out-of-bag estimates, it is necessary to run past the point where the test set error converges. But unlike cross-validation, where bias is present but its extent unknown, the out-of-bag estimates are unbiased…” from https://www.stat.berkeley.edu/%7Ebreiman/randomforest2001.pdf

      Reviewer #3 (Public Review):

      Summary:

      This paper by Portela Catani et al examines the antigenic relationships (measured using monotypic ferret and mouse sera) across a panel of N2 genes from the past 14 years, along with the underlying sequence differences and phylogenetic relationships. This is a highly significant topic given the recent increased appreciation of the importance of NA as a vaccine target, and the relative lack of information about NA antigenic evolution compared with what is known about HA. Thus, these data will be of interest to those studying the antigenic evolution of influenza viruses. The methods used are generally quite sound, though there are a few addressable concerns that limit the confidence with which conclusions can be drawn from the data/analyses.

      Strengths:

      • The significance of the work, and the (general) soundness of the methods. -Explicit comparison of results obtained with mouse and ferret sera

      Weaknesses:

      • Approach for assessing influence of individual polymorphisms on antigenicity does not account for potential effects of epistasis (this point is acknowledged by the authors).

      We agree with the reviewer and this point was addressed in the previous rebuttal.

      • Machine learning analyses neither experimentally validated nor shown to be better than simple, phylogenetic-based inference.

      We respectfully disagree with the reviewer. This point was addressed in the previous rebuttal as follows.

      This is a valid remark and indeed we have found a clear correlation between NAI cross reactivity and phylogenetic relatedness. However, besides achieving good prediction of the experimental data (as shown in Figure 5 and in FigureR7), machine Learning analysis has the potential to rank or indicate major antigenic divergences based on available sequences before it has consolidated as new clade. ML can also support the selection and design of broader reactive antigens. “

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      (1) Discuss the discrepancy between Fig. 2 and the newly constructed N2 phylogenetic tree with 180 randomly selected N2 sequences of A(H3N2) viruses from 2009-2017. Specifically please explain the antigenic vs. phylogenetic relationship observed in Fig. 2 was not observed in the large N2 phylogenetic tree.

      Discrepancies were due to different method and visualization. A new tree was provided.

      (2) Include a sentence to discuss the potential effect on the use of double-immune ferret sera in antigenic characterization.

      We prefer not to speculate on this.

      (3) Include the results of the exercise run (with the use of Swe17 and HK17) in the manuscript as a way to validate the model.

      The exercise was performed to illustrate predictive potential of the RF modeling to the reviewer. However, cross-validation is not a usual requirement for random forest, since it uses out-of-bag calculations. We prefer to not include the exercise runs within the main manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In the manuscript titled "Disease modeling and pharmacological rescue of autosomal dominant Retinitis Pigmentosa associated with RHO copy number variation" the authors describe the use of patient iPSC-derived retinal organoids to evaluate the pathobiology of a RHO-CNV in a family with dominant retinitis pigmentosa (RP). They find significantly increased expression of rhodopsin, especially within the photoreceptor cell body, and defects in photoreceptor cell outer segment formation/maturation. In addition, they demonstrate how an inhibitor of NR2E3 (a rod transcription factor required for inducing rhodopsin expression), can be used to rescue the disease phenotype.

      Strengths:

      The manuscript is very well written, the illustrations and data presented are compelling, and the authors' interpretation/discussion of their findings is logical.

      Weaknesses:

      A weakness, which the authors have addressed in the discussion section, is the lack of an isogenic control, which would allow for direct analysis of the RHO-CNV in the absence of the other genetic sequence contained within the duplicated region. As the authors suggest, CRISPR correction of a large CNV in the absence of inducing unwanted on-target editing events in patient iPSCs is often very challenging. Given that they have used a no-disease iPSC line obtained from a family member, controlled for organoid differentiation kinetics/maturation state, and that no other complete disease-causing gene is contained within the duplicated region, it is unlikely that the addition of an isogenic control would yield significantly different results.

      Aims and conclusions:

      This reviewer is of the opinion that the authors have achieved their aims and that their results support their conclusions.

      Discussion:

      The authors have provided adequate discussion on the utility of the methods and data as well as the impact of their work on the field.

      We thank the reviewer for their insightful, and encouraging review of our work that has taken several years to get to current stage.

      Reviewer #2 (Public Review):

      Summary:

      The manuscript by Kandoi et al. describes a new 3D retinal organoid model of a mono-allelic copy number variant of the rhodopsin gene that was previously shown to induce autosomal dominant retinitis pigmentosa via a dominant negative mechanism in patients. With advancements in the low-cost genomics application to detect copy number variations, this is a timely article that highlights a potential disease mechanism that goes beyond the retina field. The evidence is relatively strong that the rod photoreceptor phenotype observed in an adult patient with RP in vivo is similar to that phenotype observed in human stem cell-derived retinal organoids. Increases in RHO expression detected by qPCR, RNA-seq, and IHC support this phenotype. Importantly, the amelioration of photoreceptor rhodopsin mislocalization and related defects using the small molecule drug photoregulin demonstrates an important potential clinical application.

      Overall, the authors succeeded in providing solid evidence that copy number variation via a genomic RHO duplication leads to abnormalities in rod photoreceptors that can be partially blocked by photoregulin. However, there are several points that should be addressed that will enhance this paper.

      Strengths:

      • The use of patient-derived organoids from patients that have visual defects is a major strength of this work and adds relevance to the disease phenotype.

      • The rod phenotype assessed by qPCR, RNA-seq, and IHC supports a phenotype that shares similarities with the patient.

      • The use of a small molecule drug that selectively targets rod photoreceptors, as opposed to cones, is a noteworthy strength.

      We thank the reviewers for highlighting the key strengths of the paper.

      Weaknesses:

      (1) The chromosomal segment that was duplicated had 3 copies of RHO in addition to three copies of each of the flanking genes (IFT122, HIF100, PLXND1). Discussion of the involvement of these genes would be helpful. Would duplication of any of these genes alone cause or contribute to adRP? As an example, a missense mutation in IFT122 was previously implicated in photoreceptor loss (PMID: 33606121 PMCID: PMC8519925).

      Thank you for your comment. It is an interesting question on the contribution of the other duplicated genes. Of these, IFT122 is particularly interesting as pointed out. We did a thorough survey through literature and our genetic testing partner’s database, BluePrint Genetics. We did not find any human retinal degeneration cases with variants in IFT122. IFT122 has been shown to cause recessive phenotype in dogs and in complete knockout zebrafish model but dominant or overexpression has not been shown to have a phenotype. Interestingly, recessive biallelic IFT122 mutation can cause Cranioectodermal Dysplasia (Sensenbrenner syndrome, PMID: 24689072) and none of these patient exhibited retinal dystrophy. HIF100 is an epigenetic modifier gene while PLXND1 is expressed in endothelial cells. We will include a discussion on this in the revised manuscript.

      (2) Related to #1, have the authors considered inserting extra copies of RHO (and/or the flanking genes) of these at a genomic safe harbor site? Although not required, this would allow one to study cells with isogenic-matched genetic backgrounds and would partially address the technical challenge of repairing a 188kb duplication, which as the authors note would be difficult to do. Demonstrating that excess copy numbers in different genetic backgrounds would be a huge contribution to the field. At a minimum, a discussion of the role of the nearby genes should be included. 


      Thank you for your suggestion. We plan to test the relative role of 1-3 extra copies of RHO driven off a NRL promoter in order to drive it only in rods in our future mechanistic analysis studies. We will include a discussion on the potential role of the other genes in the revised manuscript.

      (3) In the patient, the central foveal region was spared suggesting that cones were normal. Was there a similar assessment that cones are unaffected in retinal organoids? 


      We will include this data in our revised manuscript but overall did not see a cone defect in RHO CNV organoids. Additionally, although it is true that the central foveal region was relatively spared in this patient, the cones are definitely not normal. The macular cones that remain have been damaged by chronic edema, and photoreceptor and RPE atrophy has progressed into the macula, sparing only the foveal cones.

      (4) Pathway analysis indicated that glycosylation was perturbed and this was proposed as an explanation as to why rhodopsin was mislocalized. Have the authors verified that there is an actual decrease in glycosylation? 


      These studies are ongoing. We are currently looking into the details of cellular pathophysiology focusing on RHO trafficking in RHO-CNV including role of glycosylation and other post-translational modifications defects.

      (5) Line 182: by what criteria are the authors able to state that " there were no clear visible anatomical changes in apical-basal retinal cell type distribution during the early differentiation timeframe (data not shown)." Was this based on histological staining with antibodies, nuclear counter-staining, or some other evaluation?


      This was based on both IHC for various cell type markers and nuclear (DAPI) staining.

      (6) Figure 2C - the appearance of the inner segments in RC and RM looks very different from one another. Have the authors ruled out the possibility that the RC organoid cell isn't a cone? In addition, the RM structure has what appears to be a well-defined OLM which would suggest well-formed Muller glia. Do these structures also exist in RC organoids? Typically the OLM does form in older organoids. In addition, was this representative in numerous EM preparations?


      For clarification on EM data, we will include additional images in the revision as supplementary data. We have not carefully compared OLM between the patient and control organoids but do observe them in both conditions in the older organoids. The EM preparations were made from multiple organoids from two different batches with consistent results.

      (7) What criteria were used to assess cell loss? Has any TUNEL labeling been performed to confirm cell loss? From the existing data, it seems that rod outer segments appear to be affected in organoids. However, it's not clear if the photoreceptors themselves actually die in this model.

      TUNEL was used to assess cell loss and it was not significantly different between the control and patient organoids at the timepoints examined. We did not expect a change as the disease in the patient developed over decades.

      (8) Figure 5B. The RHO staining in the vehicle-treated sample is perturbed relative to the PR3 treatments as indicated in the text. In the vehicle-treated sample, the number of DAPI-positive cells that are completely negative proximal to the inner segments suggests that there might be non-rod cells there. Have the authors confirmed whether these are cones? Labels would be helpful in the left vehicle panel as the morphology looks very different than the treated samples.


      Thank you very much for the various suggestions and these will be included in the revised manuscript version. A number of the cells in the negative regions are OTX2+/NRL- and likely to be cones (Figure 4 A and B). Unfortunately, we do not have a very good cone nuclear marker as RXRγ does not consistently stain mature cones.

      (9) It is interesting that in addition to increases in RHO, and photo-transduction, there are also increases in PTPRT which is related to synaptic adhesion. Is there evidence of ectopic neurites that result from PTPRT over-expression?

      You are absolutely correct that PTPRT data is very interesting. PTPRT requires similar PTMs like RHO in photoreceptors for its synaptic localization. We did not specifically look at ectopic neurites and test that in the revision. It will interesting to follow-up on its expression pattern to see if it gets processed or localized normally if we can find a working antibody. It is also possible that the gene-expression increase due to feedback upregulation secondary to improper protein processing.

      Reviewer #3 (Public Review):

      This manuscript reports a novel pedigree with four intact copies of RHO on a single chromosome which appears to lead to overexpression of rhodopsin and a corresponding autosomal dominant form of RP. The authors generate retinal organoids from patient- and control-derived cells, characterize the phenotypes of the organoids, and then attempt to 'treat' aberrant rhodopsin expression/mislocalization in the patient organoids using a small molecule called photoregulin 3 (PR3). While this novel genetic mechanism for adRP is interesting, the organoid work is not compelling. There are multiple problems related to the technical approaches, the presentation of the results, and the interpretations of the data. I will present my concerns roughly in the order in which they appear in the manuscript.

      Major concerns:

      (1) Individual human retinal organoids in culture can show a wide range of differentiation phenotypes with respect to the expression of specific markers, percentages of given cell types, etc. For this reason, it can be very difficult to make rigorous, quantitative comparisons between 'wild-type' and 'mutant' organoids. Despite this difficulty, the author of the present manuscript frequently presents results in an impressionistic manner without quantitation. Furthermore, there is no indication that the investigator who performed the phenotypic analyses was blind with respect to the genotype. In my opinion, such blinding is essential for the analysis of phenotypes in retinal organoids. To give an example, in lines 193-194 the authors write "we observed that while the patient organoids developing connecting cilium and the inner segments similar to control organoids, they failed to extend outer segments". Outer segments almost never form normally in human retinal organoids, even when derived from 'wild-type' cells. Thus, I consider it wholly inadequate to simply state that outer segment formation 'failed' without a rigorous, quantitative, and blinded comparison of patient and control organoids.

      We agree it is challenging to generate outer segments in retinal organoids but we are not the first to show this. This has been demonstrated by multiple independent labs (Mayerl et al (PMID: 36206764), Wahlin et al (PMID: 28396597), West at al (PMID: 35334217) including ours (Chirco et al (PMID: 34653402). To clarify, we did not observe any OS like tissue in the patient organoids across multiple EM preps of a number of organoids from two independent 300+ day experiments which matched the phase microscopy data presented in Fig2B.

      (2) The presentation of qPCR results in Figure 3A is very confusing. First, the authors normalize expression to that of CRX, but they don't really explain why. In lines 210-211, they write "CRX, a ubiquitously expressing photoreceptor gene maintained from development to adulthood." Several parts of this sentence are misleading or incomplete. First, CRX is not 'ubiquitously expressed' (which usually means 'in all cell types') nor is it photoreceptor-specific: CRX is expressed in rods, cones, and bipolar cells. Furthermore, CRX expression levels are not constant in photoreceptors throughout development/adulthood. So, for these reasons alone, CRX is a poor choice for the normalization of photoreceptor gene expression.

      As you are aware, all housekeeping genes have shortcomings when used for normalizing PCR data. We went with CRX as within the timepoints chosen, it is not expected to change much and thus represent a good equalizer for relative photoreceptor numbers between the organoids and conditions. While we agree that CRX is weakly expressed in bipolar cells (Yamamoto et al 2020), it is not expected to bias the data too much as we have not seen nor have other reported a huge relative difference in bipolar cell number in organoids. We also confirm this by showing equivalent expression of OTX2, RCVRN and NRL between all conditions.

      Second, the authors' interpretation of the qPCR results (lines 216-218) is very confusing. The authors appear to be saying that there is a statistically significant increase in RHO levels between D120 and D300. However, the same change is observed in both control and patient organoids and is not unexpected, since the organoids are more mature at D300. The key comparison is between control and patient organoids at D300. At this time point, there appears to be no difference between control and patient. The authors don't even point this out in the main text.

      Thank you for the comment and we apologize if this confused you. However, as can been seen in the graph in Figure 3A, we do compare expression of genes including RHO between control and patient organoids at two different time points. There are four conditions: D120-RC, D120-RM, D300-RC and D300-RM with individual data points and error bars for each condition. There is a statistically significant increase at both time points upon comparing the control and patient organoids for RHO. We compared RHO expression between patient organoids at the two time points and it was not statistically different.

      Third, the variability in the number of photoreceptor cells in individual organoids makes a whole-organoid comparison by qPCR fraught with difficulty. It seems to me that what is needed here is a comparison of RHO transcript levels in isolated rod photoreceptors.

      We agree that this makes it challenging. This was the exact reasoning for using CRX for normalization since it is predominantly present in photoreceptors. This was validated by the data showing no difference in expression of photoreceptor markers OTX2, RCVRN or NRL between the organoids.

      (3) I cannot understand what the authors are comparing in the bulk RNA-seq analysis presented in the paragraph starting with line 222 and in the paragraph starting with line 306. They write "we performed bulk-RNA sequencing on 300-days-old retinal organoids (n=3 independent biological replicates). Patient retinal organoids demonstrated upregulated transcriptomic levels of RHO... comparable to the qRT-PCR data." From the wording, it suggests that they are comparing bulk RNA-seq of patients and control organoids at D300. However, this is not stated anywhere in the main text, the figure legend, or the Methods. Yet, the subsequent line "comparable to the qRT-PCR data" makes no sense, because the qPCR comparison was between patient samples at two different time points, D120 and D300, not between patient and control. Thus, the reader is left with no clear idea of what is even being compared by RNA-seq analysis.

      We apologize if the conditions were not obvious and will clarify this in the revised version. The conditions compared are control and patient organoids at D300. Regarding comparison to RT-PCR, as stated above, the comparison shown is between patient and control organoids at two different timepoints.

      Remarkably, the exact same lack of clarity as to what is being compared is found in the second RNA-seq analysis presented in the paragraph starting with line 306. Here the authors write "We further carried out bulk RNA-sequencing analysis to comprehensively characterize three different groups of organoids, 0.25 μM PR3-treated and vehicle-treated patient organoids and control (RC) organoids from three independent differentiation experiments. Consistent with the qRT-PCR gene expression analysis, the results showed a significant downregulation in RHO and other rod phototransduction genes." Here, the authors make it clear that they have performed RNA-seq on three types of samples: PR3-treated patient organoids, vehicle-treated patient organoids, and control organoids (presumably not treated). Yet, in the next sentence, they state "the results showed a significant downregulation in RHO", but they don't state what two of the three conditions are being compared! Although I can assume that the comparison presented in Fig. 6A is between patient vehicle-treated and PR3-treated organoids, this is nowhere explicitly stated in the manuscript.

      Thank you for the comment and we will explicitly state various comparisons in the revised version.

      (4) There are multiple flaws in the analysis and interpretation of the PR3 treatment results. The authors wrote (lines 289-2945) "We treated long-term cultured 300-days-old, RHO-CNV patient retinal organoids with varying concentrations of PR3 (0.1, 0.25 and 0.5 μM) for one week and assessed the effects on RHO mRNA expression and protein localization. Immunofluorescence staining of PR3-treated organoids displayed a partial rescue of RHO localization with optimal trafficking observed in the 0.25 μM PR3-treated organoids (Figure 5B). None of the organoids showed any evidence of toxicity post-treatment."

      There are multiple problems here. First, the results are impressionistic and not quantitative. Second, it's not clear that the investigator was blinded with respect to the treatment condition. Third, in the sections presented, the organoids look much more disorganized in the PR3-treated conditions than in the control. In particular, the ONL looks much more poorly formed. Overall, I'd say the organoids looked considerably worse in the 0.25 and 0.5 microM conditions than in the control, but I don't know whether or not the images are representative. Without rigorously quantitative and blinded analysis, it is impossible to draw solid conclusions here. Lastly, the authors state that "none of the organoids showed any evidence of toxicity post-treatment," but do not explain what criteria were used to determine that there was no toxicity.

      Thank you for your critical insight. The RHO localization data is qualitative as it is very difficult to accurately quantify rhodopsin trafficking within the cell in the organoid. Thus, for quantitative comparison, we have provided expression level changes. Regarding toxicity, we analyzed the organoids by morphology and TUNEL and did not observe significant difference between the conditions. This closely mimics mouse data on PR3 which suppressed rod function in mice following IP injection without any obvious toxicity.

      (5) qPCR-based quantitation of rod gene expression changes in response to PR3 treatment is not well-designed. In lines 294-297 the authors wrote "PR3 drove a significant downregulation of RHO in a dose-dependent manner. Following qRT-PCR analysis, we observed a 2-to-5 log2FC decrease in RHO expression, along with smaller decreases in other rod-specific genes including NR2E3, GNAT1 and PDE6B." I assume these analyses were performed on cDNA derived from whole organoids. There are two problems with this analysis/interpretation. First, a decrease in rod gene expression can be caused by a decrease in the number of rods in the treated organoids (e.g., by cell death) or by a decrease in the expression of rod genes within individual rods. The authors do not distinguish between these two possibilities. Second, as stated above, the percentage of cells that are rods in a given organoid can vary from organoid to organoid. So, to determine whether there is downregulation of rod gene expression, one should ideally perform the qPCR analysis on purified rods.

      The reviewer is correct in pointing the potential reasons for reduction in RHO levels following PR3 treatment. Thus, we have provided NRL expression levels in the graph to show that this key rod-specific gene does not change suggesting equivalent number of rod photoreceptor cells. The suggestion of using purified rods is not practical here, as we do not have any way to sort human rods due to the lack of a rod-specific cell surface marker.

      (6) In Figure 4B 'RM' panels, the authors show RHO staining around the somata of 'rods' but the inset images suggest that several of these cells lack both NRL and OTX2 staining in their nuclei. All rods should be positive for NRL. Conversely, the same image shows a layer of cells scleral to the cells with putative RHO somal staining which do not show somal staining, and yet they do appear to be positive for NRL and OTX2. What is going on here? The authors need to provide interpretations for these findings.

      Since RHO is a cytoplasmic marker and photoreceptor are tightly packed, it is difficult to make a 1:1 comparison to NRL/OTX2 nuclear marker to RHO. Additionally, as the RHO+ cytoplasm moves towards scleral surface, it is expected to pass adjacent to other nuclei. Few of the rods do still have normal Rhodopsin trafficking and it is likely these will not have somal RHO similar to control conditions. We do rarely observe these cells as highlighted by the occasional RHO in IS/OS of RM organoids in the figure. We do agree that the NRL staining in the figure 4B (>D250) is not extremely crisp and we will include an updated figure in the revised version.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This study presents a new and valuable theoretical account of spatial representational drift in the hippocampus. The evidence supporting the claims is convincing, with a clear and accessible explanation of the phenomenon. Overall, this study will likely attract researchers exploring learning and representation in both biological and artificial neural networks.

      We would like to ask the reviewers to consider elevating the assessment due to the following arguments. As noted in the original review, the study bridges two different fields (machine learning and neuroscience), and does not only touch a single subfield (representational drift in neuroscience). In the revision, we also analysed data from four different labs, strengthening the evidence and the generality of the conclusions.

      Public Reviews:

      Reviewer #1 (Public Review):

      The authors start from the premise that neural circuits exhibit "representational drift" -- i.e., slow and spontaneous changes in neural tuning despite constant network performance. While the extent to which biological systems exhibit drift is an active area of study and debate (as the authors acknowledge), there is enough interest in this topic to justify the development of theoretical models of drift.

      The contribution of this paper is to claim that drift can reflect a mixture of "directed random motion" as well as "steady state null drift." Thus far, most work within the computational neuroscience literature has focused on the latter. That is, drift is often viewed to be a harmless byproduct of continual learning under noise. In this view, drift does not affect the performance of the circuit nor does it change the nature of the network's solution or representation of the environment. The authors aim to challenge the latter viewpoint by showing that the statistics of neural representations can change (e.g. increase in sparsity) during early stages of drift. Further, they interpret this directed form of drift as "implicit regularization" on the network.

      The evidence presented in favor of these claims is concise. Nevertheless, on balance, I find their evidence persuasive on a theoretical level -- i.e., I am convinced that implicit regularization of noisy learning rules is a feature of most artificial network models. This paper does not seem to make strong claims about real biological systems. The authors do cite circumstantial experimental evidence in line with the expectations of their model (Khatib et al. 2022), but those experimental data are not carefully and quantitatively related to the authors' model.

      We thank the reviewer for pushing us to present stronger experimental evidence. We now analysed data from four different labs. Two of those are novel analyses of existing data (Karlsson et al, Jercog et al). All datasets show the same trend - increasing sparsity and increasing information per cell. We think that the results, presented in the new figure 3, allow us to make a stronger claim on real biological systems.

      To establish the possibility of implicit regularization in artificial networks, the authors cite convincing work from the machine-learning community (Blanc et al. 2020, Li et al., 2021). Here the authors make an important contribution by translating these findings into more biologically plausible models and showing that their core assumptions remain plausible. The authors also develop helpful intuition in Figure 4 by showing a minimal model that captures the essence of their result.

      We are glad that these translation efforts are appreciated.

      In Figure 2, the authors show a convincing example of the gradual sparsification of tuning curves during the early stages of drift in a model of 1D navigation. However, the evidence presented in Figure 3 could be improved. In particular, 3A shows a histogram displaying the fraction of active units over 1117 simulations. Although there is a spike near zero, a sizeable portion of simulations have greater than 60% active units at the end of the training, and critically the authors do not characterize the time course of the active fraction for every network, so it is difficult to evaluate their claim that "all [networks] demonstrated... [a] phase of directed random motion with the low-loss space." It would be useful to revise the manuscript to unpack these results more carefully. For example, a histogram of log(tau) computed in panel B on a subset of simulations may be more informative than the current histogram in panel A.

      The previous figure 3A was indeed confusing. In particular, it lumped together many simulations without proper curation. We redid this figure (now Figure 4), and added supplementary figures (Figures S1, S2) to better explain our results. It is now clear that the simulations with a large number of active units were either due to non-convergence, slow timescale of sparsification or simulations featuring label noise in which the fraction of active units is less affected. Regarding the log(tau) calculation, while it could indeed be an informative plot, it could not be calculated in a simple manner for all simulations. This is because learning curves are not always exponential, but sometimes feature initial plateaus (see also Saxe et al 2013, Schuessler et al 2020). We added a more detailed explanation of this limitation in the methods section, and we believe the current figure exemplifies the effect in a satisfactory manner.

      Reviewer #2 (Public Review):

      Summary:

      In the manuscript "Representational drift as a result of implicit regularization" the authors study the phenomenon of representational drift (RD) in the context of an artificial network that is trained in a predictive coding framework. When trained on a task for spatial navigation on a linear track, they found that a stochastic gradient descent algorithm led to a fast initial convergence to spatially tuned units, but then to a second very slow, yet directed drift which sparsified the representation while increasing the spatial information. They finally show that this separation of timescales is a robust phenomenon and occurs for a number of distinct learning rules.

      Strengths:

      This is a very clearly written and insightful paper, and I think people in the community will benefit from understanding how RD can emerge in such artificial networks. The mechanism underlying RD in these models is clearly laid out and the explanation given is convincing.

      We thank the reviewer for the support.

      Weaknesses:

      It is unclear how this mechanism may account for the learning of multiple environments.

      There are two facets to the topic of multiple environments. First, are the results of the current paper relevant when there are multiple environments? Second, what is the interaction between brain mechanisms of dealing with multiple environments and the results of the current paper?

      We believe the answer to the first question is positive. The near-orthogonality of representations between environments implies that changes in one can happen without changes in the other. This is evident, for instance, in Khatib et al and Geva et al - in both cases, drift seems to happen independently in two environments, even though they are visited intermittently and are visually similar.

      The second question is a fascinating one, and we are planning to pursue it in future work. While the exact way in which the brain achieves this near-independence is an open question, remapping is one possible window into this process.

      We extended the discussion to make these points clear.

      The process of RD through this mechanism also appears highly non-stationary, in contrast to what is seen in familiar environments in the hippocampus, for example.

      The non-stationarity noted by the reviewer is indeed a major feature of our observations, and is indeed linked to familiarity. We divide learning into three phases (now more clearly stated in Table 1 and Figure 4C). The first, rapid phase, consists of improvement of performance - corresponding to initial familiarity with the environment. The third phase, often reported in the literature of representational drift, is indeed stationary and obtained after prolonged familiarity. Our work focuses on the second phase, which is not as immediate as the first one, and can take several days. We note in the discussion that experiments which include a long familiarization process can miss this phase (see also Table 3). Furthermore, we speculate that real life is less stationary than a lab environment, and this second phase might actually be more relevant there.

      Reviewer #3 (Public Review):

      Summary:

      Single-unit neural activity tuned to environmental or behavioral variables gradually changes over time. This phenomenon, called representational drift, occurs even when all external variables remain constant, and challenges the idea that stable neural activity supports the performance of well-learned behaviors. While a number of studies have described representational drift across multiple brain regions, our understanding of the underlying mechanism driving drift is limited. Ratzon et al. propose that implicit regularization - which occurs when machine learning networks continue to reconfigure after reaching an optimal solution - could provide insights into why and how drift occurs in neurons. To test this theory, Ratzon et al. trained a Feedforward Network trained to perform the oft-utilized linear track behavioral paradigm and compare the changes in hidden layer units to those observed in hippocampal place cells recorded in awake, behaving animals.

      Ratzon et al. clearly demonstrate that hidden layer units in their model undergo consistent changes even after the task is well-learned, mirroring representational drift observed in real hippocampal neurons. They show that the drift occurs across three separate measures: the active proportion of units (referred to as sparsification), spatial information of units, and correlation of spatial activity. They continue to address the conditions and parameters under which drift occurs in their model to assess the generalizability of their findings.

      However, the generalizability results are presented primarily in written form: additional figures are warranted to aid in reproducibility.

      We added figures, and a Github with all the code to allow full reproducibility.

      Last, they investigate the mechanism through which sparsification occurs, showing that the flatness of the manifold near the solution can influence how the network reconfigures. The authors suggest that their findings indicate a three-stage learning process: 1) fast initial learning followed by 2) directed motion along a manifold which transitions to 3) undirected motion along a manifold.

      Overall, the authors' results support the main conclusion that implicit regularization in machine learning networks mirrors representational drift observed in hippocampal place cells.

      We thank the reviewer for this summary.

      However, additional figures/analyses are needed to clearly demonstrate how different parameters used in their model qualitatively and quantitatively influence drift.

      We now provide additional figures regarding parameters (Figures S1, S2).

      Finally, the authors need to clearly identify how their data supports the three-stage learning model they suggest.

      Their findings promise to open new fields of inquiry into the connection between machine learning and representational drift and generate testable predictions for neural data.

      Strengths:

      (1) Ratzon et al. make an insightful connection between well-known phenomena in two separate fields: implicit regularization in machine learning and representational drift in the brain. They demonstrate that changes in a recurrent neural network mirror those observed in the brain, which opens a number of interesting questions for future investigation.

      (2) The authors do an admirable job of writing to a large audience and make efforts to provide examples to make machine learning ideas accessible to a neuroscience audience and vice versa. This is no small feat and aids in broadening the impact of their work.

      (3) This paper promises to generate testable hypotheses to examine in real neural data, e.g., that drift rate should plateau over long timescales (now testable with the ability to track single-unit neural activity across long time scales with calcium imaging and flexible silicon probes). Additionally, it provides another set of tools for the neuroscience community at large to use when analyzing the increasingly high-dimensional data sets collected today.

      We thank the reviewer for these comments. Regarding the hypotheses, these are partially confirmed in the new analyses we provide of data from multiple labs (new Figure 3 and Table 3) - indicating that prolonged exposure to the environment leads to more stationarity.

      Weaknesses:

      (1) Neural representational drift and directed/undirected random walks along a manifold in ML are well described. However, outside of the first section of the main text, the analysis focuses primarily on the connection between manifold exploration and sparsification without addressing the other two drift metrics: spatial information and place field correlations. It is therefore unclear if the results from Figures 3 and 4 are specific to sparseness or extend to the other two metrics. For example, are these other metrics of drift also insensitive to most of the Feedforward Network parameters as shown in Figure 3 and the related text? These concerns could be addressed with panels analogous to Figures 3a-c and 4b for the other metrics and will increase the reproducibility of this work.

      We note that the results from figures 3 and 4 (original manuscript) are based on abstract tasks, while in figure 2 there is a contextual notion of spatial position. Spatial position metrics are not applicable to the abstract tasks as they are simple random mapping of inputs, and there isn’t necessarily an underlying latent variable such as position. This transition between task types is better explained in the text now. In essence the spatial information and place field correlation changes are simply signatures of the movements in parameter space. In the abstract tasks their change becomes trivial, as the spatial information becomes strongly correlated with sparsity and place fields are simply the activity vectors of units. These are guaranteed to change as long as there are changes in the activity statistics. We present here the calculation of these metrics averaged over simulations for completeness.

      Author response image 1.

      PV correlation between training time points averaged over 362 simulations. (B) Mean SI of units normalized to first time step, averaged over 362 simulations. Red line shows the average time point of loss convergence, the shaded area represents one standard deviation.

      (2) Many caveats/exceptions to the generality of findings are mentioned only in the main text without any supporting figures, e.g., "For label noise, the dynamics were qualitatively different, the fraction of active units did not reduce, but the activity of the units did sparsify" (lines 116-117). Supporting figures are warranted to illustrate which findings are "qualitatively different" from the main model, which are not different from the main model, and which of the many parameters mentioned are important for reproducing the findings.

      We now added figures (S1, S2) that show this exactly. We also added a github to allow full reproduction.

      (3) Key details of the model used by the authors are not listed in the methods. While they are mentioned in reference 30 (Recanatesi et al., 2021), they need to be explicitly defined in the methods section to ensure future reproducibility.

      The details of the simulation are detailed in the methods sections. We also added a github to allow full reproducibility.

      (4) How different states of drift correspond to the three learning stages outlined by the authors is unclear. Specifically, it is not clear where the second stage ends, and the third stage begins, either in real neural data or in the figures. This is compounded by the fact that the third stage - of undirected, random manifold exploration - is only discussed in relation to the introductory Figure 1 and is never connected to the neural network data or actual brain data presented by the authors. Are both stages meant to represent drift? Or is only the second stage meant to mirror drift, while undirected random motion along a manifold is a prediction that could be tested in real neural data? Identifying where each stage occurs in Figures 2C and E, for example, would clearly illustrate which attributes of drift in hidden layer neurons and real hippocampal neurons correspond to each stage.

      Thanks for this comment, which urged us to better explain these concepts.

      The different processes (reduction in loss, reduction in Hessian) happen in parallel with different timescales. Thus, there are no sharp transitions between the phases. This is now explained in the text in relation to figure 4C, where the approximate boundaries are depicted.

      The term drift is often used to denote a change in representation without a change in behavior. In this sense, both the second and third phases correspond to drift. Only the third stage is stationary. This is now emphasized in the text and in the new Table 1. Regarding experimental data, apart from the new figure 3 with four datasets, we also summarize in Table 3 the relation between duration of familiarity and stationarity of the data.

      Recommendations for the authors:

      The reviewers have raised several concerns. They concur that the authors should address the specific points below to enhance the manuscript.

      (1) The three different phases of learning should be clearly delineated, along with how they are determined. It remains unclear in which exact phase the drift is observed.

      This is now clearly explained in the new Table 1 and Figure 4C. Note that the different processes (reduction in loss, reduction in Hessian) happen in parallel with different timescales. Thus, there are no sharp transitions between the phases. This is now explained in the text in relation to figure 4C, where the approximate boundaries are depicted.

      The term drift is often used to denote a change in representation without a change in behavior. In this sense, both the second and third phases correspond to drift. Only the third stage is stationary. This is now emphasized in the text and in the new Table 1. Regarding experimental data, apart from the new figure 3 with four datasets, we also summarize in Table 3 the relation between duration of familiarity and stationarity of the data.

      (2) The term "sparsification" of unit activity is not fully clear. Its meaning should be more explicitly explained, especially since, in the simulations, a significant number of units appear to remain active (Fig. 3A).

      We now define precisely the two measures we use - Active Fraction, and Fraction Active Units. There is a new section with an accompanying figure in the Methods section. As Figure S2 shows, the noise statistics (label noise vs. update noise) differentially affects these two measures.

      (3) While the study primarily focuses on one aspect of representational drift-the proportion of active units-it should also explore other features traditionally associated with representational drift, such as spatial information and the correlation between place fields.

      This absence of features is related to the abstract nature of some of the tasks simulated in our paper. In our original submission the transition between a predictive coding task to more abstract tasks was not clearly explained, creating some confusion regarding the measured metrics. We now clarified the motivation for this transition.

      Both the initial simulation and the new experimental data analysis include spatial information (Figures 2,3). The following simulations (Figure 4) with many parameter choices use more abstract tasks, for which the notion of correlation between place cells and spatial information loses its meaning as there is no spatial ordering of the inputs, and every input is encountered only once. Spatial information becomes strongly correlated with the inverse of the active fraction metric. The correlation between place cells is also directly linked to increase in sparseness for these tasks.

      (4) There should be a clearer illustration of how labeling noise influences learning dynamics and sparsification.

      This was indeed confusing in the original submission. We removed the simulations with label noise from Figure 4, and added a supplementary figure (S2) illustrating the different effects of label noise.

      (5) The representational drift observed in this study's simulations appears to be nonstationary, which differs from in vivo reports. The reasons for this discrepancy should be clarified.

      We added experimental results from three additional labs demonstrating a change in activity statistics (i.e. increase in spatial information and increase in sparseness) over a long period of time. We suggest that such a change long after the environment is already familiar is an indication for the second phase, and stress that this change seems to saturate at some point, and that most drift papers start collecting data after this saturation, hence this effect was missed in previous in vivo reports. Furthermore, these effects are become more abundant with the advent on new calcium imaging methods, as the older electrophysiological regording methods did not usually allow recording of large amounts of cells for long periods of time. The new Table 3 surveys several experimental papers, emphasizing the degree of familiarity with the environment.

      (6) A distinctive feature of the hippocampus is its ability to learn different spatial representations for various environments. The study does not test representational drift in this context, a topic of significant interest to the community. Whether the authors choose to delve into this is up to them, but it should at least be discussed more comprehensively, as it's only briefly touched upon in the current manuscript version.

      There are two facets to the topic of multiple environments. First, are the results of the current paper relevant when there are multiple environments? Second, what is the interaction between brain mechanisms of dealing with multiple environments and the results of the current paper?

      We believe the answer to the first question is positive. The near-orthogonality of representations between environments implies that changes in one can happen without changes in the other. This is evident, for instance, in Khatib et al and Geva et al - in both cases, drift seems to happen independently in two environments, even though they are visited intermittently and are visually similar.

      The second question is a fascinating one, and we are planning to pursue it in future work. While the exact way in which the brain achieves this near-independence is an open question, remapping is one possible window into this process.

      We extended the discussion to make these points clear.

      (7) The methods section should offer more details about the neural nets employed in the study. The manuscript should be explicit about the terms "hidden layer", "units", and "neurons", ensuring they are defined clearly and not used interchangeably..

      We changed the usage of these terms to be more coherent and made our code publicly available. Specifically, “units” refer to artificial networks and “neurons” to biological ones.

      In addition, each reviewer has raised both major and minor concerns. These are listed below and should be addressed where possible.

      Reviewer #1 (Recommendations For The Authors):

      I recommend that the authors edit the text to soften their claims. For example:

      In the abstract "To uncover the underlying mechanism, we..." could be changed to "To investigate, we..."

      Agree. Done

      On line 21, "Specifically, recent studies showed that..." could be changed to "Specifically, recent studies suggest that..."

      Agree. Done

      On line 100, "All cases" should probably be softened to "Most cases" or more details should be added to Figure 3 to support the claim that every simulation truly had a phase of directed random motion.

      The text was changed in accordance with the reviewer’s suggestion. In addition, the figure was changed and only includes simulations in which we expected unit sparsity to arise (without label noise). We also added explanations and supplementary figures for label noise.

      Unless I missed something obvious, there is no new experimental data analysis reported in the paper. Thus, line 159 of the discussion, "a phenomenon we also observed in experimental data" should be changed to "a phenomenon that recently reported in experimental data."

      We thank the reviewer for drawing our attention to this. We now analyzed data from three other labs, two of which are novel analyses on existing data. All four datasets show the same trends of sparseness with increasing spatial information. The new Figure 3 and text now describe this.

      On line 179 of the Discussion, "a family of network configurations that have identical performance..." could be softened to "nearly identical performance." It would be possible for networks to have minuscule differences in performance that are not detected due to stochastic batch effects or limits on machine precision.

      The text was changed in accordance with the reviewer’s suggestion.

      Other minor comments:

      Citation 44 is missing the conference venue, please check all citations are formatted properly.

      Corrected.

      In the discussion on line 184, the connection to remapping was confusing to me, particularly because the cited reference (Sanders et al. 2020) is more of a conceptual model than an artificial network model that could be adapted to the setting of noisy learning considered in this paper. How would an RNN model of remapping (e.g. Low et al. 2023; Remapping in a recurrent neural network model of navigation and context inference) be expected to behave during the sparsifying portion of drift?

      We now clarified this section. The conceptual model of Sanders et al includes a specific prediction (Figure 7 there) which is very similar to ours - a systematic change in robustness depending on duration of training. Regarding the Low et al model, using such mechanistic models is an exciting avenue for future research.

      Reviewer #2 (Recommendations For The Authors):

      I only have two major questions.

      (1) Learning multiple representations: Memory systems in the brain typically must store many distinct memories. Certainly, the hippocampus, where RD is prominent, is involved in the ongoing storage of episodic memories. But even in the idealized case of just two spatial memories, for example, two distinct linear tracks, how would this learning process look? Would there be any interference between the two learning processes or would they be largely independent? Is the separation of time scales robust to the number of representations stored? I understand that to answer this question fully probably requires a research effort that goes well beyond the current study, but perhaps an example could be shown with two environments. At the very least the authors could express their thoughts on the matter.

      There are two facets to the topic of multiple environments. First, are the results of the current paper relevant when there are multiple environments? Second, what is the interaction between brain mechanisms of dealing with multiple environments and the results of the current paper?

      We believe the answer to the first question is positive. The near-orthogonality of representations between environments implies that changes in one can happen without changes in the other. This is evident, for instance, in Khatib et al and Geva et al - in both cases, drift seems to happen independently in two environments, even though they are visited intermittently and are visually similar.

      The second question is a fascinating one, and we are planning to pursue it in future work. While the exact way in which the brain achieves this near-independence is an open question, remapping is one possible window into this process.

      We extended the discussion to make these points clear.

      (2) Directed drift versus stationarity: I could not help but notice that the RD illustrated in Fig.2D is not stationary in nature, i.e. the upper right and lower left panels are quite different. This appears to contrast with findings in the hippocampus, for example, Fig.3e-g in (Ziv et al, 2013). Perhaps it is obvious that a directed process will not be stationary, but the authors note that there is a third phase of steady-state null drift. Is the RD seen there stationary? Basically, I wonder if the process the authors are studying is relevant only as a novel environment becomes familiar, or if it is also applicable to RD in an already familiar environment. Please discuss the issue of stationarity in this context.

      The non-stationarity noted by the reviewer is indeed a major feature of our observations, and is indeed linked to familiarity. We divide learning into three phases (now more clearly stated in Table 1 and Figure 4C). The first, rapid, phase consists of improvement of performance - corresponding to initial familiarity with the environment. The third phase, often reported in the literature of representational drift, is indeed stationary and obtained after prolonged familiarity. Our work focuses on the second phase, which is not as immediate as the first one, and can take several days. We note in the discussion that experiments which include a long familiarization process can miss this phase (see also Table 3). Furthermore, we speculate that real life is less stationary than a lab environment, and this second phase might actually be more relevant there.

      Reviewer #3 (Recommendations For The Authors):

      Most of my general recommendations are outlined in the public review. A large portion of my comments regards increasing clarity and explicitly defining many of the terms used which may require generating more figures (to better illustrate the generality of findings) or modifying existing figures (e.g., to show how/where the three stages of learning map onto the authors' data).

      Sparsification is not clearly defined in the main text. As I read it, sparsification is meant to refer to the activity of neurons, but this needs to be clearly defined. For example, lines 262-263 in the methods define "sparseness" by the number of active units, but lines 116-117 state: "For label noise, the dynamics were qualitatively different, the fraction of active units did not reduce, but the activity of the units did sparsify." If the fraction of active units (defined as "sparseness") did not change, what does it mean that the activity of the units "sparsified"? If the authors mean that the spatial activity patterns of hidden units became more sharply tuned, this should be clearly stated.

      We now defined precisely the two measures we use - Active Fraction, and Fraction Active Units. There is a new section with an accompanying figure in the Methods section. As Figure S2 shows, the noise statistics (label noise vs. update noise) differentially affects these two measures.

      Likewise, it is unclear which of the features the authors outlined - spatial information, active proportion of units, and spatial correlation - are meant to represent drift. The authors should clearly delineate which of these three metrics they mean to delineate drift in the main text rather than leave it to the reader to infer. While all three are mentioned early on in the text (Figure 2), the authors focus more on sparseness in the last half of the text, making it unclear if it is just sparseness that the authors mean to represent drift or the other metrics as well.

      The main focus of our paper is on the non-stationarity of drift. Namely that features (such as these three) systematically change in a directed manner as part of the drift process. This is in The new analyses of experimental data show sparseness and spatial information.

      The focus on sparseness in the second half of the paper is because we move to more abstract These are also easy to study in the more abstract tasks in the second part of the paper. In our original submission the transition between a predictive coding task to more abstract tasks was not clearly explained, creating some confusion regarding the measured metrics. We now clarified the motivation for this transition.

      It is not clear if a change in the number of active units alone constitutes "drift", especially since Geva et al. (2023) recently showed that both changes in firing rate AND place field location drive drift, and that the passage of time drives changes in activity rate (or # cells active).

      Our work did not deal with purely time-dependent drift, but rather focused on experience-dependence. Furthermore, Geva et al study the stationary phase of drift, where we do not expect a systematic change in the total number of cells active. They report changes in the average firing rate of active cells in this phase, as a function of time - which does not contradict our findings.

      "hidden layer", "units", and "neurons" seem to be used interchangeably in the text (e.g., line 81-85). However, this is confusing in several places, in particular in lines 83-85 where "neurons" is used twice. The first usage appears to refer to the rate maps of the hidden layer units simulated by the authors, while the second "neurons" appears to refer to real data from Ziv 2013 (ref 5). The authors should make it explicit whether they are referring to hidden layer units or actual neurons to avoid reader confusion.

      We changed the usage of these terms to be more coherent. Specifically, “units” refer to artificial networks and “neurons” to biological ones.

      The authors should clearly illustrate which parts of their findings support their three-phase learning theory. For example, does 2E illustrate these phases, with the first tenth of training time points illustrating the early phase, time 0.1-0.4 illustrating the intermediate phase, and 0.4-1 illustrating the last phase? Additionally, they should clarify whether the second and third stages are meant to represent drift, or is it only the second stage of directed manifold exploration that is considered to represent drift? This is unclear from the main text.

      The different processes (reduction in loss, reduction in Hessian) happen in parallel with different timescales. Thus, there are no sharp transitions between the phases. This is now explained in the text in relation to figure 4C, where the approximate boundaries are depicted.

      The term drift is often used to denote a change in representation without a change in behavior. In this sense, both the second and third phases correspond to drift. Only the third stage is stationary. This is now emphasized in the text and in the new Table 1. Regarding experimental data, apart from the new figure 3 with four datasets, we also summarize in Table 3 the relation between duration of familiarity and stationarity of the data.

      Line 45 - It appears that the acronym ML is not defined above here anywhere.

      Added.

      Line 71: the ReLU function should be defined in the text, e.g., sigma(x) = x if x > 0 else 0.

      Added.

      106-107: Figures (or supplemental figures) to demonstrate how most parameters do not influence sparsification dynamics are warranted. As written, it is unclear what "most parameters" mean - all but noise scale. What about the learning rule? Are there any interactions between parameters?

      We now removed the label noise from Figure 4, and added two supplementary figures to clearly explain the effect of parameters. Figure 4 itself was also redone to clarify this issue.

      2F middle: should "change" be omitted for SI?

      The panel was replaced by a new one in Figure 3.

      116-119: A figure showing how results differ for label noise is warranted.

      This is now done in Figure S1, S2.

      124: typo, The -> the

      Corrected.

      127-129: This conclusion statement is the first place in the text where the three stages are explicitly outlined. There does not appear to be any support or further explanation of these stages in the text above.

      We now explain this earlier at the end of the Introduction section, along with the new Table 1 and marking on Figure 4C.

      132-133 seems to be more of a statement and less of a prediction or conclusion - do the authors mean "the flatness of the loss landscape in the vicinity of the solution predicts the rate of sparsification?"

      We thank the reviewer for this observation. The sentence was rephrased:

      Old: As illustrated in Fig. 1, different solutions in the zero-loss manifold might vary in some of their properties. The specific property suggested from theory is the flatness of the loss landscape in the vicinity of the solution.

      New: As illustrated in Fig. 1, solutions in the zero-loss manifold have identical loss, but might vary in some of their properties. The authors of [26] suggest that noisy learning will slowly increase the flatness of the loss landscape in the vicinity of the solution.

      135: typo, it's -> its

      Corrected.

      Line 135-136 "Crucially, the loss on the 136 entire manifold is exactly zero..." This appears to contradict the Figure 4A legend - the loss appears to be very high near the top and bottom edges of the manifold in 4A. Do the authors mean that the loss along the horizontal axis of the manifold is zero?

      The reviewer is correct. The manifold mentioned in the sentence is indeed the horizontal axis. We changed the text and the figure to make it clearer.

      Equation 6: This does not appear to agree with equation 2 - should there be an E_t term for an expectation function?

      Corrected.

      Line 262-263: "Sparseness means that a unit has become inactive for all inputs." This should also be stated explicitly as the definition of sparseness/sparsification in the main text.

      We now define precisely the two measures we use - Active Fraction, and Fraction Active Units. There is a new section with an accompanying figure in the Methods section. As Figure S2 shows, the noise statistics (label noise vs. update noise) differentially affects these two measures.

    1. Author response:

      The following is the authors’ response to the original reviews.

      General comments

      All three experts have raised excellent ideas and made important suggestions to extend the scope of our study and provide additional information. While we fully acknowledge that these points are valid and would provide exciting new knowledge, we also should not lose track of the fact that a single study cannot cover all bases. Sulfated steroids, for example, are clearly essential components of mouse urine. Unfortunately, however, all chemical analysis approaches are limited and the one we opted for is not suitable for analysis of such signaling molecules. Future studies should certainly focus on these aspects. The same holds true for the fact that we do not know which of the identified compounds are actually VSN ligands. These are inherent limitations of the approach, and we are not claiming otherwise.

      Reviewer #1 (Public Review):

      (1) In this manuscript, Nagel et al. sought to comprehensively characterize the composition of urinary compounds, some of which are putative chemosignals. They used urines from adult males and females in three different strains, including one wild-derived strain. By performing mass spectrometry of two classes of compounds: volatile organic compounds and proteins, they found that urines from inbred strains are qualitatively similar to those of a wild strain. This finding is significant because there is a high degree of genetic diversity in wild mice, with chemosensory receptor genes harboring many polymorphisms.

      We agree and thank the Reviewer for his / her positive assessment.

      (2) In the second part of this work, the authors used calcium imaging to monitor the pattern of vomeronasal neuron responses to these urines. By performing pairwise comparisons, the authors found a large degree of strain-specific response and a relatively minor response to sex-specific urinary stimuli. This is a finding generally in agreement with previous calcium imaging work by Ron Yu and colleagues in 2008. The authors extend the previous work by using urines from wild mice. They further report that the concentration diversity of urinary compounds in different urine batches is largely uncorrelated with the activity profiles of these urines. In addition, the authors found that the patterns of vomeronasal neuron response to urinary cues are not identical when measured using different recipient strains. This fascinating finding, however, requires an additional control to exclude the possibility that this is not due to sampling error.

      We thank Reviewer 1 for pointing this out. We agree that this is truly a “fascinating finding.” Reviewer 1 emphasizes that we need to add an “additional control to exclude […] that this is not due to sampling error”, and he / she elaborates on the required control in his / her Recommendations For The Authors (see below). Reviewer 1 states that “for Fig. 5, in order to conclude that the same urine activates a different population of VSNs in two different strains, a critical control is needed to demonstrate that this is not due to the sampling variability - as compositions of V1Rs and V2Rs could vary between different slices, one preferred control is to use VNO slices from the same strain and compare the selectivity used here across the A-P axis.” Importantly, we believe that this is already controlled for. In fact, for each experiment, we routinely prepare VNO slices along the organ’s entire anterior-to-posterior axis (not including the most anterior tip, where the VNO lumen tapers into the vomeronasal duct, and the most posterior part, the lumen ‘‘twists’’ toward the ventral aspect and its volume decreases (see Figs. 7 & S7 in Hamacher et al., 2024, Current Biology)). This usually yields ~7 slices per individual experiment / session. Therefore, we routinely sample and average across the entire VNO anterior-to-posterior axis for each experiment. In Fig. 5, in which we analyzed whether the “same urine activates a different population of VSNs in two different strains”, individual independent experiments from each strain (C57BL/6 versus BALB/c) amounted to (a) n = 6 versus n = 8; (b) n = 10 versus n = 10; (c) n = 7 versus n = 9; (d) n = 9 versus n = 10; (e) n = 10 versus n = 9; and (f) n = 12 versus n = 10. Together, we conclude that it is very unlikely that the considerably different response profiles measured in different recipient strains result from a “sampling error.”

      To clarify this point in the revised manuscript, we now explain our sampling routine in more detail in the Materials and Methods. Moreover, we now also refer to this point in the Results.

      (3) There are several weaknesses in this manuscript, including the lack of analysis of the compositions of sulfated steroids and other steroids, which have been proposed to be the major constituents of vomeronasal ligands in urines and the indirect (correlational) nature of their mass spectrometry data and activity data.

      Reviewer 1 is correct to point out that our chemical profiling approach omits (sulfated) steroids. We are aware of this weakness. We deliberately decided to omit steroids as well as other nonvolatile small organic molecules for three main reasons: (i) as the reviewer points out, (sulfated) steroid composition has been the focus of analysis in several previous studies and there is ample published information available on their role as VSN stimuli; (ii) the analytical tools available to us do not allow comprehensive profiling of non-volatile small organic molecules; employing two-dimensional head-space GC-MS as well as LC-MS/MS is not suitable for steroid detection; and (iii) the relatively small sample volumes forced us to prioritize and focus on specific chemical classes (in our case, VOCs and proteins). We made an effort to use of the exact same stimuli as previously employed to investigate sensory representations in the accessory olfactory bulb (AOB) (Bansal et al., 2021), a feature that we consider a strength of the current study. However, this entailed that we had to effectively split our samples, further reducing the available sample volume.

      We acknowledge that we did not sufficiently describe our rationale for focusing on VOCs and proteins on the previous version of the manuscript (nor did we discuss the known role of (sulfated) steroids in VSN signaling in adequate detail). We have now made an effort to address these shortcomings in the revised manuscript. Specifically, we have added new text to the Introduction (“Prominent molecularly identified VSN stimuli include various sulfated steroids (Celsi et al., 2012; Fu et al., 2015; Haga-Yamanaka et al., 2015, 2014; Isogai et al., 2011; Nodari et al., 2008; Turaga and Holy, 2012), which could reflect the dynamic endocrine state of an individual.”) and the Discussion (“Notably, our chemical profiling approach omits (sulfated) steroids other non-volatile small organic molecules, which have previously been identified in mouse urine as VSN stimuli (Nodari et al., 2008). Caution should thus be exerted to not attempt to fully explain VSN response specificity based on VOC and protein content alone.” & “In line with the notion of highly selective vomeronasal sampling is our observation that the concentration differences between compounds shared among strains, which are often substantial, are not reflected by similarly pronounced differences in response strength among generalist VSNs. There are several, not necessarily mutually exclusive explanations for this finding: First, concentration could simply not be a read-out parameter for VSNs, which would support previous ideas of concentration-invariant VSN activity (Leinders-Zufall et al., 2000). Second, the concentrations in freshly released urine could just exceed the dynamic tuning range of VSNs since, particularly for VOCs, natural signals (e.g., in scent marks) must be accessible to a recipient for a prolonged amount of time (sometimes days). A similar rationale could explain the increased protein concentrations in male urine, since male mice use scent marking to establish and maintain their territories and urinary lipocalins serve as long-lasting reservoirs of VOCs (Hurst et al., 1998). Third, generalist VSNs might sample information only from a select subset of urinary compounds, which, given their role as biologically relevant chemosignals, might be released at tightly controlled (and thus similar) concentrations. In fact, in the most extreme scenario, several compounds that do display substantial strain- and/or sex-specific differences in concentration might not act as chemosignals at all. Forth, to some extent, different response profiles could be attributed to non-volatile small organic molecules such as steroids (Nodari et al., 2008), which were beyond the focus of our chemical analysis.”).

      (4) Overall, the major contribution of this work is the identification of specific molecules in mouse urines. This work is likely to be of significant interest to researchers in chemosensory signaling in mammals and provides a systematic avenue to exhaustively identify vomeronasal ligands in the future.

      We thank the Reviewer for his / her generally positive assessment.

      Reviewer #2 (Public Review):

      (1) This manuscript by Nagel et al provides a comprehensive examination of the chemical composition of mouse urine (an important source of semiochemicals) across strain and sex, and correlates these differences with functional responses of vomeronasal sensory neurons (an important sensory population for detecting chemical social cues). The strength of the work lies in the careful and comprehensive imaging and chemical analyses, the rigor of quantification of functional responses, and the insight into the relevance of olfactory work on lab-derived vs wild-derived mice.

      We thank the Reviewer for his / her generally positive assessment.

      (2) With regards to the chemical analysis, the reader should keep in mind that a difference in the concentration of a chemical across strain or sex does not necessarily mean that that chemical is used for chemical communication. In the most extreme case, the animals may be completely insensitive to the chemical. Thus, the fact that the repertoire of proteins and volatiles could potentially allow sex and/or strain discrimination, it is unclear to what degree both are used in different situations.

      Reviewer 2 is correct to point out that sex- and/or strain-dependent differences in urine molecular composition do not automatically attribute a signaling function to those molecules. We concur and, in fact, stress this point many times throughout the manuscript. In the Results, for example, we point out (i) that “in female urine, BALB/c-specific proteins are substantially underrepresented, a fact not reflected by VSN response profiles”, (ii) that “as observed in C57BL/6 neurons, the skewed distributions of protein concentration indices were not reflected by BALB/c generalist VSN profiles”, and (iii) that “VSN population response profiles do not reflect the global molecular content of urine, suggesting that the VNO functions as a rather selective molecular detector.” Moreover, in the Discussion, we state (i) that “caution should thus be exerted to not attempt to fully explain VSN response specificity based on VOC and protein content alone”; (ii) that, for several sex- and/or strain-specific molecules, none “has previously been attributed a chemosensory function. Challenging the mouse VNO with purified recombinant protein(s) will help elucidate whether such functions exist”; (iii) that “generalist VSNs might sample information only from a select subset of urinary compounds, which, given their role as biologically relevant chemosignals, might be released at tightly controlled (and thus similar) concentrations”; and (iv) that “to some extent, different response profiles could be attributed to non-volatile small organic molecules such as steroids (Nodari et al., 2008), which were beyond the focus of our chemical analysis.”

      In the revised manuscript, we now aim to even more strongly emphasize the point made by Reviewer 2. In the Discussion, we have deleted a sentence that read: “Sex- and strain-specific chemical profiles give rise to unique VSN activity patterns.” Moreover, we have added the following statement: “In fact, in the most extreme scenario, several compounds that do display substantial strain- and/or sex-specific differences in concentration might not act as chemosignals at all.”

      Reviewer #3 (Public Review):

      (1) One of the primary objectives in this study is to ascertain the extent to which the response profiles of VSNs are specific to sex and strain. The design of these Ca2+ imaging experiments uses a simple stimulus design, using two interleaved bouts of stimulation with pairs of urine (e.g. male versus female C57BL/6, male C57BL/6 versus male BALB/c) at a single dilution factor (1:100). This introduces two significant limitations: (1) the "generalist" versus "specialist" descriptors pertain only to the specific pairwise comparisons made and (2) there is no information about the sensitivity/concentration-dependence of the responses.

      Reviewer 3 points to two limitations of our VSN activity assay. He / she is correct to mention that characterizing a VSN as generalist or specialist based on a “pairwise comparison” should not be the basis of attributing such a “generalist” or “specialist” label in general (i.e., regarding the global stimulus space). We acknowledge this point, but we do not regard this as a limitation of our study since we are not investigating rather broad (i.e., multidimensional) questions of selectivity. All we are asking in the context of this study is whether VSNs - when being challenged with pairs of sex- or strain-specific urine samples - act as rather selective semiochemical detectors. Of course, one can always think of a study design that provides more information. However, we here opted for an assay that - in our hands - is robust, “low noise” (i.e., displays low intrinsic signal variability as evident form reliability index calculations), ensures recovery from VSN adaptation (Wong et al., 2018), and, importantly, answers the specific question we are asking.

      Regarding the second point (“there is no information about the sensitivity/concentrationdependence of the responses”), we would like to emphasize that this was not a focus of our study either. In fact, concentration-dependence of VSN activity has been a major focus of several previous studies referenced in our manuscript (e.g., Leinders-Zufall et al., 2000; He et al., 2008), albeit with contradictory results. In our study, we ask whether a pair of stimuli that we have shown to display, in part, strikingly different chemical composition (both absolute and relative) preferentially activates the same or different VSNs. With this question in mind, we believe that our assay (and its results) are highly informative.

      (2) The functional measurements of VSN tuning to various pairs of urine stimuli are consistently presented alongside mass spectrometry-based comparisons. Although it is clear from the manuscript text that the mass spectrometry-based analysis was separated from the VSN tuning experiments/analysis, the juxtaposition of VSN tuning measurements with independent molecular diversity measurements gives the appearance to readers that these experiments were integrated (i.e., that the diversity of ligands was underlying the diversity of physiological responses). This is a hypothesis raised by the parallel studies, not a supported conclusion of the work. This data presentation style risks confusing readers.

      As Reviewer 3 points out correctly “it is clear from the manuscript text that the mass spectrometry-based analysis was separated from the VSN tuning experiments/analysis.” In the figures, we try make the distinction between VSN response statistics and chemical profiling more obvious by gray shadows that link the plots depicting VSN response characteristics to the general pie charts.

      We now also made an extra effort to avoid “confusing readers” by stating in the Discussion (i) that “caution should thus be exerted to not attempt to fully explain VSN response specificity based on VOC and protein content alone”; (ii) that, for several sex- and/or strain-specific molecules, none “has previously been attributed a chemosensory function. Challenging the mouse VNO with purified recombinant protein(s) will help elucidate whether such functions exist”; (iii) that “generalist VSNs might sample information only from a select subset of urinary compounds, which, given their role as biologically relevant chemosignals, might be released at tightly controlled (and thus similar) concentrations”; and (iv) that “to some extent, different response profiles could be attributed to non-volatile small organic molecules such as steroids (Nodari et al., 2008), which were beyond the focus of our chemical analysis.” Moreover, we have deleted a sentence that read: “sex- and strain-specific chemical profiles give rise to unique VSN activity patterns”, and we have added the following statement: “In fact, in the most extreme scenario, several compounds that do display substantial strain- and/or sex-specific differences in concentration might not act as chemosignals at all.”

      However, we believe that there is value in presenting “VSN tuning measurements” next to “independent molecular diversity measurements.” While these are independent measurements, their similarity or, quite frequently, lack thereof are informative. We are sure that by taking the above “precautions” we have now mitigated the risk of “confusing readers.”

      (3) The impact of mass spectrometry findings is limited by the fact that none of these molecules (in bulk, fractions, or monomolecular candidate ligands) were tested on VSNs. It is possible that only a very small number of these ligands activate the VNO. The list of variably expressed proteins - especially several proteins that are preferentially found in female urine - is compelling, but, again, there is no evidence presented that indicates whether or not these candidate ligands drive VSN activity. It is noteworthy that the largest class of known natural ligands for VSNs are small nonvolatiles that are found at high levels in mouse urine. These molecules were almost certainly involved in driving VSN activity in the physiology assays (both "generalist" and "specialist"), but they are absent from the molecular analysis.

      Reviewer 3 is right, of course, that at this point we have not tested the identified molecules on VSNs. This is clearly beyond the scope of the present study. We believe that the data we present will be the basis of (several full-length) future studies that aim to identify specific ligands and - best case scenario - receptor-ligand pairs. We find it hard to concur that our study, which provides the necessary basis for those future endeavors, is regarded as “incomplete”. By design, all studies are somewhat incomplete, i.e., there are always remaining questions and we are not contesting that.

      It is true, of course, that a class of “known natural ligands for VSNs are small nonvolatiles.” As we replied above, our chemical profiling approach omits (sulfated) steroids. We are aware of this weakness. We deliberately decided to omit steroids as well as other non-volatile small organic molecules for three main reasons: (i) steroid composition has been the focus of analysis in several previous studies and there is ample published information available on their role as VSN stimuli; (ii) the analytical tools available to us do not allow comprehensive profiling of non-volatile small organic molecules; employing two-dimensional head-space GC-MS as well as LC-MS/MS is not suitable for steroid detection; and (iii) the relatively small sample volumes forced us to prioritize and focus on specific chemical classes (in our case, VOCs and proteins). We made an effort to use of the exact same stimuli as previously employed to investigate sensory representations in the accessory olfactory bulb (AOB) (Bansal et al., 2021), a fact that we consider a key strength of our current study. However, this entailed that we had to effectively split our samples, further reducing the available sample volume.

      We acknowledge that we did not sufficiently describe our rationale for focusing on VOCs and proteins on the previous version of the manuscript (nor did we discuss the known role of (sulfated) steroids in VSN signaling in adequate detail). We have now made an effort to address these shortcomings in the revised manuscript. Specifically, we have added new text to the Introduction (“Prominent molecularly identified VSN stimuli include various sulfated steroids (Celsi et al., 2012; Fu et al., 2015; Haga-Yamanaka et al., 2015, 2014; Isogai et al., 2011; Nodari et al., 2008; Turaga and Holy, 2012), which could reflect the dynamic endocrine state of an individual.”) and the Discussion (“Notably, our chemical profiling approach omits (sulfated) steroids other non-volatile small organic molecules, which have previously been identified in mouse urine as VSN stimuli (Nodari et al., 2008). Caution should thus be exerted to not attempt to fully explain VSN response specificity based on VOC and protein content alone.” & “In line with the notion of highly selective vomeronasal sampling is our observation that the concentration differences between compounds shared among strains, which are often substantial, are not reflected by similarly pronounced differences in response strength among generalist VSNs. There are several, not necessarily mutually exclusive explanations for this finding: First, concentration could simply not be a read-out parameter for VSNs, which would support previous ideas of concentration-invariant VSN activity (Leinders-Zufall et al., 2000). Second, the concentrations in freshly released urine could just exceed the dynamic tuning range of VSNs since, particularly for VOCs, natural signals (e.g., in scent marks) must be accessible to a recipient for a prolonged amount of time (sometimes days). A similar rationale could explain the increased protein concentrations in male urine, since male mice use scent marking to establish and maintain their territories and urinary lipocalins serve as long-lasting reservoirs of VOCs (Hurst et al., 1998). Third, generalist VSNs might sample information only from a select subset of urinary compounds, which, given their role as biologically relevant chemosignals, might be released at tightly controlled (and thus similar) concentrations. In fact, in the most extreme scenario, several compounds that do display substantial strain- and/or sex-specific differences in concentration might not act as chemosignals at all. Forth, to some extent, different response profiles could be attributed to non-volatile small organic molecules such as steroids (Nodari et al., 2008), which were beyond the focus of our chemical analysis.”).

      Reviewer #1 (Recommendations For The Authors):

      (1) I find that the study is highly valuable for researchers in this field. With the finding that wild mouse urines do not elicit significantly more variable responses from urines from inbred strains, researchers can now be reassured to use inbred strains to gain general insights on pheromone signaling.

      A major omission of this study is non-volatile small organic molecules such as steroids. These compounds are the only molecular class in urine that have been identified to stimulate specific vomeronasal receptors to date. It is unclear to me that the specificity of VOC and proteins can alone fully explain the response specificity of the VSNs that have been monitored in this study. The discussion of this topic is highly beneficial for the readers.

      Reviewer 1 is correct to point out that our chemical profiling approach omits (sulfated) steroids. We are aware of this weakness. We deliberately decided to omit steroids as well as other nonvolatile small organic molecules for three main reasons: (i) as the reviewer points out, (sulfated) steroid composition has been the focus of analysis in several previous studies and there is ample published information available on their role as VSN stimuli; (ii) the analytical tools available to us do not allow comprehensive profiling of non-volatile small organic molecules; employing two-dimensional head-space GC-MS as well as LC-MS/MS is not suitable for steroid detection; and (iii) the relatively small sample volumes forced us to prioritize and focus on specific chemical classes (in our case, VOCs and proteins). We made an effort to use of the exact same stimuli as previously employed to investigate sensory representations in the accessory olfactory bulb (AOB) (Bansal et al., 2021), a fact that we consider a key strength of our current study. However, this entailed that we had to effectively split our samples, further reducing the available sample volume.

      We acknowledge that we did not sufficiently describe our rationale for focusing on VOCs and proteins on the previous version of the manuscript (nor did we discuss the known role of (sulfated) steroids in VSN signaling in adequate detail). We have now made an effort to address these shortcomings in the revised manuscript. Specifically, we have added new text to the Introduction (“Prominent molecularly identified VSN stimuli include various sulfated steroids (Celsi et al., 2012; Fu et al., 2015; Haga-Yamanaka et al., 2015, 2014; Isogai et al., 2011; Nodari et al., 2008; Turaga and Holy, 2012), which could reflect the dynamic endocrine state of an individual.”) and the Discussion (“Notably, our chemical profiling approach omits (sulfated) steroids other non-volatile small organic molecules, which have previously been identified in mouse urine as VSN stimuli (Nodari et al., 2008). Caution should thus be exerted to not attempt to fully explain VSN response specificity based on VOC and protein content alone.” & “In line with the notion of highly selective vomeronasal sampling is our observation that the concentration differences between compounds shared among strains, which are often substantial, are not reflected by similarly pronounced differences in response strength among generalist VSNs. There are several, not necessarily mutually exclusive explanations for this finding: First, concentration could simply not be a read-out parameter for VSNs, which would support previous ideas of concentration-invariant VSN activity (Leinders-Zufall et al., 2000). Second, the concentrations in freshly released urine could just exceed the dynamic tuning range of VSNs since, particularly for VOCs, natural signals (e.g., in scent marks) must be accessible to a recipient for a prolonged amount of time (sometimes days). A similar rationale could explain the increased protein concentrations in male urine, since male mice use scent marking to establish and maintain their territories and urinary lipocalins serve as long-lasting reservoirs of VOCs (Hurst et al., 1998). Third, generalist VSNs might sample information only from a select subset of urinary compounds, which, given their role as biologically relevant chemosignals, might be released at tightly controlled (and thus similar) concentrations. Forth, to some extent, different response profiles could be attributed to non-volatile small organic molecules such as steroids (Nodari et al., 2008), which were beyond the focus of our chemical analysis.”).

      (2) How many different wild mouse urines were tested in this study? Is this sufficient to capture the diversity of wild M. musculus in local (Prague) habitats?

      We thank the reviewer for pointing this out. For the present study, 20 male (M) and 27 female (F) wild mice were caught at six different sites in the broader Prague area (i.e., Bohnice (50.13415N, 14.41421E; 2M+4F), Dolni Brezany (49.96321N, 14.4585E; 3M+4F), Hodkovice (49.97227N, 14.48039E; 5M+6F), Písnice (49.98988N, 14.46625E; 3M+6F), Lhota (49.95369N, 14.43087E; 1M+2F), and Zalepy (49.9532N, 14.40829E; 6M+5F). 18 of the 27 wild females were caught pregnant. The remaining 9 females were mated with males caught at the same site and produced offspring within a month. When selecting 10 male and 10 female individuals from first-generation offspring for urine collection, we ensured that all six capture sites were represented and that age-matched animals displayed similar weight (~17g). We believe that this capture / breeding strategy sufficiently represents “the diversity of wild M. musculus in local (Prague) habitats.” In the revised manuscript, we have now included these details in the Materials and Methods.

      (3) I found Figure 1e and figures in a similar format confusing - one panel describes the response statistics of VSNs, and other panels show the number of compounds found in different MS profiling, which is not immediately obvious from the figures. Is the y-axis legend correct (%)?

      We now try make the distinction between VSN “response statistics” and chemical profiling more obvious by gray shadows that link the plots depicting VSN response characteristics to the general pie charts. Moreover, we thank the Reviewer for pointing out the mislabeling of the y-axis. Accordingly, we have deleted “%” in all corresponding figures.

      (4) For Figure 5, in order to conclude that the same urine activates a different population of VSNs in two different strains, a critical control is needed to demonstrate that this is not due to the sampling variability - as compositions of V1Rs and V2Rs could vary between different slices, one preferred control is to use VNO slices from the same strain and compare the selectivity used here across the A-P axis.

      We thank Reviewer 1 for pointing this out. Importantly, we believe that this is already controlled for (see our response to the Public Review). In fact, for each experiment, we routinely prepare VNO slices along the entire anterior-to-posterior axis (not including the most anterior tip, where the VNO lumen tapers into the vomeronasal duct, and the most posterior part, the lumen ‘‘twists’’ toward the ventral aspect and its volume decreases (see Figs. 7 & S7 in Hamacher et al., 2024, Current Biology)). This usually yields ~7 slices per individual experiment / session. Therefore, we routinely sample and average across the entire VNO anterior-to-posterior axis for each experiment. In Fig. 5, individual independent experiments from each strain (C57BL/6 versus BALB/c) amounted to (a) n = 6 versus n = 8; (b) n = 10 versus n = 10; (c) n = 7 versus n = 9; (d) n = 9 versus n = 10; (e) n = 10 versus n = 9; and (f) n = 12 versus n = 10. Together, we can thus exclude that the considerably different response profiles that we measured using different recipient strains result from a “sampling error.”

      To clarify this point in the revised manuscript, we now explain our sampling routine in more detail in the Materials and Methods. Moreover, we now also mention this point in the Results.

      Reviewer #2 (Recommendations For The Authors):

      (1) Pg 5 Lines 3-16: This summary paragraph contains too much detail given that the reader has not read the paper yet, which makes it bewildering. This should be condensed.

      We agree and have substantially condensed this paragraph.

      (2) Pg 6 Line 5-8: This summary of the experimental design is obtuse and should be edited for clarity.

      We have edited the relevant passage for clarity.

      (3) Pg 6 Line 11: "VSNs were categorized..." Specialist vs generalist is defined as responding to one or both stimuli. This definition is placed right after saying that the cells were also tested with KCl. The reader might think that specialist vs generalist was defined in relation to KCl.

      We have edited this sentence, which now reads: “Dependent on their individual urine response profiles, VSNs were categorized as either specialists (selective response to one stimulus) or generalists (responsive to both stimuli).”

      (4) Pg 6 Line 13: "we recorded urine-dependent Ca2+ signals from a total of 16,715 VSNs". Is a "signal" a response? Did all 16,715 VSNs respond to urine? What was the total of KCl responsive cells recorded?

      We edited the corresponding passage for clarification. The text now reads: “Overall, we recorded >43,000 K+-sensitive neurons, of which a total of 16,715 VSNs (38.4%) responded to urine stimulation. Of these urine-sensitive neurons, 61.4% displayed generalist profiles, whereas 38.6% were categorized as specialists (Figure 1c,d).”

      (5) Pg 7 Line 6: The repeated use of the word "pooled" is confusing as it suggests a variation in the experiment. The authors should establish once in the Methods and maybe in the Results that stimuli were pooled across animals. Then they should just refer to the stimulus as male or female or BALB/c rather than "pooled" male etc.

      We acknowledge the reviewer’s argument. Accordingly, we now introduce the experimental use of pooled urine once in the Methods and in the introductory paragraph of the Results. All other references to “pooled” urine in the Results and Captions have been deleted.

      (6) Pg 7 Line 10: "...detected in >=3 out of 10 male..." For the chemical analysis, were these samples not pooled?

      Correct. We deliberately did not pool samples for chemical analysis, but instead analyzed all individual samples separately (i.e., 60 samples were subjected to both proteomic and metabolomic analyses). Thus, the criterion that a VOC or protein must be detected in at least 3 of the 10 individual samples from a given sex/strain combination for a ‘present’ call (and in at least 6 of the 10 samples to be called ‘enriched’) ensures that the molecular signatures we identify are not “contaminated” by unusual aberrations within single samples.<br /> For clarification, we now explicitly outline this procedure in the Methods (Experimental Design and Statistical Analysis – Proteomics and metabolomics).

      (7) Pg 7 Line 23: In line 7, the specialist rate was defined as 5% in reference to the total KCl responsive cells. Here the specialist rate is defined from responsive cells. This is confusing.

      We apologize for the confusion. In both cases, the numbers (%) refer to all K+-sensitive neurons. We have added this information to both relevant sentences (l. 7 as well as ll. 23-24). Note that the rate in ll. 23-24 refers to generalists.

      (8) Pg 7 Line 25: Concentration index should be defined before its use here.

      We have revised the corresponding sentence, which now reads: “By contrast, analogously calculated concentration indices (see Materials and Methods) that can reflect potential disparities are distributed more broadly and non-normally (Figure 1h).”

      (9) Pg 7 Line 29: change "trivially" to "simply".

      Done

      (10) Pg 7 Line 30: What is meant by a "generalist" ligand? The neurons are generalists. Probably should read "common ligands"

      We have changed the text accordingly.

      (11) Pg 7 Line 31: What is meant by "global observed concentration disparities" ?

      We have changed the text to “…represented by the observed general concentration disparities.”

      (12) Pg 8 Lines 7-11: This section needs to be edited for clarity as it is very difficult to follow. For example, the definition of "enriched" is buried in a parenthetical. Also, it is very difficult to figure out what a "sample" is in this paper. Is it a pooled stimulus, or is it urine from an individual animal?

      We apologize for the confusion. Throughout the paper a “sample” is a pooled stimulus (from all 10 individuals of a given sex/strain combination) for all physiological experiments. For chemical analysis a “sample” refers to urine from an individual animal.

      (13)Pg 8 Line 11: "abundant proteins" Does this mean absolute concentration or enriched in one sample vs another?

      We changed the term “abundant” to “enriched” as this descriptor has been defined (present in ≥6 of 10 individual samples) in the previous sentence.

      (14) Pg 8 Line 18: "While 32.9% of all..." Please edit for clarity. What is the point?

      The main point here is that, for VOCs, the vast majority of compounds (91.3%) are either generic mouse urinary molecules or are sex/strain-specific.

      (15) Pg 10 Line 18: "Increased VSN selectivity..." This title is misleading as it suggests a change in sensitivity with animal exposure. I think the authors are trying to say "VSNs are more selective for strain than for sex". The authors should avoid the term "exposure to" when they mean "stimulation with" as the former suggests chronic exposure prior to testing.

      We thank the reviewer for the advice and have changed the title accordingly. We also edited the text to avoid the term "exposure to" throughout the manuscript.

      (16) Pg 12 Line 10: "we recorded hardly any..." Hardly any in comparison to what? BALB/c?

      We apologize for the confusion. We have edited the text for clarity, which now reads: “In fact, (i) compared to an average specialist rate of 11.2% ± 6.6% (mean ± SD) calculated over all 13 binary stimulus pairs (n = 26 specialist types), we observed only few specialist responses upon stimulation with urine from wild females (2% and 3%, respectively), and…”

      Reviewer #3 (Recommendations For The Authors):

      (1) Related to the pairwise stimulus-response experimental design and analysis: there is precedent in the field for studies that explore the same topic (sex- and strain-selectivity), but measure VSN sensitivity across many urine stimuli, not just two at a time. This has been done both in the VNO (He et al, Science, 2008; Fu, et al, Cell, 2015) and in the AOB (Tolokh, et al, Journal of Neuroscience, 2013). The current manuscript does not cite these studies.

      Reviewer 3 is correct and we apologize for this oversight. We now cite the two VSN-related studies by He et al. and Fu et al. in the Introduction.

      (2) The findings of the mass spectrometry-based profiling of mouse urine - especially for volatiles - is only accessible through repositories, making it difficult to for readers to understand which molecules were found to be highly divergent between sexes/strains. There is value in the list of ligands to further investigate, but this information should be made more accessible to readers without having to comb through the repositories.

      We agree that there “is value in the list of ligands to further investigate” and, accordingly, we now provide a table (Table 1) that lists the top-5 VOCs that – according to sPLS-DA – display the most discriminative power to classify samples by sex (related to Figure 2c) or strain (related to Figure 2d). For ease of identification, all entries list internal mass spectrometry identifiers, identifiers extracted from MS analysis database, the sex or strain that drives separation, which two-dimensional component / x-variate represents the most discriminative variable, PubChem chemical formula, PubChem common or alternative names, Chemical Entities of Biological Interest or PubChem Compound Identification, and the VOC’s putative origin.

      (3) There is a long precedent for integrating molecular assessments and physiological recordings to identify specific ligands for the vomeronasal system: - nonvolatiles (e.g., Leinders-Zufall, et al., Nature, 2000)

      • peptides (e.g., Kimoto et al., Nature, 2005; Leinders-Zufall et al. Science, 2004; Riviere et al., Nature, 2009; Liberles, et al., PNAS, 2009)
      • proteins (e.g., Chamero et al., Nature, 2007; Roberts et al., BMC Biology, 2010)

      • excreted steroids and bile acids (Nodari et al., Journal of Neuroscience, 2008; Fu et al., Cell, 2015; Doyle, et al., Nature Communications, 2016)

      The Leinders-Zufall (2000), Roberts, and Nodari papers are referenced, but the broader efforts by the community to find specific drivers of vomeronasal activity are not fully represented in the manuscript. The focus of this paper is fully related to this broader effort, and it would be appropriate for this work to be placed in this context in the introduction and discussion.

      We now refer to all of the studies mentioned in the Introduction (except the article published by Liberles et al. in 2009, since the authors of that study do not identify vomeronasal ligands).

      (4) Throughout the manuscript (starting in Fig. 1h) the figure panels and captions use the term "response index" whereas the methods define a "preference index." It seems to be the case that these two terms are synonymous. If so, a single term should be consistently used. If not, this needs to be clarified.

      We now consistently use the term “response index” throughout the manuscript.

      (5) It would be useful to provide a table associated with Figure 2 - figure supplement 1 that lists the common names and/or chemical formulas for the volatiles that were found to be of high importance.

      We agree and, accordingly, we now provide a table (Table 2) that lists VOC, which – according to Random Forest classification and resulting Gini importance scores – display the most discriminative power to classify samples by sex (related to Figure 2 - figure supplement 1a) or strain (related to Figure 2 - figure supplement 1b). Notably, it is generally reassuring that several VOCs are listed in both Table 1 and Table 2, emphasizing that two different supervised machine learning algorithms (i.e., sPLS-DA (Table 1) and Random Forest (Table 2)) yield largely congruent results.

      (5) The use of the term "comprehensive" for the molecular analysis is a little bit misleading, as volatiles and proteins are just two of the many categories of molecules present in mouse urine.

      We have now deleted most mentions of the term "comprehensive" when referring to the molecular analysis.

      (7) Page 11, lines 24-27: The sentences starting "We conclude..." and ending in "semiochemical concentrations." These two sentences do not make sense. It is not known how many of the identified proteins are actual VSN ligands. Moreover, there is abundant evidence from other studies that individual VSN activity provides information about distinct semiochemical concentrations.

      We have substantially edited and rephrased this paragraph to better reflect that different scenarios / interpretations are possible. The relevant text now reads: “We conclude that VSN population response strength might not be so strongly affected by strain-dependent concentration differences among common urinary proteins. In that case, it would appear somewhat unlikely that individual VSN activity provides fine-tuned information about distinct semiochemical concentrations. Alternatively, as some (or even many) of the identified proteins could not serve as vomeronasal ligands at all, generalist VSNs might sample information from only a subset of compounds which, in fact, are secreted at roughly similar concentrations.”

      (8) The explanation of stimulus timing is mentioned several times but not defined clearly in methods. Page 19, lines 14-19 have information about the stimulus delivery device, but it would be helpful to have stimulus timing explicitly stated.

      In addition to the relevant captions, we now explicitly state stimulus timing (i.e., 10 s stimulations at 180 s inter-stimulus intervals) in the Results.

      (9) Typos: Page 10, line 7: "male biased" → "male-biased" for clarity

      Wilcoxon "signed-rank" test is often misspelled "Wilcoxon singed ranked test" or "Wilcoxon signed ranked test"

      In the Fig. 3 legend, the asterisk meaning is unspecified.

      "(im)balances" → imbalances (page 27, line 24; page 37, line 16; page 38, line 16)

      Figure 2 - figure supplement 1 and in Figure 2 - figure supplement 2, in the box-andwhisker plots the units are not specified in the graph or legend.”

      We have made all required corrections.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This important study utilizes a virus-mediated short hairpin RNA (shRNA) approach to investigate in a novel way the role of the wild-type PHOX2B transcription factor in critical chemosensory neurons in the brainstem retrotrapezoid nucleus (RTN) region for maintaining normal CO2 chemoreflex control of breathing in adult rats. The solid results presented show blunted ventilation during elevated inhaled CO2 (hypercapnia) with knockdown of PHOX2B, accompanied by a reduction in expression of Gpr4 and Task2 mRNA for the proposed RTN neuron proton sensor proteins GPR4 and TASK2. These results suggest that maintained expression of wild-type PHOX2B affects respiratory control in adult animals, which complements previous studies showing that PHOX2B-expressing RTN neurons may be critical for chemosensory control throughout the lifespan and with implications for neurological disorders involving the RTN. When some methodological, data interpretation, and prior literature reference issues further highlighting novelty are adequately addressed, this study will be of interest to neuroscientists studying respiratory neurobiology as well as the neurodevelopmental control of motor behavior.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This important study investigated the role of the PHOX2B transcription factor in neurons in the key brainstem chemosensory structure, the retrotrapezoid nucleus (RTN), for maintaining proper CO2 chemoreflex responses of breathing in the adult rat in vivo. PHOX2B has an important transcriptional role in neuronal survival and/or function, and mutations of PHOX2B severely impair the development and function of the autonomic nervous system and RTN, resulting in the developmental genetic disease congenital central hypoventilation syndrome (CCHS) in neonates, where the RTN may not form and is functionally impaired. The function of the wild-type PHOX2B protein in adult RTN neurons that continue to express PHOX2B is not fully understood. By utilizing a viral PHOX2B-shRNA approach for knockdown of PHOX2B specifically in RTN neurons, the authors' solid results show impaired ventilatory responses to elevated inspired CO2, measured by whole-body plethysmography in freely behaving adult rats, that develop progressively over a four-week period in vivo, indicating effects on RTN neuron transcriptional activity and associated blunting of the CO2 ventilatory response. The RTN neuronal mRNA expression data presented suggests the impaired hypercapnic ventilatory response is possibly due to the decreased expression of key proton sensors in the RTN. This study will be of interest to neuroscientists studying respiratory neurobiology as well as the neurodevelopmental control of motor behavior.

      Strengths:

      (1) The authors used a shRNA viral approach to progressively knock down the PHOX2B protein, specifically in RTN neurons to determine whether PHOX2B is necessary for the survival and/or chemosensory function of adult RTN neurons in vivo.

      (2) To determine the extent of PHOX2B knockdown in RTN neurons, the authors combined RNAScope® and immunohistochemistry assays to quantify the subpopulation of RTN neurons expressing PHOX2B and neuromedin B (Nmb), which has been proposed to be key chemosensory neurons in the RTN.

      (3) The authors demonstrate that knockdown efficiency is time-dependent, with a progressive decrease in the number of Nmb-expressing RTN neurons that co-express PHOX2B over a four-week period.

      (4) Their results convincingly show hypoventilation particularly in 7.2% CO2 only for PHOX2B-shRNA RTN-injected rats after four weeks as compared to naïve and non-PHOX2B-shRNA targeted (NT-shRNA) RTN injected rats, suggesting a specific impairment of chemosensitive properties in RTN neurons with PHOX2B knockdown.

      (5) Analysis of the association between PHOX2B knockdown in RTN neurons and the attenuation of the hypercapnic ventilatory response (HCVR), by evaluating the correlation between the number of Nmb+/PHOX2B+ or Nmb+/PHOX2B- cells in the RTN and the resulting HCVR, showed a significant correlation between HCVR and number of Nmb+/PHOX2B+ and Nmb+/PHOX2B- cells, suggesting that the number of PHOX2B-expressing cells in the RTN is a predictor of the chemoreflex response and the reduction of PHOX2B protein impairs the CO2-chemoreflex.

      (6) The data presented indicate that PHOX2B knockdown not only causes a reduction in the HCVR but also a reduction in the expression of Gpr4 and Task2 mRNAs, suggesting that PHOX2B knockdown affects RTN neurons transcriptional activity and decreases the CO2 response, possibly by reducing the expression of key proton sensors in the RTN.

      (7) Results of this study show that independent of the role of PHOX2B during development, PHOX2B is still required to maintain proper CO2 chemoreflex responses in the adult brain, and its reduction in CCHS may contribute to the respiratory impairment in this disorder.

      Weaknesses:

      (1) The authors found a significant decrease in the total number of Nmb+ RTN neurons (i.e., Nmb+/PHOX2B+ plus Nmb+/ PHOX2B-) in NT-shRNA rats at two weeks post viral injection, and also at the four-week period where the impairment of the chemosensory function of the RTN became significant, suggesting some inherent cell death possibly due to off-target toxic effects associated with shRNA procedures that may affect the experimental results.

      (2) The tissue sampling procedures for quantifying numbers of cells expressing proteins/mRNAs throughout the extended RTN region bilaterally have not been completely validated to accurately represent the full expression patterns in the RTN under experimental conditions.

      (3) The inferences about RTN neuronal expression of NMB, GPR4, or TASK2 are based on changes in mRNA levels, so it remains speculation that the observed reduction in Gpr4 and Task2 mRNA translates to a reduction in the protein levels and associated reduction of RTN neuronal chemosensitive properties.

      Thank you for sharing the excitement for our study showing novel findings on the contribution of PHOX2B to the chemoreflex response and activity of adult RTN neurons. We believe that reporting the results on cell death following shRNA viral injections, potentially due to some off-target effects, are important to share with the scientific community to help plan experiments of similar kind in various fields of neuroscience.

      Thanks for pointing out your concerns about cell quantification, we have edited the methods and results section to add clarity about our analytical procedure.

      As we discussed in the manuscript, we were only able to assess mRNA levels of Nmb, Gpr4, Task2 as current available antibodies for the 3 targets are still unreliable. Future studies will benefit from the analysis of changes at protein levels and possibly electrophysiological recordings to verify that chemosensitive properties of RTN neurons are impaired due to reduction of PHOX2B expression. We discuss these limitations in the discussion.

      Reviewer #2 (Public Review):

      Summary:

      The authors used a short hairpin RNA technique strategy to elucidate the functional activity of neurons in the retrotrapezoid nucleus (RTN), a critical brainstem region for central chemoreception. Dysfunction in this area is associated with the neuropathology of congenital central hypoventilation syndrome (CCHS). The subsequent examination of these rats aimed to shed light on the intricate aspects of RTN and its implications for central chemoreception and disorders like CCHS in adults. They found that using the short hairpin RNA (shRNA) targeting Phox2b mRNA, a reduction of Phox2b expression was observed in Nmb neurons. In addition, Phox2b knockdown did not affect breathing in room air or under hypoxia, but the hypercapnia ventilatory response was significantly impaired. They concluded that Phox2b in the adult brain has an important role in CO2 chemoreception. They thought that their findings provided new evidence for mechanisms related to CCHS neuropathology. The conclusions of this paper are well supported by data, but careful discussion seems to be required for comparison with the results of various previous studies performed by different genetic strategies for the RTN neurons.

      Strengths:

      The most exciting aspect of this work is the modelling of the Phox2b knockdown in one element of the central neuronal circuit mediating respiratory reflexes, that is in the RTN. To date, mutations in the PHOX2B gene are commonly associated with most patients diagnosed with CCHS, a disease characterized by hypoventilation and absence of chemoreflexes, in the neonatal period, which in severe cases can lead to respiratory arrest during sleep. In the present study, the authors demonstrated that the role of Phox2b extends beyond the developmental period, and its reduction in CCHS may contribute to the respiratory impairment observed in this disorder.

      Weaknesses:

      Whereas the most exciting part of this work is the knockdown of the Phox2b in the RTN in adult rodents, the weakness of this study is the lack of a clear physiological, developmental, and anatomical distinction between this approach and similar studies already reported elsewhere (Ruffault et al., 2015, DOI: 10.7554/eLife.07051; Ramanantsoa et al., 2011, DOI: 10.1523/JNEUROSCI.1721-11.2011; Huang et al., 2017, DOI: 10.1016/j.neuron.2012.06.027; Hernandez-Miranda et al., 2018, DOI: 10.1073/pnas.1813520115; Ferreira et al., 2022 DOI: 10.7554/eLife.73130; Takakura et al., 2008 DOI: 10.1113/jphysiol.2008.153163; Basting et al., 2015 DOI: 10.1523/JNEUROSCI.2923-14.2015; Marina et al., 2010 DOI: 10.1523/JNEUROSCI.3141-10.2010). In addition, several conclusions presented in this work are not directly supported by the provided data.

      Thanks for the feedback on or manuscript. We have further highlighted in our discussion the previous developmental work aimed at determining the role of PHOX2B in embryonic development. Our study was triggered by the fascinating observations that despite its important role in development of the central and peripheral nervous system, PHOX2B is still present in the adult brain and its function in adult neurons is unknown, thus we aimed to investigate its role in the adult RTN by knocking down its expression with a shRNA approach. Therefore, in our model knockdown of PHOX2B does not affect development of the RTN. Previous studies (mentioned by the reviewer, as well as cited in the manuscript) have focused on investigating 1) the role of PHOX2B in the developmental period, 2) the physiological changes associated with the transgenic expression of mutant forms of PHOX2B in relation to CCHS, 3) the killing or the acute silencing/excitation of neuronal activity of PHOX2B+ RTN neurons. Our study had a different aim: to test whether the transcription factor PHOX2B had a physiologically relevant role in adult RTN neurons. In this experimental approach PHOX2B is not altered throughout embryonic or postnatal development. By knocking down PHOX2B in the Nmb+ cells of the RTN our results show a reduction in chemoreflex response and mRNA expression of protein sensors. Hence, we conclude that PHOX2B alters the function of Nmb+ RTN neurons, possibly through transcriptional changes including the reduction in Gpr4 and Task2 mRNA expression.

      Reviewer #3 (Public Review):

      A brain region called the retrotrapezoid nucleus (RTN) regulates breathing in response to changes in CO2/H+, a process termed central chemoreception. A transcription factor called PHOX2B is important for RTN development and mutations in the PHOX2B gene result in a severe type of sleep apnea called Congenital Central Hypoventilation Syndrome. PHOX2B is also expressed throughout life, but its postmitotic functions remain unknown. This study shows that knockdown of PHOX2B in the RTN region in adult rats decreased expression of Task2 and Gpr4 in Nmb-expressing RTN chemoreceptors and this corresponded with a diminished ventilatory response to CO2 but did not impact baseline breathing or the hypoxic ventilatory response. These results provide novel insight regarding the postmitotic functions of PHOX2B in RTN neurons.

      Main issues:

      (1) The experimental approach was not targeted to Nmb+ neurons and since other cells in the area also express Phox2b, conclusions should be tempered to focus on Phox2b expressing parafacial neurons NOT specifically RTN neurons.

      (2) It is not clear whether PHOX2B is important for the transcription of pH sensing machinery, cell health, or both. If knockdown of PHOX2B knockdown results in loss of RTN neurons this is also expected to decrease Task2 and Gpr4 levels, albeit by a transcription-independent mechanism.

      Although we did not specifically target Nmb+ neurons, we performed viral injections within the area where neurons expressing PHOX2B and Nmb are localized (i.e., the RTN region). We carefully quantified the impact of PHOX2B knockdown on Nmb expressing neurons, as well as the effects on the adjacent TH expressing C1 population and FN neurons (figure 5). As reported in the results section, significant changes in the numbers of PHOX2B expressing neurons was only observed at the site of injection in PHOX2B+/Nmb+ neurons. We did not observe changes in the total number of C1 cells (TH+/PHOX2B+), in the number of TH cells coexpressing PHOX2B, or in the hypoxic ventilatory response (which is dependent on the health status of C1 neuron). We have updated figure 5 to show representative expression of PHOX2B in TH+ neurons in the ventral medulla to complement our cell count analysis. To address potential effects on other cell populations we have edited our discussion as follows:

      “PHOX2B knockdown was also restricted to RTN neurons, as adjacent C1 TH+ neurons did not show any change in number of TH+/PHOX2B+ expressing cells, although we cannot exclude that some C1 cells may have been infected and their relative PHOX2B expression levels were reduced. To support the lack of significant alterations associated with the possible loss of C1 function was the absence of significant changes in the hypoxic response that has been shown to be dependent on C1 neurons (Malheiros-Lima et al., 2017).”

      Where appropriate, we have substituted “RTN” with “Nmb expressing neurons of the RTN” throughout the manuscript.

      We have clarified in the methods and results section how we quantified Task2 and Gpr4 mRNA expression. The quantification was performed on a pool of single cells (200-250/rat) expressing Nmb. Hence, the overall reduction is not a result of general fluorescence loss in the RTN region, but specifically assessed in single cells expressing Nmb. This is therefore independent of the reduction of the total number of Nmb cells.

      We propose that cell death is not a direct effect of PHOX2B knockdown, but rather it is associated with the injection of the viral constructs that have been already reported to promote some off-target effects (as reported in the manuscript). While modest cell death is observed only in the first two weeks post-infection, it does not increase further between 2 and 4 weeks post infection, when the reduction in PHOX2B (not associated with a further reduction in Nmb+ cells, hence no further cell death in RTN) is evident and the respiratory chemoreflex is impaired. These results suggest that 1) reduction of PHOX2B is not responsible for cell death; 2) it is the reduction of PHOX2B levels that promotes chemoreflex impairment. Given the observation that Nmb cells with no detectable PHOX2B protein show reduced expression of Task2 and Gpr4 mRNA, we propose that one of the possible mechanisms of chemoreflex impairment in PHOX2B shRNA rats is the reduction of Task2 and Gpr4. In the discussion we also suggest possible additional mechanisms that can be investigated in further studies.

      Recommendations for the authors:.

      In revising this manuscript, the authors should carefully address the issues raised by the reviewers to substantially improve the manuscript and solidify the reviewers' general assessment of the potential importance of this work.

      Reviewer #1 (Recommendations For The Authors):

      Major concerns:

      (1) The cell counts for Nmb+/PHOX2B+ and Nmb+/PHOX2B- RTN neurons are a critical component of the study, and it is unclear how the tissue sampling procedures (eight sections per animal) for quantifying numbers of cells expressing proteins/mRNAs throughout the extended RTN region bilaterally has been validated to accurately represent the full expression patterns in the RTN under the experimental conditions. It is possible that the sampling/quantification procedures used may be adequate, but validation is important. Also, quantification of the CTCF signal for Nmb, Gpr4, and Task2 mRNA is an important component of this study, but only four sections/rats were used.

      Thank you for pointing out your concern on our quantification method. We have clarified in the methods section the procedure for cell counting and quantification of the CTCF signal. We have sampled the area of the RTN in order to identify Nmb cells of RTN.

      We have edited the methods section as follows:

      “To quantify Nmb+/PHOX2B- and Nmb+/PHOX2B+ neurons within the RTN region, we analysed one in every seven sections (210 µm interval; 8 sections/rat in total) along the rostrocaudal distribution of the RTN on the ventral surface of the brainstem and compared total bilateral cell counts of PHOX2B-shRNA rats with non-target control (NT-shRNA) and naïve rats. Cells that expressed Nmb and Phox2b mRNAs but did not show co-localization with PHOX2B protein were considered Nmb+/PHOX2B-.

      The Corrected Total Cell Fluorescence (CTCF) signal for Nmb, Gpr4 and Task2 mRNAs was quantified as previously described (Cardani et al., 2022; McCloy et al., 2014). Briefly, a Leica TCS SP5 (B-120G) Laser Scanning Confocal microscope was used to acquire images of the tissue. Exposure time and acquisition parameters were set for the naïve group and kept unchanged for the entire dataset acquisition. The collected images were then analysed by selecting a single cell at a time and measuring the area, integrated density and mean grey value (McCloy et al., 2014). For each image, three background areas were used to normalize against autofluorescence. We used 4 sections/rat (210 µm interval) to count Nmb, Gpr4 and Task2 mRNA CTCF in the core of the RTN area where several Nmb cells could be identified. For each section two images were acquired with a 20× objective, so that at least fifty cells per tissue sample were obtained for the mRNA quantification analysis. To evaluate changes in Nmb mRNA expression levels following PHOX2B knockdown at the level of the RTN, we compared, the fluorescence intensity of each RTN Nmb+ cell (223.2 ± 37.1 cells/animal) with the average fluorescent signal of Nmb+ cells located dorsally in the NTS (4.3 ± 1.2 cells/animal) (Nmb CTCF ratio RTN/NTS) as we reasoned that the latter would not be affected by the shRNA infection and knockdown.

      To quantify Gpr4 and Task2 mRNA expression in Nmb cells of the RTN, we first quantified single cell CTCF for either Gpr4 (200.7 ± 13.2 cells/animal) or Task2 (169.6 ± 10.3 cells/animal) mRNA in Nmb+ RTN neurons in the 3 experimental groups (naïve, NT shRNA and PHOX2B shRNA) independent of their PHOX2B expression. We then compared CTCF values of Gpr4 and Task2 mRNA between Nmb+/PHOX2B+ and Nmb+/PHOX2B- RTN neurons in PHOX2B-shRNA rats to address changes in their mRNA expression induced by PHOX2B knockdown.

      (2) Furthermore, to evaluate changes in Nmb mRNA expression following PHOX2B knockdown at the level of the RTN, it is stated in Materials and Methods "we compared, on the same tissue section, the fluorescence intensity of RTN Nmb+ cells with the signal of Nmb+ cells in the NTS (Nmb CTCF ratio RTN/NTS)". How this was accomplished is unclear, considering the non-overlapping locations of the RTN and rostral NTS. Providing images would be helpful.

      The first sections containing Nmb cells in the ventral medulla also express few Nmb cells in the dorsal medulla. We used those cells as reference for fluorescence levels since they would not be affected by the viral infection. Similar cells are also present in the brains of mice and reported in the Allen Brain atlas (https://mouse.brain-map.org/experiment/show/71836874). We have clarified our procedure in the methods section (see above) and included a sample image of Nmb in both ventral and dorsal regions in Figure 5.

      (3) The staining for tyrosine hydroxylase (TH) to identify and quantify C1 cells (TH+/PHOX2B+) following shRNA injection provides important information, and it would be useful to show images of histological examples to accompany Fig. 5A.

      We included in figure 5A a sample image of C1 neurons used for our TH quantification.

      Minor:

      (1) Provide animal ns in the text of the Results section for the four weeks of PHOX2B knockdown.

      They have been included.

      (2) Please state in the legends for Figures 2 & 3, which images are superimposition images.

      We have in the figure information about merged images.

      Reviewer #2 (Recommendations For The Authors):

      This manuscript by Cardani and colleagues attempts to address whether a reduction in Phox2b expression in chemosensitive neuromedin-B (NMB)-expressing neurons in the RTN alters respiratory function. The authors used a short hairpin RNA technique to silence RTN chemosensor neurons. The present study is very interesting, but there are several major concerns that need to be addressed, including the main hypothesis.

      Major

      (1) Page 6, lines 119-121: I did not grasp the mechanistic property described by the authors in this passage, nor did I understand the experiments they conducted to establish a mechanistic link between Phox2b and the chemosensitive property. Could the authors provide further clarification on these points?

      We believe the reviewer refers to this paragraph: “In order to have a better understanding of the role of PHOX2B in the CO2 homeostatic processes we used a non-replicating lentivirus vector of two short-hairpin RNA (shRNA) clones targeting selectively Phox2b mRNA to knockdown the expression of PHOX2B in the RTN of adult rats and tested ventilation and chemoreflex responses. In parallel, we also determined whether knockdown of PHOX2B in adult RTN neurons negatively affected cell survival. Finally, we sought to provide a mechanistic link between PHOX2B expression and the chemosensitive properties of RTN neurons, which have been attributed to two proton sensors, the proton-activated G protein-coupled receptor (GPR4) and the proton-modulated potassium channel (TASK-2).”

      The rationale for running these experiments is based on the fact that it is well known in the literature that PHOX2B is an important transcription factor for the development of several neuronal populations. PHOX2B Knockout mice die before birth and heterozygous mice have some anatomical defects, but respiration is only impaired in the early post-natal period. While many developmental transcription factors are generally downregulated in the post-natal period, PHOX2B is still expressed in some neurons into adulthood. What is the function of PHOX2B in these fully developed neurons? We do not know as we do not yet know the entire set of target genes that PHOX2B regulates in the adult brain. Hence we decided to test what would happen if we knocked down the PHOX2B protein in the Nmb neurons of the RTN, an area that is critical for central chemoreception and involved in the presentation of CCHS. Our results show that reduction of PHOX2B blunts the CO2 chemoreflex response and reduces mRNA expression of Task2 and Gpr4, two pH sensors that have been shown to be key for RTN chemosensitive properties. We also show that the Nmb mRNA and cell survival are not affected by PHOX2B knockdown and we propose that the reduced CO2 chemoreflex may be attributed to a reduction of chemosensory function of Nmb neurons of the RTN due to partial loss of Gpr4 and Task2.

      (2) It is imperative for the authors to enhance the description of their hypothesis, as, from my perspective, the contribution of the data to the field is not clearly articulated. Numerous more selectively designed experiments were conducted to investigate the role of Phoxb-expressing neurons at the RTN level and their involvement in respiratory activity. In summary, the current study appears to lack novelty.

      We respectfully disagree with this statement. We believe we have adequately summarized previous work, although we realize we can’t reference every single publication on this topic. As described above, the developmental role of PHOX2B has been elegantly investigated in mouse embryonic studies (extensively cited in the manuscript). Furthermore, very interesting studies have shown that when the CCHS defining mutant PHOX2B protein (+7Ala PHOX2B) and other mutations linked to CCHS have been transgenically expressed in mice through development, severe anatomical defects are observed and respiratory function is impaired (extensively cited in the manuscript). We have also cited papers relevant to this study that describe the role of PHOX2B/Nmb RTN neurons and the pH protein sensors in the CO2 chemoreflex. If we missed some papers that the reviewer deems essential in the context of this study we will be happy to include them.

      We are not aware of other studies that have investigated the specific role of the PHOX2B protein in the adult RTN in the absence of confounding developmental pathogenesis (i.e. in an otherwise ‘healthy’ animal), and of no other studies that looked at the effects on the RTN proton sensors and Nmb expression following PHOX2B knockdown. Hence we believe that our results are novel and, in our opinion, very interesting.

      (3) On pages 13 and 14 (Results section), I am seeking clarity on the novelty of the findings. Doug Bayliss's prior work has already demonstrated the role of Gpr4 and Task2 on Phox2b neurons in regulating ventilation in conscious rodents.

      Bayliss’ group has elegantly demonstrated that Gpr4 and Task2 are the two proton sensors in the PHOX2B/Nmb neurons of the RTN that have a key role in chemoreception (cited in the manuscript). The novelty of our findings is that we show that a reduction in PHOX2B protein is associated with a reduction of mRNA levels of Gpr4 and Task2. This is a novel finding. Currently, we do not know what transcriptional activity PHOX2B has in adult RTN neurons (i.e., what gene targets PHOX2B has in this cell population and many others) and here we propose that Nmb is not a gene target of PHOX2B while Gpr4 and Task2 are.

      (4) The authors assert that the transcription factor Phox2b remains not fully understood. While I concur, the present study falls short of fully investigating the actual contribution of Phox2b to breathing regulation. In other words, the knockdown of Phox2b neurons did not add much to the knowledge of the field.

      We respectfully disagree with the reviewer. With the exception of very few target genes, the transcriptional role of PHOX2B beyond the embryonic development is poorly understood. No mechanistic connection has been made before between the transcriptional activity of PHOX2B with the expression of proton sensors in the RTN. Other groups have investigated the role of stimulating or depressing the neuronal activity of PHOX2B/NMB neurons in the RTN showing a key role of RTN on respiratory control, but these prior studies did not test whether changing the expression of the PHOX2B protein in these neurons had a role on respiratory control and the central chemoreflex. No other study has investigated the role of the PHOX2B protein within the RTN cells, with the exception of PHOX2B knockout mice or transgenic expression of the mutated PHOX2B that are relevant for CCHS. Again, these previous studies were done on a background of developmental impairment and to the best of our knowledge did not seek to show any association between PHOX2B expression and expression of Gpr4 or Task2.

      (5) I recommend removing the entire section entitled "The role of Phox2b in development and in the adult brain." The authors merely describe Phox2b expression without contextualizing it within the obtained data.

      Because reviewers raised the issue about not including important information about the role of PHOX2B in development and respiratory control we prefer to keep the section.

      (6) Are the authors aware of whether the shRNA in Phox2b/Nmb neurons truly induced cell death or solely depleted the expression of the transcription factor protein? Do the chemosensitive neurons persist?

      This is an excellent question that we tried to address with our study. As we report in figures 2 and 3, we propose that some cell death is occurring as an off-target effect within the first 2 weeks post-infection, likely due to off-target action of the shRNA approach and not dependent on the reduction of PHOX2B expression (discussed in the manuscript). This is further evidenced by our Fig.S1 data in which higher concentrations of shRNA led to more cell death, indicative of off-target effects. We do not believe it is a consequence of our surgical procedure as we do not see similar cell loss when injecting vehicle or other control solutions (unpublished work; Janes et al., 2024).

      During the first 2 weeks post-surgery the proportion of Nmb+/PHOX2B- cells does not change compared to control rats or non-target shRNA (knockdown is not yet visible at protein level). Four weeks post-injection, there is no further cell death (assessed by the total number of NMB cells), whereas the fraction of NMB cells that express PHOX2B is reduced (and the fraction of NMB not expressing PHOX2B is increased), suggesting that the reduction of PHOX2B protein in Nmb cells is not correlated with cell loss/survival whereas the impairment that we observe in terms of central chemoreception is possibly due to the progressive decrease of PHOX2B expression in these neurons.

      (7) In Figures 2 and 3, it is noteworthy that the authors observe peak expression at a very caudal level. In rats, the RTN initiates at the caudal end of the facial, approximately 11.6 mm, and should exhibit a rostral direction of about 2 mm.

      In our experience the Nmb cells on the ventral surface of the medulla peak in number around the caudal tip of the facial nucleus in adult SD rats (Janes et al., 2024). To add clarity to the figure we reported cell count distribution data in relation to the distance from caudal tip of the facial.

      Minor

      (1) I would like to suggest that the authors correct the recurring statement throughout the manuscript that Phox2b is essential only for the development of the autonomic nervous system. In my view, it also plays a crucial role in certain sensory and respiratory systems.

      We have addressed this in the manuscript.

      (2) Page 4, lines 59-60: Out of curiosity, do the data include information from different countries?

      This data refers to information from France and Japan. Currently it is estimated that there are 1000-2000 CCHS patients worldwide.

      (3) Page 7, lines 129-131: In my understanding, the sentence is quite clear; if we knock down the PHOX2B gene, we are expected to reduce or even eliminate the expression of Gpr4 or Task2. Am I right?

      This is what we propose from the results of this study. We would like to point out that the transcriptional activity of PHOX2B (i.e., what genes PHOX2B regulate) in adult neurons has not yet been fully investigated. With the exception of few target genes (e.g., TH, DBH) the transcriptional activity of PHOX2B in neurons is not yet known. Here we report novel findings that suggest that Gpr4 and Task2 are potential target genes of PHOX2B in RTN neurons.

      (4) The authors mentioned that NT-shRNA also impacts CO2 chemosensitivity. Could this effect be attributed to mechanical damage of the tissue resulting from the injection?

      Just to clarify, we observe some impairment in chemosensitivity when NT-shRNA was injected in “larger” (2x 200ul/side) volume. No impairment was observed in NT-shRNA when we injected smaller volumes (2x 100ul/side). Physical damage could be a possibility although in our experience (unpublished work; Janes et al, 2024, Acta Physiologica) injections of similar volume of solution performed by the same investigator in the same brain area and experimental settings did not produce a physical lesion associated with respiratory impairment. Hence we attribute the unexpected results with larger volumes to toxic effects associated with the shRNA viral constructs.

      (5) In the reference section, the authors should review and correct some entries. For instance, Janes, T. A., Cardani, S., Saini, J. K., & Pagliardini, S. (2024). Title: "Etonogestrel Promotes Respiratory Recovery in an In Vivo Rat Model of Central Chemoreflex Impairment." Running title: "Chemoreflex Recovery by Etonogestrel." Some references contain the journal, pages, and volume, while others lack this information entirely.

      We have updated references. Janes et al., 2024 has now been published in Acta Physiologica.

      (6) Why does the baseline have distribution points, whereas the other boxplots do not?

      We have clarified in the figure legend that, to be fair to the presentation of our results, the data points shown in some of the boxplot graphs do not refer to entire baseline data but only the ones that are outliers.

      In our Box-and Whisker-Plots, whiskers represent the 10th and 90th percentiles, showing the range of values for the middle 80% of the data. Individual data values that fall outside the 10th/90th percentile range are represented as single point (outliers).

      Reviewer #3 (Recommendations For The Authors):

      • What is the rationale behind dedicating the first paragraph of results to discussing an artifact?

      We think that it is important to report off target effects of shRNA viral constructs as concentration and volumes of viruses injected in various studies vary considerably and other investigators may attempt to use larger volumes of viruses to obtain more considerable or faster knockdown but would obtain erroneous conclusions if appropriate tests are not performed.

      Furthermore, because some readers could question whether we injected enough virus to knockdown the expression of PHOX2B, and may wonder if with a larger amount of virus we would increase knockdown efficiency, we wanted to show that, in our opinion, we used the maximum amount of virus to knockdown PHOX2B without causing toxic effects or physiological changes that are not dependent on PHOX2B knockdown.

      • All individual data points should be visible in floating bar graphs in Figures 1 and 4. For example, I don't see any dots for naïve animals in any of the panels in Figure 1.

      We have clarified in the figure legend that, to be fair to the presentation of our results, the data points shown in some of the boxplot graphs do not refer to entire baseline data but only the ones that are outliers.

      In our Box-and Whisker-Plots, whiskers represent the 10th and 90th percentiles, showing the range of values for the middle 80% of the data. Individual data values that fall outside the 10th/90th percentile range are represented as single point (outliers).

      • Please include specific F and T values along with DF.

      We have included a table with all the specific values in the supplementary section as Table 1.

      • The C1 and facial partly overlap with the RTN at this level of the medulla and these cells should appear as Phox2b+/Nmb- cells so it is not clear to me why these cells are not evident in the control tissue in Figures 2B and 3B. Also, some of the bregma levels shown in Figure 5A overlap with Figures 2-3 so again it is not clear to me how this non-cell type specific viral approach was targeted to Nmb cells but not nearby TH+ cells. Please clarify.

      In our experience, C1 TH cells are located slightly medial to the Nmb cells and they spread much more caudally than Nmb cells of the RTN. We focused our small volume injection in the core of the RTN to target Nmb cells but we also assessed PHOX2B knockdown in TH C1 cells by counting the PHOX2B/TH cells across treatment groups. Although we can’t exclude subtle changes in the C1 population, we did not observe changes in the total number of C1 cells (TH+/PHOX2B+), in the number of TH cells expressing PHOX2B, or in the hypoxic ventilatory response (which is dependent on the health status of C1 neuron). We have updated figure 5 to show representative expression of PHOX2B in TH+ neurons in the ventral medulla to complement our cell count analysis. To address potential effects on other cell populations we have edited our discussion as follows:

      “PHOX2B knockdown was also restricted to RTN neurons, as adjacent C1 TH+ neurons did not show any change in number of TH+/PHOX2B+ expressing cells, although we cannot exclude that some C1 cells may have been infected and their relative PHOX2B expression levels were reduced. To support the lack of significant alterations associated with the possible loss of C1 function was the absence of significant changes in the hypoxic response that has been shown to be dependent on C1 neurons (Malheiros-Lima et al., 2017).”

      • To confirm, Nmb is not expressed in the NTS, and this region was chosen as a background, right?

      In order to systematically analyze Nmb mRNA expression we decided to use measurement of fluorescence relative to Nmb neurons present in the dorsal brainstem. Here cells are sparse but we used them as reference fluorescence since they would not be affected by the ventral shRNA injection. Similar cells are also present in the brains of mice and reported by the Allen Brain atlas (https://mouse.brain-map.org/experiment/show/71836874). We have clarified our procedure in the methods section (see above) and included a sample image of Nmb in both ventral and dorsal in Figure 5.

      • How do you get a loss of Nmb+ neurons (Figs 2-3) with no change in Nmb fluorescence (Fig. 5B)? In the absence of representative images these results are not compelling and should be substantiated by more readily quantifiable approaches like qPCR.

      We have clarified in the methods and results section our analytical procedure to assess PHOX2B and Nmb expression. Figure 2 and 3 display the results of counting numbers of Nmb+ cells in the RTN. Figure 5B reports the average of total cell fluorescence measured inside Nmb+ cells, not an average fluorescence measurement of the area of the ventral medulla. Basically, our results show that we have less Nmb cells that express PHOX2B but the overall Nmb mRNA fluorescence (expression) in Nmb cells relative to Nmb fluorescence in cells of the dorsal brainstem is the same.

      We have edited the methods as follows:

      “The Corrected Total Cell Fluorescence (CTCF) signal for Nmb, Gpr4 and Task2 mRNAs was quantified as previously described (Cardani et al., 2022; McCloy et al., 2014). Briefly, a Leica TCS SP5 (B-120G) Laser Scanning Confocal microscope was used to acquire images of the tissue. Exposure time and acquisition parameters were set for the naïve group and kept unchanged for the entire dataset acquisition. The collected images were then analysed by selecting a single cell at a time and measuring the area, integrated density and mean grey value (McCloy et al., 2014). For each image, three background areas were used to normalize against autofluorescence. We used 4 sections/rat (210 µm interval) to count Nmb, Gpr4 and Task2 mRNA CTCF in the core of the RTN area where several Nmb cells could be identified. For each section two images were acquired with a 20× objective, so that at least fifty cells per tissue sample were obtained for the mRNA quantification analysis. To evaluate changes in Nmb mRNA expression levels following PHOX2B knockdown at the level of the RTN, we compared the fluorescence intensity of each RTN Nmb+ cell (223.2 ± 37.1 cells/animal) with the average fluorescent signal of Nmb+ cells located dorsally in the NTS ( 4.3 ± 1.2 cells/animal) (Nmb CTCF ratio RTN/NTS) as we reasoned that the latter would not be affected by the shRNA infection and knockdown. “

      A single cell qPCR analysis would be definitely ideal but a qPCR from dissected tissue would not help us determine whether within a cell there was a reduction in Nmb mRNA levels.

      • The boxed RTN region in these examples is all over the place. It the RTN should be consistently placed along the ventral surface under the facial and pprox.. equal distance from the trigeminal and pyramids.

      We have update the figures to consistently present the areas of interest where Nmb cells are located and images are taken.

      • Fluorescent in situ typically appears as discrete puncta so it is not clear to me why that is not the case here.

      Our images are taken at low magnification (20X) where it is difficult to distinguish the single mRNA molecules. However, is it possible to appreciate the differences between the grainy fluorescent signal in the in situ hybridization assay (RNAScope) and the smoother signal of protein detection in the immunofluorescence assay.

      • Can TUNEL staining be done to confirm loss of Nmb neurons is due to death and not re-localization?

      Does the reviewer mean “cell migration” with relocalization? We do not expect that this would occur in our experiments. Although TUNEL in the first week post-infection could be useful to determine cell death in our tissue, we do not expect a cell migration of neurons within the brain as our viral shRNA injections are performed in adult rats when developmental processes are already concluded.

    1. Author response:

      We sincerely thank the editors and reviewers for the rigorous evaluation of our work and the precious time invested. The positive comments resonate with our endeavor to explore the intrinsic role of astrocyte aquaporin in brain water homeostasis. Meanwhile, we very appreciate the constructive suggestions of the reviewers to consolidate this study. Here is the provisional response, which briefly outlines our acknowledgement of the reviewers’ suggestions:

      To Reviewer #1:

      • Imaging data will be examined and collected to determine whether AQP4 inhibition has differential effects on astrocyte calcium signals in terms of cellular locations.

      • New analysis will be performed for CSD swelling data to provide additional kinetic information.

      • The mentioned original papers are important, and will be included in the revision.

      To Reviewer #2:

      We agree, a careful revision will improve and better position the study.

      • Echoing Reviewer #1, the introduction and discussion will be strengthened with current scientific contexts, while paying attention to the important advances in glymphatic system. The limits of the study mentioned in the reviews will be stated.

      • The use of TGN-020 was based on its validation by wide range of ex vivo and in vivo studies. AER-270(271) was nicely introduced by Farr et al., 2019 (PMID: 30738082). Its validation in vivo in AQP4 KO mice, and the comparison to TGN-020, is reported in a very recent study (Giannetto et al., 2024 - PMID: 38363040) that provides valuable insights.

      • The description of specific methodologies, including the DW-MRI, will be reinforced. The presentation of experiments and statistical analysis will be refined.

      To Reviewer #3:

      • Solenov et al., 2004 (PMID: 14576087) used the calcein quenching assay and KO mice convincingly showing AQP4 is a functional water channel in cultured astroctyes. AQP4 deletion reduced both astrocyte water permeability and the absolute amplitude of swelling over comparable time, and also slowed down cell shrinking, which overall parallels our results from acute AQP4 blocking. Yet in Solenovr’s study, the time to swelling plateau was prolonged in AQP4 KO astrocytes, differing from our data of acute blocking. This difference may be due to compensatory mechanisms in chronic AQP4 KO, or reflect the different volume responses in cultured astrocytes from brain slices/in vivo results as noted previously (e.g., Risher et al., 2009 - PMID: 18720409). As suggested, methods for volume recordings will be examined.

      • It is an important point that TGN-020 partially blocks AQP4, implying the actual functional impact of AQP4 per se might be stronger than what we observed. TGN provides a means to acutely probe AQP4 function in situ, still we agree, its limitation needs be acknowledged.

      • As also pointed by Reviewer #2, the description and interpretation of DW-MRI data will be improved.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This is a valuable computational study that applies the machine learning method of bilinear modeling to the problem of relating gene expression to connectivity. Specifically, the author attempts to use transcriptomic data from mouse retinal neurons to predict their known connectivity. The results are promising, although the reviewers felt that demonstration of the general applicability of the approach required testing it against a second data set. Hence the present results were felt to provide borderline incomplete support for a key premise of the paper.

      We thank the reviewers for their insightful and constructive feedback. In response to the reviews, we have undertaken a comprehensive revision of our manuscript, incorporating changes and improvements as outlined below:

      (1) New results have been included showcasing the application of our bilinear model to a seconddataset focusing on C. elegans gap junction connectivity. This extension validates our model with a biological context other than mouse retina and facilitates a direct comparison with the spatial connectome model (SCM).

      (2) A new section titled "Previous Approaches" has been added to background, situating our studywithin the broader landscape of existing modeling methodologies.

      (3) The discussion sections have been expanded to fully incorporate the suggestions and insightsoffered by the reviewers. This includes a deeper exploration of the implications of our findings, potential applications of our model, and a more thorough consideration of its limitations and future directions.

      (4) To streamline the main text and ensure that the core narrative remains focused and accessible, select figures and tables have been relocated to the "Supplementary Materials" section.

      Reviewer 1 (Public Review):

      Summary of what the author was trying to achieve: In this study, the author aimed to develop a method for estimating neuronal-type connectivity from transcriptomic gene expression data, specifically from mouse retinal neurons. They sought to develop an interpretable model that could be used to characterize the underlying genetic mechanisms of circuit assembly and connectivity.

      Strengths:

      The proposed bilinear model draws inspiration from commonly implemented recommendation systems in the field of machine learning. The author presents the model clearly and addresses critical statistical limitations that may weaken the validity of the model such as multicollinearity and outliers. The author presents two formulations of the model for separate scenarios in which varying levels of data resolution are available. The author effectively references key work in the field when establishing assumptions that affect the underlying model and subsequent results. For example, correspondence between gene expression cell types and connectivity cell types from different references are clearly outlined in Tables 1-3. The model training and validation are sufficient and yield a relatively high correlation with the ground truth connectivity matrix. Seemingly valid biological assumptions are made throughout, however, some assumptions may reduce resolution (such as averaging over cell types), thus missing potentially important single-cell gene expression interactions.

      Thank you for recognizing the strengths of our work, particularly the clarity of the model presentation and its foundation in recommendation systems. In the revised manuscript we have also extended the model’s capabilities to analyze gene interactions for neural connectivity at single-cell resolution, when gene expression and connectivity of each cell are known simultaneously.

      Weaknesses:

      The main results of the study could benefit from replication in another dataset beyond mouse retinal neurons, to validate the proposed method. Dimensionality reduction significantly reduces the resolution of the model and the PCA methodology employed is largely non-deterministic. This may reduce the resolution and reproducibility of the model. It may be worth exploring how the PCA methodology of the model may affect results when replicating. Figure 5, ’Gene signatures associated with the two latent dimensions’, lacks some readability and related results could be outlined more clearly in the results section. There should be more discussion on weaknesses of the results e.g. quantification of what connectivity motifs were not captured and what gene signatures might have been missed.

      We acknowledge the significance of validating our method across different datasets. In line with this, our revised manuscript now includes an expanded analysis utilizing a C. elegans gap junction connectivity dataset, which not only broadens the method’s demonstrated applicability but also underscores its versatility across varied neuronal systems.

      To address the concern of resolution and reproducibility associated with PCA preprocessing, we have conducted a comparative analysis from five replicates of the bilinear model, presenting the results in the revised manuscript (Figure S3). This analysis confirms the consistency of the solutions, as evidenced by the similarity metrics. Furthermore, we discussed alternative methodologies, such as L1 or L2 regularization, to tackle multicollinearity, offering flexibility in preprocessing choices.

      In response to feedback on the original Figure 5’s clarity, we have replaced the original Figure 5e-h with Table S4, which summarizes the gene ontology (GO) enrichment results and quantifies the number of genes associated with aspects of neural development and synaptic organization. This revision aims to improve the interpretability and accessibility of the results, ensuring a clearer presentation of the model’s insights.

      Finally, we have expanded our discussion to address the study’s limitations more comprehensively. This includes exploration of potentially missed connections and gene signatures, such as transcription factors, which might not be captured by a linear model due to its inherent preference for predictors with strong correlations to the target variable.

      The main weakness is the lack of comparison against other similar methods, e.g. methods presented in Barabási, Dániel L., and Albert-László Barabási. "A genetic model of the connectome." Neuron 105.3 (2020): 435-445. Kovács, István A., Dániel L. Barabási, and Albert-László Barabási. "Uncovering the genetic blueprint of the C. elegans nervous system." Proceedings of the National Academy of Sciences 117.52 (2020): 33570-33577. Taylor, Seth R., et al. "Molecular topography of an entire nervous system." Cell 184.16 (2021): 4329-4347.

      We value your suggestion to compare our model with established methods. The revised manuscript now includes a comparative analysis with the spatial connectome model (SCM) using the same C. elegans dataset. In addition, a section reviewing previous approaches has been included in the background part, and the discussion part has been extended for the comparison.

      Appraisal of whether the author achieved their aims, and whether results support their conclusions: The author achieved their aims by recapitulating key connectivity motifs from single-cell gene expression data in the mouse retina. Furthermore, the model setup allowed for insight into gene signatures and interactions, however could have benefited from a deeper evaluation of the accuracy of these signatures. The author claims the method sets a new benchmark for single-cell transcriptomic analysis of synaptic connections. This should be more rigorously proven. (I’m not sure I can speak on the novelty of the method)

      In the revised manuscript. we emphasized the bilinear model’s innovative application in the context of neuronal connectivity analysis, inspired by collaborative filtering in recommendation systems. We present quantitative performance metrics, such as the ROC-AUC score and Pearson correlation coefficient, as well as its comparison with the SCM, to benchmark our model’s efficacy in reconstructing connectivity matrices. We also quantified the overlap of the genetic interactions revealed by the bilinear model and the SCM (using the C. elegans dataset), and reported the percentage of the top genes associated with neural development and synaptic organization (using the mouse retina dataset). These numbers set a precedent for future methodological comparisons.

      Discussion of the likely impact of the work on the field, and the utility of methods and data to the community : This study provides an understandable bilinear model for decoding the genetic programming of neuronal type connectivity. The proposed model leaves the door open for further testing and comparison with alternative linear and/or non-linear models, such as neural networkbased models. In addition to more complex models, this model can be built on to include higher resolution data such as more gene expression dimensions, different types of connectivity measures, and additional omics data.

      We are grateful for your recognition of the study’s potential impact. The bilinear model indeed offers a foundation for future explorations, allowing for integration with more complex models, higher-resolution data, and diverse connectivity measures.

      Reviewer 1 (Recommendations For The Authors):

      The inclusion of predicted connectivity (Figure 6) of unknown BC neurons is useful as it shows that this is a strong hypothesis generation tool. This utility should potentially be showcased more as it is also brought up in the abstract, "genetic manipulation of circuit wiring", with an explanation of how the model could be leveraged as such. The discussion may benefit from a summarizing sentence regarding which key gene signatures were identified and are in line with the literature, which key gene signatures/connectivity motifs may have been missed, and which gene signatures are novel.

      Thank you for the insightful recommendation on emphasizing the model’s utility in generating hypotheses, particularly regarding predicting connectivity. In the revised manuscript, we have expanded the discussion on how our model can be leveraged to guide genetic manipulations at altering circuit wiring and highlighted its potential impact in the field.

      We have discussed key gene signatures identified from our model that are in line with existing literature, such as plexins and cadherins, which have been previously recognized for their involvement in synaptic connection formation and maintenance. We have also introduced potential new candidates, such as delta-protocadherins. In the revised manuscript, we summarized potentially missed gene signatures or synaptic connections, to provide a comprehensive view of our findings.

      Reviewer 2 (Public Review):

      Summary:

      In this study, Mu Qiao employs a bilinear modeling approach, commonly utilized in recommendation systems, to explore the intricate neural connections between different pre- and post-synaptic neuronal types. This approach involves projecting single-cell transcriptomic datasets of pre- and post-synaptic neuronal types into a latent space through transformation matrices. Subsequently, the cross-correlation between these projected latent spaces is employed to estimate neuronal connectivity. To facilitate the model training, connectomic data is used to estimate the ground-truth connectivity map. This work introduces a promising model for the exploration of neuronal connectivity and its associated molecular determinants. However, it is important to note that the current model has only been tested with Bipolar Cell and Retinal Ganglion Cell data, and its applicability in more general neuronal connectivity scenarios remains to be demonstrated.

      Strengths:

      This study introduces a succinct yet promising computational model for investigating connections between neuronal types. The model, while straightforward, effectively integrates singlecell transcriptomic and connectomic data to produce a reasonably accurate connectivity map, particularly within the context of retinal connectivity. Furthermore, it successfully recapitulates connectivity patterns and helps uncover the genetic factors that underlie these connections.

      Thank you for your positive assessment of the paper.

      Weaknesses:

      (1) The study lacks experimental validation of the model’s prediction results.

      We recognize the importance of experimental validation in substantiating the predictions made by computational models. While the primary focus of this study remains computational, we have dedicated a section in the revised manuscript, titled "Experimental Validation of Candidate Genes", to outline proposed methodologies for the empirical verification of our model’s predictions. This section specifically discusses the experimental exploration of novel candidate genes, such as deltaprotocadherins, within the mouse retina using AAV-mediated CRISPR/Cas9 genetic manipulation. We plan to collaborate with experimental laboratories to facilitate the validation. Given the extensive nature of experimental work, both in terms of time and resources, it is more pragmatic to present a comprehensive experimental investigation in a follow-up study.

      (2) The model’s applicability in other neuronal connectivity settings has not been thoroughly explored.

      The question of the model’s broader applicability is well-taken. In response, we have expanded our analysis to include additional neuronal data and connectivity settings. Specifically, the revised manuscript includes results where we apply the model to a dataset of C. elegans gap junction connectivity, demonstrating its potential in different neuronal systems. This extension serves to illustrate the model’s adaptability and potential applicability to a broader range of neuronal connectivity studies.

      (3) The proposed method relies on the availability of neuronal connectomic data for model training,which may be limited or absent in certain brain connectivity settings.

      We acknowledge the limitations posed by the model’s dependency on comprehensive connectomic data, which may not be readily available across all research contexts. To address this, we have discussed in the revised manuscript several alternative strategies to adapt our model to the available data. This includes exploring the potential of applying the model to available data such as projectome, and integrating other data modalities such as electrophysiological measurements. These initiatives aim to enhance the model’s applicability and ensure its utility in a broader spectrum of brain connectivity studies, especially in scenarios where detailed connectomic data are not available.

      Reviewer 2 (Recommendations For The Authors):

      Q1. In this work, the author has mainly been studying the retina neuronal type connectivity, it will be interesting to see whether the model works for other brain regions or other neuronal type connectivity as well.

      We value your interest in the model’s applicability to other brain regions and neuronal types. To address this, we have extended our analysis in the revised manuscript to include a study on gap junction connectivity between C. elegans neurons. This extension demonstrates the model’s versatility and its potential applicability across various nervous systems and connectivity types.

      Q2. Whether the authors can use the same transformation matrices trained from the retina data to predict neuronal connectivity in other brain regions? Or an easier case, the connectivity between RGC types to the neuronal types in SC, dLGN, or other post-RGC-synaptic brain regions. As the neuronal connection mechanisms are conserved and widely shared between different neuronal types, one would expect the same transformation matrices may work in predicting other neuronal type connectivity as well (at least to some extent).

      The idea to use the same transformation matrices for predicting connectivity in other brain regions is intriguing. While direct application of these matrices to different regions remains challenging, we discussed the potential scalability of our model to other brain areas. By applying the model to combined datasets from various regions, we could uncover conserved neuronal connection mechanisms. This approach is theoretically feasible and is supported by the demonstrated scalability of the bilinear model and its deep learning variants in industrial applications.

      Q3. Section 5.2 Connectivity metric generation: in this work, the author uses the stratification profiles of the neurons to estimate the connectivity metric, how reliable this method is? There will be a scenario where though two neuronal types project to a similar inner plexiform layer, they may not have any connection. Have the authors considered combining other experimental data (like electrophysiology data or neuron tracing data)?

      We discussed the reliability of using stratification profiles for estimating connectivity metrics, acknowledging potential limitations. In the revised manuscript, we added discussion on how the integration of additional experimental data, such as electrophysiological and neuron tracing data, could enhance the accuracy of the connectivity metrics.

      Q4. Section 6 Model training and validation: does the author have a potential hypothesis as to why 2 dimensions are the best latent feature spaces dimensionality? One would imagine with more dimensionality, the model will give better results. Could it be that the connectivity data that is used to train the model is only considering the two-dimensional space of the neuronal stratification?

      The selection of two dimensions for the latent feature space was informed by 5-fold cross-validation, aimed at optimizing model generalization to unseen data. Here while increasing dimensionality improves performance on the training set, it does not necessarily enhance generalization to the validation set. Thus, the choice of two dimensions ensures good performance without overfitting to the training data.

      Q5. Could the author provide the source code for the analysis? Or could the author make it a python/R package so that non-computational biologists can easily apply the method to their own data?

      We have included a "Data and Code Availability" section in the revised manuscript. This section provides a link to the source code with pointers to datasets used in our study, facilitating the application of our methods by researchers from various backgrounds.

      Q6. I know it may be difficult for the author to do, but is it possible to design and perform some experiments to validate the model prediction results, either connectivity partners of transcriptomicallydefined RGC types or the function of the key genetic molecules (which hasn’t been discovered before)? The author may consider collaborating with some experimental labs. The author may even consider predicting the connectivity between RGC with some of its post-synaptic neurons in the brain regions, like SC or dLGN, as recently there are a lot of single-cell sequencing data as well as connectivity data.

      We appreciate your suggestion regarding experimental validation. As a future direction, we have discussed potential experimental approaches to validate the model’s predictions in the "Experimental Validation of Candidate Genes" section. Specifically, we propose an experimental design involving the manipulation of delta-protocadherins using AAV-mediated CRISPR/Cas9 and subsequent examination of connectivity phenotypes. We are also open to collaborating with experimental labs to further explore the model’s predictions, particularly in predicting connectivity between RGCs and their post-synaptic neurons in other brain regions.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews

      Reviewer #1 (Public Review):

      Comment: The fact that there are Arid1a transcripts that escape the Cre system in the Arid1a KO mouse model might difficult the interpretation of the data. The phenotype of the Arid1a knockout is probably masked by the fact that many of the sequencing techniques used here are done on a heterogeneous population of knockout and wild type spermatocytes. In relation to this, I think that the use of the term "pachytene arrest" might be overstated, since this is not the phenotype truly observed. Knockout mice produce sperm, and probably litters, although a full description of the subfertility phenotype is lacking, along with identification of the stage at which cell death is happening by detection of apoptosis.

      Response: As the reviewer indicates, we did not observe a complete arrest at Pachynema. In fact, the histology shows the presence of spermatids and sperm in seminiferous tubules and epididymides (Fig. Sup. 3). However, our data argue that the wild-type haploid gametes produced were derived from spermatocyte precursors that have likely escaped Cre mediated activity (Fig. Sup. 4). Furthermore, diplotene and metaphase-I spermatocytes lacking ARID1A protein by IF were undetectable in the Arid1acKO testes (Fig. S4B). Therefore, although we do not demonstrate a strict pachytene arrest, it is reasonable to conclude that ARID1A is necessary to progress beyond pachynema. We have revised the manuscript to reflect this point (Abstract lines 17,18; Results lines 153,154)

      Comment: It is clear from this work that ARID1a is part of the protein network that contributes to silencing of the sex chromosomes. However, it is challenging to understand the timing of the role of ARID1a in the context of the well-known DDR pathways that have been described for MSCI.

      Response: With respect to the comment on the lack of clarity as to which stage of meiosis we observe cell death, our data do suggest that it is reasonable to conclude that mutant spermatocytes (ARID1A-) undergo cell death at pachynema given their inability to execute MSCI, which is a well-established phenotype.

      Comment: Staining of chromosome spreads with Arid1a antibody showed localization at the sex chromosomes by diplonema; however, analysis of gene expression in Arid1a KO was performed on pachytene spermatocytes. Therefore, is not very clear how the chromatin remodeling activity of Arid1a in diplonema is affecting gene expression of a previous stage. CUTnRUN showed that ARID1a is present at the sex chromatin in earlier stages, leading to hypothesize that immunofluorescence with ARID1a antibody might not reflect ARID1a real localization.

      Response: It is unclear what the reviewer means about not understanding how ARID1A activity at diplonema affects gene expression at earlier stages. Our interpretations were not based solely on the observation of ARID1A associations with the XY body at diplonema. In fact, mRNA expression and CUT&RUN analyses were performed on pachytene-enriched populations. ARID1A's association with the XY body is not exclusive to diplonema. Based on both CUT&RUN and IF data, ARID1A associates with XY chromatin as early as pachynema. Only at late diplonema did we observe ARID1A hyperaccumulation on the XY body by IF.

      Reviewer #2 (Public Review):

      Comment: The inefficient deletion of ARID1A in this mouse model does not allow any detailed analysis in a quantitative manner.

      Response: As explained in our response to these comments in the first revision, we respectfully disagree with this reviewer’s conclusions. We have been quantitative by co-staining for ARID1A, ensuring that we can score mutant pachytene spermatocytes from escapers. Additionally, we provide data to show the efficiency of ARID1A loss in the purified pachytene populations sampled in our genomic assays.

      Reviewer #3 (Public Review):

      Comment: The data demonstrate that the mutant cells fail to progress past pachytene, although it is unclear whether this specifically reflects pachytene arrest, as accumulation in other stages of Prophase also is suggested by the data in Table 1. The western blot showing ARID1A expression in WT vs. cKO spermatocytes (Fig. S2) is supportive of the cKO model but raises some questions. The blot shows many bands that are at lower intensity in the cKO, at MWs from 100-250kDa. The text and accompanying figure legend have limited information. Are the various bands with reduced expression different isoforms of ARID1A, or something else? What is the loading control 'NCL'? How was quantification done given the variation in signal across a large range of MWs?

      Response: The loading control is Nucleolin. With respect to the other bands in the range of 100-250 kDa, it is difficult to say whether they represent ARID1A isoforms. The Uniprot entry for Mouse ARID1A only indicates a large mol. wt sequence of ~242 kDa; therefore, the band corresponding to that size was quantified. There is no evidence to suggest that lower molecular weight isoforms may be translated. Although speculative, it is possible that the lower molecular weight bands represent proteolytic/proteasomal degradation products or products of antibody non-specificity. These points are addressed in the revised manuscript (Legend to Fig S2, lines 926-931). Blots were scanned on a LI-COR Odyssey CLx imager and viewed and quantified using Image Studio Version 5.2.5 (Methods, lines 640-642).

      Comment: An additional weakness relates to how the authors describe the relationship between ARID1A and DNA damage response (DDR) signaling. The authors don't see defects in a few DDR markers in ARID1A CKO cells (including a low-resolution assessment of ATR), suggesting that ARID1A may not be required for meiotic DDR signaling. However, as previously noted the data do not rule out the possibility that ARID1A is downstream of DDR signaling and the authors even indicate that "it is reasonable to hypothesize that DDR signaling might recruit BAF-A to the sex chromosomes (lines 509-510)." It therefore is difficult to understand why the authors continue to state that "...the mechanisms underlying ARID1A-mediated repression of the sex-linked transcription are mutually exclusive to DDR pathways regulating sex body formation" (p. 8) and that "BAF-A-mediated transcriptional repression of the sex chromosomes occurs independently of DDR signaling" (p. 16). The data provided do not justify these conclusions, as a role for DDR signaling upstream of ARID1A would mean that these mechanisms are not mutually exclusive or independent of one another.

      Response: The reviewer’s argument is reasonable, and we have made the recommended changes (Results, lines 212-215; Discussion, lines 499-500).

      Comment: A final comment relates to the impacts of ARID1A loss on DMC1 focus formation and the interesting observation of reduced sex chromosome association by DMC1. The authors additionally assess the related recombinase RAD51 and suggest that it is unaffected by ARID1A loss. However, only a single image of RAD51 staining in the cKO is provided (Fig. S11) and there are no associated quantitative data provided. The data are suggestive but it would be appropriate to add a qualifier to the conclusion regarding RAD51 in the discussion which states that "...loss of ARID1a decreases DMC1 foci on the XY chromosomes without affecting RAD51" given that the provided RAD51 data are not rigorous. In the long-term it also would be interesting to quantitatively examine DMC1 and RAD51 focus formation on autosomes as well.

      Response: We agree with the reviewer’s comment and have made the recommended changes (Discussion, lines 518-519).

      Response to non-public recommendations

      Reviewer 2:

      Comment: Meiotic arrest is usually judged based on testicular phenotypes. If mutant testes do not have any haploid spermatids, we can conclude that meiotic arrest is a phenotype. In this case, mutant testes have haploid spermatids and are fertile. The authors cannot conclude meiotic arrest. The mutant cells appear to undergo cell death in the pachytene stage, but the authors cannot say "meiotic arrest."

      Response: We disagree with this comment. By IF, we see that ~70% of the spermatocytes have deleted ARID1A. Furthermore, we never observed diplotene spermatocytes that lacked ARID1A. The conclusion that the absence of ARID1A results in a pachynema arrest and that the escapers produce the haploid spermatids is firm.

      Comment: Fig. S2 and S3 have wrong figure legends.

      Response: The figure legends for Fig. S2 and S3 are correct.

      Comment: The authors do not appear to evaluate independent mice for scoring (the result is about 74% deletion above, Table S1). Sup S2: how many independent mice did the authors examine?

      Response:These were Sta-Put purified fractions obtained from 14-15 WT and mutant mice. It is difficult to isolate pachytene spermatocytes by Sta-Put at the required purity in sufficient yields using one mouse at a time. We used three technical replicates to quantify the band intensity, and the error bars represent the standard error of the mean (S.E.M) of the band intensity.

      Comment: Comparison of cKO and wild-type littermate yielded nearly identical results (Avg total conc WT = 32.65 M/m; Avg total conc cKO = 32.06 M/ml)". This sounds like a negative result (i.e., no difference between WT and cKO).

      Response: This is correct. There is no difference between Arid1aWT and Arid1aCKO sperm production. This is because wild-type haploid gametes produced were derived from spermatocyte precursors that have escaped Cre-mediated activity (Fig. S4). These data merely serve to highlight an inherent caveat of our conditional knockout model and are not intended to support the main conclusion that ARID1A is necessary for pachytene progression.

      Comment: The authors now admit ~ 70 % efficiency in deletion, and the authors did not show the purity of these samples. If the purity of pachytene spermatocytes is ~ 80%, the real proportion of mutant cells can be ~ 56%. It is very difficult to interpret the data.

      Response: The original submission did refer to inefficient Cre-induced recombination. The reviewer asked for the % efficiency, which was provided in the revised version. Also, please refer to Fig. S2, where Western blot analysis demonstrates a significant loss of ARID1A protein levels in CKO relative to WT pachytene spermatocyte populations that were used for CUT&RUN data generation.

      Comment: The authors should not use the other study to justify their own data. The H3.3 ChIP-seq data in the NAR paper detected clear peaks on autosomes. However, in this study, as shown in Fig. S7A, the authors detected only 4 peaks on autosomes based on MACS2 peak calling. This must be a failed experiment. Also, S7A appears to have labeling errors.

      Response: I believe the reviewer is referring to supplementary figure 8A. Here, it is not clear which labeling errors the reviewer is referring to. In the wild type, the identified peaks were overwhelmingly sex-linked intergenic sites. This is consistent with the fact that H3.3 is hyper-accumulated on the sex chromosomes at pachynema.

      The authors of the NAR paper did not perform a peak-calling analysis using MACS2 or any other peak-calling algorithm. They merely compared the coverage of H3.3 relative to input. Therefore, it is not clear on what basis the reviewer says that the NAR paper identified autosomal peaks. Their H3.3 signal appears widely distributed over a 6 kb window centered at the TSS of autosomal genes, which, compared to input, appears enriched. Our data clearly demonstrates a less noisy and narrower window of H3.3 enrichment at autosomal TSSs in WT pachytene spermatocytes, albeit at levels lower than that seen in CKO pachytene spermatocytes (Fig S8B and see data copied below for each individual replicate). Moreover, the lack of peaks does not mean that there was an absence of H3.3 at these autosomal TSSs (Supp. Fig. S8B). Therefore, we disagree with the reviewer’s comment that the H3.3 CUT&RUN was a failed experiment.

      Author response image 1.

      H3.3 Occupancy at genes mis-regulated in the absence of ARID1A

      Comment: If the author wishes to study the function of ARID2 in spermatogenesis, they may need to try other cre-lines to have more robust phenotypes, and all analyses must be redone using a mouse model with efficient deletion of ARID2.

      Response: As noted, we chose Stra8-Cre to conditionally knockout Arid1a because ARID1A is haploinsufficient during embryonic development. The lack of Cre expression in the maternal germline allows for transmission of the floxed allele, allowing for the experiments to progress.

      Comment: The inefficient deletion of ARID1A in this mouse model does not allow any detailed analysis in a quantitative manner.

      Response: In many experiments, we have been quantitative when possible by co-staining for ARID1A, ensuring that we can score mutant pachytene spermatocytes from escapers. Additionally, we provide data to show the efficiency of ARID1A loss in the purified pachytene populations sampled in our genomic assays.

      Reviewer 3:

      Comment: The Methods section refers to antibodies as being in Supplementary Table 3, but the table is labeled as Supplementary Table 2.

      Response: This has been corrected

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Here we address the major points raised by the reviewers.

      Reviewer #1 (Public Review):

      Weaknesses:

      • The signaling pathway upstream of Maf1 remains unknown. In eukaryotes, Maf1 is a negative regulator of RNA pol III and is regulated by external signals via the TORC pathway. Since TORC components are absent in the apicomplexan lineage, one central question that remains open is how Maf1 is regulated in P. falciparum. Magnesium is probably not the sole stimulus involved, as suggested by the observation that Ile deprivation also down-regulates RNA pol III activity.

      We agree that there is still much to uncover relating to the PfMaf1 signaling pathway. While we still do not know each component, we have been able to link external factors (of course not limited to only magnesium) to the increased nuclear occupancy of PfMaf1. Other protein interactors that potentially regulate PfMaf1, while not confirmed, have been identified in plasma sample as candidates for future experiments to validate their potential involvement of RNA Pol III inhibition.

      • The study does not address why MgCl2 levels vary depending on the clinical state. It is unclear whether plasma magnesium is increased during asymptomatic malaria or decreased during symptomatic infection, as the study does not include control groups with non-infected individuals. Along the same line, MgCl2 supplementation in parasite cultures was done at 3mM, which is higher than the highest concentrations observed in clinical samples.

      This reviewer raised a valid point. The plasma magnesium levels for the wet symptomatic samples (averaging [0.79mM]) were within the normal range of a healthy individual (between [0.75-0.95mM]) while the dry asymptomatic levels were above the normal range (averaging [1.13mM]). Ideally, we would have liked to have control uninfected plasma samples from individuals from The Gambia. Unfortunately, field studies and human volunteer studies do not always have all the ideal controls that in vitro studies have. We recognize that [3mM] is higher than the normal range for magnesium levels, which is why we included a revised Supplementary Figure 3A. This figure shows that magnesium concentrations as low as [1mM] (similar to the levels found in dry asymptomatic samples) reduced the expression of RNA Pol III-transcribed genes.

      • Although the study provides biochemical evidence of Maf1 accumulation in the parasite nuclear fraction upon magnesium addition, this is not fully supported by the immunofluorescence experiments.

      We agree that the resolution of IFA images does not allow to support the WB data. We believe that the importance of the IFA Supplementary Figure is to show that PfMaf1 clusters together in foci, which has not been previously reported.

      Reviewer #2 (Public Review):

      Weaknesses:

      However, most analyses are rather preliminary as only very few (3-5) candidate genes are analyzed by qPCR instead of carrying out comprehensive analyses with a large qPCR panel or RNA-seq experiments with GO term analyses. Data presentation lacks clarity, the number of biological replicates is rather low and the statistical analyses need to be largely revised. Although the in vivo data from wet (mildly symptomatic) and dry (asymptomatic) season parasites with different expression levels of Pol III-regulated genes, var genes, and MgCl2 are interesting, the link between the in vitro data and the in vivo virulence of P. falciparum, which is made in many sections of the manuscript, should be toned down. Especially since (i) the only endothelial receptor studied is CD36, which is associated with parasite binding during mild malaria, and (ii) several studies provide contradictory data on MgCl2 levels during malaria and in different disease states, which is not further discussed, but the authors mainly focused on this external stimulus in their experiments.

      We agree that, ideally, we would have liked to do full RNA-seq on The Gambia samples. However, that was out of the scope of this project. The RNA samples were limited which is why we did not use more primers. We believe that an appropriate number of replicates was done for the experiments. The wet symptomatic samples from this study were from mildly symptomatic individuals, as stated in the manuscript. Therefore, CD36 was a relevant receptor to use for our studies.

      We agree that the published studies about magnesium levels in infected individuals are not always consistent. What these studies do not consider is the time of year, whether the infection occurred during the dry or wet season. These studies were also done in different regions of the world using different technologies. For this reason, we only highlight the observed difference observed in our field study data from The Gambia.

      Reviewer #3 (Public Review):

      Weaknesses:

      (1) The signals upstream of Maf1 remain rather a black box. 4 are tested - heat shock and low-glucose, which seem to suppress ALL transcription; low-Isoleucine and high magnesium, which suppress Pol3. Therefore the authors use Mg supplementation throughout as a 'starvation type' stimulus. They do not discuss why they didn't use amino acid limitation, which could be more easily rationalised physiologically. It may be for experimental simplicity (no need for dropout media) but this should be discussed, and ideally, sample experiments with low-IsoLeu should be done too, to see if the responses (e.g. cytoadhesion) are all the same.

      We agree that deprivation of isoleucine would have been another experimental assay for our study, but it also would not have been as novel as magnesium. While understanding the exact mechanism or involvement of magnesium as a stress condition was not the scope of this manuscript, we believe that our data will be valuable into demonstrating that external stimuli act on P. falciparum virulence gene expression via RNA Pol III inhibition. Since we also had plasma level data for magnesium, and not isoleucine, we believed it made for a better external factor to use for our in vitro studies.

      (2) The proteomics, conducted to seek partners of Maf1, is probably the weakest part. From Figure S3: the proteins highlighted in the text are clearly highly selected (as ones that might be relevant, e.g. phosphatases), but many others are more enriched. It would be good to see the whole list, and which GO terms actually came top in enrichment.

      We apologize if the reviewer did not see the attached supplementary Co-IP MS data. The file includes all proteins found in each sample as well as GO term analysis. For the purpose of this work, we highlight proteins potentially involved in the canonical role of Maf1 that have been shown in model organisms to reversibly inhibit RNA Pol III (phosphatases, RNA Pol III subunits).

      (3) Figure 3 shows the Maf1-low line has very poor growth after only 5 days but it is stated that no dead parasites are seen even after 8 cycles and the merozoites number is down only ~18 to 15... is this too small to account for such poor growth (~5-fold reduced in a single cycle, day 3-5)? It would additionally be interesting to see a cell-cycle length assessment and invasion assay, to see if Maf1-low parasites have further defects in growth.

      We agree with the reviewer that the observed reduced merozoite numbers may not the only cause of the reduced growth rate. Other factors in the PfMaf1 knock-down line may contribute to the observed poor growth.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews

      Reviewer #1 (Public Review):

      Summary:

      The findings in this manuscript are important in the gene editing in human-derived hematopoietic stem and progenitor cells. By optimizing the delivery tool, adding DNA-PK inhibitor and including spacer-breaking silent mutations, the editing efficiency is significantly increased, and the heterozygosity could be tuned. The editing is even across the hematopoietic hierarchy.

      Strengths:

      The precise gene editing is important in gene therapy in vitro and in vivo. The manuscript provides solid evidence showing the efficacy and uniqueness of their gene editing approach.

      Weaknesses:

      There are several extended and unique points shown in this paper but in a specific cell population.

      The findings are indeed in a specific cell lineage, though it should be noted the editing crossed multiple cell types within that lineage. More importantly though, HSPC have substantial relevance to understanding adult stem cell biology, blood formation, and leukemia. Critically, they are also the target cells for a plethora of gene therapies for anemias, immunodeficiencies, metabolic disorders, and are also being explored for use with CAR technologies. Indeed, CRISPR-based gene therapy was recently approved for clinical use. As such, the findings here are of substantial relevance for multiple areas of research including hematology, stem cell biology, cancer, immunology and more.

      Reviewer #2 (Public Review):

      Summary:

      This work by Cloarec-Ung et al. sets out to uncover strategies that would allow for the efficient and precision editing of primitive human hematopoietic stem and progenitor cells (HSPCs). Such effective editing of HSPCs via homology directed repair has implications for the development of tractable gene therapy approaches for monogenic hematopoietic disorders as well as precise engineering of these cells for clinical regenerative and/or cell therapy strategies. In the setting of experimental hematology, precision introduction of disease relevant mutations would also open the door to more robust disease modeling approaches. It has been recognized that to encourage HDR, NHEJ as the dominant mode of repair in quiescent HSPCs must be inhibited. Testing editing of human cord blood HSPCs the authors first incorporate a prestimulation phase then identify optimal RNP amounts and donor types/amounts using standard editing culture conditions identifying optimal concentrations of AAV and short single-stranded oligonucleode donors (ssODNs) that yield minimal impacts to cell viability while still enabling heightened integration efficiency. They then demonstrate the superiority of AZD7648, an inhibitor of NHEJ-promoting DNA-PK, in allowing for much increased HDR with toxicities imparted by this compound reduced substantially by siRNAs against p53 (mean targeting efficiencies at 57 and 80% for two different loci). Although AAV offered the highest HDR frequencies, differing from ssODN by a factor by ~2-fold, the authors show that spacer breaking sequence mutations introduced into the ssODN to better mimic the disruption of the spacer sequence provided by the synthetic intron in the AAV backbone yielded ssODN HDR frequencies equal to that attained by AAV. By examining editing efficiency across specific immunophenotypically identified subpopulations they further suggest that editing efficiency with their improved strategy is consistent across stem and early progenitors and use colony assays to quantify an approximate 4-fold drop in total colony numbers but no skewing in the potentiality of progenitors in the edited HSPC pool. Finally, the authors provide a strategy using mutation-introducing AAV mixed with different ratios of silent ssODN repair templates to enable tuning of zygosity in edited CD34+ cells.

      Strengths:

      The methods are clearly described and the experiments for the most part also appropriately powered. In addition to using state of the art approaches the authors also provided useful insights into optimizing the practicalities of the experimental procedures that will aid bench scientists in effectively carrying out these editing approaches, for example avoiding longer handling times inherent when scaling up to editing over multiple conditions.

      The sum of the adjustments to the editing procedure have yielded important advances towards minimizing editing toxicity while maximizing editing efficiency in HSPCs. In particular, the significant increase in HDR facilitated by the authors' described application of AZD7648 and the preservation of a pool of targeted progenitors is encouraging that functionally valuable cell types can be effectively edited.

      The discovery of the effectiveness of spacer breaking changes in ssODNs allowing for substantially increased targeting efficiency is a promising advance towards democratizing these editing strategies given the ease of designing and synthesizing ssODNs relative to the production of viral donors.

      The ability to zygosity tune was convincingly presented and provides a valuable strategy to modify this HDR procedure towards more accurate disease modelling.

      Weaknesses:

      Despite providing convincing evidence that functional progenitors can be successfully edited by their procedure, as the authors acknowledge it remains to be verified to what degree the self-renewal capacity and in vivo regenerative potential of the more primitive fractions is maintained with their strategy.

      As other the 53BP1-based editing strategy that also disrupt DNA-PK have demonstrated maintained allele frequencies over engraftment time (De Ravin et al. Blood 2021), this suggests that a transient disruption of DNA-PK shouldn’t compromise regenerative potential. Of course, we strongly agree that maintained regenerative potential is important in any editing strategy. As such, for the version of record we have added clonal LT-CIC assessment using conditions that we’ve previously demonstrated predict long-term repopulating potential (Knapp et al. Nat Cell Bio 2018). This data, which has been added to Figure 3, shows no significant reduction in the frequency of the most potent LT-CIC in edited cells compared to unedited controls.

      Assessments of the potential for off-target effects via the authors' approach was somewhat cursory and would have benefited from a more thorough evaluation.

      Once again in the 53BP1 strategy, the authors of that study already performed CHANGE-seq, long-range PCR, NGS, and SKY with inhibition of this same pathway without obvious increases in off-target editing (as long as HDR donor was present, though they did interestingly observe increased large deletions when HDR donors were absent, De Ravin et al. Blood 2021). Our tests here were designed to confirm that our molecule was similarly not affecting off-target editing rather than to launch a large-scale investigation. We agree, however, that off-targets and particularly structural re-arrangements that could be missed by other approaches remain a concern. We have added in nanopore sequencing of the predicted off-target sites and thus verified more deeply that there was no change (indeed no observable off-target activity) at any of these sites. This data has been added to Figure 2 and to a new supplementary Figure S5. Additionally, while it’s beyond the scope of the current manuscript, a focused follow-up dedicated to structural rearrangements downstream of both single and multiple edits is currently in progress and will be submitted separately later this year.

      Viability was assessed by live cell counting however given the short-term nature of the editing assay, more sensitive readouts of potentially compromised cell health could have provided a more stringent assessment of how the editing methodology impacted cell fitness.

      Of course, we agree that viable cell counting does not fully predict whether the cell is viable in terms of retained proliferative potential or other functional potentials. This point was addressed for myeloid progenitors at least by the CFC assays already in the manuscript, as to form a colony these cells were definitionally viable at input. Indeed, in these tests, we did see a reduction beyond that of the viable counts as already discussed in the text. Similarly, we already inadvertently answered this in the general CD34+CD45RA- population in Figure 4C where we measured clonal growth following editing with different mutant to silent donor ratios. In this instance we observed 30-40% clonogenic frequencies (Figure 4C), though in this case without a specific non-edited control (as this was not the intended question). None-the-less, this would indicate that any general viability loss was no more than observed in the CFC tests (even if we assume 100% cloning efficiency if the cells had been unedited). Finally, the clonal LTC-IC show that while there is perhaps some loss in more committed progenitors, those with the highest self-renewal potential are not compromised in the edited condition compared to control (Figure 3I).

      Recommendations for the authors

      Reviewer #2 (Recommendations For The Authors):

      It will be important to include the author-provided new paragraph in the discussion to contextualize this work in the existing HSPC editing landscape and your unique findings.

      A new paragraph detailing how our manuscript fits with other recently published works is now included in the discussion.

      The legend for Figure 3 needs correction. Panel E is incorrectly labeled as panel D and panel F is incorrectly labeled as panel E.

      Thank you for catching this typo. It has been fixed.

      In Figure 4 axis headings in panel C and D require clarity beyond simply titles of "Mean Frequency".

      These axis labels have been clarified.

    1. Author response:

      The following is the authors’ response to the original reviews.

      In this letter, we respond to each of the reviewers’ comments. We support responses by referring to the revised manuscript and, where necessary, by including additional descriptions and analyses that we consider extrinsic to the manuscript itself. In this letter, all changes to the manuscript are shown in blue. As noted, the displayed figures have been added to the manuscript or the SI. We believe that we have successfully addressed all comments and that the quality of our paper has improved significantly.

      Comment 1: In addition to the technical comments by the reviewers, I would encourage the authors to discuss the dependency of their observations, e.g. emergence of microphase separation, not only on the sequence of the polypeptides, but also on the solution conditions. Similarly, the distributions of ions in the condensate bulk, interphase, and diluted phase, and hence the interfacial free energy, are significantly affected both by the chemical composition of the condensate and the salt concentration itself, see: https://pubs.acs.org/doi/10.1021/acs.nanolett.1c03138

      We thank the editor for this suggestion. Here, we have focused on the effect of sequence on condensate organization. We agree that how changes in solution condition affect condensate, including microphase separation of ELPs, is potentially interesting as well. We note this as a possible future direction at multiple places in the revised Conclusions and Discussion:

      “The simulations successfully reproduced condensate stability variation upon amino acid substitution. While our study is performed at set salt concentration and temperature to isolate the contributions of amino acid hydrophobicity to condensate organization, future studies may consider implementing temperature [cite] or salt [cite] dependent models to explore how solution conditions affect the organization of ELP condensates.”

      “Such a microenvironment arises from the collective behavior of many proteins, can deviate from that of individual chains, and is likely sensitive to the solution conditions,[cite] which are held constant in our study. Future work on systems with double amino acid substitutions or changes to salt concentration or temperature could elucidate the generality of the mean field interpretation and the additivity of individual contributions.”

      Response to referee 1

      Comment 0: This is an interesting, informative, and well-designed study that combines theoretical and experimental methodologies to tackle the phenomenon of higher-resolution structures/substructures in model biomolecular condensates. The results should be published. However, there is significant room for improvement in the presentation and interpretation of the results. As it stands, the precise definition of “frustration,” which is a main theme of this manuscript (as emphasized in the title), is not sufficiently well articulated. This situation should be rectified to avoid ””rustration” becoming a ”catch-all” term without a clear perimeter of applicability rather than a precise, informative description of the physical state of affairs. There are also a few other concerns, e.g., regarding interpretation of correlation of phase-separation critical temperature and transfer free energy of amino acid residues as well as the difference between critical temperature and onset temperature, and the way the simulated configurations are similar to that of gyroids.

      We want to thank the reviewers for their insightful comments. We revised the manuscript extensively to improve its clarity and to address the reviewers’ concerns. In the following, we provide point-to-point responses to all the comments.

      Comment 1: It is accurately pointed out on p.4 that elastin-like polypeptides (ELPs) undergo heat-induced phase separation and therefore exhibit lower critical solution temperatures (LCSTs). But it is not entirely clear how this feature is reproduced by the authors’ simulation. A relationship between simulated surface tension and “transition temperature” is provided in Fig.1C; but is the ”transition temperature” (authors cited ref.41 by Urry) the same as critical temperature? Apparently, Urry’s Tt is””critical onset temperature”, the temperature when phase separation happens at a given polymer concentration. This is different from the (global) critical temperature LCST - though the two may be correlated-or not-depending on the shape of the phase boundary. Moreover, is the MOFF coarse-grained forcefield (first step in the multi-scale simulation), by itself, capable of reproducing heat-induced phase separation in a way similar to the forcefield of Dignon et al., ACS Cent Sci 5, 821-230 (2019)? Or is this temperature-dependent effect appearing only subsequently, after the implementation of the MARTINI and/or all-atom steps? Clarification is needed. To afford a more informative context for the authors’ introductory discussion, the aforementioned Dignon et al. work and the review by Cinar et al. [Chem Eur J 25, 13049-13069 (2019)], both touching upon the physical underpinning of the LCST feature of elastin, should also be cited along with refs.41-43.

      We thank the reviewer for their comment. First, we apologize for the lack of clarity between the global lower critical solution temperature, Tc, and the transition temperature, Tt. We have modified the manuscript to be more explicit that the transition temperature we utilize is dependent on the solution conditions, instead of the global lower critical solution temperature.

      Author response image 1.

      Tt as a function of concentration for ELP[V5A2G3] constructs of different chain lengths. Logarithmic fits to the data for each construct using Eq. 1 are also shown. It is evident that the different curves converge to the critical temperature Tc at the critical concentration Cc. Figure reproduced from ref.[2] CC BY 4.0.

      However, as shown by Chilkoti and coworkers [1, 2] and in Author response image 1, the critical temperature of ELPs Tc is indeed linearly related to Tt with the following relationship

      The above equation highlights the dependence of Tt on the chain length (length) and polymer concentration (conc). The parameter Cc is the corresponding theoretical polypeptide concentration that would be required to achieve Tc, and k is the proportionality constant. Instead of making computationally expensive predictions of condensate critical temperatures, we focused on the surface tension, which can be more readily determined from single constant temperature simulations as detailed in the Methods section. This decision was made so to make it computationally feasible to systematically probe the properties of all 20 amino acids in diblock ELPs in our multiscale model. Furthermore, an expected relationship between the critical temperature and the surface tension can be inferred based on the Flory Huggins theory. In particular, relationships between the Flory Huggins parameter, χ, and interfacial tension (τ) have been investigated, and the relationship can be approximated as

      where α is a positive constant, whose exact value depends on the proximity of χ to the critical value of χ necessary for phase separation (χC).[3, 4] As detailed in new Supplemental Theory of the Supporting Information, for systems undergoing LCST,

      with Therefore, we have

      Several conclusions can be drawn from Eq. 4. First, for α = 1, τ is linearly proportional to Tc. Secondly, τ decreases at larger values for Tc since trend that is consistent with results presented in Figure 1 of the main text. Finally, as detailed in the Supplemental Theory, the inverse relationship between τ and Tc is only expected for systems exhibiting LCSTs. For systems with UCST, τ increases at larger Tc. Therefore, reproducing the correct trend supports the model’s ability to capture the temperature-dependent effect specific to the ELP system.

      We modified the text to define the physical meaning of Tt more explicitly. Furthermore, we added a new section in the Supporting Information titled Supplemental Theory to detail the relationship between Tt, Tc, the Flory-Huggins parameter χ, and the surface tension τ. The updated text now reads:

      “Utilizing the simulated condensate conformations, we computed various quantities to benchmark against experimental measurements. While the critical temperature has been widely used as a measure for condensate stability, determining it computationally is expensive. As an alternative, we computed the surface tension, τ, using 100-µs-long MARTINI simulations performed with the NPNAT ensemble.[cite] As detailed in the Supplemental Theory in the Supporting information, an inverse relationship is expected between τ and the critical temperature, Tc, for systems exhibiting LCSTs. We further approximate Tc with the transition temperatures (Tt) of ELP sequences,[cite] which are the temperatures at which ELPs undergo an LCST transition at a specified solution condition. Tt was shown to be linearly proportional to TC[cite]. As expected, a negative correlation can be readily seen between computed surface tension and experimental Tt (Fig. 1C). This observed negative correlation between Tt and τ supports the simulation approach’s accuracy in reproducing the sequence-dependent changes in ELP phase behavior.”

      The reviewer is correct that MOFF does not explicitly account for temperature-dependent effects in its interaction parameters. But as mentioned above and indicated by the reviewer, the following steps with explicit solvent simulations in the multiscale strategy succeed in capturing sequence-dependent differences in ELP systems, which are evident in both transition temperature and surface tension.

      We cited the two references suggested by the reviewer in the introduction. We further added the following text in the discussion section to suggest explicitly exploring temperature-dependent effects as an interesting future direction.

      “While our study is performed at set salt concentration and temperature to isolate the contributions of amino acid hydrophobicity to condensate organization, future studies may consider implementing temperature[cite] or salt[cite] dependent models to explore how solution conditions effect the organization of ELP condensates.”

      Comment 2: “Frustration” and ”frustrated” are used prominently in the manuscript to characterize certain observed molecular configurations (11 times total, in both the title and in the abstract). Apparently, it is the most significant conceptual pronouncement of this work, hence its precise meaning is of central importance to the authors’ thesis. Whereas one should recognize that the theoretical and experimental observations are striking without invocation of the “frustration” terminology, usage of the term can be useful if it offers a unifying conceptual framework. However, as it stands, a clear definition of the term “frustration” is lacking, leaving readers to wonder what molecular configurations are considered “frustrated” and what are not (i.e., is the claim of observation of frustration falsifiable?). For instance, “frustrated microphase separation” appears in both the title and abstract. A logical question one may ask is: “Are all microphase separations frustrated”? If the answer is in the affirmative, does invocation of the term “frustration” add anything to our physical insight? If the answer is not in the affirmative, then how does one distinguish between microphase separations that are frustrated from those that are not frustrated? Presumably all simulated and experimental molecular configurations in the present study are those of lowest free energy for the given temperature. In other words, they are what they are. In the discussion about frustrated phase separation on p.13, for example, the authors appear to refer to the fact that chain connectivity is preventing hydrophobic residues to come together in a way to achieve the most favorable interactions as if there were no chain connectivity (one may imagine in that case all the hydrophobic residues will form a large cluster without microphase separation). Is this what the authors mean by “frustration”? If that’s true, isn’t that merely stating the obvious, at least for the observed microphase separation? In general, does “frustration” always mean deviation of actual, physical molecular configurations from certain imagined/hypothetical/reference molecular configurations, and therefore dependent upon the choice of the imagined reference configuration? If this is how the authors apply the term “frustration” in the present work, what is the zero-frustration reference state/configuration for microphase separation? And, similarly, what is the zero-frustration reference state/configuration when frustrated EPS-water interactions are discussed (p.14-p.15, Fig.5)? How do non-frustrated water-protein interactions look like? Is the classic clathrate-like organization of water hydrogen bonds around small nonpolar solute “frustrated”?

      We thank the reviewer for their insightful comment, and agree that the concept of “frustration” is both important to our conclusions and, upon review, is too vague in our previous draft of the manuscript.

      For conceptual simplicity and to maximize transferability to real biological systems, we will focus our discussion of frustration on one specific type, which we term “chain frustration.” Chain frustration occurs in states where tertiary interactions between chemically distinct polymer blocks favor phase separation, while chain connectivity prevents macroscopic phase separation from occurring.[5] This frustration leads to microphase separation with microdomains of different monomers.

      We agree with the reviewer that “all microphase separations” are frustrated, and have revised the title to

      “Microphase Separation Produces Interfacial Environment within Diblock Biomolecular Condensates”

      Furthermore, we also removed frustration from the abstract to read

      “The interspersion of hydrophilic and hydrophobic residues and a lack of secondary structure formation result in an interfacial environment, which explains both the strong correlation between ELP condensate stability and interfacial hydrophobicity scales, as well as the prevalence of protein-water hydrogen bonds.”

      We have limited our discussion of the frustration to the incomplete separation of hydrophobic and hydrophobic groups. As pointed out by the reviewer, in this case, frustration refers to the fact that chain connectivity is preventing hydrophobic residues from coming together in a way to achieve the most favorable interactions as if there were no chain connectivity. The reference would be a perfectly macroscopic phase separation that partitions hydrophobic from hydrophilic groups.

      While the frustration from chain connectivity is well understood for block copolymers[5], its effect on producing the interfacial solvation environment, to the best of our knowledge, has not been emphasized before. We have revised the text at the point where we mention frustration to clearly define its meaning.

      “Therefore, while microphase separation occurs in ELP condensates, frustration remains in the system. Hydrophilic residues cannot completely separate from hydrophobic ones due to constraints imposed by the acid sequence, creating unique microenvironments.”

      When discussing the interactions between ELP and water, we used the hydrogen bond analysis to emphasize the interfacial environment. For example, the hydrophobic residues tend to “repel” water molecules, reducing the hydrogen bond density; on the other hand, hydrophilic residues and backbone retain water molecules. This difference resulted in the positive and negative correlation with Tt shown in Fig 5C. The behavior of water molecules is, therefore, inhomogeneous inside the condensate. We expect water molecules to become frustrated due to the simultaneous contact with both hydrophobic and hydrophilic chemical groups, and a perfect reference state would be the pure water environment. However, since this point is not central to our study, to avoid confusion, we have avoided mentioning frustration and revised the text to read amino acid sequence, creating unique microenvironments.”

      “The water hydrogen bond density also highlights an interfacial environment of blended hydrophobic and hydrophilic regions.”

      After revising the text, frustration only appears three times in the manuscript.

      Comment 3: In the discussion about the correlation of various transfer free energy scales for amino acids and Urry’s critical onset temperature (ref.41) on p.11 and Fig.4, is there any theoretical relationship to be expected between the interactions among amino acids of ELPs and their critical onset temperatures? While a certain correlation may be intuitively expected if the free energy scale ”is working”, is there any theoretical insight into the mathematical form of this relationship? A clarifying discussion is needed because it bears logically on whether the observed correlation or lack thereof for different transfer energy scales is a good indication of the adequacy of the energy scales in describing the actual physical interactions at play. This question requires some prior knowledge of the expected mathematical relationship between interaction parameters and onset temperature.

      We thank the reviewer for their comment. The exact relationship between the interactions between amino acids and their transition temperature can be understood in terms of the Flory-Huggins theory, which describes the thermodynamics of polymer mixtures using a lattice model. The chemical composition of the mixture is built into the polymer-solvent interaction parameter

      Where is the coordination number, T is the temperature, kB is the Boltzmann constant, and {ϵpp, ϵss, ϵps} are the strength of polymer-polymer, solventsolvent, and polymer-solvent interactions respectively.[6]

      From the original derivation of Flory-Huggins theory, it can be shown that phase separation occurs when χ is greater than its critical value, or χC, we can derive the critical temperature as

      Δϵ can indeed be interpreted as the free energy cost of transferring a polymer bead from a solution phase to a polymer phase. It corresponds to the change of energy from a mixed state, with contacts between polymer and solvent (ϵps), to the demixed state with only polymer-polymer (ϵpp) and solvent-solvent (ϵss) contacts.

      Therefore, the transfer free energy, and the interactions among amino acids of ELPs, are expected to correlate with the critical temperature. The above discussion has been incorporated into the new section Supplemental Theory in the Supporting Information. There, we also discuss the more general scenario where Δϵ is temperature dependent, which is essential for giving rise to LCST.

      We have modified the main text in the discussions of Figure 4 to better explain these mathematical relationships and their necessary assumptions in order to help interpret our simulations. Here is an expert from where we discuss Figure 4:

      “The strong dependence of molecular organization on amino acid hydrophobicity suggests that the solvation environment of individual residues might be a determining factor for condensate stability. Indeed, as shown in the Supplemental Theory of the Supporting Information, the critical temperature is closely related to the free energy cost of transferring polymer beads from a solution state to a polymer-only environment. This transfer free energy is often used to quantify the hydrophobicity of amino acids [cite]. To explore their relationship more quantitatively, we compared the transition temperature for ELP condensates measured by Urry [cite] to several hydrophobicity scales.”

      Comment 4: To provide a more comprehensive context for the present study, it is useful to compare the microphase separation seen in the authors’ simulation with the micelle-like structures observed in recent simulated condensed/aggregated states of hydrophobic-polar (HP) model sequences in Statt et al., J Chem Phys 152, 075101 (2020) [see esp. Fig.6] and Wesse´n et al., J Phys Chem B 126, 9222-9245 (2022) [see, e.g., Fig.10].

      We thank the reviewer for this suggestion. The results of Statt et al. and Wessen et al.´ indeed provide a nice comparison to our results. While we capture some of the same behavior they observe, the full array of chemical space in our model seems to give some additional morphologies as well.

      First, as predicted by the self-consistent field theory, block copolymers are expected to form primarily lamellar like micelles that clearly seperate the dense and dilute phase when the volume fraction, f, is 0.5 (Response to Comment 5). This prediction is indeed consistent with results from simulations with the HP model, and is consistent with our simulations when the substituted amino acid, X, is sufficiently polar.

      However, this observation is only one of several behaviors we observe. In particular, our simulations also produce gyroid-like structures, which are predicted to emerge at small volume differences, i.e. f ≈ 0.4 or f ≈ 0.6. These different configurations likely emerge due to the more realistic representation of amino acids in our model, which presents more frustration than the HP model. In particular, the backbone atoms are inherently hydrophilic and cannot separate from the hydrophobic side chains. Therefore, under microphase separation, it is inherently difficult to separate the different chemical groups to form lamellar or micelle-like structures. This produces a condensate interior with interfacial properties that may not be captured by the HP model.

      We make note of the micelle-like topologies predicted by HP models in the revised text, citing both Statt et al. and Wessen et al.:´

      “Surprisingly, microphase separation did not produce lamellar morphology as expected for block copolymers with equal volume fraction of the two blocks (Fig. S3 in the Supporting Information) [cite]. In particular, the condensates appear to form gyroid-like structures (Fig. S4 in the Supporting Information), in which the V and X blocks form two interpenetrating networks. This morphology also differs from micelle-like structures seen in simplified hydrophobicpolar (HP) polymers [cite]. It promotes interfacial contacts while maintaining substantial self-interactions as well. Weak interfacial tension between different ELP blocks has also been noted by Hassouneh et al.[cite]”

      Comment 5: ”Gyroid-like morphology” is mentioned several times in the manuscript (p.4, p.8, p.17, Fig.S3). This is apparently an interesting observation, but a clear explanation is lacking. A more detailed and specific discussion, perhaps with additional graphical presentations, should be provided to demonstrate why the simulated condensed-phase ELP configurations are similar to the classical description of gyroid as in, e.g., Terrones & Mackay, Chem Phys Lett 207, 45-50 (1993) and Lambert et al., Phil Trans R Soc A 354, 2009-2023 (1996).

      We thank the reviewer for their comment. Gyroids are canonical structures for diblock copolymers.[5, 7, 8, 9] Their stability is predicted using self-consistent field theory (SCFT), and occurs due to the balance of the volume fraction of polymer block A (fA), the length of the polymer (N), and the Flory-Huggins interaction parameter (χ).[8, 9] The prediction from SCFT suggests that gyroids occur at smaller values of χN and values fA near, but not equal to 0.5 (Author response image 2).[10] We hypothesize that these configurations emerge at equal molar fraction of V and X amino acids due to small differences in solvation volume between each half of the polymer chain.

      Our support for gyroid-like structures is mainly from observations of two interpenetrating networks formed by the two ELP blocks. We have revised Figure S4 to clearly highlight the two networks as shown in Author response image 3.

      We have revised the main text to clearly define the gyroid-like structures as interpenetrating networks, and added the theoretical phase diagram of diblock copolymers predicted by SCFT as Figure S3 in the Supporting Information.

      “In particular, the condensates appear to form gyroid-like structures (Fig. S4 in the Supporting Information), in which the V and X blocks form two interpenetrating networks. This morphology also differs from micelle-like structures seen in simplified hydrophobic-polar (HP) polymers [cite]. It promotes interfacial contacts while maintaining substantial self-interactions as well. Weak interfacial tension between different ELP blocks has also been noted by Hassouneh et al.[cite]”

      We note, however, that proving that our observations are indeed gyroid structures requires more sophisticated mathematical analysis that is beyond the scope of the study. It is also possible that these structures are metastable in our simulations. We emphasize these caveats in the updated Discussion Section.

      “Further studies on the thermodynamic stability of these morphologies and comparing them with predictions from the self-consistent field theory shall provide more insights into the driving forces for their emergence [cite].”

      Author response image 2.

      Theoretical phase diagram[8] and corresponding morphologies for diblock copolymers. The phases are labeled as: body centered cubic (BCC), hexagonal cylinders (HEX), gyroid (GYR), and lamellar (LAM). fA is the volume fraction of a single polymer block, denoted A, χ is the Flory-Huggins interaction parameter, and N is the total degree of polymerisation. Figure reproduced from ref.[10] CC BY 4.0.

      Author response image 3.

      Representative configurations of (A) V5F5 and (B) V5L5 condensates from MARTINI simulations. The valine substituted half of the chain is colored blue (V5) and the X substituted half of the chain is colored red (X5). To highlight the interpenetrating networks formed by the two halves, only the X substituted half of the chain is shown on the left. Simulation interfaces are once repeated periodically in the positive x and positive y dimensions for clarity. High density regions formed by the multiple X substituted half of the chains are highlighted in yellow circles, with one of the chain shown in green.

      Response to referee 2

      Comment 1: The experimental characterization relies on BODIPY and SBD reporting, respectively, on viscosity and polarity. The fluorescent signal of these dyes can possibly depend on many other factors, including quenching. Additional controls are required, or a more extensive discussion with additional references, and a mention to potential limitations of this approach.

      We agree with the reviewer that the fluorescence lifetime signal will be affected by many factors. Compared with the fluorescence intensity, the fluorescence lifetime mainly depends on the dyes’ self properties and environmental factors. BODIPY and SBD have been used in biological systems to detect the microviscosity and micropolarity of condensates. Our group published the same SBD and BODIPY fluorophores in previous work to quantify the microenvironment of protein aggregation and condensations. The extended data (ChemBioChem 20:1078–1087. doi: 10.1002/cbic.201800782; Aggregate 4:e301. doi:10.1002/agt2.301; Nat Chem Biol 1–9. doi:10.1038/s41589-023-01477-1) shows evidences that the BODIPY is only sensitive to the viscosity while SBD is only sensitive to the polarity, but nonsensitive to other environmental factors. As for the quenched issue, the fluorophores with extended pi-rich structure display aggregation-caused quenching (ACQ) effect in high probe concentration, which will lower the fluorescence lifetime and intensity. We usually labeled the 20% molar ratio of the ELPs using NHS-ester fluorophores to get stock solutions. Due to the labeling efficiency, the exact labeling ratio is much lower than 20%. The labeled ELP stock solution will be further mixed with unlabeled ELP to get ELP solutions with low labeling fractions. We measured the ELPs labeled with a different fraction of dyes. The result shows that only BODIPY performs slight ACQ phenomena at a high

      Author response image 4.

      FLIM images of ELP condensates labeled with different fractions of dyes. A) FLIM images of V30A30 condensates with 5%, 2.5%, and 1% BODIPY labels. B) FLIM images of V30A30 condensates with 5%, 2.5%, and 1% fraction of SBD. Droplets were formed with a final concentration of 70 µM ELP labeled with different fractions of BODIPY or SBD in 2 M NaCl solution. Scale bar:5 µm.

      To mostly avoid the potential ACQ effect and achieve enough fluorescence signals, we finally use the ELP labeled with a lower fraction of dyes, 1% of BODIPY and 2.5 % of SBD, to perform the FLIM experiments. The data in Figure 3 will be corrected with the following data.

      Author response image 5.

      Structures of NHS-BODIPY and NHS-SBD, and representative FLIM images of V30A30, A30V30, V30G30 and G30V30 labeled with respective fluorophores. The fluorescence lifetime of each image is the average acquired from three independent experiments. Scale bar: 5 µm.

      We revised the text in the section Microphase separation of ELP condensates as follows “To experimentally test the microphase separation behavior uncovered in simulations, we studied the micro-physicochemical properties of the V-end and X-end of the peptides. We constructed diblock peptides with the combination of 30 pentameric repeats of V block and X (A or G) block, namely V30A30 and V30G30 (Experimental Sequences Section in the Supporting Information). The amino-termini of V30A30 and V30G30 sequences were subsequently labeled with environmentally sensitive BODIPY or SBD fluorophores [cite], whose lifetime could be measured to quantify the viscosity or polarity of the V-end (Fig. 3A, left panel) [cite]. These probes have been reported to be only sensitive to single physicochemical properties.[cite] To avoid artifacts induced by fluorophore labeling, we usually used ELPs labeled with a low fraction of dyes. We also constructed A30V30 and G30V30 diblock peptides, wherein the viscosity or polarity of the A-end or the G-end could be measured by fluorophores that are attached at the amino-terminus (Fig. 3A, right panel). Using FLIM, we found that the lifetime of BODIPY for the V-end (5.43 ns) was longer than that for the A-end (4.35 ns), suggesting that the V-end indeed has a higher microviscosity than the A-end (ηV= 2233.54 cp vs ηA= 969.57 cp). Accordingly, the lifetime of SBD was longer for the V-end (8.75 ns) than the A-end (7.00 ns), indicating that the micropolarity of the V-end was lower than the A-end (ϵV= 13.25 vs ϵA = 18.97). These observations could be largely attributed to the greater extent of dehydration at the V-end due to its higher local peptide density. We further showed that the observed differences are not results of possible artifacts arising from any subtle distinctions between the two sequences V30A30 and A30V30 (Experimental Characterization of ELP Condensates Section in the Supporting Information, Fig. S8-S9 in the Supporting Information). Similar results were observed using the V-G sequences. FLIM experiments revealed that the V-end was more viscous than the G-end (ηV= 2972.72 cp vs ηG= 1958.60 cp) and the V-end was less polar than the G-end (ϵV= 9.14 vs ϵG = 27.50). These experimental observations provided the first line of evidence to support the microphase separation, as suggested by the simulation results.”

      We revised the text in the section Experimental methods as follows

      “The proteins of interest were labeled with NHS ester fluorophore. We used ELPs with 1% BODIPY labels or 2.5% SBD labels to form condensates, which avoid the artifacts induced by fluorophores. Droplets were formed with the final concentration of 70 µM ELP in 2 M NaCl for V-A and 1.5 M NH4SO4 for V-G diblock, respectively. A drop of droplets containing solution was placed on a 0.17 mm coverslip with a 500 µm spacer. Images were acquired by Leica Falcon Fluorescence Microscope equipped with Wil pulse laser and 63X/0.12 oil-immersion objective. The BODIPY was excited at 488 nm and the SBD was excited at 448 nm. The fluorescence lifetime fitting and image analysis were performed in LAS X and Image J.”

      We also used a lower concentration of free dyes to remeasure the properties of the ELP condensates. The Figure S9 data are corrected as follows. The slight differences between the results are caused by experimental errors, which don’t affect the conclusion.

      Author response image 6.

      FLIM image of unlabeled ELP condensates. A) Chemical structure of free fluorophore, which can measure the physicochemical properties of condensates without labeling. B) Representative FLIM images of V30A30 and A30V30. The mix is the mixture of V30A30 (35 µM) and A30V30 (35 µM). Droplets were formed with a final concentration of 70 µM ELP in 2 M NaCl solution with 1 µM fluorophore. C) Representative FLIM images of V30G30 and G30V30. Droplets were formed with a final concentration of 70 µM ELP in 1.5 M (NH4)2SO4 solution with 1 µM fluorophore. The mix is the mixture of V30G30(35 µM) and G30V30 (35 µM). Scale bar, 5 µm. The fluorescence lifetime of each image is the average from three independent measurements.

      We also revised the Sequence dependence of micro-viscosity and polarity section of the Supporting Information as follows

      “Since we used V30X30 and X30V30 to quantify the V- and X-end of the V-X blocks, it is possible that the observed differences arose from the innate property of the V30X30 and X30V30 sequences. To rule out this artifact, we formed the ELP condensates with sequences of V30X30, X30V30, or the V30X30 and X30V30 mixture. The condensates were subsequently treated with the aldehydeBODIPY and methyl-ester SBD fluorophores without the NHS ester reactive warhead (Fig. S9A in the Supporting Information). After brief incubation, aldehyde-BODIPY and methyl-ester SBD fluorophores were recruited into and homogeneously distributed in the ELP condensates. The fluorescence lifetime of aldehyde-BODIPY was the same for V30A30 (4.96 ns), A30V30 (4.99 ns), and their mixture (4.98 ns) (Fig. S9B in the Supporting Information, upper panel). Interestingly, this value is around the average (4.89 ns) of the A-end (4.35 ns) and the V-end (5.43 ns) labeled NHS-BODIPY. For the SBD measurement, methyl-ester SBD resulted in almost identical lifetime values of V30A30 (8.25 ns), A30V30 (8.27 ns), and their mixture (8.28 ns) (Fig. S9B in the Supporting Information, lower panel), again around the average values (7.88 ns) of the A-end (7.00 ns) and the V-end (8.75 ns) labeled NHS-SBD. In addition to the V-A blocks, similar observations were made for the V-G blocks as V30G30 and G30V30 sequences (Fig. S9C in the Supporting Information). The slight difference between the results is attributed to the experiment errors. Because the fluorophores did not covalently label the amino-terminus of the ELP peptides, their lifetime reports closer to the averaged property of the condensates instead of the microscopic property of the V-end or the X-end when the number of molecules is sufficient and the molecular distribution has no preference.

      Our results reveal that the V30X30 and X30V30 condensates exhibited similar macroscopic viscosity or polarity, suggesting that the previously observed different viscosity or polarity of V30X30 and X30V30 could be attributed to the microscopic property of the V-end or X-end.”

      The FLIM technique combined with environment-sensitive fluorophores is a powerful tool for us to investigate the physicochemical properties of the microenvironment within the condensates. However, there are some limitations to this method. As the fluorophore is labeled in the protein, we can only detect the microenvironment surrounding the surface of the probe(the distance may be angstrom level). The fluorescence signal values we got are the statistical average of the fluorescence signals from the complex microenvironments. The signal from the probes is determined by the sampling position, orientation, and number of fluorescent probes. So the quantified values can be compared relatively, but these values can not accurately describe the physical or chemical states in different systems. In addition, the resolution in FLIM experiments is not enough to directly distinguish the microstructure in condensates.

      Comment 2: It is unclear if, after the application of stretching, the micro-structure will eventually return to the original configuration or not. Overall, the point of this experiment remains somewhat unclear.

      We thank the reviewer for this comment. The ELP condensates are actually viscous fluids and they could coalesce into larger droplets within seconds. Due to the high viscosity, ELP condensates show slow fluorescence recovery after photobleaching. As stretching the condensates, the micro-structure of condensates changes to show a response to the outer force. The fluorophores may be pulled out from the microenvironment. For such a dynamic system, we speculate that the microstructure will return to the original after the condensation system equilibrium, which may be a long process. However, it is hard to characterize whether these microstructures have completely returned to their original positions. The purpose of this experiment is to show the microenvironment properties of each terminal in another aspect. The experiment also shows evidence that the microenvironment around the V terminus is more dense than the A terminus.

      Comment 3: The title is too generic and does not reflect the content of the work. There is no analysis of biological condensates. The results are specific to di-block polypetides with specific sequences. This should be clearly specified in text and title.

      We have revised the title to ”Microphase Separation Produces Interfacial Environment within Diblock Biomolecular Condensates”

      Comment 4: MD is out of the expertise of this reviewer. However, when looking at the density profiles (Figure S2), the simulation does not seem to be fully converged. The densities fluctuate inconsistently along the Z direction. The authors should comment on assessing simulation convergence. In many cases, the section used for the density values in the plot (i.e., below 0.06 box lengths away from the condensate center) does not seem representative of the dense phase. It should be justified, why these simulations can still be used for density/hydrogen bonding analysis.

      We thank the reviewer for their comment, and agree that convergence of MD simulations is simultaneously important and difficult to control for. To demonstrate the convergence of our simulations, we have taken an example system (V5F5) and reproduced the density profile in 4 unique time windows of 50 ns each (Author response image 7A-D). We find that all distributions are nearly identical, indicating that further extending these simulations is unlikely to change our findings.

      While we agree that the choice of 0.06 box lengths is arbitrary, it was chosen as an approximation for the interior of the condensate, where the more hydrophobic half of the protein chain tends to be at higher concentration. However, this choice is not important to our overall conclusion. Halving (Author response image 7E) or doubling (Author response image 7F) the cutoff maintains the inverse correlation between the protein density of the X5 half of the condensate and experimental transition temperature.

      Finally, in our multiscale simulation approach, the all-atom portion of the simulation is mostly used to examine water structure and protein solvation. We can see that dividing the simulation into four independent time estimates does not substantially change these properties, resulting in low standard deviations in Figure 5 and Figure 6. Similarly, our previous work on the dielectric of ELP condensates has shown that choosing different starting structures from MARTINI simulations is unlikely to effect the estimate of similar quantities.[11]

      Author response image 7.

      Checking convergence of all-atom simulations of ELP condensates. (A-D) The relative mass density along the Z-distance from the condensate center is shown for the V-substituted and X-substituted halves of V5F5 in four independent time windows of 50 ns each. The Z−axis is defined as the direction perpendicular to the condensate-water interface. The dashed line represents a Z-distance of 0.06 box lengths away from the condensate center, which was the original cutoff for correlation analysis. E-F) Correlation between the mass fraction of the X5 half of the condensate and transition temperature (Tt) from Urry.[12] The condensate is defined as having a Z-distance of 0.03 box lengths (E) or 0.12 box lengths (F) away from the condensate center. ρ is the Pearson correlation coefficient between the two data sets, and the dashed diagonal line is the best fit line. Error bars represent standard deviations of the mean taken over box length intervals of 0.01.

      References

      (1) McDaniel JR, Radford DC, Chilkoti A (2013) A unified model for de novo design of elastin-like polypeptides with tunable inverse transition temperatures. Biomacromolecules 14:2866–2872.

      ](2) Meyer DE, Chilkoti A (2004) Quantification of the effects of chain length and concentration on the thermal behavior of elastin-like polypeptides. Biomacromolecules 5:846–851.

      (3) Helfand E, Tagami Y (1972) Theory of the interface between immiscible polymers. J. Chem. Phys. 56:3592.

      (4) Roe RJ (1975) Theory of the interface between polymers or polymer solutions. I. Two components system. J. Chem. Phys. 62:490–499.

      (5) Shi AC (2021) Frustration in block copolymer assemblies. J. Phys. Condens. Matter 33.

      (6) Flory PJ (1942) Thermodynamics of high polymer solutions. J. Chem. Phys. 10:51.

      (7) Grason GM (2006) The packing of soft materials: Molecular asymmetry, geometric frustration and optimal lattices in block copolymer melts. Phys. Rep. 433:1–64.

      (8) Matsen MW, Bates FS (1996) Unifying weak- and strong-segregation block copolymer theories. Macromolecules 29:1091–1098.

      (9) Matsen MW, Schick M (1994) Stable and unstable phases of a diblock copolymer melt. Phys. Rev. Lett. 72:2660–2663.

      (10) Swann JM, Topham PD (2010) Design and application of nanoscale actuators using block-copolymers. Polymers 2:454–469.

      (11) Ye S et al. (2023) Micropolarity governs the structural organization of biomolecular condensates. Nat. Chem. Biol. pp 1–9.

      (12) Urry DW (1997) Physical chemistry of biological free energy transduction as demonstrated by elastic protein-based polymers. J. Phys. Chem. B 101:11007–11028.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This valuable study advances our understanding of the brain nuclei involved in rapid-eye movement (REM) sleep regulation. Using a combination of imaging, electrophysiology, and optogenetic tools, the study provides convincing evidence that inhibitory neurons in the preoptic area of the hypothalamus influence REM sleep. This work will be of interest to neurobiologists working on sleep and/or brain circuitry.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This paper identifies GABA cells in the preoptic hypothalamus which are involved in REM sleep rebound (the increase in REM sleep) after selective REM sleep deprivation. By calcium photometry, these cells are most active during REM, and show more claim signals during REM deprivation, suggesting they respond to "REM pressure". Inhibiting these cells ontogenetically diminishes REM sleep. The optogenetic and photometry work is carried out to a high standard, the paper is well-written, and the findings are interesting.

      We thank the reviewer for the detailed feedback and thoughtful comments on how to improve our manuscript. To address the reviewer’s concerns, we revised our discussion and added new data. Below, we address the concerns point by point.

      Points that could be addressed or discussed:

      (1) The circuit mechanism for REM rebound is not defined. How do the authors see REM rebound as working from the POAGAD2 cells? Although the POAGAD2 does project to the TMN, the actual REM rebound could be mediated by a projection of these cells elsewhere. This could be discussed.

      We demonstrate thatPOA GAD2→TMN cells become more frequently activated as the pressure for REMs builds up, whereas inhibiting these neurons during high REMs pressure leads to a suppression of the REMs rebound. It is not known how POA GAD2→TMN cells encodeincreased REMs pressure and subsequently influence the REMs rebound. REMsdeprivation wasshown to changethe intrinsic excitabilityof hippocampal neurons and impact synaptic plasticity (McDermott et al., 2003; Mallick and Singh, 2011 ; Zhou et al., 2020) . We speculate that increasedREMs pressure leads to an increase in the excitabilityof POA->TMN neurons, reflected inthe increased number ofcalcium peaks. The increased excitability of POA GAD2→TMN neurons in turn likely leads to stronger inhibition of downstream REM-off neurons. Consequently, as soon as REMsdeprivation stops, there is an increased chance for enteringREMs. The time coursefor how long it takes till the POA excitability resettles toits baseline consequently sets a permissive time window for increasedamounts of REMs to recover its lostamount. For future studies, it would be interesting to map how quickly the excitability ofPOA neurons increases or decays as afunction of the lost or recovered amount of REMs andunravel the cellularmechanisms underlying the elevated activity of POAGAD2 →TMN neurons during highREMs pressure, e.g., whether changes in the expression of ion channels contribute to increasedexcitability of these neurons (Donlea et al., 2014) . As we mentioned in the Discussion, the POAalso projects to other REMs regulatorybrain regions such as the vlPAG and LH. Therefore, it remains to be tested whether POA GAD2 →TMN neurons also innervate these brain regions to potentially regulate REMs homeostasis. We explicitly state this now in the revised Discussion.

      (2) The "POAGAD2 to TMN" name for these cells is somewhat confusing. The authors chose this name because they approach the POAGAD2 cells via retrograde AAV labelling (rAAV injected into the TMN). However, the name also seems to imply that neurons (perhaps histamine neurons) in the TMN are involved in the REM rebound, but there is no evidence in the paper that this is the case. Although it is nice to see from the photometry studies that the histamine cells are selectively more active (as expected) in NREM sleep (Fig. S2), I could not logically see how this was a relevant finding to REM rebound or the subject of the paper. There are many other types of cells in the TMN area, not just histamine cells, so are the authors suggesting that these non-histamine cells in the TMN could be involved?

      We acknowledge that other types of neurons in the TMN may also be involved in the REMs rebound, and therefore inhibition of histamine neurons by POA GAD2 →TMN neurons may not be the sole source of the observed effect. To stress that other neurons within the TMN and/or brain regions may also contribute to the REMs rebound, we have revised the Results section.

      We performed complementary optogenetic inhibition experiments of TMN HIS neurons to investigate if suppression of these neurons is sufficient to promote REMs. We foundthat SwiChR++ mediated inhibition of TMNHIS neurons increased theamount of REMs compared withrecordings without laser stimulation in the same mice and eYFPmice withlaser stimulation. Thus, while TMN HIS neurons may not bethe only downstream target of GABAergic POA neurons, these data suggest that they contribute to REMs regulation. We have incorporated these results in Fig. S4 .

      We further investigated whether the activity of TMN HIS neurons changes between two REMs episodes. Assumingthat REMs pressure inhibits the activity ofREM-off histamine neurons,their firing rates should behighest right after REMs ends when REMs pressure is lowest, and progressivelydecay throughout the inter-REM interval, and reach their lowest activity right before the onset of REMs ( Park et al., 2021) , similarto the activity profile observed for vlPAG REM-off neurons (Weber et al., 2018).We indeed found that TMNHIS neurons displaya gradual decrease in their activity throughout theinter-REM interval and thus potentially reflect the build up of REM pressure ( Fig. S2F ).

      (3) It is a puzzle why most of the neurons in the POA seem to have their highest activity in REM, as also found by Miracca et al 2022, yet presumably some of these cells are going to be involved in NREM sleep as well. Could the same POAGAD2-TMN cells identified by the authors also be involved in inducing NREM sleep-inhibiting histamine neurons (Chung et al). And some of these POA cells will also be involved in NREM sleep homeostasis (e.g. Ma et al Curr Biol)? Is NREM sleep rebound necessary before getting REM sleep rebound? Indeed, can these two things (NREM and REM sleep rebound) be separated?

      Previous studies have demonstrated that POA GABAergic neurons, including those projecting to the TMN, are involved in NREMs homeostasis (Sherin et al., 1998; Gong et al., 2004; Ma et al., 2019) . Therefore, we predict that POA neurons that are involved in NREMs homeostasis are a subset of POA GAD2 → TMN neurons in our manuscript.

      Using optrode recordings in the POA, we recently reported that 12.4% of neurons sampled have higher activity during NREMs compared with REMs; in contrast, 43.8% of neurons sampled have the highest activity during REMs compared with NREMs (Antila et al., 2022) indicating that the proportion of NREM max neurons is smaller compared with REM max neurons. These proportions of neurons are in agreement with previous results (Takahashi et al., 2009) . Considering fiber photometry monitors the average activity of a population of neurons as opposed to individual neurons, it is possible that we recorded neural activity across heterogeneous populations and therefore our findings may disguise the neural activity of the low proportion of NREMs neurons. We previously reported thespiking activity of POA GAD2 →TMN neurons at the singlecell level (Chung et al., 2017) . We have noted in themanuscript thatwhile the activity ofPOA GAD2→TMN neurons is highestduring REMs, theneural activity increases at NREMs → REMs transitions indicating these neurons also areactive during NREMs.

      Using our REMs restriction protocol, we selectively restricted REMs leading to the subsequent rebound of REMs without affecting NREMs and consequently we did not find an increase in the amount of NREMs during the rebound or an increase in slow-wave activity, a key characteristic of sleep rebound that gradually dissipates during recovery sleep (Blake and Gerard, 1937; Williams et al., 1964; Rosa and Bonnet, 1985; Dijk et al., 1990; Neckelmann and Ursin, 1993; Ferrara et al., 1999) . However, during total sleep deprivation when subjects are deprived of both NREMs and REMs, isolating NREMs and REMs rebound may not be attainable.

      (4) Is it possible to narrow down the POA area where the GAD2 cells are located more precisely?

      POA can be subdivided into anatomically distinct regions such as medial preoptic area, median preoptic area, ventrolateral preoptic area, and lateral preoptic area (MPO, MPN, VLPO, and LPO respectively). To quantify where the virus expressing GAD2 cells and optic fibers are located within the POA, we overlaid the POA coronal reference images (with red boundaries denoting these anatomically distinct regions) over the virus heat maps and optic fiber tracts from datasets used in Figure 1A. We found that virus expression and optic fiber tracts were located in the ventrolateral POA, lateral POA, and the lateral part of medial POA, and included this description in the text.

      Author response image 1.

      Location of virus expression (A) and optic fiber placement (B) within subregions of POA.

      (5) It would be ideal to further characterize these particular GAD2 cells by RT-PCR or RNA seq. Which other markers do they express?

      Single-cell RNA-sequencing of POA neurons has revealed an enormous level of molecular diversity, consisting of nearly 70 subpopulations based on gene expression of which 43 can be clustered into inhibitory neurons (Moffitt et al., 2018) . One of the most studied subpopulation of POA sleep-active neurons contains the inhibitory neuropeptide galanin (Sherin et al., 1998; Gaus et al., 2002; Chung et al., 2017; Kroeger et al., 2018; Ma et al., 2019; Miracca et al., 2022) . Galanin neurons have been demonstrated to innervate the TMN (Sherin et al., 1998) yet, within the galanin neurons 7 distinct clusters exist based on unique gene expression (Moffitt et al., 2018) . In addition to galanin, we have previously performed single-cell RNA-seq on POA GAD2 → TMN neurons and identified additional neuropeptides such as cholecystokinin (CCK), corticotropin-releasing hormone (CRH), prodynorphin (PDYN), and tachykinin 1 (TAC1) as subpopulations of GABAergic POA sleep-active neurons (Chung et al., 2017; Smith et al., 2023) . Like galanin, these neuropeptides can also be divided into multiple subtypes as well (Chen et al., 2017; Moffitt et al., 2018) . Thus while these molecular markers for POA neurons are immensely diverse, we agree that characterizing the molecular identity of POA GAD2 → TMN neurons and investigating the functional relevance of these neuropeptides in the context of REMs homeostasis would enrich our understanding of a neural circuit involved in REMs homeostasis and can stand as a separate extension of this manuscript.

      Reviewer #2 (Public Review):

      Maurer et al investigated the contribution of GAD2+ neurons in the preoptic area (POA), projecting to the tuberomammillary nucleus (TMN), to REM sleep regulation. They applied an elegant design to monitor and manipulate the activity of this specific group of neurons: a GAD2-Cre mouse, injected with retrograde AAV constructs in the TMN, thereby presumably only targeting GAD2+ cells projecting to the TMN. Using this set-up in combination with technically challenging techniques including EEG with photometry and REM sleep deprivation, the authors found that this cell-type studied becomes active shortly (≈40sec) prior to entering REM sleep and remains active during REM sleep. Moreover, optogenetic inhibition of GAD2+ cells inhibits REM sleep by a third and also impairs the rebound in REM sleep in the following hour. Despite a few reservations or details that would benefit from further clarification (outlined below), the data makes a convincing case for the role of GAD2+ neurons in the POA projecting to the TMN in REM sleep regulation.

      We thank the reviewer for the thorough assessment of our study and supportive comments. We have addressed your concerns in the revised manuscript, and our point by point response is provided below.

      The authors found that optogenetic inhibition of GAD2+ cells suppressed REM sleep in the hour following the inhibition (e.g. Fig2 and Fig4). If the authors have the data available, it would be important to include the subsequent hours in the rebound time (e.g. from ZT8.5 to ZT24) to test whether REM sleep rebound remains impaired, or recovers, albeit with a delay.

      We thank the reviewer for this comment and agree that it would be interesting to know how REMs changes for a longer period of time throughout the rebound phase. For Fig. 2, we did not record the subsequent hours. For Fig 4, we recorded the subsequent rebound between ZT7.5 and 10.5. When we compare the REMs amount during this 4 hr interval, the SwiChR mice have less REMs compared with eYFP mice with marginal significance (unpaired t-test, p=0.0641). We also plotted the cumulative REMs amount during restriction and rebound phases, and found that the cumulative amount of REMs was still lower in SwiChR mice than eYFP mice at ZT 10.5 (Author response image 2). Therefore, it will be interesting to record for a longer period of time to test when the SwiChR mice compensate for all the REMs that was lost during the restriction period.

      Author response image 2.

      Cumulative amount of REMs during REMs deprivation and rebound combined with optogenetic stimulation in eYFP and SwiChR groups. This data is shown as bar graphs in Figure 4.

      REM sleep is under tight circadian control (e.g. Wurts et al., 2000 in rats; Dijk, Czeisler 1995 in humans). To contextualize the results, it would be important to mention that it is not clear if the role of the manipulated neurons in REM sleep regulation hold at other circadian times of the day.

      Author response image 3.

      Inhibiting POA GAD2→ TMN neurons at ZT5-8 reduces REMs. (A) Schematic of optogenetic inhibition experiments. (B) Percentage of time spent in REMs, NREMs and wakefulness with laser in SwiChR++ and eYFP mice. Unpaired t-tests, p = 0.0013, 0.0469 for REMs and wakeamount. (C) Duration of REMs, NREMs, and wake episodes. Unpaired t-tests, p = 0.0113 for NREMs duration. (D) Frequency of REMs, NREMs, and wake episodes. Unpaired t-tests, p = 0.0063, 0.0382 for REMs and NREMs frequency.

      REMs propensity is largest towards the end of the light phase (Czeisler et al., 1980; Dijk and Czeisler, 1995; Wurts and Edgar, 2000). As a control, we therefore performed the optogenetic inhibition experiments of POA GAD2→TMN neurons during ZT5-8 (Author response image 3). Similar to our results in Figure 2, we found that SwiChR-mediated inhibition of POA GAD2 →TMN neurons attenuated REMs compared with eYFP laser sessions. These findings suggest our results are consistentat other circadian times of the day.

      The effect size of the REM sleep deprivation using the vibrating motor method is unclear. In FigS4-D, the experimental mice reduce their REM sleep to 3% whereas the control mice spend 6% in REM sleep. In Fig4, mice are either subjected to REM sleep deprivation with the vibrating motor (controls), or REM sleep deprivations + optogenetics (experimental mice).

      The control mice (vibrating motor) in Fig4 spend 6% of their time in REM sleep, which is double the amount of REM sleep compared to the mice receiving the same treatment in FigS4-D. Can the authors clarify the origin of this difference in the text?

      The effect size for REM sleep deprivation is now added in the text.

      It is important to note that these figures are analyzing two different intervals of the REMs restriction. In Fig. S4D, we analyzed the total amount of REMs over the entire 6 hr restriction interval (ZT1.5-7.5). In Fig. 4, we analyzed the amount of REMs only during the last 3 hr of restriction (ZT4.5-7.5) as optogenetic inhibition was performed only during the last 3 hrs when the REMs pressure is high. In Fig. S4D, we looked at the amount of REMs during ZT1.5-4.5 and 4.5-7.5 and found that the amount of REMs during ZT4.5-7.5 (4.46 ± 0.25 %; mean ± s.e.m.) is indeed higher than ZT 1.5-4.5 (1.66 ± 0.62 %), and is comparable to the amount of REMs during ZT4.5-7.5 in eYFP mice (5.95 ± 0.52 %) in Fig. 4. We now clearly state in the manuscript at which time points we analyzed the amount, duration and frequency of REMs.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) A few further citations suggested: Discussion "The TMN contains histamine producing neurons and antagonizing histamine neurons causes sleepiness..." It would be appropriate to cite Uygun DS et al 2016 J Neurosci (PMID: 27807161) here. Using the same HDC-Cre mice as used by Maurer et al., Uygun et al found that selectively increasing GABAergic inhibition onto histamine neurons produced NREM sleep.

      We apologize for omitting this important paper. In the revised manuscript, we added this citation.

      (2) Materials and Methods.

      Although the JAX numbers are given for the mouse lines based on researchers generously donating to JAX for others to use, please cite the papers corresponding to the GAD2-ires-Cre and HDC-ires-Cre mouse lines deposited at JAX.

      GAD2-ires-Cre was described in Taniguchi H et al., 2011, Neuron (PMID: 21943598).

      The construction of the HDC-ires-CRE line is described in Zecharia AY et al J Neurosci et al 2012 (PMID: 22993424).

      We have now added these important citations in the revised manuscript.

      (3) Similarly, for the viruses, please provide the citations for the AAV constructs that were donated to Addgene.

      We have now added these citations in the revised manuscript.

      Reviewer #2 (Recommendations For The Authors):

      The authors rely heavily on their conclusions by using an optogenetic tool that inhibits the activity of GAD2+ neurons, however, it is not shown that these neurons are indeed inhibited as expected. An alternative approach to tackle this could be the application of a different technique to achieve the same output (e.g. chemogenetics). However, both experiments (confirmation of inhibition, or using a different technique) would require a significant amount of work, and given the numerous studies out there showing that these optogenetic tools tend to work, may not be necessary. Hence the authors could also cite a similar study that used a likewise construct and where it was indeed shown that this technique works (i.e. similar retrograde optogenetic construct with Cre depedendent expression combined with electrophysiological recordings).

      This laser stimulation protocol was designed based on previous reports of sustained inhibition using the same inhibitory opsin and our prior results that recapitulate similar findings as inhibitory chemogenetic techniques (Iyer et al., 2016; Kim et al., 2016; Wiegert et al., 2017; Stucynski et al., 2022). We have now added this description in the Result section.

      Fig1A - Right: the virus expression graphs are great and give a helpful insight into the variability. The image on the left (GCAMP+ cells) is less clear, the GCAMP+ cells don't differentiate well from the background. Perhaps the whole brain image with inset in POA can show the GCAMP expression more convincingly.

      We have added a histology picture showing the whole brain image with inset in the POA in the updated Fig. 1A .

      Statistics: The table is very helpful. Based on the degrees of freedom, it seems that in some instances the stats are run on the recordings rather than on the individual mice (e.g. Fig1). It could be considered to use a mixed model where subjects as taken into account as a factor.

      Author response image 4.

      ΔF/Factivity of POA GAD2→TMN neurons during NREMs. The duration of NREMs episodes was normalized in time, ranging from 0 to 100%. Shading, ± s.e.m. Pairwise t-tests with Holm-Bonferroni correctionp = 5.34 e-4 between80 and100. Graybar, intervals where ΔF/F activity was significantly different from baseline (0 to 20%, the first time bin). n = 10 mice. In Fig. 1E , we ran stats based on the recordings. In this data set, we ran stats based on the individual mice, and found that the activity also gradually increased throughout NREMs episodes.

      There is an effect of laser in Fig2 on REM sleep amount, as well as an interaction effect with virus injection (from the table). Therefore, it would be helpful for the reader to also show REM sleep data from the control group (laser stimulation but no active optogenetics construct) in Fig 2.

      To properly control laser and virus effect, we performed the same laser stimulation experiments in eYFP control mice (expressing only eYFP without optogenetic construct, SwiChR++) and the data is provided in Fig 2C .

      Fig3B: At the start of the rebound of REM sleep, there is a massive amount of wakefulness, also reflected in the change of spectral composition. Could you comment on the text about what is happening here?

      We quantified the amount of wakefulness during the first hour of REMs rebound and found that indeed there is no significant difference in wakefulness between REM restriction and baseline control conditions ( Fig. S4H ). Therefore, while the representative image in Fig 3B shows increased wakefulness at the beginning of REMs rebound, we do not think the overall amount of wakefulness is increased.

      Fig 4, supplementary data: it would be helpful for the reader to have mentioned in the text the effect size of the REM sleep restriction protocol (e.g. mean and standard deviation).

      Thank you for this suggestion. We have now added the effect size for the REM sleep restriction experiments in the main text.

      REM sleep restriction and photometry experiment: could be improved by adding within the main body of text that, in order to conduct the photometry experiment in the last hours of REM sleep deprivation, the manual REM sleep deprivation had to be applied, because the vibrating motor technique disturbed the photometry recordings.

      Thank you for this suggestion. We have added the description in the main text.

      Suggestion to build further on the already existing data (not for this paper): you have a powerful dataset to test whether REM sleep pressure builds up during wakefulness or NREM sleep, by correlating when your optogenetic treatment occurs (NREM or wakefulness), with the subsequent rebound in REM sleep (see also Endo et al., 1998; Benington and Heller, 1994; Franken 2001).

      We thank the reviewer for this excellent suggestion. We plan to carry out this experiment in the future.

      References

      Antila, H., Kwak, I., Choi, A., Pisciotti, A., Covarrubias, I., Baik, J., et al. (2022). A noradrenergic-hypothalamic neural substrate for stress-induced sleep disturbances. Proc. Natl. Acad. Sci. 119, e2123528119. doi: 10.1073/pnas.2123528119.

      Blake, H., and Gerard, R. W. (1937). Brain potentials during sleep. Am. J. Physiol.-Leg. Content 119, 692–703. doi: 10.1152/ajplegacy.1937.119.4.692.

      Chen, R., Wu, X., Jiang, L., and Zhang, Y. (2017). Single-Cell RNA-Seq Reveals Hypothalamic Cell Diversity. Cell Rep. 18, 3227–3241. doi: 10.1016/j.celrep.2017.03.004.

      Chung, S., Weber, F., Zhong, P., Tan, C. L., Nguyen, T., Beier, K. T., et al. (2017). Identification of Preoptic Sleep Neurons Using Retrograde Labeling and Gene Profiling. Nature 545, 477–481. doi: 10.1038/nature22350.

      Czeisler, C. A., Zimmerman, J. C., Ronda, J. M., Moore-Ede, M. C., and Weitzman, E. D. (1980). Timing of REM sleep is coupled to the circadian rhythm of body temperature in man. Sleep 2, 329–346.

      Dijk, D. J., Brunner, D. P., Beersma, D. G., and Borbély, A. A. (1990). Electroencephalogram power density and slow wave sleep as a function of prior waking and circadian phase. Sleep 13, 430–440. doi: 10.1093/sleep/13.5.430.

      Dijk, D. J., and Czeisler, C. A. (1995). Contribution of the circadian pacemaker and the sleep homeostat to sleep propensity, sleep structure, electroencephalographic slow waves, and sleep spindle activity in humans. J. Neurosci. Off. J. Soc. Neurosci. 15, 3526–3538. doi: 10.1523/JNEUROSCI.15-05-03526.1995.

      Donlea, J. M., Pimentel, D., and Miesenböck, G. (2014). Neuronal machinery of sleep homeostasis in Drosophila. Neuron 81, 860–872. doi: 10.1016/j.neuron.2013.12.013.

      Ferrara, M., De Gennaro, L., Casagrande, M., and Bertini, M. (1999). Auditory arousal thresholds after selective slow-wave sleep deprivation. Clin. Neurophysiol. Off. J. Int. Fed. Clin. Neurophysiol. 110, 2148–2152. doi: 10.1016/s1388-2457(99)00171-6.

      Gaus, S. E., Strecker, R. E., Tate, B. A., Parker, R. A., and Saper, C. B. (2002). Ventrolateral preoptic nucleus contains sleep-active, galaninergic neurons in multiple mammalian species. Neuroscience 115, 285–294. doi: 10.1016/S0306-4522(02)00308-1.

      Gong, H., McGinty, D., Guzman-Marin, R., Chew, K.-T., Stewart, D., and Szymusiak, R. (2004). Activation of c-fos in GABAergic neurones in the preoptic area during sleep and in response to sleep deprivation. J. Physiol. 556, 935–946. doi: 10.1113/jphysiol.2003.056622.

      Iyer, S. M., Vesuna, S., Ramakrishnan, C., Huynh, K., Young, S., Berndt, A., et al. (2016). Optogenetic and chemogenetic strategies for sustained inhibition of pain. Sci. Rep. 6, 30570. doi: 10.1038/srep30570.

      Kim, H., Ährlund-Richter, S., Wang, X., Deisseroth, K., and Carlén, M. (2016). Prefrontal Parvalbumin Neurons in Control of Attention. Cell 164, 208–218. doi: 10.1016/j.cell.2015.11.038.

      Kroeger, D., Absi, G., Gagliardi, C., Bandaru, S. S., Madara, J. C., Ferrari, L. L., et al. (2018). Galanin neurons in the ventrolateral preoptic area promote sleep and heat loss in mice. Nat. Commun. 9, 4129. doi: 10.1038/s41467-018-06590-7.

      Ma, Y., Miracca, G., Yu, X., Harding, E. C., Miao, A., Yustos, R., et al. (2019). Galanin Neurons Unite Sleep Homeostasis and α2-Adrenergic Sedation. Curr. Biol. CB 29, 3315-3322.e3. doi: 10.1016/j.cub.2019.07.087.

      Mallick, B. N., and Singh, A. (2011). REM sleep loss increases brain excitability: role of noradrenaline and its mechanism of action. Sleep Med. Rev. 15, 165–178. doi: 10.1016/j.smrv.2010.11.001.

      McDermott, C. M., LaHoste, G. J., Chen, C., Musto, A., Bazan, N. G., and Magee, J. C. (2003). Sleep deprivation causes behavioral, synaptic, and membrane excitability alterations in hippocampal neurons. J. Neurosci. Off. J. Soc. Neurosci. 23, 9687–9695. doi: 10.1523/JNEUROSCI.23-29-09687.2003.

      Miracca, G., Anuncibay-Soto, B., Tossell, K., Yustos, R., Vyssotski, A. L., Franks, N. P., et al. (2022). NMDA Receptors in the Lateral Preoptic Hypothalamus Are Essential for Sustaining NREM and REM Sleep. J. Neurosci. 42, 5389–5409. doi: 10.1523/JNEUROSCI.0350-21.2022.

      Moffitt, J. R., Bambah-Mukku, D., Eichhorn, S. W., Vaughn, E., Shekhar, K., Perez, J. D., et al. (2018). Molecular, spatial, and functional single-cell profiling of the hypothalamic preoptic region. Science 362. doi: 10.1126/science.aau5324.

      Neckelmann, D., and Ursin, R. (1993). Sleep stages and EEG power spectrum in relation to acoustical stimulus arousal threshold in the rat. Sleep 16, 467–477.

      Park, S.-H., Baik, J., Hong, J., Antila, H., Kurland, B., Chung, S., et al. (2021). A probabilistic model for the ultradian timing of REM sleep in mice. PLOS Comput. Biol. 17, e1009316. doi: 10.1371/journal.pcbi.1009316.

      Rosa, R. R., and Bonnet, M. H. (1985). Sleep stages, auditory arousal threshold, and body temperature as predictors of behavior upon awakening. Int. J. Neurosci. 27, 73–83. doi: 10.3109/00207458509149136.

      Sherin, J. E., Elmquist, J. K., Torrealba, F., and Saper, C. B. (1998). Innervation of histaminergic tuberomammillary neurons by GABAergic and galaninergic neurons in the ventrolateral preoptic nucleus of the rat. J. Neurosci. Off. J. Soc. Neurosci. 18, 4705–4721.

      Smith, J., Honig-Frand, A., Antila, H., Choi, A., Kim, H., Beier, K. T., et al. (2023). Regulation of stress-induced sleep fragmentation by preoptic glutamatergic neurons. Curr. Biol. CB , S0960-9822(23)01585–3. doi: 10.1016/j.cub.2023.11.035.

      Stucynski, J. A., Schott, A. L., Baik, J., Chung, S., and Weber, F. (2022). Regulation of REM sleep by inhibitory neurons in the dorsomedial medulla. Curr. Biol. CB 32, 37-50.e6. doi: 10.1016/j.cub.2021.10.030.

      Takahashi, K., Lin, J.-S., and Sakai, K. (2009). Characterization and mapping of sleep-waking specific neurons in the basal forebrain and preoptic hypothalamus in mice. Neuroscience 161, 269–292. doi: 10.1016/j.neuroscience.2009.02.075.

      Weber, F., Hoang Do, J. P., Chung, S., Beier, K. T., Bikov, M., Saffari Doost, M., et al. (2018). Regulation of REM and Non-REM sleep by periaqueductal GABAergic neurons. Nat. Commun. 9, 1–13. doi: 10.1038/s41467-017-02765-w.

      Wiegert, J. S., Mahn, M., Prigge, M., Printz, Y., and Yizhar, O. (2017). Silencing Neurons: Tools, Applications, and Experimental Constraints. Neuron 95, 504–529. doi: 10.1016/j.neuron.2017.06.050.

      Williams, H. L., Hammack, J. T., Daly, R. L., Dement, W. C., and Lubin, A. (1964). RESPONSES TO AUDITORY STIMULATION, SLEEP LOSS AND THE EEG STAGES OF SLEEP. Electroencephalogr. Clin. Neurophysiol. 16, 269–279. doi: 10.1016/0013-4694(64)90109-9.

      Wurts, S. W., and Edgar, D. M. (2000). Circadian and homeostatic control of rapid eye movement (REM) sleep: promotion of REM tendency by the suprachiasmatic nucleus. J. Neurosci. Off. J. Soc. Neurosci. 20, 4300–4310. doi: 10.1523/JNEUROSCI.20-11-04300.2000.

      Zhou, Y., Lai, C. S. W., Bai, Y., Li, W., Zhao, R., Yang, G., et al. (2020). REM sleep promotes experience-dependent dendritic spine elimination in the mouse cortex. Nat. Commun. 11, 4819. doi: 10.1038/s41467-020-18592-5.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews

      Reviewer #1 (Public Review):

      Summary:

      Dormancy/diapause/hibernation (depending on how the terms are defined) is a key life history strategy that allows the temporal escape from unfavorable conditions. Although environmental conditions do play a major role in inducing and terminating dormancy (authors call this energy limitation hypothesis), the authors test a mutually non-exclusive hypothesis (life-history hypothesis) that sex-specific selection pressures, at least to some extent, would further shape the timing of these life-history events. Authors use a metanalytic approach to collect data (mainly on rodents) on various life-history traits to test trade-offs among these traits between sexes and how they affect entry and termination of dormancy.

      Strengths:

      I found the theoretical background in the Introduction quite interesting, to the point and the arguments were well-placed. How sex-specific selection pressures would drive entry and termination of diapause in insects (e.g. protandry), especially in temperate butterflies, is very well investigated. Authors attempt to extend these ideas to endotherms and trying to find general patterns across ectotherms and endotherms is particularly exciting. This work and similar evidence could make a great contribution to the life-history theory, specifically understanding factors that drive the regulation of life cycle timing.

      Weaknesses:

      (1) I felt that including 'ectotherms' in the title is a bit misleading as there is hardly (in fact any?) any data presented on ectotherms. Also, most of the focus of the discussion is heavily mammal (rodent) focussed. I believe saying endotherms in the title as well is a bit misleading as the data is mammalfocused.

      We change the title to : "Evolutionnary trade-offs in dormancy phenology". This is a hybrid article comprising both a meta-analysis and a literature review. Each of these parts brings new elements to the hypotheses presented. The statistical analyses only concern mammals and especially rodent species. But the literature review highlighted links between the evolution of dormancy in ectotherms and endotherms that have not been linked in previous studies. We feel it is important for readers to know that much of the discussion will focus on the comparison of these two groups. But we understand that placing the term ectotherms in the title might suggest a meta-analysis including these two groups.

      In addition, we indicated more specifically in the abstract and at the end of the introduction that the article includes two approaches associated with different groups of animals.

      We also specified in the section « review criteria » that:

      Only one bird species is considered to be a hibernator, and no information is available on sex differences in hibernation phenology (Woods and Brigham 2004, Woods et al. 2019).

      We have also added a "study limitations" section, which explains that although the meta-analysis is limited by the data available in the literature, the information available for the species groups not studied seems to support our results.

      (2) I think more information needs to be provided early on to make readers aware of the diversity of animals included in the study and their geographic distribution. Are they mostly temperate or tropical? What is the span of the latitude as day length can have a major influence on dormancy timings? I think it is important to point out that data is more rodent-centric. Along the line of this point, is there a reason why the extensively studied species like the Red Deer or Soay Sheep and other well-studied temperate mammals did not make it into the list?

      We specified in the abstract and at the end of the introduction that the species studied in the metaanalysis are mainly Holarctic species. We have also added a map showing all the study sites used in the meta-analysis. Finally, we've noted in the methods and added a "study limitation" section at the end of the discussion an explanation for those species that were not studied in the meta-analysis and the consequences for the interpretation of results

      The hypotheses developed in this article are based on the survival benefits of seasonal dormancy thanks to a period of complete inactivity lasting several months. The Red Deer or Soay Sheep remain active above ground throughout the year.

      The effect of photoperiod on phenology is one of the mechanisms that has evolved to match an activity with the favorable condition. In this study, we are not interested in the mechanisms but in the evolutionary pressures that explain the observed phenology. Interspecific variation in the effect of photoperiod results from different evolutionary pressures, which we are trying to highlight. It is therefore not necessary to review mechanisms and effects of photoperiod, themselves requiring a lengthy review.

      We also tested the “physiological constraint hypothesis” on several variables. Temperature and precipitation are factors correlated with sex differences in phenology of hibernation. These factors allow consideration of the geographical differences that influence hibernation phenology.

      (3) Isn't the term 'energy limitation hypothesis' which is used throughout the manuscript a bit endotherm-centric? Especially if the goal is to draw generalities across ectotherms and endotherms. Moreover, climate (e.g. interaction of photoperiod and temperature in temperatures) most often induces or terminates diapause/dormancy in ectotherms so I am not sure if saying 'energy limitation hypothesis' is general enough.

      We renamed this hypothesis the "physiological constraint hypothesis" and we have made appropriate changes in the text so as not to focus physiological constraints solely on energy aspects.

      (4) Since for some species, the data is averaged across studies to get species-level trait estimates, is there a scope to examine within population differences (e.g. across latitudes)? This may further strengthen the evidence and rule out the possibility of the environment, especially the length of the breeding season, affecting the timing of emergence and immergence.

      For a given species, data on hibernation phenology are averaged for different populations, but also for the same population when measurements are taken over several years. To test these hypotheses on a population scale, precise data on reproductive effort would be needed for each population tested, but this concerns very few species (less than 5).

      Testing the effects of temperature and precipitation allows us to take into account the effects of climate on phenology.

      (5) Although the authors are looking at the broader patterns, I felt like the overall ecology of the species (habitat, tropical or temperate, number of broods, etc.) is overlooked and could act as confounding factors.

      Yes, that's why we also tested the physiological constraints hypothesis, including the effect of temperature and precipitation. For the life-history hypothesis, we also tested reproductive effort, which takes into account the number of offspring per year.

      (6) I strongly think the data analysis part needs more clarity. As of now, it is difficult for me to visualize all the fitted models (despite Table 1), and the large number of life-history traits adds to this complexity. I would recommend explicitly writing down all the models in the text. Also, the Table doesn't make it clear whether interaction was allowed between the predictors or not. More information on how PGLS were fitted needs to be provided in the main text which is in the supplementary right now. I kept wondering if the authors have fit multiple models, for example, with different correlation structures or by choosing different values of lambda parameter. And, in addition to PGLS, authors are also fitting linear regressions. Can you explain clearly in the text why was this done?

      To simplify the results, we reduced the number of models to just three: one for emergence and two for immergence. In place of Table 1, we have written the structure of the models used. We have added a sentence to the statistics section: “each PGLS model produces a λ parameter representing the effect of phylogeny ranging between 0 (no phylogeny effect) and 1 (covariance entirely explained by co-ancestry)”. We have tested only three PGLS models and the estimated lambda value for these models is 0.

      (7) Figure 2 is unclear, and I do not understand how these three regression lines were computed. Please provide more details.

      We tested new models and modified existing figures.

      Reviewer #2 (Public Review):

      Summary:

      An article with lots of interesting ideas and questions regarding the evolution of timing of dormancy, emphasizing mammalian hibernation but also including ectotherms. The authors compare selective forces of constraints due to energy availability versus predator avoidance and requirements and consequences of reproduction in a review of between and within species (sex) differences in the seasonal timing of entry and exit from dormancy.

      Strengths:

      The multispecies approach including endotherms and ectotherms is ambitious. This review is rich with ideas if not in convincing conclusions.

      Weaknesses:

      The differences between physiological requirements for gameatogenesis between sexes that affect the timing of heterothermy and the need for euthermy during mammalian hibernator are significant issues that underlie but are under-discussed, in this contrast of selective pressures that determine seasonal timing of dormancy. Some additional discussion of the effects of rapid climate change on between and within species phenologies of dormancy would have been interesting.

      Reviewer #2 (Recommendations For The Authors):

      This review provides a very interesting and ambitious among and within-species comparison of the seasonal timing of entry and exit from dormancy, emphasizing literature from hibernating mammals (sans bats and bears) and with attention to ectotherms. The authors test hypotheses related to the timing of food availability (energy) versus life history considerations (requirements for reproduction, avoiding predation) while acknowledging that these are not mutually exclusive. I offer advice for clarifications and description of the limitations of the data (accuracy of emergence and immergence times), but mainly seek more emphasis for small mammalian hibernators on the contrast for requirements for significant periods of euthermy prior to the emergence in males versus females, a contrast that has energetic and timing consequences in both the active and hibernation seasons.

      A consideration alluded to but not fully explained or discussed is the differences in mammals between species and sexes in the timing of what can be called ecological hibernation, which is the seasonal duration that an animal remains sequestered in its burrow or den, and heterothermic hibernation, between the beginning and end of the use of torpor. The two are not synonymous. When "emergence" is the first appearance above ground, there is a significant missing observation key to the energetic contrasts discussed in this review, that of this costly pre-emergence behavior.

      To explain the difference between heterothermic hibernation and ecological hibernation, we've added a section in review Criteria from materials and methods :

      “In this study, we addressed what can be called ecological hibernation, i.e. the seasonal duration that an animal remains sequestered in its burrow or den, which is assumed to be directly linked to the reduced risk of predation. In contrast, we did not consider heterothermic hibernation, which corresponds to the time between the beginning and end of the use of torpor. So when we mention hibernation, emergence or immergence, the specific reference is to ecological hibernation.”

      In arctic and other ground squirrel species, males remain at high body temperatures after immerging and remaining in their burrows in the fall for several days to a week, and more consistently and importantly, males that will attempt to breed in the spring end torpor but remain constantly in their burrows for as much as one month at great expense whilst undergoing testicular growth, spermatogenesis, spemiation, and sperm capacitation, processes that require continuous euthermy. Female arctic ground squirrels and non-breeding males do not and typically enter their first torpor bout 1-2 days after immergence and first appear above ground 1-3 days after their last arousal in spring.

      The weeks spent euthermic in a cold burrow in spring by males while undergoing reproductive maturation require a significant energetic investment (can equate to the cost of the previous heterothermic period) that contrasts profoundly with the pre-mating energetic investment by females.

      Males cache food in their hibernacula and extend their active season in late summer/fall in order to do so and feed from these caches in spring after resuming euthermy, often emerging at body weights similar to that at immergence. Similar between-sex differences in the timing of hibernation and heterothermy occur in golden-mantled and Columbian ground squirrels and likely most other Urocitellus spp., though less well described in other species. These differences are related to life histories and requirements for male vs. female gameatogenesis and, at the same time, energetic considerations in the costs to males for remaining euthermic while undergoing spermatogenesis and the cost related to whether males undergo gonadal development being dependent on individual body mass and cache size. These issues should be better discussed in this review.

      It is the time required to complete spermatogenesis, spermiation, and maturation of sperm not the time for growth of different sizes of testes that drives the preparation time for males. This is relatively constant among rodents. I challenge the assumption that larger testes take longer to grow than smaller ones.

      We took this comment into account. As we found little evidence of an increase in testicular maturation time with relative testicular size (apart from table 4 in Kenagy and Trombulak, 1986), we no longer tested the effect of relative testicular size on protandry.

      We examined whether the ability to store food before hibernation might reduce protandry. Although food storage in the burrow may be favored for overcoming harsh environments or predation, model selection did not retain the food-storing factor. Thus, the ability to accumulate food in the burrow was not by itself likely to keep males of some species from emerging earlier (e.g. Cricetus cricetus, protandry : 20 day, Siutz et al., 2016). Early emerging males may benefit from consuming higher quality food or in competition with other males (e.g., dominance assertion or territory establishment, Manno and Dobson 2008).

      We developed these aspects in the discussion

      While it is admirable to include ectotherms in such a broad review and modelling, I can't tell what data from how many ectothermic species contributed to the models and summary data included in the figures.

      Too few data on ectotherms were available to include ectotherms in the meta-analysis

      Some consideration should be made to the limitations of the data extracted from the literature of the accuracy of emergence and immergence dates when derived from only observations or trapping data. The most accurate results come from the use of telemetry for location and data logging reporting below vs. above ground positioning and body temperature.

      We added a "study limits" section to the discussion to address all the limits in this commentary.

      L64 "favor reproduction", better to say "allow reproduction", since there is strong evolutionary pressure to initiate reproduction early, often anticipating favorable conditions for reproduction, to maximize the time available for young to grow and prepare for overwintering themselves.

      Also, generally, it is not how "harsh" an environment is but rather how short the growing season is.

      We took this comment into account.

      L80 More simply, individuals that have amassed sufficient energy reserves as fat and caches to survive through winter may opt to initiate dormancy. This may decrease but not obviate predation, since hibernating animals are dug from their burrows and eaten by predators such as bears and ermine.

      In this sentence, we indicated a gap between dormancy phenology and the growing season, which suggests survival benefits of dormancy other than from a physiological point of view. We've changed the sentence to make it clearer : “However, some animals immerge in dormancy while environnemental conditions would allow them (from a physiological point of view) to continue their activity, suggesting other survival benefits than coping with a short growing season”

      L88 other physiological or ecological factors.... (gameatogenesis).

      In this study, we examine possible evolutionary pressures and therefore the environmental factors that may influence hibernation phenology. We focus on reproductive effort because, assuming predation pressure, we would expect a trade-off between survival and reproduction.

      L113 beginning early to afford long active seasons to offspring while not compromising the survival of parents.

      We added to the sentence:

      “For females, emergence phenology may promote breeding and/or care of offspring during the most favorable annual period (e.g., a match of the peak in lactational energy demand and maximum food availability, Fig. 1) or beginning early to afford long active seasons to offspring while not compromising the survival of parents.”

      L117 based on adequate preparation for overwintering and enter dormancy....

      We modified the sentence as follows :

      recovering from reproduction, and after acquiring adequate energy stores for overwintering”

      L123 given that males outwardly invest the least time in reproduction yet generally have shorter hibernation seasons would seem to reject this hypothesis. This changes if you overtly include the time and energy that males expend while remaining euthermic preparing for hibernation, a cost that can be similar to energy expended during heterothermy.

      Males invest a lot of time in reproduction before females emerge (whether for competition or physiological maturation) and some males seem to be subject to long-term negative effects linked to reproductive stress (see Millesi, E., Huber, S., Dittami, J., Hoffmann, I., & Daan, S. (1998). Parameters of mating effort and success in male European ground squirrels, Spermophilus citellus. Ethology, 104(4), 298-313). Both processes may contribute to reducing the duration of male hibernation.

      L125 again, costs to support euthermy in males undergoing reproductive development is an investment in reproduction.

      You're right, but it's difficult to quantify. We tested a model that takes into account the reproductive effort during reproduction and prior to reproduction. We also considered the hypothesis that species living in a cold climate might have a low protandry while having a high reproductive effort due to their ability to feed in the burrow (interaction effect between reproductive effort and temperature). We think these changes answer your comment.

      L134 It isn't growing large testes that takes time, but instead completing spermatogenesis and maturation of sperm in the epdidymides.

      We removed this part.

      L140 Later immergence in male ground squirrels is related to accumulation and defense of cached food, activities that are related to reproduction the next spring. An experimental analysis that would be revealing is to compare immergence times in females that completed lactation to the independence of their litters vs. females that did not breed or lost their litters. Who immerges first?

      Body mass variation from emergence to the end of mating in males seems to explain the delayed immergence of males in species that don't hide food in their burrows for hibernation. For example, in spermophilus citellus, males immege on average more than 3 weeks after females, yet they do not hide food in their burrows for the winter.

      Such a study already exists and shows that non-breeding females immerge earlier than breeding females. We refer to it

      L386: “In mammals, males and females that invest little or not at all in reproduction exhibit advances in energy reserve accumulation and earlier immergence for up to several weeks, while reproductive congeners continue activity (Neuhaus 2000, Millesi et al. 2008a).”

      L164 So you examined literature from 152 species but included data from only 29 species? Did you include data from social hibernators (marmots) that mate before emergence?

      With current models, we have 28 different species. We have few species because very few have data on both sex difference data and information on reproductive effort data (especially for males).

      Data on sex differences in hibernation were not available for social hibernating species.

      L169 Were these data from trapping or observation results? How reliable are these versus the use of information from implanted data loggers or collars that definitively document when euthermy is resumed and/or when immergence and first emergence occurs (through light loggers)?

      We did not focus heterothermic hibernation, but in ecological hibernation. We have no idea of the margin of error for these types of data, but we have discussed these limitations in the "Study limitations" section.

      L180, again, it is the time required to complete spermatogenesis and spermiation not the time for the growth of different sizes of testes that drives the preparation time for males. This is relatively constant among rodents. I challenge the assumption that larger testes take longer to grow than smaller ones.

      We removed this part.

      L200 Males that accumulate caches in fall and then feed from those during the spring pre-emergence euthermic interval and after will often be at their seasonal maximum in body mass. Declining from that peak may not be stressful.

      It has been suggested that reproductive effort in Spermophilus citellus might induce long-term negative effects that delay male immergence.

      Millesi, E., Huber, S., Dittami, J., Hoffmann, I., & Daan, S. (1998). Parameters of mating effort and success in male European ground squirrels, Spermophilus citellus. Ethology, 104(4), 298-313.

      L210 How about altitude, which affects the length of the growing season at similar latitudes?

      We extracted the location of each study site to determine the temperature and precipitation at that precise location (based on interpolated climate surface). We therefore take into account differences in growing season (based on temperature) in altitude between sites.

      L267 How did whether males cache food or not figure into these comparisons? Refeeding before mating occurs during the pre-emergence euthermic interval.

      We removed this part.

      L332, 344 not a "proxy" but functionally related to advantages in mating systems with multiple mating males.

      We removed this part.

      L353 The need for a pre-emergence euthermic interval in male ground squirrels requires costs in the previous active season in accumulating and defending a cache and the proximal costs in spring while remaining at high body temperatures prior to emergence with resulting loss in body mass or devouring of the cache.

      You're right, but in this section, we quickly explain the benefits of food catching compared with other species that don't do so.

      L385 This review should discuss why females are not known to cache and contrast as "income breeders" from "capital breeder" males. What advantages of caches are females indifferent to (no need for a prolonged pre-emergence period) and what costs of accumulating caches do they avoid (prolonged activity period and defense of caches).

      We clarified the case of female emergence.

      L321 : “Thus, an early emergence of males may have evolved in response to sexual selection to accumulate energy reserve in anticipation of reproductive effort. Females, on the contrary, are not subject to intraspecific competition for reproduction and may have sufficient time before (generally one week after emergence) and during the breeding period to improve their body condition.”

      L388 I don't understand the logic of the conclusion that "did not ...adequately explain the late male immergence" in this section. The greater mass loss in males over the mating period is afforded by the presence of a cache that requires later immergence.

      We removed this part.

      L412 Not just congeners that invest less in reproduction, but within species individuals that do not attempt to breed in one or more years and thus have no reproductive costs should be an interesting comparison for differences in phenology from individuals that do breed. Non-breeders are often yearlings but can be a significant overall proportion of males that fail to fatten or cache enough to afford a pre-emergence euthermic period.

      L385: “In mammals, males and females that invest little or not at all in reproduction exhibit advances in energy reserve accumulation and earlier immergence for up to several weeks, while reproductive congeners continue activity (Neuhaus 2000, Millesi et al. 2008a).”

      The sentence refers to individuals who reproduce little or not at all.

      L445 Males that gain weight between emergence and mating may do so by feeding from a cache regardless of how "harsh" an environment is.

      We observe this phenomenon even in species that are not known to hoard food

      “Gains in body mass observed for some individuals, even in species not known to hoard food, may indicate that the environment allows a positive energy balance for other individuals with comparable energy demands.”

      L492 Some insects retreat to refugia in mid-summer to avoid parasitism (Gynaephora).

      Escape from parasites is also a benefit of dormancy.

      Fig 1 - It is difficult to see the differences in black and green colors, esp if color blind.<br /> Maternal effort is front-loaded within the active season (line for "optimal period" shown in midseason).

      Add "energy" underneath c) Prediction (H1) and "reproduction" underneath d) "Prediction (H2). Explain the orange vs black, green colors of triangles.

      We made the necessary changes

      Fig 2 - I don't buy the regression lines as significant in this figure. The red line, cannot have a regression with two sample points and without the left-hand most dot, nothing is significant.

      We deleted this graph.

      Fig 3 - females only?

      We deleted this graph.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Recommendations for the authors

      Reviewer #1 (Recommendations For The Authors):

      Below I summarize points that should be addressed in a revised version of the manuscript.

      • Page 6, first paragraph: I don't understand by the signals average out to a single state. If the distribution is indeed randomly distributed, a broad signal with low intensity should be present.

      We agree that this statement may cause confusion. We changed the text (marked in bold) to clarify the statement: The mobility of the undocked SBDs will be higher than the diffusion of the whole complex, allowing the sampling of varying interdomain distances within a single burst. However, these dynamic variations are subsequently averaged to a singular FRET value during FRET calculations for each burst, and may appear as a single low FRET state in the histograms.

      • Page 6, third paragraph: how can the donor only be detected in the acceptor channel? Is this tailing out?

      Donor only signal is not detected in the acceptor channel. As described in page 5 and in the Materials & Methods section, the dye stoichiometry value is defined for each burst/dwell using three types of photon counts: donor-based donor emission (FDD), donor-based acceptor emission (FDA) and acceptorbased acceptor emission (FAA).

      When no acceptor fluorophore is present FAA=0 and S=1.

      Some donor photons bleed through into the acceptor channel, but we correct for this by calculating the leakage and crosstalk factors as described in the Materials and Methods (page 20).

      We changed the text (marked in bold) in the manuscript to address the question: The FRET data of both OpuA variants is best explained by a four-state model (Figure 2A,B; fourth and fifth panel) (Supplementary File 3). Two of the four states represent donor-only (S≈1) or acceptor-only (S≈0) dwells. The full bursts belonging to donor-only and acceptor-only molecules were excluded prior to mpH2MM. This means that some molecules transit to a donor-only or acceptor-only state within the burst period, which most likely reflects blinking or bleaching of one of the fluorophores. These donoronly and acceptor-only states were also excluded during further analysis. The other two states reflect genuine FRET dwells that were analyzed by mpH2MM. They represent different conformations of the SBDs.

      • Page 7, "SBD dynamics ..": why was the V149Q mutant only analyzed in the K521C background and not also in the N414C background?

      The two FRET states were best distinguished in OpuA-K521C. Therefore, we decided to focus on OpuA-K521C and not OpuA-N414C. OpuA-V149Q was used to show that reduced docking efficiency does not affect the transition rate constants and relative abundances of the two FRET states, and we regarded it sufficient to test the SBD dynamics in OpuA-K521C only.

      • Page 8, second paragraph: why was the N414C mutant analyzed only from 0 - 600 mM and not also up to 1000 mM?

      In line with the previous answer, our main focus was on OpuA-K521C, since the two FRET states were best distinguished in OpuA-K521C. OpuA-N414C was used to prove that similar states are observed when measuring with fluorophores on the opposite site of the SBD. We studied how the FRET states change in response to different conditions that correspond to different stages of the transport cycle and how it changes in response to different ionic strengths. Initially, 600 mM KCl was used to study the dynamics of the SBD at high ionic strength. Later in this study, we tested a very wide range of different salt concentrations for OpuA-K521C to get detailed insights into the dynamics of the SBDs over a wide ionic strength range. Note that 1 M KCl is a very high, non-physiological ionic strength for the typical habitat of L. lactis and was only used to show that the high FRET state occurs even under very extreme conditions.

      • Page 8, third paragraph: why was the dimer (if it is the source of the FRET signal) only partially disrupted?

      We acknowledge that this is a very good point. However, we purposely did not speculate on this point in the manuscript, because we have limited information on the molecular details of the interaction. As we highlight on page 8, the SBDs experience each other in a very high apparent concentration (millimolar range). This means that the interactions are most likely very weak (low affinity) and not very specific. Such interactions are in the literature referred to as the quinary structure of proteins and they occur at the high macromolecular crowding in the cell and in proteins with tethered domains, and thus at high local concentrations. Such interactions can be screened by high ionic strength. In the revised manuscript, we now present the partially disrupted dimer structure in the context of the quinary structure of a protein (page 11):

      In other words, the high FRET state may comprise an ensemble of weakly interacting states rather than a singular stable conformation, resembling the quinary structure of proteins. The quinary structure of proteins is typically revealed in highly crowded cellular environments and describes the weak interactions between protein surfaces that contribute to their stability, function, and spatial organization (Guin & Gruebele, 2019). Despite the current study being conducted under dilute conditions, the local concentration of SBDs (~4 mM) mimics a densely populated environment and reveal quinary structure.

      • Page 9, second paragraph: according to the EM data processing, only 20% of the particles were used for 3D reconstruction. Why? Does it mean that the remaining 80% were physiologically not relevant? If so, why were the 20% used relevant?

      We note that it is a fundamental part of image processing of single particle cryo-EM data to remove false positives or low-resolution particles throughout the processing workflow. In particular when using a very low and therefore generous threshold during automated particle picking, as we did (t=0.01 and t=0.05 for the 50 mM KCl and 100 mM KCl datasets, respectively), the initial set of particles includes a significant amount of false positives – a tradeoff to avoid excluding particles belonging to low populated classes/orientations. It is thus common that more than 50% of ‘particles’ are excluded in the first rounds of 2D classification. In our case, only 30% and 52% of particles were retained after such first clean-up steps. Subsequently, the particle set is further refined, and additional false positives and low-resolution particles are excluded during extensive rounds of 3D classification. We also note that during the final steps, most of the data excluded represents particles of lower quality that do not contribute to a high-resolution, or belong to low population protein conformations. This does not mean that such a population is not physiological relevant. In conclusion, having only 5-20% of the initial automated picked particles contributing to the reconstruction of the final cryo-EM map is common, with the vast majority of excluded particles being false positives.

      • Page 11, third paragraph: the way the proposed model is selected is also my main criticism. All alternative models do not fit the data. Therefore, the proposed model is suggested. However, I do not grasp any direct support for this model. Either I missed it or it is not presented.

      Concerning the specific model in Figure 5, the reviewer is correct. We do not provide direct evidence for a side-ways interaction. However, we have evidence of transient interactions and our data rule out several scenarios of interaction, leaving 5C as the most likely model. This is also the main conclusion of this paper: In conclusion, the SBDs of OpuA transiently interact in a docking competent conformation, explaining the cooperativity between the SBDs during transport. The conformation of this interaction is not fixed but differs substantially between different conditions.

      Because the interaction is very short-lived it was not possible to visualize molecular details of this interaction. We present Figure 5 to hypothesize the most likely type of interaction, since many possibilities can be excluded with the vast amount of presented data. To make our point more clear that we discuss models and rule out several possibilities but not demonstrate a specific interaction between the SBDs, we now write on page 10 (changes marked in bold): We have shown that the SBDs of OpuA come close together in a short-lived state, which is responsive to the addition of glycine betaine (Figure 4A). Although the occurrence of the state varies between different conditions, it was not possible to negate the high-FRET state completely, not even under very high or low KCl concentrations, or in the presence of 50 mM arginine plus 50 mM glutamate (Figure 4A,B). To evaluate possible interdomain interactions scenarios we consider the following: (1) The SBDs of OpuA are connected to the TMDs with very short linkers of approximately 4 nm, which limit their movement and allow the receptor to sample a relatively small volume near its docking site. (2) in low ionic strength condition OpuA-K521C displays a high FRET state with mean FRET values of 0.7-0.8, which correspond to inter-dye distances of approximately 4 nm. (3) The high FRET state is responsive to glycine betaine, which points toward direct communication between the two SBDs. (4) The distance between the density centers of the SBDs in the cryo-EM reconstructions (based on particles with a low and high FRET state) is 6 nm, which aligns with the dimensions of an SBD (length: ~6 nm, maximal width: ~4 nm). These findings collectively indicate that two SBDs interact but not necessarily in a singular conformation but possibly as an ensemble of weakly interacting states. Hence, we discuss three possible SBD-SBD interaction models to explain the highFRET state:

      Reviewer #2 (Recommendations For The Authors):

      In the abstract and elsewhere the authors suggest that the SBDs physically interact with one another, and that this interaction is important for the transport mechanism, specifically for its cooperativity.

      I feel that this main claim is not well established. The authors convincingly demonstrate that the SBDs largely occupy two states relative to one another and that in one of these states, they are closer than in the other. Unless I have missed (or failed to understand) some major details of the results, I did not find any evidence of a physical interaction. Have the authors established that the high FRET state indeed corresponds to the physical engagement of the SBDs? I feel that a direct demonstration of an interaction is much missing.

      Along the same lines, in the low-salt cryo-EM structure, where the SBDs are relatively closer together, the SBDs are still separated and do not interact.

      See also our response to the final comment of reviewer 1. Furthermore, please carefully consider the following: (1) FRET values of 0.7-0.8 correspond to inter-dye distances of approximately 4 nm. (2) The high FRET state is responsive to glycine betaine, which points toward direct communication between the two SBDs. (3) The cryo-EM reconstruction is the average of all the particles in the final dataset, including both the particles with a low and high FRET state. Further, the local resolution of the SBDs in the cryo-EM map is low, indicative of high degree of flexibility. Thus, a potential interaction is possible within the observed range of flexibility. (4) The distance between the density centers is 6 nm, aligning with the dimensions of an SBD (length: 6 nm, maximal width: 4 nm). These factors collectively indicate SBD interactions, and we present these points now more explicitly in Figure 4 and the last part of the results section (page 9).

      Once the authors successfully demonstrate that direct physical interaction indeed occurs, they will need to provide data that places it in the context of the transport cycle. Do the SBDs swap ligand molecules between them? Do they bind the ligand and/or the transporter cooperatively? What is the role of this interaction?

      We acknowledge the intriguing nature of the posed questions, but they extend beyond the scope of this study. It is extremely challenging to obtain high-resolution structures of highly dynamic multidomain proteins, like OpuA, and to probe transient interactions as we do here for the SBDs of OpuA. We therefore combined cryo-TEM with smFRET studies and perform the most advanced and state-of-theart analysis tools as acknowledged by reviewer 1. We link our observations on the structural dynamics and interactions of the SBDs to a previous study, where we showed that the two SBDs of OpuA interact cooperatively. We do not have further evidence that connect the physical interactions to the transport cycle. In our view, the collective datasets indicate that the here reported physical interactions between the SBDs increase the transport efficiency.

      As far as I understand, the smFRET data have been interpreted on the basis of a negative observation, i.e., that it is "likely" that none of the FRET states corresponds to a docked SBD. To convincingly show this, a positive observation is required, i.e., observation of a docked state.

      The aim of this study was to study interdomain dynamics and not specifically docking. We have previously shown that docking can be visualized via cryo-EM (Sikkema et al., 2020), however the SBDs of OpuA appear to only dock in specific turnover conditions. We now show that the high FRET state of OpuA cannot represent a docked state, but that the SBDs transiently interact (see our response to the first comment). Importantly, a docked state was also not found in the cryo-EM reconstructions at low ionic strength, representing the smFRET conditions where we observe the interactions between the SBDs. The high FRET state occupies 30% of the dwells in this condition, and such a high percentage of molecules would have become apparent during cryo-EM 3D classification in case they would form a docked state. Therefore, we conclude that docking does not occur in low ionic strength apo condition. We discuss this point and our reasoning on page 11 of the revised manuscript.

      In this respect, I find it troubling that in none of the tested conditions, the authors observed a FRET state which corresponds to the docked state. Such a state, which must exist for transport to occur (as mentioned in the authors' previous publications), needs to be demonstrated. This brings me to my next question: why have the authors not measured FRET between the SBDs and the transporter? Isn't this a very important piece that is missing from their puzzle?

      We agree that investigating docking behavior under varied turnover conditions requires focused experiments on FRET dynamics between the SBDs and the transporter. As noted on page 5, OpuA exists as a homodimer, implying that a single cysteine mutation introduces two cysteines in a single functional transporter. To specifically implement a cysteine mutation in only one SBD and one transmembrane domain, it is necessary to artificially construct a heterodimer. We recently published initial attempts in this direction, and this will be a subject for future research but still requires years of work.

      Additionally, I feel that important controls are missing. For example, how will the data presented in Fig1 look if the transporter is labeled with acceptor or donor only? How do soluble SBDs behave?

      In the employed labeling method, donor and acceptor dyes are mixed in a 1:1 ratio and randomly attached to the two cysteines in the transporter. This automatically yields significant fractions of donor only and acceptor only transporters which are always present during the smFRET recordings. We can visualize those molecules on the basis of the dye stoichiometry, which we calculate by using three types of photon counts: donor-based donor emission (FDD), donor-based acceptor emission (FDA) and acceptorbased acceptor emission (FAA).

      Unfiltered plots look as follows (a dataset of OpuA-K521C at 600 mM KCl):

      Author response image 1.

      Donor only and acceptor only molecules have a very well discernible stoichiometry of 1 and 0, respectively. The filtering procedure is described in the materials and methods section, and these plots can be found in the supplementary database. We did not add them to the main text or supplementary materials of the original manuscript, as this is a very common procedure in the field of smFRET. We now include such a dataset in the revised manuscript.

      Soluble SBDs of OpuA have been studied previously (e.g. Wolters et al., 2010 & De Boer et al. 2019). For example, we have shown by SEC-MALLLS that soluble SBDs do not form dimers, which is consistent with our notion that the SBDs interact with low affinity. It is not possible to study interdomain dynamics between soluble SBDs by smFRET, because the measurements are carried out at picomolar concentrations (monomeric conditions). We emphasize that smFRET measurements with native complexes, with SBDs near each other at apparent millimolar concentrations, is physiologically more relevant.

      Additional comments:

      (1) "It could well be that cooperativity and transient interactions between SBDs is more common than previously anticipated" and a similar statement in the abstract. What evidence is there to suggest that the transient interactions between SBDs are a common phenomenon?

      On page 11, we write: Dimer formation of SBPs has been described for a variety of proteins from different structural clusters of substrate-binding proteins [33–38,51–53]. We cite 9 papers that report SBD/SBP dimers. This suggest to us that the phenomenon of interacting substrate-binding proteins could be more common. Moreover, the concentration of maltose-binding protein and other SBPs in the periplasm of Gram-negative bacteria can reach (sub)millimolar concentrations, and low-affinity interactions may play a role not only in membrane protein-tethered SBDs (like in OpuA) but also be important in soluble substrate-receptors. Such low-affinity interactions are rarely studied in biochemical experiments.

      (2) I think that the data presented in 1B-C better suits the supplementary information.

      Figure 1B-D is already a summary of the supplementary information that describes the optimization of OpuA purification. We think it is valuable to show this part of the figure in the main text. A very clean and highly pure OpuA sample is essential for smFRET experiments. Quality of protein preparations and data analysis are key for the type of measurements we report in this paper.

      (3) "the first peak in the SEC profile corresponds...." The peaks should be numbered in the figure to facilitate their identification.

      We have changed the figure as suggested.

      (4) "smFRET is a powerful tool for studying protein dynamics, but it has only been used for a handful of membrane proteins". With the growing list of membrane proteins studied by smFRET I find this an overstatement.

      We removed this sentence in the new version of the manuscript.

      (5) "We rationalized that docking of one SBD could induce a distance shift between the two SBDs in the FRET range of 3-10 nm (Figure 1E)" How and why was this assumed?

      We realize that this is one of the sentences that caused confusion about the aim of this study. In this part of the manuscript, we should not have used docking as an example and we apologize for that. We replaced the sentence by: These variants are used to study inter-SBD dynamics in the FRET range of 310 nm (Figure 1E).

      Also Figure 1E was adjusted to prevent confusion:

      Author response image 2.

      In addition, to avoid any confusion we changed the following sentence on page 4 (changes marked in bold): We designed cysteine mutations in the SBD of OpuA to study interdomain dynamics in the full length transporter.

      (6) "However, the FRET distributions are broader than would be expected from a single FRET state, especially for OpuA-K521C" Have the authors established how a single state FRET of OpuA looks? Is there a control that supports this claim?

      Below we compare two datasets from OpuA-K521C in 600 mM KCl with a typical smFRET dataset from the well-studied substrate-binding protein MBP from E. coli, which resides in a single state. Left: OpuA-K521C; Right: MBP

      Author response image 3.

      We agree that this cannot be assumed from the presented data. Therefore we rewrote this sentence: However, the FRET distributions tail towards higher FRET values, especially OpuA-K521C.

      (7) "V149Q was designed as a mild mutation that would reduce docking efficiency and thereby substrate loading, but leave the intrinsic transport and ATP hydrolysis efficiency intact." I find this statement confusing: How can a mutation reduce docking efficiency yet leave the transport activity unchanged?

      We rewrote the sentences (changes marked in bold): V149Q was designed as a mild mutation that would reduce docking efficiency and thereby substrate loading, but leave the ionic strength sensing in the NBD and the binding of glycine betaine and ATP intact. Accordingly, a reduced docking efficiency should result in a lower absolute glycine betaine-dependent ATPase activity. At the same time the responsiveness of the system to varying KCl, glycine betaine, or Mg-ATP concentrations should not change.

      (8) Along the same lines: "whereas the glycine betaine-, Mg-ATP-, or KCl-dependent activity profiles remain unchanged" vs. "OpuA-V149Q-K521C exhibited a 2- to 3-fold reduction in glycine betainedependent ATPase activity".

      See comment at point 7.

      (9) In general, I find the writing wanting at places, not on par with the high standards set by previous publications of this group.

      We recognize the potential ambiguity in our phrasing. We hope that after incorporating the feedback provided by the reviewers our manuscript will convey our findings in a clearer manner.

      Extra changes to the text:

      (1) Title changed: The substrate-binding domains of the osmoregulatory ABC importer OpuA physically transiently interact

      (2) Second part of the abstract changed: We now show, by means of solution-based single-molecule FRET and analysis with multi-parameter photon-by-photon hidden Markov modeling, that the SBDs transiently interact in an ionic strength-dependent manner. The smFRET data are in accordance with the apparent cooperativity in transport and supported by new cryo-EM data of OpuA. We propose that the physical interactions between SBDs and cooperativity in substrate delivery are part of the transport mechanism.

      (3) Page 6, third paragraph and Figure 2B: the wrong rate number was extracted from table 1. Changed this in the text and figure: 112 s-1  173 s-1. It did not affect any of the interpretations or conclusions.

      (4) Page 8, last paragraph, changed: smFRET was also performed in the absence of KCl and with a saturating concentration of glycine betaine (100 µM). The mean FRET efficiency of the highFRET state of OpuA-K521C increased to 0.78, which corresponds to an inter-dye distance of about 4 nm. This indicates that the dyes at the two SBDs move very close towards each other (Figure 4A) (Table 1) (Supplementary File 34).

      (5) Page 9, second paragraph changed: Due to the inherent flexibility of the SBDs, with respect to both the MSP protein of the nanodisc and the TMDs of OpuA, their resolution is limited. Furthermore, the cryo-EM reconstructions average all the particles in the final dataset, including those with a low and high FRET state. Nevertheless, in both conditions, the densities that correspond to the SBDs can be observed in close proximity (Figure 4D). The distance between the density centers is 6 nm and align with the dimensions of an SBD, providing further evidence for physical interactions between the SBDs.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We are grateful to the Editors for overseeing the review of our manuscript, and to the two reviewers for their thoughtful comments and suggestions for how it can be improved.

      I submit at this time a revision, as well as a detailed response (below) to each of the points raised in the first round of review.

      We feel the manuscript has been significantly improved by taking the reviewers' comments to heart. In a nutshell, we added new key pieces of data (impact of WIN site inhibition on global translation, rRNA production, as well as the requested cell biology analyses showing nucleolar stress), new analyses of the proteomics to counter potential concerns with normalization, and expanded/revised verbiage in key areas to clarify parts of the text that were confusing or problematic. The main figures have not changed; all new material is included in supplements to figures 2 and 3.

      Public Reviews

      Reviewer #1 (Public Review):

      Building on previous work from the Tansey lab, here Howard et al. characterize transcriptional and translational changes upon WIN site inhibition of WDR5 in MLL-rearranged cancer cells. They first analyze whether C16, a newer generation compound, has the same cellular effects as C6, an early generation compound. Both compounds reduce the expression of WDR5-bound RPGs in addition to the unbound RPG RPL22L1. They then investigate differential translation by ribo-seq and observe that WIN site inhibition reduces the translational RPGs and other proteins related to biomass accumulation (spliceosome, proteasome, mitochondrial ribosome). Interestingly, this reduction adds to the transcriptional changes and is not limited to RPGs whose promoters are bound by WDR5. Quantitative proteomics at two-time points confirmed the downregulation of RPGs. Interestingly, the overall effects are modest, but RPL22LA is strongly affected. Unexpectedly, most differentially abundant proteins seem to be upregulated 24 h after C6 (see below). A genetic screen showed that loss of p53 rescues the effect of C6 and C16 and helped the authors to identify pathways that can be targeted by compounds together with WIN site inhibitors in a synergistic way. Finally, the authors elucidated the underlying mechanisms and analyzed the functional relevance of the RPL22, RPL22L1, p53, and MDM4 axis.

      While this work is not conceptually new, it is an important extension of the observations of Aho et al. The results are clearly described and, in my view, very meaningful overall.

      Major points:

      (1) The authors make statements about the globality/selectivity of the responses in RNA-seq, ribo-seq, and quantitative proteomics. However, as far as I can see, none of these analyses have spike-in controls. I recommend either repeating the experiments with a spike-in control or carefully measuring transcription and translation rates upon WIN site inhibition and normalizing the omics experiments with this factor.

      The reviewer is correct that we did not include spike-in controls in our omics experiments. We would like to emphasize that none of the omics data in this manuscript have been processed in unorthodox ways, and that the major conclusions each have independent corroborating data.

      The selectivity in RPG suppression observed in RNA-Seq, for example, is supported by results from our target engagement (QuantiGene) assays; suppression of RPL22L1 mRNA levels is supported by quantitative and semi-quantitative RT-PCR, by western blotting, and by the results of our proteomic profiling; alternative splicing (and expression) of MDM4—and its dependency on RPL22—is also backed up by similar RT-PCR and western blotting data. The same applies for alternative splicing of RPL22L1.

      That said, we do appreciate the point the reviewer is making here, and have done our best to respond. We do not think it is a prudent investment in resources to repeat the numerous omics assays in the manuscript. We also considered normalizing for bulk transcription and translation rates as suggested, but it is not clear in practice how this would be done, and it could introduce additional variables and uncertainties that may skew the interpretation of results. Instead, to respond to this comment, we made the following changes to the manuscript:

      (1) We now explicitly state, for all omics assays, that spike-in controls were not included. These statements will prompt the reader to make their own assessment of the robustness of each of our findings and interpretations.

      (2) We have added new data to the manuscript (Figure 2—figure supplement 1A–B) measuring the impact of C6 and C16 on bulk translation using the OPP labeling method. These new data demonstrate that WIN site inhibitors induce a progressive yet modest decline in protein synthesis capacity. At 24 hours, there is no significant effect of either agent on protein synthesis levels. By 48 hours, a small but significant effect is observed, and by 96 hours translation levels are ~60% of what they are in vehicle-treated control cells. These new data are important because they support the idea that normalization has not blunted the responses we observe—the magnitude of the effects are consistent between the different assays and tend to cap out at two-fold in terms of RPG suppression, translation efficiency, ribosomal protein levels, and protein synthesis capacity.

      (3) We have included additional analysis regarding the LFQMS, as described below, that specifically addresses the issue of normalization in our proteomics experiments.

      (2) Why are the majority of proteins upregulated in the proteomics experiment after 24 h in C6 (if really true after normalization with general protein amount per cell)? This is surprising and needs further explanation.

      The reviewer is correct in noting that (by LFQMS) ~700 proteins are induced after 24 hours of treatment of MV4:11 cells with C16 (not C6, as stated). The reviewer would like us to examine whether this apparent increase in proteins is a normalization artifact. In response to this comment, we have made the following changes to the manuscript:

      (1) Our new OPP labeling experiments (Figure 2—figure supplement 1A–B) show that there is no significant reduction in overall protein synthesis following 24 hours of C16 treatment. In light of this finding, it is unlikely that normalization artifacts, resulting from diminution of the pool of highly abundant proteins, create the appearance of these 700 proteins being induced. We now explicitly make this point in the text.

      (2) We now clarify in the methods how we seeded identical numbers of cells for DMSO and C16-treated cultures in these experiments, and—consistent with our finding that WIN site inhibitors have little if any effect on protein synthesis or proliferation at the 24 hour timepoint— extracted comparable amounts of proteins from these two treatment conditions (DMSO: 344.75 ± 21.7 µg; C16: 366.50 ± 15.8 µg; [Mean ± SEM]).

      (3) We now include in Figure 3—figure supplement 1A a plot showing the distribution of peptide intensities for each protein detected in each run of LFQMS before and after equal median normalization. This new analysis reveals that the distribution of intensities is not appreciably changed via normalization. Specifically, there is not a reduction in peptide intensities in the unnormalized data from 24 hours of C16 treatment that is reversed or tempered by normalization. This analysis provides further support for the notion that the increase we observe is not a normalization artifact.

      (4) We now include in Figure 3—figure supplement 1B–D a set of new analyses examining the relationship between the initial intensity of proteins in DMSO control samples (a crude proxy for abundance) versus the fold change in response to WIN site inhibitor. This analysis shows that we have as many "highly abundant" (10th decile) proteins increasing as we do decreasing in response to WINi. Thus, it appears as though the wholesale clearance of highly abundant proteins from the cell is not occurring at this early treatment timepoint. In addition, this analysis also shows that ribosomal proteins (RP) are generally the most abundant, most suppressed, proteins and that their fold-change at the protein level at 24 hours is less than two-fold, consistent again with the magnitude of transcriptional effects of C16, as measured by RNA-Seq and QuantiGene. The fact that the drop in RP levels is consistent with expectations based on other analyses provides further empirical support for the notion that protein levels inferred from LFQMS are authentic and not skewed by global changes in the proteome.

      The increase in proteins at this time point, we argue, is thus most likely genuine. It is not surprising that—at a timepoint at which protein synthesis is unaffected—several hundred proteins are induced by a factor of two. How this occurs, we do not know. It may be a transient compensatory mechanism, or it may be an early part of the active response to WIN site inhibitors. Lest the reader be confused by this finding, we have now added text to this section of the manuscript discussing and explaining the phenomenon in more detail.

      (3) The description of the two CRISPR screens (GECKO and targeted) is a bit confusing. Do I understand correctly that in the GECKO screen, the treated cells are not compared with nontreated cells of the same time point, but with a time point 0? If so, this screen is not very meaningful and perhaps should be omitted. Also, it is unclear to me what the advantages of the targeted screen are since the targets were not covered with more sgRNAs (data contradictory: 4 or 10 sgRNAs per target?) than in Gecko. Also, genome-wide screens are feasible in culture for multiple conditions. Overall, I find the presentation of the screening results not favorable.

      In essence, this is a single screen performed in two tiers. In Tier 1, we screened a complete GECKO library (six sgRNA/gene) with the earliest generation (less potent) inhibitor C6, and compared sgRNA representation against the time zero population. This screen would reveal sgRNAs that are specifically associated with response to C6, as well as those that are associated with general cell fitness and viability. We then identified genes connected to these sgRNAs, removed those that are pan essential, and built a custom library for the second tier using sgRNAs from the Brunello library (four sgRNA/gene). We then screened this custom library with both C6 and the more potent inhibitor C16, this time against DMSO-treated cells from the same timepoint.

      We acknowledge that this is not the most streamlined setup for a screen. But our intention was to compare two inhibitors (C6 and C16) and identify high confidence 'hits' that are disconnected from general cell viability, rather than generate an exhaustive list of all genes that, when disrupted, skew the response to WIN site inhibitor. The final result of this screen (Figure 4E) is a gene list that has been validated with two chemically distinct WIN site inhibitors and up to 10 unique sgRNAs per gene. We may not have captured every gene that can modulate response to WIN site inhibitor, but those appearing in Figure 4E are highly validated.

      To answer the reviewer's specific questions: (i) we cannot omit the Tier 1 screen because then there would be no rationale for what was screened in the second Tier; and (ii) the advantage of the custom Tier 2 library is that it allowed us to screen hits from the Tier 1 screen with four completely independent sgRNAs. Although there are not more sgRNAs for each gene in the Tier 2 versus the Tier 1 library, these sgRNAs are different and thus, for C6 at least, hits surviving both screens were validated with up to 10 unique sgRNAs.

      We apologize that the description of the CRISPR screens was not clearer, and have reworked this section of the manuscript to make our intent and our actions clearer.

      (4) Can Re-expression of RPL22 rescue the growth arrest of C6?.

      We have not attempted to complement the RPL22 knock out. But we do note that evidence supporting the idea that loss of RPL22 confers resistance to WIN site inhibitor is strong—six (out of six) sgRNAs against RPL22 were significantly enriched in the Tier 1 screen, and independent knock out of RPL22 with the Synthego multi-guide system in MV4;11 and MOLM13 cells increases the GI50 for C16.

      Reviewer #2 (Public Review):

      Summary:

      The manuscript by Howard et al reports the development of high-affinity WDR5-interaction site inhibitors (WINi) that engage the protein to block the arginine-dependent engagement with its partners. Treatment of MLL-rearranged leukemia cells with high-affinity WINi (C16) decreases the expression of genes encoding most ribosomal proteins and other proteins required for translation. Notably, although these targets are enriched for WDR5-ChIP-seq peaks, such peaks are not universally present in the target genes. High concordance was found between the alterations in gene expression due to C16 treatment and the changes resulting from treatment with an earlier, lower affinity WINi (C6). Besides protein synthesis, genes involved in DNA replication or MYC responses are downregulated, while p53 targets and apoptosis genes are upregulated. Ribosome profiling reveals a global decrease in translational efficiency due to WINi with overall ribosome occupancies of mRNAs ~50% of control samples. The magnitude of the decrements of translation for most individual mRNAs exceeds the respective changes in mRNA levels genome-wide. From these results and other considerations, the authors hypothesize that WINi results in ribosome depletion. Quantitative mass spec documents the decrement in ribosomal proteins following WINi treatment along with increases in p53 targets and proteins involved in apoptosis occurring over 3 days. Notably, RPL22L1 is essentially completely lost upon WINi treatment. The investigators next conduct a CRISPR screen to find moderators and cooperators with WINi. They identify components of p53 and DNA repair pathways as mediators of WINi-inflicted cell death (so gRNAs against these genes permit cell survival). Next, WINi are tested in combination with a variety of other agents to explore synergistic killing to improve their expected therapeutic efficacy. The authors document the loss of the p53 antagonist MDM4 (in combination with splicing alterations of RPL22L1), an observation that supports the notion that WINi killing is p53-mediated.

      Strengths:

      This is a scientifically very strong and well-written manuscript that applies a variety of state-ofthe art molecular approaches to interrogate the role of the WDR5 interaction site and WINi. They reveal that the effects of WINi seem to be focused on the overall synthesis of protein components of the translation apparatus, especially ribosomal proteins-even those that do not bind WDR5 by ChIP (a question left unanswered is how much the WDR5-less genes are nevertheless WINi targeted). They convincingly show that disruption of the synthesis of these proteins is accompanied by DNA damage inferred by H2AX-activation, activation of the p53pathway, and apoptosis. Pathways of possible WINi resistance and synergies with other antineoplastic approaches are explored. These experiments are all well-executed and strongly invite more extensive pre-clinical and translational studies of WINi in animal studies. The studies also may anticipate the use of WINi as probes of nucleolar function and ribosome synthesis though this was not really explored in the current manuscript.

      Weaknesses:

      A mild deficiency in the current manuscript is the absence of cell biological methods to complement the molecular biological and biochemical approaches so ably employed. Some microscopic observations and confirmation of nucleolar dysfunction and DNA damage would be reassuring.

      We thank the reviewer for their comments. We agree that an absence of cell biological methods was a deficiency in the original manuscript. In response to this comment, we have now added immunofluorescence (IF) analyses, examining the impact of C16 on nucleolar integrity and nucleophosmin (NPM1) distribution (Figure 3—figure supplement 4). These new data clearly show that C16 induces nucleolar stress at 72 hours—as measured by the redistribution of NPM1 from the nucleolus to the nucleoplasm. These new data fill an important gap in the story, and we are grateful to the reviewer for prompting us to perform these experiments.

      As part of the above study, we also probed for gamma-H2AX, expecting that we may see some signs of accumulation in the nucleoli (see comment #4 from Reviewer #2, below). We did not observe this response. Importantly, however, we did see that gamma-H2AX staining occurs only in what are overtly apoptotic cells. This is an important finding, because we had previously speculated that the induction of gamma-H2AX observed by Western blotting reflected part of a bona-fide response to DNA damage elicited by WIN site inhibitors. Instead, the IF data now leads us to conclude that this signal simply reflects the established fact that WIN site inhibitors induce apoptosis in this cell line (Aho et al., 2019). In response to this new finding, we have added additional discussion to the text and have removed or de-emphasized the potential contribution of DNA damage to the mechanism of action of WDR5 WIN site inhibitors. Again, we are grateful for this comment as it has prevented us from continuing to report/pursue erroneous observations.

      Recommendations for the authors

      Reviewer #1 (Recommendations For The Authors):

      There is a typo in "but are are linked to mRNA instability when translation is inhibited".

      Thank you for catching this typo. It has now been corrected.

      Reviewer #2 (Recommendations For The Authors):

      (1) The authors report that WINi initially (at 24 hrs) increases the expression of most proteins while decreasing ribosomal proteins, but at 72 hours all proteins are depressed. The transient bump-up of non-translation-related proteins seems odd. A simple resolution to this somewhat strange observation is that there is no real increase in the other proteins, but because of the loss of a large fraction of the most abundant cellular proteins (the ribosomal proteins), the relative fraction of all other proteins is increased; that is, the increase of non-ribosomal proteins may be an artifact of normalization to a lower total protein content. Can this be explored?

      We are grateful to the reviewer for this comment. We have tried our best to respond, as detailed above in response to Reviewer #1 Public Comment #2.

      (2) It would be really nice to assess nucleolar status microscopically. Do nucleoli get bigger? Smaller? Do they have abnormal morphology? Is there nucleolar stress? What happens to rRNA synthesis and processing?

      We agree and thank the reviewer for raising this point. As noted in our response to Reviewer #2, above, we have included new IF that shows: (i) no obvious effect on nucleolar integrity, (ii) redistribution of NPM1 to the nucleoplasm (indicative of nucleolar stress), and (iii) induction of gamma-H2AX staining in apoptotic cells (indicative of apoptosis).

      Additionally, in response to this comment, we also looked at the impact of WIN site inhibitors on rRNA synthesis, using AzCyd labeling. These new data appear in Figure 3—figure supplement 3. Interestingly, these new data show that there is a progressive decline in rRNA synthesis, and that by 96 hours of treatment levels of both 18S and 28S rRNAs are reduced— again by about a factor of two. Our interpretation of this finding is that in response to the progressive decline in RPG transcription there is a secondary decrease in rRNA synthesis. This result is perhaps not surprising, but it does again add an important missing piece to our characterization of WIN site inhibitors and is further support for the concept that inhibition of ribosome production is a dominant part of the response to these agents.

      (3) The WINi elicited DNA damage is incompletely characterized, rather it is inferred from H2AX activation. Comet assays would help to confirm such damage.

      As noted in our response to Reviewer #2, our original inference of DNA damage, prompted by gamma-H2AX activation, is erroneous, and due instead to the ability of WIN site inhibitors to induce apoptosis. We thus did not pursue comet assays, etc., and removed discussion of potential DNA damage from the manuscript.

      (4) Staining and microscopic observation of H2AX would be very useful. Is the WINi provoked DNA damage nucleolar-localized? Does the deficiency of ribosomal proteins lead to localized genotoxic nucleolar stress - or alternatively does the paucity of ribosomes and decreased translation lead to imbalances in other cellular pathways, perhaps including some involved in overall genome maintenance which would provoke more global DNA damage and H2AX staining, not limited to the nucleolus.

      Again, please see our response to the Public Comment from Reviewer #2.

      (5) It would be important to assess the influence and effects of WINi on some p53 mutant, p53-/- and p53 wild-type cell lines. Given their prevalence, p53 status may be expected to alter WINi efficacy.

      The issue of how p53 status impacts the response to WINi is interesting and important, but we feel this is beyond the scope of the current manuscript. It is likely that many factors contribute to the response of cancer cells to these agents, and thus simply surveying some cancer lines for their response and linking this to their p53 status is unlikely to be very informative. Making definitive statements about the contribution of p53, and the differences between wild-type, lossof-function mutants, gain of function mutants, and null mutants will require more extensive analyses and is fertile territory for future studies, in our opinion.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This is a useful study examining the determinants and mechanisms of LRMP inhibi:on of cAMP regula:on of HCN4 channel ga:ng. The evidence provided to support the main conclusions is unfortunately incomplete, with discrepancies in the work that reduce the strength of mechanis:c insights.

      Thank you for the reviews of our manuscript. We have made a number of changes to clarify our hypotheses in the manuscript and addressed all of the poten:al discrepancies by revising some of our interpreta:on. In addi:on, we have provided addi:onal experimental evidence to support our conclusions. Please see below for a detailed response to each reviewer comment.

      Public Reviews

      Reviewer #1 (Public Review):

      Summary:

      The authors use truncations, fragments, and HCN2/4 chimeras to narrow down the interaction and regulatory domains for LRMP inhibition of cAMP-dependent shifts in the voltage dependence of activation of HCN4 channels. They identify the N-terminal domain of HCN4 as a binding domain for LRMP, and highlight two residues in the C-linker as critical for the regulatory effect. Notably, whereas HCN2 is normally insensitive to LRMP, putting the N-terminus and 5 additional C-linker and S5 residues from HCN4 into HCN2 confers LRMP regulation in HCN2.

      Strengths:

      The work is excellent, the paper well written, and the data convincingly support the conclusions which shed new light on the interaction and mechanism for LRMP regulation of HCN4, as well as identifying critical differences that explain why LRMP does not regulate other isoforms such as HCN2.

      Thank you.

      Reviewer #2 (Public Review):

      Summary:

      HCN-4 isoform is found primarily in the sino-atrial node where it contributes to the pacemaking activity. LRMP is an accessory subunit that prevents cAMP-dependent potentiation of HCN4 isoform but does not have any effect on HCN2 regulation. In this study, the authors combine electrophysiology, FRET with standard molecular genetics to determine the molecular mechanism of LRMP action on HCN4 activity. Their study shows that parts of N- and C-termini along with specific residues in C-linker and S5 of HCN4 are crucial for mediating LRMP action on these channels. Furthermore, they show that the initial 224 residues of LRMP are sufficient to account for most of the activity. In my view, the highlight of this study is Fig. 7 which recapitulates LRMP modulation on HCN2-HCN4 chimera. Overall, this study is an excellent example of using time-tested methods to probe the molecular mechanisms of regulation of channel function by an accessory subunit.

      Weaknesses:

      (1) Figure 5A- I am a bit confused with this figure and perhaps it needs better labeling. When it states Citrine, does it mean just free Citrine, and "LRMP 1-230" means LRMP fused to Citrine which is an "LF" construct? Why not simply call it "LF"? If there is no Citrine fused to "LRMP 1-230", this figure would not make sense to me.

      We have clarified the labelling of this figure and specifically defined all abbreviations used for HCN4 and LRMP fragments in the results section on page 14.

      (2) Related to the above point- Why is there very little FRET between NF and LRMP 1-230? The FRET distance range is 2-8 nm which is quite large. To observe baseline FRET for this construct more explanation is required. Even if one assumes that about 100 amino are completely disordered (not extended) polymers, I think you would still expect significant FRET.

      FRET is extremely sensitive to distance (to the 6th power of distance). The difference in contour length (maximum length of a peptide if extended) between our ~260aa fragment and our ~130 aa fragments is on the order of 450Å (45nm), So, even if not extended it is not hard to imagine that the larger fragments show a weaker FRET signal. In fact, we do see a slightly larger FRET than we do in control (not significant) which is consistent with the idea that the larger fragments just do not result in a large FRET.

      Moreover, this hybridization assay is sensitive to a number of other factors including the affinity between the two fragments, the expression of each fragment, and the orientation of the fluorophores. Any of these factors could also result in reduced FRET.

      We have added a section on the limitations of the FRET 2-hybrid assay in the discussion section on page 20. Our goal with the FRET assay was to provide complimentary evidence that shows some of the regions that are important for direct association and we have edited to the text to make sure we are not over-interpreting our results.

      (3) Unless I missed this, have all the Cerulean and Citrine constructs been tested for functional activity?

      All citrine-tagged LRMP constructs (or close derivatives) were tested functionally by coexpression with HCN (See Table 1 and pages 10-11). Cerulean-tagged HCN4 fragments are of course intrinsically not-functional as they do not include the ion conducting pore.

      Reviewer #3 (Public Review):

      Summary:

      Using patch clamp electrophysiology and Förster resonance energy transfer (FRET), Peters and co-workers showed that the disordered N-terminus of both LRMP and HCN4 are necessary for LRMP to interact with HCN4 and inhibit the cAMP-dependent potentiation of channel opening. Strikingly, they identified two HCN4-specific residues, P545 and T547 in the C-linker of HCN4, that are close in proximity to the cAMP transduction centre (elbow Clinker, S4/S5-linker, HCND) and account for the LRMP effect.

      Strengths:

      Based on these data, the authors propose a mechanism in which LRMP specifically binds to HCN4 via its isotype-specific N-terminal sequence and thus prevents the cAMP transduction mechanism by acting at the interface between the elbow Clinker, the S4S5-linker, the HCND.

      Weaknesses:

      Although the work is interesting, there are some discrepancies between data that need to be addressed.

      (1) I suggest inserting in Table 1 and in the text, the Δ shift values (+cAMP; + LRMP; +cAMP/LRMP). This will help readers.

      Thank you, Δ shift values have been added to Tables 1 and 2 as suggested.

      (2) Figure 1 is not clear, the distribution of values is anomalously high. For instance, in 1B the distribution of values of V1/2 in the presence of cAMP goes from - 85 to -115. I agree that in the absence of cAMP, HCN4 in HEK293 cells shows some variability in V1/2 values, that nonetheless cannot be so wide (here the variability spans sometimes even 30 mV) and usually disappears with cAMP (here not).

      With a large N, this is an expected distribution. In 5 previous reports from 4 different groups of HCN4 with cAMP in HEK 293 (Fenske et al., 2020; Liao et al., 2012; Peters et al., 2020; Saponaro et al., 2021; Schweizer et al., 2010), the average expected range of the data is 26.6 mV and 39.9 mV for 95% (mean ± 2SD) and 99% (mean ± 3SD) of the data, respectively. As the reviewer mentions the expected range from these papers is slightly larger in the absence of cAMP. The average SD of HCN4 (with/without cAMP) in papers are 9.9 mV (Schweizer et al., 2010), 4.4 mV (Saponaro et al., 2021), 7.6 mV (Fenske et al., 2020), 10.0 mV (Liao et al., 2012), and 5.9 mV (Peters et al., 2020). Our SD in this paper is roughly in the middle at 7.6 mV. This is likely because we used an inclusive approach to data so as not to bias our results (see the statistics section of the revised manuscript on page 9). We have removed 2 data points that meet the statistical classification as outliers, no measures of statistical significance were altered by this.

      This problem is spread throughout the manuscript, and the measured mean effects are indeed always at the limit of statistical significance. Why so? Is this a problem with the analysis, or with the recordings?

      The exact P-values are NOT typically at the limit of statistical significance, about 2/3rds would meet the stringent P < 0.0001 cut-off. We have clarified in the statistics section (page 10) that any comparison meeting our significance threshold (P < 0.05) or a stricter criterion is treated equally in the figure labelling. Exact P-values are provided in Tables 1-3.

      There are several other problems with Figure 1 and in all figures of the manuscript: the Y scale is very narrow while the mean values are marked with large square boxes. Moreover, the exemplary activation curve of Figure 1A is not representative of the mean values reported in Figure 1B, and the values of 1B are different from those reported in Table 1.

      Y-axis values for mean plots were picked such that all data points are included and are consistent across all figures. They have been expanded slightly (-75 to -145 mV for all HCN4 channels and -65 to -135 mV for all HCN2 channels). The size of the mean value marker has been reduced slightly. Exact midpoints for all data are also found in Tables 1-3.

      The GV curves in Figure 1B (previously Fig. 1A) are averages with the ±SEM error bars smaller than the symbols in many cases owing to relatively high n’s for these datasets. These curves match the midpoints in panel 1C (previously 1B). Eg. the midpoint of the average curve for HCN4 control in panel A is -117.9 mV, the same as the -117.8 mV average for the individual fits in panel B.

      We made an error in the text based on a previous manuscript version about the ordering of the tables that has now been fixed so these values should now be aligned.

      On this ground, it is difficult to judge the conclusions and it would also greatly help if exemplary current traces would be also shown.

      Exemplary current traces have been added to all figures in the revised manuscript.

      (3) "....HCN4-P545A/T547F was insensitive to LRMP (Figs. 6B and 6C; Table 1), indicating that the unique HCN4 C-linker is necessary for regulation by LRMP. Thus, LRMP appears to regulate HCN4 by altering the interactions between the C-linker, S4-S5 linker, and Nterminus at the cAMP transduction centre."

      Although this is an interesting theory, there are no data supporting it. Indeed, P545 and T547 at the tip of the C-linker elbow (fig 6A) are crucial for LRMP effect, but these two residues are not involved in the cAMP transduction centre (interface between HCND, S4S5 linker, and Clinker elbow), at least for the data accumulated till now in the literature. Indeed, the hypothesis that LRMP somehow inhibits the cAMP transduction mechanism of HCN4 given the fact that the two necessary residues P545 and T547 are close to the cAMP transduction centre, remains to be proven.

      Moreover, I suggest analysing the putative role of P545 and T547 in light of the available HCN4 structures. In particular, T547 (elbow) points towards the underlying shoulder of the adjacent subunit and, therefore, is in a key position for the cAMP transduction mechanism. The presence of bulky hydrophobic residues (very different nature compared to T) in the equivalent position of HCN1 and HCN2 also favours this hypothesis. In this light, it will be also interesting to see whether a single T547F mutation is sufficient to prevent the LRMP effect.

      We agree that testing this hypothesis would be very interesting. However, it is challenging. Any mutation we make that is involved in cAMP transduction makes measuring the LRMP effect on cAMP shifts difficult or impossible.

      Our simple idea, now clarified in the discussion, is that if you look at the regions involved in cAMP transduction (HCND, C-linker, S4-S5), there are very few residues that differ between HCN4 and HCN2. When we mutate the 5 non-conserved residues in the S5 segment and the C-linker, along with the NT, we are able to render HCN2 sensitive to LRMP. Therefore, something about the small sequence differences in this region confer isoform specificity to LRMP. We speculate that this happens because of small structural differences that result from those 5 mutations. If you compare the solved structures of HCN1 and HCN4 (there is no HCN2 structure available), you can see small differences in the distances between key interacting residues in the transduction centre. Also, there is a kink at the bottom of the S4 helix in HCN4 but not HCN1. This points a putatively important residue for cAMP dependence in a different direction in HCN4. We hypothesize in the discussion that this may be how LRMP is isoform specific.

      Moreover, previous work has shown that the HCN4 C-linker is uniquely sensitive to di-cyclic nucleotides and magnesium ions. We are hypothesizing that it is the subtle change in structure that makes this region more prone to regulation in HCN4.

      Reviewing Editor (recommendations for the Authors):

      (1) Exemplar recordings need to be shown and some explanation for the wide variability in the V-half of activation.

      Exemplar currents are now shown for each channel. See the response to Reviewer 3’s public comment 2.

      (2) The rationale for cut sites in LRMP for the investigation of which parts of the protein are important for blocking the effect of cAMP is not logically presented in light of the modular schematics of domains in the protein (N-term, CCD, post-CCD, etc).

      There is limited structural data on LRMP and the HCN4 N-terminus. The cut sites in this paper were determined empirically. We made fragments that were small enough to work for our FRET hybridization approach and that expressed well in our HEK cell system. The residue numbering of the LRMP modules is based on updated structural predictions using Alphafold, which was released after our fragments were designed. This has been clarified in the methods section on pages 5-6 and the Figure 2 legend of the revised manuscript.

      (3) Role of the HCN4 C-terminus. Truncation of the HCN4 C-terminus unstructured Cterminus distal to the CNBD (Fig. 4 A, B) partially reverses the impact of LRMP (i.e. there is now a significant increase in cAMP effect compared to full-length HCN4). The manuscript is written in a manner that minimizes the potential role of the C-terminus and it is, therefore, eliminated from consideration in subsequent experiments (e.g. FRET) and the discussion. The model is incomplete without considering the impact of the C-terminus.

      We thank the reviewer for this comment as it was a result that we too readily dismissed. We have added discussion around this point and revised our model to suggest that not only can we not eliminate a role for the distal C-terminus, our data is consistent with it having a modest role. Our HCN4-2 chimera and HCN4-S719x data both suggest the possibility that the distal C-terminus might be having some effect on LRMP regulation. We have clarified this in the results (pages 12-13) and discussion (page 19).

      (4) For FRET experiments, it is not clear why LF should show an interaction with N2 (residues 125-160) but not NF (residues 1-160). N2 is contained within NF, and given that Citrine and Cerulean are present on the C-terminus of LF and N2/NF, respectively, residues 1-124 in NF should not impact the detection of FRET because of greater separation between the fluorophores as suggested by the authors.

      This is a fair point but FRET is somewhat more complicated. We do not know the structure of these fragments and it’s hard to speculate where the fluorophores are oriented in this type of assay. Moreover, this hybridization assay is sensitive to affinity and expression as well. There are a number of reasons why the larger 1-260 fragment might show reduced FRET compared to 125-260. As mentioned in our response to reviewer 2’s public comment 2, we have added a limitation section that outlines the various caveats of FRET that could explain this.

      (5) For FRET experiments, the choice of using pieces of the channel that do not correlate with the truncations studied in functional electrophysiological experiments limits the holistic interpretation of the data. Also, no explanation or discussion is provided for why LRMP fragments that are capable of binding to the HCN4 N-terminus as determined by FRET (e.g. residues 1-108 and 110-230, respectively) do not have a functional impact on the channel.

      As mentioned in the response to comment 2, the exact fragment design is a function of which fragments expressed well in HEK cells. Importantly, because FRET experiments do not provide atomic resolution for the caveats listed in the revised limitations section on page 20-21, small differences in the cut sites do not change the interpretation of these results. For example, the N-terminal 1-125 construct is analogous to experiments with the Δ1-130 HCN4 channel.

      We suspect that residues in both fragments are required and that the interaction involves multiple parts. This is stated in the results “Thus, the first 227 residues of LRMP are sufficient to regulate HCN4, with residues in both halves of the LRMP N-terminus necessary for the regulation” (page 11). We have also added discussion on this on page 21.

      (6) A striking result was that mutating two residues in the C-linker of HCN4 to amino acids found in HCN channels not affected by LRMP (P545A, T547F), completely eliminated the impact of LRMP on preventing cAMP regulation of channel activation. However, a chimeric channel, (HCN4-2) in which the C-linker, the CNBD, and the C-terminus of HCN4 were replaced by that of HCN2 was found to be partially responsive to LRMP. These two results appear inconsistent and not reconciled in the model proposed by the authors for how LRMP may be working.

      As stated in our answer to your question #3, we have revised our interpretation of these data. If the more distal C-terminus plays some role in the orientation of the C-linker and the transduction centre as a whole, these data can still be viewed consistent with our model. We have added some discussion of this idea in our discussion section.

      (7) Replacing the HCN2 N-terminus with that from HCN4, along with mutations in the S5 (MCS/VVG) and C-linker (AF/PT) recapitulated LRMP regulation on the HCN2 background. The functional importance of the S5 mutations is not clear as no other experiments are shown to indicate whether they are necessary for the observed effect.

      We have added our experiments on a midpoint HCN2 clone that includes the S5 mutants and the C-linker mutants in the absence of the HCN4 N-terminus (ie HCN2 MCSAF/VVGPT) (Fig. 7). And we have discussed our rationale for the S5 mutations as we believe they may be responsible for the different orientations of the S4-S5 linker in HCN1 and HCN4 structures that are known to impact cAMP regulation.

      Reviewer #1 (Recommendations For The Authors):

      A) Comments:

      (1) Figure 1: Please show some representative current traces.

      Exemplar currents are now shown for each channel in the manuscript.

      (2) Figure 1: There appears to be a huge number of recordings for HCN4 +/- cAMP as compared to those with LRMP 1-479Cit. How was the number of recordings needed for sufficient statistical power decided? This is particularly important because the observed slowing of deactivation by cAMP in Fig. 1C seems like it may be fairly subtle. Perhaps a swarm plot would make the shift more apparent? Also, LRMP 1-479Cit distributions in Fig. 1B-C look like they are more uniform than normal, so please double-check the appropriateness of the statistical test employed.

      We have revised the methods section (page 7) to discuss this, briefly we performed regular control experiments throughout this project to ensure that a normal cAMP response was occurring. Our minimum target for sufficient power was 8-10 recordings. We have expanded the statistics section (page 9) to discuss tests of normality and the use of a log scale for deactivation time constants which is why the shifts in Fig. 1D (revised) are less apparent.

      (3) It would be helpful if the authors could better introduce their logic for the M338V/C341V/S345G mutations in the HCN4-2 VVGPT mutant.

      See response to the reviewing editor’s comment 7.

      B) Minor Comments:

      (1) pg. 9: "We found that LRMP 1-479Cit inhibited HCN4 to an even greater degree than the full-length LRMP, likely because expression of this tagged construct was improved compared to the untagged full-length LRMP, which was detected by co-transfection with GFP." Co-transfection with GFP seems like an extremely poor and a risky measure for LRMP expression.

      We agree that the exact efficiency of co-transfection is contentious although some papers and manufacturer protocols indicate high co-transfection efficiency (Xie et al., 2011). In this paper we used both co-transfection and tagged proteins with similar results.

      (2) pg 9: "LRMP 1-227 construct contains the N-terminus of LRMP with a cut-site near the Nterminus of the predicted coiled-coil sequence". In Figure 2 the graphic shows the coiledcoil domain starting at 191. What was the logic for splitting at 227 which appears to be the middle of the coiled-coil?

      See response to the reviewing editor’s comment 2.

      (3) Figure 5C: Please align the various schematics for HCN4 as was done for LRMP. It makes it much easier to decipher what is what.

      Fig. 5 has been revised as suggested.

      (4) pg 12: I assume that the HCN2 fragment chosen aligns with the HCN4 N2 fragment which shows binding, but this logic should be stated if that is the case. If not, then how was the HCN2 fragment chosen?

      This is correct. This has been explicitly stated in the revised manuscript (page 14).

      (5) Figure 7: Add legend indicating black/gray = HCN4 and blue = HCN2.

      This has been stated in the revised figure legend.

      (6) pg 17: Conservation of P545 and T547 across mammalian species is not shown or cited.

      This sentence is not included in the revised manuscript, however, for the interest of the reviewer we have provided an alignment of this region across species here.

      Author response image 1.

      Reviewer #2 (Recommendations For The Authors):

      (1) It is not clear whether in the absence of cAMP, LRMP also modestly shifts the voltagedependent activity of the channels. Please clarify.

      We have clarified that LRMP does not shift the voltage-dependence in the absence of cAMP (page 10). In the absence of cAMP, LRMP does not significantly shift the voltagedependence of activation in any of the channels we have tested in this paper (or in our prior 2020 paper).

      (2) Resolution of Fig. 8b is low.

      We ultimately decided that the cartoon did not provide any important information for understanding our model and it was removed.

      (3) Please add a supplementary figure showing the amino acid sequence of LRMP to show where the demarcations are made for each fragment as well as where the truncations were made as noted in Fig 3 and Fig 4.

      A new supplementary figure showing the LRMP sequence has been added and cited in the methods section (page 5). Truncation sites have been added to the schematic in Fig. 2A.

      (4) In the cartoon schematic illustration for Fig. 3 and Fig.4, the legend should include that the thick bold lines in the C-Terminal domain represent the CNBD, while the thick bold lines in the N-Terminal domain represent the HCN domain. This was mentioned in Liao 2012, as you referenced when you defined the construct S719X, but it would be nice for the reader to know that the thick bold lines you have drawn in your cartoon indicate that it also highlights the CNBD or the HCN domain.

      This has been added to figure legends for the relevant figures in the revised manuscript.

      (5) On page 12, missing a space between "residues" and "1" in the parenthesis "...LRMP L1 (residues1-108)...".

      Fixed. Thank you.

      (6) Which isoform of LRMP was used? What is the NCBI accession number? Is it the same one from Peters 2020 ("MC228229")?

      This information has been added to the methods (page 5). It is the same as Peters 2020.

      Reviewer #3 (Recommendations For The Authors):

      (1) "Truncation of residues 1-62 led to a partial LRMP effect where cAMP caused a significant depolarizing shift in the presence of LRMP, but the activation in the presence of LRMP and cAMP was hyperpolarized compared to cAMP alone (Fig. 3B, C and 3E; Table 1). In the HCN4Δ1-130 construct, cAMP caused a significant depolarizing shift in the presence of LRMP; however, the midpoint of activation in the presence of LRMP and cAMP showed a non-significant trend towards hyperpolarization compared to cAMP alone (Fig. 3C and 3E; Table 1)".

      This means that sequence 62-185 is necessary and sufficient for the LRMP effect. I suggest a competition assay with this peptide (synthetic, or co-expressed with HCN4 full-length and LRMP to see whether the peptide inhibits the LRMP effect).

      We respectfully disagree with the reviewer’s interpretation. Our results, strongly suggest that other regions such as residues 25-65 (Fig. 3C) and C-terminal residues (Fig. 6) are also necessary. The use of a peptide could be an interesting future experiment, however, it would be very difficult to control relative expression of a co-expressed peptide. We think that our results in Fig. 7E-F where this fragment is added to HCN2 are a better controlled way of validating the importance of this region.

      (2) "Truncation of the distal C-terminus (of HCN4) did not prevent LRMP regulation. In the presence of both LRMP and cAMP the activation of HCN4-S719X was still significantly hyperpolarized compared to the presence of cAMP alone (Figs. 4A and 4B; Table 1). And the cAMP-induced shift in HCN4-S719X in the presence of LRMP (~7mV) was less than half the shift in the absence of LRMP (~18 mV)."

      On the basis of the partial effects reported for the truncations of the N-terminus of HCN4 162 and 1-130 (Fig 3B and C), I do not think it is possible to conclude that "truncation of the distal C-terminus (of HCN4) did not prevent LRMP regulation". Indeed, cAMP-induced shift in HCN4 Δ1-62 and Δ1-130 in the presence of LRMP were 10.9 and 10.5 mV, respectively, way more than the ~7mV measured for the HCN4-S719X mutant.

      As you rightly stated at the end of the paragraph:" Together, these results show significant LRMP regulation of HCN4 even when the distal C-terminus is truncated, consistent with a minimal role for the C-terminus in the regulatory pathway". I would better discuss this minimal role of the C-terminus. It is true that deletion of the first 185 aa of HCN4 Nterminus abolishes the LRMP effect, but it is also true that removal of the very Cterm of HCN4 does affect LRMP. This unstructured C-terminal region of HCN4 contains isotype-specific sequences. Maybe they also play a role in recognizing LRMP. Thus, I would suggest further investigation via truncations, even internal deletions of HCN4-specific sequences.

      Please see the response to the reviewing editor’s comment 3.

      (3) Figure 5: The N-terminus of LRMP FRETs with the N-terminus of HCN4.

      Why didn't you test the same truncations used in Fig. 3? Indeed, based on Fig 3, sequences 1-25 can be removed. I would have considered peptides 26-62 and 63-130 and 131-185 and a fourth (26-185). This set of peptides will help you connect binding with the functional effects of the truncations tested in Fig 3.

      Please see the response to the reviewing editor’s comment 2 and 5.

      Why didn't you test the C-terminus (from 719 till the end) of HCN4? This can help with understanding why truncation of HCN4 Cterminus does affect LRMP, tough partially (Fig. 4A).

      Please see the response to the reviewing editor’s comment 3.

      (4) "We found that a previously described HCN4-2 chimera containing the HCN4 N-terminus and transmembrane domains (residues 1-518) with the HCN2 C-terminus (442-863) (Liao et al., 2012) was partially regulated by LRMP (Fig. 7A and 7B)".

      I do not understand this partial LRMP effect on the HCN4-2 chimera. In Fig. 6 you have shown that the "HCN4-P545A/T547F was insensitive to LRMP (Figs. 6B and 6C; Table 1), indicating that the unique HCN4 C-linker is necessary for regulation by LRMP". How can be this reconciled with the HCN4-2 chimera? HCN4-2, "containing" P545A/T547F mutations, should not perceive LRMP.

      Please see the response to the reviewing editor’s comment 6.

      (5) "we next made a targeted chimera of HCN2 that contains the distal HCN4 N-terminus (residues 1-212) and the HCN2 transmembrane and C-terminal domains with 5 point mutants in non-conserved residues of the S5 segment and C-linker elbow (M338V/C341V/S345G/A467P/F469T)......Importantly, the HCN4-2 VVGPT channel is insensitive to cAMP in the presence of LRMP (Fig. 7C and 7D), indicating that the HCN4 Nterminus and cAMP-transduction centre residues are sufficient to confer LRMP regulation to HCN2".

      Why did you insert also the 3 mutations of S5? Are these mutations somehow involved in the cAMP transduction mechanism?

      You have already shown that in HCN4 only P545 and T547 (Clinker) are necessary for LRMP effect. I suggest to try, at least, the chimera of HCN2 with only A467P/F469T. They should work without the 3 mutations in S5.

      Please see the response to the reviewing editor’s comment 7.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      In this study, Pan DY et al. discovered that the clearance of senescent osteoclasts can lead to a reduction in sensory nerve innervation. This reduction is achieved through the attenuation of Netrin-1 and NGF levels, as well as the regulation of H-type vessels, resulting in a decrease in pain-related behavior. The experiments are well-designed. The results are clearly presented, and the legends are also clear and informative. Their findings represent a potential treatment for spine pain utilizing senolytic drugs.

      Strengths:

      Rigorous data, well-designed experiments as well as significant innovation make this manuscript stand out.

      Weaknesses:

      Quantification of histology and detailed statistical analysis will further strengthen this manuscript.

      I have the following specific comments.

      (1) Since defining senescent cells solely based on one or two markers (SA-β-gal and p16) may not provide a robust characterization, it would be advisable to employ another wellestablished senescence marker, such as γ-H2AX or HMGB1, to corroborate the observed increase in senescent osteoclasts following LSI and aging.

      We value the comments provided by the reviewer. In accordance with your suggestion, we have performed co-staining of HMGB1 with Trap in Supplementary Figure 1 to corroborate the observed augmentation of senescent osteoclasts following LSI and aging.

      Author response image 1.

      (2) The connection between heightened Netrin-1 secretion by senescent osteoclasts following LSI or aging and its relevance to pain warrants thorough discussion within the manuscript to provide a comprehensive understanding of the entire narrative.

      We appreciate the reviewer's insightful comments. We have thoroughly addressed the entire narrative in the revised manuscript, as outlined below:

      During lumbar spine instability (LSI) or aging, endplates undergo ossification, leading to elevated osteoclast activity and increased porosity1-4. The progressive porous transformation of endplates, accompanied by a narrowed intervertebral disc (IVD) space, is a hallmark of spinal degeneration4,5. Considering that pain arises from nociceptors, it is plausible that low back pain (LBP) may be attributed to sensory innervation within endplates. Additionally, porous endplates exhibit higher nerve density compared to normal endplates or degenerative nucleus pulposus6. Netrin-1, a crucial axon guidance factor facilitating nerve protrusion, has been implicated in this process7-9. The receptor mediating Netrin-1-induced neuronal sprouting, deleted in colorectal cancer (DCC), was found to co-localize with CGRP+ sensory nerve fibers in endplates after LSI surgery10,11. In summary, during LSI or aging, osteoclastic lineage cells secrete Netrin-1, inducing extrusion and innervation of CGRP+ sensory nerve fibers within the spaces created by osteoclast resorption. This Netrin-1/DCC-mediated pain signal is subsequently transmitted to the dorsal root ganglion (DRG) or higher brain levels.

      (3) It appears that the quantitative data for TRAP staining in Figure 1j is missing.

      We appreciate the reviewer's comments. We have added the statistical data of TRAP staining (Figure. 1p) to Figure 1 in the revised manuscript.

      Author response image 2.

      (4) Regarding Figure 6, could you please specify which panels were analyzed using a t-test and which ones were subjected to ANOVA? Alternatively, were all the panels in Figure 6 analyzed using ANOVA?

      We appreciate the reviewer’s comments here. Upon careful review, we have ensured that quantitative data in panels b, c, and f are analyzed using t-tests, while panels d, e, and g are subjected to one-way ANOVA. These updates have been reflected in the revised figure legend.

      Reviewer #2 (Public Review):

      Summary:

      This manuscript examined the underlying mechanisms between senescent osteoclasts (SnOCs) and lumbar spine instability (LSI) or aging. They first showed that greater numbers of SnOCs are observed in mouse models of LSI or aging, and these SnOCs are associated with induced sensory nerve innervation, as well as the growth of H-type vessels, in the porous endplate. Then, the deletion of senescent cells by administration of the senolytic drug Navitoclax (ABT263) results in significantly less spinal hypersensitivity, spinal degeneration, porosity of the endplate, sensory nerve innervation, and H-type vessel growth in the endplate. Finally, they also found that there is greater SnOCmediated secretion of Netrin-1 and NGF, two well-established sensory nerve growth factors, compared to non-senescent OCs. The study is well conducted and data strongly support the idea. However, some minor issues need to be addressed.

      (1) In Figure 2C, "Number of SnCs/mm2", SnCs should be SnOCs.

      We apologize for the oversight. This has been rectified in the revised manuscript.

      Author response image 3.

      (2) In Figure 3A-E, is there any statistical difference between groups Young and Aged+PBS?

      We appreciate the reviewer's comments. Following your recommendation, we conducted additional statistical analyses to compare the young and PBS-treated aged mice, and we have incorporated these findings into the revised manuscript. The data reveals a significant increased paw withdrawal frequency (PWF) in aged mice treated with PBS compared with young mice, particularly at 0.4g instead of 0.07g (Figure 3a, 3b). Moreover, aged mice treated with PBS exhibited a significant reduction in both distance traveled and active time when compared to young mice (Figure. 3d, 3e). Additionally, PBS-treated aged mice demonstrated a significantly shortened heat response time relative to young mice (Figure. 3c).

      Author response image 4.

      (3) Again, is there any statistical difference between the Young and Aged+PBS groups in Figure 4F-K?

      We appreciate the reviewer's comments. As per your suggestion, we conducted a thorough analysis to determine the statistical differences between the young and aged+PBS groups, and these statistical results have been implemented in the revised manuscript. The caudal endplates of L4/5 in PBS-treated aged mice exhibited a significant increase in endplate porosity (Figure. 4f) and trabecular separation (Tb.Sp) (Figure. 4g) compared to young mice.

      Additionally, PBS-treated aged mice showed a significant elevation in endplate score (Figure. 4h), as well as an increased distribution of MMP13 and ColX within the endplates when compared to young mice (Figure. 4i, 4j). Furthermore, TRAP staining revealed a significant increase in TRAP+ osteoclasts within the endplates of PBS-treated aged mice as compared to young mice (Figure. 4k).

      Author response image 5.

      (4) What is the figure legend of Figure 7?

      The legend for Figure 7 (as below) is included in a separate PDF file labeled 'Figures and Legends.' We have carefully checked the revised manuscript and made sure all the legends are included.

      “Fig. 7. (a) Representative images of immunofluorescent analysis of CD31, an angiogenesis marker (green), Emcn, an endothelial cell marker (red) and nuclei (DAPI; blue) of adult sham, LSI and aged mice injected with PBS or ABT263. (b) Quantitative analysis of the intensity mean value of CD31 per mm2 in sham, LSI mice treated with PBS or ABT263. (c) Quantitative analysis of the intensity mean value of CD31 per mm2 in aged mice treated with PBS or ABT263. (d) Quantitative analysis of the intensity mean value of Emcn per mm2 in sham, LSI mice treated with PBS or ABT263. (e) Quantitative analysis of the intensity mean value of Emcn per mm2 in aged mice treated with PBS or ABT263. n ≥ 4 per group. Statistical significance was determined by one-way ANOVA, and all data are shown as means ± standard deviations. “

      (5) In "Mice" section, an Ethical code is suggested to be added.

      We appreciate the reviewer's comments. In accordance with your suggestion, we have included the Johns Hopkins University animal protocol number in the revised manuscript. The relevant paragraph has been updated to read: “All mice were maintained at the animal facility of The Johns Hopkins University School of Medicine (protocol number: MO21M276).”

      (6) In "Methods" section, please indicate the primers of GAPDH.

      We apologize for the absence of the GAPDH primers. Upon review, the GAPDH primers used were as follows: forward primer 5'-ATGTGTCCGTCGTGGATCTGA-3' and reverse primer 5'-ATGCCTGCTTCACCACCTTCTT-3'. These primer sequences have been included in the revised manuscript.

      (7) Preosteoclasts are regarded to be closely related to H-type vessel growth, so do the authors have any comments on this? Any difference or correlation between SnCs and preosteoclasts?

      The pre-osteoclast plays a crucial role in secreting anabolic growth factors that facilitate H-type vessel formation, osteoblast chemotaxis, proliferation, differentiation, and mineralization. The osteoclast represents the terminal differentiation phase, ultimately leading to the induction of resorption.

      Senescent cells, including senescent osteoclasts, are characterized by permanent cell cycle arrest and changes in their secretory profile, which can impact their function. In the context of osteoclasts, senescence can lead to a reduction in bone resorption capacity and impaired bone remodeling. Senescent osteoclasts are believed to contribute to age-related bone loss and bonerelated diseases, such as osteoporosis.

      Reviewer #3 (Public Review):

      Summary:

      This research article reports that a greater number of senescent osteoclasts (SnOCs), which produce Netrin-1 and NGF, are responsible for innervation in the LSI and aging animal models.

      Strengths:

      The research is based on previous findings in the authors' lab and the fact that the IVD structure was restored by treatment with ABT263. The logic is clear and clarifies the pathological role of SnOCs, suggesting the potential utilization of senolytic drugs for the treatment of LBP. Generally, the study is of good quality and the data is convincing.

      Weaknesses:

      There are some points that can be improved:

      (1) Since this work primarily focuses on ABT263, it resembles a pharmacological study for this drug. It is preferable to provide references for the ABT263 concentration and explain how the administration was determined.

      Thank you for your comment. ABT263 has been extensively employed in diverse research studies12-15. The concentration and administration of ABT263 followed the protocol outlined in the published paper13. The reference on how to use ABT263 is cited in the method section: “ABT263 was administered to mice by gavage at a dosage of 50 mg per kg body weight per day (mg/kg/d) for a total of 7 days per cycle, with two cycles conducted and a 2-week interval between them39”.

      (2) It would strengthen the study to include at least 6 mice per group for each experiment and analysis, which would provide a more robust foundation.

      Thank you for your comment here. In response, we conducted a new set of experiments, augmenting the majority of the sample size to six, and updated the corresponding statistical data in the revised manuscript.

      (3) In Figure 4, either use "adult" or "young" consistently, but not both. Additionally, it's important to define "sham," "young," and "adult" explicitly in the methods section.

      Thank you for your comment. We have addressed the inconsistency in the labeling of Figure 4. Additionally, we have explicitly defined "sham," "young," and "adult" in the methods section as follows: The control group (sham group) for the LSI group refers to C57BL/6J mice that did not undergo LSI surgery, while the control group (young group) for the Aged group refers to 4-month-old C57BL/6J mice.

      Author response image 6.

      (4) Assess the protein expression of Netrin 1 and NGF.

      Thank you for your comment here. We employed ELISA to assess the protein expression of Netrin-1 and NGF in the L3 to L5 endplates. The data revealed that compared to the young sham mice, LSI was associated with significantly greater protein expression of Netrin1 and NGF, which was substantially attenuated by ABT263 treatment in LSI mice (Supplementary Fig. 2a, 2b)

      Author response image 7.

      Reference

      (1) Bian, Q. et al. Excessive Activation of TGFbeta by Spinal Instability Causes Vertebral Endplate Sclerosis. Sci Rep 6, 27093, doi:10.1038/srep27093 (2016).

      (2) Bian, Q. et al. Mechanosignaling activation of TGFbeta maintains intervertebral disc homeostasis. Bone Res 5, 17008, doi:10.1038/boneres.2017.8 (2017).

      (3) Papadakis, M., Sapkas, G., Papadopoulos, E. C. & Katonis, P. Pathophysiology and biomechanics of the aging spine. Open Orthop J 5, 335-342, doi:10.2174/1874325001105010335 (2011).

      (4) Rodriguez, A. G. et al. Morphology of the human vertebral endplate. J Orthop Res 30, 280-287, doi:10.1002/jor.21513 (2012).

      (5) Taher, F. et al. Lumbar degenerative disc disease: current and future concepts of diagnosis and management. Adv Orthop 2012, 970752, doi:10.1155/2012/970752 (2012).

      (6) Fields, A. J., Liebenberg, E. C. & Lotz, J. C. Innervation of pathologies in the lumbar vertebral end plate and intervertebral disc. Spine J 14, 513-521, doi:10.1016/j.spinee.2013.06.075 (2014).

      (7) Hand, R. A. & Kolodkin, A. L. Netrin-Mediated Axon Guidance to the CNS Midline Revisited. Neuron 94, 691-693, doi:10.1016/j.neuron.2017.05.012 (2017).

      (8) Moore, S. W., Zhang, X., Lynch, C. D. & Sheetz, M. P. Netrin-1 attracts axons through FAK-dependent mechanotransduction. J Neurosci 32, 11574-11585, doi:10.1523/JNEUROSCI.0999-12.2012 (2012).

      (9) Serafini, T. et al. Netrin-1 is required for commissural axon guidance in the developing vertebrate nervous system. Cell 87, 1001-1014, doi:10.1016/s0092-8674(00)81795-x (1996).

      (10) Forcet, C. et al. Netrin-1-mediated axon outgrowth requires deleted in colorectal cancer-dependent MAPK activation. Nature 417, 443-447, doi:10.1038/nature748 (2002).

      (11) Shu, T., Valentino, K. M., Seaman, C., Cooper, H. M. & Richards, L. J. Expression of the netrin-1 receptor, deleted in colorectal cancer (DCC), is largely confined to projecting neurons in the developing forebrain. J Comp Neurol 416, 201-212, doi:10.1002/(sici)1096-9861(20000110)416:2<201::aid-cne6>3.0.co;2-z (2000).

      (12) Born, E. et al. Eliminating Senescent Cells Can Promote Pulmonary Hypertension Development and Progression. Circulation 147, 650-666, doi:10.1161/CIRCULATIONAHA.122.058794 (2023).

      (13) Chang, J. et al. Clearance of senescent cells by ABT263 rejuvenates aged hematopoietic stem cells in mice. Nat Med 22, 78-83, doi:10.1038/nm.4010 (2016).

      (14) Lim, S. et al. Local Delivery of Senolytic Drug Inhibits Intervertebral Disc Degeneration and Restores Intervertebral Disc Structure. Adv Healthc Mater 11, e2101483, doi:10.1002/adhm.202101483 (2022).

      (15) Yang, H. et al. Navitoclax (ABT263) reduces inflammation and promotes chondrogenic phenotype by clearing senescent osteoarthritic chondrocytes in osteoarthritis. Aging (Albany NY) 12, 12750-12770, doi:10.18632/aging.103177 (2020).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1

      The authors should include experiments such as Cryo-EM and genetically modified animals to demonstrate the physiological importance of the TMEM81 complex.

      While we intend to pursue cryo-EM studies of the putative complex (or subcomplexes thereof), this is clearly not a straightforward endeavor and goes beyond the scope of the present manuscript. Concerning the generation of genetically modified animals, we would like to underline that the majority of the proteins that we used for AlphaFold-Multimer complex predictions were precisely chosen based on the fact that - as detailed in the publications referenced in the Introduction - ablation of the respective genes caused sex-specific infertility due to defects in gamete fusion (the other criterion used for inclusion being structural similarity to IZUMO1 coupled with expression in the testis (IZUMO2-4 and TMEM81), or evidence from other kinds of experiments in the case of human-specific MAIA). Concerning TMEM81, experimental evidence for a direct involvement in gamete fusion is described in the referenced preprint by Daneke et al., which was submitted to bioRxiv concomitantly with the present work.

      Reviewer #2

      I believe that the manuscript would benefit from the authors providing more information about the systematic search (Figure 4). For example, by indicating for each pair tested the average pDock score in a 2D plot (or table) and as raw data in the supplementary information.

      Figure 4 has been modified to report both the top and the mean ranking scores for every interaction. Furthermore, additional metrics for the systematic search summarized in Figure 4, including pDockQ scores, are provided in this manuscript revision as supplementary Table S1.

      A global search, such as including all membrane proteins expressed in eggs or sperm, could not only be more informative but could also allow the reader to understand the pDock score discrimination power for this particular subset.

      The possibility of carrying out a global search was evaluated by performing preliminary computational experiments on an extended ensemble of sperm and egg proteins. In order to do so, we compiled a list of sperm membrane proteins by referring to 4 proteomic datasets (PMIDs 36384108, 36896575, 31824947, 24082039) and identifying ~600 proteins that were found in at least two of them; among these, 250 were single-pass type I or type II membrane proteins, or GPI-anchored proteins. Similarly, a list of 160 egg surface membrane proteins, excluding multipass and secreted ones, was obtained by comparing oocyte cDNA library NIH_MGC_257_N (Express Genomics, USA) with 4 proteomic datasets (PMIDs 35809850, 36042231, 29025019, 27215607). As we briefly commented at the beginning of the section “Prediction of interactions between human proteins associated with gamete fusion” of the revised manuscript, the tests carried out using the resulting list of sperm and egg proteins suggested that interpreting the results of a global search would be severely complicated by a relatively large number of putative false positives. Moreover, the tests showed that performing a complete systematic search would be beyond our current access to computing power. Based on these observations, we preferred to maintain the present study limited to proteins that had been previously clearly implicated in gamete fusion and/or matched specific structural features of IZUMO1.

      Figure 5 could be improved in clarity by schematically indicating to which cell each protein is anchored.

      This has been done in the revised version of the manuscript.

      Reviewer #3

      Major comments

      (1) In Figure 1, how the protein of mouse/human IZUMO1 and JUNO is purified is not mentioned in the main text nor in the Methods. Are the mouse IZUMO1-His and mouse JUNO-His transfected together or separately? Are human JUNO-His and human IZUMO1-Myc transfected together into HEK293 cells? And purified by IMAC?

      Transfection information has been included in the Methods section “Protein expression, purification and analysis” (previously “Protein expression and purification”). Concerning the purification procedure, we had already stated in the legend of Figure 1 that human JUNOE-His/IZUMO1E-Myc had been purified by IMAC before SEC, and have now done the same for mouse JUNOE-His and IZUMO1E-His.

      (2) It would be easier to understand the figure if the author could run a WB to indicate which band above JUNO is specifically IZUMO1-Myc in Figure 1.

      This has been done and reported in a new Figure S1 (with the original Figure S1 having now become Figure S2). Details about the antibodies used for immunoblot have been included in both Methods section “Protein expression, purification and analysis” and the Key Resources Table.

      (3) Figure 4: Analysis of more proteins that have been suggested as possible candidates for sperm-egg interaction will help to highlight the following results. Also, providing a score for the possibility of interaction might help in selecting those proteins in Figures 5 and 6.

      Please refer to the answer to the first question of Reviewer #2.

      (4) Figure 7: The authors take advantage of the latest developments in protein structure and interaction to model protein complex formation. However, some experimental experiments such as Co-IP, pull down to support the prediction to verify some of this predicated interaction is necessary.

      We agree with the reviewer; however, for the reasons we discussed during our comparison of the biochemical properties of the JUNO/IZUMO1 interaction between mouse and human, pursuing this line of inquiry will likely necessitate an extensive set of parallel experiments using proteins from different species. This work is being planned and will be the focus of future studies. However, as we mentioned at the end of the Abstract, one should also consider that some of these complexes are likely to be highly transient. Because of this, while they may have important regulated roles in vivo (function at a specific time and place), they could be very challenging to detect using standard approaches in vitro. We thus see this as a significant advance that structural modeling could contribute to the identification of such functionally important but transient interactions.

      Minor points

      (1) In the abstract, "three sperm (IZUMO1, SPACA6 and TMEM81) "should be "three sperm proteins."

      The Abstract has been condensed to fit within the suggested 200-word limit and, as part of this, the sentence has been changed to “complex involving sperm IZUMO1, SPACA6, TMEM81 and egg JUNO, CD9”.

      (2) How do the predictions of the binary complex IZUMO1/CD9 (Figure S1B) or IZUMO1/CD81 (Figure S1C) suggest "the two egg tetraspanins are interchangeable"? Was it because they are quite similar? Please provide more explanation for this speculation. Interchangeable by function or for complex formation? To support the conclusion, biochemical data is required. Otherwise, it needs to be toned down.

      This is because, in the AlphaFold-Multimer predictions of the pentameric complex, CD9 and CD81 are placed in essentially the same way relative to the other subunits.

      We have now clarified this at the end of page 6:

      “(...) suggest that the two egg tetraspanins are interchangeable because they are predicted to bind to the same region of IZUMO1; (...)”

      (3) It would be more reader-friendly if the author could label the name of each protein in the figure in Figure S1, especially when the name is not written in the figure legend.

      This has been done in Figure S2 of the revised manuscript (corresponding to original Figure S1).

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      This study examined a universal fractal primate brain shape. However, the paper does not seem well structured and is not well written. It is not clear what the purpose of the paper is. And there is a lack of explanation for why the proposed analysis is necessary. As a result, it is challenging to clearly understand what novelty in the paper is and what the main findings are.

      We have now restructured the paper, including a summary of the main purpose and findings as follows:

      “Compared to previous literature, we can summarise our main contribution and advance as follows:

      (i) We are showing for the first time that representative primate species follow the exact same fractal scaling – as opposed to previous work showing that they have a similar fractal dimension [Hofman1985, Hofman1991], i.e. slope, but not necessarily the same offset, as previous methods had no consistent way of comparing offsets.

      (ii) Previous work could also not show direct agreement in morphometrics between the coarse-grained brains of primate species and other non-primate mammalian species.

      (iii) Demonstrating in proof-of-principle that multiscale morphometrics, in practice, can have much larger effect sizes for classification applications. This moves beyond our previous work where we only showed the scaling law across [Mota2015] and within species [Wang2016], but all on one (native) scale with comparable effect sizes for classification applications [Wang2021].

      In simple terms: we know that objects can have the same fractal dimension, but differ greatly in a range of other shape properties. However, we demonstrate here, that representative primate brains and mammalian brain indeed share a range of other key shape properties, on top of agreeing in fractal dimension. This suggests a universal blueprint for mammalian brain shape and a common set of mechanisms governing cortical folding. As a practical additional outcome of our study, we could show that our novel method of deriving multiscale metrics of brain shape can differentiate subtle shape changes much better than the metrics we have been using so far at a single native scale.”

      We plan to use the second paragraph as a plain-language summary of our work.

      Additionally, several terms are introduced without adequate explanation and contextualization, further complicating comprehension.

      We have now made sure that potential jargon is introduced with context and explanation. For example in Introduction: “This scaling law, relating powers of cortical thickness and surface area metrics, […]”

      Does the second section, "2. Coarse-graining procedure", serve as an introduction or a method?

      We have now renamed this section to “Coarse-graining Method” to indicate that this is a section about methods. However, to describe the methods adequately, we also expanded this section with introductory texts around the history and motivation of the method to provide context and explanations, as the reviewer rightly requested.

      Moreover, the rationale behind the use of the coarse-graining procedure is not adequately elucidated. Overall, it is strongly recommended that the paper undergoes significant improvements in terms of its structure, explanatory depth, and overall clarity to enhance its comprehensibility.

      To specifically explain the rationale behind the coarse-graining method, we added several clarifications, including the following paragraph:

      “As a starting point for such a coarse-graining procedure, we suggest to turn to a well-established method that measures fractal dimension of objects: the so-called box-counting algorithm [Kochunov2007, Madan2019]. Briefly, this algorithm fills the object of interest (say the cortex in our case) with boxes, or voxels of increasingly larger sizes and counts the number of boxes in the object as a function of box size. As the box size increases, the number of boxes decreases; and in a log-log plot, the slope of this relationship indicates the fractal dimension of the object. In our case, this method would not only provide us with the fractal dimension of the cortex, but, with increasing box size, the filled cortex would also contain less and less detail of the folded shape of the cortex. Intuitively, with increasing box size, the smaller details, below the resolution of a single box, would disappear first, and increasingly larger details will follow -- precisely what we require from a coarse-graining method. We therefore propose to expand the traditional box-counting method beyond its use to measure fractal dimension, but to also analyse the reconstructed cortices as different realisations of the original cortex at the specified spatial scale.”

      Reviewer #2 (Public Review):

      In this manuscript, Wang and colleagues analyze the shapes of cerebral cortices from several primate species, including subgroups of young and old humans, to characterize commonalities in patterns of gyrification, cortical thickness, and cortical surface area. The work builds on the scaling law introduced previously by co-author Mota, and Herculano-Houzel. The authors state that the observed scaling law shares properties with fractals, where shape properties are similar across several spatial scales. One way the authors assess this is to perform a "cortical melting" operation that they have devised on surface models obtained from several primate species. The authors also explore differences in shape properties between the brains of young (~20 year old) and old (~80) humans. My main criticism of this manuscript is that the findings are presented in too abstract a manner for the scientific contribution to be recognized.

      We recognise that our work is at the intersection of complex mathematical concepts and a perplexing biological phenomenon. Therefore, our paper has to strike a balance between scientifically accurate and succinct descriptions whilst giving sufficient space to provide context and explanations.

      Throughout, we have now added text to provide more context, but also repeat key statements in plain-english terms.

      For example, we added the following text to highlight our key contributions.

      “In simple terms: we know that objects can have the same fractal dimension, but differ greatly in a range of other shape properties. However, we demonstrate here, that representative primate brains and mammalian brain indeed share a range of other key shape properties, on top of agreeing in fractal dimension. This suggests a universal blueprint for mammalian brain shape and a common set of mechanisms governing cortical folding. As a practical additional outcome of our study, we could show that our novel method of deriving multiscale metrics of brain shape can differentiate subtle shape changes much better than the metrics we have been using so far at a single native scale.”

      (1) The series of operations to coarse-grain the cortex illustrated in Figure 1, constitute a novel procedure, but it is not strongly motivated, and it produces image segmentations that do not resemble real brains.

      To specifically explain the rationale behind the coarse-graining method, we added several clarifications, including the following paragraph:

      “As a starting point for such a coarse-graining procedure, we suggest to turn to a well-established method that measures fractal dimension of objects: the so-called box-counting algorithm [Kochunov2007, Madan2019]. Briefly, this algorithm fills the object of interest (say the cortex in our case) with boxes, or voxels of increasingly larger sizes and counts the number of boxes in the object as a function of box size. As the box size increases, the number of boxes decreases; and in a log-log plot, the slope of this relationship indicates the fractal dimension of the object. In our case, this method would not only provide us with the fractal dimension of the cortex, but, with increasing box size, the filled cortex would also contain less and less detail of the folded shape of the cortex. Intuitively, with increasing box size, the smaller details, below the resolution of a single box, would disappear first, and increasingly larger details will follow -- precisely what we require from a coarse-graining method. We therefore propose to expand the traditional box-counting method beyond its use to measure fractal dimension, but to also analyse the reconstructed cortices as different realisations of the original cortex at the specified spatial scale.”

      We also note in several places in the text that the coarse-grained brains are not to be understood as exact reconstructions of actual brains, but serve the purpose of a model:

      “[…] nor are the coarse-grained versions of human brains supposed to exactly resemble the location/pattern/features of gyri and sulci of other primates. The similarity we highlighted here are on the level of summary metrics, and our goal was to highlight the universality in such metrics to point towards highly conserved quantities and mechanisms.”

      “Note, of course, that the coarse-grained brain surfaces are an output of our algorithm alone and not to be directly/naively likened to actual brain surfaces, e.g. in terms of the location or shape of the folds. Our comparisons here between coarse-grained brains and actual brains is purely on the level of morphometrics across the whole cortex.”

      The process to assign voxels in downsampled images to cortex and white matter is biased towards the former, as only 4 corners of a given voxel are needed to intersect the original pial surface, but all 8 corners are needed to be assigned a white matter voxel (section S2). This causes the cortical segmentation, such as the bottom row of Figure 1B, to increase in thickness with successive melting steps, to unrealistic values. For the rightmost figure panel, the cortex consists of several 4.9-sided voxels and thus a >2 cm thick cortex. A structure with these morphological properties is not consistent with the anatomical organization of a typical mammalian neocortex.

      Specifically on the point on increasing cortical thickness with increased level of coarse-graining, we have now added the following paragraph:

      “The observation that with increasing voxel sizes, the coarse-grained cortices tend to be smoother and thicker is particularly interesting: the scaling law in Eq. 1 can be understood as thicker cortices (T) form larger folds (or are smoother i.e. less surface area At) when brain size is kept constant (Ae). This way of understanding has also been vividly illustrated by using the analogy of forming paper balls with papers of varying thickness in [Mota2015]: to achieve the same size of a paper ball (Ae), the one that uses thicker paper (T) will show larger folds (or is smoother i.e. less surface area At) than the one using thinner paper. The scaling law can therefore be understood as a physically and biologically plausible statement, and here, we are encouraged that our algorithm yields results in line with the scaling law.”

      (2) For the comparison between 20-year-old and 80-year-old brains, a well-documented difference is that the older age group possesses more cerebral spinal fluid due to tissue atrophy, and the distances between the walls of gyri becomes greater. This difference is born out in the left column of Figure 4c. It seems this additional spacing between gyri in 80-year-olds requires more extensive down-sampling (larger scale values in Figure 4a) to achieve a similar shape parameter K as for the 20-year-olds. A case could be made that the familiar way of describing brain tissue - cortical volume, white matter volume, thickness, etc. - is a more direct and intuitive way to describe differences between young and old adult brains than the obscure shape metric described in this manuscript. At a minimum, a demonstration of an advantage of the Figure 4a and 4b analyses over current methods for interpreting age-related differences would be valuable.

      We have demonstrated the utility of our new shape metrics in a separate paper [Wang2021]. However, we agree with the reviewer that, in this specific instance, it is much easier to understand the key message without considering the less traditional metrics. We have therefore completely revised this part of the Results section to highlight the advantage of multiscale morphometrics, and used the traditional metric of surface area to illustrate the point. The reasoning in surface area is much easier to follow, both visually and conceptually, exactly as the reviewer described.

      (3) In Discussion lines 199-203, it is stated that self-similarity, operating on all length scales, should be used as a test for existing and future models of gyrification mechanisms. First, the authors do not show, (and it would be surprising if it were true) that self-similarity is observed for length scales smaller than the acquired MRI data for any of the datasets analyzed. The analysis is restricted to coarse (but not fine)-graining.

      To clarify this point, we have added a supplementary section and the following sentence: “Note this method has also no direct dependency on the original MR image resolution, as the inputs are smooth grey and white matter surface meshes reconstructed from the images using strong (bio-)physical assumptions and therefore containing more fine-grained spatial information than the raw images (also see Suppl. Text 3).”

      We are indeed sampling at resolutions down to 0.2mm, which is below MR image resolution. The reviewer is, however, correct that we are only coarse-graining, not “fine-graining”. Coarse-graining, here, relates to more coarse than the smooth surface meshes though, not the MR image.

      Therefore, self-similarity on all length scales would seem to be too strong a constraint. Second, it is hard to imagine how this test could be used in practice. Specific examples of how gyrification mechanisms support or fail to support the generation of self-similarity across any length scale, would strengthen the authors' argument.

      We agree that spatial scales much below 0.2mm resolution may not be of interest, as these scales are only measuring the fractal properties, or “bumpiness”, of the surface meshes at the vertex level. We have therefore revised our statement in Discussion and clarified it with an example: “Finally, this dual universality is also a more stringent test for existing and future models of cortical gyrification mechanisms at relevant scales, and one that moreover is applicable to individual cortices. For example, any models that explicitly simulate a cortical surface could be directly coarse-grained with our method and compared to actual human and primate data provided here.”

      Some additional, specific comments are as follows:

      (4) The definition of the term A_e as the "exposed surface" was difficult to follow at first. It might be helpful to state that this parameter is operationally defined as the convex hull surface area.

      We agree and introduced this term now at first use: “The exposed surface area can be thought of as the surface area of a piece of cling film wrapped around the brain. Mathematically, for the remaining paper it is the convex hull of the brain surface.”

      Also, for the pial surface, A_t, there are several who advocate instead for the analysis of a cortical mid-thickness surface area, as the pial surface area is subject to bias depending on the gyrification index and the shape of the gyri. It would be helpful to understand if the same results are obtained from mid-thickness surfaces.

      This point is indeed being investigated independently of this study. Our provisional understanding is that in healthy human brains, at native scale, using the mid (or the white matter) surface introduced a systematic offset shift in the scaling law, but does not affect the scaling slope of 1.25. However, this requires a more in-depth investigation in a range of other conditions, and in the context of the coarse-grained shapes, which is on-going. Nevertheless, the scaling law, at first introduction already, has been using the pial surface area [Mota2015] and all subsequent follow-up studies followed this convention. To make our paper here accessible and directly comparable, we therefore used the same metric. Future work will investigate the utility of other metrics.

      (5) In Figure 2c, the surfaces get smaller as the coarse-graining increases, making it impossible to visually assess the effects of coarse-graining on the shapes. Why aren't all cortical models shown at the same scale?

      The purpose of rescaling the surfaces is to match the scaling plot (Fig 2A) directly, which are showing shrinking surface areas Ae and At with increasing coarse-graining. Here, we are effectively keeping the size of the box constant and resizing the cortical surface instead, which is mathematically equivalent to changing the box size and keeping the cortical surface constant.

      An alternative interpretation of the “shrinking” is, therefore, that with increasingly smaller cortical surfaces, the folding details disappear, as we require from our coarse-graining method. This is also visually apparent, as the reviewer points out. We have added this to the explanation in the text.

      If we, however, changed the box size instead, the scaling law plot would be meaningless: for example, Ae would barely change with coarse-graining. We would therefore have needed to introduce more complexity in our analysis in terms of how we can measure the scaling law. Thus, we opted to present the simpler method and interpretation here.

      Nevertheless, we agree that a direct comparison would be beneficial and have thus added the videos for each species in supplementary under this link: https://bit.ly/3CDoqZQ Upon completed peer-review, we hope to integrate these directly into eLife’s interactive displays for this figure.

      (6) Text in Section 3.2 emphasizes that K is invariant with scale (horizontal lines in Figure 3), and asserts this is important for the formation of all cortices. However, I might be mistaken, but it appears that K varies with scale in Figure 4a, and the text indicates that differences in the S dependence are of importance for distinguishing young vs. old brains. Is this an inconsistency?

      We agree that it may be confusing to emphasise a “constant K” in the first set of results across species, and then later highlight a changing K in the human ageing results. To clarify, in the first set of results, we find a constant K relative to a changing S: the range in K across melted primate brains is less than 0.1, whereas in S it is over 1.2. In other words, S changes are an order of magnitude higher than K changes. Hence, we described K as “constant” relative to S.

      Nevertheless, K shows subtle changes within individuals, which is what we were describing in the human ageing results. These changes are within the range of K values described in the across species results.

      However, in the interest of clarity, we followed the reviewer’s suggestion of simplifying the last set of results on human ageing and therefore the variable K in human ageing now only appears in Supplementary. We have now added clarifications to the supplementary on this point.

      Reviewer #3 (Public Review):

      Summary:

      Through a detailed methodology, the authors demonstrated that within 11 different primates, the shape of the brain matched a fractal of dimension 2.5. They enhanced the universality of this result by showing the concordance of their results with a previous study investigating 70 mammalian brains, and the discordance of their results with other folded objects that are not brains. They incidentally illustrated potential applications of this fractal property of the brain by observing a scale-dependent effect of aging on the human brain.

      Strengths:

      • New hierarchical way of expressing cortical shapes at different scales derived from the previous report through the implementation of a coarse-graining procedure.

      Positioning of results in comparison to previous works reinforcing the validity of the observation.

      • Illustration of scale-dependence of effects of brain aging in the human.

      Weaknesses:

      • The impact of the contribution should be clarified compared to previous studies (implementation of new coarse graining procedure, dimensionality of primate brain vs previous studies, and brain aging observations).

      We have now made these changes, particularly by adding two paragraphs to the start of Discussion. One summarising the main contributions above previous work, and one paraphrasing the former in plain English for accessibility.

      • The rather small sample sizes, counterbalanced by the strength of the effect demonstrated.

      We have now increased the sample size of the human ageing analysis substantially to over 100 subjects and observe the same trends, but with an even stronger effect. We therefore believe that this revision serves as an additional internal validation of our data and methods.

      • The use of either averaged or individual brains for the different sub-studies could be made clearer.

      We have now added this to our Suppl methods: with the exception of the Marmoset, all brain surface data were derived from healthy individual brains.

      • The model discussed hypothetically in the discussion is not very clear, and may not be state-of-the-art (axonal tension driving cortical folding? cf. https://doi.org/10.1115/1.4001683).

      We have now added this citation to our Discussion and given it context:

      “Indeed, our previously proposed model [Mota2015] for cortical gyrification is very simple, assuming only a self-avoiding cortex of finite thickness experiencing pressures (e.g. exerted by white matter pulling, or by CSF pressure). The offset K, or 'tension term', precisely relates to these pressures, leading us to speculate that subtle changes in K correlate with changes in white matter property [Wang2016, Wang2021]. In the same vein of speculation, the scale-dependence of K shown in this work might therefore be related to different types of white matter that span different length scales, such as superficial vs. deep white matter, or U-fibres vs. major tracts. However, there are also challenges to the axonal tension hypothesis [Xu2010]. Indeed, white matter tension differentials in the developed brain may not explain location of folds, but instead white matter tension may contribute to a whole-brain scale 'pressure' during development that drives the folding process overall.”

      Reviewer #3 (Recommendations For The Authors):

      Many thanks to the authors for this elegant article. I will only report here on the cosmetics of the article.

      We thank the reviewer for their kind words and attention to detail and have made all the suggested changes and revised the paper generally for readability, grammar and spelling.

      p2: last line of abstract: 'for a range of conditions in the future'.

      p3 l.37: I would not self-describe this method as elegant as this is a subjective property .

      p3 l.38: 'that will render' -> I wouldn't use the future here.

      p.4 l.59: double spacing before ref [9]?

      p.6 l.99: 'approximate a fractal' -> why is 'a' italicized?

      p.7 fig.2: I would expect the colours to be detailed in the legend. Are there two data points per species because both hemispheres are treated separately?

      p.9 l.134-135: 'similar to and in terms of the universal law 'as valid as' -> please add commas for reading comfort: 'similar to, and, in terms of the universal law, 'as valid as'.

      p.9 l. 141: For all the cortices we analysed.

      p.9 Fig 3: I find the colours a bit confusing in Figs B and C. I find Fig C a bit confusing: what are all the lines representative of, and more specifically, the two lower lines with a different trajectory?

      p.10 l.155: '1̃500' -> '~1500'.

      p.13 l. 209: either 'speculate that' of 'wonder if'.

      p.14 l.232: 'neuron numbers' -> 'number of neurons'.

      p.26 S2 second paragraph: 'gryi' -> 'gyri'.

      p.30 l.3: please refrain from starting a sentence with I.e..

      p.30 last line before S3.2: 'The algorithmic implementation in MATLAB can be found on Zenodo: TBA' - I guess this is linked to you disclosing the code upon acceptance, but please complete before final submission.

      p.34 middle/bottom of page: 'The scheme described in Sec. S3.1' -> double spacing before S3.1?

      p.35 l.1: 'We simply replace' -> 'we simply replace' (no capital).

      p.36 Fig S5.1: explicit the same colouring of the points and boxes in legend

      p.38 Fig. S6.1: briefly describe the use of colours in the legend.

      p.39 Fig. S7.1: detail colours in the legend.

      p.41 Fig. S7.3: detail colours in the legend.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      Figure 1

      • The "matched primary tumors" from TCGA include n=424 from cutaneous melanoma; but it is unclear where this is coming from; the PanCan Atlas for melanoma shows n=81 primary and 367 metastatic tumors. There are also additional large cohorts of ICI-treated metastatic tumors with RNAseq data (e.g. a metastatic melanoma cohort with 100+ patients https://doi.org/10.1038/s41591-019-0654-5) that would increase the numbers here.

      We thank the reviewer for their observation. We have replaced references to “primary” cancers as “TCGA” cancers as appropriate. While the TCGA analyses included metastatic samples, the majority of the TCGA tumors in most cohorts correspond to primary cancers or local metastases, a point which we added to the text. We retained Fig. 1D as the representative examples are actual primary samples. We have decided to defer analysis of additional melanoma cohorts for future inquiry.

      Figure 2

      • What is the basis for the split between high and low Dux4 expressing tumors at 1 TPM? Is it arbitrary, or based on some structure in the distribution? (e.g. bimodal distribution)

      Our previous analyses of RNA-seq datasets derived from early embryogenesis samples (PMID: 3132774, 28459457) showed that physiologic levels of DUX4 range from approximately 2 to 10 TPM. We added a description in the methods section, under “Genome annotations, gene expression, and Gene Ontology (GO) enrichment analyses,” of our conservative choice for the threshold: DUX4-positivity defined as expression levels > 1 TPM.

      Figure 3

      • Overall claim is that Dux4 expression is associated with worse survival in metastatic urothelial carcinomas treated with PD-L1 inhibitor. However, the rationale for the choice of split (Dux4 expression < 0.5 and > 1 TPM) to show is unclear (is this the 25th percentile? 75th percentiles?), and the rationale/interpretation of the "partial adjustment" for TMB by removing the bottom quartile of TMB feels non-rigorous and prone to bias. It doesn't feel like Fig 3bc contributes very much; Figure 4 really is the more rigorous analysis.

      We thank the reviewer for these comments and suggestions. We adjusted the analyses in Fig. 3C and Fig. S3 to be consistent with Fig. 1 and Fig. 2, in terms of the choice of split. We also clarified in the text how our initial, crude TMB adjustment served as an important indication for us to pursue more rigorous statistical approaches.

      Figure 4

      • Dux4 expression is independently associated with worse survival considering other clinical and molecular characteristics

      • I would include TGFB in the features considered in the table (in the supplementary but not the main table or forest plots, not sure why not?)

      • The choice of Dux4 expression split ( < 0.25 and > 1 TPM) feels arbitrary and is different than the split in Figure 3; what is the rationale for this? Also, how many patients does this exclude? (TPM between 0.25 and 1). What does the continuous value or median split for Dux4 expression give you for the CoxPH model?

      • Re: building a predictive model, excluding patients (e.g. between <0.25 and > 1 Dux4 TPM) makes the model difficult to apply (e.g. cannot apply to patients with Dux4 levels in the missing interval); a better predictive model would include all patients in the cohort.

      We thank the reviewer for their other suggestions. We have clarified in the text that our choice to define DUX4negative samples as those with DUX4 expression levels < 0.25 TPM was made to preemptively address potential misclassifications due to decreased sensitivity of bulk RNA-seq at very low expression levels (PMID: 18516045). We believe our classifications with the new scheme are more reliable. We have also now specified in the text that our categorization excludes 126 patients. We have decided to not pursue the addition of TGFB or exploration of the use of an alternative split or continuous version of DUX4 expression in the Cox Proportional Hazards analyses but appreciate the suggestions, which we will keep in mind for future studies.

      Figure 5

      • An RSF (randomized survival forest) model predicts survival in Dux4+ vs Dux4- patient, and the Shapley values for landmark time analyses show time-varying effects of different features.

      • In some sense, the authors have already demonstrated that Dux4+ is associated with survival differences in ICI treated patients; so a model that predicts survival applied to Dux4+ and Dux4- patients that shows a difference in survival is unsurprising (even in a training/test set setting given that there is a difference in survival across the entire cohort). The quantified marginal effect (from a predictive perspective) of different features is what is interesting here. In that light, I'd like to see more validation of the model up front, specifically how close the predicted survival is to the actual survival of patients (e.g. the survival curves in Fig 5a but with actual survival of the Dux4- and Dux4+ cohorts superimposed on the predicted probabilities).

      We thank the reviewer for this suggestion. We have added a plot showing the superimposed survival probability estimates over time for the RSF and KM models for patients assigned to either the test or training sets in Fig. 5.

      SFig 5

      • Unclear how the authors got estimates of the # of expected deaths associated with covariates (e.g. "...we measured an increase in the number of predicted deaths associated with DUX4-positivity by approximately 16, over DUX4negative status (Fig S5F-G).") from Shapley values as shown in the indicated figure - is this 16 out of the entire cohort? At a given time point? Would recommend perhaps showing the inferred absolute change in mortality (e.g. 8% absolute increase in mortality)

      Mortality is the expected number of deaths for the cohort over the observation window, measured as the sum of the CHF over time. We have clarified this in the Methods section, under “Random Survival Forest, feature importance, and partial dependence.” We have also changed the quantification to show the absolute mortality differences comparing patients with DUX4-negative and -positive tumors; we thank the reviewer for this suggestion. We have also clarified in the text that adjusted mortality was estimated via partial dependence, which operates using the correct units, as opposed to Shapley values, where attribution is scaled. Finally, we changed the referenced figure when discussing changes in mortality associated with TMB and DUX4 status (Fig. S5H-I); we appreciate the reviewer pointing out this error.

      Figure S1B-C

      • The authors argue that Dux4 expression is not an artifact of FFPE tissue by analyzing a mixed tumor cohort sequenced with both poly-A and hybrid probe capture in matched flash-frozen and FFPE tumor samples, showing that it is 1) detectible both FFPE and flash-frozen tissue and 2) higher levels are detected in polyA sequencing/frozen tissue. However, the reference for this section (D. Robinson et al 2015) is a study of a cohort of prostate cancers with polyA bulk RNAseq sequencing; is this correct/is the data coming from a different study?

      • Analysis of scRNAseq (if available) would strengthen their analyses by better delineating the expression and response of interferon-gamma and downstream (e.g. antigen presentation) pathways in specific cell compartments, and potential differences in cell-cell interactions (e.g. using CellPhoneDB) associated with Dux4+ vs Dux4- tumors.

      • Do the investigators find similar findings in primary and metastatic tumors sequenced the same way (e.g. tcga primary vs met melanoma, albeit most of the met melanoma are Stage III lymph nodes)?

      We thank the reviewer for finding the citation error. We have corrected the manuscript to reflect the correct study we analyzed (PMID: 28783718). We also thank the reviewer for their additional suggestions, which undoubtedly would strengthen the current study. However, we have respectfully decided to defer these additional analyses for future study.

      Reviewer #2:

      It is strange as a statistician to see BIC and AIC represented as barplots, e.g. Figure 4B. There is no knowledge to be gained through this visual representation that would not otherwise be conveyed by just giving the numbers.

      We thank the reviewer for this suggestion. We understand that simply stating the numbers would be equally informative. However, we respectfully decided to retain our current versions of Figures 4 and S4 so that the numbers can be illustrated in a visual manner in the figures, rather than just stated in the text.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      • Line 144, after eq. (1). Vectors d_i need to be defined. Are these the mapping of vectors e_i due to the active deformation? It would be useful to state then that d_3 is aligned with r'.

      Thank you for your suggestion, and the definition has been added to lines 146-149 for a better understanding of the model.

      • Line 144.Authors state a_i(0,0,Z)=0. Shouldn't this be true also for any angle, i.e., a_i(0,Theta, Z)=0?

      Thank you, we have revised it in line 144.

      • Line 156. G_0 is defined as Diag(1,g_0(t), 1), which seems to be using cylindrical coordinates. Previously, in line 147, vector argument X of \chi is defined with Cartesian coordinates (X,Y,Z). Shouldn't these be also cylindrical?

      We are very sorry for this error, our initial configuration is defined with cylindrical coordinates, we have revised it in the manuscript line 151.

      • Line 162. "where alpha and beta lie in the range [-pi/2, pi/2]" has already been indicated.

      Thank you for your mention, we have deleted duplicate information in line 166.

      • Line 171. W is defined as the strain energy density, while in equation (2), symbol W is the total energy (which depends on the previous W). Letters for total elastic and strain energy must be distinguished.

      Thank you, we have changed the letter for total energy in Eq.(2).

      • Line 176. "we take advantage of the weakness of" -> "we take advantage of the small value of".

      We have revised it in line 179.

      • Line 177. Why is there a subscript i in p_i? If these do not correspond to penalty p, but to parameters in eqn (3), the latter should have been introduced before this line.

      We have revised this error in line 180.

      • Line 186. "as the overall elongation \zeta". This parameter, axial extension, has not been defined yet.

      Thank you for your mention, the definition of \zeta is now given in line 146.

      • Figure 4. Why are the values of g_0 from the elastic model and equations (30)-(32) so non-smooth? Clarify what is being fit and what is the input in the latter equations. Final external radius R_3? Final internal radius R_1'?

      (1) To mimic the embryo, we consider a multi-layered cylindrical body so that the shear modulus of each layer is different. The continuity of both deformations and stresses is imposed (see Eq.(26)-Eq.(30). This is the usual treatment for complex morpho-elastic systems. Obviously, $g_0$ originates from the actomyosin cortex so it appears only in the corresponding layer. Finally, all physical quantities such as deformations and stresses must be continuous.

      (2) The final outer radius is R_3, which represents the outer radius of C. elegans embryos. In addition to R_3, what we need to consider in this model are R_1’=0.7, R_1’=0.768, R_2=0.8 and R_2’=0.96, these definitions have been added in the caption of Appendix 2—figure 1.

      • Line 663, equation (19). Parameter mu is multiplying penalisation term with p, while in equation (2) mu is only affecting the elastic part.

      These two different ways of expressing the energy function will ultimately affect the value of p, but the two p are not the same quantities, so they will not affect our results. To avoid misunderstandings, we will replace p in equation (19) with q.

      Reviewer #2 (Recommendations For The Authors):

      As mentioned in my public summary, I find the writing really not adequate. I provide here a list of specific points that the authors should in my opinion address. As a general comment, I would delete many instances of 'the'.

      First, here are figures and whole paragraphs that do not seem to bring anything to the understanding of the phenomenon of C. elegans elongation, notably, Figs. 2, 3C-H, 5m, and 6. Figures 6G and 7 are the only figures containing results it seems. Some elements of the figures are repeated, for example, the illustration of the system's cross-section in Figs 3 and 5.

      Thank you for your suggestion, we have made some adjustments to our images to remove some of the duplicate information.

      Second, and this is my most important criticism: the mechanism of elongation by releasing elastic stress introduced by muscle contraction is not explained in clear terms anywhere in the text. At least, I was unable to understand it. On p 10 you write "This energy exchange causes the torsion-bending energy to convert into elongation energy, (...)" How this is done is not explained. I assume that the reference state is somehow changed through muscle contraction. The new reference state probably has a longer axis than the one before, but this would then be a plastic deformation and not purely elastic as claimed by the authors (ll 76: "This work aims to answer this paradox within the framework of finite elasticity without invoking cell plasticity (...)"). Is torsion important for this process or is it 'just' another way to store elastic energy in the system?

      We perfectly explain most of the exchange of energy between bending, torsion and elongation: indeed, we quantify all aspects of this transformation as the elastic elongation energy, and the dissipation processes which will cost energy. The dissipation evaluated here concerns the rotation of the worm due to the muscle geometry and the viscous friction at the inner surface of the egg. Torsion seems to appear in the late stages and only in some cases. As we show, it comes from a torque induced by the muscles which are not vertical. vertical. Finally, our quantitative predictions of the modelling which recovers most of the experimental published results.

      Third, there are a number of strange phrasings and the notation is not helpful in places.

      We feel sorry for that, the manuscript is now more precise.

      Fourth, the title promises to explain how cyclic muscle contractions reinforce acto-myosin motors. I can't see this done in this work.

      The fact that the acto-myosin is reorganized between two sequences of contraction justifies the title. The complete reorganization of the actomyosin network would require a chemico-mechanical model that is not achieved here, perhaps in future work as data become available.

      In addition:

      We have chosen to respond globally rather than point by point to the referee’s recommendations.

      Typographic errors and vocabulary

      All English corrections and typos are now included in the main text.

      Figures and captions:

      Figures and captions have been improved.

      • Figure 1: Make the caption and the illustration more coherent. For example, only two cell types are distinguished; in the caption, you mention lateral cells, in the sketch seam cells. What is the difference between acto-myosin and muscle contraction? Muscle contraction is also auto-myosin-based.

      (1) The caption for Fig.1 is revised.

      (2) From a mechanical point of view, actomyosin bundles in C elegans are orthoradial, whereas muscles are essentially parallel to the main axis of the body are essentially parallel to the main axis of the body, so the geometry is completely different and of extreme importance for deformation. Muscle contractions are quasi-periodic, we do not know the dynamics of the attached molecular motor of myosin. So of course, both contain actin and myosin (not exactly the same proteins), but our model is sensitive to more macroscopic properties.

      • Figure 2: I do not find this figure helpful. I might expect such a figure in a grant proposal, but much less in an article.

      Figure 2 shows the strategy of our work, we hope that readers can see at a glance what kind of analysis has been done through this figure: since our work is divided into several parts, readers can also unravel the logic through this scheme after reading the whole manuscript. So, this diagram is a guide, and it may be helpful and necessary.

      • Figure 3: Figure 3 A, right: What is the dashed line? B You indicate fibers, but your model does not contain fibers, does it? How do I get from the cube to the deformed object? What is the relation of C-H with the rest of the work? Furthermore, you mention seam cells in Fig. 1, but they are absent here. Why can you neglect them? Why introduce them in the first place? E What is a plant vine? F-H What rods are you referring to? Plants do not have muscles, right?

      We have modified this figure, and the original Figure 3 now corresponds to Figures 3 and 4.

      (1) The dashed line is the centerline after deformation.

      (2) The referee is wrong: our model represents the fibers by a higher shear modulus for the actomyosin cortex and for the muscles (see Table Appendix 1) and G_1 reflects the activities of the muscle and actin fibers.

      (3) The cube in Figure 3 is a mathematical 3D volume element that is subjected to stresses. Hyperelasticity modelling is based on such a representation.

      (4) C-H(new version: Fig.4 A-F): These images show similar deformations: bending and torsion as our C. elegans study. These figures indicate that such deformations are quite common in nature, even if the underlying mechanism is different.

      (5) This is a point we have already mentioned: we ignore the difference between the different types of epidermal cells and average their role in the early and second stages of elongation.

      (6) The plant vine is the 'botanical vine', see Goriely's article and book.

      (7) F-H(new version: Fig.4 D-F) do not have fixed rods, we set a curvature and torsion to fit the actual biological behavior.

      (8) Plants do not have muscles, but they grow, and our formalism for growth, pre-strain and material plasticity is very similar to the hyper-elasticity formalism.

      • Figure 4: Fig .4 A: "The central or inner part (0 < 𝑅 < 𝑅2, shear modulus 𝜇𝑖) except the muscles which are stiffer." I do not understand.

      In the new version, this figure corresponds to Fig.5. The shear modulus of the intrinsic part is very small, but the muscles are harder so we have to consider them separately, we have revised this sentence to avoid misunderstanding.

      • Figure 5: Fig 5 A and D: The schematic of the cross-section has appeared already in the previous figure. No need to repeat it here. The same holds for the schematic of the cylindrical embryo. Caption: "But, the yellow region is not an actual tissue layer and it is simply to define the position of muscles." Why do you introduce the yellow region at all? I do not think that it clarifies anything. "Deformation diagram, when left side muscles M_1 and M_2." Something seems to be missing here. Similarly in the next sentence. "the actin fiber orientation changes from the 'loop' to the 'slope'" Do the rings break up and form a helix?

      In the new version, this figure corresponds to Fig.6.

      (1) We have made revisions to these figures.

      (2) The yellow part can show the accurate location of four muscles, which is important for our model and further calculations.

      (3) We have revised this sentence in the caption of Fig. 6.

      (4) Actin rings do not change to a helix pattern, they will be only sloping.

      • Figure 6: Fig 6 A-C These panels do not go beyond Fig 5B. Fig 6D: what are these images supposed to show? They are not really graphs, but microscopy images. The caption is not helpful to understand, what the reader is supposed to see here. Fig 6F: do you really want to plot a linear curve?

      In the new version, Fig.5 and Fig.6 respectively correspond to Fig.6 and Fig.7.

      (1) Fig.6 shows the simulated images, and Fig.7 A-C is the real calculation results, they are different.

      (2) Fig.7 D can show the real condition during C. elegans late elongation, here, we would like to show the torsion of the C. elegans.

      (3) Yes, it is our result.

      Discussions concerning the biological referee questions:

      Ll 75: “how the muscle contractions couple to the acto-myosin activity" Again I find this misleading because muscle contraction relies on auto-myosin activity. Probably, you can find a better expression to refer to the activity of the actomyosin network in the epidermis. Do you propose any mechanism for how muscle contraction increases epidermal contractility? This does not seem to be the mechanism that you propose for elongation, is it?

      The actomyosin activity will not stop because of the muscle contraction. Obviously, these two processes cannot be independent. The energy released by a muscle contraction event can and must contribute to the reorganization of the actomyosin network that occurs during the elongation process. Indeed, despite the fact that the embryo elongates, the density of actin cables appears to be maintained, which automatically requires a redistribution of actin monomers. We propose a scenario in which muscle contraction increases actomyosin contractility via energy conversion. We show that after unilateral contraction there is an energy release for this once all dissipation factors are eliminated. We invite the reviewer to re-examine Figure 2 and invite biologists to seriously evaluate the density of molecular motors attached to the circumferential actin cable throughout the stretch process.

      Ll 133: "we decide to simplify the geometrical aspect because of the mechanical complexity" This is hardly a justification. Why is it appropriate?

      Yes, we would like to offer the reader the simplest modelling with a limiting technicity and a limited number of unknown parameters.

      L 135: "active strains" Why not active stress?

      The two are equivalent, the choice is dictated by the simplicity of deriving quantitative results for comparison with experiments.

      L 170: "hyperelastic" Please, explain this term.

      It is the elasticity of very soft samples subjected to large deformations. For classic references, see the books of Ogden, Holzapfel and Goriely, all of which are mentioned in our paper.

      Major criticism

      Eq. 3 and Ll 227: "𝑝1 is the ratio between the free available myosin population and the attached ones divided by the time of recruitment" Why is the time of recruitment the same for all motors? "inverse of the debonding time" Is it the same as the unbinding rate? Why use the symbol p_2 for it? What is p_3?

      The model proposed to justify the increase in the activity of the actomyosin motors during the first phase is a mean-field model: thus all quantities are averaged: we are not considering the theory of a single molecular motor, but a collection in a dynamic environment, so we do not need stochasticity here. Equation (3) concerns the compressive pre-strain, which by definition is a quantity varying between $0$ and $1$ and $X_g=1-G$. ... The debonding time is not the same as the debonding rate. The term $p_3$ indicates saturation and is derived from the law of mass action. The good agreement with the experimental data is shown in Fig.5 (A) and (B). An equivalent model has been developed by (M. Serra et al.).

      Serra M, Serrano Nájera G, Chuai M, et al. A mechanochemical model recapitulates distinct vertebrate gastrulation modes[J]. Science Advances, 2023, 9(49)

      Ll 275: "This energy exchange causes the torsion-bending energy to convert into elongation energy, leading to a length increase during the relaxation phase, as shown in Fig.1 of Appendix 5." You have posed the puzzle of how contraction leads to elongation, and now that you resolve the puzzle, you simply say that torsion and bending energy are converted into elongation. How? Usually, if I deform an elastic object, it will return to its original configuration after releasing the external forces. Why is this not the case here?

      Furthermore, the central result of your work is presented in an Appendix!?

      We agree with the referee that an elastic object will return to its initial configuration by releasing stress, i.e. by giving up its accumulated elastic energy to the environment. But the elastic energy has to go somewhere, such as heat. We do not dare to say that the temperature of the worm increases during the muscle contractions.

      In fact, the referee's comment also assumes that full relaxation of the stresses is possible, so the object is not a multi-layered specimen and/or it is not enclosed in a box. Most living species are under stress, usually called residual stress. Our skin is under stress. Our fingerprints result from an elastic instability of the epidermis, occurring on foetal life as our brain circumvolutions or our vili. . So, it is obvious that stresses are maintained in multilayered living systems. Closer to the case of C. elegans, the existence of stresses has been demonstrated by experiments with laser ablation fractures in the first stage. The fact that the fractures open proves the existence of stress: if not, there is no opening and only a straight line.

      Ll 379: "Although a special focus is made on late elongation, its quantitative treatment cannot avoid the influence of the first stage of elongation due to the acto-myosin network, which is responsible for a prestrain of the embryo." This statement is made repeatedly through the manuscript, but I do not understand, why you could not use an initial state without pre-strain.

      This is the basic concept of hyperelasticity. The reference state must be free of stress, so we cannot evaluate the first muscle contraction without treating the first elongation stage.

      Grammar, vocabulary and writing errors

      ll 31: "the influence of mechanical stresses (...) becomes more complex to be identified and quantified" Is the influence of mechanical stress too complex or too difficult to be identified/quantified?

      We have revised it in line 31, “The superposition of mechanical stresses, cellular processes (e.g., division, migration), and tissue organization is often too complex to identify and quantify.”

      Ll 41: "The embryonic elongation of C. elegans represents an attractive model of matter reorganization without a mass increase before hatching." Maybe "Embryonic elongation of C. elegans before hatching represents an attractive model of matter reorganization in the absence of growth.".

      We have revised it in line 41.

      L 42: "It happens after the ventral enclosure (...)" Maybe "It happens after ventral enclosure (...)".

      We have revised it in line 42.

      Ll 52: "The transition is well defined since the muscle participation makes the embryo rather motile impeding any physical experiments such as laser ablation (...)" Ablation of what?

      We have revised it in line 53:The transition is well defined, because the muscle involvement makes the embryo rather motile, and any physical experiments such as laser fracture ablation of the epidermis, which could be performed and achieved in the first period (\cite{vuong2017interplay}), become difficult,.

      Ll 59: "a hollow cylinder composed of four parts (seam and dorso-ventral cells)" It is not clear, what the four parts are - in the parenthesis, two are mentioned.

      We have revised it in line 59. Fig.1 shows the whole structure, dorsal, ventral and seam cells form four parts of the epidermis.

      L 78: "several important issues at this stage remain unsettled" At which stage?

      It means the late elongation stage, we have added this information in line 78.

      Ll 85: "but how it works at small scales remains a challenge." Maybe "but how it works at small scales remains to be understood.".

      We have revised it in line 86.

      Ll 99: "the osmolarity of the interstitial fluid" The comes out of the blue. Before you only talked about mechanics, why now osmolarity? Also, the interstitial fluid is only mentioned now. It is important for the dissipative effects that you discuss later, right? If yes, then you should probably introduce it earlier.

      For a better understanding, we have change osmolarity into viscosity in line 99.

      l 120: "The cortex is composed of three distinct cells" Maybe "distinct cell types".

      Thank you, and we have revised it in line 120.

      L 121: "cytoskeleton organization and actin network configurations" What is the difference between cytoskeleton organization and actin network configuration? Also, either both should be plural or both singular, I guess.

      (1) Cytoskeleton (which involves microtubules) forms the epidermis of C. elegans embryos, and the actin network surrounds the epidermis.

      (2) Thank you for your suggestion, we have revised it in line 121.

      L 130: "which will be introduced hereafter" Maybe "which will be used hereafter".

      We have revised it in line 130.

      Ll 148: "The geometric deformation gradient" You usually denote vectors in bold face, so \chi should be bold, right? Define d_i in Eq.(1).

      Yes, we have added this information in line 147.

      L 172: "auxiliary energy density" Please, explain this term.

      We have changed "auxiliary energy density" into "associated energy density" in line 175. Energy density is the amount of energy stored in a given system or region of space per unit volume, the associated energy density in our manuscript can help us to do some calculations.

      Ll 188: "Similar active matter can be found in biological systems, from animals to plants as illustrated in Fig.3(C)-(E), they have a structure that generates internal stress/strain when growing or activity. (...)" Why such a general statement during the presentation of the results? The second part of the sentence seems to be incomplete.

      Answers: We would like to show our method is general, and can be used in many situations. We have revised the wrong sentence in line 192.

      Ll 243: "a bending deformation occurs on the left for active muscles localized on left" Maybe "bending to the left occurs if muscles on the left are activated".

      Thank you, we have revised it in line 247.

      L 250: "we assume them are perfectly synchronous" Maybe "we assume them to contract simultaneously". We have revised it in line 252.

      L 258: "the muscle and acto-myosin activities are assumed to work almost simultaneously." Before it was simultaneously, now only almost!? What does almost mean?

      Sorry, we would like to express the same meaning in theses two sentences, we have deleted the word ‘almost’ in line 261.

      Ll 294: "one can hypothesize several scenarios" After that, only one scenario is described it seems.

      Thank you, we have revised this sentence in line 299.

      L 341: "and then is more viscous than water" Maybe "and that is more viscous than water".

      We have revised it in line 345.

      L 373: "before the egg hatch" Maybe "before the embryo (or larva) hatches"?

      We have revised the sentence in line 367.

      L 409: "elephant trunk elongated" maybe "elephant trunk elongation".

      We have revised it in line 412.

      Ll 417: "As one imagines, it is far from triviality (...)" Does this remake help in any way to understand better C. elegans elongation? Also maybe "it is far from trivial".

      We have revised it in line 423.

      Ll 428: "can map the initial stress-free state B_0 to a state B_1, which reflects early elongation process" Maybe: "maps the initial stress-free state B_0 to a state B_1, which describes early elongation".

      We have revised it in line 428.

      L 429: "After in the residually stressed (...)" Maybe "Subsequently, we impose an incremental strain filed G_1 that maps the state B_1 to the state B_2, which represents late elongation".

      We have revised it in line 429.

      l 763: "Modelling details of without pre-strain case" Maybe "Case without pre-strain" or "Modelling in the absence of pre-strain" Similarly for l 784.

      We have revised them in line 763 and line 784.

      Some questions of definition and understanding

      Ll 71: "We can imagine that once the muscle is activated on one side, it can only contract, and then the contraction forces will be transmitted to the epidermis on this side." I do not understand the sentence. Muscle activation leads to contraction, there is nothing to imagine here. Maybe you hypothesize that the muscles are attached to the epidermis such that muscle contraction leads to epidermis deformation?

      Yes, four muscle bands are attached to the epidermis, as shown in Fig.1. The deformation does not concern only the epidermis but the whole embryo during the bending events. We have modified the sentence to avoid misunderstanding, the sentence change to “Once the muscle is activated on one side, it can only contract, and then the contraction forces will be transmitted to the epidermis on this side.” in line 71.

      Ll 110: "However, it is less widely known that its internal striated muscles share similarities with skeletal muscles found in vertebrates in terms of both function and structure" Is it important for what you report, whether this fact is widely known?

      Yes, it is our opinion.

      Ll 112: "the role of the four axial muscles (...) is nearly contra-intuitive" Is it or is it not? If yes, why?

      Yes it is. Muscles exert contractions, so compressive deformations. Their localization are along the axis of symmetry (up to a small deviation) so they cannot mechanically realize the expected elongation, contrary to the orthoradial actomyosin network.

      However, elongation of the C. elegans is observed experimentally, so yes, we think the result contraintuitive.

      L 116: "fully heterogeneous cylinder" What is this?

      It means that the C. elegans embryo does not have the same elastic properties in different parts (or layers).

      L 129: "will collaborate to facilitate further elongation" To facilitate or to drive? If the former, what drives elongation?

      Contraction of muscles and actin bundles together drive elongation

      Ll 141: "the deformation in each section can be quantified since the circular geometry is lost with the contractions" The deformation could also be quantified if the sections remained circular, right?

      Yes. However, circularity is lost during each bending event.

      Ll 151: "we need to evaluate the influence of the C. elegans actin network during the early elongation before studying the deformation at the late stage. So, the deformation gradient can be decomposed into: (...) where (...) is the muscle-actomyosin supplementary active strain in the late period" I thought you were now studying the early stage?

      In this part, we are outlining how we can study the whole elongation (early and late), not just the early elongation stage. To evaluate the deformation induced by the first contraction of the muscles, we need to know the state of stress of the worm prior to this event, so we also need to recover the early period using the same formalism for the same structure.

      L 160: "When considering a filamentary structure with different fiber directions" Which filamentary structure are you talking about?

      Fig.3 B shows this model and the filamentary structure, which contains the actin and muscle fibers.

      Ll 174: "When the cylinder involves several layers with different shear modulus 𝜇 and different active strains, the integral over 𝑆 covers each layer" I do not understand this sentence. Also, you should probably write 'moduli' instead of modulus.

      This implies that when integrating over the whole cross-section S, we need to take into account each layer independently with its own shear modulus and sum the results.

      L 176: "weakness of 𝜀" Do you mean \epsilon << 1?

      Yes

      Ll 178: "Given that the Euler-Lagrange equations and the boundary conditions are satisfied at each order, we can obtain solutions for the elastic strains at zero order 𝐚(𝟎) and at first order 𝐚(𝟏)." Are you thinking about different orders in an \epsilon expansion or the early and the late stages of elongation?

      Answers: Different orders are considered only for the late elongation study, the early elongation is treated exactly so do not need a correction in \epsilon.

      L 197: "fracture ablation" Please, define.

      This is an experiment in which a laser is used to make a cut in a small-scale object of study and then the internal stresses are obtained based on the morphology of the cut, please see the Ref ‘Assessing the contribution of active and passive stresses in C. elegans elongation’. We have added this definition in line 200.

      Ll 203: What motivated your choice of notations for the radii R_2'? The inner part of the cylinder is fluid? But above you wrote about a solid cylinder. Why should the inner part be compressible?

      (1) We need to define the location of actin cables, which concentrate at the outer periphery.

      (2) Our model is a hollow cylinder, and the inner part of the cylinder contains internal organs, tissues, fluids, and so on, so we consider it to be a compressible extremely soft material (Line 213).

      Ll 212: "𝑟(𝑅) is the radius after early elongation." And during?

      R is variable, r(R) depends on R but also on time t, it represents the radius of C. elegans embryos after the onset of elongation, i.e., after acto-myosin and muscle activities begin.

      L 232: \tau_p is probably t_p?

      Yes.

      L 240: "quite simultaneously" Please, be precise.

      In practice, it is difficult to define the concept of simultaneous occurrence unless there is rigorous experimental data to show it, but all we can get in the Ref ‘Remodelage des jonctions sous stress mécanique’, is that it occurs almost simultaneously, which we define as quite simultaneously.

      Ll 246: "a short period" What does short mean? Why is it relevant?

      From the experimental observations and data, we know that each contraction occurs very rapidly: a few seconds so we define a short period for one contraction.

      L 263: "the bending of the model will be increased" Is it really the model that is bent?

      Yes, the bending deformation predicted by the model, we have revised in line 266.

      Ll 265: "we observed a consistent torsional deformation (Fig.6(E)) that agrees with the patterns seen in the video" In which sense do these configurations agree? I do not see any similarity between panels D and E.

      Both show a torsion deformation.

      L 267: "torsion as the default of symmetry of the muscle axis" I do not understand.

      We discuss two cases in this research, one where the muscle follows the axis of the C. elegans in the initial configuration, and the other where the muscle has a slight angle of deflection, and we have added more information in the manuscript (line 270).

      Ll 274: "Each contraction of a pair increases the energy of the system under investigation, which is then rapidly released to the body." Do you mean the elastic energy stored in the epidermis and central part of the embryo?

      Yes, the whole body.

      Ll 284: "The activation of actin fibers 𝑔𝑎1 after muscle relaxation can be calculated and determined by our model." Have you done it?

      Yes, we can obtain the value of g_a1, and then calculate the elongation.

      Ll 286 I do not understand, why you write about mutants at this place. Am I supposed to have already understood the basic mechanism of elongation? Why do you now write about the first stage?

      I would like to show our formalism can model wild-type and mutant C.elegans, and the comparison results are good.

      L 302: "The result is significantly higher than our actual size 210𝜇𝑚." How was significance assessed? Your actual size is probably more than 210µm.

      Here, we have considered two situations, one is that the accumulated energy is totally applied to the elongation so that the length will be much larger than the experimental result of 210 µm, the length value that we have obtained by calculation. In the other case, we have considered the energy dissipation, which leads to 210 µm.

      L 433: "where 𝜆 is the axial extension due to the pre-strained" Maybe ""where 𝜆 is the axial extension due to the pre-stress".

      In our manuscript, we define the pre-strain, not the pre-stress.

      L 438: "active filamentary tensor" Please, define.

      Active filamentary tensor defines the tensor representing the activities of a cylindrical model composed of different orientations fibers.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      This study presents careful biochemical experiments to understand the relationship between LRRK2 GTP hydrolysis parameters and LRRK2 kinase activity. The authors report that incubation of LRRK2 with ATP increases the KM for GTP and decreases the kcat. From this, they suppose an autophosphorylation process is responsible for enzyme inhibition. LRRK2 T1343A showed no change, consistent with it needing to be phosphorylated to explain the changes in G-domain properties. The authors propose that phosphorylation of T1343 inhibits kinase activity and influences monomer-dimer transitions.

      Strengths:

      The strengths of the work are the very careful biochemical analyses and the interesting result for wild-type LRRK2.

      Weaknesses:

      A major unexplained weakness is why the mutant T1343A starts out with so much lower activity--it should be the same as wild-type, non-phosphorylated protein. Also, if a monomer-dimer transition is involved, it should be either all or nothing. Other approaches would add confidence to the findings.

      We thank the reviewer for these suggestions. We are aware that the T1343A has generally a lower activity compared to the wild type. Therefore, we would like to emphasize that this mutant is the only one not showing an increase in Km values after ATP treatment. Other mutants, also having lower kcat values like T1503A, still show this characteristic change in Km. Our favored explanation for the lower kcat of T1343A is that this mutation lays within a critical region, the so-called ploop, of the Roc domain and is very likely structurally not neutral. Concerning the dimer-monomer transition, we are convinced that there is more than one factor involved in this equilibrium. Most likely, including, but not limited to other LRRK2 domains (e.g. the WD40 domain), binding of co-factors (e.g. Rab29/Rab32 or 14-3-3) and membrane binding. Consistently, also with stapled peptides targeting the Roc or Cor domains we were not able to shift the equilibrium completely to the monomer (Helton et al., ACS Chem Biol. 2021, 16:2326-2338; Pathak et al. ACS Chem Neurosci. 2023, 14(11):1971-1980) We have addressed these points in a revised version of the manuscript.

      Reviewer #2 (Public Review):

      This study addresses the catalytic activity of a Ras-like ROC GTPase domain of LRRK2 kinase, a Ser/Thr kinase linked to Parkinson's disease (PD). The enzyme is associated with gain-of-function variants that hyper-phosphorylate substrate Rab GTPases. However, the link between the regulatory ROC domain and activation of the kinase domain is not well understood. It is within this context that the authors detail the kinetics of the ROC GTPase domain of pathogenic variants of LRRK2, in comparison to the WT enzyme. Their data suggest that LRRK2 kinase activity negatively regulates the ROC GTPase activity and that PD variants of LRRK2 have differential effects on the Km and catalytic efficiency of GTP hydrolysis. Based on mutagenesis, kinetics, and biophysical experiments, the authors suggest a model in which autophosphorylation shifts the equilibrium toward monomeric LRRK2 (locked GTP state of ROC). The authors further conclude that T1343 is a crucial regulatory site, located in the P-loop of the ROC domain, which is necessary for the negative feedback mechanism. Unfortunately, the data do not support this hypothesis, and further experiments are required to confirm this model for the regulation of LRRK2 activity.

      Specific comments are below:

      • Although a couple of papers are cited, the rationale for focusing on the T1343 site is not evident to readers. It should be clarified that this locus, and perhaps other similar loci in the wider ROCO family, are likely important for direct interactions with the GTP molecule.

      To clarify this point: We, have not only have focused on this specific locus, but instead systematically mutated all known auto-phosphorylation sites with the RocCOR domain (see. supplemental information). Furthermore, it has been shown that this site, at least in the RCKW (Roc to WD40) construct, is quantitatively phosphorylated (Deniston et al., Nature 2020, 588:344-349). We are aware that the T1343 residue is located within the p-loop and that this can impact nucleotide binding capacities (see response to reviewer 1).

      We have clarified and addressed these points in a revised version of the manuscript.

      • Similar to the above, readers are kept in the dark about auto-phosphorylation and its effects on the monomer/dimer equilibrium. This is a critical aspect of this manuscript and a major conceptual finding that the authors are making from their data. However, the idea that auto-phosphorylation is (likely) to shift the monomer/dimer equilibrium toward monomer, thereby inactivating the enzyme, is not presented until page 6, AFTER describing much of their kinetics data. This is very confusing to readers, as it is difficult to understand the meaning of the data without a conceptual framework. If the model for the LRRK2 function is that dimerization is necessary for the phosphorylation of substrates, then this idea should be presented early in the introduction, and perhaps also in the abstract. If there are caveats, then they should be discussed before data are presented. A clear literature trail and the current accepted (or consensus) mechanism for LRRK2 activity is necessary to better understand the context for these data.

      We agree on the reviewer’s opinion. We have revised the introduction accordingly and added a paragraph on page 3 starting from line 27.

      • Following on the above concepts, I find it interesting that the authors mention monomeric cytosolic states, and kinase-active oligomers (dimers??), with citations. Again here, it would be useful to be more precise. Are dimers (oligomers?) only formed at the membrane? That would suggest mechanisms involving lipid or membrane-attached protein interactions. Also, what do the authors mean by oligomers? Are there more than dimers found localized to the membrane?

      There are multiple studies that have shown that LRRK2 is mainly monomeric in the cytosol while it forms mainly dimeric or higher oligomeric states at membrane (James et al., Biophys. J. 2012, 102, L41–L43; Berger et al., Biochemistry, 2010, 49, 5511–5523). However, we agree with the reviewer that it remains to be determined if the dimeric form is the most active state at the membrane, or a higher oligomeric state. Espescially since a recent study shows that LRRK2 can form active tetramers only when bound to Rab29 (Zhu et al., bioRxiv, 2022, DOI: 10.1101/2022.04.26.489605). We have clarified these points in the introduction of the revised version of the manuscript (page 3, line 27ff).

      • Fig 5 is a key part of their findings, regarding the auto-phosphorylation induced monomer formation of LRRK2. From these two bar graphs, the authors state unequivocally that the 'monomer/dimer equilibrium is abolished', and therefore, that the underlying mechanism might be increased monomerization (through maintenance of a GTP-locked state). My view is that the authors should temper these conclusions with caveats. One is that there are still plenty of dimers in the auto-phosphorylated WT, and also in the T1343A mutant. Why is that the case? Can the authors explain why only perhaps a 10% shift is sufficient? Secondly, the T1343A mutant appears to have fewer overall dimers to begin with, so it appears to readers that 'abolition' is mainly due to different levels prior to ATP treatment at 30 deg. I feel these various issues need to be clarified in a revised manuscript, with additional supporting data. Finally, on a minor note, I presume that there are no statistically significant differences between the two sets of bar graphs on the right panel. It would be wise to place 'n.s.' above the graphs for readers, and in the figure legend, so readers are not confused.

      Starting with the monomer-dimer equilibrium we are convinced that there is more than the phosphorylation of T1343 (see response to reviewer 1). Therefore a 10% shift in our assay most likely underestimate the effect seen in cells. Consistently, the T1343A mutants show a similar increase in Rab10 phosphorylation assay as the G2019S mutant. This thus shows that the identified feedback mechanism plays an important role in a cellular context. We have addressed this point in the revised manuscript on page 6, line 8ff. As long as the significance indicators in the bar charts are concerned, we agree with reviewer. In order not to overload the figure, we finally decided to include all pairwise comparisons (post-hoc tests) in the supplement.

      • Figure 6B, Westerns of phosphorylation, the lanes are not identified and it is unclear what these data mean.

      We apologize for this mistake and have added the correct labeling in the revised version of the manuscript.

    1. Author Response

      The following is the authors’ response to the current reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      The author studies a family of models for heritable epigenetic information, with a focus on enumerating and classifying different possible architectures. The key aspects of the paper are:

      • Enumerate all 'heritable' architectures for up-to 4 constituents.

      • A study of whether permanent ("genetic") or transient ("epigenetic") perturbations lead to heritable changes

      • Enumerated the connectivity of the "sequence space" formed by these heritable architectures

      • Incorporating stochasticity, the authors explore stability to noise (transient perturbations)

      • A connection is made with experimental results on C elegans.

      The study is timely, as there is a renewed interest in the last decade in non-genetic, heritable heterogeneity (e.g., from single-cell transcriptomics). Consequently, there is a need for a theoretical understanding of the constraints on such systems. There are some excellent aspects of this study: for instance, the attention paid to how one architecture "mutates" into another. Unfortunately, the manuscript as a whole does not succeed in formalising nor addressing any particular open questions in the field. Aside from issues in presentation and modelling choices (detailed below), it would benefit greatly from a more systematic approach rather than the vignettes presented.

      Despite being foundational, this work was systematic in that (1) for the simple architectures modeled using ordinary differential equations (ODEs) with continuity assumptions, parameters that support steady states were systematically determined for each architecture and then every architecture was explored using genetic changes exhaustively, although epigenetic perturbations were not examined exhaustively because of their innumerable variety; and (2) for the more realistic modeling of architectures as Entity-Sensor-Property systems, the behavior of systems with respect to architecture as well as parameter space that lead to particular behaviors (persistence, heritable epigenetic change, etc.) was systematically explored. A more extensive exploration of parameter space that also includes the many ways that the interaction between any two entities/nodes could be specified using an equation is a potentially ever-expanding challenge that is beyond the scope of any single paper.

      Specific aspects that remain to be addressed include the application of multiple notions of heritability to real networks of arbitrary size, considering different types of equations for change of each entity/node, and classifying different behavioral regimes for different sets of parameters.

      The key contribution of the paper is an articulation of the crucial questions to ask of any regulatory architecture in living systems rather than the addressing of any question that a field has recognized as ‘open’. Specifically, through the exhaustive listing of small regulatory architectures that can be heritable and the systematic analysis of arbitrary Entity-Sensor-Property systems that more realistically capture regulatory architectures in living systems, this work points the way to constrain inferences after experiments on real living systems. Currently, most experimental biologists engaged in reductionist approaches and some systems biologists examining the function or prevalence of network motifs do not explicitly constrain their models for heritability or persistence. It is hoped that this paper will raise awareness in both communities and lead to more constrained models that minimize biases introduced by incomplete knowledge of the network, which is always the case when analyzing living systems.

      Terminology

      The author introduces a terminology for networks of interacting species in terms of "entities" and "sensors" -- the former being nodes of a graph, and the latter being those nodes that receive inputs from other nodes. In the language of directed graphs, "entities" would seem to correspond to vertices, and "sensors" those vertices with positive indegree and outdegree. Unfortunately, the added benefit of redefining accepted terminology from the study of graphs and networks is not clear.

      The Entities-Sensors-Property (ESP) framework is based on underlying biology and not graph theory, making an ESP system not entirely equivalent to a network or graph, which is much less constrained. The terms ‘entity’, ‘sensor’, and ‘property’ were defined and justified in a previous paper (Jose, J R. Soc. Interface, 2020). While nodes of a network can be parsed arbitrarily and the relationship between them can also be arbitrary, entities and sensors are molecules or collections of molecules that are constrained such that the sensors respond to changes in particular properties of other entities and/or sensors. When considered as digraphs, sensors can be seen as vertices with positive indegree and outdegree. The ESP framework can be applied across any scale of organization in living systems and this specific way of parsing interactions also discretizes all changes in the values of any property of any entity. In short, ESP systems are networks, but not all networks are ESP systems. Therefore, the results of network theory that remain applicable for ESP systems need further investigation.

      The key utility of the ESP framework is that it is aligned with the development of mechanistic models for the functions of living systems while being consistent with heredity. In contrast, widely analyzed networks like protein-interaction networks, signaling networks, gene regulatory networks, etc., are not always constrained using these principles.

      Model

      The model seems to suddenly change from Figure 4 onwards. While the results presented here have at least some attempt at classification or statistical rigour (i.e. Fig 4 D), there are suddenly three values associated with each entity ("property step, active fraction, and number"). Furthermore, the system suddenly appears to be stochastic. The reader is left unsure what has happened, especially after having made the effort to deduce the model as it was in Figs 1 through 3. No respite is to be found in the SI, either, where this new stochastic model should have been described in sufficient detail to allow one to reproduce the simulation.

      The Supplementary Information section titled ‘Simulation of simple ESP systems’ provides the requested detailed information and revisions to the writing provide the biologically grounded justification for parsing interacting regulators as ESP systems.

      Perturbations

      Inspired especially by experimental manipulations such as RNAi or mutagenesis, the author studies whether such perturbations can lead to a heritable change in network output. While this is naturally the case for permanent changes (such as mutagenesis), the author gives convincing examples of cases in which transient perturbations lead to heritable changes. Presumably, this is due the the underlying multistability of many networks, in which a perturbation can pop the system from one attractor to another.

      Unfortunately, there appears to be no attempt at a systematic study of outcomes, nor a classification of when a particular behaviour is to be expected. Instead, there is a long and difficult-to-read description of numerical results that appear to have been sampled at random (in terms of both the architecture and parameter regime chosen). The main result here appears to be that "genetic" (permanent) and "epigenetic" (transient) perturbations can differ from each other -- and that architectures that share a response to genetic perturbation need not behave the same under an epigenetic one. This is neither surprising (in which case even illustrative evidence would have sufficed) nor is it explored with statistical or combinatorial rigour (e.g. how easy is it to mistake one architecture for another? What fraction share a response to a particular perturbation?)

      As an additional comment, many of the results here are presented as depending on the topology of the network. However, each network is specified by many kinetic constants, and there is no attempt to consider the robustness of results to changes in parameters.

      The systematic study of all arbitrary regulatory architectures is beyond the scope of this paper and, indeed, beyond the scope of any one paper. Nevertheless 225,000 arbitrary Entity-Sensor-Property systems were systematically explored and collections of parameters that lead to different behaviors provided (e.g., 78,285 are heritable). These ESP systems more closely mimic regulation in living systems than the coupled ODE-based specification of change in a regulatory architecture.

      The example questions raised here are not only difficult to answer, but subjective and present a moving target for future studies. One, ‘how easy is it to mistake one architecture for another?’. Mistaking one architecture for another clearly depends on the number of different types of experiments one can perform on an architecture and the resolution with which changes in entities can be measured to find distinguishing features. Two, ‘What fraction share a response to a particular perturbation?’. ‘Sharing a response’ also depends on the resolution of the measurement after perturbation.

      DNA analogy

      At two points, the author makes a comparison between genetic information (i.e. DNA) and epigenetic information as determined by these heritable regulatory architectures. The two claims the author makes are that (i) heritable architectures are capable of transmitting "more heritable information" than genetic sequences, and (ii) that, unlike DNA, the connectivity (in the sense of mutations) between heritable architectures is sparse and uneven (i.e. some architectures are better connected than others).

      In both cases, the claim is somewhat tenuous -- in essence, it seems an unfair comparison to consider the basic epigenetic unit to be an "entity" (e.g., an entire transcription factor gene product, or an organelle), while the basic genetic unit is taken to be a single base-pair. The situation is somewhat different if the relevant comparison was the typical size of a gene (e.g., 1 kb).

      Considering every base being the unit of stored information in the DNA sequence results in the maximal possible storage capacity of a genome of given length. Any other equivalence between entity and units within the genome (e.g., 1 kb gene) will only reduce the information stored in the genome.

      Nevertheless, the claim was modified to say that the information content of an ESP system can [italics added] be more extensive than the information content of the genome. This accounts for the possibility of an organism that has an inordinately large genome such that maximal information that can be stored in a particular genome sequence exceeds that stored in a particular configuration of all the contents in a cell.

      I thank the reviewer for providing further explanation of this misunderstanding in the second round of review, which helps draw future readers to the sections in the paper that discusses this important point (also see response to Recommendations for the authors).

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      I thank the author for their efforts in replying to the comments. I have updated my review accordingly; in particular, I have:

      (1) Removed my complaint that Heritability is nowhere defined

      (2) Removed issues with the presentation of the ODE model in the supplementary information.

      I thank the reviewer for raising these issues and acknowledging the improvements made.

      However, given that the manuscript is broadly unchanged from the initial one, many of my prior comments remain justified. Some key points:

      (1) The manuscript continues to be difficult to read, for the same reasons as I mentioned when reviewing the paper previously.

      (2) The utility of the "ESP" formalism is still unclear.

      • As the author notes, continuous ODEs are of course an idealisation of a system with discrete copy number.

      • However, discussing this is standard fare in any textbook dealing with chemical dynamics and stochastic processes -- see, for instance, the standard textbook by van Kampen.

      • This seems little reason to reject ODEs and implement a poorly defined formalism/simulation scheme.

      (3) The author claims that many questions raised are "beyond the scope of this study". Indeed, answering all of these questions are beyond the scope of any one study. However, as I initially wrote, the paper would be much stronger if it focused on a particular problem rather than the many vignettes depicted.

      The broad scope of this foundational paper necessitates addressing many issues, which may make it a difficult read for some readers. I hope that future work where each paper focuses on one of the aspects raised here will enable the extensive treatment of limited scope as suggested by the reviewer.

      The utility of ODEs is much appreciated and was indeed a computationally efficient way of exploring the vast space of regulatory architectures. As stated in the response to the public reviews, the Entity-Sensors-Property framework provides a biologically grounded way of parsing interacting regulators. This approach is aligned with the development of mechanistic models for the functions of living systems while being consistent with heredity. In contrast, widely analyzed networks like protein-interaction networks, signaling networks, gene regulatory networks, etc., are not always constrained using these principles.

      On a final note, on the subject of the comparison with DNA:

      Perhaps I have misunderstood something. I simply meant that comparing the "maximal information" with 4 HRAs (12.45 bits) is certainly more than the "maximal information" with 4 basepairs (8 bits), but definitely less than the "maximal information" for four 1-kb genes (4^(4000) combinations, so 8000 bits...)

      Perhaps the author means that the growth in information of HRAs is faster than exponential. If so, that should be shown and then remarked on.

      For this reason, I maintain my comment that the comparison is tenuous.

      This issue was addressed once in the results section and again in the discussion section.

      The results section states that “The combinatorial growth in the numbers of HRAs with the number of interactors can thus provide vastly more capacity for storing information in larger HRAs compared to that afforded by the proportional growth in longer genomes.”

      The discussion section states that “Despite imposing heritability, regulated non-isomorphic directed graphs soon become much more numerous than unregulated non-isomorphic directed graphs as the number of interactors increase (125 vs. 5604 for 4 interactors, Table 1). With just 10 interactors, there are >3x1020 unregulated non-isomorphic directed graphs [60] and HRAs are expected to be more numerous. This tremendous variety highlights the vast amount of information that a complex regulatory architecture can represent and the large number of changes that are possible despite sparsity of the change matrix (Fig. 3).”

      Thus, indeed as the reviewer surmises, the combinatorial explosion in information of HRAs with increases in interacting entities is faster than the proportional growth in information of genome sequence with increases in length.

      In summary, I thank the reviewers and editors for their help in improving the paper and would like to make the current manuscript the Version of Record.


      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      The author studies a family of models for heritable epigenetic information, with a focus on enumerating and classifying different possible architectures. The key aspects of the paper are:

      • Enumerate all 'heritable' architectures for up to 4 constituents.

      • A study of whether permanent ("genetic") or transient ("epigenetic") perturbations lead to heritable changes.

      • Enumerated the connectivity of the "sequence space" formed by these heritable architectures.

      -Incorporating stochasticity, the authors explore stability to noise (transient perturbations). - A connection is made with experimental results on C elegans.

      The study is timely, as there has been a renewed interest in the last decade in nongenetic, heritable heterogeneity (e.g., from single-cell transcriptomics). Consequently, there is a need for a theoretical understanding of the constraints on such systems. There are some excellent aspects of this study: for instance:

      • The attention paid to how one architecture "mutates" into another, establishing the analogue of a "sequence space" for network motifs (Fig 3).

      • The distinction is drawn between permanent ("genetic") and transient ("epigenetic") perturbations that can lead to heritable changes.

      • The interplay between development, generational timescales, and physiological time (as in Fig. 5).

      I thank the reviewer for highlighting these aspects of the work.

      The manuscript would be very interesting if it focused on explaining and expanding these results. Unfortunately, as a whole, it does not succeed in formalising nor addressing any particular open questions in the field. Aside from issues in presentation and modelling choices (detailed below), it would benefit greatly from a more systematic approach rather than the vignettes presented.

      This first paper is foundational and therefore cannot be expected to solve all aspects of the problem of heredity. The work was nevertheless systematic in that (1) for the simple architectures modeled using ordinary differential equations (ODEs) with continuity assumptions, parameters that support steady states were systematically determined for each architecture and then every architecture was explored using genetic changes exhaustively, although epigenetic perturbations were not examined exhaustively because of their wide variety; and (2) for the more realistic modeling of architectures as Entity-Sensor-Property systems, the behavior of systems with respect to architecture as well as parameter space that lead to particular behaviors (persistence, heritable epigenetic change, etc.) was systematically explored. A more extensive exploration of parameter space that also includes the many ways that the interaction between any two entities/nodes could be specified using an equation is a potentially ever-expanding challenge that is beyond the scope of any single paper (see response to additional comments below).

      Specific aspects that remain to be addressed include the application of multiple notions of heritability to real networks of arbitrary size, considering different types of equations for change of each entity/node, and classifying different behavioral regimes for different sets of parameters. As is evident from this list of combinatorial possibilities, the space to be explored is vast and beyond the scope of this foundational paper.

      The key contribution of the paper is an articulation of the crucial questions to ask of any regulatory architecture in living systems rather than the addressing of any question that a field has recognized as ‘open’. Specifically, through the exhaustive listing for small regulatory architectures that can be heritable and the systematic analysis of arbitrary Entity-Sensor-Property systems that more realistically capture regulatory architectures in living systems, this work points the way to constrain inferences after experiments on real living systems. Currently, most experimental biologists engaged in reductionist approaches and some systems biologists examining the function or prevalence of network motifs do not explicitly constrain their models for heritability or persistence. It is hoped that this paper will raise awareness in both communities and lead to more constrained models that minimize biases introduced by incomplete knowledge of the network, which is always the case when analyzing living systems.

      Terminology

      The author introduces a terminology for networks of interacting species in terms of "entities" and "sensors" -- the former being nodes of a graph, and the latter being those nodes that receive inputs from other nodes. In the language of directed graphs, "entities" would seem to correspond to vertices, and "sensors" those vertices with positive indegree and outdegree. Unfortunately, the added benefit of redefining accepted terminology from the study of graphs and networks is not clear.

      The Entities-Sensors-Property (ESP) framework is based on underlying biology and not graph theory, making an ESP system not entirely equivalent to a network or graph, which is much less constrained. The terms ‘entity’, ‘sensor’, and ‘property’ were defined and justified in a previous paper (Jose, J R. Soc. Interface, 2020). While nodes of a network can be parsed arbitrarily and the relationship between them can also be arbitrary, entities and sensors are molecules or collections of molecules that are constrained such that the sensors respond to changes in particular properties of other entities and/or sensors. When considered as digraphs, sensors can be seen as vertices with positive indegree and outdegree. The ESP framework can be applied across any scale of organization in living systems and this specific way of parsing interactions also discretizes all changes in the values of any property of any entity. In short, ESP systems are networks, but not all networks are ESP systems. Therefore, the results of network theory that remain applicable for ESP systems need further investigation. This justification is now repeated in the paper.

      The key utility of the ESP framework is that it is aligned with the development of mechanistic models for the functions of living systems while being consistent with heredity. In contrast, widely analyzed networks like protein-interaction networks, signaling networks, gene regulatory networks, etc., are not always constrained using these principles. In addition, the language of digraphs where sensors can be seen as vertices with positive indegree and outdegree has been also added to aid readers who are familiar with graph theory.

      Heritability

      The primary goal of the paper is to analyse the properties of those networks that constitute "heritable regulatory architectures". The definition of heritability is not clearly stated anywhere in the paper, but it appears to be that the steady-state of the network must have a non-zero expression of every entity. As this is the heart of the paper, it would be good to have the definition of heritable laid out clearly in either the main text or the SI.

      I have now defined the term as used in this paper early, which is indeed as surmised by the reviewer simply the preservation of the architecture and non-zero levels of all entities. I have also highlighted additional notions of heredity that are possible, which will be the focus of future work. These can range from precise reproduction of the concentration and the localization of every entity to a subset of the entities being reproduced with some error while the rest keep varying from generation to generation (as illustrated in Fig. 2 of Jose, BioEssays, 2018). Importantly, it is currently unclear which of these possibilities reflects heredity in real living systems.

      Model

      As described in the supplementary, but not in the main text, the author first chooses to endow these networks with simple linear dynamics; something like $\partial_t \vec{x} = A x - T x$, where the vector $x$ is the expression level of each entity, $A$ has the structure of the adjacency matrix of the directed graph, and $T$ is a diagonal matrix with positive entries that determines the degradation or dilution rate of each entity. From a readability standpoint, it would greatly aid the reader if the long list of equations in the SI were replaced with the simple rule that takes one from a network diagram to a set of ODEs.

      I have abridged the description by eliminating the steady state expression for every HRA as suggested and simply pointed to the earlier version of the paper for those readers who might prefer the explicit derivations of these simple expressions. An overview is now provided for going from any network diagram to a set of ODEs.

      The implementation of negative regulation is manifestly unphysical if the "entities" represent the expression level of, say, gene products. For instance, in regulatory network E, the value of the variable z can go negative (for instance, if the system starts with z= and y=0, and x > 0).

      Negative values for any entity were avoided in simulations by explicitly setting all such values to zero. This constraint has been added as a note in the section describing the equations for the change of each node/entity in each regulatory network. Specifically, the levels of each entity/sensor was set to zero during any time step when the computed value for that entity/sensor was less than zero. This bounding of the function allows for any approach to zero while avoiding negative values. I apologize for the omission of this constraint from the supplemental material in the last submission. This constraint was used in all the simulations and therefore this change does not affect any of the results presented. In this way, it is ensured that the presence of negative regulation does not lead to negative values.

      Formally, the promotion or inhibition of an entity or sensor can be modeled using any function that is either increasing (for promotion) or decreasing (for inhibition). This diversity of possibilities is one of the challenges that prevents exhaustive exploration of all functions. In fact, the use of ODEs after assuming a continuous function is an idealization that facilitates understanding of general principles but is not in keeping with the discreteness of entities or step changes in their values (amount, localization, etc.) observed in living systems. Other commonly used continuous functions include Hill functions for the rate of production of y given as xn/(k + xn) for x activating y, which increases to ~1 as x increases, or given as k/(k + xn) for x inhibiting y, which decreases to ~0 as x increases. Increasing values of ‘n’ result in steeper sigmoidal curves. In reality, levels of all entities/sensors are expected to be discretized by measurement in living systems and the form of the function for any regulation needs empirical measurement in vivo (see response to comment below).

      The model seems to suddenly change from Figure 4 onwards. While the results presented here have at least some attempt at classification or statistical rigour (i.e. Fig 4 D), there are suddenly three values associated with each entity ("property step, active fraction, and number"). Furthermore, the system suddenly appears to be stochastic. The reader is left unsure of what has happened, especially after having made the effort to deduce the model as it was in Figs 1 through 3. No respite is to be found in the SI, either, where this new stochastic model should have been described in sufficient detail to allow one to reproduce the simulation.

      While ODEs are easier to simulate and understand, they are less realistic as explained above. I have now added more explanation justifying the need for the subsequent simulation of Entity-Sensor-Property systems. I have also expanded the information provided for each aspect of the model (previously outlined in Fig. 4A and detailed within the code) in a Supplementary Information section titled ‘Simulation of simple ESP systems’.

      Perturbations

      Inspired especially by experimental manipulations such as RNAi or mutagenesis, the author studies whether such perturbations can lead to a heritable change in network output. While this is naturally the case for permanent changes (such as mutagenesis), the author gives convincing examples of cases in which transient perturbations lead to heritable changes. Presumably, this is due the the underlying mutlistability of many networks, in which a perturbation can pop the system from one attractor to another.

      Unfortunately, there appears to be no attempt at a systematic study of outcomes, nor a classification of when a particular behaviour is to be expected. Instead, there is a long and difficult-to-read description of numerical results that appear to have been sampled at random (in terms of both the architecture and parameter regime chosen). The main result here appears to be that "genetic" (permanent) and "epigenetic" (transient) perturbations can differ from each other -- and that architectures that share a response to genetic perturbation need not behave the same under an epigenetic one. This is neither surprising (in which case even illustrative evidence would have sufficed) nor is it explored with statistical or combinatorial rigour (e.g. how easy is it to mistake one architecture for another? What fraction share a response to a particular perturbation?)

      The systematic study of all arbitrary regulatory architectures is beyond the scope of this paper and, as stated earlier, beyond the scope of any one paper. Nevertheless 225,000 arbitrary Entity-Sensor-Property systems were systematically explored and collections of parameters that lead to particular behaviors provided (e.g., 78,285 are heritable). These ESP systems more closely mimic regulation in living systems than the coupled ODE-based specification of change in a regulatory architecture.

      The example questions raised here are not only difficult to answer, but subjective and present a moving target for future studies. One, ‘how easy is it to mistake one architecture for another?’. Mistaking one architecture for another clearly depends on the number of different types of experiments one can perform on an architecture and the resolution with which changes in entities can be measured to find distinguishing features. Two, ‘What fraction share a response to a particular perturbation?’. ‘Sharing a response’ also depends on the resolution of the measurement of entities after perturbation.

      As an additional comment, many of the results here are presented as depending on the topology of the network. However, each network is specified by many kinetic constants, and there is no attempt to consider the robustness of results to changes in parameters.

      The interpretations presented are conservative determinations of heritability based on the topology of the architecture. In other words, architectures that can be heritable for some set of parameters. Of course, parameter sets can be found that make any regulatory architecture not heritable. As stated earlier, exploring all parameters for even one architecture is beyond the scope of a single study because of the infinitely many ways that the interaction between any two entities can be specified.

      DNA analogy

      At two points, the author makes a comparison between genetic information (i.e. DNA) and epigenetic information as determined by these heritable regulatory architectures. The two claims the author makes are that (i) heritable architectures are capable of transmitting "more heritable information" than genetic sequences, and (ii) that, unlike DNA, the connectivity (in the sense of mutations) between heritable architectures is sparse and uneven (i.e. some architectures are better connected than others).

      In both cases, the claim is somewhat tenuous -- in essence, it seems an unfair comparison to consider the basic epigenetic unit to be an "entity" (e.g., an entire transcription factor gene product, or an organelle), while the basic genetic unit is taken to be a single base-pair. The situation is somewhat different if the relevant comparison was the typical size of a gene (e.g., 1 kb).

      Considering every base being the unit of stored information in the DNA sequence results in the maximal possible storage capacity of a genome of given length. Any other equivalence between entity and units within the genome (e.g., 1 kb gene) will only reduce the information stored in the genome.

      Nevertheless, the claim has been modified to say that the information content of an ESP system can [italics added] be more extensive than the information content of the genome. This accounts for the possibility of an organism that has an inordinately large genome such that maximal information that can be stored in a particular genome sequence exceeds that stored in a particular configuration of all the contents in a cell.

      Reviewer #2 (Public Review):

      Summary:

      This manuscript uses an interesting abstraction of epigenetic inheritance systems as partially stable states in biological networks. This follows on previous review/commentary articles by the author. Most of the molecular epigenetic inheritance literature in multicellular organisms implies some kind of templating or copying mechanisms (DNA or histone methylation, small RNA amplification) and does not focus on stability from a systems biology perspective. By contrast, theoretical and experimental work on the stability of biological networks has focused on unicellular systems (bacteria), and neglects development. The larger part of the present manuscript (Figures 1-4) deals with such networks that could exist in bacteria. The author classifies and simulates networks of interacting entities, and (unsurprisingly) concludes that positive feedback is important for stability. This part is an interesting exercise but would need to be assessed by another reviewer for comprehensiveness and for originality in the systems biology literature. There is much literature on "epigenetic" memory in networks, with several stable states and I do not see here anything strikingly new.

      The key utility of the initial part of the paper is the exhaustive enumeration of all small heritable regulatory architectures. The implications for the abundance of ‘network motifs’ and more generally any part of a network proposed to perform a particular function is that all such parts need to be compatible with heredity. This principle is generally not followed in the literature, resulting in incomplete networks being interpreted as having motifs or modules with autonomous function. Therefore, while the need for positive feedback for stability is indeed obvious, it is not consistently applied by all. For example, the famous synthetic circuit ‘the repressilator’ (Elowitz and Leibler, “A synthetic oscillatory network of transcriptional regulators”, Nature, 2000), which is presented as an example of ‘rational network design’, has three transcription factors that all sequentially inhibit the production of another transcription factor in turn forming a feedback loop of inhibitory interactions. Therefore, the contributions of the factors that promote the expression of each entity is unknown and yet essential for heritability. The comprehensive listing of the heritable regulatory architectures that are simple provide the basis for true synthetic biology where the contributing factors for observed behavior of the network are explicitly considered only after constraining for heredity. Using this principle, the minimal autonomous architecture that can implement the repressilator is the HRA ‘Z’ (Fig. 1).

      An interesting part is then to discuss such networks in the framework of a multicellular organism rather than dividing unicellular organisms, and Figure 5 includes development in the picture. Finally, Figure 6 makes a model of the feedback loops in small RNA inheritance in C. elegans to explain differences in the length of inheritance of silencing in different contexts and for different genes and their sensitivity to perturbations. The proposed model for the memory length is distinct from a previously published model by Karin et al. (ref 49).

      I thank the reviewer for appreciating this aspect of the paper.

      Strengths:

      A key strength of the manuscript is to reflect on conditions for epigenetic inheritance and its variable duration from the perspective of network stability.

      I thank the reviewer for appreciating the importance of the overall topic.

      Weaknesses:

      • I found confusing the distinction between the architecture of the network and the state in which it is. Many network components (proteins and RNAs) are coded in the genome, so a node may not disappear forever.

      I have added language to clarify the many states of a network versus its architecture (also illustrated in Fig. 4 for ESP systems). Even loss of expression below a threshold can lead to permanent loss if there is not sufficient noise to induce re-expression. For example, consider the simple case of a transcription factor that binds to its own promoter, requiring 10 molecules for the activation of the promoter and thus production of more of the same transcription factor. If an epigenetic change (e.g., RNA interference) reduces the levels to fewer than 10 molecules and if the noise in the system never results in the numbers of the transcription factor increasing beyond 10, the transcription factor has been effectively lost permanently. In this way, reduction of a regulator can lead to permanent change despite the presence of the DNA. Many papers in the field of RNA silencing in C. elegans have provided strong experimental evidence to support this assertion.

      • From the Supplementary methods, the relationship between two nodes seems to be all in the form of dx/dt = Kxy . Y, which is just one way to model biological reactions. The generality of the results on network architectures that are heritable and robust/sensitive to change is unclear. Other interactions can have sigmoidal effects, for example. Is there no systems biology study that has addressed (meta)stability of networks before in a more general manner?

      Indeed, the relationship between any two entities can in principle be modeled using any function. Extensive exploration of the behavior of any regulatory architecture – even the simplest ones – require simplifications. For example, early work by Stuart Kauffman explored Boolean networks (see ref. 10 in the paper for history and extensive explanations). However, allowing all possible ways of specifying the interactions between components of a network makes analysis both a computational and conceptual challenge.

      • Why is auto-regulation neglected? As this is a clear cause of metastable states that can be inherited, I was surprised not to find this among the networks.

      Auto-regulation in the sense of some molecule/entity ultimately leading to the production of more of itself is present in every heritable regulatory architecture. Specifically, all auto-regulatory loops rely on a sequence of interactions between two or more kinds of molecules. For example, a transcription factor (TF) binding to the promoter of its own gene sequence, resulting in the production of more TF protein is a positive feedback loop that relies on many interacting factors (transcription, translation, nuclear import, etc.) and can be considered as ‘auto-regulation’ as it is sometimes referred to in the literature. In this sense, every HRA (A through Z) includes ‘auto-regulation’ or more appropriately positive feedback loops. For example, in the HRA ‘A’, x ‘auto-regulates’ itself via y.

      • I did not understand the point of using the term "entity-sensor-property". Are they the same networks as above, now simulated in a computer environment step by step (thus allowing delays)?

      Please see response to the other reviewer regarding the need for the Entity-SensorProperty framework and how it is distinct from generic networks. Briefly, the ODE-based simple networks, while easy to analyze, are not realistic because of the assumptions of continuity. In contrast ESP systems are more realistic with measurement discretizing changes in property values as is expected in real living systems.

      • The final part applies the network modeling framework from above to small RNA inheritance in C. elegans. Given the positive feedback, what requires explanation is how fast the system STOPs small RNA inheritance. A previous model (Karin et al., ref. 49) builds on the fact that factors involved in inheritance are in finite quantity hence the different small RNAs "compete" for amplification and those targeting a given gene may eventually become extinct.

      The present model relies on a simple positive feedback that in principle can be modulated, and this modulation remains outside the model. A possibility is to add negative regulation by factors such as HERI-1, that are known to limit the duration of the silencing.

      The duration of silencing differs between genes. To explain this, the author introduces again outside the model the possibility of piRNAs acting on the mRNA, which may provide a difference in the stability of the system for different transcripts. At the end, I do not understand the point of modeling the positive feedback.

      The previous model (Karin et al., Cell Systems, 2023) can describe populations of genes that are undergoing RNA silencing but cannot explain the dynamics of silencing particular genes. Furthermore, this model also cannot explain cases of effectively permanent silencing of genes that have been reported (e.g., Devanapally et al., Nature Communications, 2021 and Shukla et al., Current Biology, 2021). Finally, the observations of susceptibility to, recovery from, and even resistance to trans silencing (e.g., Fig. 5a in Devanapally et al., Nature Communications, 2021) require an explanation that includes modulation of the HRDE-1-dependent positive feedback loop that maintains silencing across generations.

      The specific qualitative predictions regarding the relationship between piRNA-mediated regulation genome-wide and HRDE-1-dependent silencing of a particular gene across generations could guide the discovery of potential regulators of heritable RNA silencing. The equations (4) and (5) in the paper for the extent of modulation needed for heritable epigenetic change provide specific quantitative predictions that can be tested experimentally in the future. I have also revised the title of the section to read ‘Tuning of positive feedback loops acting across generations can explain the dynamics of heritable RNA silencing in C. elegans’ to emphasize the above points.

      • From the initial analysis of abstract networks that do not rely on templating, I expected a discussion of possible examples from non-templated systems and was a little surprised by the end of the manuscript on small RNAs.

      The heritability of any entity relies on regulatory interactions regardless of whether a templated mechanism is also used or not. For example, DNA replication relies on the interactions between numerous regulators, with only the sequence being determined by the template DNA. The field of small RNA-mediated silencing facilitates analysis of epigenetic changes at single-gene resolution (Chey and Jose, Trends in Genetics, 2022). It is therefore likely to continue to provide insights into heritable epigenetic changes and how they can be modulated. Unfortunately, there are currently no known cases of epigenetic inheritance where the role of any templated mechanism has been conclusively excluded. Future research will improve our understanding of epigenetic states and their modulation in terms of changes in positive feedback loops as proposed in this study and potentially lead to the discovery of such mechanisms that act entirely independent of any template-dependent entity.

      Recommendations for the authors:

      I thank the reviewers for their specific suggestions to improve the paper.

      Reviewer #1 (Recommendations For The Authors):

      The paper has many long paragraphs that attempt to explain results, make illustrations, and give intuition. Unfortunately, these are difficult to read. It would aid the reader greatly if these were, say, converted into cartoons (even if only in the SI), or made more accessible in some other way.

      I agree with the importance of making the material accessible to readers in multiple ways. I have now added a figure with schematics in the SI titled ‘Illustrations of key concepts’ (new Fig. S2), which collects concepts that are relevant throughout the paper and might aid some readers.

      The bulk of the supplementary is currently a collection of elementary mathematics results: to whit, pages 26 to 33 of the combined manuscript carry no more information than a quick description of the general model and the diagrams in Fig 1. Similarly, pages 34 to 39 (non-zero dilution rate), and pages 39 through 58 (response to permanent changes) each express a trivial mathematical point that is more than sufficiently made with one illustrative example.

      I agree with the reviewer and have condensed these pages as suggested. I have added a pointer to the earlier version as containing further details for the readers who might prefer the explicit listing of these equations.

      Overall, the paper appears to be a collection of numerical results obtained from different models, united by uncertain terminology that is not fully defined in this paper. The most promising aspects of the paper lie either in (a) combinatorially complete enumeration of all regulatory architectures, or (b) relating experimental manipulations in C. elegans to possible underlying regulatory architectures. Focusing on one or the other might improve the readability of the paper.

      The two sections of the paper are complementary and when presented together help with the integration of concepts rather than the siloed pursuit of theory versus experimental analysis. When this work was presented at meetings before submission, it was clear that different researchers appreciated different aspects. This divergence is also apparent in the two reviews, with each reviewer appreciating different aspects. I have repeated the definitions and justifications from the earlier paper (Jose, J R Soc Interface, 2020) to provide a more fluid transition between the two complementary sections of the paper. Knowing both sides could aid in the development of models that are not only consistent with measurable quantities (e.g., anything that can be considered an entity) but are also logically constrained (e.g., entities matched with sensors while avoiding any entities that do not have a source of production – i.e., avoiding nodes with indegree = 0).

      However, having said that many results of these types are well-known in models of regulatory networks, and it is unclear what precisely warrants the new framework that the author is proposing. Indeed, it would be good to understand in what way the framework here is novel, and how it is distinguished from prior studies of regulatory networks.

      The key novelty of the work is the consideration of heritability for any regulation. With the explicit definition of the heritability for a regulatory architecture and the acknowledgement that there can be more than one notion of heredity, this paper now sets the foundation for examining many real networks in this light. I hope that the added justifications for the current framework in the revised paper strengthen these arguments. Future literature reviews on networks in general and how they address heritability or persistence will better define the prevalence of these considerations. Currently, most experimental biologists engaged in reductionist approaches and some systems biologists examining the function or prevalence of network motifs do not explicitly constrain their models for heritability or persistence. It is hoped that this work will raise awareness in both communities and lead to more constrained models that acknowledge incomplete knowledge of the network, which is always the case when analyzing living systems.

      Reviewer #2 (Recommendations For The Authors):

      Minor points/clarity

      • page 1 line 57: "transgenerational waveforms that preserve form and function" is unclear.

      This phrase was expanded upon in a previous paper (Jose, BioEssays, 2020). I have now added more explanation in this paper for completeness. The section now reads ‘For example, the localization and activity of many kinds of molecules are recreated in successive generations during comparable stages [1-3]. These recurring patterns can change throughout development such that following the levels and/or localizations of each kind of molecule over time traces waveforms that return in phase with the similarity of form and function across generations [2].’

      • page 7 line 3-6: the sentence has an ambiguous structure.

      I have now edited this long sentence to read as follows: ‘For systematic analysis, architectures that could persist for ~50 generations without even a transient loss of any entity/sensor were considered HRAs. Each HRA was perturbed (loss-of-function or gain-of-function) after five different time intervals since the start of the simulation (i.e., phases). The response of each HRA to such perturbations were compared with that of the unperturbed HRA.’

      • page 9 lines 25-27: the sentence is convoluted: are you defining epigenetic inheritance?

      I have simplified this sentence describing prior work by others (Karin et al., Cell Systems, 2023) and moved a clause to the subsequent sentence. This section now reads: ‘Recent considerations of competition for regulatory resources in populations of genes that are being silenced suggest explanations for some observations on RNA silencing in C. elegans [49]. Specifically, based on Little’s law of queueing, with a pool of M genes silenced for an average duration of T, new silenced genes arise at a rate  that is given by M = T’. I have also provided more context by preceding this section with: ‘Although the release of shared regulators upon loss of piRNA-mediated regulation in animals lacking PRG-1 could be adequate to explain enhanced HRDE-1-dependent transgenerational silencing initiated by dsRNA in prg-1(-) animals, such a competition model alone cannot explain the observed alternatives of susceptibility, recovery and resistance (Fig. 6A).’

      • page 13 lines 51-53. This last sentence of the discussion is ambiguous/unclear.

      I have now rephrased this sentence to read: ‘This pathway for increasing complexity through interactions since before the origin of life suggests that when making synthetic life, any form of high-density information storage that interacts with heritable regulatory architectures can act as the ‘genome’ analogous to DNA.’

      • Figure 2: the letters in the nodes are hard to read; the difference between full and dotted lines in the graphs also.

      I have enlarged the nodes and widened the gap in the dotted lines to make them clearer. I have also similarly edited Fig. 1 and Fig. S3 to Fig. S9.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This important study convincingly shows that the less common D-serine stereoisomer is transported in the kidney by the neutral amino acid transporter ASCT2 and that it is a noncanonical substrate for sodium-coupled monocarboxylate transporter SMCTs. With a multihierarchical approach, this important study further shows that Ischemia-Reperfusion Injury in the kidney causes a specific increment in renal reabsorption carried out, in part, by ASCT2.

      Public Reviews:

      Reviewer #1 (Public Review):

      Most amino acids are stereoisomers in the L-enantiomer, but natural D-serine has also been detected in mammals and its levels shown to be connected to a number of different pathologies. Here, the authors convincingly show that D-serine is transported in the kidney by the neutral amino acid transporter ASCT2 and as a non-canonical substrate for the sodium-coupled monocarboxylate transporter SMCTs. Although both transport D-serine, this important study further shows in a mouse model for acute kidney injury that ASCT2 has the dominant role.

      Strengths:

      The paper combines proteomics, animal models, ex vivo transport analyses, and in vitro transport assays using purified components. The exhaustive methods employed provide compelling evidence that both transporters can translocate D-serine in the kidney.

      Weakness:

      In the model for acute kidney injury, the SMCTs proteins were not showing a significant change in expression levels and were rather analysed based on other, circumstantial evidence. Although its clear SMCTs can transport D-serine its physiological role is less obvious compared to ASCT2.

      We greatly value the reviewer's efforts and feedback in reviewing our manuscript. We acknowledge the reviewer's observation that the changes indicated by our proteomic results are not markedly pronounced. To reinforce our findings, we have incorporated an analysis of gene alterations at the single-cell level (snRNA-seq) from the publicly accessible IRI mouse model data (Figure supplement 7). The snRNA-seq data align with our proteomic data in terms of the general trend of gene/protein alterations, but reveal more substantial changes in both ASCT2 and SMCTs. These discrepancies might stem from the different quantification methods used, suggesting a possible underestimation in our label-free proteomic quantification. The differences we see between the functional changes in transporters and their quantification in proteomics can be explained by the unique challenges posed by membrane proteins. Post-translational modifications and the complex nature of multiple transmembrane domains often impact the accurate measurement of these proteins in proteomic studies. This complexity can lead to a mismatch between the actual functional changes occurring in the transporters and their perceived abundance or alterations as detected by proteomic methods (Figure 4A) (Schey KL et al. Biochemistry 2015, doi: 10.1021/bi301604j). However, this label-free quantitative proteomics approach is well-suited for our study, given its screening efficiency, compatibility with animal models, and the absence of a labeling requirement. We may consider incorporating alternative quantitative proteomic methods in future for a more thorough comparison. We have included these considerations in lines 351-356 of the revised manuscript.

      Manuscript lines 351-356

      “When evaluating the extent of gene/protein alterations between the control and IRI conditions, we observed that the gene alterations of both Asct2 and Smcts, as revealed by snRNAsequencing, are more pronounced than the protein alteration ratios obtained from proteomics. This discrepancy may stem from difficulty in the quantification method, especially for membrane transport proteins in label-free quantitative proteomics.”

      Regarding the roles of ASCT2 and SMCTs in renal D-serine transport, snRNA-seq showed that ASCT2 expression in the controls is less than 10% of the cell population. We suggest that ASCT2 contributes to D-serine reabsorption because of its high affinity and SMCTs (SMCT1 and SMCT2) would play a role in D-serine reabsorption in the cells without ASCT2 expression. In addition, we included other factors (the turnover rate and the presence of local canonical substrates) that may determine the capability of D-serine reabsorption. We have included this suggestion in the Discussion lines 386-404.

      Manuscript lines 386-404

      “Kinetics analysis of D-serine transport revealed the high affinity by ASCT2 (Km 167 µM) (Foster et al., 2016) and low affinity by SMCT1 (Km 3.39 mM; Figure 5E). In addition to transport affinity, the expression levels and co-localization of multiple transporters within the same cells are critical for elucidating the physiological roles of transporters or transport systems (Sakaguchi et al., 2024). In our proteome data, the chromatogram intensities of Smct1 (2.9 x 109 AU) and Smct2 (1.6 x 108 AU) were significantly higher than that of Asct2 (1.5 x 107 AU) in control mice (Table 1: abundance in Sham). While direct intensity comparisons between different proteins in mass spectrometry analyses are not precise, they can provide a general indication of relative protein amounts. This finding aligns with the snRNA-seq data, where Asct2 expression was found to be minimal, present in less than 10% of cell populations under both control and IRI conditions, suggesting that many cells do not express Asct2. Conversely, Smct1 and Smct2 show high and ubiquitous expression in control conditions, but their levels are markedly reduced in IRI conditions (Figure supplement 7). Our ex vivo assays demonstrate that both ASCT2 and SMCTs mediate D-serine transport (Figure 7B). Consequently, Asct2 may contribute to D-serine reabsorption due to its high affinity, whereas Smcts, owing to their abundance, particularly in cells lacking Asct2, likely play a significant role in D-serine reabsorption. Moreover, factors such as transport turnover rate (Kcat) and the presence of local canonical substrates are also vital in defining the overall contribution of Dserine transport systems.”

      Reviewer #2 (Public Review):

      Summary:

      The manuscript "A multi-hierarchical approach reveals D-1 serine as a hidden substrate of sodium-coupled monocarboxylate transporters" by Wiriyasermkul et al. is a resubmission of a manuscript, which focused first on the proteomic analysis of apical membrane isolated from mouse kidney with early Ischemia-Reperfusion Injury (IRI), a well-known acute kidney injury (AKI) model. In the second part, the transport of D-serine by Asct2, Smct1, and Smct2 has been characterized in detail in different model systems, such as transfected cells and proteoliposomes.

      Strengths:

      A major problem with the first submission was the explanation of the link between the two parts of the manuscript: it was not very clear why the focus on Asct2, Smct1, and Smct2 was a consequence of the proteomic analysis. In the present version of the manuscript, the authors have focused on the expression of membrane transporters in the proteome analysis, thus making the reason for studying Asct2, Smct1, and Smct2 transporters more clear. In addition, the authors used 2D-HPLC to measure plasma and urinary enantiomers of 20 amino acids in plasma and urine samples from sham and Ischemia-Reperfusion Injury (IRI) mice. The results of this analysis demonstrated the value of D-serine as a potential marker of renal injury. These changes have greatly improved the manuscript and made it more convincing.

      We deeply appreciate the reviewer’s comments on the manuscript. We have responded to the recommendations one by one in the later section.

      Reviewer #3 (Public Review):

      Summary:

      The main objective of this work has been to delve into the mechanisms underlying the increment of D-serine in serum, as a marker of renal injury.

      Strengths:

      With a multi-hierarchical approach, the work shows that Ischemia-Reperfusion Injury in the kidney causes a specific increment in renal reabsorption of D-serine that, at least in part, is due to the increased expression of the apical transporter ASCT2. In this way, the authors revealed that SMCT1 also transports D-serine.

      The experimental approach and the identification of D-serine as a new substrate for SMCT1 merit publication in Elife.

      The manuscript also supports that increased expression of ASCT2, even together with the parallel decreased expression of SMCT1, in renal proximal tubules underlies the increased reabsorption of D-serine responsible for the increment of this enantiomer in serum in a murine model of Ischemia-Reperfusion Injury.

      Weaknesses:

      Remains to be clarified whether ASCT2 has substantial stereospecificity in favor of D- versus L-serine to sustain a ~10-fold decrease in the ratio D-serine/L-serine in the urine of mice under Ischemia-Reperfusion Injury (IRI).

      It is not clear how the increment in the expression of ASCT2, in parallel with the decreased expression of SMCT1, results in increased renal reabsorption of D-serine in IRI.

      We thoughtfully appreciate the reviewer’s comment on the manuscript. Considering the alteration of D-/L-serine ratios, there are several factors including protein expression levels at both apical and basolateral sides, properties of the transporters (e.g. transport affinities, substrate stereoselectivities), and the expression of DAAO (D-amino acid oxidase) which selectively degrades D-amino acids. Moreover, the mechanism becomes more complicated when the transport systems of L- and D-enantiomers are different and have distinct stereoselectivities as in the case of serine. Future studies are required to complete the mechanism. However, we would like to explore the mechanism based on the current knowledge.

      From this study, we identified ASCT2 and SMCTs (SMCT1 and SMCT2) as D-serine transport systems. We showed that SMCT1 prefers D-serine. Although we did not analyze ASCT2 stereoselectivity, based on the previous studies, ASCT2 recognizes both D- and Lserine with high affinities and slightly prefers L-enantiomer (Km of 18.4 µM for L-serine in oocyte expression system (Utsunomiya-Tate et al. J Biol Chem 1996) and 167 µM for Dserine in oocyte expression system (Foster et al. Plos ONE 2016), and the IC50 of 0.7 mM for L-serine and 4.9 mM for D-serine (in HEK293 expression systems, Foster et al. PLOS ONE 2016). The proteomics showed an increase of ASCT2 (1.6-fold increase) and a decrease of SMCTs (1.7-fold decrease in SMCT1, and 1.3-fold decrease in SMCT2) in IRI conditions. The table below summarizes D-serine transport by ASCT2 and SMCTs.

      In the case of L-serine, ASCT2 and B0ATs (in particular B0AT3) have been revealed as L-serine transport systems in the kidneys (Bröer et al. Physiol Rev 2008; Singer et al. J Biol Chem 2009). Proteomics showed that B0ATs have higher expression levels than ASCT2 supporting the idea that B0ATs are the main L-serine transport system (Table S1: Abundance of B0AT1 = 1.34E+09, B0AT3 = 2.13E+08, ASCT2 = 1.46E+07). In IRI conditions, B0AT3 decreased 1.8 fold and B0AT1 decreased 1.1 fold. From these results, we included the contribution of B0ATs in L-serine transport in Author response table 1.

      Author response table 1.

      Taken together, we suggest that high ratios of D-/L-serine in IRI conditions are a combinational result of 1) increase of D-serine reabsorption by ASCT2 enhancement and SMCTs reduction and 2) decrease of L-serine reabsorption by B0ATs. We have included this suggestion in the Discussion lines 438-451.

      Manuscript lines 438-451

      “The enantiomeric profiles of serine revealed distinct plasma D/L-serine ratio, with low rations in the normal control but elevated ratios in IRI, despite the weak stereoselectivity of ASCT2 (Figure 1B). This observation suggested differential renal handling of D-serine compared to L-serine. While we identified SMCTs as a D-serine transport system, it has been reported that L-serine reabsorption is mediated by B0AT3 (Singer et al., 2009). We propose that the alterations in plasma and urinary D/L-serine ratios are the combined outcomes of: 1) transport systems for L-serine, and 2) transport systems for D-serine. In normal kidneys, the low plasma D/L-serine ratios could result from the efficient reabsorption of L-serine by B0AT3, coupled with the DAAO activity that degrades intracellular D-serine reabsorbed by SMCTs. In IRI conditions, our enantiomeric amino acid profiling revealed low plasma L-serine and high urinary L-serine (Figure supplements 1B, 2B). Additionally, the proteomic analysis indicated a reduction in B0AT3 levels (4h IRI/sham = 0.56 fold; 8h IRI/sham = 0.65 fold; Table S1). These observations suggest that the low L-serine reabsorption in IRI is a result of B0AT3 reduction.”

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      This is a thorough study that was reviewed previously under the old system. I think the authors have strengthened their findings and have no further suggestions.

      We appreciate reviewer 1 for his/her effort and comments, which greatly contributed to improving this manuscript.

      Reviewer #2 (Recommendations For The Authors):

      The experiments seem to me to have been well performed and the data are readily available.

      Weaknesses:

      More than weakness I would speak of discussion points: I have a few suggestions that may help to make the paper more accessible to a general audience.

      (1) In the Introduction, when the authors introduce the term "micromolecules", it would be beneficial to provide a precise definition or clarification of what they mean by this term. Adding a brief explanation may help the reader to better understand the context.

      Following the reviewer’s comment, we have included the explanation of the micromolecule and membrane transport proteins in lines 41-43.

      Manuscript lines 41-43

      “Membrane transport proteins function to transport micromolecules such as nutrients, ions, and metabolites across membranes, thereby playing a pivotal role in the regulation of micromolecular homeostasis.”

      (2) In line 91, I suggest specifying that this is a renal IRI model.

      Following the reviewer’s comment, we have added the information that it is a renal IRI model of AKI (lines 90-92).

      Manuscript lines 90-92

      “We applied 2D-HPLC to quantify the plasma and urinary enantiomers of 20 amino acids of renal ischemia-reperfusion injury (IRI) mice, a model of AKI and AKI-to-CKD transition (Sasabe et al., 2014; Fu et al., 2018).”

      (3) Lines 167-168 state that Asct2 is localised to the apical side of the renal proximal tubules. Is there any expression of Asct2 in other nephron segments?

      To our knowledge, there is no report of ASCT2 expression in other nephron segments. Our immunofluorescent data of the ASCT2 staining in the whole kidney at the low magnification and another region of Figure 3 (below) as well as immunohistochemistry from Human Protein Atlas (update: Jun 9th, 2023) did not show a strong signal of ASCT2 expression in other regions besides the proximal tubules. Thus, we conclude that ASCT2 is mainly expressed in proximal tubules, but not in other nephron regions.

      Author response image 1.

      (4) Lines 225-226: Have the authors expressed the candidate genes in HEK293 cells with ASCT2 knockdown?

      This experiment was done by expressing the candidate genes in the presence of endogenous ASCT2. We have added the information in lines 225-227 to emphasize this process.

      Manuscript lines 225-227

      “Based on this finding, we utilized cell growth determination assay as the screening method even in the presence of endogenous ASCT2 expression. HEK293 cells were transfected with human candidate genes without ASCT2 knockdown.”

      (5) Lines 254-255: why was D-serine transport enhanced by ASCT2 knockdown in FlpInTRSMCT1 or 2 cells?

      We appreciate the reviewer to point out this data. We apologize for causing the confusion in the text. The total amount of D-serine uptake in the cells did not enhance but the net uptake (uptake subtracted from the background) was increased. This enhancement is a result of the lower background by ASCT2 knockdown. We have revised the texts and explained this result in more detail (lines 256-258).

      Manuscript lines 256-258

      “In the cells with ASCT2 knockdown, the background level was lower, thereby enhancing the D-[3H]serine transport contributed by both SMCT1 and SMCT2 (the net uptake after subtracted with background) (Figure 5C).”

      (6) Line 265: The low affinity of SMCT1 for D-serine alone makes it an unlikely transporter for urinary D-serine.

      We admitted the reviewer’s concern about the low affinity of SMCT1. However, Km at mM range is widely accepted for several low-affinity amino acid transporters such as proton-coupled amino acid transporter PAT1 (Km = 2 – 5 mM; Miyauchi et al. Biochem J 2010), cationic amino acid transporter CAT2A (Km = 3 – 4 mM; Closs et al. Biochem 1997), and large-neutral amino acid transporter LAT4 (Km = 17 mM; Bodoy et al. J Biol Chem 2005). In the kidneys, many compounds are well-known to be reabsorbed by the low-affinity but high-capacity (high-expression) transporters. Similarly, D-serine was reported to be reabsorbed by the low-affinity transporter (Kragh-Hansen and Sheikh, J Physiol 1984; Shimomura et al. BBA 1988; Silbernagl et al. Am J Physiol Renal Physiol 1999). Moreover, amino acid profile showed urinary D-serine in the range of 100 – 200 µM (Figure supplement 2). This concentration range could drive SMCT1 function (Figure 5). Combined with the high and ubiquitous expression of SMCT1, we propose that SMCT1 is a low-affinity but highcapacity D-serine transporter in the kidneys.

      snRNA-seq is a method that can directly compare the expression levels between different genes within the same cells. From Figure supplement 7, expression of SMCT1 is much more abundant than ASCT2. ASCT2 was presented in less than 10% of cell population. It is possible that 90% of the cells that do not express ASCT2 use SMCT1 to reabsorb Dserine.

      We have revised the Discussion regarding this comment (lines 386-404).

      Manuscript lines 386-404

      “Kinetics analysis of D-serine transport revealed the high affinity by ASCT2 (Km 167 µM) (Foster et al., 2016) and low affinity by SMCT1 (Km 3.39 mM; Figure 5E). In addition to transport affinity, the expression levels and co-localization of multiple transporters within the same cells are critical for elucidating the physiological roles of transporters or transport systems (Sakaguchi et al., 2024). In our proteome data, the chromatogram intensities of Smct1 (2.9 x 109 AU) and Smct2 (1.6 x 108 AU) were significantly higher than that of Asct2 (1.5 x 107 AU) in the control mice (Table 1: abundance in Sham). While direct intensity comparisons between different proteins in mass spectrometry analyses are not precise, they can provide a general indication of relative protein amounts. This finding aligns with the snRNA-seq data, where Asct2 expression was found to be minimal, present in less than 10% of cell populations under both control and IRI conditions, suggesting that many cells do not express Asct2. Conversely, Smct1 and Smct2 show high and ubiquitous expression in control conditions, but their levels are markedly reduced in IRI conditions (Figure supplement 7). Our ex vivo assays demonstrate that both ASCT2 and SMCTs mediate D-serine transport (Figure 7B). Consequently, Asct2 may contribute to D-serine reabsorption due to its high affinity, whereas Smcts, owing to their abundance, particularly in cells lacking Asct2, likely play a significant role in D-serine reabsorption. Moreover, factors such as transport turnover rate (Kcat) and the presence of local canonical substrates are also vital in defining the overall contribution of Dserine transport systems.”

      (7) Line 316: The authors state that there is a high tubular D-serine reabsorption in IRI and in line 424 there is an inactivation of DAAO during the pathology. This suggests that there is a reabsorption of D-serine mediated by a transport system in the basolateral membrane domain of proximal tubular cells. Do the authors have any information about this transporter?

      We agree with the reviewer that transporters at the basolateral membrane are important to complete the D-serine reabsorption in the kidney, and have included this issue in the original manuscript. We stated that transport systems at the basolateral side are necessary to be analyzed in order to complete the picture of D-serine transport systems in the kidney (lines 481-483 of the revised manuscript). However, we did not have any strong candidates for basolateral D-serine transport systems. Because we analyzed the proteome of BBMV, which concentrates on the apical membrane proteins, the analysis did not detect several transporters at the basolateral side.

      (8) In lines 462-463, the authors state: "It is suggested that PAT1 is less active at the apical membrane where the luminal pH is neutral". However, the pH of urine in the proximal tubules is normally acidic due to the high activity of NH3. I suggest rewording this sentence.

      Thank you for your comment. Proximal tubule (PT) is the first and the main region to maintain acid-base homeostasis in the kidney. In PT cells, NH3 secretes H+ to titrate luminal HCO3- and creates CO2, which is absorbed into PT cells and produces "new intracellular HCO3-", which is subsequently reabsorbed into the blood. Although ion fluxes in PT is to maintain the pH homeostasis, the pH regulation in both luminal and intracellular PT cells is highly dynamic. We totally agree with the reviewer and to follow that, we have revised the text by emphasizing the pH around PT segments, rather than the final urine pH, and leaving the discussion open for the possibility of PAT1 function in PT of normal kidneys (lines 474481).

      Manuscript lines 474-481

      “PAT1, a low-affinity proton-coupled amino acid transporter (Km in mM range), has been found at both sub-apical membranes of the S1 segment and inside of the epithelia (The Human Protein Atlas: https://www.proteinatlas.org; updated on Dec 7th, 2022) (Sagné et al., 2001; Vanslambrouck et al., 2010). PAT1 exhibits optimum function at pH 5 - 6 but very low activity at pH 7 (Miyauchi et al., 2005; Bröer, 2008b). Future research is required to address the significance of PAT1 on D-serine transport in the proximal tubule segments where pH regulation is known to be highly dynamic (Boron, 2006; Nakanishi et al., 2012; Bouchard and Mehta, 2022; Imenez Silva and Mohebbi, 2022).”

      Reviewer #3 (Recommendations For The Authors):

      The authors proposed that the increased expression of ASCT2, even together with the decreased expression of SMCT1/2, causes the increased renal reabsorption of D-serine that occurs in IRI. In the discussion, the main argument to sustain this hypothesis is the higher apparent affinity for D-serine of ASCT2 (<200 uM Km) versus SMCT1 (3.4 mM Km). In the Discussion section (page 18- 1st complete paragraph), the authors indicate that the Mass Spec intensities of SMCT1 and 2 are two and one order of magnitude higher respectively than that of ASCT2. This suggests that SMCT1 is clearly more expressed than ASCT2 in control conditions. IRI increments ASCT2 protein expression in brush-border membrane vesicle from kidney 1.6 folds and decreases that of SMCT1 0.6 folds. How this fold changes, even taking into account the lower Km of ASCT2 versus SMCT1 would explain the dramatic changes in the D-/L-serine ratios in plasma and urine in IRI? The authors might discuss whether other transport characteristics, even unknown (e.g., a higher turnover rate of ASCT2 vs SMCT1), would also contribute to the higher D-serine reabsorption in IRI.

      SMCT1 shows some enantiomer selectivity for D- vs L-serine. At 50 uM concentration the transport is almost double for D. vs L-serine, but is ASCT2 stereoselective between the two enantiomers of serine? Some of the authors of this manuscript showed in a previous paper that the basolateral transporter Asc1 also participates in the accumulation of D-serine in serum caused by renal tubular damage. (Serum D-serine accumulation after proximal renal tubular damage involves neutral amino acid transporter Asc-1. Suzuki M et al. Sci Rep. 2019 Nov 13;9(1):16705 (PMID: 31723194)). Asc1 shows no stereoselectivity between L- and D-serine. Can the authors discuss possible mechanisms resulting in increased renal reabsorption of Dserine than L-serine in IRI with the participation of transporters with modest stereoselectivity for D- vs L-serine?

      We appreciate the reviewer’s comments on the degree of protein alteration in proteomics, the functional contributions of ASCT2 and SMCTs, and the alteration of D/L ratios. We have included the possibilities of the technical concerns and the discussion on the roles of ASCT2 and SMCTs as follows.

      • Regarding the expression levels, proteomics and snRNA-seq showed the same tendency that ASCT2 increase and SMCTs decrease in IRI conditions. However, the degrees of alterations are more contrast in snRNA-seq. This may be due to the difference in quantification methods and probably points out the underestimated quantification of membrane transport proteins in label-free proteomics. The accuracy of protein quantifications in the label-free proteomics are often impacted by the presence of post-translational modifications and multiple trans-membrane domains like in the case of the membrane transport proteins (Schey KL et al. Biochemistry 2015, doi: 10.1021/bi301604j). Alternative methods of quantitative proteomics may be added in the future for a more thorough comparison. We have added this issue in lines 351-356 of the revised version.

      Manuscript lines 351-356

      “When evaluating the extent of gene/protein alterations between the control and IRI conditions, we observed that the gene alterations of both Asct2 and Smcts, as revealed by snRNA-sequencing, are more pronounced than the protein alteration ratios obtained from proteomics. This discrepancy may stem from difficulty in the quantification method, especially for membrane transport proteins in label-free quantitative proteomics.”

      • For the functional contributions of ASCT2 and SMCTs in the kidney, we admitted the reviewer’s concern about the low affinity of SMCT1. Following the reviewer’s comment, we have included other factors besides transport affinities, e.g. expression levels and turnover rates of the transporters. From the results of both proteomics and snRNA-seq, ASCT2 expression is significantly lower than SMCTs in the normal conditions. snRNA-seq showed that ASCT2 was presented in less than 10% of the cell population (Figure supplement 7). We propose that most of the cells that do not express ASCT2 may use SMCT1 to reabsorb D-serine. This topic was included in the revised manuscript lines 386-404.

      Manuscript lines 386-404

      “Kinetics analysis of D-serine transport revealed the high affinity by ASCT2 (Km 167 µM) (Foster et al., 2016) and low affinity by SMCT1 (Km 3.39 mM; Figure 5E). In addition to transport affinity, the expression levels and co-localization of multiple transporters within the same cells are critical for elucidating the physiological roles of transporters or transport systems (Sakaguchi et al., 2024). In our proteome data, the chromatogram intensities of Smct1 (2.9 x 109 AU) and Smct2 (1.6 x 108 AU) were significantly higher than that of Asct2 (1.5 x 107 AU) in the control mice (Table 1: abundance in Sham). While direct intensity comparisons between different proteins in mass spectrometry analyses are not precise, they can provide a general indication of relative protein amounts. This finding aligns with the snRNA-seq data, where Asct2 expression was found to be minimal, present in less than 10% of cell populations under both control and IRI conditions, suggesting that many cells do not express Asct2. Conversely, Smct1 and Smct2 show high and ubiquitous expression in control conditions, but their levels are markedly reduced in IRI conditions (Figure supplement 7). Our ex vivo assays demonstrate that both ASCT2 and SMCTs mediate D-serine transport (Figure 7B). Consequently, Asct2 may contribute to D-serine reabsorption due to its high affinity, whereas Smcts, owing to their abundance, particularly in cells lacking Asct2, likely play a significant role in D-serine reabsorption. Moreover, factors such as transport turnover rate (Kcat) and the presence of local canonical substrates are also vital in defining the overall contribution of D-serine transport systems.”

      • As for the dramatic alterations of D/L-serine ratios juxtaposed with minimal changes in ASCT2 and SMCTs expression level, we cautiously refrain from drawing a definitive conclusion regarding the entire mechanism. This caution is grounded in the scientific understanding of a comprehensive elucidation of both L-serine transport systems and D-serine transport systems at both apical and basolateral membranes. Nevertheless, we would like to suggest a mechanism at the apical membrane based on the current knowledge.

      For D-serine transport systems, we found ASCT2 and SMCTs contributions in this study. Meanwhile, L-serine was previously reported to be mediated mainly by the neutral amino acid transporters B0AT3 (in particular B0AT3; Bröer et al. Physiol Rev 2008; Singer et al. J Biol Chem 2009). Hence, the mechanism behind the alterations of D/L-serine ratios should include B0AT3 functions as well. In IRI conditions, B0AT3 decreased 1.8 fold. We suggest that high ratios of D-/L-serine in IRI conditions are a combined outcome of 1) increase of D-serine reabsorption by ASCT2 enhancement and SMCTs reduction, and 2) decrease of L-serine reabsorption by B0AT3. We have included this suggestion in the Discussion lines 438-451.

      Manuscript lines 438-451

      “The enantiomeric profiles of serine revealed distinct plasma D/L-serine ratios, with low ratios in the normal control but elevated ratios in IRI, despite the weak stereoselectivity of ASCT2 (Figure 1B). This observation suggested the differential renal handling of D-serine compared to L-serine. While we identified SMCTs as a Dserine transport system, it has been reported that L-serine reabsorption is mediated by B0AT3 (Singer et al., 2009). We propose that the alterations in plasma and urinary D/Lserine ratios are the combined outcomes of: 1) transport systems for L-serine, and 2) transport systems for D-serine. In normal kidneys, the low plasma D/L-serine ratios could result from the efficient reabsorption of L-serine by B0AT3, coupled with the DAAO activity that degrades intracellular D-serine reabsorbed by SMCTs. In IRI conditions, our enantiomeric amino acid profiling revealed low plasma L-serine and high urinary L-serine (Figure supplements 1B, 2B). Additionally, the proteomics analysis indicated a reduction in B0AT3 levels (4h IRI/sham = 0.56 fold; 8h IRI/sham = 0.65 fold; Table S1). These observations suggest that the low L-serine reabsorption in IRI is a result of B0AT3 reduction.”

      • In the case of Asc-1, it was reported to be a D-serine transporter in the brain (Rosenberg et al. J Neurosci 2013). Suzuki et al. 2019 showed the increase of Asc-1 in cisplatin-induced tubular injury. Notably, the mRNA of Asc-1 is predominantly found in Henle’s loop, distal tubules, and collecting ducts but not in proximal tubules, and its protein expression level is dramatically low in the kidney (Human Protein Atlas: update on Jun 19, 2023). Furthermore, in this study, Asc-1 expression was not detected in the brush border membrane proteome. Consequently, we have decided not to include Asc-1 in the Discussion of this study, which primarily focuses on the proximal tubules.
    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      (1) More explanation/description of Fig 3C and 3D would be helpful for readers, including the color code of 3D and black lines shown in both panels.

      We have added more description to the legend of Figure 3, and we have used the same color code as in Figure 2, which we now specifically note in the figure legend as well.

      (2) Differences between cranial and trunk NCC could be experimentally shown or discussed. Fig 4C shows some differences between these two populations, but in situ, results using Dlc1/Sp5/Pak3 probes in the trunk region may be informative, like Fig 5 supplement 2 for cranial NCCs.

      This is an important point. The focus of our study was on cranial neural crest cells, and the single cell sequencing data is therefore truly reflective of only cranial neural crest cells. We have not functionally tested for the roles of Dlc1/Sp5/Pak3 in trunk neural crest cells, however, based on the expression and loss-of-function phenotypes of Sp5 or Pak3 knockout mice, we predict they individually may not play a significant role. It remains plausible that Dlc1 could play an important role in the delamination of trunk neural crest cells, but we have not tested that definitively. Nonetheless, Sabbir et al 2010 showed in a gene trap mouse mutant that Dlc1 is expressed in trunk neural crest cells. Regarding the similarities and differences between cranial and trunk neural crest cells as noted by the reviewer with respect to Figure 4, it’s important to recognize the temporal differences illustrated in Figure 4. Neural crest cell delamination proceeds in a progressive wave from anterior to posterior, but also that the analysis was designed to quantify cell cycle status before and during neural crest cell delamination. We have compared cranial and trunk neural crest cells in more detail in the discussion and also speculate what might happen in the trunk based on what we know from other species.

      (3) Discussion can be added about the potential functions of Dlc1 for NCC migration and/or differentiation based on available info from KO mice.

      We have added specific details regarding the published Dlc1 knockout mouse phenotype to the discussion, particularly with respect to the craniofacial anomalies which included frontonasal prominence and pharyngeal arch hyperplasia, and defects in neural tube closure and heart development. Although the study didn’t investigate the mechanisms underpinning the Dlc1 knockout phenotype, the craniofacial morphological anomalies would be consistent with a deficit in neural crest cell delamination reducing the number of migrating neural crest cells, as we observed in our Dlc1 knockdown experiments.

      Reviewer #2 (Recommendations For The Authors):

      The authors used the (Tg(Wnt1-cre)11Rth Tg(Wnt1-GAL4)11Rth/J) line but work from the Bush lab (see Lewis et al., 2013) has demonstrated fully penetrant abnormal phenotypes that affect the midbrain neuroepithelium, increased CyclinD1 expression and overt cell proliferation as measured by BrdU incorporation. The authors should explain why they used this mouse line instead of the Wnt1-Cre2 mice (129S4-Tg(Wnt1-cre)1Sor/J) in the Jackson Laboratory (which lacks the phenotypic effects of the original Wnt1-Cre line), or a "Cre-only" control, or at a minimum explain the steps they took to ensure there were no confounding effects on their study, especially since cell proliferation was a major outcome measure.

      This is an important point, and we thank the reviewer for raising it. Yes, it has been reported that the original Wnt1Cre mice exhibit a midbrain phenotype (Ace et al. 2013). However, it has also been noted that Wnt1Cre2 can exhibit recombination in the male germline leading to ubiquitous recombination (Dinsmore et al., 2022). Therefore, to avoid any potential for bias, we used an equal number of cells derived from the Wnt1 and F10N transgenic line embryos in our scRNA-seq, and this included multiple non-Cre embryos. Our scRNA-seq analysis was therefore not dependent upon Wnt1-Cre, but also because we used whole heads not fluorescence sorted cells. However, Wnt1-Cre lineage tracing was advantageous from a computational perspective to help define cells that were premigratory and migratory in concert with Mef2c-lacZ ¬based on their expression of YFP, LacZ or both. We note these specifics more clearly in the methods.

      The Results section (line 122) states that scRNA-seq was performed on dissociated cranial tissues but the Methods section (lines 583-584) implies that whole E8.5 mouse embryos were dissociated. Which was dissociated, whole embryos or just cranial tissues? Obviously, the latter would be a better strategy to enrich for cranial neural crest, but the authors also examine the trunk neural crest. This should be clarified in the text.

      We apologize that some of the details regarding the tissue isolation were confusing and we have clarified this in the methods and the text. For the record, after isolating E8.5 embryos, we then dissected the head from those embryos, and performed scRNA-seq on dissociated cranial tissues. As the reviewer correctly noted, this approach strategically enriches for cranial neural crest cells.

      The authors do not justify why they chose a knockdown strategy, which has its limitations including its systemic injection into the amniotic cavity, its likely global and more variable effects, and its need to be conducted in culture. Why the authors did not instead use a Wnt1-Cre-mediated deletion of Dlc1, which would have been "cleaner" and more specific to the neural crest, is not clear (maybe so they could specifically target different Dcl1 isoforms?). Also, the authors use Sox10 as a marker to count neural crest cells, but Sox10 may only label a subset of neural crest cells and thus some unaffected lineages may not have been counted. The authors should mention what is known about the regulation of Dcl1 by Sox10 in the neural crest. Although the data are persuasive, a second marker for counting neural crest cells following knockdown would make the analysis more robust. Can the authors explain why they did not simply use the Mef2c-F10N-LacZ line and count LacZ-positive cells (if fluorescence signal was required for the quantification workflow, then could they have used an anti-beta Galactosidase antibody to label cells)?

      We thank the reviewer for raising these important considerations. It has previously been noted that although Wnt1-Cre is the gold standard for conditional deletion analyses in neural crest cell development, especially migration and differentiation, it is not a good tool for functional studies of the specification and delamination of neural crest cells due to the timing of Wnt1 expression and Cre activation and excision (see Barriga et al., 2015). Therefore, we chose a knockdown strategy instead, and also because it allows us to more rapidly evaluate gene function. We agree that there are limitations to the approach with respect to variability, however, this is outweighed by the ability to repeatedly perform the knockdown at multiple and more relevant temporal stages such as E7.5 (which is prior to the onset of Wnt1-Cre activity), as well as target different isoforms, and also treat large numbers of embryos for quantitative analyses. The advantage of using Sox10 as a marker for counting neural crest cells is that at the time of analysis, cranial neural crest cells are still migrating towards the frontonasal prominences and pharyngeal arches, and the overwhelming majority of these cells are Sox10 positive. Moreover, we can therefore assay every Dlc1 knockdown embryo for Sox10 expression and count the number of migrating neural crest cells. The limitation of using the Mef2c-F10N-LacZ line is that this transgenic line is maintained as a heterozygote, and thus only half the embryos in a litter could reasonably be expected to be lacZ+. But combining Sox10 and Mef2c-F10N-LacZ fluorescent immunostaining for similar analyses in the future is a great idea.

      Reviewer #3 (Recommendations For The Authors):

      The putative intermediate cells differentially express mRNAs for genes involved in cell adhesion, polarity, and protrusion relative to bona fide premigratory cells (Fig. 2E). This is persuasive evidence, but only differentially expressed genes are shown. Discussing those markers that have not yet changed, e.g. Cdh1 or Zo1 (?), would be instructive and help to clarify the order of events.

      We thank the author for this suggestion and we have provided more detail about adherens junction and tight junctions. Cdh1 is not expressed, and although Myh9 and Myh10 are expressed, we did not detect any significant changes. ZO1 is a tight junction protein encoded by the gene Tjp1, which along with other tight junctions protein encoding genes, is downregulated in intermediate NCCs as shown in the Figure 2E.

      It is unclear whether the two putative intermediate state clusters differ other than their stage of the cell cycle. Based on the trajectory analysis in Fig. 3C-D, the authors state that these two populations form simultaneously and independently but then merge into a single population. However, without further differential expression, it seems more plausible that they represent a single population that is temporarily bifurcated due to cell cycle asynchrony.

      We have addressed the cell cycle question in the discussion by noting that while it is possible the transition states represent a single population that is temporarily bifurcated due to cell cycle asynchrony, if this were true, then we should expect S phase inhibition to eliminate both transition state groups. Instead, our trajectory analyses suggest that the transition states are initially independent, and furthermore, S phase inhibition did not affect delamination of the other population of neural crest cells.

      The authors do not present an in-depth comparison of these neural crest intermediate states to previously reported cancer intermediate states. This analysis would reveal how similar the signatures are and thus how extrapolatable these and future findings in delaminating neural crest are to different types of cancer.

      We have also added more detail to the discussion to address the potential for similarities and differences in neural crest intermediate states compared to previously reported cancer intermediate states. The challenge, however, is that none of the cancer intermediate states have been characterized at a molecular level. Nonetheless, with the limited molecular markers available, we have not identified any similarities so far, but our datasets are now available for comparison with future cancer EMP datasets.

      The reduction in SOX10+ cells may be in part or wholly attributable to inhibition of proliferation AFTER delamination. Showing that there are premigratory NCCs in G2/M at ~E8.0 would bolster the argument that this population is present from the earliest stages.

      The presence of premigratory neural crest cells in G2/M is shown by the scRNA-seq data and cell cycle staining data in the neural plate border.

      Lines 248-249: The pseudo-time analysis in Fig 3C/D does indicate that the two most mature cell clusters (pharyngeal arch and frontonasal mesenchyme) may arise from common or similar migratory progenitors. However, given the decades of controversy about fate restriction of neural crest cells, the statement that "EMT intermediate NCC and their immediate lineages are not fate restricted to any specific cranial NCC derivative at this timepoint" should be toned down so as to not give the impression that they have identified common progenitors of ectomesenchyme and neuro/glial/pigment derivatives.

      We appreciate this comment, because as the reviewer noted, there has been considerable literature and debate about the fate restriction and plasticity of neural crest cells, and indeed we did not intend to imply we have identified common progenitors of ectomesenchyme and neuro/glial/pigment derivatives. That can only be truly functionally demonstrated by clonal lineage tracing analyses. Rather, we interpret our pseudo-time analyses to indicate that irrespective of cell cycle status at the time of delamination, these two populations come together with equivalent mesenchymal and migratory properties, but in the absence of fate determination in the collective of cells. This does not mean that individual cells are common progenitors of both ectomesenchyme and neuro/glial/pigment derivatives. The nuance is important, and we address this more carefully in the text.

      Lines 320-321: "...this overlap in expression was notably not observed in older embryos in areas where EMT had concluded". It is unclear whether the markers no longer overlap in older embryos (i.e. segregate to distinct populations) or are simply no longer expressed.

      The data in Figure 5 demonstrates the dynamic and overlapping expression of Dlc1, Sp5 and Pak3 in the different clusters of cells as they transition from being neuroepithelial to mesenchymal. In contrast to Sp5 and Pak3, Dlc1 is not expressed by premigratory neural crest cells but is expressed at high levels in all EMT intermediate stage neural crest cells. Later as Dlc1 continues to be expressed in migrating neural crest cells, Pak3 and Sp5 are downregulated. But the absence of overlapping expression in the dorsolateral neural plate at the conclusion of EMT coincides with their downregulation in that territory.

      In the final results section on Dlc1, the previously published mutant mouse lines are referenced as having "craniofacial malformation phenotypes". The lack of detail given on what those malformations are (assuming descriptions are available) makes the argument that they may be related to insufficient delamination less persuasive. The degree of knockdown correlates so well with the percentage reduction in migratory neural crest (Fig. 6) that one would imagine a null mutant to have a very severe phenotype.

      The inference from the reviewer is correct and indeed Dlc1 null mutant mice do have a severe phenotype. We have added more specific details regarding the craniofacial and other phenotypes of the Dlc1 mutant mice to the discussion. Of note the frontonasal prominences and the pharyngeal arches are hypoplastic in E10.5 Dlc1 mutant embryos, which would be consistent with a neural crest cell deficit. Although a deficit in neural crest cells can be caused my multiple distinct mechanisms, our Dlc1 knockdown analyses suggest that the phenotype is due to an effect on neural crest cell delamination which diminishes the number of migrating neural crest cells.

      Use the same y-axis for Fig. 4C/D

      This has been corrected.

      Fig. 6C: Please note in the panel which gene is being measured by qPCR

      This has been corrected to denoted Dlc1.

      Lines 108-117: More concise language would be appropriate here.

      As requested, we were more succinct in our language and have shortened this section.

      The SABER-FISH images are very dim. I realize the importance of not saturating the pixels, but the colors are difficult to make out.

      We thank the reviewer for pointing this out and have endeavored to make the SABER-FISH images brighter and easier to see.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      In this manuscript, the authors report a molecular mechanism for recruiting syntaixn 17 (Syn17) to the closed autophagosomes through the charge interaction between enriched PI4P and the C-terminal region of Syn17. How to precisely control the location and conformation of proteins is critical for maintaining autophagic flux. Particularly, the recruitment of Syn17 to autophagosomes remains unclear. In this paper, the author describes a simple lipid-protein interaction model beyond previous studies focusing on protein-protein interactions. This represents conceptual advances.

      We would like to thank Reviewer #1 for the positive evaluation of our study.

      Reviewer #2 (Public Review):

      Summary:

      Syntaxin17 (STX17) is a SNARE protein that is recruited to mature (i.e., closed) autophagosomes, but not to immature (i.e., unclosed) ones, and mediates the autophagosome-lysosome fusion. How STX17 recognizes the mature autophagosome is an unresolved interesting question in the autophagy field. Shinoda and colleagues set out to answer this question by focusing on the C-terminal domain of STX17 and found that PI4P is a strong candidate that causes the STX17 recruitment to the autophasome.

      Strengths:

      The main findings are: 1) Rich positive charges in the C-terminal domain of STX17 are sufficient for the recruitment to the mature autophagosome; 2) Fluorescence charge sensors of different strengths suggest that autophagic membranes have negative charges and the charge increases as they mature; 3) Among a battery of fluorescence biosensors, only PI4P-binding biosensors distribute to the mature autophagosome; 4) STX17 bound to isolated autophagosomes is released by treatment with Sac1 phosphatase; 5) By dynamic molecular simulation, STX17 TM is shown to be inserted to a membrane containing PI4P but not to a membrane without it. These results indicate that PI4P is a strong candidate that STX17 binds to in the autophagosome.

      We would like to thank Reviewer #2 for pointing out these strengths.

      Weaknesses:

      • It was not answered whether PI4P is crucial for the STX17 recruitment in cells because manipulation of the PI4P content in autophagic membranes was not successful for unknown reasons.

      As we explained in the initial submission, we tried to deplete PI4P in autophagosomes by multiple methods but did not succeed. In this revised manuscript, we added the result of an experiment using the PI 4-kinase inhibitor NC03 (Figure 4―figure supplement 1), which shows no significant effect on the autophagosomal PI4P level and STX17 recruitment.

      Author response image 1.

      The PI 4-kinase inhibitor NC03 failed to suppress autophagosomal PI4P accumulation and STX17 recruitment. HEK293T cells stably expressing mRuby3–STX17TM (A) or mRuby3–CERT(PHD) (B) and Halotag-LC3 were cultured in starvation medium for 1 h and then treated with and without 10 μM NC03 for 10 min. Representative confocal images are shown. STX17TM- or CERT(PHD)-positive rates of LC3 structures per cell (n > 30 cells) are shown in the graphs. Solid horizontal lines indicate medians, boxes indicate the interquartile ranges (25th to 75th percentiles), and whiskers indicate the 5th to 95th percentiles. Differences were statistically analyzed by Welch’s t-test. Scale bars, 10 μm (main), 1 μm (inset).

      • The molecular simulation study did not show whether PI4P is necessary for the STX17 TM insertion or whether other negatively charged lipids can play a similar role.

      As the reviewer suggested, we performed the molecular dynamics simulation using membranes with phosphatidylinositol, a negatively charged lipid. STX17 TM approached the PI-containing membrane but was not inserted into the membrane within a time scale of 100 ns in simulations of all five structures. This data suggests that PI4P, which is more negatively charged than PI, is required for STX17 insertion. Thus, we have included these data in Figure 5E and F and added the following text to Lines 242–244. “Moreover, if the membrane contained phosphatidylinositol (PI) instead of PI4P, STX17 approached the PI-containing membrane but was not inserted into the membrane (Figure 5E, F, Video 3)."

      Author response image 2.

      (E) An example of a time series of simulated results of STX17TM insertion into a membrane consisting of 70% phosphatidylcholine (PC), 20% phosphatidylethanolamine (PE), and 10% phosphatidylinositol (PI). STX17TM is shown in blue. Phosphorus in PC, PE and PI are indicated by yellow, cyan, and orange, respectively. Short-tailed lipids are represented as green sticks. The time evolution series are shown in Video 3. (F) Time evolution of the z-coordinate of the center of mass (z_cm) of the transmembrane helices of STX17TM in the case of membranes with PI. Five independent simulation results are represented by solid lines of different colors. The gray dashed lines indicate the locations of the lipid heads. A scale bar indicates 5 nm.

      • The question that the authors posed in the beginning, i.e., why is STX17 recruited to the mature (closed) autophagosome but not to immature autophagic membranes, was not answered. The authors speculate that the seemingly gradual increase of negative charges in autophagic membranes is caused by an increase in PI4P. However, this was not supported by the PI4P fluorescence biosensor experiment that showed their distribution to the mature autophagosome only. Here, there are at least two possibilities: 1) The increase of negative charges in immature autophagic membranes is derived from PI4P. However the fluorescence biosensors do not bind there for some reason; for example, they are not sensitive enough to recognize PI4P until it reaches a certain level, or simply, their binding does not occur in a quantitative manner. 2) The negative charge in immature membranes is not derived from PI4P, and PI4P is generated abundantly only after autophagosomes are closed. In either case, it is not easy to explain why STX17 is recruited to the mature autophagosome only. For the first scenario, it is not clear how the PI4P synthesis is regulated so that it reaches a sufficient level only after the membrane closure. In the second case, the mechanism that produces PI4P only after the autophagosome closure needs to be elucidated (so, in this case, the question of the temporal regulation issue remains the same).

      We thank the reviewers for pointing this out. While the probe for weakly negative charges (1K8Q) labeled both immature and mature autophagosomes, the probes for intermediate charges (5K4Q and 3K6Q) and PI4P labeled only mature autophagosomes (Figure 2F, Figure 2–figure supplement 1B). Thus, we think that the autophagosomal membrane rapidly and drastically becomes negatively charged, and at the same time, PI4P is enriched. Although immature membranes may have weak negative charges, we did not examine which lipids contribute to the negative charges. Thus, we have added the following sentences to the Discussion part.

      “Our data of the 1K8Q probe suggest that immature autophagosomal membranes may also have slight negative charges (Figure 2E). Although the source of the negative charge of immature autophagosomes is currently unknown, it may be derived from low levels of PI4P, which is undetectable by the PI4P probes and/or other negatively charged lipids such as PI and PS (Schmitt et al., EMBO Rep, 2022).” (Lines 279–283) “In any case, it would be important to elucidate how PI 4-kinase activity or PI4P synthesis is upregulated during autophagosome maturation.” (Lines 302–303)

      Reviewer #3 (Public Review):

      Summary:

      In this study, the authors set out to address the question of how the SNARE protein Syntaxin 17 senses autophagosome maturation by being recruited to autophagosomal membranes only once autophagosome formation and sealing is complete. The authors discover that the C-terminal region of Syntaxin 17 is essential for its sensing mechanism that involves two transmembrane domains and a positively charged region. The authors discover that the lipid PI4P is highly enriched in mature autophagosomes and that electrostatic interaction with Syntaxin 17's positively charged region with PI4P drives recruitment specifically to mature autophagosomes. The temporal basis for PI4P enrichment and Syntaxin 17 recruitment to ensure that unsealed autophagosomes do not fuse with lysosomes is a very interesting and important discovery. Overall, the data are clear and convincing, with the study providing important mechanistic insights that will be of broad interest to the autophagy field, and also to cell biologists interested in phosphoinositide lipid biology. The author's discovery also provides an opportunity for future research in which Syntaxin 17's c-terminal region could be used to target factors of interest to mature autophagosomes.

      Strengths:

      The study combines clear and convincing cell biology data with in vitro approaches to show how Syntaxin 17 is recruited to mature autophagosomes. The authors take a methodical approach to narrow down the critical regions within Syntaxin 17 required for recruitment and use a variety of biosensors to show that PI4P is enriched on mature autophagosomes.

      We would like to thank Reviewer #3 for the positive comments.

      Weaknesses:

      There are no major weaknesses, overall the work is highly convincing. It would have been beneficial if the authors could have shown whether altering PI4P levels would affect Syntaxin 17 recruitment. However, this is understandably a challenging experiment to undertake and the authors outlined their various attempts to tackle this question.

      We thank Reviewer #3 for pointing this out. Please see our above response to Reviewer #2 (Public Review).

      In addition, clear statements within the figure legends on the number of independent experimental repeats that were conducted for experiments that were quantitated are not currently present in the manuscript.

      As pointed out by Reviewer #3, we have added the number of independent experimental repeats in the figure legends.

      Reviewer #1 (Recommendations For The Authors):

      This paper is well written and all experiments were conducted with a high standard. Several minor issues should be addressed before final publication.

      (1) To further confirm the charge interaction, a charge screening experiment should be performed for Fig. 2A.

      We have asked Reviewer #1 through the editor what this experiment meant and understood that it was to see the effects of high salt concentrations. We monitored the association of GFP-STX17TM with liposomes in the presence or absence of 1 M NaCl and found that it was blocked in a high ionic buffer. This data supports the electrostatic interaction of STX17 with membranes. We have included this data in Figure 2B and added the following sentences to Lines 124–126.

      “The association of STX17TM with PI4P-containing membranes was abolished in the presence of 1 M NaCl (Figure 2B). These data suggest that STX17 can be recruited to negatively charged membranes via electrostatic interaction independent of the specific lipid species.”

      Author response image 3.

      GFP–STX17TM translated in vitro was incubated with rhodamine-labeled liposomes containing 70% PC, 20% PE and 10% PI4P in the presence of 1 M NaCl or 1.2 M sucrose. GFP intensities of liposomes were quantified and shown as in Figure 1C (n > 30).

      (2) The authors claim that "Autophagosomes become negatively charged during maturation", based on experiments using membrane charge probes. Since it's mainly about the membrane, it's better to refine the claim to "The membrane of autophasosomes becomes...", which would be more precise and close to the topic of this paper.

      We would like to thank the reviewer for pointing this out. This point is valid. As recommended, we have collected the phrases “Autophagosomes become negatively charged during maturation” to “The membrane of autophagosomes becomes negatively charged during maturation” (Line 72, 118, 262, 969 (title of Figure2), 1068 (title of Figure2–figure supplyment1)).

      (3) The authors should add more discussion regarding the "specificity" for recruiting Syn17 through the charge interaction. Particularly, how Syn17 could be maintained before the closure of autophagosomes? For the MD simulations in Fig. 5, the current results don't add much to the manuscript. The cell biology experiments have demonstrated the conclusion. The authors could try to find more details about the insertion by analyzing the simulation movies. Do membrane packing defects play a role during the insertion process? A similar analysis was conducted for alpha-synuclein (https://pubmed.ncbi.nlm.nih.gov/33437978/).

      Regarding the mechanism of STX17 maintenance in the cytosol, we do not think that other molecules, such as chaperones, are essential because purified recombinant mGFP-STX17TM used in this study is soluble. However, it does not rule out such a mechanism, which would be a future study.

      In the paper by Liu et al. (PMID: 33437978), small liposomes with diameters of 25–50 nm are used. Therefore, there are packing defects in the highly curved membranes, to which alpha-synuclein helices are inserted in a curvature-dependent manner. On the other hand, autophagosomes are much larger (~1 um in diameter) and almost flat for STX17 molecules, so we think it is unlikely that STX17 recognizes the packing defect.

      Reviewer #2 (Recommendations For The Authors):

      • The two (and other) possibilities with regards to the interpretation of the negative charge/PI4P result in autophagic membranes are hoped to be discussed.

      As mentioned above, we have added the following sentences to the Discussion section. “Our data of the 1K8Q probe suggest that immature autophagosomal membranes may also have slight negative charges (Figure 2E). Although the source of the negative charge of immature autophagosomes is currently unknown, it may be derived from low levels of PI4P, which is undetectable by the PI4P probes and/or other negatively charged lipids such as PI and PS (Schmitt et al., EMBO Rep, 2022).” (Lines 279–283)

      “In any case, it would be important to elucidate how PI 4-kinase activity or PI4P synthesis is upregulated during autophagosome maturation.” (Lines 302–303)

      • Fluorescence biosensors are convenient to give an overview of the intracellular distribution of various lipids, but some of them show false-negative results. For example, evectin-2-PH for PS binds to endosomes but not to the plasma membrane, even though the latter contains abundant PS. With regards to PI4P, some biosensors illuminate both the Golgi and autophagosome, while others do not appear to bind the Golgi. Moreover, fluorescence biosensors for PI(3,5)P2 and PI(3,4)P2, which are also candidates for the STX17 insertion issue, are less reliable than others (e.g., those for PI3P and PI(4,5)P2). These problems need to be considered.

      We agree with Reviewer #2 that fluorescence biosensors are not perfect for detecting specific lipids. Based on the Reviewer’s suggestion, we have included a comment on this in the Discussion section as follows (Lines 265–268).

      “Given the possibility that fluorescence lipid probes may give false-negative results, a more comprehensive biochemical analysis, such as lipidomics analysis of mature autophagosomes, would be imperative to elucidate the potential involvement of other negatively charged lipids.”

      • A negative control for the PI4P biosensor, i.e., a mutant lacking the PI4P binding ability, is better to be tested to confirm the presence of PI4P in autophagosomes.

      We would like to thank the Reviewer for this comment. We conducted the suggested experiment and confirmed that the CERT(PHD)(W33A) mutant, which is deficient for PI4P binding (Sugiki et al., JBC. 2012), was diffusely present in the cytosol and did not localize to STX17-positive autophagosomes. This data supports our conclusion that PI4P is indeed present in autophagosomes. We have included this data in Figure 3–figure supplement 2A and explained it in the text (Lines 164–166).

      Author response image 4.

      Mouse embryonic fibroblasts (MEFs) stably expressing GFP–CERT(PHD)(W33A) and mRuby3–STX17TM were cultured in starvation medium for 1 h. Bars indicate 10 μm (main images) and 1 μm (insets).

      • As a control to the molecular dynamic simulation study, STX17 TM insertion into a membrane containing other negative charge lipids, especially PI, needs to be tested. PI is a negative charge lipid that is likely to exist in autophagic membranes (as suggested by the authors' past study).

      We thank the reviewers for this suggestion. As mentioned above (Reviewer #2, Public Review), we performed the molecular dynamics simulation using membranes containing PI and added the results in Figure 5E and F and Video 3.

      • If the putative role of PI4P could be shown in the cellular context, the authors' conclusion would be much strengthened. I wonder if overexpression of PI4P fluorescence biosensors, especially those that appear to bind to the autophagosome almost exclusively, may suppress the recruitment of STX17 there.

      We would like to thank the Reviewer for asking this question. In MEFs stably overexpressing PI4P probes driven by the CMV promoter, STX17 recruitment was not affected. Thus, simple overexpression of PI4P probes does not appear to be effective in masking PI4P in autophagosomes.

      Another idea is to use an appropriate molecule (e.g., WIPI2, ATG5) and to recruit Sac1 to autophagic membranes by using the FRB-FKBP system or the like. I hope these and other possibilities will be tested to confirm the importance of PI4P in the temporal regulation of STX17 recruitment.

      We tried the FRB-FKBP system using the phosphatase domain of yeast Sac1 fused to FKBP and LC3 fused to FRB, but unfortunately, this system failed to deplete PI4P from the autophagosomal membrane.

      Reviewer #3 (Recommendations For The Authors):

      A few areas for suggested improvement are:

      (1) It would be helpful if the authors could clarify for all figures how many independent experiments were conducted for all experiments, particularly those that have quantitation and statistical analyses.

      As pointed out by Reviewer #3, we have added the number of independent experimental repeats in the figure legends.

      The authors made several attempts to modulate PI4P levels on autophagosomes although understandably this proved to be challenging. A couple of suggestions are provided to address this area:

      (2) Given the reported role of GABARAPs in PI4K2a recruitment and PI4P production on autophagosomes, as well as autophagosome-lysosome fusion (Nguyen et al (2016) J Cell Biol) it would be worthwhile to assess whether GABARAP TKO cells have reduced PI4P and reduced Stx17 recruitment

      According to the Reviewer’s suggestion, we examined the localization of STX17 TM and the PI4P probe CERT(PHD) in ATG8 family (LC3/GABARAP) hexa KO HeLa cells that were established by the Lazarou lab (Nguyen et al., JCB 2016). As in WT cells, STX17 TM and CERT(PHD) were still colocalized with each other in hexa KO cells, suggesting that neither STX17 recruitment nor PI4P enrichment depends on ATG8 family proteins (note: the size of autophagosomes in HeLa cells is smaller than in MEFs, making it difficult to observe autophagosomes as ring-shaped structures). We have included this result in Figure 3–figure supplement 2(F) and explained it in the text (Lines 194–196, 198).

      Author response image 5.

      (F) WT and ATG8 hexa KO HeLa cells stably expressing GFP–STX17TM and transiently expressing mRuby3–CERT(PHD) were cultured in starvation medium. Bars indicate 10 μm (main images) and 1 μm (insets).

      (3) Can the authors try fusing Sac1 to one of the PI4P probes (CERT(PHD)) that were used, or alternatively to the c-terminus of Syntaxin 17? This approach would help to recruit Sac1 only to mature autophagosomes and could therefore prevent the autophagosome formation defect observed when fused to LC3B that targeted Sac1 to autophagosomes as they were forming. Understandably, this approach might seem a bit counterintuitive since the phosphatase is removing PI4P which is what is recruiting it but it could be a viable approach to keep PI4P levels low enough on mature autophagosomes so that Syntaxin 17 is no longer recruited. A Sac1 phosphatase mutant might be needed as a control.

      We would like to thank the Reviewer for these suggestions. We tried the phosphatase domain of yeast Sac1 or human SAC1 fused with STX17TM, but unfortunately, these fusion proteins did not deplete PI4P from autophagosomes.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      To resolve and further test the claim that TBI did not induce cell proliferation:

      How many brains did they analyse? Sample sizes must be provided in Figure S1.

      As per reviewer’s suggestion, we removed one of the unsupported claims shown in Figure S1. The original Figure S1 is shown below with the sample number added.

      Author response image 1.

      The authors could either improve the TBI method or the detection of cells in S-phase, mitosis or cycling. They could use PCNA-GFP or BrdU, EdU or FUCCI instead and at least provide evidence that they can detect cells in S-phase in intact brains. Timing is critical (ie cell cycle is longer than in larvae) so multiple time points should be tested. Or they could use pH3 but test more time points and rather large sample sizes. If they are not able to provide any evidence, then their lack of evidence is no evidence. The authors should consider removing pH3 and PCNA-GFP related claims instead.

      We have removed pH3 and PCNA-GFP related results and claims.

      Other unsupported claims:

      Figure 2A-C is not very clear what they are showing, but it is not evidence of astrocyte hypertrophy. It does not have cellular resolution and does not show the cell size, membranes, nor number

      (1) We have avoided the term “hypertrophy” and changed the description throughout the text to “astrocyte swelling”.

      (2) Images in the resolution of Figure 2E and 2F were able to show the enlarged soma of astrocytes, suggesting swelling.

      What is the point of using RedStinger in Figure 2?

      We used RedStinger to label the astrocyte nuclei.

      Figure S5 is not convincing, as anti-Pvr does not look localised to specific cells. Instead, it looks like uniform background. If they really think the antibody is localised, they should do double stainings with cell type specific markers. If the antibody does not work, then remove the data and the claim. They could test with RNAi knock-down in specific cell types and qRT-PCR which cells express pvr instead.

      We have removed the claim that “Pvr is predominantly expressed in astrocytes” and changed the description to “Immunostainings using the anti-Pvr antibodies revealed that endogenous Pvr expression is low in the control brains, yet significantly enhanced upon TBI. Reducing Pvr expression, but not Pvr overexpression, in astrocytes blocked the TBI-induced increase of Pvr expression (Figure S5)”.

      Figure S6: it is unclear what they are trying to show, but these data do not demonstrate that astrocytes do not engulf debris after TBI, as there isn't sufficient cellular resolution to make such claim. Firstly, they analyse one single cell per treatment. Secondly, the cell projections are not visible in these images, and therefore engulfment cannot be seen. The authors could remove the claim or visualise whether astrocytes phagocytose debris or not either using clones or with TEM.

      We agree with the reviewer that our images do not have the resolution to make this claim. We have removed Figure S6 and corresponding text description.

      On statistics:

      The statistical analysis needs revising as it is wrong in multiple places, eg Fig.1F,G,H; Figure 2D. They only use Student t-tests. These can only be used when data are continuous, distributed uniformly and only two samples are compared; if more than 2 samples, distributed uniformly, then use One-Way ANOVA and multiple comparisons tests. If data are categorical, use Chi-Square.

      We have double checked and compared the experimental group to the control separately using the Student t-tests throughout the study.

      Other points for improvement:

      Figure 2E,F: what are GFP puncta and how are they counted?

      I. Each GFP puncta looks like a little circle, likely representing a functional or dysfunctional structure. The biology of the GFP puncta is currently unkonwn.

      II. We used the ImageJ to quantify the GFP puncta:

      (1) Image- type-8 bits

      (2) Process-subtract background (Rolling ball radio:10)

      (3) Image-Adjust-Threshold-Apply

      (4) Analyze-Measure-set measurements-choose “area” “limit to threshold”-OK

      (5) Count the puncta number in the choosing area.

      (6) Get the number of puncta per square micron.

      All genotypes must be provided (including for MARCM clones), currently they are not.

      We have shown the full genotype in the corresponding legend.

      Figure 7O,P indicate on figure that these are RNAi

      We have revised the labels to RNAi in Figure 7O,P.

      Reviewer #2 (Recommendations For The Authors):

      Several typos are present in the text.

      We have read the manuscript carefully and corrected typos throughout.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This study presents valuable findings on the roles of the axon growth regulator Sema7a in the formation of peripheral sensory circuits in the lateral line system of zebrafish. The evidence supporting the claims of the authors is solid, although further work directly testing the roles of different sema7a isoforms would strengthen the analysis. The work will be of interest to developmental neuroscientists studying circuit formation.

      Public Reviews:

      Reviewer #1 (Public Review):

      In this work, Dasguta et al. have dissected the role of Sema7a in fine tuning of a sensory microcircuit in the posterior lateral line organ of zebrafish. They attempt to also outline the different roles of a secreted verses membrane-bound form of Sema7a in this process. Using genetic perturbations and axonal network analysis, the authors show that loss of both Sema7a isoforms causes abnormal axon terminal structure with more bare terminals and fewer loops in contact with presynaptic sensory hair cells. Further, they show that loss of Sema7a causes decreased number and size of both the pre- and post-synapse. Finally, they show that overexpression of the secreted form of Sema7a specifically can elicit axon terminal outgrowth to an ectopic Sema7a expressing cell. Together, the analysis of Sema7a loss of function and overexpression on axon arbor structure is fairly thorough and revealed a novel role for Sema7a in axon terminal structure. However, the connection between different isoforms of Sema7a and the axon arborization needs to be substantiated. Furthermore, an autocrine role for Sema7a on the presynaptic cell is not ruled out as a contributing factor to the synaptic and axon structure phenotypes.

      Finally, critical controls are absent from the overexpression paradigm.

      Comments: Thank you for your valuable comments. We have analyzed the hair cell scRNA transcriptome data of zebrafish neuromasts from published works and have not identified known expression of receptors of the Sema7A protein, particularly PlexinC1 and Integrin β1 molecules (reference 4 and 15) in hair cells. This result suggests that the Sema7A protein molecule, either secreted or membrane-bound, does not possess its cognate receptor to elicit an autocrine function on the hair cells. Moreover, the GPI-anchored Sema7A lacks a cytosolic domain. So it is unlikely that Sema7A signaling directly induces the formation of presynaptic ribbons. We propose that the decrease in average number and area of synaptic aggregates likely reflects decreased stability of the synaptic structures owing to lack of contact between the sensory axons and the hair cells, which has been identified in zebrafish neuromasts (reference 38).

      Thank you for pointing missing critical control experiments. Additional control experiments (lines 333-346) with a new figure (Figure 5) have been added.

      These issues weaken the claims made by the authors including the statement that they have identified differential roles for the GPI-anchored verses secreted forms of Sema7a on synapse formation and as a chemoattractant for axon arborization respectively.

      Comments: We have rephrased our statement and argue in lines 428-430 that our experiments “suggest a potential mechanism for hair cell innervation in which a local Sema7Asec diffusive cue likely consolidates the sensory arbors at the hair cell cluster and the membrane-anchored Sema7A-GPI molecule guides microcircuit topology and synapse assembly.”

      The manuscript itself would benefit from the inclusion of details in the text to help the reader interpret the figures, tools, data, and analysis.

      Comments: We have made significant revisions to the text and figures to improve clarity and consistency of the manuscript.

      Reviewer #2 (Public Review):

      In this work, Dasgupta et al. investigates the role of Sema7a in the formation of peripheral sensory circuit in the lateral line system of zebrafish. They show that Sema7a protein is present during neuromast maturation and localized, in part, to the base of hair cells (HCs). This would be consistent with pre-synaptic Sema7a mediating formation and/or stabilization of the synapse. They use sema7a loss-of-function strain to show that lateral line sensory terminals display abnormal arborization. They provide highly quantitative analysis of the lateral line terminal arborization to show that a number of specific topological parameters are affected in mutants. Next, they ectopically express a secreted form of Sema7a to show that lateral line terminals can be ectopically attracted to the source. Finally, they also demonstrate that the synaptic assembly is impaired in the sema7a mutant. Overall, the data are of high quality and properly controlled. The availability of Sema7a antibody is a big plus, as it allows to address the endogenous protein localization as well to show the signal absence in the sema7a mutant. The quantification of the arbor topology should be useful to people in the field who are looking at the lateral line as well as other axonal terminals. I think some results are overinterpreted though. The authors state: "Our findings demonstrate that Sema7A functions both as a juxtracrine and as a secreted cue to pattern neural circuitry during sensory organ development." However, they have not actually demonstrated which isoform functions in HCs (also see comments below).

      Comments: Thank you for making this point. To investigate the presence of both sema7a transcripts in the hair cells of the lateral-line neuromasts, we used the Tg(myo6b:actb1EGFP) transgenic fish to capture the labeled hair cells by fluorescence-activated cell sorting (FACS) and isolated total RNA. Using transcript specific DNA oligonucleotide primers, we have identified the presence of both sema7a transcript variants in the hair cell of the neuromast. Even though we have not developed transcript specific knockout animals, we speculate that the presence of both transcript variants in the hair cell implies that they function in distinct fashion. We have changed our interpretation in lines 32-34 to “Our findings propose that Sema7A likely functions both as a juxtracrine and as a secreted cue to pattern neural circuitry during sensory organ development.”

      In future we will utilize the CRISPR/Cas9 technique to target the unique C-terminal domain of the GPI-anchored sema7a transcript variant. We believe that this will only perturb the formation of the full-length Sema7A protein and help us determine the role of the membrane-bound Sema7AGPI molecule as well as the Sema7Asec in sensory arborization and synaptic assembly.

      In addition, they have to be careful in interpreting their topology analysis, as they cannot separate individual axons. Thus, such analysis can generate artifacts. They can perform additional experiments to address these issues or adjust their interpretations.

      Comments: Thank you for this insightful comment. In a previous eLife publication from our laboratory, we utilized the serial blockface scanning electron micrograph (SBFSEM) technique to characterize the connectome of the neuromast microcircuit where patterns of innervation of all the individual axons can be delineated in five-days-old larvae (reference 8). However, the collective behavior of all the sensory axons that build the innervation network remained enigmatic, especially in a living animal during development. In this paper we addressed how the sensory-axon collective behaves around the clustered hair cells and build the innervation network in living animals during diverse developmental stages. Our analyses have not only identified how the axons associates with the hair cell cluster as the organ matures, but also discovered distinct topological features in the arbor network that emerges during organ maturation, which may influence assembly of postsynaptic aggregates (lines 384-403, Figure 6G-I). We believe that our quantitative approach to capture collective axonal behaviors and their topological attributes during circuit formation have highlighted the importance of understanding network assembly during sensory organ development.

      Reviewer #3 (Public Review):

      Summary:

      This study demonstrates that the axon guidance molecule Sema7a patterns the innervation of hair cells in the neuromasts of the zebrafish lateral line, as revealed by quantifying gain- and loss-of function effects on the three-dimensional topology of sensory axon arbors over developmental time. Alternative splicing can produce either a diffusible or membrane-bound form of Sema7a, which is increasingly localized to the basolateral pole of hair cells as they develop (Figure 1). In sema7a mutant zebrafish, sensory axon arbors still grow to the neuromast, but they do not form the same arborization patterns as in controls, with many arbors overextending, curving less, and forming fewer loops even as they lengthen (Figure 2,3). These phenotypes only become significant later in development, indicating that Sema7a functions to pattern local microcircuitry, not the gross wiring pattern. Further, upon ectopic expression of the diffusible form of Sema7a, sensory axons grow towards the Sema7a source (Figure 4). The data also show changes in the synapses that form when mutant terminals contact hair cells, evidenced by significantly smaller pre- and post-synaptic punctae (Figure 5). Finally, by replotting single cell RNA-sequencing data (Figure 6), the authors show that several other potential cues are also produced by hair cells and might explain why the sema7a phenotype does not reflect a change in growth towards the neuromast. In summary, the data strongly indicate that Sema7a plays a role in shaping connectivity within the neuromast.

      Strengths:

      The main strength of this study is the sophisticated analysis that was used to demonstrate fine-level effects on connectivity. Rather than asking "did the axon reach its target?", the authors asked "how does the axon behave within the target?". This type of deep analysis is much more powerful than what is typical for the field and should be done more often. The breadth of analysis is also impressive, in that axon arborization patterns and synaptic connectivity were examined at 3 stages of development and in three-dimensions.

      Weaknesses:

      The main weakness is that the data do not cleanly distinguish between activities for the secreted and membrane-bound forms of Sema7a, which the authors speculate may influence axon growth and synapse formation respectively. The authors do not overstate the claims, but it would have been nice to see some additional experimentation along these lines, such as the effects of overexpressing the membrane-bound form,

      Comments: We have accepted this useful suggestion. In lines 333-346 and in Figure 5 we have demonstrated the impact of overexpressing the membrane-bound transcript variant on arborization pattern of the sensory axons.

      Some analysis of the distance over which the "diffusible" form of Sema7a might act (many secreted ligands are not in fact all that diffusible), or

      Comments: We have reported this in lines 311-317 and in Figure 4F,G.

      Some live-imaging of axons before they reach the target (predicted to be the same in control and mutants) and then within the target (predicted to be different).

      Comments: We have accepted this useful suggestion. We demonstrate the dynamics of the sensory arbors that are attracted to an ectopic Sema7Asec source in lines 325-332, Figure 4I,J; Figure 4—figure supplement 2A, and Videos 13-16.

      Clearly, although the gain-of-function studies show that Sema7a can act at a distance, other cues are sufficient. Although the lack of a phenotype could be due to compensation, it is also possible that Sema7a does not actually act in a diffusible manner within its natural context. Overall, the data support the authors' carefully worded conclusions. While certain ideas are put forward as possibilities, the authors recognize that more work is needed. The main shortcoming is that the study does not actually distinguish between the effects of the two forms of Sema7a, which are predicted but not actually shown to be either diffusible or membrane linked (the membrane linkage can be cleaved). Although the study starts by presenting the splice forms, there is no description of when and where each splice form is transcribed.

      Comments: We have utilized the HCR™ RNA-FISH Technology to generate transcript specific probes. To generate transcript-specific HCR probes to distinctly detect the sema7aGPI (NM_001328508) and the sema7asec (NM_001114885) transcripts, Molecular Instruments could design only 11 probes against the sema7aGPI transcript and only one probe against the sema7asec transcript (personal correspondence with Mike Liu, PhD, Head of Operations and Product Development Lead Molecular Instruments, Inc.). The HCR probe against the sema7aGPI transcript showed a very faint signal. Unfortunately, the HCR probe against the sema7asec transcript failed to detect the presence of any transcript. For robust detection of transcripts, the protocol demands a minimum of 20 probes. We believe that the very low number of probes against our transcripts is the primary reason for the absence of a signal.

      We therefore utilized fluorescence-activated cell sorting (FACS) to capture the labeled hair cells and isolated total RNA to perform RT-PCR using transcript specific DNA oligonucleotide primers. We identified the presence of both the secreted and the membrane-bound transcripts at four-days-old neuromasts (lines 80-84, Figure 1B-D).

      Additionally, since the mutants are predicted to disrupt both forms, it is a bit difficult to disentangle the synaptic phenotype from the earlier changes in circuit topology - perhaps the change at the level of the synapse is secondary to the change in topology.

      Comments: Thank you for the insightful suggestion. We have analyzed the relationship between the sensory arbor network topology and the distribution of postsynaptic structures (lines 384-403, Figure 6G-I). We identified that the distribution of the postsynaptic aggregates is closely associated with the topological attributes of the sensory circuit. We further clarify the potential origin of disrupted synaptic assemblies in sema7a-/- mutants in lines 380-382 and lines 417-420.

      Further, the authors do not provide any data supporting the idea that the membrane bound form of Sema7a acts only locally. Without these kinds of data, the authors are unable to attribute activities to either form.

      Comments: We have accepted this useful suggestion and have prepared the Figure 5 with the necessary details.

      The main impact on the field will be the nature of the analysis. The field of axon guidance benefits from this kind of robust quantification of growing axon trajectories, versus their ability to actually reach a target. This study highlights the value of more careful analysis and as a result, makes the point that circuit assembly is not just a matter of painting out paths using chemoattractants and repellants, but is also about how axons respond to local cues. The study also points to the likely importance of alternative splice forms and to the complex functions that can be achieved using different forms of the same ligand.

      Reviewer #4 (Public Review):

      Summary:

      The work by Dasgupta et al identifies Sema7a as a novel guidance molecule in hair cell sensory systems. The authors use the both genetic and imaging power of the zebrafish lateralline system for their research. Based on expression data and immunohistochemistry experiments, the authors demonstrate that Sema7a is present in lateral line hair cells. The authors then examine a sema7a mutant. In this mutant, Sema7a proteins levels are nearly eliminated. Importantly, the authors show that when Sema7a is absent, afferent terminals show aberrant projections and fewer contacts with hair cells. Lastly the authors show that ectopic expression of the secreted form of Sema7a is sufficient to recruit aberrant terminals to non-hair cell targets. The sema7a innervation defects are well quantified. Overall, the paper is extremely well written and easy to follow.

      Strengths:

      (1) The axon guidance phenotypes in sema7a mutants are novel, striking and thoroughly quantified.

      (2) By combining both loss of function sema7a mutants and ectopic expression of the secreted form of Sema7a the authors demonstrate the Sema7a is both necessary and sufficient to guide sensory axons

      Weaknesses:

      (1) Control. There should be an uninjected heatshock control to ensure that heatshock itself does not cause sensory afferents to form aberrant arbors. This control would help support the hypothesis that exogenously expressed Sema7a (via a heatshock driven promoter) is sufficient to attract afferent arbors.

      Comments: Thank you for the suggestion. We have added the uninjected heatshock control experiment in Figure 5 and described experimental details in the text, lines 343-345.

      (2) Synapse labeling. The numbers obtained for postsynaptic labeling in controls do not match up with the published literature - they are quite low. Although there are clear differences in postsynaptic counts between sema7a mutants and controls, it is worrying that the numbers are so low in controls. In addition, the authors do not stain for complete synapses (pre- and post-synapses together). This staining is critical to understand how Sema7a impacts synapse formation.

      Comments: Thank you for raising this issue. We believe the low average numbers of the postsynaptic punctae in control neuromasts arise from lack of formation of postsynaptic aggregates beneath the immature hair cells, which are abundant in early stages of neuromast maturation. We have performed exhaustive analysis on the formation of pre- and postsynaptic structures and have identified how their distribution changes along neuromast development in control larvae. We have further analyzed how such distribution is perturbed in the sema7a-/- mutants. We do not think analyzing the complete synapse structure will add much to our understanding of how Sema7A influence synapse formation and maintenance.

      (3) Hair cell counts. The authors need to provide quantification of hair cell counts per neuromast in mutant and control animals. If the counts are different, certain quantification may need to be normalized.

      Comments: We have added the raw data with the hair cell counts in both control and sema7a-/- mutants across developmental stages. The homozygous sema7a-/- mutants have slightly less hair cells and we have normalized all our topological analyses by the corresponding hair cell numbers for each neuromast in each experiment (lines 669-675).

      (4) Developmental delay. It is possible that loss of Sema7a simply delays development. The latest stage examined was 4 dpf, an age that is not quite mature in control animals. The authors could look at a later age, such as 6 dpf to see if the phenotypes persist or recover.

      Comments: The homozygous sema7a-/- mutants are unviable and die at 6 dpf. We therefore restricted our analysis till 4 dpf. The association of the sensory arbors with the clustered hair cells gradually decreases as the neuromasts mature from 2 dpf to 4dpf in the sema7a-/- mutants (lines 174-176, Figure 2I). Moreover, in the sema7a-/- mutants the sensory axons throw long projections that keep getting farther away from the clustered hair cells as the neuromast matures from 2 dpf to 4 dpf (lines 166-168, Figure 2H; Figure 2—figure supplement 1K,L). These observations suggest that if the phenotypes in the sema7a-/- mutants were due to developmental delays, then we should have seen a recovery of disrupted arborization patterns over time. But instead, we observe a further deterioration of the arborization patterns and other architectural assemblies. These findings confirm that the observed phenotypes in the sema7a-/- mutants are not due to delayed development of the larvae, but a specific outcome for the loss of Sema7A protein.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Major concerns:

      Issue 1: One of the most interesting conclusions in this manuscript is the function of the GPIanchored vs. secreted form of Sema7a in axon structure and synapse formation. In lines 357360 of the discussion (for example) the authors state that they have shown that the GPIanchored form of Sema7a is responsible for contact-mediated synapse formation while the secreted form functions as a chemoattractant for axon arbor structure. "We have discovered dual modes of Sema7A function in vivo: the chemoattractive diffusible form is sufficient to guide the sensory arbors toward their target, whereas the membrane-attached form likely participates in sculpting accurate neural circuitry to facilitate contact-mediated formation and maintenance of synapses." However, the data do not support this conclusion. Specifically, no analysis is done showing unique expression of either isoform in hair cells and no functional analysis is done to conclusively determine which isoform is important for either phenotype.

      Comments: We have shown that both sema7a transcripts are expressed in the hair cells of four-day-old neuromasts (lines 78-84, Figure 1C,D). Ectopic expression of the sema7asec transcript variant robustly attracts the lateral-line sensory arbors toward itself, whereas ectopic expression of the sema7aGPI variant fails to impart sensory guidance from a distance, suggesting that the membrane-bound form likely participates in contact-mediated neural guidance. These experiments decisively show, for the first time in zebrafish, the dual modes of Sema7A function in vivo. However, we agree that the sema7aGPI transcript-specific knockout animal would be essential to conclusively prove that the membrane-attached form is primarily involved in forming accurate neural circuitry and contact-mediated formation and maintenance of synapses. Hence, we have very carefully stated in lines 427-428 that “the membrane-attached form likely participates in sculpting accurate neural circuitry to facilitate contact-mediated formation and maintenance of synapses”. We will follow up on this suggestion in our upcoming manuscript that will incorporate transcript-specific genetic ablations.

      Though the authors present RT-PCR analysis of sema7a isoforms, it is not interpretable. The second reverse primer will also recognize the full-length transcript (from what I can gather) so it does not simply show the presence of the secreted form. Is there a unique 3'UTR for the short transcript that can be used? Additionally, for the GPI-anchored version can you use a forward primer that is not present in the short isoform? This would shed some light on the respective levels of both transcripts.

      Comments: The C-termini of the two transcript variants are distinct and we have designed distinct primers that will selectively bind to each transcript (lines 503-511). Since, we have not performed quantitative polymerase chain reaction (qPCR), relative levels of each transcript are hard to determine.

      Alternatively, and perhaps of more use, in situ hybridization using unique probes for each isoform would allow you to determine which are actually present in hair cells.

      Comments: We have tried this approach and explained the point earlier (refer to lines 203212 of this response letter).

      To decisively state that these isoforms have unique functions in axon terminal structure and synapse formation, other experiments are also essential. For example, RNA-mediated rescue analyses using both isoforms would tell you which can rescue the axonal structure and synapse size/number phenotypes. Overexpression of the GPI-anchored form, like the secreted form in Figure 4, would allow you to determine if only the secreted form can cause abnormal axon extension phenotypes. Expression of both forms in hair cells (using a myo6b promotor for example) would allow assessment of their role in presynapse formation.

      Comments: We have ectopically expressed the sema7aGPI transcript variant near the sensory arbor network and observed that Sema7A-GPI fails to impart sensory axon guidance from a distance.

      Thank you for suggesting the rescue experiments. We are in the process of generating CRISPR/Cas9-mediated transcript-specific knockout animals. We are currently preparing another manuscript that incorporates the above-mentioned rescue experiments to dissect the role of each transcript in regulating arbor topology and synapse formation.

      For the overexpression experiments, expression of mKate alone (with and without heat shock) is also a critical control to include.

      Comments: We have incorporated two control experiments: (1) larvae injected with hsp70:sema7asec-mKate2 plasmid that were not heat shocked and (2) Uninjected larvae that were heatshocked. We think these two controls are sufficient to demonstrate that the abnormal arborization patterns are not artifacts generated due to plasmid injection and heatshocking.

      Issue 2: A second concern is the lack of data showing support cell and hair cell formation and function is unaffected. Analysis of support and hair cell number with loss of Sema7a as well as simple analyses of mechanotransduction (FM4-64) would help alleviate concerns that phenotypes are due to disrupted neuromast formation and basic hair cell function rather than a specific role for Sema7a in this process.

      Comments: We have measured the hair cell numbers in both control and sema7a-/- mutants across developmental stages. We have added this to our submitted raw data.

      We have utilized the styryl fluorophore FM4-64 to test the mechanotransduction function of the hair cells in sema7a-/- mutants. We have detailed our finding in lines 137141 and in Figure 2—figure supplement 1C,D.

      Expression analysis of Sema7a receptors would also help strengthen the argument for a specific effect on lateral line afferent axons.

      Comments: Thank you for this suggestion. Currently, we do not possess an RNA transcriptome dataset for the lateral line ganglion. This deficit limits a systematic screen for lateral-line sensory neuronal gene expressions either through antibody stains or via HCRmediated in situ techniques. In future we plan to develop an RNA transcriptome for the lateral-line ganglion and identify potential binding partners for Sema7A.

      Issue 3: The manuscript could also be improved to include more detail in some areas and less in others. In general, each section has a fairly long lead up but lacks important experimental details that would help the reader interpret the data. For example:

      Figure 1: What is the label for the lateral line axons? Is it a specific transgenic? The legend states that 3 asterisks indicate p<0.0001. What about the other asterisk combinations?

      Comments: We have clarified these issues in lines 118-121 and in lines 906-907.

      Figure 2: For the network analysis, are the traces for all axons that branch to innervate the neuromast?

      Comments: Yes, we have traced the entire arbor containing all the axons that branched from the lateral line nerve and extended toward the clustered hair cells. The three-dimensional traces depict a skeletonized representation of the arbor network.

      Can the tracing method distinguish individual axons?

      Comments: No, our goal is to understand how the axon-collective behave around the clustered hair cells during development.

      How do you know where an end is versus continued looping?

      Comments: We have categorically defined the topological attributes in lines 187-191 and in Figure 3A.

      Also, are all neuromasts similarly affected or is there a divergence based on which organ you are imaging? What neuromast was imaged in this and other figures?

      Comments: Yes, all the neuromasts in the trunk and tail regions were affected similarly by the sema7a mutation. We did not observe any region-specific phenotypic outcome. We consistently imaged the trunk neuromasts, particularly the second, third, and fourth neuromasts.

      Discussion: The short discussion failed to put these findings into context or to discuss how this unique topological arrangement of axon terminals impacts function.

      Comments: We have added a new segment, lines 432-448, in the discussion section which mentions the potential role of the topological features in arranging the distribution pattern of the postsynaptic densities and thereby potentially influencing the network’s ability to gather sensory inputs through properly placed postsynaptic aggregates.

      Can you speculate on how the looping structure may alter number of synaptic contacts per axon for instance? For this, it would be useful to know if normally the synapses form on loops versus bare terminals.

      Comments: Thank you for this insightful suggestion. We have performed detailed analysis, as mentioned in lines 384-397, to characterize the distribution of the postsynaptic densities between the two topological attributes.

      Does this looping facilitate single axons contacting more hair cells of the same polarity? Would that be beneficial?

      Comments: Looping behaviors indeed facilitate the contact between the axons and the hair cells. As we have observed, the primary topological attribute that the sensory arbor network underneath the clustered hair cells adopts is a loop. The bare terminals are predominantly projected transverse to the clustered hair cells and lack contact with them. Whether a single axon, being part of a loop, preferentially contacts hair cells of same polarity is yet to be determined. We can address this question by mosaic labeling a single axon in the arbor network and determine its association with the hair cells. We intend to do these experiments in our upcoming manuscript.

      Minor concerns:

      (1) For the stacked charts quantifying topological features, I found interpreting them challenging. Is it possible to put these into overlapping histograms or line graphs to better compare wild type to mutant directly?

      Comments: Thank you for your suggestion. We tried several ways to represent our data and found that the stacked charts optimally signify our analysis and depict the characteristic phenological differences between the control and the sema7a-/- mutants.

      (2) There are numerous strong statements throughout not directly supported by the data, e.g. lines 110-113; 206-208; 357-360 and others. These should be tempered.

      Comments: For lines 110-113, we have updated this section with new experiments and the new segment is represented in lines 115-126.

      For lines 206-208, we have updated the statement to “This result suggests that the stereotypical circuit topology observed in the mature organ may emerge through transition of individual arbors from forming bare terminals to forming closed loops encircling topological holes” in lines 225-227.

      Reviewer #2 (Recommendations For The Authors):

      The authors should be careful about making any assumptions which form of sema7a is active in NMs. Their RT-PCR demonstrates presence of both isoforms in a whole animal; however, whether they are similarly present in HCs is not investigated here.

      Comments: We have addressed this concern and have updated the manuscript with new experiments, detailed in lines 78-84.

      Also, there is an issue of translation and trafficking to the membrane with subsequent secretion. An important experiment that would address this question is expressing two sema7a isoforms in mutant HCs and asking whether this can suppress the mutant phenotype.

      Comments: Thank you for suggesting the rescue experiments. We are in the process of generating CRISPR/Cas9-mediated transcript-specific knockout animals. We are currently preparing another manuscript that incorporates the above-mentioned rescue experiments to dissect the role of each transcript in regulating arbor topology and synapse formation.

      Presumably, sema7a is trafficked to the membrane during HC maturation. This is consistent with the authors' observation that sema7a localization is changing as NM mature. However, actin-sema7a co-labeling does not actually show whether sema7a is on the membrane. Labeling HCs with a membrane marker (transgene) would be much more convincing. Alternatively, can the authors show sema7a localization actually correlates with the presence of sensory axon terminals? They already have immunos that label both. Thus, this should be pretty straightforward.

      Comments: Thank you for these suggestions. We have addressed these issues in lines 112114, and in lines 119-126.

      Figure 2 should have a control panel, so the reduced sema7a staining can be compared to the control side-by-side.

      Comments: We have depicted Sema7A staining in control neuromasts in multiple images, including Figure 1E, Figure 1H, and in Figure 2—figure supplement 1B. We have kept the control panel in the supplementary figure due to space restrictions in Figure 2.

      Arborization topology: While I appreciate the very careful characterization of the topology for wild-type and mutant NMs, I think it would be much more informative to mark individual axons and then analyze their topology. The main reason is that the authors cannot really distinguish whether some aspects of topology they describe are really due to the densely packed overlapping terminals of multiple axons or these are really characteristic, higher order organization of individual axons. Because of this, they cannot be certain what is really happening with sema7a mutant terminals. Related to the point above. While it is clear that the overall topology is abnormal in the mutant, the authors should be careful in concluding that sema7a regulates specific aspects of it. The overall structure is probably highly interconnected perturbing one parameter would likely affect all the others.

      Comments: Thank you for this comment. In a previous eLife publication from our laboratory, we utilized the serial blockface scanning electron micrograph (SBFSEM) technique to characterize the connectome of the neuromast microcircuit where patterns of innervation of all the individual axons can be delineated in five-days-old larvae (reference number 8). However, the collective behavior of all the sensory axons that build the innervation network remained enigmatic, especially in a living animal during development. In this paper we addressed how the sensory axon-collective behave around the clustered hair cells and build the innervation network in living animals during diverse developmental stages. Our analyses have not only identified how the axon-collective associates itself with the hair cell cluster as the organ matures, but also discovered distinct topological features in the arbor network that emerges during organ maturation, which may influence assembly of postsynaptic aggregates (lines 384-403, Figure 6G-I). We believe that our quantitative approach to capture collective axonal behaviors and their topological attributes during circuit formation have highlighted the importance of understanding network assembly during sensory organ development.

      Experiments with the secreted sema7a isoform would be much more informative if they were compared/contrasted to the GPI anchored isoform.

      Comments: We added a new section, lines 338-351, and a new Figure 5 to address this issue.

      The phenotype of ectopic projections in sema7a overexpression experiments is pretty dramatic, especially given the fact that these were performed in wild-type animals. Does this mean that the phenotype would be even more dramatic in sema7a mutants, as they have more bare axon terminals according to the authors' analysis. Have the authors attempted this type of experiments?

      Comments: That is an interesting suggestion. We have not tested that yet. Our guess is that in the sema7a-/- mutants, the abundant bare terminals will be far more sensitive to an ectopic source of Sema7A. But even in the sema7a-/- mutants, other chemotropic cues are still functional, which may impart certain restrictions on how many bare terminals are allowed to leave the neuromast region.

      Reviewer #3 (Recommendations For The Authors):

      (1) No raw data are shown, such that it is difficult to assess variability across animals or within animals, just the overall trends within the whole dataset. Raw data need to be shown for every measurement, at least in supplemental figures. It would also be useful to reliably show control next to mutant in the same plot, as it is a bit hard to compare across panels, which occurs in several figures.

      Comments: We have uploaded all the raw data related to each experiment.

      (2) Given the focus on the two possible forms of Sema7a, the authors should use HCR or another form of reliable in situ hybridization to show the spatiotemporal pattern of expression of each isoform.

      Comments: We have utilized the HCR™ RNA-FISH Technology to generate transcript specific probes. To generate transcript-specific HCR probes to distinctly detect the sema7aGPI (NM_001328508) and the sema7asec (NM_001114885) transcripts, Molecular Instruments could design only 11 probes against the sema7aGPI transcript and only one probe against the sema7asec transcript (personal correspondence with Mike Liu, PhD, Head of Operations and Product Development Lead Molecular Instruments, Inc.). The HCR probe against the sema7aGPI transcript showed a very faint signal. Unfortunately, the HCR probe against the sema7asec transcript failed to detect the presence of any transcript. For robust detection of transcripts, the protocol demands a minimum of 20 probes. We believe that the very low number of probes against our transcripts is the primary reason for the lack of a signal.

      (3) The authors should explain the criteria used to select the 22 embryos used to analyze the effects of expressing diffusible Sema7a.

      Comments: We have explained this in lines 291-292. We identified 22 mosaic sema7asecmKate2 integration events, in which a single mosaic ectopic integration had occurred near the network of sensory arbors, from a total of almost 100 integrations. We rejected events where the sema7asec-mKate2 integration occurred either farther away from the sensory arbor network or had happened in multiple neighboring cells.

      (4) Although arbors were imaged in live embryos, time is never presented as a variable, so I cannot tell whether axon topology was changing as the images were collected. This needs to be clarified.

      Comments: We imaged the trunk neuromasts of both control and sema7a-/- mutant live zebrsfish larvae at 2, 3, and 4 dpf. We imaged the control and the sema7a-/- mutants of each developmental stage in parallel, within a span of two hours, and repeated these experiments multiple times to gather almost a hundred larvae from each genotype. Even though the sensory arbor network is dynamic, we believe imaging both the genotypes in parallel and within a span of two hours, and averaging almost a hundred larvae from each genotype minimize the temporal variability observed in the arbor architecture.

      (5) Ideally, the authors should use CRISPR/cas-9 to create a mutation in the C-terminus that would prevent production of the GPI-anchored form and not of the diffusible form. I understand if this is too much work to do in a short time, and would be satisfied with another experiment that could distinguish roles for at least one isoform more clearly. For instance, it would be interesting to see an analysis of how far an axon can be from a source to detect diffusible Sema7a (live imaging would be ideal for this) and then to show that the effect is different when the membrane bound form is expressed.

      Comments: Thank you for this comment. We are currently working in generating transcript specific knockout animals.

      We have added live timelapse video microscopy data in lines 330-337, Figure 4H-J, Figure 4—figure supplement 2, Video15,16.

      We have added a new segment analyzing the membrane-bound transcript variant in lines 338-351.

      Reviewer #4 (Recommendations For The Authors):

      Feedback to authors

      Overall, this is a very important and novel study. Currently the manuscript does need revision.

      Major concerns:

      (1) Controls. For the ectoptic expression of Sema7a, injection of a construct expressing Sema7a under a heatshock promoter is used to drive ectopic expression. No heatshock (injected) animal are used as a control. In many systems heatshock can impact neuron morphology. And heatshock proteins are required for normal neurite and synapse formation. Please examine sensory axons in uninjected wildtype animals with heatshock.

      Comments: We have added this control experiment in a new segment, explained in detail in lines 348-350 and Figure 5.

      (2) Synapse staining - regarding Figure 5 and related supplement

      Understanding whether guidance defects ultimately impact synapse formation is an important aspect of this paper. Therefore, is necessary to have accurate measurements of the number of complete synapses, and the overall numbers of pre- and postsynaptic components. Currently the data plotted in Figure 5 is extensive, but the way the data is laid out, the relevant comparisons are challenging to make. Perhaps include this quantification in the supplement, and move the data from the supplement to the main figure? The quantifications in the supplement are easier to follow and easier to compare between genotypes.

      Comments: We have performed exhaustive analysis on the formation of pre- and postsynaptic structures and have identified how their distribution changes along neuromast development in control larvae. We have further analyzed how such distribution is perturbed in the sema7a-/- mutants. We believe that showing only the average numbers will not reveal the changes in the distribution of the synaptic structures during development and across genotypes.

      Looking at the data itself, there seems to be some discrepancies with the synaptic counts compared to published work. While the CTBP numbers seem in order, the Maguk numbers do not. In both mutant and control there are many hair cells without any Maguk puncta/aggregates-leading to 0.75-1 postsynapses per hair cell (Figure 5 supplement H-I). Typically, the numbers should be more comparable to what was obtained for CTBP, 3-4 puncta per cells (Figure 5 supplement B-C), especially by 3-4 dpf. 3-4 CTPB or Maguk puncta per cell is based on previously published immunostaining and EM work.

      The Maguk immunostaining, especially at early stages (2-3 dpf) is challenging. To compound a challenging immunostain, around 2019 Neuromab began to outsource the purification of their Maguk antibody. After this outsourcing our lab was no longer able to get reliable label with the Maguk antibody from Neuromab.

      Millipore sells the same monoclonal antibody and it works well: https://www.emdmillipore.com/US/en/product/Anti-pan-MAGUK-Antibody-clone-K2886,MM_NF-MABN72

      I would recommend this source.

      Comments: Thank you for suggesting the new MAGUK antibody. We have utilized this new MAGUK antibody from Millipore and added a new segment in lines 389-408. In future publication we will utilize this antibody to capture the postsynaptic densities in the sensory arbors.

      The discrepancies in the postsynaptic punctae number in our control larvae may arise due to the reliability of the Neuromab MAGUK antibody. We have utilized this same antibody to stain the sema7a-/- mutants and have observed a significant decrease in MAGUK punctae number and area. On grounds of keeping parity between the control and the sema7a-/- mutants, we have decided to keep our experimental results in the manuscript.

      In addition to a more accurate Maguk label, a combined pre- and post-synaptic label is essential to understand whether synapses pair properly in the sema7a mutants. This can be accomplished using subtype specific antibodies using goat anti-mouse IgG1/Maguk and goat anti-mouse IgG2a/CTBP secondaries.

      Comments: Thank you for suggesting this. We are preparing another manuscript in which we will utilize this technique along with other suggestions to tease apart the role of distinct transcript variants in regulating neural guidance and synapse formation.

      (3) Does sema7a lesion impact the number of hair cells per neuromast? If hair cell numbers are reduced several of the quantifications could be impacted.

      Comments: We have added the raw data with the hair cell counts in both control and sema7a-/- mutants across developmental stages. The homozygous sema7a-/- mutants have slightly less hair cells and we have normalized all our topological analyses by the corresponding hair cell numbers for each neuromast in each experiment (lines 669-675).

      (4) Could innervation just be developmentally delayed in sema7a mutants? At 4 dpf the sensory system is just starting to come online and could still be in the process of refinement. Did you look at slightly older ages, after the sensory system is functional behaviorally, for example, 6 dpf? Do the cores phenotypes (synapse defects and excess arbors) persist at 6 dpf in the sema7a mutants?

      Comments: The homozygous sema7a-/- mutants are unviable and start to die at 6 dpf. We therefore restricted our analysis until 4 dpf. The association of the sensory arbors with the clustered hair cells gradually decreases as the neuromasts mature from 2 dpf to 4dpf in the sema7a-/- mutants (lines 174-176, Figure 2I). Moreover, in the sema7a-/- mutants the sensory axons throw long projections that keep getting farther away from the clustered hair cells as the neuromast matures from 2 dpf to 4 dpf (lines 166-168, Figure 2H; Figure 2—figure supplement 1K,L). These observations suggests that if the phenotypes in the sema7a-/- mutants were due to developmental delays, then we should have seen a recovery of disrupted arborization patterns over time. But instead, we observe a further deterioration of the arborization patterns and other architectural assemblies. These findings confirm that the observed phenotypes in the sema7a-/- mutants are not due to delayed development of the larvae, but a specific outcome for the loss of Sema7A protein.

      Minor comments to address:

      Results

      Page 4 lines 89-91. For the readers, explain why you examined levels in Sema7a in rostral and caudal hair cells. Also, this sentence is, in general, a little bit misleading-initially reading that there is no difference in Sema7a at 1.5-4 dpf.

      Comments: In lines 44-48, we explain that the hair cells in the neuromast contain mechanoreceptive hair cells of opposing polarities that help them detect water currents from opposing directions. In lines 93-106, we tested whether the Sema7A level varies between the two polarities. We observed that the Sema7A level is similar between the two polarities of hair cells, but the average Sema7A intensity increases significantly over the developmental period of 2 dpf to 4 dpf in both rostrally and caudally polarized hair cells.

      Page 10-11 Lines 263-270. What was the frequency of these 2 outcomes- out of the 22 cases with ectopic expression?

      Comments: We have explained this in lines 291-292. We identified 22 mosaic sema7asecmKate2 integration events, in which a single mosaic ectopic integration had occurred near the network of sensory arbors, from a total of almost 100 integrations. We rejected events where the sema7asec-mKate2 integration occurred either farther away from the sensory arbor network or had happened in multiple neighboring cells.

      Discussion

      Page 14 Lines 359-360. There is not enough evidence provided in this work to suggest that the membrane attached form of Sema7a is playing a role. Both the secreted and membrane form are gone in the sema7a mutants. If the membrane attached form was specifically lesioned, and resulted in a phenotype, then there would be sufficient evidence. Currently there is strong evidence for a distinct role for the secreted form. Although the authors qualify the outlined statement with the word 'likely', stating this possibility in the discussion take-home is misleading.

      Comments: In future we will utilize the CRISPR/Cas9 technique to target the unique Cterminal domain of the GPI-anchored sema7a transcript variant. We believe that this will only perturb the formation of the full-length Sema7A protein and help us differentiate between the roles of the membrane-bound Sema7AGPI molecule and the secreted Sema7Asec in sensory arborization and synaptic assembly.

      It might be interesting in either the intro or discussion to reference the role Sema3F in axon guidance in the mouse auditory epithelium. https://elifesciences.org/articles/07830

      Comments: We have added this reference in lines 61-64.

      Figures

      Please indicate on one of your Figures where the mutation is (roughly) in the sema7a mutant (in addition to stating it in the results).

      Comments: We have added this information in Figure 2—figure supplement 1A.

      Either state or indicate in a Figure where the epitope used to make the Sema7a antibody-to show that the antibody is predicted to recognize both isoforms.

      Comments: We have stated the details of the epitope in lines 528-529.

      Figure 2-S1 what is the scale in panel A, is it different between mutant and wildtype?

      Comments: We have updated the images. New images are depicted in Figure 2—figure supplement 1A.

      Methods

      What were the methods used to quantify synapse number and area?

      Comments: We have added a new section in lines 702-708 to explain the measurement techniques.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This article presents important results describing how the gathering, integration, and broadcasting of information in the brain changes when consciousness is lost either through anesthesia or injury. They provide convincing evidence to support their conclusions, although the paper relies on a single analysis tool (partial information decomposition) and could benefit from a clearer explication of its conceptual basis, methodology, and results. The work will be of interest to both neuroscientists and clinicians interested in fundamental and clinical aspects of consciousness.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this paper, Luppi et al., apply the recently developed integrated information decomposition to the question how the architecture of information processing changes when consciousness is lost. They explore fMRI data from two different populations: healthy volunteers undergoing reversible anesthesia, as well as from patients who have long-term disorders of consciousness. They show that, in both populations, synergistic integration of information is disrupted in common ways. These results are interpreted in the context of the SAPHIRE model (recently proposed by this same group), that describes information processing in the brain as being composed of several distinct steps: 1) gatekeeping (where gateway regions introduce sensory information to the global synergistic workspace where 2) it is integrated or "processed" before 3) by broadcast back to to the brain.

      I think that this paper is an excellent addition to the literature on information theory in neuroscience, and consciousness science specifically. The writing is clear, the figures are informative, and the authors do a good job of engaging with existing literature. While I do have some questions about the interpretations of the various information-theoretic measures, all in all, I think this is a significant piece of science that I am glad to see added to the literature.

      One specific question I have is that I am still a little unsure about what "synergy" really is in this context. From the methods, it is defined as that part of the joint mutual information that is greater than the maximum marginal mutual information. While this is a perfectly fine mathematical measure, it is not clear to me what that means for a squishy organ like the brain. What should these results mean to a neuro-biologist or clinician?

      Right now the discussion is very high level, equating synergy to "information processing" or "integrated information", but it might be helpful for readers not steeped in multivariate information theory to have some kind of toy model that gets worked out in detail. On page 15, the logical XOR is presented in the context of the single-target PID, but 1) the XOR is discrete, while the data analyzed here are continuous BOLD signals w/ Gaussian assumptions and 2) the XOR gate is a single-target system, while the power of the Phi-ID approach is the multi-target generality. Is there a Gaussian analog of the single-target XOR gate that could be presented? Or some multi-target, Gaussian toy model with enough synergy to be interesting? I think this would go a long way to making this work more accessible to the kind of interdisciplinary readership that this kind of article with inevitably attract.

      We appreciate this observation. We now clarify that:

      “redundancy between two units occurs when their future spontaneous evolution is predicted equally well by the past of either unit. Synergy instead occurs when considering the two units together increases the mutual information between the units’ past and their future – suggesting that the future of each is shaped by its interactions with the other. At the microscale (e.g., for spiking neurons) this phenomenon has been suggested as reflecting “information modification” 36,40,47. Synergy can also be viewed as reflecting the joint contribution of parts of the system to the whole, that is not driven by common input48.”

      In the Methods, we have also added the following example to provide additional intuition about synergy in the case of continuous rather than discrete variables:

      “As another example for the case of Gaussian variables (as employed here), consider a 2-node coupled autoregressive process with two parameters: a noise correlation c and a coupling parameter a. As c increases, the system is flooded by “common noise”, making the system increasingly redundant because the common noise “swamps” the signal of each node. As a increases, each node has a stronger influence both on the other and on the system as a whole, and we expect synergy to increase. Therefore, synergy reflects the joint contribution of parts of the system to the whole that is not driven by common noise. This has been demonstrated through computational modelling (Mediano et al 2019 Entropy).”

      See below for the relevant parts of Figures 1 and 2 from Mediano et al (2019 Entropy), where Psi refers to the total synergy in the system.

      Author response image 1.

      Strengths

      The authors have a very strong collection of datasets with which to explore their topic of interest. By comparing fMRI scans from patients with disorders of consciousness, healthy resting state, and various stages of propofol anesthesia, the authors have a very robust sample of the various ways consciousness can be perturbed, or lost. Consequently, it is difficult to imagine that the observed effects are merely a quirk of some biophysical effect of propofol specifically, or a particular consequence of long-term brain injury, but do in fact reflect some global property related to consciousness. The data and analyses themselves are well-described, have been previously validated, and are generally strong. I have no reason to doubt the technical validity of the presented results.

      The discussion and interpretation of these results is also very nice, bringing together ideas from the two leading neurocognitive theories of consciousness (Global Workspace and Integrated Information Theory) in a way that feels natural. The SAPHIRE model seems plausible and amenable to future research. The authors discuss this in the paper, but I think that future work on less radical interventions (e.g. movie watching, cognitive tasks, etc) could be very helpful in refining the SAPHIRE approach.

      Finally, the analogy between the PID terms and the information provided by each eye redundantly, uniquely, and synergistically is superb. I will definitely be referencing this intuition pump in future discussions of multivariate information sharing.

      We are very grateful for these positive comments, and for the feedback on our eye metaphor.

      Weaknesses

      I have some concerns about the way "information processing" is used in this study. The data analyzed, fMRI BOLD data is extremely coarse, both in spatial and temporal terms. I am not sure I am convinced that this is the natural scale at which to talk about information "processing" or "integration" in the brain. In contrast to measures like sample entropy or Lempel-Ziv complexity (which just describe the statistics of BOLD activity), synergy and Phi are presented here as quasi-causal measures: as if they "cause" or "represent" phenomenological consciousness. While the theoretical arguments linking integration to consciousness are compelling, is this is right data set to explore them in? For example, the work by Newman, Beggs, and Sherril (nee Faber), synergy is associated with "computation" performed in individual neurons: the information about the future state of a target neuron that is only accessible when knowing both inputs (analogous to the synergy in computing the sum of two dice). Whether one thinks that this is a good approach neural computation or not, it fits within the commonly accepted causal model of neural spiking activity: neurons receive inputs from multiple upstream neurons, integrate those inputs and change their firing behavior accordingly.

      In contrast, here, we are looking at BOLD data, which is a proxy measure for gross-scale regional neural activity, which itself is a coarse-graining of millions of individual neurons to a uni-dimensional spectrum that runs from "inactive to active." It feels as though a lot of inferences are being made from very coarse data.

      We appreciate the opportunity to clarify this point. It is not our intention to claim that Phi-R and synergy, as measured at the level of regional BOLD signals, represent a direct cause of consciousness, or are identical to it. Rather, our work is intended to use these measures similarly to the use of sample entropy and LZC for BOLD signals: as theoretically grounded macroscale indicators, whose empirical relationship to consciousness may reveal the relevant underlying phenomena. In other words, while our results do show that BOLD-derived Phi-R tracks the loss and recovery of consciousness, we do not claim that they are the cause of it: only that an empirical relationship exists, which is in line with what we might expect on theoretical grounds. We have now clarified this in the Limitations section of our revised manuscript, as well as revising our language accordingly in the rest of the manuscript.

      We also clarify that the meaning of “information processing” that we adopt pertains to “intrinsic” information that is present in the system’s spontaneous dynamics, rather than extrinsic information about a task:

      “Information decomposition can be applied to neural data from different scales, from electrophysiology to functional MRI, with or without reference to behaviour 34. When behavioural data are taken into account, information decomposition can shed light on the processing of “extrinsic” information, understood as the translation of sensory signals into behavioural choices across neurons or regions 41,43,45,47. However, information decomposition can also be applied to investigate the “intrinsic” information that is present in the brain’s spontaneous dynamics in the absence of any tasks, in the same vein as resting-state “functional connectivity” and methods from statistical causal inference such as Granger causality 49. In this context, information processing should be understood in terms of the dynamics of information: where and how information is stored, transferred, and modified 34.”

      References:

      (1) Newman, E. L., Varley, T. F., Parakkattu, V. K., Sherrill, S. P. & Beggs, J. M. Revealing the Dynamics of Neural Information Processing with Multivariate Information Decomposition. Entropy 24, 930 (2022).

      Reviewer #2 (Public Review):

      The authors analysed functional MRI recordings of brain activity at rest, using state-of-the-art methods that reveal the diverse ways in which the information can be integrated in the brain. In this way, they found brain areas that act as (synergistic) gateways for the 'global workspace', where conscious access to information or cognition would occur, and brain areas that serve as (redundant) broadcasters from the global workspace to the rest of the brain. The results are compelling and consisting with the already assumed role of several networks and areas within the Global Neuronal Workspace framework. Thus, in a way, this work comes to stress the role of synergy and redundancy as complementary information processing modes, which fulfill different roles in the big context of information integration.

      In addition, to prove that the identified high-order interactions are relevant to the phenomenon of consciousness, the same analysis was performed in subjects under anesthesia or with disorders of consciousness (DOC), showing that indeed the loss of consciousness is associated with a deficient integration of information within the gateway regions.

      However, there is something confusing in the redundancy and synergy matrices shown in Figure 2. These are pair-wise matrices, where the PID was applied to identify high-order interactions between pairs of brain regions. I understand that synergy and redundancy are assessed in the way the brain areas integrate information in time, but it is still a little contradictory to speak about high-order in pairs of areas. When talking about a "synergistic core", one expects that all or most of the areas belonging to that core are simultaneously involved in some (synergistic) information processing, and I do not see this being assessed with the currently presented methodology. Similarly, if redundancy is assessed only in pairs of areas, it may be due to simple correlations between them, so it is not a high-order interaction. Perhaps it is a matter of language, or about the expectations that the word 'synergy' evokes, so a clarification about this issue is needed. Moreover, as the rest of the work is based on these 'pair-wise' redundancy and synergy matrices, it becomes a significative issue.

      We are grateful for the opportunity to clarify this point. We should highlight that PhiID is in fact assessing four variables: the past of region X, the past of region B, the future of region X, and the future of region Y. Since X and Y each feature both in the past and in the future, we can re-conceptualise the PhiID outputs as reflecting the temporal evolution of how X and Y jointly convey information: the persistent redundancy that we consider corresponds to information that is always present in both X and Y; whereas the persistent synergy is information that X and Y always convey synergistically. In contrast, information transfer would correspond to the phenomenon whereby information was conveyed by one variable in the past, and by the other in the future (see Luppi et al., 2024 TICS; and Mediano et al., 2021 arXiv for more thorough discussions on this point). We have now added this clarification in our Introduction and Results, as well as adding the new Figure 2 to clarify the meaning of PhiID terms.

      We would also like to clarify that all the edges that we identify as significantly changing are indeed simultaneously involved in the difference between consciousness and unconsciousness. This is because the Network-Based Statistic differs from other ways of identifying edges that are significantly different between two groups or conditions, because it does not consider edges in isolation, but only as part of a single connected component.

      Reviewer #3 (Public Review):

      The work proposes a model of neural information processing based on a 'synergistic global workspace,' which processes information in three principal steps: a gatekeeping step (information gathering), an information integration step, and finally, a broadcasting step. The authors determined the synergistic global workspace based on previous work and extended the role of its elements using 100 fMRI recordings of the resting state of healthy participants of the HCP. The authors then applied network analysis and two different measures of information integration to examine changes in reduced states of consciousness (such as anesthesia and after-coma disorders of consciousness). They provided an interpretation of the results in terms of the proposed model of brain information processing, which could be helpful to be implemented in other states of consciousness and related to perturbative approaches. Overall, I found the manuscript to be well-organized, and the results are interesting and could be informative for a broad range of literature, suggesting interesting new ideas for the field to explore. However, there are some points that the authors could clarify to strengthen the paper. Key points include:

      (1) The work strongly relies on the identification of the regions belonging to the synergistic global workspace, which was primarily proposed and computed in a previous paper by the authors. It would be great if this computation could be included in a more explicit way in this manuscript to make it self-contained. Maybe include some table or figure being explicit in the Gradient of redundancy-to-synergy relative importance results and procedure.

      We have now added the new Supplementary Figure 1 to clarify how the synergistic workspace is identified, as per Luppi et al (2022 Nature Neuroscience).

      (2) It would be beneficial if the authors could provide further explanation regarding the differences in the procedure for selecting the workspace and its role within the proposed architecture. For instance, why does one case uses the strength of the nodes while the other case uses the participation coefficient? It would be interesting to explore what would happen if the workspace was defined directly using the participation coefficient instead of the strength. Additionally, what impact would it have on the procedure if a different selection of modules was used? For example, instead of using the RSN, other criteria, such as modularity algorithms, PCA, Hidden Markov Models, Variational Autoencoders, etc., could be considered. The main point of my question is that, probably, the RSN are quite redundant networks and other methods, as PCA generates independent networks. It would be helpful if the authors could offer some comments on their intuition regarding these points without necessarily requiring additional computations.

      We appreciate the opportunity to clarify this point. Our rationale for the procedure used to identify the workspace is to find regions where synergy is especially prominent. This is due to the close mathematical relationship between synergistic information and integration of information (see also Luppi et al., 2024 TICS), which we view as the core function of the global workspace. This identification is based on the strength ranking, as per Luppi et al (2022 Nature Neuroscience), which demonstrated that regions where synergy predominates (i.e., our proposed workspace) are also involved with high-level cognitive functions and anatomically coincide with transmodal association cortices at the confluence of multiple information streams. This is what we should expect of a global workspace, which is why we use the strength of synergistic interactions to identify it, rather than the participation coefficient. Subsequently, to discern broadcasters from gateways within the synergistic workspace, we seek to encapsulate the meaning of a “broadcaster” in information terms. We argue that this corresponds with making the same information available to multiple modules. Sameness of information corresponds to redundancy, and multiplicity of modules can be reflected in the network-theoretic notion of participation coefficient. Thus, a broadcaster is a region in the synergistic workspace (i.e., a region with strong synergistic interactions) that in addition has a high participation coefficient for its redundant interactions.

      Pertaining specifically to the use of resting-state networks as modules, indeed our own (Luppi et al., 2022 Nature Neuroscience) and others’ research has shown that each RSN entertains primarily redundant interactions among its constituent regions. This is not surprising, since RSNs are functionally defined: their constituent elements need to process the same information (e.g., pertaining to a visual task in case of the visual network). We used the RSNs as our definition of modules, because they are widely understood to reflect the intrinsic organisation of brain activity into functional units; for example, Smith et al., (2009 PNAS) and Cole et al (2014 Neuron) both showed that RSNs reflect task-related co-activation of regions, whether directly quantified from fMRI in individuals performing multiple tasks, or inferred from meta-analysis of the neuroimaging literature. This is the aspect of a “module” that matters from the global workspace perspective: modules are units with distinct function, and RSNs capture this well. This is therefore why we use the RSNs as modules when defining the participation coefficient: they provide an a-priori division into units with functionally distinct roles.

      Nonetheless, we also note that RSN organisation is robustly recovered using many different methods, including seed-based correlation from specific regions-of-interest, or Independent Components Analysis, or community detection on the network of inter-regional correlations - demonstrating that they are not merely a function of the specific method used to identify them. In fact, we show significant correlation between participation coefficient defined in terms of RSNs, and in terms of modules identified in a purely data-driven manner from Louvain consensus clustering (Figure S4).

      (3) The authors acknowledged the potential relevance of perturbative approaches in terms of PCI and quantification of consciousness. It would be valuable if the authors could also discuss perturbative approaches in relation to inducing transitions between brain states. In other words, since the authors investigate disorders of consciousness where interventions could provide insights into treatment, as suggested by computational and experimental works, it would be interesting to explore the relationship between the synergistic workspace and its modifications from this perspective as well.

      We thank the Reviewer for bringing this up: we now cite several studies that in recent years have applied perturbative approaches to induce transitions between states of consciousness.

      “The PCI is used as a means of assessing the brain’s current state, but stimulation protocols can also be adopted to directly induce transitions between states of consciousness. In rodents, carbachol administration to frontal cortex awakens rats from sevoflurane anaesthesia120, and optogenetic stimulation was used to identify a role of central thalamus neurons in controlling transitions between states of responsiveness121,122. Additionally, several studies in non-human primates have now shown that electrical stimulation of the central thalamus can reliably induce awakening from anaesthesia, accompanied by the reversal of electrophysiological and fMRI markers of anaesthesia 123–128. Finally, in human patients suffering from disorders of consciousness, stimulation of intra-laminar central thalamic nuclei was reported to induce behavioural improvement 129, and ultrasonic stimulation 130,131 and deep-brain stimulation are among potential therapies being considered for DOC patients 132,133. It will be of considerable interest to determine whether our corrected measure of integrated information and topography of the synergistic workspace also restored by these causal interventions.”

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      I would appreciate it if the authors could revisit the figures and make sure that:

      (1) All fonts are large enough to be readable for people with visual impairments (for ex. the ranges on the colorbars in Fig. 2 are unreadably small).

      Thank you: we have increased font sizes.

      (2) The colormaps are scaled to show meaningful differences (Fig. 2A)

      We have changed the color scale in Figure 2A and 2B.

      Also, the authors may want to revisit the references section: some of the papers that were pre-prints at one point have now been published and should be updated.

      Thank you: we have updated our references.

      Minor comments:

      • In Eqs. 2 and 3, the unique information term uses the bar notation ( | ) that is typically indicative of "conditioned on." Perhaps the authors could use a slash notation (e.g. Unq(X ; Z / Y)) to avoid this ambiguity? My understanding of the Unique information is that it is not necessarily "conditioned on", so much as it is "in the context of".

      Indeed, the “|” sign of “conditioning” could be misleading; however, the “/” sign could also be misleading, if interpreted as division. Therefore, we have opted for the “\” sign of “set difference”, in Eq 2 and 3, which is conceptually more appropriate in this context.

      • The font on the figures is a little bit small - for readers with poor eyes, it might be helpful to increase the wording size.

      We have increased font sizes in the figures where relevant.

      • I don't quite understand what is happening in Fig. 2A - perhaps it is a colormap issue, but it seems as though it's just a bit white square? It looks like redundancy is broadly correlated with FC (just based on the look of the adjacency matrices), but I have no real sense of what the synergistic matrix looks like, other than "flat."

      We have now changed the color scale in Figure 2.

      Reviewer #2 (Recommendations For The Authors):

      Besides the issues mentioned in the Public review, I have the following suggestions to improve the manuscript:

      • At the end of the introduction, a few lines could be added explaining why the study of DOC patients and subjects under anesthesia will be informative in the context of this work.

      By comparing functional brain scans from transient anaesthetic-induced unconsciousness and from the persistent unconsciousness of DOC patients, which arises from brain injury, we can search for common brain changes associated with loss of consciousness – thereby disambiguating what is specific to loss of consciousness.

      • On page and in general the first part of Results, it is not evident that you are working with functional connectivity. Many times the word 'connection' is used and sometimes I was wondering whether they were structural or functional. Please clarify. Also, the meaning of 'synergistic connection' or 'redundant connection' could be explained in lay terms.

      Thank you for bringing this up. We have now replaced the word “connection” with “interaction” to disambiguate this issue, further adding “functional” where appropriate. We have also provided, in the Introduction, an intuitive explanation of what synergy and redundancy mean int he context of spontaneous fMRI signals.

      • Figure 2 needs a lot of improvement. The matrix of synergistic interactions looks completely yellow-ish with some vague areas of white. So everything is above 2. What does it mean?? Pretty uninformative. The matrix of redundant connections looks a lot of black, with some red here and there. So everything is below 0.6. Also, what are the meaning and units of the colorbars?.

      We agree: we have increased font sizes, added labels, and changed the color scale in Figure 2. We hope that the new version of Figure 2 will be clearer.

      • Caption of Figure 2 mentions "... brain regions identified as belonging to the synergistic global workspace". I didn't get it clear how do you define these areas. Are they just the sum of gateways and broadcasters, or is there another criterion?

      Regions belonging to the synergistic workspace are indeed the set comprising gateways and broadcasters; they are the regions that are synergy-dominated, as defined in Luppi et al., 2022 Nature Neuroscience. We have now clarified this in the figure caption.

      • In the first lines of page 7, it is said that data from DOC and anesthesia was parcellated in 400 + 54 regions. However, it was said in a manner that made me think it was a different parcellation than the other data. Please make it clear that the parcellation is the same (if it is).

      We have now clarified that the 400 cortical regions are from the Schaefer atlas, and 54 subcortical regions from the Tian atlas, as for the other analysis. The only other parcellation that we use is the Schaefer-232, for the robustness analysis. This is also reported in the Methods.

      • Figure 3: the labels in the colorbars cannot be read, please make them bigger. Also, the colorbars and colorscales should be centered in white, to make it clear that red is positive and blue is negative. O at least maintain consistency across the panels (I can't tell because of the small numbers).

      Thank you: we have increased font sizes, added labels, indicated that white refers to zero (so that red is always an increase, and blue is always a decrease), and changed the color scale in Figure 2.

      • The legend of Figure 4 is written in a different style, interpreting the figure rather than describing it. Please describe the figure in the caption, in order to let the read know what they are looking at.

      We have endeavoured to rewrite the legend of Figure 4 in a style that is more consistent with the other figures.

      • In several parts the 'whole-minus-sum' phi measure is mentioned and it is said that it did not decrease during loss of consciousness. However, I did not see any figure about that nor any conspicuous reference to that in Results text. Where is it?

      We apologise for the confusion: this is Figure S3A, in the Supplementary. We have now clarified this in the text.

      Reviewer #3 (Recommendations For The Authors):

      (1) In the same direction, regarding Fig. 2, in my opinion, it does not effectively aid in understanding the selection of regions as more synergistic or redundant. In panels A) and B), the color scales could be improved to better distinguish regions in the matrices (panel A) is saturated at the upper limit, while panel B) is saturated at the lower limit). Additionally, I suggest indicating in the panels what is being measured with the color scales.

      Thank you: we have increased font sizes, added labels, and changed the color scale in Figure 2.

      (2) When investigating the synergistic core of human consciousness and interpreting the results of changes in information integration measures in terms of the proposed framework, did the authors consider the synergistic workspace computed in HCP data? If the answer is positive, it would be helpful for the authors to be more explicit about it and elaborate on any differences that may be found, as well as the potential impact on interpretation.

      This is correct: the synergistic workspace, including gateways and broadcasters, are identified from the Human Connectome Project dataset. We now clarify this in the manuscript.

      Minors:

      (1) I would suggest improving the readability of figures 2 and 3, considering font size (letters and numbers) and color bars (numbers and indicate what is measured with this scale). In Figure 1, the caption defines steps instead stages that are indicated in the figure.

      Thank you: we have increased font sizes, added labels, and replaced steps with “stages” in Figure 1.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reply to comments:

      (1) It was not clear why the phylogenetic analysis included non-validated GPCRs that clustered with the validated peptidergic receptors. Would restricting the phylogenetic analyses only to confirmed peptidergic GPCRs alter the topology of the tree and subsequent conclusions of independent expansion?

      Thank you for this comment. In general, phylogenetic analyses become more robust if a larger diversity and fuller complement of sequences are included. With very sparse sampling, sequences that are homologous but not orthologous may be misleadingly grouped together, because intermediate sequences have been left out. For tree building, we thus did not want to focus only on experimentally validated receptors but also on all receptors that are phylogenetically related to the validated receptors. Only this approach can ensure a comprehensive exploration of the relationship of peptidergic receptors. The broader phylogenetic approach was also essential to identify orthologs to the experimentally validated Nematostella receptors across other cnidarian species.

      (2) Clearly, other neuropeptide signaling systems in cnidarians remain to be discovered but this paper represents a huge step forward.

      We appreciate this assessment of the paper. We agree that many systems remain to be discovered. Our paper will also help with the identification of further receptors both in Nematostella as well as other cnidarian species. Please note that we have made specific receptor-ligand predictions for several cnidarian species based on our phylogenetic analysis. Our phylogenies could also help prioritize the study of the remaining orphan Nematostella GPCRs.

      (3) There are limitations in what can be interpreted from single cell transcriptomic data but the data nevertheless provide the foundations for future studies involving i). detailed anatomical analysis of neuropeptide and neuropeptide receptor expression in N. vectensis using mRNA in situ hybridization and/or immunohistochemical methods and ii). functional analysis of the physiological/behavioral roles of neuropeptide signaling systems in N. vectensis

      We fully agree with this comment. The analysis of the available single-cell sequence resources clearly represents only the first step of anatomical and functional analyses. Our aim was to place the identified peptide-receptor interactions into a whole-organism context with cell type resolution, to highlight the potential complexity of peptidergic signaling in this organism and to facilitate the exploration and conceptualisation of our biochemical screen.

      Comments to authors

      (1) In future, when preparing manuscripts, please use page and line numbers; it makes the task below for reviewers much easier!

      We appreciate the suggestion and will do this for future manuscripts.

      (2) In the abstract the term "extensively wired" is used. In the context of neuropeptide mediated volume transmission this may not be an appropriate term to use because use of the word "wired" is likely to be associated with point-to-point type classical synaptic transmission; "extensively connected" would be better.

      Thank you for this comment. We have changed the text in the abstract to “extensively connected”.

      (3) Introduction: Please change "seven-transmembrane proteins and show a slower evolutionary rate than proneuropeptide..." to "seven-transmembrane proteins that show a slower evolutionary rate than proneuropeptide..."

      Changed.

      (4) Under the section "Creation of a Nematostella neuropeptide library, what is meant by "our regular expressions"? This needs to be rephrased to make it clearer what is meant.

      We have now rephrased the relevant sentence to make our approach clearer.

      “This predicted secretome was filtered with regular expressions to detect sequences with the repetitive dibasic cleavage sites (K and R in any combination) and amidation sites, using a custom script from a previous publication (Thiel et al., 2021).”

      and later:

      “Based on the MS data, we included the additional, non-dibasic N-terminal cleavage sites into our script that uses regular expressions to search for repetitive cleavage sites (Thiel et al., 2024) and re-screened the predicted secretome.”

      (5) Under the section "Creation of a Nematostella neuropeptide library" the phrase "differ in the length of their N-terminus" needs to be changed to "differ in the length of their N-terminal region". The N-terminus is, as its name implies, one end of the peptide/protein so it can't have a length as such.

      Changed.

      (6) Under the section "Analysis of metazoan class A GPCRs and selection of N. vectensis neuropeptide-receptor candidates",

      Change:

      "For a more detailed analysis, we then reduced our sampled species to the cnidarian, the bilaterian with experimentally confirmed GPCRs and Petromyzon marinus, and the two placozoan species (Figure 2B)."

      To

      "For a more detailed analysis, we then reduced our sampled species to cnidarians, bilaterians with experimentally confirmed GPCRs and Petromyzon marinus, and two placozoan species (Figure 2B)."

      Changed.

      (7) Under the section "Analysis of metazoan class A GPCRs and selection of N. vectensis neuropeptide-receptor candidates" - change "We re-run" to "We re-ran"

      Changed.

      (8) Throughout the paper reference is made to a variety of neuropeptides that have or are predicted to have an N-terminal pyroglutmate. However, these are referred to without indicating this post-translational modification e.g. QGRFamide.

      This should be corrected throughout the paper, in the text, and figures. Two abbreviations for pyroglutamate are used in the literature:

      pQ, which shows that the encoded amino acid is Q (Glutamine)

      pE, which shows that the post-translationally modified amino-acid is glutamate (E)

      In the neuropeptide field, pQ seems to be more widely used than pE, so our recommendation would be to use pQ.

      In the revised version we now write pyroQ whenever we refer to the actual peptide. We now only use the peptide name without indicating this modification when we refer to the precursor of these peptides.

      (9) The title for Figure 5 is rather short and vague. A title like "Tissue-specific expression of neuropeptide precursors and receptors in Nematostella" seems more appropriate

      We appreciate the reviewer's input, and we have made the change accordingly. The revised figure legend now reads: “Tissue-specific expression of neuropeptide precursors and receptors (GPCRs) in N. vectensis.”

      (10) All of the figures in the paper have been saved in bitmap format (e.g. tiff), which means that the resolution of the figures may end up being poor in the published article. All of the figures in this paper should be saved in vector format (e.g. eps) so that there is no loss of resolution when the size of the file/figure is reduced.

      We have now uploaded all figures in vector format (.eps or .pdf) to prevent any loss of resolution.

      (11) In Figure 3 - supplement 2 - the neuropeptides are referred to here as PRGamides and GPRGamides. Some consistency is needed here. And in Figure B, the G of one of the GPRGamides is not shown in black.

      Thank you for spotting this mistake. We now give the correct peptide sequence in parenthesis as "GPRGamide". We also highlighted the missing GPRGamide in the figure.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      In the manuscript by Chen et al. entitled, "The retina uncouples glycolysis and oxidative phosphorylation via Cori-, Cahill-, and mini-Krebs-cycle", the authors look to provide insight on retinal metabolism and substrate utilization by using a murine explant model with various pharmacological treatments in conjunction with metabolomics. The authors conclude that photoreceptors, a specific cell within the explant, which also includes retinal pigment epithelium (RPE) and many other types of cells, are able to uncouple glycolytic and Krebs-cycle metabolism via three different pathways: 1) the mini-Krebs-cycle, fueled by glutamine and branched-chain amino acids; 2) the alanine-generating Cahill-cycle; and 3) the lactate-releasing Cori-cycle. While intriguing if determined to be true, these cell-specific conclusions are called into question due to the ex vivo experimental setup with the inclusion of RPE, the fact that the treatments were not cell-specific nor targeted at an enzyme specific to a certain cell within the retina, and no stable isotope tracing nor mitochondrial function assays were performed. Hence, without significant cell-specific methods and future experimentation, the primary claims are not supported.

      Strengths:

      This study attempts to improve on the issues that have limited the results obtained from previous ex vivo retinal explant studies by culturing in the presence of the RPE, which is a major player in the outer retinal metabolic microenvironment. Additionally, the study utilizes multiple pharmacologic methods to define retinal metabolism and substrate utilization.

      Weaknesses:

      A major weakness of this study is the lack of in vivo supporting data. Explant cultures remove the retina from its dual blood supply. Typically, retinal explant cultures are done without RPE. However, the authors included RPE in the majority of experimental conditions herein. However, it is unclear if the metabolomics samples included the RPE or not. The inclusion of the RPE, which is metabolically active and can be altered by the treatments investigated herein, further confounds the claims made regarding the neuroretina. Considering the pharmacologic treatments utilized with the explant cultures are not cell-specific and/or have significant off-target effects, it is difficult to ascertain that the metabolic changes are secondary to the effects on photoreceptors alone, which the authors claim. Additionally, the explants are taken at a very early age when photoreceptors are known to still be maturing. No mention or data is presented on how these metabolic changes are altered in retinal explants after photoreceptors have fully matured. Likewise, significant assumptions are made based on a single metabolomics experiment with no stable isotope tracing to support the pathways suggested. While the authors use immunofluorescence to support their claims at multiple points, demonstrating the presence of certain enzymes in the photoreceptors, many of these enzymes are present throughout the retina and likely the RPE. Finally, the claims presented here are in direction contradiction to recent in vivo studies that used cell-specific methods when examining retinal metabolism. No discussion of this difference in results is attempted. Response: We agree with the reviewer that in vivo studies could be very interesting indeed. However, technologically it will be extremely difficult to (repeatedly/continuously) sample the retina of an experimental animal and to combine this with an interventional study, with a subsequent metabolomic analysis. We do not currently have access to such technology nor are we aware of any other lab in the world capable of doing such studies. Moreover, virtually all prior studies on retinal metabolism have been done on explanted retina without RPE. This includes the seminal studies by Otto Warburg in the 1920s. As opposed to this, our retinal samples for also all the metabolomic analyses included the RPE, except for the no RPE condition that was used as a comparator for the earlier investigations.

      We note that our metabolomic analysis was done for all five experimental conditions where each condition included at least five independent samples (each derived from different animals).

      The reviewer is correct to say that our organotypic explant cultures are early post-natal, with explantation performed at post-natal day 9 and culturing until day 15. Since our retinal explant system has been validated extremely well over more than three decades of pertinent research (see for instance: Caffe et al., Curr Eye Res. 8:1083-92, 1989), we are confident that photoreceptors mature in vitro in ways that are very similar to the in vivo situation. As far as studies in adult retina (i.e. three months or older) are concerned, this is indeed an important question that will be addressed in future studies. Studies employing stable isotope labelling may also be very informative and are planned for the future, also in order to properly determine fluxes. This will likely require an extension to our NMR hardware with an 15N channel probe, something that we plan on implementing in the future.

      We are aware that a number of questions relating to retinal metabolism are controversial and that the use of other methodology or experimental systems may lead to alternative interpretations. We have now included citations of other studies that use, for example, conditional and/or inducible knock-outs or in vivo blood sampling (e.g. Wang et al., IOVS 38:48-55, 1997; Yu et al., Invest Ophthalmol Vis Sci. 46:4728-33, 2005; Swarup et al., Am J Physiol Cell Physiol. 316:C121-C133, 2019; Daniele et al., FASEB Journal 36:e22428, 2022) and discuss the pros and cons of such approaches (e.g. in Lines 376-384; 454-472).

      Reviewer #2 (Public Review):

      Summary:

      The authors aim to learn about retinal cell-specific metabolic pathways, which could substantially improve the way retinal diseases are understood and treated. They culture ex vivo mouse retinas for 6 days with 2 - 4 days of various drug treatments targeting different metabolic pathways or by removing the RPE/choroid tissue from the neural retina. They then look at photoreceptor survival, stain for various metabolic enzymes, and quantify a broad panel of metabolites. While this is an important question to address, the results are not sufficient to support the conclusions.

      Strengths:

      The questions the authors are exploring at extremely valuable and I commend the authors and working to learn more about retina metabolism. The different sensitivity of the cones to various drugs is interesting and may suggest key differences between rods and cones. The authors also provide a thoughtful discussion of various metabolic pathways in the context of previous publications.

      Weaknesses:

      As the authors point out, ex vivo culture models allow for control over multiple aspects of the environment (such as drug delivery) not available in vivo. Ex vivo cultures can provide good hints as to what pathways are available between interacting tissues. However, there are many limitations to ex vivo cultures, including shifting to a very artificial culture media condition that is extremely different than the native environment of the retina. It is well appreciated that cells have flexible metabolism and will adapt to the conditions provided. Therefore, observations of metabolic responses obtained under culture conditions need to be interpreted with caution, they indicate what the tissue is doing under those specific conditions (which include cells adapting and dying).

      Chen et al use pharmacological interventions to the impact of various metabolic pathways on photoreceptor survival and "long term" metabolic changes. The dose and timing of these drug treatments are not examined though. It is also hard to know how these drugs penetrate the tissue and it needs to be validated that the intended targets are being accurately hit. These relatively long-term treatments should be causing numerous downstream changes to metabolism, cell function, and survival, which makes looking at a snapshot of metabolite levels hard to interpret. It would be more valuable to look at multiple time points after drug treatment, especially easy time points (closer to 1 hr). The authors use metabolite ratios to make conclusions about pathway activity. It would be more valuable to directly measure pathway activity by looking a metabolite production rates in the media and/or with metabolic tracers again in time scales closer to minutes and hours instead of days.

      It is not clear from the text if the ex vivo samples with RPE/choroid intact are analyzed for metabolomics with the RPE/choroid still intact or if this is removed. If it is not removed, the comparison to the retina without RPE/choroid needs to be re-interpreted for the contribution of metabolites from the added tissue. The composition of the tissue is different and cannot be disentangled from the changes to the neural retina specifically.

      While the data is interesting and may give insights into some rod and cone-specific metabolic susceptibility, more work is needed to validate these conclusions. Given the limitations of the model the authors have over-interpreted their findings and the conclusions are not supported by the results. They need to either dramatically limit the scope of their conclusions or validate these hypotheses with additional models and tools.

      Response: We thank the reviewer for the insightful comments and agree that some of our interpretations may have been phrased too determinedly. We have therefore rephrased and toned down our conclusions in many instances in the text, and changed the manuscript title to now read “Retinal metabolism: Evidence for uncoupling of glycolysis and oxidative phosphorylation via Cori-, Cahill-, and mini-Krebs-cycle”.

      Nevertheless, when considering the major known metabolic pathways and their possible impact on metabolite patterns after the experimental manipulations used here, we believe our interpretations to be consistent with the data obtained. Conversely, the previously suggested retinal aerobic glycolysis cannot explain most of the data we have obtained. Even further, also a predominant use of the classical “full” Krebs-cycle/OXPHOS would not explain the metabolite patterns found (e.g. alanine, N-acetylaspartate (NAA)). While this does not in itself mean that our interpretations are all correct, they seem plausible in view of the data at hand and will hopefully stimulate further research on retinal energy metabolism using complementary technologies that were not available to us for the purpose of this study.

      We comment that our organotypic retinal explant cultures, while they do contain their very own, native RPE, do not comprise the choroidal vasculature (in our explantation procedure the RPE readily detaches from the choroid).

      As far as the drugs used on retinal explants are concerned, we note that:

      (1) all three compounds used are extremely well validated, with literally thousands of studies and decades of research to their credit (i.e., 1,9-dideoxyforskolin: >270 publications since 1984; Shikonin: >1000 publications since 1977; FCCP: >2800 publications since 1967),

      (2) all experimental conditions show clear and differential drug effects, as shown, for instance, by the principal component analysis in Figure1I and the cluster analysis in Figure2A,

      (3) the response patterns observed for key metabolites match the anticipated drug effects (e.g. decreased glucose consumption with 1,9-dideoxyforskolin; decreased lactate levels with Shikonin; lactate accumulation with FCCP).

      One can therefore be reasonably certain that these drugs did penetrate the explanted retina and that their respective drug targets were hit. Assessing dose-responses would certainly be interesting, however, the aim of this initial study was not pharmacodynamics but a general manipulation of energy metabolism. Moreover, given the extensive validation of these drugs, off-target effects seem not very likely at the concentrations used.

      We agree with the reviewer that using a longitudinal, time-series type of analysis could give additional insights. We note that each additional time-point will require retinae from 25 animals and a very resource-intensive and time-consuming metabolomic analysis, together with a significantly more complex multivariate analysis (metabolite, experimental condition, time). This is a completely new undertaking that is simply not feasible as an extension of the present study.

      To look at pathway activity in more direct ways is very good idea, to this end we aim to implement in the future an idea put forward by the reviewers, namely 13C-labeling and additionally 15N-labeling and tracing for specific metabolic fuels (e.g. glucose, lactate and anaplerotic amino acids such as glutamate and branched chain amino acids).

      The reviewer is of course correct to say that the culture condition is somewhat artificial and that this may have introduced changes in the metabolism. However, as noted above in the first response to reviewer #1, the organotypic retinal culture system, using a defined medium, free of serum and antibiotics, has been extremely well studied and validated for decades (cf. Caffé et al., Curr Eye Res. 8:1083-92, 1989). Importantly, this system allows to maintain retinal viability, histotypic organization, and function over many weeks in culture. Moreover, most previous studies on retinal metabolism have also used explanted retina – acute or cultured – i.e. experimental approaches that are similar to what we have used and that may be liable to their own artefactual changes in metabolism. This includes the seminal, 1920s studies by Otto Warburg, or the 1980s studies by Barry Winkler, the results of which the reviewers do not seem to doubt.

      We further agree that studying retinal metabolism in a situation closer to in vivo conditions would be thrilling, however to our knowledge to date there is no retina model that fully mimics the complex interplay of the blood metabolome with metabolic tissue activity. This likely means that for each metabolic condition to study (e.g. hyperglycemia, cachexia, etc.), a fairly large number of animals will need to be sacrificed for the molecular investigation of ex vivo retinal biopsies, which would mean a tremendous animal burden.

      We hope the reviewer will appreciate that the revised manuscript now includes numerous improvements, along with new, additional datasets and figures, references to further relevant literature, and – as mentioned above – a more cautious phrasing of our interpretations and conclusions, including a more careful wording for the manuscript title.

      Reviewer #3 (Public Review):

      Summary:

      The neural retina is one of the most energetically active tissues in the body and research into retinal metabolism has a rich history. Prevailing dogma in the field is that the photoreceptors of the neural retina (rods and cones) are heavily reliant on glycolysis, and as oxygen tension at the level of photoreceptors is very low, these specialized sensory neurons carry out aerobic glycolysis, akin to the Warburg effect in cancer cells. It has been found that this unique metabolism changes in many retinal diseases, and targeting retinal metabolism may be a viable treatment strategy. The neural retina is composed of 11 different cell types, and many research groups over the past century have contributed to our current understanding of cell-specific metabolism of retinal cells. More recently, it has been shown in mouse models and co-culture of the mouse neural retina with human RPE cultures that photoreceptors are reliant on the underlying retinal pigment epithelium for supplying nutrients. Chen and colleagues add to this body of work by studying an ex vivo culture of the developing mouse retina that maintained contact with the retinal pigment epithelium. They exposed such ex vivo cultures to small molecule inhibitors of specific metabolic pathways, performing targeted metabolomics on the tissue and staining the tissue with key metabolic enzymes to lay the groundwork for what metabolic pathways may be active in particular cell types of the retina. The authors conclude that rod and cone photoreceptors are reliant on different metabolic pathways to maintain their cell viability - in particular, that rods rely on oxidative phosphorylation and cones rely on glycolysis. Further, their data support multiple mechanisms whereby glycolysis may occur simultaneously with anapleurosis to provide abundant energy to photoreceptors. The data from metabolomics revealed several novel findings in retinal metabolism, including the use of glutamine to fuel the mini-Krebs cycle, the utilization of the Cahill cycle in photoreceptors, and a taurine/hypotaurine shuttle between the underlying retinal pigment epithelium and photoreceptors to transfer reducing equivalents from the RPE to photoreceptors. In addition, this study provides robust quantitative metabolomics datasets that can be compared across experiments and groups. The use of this platform will allow for rapid testing of novel hypotheses regarding the metabolic ecosystem in the neural retina.

      Strengths:

      The data on differences in the susceptibility of rods and cones to mitochondrial dysfunction versus glycolysis provides novel hypothesis-generating conjectures that can be tested in animal models. The multiple mechanisms that allow anapleurosis and glycolysis to run side-by-side add significant novelty to the field of retinal metabolism, setting the stage for further testing of these hypotheses as well.

      Weaknesses:

      Almost all of the conclusions from the paper are preliminary, based on data showing enzymes necessary for a metabolic process are present and the metabolites for that process are also present. However, to truly prove whether these processes are happening, C13 labeling or knock-out or over-expression experiments are necessary. Further, while there is good data that RPE cultures in vitro strongly recapitulate RPE phenotypes in vivo, ex vivo neural retina cultures undergo rapid death. Thus, conclusions about metabolism from explants should either be well correlated with existing literature or lead to targeted in vivo studies. This paper currently lacks both.

      Response: As mentioned above in the first answers to reviewers #1 and #2, we think of our study as a starting point that may provide novel directions for a whole series of investigations into retinal energy metabolism. Especially the use of novel technologies may in the future allow to decipher the different metabolic phenotypes of the 100+ distinct retinal cell types by in situ spatial metabolomics and lipidomics. Currently, we still have to limit the scope of our studies to only certain aspects of this topic. We thus agree that some of our interpretations need to be formulated more carefully and we have done so in the revised version of our manuscript. We also agree with the reviewer that carbon (13C) labelling and tracing studies will be very informative and will engage in such studies in the future. Besides 13C, we aim to further employ 15N labelled substrates, which is especially suitable to study the destiny of amino acids.

      As far as our organotypic retinal explant system is concerned, it is arguably one of the best validated such systems available (see responses to reviewers #1 and #2). While the reviewer is correct to say that the neuroretina without RPE degenerates relatively quickly in vitro, in our system, with the neuroretina and its native RPE cultured together, we can routinely culture the retina for four weeks or more, without major cell loss (Söderpalm et al., IOVS 35:3910-21, 1994; Belhadj et al., JoVE 165, 2020). Thus, our retinal cultures with RPE do not undergo rapid death. Within the time-frame of the present study (6 days in vitro) culturing-induced cell death is minimal and unlikely to influence our analyses. For further, more detailed answers to the reviewers’ questions please see our detailed point-to-point response below.

      We agree with the reviewer that eventually in vivo studies will be important to confirm our interpretations. As mentioned in our initial response to reviewer #1, such studies will be very challenging and new technologies may need to be developed before in vivo investigations can deliver the answers to the questions at hand (see answer to question Rev#3.17 below), especially if the cross-play between substrate availability from the blood metabolome and the retinal metabolic pathway activity shall be studied.

      Recommendations For The Authors

      Reviewer #1 (Recommendations For The Authors):

      Rev#1.1. The animals should be screened for and lack rd8.

      Response: This is a pertinent question from the reviewer. Ever since we first became aware of the presence of rd8 mutations in certain mouse lines from major vendors (e.g. Charles River, Jackson Labs) in around 2010, we have setup regular screening of all our mouse lines for this Crb1 mutation. Accordingly, the mouse lines used in this study were confirmed to be free of the rd8 / Crb1 mutation. A corresponding remark has now been inserted into the SI materials and methods section (Lines 37-38).

      Rev#1.2. GLUT1 looks significantly different from in vivo to in vitro. Recommend co-staining with RHO and cone markers (PNA or CAR) to further delineate where it is being expressed. The in vitro cultures appear to have much shorter outer segments (OS). Considering OS biosynthesis is thought to drive a good deal of metabolic adaptations, how relevant is the in vitro model system to what is truly occurring in vivo?

      Response: The GLUT1 staining shown in Figure 1 displays the in vivo situation. Since may not have been entirely clear from the previous figure legend, we have now labelled this as “in vivo retina” and distinguish it from “in vitro” samples in the legend to Figure 1 (Lines 774-778). As far as the comparison of GLUT1 staining in vivo (Figure 1A3) vs. in vitro (Figure S1C3) is concerned, in both situations a strong RPE labelling is clearly visible, with essentially no GLUT1 label within the neuroretina.

      Nevertheless, to better delineate the expression of GLUT1 in the outer retina, we have now performed an additional co-staining with rhodopsin (RHO) as rod marker and peanut agglutinin (PNA) as cone marker, as suggested by the reviewer (new supplemental Figure S1). In brief, this co-staining confirms the strong expression of GLUT1 in the RPE, while there is essentially no GLUT1 detectable in rod or cone photoreceptors.

      Retinal explants in long-term cultures do indeed have somewhat shorter outer segments compared to same age in vivo counterparts (Caffe et al., Curr Eye Res. 8:1083-1092, 1989). However, in the short-term cultures (6 DIV) and at the age studied here (P15) outer segments have only just started to grow out and are around 10 - 12 µm long, both in vitro and in vivo (cf. LaVail, JCB 58:650-661, 1978). Thus, the metabolism required for outer segment synthesis should be equivalent when in vitro and in vivo situations are compared. For considerations on outer segments in retinal explant cultures see also Rev#3.2 and Rev#3.29.

      Rev#1.3. Also, recent publications have shown that GLUT1 is expressed in the neuroretina including rods, cones, and muller glia. Was GLUT1 not appreciated in these cells in your ex vivo samples and if so, why? Likewise, these same studies previously demonstrated GLUT1 resulted in rod degeneration but not cone. The results presented here differ significantly. Why the difference in results and is it secondary to the in vitro vs. in vivo setting? Furthermore, the authors state that they thought the no RPE situation would be similar to the GLUT1 inhibitor experimental condition but instead, they were vastly different. Is this secondary to the fact that GLUT1 is expressed outside the RPE.

      Response: We are aware that there is a controversy regarding GLUT1 expression in the neuroretina, please see also our response to question Rev#3.1 below. As far as our immunostaining for GLUT1 on in vivo retina is concerned, we find an unambiguous and very marked expression of GLUT1 in RPE cells, at both basal and apical sides. Compared to the RPE, the neuroretina appears devoid of GLUT1 staining. However, at very high gamma values a faint staining in the neuroretina becomes visible, a staining which from its appearance – processes spanning the entire width of the retina – is most compatible with Müller glia cells. Under normal circumstances we would have dismissed such a faint staining as background and false positive. Given the sometimes very contradicting reports in the literature, we cannot fully exclude a weak expression of GLUT1 also in cells other than the RPE, with Müller glial cells perhaps being the most likely candidate. At any rate, GLUT1 expression in the neuroretina can only be much weaker than in the RPE, making its relevance for overall retinal metabolism unclear.

      As far as recent publications studying GLUT1 in the retina are concerned, we know of the study by Daniele et al. (FASEB Journal 36:e22428, 2022), which used a rod-specific, conditional knock-out of GLUT1 and found a relatively slow rod degeneration. We are not aware of a selective GLUT1 knock-out in cones, nor are we aware of conditional GLUT3 knock-outs in the retina. For further discussion of the Daniele et al. study please see Rev#3.13.

      The reviewer is right, initially we were thinking that, since GLUT1 was expressed only (predominantly) in RPE, the metabolic response to GLUT1 inhibition should look similar to the no RPE situation. However, this initial hypothesis did not consider a key fact: The RPE builds the blood retinal barrier and the tight-junction coupled RPE cells are a barrier to any larger molecule, including glucose. Removing the barrier by removing the RPE dramatically increases the availability of glucose to the retina, a phenomenon that is likely exacerbated by the expression of the high affinity/high capacity GLUT3 on photoreceptors (cf. Figure S1A). In other words, when the RPE is removed the outer retina is “flooded” with glucose and we believe that this is probably the main factor that explains why the metabolic response to GLUT1 inhibition (1,9-DDF group) is so different from the no RPE condition.

      We have now included an additional corresponding explanation in the discussion (Lines 422-429). Furthermore, we have added an entire new subchapter to the discussion to debate the expression of glucose transporters in the outer retina (Lines 454-472).

      Rev#1.4. Shikonin's mechanism of action via protein aggregation and lack of specificity for PKM2 vs PKM1 at 4uM is an experimental limitation that needs to be taken into account. All treatments utilized are not cell-specific.

      Response: While the reviewer is correct to say that Shikonin may have multiple cellular targets and a diverse range of possible applications as an anti-inflammatory, antimicrobial, or anticancer agent (cf. Guo et al., Pharmacol. Res. 149:104463, 2019), numerous studies support its specificity for PKM2 over PKM1, at concentrations ranging from 1 – 10 µM (Chen et al., Oncogene 30:4297-306, 2011; Zhao et al., Sci. Rep. 8:14517, 2018; Traxler et al., Cell Metab. 34:1248-1263, 2022). We settled for 4 μM as an intermediate concentration, considering its effectiveness and specificity in previous studies. We have now inserted references detailing the specificity and concentration range of Shikonin into the SI Materials and Methods section (Line 62).

      The concern that “all treatments” are not cell-specific is debatable. Certainly, any given compound may have off-target effects, yet, since the compounds we used in our study have all been studied for decades (see above, initial response to Reviewer #2), their off-target profile is well established and unlikely to play an important role here. Moreover, in our study the cell specificity does not come from the compounds used but from where their targets are expressed. As shown in Figure 1A and in Figure S1C, Shikonin´s target PKM2 is almost exclusively expressed in photoreceptor inner segments. Hence, it seems very reasonable to expect that the vast majority of the metabolomic changes observed by Shikonin treatment are related to photoreceptors. We note that this assertion would still be true even if there was a low-level expression of PKM2 in other retinal cell types and/or if Shikonin had moderate off-target effects on other enzymes since the bulk of the effect on the quantitative metabolomic dataset would still originate from PKM2 inhibition in photoreceptors.

      Rev#1.5. What was the method of cone counting in Figure 1?

      Response: Cones were counted per 100 µm of retinal circumference based on an arrestin-3 staining (cone arrestin, CAR).

      This information is now included in the SI Materials and Methods section under “Microscopy, cell counting, and statistical analysis” (Lines 99-100).

      Rev#1.6. How do you know that FCCP is not altering RPE ox phos, disrupting the outer retinal microenvironment and leading to cell death, and therefore, the effects seen are not photoreceptor-specific but rather downstream from the initial insult in RPE?

      Response: We propose that FCCP will be acting on both photoreceptors and RPE cells (and all other retinal cell types) at essentially the same time, over the experimental time-frame. Thus, OXPHOS should be inhibited in all cells simultaneously. However, FCCP will primarily affect cells that actually use OXPHOS to a large extent, while cells relying on other metabolic pathways (e.g. glycolysis) will hardly be affected.

      We believe the very strong effect of FCCP, seen exclusively in rod photoreceptors, to be a direct drug effect. While we cannot not fully exclude an indirect effect via the RPE – as proposed by the reviewer – we think this to be unlikely because:

      (1) RPE viability was not compromised by FCCP treatment.

      (2) If the reviewer´s hypothesis was correct, then also cone photoreceptors should have been affected (e.g. because now the RPE consumes all glucose, leaving nothing for cones). However, cones were essentially unaffected by the FCCP treatment, making a dependence on RPE OXPHOS unlikely. Especially so, because blocking GLUT1 and glucose import on the RPE with 1,9-DDF had only relatively minor effects on rod photoreceptor viability but strongly affected cones. This indicates that the RPE is mainly shuttling glucose through to photoreceptors, especially to cones, and this function does not seem to be impaired by FCCP treatment.

      (3) We found that enzymes required for Krebs-cycle and OXPHOS activity (i.e. citrate synthase, fumarase, ATP synthase γ) are predominantly expressed in photoreceptors but virtually absent from RPE (Figure 3D, see also answer to following question).

      (4) The density of mitochondria (i.e. the target for FCCP) is far lower in RPE than in photoreceptors, as evidenced also by the COX staining shown in Figure 1A. Hence, photoreceptors are far more likely to be hit by FCCP treatment than RPE cells.

      To accommodate the reviewer´s concern, we have now added a further comment into the discussion (Lines 440-442).

      Rev#1.7. While Figure 3D is interesting, it offers no significant insight into mechanisms as the enzyme levels are not being compared to control nor is mitochondrial fitness in these conditions being assessed, which would provide greater insight than just showing that these enzymes are present in the inner segments, which are known to be rich in mitochondria. Additionally, stating that the low ATP is secondary to decreased Krebs cycle activity and ox phos based on merely ATP levels is not supported by metabolite levels minus citrate nor ox phos enzyme levels or oxygen consumption. Also, citrate is purported to be decreased in the table in Figure 2 in the no RPE condition; however, Supplemental Figure 2 demonstrates this change is not significant then the same data is presented in Supplemental Figure 3 and it is statistically significant again. Why the difference in data and why is the same data being shown multiple times?

      Response: The immunostaining shown in Figure 3D shows the in vivo retina, or in other words the localization of enzymes in the native situation. Since this may not have been obvious in the previous manuscript version, we have added a corresponding comment to the legend of Figure 3 (Line 806). The localization of the Krebs-cycle/OXPHOS enzymes citrate synthase, fumarase, and ATP synthase mainly to photoreceptors, but not (or much less) to RPE, is another piece of evidence supporting the idea that OXPHOS is predominantly performed by photoreceptors (see also answer to previous question Rev#1.6).

      The decreased ATP levels (together with citrate, aspartate, NAA) shown in Figure 3 in the no RPE group, are an indication that photoreceptor Krebs-cycle activity may be decreased but not abolished in the absence of RPE. Importantly, GTP levels are not reduced in the no RPE group (Figure 2). Since large amounts of GTP can only by synthesized by either SUCLG-1 in the Krebs-cycle or by NDK-mediated exchange with ATP, the most plausible interpretation is that Krebs-cycle dependent ATP-synthesis was decreased in the no RPE situation, but that the (mini) Krebs-cycle or Cahill-cycle, notably the step from succinyl-CoA to succinate, was running. Since there is no RPE in this group, this strongly suggests important Krebs-cycle/OXPHOS activity in photoreceptors where the majority of the corresponding enzymes are located (see above).

      We thank the reviewer for pointing out that the information on group comparisons may not have been presented with sufficient clarity. In the figures mentioned by the reviewer the data is shown and compared in different contexts: the table in Figure 2B and the data in Figure S3 (now renumbered to Figure S5) refer to two-way comparisons of treatment condition to control, to elucidate individual treatment effects. Meanwhile Figure S2 (now supplementary Figure S3) refers to a 5-way comparison for a general overview that puts all five groups in context with each other. These differences in comparisons and normalization to the respective common standards entail the use of different statistical tools, resulting in different p-values. The statistical testing approaches and thresholds are now disclosed in the figure legends, and additionally in the SI Materials and Methods section (Lines 145-155).

      Rev#1.8. When were the ex vivo samples taken for metabolomics, and if taken when significant TUNEL staining and cell death have occurred, are the changes in metabolism due to cell death or a true indication of differential metabolism? Furthermore, it is unclear if the metabolomics samples included the RPE or not. Considering these treatments will affect most cells in the retina and the RPE, which is included in the ex vivo samples, it is difficult to ascertain that these changes are secondary to the effects on photoreceptors alone.

      Response: The samples for metabolomics included the RPE (except for the no RPE condition) and were taken at the same time as the tissues for histological preparations and TUNEL assays, i.e. they were all taken at post-natal day 15. This has now been clarified in the SI Materials and Methods section (Lines 108-110).

      We cannot entirely exclude an effect of ongoing cell death caused by the different drug treatments on the retinal metabolome. However, since in the experimental treatments cell death was still comparatively low (even in the FCCP condition, overall cell death was only around 10% of the total retina), and the metabolomic analysis considered the entire tissue, the impact of cell death per se on the total metabolome will be comparatively minor (≤ 10%, i.e. within the typical error margin of the metabolomic analysis).

      As mentioned above, the drug treatments should in principle affect all retinal cells at the same time. However, only cells that express the drug targets (i.e. 1,9-DDF targets GLUT1 in RPE cells, Shikonin targets PKM2 in photoreceptors; cf. Figure 1A) should react to the treatment. Even FCCP, in the paradigm employed, will only affect those cells that rely heavily on OXPHOS. Our data indicates that while this is almost certainly the case for rods; cones, RPE cells, and essentially all of the inner retina, are not affected by FCCP treatment, strongly suggesting that OXPHOS is of minor importance for these cell populations.

      Rev#1.9. Why were the FCCP and no RPE groups compared? If they have similar metabolite patterns as noted in Figure 2, would that suggest that FCCP's greatest effect is on the ox phos of RPE and the metabolite patterns are secondary to alterations in RPE metabolism? Also, the increase in citrate and decrease in NAD may be related to effects on RPE mitochondrial metabolism when comparing these groups, and the disruption of RPE metabolism may then result in PARP staining of photoreceptors.

      Response: The reason for the pair-wise comparison of the no RPE and FCCP groups initially was indeed the similarity in metabolite patterns. This was now rephrased accordingly in the results section “Photoreceptors use the Krebs-cycle to produce GTP” (Lines 218-219). The interpretation that the reviewer proposed here is interesting, but does not conform with the data analysis of this and other group comparisons.

      Instead, the similarity between the metabolic patterns found in the no RPE and FCCP groups further supports the idea that a lack of RPE decreases retinal OXPHOS and increases glycolysis. This interpretation is based on the following observations:

      (1) Mitochondrial density in the RPE is far lower than in photoreceptors (see COX staining in Figure 1A), thus quantitatively the metabolite pattern caused by a disruption of OXPHOS (via FCCP treatment) will be dominated by metabolites generated by photoreceptors. For the same reason the depletion of retinal NAD+, and the concomitant increase in photoreceptor PAR accumulation after FCCP treatment, is unlikely to be due to changes in RPE.

      (2) Similarly, citrate synthase (CS) was found to be almost exclusively expressed in photoreceptor inner segments, with little expression in RPE (Figure 3D). Hence, the quantitative increase of citrate levels after FCCP treatment can only originate in photoreceptors.

      (3) The comparison of the control (with RPE) against the no RPE group suggested an increase in (aerobic) glycolysis in the absence of RPE, evidenced notably by a retinal accumulation of lactate, BCAAs, and glutamate (Figure 3A). The very same metabolite pattern is seen for the FCCP treatment (Figure 1B) indicating a marked upregulation of glycolysis (Figure 6C). The latter observation suggests that photoreceptors, after disruption of OXPHOS switch to an exclusively glycolytic metabolism, which, however, rods cannot sustain (Figure 1C, D).

      (4) Glucose consumption and lactate release is increased in the no RPE group vs. control (new Supplementary Figure 4). A similar increase in glucose consumption and lactate production is seen in the FCCP group suggesting that also the no RPE situation disrupts OXPHOS in photoreceptors.

      Rev#1.10. The conclusions being reached are difficult to interpret secondary to the experimental procedures and the fact that the treatments are not cell-specific and RPE is included with the neuroretina as well. Likewise, stating FCCP is altering the Krebs cycle in the neuroretina is difficult to believe as there are no changes in the Krebs cycle when compared to the control, which also has RPE.

      Response: We agree with the reviewer, that some of the conclusions may have been somewhat speculative. Accordingly, we have toned down our conclusions in several instances in the text, notably in abstract, introduction, and discussion.

      When it comes to Krebs cycle intermediates a key limitation of our study is indeed the lack of carbon-tracing and metabolic flux analysis as noted by the reviewers, a limitation that we now highlight more strongly in the discussion of the revised manuscript (Lines 545-549). While it is highly probable that the flux of Krebs cycle intermediates is altered by FCCP, our steady-state data does not show significant changes in the metabolites citrate, fumarate, and succinate. However, our study does show a highly significant decrease in GTP levels, which as explained above, is a key indicator of Krebs cycle activity/inactivity. Moreover, while GTP levels were reduced also in the no RPE group, GTP was still significantly higher in the no RPE group compared to the FCCP treatment. Our interpretation of this finding is that there is Krebs-cycle/OXPHOS activity in the neuroretina, which is abolished by FCCP.

      Rev#1.11. Supplemental Figure 4C and D states that GAC inhibition affected only photoreceptors, but GAC is expressed throughout the retina and so the inhibition is altering glutamine-glutamate homeostasis throughout the retina. Clearly, based on histology, one can see that the architecture of the retina, especially at the highest dose, is lost likely because all cells are being affected. So it is not photoreceptor-specific and even at low doses one can see that the inner retina is edematous. Moreover, with such a high amount of TUNEL staining in the ONL, are rods more affected than cones?

      Response: In our hands the immunostaining for Glutaminase C (GAC) labelled predominantly cone inner segments, the OPL, and perhaps bipolar cells (Figure S1A). The deleterious effects mentioned by the reviewer are only seen at the highest concentration of the GAC inhibitor compound 968. This concentration (10 µM) is 100-fold higher than the dose that produces a significant loss of cones in the outer retina (0.1 µM). We therefore think that this data points to the extraordinary reliance of cones on glutamine and glutamate. As can be seen from the images (Figure S4C) illustrating the effects of 0.1 and 1 µM Compound 968 treatment, the ONL thickness is not significantly reduced by the GAC inhibitor. This strongly indicates that at these doses the rods are not affected by GAC inhibition.

      Rev#1.12. The no RPE vs 1,9 DDF data may be interpreted as preventing glucose transport in the RPE increases BCAA catabolism by the RPE, which has been shown to utilize BCAA in culture systems. To this end, when the RPE is not present, the BCAA is increased as compared to the control with RPE.

      Response: Our original interpretation of this data was that after GLUT1 inhibition and a correspondingly reduced retinal glucose uptake, the retina switched to an increasing use of anaplerotic substrates, including BCAAs. This is supported by the concomitant upregulation of the Cahill-cycle product alanine and the mini-Krebs-cycle product N-acetylaspartate (NAA). Yet, we agree with the reviewer that BCAAs could also be consumed by the RPE. We have now changed our conclusion at the end of the results chapter “Reduced retinal glucose uptake promotes anaplerotic metabolism“ to also highlight this possibility (Lines 261-262).

      Rev#1.13. It is unclear why so much effort is comparing the no RPE group to the treatment groups and not comparing the control group to the different treatment groups.

      Response: Previous studies – including the seminal studies of Otto Warburg from the early 1920s – had always used retina without RPE. This “no RPE” situation is therefore something of a reference for our entire study, which is why we dedicated more effort to its analysis. We have now inserted a corresponding remark into the manuscript (Lines 182-184).

      Rev#1.14. The conclusions are significantly overstated especially with regards to rods versus cones as these are not cell-specific treatments. For example, the control vs 1,9 DDF vs FCCP clearly shows that there is mitochondrial dysfunction due to decreased NAD, increased AMP/ATP ratio, decreased Asp but increased Gln, and a compensatory increase in lactate production.

      Response: We agree with the reviewer and have tried to phrase our statements in more measured fashion. Notably, we have toned down our statements in the title, abstract, results, discussion, and several of the subchapter headings.

      Rev#1.15. While metabolic conclusions are drawn on serine/lactate ratio, this ratio is driven by the drastic changes in lactate and not so much serine in the treatment conditions as it was rather stable. Likewise, substrates beyond glucose have the potential to fuel the TCA cycle and make GTP via SUCLG1, such as fatty acids, other AAs, etc. Therefore, this ratio may not tell the entire story about anaplerotic metabolism. Furthermore, knowing that RPE utilize BCAAs to fuel their TCA cycle, the no RPE condition may simply have increased BCAAs due to lack of metabolism by the RPE, which drives the GTP/BCAA ratio. To state that the neuroretina was utilizing BCAAs for anaplerosis is not well supported based on the current data. Similarly, what is to say that the GTP/lactate ratio in the no RPE situation is not driven by the fact that the RPE is no longer present to act as acceptor of retinal lactate production or that more glucose is reaching the retina since the RPE is not present to accept and utilize that produced. Glucose uptake was not assessed to further address these issues.

      Response: We agree with reviewer that metabolite ratios may not tell the full story underlying retinal metabolism however based on the robustness of using quantitative and highly reproducible NMR data, they are an important part of the metabolomics toolbox. The reviewer correctly observed that the changes in lactate levels are more dramatic than in serine. Still, also serine was significantly increased in the no RPE, 1,9-DDF, and Shikonin groups. Together with the lactate changes (same or opposite direction) the resulting serine/lactate ratios display marked alterations.

      When it comes to the supply of other potential energy substrates mentioned by the reviewer, i.e. fatty acids or amino acids other than BCAAs, these are only supplied in minimal amounts in the defined, serum free R16 medium (Romijn, Biology of the Cell, 63, 263-268, 1988) and – if used to any important extent – would be rapidly depleted by the retina. Thus, for a culture period of 2 days in vitro between medium changes these energy sources are not available and thus cannot be used by the retina.

      Our conclusion that the retina is using anaplerosis is based not only on the observations made in the no RPE group but also on, for instance, the metabolite ratios seen in the 1,9-DDF treatment group. In this group decreased glycolytic activity may correspond to increased serine synthesis and anaplerosis.

      As far as glucose uptake is concerned, we have analysed the medium samples at P15 (equivalent to the retina tissue collection time point) and now present data that addresses this question more directly via the consumption of glucose from and release of lactate to the culture medium (New Supplementary Figure 4C, D). This new dataset provides another independent observation showing that:

      (1) Glucose consumption/lactate release (i.e. aerobic glycolysis) is high in the no RPE situation but low in the control situation. In other words, retinal aerobic glycolysis is most likely stimulated by the absence of RPE.

      (2) 1,9-DDF treatment decreases glucose consumption/lactate release as would be expected from a GLUT1 blocker. Since ATP and GTP production are high nonetheless, this indicates that other substrates (i.e. anaplerosis) were used for retinal energy production, in agreement with the analysis shown in Figure 6C.

      (3) The FCCP treatment, which disrupts oxidative ATP-production, increases glucose consumption/lactate release in way similar to the no RPE situation. Yet, the no RPE retina can still generate sizeable amounts of GTP but not ATP. Together, this provides further evidence that neuroretinal OXPHOS is decreased in the absence of RPE.

      Rev#1.16. The evidence for the mini-Krebs cycle is intriguing but weak considering it is based on certain enzymes being expressed in the photoreceptors, which had already been shown to be present in other publications, and a single ratio of metabolites that is increased in FCCP. One would expect this ratio to be increased under FCCP regardless. There is no stable isotope tracing with certain fuels to confirm the existence of the mini-Krebs cycle.

      Response: We thank the reviewer for this suggestion. We agree that our evidence for the mini-Krebs-cycle (and the Cahill-cycle) may be to some extent circumstantial and additional technologies would help to obtain further supportive data. Still, here we would like to invite the reviewer to a thought experiment where he/she could try and interpret our data without considering the Cahill- or the mini-Krebs-cycle. At least we ourselves, when we engaged into such thought experiments, were unable to explain the data observed without these alternative energy-producing cycles. Most notably, we were unable to explain the strong accumulation of either alanine or N-acetyl-aspartate (NAA) when only considering glycolysis and (full) Krebs-cycle metabolism. Of course, this may still be considered “weak” evidence, and we expect that future studies including complementary technologies will either confirm or expand our interpretation of the existing data set.

      The suggestion to perform stable isotope-labelled tracing with potential alternative fuels (e.g. glutamate, glutamine, pyruvate, etc.) is very attractive indeed. While such studies are likely to shed further light on the metabolic pathways proposed, this will entail very extensive experimental work, with multiple different conditions and concentrations and variety of analysis methods that is currently not feasible (e.g. a 1.7 mm NMR probe equipped with a 15N channel) as an extension of the present manuscript. Nevertheless, we will certainly consider this approach for future follow-up studies once such techniques are available and will screen for suited collaboration partners. A corresponding comment on such future possibilities has now been inserted into the discussion (Lines 545-549).

      Rev#1.17. The discussion does not mention how this data contradicts a recent in vivo study looking at Glut1 knockout in the retina (Daniele et al. FASEB. 2022) or previous in vivo studies that suggest cones may be less sensitive to changes in glucose levels (Swarup et al. 2019). This is a key oversight.

      Response: We thank the reviewer for pointing this out. We now included these studies in the revised discussion in a new subchapter on the expression of glucose transporters in the outer retina (Lines 454-472). For a critical review of the Daniele et al., 2022 study please also see our more detailed response to question Rev#3.13 below.

      Rev#1.18. GAC is expressed in more than just cones so making cell-specific statements regarding fuel utilization is not well supported.

      Response: Our immunostaining for GAC revealed a strong expression in cone inner segments (Figure S1A3). While this does not exclude (relatively minor) expression in other retinal cell types, cones are likely to be more reliant on GAC activity than other cell types. See also answer above.

      Rev#1.19. Suggesting that rods utilize the mini-Krebs cycle based on AAT2 being seen in the inner segments without at least co-staining for RHO or PNA is weak evidence for such a cycle. AAT looks to be expressed in the inner segments of all photoreceptors.

      Response: We have taken up this suggestion from the reviewer and now provide an additional co-staining for AAT1 and AAT2 with rhodopsin. Note that in response to a pertinent comment from Reviewer #3 we have changed the abbreviation for aspartate aminotransferase from “AAT” to the more commonly used “AST” throughout the manuscript.

      New images showing a co-staining for AST1 and AST2 with rhodopsin now replace the former image set in Figure 7D. In brief, the new images show the expression of both AST1 and AST2 across the retina, with, notably an expression in the inner segments of photoreceptors but not in the outer segments, where rhodopsin is expressed.

      Reviewer #3 (Recommendations For The Authors):

      Rev#3.1. The staining for the glucose transporters GLUT1 and GLUT3 does not reflect what has previously been published by two different groups that were validated by cell-specific knockout mice. As mentioned by the author GLUT1 and GLUT3 have differences in transport kinetics, which would affect their metabolism. Therefore, the lack of GLUT1 in photoreceptors would suggest that photoreceptor metabolism is not faithfully replicated in this system. This difference from the previous literature should be discussed in the discussion.

      Response: As the reviewer pointed out, the expression of GLUT1 in the retina is somewhat controversial, with much older literature showing expression on the RPE, while some more recent studies claim GLUT1 expression in photoreceptors. For a brief discussion of our GLUT1 immunostaining please see also our answer to question Rev#1.3 above.

      Although the retinal expression of GLUT1 was besides the focus of our study, we feel we must address this point in more detail: In the brain the generally accepted setup for GLUT1 and GLUT3 expression is that low-affinity GLUT1 (Km = 6.9 mM) is expressed on glial cells, which contact blood vessels, while high-affinity GLUT3 (Km = 1.8 mM) is expressed on neurons (Burant & Bell, Biochemistry 31:10414-20, 1992; Koepsell, Pflügers Archiv 472, 1299–1343, 2020). This setup matches decreasing glucose concentration with increasing transporter affinity, for an efficient transport of glucose from blood vessels, to glial cells, to neurons. In the retina, the cells that contact the choroidal blood vessels are the tight-junction-coupled RPE cells. As shown by us and many others, RPE cells strongly express GLUT1 (cf. Figure 1A-3.). To warrant an efficient glucose transport from the RPE to photoreceptors, photoreceptors must express a glucose transporter with higher glucose affinity than GLUT1. We show that this is indeed the case with photoreceptors expressing GLUT3 (cf. Supplemental Figure 1-5.). While a part teleological explanation does not per se prove that our data is correct, at least our data is plausible. In contrast, the glucose transporter setup sometimes claimed in the literature is biochemically implausible, i.e. for the flow of metabolites (glucose) to go against a gradient of transporter affinities, and we are not aware of an example of such a setup occurring anywhere in nature.

      However, at this point we cannot exclude low levels of GLUT1 expression on Müller glia cells or even photoreceptors. This expression could, for instance, be relevant in cases where cells were shuttling excess glucose – perhaps produced through gluconeogenesis – onwards to other retinal cells. Still, GLUT1 expression can only be minor when compared to RPE since a major expression would destroy the glucose affinity gradient (see above) required for efficient glucose shuttling into the energy hungry photoreceptors.

      To address this request by the reviewer (and also reviewer #1) we now discuss the question of glucose transporter expression in the outer retina in a new subchapter of the discussion (Lines 454-472).

      Rev#3.2. Photoreceptor metabolism and aerobic glycolysis are tied to photoreceptor function, as demonstrated by Dr. Barry Winkler. The authors should provide data or mention (if previously published) about photoreceptor OS growth and function in this system.

      Response: The studies of Barry Winkler (e.g. Winkler, J Gen Physiol. 77, 667-692, 1981) confirmed the original work of Otto Warburg and expanded on the idea that the neuroretina was using aerobic glycolysis. Importantly, Winkler used a very similar experimental setup as Warburg has used, namely explanted rat retina without RPE. In light of our data where we compare metabolism of mouse retina with and without RPE – where retina cultured without RPE confirms the data of Warburg and Winkler – it appears most likely that the purported aerobic glycolysis occurs mostly in the absence of RPE but only to a lower extent in the native retina.

      Photoreceptor outer segment outgrowth is somewhat slower in the organotypic retinal explant cultures compared to the in vivo situation (cf. Caffe et al., Curr Eye Res. 8:1083-1092, 1989 with LaVail, JCB 58:650-661, 1978; see also answer to reviewer #1). Importantly, organotypic retinal explant cultures and their photoreceptors are fully functional and remain so for extended periods in culture (Haq et al., Bioengineering 10:725, 2023; Tolone et al., IJMS 24:15277, 2023). This information has now been added to the manuscript discussion section, into the new subchapter “The retina as an experimental system for studies into neuronal energy metabolism” (Lines 367-395).

      Rev#3.3. It is unclear from the description of the experiment in both the results and methods if 1,9DDF, Shikonin, and FCCP were added to both apical and basal media compartments or one or the other and should be specified. The details of what was on the apical compartment would be helpful, as the model is supposed to allow for only nutrients from the basal compartment (as indicated by the authors themselves). Is the apical compartment just exposed to air? How does this affect survival?

      Response: In organotypic retinal explant cultures the RPE rests on the permeable culturing membrane such that the basal side is contact with the membrane and the medium below (far schematic drawing see Figure S1B), while the apical side is covered by a thin film of medium created by the surface tension of water (Caffe et al., Curr Eye Res. 1989; Belhadj et al., JoVE, 2020). This thin liquid film ensures sufficient oxygenation and is an important factor that allows the retinal explant to remain viable for several weeks in culture. If the retinal cultures were submerged by the medium, their viability – especially that of the photoreceptors – would drop dramatically and would typically be below 3-5 days. Therefore, in the retinal organotypic explant cultures used here, the nutrients and the drugs applied do indeed reach the outer retina from the basal side, i.e. similar as they would in vivo.

      To address this question from the reviewer, corresponding clarifications have been inserted into the SI Materials and Methods section (Lines 64-66).

      Rev#3.4. As the metabolomic data obtained was quantitative, several metabolites discussed should be analyzed in terms of ratios, for example, Glutathione and glutathione disulfide should be reported as a ratio. In addition as ATP, ADP, and AMP were measured, they can used to calculate the energy charge of the tissue.

      Response: We thank the reviewer for these suggestions and have created corresponding graphs for GSH / GSSG ratio and energy charge. These new graphs have now been added to the SI datasets, to the new Supplementary Figure 4. To accommodate other requests from the Reviewers, this new Figure also contains additional new datasets on glucose and lactate concentrations (see further comments above and below). Please note that all later SI Figures have been renumbered accordingly.

      In brief, the ratios for GSH/GSSG show no significant changes between control and the different experimental groups. Meanwhile, the adenylate energy charge of the retinal tissues show a significant decrease in the energy charge for the Shikonin group and the FCCP group. Note that in the new Supplementary Figure 4A, the dotted lines indicate the energy charge window typical for most healthy cells (0.7 – 0.95).

      Rev#3.5. I think a missed opportunity when discussing the possible taurine/hypotaurine shuttle would be the impact on the osmosis of the subretinal space as taurine has been hypothesized as a major osmolyte.

      Response: This is another interesting recommendation from the reviewer. To address this point, we have now introduced a corresponding paragraph and references in the discussion of the manuscript (Lines 503-504; 512-514).

      Rev#3.6. In Figure 3, the distribution of these enzymes should also be studied under the no RPE condition as the culture treatment took several days for these metabolic changes to occur.

      Response: The images shown in Figure 3D are from the in vivo retina. Since this may not have been very clear in the previous manuscript version, we have now added a corresponding explanation to the legend of Figure 3. As far as we can tell, the expression and localization of neuroretinal enzymes does not change in cultured retina, during the culture period (compare Figure 1A with Supplementary Figure S1C). However, when it comes to the metabolite taurine its production (localization) changes dramatically in the no RPE situation where taurine is essentially undetectable by immunostaining (not shown but see metabolite data in Figure 2A, Figure 3A).

      Rev#3.7. In Figures 4 and 5, it is unclear why the experimental groups were not compared to the control and requires further explanation. Furthermore, the authors should justify the concentrations of drugs used as the cell death could have risen from toxicity to the drugs and not due to disruption of metabolism.

      Response: The reviewer is right, the rationale for these comparisons may not have been laid out with sufficient clarity. In Figure 4 the no RPE and FCCP groups are compared because both groups showed similar metabolite changes towards the control situation. The no RPE to FCCP comparison thus focussed on the details of the – at first seemingly minor – differences between these two groups. This has now been clarified in the corresponding part of the results (Lines 218-219).

      In Figure 5A, B we compare the no RPE and 1,9-DDF groups with each other, notably because the data obtained seemingly contradicted our initial expectation that these two groups should show similar metabolite patterns. Also here, we have now inserted an additional explanation for this choice of comparisons (Lines 252-253).

      In Figure 5C, D we compare the Shikonin and FCCP groups with each other. The idea behind this comparison was that in the 1st group glycolysis was blocked while in the 2nd group OXPHOS was inhibited, or in other words here were compared what happened when the two opposing ends of energy metabolism were manipulated in opposite directions. This reasoning is now given in the results section (Lines 265-268).

      As far as the choice of drugs and concentrations is concerned, we used only compounds that have been extremely well validated through up to five decades of scientific research (see initial response to Reviewer #2 above). We therefore are confident that at the concentrations employed the results obtained stem from drug effects on metabolism and not from generic, off-target toxicity. Then again, as we show, prolonged (i.e. 4 days) block of energy metabolism pathways does cause cell death.

      Rev#3.8. In line 203, the authors discuss GTP as being primarily a mitochondrial metabolite, however, photoreceptors would require a localized source of GTP synthesis in the outer segments as part of phototransduction, and therefore GTP in photoreceptors cannot be a mitochondrial-specific reaction in photoreceptors. Furthermore, the authors mentioned NDK as being a possible source of GTP, but they do not show NDK localization despite it being reported in the literature to be localized in the OS.

      Response: The question as to the source of GTP in photoreceptor outer segments is indeed highly relevant. For GTP production in mitochondria see the answer to the next question below (Rev#3.9). An early study showed nucleoside-diphosphate kinases (NDK) to be expressed on the rod outer segments of bovine retina (Abdulaev et al., Biochemistry 37:13958-13967, 1998). More recently NDK-A was shown to be strongly expressed in photoreceptor inner segments (Rueda et al., Molecular Vision 22:847-885, 2016). We now refer to both studies in the results section of the manuscript (Line 227-228).

      Rev#3.9. In the "Impact on glycolytic activity, serine synthesis pathway, and anaplerotic metabolism" section, the authors claim in the no RPE group glycolytic activity was higher due to a depressed GTP-to-lactate ratio. However, this reviewer is under the impression that GTP production in photoreceptors is not mitochondrial specific, so this ratio doesn't make sense (I could be mistaken, however). A better ratio would have been pyruvate/lactate or glucose/lactate when discussing increased glucose consumption.

      Response: We appreciate the reviewers’ comment, yet we do indeed believe we can show that GTP-production in our experimental context is mainly mitochondrial. As explained in the manuscript results section (“Photoreceptors use the Krebs-cycle to produce GTP”), there are essentially only two possibilities for a photoreceptor to produce sizeable amounts of GTP. In the mitochondria via SUCLG1 – i.e. an enzyme highly expressed in photoreceptor inner segments (Figure 5D) – and the cytoplasm via NDK from excess ATP. The claim about the depressed GTP-to-lactate ratio in the no RPE situation takes this into account. Importantly, since in the no RPE situation ATP-levels are significantly lower than GTP, here GTP can only be produced via SUCLG1 and OXPHOS. Moreover, this contrasts with the FCCP group where mitochondrial OXPHOS is disrupted and both ATP and GTP are depleted.

      As far as ratios with pyruvate and glucose are concerned, we agree that these could potentially be very interesting to analyse. Unfortunately, in our retinal tissue 1H-NMR spectroscopy- based metabolomics analysis the levels of both pyruvate and glucose were below the detection limits which likely reflects their rapid metabolic turnover (cf. table S1). While this might be attributable to the marked consumption of these metabolites within the tissue, it does not allow for us to calculate the suggested ratios to lactate. Then again, in the supernatant medium which was collected at the same time point as the retina tissue, we can readily detect glucose and lactate levels, for this data please see the new Supplementary Figure 4.

      Rev#3.10. Aspartate aminotransferase should be abbreviated as AST, as it is more commonly noted.

      Response: In response to this comment from the reviewer, we have changed the abbreviation for aspartate aminotransferase from AAT to AST throughout the manuscript.

      Rev#3.11. In the discussion the assumptions of the ex vivo culture systems should be clearly stated. One that was not mentioned, but affects the implications of the data, is that the retinas used in this study are from the developing mouse eye. Another important assumption that was made in this paper was that the changes in retinal metabolism were due to photoreceptors even though the whole neural retina was included.

      Response: The reviewer is correct; we have added these two points to the discussion section of the manuscript. Notably, we now included a new subchapter “The retina as an experimental system for studies into neuronal energy metabolism” (Lines 367-395) to present different in vitro and in vivo test systems.

      Rev#3.12. Starting at line 347: As the authors know, the RPE has been shown to be highly reliant on mitochondrial function, and disruption of RPE mitochondrial metabolism leads to photoreceptor degeneration (numerous papers have shown this). Furthermore, the lower levels of lactate detected in their explants when RPE was present suggests that lactate is actively transported out of the neural retina by the RPE.

      Response: The reviewer is right about lactate being exported from the retina to the blood stream in vivo, or, in our in vitro study, to the culture medium. In the new dataset showing glucose and lactate concentrations in the culture medium (new Supplementary Figure 4C, D), we show that without RPE (no RPE group) and the retina releases more significantly lactate into the medium than control retina with RPE. At the same time the no RPE retina consumes more glucose than control retina.

      Rev#3.13. Line 360: Again, in mouse photoreceptors (by bulk RNAseq and scRNAseq), there is no GLUT3 expression (encoded by slc2a3). It was also recently shown by Dr. Nancy Philp's lab that rod photoreceptors express GLUT1, encoded by slc2a1 (PMCID: PMC9438481). The differences reported in this study and previous studies should be discussed.

      Response: Although this comment may not make us very popular, we are somewhat sceptical of RNAseq data (especially single cell RNAseq) since the underlying methodology – at the current level of technological development – is notoriously unreliable when it comes to the assessment of low abundance transcripts and suffers from apoor batch reproducibility, compared to NMR based metabolomics. Due to methodological constraints RNAseq have a propensity to display erroneously high or low expression. Moreover, and perhaps even more important, dissociated cells in scRNAseq studies undergo rapid gene expression changes that can significantly falsify the image obtained (Rajala et al., PNAS Nexus 2:1-12, 2023). Finally, it cannot be emphasized enough that mRNA expression profiles DO NOT equate protein expression and there are numerous examples for divergent expression profiles when mRNA and protein is compared.

      The Daniele et al. study (FASEB Journal 36:e22428, 2022; PMCID: PMC9438481) used in situ hybridization to study the mRNA expression of GLUT1 (slc2a1) and GLUT3 (slc2a3). In line with our comment just above, the Daniele et al. study may provide for an example of divergence between mRNA and protein expression, since it seemingly showed only minor expression of GLUT1/slc2a1 in the RPE, i.e. precisely in the one cell type that is well-known for its very strong GLUT1 protein expression.

      Furthermore, Daniele et al. used a conditional GLUT1 knock-out in photoreceptors induced by repeated Tamoxifen injections. The photoreceptor GLUT1 knock-out led to a relatively mild phenotype with only about 45% of the outer nuclear layer lost over a 4-months time-course. This is in stark contrast with the FCCP or the 1,9-DDF treatment, which would ablate nearly all rod photoreceptors in under one or two weeks, respectively.

      As a side note, Tamoxifen is an oestrogen receptor antagonist (with partial agonistic behaviour) with a long history of causing retinal and photoreceptor damage. Notably, oestrogen receptor signalling is important for maintaining photoreceptor viability (Nixon & Simpkins, IOVS 53:4739-47, 2012; Xiong et al., Neuroscience 452:280-294, 2021). Therefore, the relatively minor effects of the conditional GLUT1 KO in photoreceptors found in Daniele et al. may have been confounded by direct tamoxifen photoreceptor toxicity. On a wider level, this possible confounding factor related to the use of Tamoxifen points to general problems associated with certain forms of genetic manipulations.

      We now mention the controversy around the expression of glucose transporters in the retina, including the Daniele et al. study in a new subchapter of the discussion on "Expression of glucose transporters in the outer retina” (Lines 454-472).

      Rev#3.14. Lines 370-372: FCCP caused a strong cell death phenotype in rods, however under stress rods upregulate the secretion of RdCVF, which leads to cone photoreceptor survival by the upregulation of aerobic glycolysis in cones. The data should be re-interpreted in the context of this previous literature.

      Response: We thank the reviewer for this comment; however, we could not find a reference that would state that “…under stress rods upregulate the secretion of RdCVF”. What we did find was a reference stating that similar factors such as thioredoxins (TRX80) are secreted from blood monocytes under stress (Sahaf & Rosén, Antioxid Redox Signal 2:717-26, 2000). However, we consider these cells to be too dissimilar to rod photoreceptors to warrant a corresponding comment. Moreover, the research group who discovered RdCVF originally showed that rod-secreted RdCVF cannot prevent cone degeneration if the corresponding Nxnl1 gene is knocked-out in cones, arguing for a cell-autonomous mechanism of RdCVF -dependent cone protection (Mei et al., Antioxid Redox Signal. 24:909-23, 2016).

      Since it is very possible that we may have missed the correct reference(s), we would welcome further guidance by the reviewer.

      Rev#3.15. Line 374: 1,9-DDF caused a 90% loss of cones, however, previous studies by Dr. Nancy Philp have shown glucose deprivation in the outer retina affects primarily rod photoreceptors. The differences should be discussed.

      Response: We thank the reviewer for directing us to these studies. As mentioned above (Rev#3.13.) the Daniele et al. 2022 study yielded only relatively mild effects for a rod-specific conditional GLUT1 KO on photoreceptor viability. Similarly, in an earlier study (Swarup et al., Am J Physiol Cell Physiol. 316: C121–C133, 2019) the Philp group found that also a GLUT1 KO in the RPE caused only a minor phenotype in the photoreceptor layer. We would argue that if glucose, and by extension aerobic glycolysis, were indeed of major importance for (rod) photoreceptor survival, the degenerative effect of these genetic GLUT1 ablations should have been devastating and should have destroyed most of the outer retina in a matter of days. The fact that this was not seen in both studies is another piece of independent evidence that rod photoreceptors do not rely to any major extent on glycolytic metabolism.

      The two studies from the Philp lab (Swarup et al., 2019; Daniele et al., 2022) are now cited in the discussion (Lines 417-419 and 458-460).

      Rev#3.16. Line 375: Yes Dr. Claudio Punzo and Dr. Leveillard Thierry along with other groups have shown glycolysis is required to maintain cone survival when under stress, however, the authors should emphasize that it is under stress that this is observed.

      Response: In response to this comment we have now specifically extended our corresponding remark in the discussion of the manuscript (Lines 446-447).

      Rev#3.17. The section "Cone photoreceptors use the Cahill-cycle". The presence of ALT in photoreceptors was surprising and suggests alternatives to the Cori reaction. However, previous measurements of glucose and lactate from localized in vivo cannulation of animal eyes suggest the majority of glucose taken up by the retina is released back to the blood as lactate. Again, this section should discuss this idea in terms of the previous literature.

      Response: Here, we believe the reviewer is referring to studies performed in the late 1990s where, in anaesthetized cats, the lactate concentration in blood samples obtained from choroidal vein cannulation was compared against that in blood samples obtained from femoral arteries (Wang et al., IOVS 38:48-55, 1997). We note that a more relevant in vivo measurement of retinal glucose consumption and lactate production would likely require the simultaneous cannulation of the central retinal artery (CRA) and the central retinal vein (CRV). This would need to be combined with repeated (online) blood sampling, drug applications, and subsequent metabolomic analysis. We are not aware of any in vivo studies where such procedures have been successfully performed and further miniaturization and increased sensitivity of metabolomic analytic equipment will likely be required before such an undertaking may become feasible. Even so, such studies may not be feasible in small rodents (mice, rats) and may instead require larger animal species (e.g. dog, monkey) to overcome limitations in eye and blood sample size.

      We have now extended the discussion of our manuscript with a new subchapter on “The retina as an experimental system for studies into neuronal energy metabolism”. Within this new subchapter we now present two different in vivo experimental approaches that addressed retinal energy metabolism (Lines 376-384). Moreover, we now present new data on retinal lactate release to the culture medium, showing, for instance, a strong increase in lactate release in the no RPE condition compared to control (new Supplementary Figure 4).

      Rev#3.18. Lines 431-433: The study cited suggested that the mitochondrial AST was detected in other cells, in agreement with the data shown. However, the authors' statements in this section are misleading as they do not take into consideration the contribution of AST from other cell types.

      Response: The reviewer is right, we found both AST1 and AST2 to be expressed not only in photoreceptor inner segments but also in the inner retina, especially in the inner plexiform layer (new Figure 6D). Since this might indicate mini-Krebs-cycle activity also in retinal synapses, we have added a corresponding comment to the discussion (Lines 540-543).

      Grammatical and wording fixes:

      Rev#3.19. Line 98 - "the recycling of the photopigment, retinal."

      Response: We have inserted a comma after “photopigment”.

      Rev#3.20. Results section and Figure 1 start without providing context for the model system where staining is being done.

      Response: We have added this information to the beginning of the results section (Lines 105-106).

      Rev#3.21. Supplementary Figure 2 is not mentioned in the main text - there is no context for this figure.

      Response: Supplementary Figure 2 was originally referenced in the legend to Figure 2. We now mention supplementary figure 2 (now renumbered to supplementary figure S3) also in the main text, in the results section under “Experimental retinal interventions produce characteristic metabolomic patterns” (Line 148).

      Rev#3.22. Volcano plot in Supplementary Figures 3, 5, 6, 7, and 8 don't indicate what Log2(FC) is in reference to.

      Response: The log2 fold change (FC) is calculated as follows: log2 (fold change) = log2 (mean metabolite concentration in condition A) - log2 (mean metabolite concentration in condition B) where condition A and condition B are two different experimental groups being compared. This is now explained in the SI Materials and Methods (Lines 145-147) and indicated in abbreviated form in the figure legends. Please note that supplemental figures have now been renumbered due to the insertion of an additional, new Figure.

      Rev#3.23. Line 331 - –a“d allowed to analyze the..." ”s incorrect phrasing.

      Response: This phrasing was changed.

      Rev#3.24. Line 343 "c“cled" ”

      Response: This phrasing was changed.

      Rev#3.25. Line 446 is misworded.

      Response: This phrasing was changed.

      Technical questions:

      Rev#3.26. At what point after explant was the IHC done in Supplemental Figure 1? If early, but experiments are done later, there's’a chance things are more disorganized at the end of the experiment.

      Response: Staining and metabolomics analysis were both done at the end of each experiment, at the same time, at P15. This is now mentioned in the SI materials and methods section (Lines 67, 108-110).

      Rev#3.27. FCCP affects plasma membrane permeability, which is particularly critical in neurons that undergo repolarization and depolarization - –ow do we know FCCP on cell death via metabolism? See: https://www.sciencedirect.com/science/article/pii/S2212877813001233

      Response: The reviewer is correct, a significant permeabilization of cell membranes in general would likely cause extensive neuronal cell death, unrelated to a disruption of OXPHOS. However, the FCCP concentration used here (5 µM) is at the lower end of what was used in the mentioned Kenwood et al. study (Mol Metab. 3:114-123, 2014) and the effect on cell membrane permeability in tissue culture is likely to be rather small, as opposed to what was seen by Kenwood et al. in cultures of individual cells. This view is supported by the fact that in our FCCP treatments, we did not observe any significant increases of cell death in any retinal cell type (including RPE) other than in rod photoreceptors. Together with the fact that only photoreceptors strongly express Krebs-cycle/OXPHOS related enzymes, this strongly suggests that the FCCP effects seen by us were due to disruption of OXPHOS.

      Rev#3.28. Numerous metabolite comparisons are being made throughout the manuscript – what type of multiple hypothesis testing corrections are utilized? Only certain figures mention multiple hypothesis testing (e.g. Figure 6).

      Response: In general, in this manuscript we used two different statistical methods: 1) For two-group comparisons, we used an unpaired, two-tailed t-test, which reports a p-value with 95% confidence interval without additional multiple hypothesis testing (e.g. in Figure 2, Suppl. Figures 4, 6, 7, 8). 2) For multiple group comparisons we used a one-way ANOVA analysis with Tukey’s multiple comparisons post-hoc test (except suppl. Figure 9 where Fisher´s LSD post-hoc test was used). The information on which statistical test was used for what dataset is now given in the figure legends and in the SI Material and Methods section.

      Rev#3.29. For Figure 3, how do we know that the removal of RPE is causing the metabolite changes due to RPE-PR coupling? How do you rule out the fact that it isn’t just: I – a thicker physical barrier between media and the neural retina that is causing the changes, or II – removal of RPE from PR causes OS shearing and a stress response that alters metabolism?

      Response: We believe these concerns can be ruled out: The RPE cells are linked by tight junctions and are not “just a thicker barrier” but a barrier that is almost impermeable for most metabolites unless they are carried by specific transporters. Outer segment shearing via RPE removal would indeed be a concern if we had used adult retina. However, we explanted that retina at P9 when it does not possess any sizeable outer segments yet. As a matter of fact, photoreceptors grow out outer segments only after P9.

      Rev#3.30. While 1,9-dideoxyforskolin blocks GLUT1, it is known to have other effects, including on potassium channels. How do we know the effects of 1,9-dideoxyforskolin are specific to GLUT1? Utilizing a GLUT1 KO and showing no additional effects when adding 1,9-dideoxyforskolin would be helpful as a control.

      Response: This is a good suggestion from the reviewer. We note that this is technically not easy to achieve as it would require an RPE-specific knock-out that should be inducible at a given experimental time-point, in a quantitative manner. The study by Swarup et al. (see above Rev#3.13.) used an RPE specific knock-out that was, however, not inducible. Moreover, if the corresponding inducible knock-out animals could be generated, then the stochastic nature of the inducing treatment would probably affect only a limited number of cells within a given cell population. In our experimental context, a less than quantitative knock-out would significantly complicate interpretation of results, even to the point that no additional insight might be gained.

      Rev#3.31. The analysis in Figure 6, even with attempts to control drug treatments, is highly speculative. One really needs animals with predominately cones vs. predominately rods to do this analysis (e.g. with NRL mice).

      Response: The reviewer is right, the analysis shown in Figure 6 was an explorative approach to try and deduce features of rod and cone metabolism. This is now mentioned in the results section (Lines 282-284). Since the experiments were not initially intended to address such questions, by necessity the interpretations remain speculative. The comparison of mouse mutants in which there are either no cones (e.g. cpfl1 mouse) or no rods (e.g. NRL knock-out mouse) may allow to disentangle the metabolic contributions of rods and cones. We appreciate the suggestion from the reviewer and have now inserted a relating suggestion for future studies into the discussion section (Lines 450-452).

      Rev#3.32. Overall, much of the paper suggests intriguing pathways, but without C13 tracing or relevant genetic knock-outs, the pathways would have to be speculative rather than definitive.

      Response: We agree with the reviewer that further research, including 13C and 15N-tracing studies, will be necessary to evaluate which pathway(s) are used by what retinal cell type under what condition. Still, the high robustness and quantitative nature of the NMR metabolomics data allows us to draw pathway conclusions based on metabolites that are unique to specific pathways/cell types or using ratios. We now relate to the advantages of such carbon-tracing studies in the discussion of the manuscript (Lines 545-549).

      Stylistic suggestions:

      Rev#3.33. This is a very dense paper to read. It would be helpful for each figure to have a summary diagram of the relevant metabolite changes and how they fit together. Further, for those not metabolism-inclined, defining the mini-Kreb’s, Cahill, and Cori cycles and their brief implications at some point early in the manuscript would be helpful.

      Response: We have been thinking a lot about how we could add in the suggested summary diagrams into each figure. Unfortunately, whatever idea we contemplated would have significantly increased the complexity of the figures, while the actual benefit in terms of improved understandability was unclear.

      However, we did include the suggestion from the reviewer to present the terms Cori, Cahill-, and mini-Krebs-cycle already in the introduction and we hope that this has improved the understandability of the manuscript overall (Lines 79-92).

      Rev#3.34. More discussion about the step-by-step ways that the mini-Kreb’s reaction “uncouples” glycolysis from the Kreb’s cycle would be helpful. What do you mean by “uncouple” in this context?

      Response: We thank the reviewer for this suggestion. Uncoupling in this context means that glycolysis and Krebs cycle are not metabolically coupled to each other via pyruvate. Instead both pathways can run independently from each other and in parallel, as long as the Krebs-cycle uses glutamate, BCAAs or other amino acids as fuels. We now also address this point already in the introduction of the manuscript (Lines 87-90).

      Conceptual questions:

      Rev#3.35. As the proposal that PR undergo heavy amounts of OXPHOS is controversial, it would be helpful for the authors to review the literature on lactate production by the retina and what studies have shown previously about retina use of lactate, specifically lactate making its way into TCA cycle intermediates, suggesting OXPHOS, in PRs.

      Response: In response to this question we have added several new references to the introduction and discussion of the manuscript. The question of lactate production (aerobic glycolysis) vs. the use of OXPHOS is now discussed in Lines 77-81, Lines 367-384.

      Rev#3.36. Why would cones die more in the no RPE condition? The authors suggest this has something to do with GLUT1 expression on RPE and the transport of glucose to cones. Even if we accept that cones are highly glycolytic, loss of RPE should expose the neural retina to even more glucose in your experimental set-up.

      Response: This is a very interesting question from the reviewer. Indeed, loss of the RPE and blood-retinal barrier function should increase photoreceptor access to glucose, even more so if they are expressing high affinity GLUT3. In the discussion (Lines 420-424), we speculate that this may trigger the Crabtree effect, shutting down OXPHOS and causing the cells to exclusively rely on glycolysis. This, however, will likely not yield sufficient ATP to maintain their viability, so that they “starve” to death even in the presence of ample glucose. Since cones require at least twice as much ATP as rods, they may be more sensitive to a Crabtree-dependent shut-down of OXPHOS. However, if this speculation was correct then the question remains why the FCCP treatment, which abolishes OXPHOS more directly, does not cause cone death. Here, we again can only speculate that high glucose may have additional toxic effects on cones that are independent of OXPHOS. We now try to present this reasoning in the discussion (Lines 426-429).

    1. Author Response

      The following is the authors’ response to the original reviews.

      We thank the reviewers and editors for their comments, as well as for the time dedicated to make useful suggestions that have contributed to improve the manuscript. We have responded to the concerns raised by the reviewers, and after that, we have also responded to the different points highlighted in the Recommendations for the authors:

      Reviewer #1

      While in vivo injury was used to assess regeneration from subsets of PNS neurons, different in vitro neurite growth or explant assays were used for further assessments. However, the authors did not assess whether the differential "regenerative" responses in vivo could be recapitulated in vitro. Such results will be important in interpreting the results.

      We included a supplementary figure evaluating the neurite extension in vitro and updated the text accordingly.

      Intriguingly, even in individual groups of PNS neurons, not all neurons regenerate to the same extent. It is known that the distance between the cell body and the lesion site affects neuronal injury responses. It would be interesting to test this in the observed regeneration.

      Although it is true that the distance can affect the outcome, here we used a physiological model where all neurons are lesioned at the same point in the nerve. Not only distance is different for motoneurons, but also the microenvironment surrounding their somas and therefore the direct comparison of these neurons with sensory neurons is limited. We extended the discussion on this matter in the new manuscript.

      Fig 1: The authors quantified the number of regenerating axons at two different time points. However, the total numbers of neurons/axons in each subset are different. The authors should use these numbers to normalize their regenerative axons.

      Figure 1D shows the normalization of data from figure 1C (normalized against the number of control axons in each neuron type). This has been clarified in the text.

      Fig 2-5: In explaining differential regeneration of individual groups of neurons, there are at least two possibilities: (1). Each group of neurons has different injury/regenerative responses; (2). The same set of injury/regenerative responses are differentially activated. Some data in this manuscript suggested the latter possibility. But some other data point in the opposite direction. It would be informative for the authors to analyze/discuss this further.

      From our point of view, these two options can be considered differential response to injury and could be potentially used for the modulation of regeneration. However, if the second possibility is correct, the regenerative program could be more influenced by the time chosen to study the response. Given the importance of this, we added some discussion about this topic.

      Fig 6: Is it possible to assess the regenerative effects of knockdown Med12 after in vivo injury?

      It is possible, but it is out of the scope of this work. Here, we aimed to describe the regenerative response and validate our data by testing a potential target for specific regeneration. Future studies will focus on the modulation of this specific regeneration both in vitro and in vivo.

      Reviewer #2

      It seems that the most intriguing outcome of this paper revolves around the role of Med12 in nerve regeneration. The authors should prioritize this finding. Drawing a conclusion regarding Med12's role in proprioceptor regeneration based solely on this in vitro model may be insufficient. This noteworthy result requires further investigation using more animal models of nerve regeneration.

      The main goal of this work was to compare the regenerative responses of different neuron subpopulations. We modulated Med12 to validate our data and the potential of our findings. Unfortunately, investigating in depth the role of Med12 in regeneration is out of the scope of this paper. For this reason, we did not prioritise this finding here. As this finding was striking, we strongly agree that the next step should be studying how it modulates regeneration.

      One critique revolves around the authors' examination of only a single time point within the dynamic and continuously evolving process of regeneration/reinnervation. Given that this process is characterized by dynamic changes, some of which may not be directly associated with active axon growth during regeneration, and encompasses a wide range of molecular alterations throughout reinnervation, concentrating solely on a single time point could result in the omission of critical molecular events.

      We agree that this is probably the main limitation of this study, as we discussed in the text. We chose 7 days postinjury as a standard time point widely described in literature and to have a correlate with our histological data. Although the main aim was to compare populations, analyzing an additional time point after injury could add valuable information.

      Reviewer #3

      No concerns were expressed by that reviewer.

      Recommendations for the authors:

      The authors should assess whether the differential "regenerative" responses in vivo could be recapitulated in vitro.

      We included a supplementary figure evaluating the neurite extension in vitro and updated the text accordingly.

      Optional:

      It will be interesting to test if the distances between the cell body and the lesion site contribute to the observed differences in individual subsets of PNS neurons.

      Figure 1D shows the normalization of data from figure 1C (normalized against the number of control axons in each neuron type). This has been clarified in the text.

      Fig 2-5: In explaining differential regeneration of individual groups of neurons, there are at least two possibilities: (1). Each group of neurons has different injury/regenerative responses; (2). The same set of injury/regenerative responses are differentially activated. Some data in this manuscript suggested the latter possibility. But some other data point in the opposite direction. At least the authors should discuss these.

      From our point of view, these two options can be considered differential response to injury and could be potentially used for the modulation of regeneration. However, if the second possibility is correct, the regenerative program could be more influenced by the time chosen to study the response. Given the importance of this, we added some discussion about this topic.

      While the paper is technically well-executed, the conclusions and some of the findings appear to be incomplete and challenging to draw meaningful conclusions from. This manuscript presents some interesting findings, but the title is quite broad and may suggest that the authors have unveiled fundamental mechanisms explaining the varying regenerative abilities of peripheral axons. However, the results do not substantiate such a conclusion. Further comments and suggestions follow.

      We eliminated the word “regenerative (response)” from the title, as it could lead to think that all changes seen in these neurons are related only to regeneration. We think that “Neuron-specific RNA-sequencing reveals different responses in peripheral neurons after nerve injury” highlights the differences between neurons that we found without misleading towards thinking that we described regenerative mechanisms in all neurons.

      What's notably absent here is the validation of certain genes found with the ribosomes, especially those highlighted in the subsequent figures. The question arises as to whether the changes depicted in the figures align with changes in the DRGs in vivo. Is there concordance between the presence of these genes and their transcriptional changes? It would greatly enhance the study's value if the authors could show evidence of upregulation or downregulation of certain genes over time in tissue sections, utilizing techniques such as in situ hybridization or immunocytochemistry.

      We selected some factors that were specifically upregulated in subsets of neurons to corroborated by immunohistochemistry these findings. Changes in the immunofluorescence of P75 in motoneurons and ATF2 in cutaneous mechanoreceptors, were evaluated in controls and animals that received a nerve crush one week before. Supplementary figures with the images have been added.

      The authors discovered intriguing distinctions, such as the presence of specific signaling pathways unique to neurons projecting to muscle as opposed to those projecting to the skin. Among these pathways were those associated with receptor tyrosine kinases like VEGF, erbB, and neurotrophin signaling among others. The question now arises: do these pathways play a role in natural peripheral regeneration processes? To answer this, it is imperative to conduct in vivo studies. However, the authors employed an in vitro DRG neurite outgrowth assay to demonstrate that various types of neurons exhibit different responses to the presence of different neurotrophins. This does not reflect what actually happens in vivo. While neurotrophins indeed play a role in neuron survival and axon extension during development, their role in postnatal periods changes over time, and it remains unclear whether they play any role in the natural regenerative processes of the peripheral nerve. Therefore, this experiment may not be directly relevant in this case, especially during the early axon extension period of the regenerating axons. if the authors aim to establish a causal link with neurotrophin signaling, it becomes crucial to conduct in vivo experiments by manipulating the expression of key molecules like the receptors.

      It has been widely described that different types of peripheral neurons have a differential expression of Trk receptors, even in the adult, and that these respond differentially to neurotrophins. In our study, we do not stablish a causal relationship between the expression of Trk and neurite extension, but instead we show (as many others) that distinct neurons respond differentially to these neurotrophins. The fact that in vivo studies fail to show a clear effect does not necessarily mean that neurotrophins are not specific. It might mean that their effect is not strong enough to be a useful guide in the complex microenvironment found after an injury. For instance, NGF acts on TrkA (present in some neurons), but in vivo it has been shown to accelerate the clearance of myelin debris in Schwann cells (Li et al., 2020), which could facilitate regeneration of all type of axons, masking any potential specific effect on the subtypes of neurons expressing TrkA. In contrast, in an in vitro setting on neuronal cultures, the specific neuronal effect can be more evident.

      Additionally, it's worth noting that another paper utilizing the same methodology and experimental setup (PMID: 29756027, "Translatome Regulation in Neuronal Injury and Axon Regrowth" by Rozenbaum et al.) exists. Are there any significant differences or shared findings with that study?

      This study shows the transcriptomic response after an injury 4, 12 and 24 hours after an injury in a very similar experimental setup. They focus on comparing the neuronal vs the glial response to the injury, using a Ribotag line that tags ribosomes from all neurons in the DRG rather than specific neuron subtypes. As the time postinjury (24h vs 7 days) and the cell types studied are different, we could not directly compare our results. We did see an upregulation in both datasets of previously described growth-associated genes (Jun, Atf3, Sox11, Sprr1a, Gal…). We included the article in the references for its relevance in the topic.

      It would be helpful for readers to illustrate the finding of the fastest axon regeneration of nociceptors by showing fluorescence micrographs of the nerve samples in addition to the graphs shown in Fig. 1 C/D.

      In figure 1B, we show fluorescence micrographs of the nerves 7 days postinjury. As explained in the results, we counted the number of axons at 2 distances from the injury, we did not analyse the fastest axon. This is due to technical reasons: 7 days after the injury the fastest axon has surpassed our evaluation point, which was the further distance that we could assess in our experimental setting in a consistent manner. If the reviewer thinks that we need to include more images from our evaluations (from 9 dpi for example), we could prepare a new figure.

      The labeling in Fig. 2B is confusing. Is the CHAT immunoreactivity shown in the last panel illustrated by green or red signals? Is the red signal counterstaining with beta-tubulin?

      The labelling was changed in the figure to increase clarity.

      The references to the supplementary data throughout the manuscript are confusing. For example, where can the "Supp data 2" be found? (mention on p. 14 in the merged pdf file). Are they referring to the Excel spreadsheets?

      We divided the supplementary material in supplementary figures/table (found in the pdf) and supplementary data. Supplementary data refers to excel spreadsheets found outside the pdf file. We hope this will be clearer after the final formatting of the article.

      What does the following statement on p. 14 mean?: "The caveat in these analyses was that molecular classification by these approaches may be arbitrary, and not reflective of protein repurposing." This reviewer notes that these databases consider the fact that components participate in different pathways.

      Indeed, we aimed to explain that many proteins participate in different pathways, and this is a limitation of the enrichment analysis. We modified the sentence in the text.

      First paragraph on p. 15: The PPAR and AMPK pathways have much broader roles, and are not only "related to fatty acid metabolism". This factual inaccuracy should be corrected in the manuscript.

      The sentence has been corrected.

      The authors should consider showing increased TGF-beta signaling in their neurons after downregulation of Med12 given the previous implication of TGF-beta signaling in axon regeneration.

      We tried to demonstrate the effect of our knockdown in TGF-beta pathway by analyzing the expression of typical targets from this pathway by qPCR in our cultures. However, we could not detect any difference. We think that this can have two explanations: (1) as only a few cells upregulate Med12 whereas many cells downregulate it, the effect is masked (presumably only proprioceptors will have a significant difference in this pathway and, thus, it would be very difficult to see the effect), or (2) Med12 is not exerting its effect through this pathway. We added a supplementary figure with these data and discussed it in the manuscript.

      It would be helpful to eliminate typos and improve syntax/grammar/style.

      We revised the text to improve style.

    1. Author Response

      Public Reviews:

      Reviewer #1

      Strengths:

      Overall, the work is novel and moves the field of Alzheimer's disease forward in a significant way. The manuscript reports a novel concept of aberrant activity in VIP interneurons during the early stages of AD thus contributing to dysfunctions of the CA1 microcircuit. This results in the enhancement of the inhibitory tone on the primary cells of CA1. Thus, the disinhibition by VIP interneurons of Principal Cells is dampened. The manuscript was skillfully composed, and the study was of strong scientific rigor featuring well-designed experiments. Necessary controls were present. Both sexes were included.

      We express our gratitude to the reviewer for their keen appreciation of our efforts and their enthusiasm for the outcomes of this research.

      Limitations:

      (1) The authors attributed aberrant circuit activity to the accumulation of "Abeta intracellularly" inside IS-3 cells. That is problematic. 6E10 antibody recognizes amyloid plaques in addition to Amyloid Precursor Protein (APP) as well as the C99 fragment. There are no plaques at the ages 3xTg mice were examined. Thus, the staining shown in Figure 1a is of APP/C99 inside neurons, not abeta accumulations in neurons. At the ages of 3-6 months, 3xTg starts producing abeta oligomers and potentially tau oligomers as well (Takeda et al., 2013 PMID: 23640054; Takeda et al., 2015 PMID: 26458742 and others). Emerging literature suggests that abeta and tau oligomers disrupt circuit function. Thus, a more likely explanation of abeta and tau oligomers disrupting the activity of VIP neurons is plausible.

      The Reviewer correctly points out that 3xTg-AD mice typically do not exhibit plaques before 6 months of age, with limited amounts even up to 12 months, particularly in the hippocampus. To the best of our knowledge, the 6E10 antibody binds to an epitope in APP (682-687) that is also present in the Abeta (3-8) peptide. Consequently, 6E10 detects full-length APP, α-APP (soluble alpha-secretase-cleaved APP), and Abeta (LaFerla et al., 2007). Nonetheless, we concur with the Reviewer's observation that the detected signal includes Abeta oligomers and the C99 fragment, which is currently considered an early marker of AD pathology (Takasugi et al., 2023; Tanuma et al., 2023). Studies have demonstrated intracellular accumulation of C99 in 3-month-old 3xTg mice (Lauritzen et al., 2012), and its binding to the Kv7 potassium channel family, which results in inhibiting their activity (Manville and Abbott, 2021). If a similar mechanism operates in IS-3 cells, it could explain the changes in their firing properties observed in our study. Consequently, we will revise the manuscript to include this crucial information in both the Results and Discussion sections.

      (2) Authors suggest that their animals do not exhibit loss of synaptic connections and show Figure 3d in support of that suggestion. However, imaging with confocal microscopy of 70micron thick sections would not allow the resolution of pre- and post-synaptic terminals. More sensitive measures such as electron microscopy or array tomography are the appropriate techniques to pursue. It is important for the authors to either remove that data from the manuscript or address the limitations of their technique in the discussion section. There is a possibility of loss of synaptic connections in their mouse model at the ages examined.

      We appreciate the Reviewer’s perspective on the techniques used for imaging synaptic connections. While we acknowledge the limitations of confocal microscopy for resolving pre- and post-synaptic structures in thick sections, we respectfully disagree regarding the exclusive suitability of electron microscopy (EM). Our approach involved confocal 3D image acquisition using a 63x objective at 0.2 um lateral resolution and 0.25 Z-step, providing valuable quantitative insights into synaptic bouton density. Despite the challenges posed by thick sections, this method together with automatic analysis allows for careful quantification. Although EM offers unparalleled resolution, it presents challenges in quantification. We will ensure to include the important details regarding image acquisition and analysis in the revised manuscript.

      Reviewer #2 (Public Review):

      Summary:

      The submitted manuscript by Michaud and Francavilla et al., is a very interesting study describing early disruptions in the disinhibitory modulation exerted by VIP+ interneurons in CA1, in a triple transgenic model of Alzheimer's disease. They provide a comprehensive analysis at the cellular, synaptic, network, and behavioral level on how these changes correlate and might be related to behavioral impairments during these early stages of the disease.

      Main findings:

      3xTg mice show early Aß accumulation in VIP-positive interneurons.

      3xTg mice show deficits in a spatially modified version of the novel object recognition test. - 3xTg mice VIP cells present slower action potentials and diminished firing frequency upon current injection.

      3xTg mice show diminished spontaneous IPSC frequency with slower kinetics in Oriens / Alveus interneurons.

      3xTg mice show increased O/A interneuron activity during specific behavioral conditions.

      3xTg mice show decreased pyramidal cell activity during specific behavioral conditions.

      Strengths:

      This study is very important for understanding the pathophysiology of Alzheimer´s disease and the crucial role of interneurons in the hippocampus in healthy and pathological conditions.

      We are thankful to the reviewer for their insightful recognition of our efforts and their enthusiasm for the results of this research.

      Weaknesses:

      Although results nicely suggest that deficits in VIP physiological properties are related to the differences in network activity, there is no demonstration of causality.

      RE: We completely agree with the reviewer's observation regarding the lack of demonstration of causality in our results. Investigating causality in the relationship between deficits in VIP physiological properties and differences in network activity is indeed a crucial aspect of this project. However, achieving this goal will require a significant amount of time and dedicated manipulations in a new mouse model (VIP-Cre-3xTg). We appreciate the importance of this line of investigation and consider it as a priority for our future research endeavors.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We are grateful to the reviewers for their constructive comments. The following is our point-to-point responses.

      Reviewer #1 (Recommendations For The Authors):

      Point 1- Abstract: advanced morning peak « opposite » to pdf/pdfr mutants. To my knowledge, the alteration of PDF/PDFR suppresses the morning peak. I am not sure that an advance of the peak is « opposite » to its inhibition?

      Mutants with disruptions in CNMa or CNMaR display advanced morning activity, indicating an enhanced state. Mutants with disruptions in Pdf or Pdfr exhibit no morning anticipation, suggesting a promoting role of these genes in morning anticipation. Therefore, our revised version is: “Specific elimination of each from clock neurons revealed that loss of the neuropeptide CNMa in two posterior dorsal clock neurons (DN1ps) or its receptor (CNMaR) caused advanced morning activity, indicating a suppressive role of CNMa-CNMaR on morning anticipation, opposite to the promoting role of PDF-PDFR on morning anticipation.” (Line 43-51)

      Point 2- Fig 1K-L: the authors should show the sleep phenotype of the homozygous nAChRbeta2 mutant (if not lethal) for a direct comparison with the FRT/FLP genotype and thus evaluate the efficiency of the system.

      We have incorporated sleep profiles of nAChRbeta2 mutant and W1118 into Fig 1K-L. nAChRbeta2 mutants (red) exhibited a sleep amount comparable to that of pan-neural nAChRbeta2 knockout flies (dark red), as shown below.

      Author response image 1.

      Point 3- Dh31-EGFP-FRT expression patterns look different in figS1 A (or fig1 H) and J. why that?

      We re-examined the original data. Both (with R57C10-GAL4 for Fig. S1A, right, S1J, left) are Dh31EGFP.FRT samples displayed below which demonstrated consistent primary expression subsets. Any observed disparities in region "e" could potentially be attributed to variations during dissection.

      Author response image 2.

      Point 4- The knockdown experiments with the elav-switch (RU486) system (fig S2) do not seem to be as efficient as the HS-FLP system (fig 1H-J). The conclusions on the efficiency should be toned down.

      We have revised accordingly: "Near Complete Disruption of Target Genes by GFPi and Flp-out Based cCCTomics" (Line 130): "Knocking out at the adult stage using either hsFLP driven Flp-out (Golic and Lindquist, 1989) (Fig. 1H-1J) or neural (elav-Switch) driven shRNAGFP (Nicholson et al., 2008; Osterwalder et al., 2001) (Fig. S2A-S2I), also resulted in the elimination of most, though not all, GFP signals." (Line 145-149)

      Point 5- Fig 2H-J: the LD behavioral phenotype of pdfr pan-neuronal cripsr does not seem to correspond to what is described in the literature for the pdfr mutant (han), see hyun et al 2005 (no morning anticipation and advanced evening peak). I understand that the activity index is lower than controls but fig2H shows a large anticipatory activity that seems really unusual, and no advanced evening peak is observed. I think that the authors should show the CRISPR flies and pdfr mutants together, to better compare the phenotypes.

      Thank you for pointing out that the phenotypes of pan-neuronal knockout of PDFR by unmodified Cas9 (Fig. 2H-2I of the previous version) whose morning anticipation still exist (Fig, 2H of the previous manuscript), although the significant decrease of morning anticipation index (Fig 2I of the previous manuscript) and advanced evening activity are not as pronounced as observed in han5304 (Fig. 3C in Hyun et al., 2005).

      First, we have separated the activity plots of Fig. 2H of previous manuscript, as shown below. The activity from ZT18 to ZT24 shows a tendency of decreasing from ZT18 to ZT21 and a tendency of increasing from ZT21 to ZT24. The lowest activity before dawn during ZT18 to ZT24 shows at about ZT21, and the activity at ZT18 is comparable to the activity at ZT24. This is significantly different compared to the two control groups, whose activity tends to increase activity from ZT18 to ZT24 with an activity peak at ZT24.

      The activity from ZT6 to ZT12 increased much faster in Pdfr knockout flies and get to an activity plateau at about ZT11 compared to two control groups with a slower activity increasing from ZT6 to ZT12 with no activity plateau but an activity peak at ZT12.

      Author response image 3.

      Second, we have incorporated the phenotype of Pdfr mutants we previously generated (Pdfr-attpKO Deng et al., 2019) with Pdfr pan-neuronal knockout by Cas9.HC. This mutant lacks all seven transmembrane regions of Pdfr (a). The phenotypes are very similar between Pdfr-attpKO flies and Pdfr pan-neuronal knockout flies. In this experimental repeat, we found that a much more obvious advanced evening activity peak is observed both in pan-neuronal knockout flies and Pdfr-attpKO flies.

      To further analyze the phenotypes of Pdfr pan-neuronal knockout flies by Cas9.HC, we referred to the literature. The activity pattern at ZT18 to ZT24 (activity tends to decrease from ZT18 to ZT21 and tends to increase from ZT21 to ZT24, with the lowest activity before dawn occurring at about ZT21, and activity at ZT18 comparable to activity at ZT24) is also reported in Pdfr knockout flies such as Fig3C and 3H in Hyun et al., 2005, Fig 2B in Lear et al., 2009, Fig 3B in Zhang et al., 2010, Fig .5A in Guo et al., 2014, and Fig 5B in Goda et al., 2019. Additionally, the less pronounced advanced evening activity peak compared to han5304 (Fig. 3C in Hyun et al., 2005) is also reported in Fig. 2B in Lear et al., 2009, Fig. 3B in Zhang et al., 2010, and Fig. 5B in Goda et al., 2019. We consider that this difference is more likely to be caused by environmental conditions or recording strategies (DAM system vs. video tracing).

      Therefore, we revised the text to: “Pan-neuronal knockout of Pdfr resulted in a tendency towards advanced evening activity and weaker morning anticipation compared to control flies (Fig. 2H-2I), which is similar to Pdfr-attpKO flies. These phenotypes were not as pronounced as those reported previously, when han5304 mutants exhibited a more obvious advanced evening peak and no morning anticipation (Hyun et al., 2005)”.

      Author response image 4.

      Point 6-The authors should provide more information about the DD behavior (power is low, but how about the period of rhythmic flies, which is shortened in pdf (renn et al) and pdfr (hyun et al) mutants).

      We have incorporated period data into Fig. 2I. Indeed, conditional knock out of Pdfr by Cas9.HC driven by R57C10-GAL4 shortens the period length, as shown below (previous data), also in Fig. 2I of the revised version.

      In the revised Fig. 2I, we tested 45 Pdfr-attpKO flies during DD condition (3 out of 48 flies died during video tracing in DD condition), and only one fly was rhythmic. In contrast, 9 out of 48 Pdfr pan-neuronal knockout flies were rhythmic.

      Author response image 5.

      Point 7- P15 and fig6. The authors indicate that type II CNMa neurons do not show advanced morning activity as type I do, but Figs 6 I and K seem to show some advance although less important than type I. I am not sure that this supports the claim that type I is the main subset for the control of morning activity. This should be toned down.

      We have re-organized Fig. 6 and revised the summary of these results as: “However, Type II neurons-specific CNMa knockout (CNMa ∩ GMR91F02) showed weaker advanced morning activity without advanced morning peak (Fig. 6N), while Type I neurons-specific CNMa knockout did (Fig. 6J), indicating a possibility that these two type I CNMa neurons constitute the main functional subset regulating the morning anticipation activity of fruit fly”. (Line 400-405)

      Point 8- Figs 6M and N: is power determined from DD data? if yes, how about the period and arrhythmicity? Please also provide the LD activity profiles for the mutants and rescued pdfr genotypes.

      Yes, the power was determined from the DD data. In the new version of the manuscript, we have included the activity plots for the LD phase in supplementary Fig S13, as well as shown below (A, B), and the period and arrhythmicity data for the DD phase in Fig. 6S and Table S7. We have also refined the related description as follows: “Moreover, knocking out Pdfr by GMR51H05, GMR79A11 and CNMa GAL4, which cover type I CNMa neurons, decreased morning anticipation of flies (Fig. 6T, Fig. S13B). However, the decrease in morning anticipation observed in the Pdfr knockout by CNMa-GAL4 was not as pronounced as with the other two drivers. Because the presumptive main subset of functional CNMa is also PDFR-positive, there is a possibility that CNMa secretion is regulated by PDF/PDFR signal”. (Line 413-419)

      Author response image 6.

      Point 9- Fig 7: does CNMaR affect DD behavior? This should be tested.

      We analyzed the CNMaR-/- activity in the dark-dark condition over a span of six days. Results revealed a higher power in CNMaR mutants compared to control flies (Power: 93.5±41.9 (CNMaR-/-, n=48) vs 47.3±31.6 (w1118, n=47); Period: 23.7±0.3 h (CNMaR-/-, n=46) vs 23.7±0.3 h (w1118, n=47); arrhythmic rate 2/48 (CNMaR-/-) vs 0/47 (w1118)). Considering that mutating CNMa had no obvious effect on DD behavior, even if CNMaR affects DD behavior, it cannot be attributed to CNMa signal, we did not further repeat and analyze DD behavior of CNMaR mutant. We believe this raises another question beyond the scope of our current discussion.

      Reviewer #2 (Recommendations For The Authors):

      Point 1-One major concern is the apparent discrepancies in clock network gene expression using the Flp-Out and split-LexA approaches compared to what is known about the expression of several transmitter and peptide-related genes. For example, it is well established that the 5th-sLNv expresses CHAT (along with a single LNd), yet there appears to be no choline acetyltransferase (ChAT) signal in the 5th-sLNv as assayed by the Split-LexA approach (Fig. 4). This approach also suggests that DH31 is expressed in the s-LNvs, which, as one of the most intensely studied clock neuron are known to express PDF and sNPF, but not DH31. The results also suggest that the sLNvs express ChAT, which they do not. Remarkably PDF is not included in the expression analysis, this peptide is well known to be expressed in only two subgroups of clock neurons, and would therefore be an excellent test case for the expression analysis in Fig. 4. PDF should therefore be added to analysis shown in Fig. 4. Another discrepancy is PdfR, which split LexA suggests is expressed in the Large LNvs but not the small LNvs, the opposite of what has been shown using both reporter expression and physiology. The authors do acknowledge that discrepancies exist between their data and previous work on expression within the clock network (lines 237 and 238). However, the extent of these discrepancies is not made clear and calls into question the accuracy of Flp-Out and Split LexA approaches.

      The concerns mentioned above are:

      (1) sLNvs express PDF and sNPF but not Dh31;

      (2) ChAT presents in 5th-sLNv and one LNd but not in other sLNvs;

      (3) PDFR presents in sLNvs but not l-LNvs.

      (4) PDF is not included in the analysis.

      To verify the accuracy of these intersection analyses, all related to PDF positive neurons (except 5th-sLNv and LNds), we stained PDF and examined the co-localization between PDF-positive LNvs and the respective drivers ChAT-KI-LexA, Pdfr-KI -LexA, Dh31-KI -LexA, and Pdf-KI -LexA.

      First, Dh31-KI-LexA labeled four s-LNvs, as shown below (also in Fig. S9A). Therefore, the results of the intersection analysis of Dh31-KI-LexA with Clk856-GAL4 are correct. The difference in the results compared to previous literature is attributed to Dh31-KI-LexA labels different neurons than the previous driver or antibody.

      Second, no s-LNv was labeled by ChAT-KI -LexA as shown below. We rechecked our intersection data and found that we analyzed 10 brains of ChAT-KI-LexA∩Clk856-GAL4 while only two brains showed sLNvs positively. To enhance the accuracy of intersection analysis results, we marked all positive signal records when positive subsets were found in less than 1/3 of the total analyzed brains (Table S4).

      Third, one l-LNv and at least two s-LNvs were labeled by Pdfr-KI-LexA, as shown below (also in Fig. S9B). Fourth, Pdf-KI-LexA labels all PDF-positive neurons, but the intersection analysis by Pdf-KI-LexA and Clk856-GAL4 only showed scattered signals, as shown below (D, also in Fig. S9C). For these cases, we found some positive signals expected but not observed in our dissection. The possible reason could be the inefficiency of LexAop-FRT-myr::GFP driven by LexA. Therefore, our intersection results must miss some positive signals.

      Author response image 7.

      Finally, we revised the text to (Line 286-317):

      To assess the accuracy of expression profiles using CCT drivers, we compared our dissection results with previous reports. Initially, we confirmed the expression of CCHa1 in two DN1s (Fujiwara et al., 2018), sNFP in four s-LNvs and two LNds(Johard et al., 2009), and Trissin in two LNds (Ma et al., 2021), aligning with previous findings. Additionally, we identified the expression of nAChRα1, nAChRα2, nAChRβ2, GABA-B-R2, CCHa1-R, and Dh31-R in all or subsets of LNvs, consistent with suggestions from studies using ligands or agonists in LNvs (Duhart et al., 2020; Fujiwara et al., 2018; Lelito and Shafer, 2012; Shafer et al., 2008) (Table S4).

      Regarding previously reported Nplp1 in two DN1as (Shafer et al., 2006), we found approximately five DN1s positive for Nplp-KI-LexA, indicating a broader expression than previously reported. A similar pattern emerged in our analysis of Dh31-KI-LexA, where four DN1s, four s-LNvs, and two LNds were identified, contrasting with the two DN1s found in immunocytochemical analysis (Goda et al., 2016). Colocalization analysis of Dh31-KI-LexA and anti-PDF revealed labeling of all PDF-positive s-LNvs but not l-LNvs (Fig S9A), suggesting that the differences may arise from the broader labeling of 3' end knock-in LexA drivers or the amplitude effect of the binary expression system. The low protein levels might go undetected in immunocytochemical analysis. This aligns with transcriptome analysis findings showing Nplp1 positive in DN1as, a cluster of CNMa-positive DN1ps, and a cluster of DN3s (Ma et al., 2021), which is more consistent with our dissection.

      Despite the well-known expression of PDF in LNvs and PDFR in s-LNvs (Renn et al., 1999; Shafer et al., 2008), we did not observe stable positive signals for both in Flp-out intersection experiments, although both Pdf-KI-LexA and Pdfr-KI-LexA label LNvs as expected (Fig S9B-S9C). We also noted fewer positive neurons in certain clock neuron subsets compared to previous reports, such as NPF in three LNds and some LNvs (Erion et al., 2016; He et al., 2013; Hermann et al., 2012; Johard et al., 2009; Lee et al., 2006) and ChAT in four LNds and the 5th s-LNv (Johard et al., 2009; Duhart et al., 2020) (Table S4). We attribute this limitation to the inefficiency of LexAop-FRT-myr::GFP driven by LexA, acknowledging that our intersection results may miss some positive signals.

      Point 2-Related to this, the authors rather inaccurately suggest that the field's understanding of PdfR expression within the clock neuron network is "inconsistent" and "variable" (lines 368-377). This is not accurate. It is true that the first attempts to map PdfR expression with antisera and GAL4s were inaccurate. However, subsequent work by several groups has produced strong convergent evidence that with the exception of the l-LNvs after several days post-eclosion, PdfR is expressed in the Cryptochrome expressing a subset of the clock neuron network. This section of the study should be revised.

      We thank the reviewer for pointing this out. As we have already addressed and revised the related part in the RESULTS section (Line 308-317), we have now removed this part from the DISCUSSION section of the revised version.

      Point 3-One minor issue that would avoid unnecessary confusion by readers familiar with the circadian literature is the say that activity profiles are plotted in the study. The authors have centered their averaged activity profiles on the 12h of darkness. This is the opposite of the practice of the field, and it leads to some initial confusion in the examination of the morning and evening peak data. The authors may wish to avoid this by centering their activity plots on the 12h light phase, which would put the morning peak on the left and the evening peak on the right. This is the way the field is accustomed to examining locomotor activity profiles.

      The centering of averaged activity profiles on the 12 h of darkness is done to highlight the phenotype of advanced morning activity. To prevent any confusion among readers, we have included a sentence in the figure legend explaining the difference in our activity profiles compared to previous literatures: "Activity profiles were centered of the 12 h darkness in all figures with evening activity on the left and morning activity on the right, which is different from general circadian literatures. (Fig. 2H legend)" (Line 957-959))

      Point 4-The authors conclude that the loss of PDF and CNMa have opposite effects on the morning peak of locomotor activity (line 392). But they also acknowledge, briefly, that things are not that simple: loss of CNMa causes a phase advance, but loss of PDF causes a loss or reduction in the anticipatory peak. It is still significant to find a peptide transmitter with the clock neuron network that regulates morning activity, but the authors should revise their conclusion regarding the opposing actions of PDF and CNMa, which is not well supported by the data.

      We have revised the relevant parts.

      ABSTRACT: “Specific elimination of each from clock neurons revealed that loss of the neuropeptide CNMa in two posterior dorsal clock neurons (DN1ps) or its receptor (CNMaR) caused advanced morning activity, indicating a suppressive role of CNMa-CNMaR on morning anticipation, opposite to the promoting role of PDF-PDFR on morning anticipation.” (Line 43-48)

      DISCUSSION: “Furthermore, given that the morning anticipation vanishing phenotype of Pdf or Pdfr mutant indicates a promoting role of PDF-PDFR signal, while the enhanced morning anticipation phenotype of CNMa mutant suggests an inhibiting role of CNMa signal, we consider the two signals to be antagonistic.” (Line 492-495)

      Point 5-The authors should acknowledge, cite, and incorporate the substantive discussion of CNMa peptide and the DN1p neuronal class in Reinhard et al. 2022 (Front Physiol. 13: 886432).

      We have revised the text accordingly and cited this paper: “Type I with two neurons whose branches projecting to the anterior region, as in CNMa∩GMR51H05, CNMa∩Pdfr, and CNMa∩GMR79A11 (Fig. 6E, 5G, 6H), and type II with four neurons branching on the posterior side with few projections to the anterior region, as in CNMa∩GMR91F02 (Fig. 6F). These two types of DN1ps’ subsets were also reported and profound discussed previously (Lamaze et al., 2018; Reinhard et al., 2022)”. (Line 393-397)

      Reviewer #3 (Recommendations For The Authors):

      Point 1-Throughout the manuscript figure legends (axis, genotypes, etc) are too small to be appreciated. Fig. 1. Panel A. The labels are very difficult to read.

      We have attempted to enlarge the font as much as possible in the revised version.

      Point 2-Fig. 1. H-J Why is efficiency not mentioned in all the examples?

      In the revised manuscript, the results of Fig 1H-1J are discussed in the revised version (Line 145-147). The reason that we did not calculate the exact efficiency is that the GFP intensity is not stable enough which might change during dissection, mounting or intensity of laser in our experimental process. Therefore, in all results related to GFP signal (Fig. 1B-1J, Fig. S1, Fig. S2, Fig. 2B-2D), we relied on qualitative judgment rather than quantitative judgment, unless the GFP signal was easily quantifiable (such as in cases with limited cells or no GFP signal in the experimental group).

      Point 3-Fig. 1. Panel L, left (light phase): the statistical comparisons are not clearly indicated (the same happens in Figs 3Q and 3R).

      We have now re-arranged Fig. 1L and Fig. 3Q-3R to make the statistical comparisons clear in the new version.

      Point 4-Line 792. Could induced be introduced?

      Yes, we have now corrected this typo.

      Point 5-Fig. S1. Check labels for consistency. GMR57C10 Gal4 driver is most likely R57C10.

      We have now revised the labels (Fig. S1).

      Point 6-Fig. S2. If the experiments were repeated and several brains were observed, the authors should include the efficiency and the number of flies as reported in Fig. S1.

      We have now added the number of flies in Fig. S2 as reported in Fig. S1. As Response to Point 2 mentioned, due to the instability of the GFP signal, we are unable to provide a quantitative efficiency in this context.

      Point 7-Fig S4. The fig legend describes panels I-J which are not shown in the current version of the manuscript.

      We now have deleted them.

      Point 8-Fig 2I. Surprising values for morning anticipation indexes even for controls (0.5 would indicate ¨no anticipation¨; in controls, the expected values would be >>0.5, as most of the activity is concentrated right before the transition. Could the authors explain this unexpected result?

      We have revised the description of the calculation in the methods section (Line 612). After calculating the ratio of the last three hours of activity to the total six hours of activity, the results were further subtracted by 0.5. Therefore, the index should be ≤0.5. When the index is equal to 0, it indicates no morning anticipation.

      Point 9-Fig 2K/L. The authors mention that not all genes are effectively knocked out with their strategy. Could this be accounted for the specific KD strategy, its duration, or the promotor strength? It is surprising no explanation is provided in the text (page 9 line 179).

      In our pursuit of establishing a broadly effective method for gene editing, Fig. 2H-2L and Fig. 2D revealed that previous attempts have fallen short of achieving this objective. The observed inefficiency may be attributed to the intensity of the promoter, resulting in inadequate expression. Alternatively, the insufficient duration of the operation may also contribute to the lack of success. However, in the context of sleep and rhythm research applications, the age of the fruit fly tests is typically fixed, limiting the potential to enhance efficiency by extending the manipulation time. Moreover, increasing the expression level may pose challenges related to cytotoxicity, as reported in previous studies (Port et al., 2014). We refrain from offering specific explanations, as we lack a definitive plan and cannot provide additional robust evidence to support the above speculations. Consequently, in our ongoing efforts, we aim to enhance the efficiency of the tool system while operating within the current constraints.

      Point 10-Page 9, line 179. Can the authors include a brief description of the reason for the different modifications? Only one was referenced.

      We have revised related part in the manuscript (Line 223-231):

      Cas9.M9: We fused a chromatin-modulating peptide (Ding et al., 2019), HMGN1 183 (High mobility group nucleosome binding domain 1), at the N-terminus of Cas9 and HMGB1 184 (High mobility group protein B1) at its C-terminus with GGSGP linker, termed Cas9.M9.

      Cas9.M6: We also obtained a modified Cas9.M6 with HMGN1 at the N-terminus and an undefined peptide (UDP) at the C-terminus. (NOTE:UDP was gained by accident)

      Cas9.M0: We replaced the STARD linker between Cas9 and NLS in Cas9.HC with GGSGP the linker (Zhao et al., 2016), termed Cas9.M0

      Point 11-The authors tested the impact of KO nAChR2 across the different versions of conditional disruption (Fig 1K-L, Fig 2L, Fig 3R). It is surprising they observe a difference in daytime sleep upon knocking down with Cas9.HC (2L) but not with Cas9.M9 (3R) and the reverse is seen for night-time sleep. Could the authors provide an explanation? Efficiency is not the issue at stake, is it?

      In Fig. 2K, the day sleep of flies (R57C10-GAL4/UAS-sgRNAnAChRbeta2; UAS-Cas9/+) was significantly decreased compared to flies (R57C10-GAL4/UAS-sgRNAnAChRbeta2; +/+), but not when compared to flies (R57C10-GAL4/+; UAS-Cas9/+). Our criterion for asserting a difference is that the experimental group must show a significant distinction from both control groups. Therefore, we concluded that there was no significant difference between the experimental group and the control groups in Fig. 2K.

      Point 12-Fig. 4. Which of the two strategies described in A-B was employed to assemble the expression profile of CCT genes in clock neurons shown in C? This information should be part of the fig legend.

      We have now revised the legend as follows: “(A-B) Schematic of intersection strategies used in Clk856 labelled clock neurons dissection, Flp-out strategy (A) and split-LexA strategy (B). The exact strategy used for each gene is annotated in Table S5.”

      Point 13-Similarly, how many brains were analyzed to give rise to the table shown in C?

      We have now revised the legend of Table S4 to address this concern. As indicated in: “The largest N# for each gene in Table S4 is the brain number analyzed for each gene”.

      Point 14-Finally, the sentence ¨The figure is...¨ requires revision.

      We have now revised it: “The exact cell number for each subset is annotated in Table S4”.

      Point 15-Legend to Table S3. The authors have done an incredible job testing many gRNAs for each gene potentially relevant for communication. However, there is very little information to make the most out of it; for instance, the legend does not inform why many of the targeted genes do not appear to have been tested any further. It would be useful to the reader to discern whether despite being the 3 most efficient gRNAs, they were still not effective in targeting the gene of interest, or whether they showed off-targets, or it was simply a matter of testing the educated guesses. This information would be invaluable for the reader.

      First, we designed and generated transgenic UAS-sgRNA fly lines for all these sgRNAs. We randomly selected 14 receptor genes, known for their difficulty in editing based on our experience, to assess the efficiency of our strategy, as depicted in Fig. 3M-3P, Fig. S5, and Fig. S6. We believe these results are representative and indicative of the efficiency of sgRNAs designed using our process and applied with the modified Cas9.

      Secondly, we acknowledge your valid concern. While we selected sgRNAs with no predicted off-target effects through various prediction models (outlined in the Methods under C-cCCTomics sgRNA design), we did not conduct whole-genome sequencing. Consequently, we can only assert that the off-target possibility is relatively low. To address potential misleading effects arising from off-target concerns, it is essential to validate these results through mutants, RNAi, or alternative UAS-sgRNAs targeting the same gene.

      Point 16-Table S4. Some of the data presented derives from observations made in 1-2 brains for a specific cluster; isn´t it too little to base a decision on whether a certain gene is (or not) expressed? It is surprising since the same CCT line was observed/analysed in more brains for other clusters. Can the authors explain the rationale?

      The N# number represents the GFP positive number, and we have revised the legend of Table S4. The largest N# number denotes the total number of brains analyzed for a specific CCT line. It's possible that, due to variations in our dissection or mounting process, some clusters were only observed in 1-2 brains out of the total brains analyzed. To enhance the accuracy of intersection analysis results, we marked all positive signal records when positive subsets were found in less than 1/3 of the total analyzed brains (Table S4).

      Point 17-The paragraph describing this data in the results section needs revising (lines 233-243).

      We have now revised this. (Line 286-317)

      Point 18-While it is customary for authors to attempt to improve the description of the activity patterns by introducing new parameters (i.e. MAPI and EAPI, lines 253-258) it would be interesting to understand the difference between the proposed method and the one already in use (which compares the same parameter, i.e., the slope (defined as ¨the slope of the best-fitting linear regression line over a period of 6 h prior to the transition¨, i.e., Lamaze et al. 2020 and many others). Is there a need to introduce yet another one?

      This approach is necessary. The slope defined by Lamaze et al. utilizes data from only 2 time points, which may not accurately capture the pattern within a period before light on or off. Linear regression is not well-suited for a single fly due to the high variability in activity at each time point, making it challenging to fit the model at the individual level. The parameters we have introduced (MAPI and EAPI) in this paper are concise and can be applied at the individual level, effectively reflecting the morning or evening anticipation characteristics of each fly.

      As an alternative, the activity plot of a certain fly line could be represented by an average of all flies' activity in one experiment. This would make linear regression easier to fit. However, several independent experiments are required for statistical robustness, necessitating the inclusion of hundreds of flies for each strain in a single analysis.

      Point 19-In general, the legends of supplementary figures are a bit too brief. S7 and S8: it is not clear which of the two intersectional strategies were used (it would benefit whoever is interested in replicating the experiments). Legend to Fig S8 should read ¨similar to Fig S7¨.

      We have now revised the legend and included “The exact strategy used for each gene is annotated in Table S5” in the legend.

      Point 20-The legend in Table S6 should clearly state the genotypes examined. What does the marking in bold refer to?

      We have now revised annotation of Table S6. Marking in bold refer to results out of one SD compared to control group.

      Point 21-Line 314. The sentence needs revision.

      We have revised these sentences.

      Point 22-Line 391 (and also in the results section). The authors attempt to describe the CNMa phenotype as the opposite of pdf/pdfr mutant phenotypes. However, no morning anticipation/advanced morning anticipation are not necessarily opposite phenotypes.

      We have revised related description.

      ABSTRACT: “Specific elimination of each from clock neurons revealed that loss of the neuropeptide CNMa in two posterior dorsal clock neurons (DN1ps) or its receptor (CNMaR) caused advanced morning activity, indicating a suppressive role of CNMa-CNMaR on morning anticipation, opposite to the promoting role of PDF-PDFR on morning anticipation.” (Line 43-48)

      DISCUSSION: “Furthermore, given that the morning anticipation vanishing phenotype of Pdf or Pdfr mutant indicates a promoting role of PDF-PDFR signal, while the enhanced morning anticipation phenotype of CNMa mutant suggests an inhibiting role of CNMa signal, we consider the two signals to be antagonistic.” (Line 492-495)

      Reference

      Deng, B., Li, Q., Liu, X., Cao, Y., Li, B., Qian, Y., Xu, R., Mao, R., Zhou, E., Zhang, W., et al. (2019). Chemoconnectomics: mapping chemical transmission in Drosophila. Neuron 101, 876-893.e874.

      Ding, X., Seebeck, T., Feng, Y., Jiang, Y., Davis, G.D., and Chen, F. (2019). Improving CRISPR-Cas9 genome editing efficiency by fusion with chromatin-modulating peptides. Crispr j 2, 51-63.

      Duhart, J.M., Herrero, A., de la Cruz, G., Ispizua, J.I., Pírez, N., and Ceriani, M.F. (2020). Circadian Structural Plasticity Drives Remodeling of E Cell Output. Curr Biol 30, 5040-5048.e5045.

      Erion, R., King, A.N., Wu, G., Hogenesch, J.B., and Sehgal, A. (2016). Neural clocks and Neuropeptide F/Y regulate circadian gene expression in a peripheral metabolic tissue. eLife 5, e13552.

      Fujiwara, Y., Hermann-Luibl, C., Katsura, M., Sekiguchi, M., Ida, T., Helfrich-Förster, C., and Yoshii, T. (2018). The CCHamide1 neuropeptide expressed in the anterior dorsal neuron 1 conveys a circadian signal to the ventral lateral neurons in Drosophila melanogaster. Front Physiol 9, 1276.

      Goda, T., Tang, X., Umezaki, Y., Chu, M.L., Kunst, M., Nitabach, M.N.N., and Hamada, F.N. (2016). Drosophila DH31 neuropeptide and PDF receptor regulate night-onset temperature preference. J Neurosci 36, 11739-11754.

      Goda, T., Umezaki, Y., Alwattari, F., Seo, H.W., and Hamada, F.N. (2019). Neuropeptides PDF and DH31 hierarchically regulate free-running rhythmicity in Drosophila circadian locomotor activity. Sci Rep 9, 838.

      Guo, F., Cerullo, I., Chen, X., and Rosbash, M. (2014). PDF neuron firing phase-shifts key circadian activity neurons in Drosophila. Elife 3.

      He, C., Cong, X., Zhang, R., Wu, D., An, C., and Zhao, Z. (2013). Regulation of circadian locomotor rhythm by neuropeptide Y-like system in Drosophila melanogaster. Insect Mol Biol 22, 376-388.

      Hermann, C., Yoshii, T., Dusik, V., and Helfrich-Förster, C. (2012). Neuropeptide F immunoreactive clock neurons modify evening locomotor activity and free-running period in Drosophila melanogaster. J Comp Neurol 520, 970-987.

      Hyun, S., Lee, Y., Hong, S.T., Bang, S., Paik, D., Kang, J., Shin, J., Lee, J., Jeon, K., Hwang, S., et al. (2005). Drosophila GPCR Han is a receptor for the circadian clock neuropeptide PDF. Neuron 48, 267-278.

      Johard, H.A., Yoishii, T., Dircksen, H., Cusumano, P., Rouyer, F., Helfrich-Förster, C., and Nässel, D.R. (2009). Peptidergic clock neurons in Drosophila: ion transport peptide and short neuropeptide F in subsets of dorsal and ventral lateral neurons. J Comp Neurol 516, 59-73.

      Lamaze, A., Krätschmer, P., Chen, K.F., Lowe, S., and Jepson, J.E.C. (2018). A Wake-Promoting Circadian Output Circuit in Drosophila. Curr Biol 28, 3098-3105.e3093.

      Lear, B.C., Zhang, L., and Allada, R. (2009). The neuropeptide PDF acts directly on evening pacemaker neurons to regulate multiple features of circadian behavior. PLoS Biol 7, e1000154.

      Lee, G., Bahn, J.H., and Park, J.H. (2006). Sex- and clock-controlled expression of the neuropeptide F gene in Drosophila. 103, 12580-12585.

      Lelito, K.R., and Shafer, O.T. (2012). Reciprocal cholinergic and GABAergic modulation of the small ventrolateral pacemaker neurons of Drosophila's circadian clock neuron network. J Neurophysiol 107, 2096-2108.

      Ma, D., Przybylski, D., Abruzzi, K.C., Schlichting, M., Li, Q., Long, X., and Rosbash, M. (2021). A transcriptomic taxonomy of Drosophila circadian neurons around the clock. Elife 10.

      Port, F., Chen, H.M., Lee, T., and Bullock, S.L. (2014). Optimized CRISPR/Cas tools for efficient germline and somatic genome engineering in Drosophila. Proc Natl Acad Sci USA 111, E2967-2976.

      Reinhard, N., Schubert, F.K., Bertolini, E., Hagedorn, N., Manoli, G., Sekiguchi, M., Yoshii, T., Rieger, D., and Helfrich-Förster, C. (2022). The Neuronal Circuit of the Dorsal Circadian Clock Neurons in Drosophila melanogaster. Front Physiol 13, 886432.

      Renn, S.C., Park, J.H., Rosbash, M., Hall, J.C., and Taghert, P.H. (1999). A pdf neuropeptide gene mutation and ablation of PDF neurons each cause severe abnormalities of behavioral circadian rhythms in Drosophila. Cell 99, 791-802.

      Shafer, O.T., Helfrich-Förster, C., Renn, S.C., and Taghert, P.H. (2006). Reevaluation of Drosophila melanogaster's neuronal circadian pacemakers reveals new neuronal classes. J Comp Neurol 498, 180-193.

      Shafer, O.T., Kim, D.J., Dunbar-Yaffe, R., Nikolaev, V.O., Lohse, M.J., and Taghert, P.H. (2008). Widespread receptivity to neuropeptide PDF throughout the neuronal circadian clock network of Drosophila revealed by real-time cyclic AMP imaging. Neuron 58, 223-237.

      Zhang, L., Chung, B.Y., Lear, B.C., Kilman, V.L., Liu, Y., Mahesh, G., Meissner, R.A., Hardin, P.E., and Allada, R. (2010). DN1(p) circadian neurons coordinate acute light and PDF inputs to produce robust daily behavior in Drosophila. Curr Biol 20, 591-599.

      Zhao, P., Zhang, Z., Lv, X., Zhao, X., Suehiro, Y., Jiang, Y., Wang, X., Mitani, S., Gong, H., and Xue, D. (2016). One-step homozygosity in precise gene editing by an improved CRISPR/Cas9 system. Cell Res 26, 633-636.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This fundamental study provides an unprecedented understanding of the roles of different combinations of NaV channel isoforms in nociceptors' excitability, with relevance for the design of better strategies targeting NaV channels to treat pain. Although the experimental combination of electrophysiological, modeling, imaging, molecular biology, and behavioral data is convincing and supports the major claims of the work, some conclusions need to be strengthened by further evidence or discussion. The work may be of broad interest to scientists working on pain, drug development, neuronal excitability, and ion channels.

      Reviewer #1 (Public Review):

      Summary:

      In this work, Xie, Prescott, and colleagues have reevaluated the role of Nav1.7 in nociceptive sensory neuron excitability. They find that nociceptors can make use of different sodium channel subtypes to reach equivalent excitability. The existence of this degeneracy is critical to understanding neuronal physiology under normal and pathological conditions and could explain why Nav subtype-selective drugs have failed in clinical trials. More concretely, nociceptor repetitive spiking relies on Nav1.8 at DIV0 (and probably under normal conditions in vivo), but on Nav1.7 and Nav1.3 at DIV4-7 (and after inflammation in vivo).

      The conclusions of this paper are mostly well supported by data, and these findings should be of broad interest to scientists working on pain, drug development, neuronal excitability, and ion channels.

      Strengths:

      (1.1) The authors have employed elegant electrophysiology experiments (including specific pharmacology and dynamic clamp) and computational simulations to study the excitability of a subpopulation of DRGs that would very likely match with nociceptors (they take advantage of using transgenic mice to detect Nav1.8-expressing neurons). They make a strong point showing the degeneracy that occurs at the ion channel expression level in nociceptors, adding this new data to previous observations in other neuronal types. They also demonstrate that the different Nav subtypes functionally overlap and are able to interchange their "typical" roles in action potential generation. As Xie, Prescott, and colleagues argue, the functional implications of the degenerate character of nociceptive sensory neuron excitability need to be seriously taken into account regarding drug development and clinical trials with Nav subtype-selective inhibitors.

      Weaknesses:

      (1.2) The next comments are minor criticisms, as the major conclusions of the paper are well substantiated. Most of the results presented in the article have been obtained from experiments with DRG neuron cultures, and surely there is a greater degree of complexity and heterogeneity about the degeneracy of nociceptors excitability in the "in vivo" condition. Indeed, the authors show in Figures 7 and 8 data that support their hypothesis and an increased Nav1.7's influence on nociceptor excitability after inflammation, but also a higher variability in the nociceptors spiking responses. On the other hand, DRG neurons targeted in this study (YFP (+) after crossing with Nav1.8-Cre mice) are >90% nociceptors, but not all nociceptors express Nav1.8 in vivo. As shown by Li et al., 2016 ("Somatosensory neuron types identified by high-coverage single-cell RNA-sequencing and functional heterogeneity"), there is a high heterogeneity of neuron subtypes within sensory neurons. Therefore, some caution should be taken when translating the results obtained with the DRG neuron cultures to the more complex "in vivo" panorama.

      We agree that most but not all Nav1.8+ DRG cells are nociceptors and that not all nociceptors express Nav1.8. We targeted small neurons that also express (or at some point expressed) Nav1.8, thus excluding larger neurons that express Nav1.8. This allowed us to hone in on a relatively homogeneous set of neurons, which is crucial when testing different neurons to compare between conditions (as opposed to testing longitudinally in the same neuron, which is not feasible). We expect all neurons are degenerate but likely on the basis of different ion channel combinations. Indeed, even within small Nav1.8+ neurons, other channels that we did not consider likely contribute to the degenerate regulation (as now better reflected in the revised Discussion).

      That said, there are multiple sources of heterogeneity. We suspect that heterogeneity is more increased after inflammation than after axotomy because all DRG neurons experience axotomy when cultured whereas neurons experience inflammation differently in vivo depending on whether their axon innervates the inflamed area (now explained on lines 214-215). This is not so much about whether the insult occurs in vivo or in vitro, but about how homogeneously neurons are affected by the insult. Granted, neurons are indeed more likely to be heterogeneously affected in vivo since conditions are more complex. But our goal in testing PF-71 in behavioral tests (Fig. 8) was to show that changes observed in nociceptor excitability in Figure 7, despite heterogeneity, were predictive of changes in drug efficacy. In short, we establish Nav interchangeability by comparing neurons in culture (Figs 1-6), but we then show that similar Nav shifts can develop in vivo (Fig 7) with implications for drug efficacy (Fig 8). Such results should alert readers to the importance of degeneracy for drug efficacy (which is our main goal) even without a complete picture of nociceptor degeneracy or DRG neuron heterogeneity. Additions to the Discussion (lines 248-259, 304-308) are intended to highlight these considerations.

      (1.3) Although the authors have focused their attention on Nav channels, it should be noted that degeneracy concerning other ion channels (such as potassium ion channels) could also impact the nociceptor excitability. The action potential AHP in Figure 1, panel A is very different comparing the DIV0 (blue) and DIV4-7 examples. Indeed, the conductance density values for the AHP current are higher at DIV0 than at DIV7 in the computational model (supplementary table 5). The role of other ion channels in order to obtain equivalent excitability should not be underestimated.

      We completely agree. We focused on Nav channels because of our initial observation with TTX and because of industry’s efforts to develop Nav subtype-selective inhibitors, whose likelihood of success is affected by the changes we report. But other channels are presumably changing, especially given observed changes in the AHP shape (now mentioned on lines 304-308). Investigation should be expanded to include these other channels in future studies.

      Reviewer #2 (Public Review):

      Summary:

      The authors have noted in preliminary work that tetrodotoxin (TTX), which inhibits NaV1.7 and several other TTX-sensitive sodium channels, has differential effects on nociceptors, dramatically reducing their excitability under certain conditions but not under others. Partly because of this coincidental observation, the aim of the present work was to re-examine or characterize the role of NaV1.7 in nociceptor excitability and its effects on drug efficacy. The manuscript demonstrates that a NaV1.7-selective inhibitor produces analgesia only when nociceptor excitability is based on NaV1.7. More generally and comprehensively, the results show that nociceptors can achieve equivalent excitability through changes in differential NaV inactivation and NaV expression of different NaV subtypes (NaV 1.3/1.7 and 1.8). This can cause widespread changes in the role of a particular subtype over time. The degenerate nature of nociceptor excitability shows functional implications that make the assignment of pathological changes to a particular NaV subtype difficult or even impossible.

      Thus, the analgesic efficacy of NaV1.7- or NaV1.8-selective agents depends essentially on which NaV subtype controls excitability at a given time point. These results explain, at least in part, the poor clinical outcomes with the use of subtype-selective NaV inhibitors and therefore have major implications for the future development of Nav-selective analgesics.

      Strengths:

      (2.1) The above results are clearly and impressively supported by the experiments and data shown. All methods are described in detail, presumably allow good reproducibility, and were suitable to address the corresponding question. The only exception is the description of the computer model, which should be described in more detail.

      We failed to report basic information such as the software, integration method and time step in the original text. This information is now provided on lines 476-477. Notably, the full code is available on ModelDB plus all equations including the values for all gating parameters are provided in Supplementary Table 5 and values for maximal conductance densities for DIV0 and DIV7 models are provided in Supplementary Table 6. Changes in conductance densities to simulate different pharmacological conditions are reported in the relevant figure legends (now shown in red). We did not include model details in the main text to avoid disrupting the flow of the presentation, but all the model details are reported in the Methods, tables and/or figure legends.

      (2.2) The results showing that nociceptors can achieve equivalent excitability through changes in differential NaV inactivation and expression of different NaV subtypes are of great importance in the fields of basic and clinical pain research and sodium channel physiology and pharmacology, but also for a broad readership and community. The degenerate nature of nociceptor excitability, which is clearly shown and well supported by data has large functional implications. The results are of great importance because they may explain, at least in part, the poor clinical outcomes with the use of subtype-selective NaV inhibitors and therefore have major implications for the future development of Nav-selective analgesics.

      In summary, the authors achieved their overall aim to enlighten the role of NaV1.7 in nociceptor excitability and the effects on drug efficacy. The data support the conclusions, although the clinical implications could be highlighted in a more detailed manner.

      Weaknesses:

      As mentioned before, the results that nociceptors can achieve equivalent excitability through changes in differential NaV inactivation and NaV expression of different NaV subtypes are impressive. However, there is some "gap" between the DRG culture experiments and acutely dissociated DRGs from mice after CFA injection. In the extensive experiments with cultured DRG neurons, different time points after dissociation were compared. Although it would have been difficult for functional testing to examine additional time points (besides DIV0 and DIV47), at least mRNA and protein levels should have been determined at additional time points (DIV) to examine the time course or whether gene expression (mRNA) or membrane expression (protein) changes slowly and gradually or rapidly and more abruptly.

      Characterizing the time course of NaV expression changes is worthwhile but, insofar as such details are not necessary to establish that excitability is degenerate, it was not include in the current study. Furthermore, since mRNA levels do not parallel the functional changes in Nav1.7 (Figure 6A), we do not think it would be helpful to measure mRNA levels at intermediate time points. Measuring protein levels would be more informative, however, as now explained on lines 362-369, neurons were recorded at intermediate time points in initial experiments and showed a lot of variability. Methods that could track fluorescently-tagged NaV channels longitudinally (i.e. at different time points in the same cell) would be well suited for this sort of characterization, but will invariably lead to more questions about membrane trafficking, phosphorylation, etc. We agree that a thorough characterization would be interesting but we think it is best left for a future study.

      It would also be interesting to clarify whether the changes that occur in culture (DIV0 vs. DIV47) are accompanied by (pro-)inflammatory changes in gene and protein expression, such as those known for nociceptors after CFA injection. This would better link the following data demonstrating that in acutely dissociated nociceptors after CFA injection, the inflammationinduced increase in NaV1.7 membrane expression enhances the effect of (or more neurons respond to) the NaV1.7 inhibitor PF-71, whereas fewer CFA neurons respond to the NaV1.8 inhibitor PF-24.

      These are some of the many good questions that emerge from our results. We are not particularly keen to investigate what happens over several days in culture, since this is not so clinically relevant, but it would be interesting to compare changes induced by nerve injury in vivo (which usually involves neuroinflammatory changes) and changes induced by inflammation. Many previous studies have touched on such issues but we are cautious about interpreting transcriptional changes, and of course all of these changes need to be considered in the context of cellular heterogeneity. It would be interesting to decipher if changes in NaV1.7 and NaV1.8 are directly linked so that an increase in one triggers a decrease in the other, and vice versa. But of course many other channels are also likely to change (as discussed above), and they too warrant attention, which makes the problem quite difficult. We look forward to tackling this in future work.

      The results shown explain, at least in part, the poor clinical outcomes with the use of subtypeselective NaV inhibitors and therefore have important implications for the future development of Nav-selective analgesics. However, this point, which is also evident from the title of the manuscript, is discussed only superficially with respect to clinical outcomes. In particular, the promising role of NaV1.7, which plays a role in nociceptor hyperexcitability but not in "normal" neurons, should be discussed in light of clinical results and not just covered with a citation of a review. Which clinical results of NaV1.7-selective drugs can now be better explained and how?

      We wish to avoid speculating on which particular clinical results are better explained because our study was not designed for that. Instead, our take-home message (which is well supported; see Discussion on lines 309-321) is that NaV1.7-selective drugs may have a variable clinical effect because nociceptors’ reliance on NaV1.7 is itself variable – much more than past studies would have readers believe. At the end of the results (line 235), which is, we think, what prompted the reviewer’s comment, we point to the Discussion. The corollary is that accounting for degeneracy could help account for variability in drug efficacy, which would of course be beneficial. The challenge (as highlighted in the Abstract, lines 21-22) is that identifying the dominant Nav subtype to predict drug efficacy is difficult. We certainly don’t have all the answers, but we hope our results will point readers in a new direction to help answer such questions.

      Another point directly related to the previous one, which should at least be discussed, is that all the data are from rodents, or in this case from mice, and this should explain the clinical data in humans. Even if "impediment to translation" is briefly mentioned in a slightly different context, one could (as mentioned above) discuss in more detail which human clinical data support the existence of "equivalent excitability through different sodium channels" also in humans.

      We are not aware of human data that speak directly to nociceptor degeneracy but degeneracy has been observed in diverse species; if anything, human neurons are probably even more degenerate based on progressive expansion of ion channel types, splice variants, etc. over evolution. Of course species differences extend beyond degeneracy and are always a concern for translation, because of a species difference in the drug target itself or because preclinical pain testing fails to capture the most clinically important aspects of pain (which we mention on line 35). Line 39 now reiterates that these explanations for translational difficulties are not mutually exclusive, but that degeneracy deserves greater consideration that is has hitherto received. Indeed, throughout our paper we imply that degeneracy may contribute to the clinical failure of Nav subtype-specific drugs, but those failures are certainly not evidence of degeneracy. In the Discussion (line 320-321), we now cite a recent review article on degeneracy in the context of epilepsy, and point out how parallels might help inform pain research. We wish we had a more direct answer to the reviewer’s request; in the absence of this, we hope our results motivate readers to seek out these answers in future research.

      Although speculative, it would be interesting for readers to know whether a treatment regimen based on "time since injury" with NaV1.7 and NaV1.8 inhibitors might offer benefits. Based on the data, could one hypothesize that NaV1.7 inhibitors are more likely to benefit (albeit in the short term) in patients with neuropathic pain with better patient selection (e.g., defined interval between injury and treatment)?

      We like that our data prompt this sort of prediction. However, this is potentially complicated since the injury may be subtle, which is to say that the exact timing may not be known. There are scenarios (e.g. postoperative pain) where the timing of the insult is known, but in other cases (e.g. diabetic neuropathy) the disease process is quite insidious, and different neurons might have progressed through different stages depending on how they were exposed to the insult. Our own experiments with CFA are a case in point. Notwithstanding the potential difficulties about gauging the time course, any way of predicting which Nav subtype is dominant could help more strategically choose which drug to use.

      Reviewer #3 (Public Review):

      Summary:

      In this study, the authors used patch-clamp to characterize the implication of various voltagegated Na+ channels in the firing properties of mouse nociceptive sensory neurons. They report that depending on the culture conditions NaV1.3, NaV1.7, and NaV1.8 have distinct contributions to action potential firing and that similar firing patterns can result from distinct relative roles of these channels. The findings may be relevant for the design of better strategies targeting NaV channels to treat pain.

      Strengths:

      The paper addresses the important issue of understanding, from an interesting perspective, the lack of success of therapeutic strategies targeting NaV channels in the context of pain. Specifically, the authors test the hypothesis that different NaV channels contribute in a plastic manner to action potential firing, which may be the reason why it is difficult to target pain by inhibiting these channels. The experiments seem to have been properly performed and most conclusions are justified. The paper is concisely written and easy to follow.

      Weaknesses:

      (1) The most critical issue I find in the manuscript is the claim that different combinations of NaV channels result in equivalent excitability. For example, in the Abstract it is stated that: "...we show that nociceptors can achieve equivalent excitability using different combinations of NaV1.3, NaV1.7, and NaV1.8". The gating properties of these channels are not identical, and therefore their contributions to excitability should not be the same. I think that the culprit of this issue is that the authors reach their conclusion from the comparison of the (average) firing rate determined over 1 s current stimulation in distinct conditions. However, this is not the only parameter that determines how sensory neurons convey information. For instance, the time dependence of the instantaneous frequency, the actual firing pattern, may be important too. Moreover, the use of 1 s of current stimulation might not be sufficient to characterize the firing pattern if one wants to obtain conclusions that could translate to clinical settings (i.e., sustained pain). A neuron in which NaV1.7 is the main contributor is expected to have a damping firing pattern due to cumulative channel inactivation, whereas another depending mainly on NaV1.8 is expected to display more sustained firing. This is actually seen in the results of the modelling.

      This concern seems to boil down to how equivalent is equivalent? The spike shape or the full inputoutput curve for a DIV0 neuron (Nav1.8-dominant) is never equivalent to what’s seen in a DIV47 neuron (Nav1.7-dominant), but nor are any two DIV0 neurons strictly equivalent, and likewise for any two DIV4-7 neurons. Our point is that DIV0 and DIV4-7 neurons are a far more similar (less discriminable) in their excitability than expected from the qualitative difference in their TTX sensitivity (and from repeated claims in the literature that Nav1.7 is necessary for spike generation in nociceptors). Nav isoforms need not be identical to operate similarly; for instance, Nav1.8 tends to activate at “suprathreshold” voltages, but this depends on the value of threshold; if threshold increases, Nav1.8 can activate at subthreshold voltages (see Fig 5). We have modified lines 155- 175 to help clarify this.

      We completely agree that firing rate is not the only way to convey sensory information, and of course injecting current directly into the cell body via a patch pipette is not a natural stimulus. These are all factors to keep in mind when interpreting our data. Nonetheless, our data show that excitability is similar between DIV0 and DIV 4-7, so much so that data from any one neuron (without pharmacological tests or capacitance measurements) would likely not reveal if that cell is DIV0 or DIV4-7; this “indiscriminability” qualifies as “equivalent” for our purposes, and is consistent with phrasing used by other authors studying degeneracy. Notably, not every DIV4-7 neuron exhibits spike height attenuation (see Fig. 1A), likely because of concomitant changes in the AHP that were not captured in our computer model or directly tested in our experiments. This highlights that other channel changes may also contribute to degeneracy and the maintenance of repetitive spiking.

      (2) In Fig. 1, is 100 nM TTX sufficient to inhibit all TTX-sensitive NaV currents? More common in literature values to fully inhibit these currents are between 300 to 500 nM. The currents shown as TTX-sensitive in Fig. 1D look very strange (not like the ones at Baseline DIV4-7). It seems that 100 nM TTX was not enough, leading to an underestimation of the amplitude of the TTXsensitive currents.

      As now summarized in Supplementary Table 3 (which is newly added), 100 nM TTX is >20x the EC50 for Nav1.3 and Nav1.7 (but is still far below the EC50 for Nav1.8). Based on this, TTXsensitive channels are definitely blocked in our TTX experiments.

      (3) Page 8, the authors conclude that "Inflammation caused nociceptors to become much more variable in their reliance of specific NaV subtypes". However, how did the authors ensure that all neurons tested were affected by the CFA model? It could be that the heterogeneity in neuron properties results from distinct levels of effects of CFA.

      We agree with the reviewer. We also believe that variable exposure to CFA is the most likely explanation for the heightened variability in TTX-sensitivity reported in Figure 7 (now more clearly explained on lines 214-215). One could try co-injecting a retrograde dye with the CFA to label cells innervating the injection site, but differential spread of the CFA and dye are liable to preclude any good concordance. Alternatively, a pain model involving more widespread (systemic) inflammation might cause a more homogeneous effect. But, our main goal with CFA injections was to show that a Nav1.8®Nav1.7 switch can occur in vivo (and is therefore not unique to culturing), and that demonstration is true even if some neurons do not switch. Subsequent testing in Figure 8 shows that enough neurons switch to have a meaningful effect in terms of the behavioral pharmacology. So, notwithstanding tangential concerns, we think our CFA experiments succeeded in showing that Nav channels can switch in vivo and that this impacts drug efficacy.

      Recommendations for the authors:

      All reviewers agreed that these results are solid and interesting. However, the reviewers also raised several concerns that should be addressed by the authors to improve the strength of the evidence presented. Revisions considered to be essential include:

      (1) Discuss how degeneracy concerning other ion channels (such as potassium ion channels) could also impact nociceptor excitability (reviewer #1). Additionally, the translation of results from DRG neuron cultures to "in vivo" nociceptors should be better discussed.

      We have added a new paragraph to the Discussion (line 248-259) to remind readers that despite our focus on Nav channels, other ion channels likely also change (and that these changes involve diverse regulatory mechanisms that require further investigation). Likewise, despite our focus on the changes caused by culturing neurons, we remind readers that subtler, more clinically relevant in vivo perturbations can likewise cause a multitude of changes. We end that paragraph by emphasizing that although accounting for all the contributing components is required to fully understand a degenerate system, meaningful progress can be made by studying a subset of the components. We want to emphasize this because there is some middle ground between focusing on one component at a time (which is the norm) vs. trying to account for everything (which is an infeasible ideal). Additional text on lines 304-308 also addresses related points.

      (2) Discuss how different combinations of NaV channels result in equivalent excitability, in the context of the experimental conditions used (see main comment by reviewer #3). It should also be discussed in more detail which human clinical data support the existence of "equivalent excitability through different sodium channels" also in humans (reviewer #2).

      Regarding the first part of this comment, reviewer 3 wrote in the public review that “The gating properties of these channels are not identical, and therefore their contributions to excitability should not be the same.” Differences in gating properties are commonly used to argue that different Nav subtypes mediate different phases of the spike, for example, that Nav1.7 initiates the spike whereas Nav1.8 mediates subsequent depolarization because Nav1.7 and Nav1.8 activate at perithreshold and suprathrehold voltages, respectively (see lines 134-135, now shown in red). But such comparison is overly simplistic insofar as it neglects the context in which ion channels operate. For instance, if Nav1.7 is not expressed or fully inactivates, voltage threshold will be less negative, enabling Nav1.8 to contribute to spike initiation; in other words, previously “suprathreshold” voltages become “perithreshold”. Figure 5 is dedicated to explaining this context-sensitivity; specifically, we demonstrate with simulations how Nav1.8 takes over responsibility for initiating a spike when Na1.7 is absent or inactivated. Text on lines 155- 184 has been edited to help clarify this. Regarding the second part of this comment, we are not aware of any direct evidence from human sensory neurons that different sodium channels produce equivalent excitability, but that is certainly what we expect. We suggest that failure of Nav subtype-specific drugs is, at least in part, because of degeneracy, but such failures do not demonstrate degeneracy unless other contributing factors can be excluded (which they can’t). Recognizing degeneracy is difficult, and so variability that might be explained by degeneracy will go unexplained or attributed to other factors unless, by design or serendipity, experiments quantify the effects of degeneracy (as we have attempted to do here). We now cite a recent review article on degeneracy and epilepsy (line 320), which addresses relevant themes that might help inform pain research; for instance, most existing antiseizure medications act on multiple targets whereas more recently developed single-target drugs have proven largely ineffective. This is similar to but better documented than for analgesics. With this in mind, we revised the text to emphasize the circumstantial nature of existing evidence and the need to test more directly for degeneracy (lines 320-323).

      (3) Extend the discussion about the poor clinical outcomes with the use of subtype-selective NaV inhibitors. In particular, the promising role of NaV1.7, which plays a role in nociceptor hyperexcitability but not in "normal" neurons, should be discussed in light of clinical results and not just covered with a citation of a review. Which clinical results of NaV1.7-selective drugs can now be better explained and how? (reviewer #2)

      As discussed above, we are cautious avoid speculating on which clinical results are attributable to degeneracy. Instead, our take-home message (see Discussion, lines 309-323) is that NaV1.7selective drugs may have a variable clinical effect because nociceptors’ reliance on NaV1.7 is itself variable – much more than past studies would have readers believe. The corollary is that accounting for degeneracy could help account for variability in drug efficacy, which would of course be beneficial. The challenge (as highlighted in the Abstract, lines 21-22) is that identifying the dominant Nav subtype to predict drug efficacy is not trivial. Interpreting clinical data is also complicated by the fact that we are either dealing with genetic mutations (with unclear compensatory changes) or pharmacological results (where NaV1.7-selective drugs have a multitude of problems that might contribute to their lack of efficacy, separate from effects of degeneracy). We have striven to contextualize our results (e.g. last paragraph of results, lines 222-235). We think this is the most we can reasonably say based on the limitations of existing clinical data.

      (4) Provide a clearer and more detailed description of the computational model (reviewers #2 and #3).

      We added important details on line 476-477 but, in our honest opinion, we think our computational model is thoroughly explained. The issue seems to boil down to whether details are included in the Results vs. being left for the Methods, tables and figure legends. We prefer the latter.

      (5) Better clarify the effects of the CFA model, to provide further evidence relating inflammation with nociceptors variability (reviewers #2 and #3)

      As explained in response to a specific point by reviewer #3, we believe that variable exposure to CFA explains the heightened variability in TTX-sensitivity reported in Figure 7 (now explained on lines 214-215). One could try co-injecting a retrograde dye with the CFA to label cells innervating the injection site, but differential spread of the inflammation and dye are liable to preclude any good concordance. Alternatively, a pain model involving more widespread (systemic) inflammation might cause a more homogeneous effect. But, our main goal with CFA injections was to show that a Nav1.8®Nav1.7 switch can occur in vivo (and is therefore not unique to culturing); that demonstration holds true even if some neurons do not switch. Subsequent testing (Fig 8) shows that enough neurons switch to drug efficacy assessed behaviorally. This is emphasized with new text on lines 225-227. Overall, we think our CFA experiments succeed in showing that Nav channels can switch in vivo and, despite variability, that this occurs in enough neurons to impact drug efficacy.

      (6) Revise the text according to all recommendations raised by the reviewers and listed in the individual reviews.

      Detailed responses are provided below for all feedback and changes to the text were made whenever necessary, as identified in our responses.

      Reviewer #1 (Recommendations For The Authors):

      Minor points/recommendations:

      Protein synthesis inhibition by cercosporamide could be the direct cause of a smaller-thanexpected increase in Nav1.7 levels at DIV5. But for Nav1.8, there is a mitigation in the increased levels at DIV5, that only could be explained by several indirect mechanisms, including membrane trafficking and posttranslational modifications (phosphorylation, SUMOylation, etc.) on Nav1.8 or protein regulators of Nav1.8 channels. The authors suggest that "translational regulation is crucial", but also insinuate that other processes (membrane trafficking, etc.) could contribute to the observed outcome. It is difficult to assess the relative importance of these different explanations without knowing the exact mechanisms that are acting here.

      We agree. We relied on electrophysiology (and pharmacology) to measure functional changes, but we wanted to verify those data with another method. We expected mRNA levels to parallel the functional changes but, when that did not pan out, we proceeded to look at protein levels. Perhaps we should have stopped there, but by blocking protein translation, we show that there is not enough Nav1.7 protein already available that can be trafficked to the membrane. That does not explain why Nav1.8 levels drop. Our immunohistochemistry could not tease apart membrane expression from overall expression, which limits interpretation. We have enhanced the text to discuss this (lines 200-204), but further experiments are needed. Though admittedly incomplete, our initial finding help set the stage for future experiments on this matter.

      Page 15, typo: "contamination from genomic RNA" -> "contamination from genomic DNA" (appears twice).

      This has been corrected on lines 420 and 421.

      Page 17: I could not find the computer code at ModelDB (http://modeldb.yale.edu/267560). It seems to be an old web link. It should be available at some web repository.

      We confirmed that the link works. Entry is password-protected (password = excitability; see line 476). Password protection will be removed once the paper is officially published.

      Page 19, reference 36, typo: "Inhibitio of" -> "Inhibition of".

      This has been corrected (line 557).

      Page 33, typo: "are significantly larger than differences at DIV1" -> "are significantly larger than differences at DIV0".

      This has been corrected (line 796).

      Page 35, figure 6 legend. The number of experiments (n) is not indicated for panel C data.

      N = 3 is now reported (line 828).

      Reviewer #2 (Recommendations For The Authors):

      p. 3/4 and Data of Fig. 6: It should be commented on why days 1-3 were not investigated. An investigation of the time course (by higher frequency testing) would certainly have an added value because it would be possible to deduce whether the changes develop slowly and gradually, or whether the excitability induced by different NaVs changes suddenly. At least mRNA and protein levels should be determined at additional time points to examine the time course or whether gene expression (mRNA) or membrane expression (protein) changes slowly and gradually or rapidly and more abruptly. It would also be interesting to clarify whether the changes that occur in culture (DIV0 vs. DIV4-7) are accompanied by (pro-)inflammatory changes in gene and protein expression, such as those known for nociceptors after CFA injection. Or is the latter question clear in the literature?

      We now explain (lines 362-369) that intermediate time points (DIV1-3) were tested in initial current clamp recordings. Those data showed that TTX-sensitivity stabilized by DIV4 and differed from the TTX-insensitivity observed at DIV0. TTX-sensitivity was mixed at DIV1-3 and crosscell variability complicated interpretation. Subsequent experiments were prioritized to clarify why NaV1.7 is not always critical for nociceptor excitability, contrary to past studies. Our efforts to measure mRNA and protein levels were primarily to validate our electrophysiological findings; we are also interested in deciphering the underlying regulatory processes but this is an entire study on its own. Unfortunately, the existing literature does not help or point to an explanation for the Nav1.7/1.8 shift we observed.

      Our evidence that mRNA levels do not parallel functional changes argues against pursuing transcriptional changes in Nav1.7, though transcriptional changes in other factors might be important. Interpretation of immuno quantification would be complicated by the high variability we observed with the physiology at intermediate time points and, furthermore, we cannot resolve surface expression from overall expression based on available antibodies. Methods conducive to longitudinal measurements would be more appropriate (as now mentioned on line 367-369). In short, a lot more work is required to understand the mechanisms involved in the switch, but we think the existing demonstration suffices to show that NaV1.7 and NaV1.8 protein levels vary, with crucial implications for which Nav subtype controls nociceptor excitability, and important implications for drug efficacy. Explaining why and how quickly those protein levels change will be no small feat is best left for a future study.

      p. 4 and following: In order to enable the interpretation of the used concentration of PF-24, PF71, and ICA, the respective IC50 should be indicated.

      A table (now Supplementary Table 3; line 861) has been added to report EC50 values for all drugs for blocking NaV1.7, NaV1.8 and NaV1.3. The concentrations we used are included on that table for easy comparison.

      p. 5, end of the middle paragraph: Here it should be briefly explained -for less familiar readers- why NaV1.1 cannot be causative (ICA inhibits NaV1.1 and 1.3).

      We now explain (lines 117-120) that NaV1.1 is expressed almost exclusively in medium-diameter (A-delta) neurons whereas NaV1.3 is known to be upregulated in small-diameter neurons, and so the effect we observe in small neurons is most likely via blockade NaV1.3.

      p. 6, lines 4/5: At least once it should read computer model instead of model.

      “Computer” has been added the first time we refer to DIV0 or DIV4-7 computer models (lines 138-139)

      p. 6: the difference between Fig. 4B and Fig. 4 - Figure suppl. 1 should be mentioned briefly.

      We now explain (lines 150-154) that Fig. 4B involves replacing a native channel with a different virtual channel (to demonstrate their interchangeability) whereas and Fig. 4 - Figure supplement 1 involves replacing a native channel with the equivalent virtual channel (as a positive control).

      p. 6/7: the text and the conclusions regarding Figure 5 are difficult to follow. Somewhat more detailed explanations of why which data demonstrate or prove something would be helpful.

      The text describing Figure 5 (lines 155-175) has been revised to provide more detail.

      p. 7, last sentence of the first paragraph: How is this supported by the data? Or should this sentence be better moved to the discussion?

      This sentence (now lines 182-184) is designed as a transition. The first half – “a subtype’s contribution shifts rapidly (because of channel inactivation)” – summarizes the immediately preceding data (Figure 5). The second half – “or slowly (because of [changes in conductance density])” – introduces the next section. The text show in square brackets has been revised. We hope this will be clearer based on revisions to the associated text.

      p. 7, second paragraph, line 3: Please delete one "at both".

      Corrected

      p. 7, second paragraph: Please explain why different time points (DIV4-7, DIV5, or DIV7) were used or studied.

      Initial electrophysiological experiments determined that TTX sensitivity stabilized by DIV 4 (see response to opening point) and we did not maintain neurons longer than 7 days, and so neurons recorded between DIV4 and 7 were pooled. If non-electrophysiological tests were conducted on a specific day within that range, we report the specific day, but any day within the DIV4-7 range is expected to give comparable results. This is now explained on lines 365-367.

      p. 8: the text regarding Fig. 7 should also include the important data (e.g. percentage of neurons showing repetitive spinking) mentioned in the legend.

      This text (lines 216-220) has been revised to include the proportion of neurons converted by PF71 and PF-24 and the associated statistical results.

      Fig. 1: third panel (TTX-sensitive current...) of D & Fig. 2 subpanel of A (Nav1.8 current...). These panels should be explained or mentioned in the text and/or legends.

      We now explain in the figure legends (lines 708-710; 714-715; 736-738) how those currents are found through subtraction.

      Fig. 2 - figure supplement 2. One might consider taking Panel A to Fig. 2 so that the comparison to DIV0 is apparent without switching to Suppl. Figs.

      We left this unchanged so that Figures 2 and 3 are equivalently organized, with negative control data left to the supplemental figures. Elife formatting makes it easy to reach the supplementary figure from the main figure, so we hope this won’t be an impediment to readers.

      Fig. 6 C, middle graph (graph of Nav1.7): Please re-check, whether DIV5 none vs. 24 h and none vs. 120 h are really significantly different with such a low p-value.

      We re-checked the statistics and the difference pointed out by the reviewer is significant at p=0.007. We mistakenly reported p<0.001 for all comparisons, and so this p value has been corrected; all the other p values are indeed <0.001. Notably, the data are summarized as median ± quartile because of their non-Gaussian distribution; this is now explained on line 827 (as a reminder to the statement on lines 461-462). Quartiles are more comparable to SD than to SEM (in that quartiles and SD represent the distribution rather than confidence in estimating the mean, like SEM), and so medians can differ very significantly even if quartiles overlap, as in this case.

      Reviewer #3 (Recommendations For The Authors):

      (1) A critical issue in the manuscript is the use of teleological language. It is likely that this is not the intention, but careful revision of the language should be done to avoid the use of expressions that confer purpose to a biological process. Please, find below a list of statements that I consider require correction.

      • In the Abstract, the first sentence: "Nociceptive sensory neurons convey pain signals to the CNS using action potentials". Neurons do not really "use" action potentials, they have no will or purpose to do so. Action potentials are not tools or means to be "used" by neurons. Other examples of misuse of the verb "use" are found in several other sentences:

      "...nociceptors can achieve equivalent excitability using different combinations of NaV1.3, NaV1.7, and NaV1.8"

      "Flexible use of different NaV subtypes - an example of degeneracy - compromises..."

      "Nociceptors can achieve equivalent excitability using different sodium channel subtypes" "...degeneracy - the ability of a biological system to achieve equivalent function using different components..."

      "...nociceptors can achieve equivalent excitability using different sodium channel subtypes..."

      "Our results show that nociceptors can achieve similar excitability using different NaV channels" "...the spinal dorsal horn circuit can achieve similar output using different synaptic weight combinations..."

      "Contrary to the view that certain ion channels are uniquely responsible for certain aspects of neuronal function, neurons use diverse ion channel combinations to achieve similar function" "In summary, our results show that nociceptors can achieve equivalent excitability using different NaV subtypes"

      “Use” can mean to put into action (without necessarily implying intention). Based on definitions of the word in various dictionaries, we feel we are well within the realm of normal usage of this term. In trying to achieve a clear and succinct writing style, we have stuck with our original word choice.

      • At the end of page 5 and in the legend of Fig. 7, the word "encourage" is not properly used in the sentence "The ability of NaV1.3, NaV1.7 and NaV1.8 to each encourage repetitive spiking is seemingly inconsistent with the common view...". Encouraging is really an action of humans or animals on other humans or animals.

      Like for “use”, we verified our usage in various dictionaries and we do not think that most readers will be confused or disturbed by our word choice. We use “encourage” to explain that increasing NaV1.3, NaV1.7 or NaV1.8 can increase the likelihood of repetitive spiking; we avoided “cause” because the probability of repetitive spiking is not raised to 100%, since other factors must always be considered.

      • In the Abstract and other places in the manuscript, the word "responsibility" seems to be wrongly employed. It is true that one can say, for instance, on page 4 last paragraph "we sought to identify the NaV subtype responsible for repetitive spiking at each time point". However, to confer channels with the human quality of having "responsibility" for something does not seem appropriate. See also page 8 last paragraph, the first paragraph of the Discussion, and the three paragraphs of page 11.

      Again, we must respectfully disagree with the reviewer. We appreciate that this reviewer does not like our writing style but we do not believe that our style violates English norms.

      (2) In the first sentence of the Abstract, nociceptive sensory neurons do not convey "pain signals". Pain is a sensation that is generated in the brain.

      “Pain” is used as an adjective for “signal” and is used to help identify the type of signal. Nonetheless, since the word count allowed for it, we now refer to “pain-related signals” (line 10).

      (3) I do not see the point of plotting the firing rate as a function of relative stimulus amplitude (normalized to the rheobase, e.g., Fig. 1A bottom panels, Fig. 2B, bottom-right, Fig. 2 Supp2A right, Fig. 3 B bottom panels, etc) instead of as a function of the actual stimulus amplitude. I have the impression that this maneuver hides information. This is equivalent to plotting the current amplitudes as a function of the voltage normalized by the voltage threshold for current activation, which is obviously not done.

      This is how the experiments were performed, so it would be impossible to perform the statistical analysis using the absolute amplitudes post-hoc; specifically, stimulus intensities were tested at increments defined relative to rheobase rather than in absolute terms. There are pros and cons to each approach, and both approaches are commonly used. Notably, we report the value of rheobase on the figures so that readers can, with minimal arithmetic, convert to absolute stimulus intensities. No information is hidden by our approach.

      (4) On page 4 it is stated that "We show later that similar changes develop in vivo following inflammation with consequences for drug efficacy assessed behaviourally (see Fig. 8), meaning the NaV channel reconfiguration described above is not a trivial epiphenomenon of culturing". However, what happens in culture may have nothing in common with what happens in vivo during inflammation. Thus, the latter data may not serve to answer whether the culture conditions induce artifacts or not. I suggest tuning down this statement by changing "meaning" to "suggesting".

      On line 97, we now write “suggesting”.

      (5) Page 5, first paragraph, I miss a clear description of the mathematical models. Having to skip to the Methods section to look for the details of the models as the artifices introduced to simulate different conditions is rather inconvenient.

      So as not to disrupt the flow of the presentation with methodological details, we only provide a short description of the model in the Results. We have slightly expanded this to point out that the conductance-based model is also single-compartment (line 111). We provide a very thorough description of our model in the Methods, especially considering all the details provided in Supplementary Tables 1, 5 and 6. We also report conductance densities and % changes in figure legends (lines 722, 747-748; now shown in red). This is also true for Figure 3-figure supplement 2 (lines 756-759). We tried very hard to find a good balance that we hope most readers will appreciate.

      (6) Page 6, second paragraph, simulations do not serve to "measure" currents.

      The sentence been revised to indicate that simulations were used to “infer” currents during different phases of the spike (line 155).

      (7) Page 7, regarding the tile of the subsection "Control of changes in NaV subtype expression between DIV0 and DIV4-7", the authors measured the levels of expression, but not really the mechanisms "controlling" them. I suggest writing "changes in NaV subtype expression between DIV0 and DIV4-7"

      We have removed “control of” from the section title (line 185)

      (8) What was the reason for adding a noise contribution in the model?

      We now explain that noise was added to reintroduce the voltage noise that is otherwise missing from simulations (line 474). For instance, in the absence of noise, membrane potential can approach voltage threshold very slowly without triggering a spike, which does not happen under realistically noisy conditions. Of course membrane potential fluctuates noisily because of stochastic channel opening and a multitude of other reasons. This is not a major issue for this study, and so we think our short explanation should suffice.

      (9) Please, define the concept of degeneracy upon first mention.

      Degeneracy is now succinctly defined in the abstract (line 20).

    1. Author Response

      The following is the authors’ response to the current reviews.

      Our answer to the final point(s) raised is as follows:

      "We thank the reviewer for the comment. We checked our datasets accordingly. Typically, the n of cells showed deviations of maximally 20% from experiment to experiment (e.g. 16-24 cells per experiment). Additionally, experiments were performed using different passages of the cells. Moreover, data were validated at different time-points during the study using newly thawed cell lines."


      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Bischoff et al present a carefully prepared study on a very interesting and relevant topic: the role of ion channels (here a Ca2+-activated K+ channel BK) in regulating mitochondrial metabolism in breast cancer cells. The potential impact of these and similar observations made in other tumor entities has only begun to be appreciated. That being said, the authors pursue in my view an innovative approach to understanding breast cancer cell metabolism. Considering the following points would further strengthen the manuscript:

      We thank reviewer #1 for the overall positive feedback on our study.

      Methods:

      (1) The authors use an extracellular Ca2+ concentration (2 mM) in their Ringer's solutions that is almost twice as high as the physiologically free Ca2+ concentration (ln 473). Moreover, the free Ca2+ concentration of their pipette solution is not indicated (ln 487).

      Indeed, we utilized 2 mM of Ca2+ in the physiologic live-cell imaging buffer. This concentration could actually be a little lower than the total Ca2+ concentration (ranging usually from 2.2 to 2.6 mM) in the body, while the free Ca2+ concentration is typically half as high. Nevertheless, we find multiple studies different from ours, which utilized 2 mM for their live-cell-based experiments. Please check the following studies, which represent only a small selection:

      https://doi.org/10.1038/s41598-019-49070-8

      https://doi.org/10.1016/j.bpj.2020.08.045

      https://doi.org/10.1016/j.redox.2022.102319

      However, to ensure that the applied conditions are physiologically relevant, we reperformed experiments using MMTV-PyMT WT and MMTV-PyMT BK-KO cells and compared cytosolic Ca2+ concentrations over time in response to cell stimulation with ATP, either in the presence of 1.0 mM (Author response image 1A) or 2.0 mM extracellular Ca2+ (Author response image1B). The respective graphs are attached in the following for reviewer’s inspection. As expected, we find that the intracellular Ca2+ concentration in MMTV-PyMT WT and BK-KO cells was dependent on the extracellular Ca2+ concentration. Importantly, however, irrespective of the exact Ca2+ concentration applied, we observed a similar difference in basal cytosolic Ca2+ between MMTV-PyMT WT and BK-KO cells (Author response image1C).

      Author response image 1.

      Cytosolic Ca2+ concentrations over-time in the presence of 1.mM or 2.0 mM extracellular Ca2+.

      Concerning the Ca2+ concentration in the patch-pipette – we are very glad that you uncovered an error in our description and apologize for the mistake. Actually, the information the reviewer is referring to was already given in the previous version of the manuscript, but unclear because a comma was shifted (see line 487 in the originally submitted manuscript). The Ca2+ concentration of the patch-pipette was 0.1 mM in the presence of 0.6 mM EGTA, which should (according to Ca-EGTA calculator, https://somapp.ucdmc.ucdavis.edu/pharmacology/bers/maxchelator/CaEGTA-NIST.htm) be equivalent to ~30 nM of free Ca2+ in the patch pipette. We corrected the mistake in the manuscript and thank the reviewer again for spotting this inaccuracy.

      (2) Ca2+I measurements: The authors use ATP to elicit intracellular Ca2+ signals. Is this then a physiological stimulus for Ca2+ signaling in breast cancer? What is the rationale for using ATP? Moreover, it would be nice to see calibrated baseline values of Ca2+i.

      We thank the reviewer for the comment and suggestion. Importantly, it was demonstrated recently, that all of the utilized cell lines respond to treatment with extracellular ATP with a prominent increase in Ca2+I, most probably indicating the expression of purinergic receptors, which was a prerequisite to observe ATP induced changes in [Ca2+]i.

      https://doi.org/10.1038/s41419-022-05329-z,

      https://doi.org/10.1093/carcin/bgt493

      https://doi.org/10.1038/s41598-018-26459-5

      Furthermore, ATP plays a crucial role in the tumor microenvironment, where high rates of cell death occur. Hence, ATP is of pathophysiologic relevance for the utilized cancer cell lines.

      https://doi.org/10.1038/s41568-018-0037-0

      https://doi.org/10.3390/cells9112496

      https://doi.org/10.1002/jcp.30580

      Following the suggestions by Reviewer #1 (and #2), we included calibrations of Ca2+cyto and Ca2+mito in the manuscript, by depleting the intracellular Ca2+ stores using Ionomycin in the absence of extracellular Ca2+ (EGTA) to validate the basal difference in Ca2+cyto and Ca2+mito. Additionally, Ca2+cyto was calibrated under basal and inhibitor treated conditions, and values in nM are given in the text (p. 5, lines 185-190, 193-195 and 199-200, in the tracked changes version of the MS). The new data can be found in new Figure S2F – Figure S2J and new Figure S2R – Figure S2V. Moreover, we calculated basal [Ca2+]cyto in the different BKCa pro- and deficient cell lines and under inhibitor treated conditions. We additionally added information about the pathophysiologic relevance of ATP in the tumor microenvironment in lines 175-178 in the tracked changes version of the manuscript.

      (3) Membrane potential measurements: It would be nice to see a calibration of the potential measurements; this would allow us to correlate the IV relationship with membrane potential. Without calibration, it is hard to compare unless the identical uptake of the dye is shown. Does paxilline or IbTx also induce depolarization?

      We thank the reviewer for the suggestion. Indeed, membrane potential calibrations/ measurements using the membrane potential sensitive dye Dibac4(3) would be interesting, however, technically hardly feasible. The reason is that the principle of the dye is based on different uptake in response to differences in membrane potential, and not ratiometric as for most other dyes/ sensors used. Considering this limitation, we decided to perform membrane potential measurements by patch-clamp analysis. Additionally, we performed these experiments upon inhibition of PM-located BKCa by IBTX. Current-clamp experiments confirmed the difference in basal membrane potential between MMTV-PyMT WT and BK-KO cells (consult new Figure S1C and lines 127-130 in the tracked changes version of the manuscript). Interestingly, IBTX treatment depolarized the PM potential to the BK-KO cell level, which validates that BK activity and PM potential are connected. In addition to this approach, we utilized our recently developed genetically encoded K+ sensors revealing basal differences in [K+]cyto between MMTV-PyMT WT and BK-KO cells. Also this difference between both genotypes was equalized by IBTX as the respective treatment increased [K+]cyto only in WT cells, which most likely explains the cause of PM depolarization (consult lines 130-135 in the tracked changes version of the manuscript and new Figure S1D and Figure S1E).

      (4) Mito-potential measurements: Why did the authors use such a long time course and preincubate cells with channel blockers overnight? Why did they not perform paired experiments and record the immediate effect of the BK channel blockers in the mito potential?

      We thank the reviewer for the suggestion. We performed TMRM-based experiments with MMTV-PyMT WT cells in response to short-term exposure to paxilline, which did not significantly affect the mitochondrial membrane potential, at least within 15 minutes of treatment (Author response image 2). This indicates, that further downstream processes subsequent to (mito) BKCa inhibition affect the mitochondrial membrane potential(MMP), most probably including remodeling processes of the respiratory chain, mitochondrial ion homeostasis or glycolytic activity, ultimately also delivering reduction equivalents to mitochondria. Our final goal was to validate potential differences between a BKCa pro-and deficient cell model, whereby the latter cells lacked the BKCa channel since its origination. Hence, “long-term” (~12h) BKCa inhibition as performed in our experiments rather reflects the BK-KO cell situation. Taken together with the new experiment (Author response image 2), we can now state that the effect of BK inhibition on the MMP is at least not the consequence of an acute (within minutes) channel blockade.

      Author response image 2.

      Mitochondrial membrane potential, as measured using TMRM, in response to acute short-term administration of 5µM paxilline, followed by mitochondrial depolarization using FCCP.

      (5) MTT assays are also based on mitochondrial function - since modulation of mito function is at the core of this manuscript, an alternative method should be used.

      We thank the reviewer for the important comment. We performed additional, immunofluorescence-based experiments using Ki-67 staining to assess cell proliferation rates. The newly added data can be found in the text, lines 409-412 in the tracked changes version of the manuscript and new Figure S6D-F. The results obtained confirm the MTTbased results (Fig.6H-I).

      Results:

      (1) Fig. 5G: The number of BK "positive" mitoplasts is surprisingly low - how does this affect the interpretation? Did the authors attempt to record mitoBK current in the "whole-mitoplast" mode? How does the mitoBK current density compare with that of the plasma membrane? Is it possible to theoretically predict the number of mitoBK channels per mitochondrion to elicit the observed effects? Can these results be correlated with the immuno-localization of mitoBK channels?

      Indeed, the number of BKCa-positive mitoplasts appears low on a first view. However, as these experiments were performed in a mitoplast-attached mode, it is important to keep in mind that only a very small area of the actual mitoplast is investigated with each patch. If no channel was detected in such region, the patch was depicted as “empty”, as presented in Fig.5G, which does, however, not mean that the entire mitochondria was actually BKCa negative. Hence, the density of BKCa in the IMM might be higher than expected from our experiments. Nevertheless, already earlier results using glioblastoma cell lines – considered to be one of the cell lines mostly enriched in mitoBKCa – demonstrated a quite low density of BKCa β4 regulatory subunit in mitochondria – please see figure 2B in the following paper: 10.1371/journal.pone.0068125 – which (based on 1:1 stoichiometry of α and β subunits) also suggests that the density of the alpha subunit of BKCa might be low in this compartment.

      Author response image 3.

      Author response image 3: Schematic representation of mitoplast attached patch-clamp experiments

      Theoretically, density predictions of mitoBK compared to PM localized BKCa would be possible if whole-mitoplast experiments were performed, however, we are unsure what added value this information would actually burst, allowing the pharmacologic modulation of structures originally located within the mitochondrial matrix. Please also consult Author response image 3. According to the most recent models, even if there are other views on this, mitoBKCa is oriented in a way, that the C-terminus with its Ca2+ binding bowl is located within the mitochondrial matrix. Hence, to allow Ca2+ sensitivity experiments of the channel, broken up (by swelling) mitoplasts are required to make the Ca2+ binding bowl accessible for Ca2+ manipulations in the bath solution. This approach does not allow us to compare the channel density to that of the PM.

      Finally, to the best of our knowledge, a combination of immunofluorescence with mitoplast patch-clamp experiments is not feasible yet, and would probably be impossible due to the low density of the mitoBKCa as well as the lack of highly sensitive and specific antibodies.

      (2) There are also reports about other mitoK channels (e.g. Kv1.3, KCa3.1, KATP) playing an important role in mitochondrial function. Did the authors observe them, too? Can the authors speculate on the relative importance of the different channels? Is it known whether they are expressed organ-/tumor-specifically?

      Author response image 4.

      Representative single channels different to mitoBKCa detected in MDAMB-453 mitoplasts.

      The reviewer is right, other K+ channels have been found in mitochondria and these also play a role in tumor cells. This is also consistent with our data (Fig.5G), where we observed other channels in the mitoplasts of BCCs as well. These all four cell lines tested. According to their conductance and our expectations from literature, these channels may e.g. include mitoIKCa, mitoSKCa, mitoKATP orothers (10.1146/annurev-biophys-092622-094853). As we focused, however, on patches containing a mitoBKCa, we did not further pharmacologically characterize these channels. Two examples of channels we found in these mitoplasts besides BKCa are presented for reviewers’ inspection (Author response image 4). As our manuscript focusses on mitoBKCa, we did not further classify these channels in smaller subgroups according to their conductance, as we feel that a differentiation between BKCa (~210 pS), and channels showing a conductance ≤150pS, or a conductance ≤100 pS is sufficient. Furthermore, this additional information would dilute our story too much making it difficult for the (non-specialist) reader to follow the red thread of the study. We added respective information in the manuscript, however. Please consult lines 365-366 in the tracked changes version of the manuscript.

      Reviewer #1 is right, the observed the different K+ channels might of course be organ- or tumor-specific. For example, it has been reported that the expression of K+ channels is different in various cancer cell (lines) (https://doi.org/10.2174/13816128113199990032, 10.1016/j.pharmthera.2021.107874, 10.1038/nrc3635), a fact, which also according to our study might be exploited for pharmacological manipulation, aiming to affect proliferation/apoptosis of cancer cells. Further, a recently published single-cell and spatially resolved atlas of human breast cancer implies that the expression of different K+ channels (such as mitoIKCa, mitoSKCa, mitoKATP) might even differ between cancer- and non-cancer cells within a single tumour (https://doi.org/10.1038/s41588-021-00911-1).

      Reviewer #2 (Public Review):

      Summary:

      The large-conductance Ca2+ activated K+ channel (BK) has been reported to promote breast cancer progression, but it is not clear how. The present study carried out in breast cancer cell lines, concludes that BK located in mitochondria reprograms cells towards the Warburg phenotype, one of the metabolic hallmarks of cancer.

      Strengths:

      The use of a wide array of modern complementary techniques, including metabolic imaging, respirometry, metabolomics, and electrophysiology. On the whole, experiments are astute and well-designed and appear carefully done. The use of BK knock-out cells to control for the specificity of the pharmacological tools is a major strength. The manuscript is clearly written.

      There are many interesting original observations that may give birth to new studies.

      Weaknesses:

      The main conclusion regarding the role of a BK channel located in mitochondria appears is not sufficiently supported. Other perfectible aspects are the interpretation of co-localization experiments and the calibration of Ca2+ dyes. These points are discussed in more detail in the following paragraphs:

      We thank reviewer #2 for the thorough assessment of our study.

      (1) May the metabolic effects be ascribed to a BK located in mitochondria? Unfortunately not, at least with the available evidence. While it is clear these cells have a BK in mitochondria (characteristic K+ currents detected in mitoplasts) and it is also well substantiated that the metabolic effects in intact cells are explained by an intracellular BK (paxilline effects absent in the BK KO), it does not follow that both observations are linked. Given that ectopic BKDEC appeared at the surface, a confounding factor is the likely expression of BK in other intracellular locations such as ER, Golgi, endosomes, etc. To their credit, authors acknowledge this limitation several times throughout the text ("...presumably mitoBK...") but not in other important places, particularly in the title and abstract.

      We thank the reviewer for this important comment and amended the title and abstract, respectively. The title of the manuscript was changed to “mitoBKCa is functionally expressed in murine and human breast cancer cells and potentially contributes to metabolic reprogramming.” Additionally, we changed appropriate passages in the text, to emphasize that mitoBKCa potentially mediates the metabolic reprogramming, but other intracellular channels could also contribute to these processes.

      (2) MitoBK subcellular location. Pearson correlations of 0.6 and about zero were obtained between the locations of mitoGREEN on one side, and mRFP or RFP-GPI on the other (Figs. 1G and S1E). These are nice positive and negative controls. For BK-DECRFP however, the Pearson correlation was about 0.2. What is the Z resolution of apotome imaging? Assuming an optimum optical section of 600 nm, as obtained by a 1.4 NA objective with a confocal, that mitochondria are typically 100 nm in diameter and that BK-DECRFP appears to stain more structures than mitoGREEN, the positive correlation of 0.2 may not reflect colocalization. For instance, it could be that BK-DECRFP is not just in mitochondria but in a close underlying organelle e.g. the ER. Along the same line, why did BK-RFP also give a positive Pearson? Isn´t that unexpected? Considering that BK-DEC was found by patch clamping at the plasma membrane, the subcellular targeting of the channel is suspect. Could it be that the endogenous BK-DEC does actually reside exclusively in mitochondria (a true mitoBK), but overflows to other membranes upon overexpression? Regarding immunodetection of BK in the mitochondrial Percoll preparation (Fig. S5), the absence of NKA demonstrates the absence of plasma membrane contamination but does not inform about contamination by other intracellular membranes.

      Indeed, it seems that BKCa-DEC is not an exclusive mitoBKCa, at least not upon (over-/)expression in MCF-7 cells. It is known from literature, that mitochondrial K+ channels are encoded by the nuclear genome, as no obvious gene for a K+ channel is found in the mitochondrial genome. Channel proteins are synthetized by cytosolic ribosomes and likely translocated into mitochondria via the TOM/TIM system. Although some K+ channels possess a mitochondrial targeting sequence at the N-terminus, their import is mostly far from a general mechanism, and this seems also to be true for BK channels. In the case of the K+ channel Kv1.3, an even more complex scenario is hypothesized, as the channel located in the PM could be transferred to mitochondria via mitochondria-associated membranes (MAM) structures of the ER (https://doi.org/10.3390/ijms20030734). Yet, the detailed mechanism for BK shuttling to mitochondria is not fully understood. Possibly, overflow is exactly what is happening, due to very high levels of BK-DEC expression upon transfection. However, that the channel translocates to the IMM upon transfection is not surprising and was also demonstrated for other cell models including HEK293 – see e.g. 10.1038/s41598-021-904653. Unfortunately, transfection efficiency of MCF-7 is quite low compared to HEK293 – hence, quantitative statements from mito-patches upon transfection are difficult.

      In order to ensure that the mitochondrial colocalization is not a matter of poor microscope resolution, we reperformed these experiments using confocal imaging on a Zeiss LSM980 with an Airyscan 2 detector, yielding z resolutions of ~ 450 nm. These experiments confirmed the increased colocalization of BKCa-DEC with mitochondria compared to BKCa lacking the DEC exon. Furthermore, this imaging at higher resolution demonstrated, that, unfortunately, colocalization might not be the best analysis, as especially fragmented mitochondria showed a clear MitoGREEN stained matrix, surrounded by red fluorescence derived from BKCaDECRFP present in the IMM (revised Fig. 1G).

      To validate the results derived from immunoblotting, we additionally stained the membranes for TMX1, a marker for the ER membrane. This analysis confirmed the high purity of the mitochondrial isolation without ER-membrane contamination after percoll purification, and hence validated the presence of BKCa in the mitochondrial membrane (revised Fig. S5D). The additional information can be found in lines 156-159 in the tracked changes version of the manuscript.

      (3) Calibration of fluorescent probes. The conclusion that BK blockers or BK expression affects resting Ca2+ levels should be better supported. Fluorescent sensors and dyes provide signals or ratios that need to be calibrated if comparisons between different cell types or experimental conditions are to be made. This is implicitly acknowledged here when monitoring ER Ca2+, with an elaborate protocol to deplete the organelle in order to achieve a reading at zero Ca2+.

      We thank the reviewer for the important comment. Please note that at no point in the manuscript we aim to compare different cell lines concerning their intracellular Ca2+ concentration, but we only compare the same cell lines after the different treatments, as we are aware of this limitation of fluorescent probes. However, to validate the differences in intracellular Ca2+ concentrations, we calibrated the signals derived from Fura-2 and 4mtD3cpV using ionomycin in combination with cellular Ca2+ depletion/ saturation. The newly added data can be found in the text, lines 185-190, 192-195, 199-200, and 228-230 in the tracked changes version of the manuscript, as well as new Figure S2F – Figure S2J and new Figure S2R – Figure S2V

      Line 203. "...solely by the expression of BKCa-DECRFP in MCF-7 cells". Granted, the effect of BKCa-DECRFP on the basal FRET ratio appears stronger than that of BK-RFP, but it appears that the latter had some effect. Please provide the statistics of the latter against the control group (after calibration, see above).

      Author response image 5.

      Dot blot for data shown in Figure 2I.

      The reviewer is right, it seems that BKCaRFP may also affect [Ca2+]mito. However, the effect is not significant and shows a p-value of p>0.999 using Kruskal-Wallistest followed by Dunn’s multiple comparison test, due to the non-normally distributed nature of the data. p=0.0002 for ctrl vs. BKCa-DECRFP and 0.0022 for BKCaRFP vs. BKCa-DECRFP, however. We added a scatter dot-blot of the respective data as Author response image 5 for reviewer’s inspection. Additionally, first, even using a more stringent statistical test by only comparing ctrl vs BKCaRFP using Mann-Whitney test, the results are not significant, as the p-value was determined at 0.4467, and second, we performed the requested Ca2+calibration using ionomycin under these conditions, which confirmed the difference between ctrl cells and BKCa-DECRFP expressing cells, but not BKCaRFP expressing ones. Please see Figure S2V.

      Reviewer #3 (Public Review):

      The original research article, titled "mitoBKCa is functionally expressed in murine and human breast cancer cells and promotes metabolic reprogramming" by Bischof et al, has demonstrated the underlying molecular mechanisms of alterations in the function of Ca2+ activated K+ channel of large conductance (BKCa) in the development and progression of breast cancer. The authors also proposed that targeting mitoBKCa in combination with established anti-cancer approaches, could be considered as a novel treatment strategy in breast cancer treatment.

      The paper is clearly written, and the reported results are interesting.

      Strengths:

      Rigorous biophysical experimental proof in support of the hypothesis.

      Weaknesses:

      A combinatorial synergistic study is missing.

      We thank reviewer #3 for the positive summary of our study. Indeed, we propose that targeting of mitoBKCa in combination with established anti-cancer drugs may represent a novel anti-cancer treatment strategy. Unfortunately, we feel that the manuscript is very condensed already, and that adding respective required experiments and data to support this hypothesis will make the flow of the manuscript more complex or even incomprehensible. As no attempts linking mitoBKCa activity with anti-cancer therapies have been made so far, we removed the respective information from the abstract and only discuss this aspect.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) Statistics: Legends have to contain information about the number of biological replicates (N) and cells analysed (n). Statistics must be calculated with the averages of the replicates.

      Author response image 6.

      Representative single cell responses of Fura-2 loaded MMTV-PyMT WT cells.

      We thank the reviewer for the comment and added the missing details to all figure legends.

      We feel that using each cell represents exactly the power of high-resolution live-cell imaging, as there is no better biological replicate than a single separated cell, which is observed by fluorescence microscopy. This analysis is also able to visualize cell-to-cell differences in the microscopy area, similarly to patch-clamp experiments, where each single cell or mitoplast patched is used as a single replicate. Please find a representative dataset derived from fluorescence microscopy of different responses of neighboring single cells in Author response image 6.

      (2) Fig. 1G: This is a poor resolution figure, mostly because of its far too small size; in its current form it bears very little information.

      We agree with reviewer #1 and reperformed the imaging experiments using high resolution confocal imaging and exchanged the respective images. We feel that this increased the quality of the images significantly. Unfortunately, we were not able to increase the size of the images in the main figure, hence, we added magnifications of the respective images as new Figure S1I.

      (3) Fig. 1H: What do the dotted grey lines and the labels stand for?

      We believe Reviewer #1 is probably referring to Figure 1G. As indicated in the figure panel and in the text, the grey dotted lines and labels indicate the colocalization scores of mtRFP and RFP-GPI with MitoGREEN, respectively. These data are also shown in Figure S1H, including error bars and statistics. We added additional information in the text to make the meaning of the lines clearer to the reader. Please consult lines 149 – 150 in the tracked changes version of the manuscript.

      Reviewer #2 (Recommendations For The Authors):

      (1) May the metabolic effects be ascribed to a BK located in mitochondria? Short of a way to tackle BK function and metabolism specifically in mitochondria, the conclusion may best be toned down to "intracellular BK". For the time being the term "mitoBK" appears too ambitious.

      We fell that you are right and that our previous overstatement requires adaptation as a clear (100%) attribution of the observed metabolic effects solely to mitoBKCa is not definitely possible. We have therefore amended all relevant passages in the entire MS accordingly.

      (2) MitoBK subcellular location. Please address the points raised in the Public Review.

      As stated above we addressed the point raised in the public review accordingly (please consult new Figure S1I and revised Figure 1G).

      (3) Calibration of fluorescent probes. Please provide calibrations for cytosolic and mitochondrial Ca2+, for example, the standard high Ca2+/ionophore/metabolic inhibition treatment to reach saturation followed by Ca2+ chelation to obtain zero Ca2+.

      We thank the reviewer for the comment. As you can see from our response to the public review, we performed the respective experiments, and datasets were added in the manuscript.

      (4) Line 203. "...solely by the expression of BKCa-DECRFP in MCF-7 cells". Granted, the effect of BKCa-DECRFP on the basal FRET ratio appears stronger than that of BK-RFP, but it appears that the latter had some effect. Please provide the statistics of the latter against the control group (after calibration, see above).

      Please consult our response to the (same) comment in the public review.

      (5) Line 228. The statement "Similar results were obtained in MDA-MB-453 cells" is confusing. As shown in Fig.3, pax reduced ECAR and OCR in MMTV-PyMT WT cells. As ibtx was without effect, it is suggested that intracellular BK support metabolism. However, the effect of pax on MDA cells was the opposite. Doesn´t this divergence speak against a universal role of intracellular BKs in promoting metabolism in BCCs? A similar point may be made regarding metabolomics, which showed no effects of pax on lactate and pyruvate in MMTV-PyMT WT cells but stimulation in MDA cells. Perhaps the word "promotes" in the title of the figure should be replaced by something more neutral like "affects" or "alters", as used elsewhere,

      We thank the reviewer for pointing out the overstatement regarding intracellular BK functions and changed the title of the figure as suggested.

      With regard to the experiments mentioned, we would like to point out the following aspects:

      First, the cell lines used strongly differ in their metabolic settings under basal conditions. While both, MMTV-PyMT and MDA-MB-453 cells seem to show similar basal ECAR levels (if BKCa was present), their OCR seems to differ strongly. MMTV-PyMT cells seem to show a basal OCR which is almost at the maximum already, while MDA-MB-453 cells possess a tremendous capacity in their OCR, as observed upon mitochondrial uncoupling using FCCP. Of note, both, ECAR and OCR are indirect metabolic measures. On the one hand, ECAR measures extracellular acidification, which is accomplished by H+ along with lactate secretion. However, lactate secretion is not the only process leading to extracellular acidification, and ECAR may hence measure a variety of H+ releasing processes, including processes of vesicle secretion. On the other hand, OCR is not directly linked to ATP production, as mitochondrial complex IV is consuming O2, ATP, however, is produced by mitochondrial complex V. This becomes even more evident when having a look on OCRs after FCCP treatment – under these conditions, the H+ gradient is destroyed and ATP synthase activity is reduced, OCR, however, increases to the maximum due to increased supply of mitochondrial complex IV with H+.

      Second, please note that the LC-MS-based metabolomics derive from a static single time point and not from an over-time “live” read-outs. Moreover, underlying dynamics of the parameters measured can not be assessed. Hence, as an example, increasing levels of pyruvate can e.g. indicate faster generation, or slower subsequent degradation/ metabolization. A clear in-depth statement about what is happening under basal and BKCa inhibitor treated conditions is hence not possible. The only conclusion possible to draw from these experiments is that paxilline treatment differentially affects metabolic pathways in these cells.

      Based on these limitations of both methods, we decided to perform our in-depth fluorescence microscopy-based analysis, which provided strong evidence for intracellular BKCa channels on mitochondrial ATP production. Despite opposing effects of BKCa inhibition on OCR in MMTV-PyMT WT and MDA-MB-453 cells, mitochondrial ATP production was reduced, if BKCa-DECRFP was expressed/ intracellular BKCa was functional.

      In line with these findings, mitoBKCa was recently described as an uncoupling protein, which could furthermore explain the differential effects of intracellular BKCa inhibition on OCR. https://doi.org/10.1038/s41598-021-90465-3

      Minor

      (6) Fig. 1C. Average fluorescence intensity in 6 experiments was about 20% higher in BK-KO cells relative to WT. Such a small difference is significant but should not be evident to the eye. The pictures selected for illustration appear to show a much larger difference and therefore may not be representative. If this is the case, please omit them. The same goes for the other representative pictures.

      Author response image 7.

      : Representative images at different brightnesses.

      Please note, that the analysis of the images was done in an unbiased way using a Fiji macro. After analysis, we chose representative images, which were closest to the average.

      Furthermore, we must kindly disagree with the reviewer as changes of 20% in fluorescence intensity are indeed evident to the eye (consult Author response image 7). This panels show the same image at different brightness levels with intensity differences of 20%. Hence, we feel, that all the images the reviewer was referring are representative for the values given.

      (7) Line 130. The definition of "recent" is of course relative, but 10 years?

      We are very glad that you have discovered this “inconsistency", and reworded the respective phrase accordingly.

      (8) Line 327. "conductivity" is the property of a medium, "conductance" is the property of a component, such as a channel.

      We thank the reviewer for the important comment. We revised the text accordingly.

      (9) Various figures. FRET sensor data are expressed as Ratio(FRET/CFP). This is unusual, typically it should be FRET ratio (YFP/CFP), FRET ratio(mTFP/Venus), etc. Please note that the FRET partners differ between sensors.

      We acknowledge the comment of the reviewer. It is correct that fluorescent proteins vary widely between the sensors (used). Please note, however, the following: The emission measured from these sensors actually represents FRET, as CFP but not YFP is directly excited. Hence, emission is FRET, not the “intrinsic” fluorescence of the YFP. This is getting more and more important to differentiate, as there are probes existing, which can also be “alternately” excited, i.e. CFP and YFP separately, which will then yield the YFP/CFP ratio (https://doi.org/10.1021/acssensors.8b01599). In case of only CFP excitation, we feel, that the term FRET/CFP is preferable over other labelings such as YFP/CFP.

      (10) BK-DEC makes BCCs cells less oxidative. However, BK-DEC was first described in cardiomyocytes, which are among the most oxidative cell types. It would be useful if authors could address this apparent contradiction in the Discussion Section.

      That is an exciting point that we addressed as follows in the revised MS:

      First, it is important to mention that cardiac myocytes do not show a metabolic Warburg setting and are – under physiologic conditions – maintained in a high O2 environment.

      Second, a recent study from our group addressed the question about the role of mitoBKCa in primary cardiac myocytes. Indeed, mitoBKCa was functionally expressed in these cells. Interestingly, under physiologic conditions, the channel did not alter (multiple) cell behaviours nor overall cardiac physiology in a mouse model. However, upon induction of ischemia/ reperfusion injury, a lack of BK increased cardiac susceptibility to cell death resulting in increased infarction size (https://doi.org/10.1161/CIRCULATIONAHA.117.028723). Hence, also in this cell model, BKCa only played a role under oxygen limited conditions/ conditions where mitochondria were not properly functioning. Thus, the results derived from cardiac myocytes support our recent findings in BCCs, as BKCa mediates BCC resistance to hypoxic stress/ makes BCCs more independent from oxidative metabolism.

      Parts of this discussion were included in the revised MS. Please consult lines 490-500 in the tracked changes version of the manuscript.

      Reviewer #3 (Recommendations For The Authors):

      (1) The study is very well designed and most of the computational analyses were done rigorously.

      We highly appreciate the positive feedback by reviewer #3.

      (2) The authors should discuss the expression of BKCa in different subsets of breast cancer. Authors may also debate on the level of steroid receptors and BKCa expressions.

      We thank reviewer #3 for the important suggestion and added the requested information in the discussion, lines 445-447 and 450-454 in the tracked changes version of the manuscript.

      (3) In the discussion section, the authors mentioned that the MCF7 cell is the best model to study this hypothesis. Does it imply that triple-negative breast cancer cell lines express lower levels of BKCa? The authors should discuss this.

      We thank the reviewer for the interesting comment; we would like to point out that the ERα-positive MCF-7 cell line was used to study experimental overexpression of BKCa at an otherwise low baseline level. This does not imply that BKCa is expressed at lower levels in TNBC cell lines; in fact a recent study showed the opposite, i.e. overexpression of BKCa in TNBC patients (10.1186/s12885-020-07071-1). Consistent with our work, the authors conclude that the channel could even be a new strategy for development of a targeted therapy in TNBC. We also added this information in the discussion, lines 450-454 in the tracked changes version of the manuscript.

      (4) The authors propose that combinatorial targeting of mitoBKCa along with known breast cancer chemotherapeutics can open a new horizon in breast cancer treatment. However, the authors did not perform any experiment to show the synergistic effect as mentioned.

      As already stated in the public reviews, we feel that the manuscript is very condensed already, and that adding the respective experiments and data will make the flow of the study even more complex. For the moment, we removed all information and statements linking mitoBKCa with anti-cancer treatment strategies from the abstract and only discuss this aspect. We hope that the reviewer agrees with us that an extensive analysis of the functional mitoBKCa status in the context of established breast cancer therapies must be addressed by (our) future studies.

      Minor Comments:

      There are several typos and grammatical errors that need further attention and rephrasing.

      We thank the reviewer for the comment and revised the text accordingly.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This paper represents important findings when identifying untargeted metabolomics and its differences between metabolomes of different biological samples. GromovMatcher is the fantasy name for the soft development. The main idea behind it is built on the assumption of featuring and matching complex datasets. Although the manuscript reflects a solid analysis, it remains incomplete for validation with putative non-curated datasets.

      We are grateful to the eLife editor for taking the time and effort to assess our manuscript.

      We are however unsure of what the editor means by “it remains incomplete for validation with putative non-curated datasets”. As noted by Reviewer 2, manually curated datasets that could be used for validation are scarce. Most publicly available datasets do not contain sufficient information to establish a ground truth matching on which GromovMatcher, M2S, or metabCombiner can be tested. Even in the case where such a ground truth matching can be established, it must be performed by-hand through a manual matching process which is extremely time-consuming and requires very specific expertise. This, in our opinion, only highlights the need for automatic alignment methods such as metabCombiner, M2S or GromovMatcher.

      We do agree that the performance of GromovMatcher (and its competitors) needs to be validated further, and we plan to continue validating GromovMatcher as additional data becomes available in EPIC and other cohorts. With that in mind, the lack of publicly available validation data is the reason why we conducted such an extensive simulation study, arguably more comprehensive than previous validations, exploring challenging settings that we believe reflect real-life scenarios (main text “Validation on ground-truth data” and Appendix 3). We would like to stress that this allows us to highlight previously ignored limitations of the previously published methods, metabCombiner and M2S.

      We wish to thank the editor and reviewers for their time and efforts in reviewing our manuscript which led to many significant additions to our paper. Namely we:

      • Performed an additional sensitivity analysis (Appendix 3) exploring how an imbalance in the number of features or samples between two studies being matched (e.g. the dataset split), affects the quality of matchings found by GromovMatcher, metabCombiner, and M2S.

      • Investigated how changing or removing the reference dataset (Appendix 5) in the EPIC study (main text “Application to EPIC data”), affects the results of GromovMatcher.

      • Improved alignment matrix visualizations in Fig. 3a for all four methods tested on the validation data, to highlight more clearly which feature matches were correctly identified or missed.

      The revised paper is uploaded as the file “main_elife_revision.pdf” where all revisions are highlighted in blue as well as a copy “main_elife_revision_nohighlights.pdf” where revisions are not highlighted.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors have implemented the Optimal Transport algorithm in GromovMatcher for comparing LC/MS features from different datasets. This paper gains significance in the proteomics field for performing meta-analysis of LC/MS data.

      Strengths:

      The main strength is that GromovMatcher achieves significant performance metrics compared to other existing methods. The authors have done extensive comparisons to claim that GromovMatcher performs well.

      Weaknesses:

      There are two weaknesses.

      (1) When the number of features is reduced the precision drops to ~0.8.

      We would like to clarify that this drop in precision occurs in the challenging setting where only a small proportion of metabolites are shared between both datasets (e.g., the overlap – or proportion of shared features - was 25% in our simulation study). When two untargeted metabolic datasets share only 25% of their features, this is a challenging setting for any automated matching method as the vast majority 75% of the features in both datasets must remain unmatched.

      In such settings, the reviewer correctly observes that the precision of GromovMatcher algorithms (GM and GMT) drops within the range of 0.80 - 0.85 (Figure 3b, top left panel). Such a precision of 0.8 or larger is still competitive compared with the alternative methods MetabCombiner (mC) and M2S whose precisions drop below 0.8 (see main text Fig. 3b, top left panel).

      Precision is measured as the number of metabolite pairs correctly matched divided by all matches identified by a method. In other words, even in the challenging setting when the number of shared features (true matches) between both datasets is small (e.g. low 25% overlap), upwards of 80% of the feature matches found by GromovMatcher are correct which is a very encouraging result.

      (2) How applicable is the method for other non-human datasets?

      We thank the reviewer for raising this question. The crux of the matter concerning the application to animal data revolves around the hypothesis that correlations between metabolites in two different studies are preserved. Theoretically, the metabolome operates under similar principles in humans, governed by an underlying network of biochemical reactions. Consequently, in comparable human populations, the GM hypothesis is likely to hold to some extent.

      However, in practice, application to animal data is more complicated. Animal studies tend to have smaller sample sizes and often stem from intervention-driven scenarios, such as mice subjected to specific diets or chemicals. This results in deliberate alterations in metabolic structures which makes finding two comparable animal studies less likely. To investigate the reviewer’s question, we have searched through the two predominant LC-MS dataset repositories (MetaboLights and NIH Metabolomics Workbench) but did not find any pairs of comparable animal studies due to the reasons mentioned above. One potential strategy to navigate this issue could entail regressing the metabolic intensities against the variables that notably differ between the two animal populations and running GM using the residual intensities. This would be an interesting direction for future research and additional validation would be needed to test the robustness of GM in this setting.

      Reviewer #2 (Public Review):

      Summary:

      The goal of untargeted metabolomics is to identify differences between metabolomes of different biological samples. Untargeted metabolomics identifies features with specific mass-to-charge ratio (m/z) and retention time (RT). Matching those to specific metabolites based on the model compounds from databases is laborious and not always possible, which is why methods for comparing samples on the level of unmatched features are crucial.

      The main purpose of the GromovMatcher method presented here is to merge and compare untargeted metabolomes from different experiments. These larger datasets could then be used to advance biological analyses, for example, for the identification of metabolic disease markers. The main problem that complicates merging different experiments is m/z and RT vary slightly for the same feature (metabolite).

      The main idea behind the GromovMatcher is built on the assumption that if two features match between two datasets (that feature I from dataset 1 matches feature j from dataset 2, and feature k from dataset 1 matches feature l from dataset 2), then the correlations or distances between the two features within each of the datasets (i and k, and j and l) will be similar. The authors then use the Gromov-Wasserstein method to find the best matches matrix from these data.

      The variation in m/z between the same features in different experiments is a user-defined value and it is initially set to 0.01 ppm. There is no clear limit for RT deviations, so the method estimates a non-linear deviation (drift) of RT between two studies. GromovMatcher estimates the drift between the two studies and then discards the matching pairs where the drift would deviate significantly from the estimate. It learns the drift from a weighted spline regression.

      The authors validate the’performance of their GromovMatcher method by a validation experiment using a dataset of cord blood. They use 20 different splits and compare the GromovMatcher (both its GM and GMT iterations, whereby the GMT version uses the deviation from estimated RT drift to filter the matching matrix) with two other matching methods: M2S and metabCombiner.

      The second validation was done using a (scaled and centered) dataset of metabolics from cancer datasets from the EPIC cohort that was manually matched by an expert. This dataset was also used to show that using automatic methods can identify more features that are associated with a particular group of samples than what was found by manual matching. Specifically, the authors identify additional features connected to alcohol consumption.

      Strengths:

      I see the main strength of this work in its combination of all levels of information (m/z, RT, and higher-order information on correlations between features) and using each of the types of information in a way that is appropriate for the measure. The most innovative aspect is using the Gromov-Wasserstein method to match the features based on distance matrices.

      We thank the reviewer for acknowledging this strength of our proposed GromovMatcher method.

      The authors of the paper identify two main shortcomings with previously established methods that attempt to match features from different experiments: a) all other methods require fine-tuning of user-defined parameters, and, more importantly, b) do not consider correlations between features. The main strength of the GromovMatcher is that it incorporates the information on distances between the features (in addition to also using m/z and RT).

      Weaknesses:

      The first, minor, weakness I could identify is that there seem not to be plenty of manually curated datasets that could be used for validation.

      We thank the reviewer for raising this issue concerning manually curated validation data.

      Manually curated datasets available for validation purposes are indeed scarce. This stems from the laborious nature of matching features across diverse studies, hence the need for automatic matching methods. Our future strategy involves further validation of the GromovMatcher approach as more data becomes accessible in EPIC and other cohorts.

      The scarcity of real-life publicly available datasets that can be used for validation purpose is the reason why we conducted an extensive simulation study (main text “Validation on ground-truth data” and Appendix 3). It is notably thorough, arguably more comprehensive than previous validations, utilizes real-life untargeted data, and imitates situations where data originates from distinct untargeted metabolomics studies, complete with realistic noise parameters encompassing RT, mz, and feature intensities. Our validation study comprehensively explores the performance of GromovMatcher, M2S, and metabCombiner, including in challenging realistic settings where there is a nonlinear drift in retention times, varying levels of feature overlaps between studies, normalizations of feature intensities, as well as imbalances in the number of features and samples present in the studies being matched.

      The second is also emphasized by the authors in the discussion. Namely, the method as it is set up now can be directly used only to compare two datasets.

      This is indeed a limitation that is common to all three methods considered in this paper. However, all these methods, GromovMatcher, M2S, and metabCombiner, can still be used to compare and pool multiple datasets using a multi-step procedure. Namely, this can be done by designating a 'reference' dataset and aligning all studies to it one by one. We take this exact approach in our paper when aligning the CS, HCC, and PC studies of the EPIC data in positive mode (main text “Application to EPIC data”). Namely, the HCC and PC studies are both aligned to the CS study by running GromovMatcher twice, and after obtaining these matchings, our analysis is restricted to those features in HCC and PC that are present in the CS study.

      After the reviewer’s comment, we have added an additional sensitivity analysis in Appendix 5, to compare the results produced by GromovMatcher depending on the choice of the reference study. Namely, setting the reference study to either the CS study or the HCC study, GromovMatcher identified 706 and 708 common features respectively, with an overlap of 640 features. This highlights that the choice of reference does matter to some extent. In our original analysis of the EPIC data, choosing CS as the reference was motivated by the fact that CS had the largest sample size (compared to HCC and PC) and a subset of features in HCC and PC were already matched by experts to the CS study which we could use for validation (see Loftfield et al. (2021). J Natl Cancer Inst.).

      As mentioned in the discussion section of our manuscript, the recently proposed multimarginal Gromov-Wasserstein algorithm (Beier, F., Beinert, R., & Steidl, G. (2023). Information and Inference) could potentially allow multiple metabolomic studies to be matched using one optimization routine (e.g. without the designation of a ‘reference study’ for matching). We have not explored this possibility in depth yet as fast numerical methods for multimarginal GW are still in their infancy. Also, such multimarginal methods rely on the computation and storage of coupling or matching matrices that are tensors where the number of dimensions is equal to the number of datasets being matched. Therefore, multimarginal methods have large memory costs, which currently precludes their application for the matching of multiple metabolomics datasets.

      Reviewer #2 (Recommendations For The Authors):

      (1) I was struggling with the representation used in Figure 3a. The gray points overlayed over the green points on a straight line are difficult to visually quantify. I found that my eyes mainly focused on the pattern of the red dots.

      Figure 3a has been modified to improve visual clarity. Namely we have consistently reordered the rows and columns of the coupling matrices such that the true positive matches (green points) are spatially separated from the false negative matches (red points). Now the fraction of true positive and false negative matches can be appreciated much more clearly by eye in Figure 3a.

      (2) I would also like to add the caveat that I cannot judge whether the authors used the other two methods that they compare with GromovMatcher (the M2S and metabCombiner) optimally. But I also do not see any evidence that they did not. Hopefully one of the other reviewers can address that.

      We appreciate the reviewer for highlighting the comparison of our approach GromovMatcher to the other existing methods M2S and MetabCombiner (mC). Both M2S and mC depend on tens of hyperparameters each with a discrete or continuous set of values that must be properly optimized to infer accurate matchings between dataset features. We detail in Appendix 2 how the hyperparameters of the M2S and mC methods are optimally tuned to achieve the best possible performance on the validation ground-truth data. Namely, both in the simulation study and on EPIC data, we grid-search over all important hyperparameters in the M2S and mC methods and choose those parameter combinations that result in the highest F1 score, averaged over 20 random trials. We remark that no such hyperparameter optimization was performed for our GromovMatcher method. As shown in Figures 3 and 4 of the main text, we find that GromovMatcher outperforms M2S and mC even in these cases when the hyperparameters of M2S and mC are tuned to predict optimal feature matchings.

      Given the large combinatorial space of hyperparameter choices, we believe we have thoroughly tested the important hyperparameter combinations that users of M2S and mC would be likely to explore in their own research.

      (3) Validation

      (3a) The first validation is done on a split cord blood dataset. I could not clearly see from the paper how sensitive the result is to the dataset split.

      We are grateful for the reviewer’s question and have included new experiments in Appendix 3 which show how the results of GromovMatcher, M2S, and MetabCombiner are affected by the dataset split. In our original manuscript, our validation ground-truth experiment began with an untargeted metabolomic dataset consisting of n = 499 samples and p = 4,712 metabolic features which is split equally into two datasets consisting of an equal number of samples n1 = n2 and an equal number of metabolic features p1 = p2. The features of these equal-sized datasets would then be matched by our method.

      Now in Appendix 3 (Figs. 1-3) we show the sensitivity of all three alignment methods (GromovMatcher, M2S, and MetabCombiner) when we vary the fraction of samples in dataset 1 over dataset 2 given by n1/ n2, the overlap in shared features between both datasets, and the fraction of metabolic features in dataset 1 that are not present in dataset 2 which affects the feature sizes of both datasets p1/ p2. We find that all alignment methods are able to maintain a consistent precision and recall score when these three dataset split parameters are varied. GromovMatcher achieves a higher precision and recall than M2S and MetabCombiner for all choices of dataset split, agreeing with the validation experiment results from the main text (see main text Fig. 3). All three methods tested decrease in precision (without dropping in recall) when dataset 1 and dataset 2 contain an equal number of unshared features (e.g. when p1 = p2). Therefore, these sensitivity experiments in Appendix 3 show that our results in the main text are performed in the most challenging setting for the dataset split.

      (3b) The second validation was done using a (scaled and centered) dataset of metabolics from cancer datasets from the EPIC cohort that was manually matched by an expert. Here the authors observe that metabCombiner has good precision, but lags in recall. And M2S has a very similar performance to GromovMatcher. The authors explain this by the fact that the drift in RT between the two experiments is mostly linear and thus does not affect the M2S performance. Can the authors find a different validation dataset where the drift in RT is not linear? If yes, it would be interesting to add it to the paper.

      We thank the reviewer for raising this question. As mentioned above, curated validation datasets such as the EPIC study analyzed in our paper are very rare and we do not currently have a validation study with a nonlinear retention time drift.

      Nevertheless, we performed an additional analysis of simulated data (reported in Appendix 2 – “M2S hyperparameter experiments” and Appendix 2 – Table 1) that demonstrates the decrease in M2S performance when the simulated drift is nonlinear. As presented in Appendix 2 – Table 1, in a low overlap setting with a linear drift which corresponds to the EPIC data, precision and recall were 0.831 and 0.934 respectively, instead of 0.769 and 0.905 in the main analysis where the drift was nonlinear.

    1. Author Response

      The following is the authors’ response to the previous reviews.

      eLife assessment

      This important study reports a novel mechanism linking DHODH inhibition-mediated pyrimidine nucleotide depletion to antigen presentation. Alternative means of inducing antigen presentation provide therapeutic opportunities to augment immune checkpoint blockade for cancer treatment. While the solid mechanistic data in vitro are compelling, in vivo assessments of the functional relevance of this mechanism are still incomplete.

      Public Reviews:

      We thank all Reviewers for their insightful comments and excellent suggestions.

      Reviewer #1 (Public Review):

      The manuscript by Mullen et al. investigated the gene expression changes in cancer cells treated with the DHODH inhibitor brequinar (BQ), to explore the therapeutic vulnerabilities induced by DHODH inhibition. The study found that BQ treatment causes upregulation of antigen presentation pathway (APP) genes and cell surface MHC class I expression, mechanistically which is mediated by the CDK9/PTEFb pathway triggered by pyrimidine nucleotide depletion.

      No comment from authors

      The combination of BQ and immune checkpoint therapy demonstrated a synergistic (or additive) anti-cancer effect against xenografted melanoma, suggesting the potential use of BQ and immune checkpoint blockade as a combination therapy in clinical therapeutics.

      No comment from authors

      The interesting findings in the present study include demonstrating a novel cellular response in cancer cells induced by DHODH inhibition. However, whether the increased antigen presentation by DHODH inhibition actually contributed to the potentiation of the efficacy of immune-check blockade (ICB) is not directly examined is the limitation of the study.

      No comment from authors for preceding text, comment addresses the following text

      Moreover, the mechanism of the increased antigen presentation pathway by pyrimidine depletion mediated by CDK9/PTEFb was not validated by genetic KD or KO targeting by CDK9/PTEFb pathways.

      We appreciate this comment, and we would like to explain why we did not pursue these approaches. According to DepMap, CRISPR/Cas9-mediated knockout of CDK9 in cancer cell lines is almost universally deleterious, scoring as “essential” in 99.8% (1093/1095) of all cell lines tested (see Author response image 1 below). This makes sense, as P-TEFb is required for productive RNA polymerase II elongation of most mammalian genes. As such, it was not feasible to generate cell lines with stable genetic knockout of CDK9 to test our hypothesis.

      While knockdown of CDK9 by RNA interference could support our results, DepMap data seems to indicate that RNAi-mediated knockdown of CDK9 is generally ineffective in silencing its activity, as this perturbation scored as “essential” in only 6.2% (44/710) of tested cell lines. This suggests that incomplete depletion of CDK9 will likely not be sufficient to block APP induction downstream of nucleotide depletion. Furthermore, RNAi-mediated depletion of CDK9 may trigger transcriptional changes in the cell by virtue of its many documented protein-protein interactions, and it would be difficult to establish a consistent “time zero” at which point CDK9 protein depletion is substantial but secondary effects of this have not yet occurred to a significant degree. These factors constitute major limitations of experiments using RNAi-mediated knockdown of CDK9.

      Author response image 1.

      Essentiality score from CRISPR and RNAi perturbation of CDK9 in cancer cell lines https://depmap.org/portal/gene/CDK9?tab=overview&dependency=RNAi_merged

      At any rate, we provide evidence that three different inhibitors of CDK9 (flavopiridol, dinaciclib, and AT7519) all inhibit our effect of interest (Fig 4B). The same results were observed using a previously validated CDK9-directed proteolysis targeting chimera (PROTAC2), and this was reversed by addition of excess pomalidomide (Fig 4C), which correlated with the presence/absence of CDK9 on western blot under the exact same conditions (Fig 4D).

      It is formally possible that all CDK9 inhibitors we tested are blocking BQ-mediated APP induction by some shared off-target mechanism (or perhaps by two or more different off-target mechanisms) AND this CDK9-independent target also happens to be degraded by PROTAC2. However, this would be an extraordinarily non-parsimonious explanation for our results, and so we contend that we have provided compelling evidence for the requirement of CDK9 for BQ-mediated APP induction.

      Finally, high concentrations of BQ have been reported to show off-target effects, sensitizing cancer cells to ferroptosis, and the authors should discuss whether the dose used in the in vivo study reached the ferroptotic sensitizing dose or not.

      We are intrigued by the results shown to us by Reviewer #1 in the linked preprint (Mishima et al 2022, https://doi.org/10.21203/rs.3.rs-2190326/v1). We have also observed in our unpublished data that very high concentrations of BQ (>150µM) cause loss of cell viability that is not rescued by uridine supplementation and that occurs even in DHODH knockout cells. This effect of high-dose BQ must be DHODH-independent. We also agree that Mishima et al provide compelling evidence that the ferroptosis-sensitizing effect of high-dose BQ treatment is due (at least in large part) to inhibition of FSP1.

      Although we showed that DHODH is strongly inhibited in tumor cells in vivo (Fig 5C), we did not directly measure the concentration of BQ in the tumor or plasma. Sykes et al (PMID: 27641501) found that the maximum plasma concentration (Cmax) for [BQ]free following a single IP administration in C57Bl6/J mice (15mg/kg) is approximately 3µM, while the Cmax for [BQ]total was around 215µM. Because polar drug molecules bound to serum proteins (predominantly albumin) are not available to bind other targets, [BQ]free is the relevant parameter.

      Given a Cmax for [BQ]free of 3µM and half-life of 12.0 hours, we estimate that the steady-state [BQ]free with daily IP injections at this dose is around 4µM. Since we used an administration schedule of 10mg/kg every 24 hours, we estimate that the steady-state plasma [BQ]free in our system was 2.67µM (assuming initial Cmax of 2µM and half-life of 12.0 hours).

      To derive an upper-bound estimate for the Cmax of [BQ]free over the 12-day treatment period (Fig 5A-D), we will use the observed data for 15mg/kg dose, and we will assume that 1) there is no clearance of BQ whatsoever and 2) that [BQ]free increases linearly with increasing [BQ]total. This yields a maximum free BQ concentration of 12 x 3 = 36µM.

      Therefore, we consider it very unlikely that plasma concentrations of free BQ in our experiment exceeded the lower limit of the ferroptosis-sensitizing dose range reported by Mishima et al. However, without direct pharmacokinetic analysis, we cannot say for sure what the maximal [BQ]free was under our experimental conditions.

      Reviewer #2 (Public Review):

      In their manuscript entitled "DHODH inhibition enhances the efficacy of immune checkpoint blockade by increasing cancer cell antigen presentation", Mullen et al. describe an interesting mechanism of inducing antigen presentation. The manuscript includes a series of experiments that demonstrate that blockade of pyrimidine synthesis with DHODH inhibitors (i.e. brequinar (BQ)) stimulates the expression of genes involved in antigen presentation. The authors provide evidence that BQ mediated induction of MHC is independent of interferon signaling. A subsequent targeted chemical screen yielded evidence that CDK9 is the critical downstream mediator that induces RNA Pol II pause release on antigen presentation genes to increase expression. Finally, the authors demonstrate that BQ elicits strong anti-tumor activity in vivo in syngeneic models, and that combination of BQ with immune checkpoint blockade (ICB) results in significant lifespan extension in the B16-F10 melanoma model. Overall, the manuscript uncovers an interesting and unexpected mechanism that influences antigen presentation and provides an avenue for pharmacological manipulation of MHC genes, which is therapeutically relevant in many cancers. However, a few key experiments are needed to ensure that the proposed mechanism is indeed functional in vivo.

      The combination of DHODH inhibition with ICB reflects more of an additive response instead of a synergistic combination. Moreover, the temporal separation of BQ and ICB raises the question of whether the induction of antigen presentation with BQ is persistent during the course of delayed ICB treatment. To confidently conclude that induction of antigen presentation is a fundamental component of the in vivo response to DHODH inhibition, the authors should examine whether depletion of immune cells can reduce the therapeutic efficacy of BQ in vivo.

      We concur with this assessment.

      Moreover, they should examine whether BQ treatment induces antigen presentation in non-malignant cells and APCs to determine the cancer specificity.

      Although we showed that this occurs in HEK-293T cells, we appreciate that this cell line is not representative of human cells of any organ system in vivo. So, we agree it is important to determine if DHODH inhibition induces antigen presentation in human tissues and professional antigen presenting cells, and this is an excellent focus for future studies.

      However, it should also be noted that increased antigen presentation in non-malignant host tissues would not be expected to generate an autoimmune response, because host tissues likely lack strong neoantigens, and whatever immunogenic peptides they may have would likely be presented via MHC-I at baseline (i.e. even in the absence of DHODH inhibitor treatment), since all nucleated cells express MHC-I.

      This argument is strongly supported by clinical experience/data, as DHODH inhibitors (leflunomide and teriflunomide) are commonly used to treat rheumatoid arthritis and multiple sclerosis. While the pathophysiology of these autoimmune syndromes is complex, it is thought that both diseases are driven by aberrant T-cell attack on host tissues, mediated by incorrect recognition of host antigens presented via MHC-I (as well as MHC-II) as “foreign.”

      If increased antigen presentation in host tissues (downstream of DHODH inhibition) could lead to a de novo autoimmune response, then administration of DHODH inhibitors would be expected to exacerbate T-cell driven autoimmune disease rather than ameliorate it. Randomized controlled trials have consistently found that treatment with DHODH inhibitors leads to improvement of rheumatoid arthritis and multiple sclerosis symptoms, which is the opposite of what one would expect if DHODH inhibitors are causing de novo autoimmune reactions in human patients.

      Finally, although the authors show that DHODH inhibition induces expression of both MHC-I and MHC-II genes at the RNA level, only MHC-I is validated by flow cytometry given the importance of MHC-II expression on epithelial cancers, including melanoma, MHC-II should be validated as well.

      We fully agree with this statement. We attempted to quantify cell surface MHC-II expression by FACS using the same method as for MHC-I (Figs 1G-H, 2D, and 3F). We did not detect cell surface MHC-II in any of our cancer cell lines, despite the use of high-dose interferon gamma and other stimulants (which robustly increase MHC-II mRNA in our system) in an attempt to induce expression. However, because we did not use cells known to express MHC-II as a positive control (e.g. B-cell leukemia cell lines or primary splenocytes), we do not know if our results are due to some technical failure (perhaps related to our protocol/reagents) or if they reflect a true absence of cell surface MHC-II in our cell lines.

      If the latter is true, that implies that either 1) MHC-II mRNA is not translated or 2) that it is translated, but our cancer cell lines lack one or more elements of the machinery required for MHC-II antigen presentation.

      In any case, it is important to determine if DHODH inhibition increases MHC-II at the cell surface of cancer cells using appropriate positive and negative controls, as this could have important implications for cancer immunotherapy.

      [As a minor point, melanoma is not an epithelial cancer, as it is derived from neural crest lineage cells (melanocytes)]

      Overall, the paper is clearly written and presented. With the additional experiments described above, especially in vivo, this manuscript would provide a strong contribution to the field of antigen presentation in cancer. The distinct mechanisms by which DHODH inhibition induces antigen presentation will also set the stage for future exploration into alternative methods of antigen induction.

      Reviewer #3 (Public Review):

      Mullen et al present an important study describing how DHODH inhibition enhances efficacy of immune checkpoint blockade by increasing cell surface expression of MHC I in cancer cells. DHODH inhibitors have been used in the clinic for many years to treat patients with rheumatoid arthritis and there has been a growing interest in repurposing these inhibitors as anti-cancer drugs. In this manuscript, the Singh group build on their previous work defining combinatorial strategies with DHODH inhibitors to improve efficacy. The authors identify an increase in expression of genes involved in the antigen presentation pathway and MHC I after BQ treatment and they narrow the mechanism to be strictly pyrimidine and CDK9/P-TEFb dependent. The authors rationalize that increased MHC I expression induced by DHODH inhibition might favor efficacy of dual immune checkpoint blockade. This combinatorial treatment prolonged survival in an immunocompetent B16F10 melanoma model.

      [No comment from authors]

      Previous studies have shown that DHODH inhibitors can increase expression of innate immunity-related genes but the role of DHODH and pyrimidine nucleotides in antigen presentation has not been previously reported. A strength of the manuscript is the use of multiple controls across a panel of cell lines to exclude off-target effects and to confirm that effects are exclusively dependent on pyrimidine depletion. Overall, the authors do a thorough characterization of the mechanism that mediates MHC I upregulation using multiple strategies. Furthermore, the in vivo studies provide solid evidence for combining DHODH inhibitors with immune checkpoint blockade.

      No comment from authors

      However, despite the use of multiple cell lines, most experiments are only performed in one cell line, and it is hard to understand why particular gene sets, cell lines or time points are selected for each experiment. It would be beneficial to standardize experimental conditions and confirm the most relevant findings in multiple cell lines.

      We appreciate this comment, and we understand how the use of various cell lines may seem puzzling. We would like to explain how our cell line panel evolved over the course of the study. Our first indication that BQ caused APP upregulation came from transcriptomics experiments (Figs 1A-D, S1A) performed as part of a previous study investigating BQ resistance (Mullen et al, 2023 Cancer Letters). In that study, we used CFPAC-1 as a model for BQ sensitivity and S2-013 as a model for BQ resistance. We did RNA sequencing +/- BQ in these cell lines to look for gene expression patterns that might underlie resistance/sensitivity to BQ. When analyzing this data, we serendipitously discovered the APP/MHC phenomenon, which gave rise to the present study.

      Our next step was to extend these findings to cancer cell lines of other histologies, and we prioritized cell lines derived from common cancer types for which immunotherapy (specifically ICB) are clinically approved. This is why A549 (lung adenocarcinoma), HCT116 (colorectal adenocarcinoma), A375 (cutaneous melanoma), and MDA-MB-231 (triple-negative breast cancer) cell lines were introduced.

      Because PDAC is considered to have an especially “immune-cold” tumor microenvironment, we reasoned that even dramatically increasing cancer cell antigen presentation may be insufficient to elicit an effective anti-tumor immune response in vivo. So we shifted our focus towards melanoma, because a subset of melanoma patients is very responsive to ICB and loss of antigen presentation (by direct silencing or homozygous loss-of-function mutations in MHC-I components such as B2M, or by functional loss of IFN-JAK1/2-STAT signaling) has been shown to mediate ICB resistance in human melanoma patients. This is why we extended our findings to B16F10 murine melanoma cells, intending to use them for in vivo studies with syngeneic immunocompetent recipient mice.

      The PDAC cell line MiaPaCa2 was introduced because a collaborator at our institution (Amar Natarajan) happened to have IKK2 knockout MiaPaCa2 cells, which allowed us to genetically validate our inhibitor results showing that IKK1 and IKK2 (crucial effectors for NF-kB signaling) are dispensable for our effect of interest.

      Ultimately, realizing that our results spanned various human and murine cell lines, we chose to use HEK-293T cells to validate the general applicability of our findings to proliferating cells in 2D culture, since HEK-293T cells (compared to our cancer cell lines) have relatively few genetic idiosyncrasies and express MHC-I at baseline.

      The differential in vivo survival depending on dosing schedule is interesting. However, this section could be strengthened with a more thorough evaluation of the tumors at endpoint.

      Overall, this is an interesting manuscript proposing a mechanistic link between pyrimidine depletion and MHC I expression and a novel therapeutic strategy combining DHODH inhibitors with dual checkpoint blockade. These results might be relevant for the clinical development of DHODH inhibitors in the treatment of solid tumors, a setting where these inhibitors have not shown optimal efficacy yet.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) The main issue is that it did not directly examine whether the increased antigen presentation by DHODH inhibition contributed to the potentiation of the efficacy of immune-check blockade (ICB). The additional effect of BQ in the xenograft tumor study was not examined to determine if it was due to increased antigen presentation toward the cancer cells or due to merely cell cycle arrest effect by pyrimidine depletion in the tumor cells. The different administration timing of ICB with BQ treatment (Fig 5E) would not be sufficient to answer this issue.

      We agree with this assessment and, and we believe the experiment proposed by Reviewer #2 below (comparing the efficacy of BQ in Rag-null versus immunocompetent recipients) would address this question directly. We also think that using a more immunogenic cell line for this experiment (such as B16F10 transduced with ovalbumin or some other strong neoantigen) would be useful given the poor immunogenicity and lack of any defined strong neoantigen in B16F10 cells. An orthogonal approach would be to engraft cancer cells with or without B2M knockout into immunocompetent recipient mice (+/- BQ treatment) to further implicate MHC-I and antigen presentation. These questions will be addressed in future studies.

      (2) Additionally, in the in vivo study, the increase in surface MHC1 in the protein level in by BQ treatment was not examined in the tumor samples, and it was not confirmed whether increased antigen presentation by BQ treatment actually promoted an anti-cancer immune response in immune cells. To support the story presented in the study, these data would be necessary.

      We attempted to show this by immunohistochemistry, but unfortunately the anti-H2-Db antibody that we obtained for this purpose did not have satisfactory performance to assess this in our tissue samples harvested at necropsy.

      (3) The mechanism of the increased antigen presentation pathway by pyrimidine depletion mediated by CDK9/PTEFb was not validated by genetic KD or KO targeting by CDK9/PTEFb pathways. In general, results only by the inhibitor assay have a limitation of off-target effects.

      Please see our above reply to Reviewer #1 comment making this same point, where we spell out our rationale for not pursuing these experiments.

      (4) High concentrations of BQ (> 50 uM) have been reported to show off-target effects, sensitizing cancer cells to ferroptosis, an iron-mediated lipid peroxidation-dependent cell death, independent of DHODH inhibition (https://www.researchsquare.com/article/rs-2190326/v1). It would be needed to discuss whether the dose used in the in vivo study reached the ferroptotic sensitizing dose or not.

      Please see our above reply to Reviewer #1 comment making this same point, where we explain why we are very confident that the BQ dose administered in our animal experiments was far below the minimum reported BQ dose required to sensitize cancer cells to ferroptosis in vitro.

      Reviewer #2 (Recommendations For The Authors):

      Major Points

      (1) According to the proposed model, BQ mediated induction of antigen presentation is a contributing factor to the efficacy of this therapeutic strategy. If this is true, then depletion of immune cells should reduce the therapeutic efficacy of BQ in vivo. The authors should perform the B16-F10 transplant experiments in either Rag null mice (if available) or with CD8/CD4 depletion. The expectation would be that T cell depletion (or MHC loss with genetic manipulation) should reduce the efficacy of BQ treatment. Absent this critical experiment, it is difficult to confidently conclude that induction of antigen presentation is a fundamental component of the in vivo response to DHODH inhibition.

      We agree with this assessment and the proposed experiment comparing the response in Rag-null versus immunocompetent recipients. We also think that using a more immunogenic cell line for this experiment (such as B16F10 transduced with ovalbumin or some other strong neoantigen) would be useful given the poor immunogenicity and lack of any defined strong neoantigen in B16F10 cells. An orthogonal approach would be to engraft cancer cells with or without B2M knockout into immunocompetent recipient mice (+/- BQ treatment) to further implicate MHC-I and antigen presentation. These questions will be addressed in future studies.

      (2) Does BQ treatment induce antigen presentation in non-malignant cells? APCs? If the induction of antigen presentation is not cancer specific and related to a pyrimidine depletion stress response, then there is a possibility that healthy tissues will also exhibit a similar phenotype, raising concerns about the specificity of a de novo immune response. The authors should examine antigen presentation genes in healthy tissues treated with BQ.

      We agree it is important to examine if our findings regarding nucleotide depletion and antigen presentation are true of APCs and other non-transformed cells, but we are not so concerned about the possibility of raising an immune response against non-malignant host tissues, as explained above. We have reproduced the relevant section below:

      “However, it should also be noted that increased antigen presentation in non-malignant host tissues would not be expected to generate an autoimmune response, because host tissues likely lack strong neoantigens, and whatever immunogenic peptides they may have would likely be presented via MHC-I at baseline, since all nucleated cells express MHC-I.

      This argument is strongly supported by clinical experience/data, as DHODH inhibitors (leflunomide and teriflunomide) are commonly used to treat rheumatoid arthritis and multiple sclerosis. While the pathophysiology of these autoimmune syndromes is complex, it is thought that both diseases are driven by aberrant T-cell attack on host tissues, mediated by incorrect recognition of host antigens presented via MHC-I (as well as MHC-II) as “foreign.”

      If increased antigen presentation in host tissues (downstream of DHODH inhibition) could lead to a de novo autoimmune response, then administration of DHODH inhibitors would be expected to exacerbate T-cell driven autoimmune disease rather than ameliorate it. Randomized controlled trials have consistently found that treatment with DHODH inhibitors leads to improvement of rheumatoid arthritis and multiple sclerosis symptoms, which is the opposite of what one would expect if DHODH inhibitors are causing de novo autoimmune reactions in human patients.”

      (3) In the title, the authors claim that DHODH enhances the efficacy of ICB. However, the experiment shown in Figure 5D does not demonstrate this. The Kaplan Meier curves reflect more of an additive response versus a synergistic combination. Furthermore, the concurrent treatment of BQ and ICB seems to inhibit the efficacy of ICB due to BQ toxicity in immune cells. This result seems to contradict the title.

      We do not agree with this assessment. Given that the effect of dual ICB alone was very marginal, while the effect of BQ monotherapy was quite marked, we cannot conclude from Fig 5 that BQ treatment inhibited ICB efficacy due to immune suppression.

      (4) Related to Point 3, the temporal separation of BQ and ICB raises the question of whether the induction of antigen presentation with BQ is persistent during the course of delayed ICB treatment. One explanation for the results is that BQ treatment reduces tumor burden, and then a subsequent course of ICB also reduces tumor burden but not that the two therapies are functioning in synergy. To address this, the authors should measure the duration of BQ mediated induction of antigen presentation after stopping treatment.

      We agree that the alternative explanation proposed by Reviewer #2 is possible and we appreciate the suggestion to test the stability of APP induction after stopping BQ treatment.

      (5) In Figure 1, the authors show that DHODH inhibition induces expression of both MHC-I and MHC-II genes at the RNA level. However, they only validate MHC-I by flow cytometry. A simple experiment to evaluate the effect of BQ treatment on MHC-II surface expression would provide important additional mechanistic insight into the immunomodulatory effects of DHODH inhibition, especially given recent literature reinforcing the importance of MHC-II expression on epithelial cancers, including melanoma (Oliveira et al. Nature 2022).

      We fully agree with this statement. We attempted to quantify cell surface MHC-II expression by FACS using the same method as for MHC-I (Figs 1G-H, 2D, and 3F). We did not detect cell surface MHC-II in any of our cancer cell lines, despite the use of high-dose interferon gamma and other stimulants (which robustly increase MHC-II mRNA in our system) in an attempt to induce expression. However, because we did not use cells known to express MHC-II as a positive control (e.g. B-cell leukemia cell lines or primary splenocytes), we do not know if our results are due to some technical failure (perhaps related to our protocol/reagents) or if they reflect a true absence of cell surface MHC-II in our cell lines.

      If the latter is true, that implies that either 1) MHC-II mRNA is not translated or 2) that it is translated, but our cancer cell lines lack one or more elements of the machinery required for MHC-II antigen presentation.

      In any case, it is important to determine if DHODH inhibition increases MHC-II at the cell surface of cancer cells using appropriate positive and negative controls, as this could have important implications for cancer immunotherapy.

      [As a minor point, melanoma is not an epithelial cancer, as it is derived from neural crest lineage cells (melanocytes)]

      Minor Points

      (1) The authors show ChIP-seq tracks from Tan et al. for HLA-B. However, given the pervasive effect of Ter treatment across many HLA genes, the authors should either show tracks at additional loci, or provide a heatmap of read density across more loci. This would substantiate the mechanistic claim that RNA Pol II occupancy and activity across antigen presentation genes is the major driver of response to DHODH inhibition as opposed to mRNA stabilization/increased translation.

      We appreciate this suggestion. We have changed Fig 4 by replacing the HLA-B track (old Fig 4E) with a representation of fold change (Ter/DMSO) in Pol II occupancy versus fold change (Ter/DMSO) in mRNA abundance for 23 relevant genes (new Fig 4G); both of these datasets were obtained from the Tan et al manuscript. This new figure panel (Fig 4G) also shows linear regression analysis demonstrating that Pol II occupancy and mRNA expression are significantly correlated for APP genes. While we recognize that this data in itself is not formal proof of our hypothesis, it does strongly support the notion that increased transcription is responsible for the increased mRNA abundance of APP genes that we have observed.

      (2) A compelling way to demonstrate a change in antigen presentation is through mass spectrometry based immunopeptidomics. Performing immunopeptidomic analysis of BQ treated cell lines would provide substantial mechanistic insight into the outcome of BQ treatment. While this approach may be outside the scope of the current work, the authors should speculate on how this treatment may specifically alter the antigenic landscape where future directions would include empirical immunopeptidomics measurements.

      We fully agree with this comment. While the abundance of cancer cell surface MHC-I is an important factor for anticancer immunity, another crucial factor is the identity of peptides that are presented. Treatments that cause presentation of more immunogenic peptides can enhance T-cell recognition even in the absence of a relative change in cell surface MHC-I abundance.

      While we did not perform the immunopeptidomics experiments described, we can offer some speculation regarding this comment. As shown in Fig 1D-E, transcriptomics experiments suggest that immunoproteasome subunits (PSMB8, PSMB9, PSMB10) are upregulated upon DHODH inhibition. If this change in mRNA levels translates into greater immunoproteasome activity (which was not tested in our study), this would be expected to alter the repertoire of peptides available for presentation and could thereby change the immunopeptidome.

      However, this hypothesis requires direct testing, and we hope future studies will delineate the effects of DHODH inhibition and other cancer therapies on the immunopeptidome, as this area of research will have important clinical implications.

      (3) While the signaling through CDK9 seems convincing, it still does not provide a mechanistic link between depleted pyrimidines and CDK9 activity. The authors should speculate on the mechanism that signals to CDK9.

      We agree with the assessment. A mechanistic link between depleted pyrimidines and CDK9 activity will be a subject of future studies.

      (4) Related to minor point 2, the authors should consider a genetic approach to confirm the importance of CDK9. While the pharmacological approach, including multiple mechanistically distinct CDK9 inhibitors provides strong evidence, an additional experiment with genetic depletion of CDK9 (CRISPR KO, shRNA, etc) would provide compelling mechanistic confirmation.

      Reviewer #1 raised this very same point, and we agree. Please see our reply to Reviewer #1, which details why we did not pursue this approach and argues that the evidence we present is compelling even in absence of genetic manipulation.

      Additionally, please see the new Fig 4E and 4F, which is a repeat of Fig 4B using HCT116 cells. Figure 4E shows that, in this cell line, CDK9 inhibitors (flavopiridol, dinaciclib, and AT7519) block BQ-mediated APP induction, while PROTAC2 does not. Figure 4F shows that (for reasons we cannot fully explain) PROTAC2 does not lead to CDK9 degradation in HCT116 cells. This data strongly implicates CDK9, because it excludes a CDK9-degradation-independent effect of PROTAC2.

      (5) Figure 2B needs a legend.

      Thank you for pointing this out. We have added a legend to Fig 2B.

      (6) The authors should comment in the discussion on how this strategy may be particularly useful in patients harboring genetic or epigenetic loss of interferon signaling, a known mechanism of ICB resistance. Perhaps DHODH inhibition could rescue MHC expression in cells that are deficient in interferon sensing.

      Thank you for this suggestion! We have amended the Discussion section to mention this important point. Please see paragraph 2 of the revised Discussion section where we have added the following text:

      “Because BQ-mediated APP induction does not require interferon signaling, this strategy may have particular relevance for clinical scenarios in which tumor antigen presentation is dampened by the loss or silencing of cancer cell interferon signaling, which has been demonstrated to confer both intrinsic and acquired ICB resistance in human melanoma patients.”

      Reviewer #3 (Recommendations For The Authors):

      The authors present convincing evidence of the mechanism by which pyrimidine nucleotides regulate MHC I levels and about the potential of combining DHODH inhibitors with dual immune checkpoint blockade (ICB). This is an interesting paper given the clinical relevance of DHODH inhibitors. The studies raise some questions, and some points might need clarifying as below:

      • In Figure 2C, why do the authors focus on these two genes in the uridine rescue? These are important genes mediating antigen presentation, but it might be more interesting to see how H2-Db and H2-Kb expression correlate with the protein data shown in Fig 2D. Fig. 2C-2D is a relevant control, so it would be important to validate in a different cancer cell line (e.g. one of the PDAC cell lines used for the RNAseq).

      We appreciate this comment. Although Fig 3C shows that BQ-induced expression of H2-Db, H2-Kb, and B2m is reversed by uridine (in B16F10 cells), we recognize that this was not the best placement for this data, as it can easily be overlooked here since uridine reversal is not the main point of Fig 3C. We have left Fig 3C as is, because we think that the uridine reversal demonstrated in that panel serves as a good internal positive control for reversal of BQ-mediated APP induction in that experiment.

      We have repeated the experiments shown in the original Fig 2C and substituted the original Fig 2C with a new Fig 2C and Fig S2B, which show both Tap1 and Nlrc5 as well as H2-Db, H2-Kb, and B2m after treatment with either BQ (new Fig 2C) or teriflunomide (new Fig S2B). The original Fig S2B is now Fig S2C, and it shows that uridine has no effect on the expression of any of the genes assayed in the new Fig 2C or S2B.

      The reversibility of cell surface MHC-I induction was also validated in HCT116 cells (Fig 3F). We included the uridine reversal in Fig 3F to avoid duplicating the control and BQ FACS data in multiple panels.

      We have also added the qPCR data for HCT116 cells showing this same phenotype (at the mRNA level), which is the new Fig S2D.

      We decided to prioritize HCT116 cells for our mechanistic studies (Figures S2D, S4A, and 4E-F) because previous reports indicate that it is diploid and therefore less genetically deranged compared to our other cancer cell lines.

      • Figure 2F shows an elegant experiment to discard off-target effects related to cell death and to confirm that the increased MHC I expression is uniquely dependent on pyrimidines. DHODH has recently been involved in ferroptosis, a highly immunogenic type of cell death. What are the authors´ thoughts on BQ-induced ferroptosis as a possible contributor to the effects of ICB? Does BQ + ferroptosis inhibitor (ferrostatin) affect cell surface MHC I and/or expression of antigen processing genes?

      The potential role of DHODH in ferroptosis protection (Mao et al 2021) has important implications, so we are glad that multiple reviewers raised questions concerning ferroptosis. We did not directly test the effect of ferroptosis inducing agents (with or without BQ) on MHC-I/APP expression, but that is certainly a worthwhile line of investigation.

      The DHODH/ferroptosis issue is complicated by a study pointed out by Reviewer #1 that challenges the role of DHODH inhibition in BQ-mediated ferroptosis sensitization (Mishima et al, 2022). This study argues that high-dose BQ treatment causes FSP1 inhibition, and this underlies the effect of BQ on the cellular response to ferroptosis-inducing agents.

      Regardless of whether BQ-induced ferroptosis-sensitization is dependent on DHODH, FSP1, or some other factor, the Mao and Mishima studies agree that a relatively high dose of BQ is required to observe these effects (100-200µM for most cell lines and >50µM even in the most ferroptosis-sensitive cell lines). As we explained above, we consider it very unlikely that the in vivo BQ exposure in our experiments (Fig 5) was high enough to cause significant ferroptosis, especially in the absence of any dedicated ferroptosis-inducing agent (which is typically required to cause ferroptosis even in the presence of high-dose BQ).

      • The authors nail down the mechanism to CDK9 (Fig 4). However, all these experiments are performed in 293T cells. I would like to see a repeat of Fig. 4B in a cancer cell line (either PDAC or B16). Also, does BQ have any effect on CDK9 expression/protein levels?

      We have added two figure panels that address this comment (new Fig 4E and 4F). Figure 4E (which is a repeat of Fig 4B with HCT116 cells) shows that CDK9 inhibitors (flavopiridol, AT7519, and dinaciclib) reverse BQ-mediated APP induction in HCT116 cells (this agrees with Fig S4A showing that flavopiridol reverses MHC induction by various nucleotide synthesis inhibitors in this cell line), but PROTAC2 does not. Figure 4F shows that PROTAC2 (for reasons we cannot explain) does not cause CDK9 degradation in HCT116 cells. This adds further support to our thesis that CDK9 is a critical mediator of BQ-mediated APP induction (because how else can this pattern of results be explained?). The text of the Results section has been amended to reflect this.

      We chose to use HCT116 cells for this repeat experiment 1) to align with Fig S4A and 2) because, as previously mentioned, we consider HCT116 to be a good cell line for mechanistic studies because of its relative lack of idiosyncratic genetic features (compared to CFPAC-1, for example, which was derived from a patient with cystic fibrosis).

      • What are the differences in tumor size for the experiment shown in Figure 5E? What about tumor cell death in the ICB vs. BQ+ICB groups?

      Because this was a survival assay, direct comparisons of tumor volumes between groups was not possible at later time points, since mice that die or have to be euthanized are removed from their experimental group, which lowers the average group tumor burden at subsequent time points. Although tumor volume was the most common euthanasia criteria reached, a subset of mice were either found dead or had to be euthanized for other reasons attributed to their tumor burden (moribund state, inability to ambulate or stand, persistent bleeding from tumor ulceration, severe loss of body mass, etc.). This confounds any comparison of endpoint measurements (such as immunohistochemical quantification of tumor cell death markers, T-cell markers, etc.).

      • The different response in the concurrent vs delayed treatment is very interesting. The authors suggest two possible mechanisms to explain this: "1) Concurrent BQ dampens the initial anticancer immune response generated by dual ICB, or b) cancer cell MHC-I and related genes are not maximally upregulated at the time of ICB administration with concurrent treatment". However, and despite the caveat of comparing the in vitro to the in vivo setting, Fig 2D shows upregulation of MHC I already at 24h of treatment in B16 cells. Have the authors checked T cell infiltration in the concurrent and delayed treatment setting?

      For the same reasons described in response to the preceding comment, tumors harvested upon mouse death/euthanasia from our survival experiment were not suitable for cross-cohort comparison of tumor endpoint measurements. An additional experiment in which mice are necropsied at a prespecified time point (before any mice have died or reached euthanasia criteria, as in the experiment for Fig 5A-D) would be required to answer this question.

      • Page 5, line 181 -do the authors mean "nucleotide salvage inhibitors" instead of "synthesis"?

      We believe the reviewer is referring to the following sentence:

      “The other drugs screened included nucleotide synthesis inhibitors (5-fluorouracil, methotrexate, gemcitabine, and hydroxyurea), DNA damage inducers (oxaliplatin, irinotecan, and cytarabine), a microtubule targeting drug (paclitaxel), a DNA methylation inhibitor (azacytidine), and other small molecule inhibitors (Fig 2F).”

      In this context, we believe our use of “synthesis” instead of “salvage” is correct, because methotrexate and 5-FU inhibit thymidylate synthase (which mediates de novo dTTP synthesis), while gemcitabine and hydroxyurea inhibit ribonucleotide reductase (which mediates de novo synthesis of all dNTPs).

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This study, utilizing CITE-Seq to explore CML, is considered a useful contribution to our understanding of treatment response. However, the reviewers express concern about the incomplete evidence due to the small sample size and recommend addressing these limitations. Strengthening the study with additional patient samples and validation measures would enhance its significance.

      We thank the editors for the assessment of our manuscript. In view of the comments of the three reviewers, we have increased the number of CML patient samples analyzed to confirm all the major findings included in the manuscript. In total, more than 80 patient samples across different approaches have now been analyzed and incorporated in the revised manuscript.

      To the best of our knowledge, this is the first single cell multiomics report in CML and differs substantially from the recent single cell omics-based reports where single modalities were measured one at a time (Krishnan et al., 2023; Patel et al., 2022). Thus, the sc-multiomic investigation of LSCs and HSCs from the same patient addresses a major gap in the field towards managing efficacy and toxicity of TKI treatment by enumerating CD26+CD35- LSCs and CD26-CD35+ HSCs burden and their ratio at diagnosis vs. 3 months of therapy. The findings suggest design of a simpler and cheaper FACS assay to simultaneously stratify CML patients for TKI efficacy as well as hematologic toxicity.

      Reviewer 1:

      Summary:

      This manuscript by Warfvinge et al. reports the results of CITE-seq to generate singlecell multi-omics maps from BM CD34+ and CD34+CD38- cells from nine CML patients at diagnosis. Patients were retrospectively stratified by molecular response after 12 months of TKI therapy using European Leukemia Net (ELN) recommendations. They demonstrate heterogeneity of stem and progenitor cell composition at diagnosis, and show that compared to optimal responders, patients with treatment failure after 12 months of therapy demonstrate increased frequency of molecularly defined primitive cells at diagnosis. These results were validated by deconvolution of an independent previously published dataset of bulk transcriptomes from 59 CML patients. They further applied a BCR-ABL-associated gene signature to classify primitive Lin-CD34+CD38- stem cells as BCR:ABL+ and BCR:ABL-. They identified variability in the ratio of leukemic to non-leukemic primitive cells between patients, showed differences in the expression of cell surface markers, and determined that a combination of CD26 and CD35 cell surface markers could be used to prospectively isolate the two populations. The relative proportion of CD26-CD35+ (BCR:ABL-) primitive stem cells was higher in optimal responders compared to treatment failures, both at diagnosis and following 3 months of TKI therapy.

      Strengths:

      The studies are carefully conducted and the results are very clearly presented. The data generated will be a valuable resource for further studies. The strengths of this study are the application of single-cell multi-omics using CITE-Seq to study individual variations in stem and progenitor clusters at diagnosis that are associated with good versus poor outcomes in response to TKI treatment. These results were confirmed by deconvolution of a historical bulk RNAseq data set. Moreover, they are also consistent with a recent report from Krishnan et al. and are a useful confirmation of those results. The major new contribution of this study is the use of gene expression profiles to distinguish BCRABL+ and BCR-ABL- populations within CML primitive stem cell clusters and then applying antibody-derived tag (ADT) data to define molecularly identified BCR:ABL+ and BCR-ABL- primitive cells by expression of surface markers. This approach allowed them to show an association between the ratio of BCR-ABL+ vs BCR-ABL- primitive cells and TKI response and study dynamic changes in these populations following short-term TKI treatment.

      Weaknesses:

      One of the limitations of the study is the small number of samples employed, which is insufficient to make associations with outcomes with confidence. Although the authors discuss the potential heterogeneity of primitive stem, they do not directly address the heterogeneity of hematopoietic potential or response to TKI treatment in the results presented. Another limitation is that the BCR-ABL + versus BCR-ABL- status of cells was not confirmed by direct sequencing for BCR-ABL. The BCR-ABL status of cells sorted based on CD26 and CD35 was evaluated in only two samples. We also note that the surface markers identified were previously reported by the same authors using different single-cell approaches, which limits the novelty of the findings. It will be important to determine whether the GEP and surface markers identified here are able to distinguish BCR-ABL+ and BCR-ABL- primitive stem cells later in the course of TKI treatment. Finally, although the authors do describe differential gene expression between CML and normal, BCR:ABL+ and BCR:ABL-, primitive stem cells they have not as yet taken the opportunity to use these findings to address questions regarding biological mechanisms related to CML LSC that impact on TKI response and outcomes.

      Reviewer #1 (Recommendations For The Authors):

      Minor comment: Fig 4 legend -E and F should be C and D.

      We thank the reviewer for positive assessment of our work. Here, we highlight the updates in the revised manuscript considering the feedback received.

      Minor comment: Fig 4 legend -E and F should be C and D.

      We have edited the revised manuscript accordingly

      One of the limitations of the study is the small number of samples employed, which is insufficient to make associations with outcomes with confidence.

      Although we performed CITE-seq for 9 CML patient samples at diagnosis, we extended our investigations to include additional samples (e.g., largescale deconvolution analysis of samples, Fig 3 C-E, qPCR for BCR::ABL1 status, Fig. 6A, and the ratio between CD35+ and CD26+ populations at diagnosis and during TKI therapy, Fig. 6C-D) as described in the manuscript.

      In comparison to a scRNA-seq, multiomic CITE-seq involves preparation and sequencing of separate libraries corresponding to RNA and ADTs thereby being even more resource demanding limiting our capacity to process an extensive number of patient samples. To confirm our findings in a larger cohort we have therefore adopted a computational deconvolution approach, CIBERSORT to analyze a larger number of independent samples (n=59). This reflects a growing, sustainable trend to study larger number of patients in face of still prohibitively expensive but potentially insightful scomics approaches (For example, please see Zeng et al, A cellular hierarchy framework for understanding heterogeneity and predicting drug response in acute myeloid leukemia, Nature Medicine, 2022).

      However, in view of the comment, we have now substantially increased the number of analyzed patients in the revised manuscript. These include increased number of patient samples to investigate the ratio between CD35 and CD26 marked populations at diagnosis, and 3 months of TKI therapy (from n=8 to n=12 with now 6 optimal responders and 5 treatment failure at diagnosis and after TKI therapy), qPCR for BCR::ABL1 expression status at diagnosis (from n=3 to n=9) , and followed up the BCR::ABL1 expression in three additional samples after TKI therapy. Moreover, we examined the CD26 and CD35 marked populations for expression of GAS2, one of our top candidate LSC signature genes in three additional samples at diagnosis and at 3m follow up. Thus, >80 patient samples across different approaches have been analyzed to strengthen all major conclusions of the study.

      We emphasize that we were cautious in generalizing the observation obtained from any one approach and sought to confirm any major finding using at least one complementary method. As an example, although CITE-seq (n=9) showed altered frequency of all cell clusters between optimal and poor responders (Fig. 3B), we refrained from generalizing because our independent large-scale computational deconvolution analysis (n=59) only substantiated the altered proportion of primitive and myeloid cell clusters (Fig. 3E).

      Although the authors discuss the potential heterogeneity of primitive stem, they do not directly address the heterogeneity of hematopoietic potential or response to TKI treatment in the results presented.

      Thanks for noting the discussion on heterogeneity of the primitive stem cells. As described in the original manuscript, the figure 6 D-E showed a relationship between heterogeneity and TKI therapy response. The results showed that CD35+/CD26+ ratio within the HSC fraction associated with this therapy response. We have now increased the number of patient samples analyzed and present the updated results in the revised manuscript (now figure 6 C-D). These observations set the stage for assessing whether long term therapy outcome can also be influenced by heterogeneity at diagnosis.

      We have shown the hematopoietic potential of HSCs marked by CD35 expression in an independent parallel study and therefore only mentioned it concisely in the current manuscript. A combination of scRNA-seq, scATAC-seq and cell surface proteomics showed CD35+ cells at the apex of healthy human hematopoiesis, containing an HSCspecific epigenetic signature and molecular program, as well as possessing self-renewal capacity and multilineage reconstitution in vivo and vitro. The preprint is available as Sommarin et al. ‘Single-cell multiomics reveals distinct cell states at the top of the human hematopoietic hierarchy’, Biorxiv; https://www.biorxiv.org/content/10.1101/2021.04.01.437998v2.full

      We also note that the surface markers identified were previously reported by the same authors using different single-cell approaches, which limits the novelty of the findings.

      Our current manuscript is indeed a continuation of and builds onto our previous paper (Warfvinge R et al. Blood, 2017). In contrast to our previous report which was limited to examination of only 96 genes per cell, CITE-seq allowed us to examine the molecular program of cells using unbiased global gene expression profiling. Finally, although CD26 appears, once again as a reliable marker of BCR::ABL1+ primitive cells, CD35 emerges as a novel and previously undescribed marker of BCR::ABL1- residual stem cells. A combination of CD35 and CD26 allowed us to efficiently distinguish between the two populations housed within the Lin-34+38/low stem cell immunophenotype.

      Another limitation is that the BCR-ABL + versus BCR-ABL- status of cells was not confirmed by direct sequencing for BCR-ABL. The BCR-ABL status of cells sorted based on CD26 and CD35 was evaluated in only two samples

      Single cell detection of fusion transcripts is challenging with low detection sensitivity in single cell RNA-seq as has been noted previously (Krishnan et al. Blood, 2023, Giustacchini et al. Nature Medicine, 2017, Rodriguez-Meira et al. Molecular Cell, 2019). However, this is likely to change with the inclusion of targetspecific probes in scRNA-seq library preparation protocols. Nonetheless, in view of the comment, we have included more patient samples (from the previous n=3 to current n=10 (including TKI treated samples) for direct assessment of BCR-ABL1 status by qPCR analysis; the updated results are included in the revised manuscript (Figure 6A).

      It will be important to determine whether the GEP and surface markers identified here are able to distinguish BCR-ABL+ and BCR-ABL- primitive stem cells later in the course of TKI treatment.

      We performed qPCR to check for BCR::ABL1 status, and the level of GAS2, one of the top genes expressed in CML cells within CD26+ and CD35+ cells at diagnosis and following 3 months of TKI therapy. The results showed that while CD26+ are BCR::ABL1+, the CD35+ cells are BCR::ABL1- at both time points. Moreover, the expression of LSC-specific gene, GAS2 was specific to BCR::ABL1+ CD26+ cells at both diagnosis as well as following 3 months of TKI therapy. The new results are presented in figure 6B in the revised manuscript.

      Finally, although the authors do describe differential gene expression between CML and normal, BCR:ABL+ and BCR:ABL-, primitive stem cells they have not as yet taken the opportunity to use these findings to address questions regarding biological mechanisms related to CML LSC that impact on TKI response and outcomes.

      We agree with the reviewer that our major focus here was to characterize the cellular heterogeneity coupled to treatment outcome and therefore we did not delve deep into the molecular mechanisms underlying TKI response. However, in response to this comment, as mentioned above, we noted that one of the top genes in BCR::ABL1 cells (Fig. 4 C; right; in red), GAS2 (Growth Specific Arrest 2) was expressed at both diagnosis and TKI therapy within CD26+ cells relative to CD35+ cells (updated figure 6B). Interestingly, GAS2 was also detected in CML LSCs in a recent scRNA-seq study (Krishnan et al. Blood, 2023) suggesting GAS2 upregulation could be a consistent molecular feature of CML cells. GAS2 has been previously noted as deregulated in CML (Janssen JJ et al. Leukemia, 2005, Radich J et al, PNAS, 2006), control of cell cycle, apoptosis, and response to Imatinib (Zhou et al. PLoS One, 2014). Future investigations are warranted to assess whether GAS2 could play a role in the outcome of long-term TKI therapy.

      Reviewer 2:

      Summary:

      The authors use single-cell "multi-comics" to study clonal heterogeneity in chronic myeloid leukemia (CML) and its impact on treatment response and resistance. Their main results suggest 1) Cell compartments and gene expression signatures both shared in CML cells (versus normal), yet 2) some heterogeneity of multiomic mapping correlated with ELN treatment response; 3) further definition of s unique combination of CD26 and CD35 surface markers associated with gene expression defined BCR::ABL1+ LSCs and BCR::ABL1- HSCs. The manuscript is well-written, and the method and figures are clear and informative. The results fit the expanding view of cancer and its therapy as a complex Darwinian exercise of clonal heterogeneity and the selective pressures of treatments.

      Strengths:

      Cutting-edge technology by one of the expert groups of single-cell 'comics.

      Weaknesses:

      Very small sample sizes, without a validation set. The obvious main problem with the study is that an enormous amount of results and conjecture arise from a very small data set: only nine cases for the treatment response section (three in each of the ELN categories), only two normal marrows, and only two patient cases for the division kinetic studies. Thus, it is very difficult to know the "noise" in the system - the stability of clusters and gene expression and the normal variation one might expect, versus patterns that may be reproducibly study artifact, effects of gene expression from freezing-thawing, time on the bench, antibody labeling, etc. This is not so much a criticism as a statement of reality: these elegant experiments are difficult, timeconsuming, and very expensive. Thus in the Discussion, it would be helpful for the authors to just frankly lay out these limitations for the reader to consider. Also in the Discussion, it would be interesting for the authors to consider what's next: what type of validation would be needed to make these studies translatable to the clinic? Is there a clever way to use these data to design a faster/cheaper assay?

      We thank the reviewer for appraisal of our manuscript. We take the opportunity to point out the updates in the revised manuscript in view of the comments.

      Very small sample sizes, without a validation set. The obvious main problem with the study is that an enormous amount of results and conjecture arise from a very small data set: only nine cases for the treatment response section (three in each of the ELN categories), only two normal marrows, and only two patient cases for the division kinetic studies.

      As the reviewer has noted the single cell omics experiments remain resource demanding thereby placing a limitation on the number of patients analyzed. As described above in response to the comments from reviewer 1, multiomic CITE-seq allows extraction of two modalities in comparison to a typical scRNA-seq, however, this also makes it even more limited in the number of samples processed in a sustainable way. This was one of the motivations to analyze a larger number of independent samples (n=59) while benefiting from the insights gained from CITE-seq (n=9). Furthermore, by analyzing CD34+ cells from bone marrow and peripheral blood of CML patients, including both responders and non-responders after one year of Imatinib therapy, we were able to significantly diversity the patient pool, which was lacking in our CITE-seq patient pool. As mentioned above, this reflects a growing trend to analyze larger number of patients while anchoring the analysis on prohibitively expensive but potentially insightful sc-omics approaches (For example, please see Zeng et al, A cellular hierarchy framework for understanding heterogeneity and predicting drug response in acute myeloid leukemia, Nature Medicine, 2022).

      As emphasized above, we frequently sought to confirm the findings from one approach using a complementary method and independent samples. For example, although CITE-seq (n=9) showed altered frequency of all cell clusters between optimal and poor responders (Fig. 3B), we refrained from generalizing because an independent largescale computational deconvolution analysis (n=59) only substantiated the altered proportion of primitive and myeloid clusters.

      In view of the comment, we have now increased the number of patients analyzed during the revision process. These include increased numbers to investigate the ratio between CD35+ and CD26+ populations at diagnosis, as well as 3 months of TKI therapy, qPCR for BCR::ABL1, and patients examined for GAS2, one of the top genes expressed in CML cells (see response to reviewer 1 for details). Altogether, >80 patient samples across different approaches were analyzed to strengthen the conclusions.

      During the revision, we have analyzed cells from 8 CML patients for cell cycle using gene activity scores. This is in addition to the cell division kinetics data reported previously are now together described in the supplementary figures 9C-F.

      It is very difficult to know the "noise" in the system - the stability of clusters and gene expression and the normal variation one might expect, versus patterns that may be reproducibly study artifact, effects of gene expression from freezing-thawing, time on the bench, antibody labeling, etc. This is not so much a criticism as a statement of reality: these elegant experiments are difficult, time-consuming, and very expensive. Thus in the Discussion, it would be helpful for the authors to just frankly lay out these limitations for the reader to consider.

      We agree with the reviewer that sc-omics approaches can be noisy despite continuing efforts to denoise single cell datasets through both experimental and bioinformatic innovations. Therefore, we have updated the discussion as recommended by the reviewer (paragraph 5 in the discussion).

      We also note that CITE-seq, in contrast to scRNA-seq alone provides dual features: surface marker/protein as well as RNA for annotating the same cluster. In our manuscript, for example, cell clusters in UMAP for normal BM; Fig 1B were described using both surface markers (Fig. 1C) and RNA (Fig. 1D) making the cluster identity robust. To further elaborate this approach, a new supplementary figure 1C shows annotations of clusters using both RNA and surface markers.

      To potentially address the issue of stability of clusters and gene expression, we compared the marker genes for major clusters from nBM from this study (supplementary table 4, Warfvinge et al.) with those described recently in a scRNA-seq study by Krishnan et al. supplementary table 8, Blood, 2023 using Cell Radar, a tool that identifies and visualizes which hematopoietic cell types are enriched within a given gene set (description: https://github.com/KarlssonG/cellradar

      Direct link: https://karlssong.github.io/cellradar/). To compare, we used our in-house gene list for the major clusters as well as mapped the same number of top marker genes based on log2FC from corresponding cluster from Krishnan et al. as inputs to Cell Radar. The Cell Radar plot outputs are shown below.

      Author response image 1.

      This approach showed broad similarities across clusters from this study with their counterparts from the other study suggesting the cluster identities reported here are likely to be robust. Please note these figures are for reviewer response only and not included in the final manuscript.

      Also in the Discussion, it would be interesting for the authors to consider what's next: what type of validation would be needed to make these studies translatable to the clinic? Is there a clever way to use these data to design a faster/cheaper assay?

      Our findings on CD26+ and CD35+ surface markers to enrich BCR::ABL1+ and BCR::ABL1- cells suggest a simpler, faster and cheaper FACS panel can possibly quantify leukemic and non-leukemic stem cells in CML patients. We anticipate that future investigations, clinical studies might examine whether CD26CD35+ cells could be plausible candidates for restoring normal hematopoiesis once the TKI therapy diminishes the leukemic load, and whether patients with low counts of CD35+ cells at diagnosis have a relatively higher chance of developing hematologic toxicity such as cytopenia during therapy.

      We briefly mentioned this possibility in the discussion; however, we have now moved it to another paragraph to highlight the same. Please see paragraph 5 in the revised manuscript.

      Reviewer 3:

      Summary:

      In this study, Warfvinge and colleagues use CITE-seq to interrogate how CML stem cells change between diagnosis and after one year of TKI therapy. This provides important insight into why some CML patients are "optimal responders" to TKI therapy while others experience treatment failure. CITE-seq in CML patients revealed several important findings. First, substantial cellular heterogeneity was observed at diagnosis, suggesting that this is a hallmark of CML. Further, patients who experienced treatment failure demonstrated increased numbers of primitive cells at diagnosis compared to optimal responders. This finding was validated in a bulk gene expression dataset from 59 CML patients, in which it was shown that the proportion of primitive cells versus lineage-primed cells correlates to treatment outcome. Even more importantly, because CITE-seq quantifies cell surface protein in addition to gene expression data, the authors were able to identify that BCR/ABL+ and BCR/ABL- CML stem cells express distinct cell surface markers (CD26+/CD35- and CD26-/CD35+, respectively). In optimal responders, BCR/ABL- CD26-/CD35+ CML stem cells were predominant, while the opposite was true in patients with treatment failure. Together, these findings represent a critical step forward for the CML field and may allow more informed development of CML therapies, as well as the ability to predict patient outcomes prior to treatment.

      Strengths:

      This is an important, beautifully written, well-referenced study that represents a fundamental advance in the CML field. The data are clean and compelling, demonstrating convincingly that optimal responders and patients with treatment failure display significant differences in the proportion of primitive cells at diagnosis, and the ratio of BCR-ABL+ versus negative LSCs. The finding that BCR/ABL+ versus negative LSCs display distinct surface markers is also key and will allow for a more detailed interrogation of these cell populations at a molecular level.

      Weaknesses:

      CITE-seq was performed in only 9 CML patient samples and 2 healthy donors. Additional samples would greatly strengthen the very interesting and notable findings.

      Reviewer #3 (Recommendations For The Authors):

      My only recommendation is to bolster findings with additional CML and healthy donor samples.

      CITE-seq was performed in only 9 CML patient samples and 2 healthy donors. Additional samples would greatly strengthen the very interesting and notable findings.

      We thank the reviewer for the positive assessment of our manuscript. As mentioned in response to comments from reviewer 1 and 2, CITE-seq remains an reource consuming single cell method potentially limiting the number of patients to be analyzed. However, during the revision process, we have increased the number of patient material analyzed for other assays; these include increased number to investigate the ratio between CD35+ and CD26+ populations at diagnosis, and 3 months of TKI therapy, qPCR for BCR::ABL1, and patients examined for GAS2, one of the top genes expressed in CML cells. Thus, >80 patient samples across different assays have been analyzed to strengthen the conclusions. (Please see comment to reviewer 1 for more details)

    1. Author Response

      The following is the authors’ response to the original reviews.

      We want to thank the reviewers for their thoughtful analysis and questions.

      A brief overview of the changes to the manuscript is provided here, with individual responses to the reviewer comments following.

      The methods section has been expanded to better explain the techniques used in our analyses. CTCF binding data section has likewise been expanded, to include more detail on the dataset and our analysis of its contents. All other requested clarifications have been added to areas of the results.

      Beyond specific requests from the reviewers, we made the following changes.

      We felt that a particular terminology choice on our part resulted in some confusion: the use of “SNPs” to refer to genetic variants within our Diversity Outbred samples. While we used SNPs that lay closest to the center of our haplotype predictions as our representative loci for each linkage disequilibrium block, this was done for computational purposes only. We did not focus most of our analyses on the haplotypes themselves, because of the uncertainty of which variants within an LD block actually participated in the genetic-epigenetic interactions we imputed.

      Thus, we edited the text to remove mention of “SNPs” unless our analysis did directly and deliberately profile SNPs themselves. In all other cases, we now refer to “haplotypes”, “genetic variants”, or “variants”. This should help increase clarity in the manuscript as a whole.

      A small error was discovered within the labelling and processing of regression model outputs in chromosome 14. A consistency check was run on all chromosomes, finding that only Chr 14 was affected. Chr 14 was rerun in its entirety to verify its results, with the previous results now archived within our databases uploaded on Synapse (see Methods for a link). All relevant calculations and figures were regenerated, resulting in an average shift of 1% or less across the manuscript. All analyses remain highly statistically significant.

      Responses to comments from Reviewer #1

      Methods

      • Sequencing depth was retrieved from the original publication on the primary multiomics dataset. (Line 105-106)

      • A line was added regarding initial mouse genome alignment for the original publication: we explain the GigaMUGA genotyping array, used for the DO mESC samples. For our ChIP-seq data, we reword to specify: we used liftovers from imputed strain-specific genomes to B6 mm10. (Lines 108-110; 116-120; 168-170)

      • Aneuploidy removal is expanded upon in a similar fashion: the original QC identified chromosome-level gene expression differences to remove aneuploid samples. (Line 111)

      • Mention of the pre-publication use of an alternative null model has been removed, given its lack of relevance to the rest of the text. While it was interesting to compare to the standard null model, it amounts to a side note that distracts from the focus of the paper. (Line 137-139).

      • Descriptive subheadings have been added.

      Results - Line 179 (now Line 191) now points to Methods.

      • Line 189-200 (now Line 188-204): language altered to better explain our intent: We wished to perform an intrachromosomal scan across the whole genome for non-additive genetic-epigenetic interactions. However, there were computational limits to how many possible combinations of gene, haplotype, and ATAC-seq peak we could feasibly test. We thus generated a random subset of possible combinations. This was also performed to identify target regions for focused analyses.

      • Line 195 (now line 206, expanded on in Line 210): Clarification added on the significance of our result: if non-additive genetic-epigenetic interactions were not a significant explanatory factor for gene expression, we would expect to see no enrichment of low p-value results. Instead, we see 0.07% of our models coming in at adj. p < 1x10-7.

      • Line 199 (now Line 216): The requested calculations were run, and are now included in table S3. We found that within 4 Mb of a given gene, less than 10% of variants and ATAC peaks within clustered closer to each other than they did to the gene they affected.

      Please note that this figure has a level of uncertainty due to linkage disequilibrium. Thus, rather than precisely answering the question “[are there haplotype-ATAC pairs] that are in the same locality but further away from the gene?”, we asked "is the ATAC peak closer than the gene to the point where we have the highest confidence of correctly calling the interacting genotype?". The relevant code has been deposited in our Synapse repository (see Methods for link).

      • Line 205 (now restructured in Line 221-228): The text has been edited to specify our intent. We are referring to a set of TAD-focused regression models we generated (see Methods) that comprehensively included all possible interactions between genes, and all haplotypes and ATAC peaks within +/- 1 TAD of the gene.

      • (Line 227): We specified that the previously-published TAD boundary dataset we used was retrieved from the Bing Ren lab’s Hi-C projects, which imputed locations of TAD boundaries in B6 mESCs.

      • We have relabeled Figure 1 and tweaked the surrounding text to clear up some confusing aspects. The Euler plots in Figure 1D-E reflect the fact that each ATAC-seq peak and haplotype can be in multiple relationships with local genes and regulatory factors. Some of these relationships will be simple correlation between their presence and gene expression, while others may co-regulate alongside independent regulatory factors, or engage in non-additive regulatory interactions.

      Because these non-additive regulatory interactions have not been comprehensively studied, we wished to determine whether there were any regulatory factors within our data that would not be detected as significant via more conventional methods, such as correlation analysis, mediation analysis, or regression analysis without an interaction term. Our Euler plots show that there are large subsets of both ATAC-seq peaks and haplotypes that are exclusively found in non-additive interactions. Thus, our justification for focusing on non-additive interactions for the rest of the paper.

      • Line 256 (now Line 252-255): We further clarified the above in this section: correlation and mediation analyses were previously completed by the team which initially analyzed the DO mESC dataset (Skelly et al. 2020, Cell Stem Cell). They performed a correlation analysis between open chromatin and gene expression (Skelly et al. Fig. 2A), and identified expression quantitative trait loci (eQTL) (Skelly et al. Fig. 2E). We felt that more direct comparisons to the Skelly et al. data would distract readers from our focus on genetic-epigenetic interactions. Thus, we limited our discussion of non-interacting regulatory relationships to Figures 1-2, and a brief mention in Figure 5.

      • Line 290 (now Line 337): We pulled promoter locations from the FANTOM5 database of mouse promoters, and included analysis in both the text and Figure S4A-B.

      • (Line 475-476): we clarified “DO founder SNPs” to “SNPs from the non-reference DO founder strains”.

      • Line 472 (restructured in Lines 531-564): We have expanded on this section, including answers to the reviewer’s questions regarding ChIP-seq peak counts, overlap with the TAD map we used for our other analyses, and expanded upon strain-specific CTCF binding we identified in our ChIP-seq analysis.

      Responses to comments from Reviewer #2:

      (1) Typo corrected.

      (2) Lines 194-195 (now line 206, expanded on in Line 210): We have expanded upon the intent and expectations of our analysis. In summary: if non-additive genetic-epigenetic interactions were not a significant explanatory factor for gene expression, we would expect to see no enrichment of low p-value results. Thus, we would expect 0.0000001% of results to reach adj. p < 1x10-7. Instead, we see 0.07% of our models coming in at adj. p < 1x10-7, four orders of magnitude greater than expected.

      (3) Lines 226-230 (Expanded on in Lines 252-276): We have relabeled Figure 1 and tweaked the surrounding text to clear up some confusing aspects. The percentages in the text are derived from the data summarized in the Euler plots in Figure 1D-E. These plots reflect the fact that each ATAC-seq peak and haplotype can be in multiple relationships with local genes and regulatory factors. Some of these relationships will be simple correlation between their presence and gene expression, while others may co-regulate alongside independent regulatory factors, or engage in non-additive regulatory interactions.

      (4) Line 261-263 (now lines 299-300): A companion to Figure 2B has been added (Fig. S3), which provides interaction counts for each ATAC-seq peak that contributed to Figure 2B. A horizontal line is included to highlight the locations of the highly-interacting ATAC peaks.

      (5) Analysis regarding Figure 3B had been removed from its original context. It has now been restored to the manuscript (Line 368-371).

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendation for the authors)

      I only have one comment for improvement of this study and it has to do with the comparison of simulators that they conducted. There are many other simulators around now, including scDesign3, spaSim, SPIDER, SRTSIM, etc. Are any of those methods worth including in the comparison?

      Indeed, many of the mentioned simulators did not exist when we initially developed synthspot, and upon closer examination, they are not directly comparable to our tool.

      • scDesign3: The runtime of scDesign3 is quite long as a result of its generative model. The example provided in its tutorial only simulates 183 genes and takes over seven minutes when using four cores on a system with Intel Xeon E5-2640 CPUs running at 2.5GHz. In a small downsampling analysis, we simulated 10, 50, 100, and 150 genes with scDesign3 and observed runtimes of 30, 130, 245, and 360 seconds, respectively. This seems to indicate a linear relationship between the number of genes and the runtime, therefore rendering it unsuitable for simulating whole-transcriptome datasets for deconvolution.

      • spaSim: spaSim focuses on modelling cell locations in different tissue structures but does not provide gene expression data. It is designed for testing cell colocalization capabilities rather than simulating gene expression.

      • SPIDER: Although SPIDER appears to have some overlap with our work, it seems to be in the early stages of development. The GitHub repository contains only two scripts without any documentation, and the preprint does not provide instructions on how to use the tool.

      • SRTSim: SRTSim explicitly states in its publication that it is not suitable for evaluating cell type deconvolution, as its focus is on simulating gene expression data without modelling cell type composition.

      • scMultiSim: scMultiSim, like scDesign3, is limited in its capability to model the entire transcriptome.

      Nonetheless, the inherent modularity of our Nextflow framework makes it possible for users to simply run the deconvolution methods on data that has been simulated by other simulators if need be.

      Additionally, we have added the following rationale for why we developed synthspot in “Synthspot allows simulation of artificial tissue patterns”:

      “On the other hand, general-purpose simulators are typically more focused on other inference tasks, such as spatial clustering and cell-cell communication, and are unsuitable for deconvolution. For instance, generative models and kinetic models like those of scDesign3 and scMultiSim are computationally intensive and unable to model entire transcriptomes. SRTSim focuses on modeling gene expression trends and does not explicitly model tissue composition, while spaSim only models tissue composition without gene expression.”

      The other aspect of the simulation comparison that I'm missing is some kind of spatial metric. There are metrics about feature correlation, sample-sample correlation, library size, etc. But, what about spatial correlation (e.g., Moran's I or similar). Perhaps comparing the distribution of Moran's I across genes in a simulated and real dataset would be a good first start.

      We would like to clarify that synthspot does not actually simulate the spatial location of spots, but synthetic regions where spots from the same region share similar compositions. Hence, incorporating a spatial metric in the comparison is not feasible. However, as RCTD is the only method that explicitly uses spot locations in its model (Supplementary Table 2, "Location information"), we believe that generating synthetic datasets with actual coordinates would not significantly impact the conclusions of the study.

      Reviewer #2 (Public Review)

      On the other hand, the authors state that in silver standard datasets one half of the scRNA-seq data was used for simulation and the other half was used as a reference for the algorithms, but the method of splitting the data, i.e., at random or proportionally by cell type, was not specified.

      The data was split proportionally by cell type. To clarify this, we have included an additional sentence in the main text under the first paragraph of “Cell2location and RCTD perform well in synthetic data”, as well as in Figure S2.

      Reviewer #2 (Recommendation for the authors)

      Figure legends in Figures 3, 4 and across most Supplementary material are almost illegible. Please consider increasing font size for better readability.

      Thank you for bringing this to our attention. The font size has been increased for all main and supplementary figures. Additionally, the supplementary figures have also been exported in higher resolution.

      Supplementary Notes Figure 2c reads "... total count per sampled multiplied by..."

      This has been adapted, as well as the captions of Supplementary Notes Figure 3c and 4c which had the same typo.

      Review #3 (Public review)

      The simulation setup has a significant weakness in the selection of reference single-cell RNAseq datasets used for generating synthetic spots. It is unclear why a mix of mouse and human scRNA-seq datasets were chosen, as this does not reflect a realistic biological scenario. This could call into question the findings of the "detecting rare cell types remains challenging even for top-performing methods" section of the paper, as the true "rare cell types" would not be as distinct as human skin cells in a mouse brain setting as simulated here.

      We appreciate the reviewer’s concern and would like to clarify that within one simulated dataset, we never mix mouse and human scRNA-seq data together. The synthetic spots generated for the silver standards are always sampled from a single scRNA-seq or snRNA-seq dataset. Specifically, for each of the seven public scRNA-seq datasets, we generate synthetic datasets with one of nine abundance patterns, resulting in a total of 63 synthetic datasets. These abundance patterns only affect the sampling priors that are used—the spots are still created with combinations of cells sampled from the same dataset.

      Furthermore, it is unclear why the authors developed Synthspot when other similar frameworks, such as SRTsim, exist. Have the authors explored other simulation frameworks?

      While there are other simulation frameworks available now, synthspot was designed to specifically address the requirements of our study, offering unique capabilities that make it suitable for deconvolution evaluation. Moreover, many of the simulators did not exist when we initially developed our tool. We have added the following rationale for why we developed synthspot in “Synthspot allows simulation of artificial tissue patterns”:

      “On the other hand, general-purpose simulators are typically more focused on other inference tasks, such as spatial clustering and cell-cell communication, and are unsuitable for deconvolution. For instance, generative models and kinetic models like those of scDesign3 and scMultiSim are computationally intensive and unable to model entire transcriptomes. SRTSim focuses on modeling gene expression trends and does not explicitly model tissue composition, while spaSim only models tissue composition without gene expression.”

      In our response to Reviewer 1 copied below, we also outline specific reasons why other simulators were not suitable for our benchmark:

      • scDesign3: The runtime of scDesign3 is quite long as a result of its generative model. The example provided in its tutorial only simulates 183 genes and takes over seven minutes when using four cores on a system with Intel Xeon E5-2640 CPUs running at 2.5GHz. In a small downsampling analysis, we simulated 10, 50, 100, and 150 genes with scDesign3 and observed runtimes of 30, 130, 245, and 360 seconds, respectively. This seems to indicate a linear relationship between the number of genes and the runtime, therefore rendering it unsuitable for simulating whole-transcriptome datasets for deconvolution.

      • spaSim: spaSim focuses on modelling cell locations in different tissue structures but does not provide gene expression data. It is designed for testing cell colocalization capabilities rather than simulating gene expression.

      • SPIDER: Although SPIDER appears to have some overlap with our work, it seems to be in the early stages of development. The GitHub repository contains only two scripts without any documentation, and the preprint does not provide instructions on how to use the tool.

      • SRTSim: SRTSim explicitly states in its publication that it is not suitable for evaluating cell type deconvolution, as its focus is on simulating gene expression data without modelling cell type composition.

      • scMultiSim: scMultiSim, like scDesign3, is limited in its capability to model the entire transcriptome.

      Finally, we would have appreciated the inclusion of tissue samples with more complex structures, such as those from tumors, where there may be more intricate mixing between cell types and spot types.

      We acknowledge the reviewer's suggestion and have incorporated a melanoma dataset from Karras et al. (2022) in response to this suggestion. This study profiled melanoma tumors by using both scRNA-seq and spatial technologies. The scRNA-seq consists of eight immune cell types and seven melanoma cell states. We have included this study as an additional silver standard and case study, the latter of which is presented in a separate section following the liver analysis (and a corresponding section in Methods).

      We found that method performances on synthetic datasets generated from this melanoma dataset follow previous trends (Figure S3-S5). However, the inclusion of the case study led to the following changes in the overall rankings: cell2location and RCTD are now tied for first place (previously RCTD ranked first), and Seurat and SPOTlight have swapped places. Despite these changes, the core messages and conclusions of our paper remain unchanged. All relevant figures (Figures 1a, 2, 3a, 4a, 6b, 7a, S3-S6, S9) have been updated to incorporate these new analyses and results.

      Review #3 (Recommendation for the authors)

      To maintain consistency in the results, it is recommended to exclude the human scRNAseq set when generating synthetic spots. Furthermore, addressing the other significant weaknesses mentioned earlier would be beneficial.

      Please refer to our response to the public review where we address the same remark.

      It is essential to differentiate this work from previous benchmarking and simulation frameworks.

      In addition to the rationale on why we developed our own framework (see response to the public review), we have included the following text in the discussion that highlights our versatile approach when using a real spatial dataset for evaluation:

      “In the case studies, we demonstrated two approaches for evaluating deconvolution methods in datasets without an absolute ground truth. These approaches include using proportions derived from another sequencing or spatial technology as a proxy, and leveraging spot annotations, e.g., zonation or blood vessel annotations, that typically have already been generated for a separate analysis.”

      Furthermore, we conducted an extra analysis in the liver case study, generating synthetic datasets with one experimental protocol and using the remaining two as separate references (Figure S13). This further illustrates the usefulness of our simulation framework, which we mentioned by appending this sentence in the discussion:

      “As in our silver standards, users can select the abundance pattern most resembling the real tissue to generate the synthetic spatial dataset, as we have also demonstrated in the liver case study.”

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1

      (1) Since you only included patients with early-onset preeclampsia in the study, I suggest revising the title to "Identification of novel syncytiotrophoblast membrane extracellular vesicle derived protein biomarkers in early-onset preeclampsia...."

      We have changed our title to early-onset preeclampsia.

      (2) Under methods, you state that placenta was obtained from women undergoing elective cesarean section. Was this because all the study patients were delivered before the onset of labor? Or were laboring patients specifically excluded from the study?

      Indeed, labor influences the extracellular vesicles (EVs) generated. To ensure consistency in our samples and avoid this variable, we chose placentas obtained from elective cesarean sections (CS) for our study.

      (3) In Table 1 on page 10, the 8th row (Birth weight grams) needs to be reformatted. The mean birthweights for normal pregnancy and preeclampsia should be the same.

      We have reformatted the table and using ranges instead of brackets.

      (4) In the legend for Table 1, the sentence beginning on page 10, line 227, and continuing onto page 11, line 228, does not make sense. Part of the sentence was omitted inadvertently.

      We have modified this sentence to :

      Detergent treatment, which could break down EVs, with NP-40 confirmed that the majority (99%) of our samples were largely vesicular since only 0.1 ± 0.12% of BODIPY FL N-(2-aminoethyl)-maleimide and PLAP double-positive events were detected (a reduction of 99%) (Figure 1E and 1H).'

      (5) As you acknowledge, the sample size (12 patients) was small. This is understandable because early-onset preeclampsia occurs in <1% of parturients. You could collaborate with other centers in future studies to increase the sample size.

      Thank you very much for your comment. We are willing to cooperate on future research and will try to expand our sample size in subsequent studies.

      Reviewer #2 (Recommendations For The Authors):

      (1) This is one of the many "catalogue" papers where placental exosome proteins in preeclampsia are profiled. Thus, the manuscript lacks novelty. The only novelty factor is the authors have isolated exosomes by a different method and even separated the small and large exosomes. However, there is no mention of how these exosomes differ from each other in terms of their functionality. Thus it is hard to judge the biological significance of this work.

      We appreciate your insights regarding the novelty of our study. While numerous papers have profiled placental exosome proteins in preeclampsia, our methodology for enriching sSTB-EVs (exosomes) offers a distinct perspective. We believe that the separation of sSTB-EVs (exosomes) and medium/large STB-EVs (microvesicles) introduces a differentiation that extends beyond mere profiling, with implications for their functionality. There are previous studies showed that the different sizes of placenta EVs have distinct characteristics (Zabel RR, et al. Enrichment and characterization of extracellular vesicles from ex vivo one-sided human placenta perfusion. Am J Reprod Immunol. 2021 Aug;86(2)). Furthermore, the way cells internalize and respond to EVs may depend on the size of the EV (Zhuang X et al. Treatment of brain inflammatory diseases by delivering exosome encapsulated anti-inflammatory drugs from the nasal region to the brain. Mol Ther. 2011 Oct;19(10).) Therefore, it would be important for future studies to distinguish different sizes of EVs for the research.

      (2) The authors must demonstrate that these two types of EVs are also produced in vivo by detecting them in the serum of women.

      Thank you for the comment. Many previous studies have shown the two types of placental EVs in women's blood. Nakahara et al.'s (PMCID: PMC7755551) extensive review compiles studies that have specifically isolated various subtypes of placenta-derived EVs from maternal circulation. We have also readdressed it in the introduction.

      (3) The authors must compare the proteomes of serum-derived placental exosomes and the proteome of the STBs isolated from the perfusion experiments to judge how overlapping the outcomes are from those produced naturally and those produced under ex vivo conditions.

      We appreciate the reviewer's suggestion to compare the proteomes of serum-derived placental sSTB-EVs (exosomes) with those from STBs isolated through perfusion experiments. Indeed, such a comparison would provide valuable insights into the similarities and differences between naturally produced and ex vivo-generated sSTB-EVS (exosomes). However, isolating placental EVs from maternal circulation for comprehensive proteomic profiling presents challenges. It requires a significant amount of serum or plasma sample that will be sufficient to enable the isolation of placenta-specific EVs amongst numerous EVs in the circulation. In addition, it will require multiple intricate steps such as ultracentrifugation followed by immunoprecipitation. Each of these steps can potentially lead to the loss of EVs. Additionally, given the high concentration of lipoproteins in plasma relative to EVs, there's a significant risk of obtaining low-purity isolates from the outset. These challenges might compromise the comparability of results between placenta-specific EVs from maternal circulation and those from ex vivo perfusion. Nevertheless, we acknowledge the value of such an endeavor and will consider incorporating this aspect in future studies as the EV and proteomic methodology and technology improve and become more sensitive.

      (4) I have a major issue with the chosen study subjects. While the study title and the manuscript mention preeclampsia, as per the inclusion criteria mentioned in lines 88-90, the patients will be HELLP syndrome. Please clarify what was used and modify the manuscript accordingly.

      Thank you very much for finding this error. Our patients had none of the features that would qualify them for HELLP syndrome. We have edited to:

      PE was defined as new (after 20 weeks) systolic blood pressure of 140 mmHg or diastolic pressure of 90 mmHg, proteinuria (protein/creatinine ratio of 30 mg/mmol or more). None of our patients had maternal acute kidney injury, liver dysfunction, neurological features, hemolysis, or thrombocytopenia.

      (5) It is hard to reconcile how only 15 proteins were identified in the placental extract while 300+ in EVs. There is a methodological issue in the mass spec or extraction. With such widely different denominators in the total proteins identified, it is hard to compare the outcomes in terms of the three sample types.

      We acknowledge the reviewer's concerns regarding the disparity in protein counts between the placental extract and the EVs. Ultimately, more is not necessarily better. Several factors might contribute to this discrepancy. Firstly, it is plausible that certain proteins exhibit selective affinity to varying sizes of EVs, leading to a more diverse range of proteins than the placental extract. We were also stringent in our analysis to enable us to select proteins whose biological differences are more likely to be reproducible with a different validatory method like a western blot. Additionally, although the placental extract might contain a higher total protein concentration, it doesn't necessarily translate to a richer diversity of disease-specific proteins. Considering these nuances when comparing protein outcomes across sample types is helpful.

      (6) I am unable to understand the terms least differentially expressed and most differentially expressed. Do the authors mean upregulated and downregulated? Please clarify and use the terms appropriately by providing fold change values.

      We appreciate the reviewer's request for clarification. We intended to provide a relative measure of expression for the terms 'least differentially expressed' and 'most differentially expressed'. The terms are roughly equitable to down- and upregulated. Regarding EVs, we avoid using the terms 'upregulated' and 'downregulated' as EVs act as transporters and do not possess regulatory functions per se. However, for the placenta, we recognize the relevance of these terms.

      (7) The data presented is very superficial and lacks methodological details. The authors should provide the total number of targets achieved after mass spec. The cutoffs used the FDRs and other details.

      We apologize for the omission. We have added these details to the method section.

      (8) It is not clear how were these differentially abundant proteins identified. What was the cutoff used? Was it identified in all the replicates?

      We apologize for the omission. We have added these details to the method section.

      (9) How many samples were subjected to the discovery cohort, and how many were in the validation cohort? Were they the same or different? If the samples were different, how many PE samples had differentially abundant proteins by both methods?

      The study utilized 12 samples for initial discovery and another 12 for western blot validation. The validation samples specifically targeted proteins of interest, rather than undergoing another comprehensive mass spectrometry analysis.

      (10) It is striking that the authors report the expression of prostatic acid phosphatase in the placenta. In my understanding of placental biology, this gene or protein is not known to be expressed by the placenta. Please perform immunofluorescence to demonstrate that this protein is indeed produced in the STBs

      Research has revealed that even though it's called prostate-specific antigen, it's created in tissues other than the prostate, such as the placenta. Here are a couple of references to support this claim: PMID: 10634405, PMID: 7533063, PMID: 8939403, and PMID: 8945610. Hence it is likely not beneficial to demonstrate what many researchers have already demonstrated.

      (11) Please validate the differential abundance of these proteins in the exosomes isolated from the plasma of women with and without preeclampsia. A serial measurement will be of high value to determine how early as compared to hypertension, these biomarkers can predict preeclampsia.

      We are validating each EV-carried marker individually in the circulation (plasma or serum), localizing them in the placenta, and performing downstream functional analysis. This article is already lengthy and would likely be too cumbersome to include the details of all individual proteins in this manuscript. However, we have already published papers on Siglec 6 (PMID: 32998819) and Neprilysin (PMID: 30929513), and others will be published soon. We agree that there will be a lot of value to serial measurement, not just in terms of how early as compared to hypertension, these biomarkers can predict preeclampsia but also as potentially a more sensitive or specific test. This would be the subject of subsequent papers.

      (12) The authors are recommended to carry out immunofluorescence to localize the differentially abundant proteins in the placental sections and show that they are specific to STBs.

      We have already provided a similar response earlier (see response to point 11). In addition, while it is preferable, the biomarkers don't necessarily need to be specific to STB. Not all biomarkers are mechanistic agents/targets, and not all mechanistic agents are biomarkers. However, mechanistic agents should preferably be placental-specific. For example, the total sFLT1, the most studied biomarker, is not exclusively synthesized in the placenta, even though the placental-specific isoform represents a small fraction of the total sFLT-1. For example, in the non-placental world, alkaline phosphatase (ALP) is not exclusively produced by the liver but is a ‘biomarker’ of cholestatic disease.

      (13) Table 1 should give the range and SD could be given as + instead of the bracket.

      Thank you for your suggestion. We have edited it accordingly.

      (14) It is necessary to provide the gestational age of the onset of hypertension to get a judgment of how long these women were preeclamptic, culminating in HELLP.

      We want to emphasize that none of our patients experienced HELLP syndrome. In the results section, we have included the gestational age at the time of diagnosis in the table for preeclampsia. It's crucial to understand that the gestational age at diagnosis is distinct from the gestational age when hypertension initially appeared. Detecting the exact gestational age of hypertension onset would be challenging, and it would likely require a prospective or randomized clinical trial with continuous monitoring, possibly on a daily basis. However, our study is retrospective. Thus we can only comment on the gestational age at diagnosis

      (15) For newborns the term Sex is used and not gender

      Thank you for your suggestion. We have edited it accordingly.

      (16) Figure 2 is stretched and hard to read

      Thank you for your suggestion. We have edited it accordingly by creating two separate images to promote readability.

      (17) Line 278 change the sentence "there fifteen (15) proteins in the placenta" to "there were fifteen (15) proteins in the placenta"

      Thank you for your suggestion. We have edited it accordingly.

      (18) Line 288 you mean least and not lease

      Thank you for your suggestion. We have edited it accordingly.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This important study advances our knowledge of how parasites evade the host complement immune system. The new cryo-EM structure of the trypanosome receptor ISG65 bound to complement component C3b is highly compelling and well-supported by biochemical experiments. This work will be of broad interest to parasitologists, immunologist, and structural biologists.

      We thank the reviewers and editorial team for this assessment of our work.

      Public Reviews:

      Reviewer #1 (Public Review):

      The authors set out to use structural biology (cryo-EM), surface plasmon resonance, and complement convertase assays to understand the mechanism(s) by which ISG65 dampens the cytoxicity/cellular clearance to/of trypanosomes opsonised with C3b by the innate immune system.

      The cryo-EM structure adds significantly to the author's previous crystallographic data because the latter was limited to the C3d sub-domain of C3b. Further, the in vitro convertase assay adds an additional functional dimension to this study.

      The authors have achieved their aims and the results support their conclusions.

      The role of complement in immunity to T. brucei (or lack thereof) has been a significant question in molecular parasitology for over 30 years. The identification of ISG65 as the C3 receptor and now this study providing mechanistic insights represents a major advance in the field.

      Reviewer #2 (Public Review):

      This is an excellent paper that uses structural work to determine the precise role of one of the few invariant proteins on the surface of the African trypanosome. This protein, ISG65, was recently determined to be a complement receptor and specifically a receptor of C3, whose binding to ISG65 led to resistance to complement-mediated lysis. But the molecular mechanism that underlies resistance was unknown.

      Here, through cryoEM studies, the authors reveal the interaction interface (two actually) between ISG65 and C3, and based on this, make inferences regarding downstream events in the complement cascade. Specifically, they suggest that ISG65 preferably binds the converted C3b (rather than the soluble C3). Moreover, while conversion to a C3bB complex is not blocked, the ability to bind complement receptors 1 and 3 is likely blocked.

      Of course, all this is work on proteins in isolation and the remaining question is - can this in fact happen on the membrane? The VSG-coated membrane is supposed to be incredibly dense (packed at the limits of physical density) and so it is unclear whether the interactions that are implied by the structural work can actually happen on the membrane of a live trypanosome. This is not necessarily a dig but it should be addressed in the manuscript perhaps as a caveat.

      We thank the reviewer for their positive response our work. We fully agree with the reviewer about the caveats which come from this work being done in a biochemical context. We have addressed this in lines 223-24 and 327-333.

      Reviewer #3 (Public Review):

      The authors investigate the mechanisms by which ISG65 and C3 recognize and interact with each other. The major strength is the identification of eco-site by determining the cryoEM structure of the complex, which suggests new intervention strategies. This is a solid body of work that has an important impact on parasitology, immunology, and structural biology.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      A paper by Sulzen et al was published online on 27th April in Nature Communications that has a similarity (the cryo-EM structure) to this paper. This does not detract from the value of this paper. The authors should, however, include a "compare and contrast" section in this paper to explain similarities and differences in the conclusions. For example, while this paper demonstrates that ISG65 does not prevent C3 convertase activity, the Sulzen paper suggests it does prevent C5 convertase activity. The compatibility of these conclusions should be discussed.

      Two studies of ISG65 were published shortly after submission of this manuscript (Sulzen et al and Lorenzen et al) and we have added a brief comparison of the conclusions of these papers here. These mentions include lines 151, 155-6, 201-2, 274-278, 292-93 and 321-323. For a more in-depth comparison we have published an opinion piece in Trends in Parasitology, which discusses all three of these papers and which we also now reference here.

      Could the authors comment as to whether they think the association of C3b with the unstructured region of ISG65 comes about via S-S shuffling? I.e., is C3B first thioester linked to VSG and then this rearranges to ISG65 through C3b-ISG65 proximity?

      We thank the reviewer for the interesting suggestion. However, we are not aware of evidence showing that C3b, which has been conjugated to a target protein through its covalent ester bond, then becomes transferred to a second target protein. As ISG65 can bind to C3 as well as C3b, we think that the conjugate could form when ISG65-bound C3 converts to C3b, becomes reactive and, through proximity, is most likely to conjugate to ISG65. Whether this occurs to a substantial degree in trypanosomes, or whether it is more likely that ISG65 interacts with C3b which is already VSG-conjugated, requires further experiments. We have edited lines 217-222 to make this point more clearly.

      Reviewer #3 (Recommendations For The Authors):

      The authors previously reported that ISG65 C-terminus is so flexible and is not resolved in their 2022 ISG65-C3d (TED of C3b) crystal structure, which is the same case here in the cryo-EM structure of ISG65-C3b. Thus, I am wondering how C3b might find the flexible C-terminus and form a covalent bond.

      We think that the answer to the reviewer’s question relates to local concentration. When two reactive compounds are not attached together, then they diffuse freely in three-dimensions and their likelihood of colliding and reacting is subject to the randomness of Brownian motion. However, if they bind together through an interaction distinct from the reactive residues, then this increases their relative local concentration and the likelihood of collision and reaction taking place. In the case of ISG65, this is coupled with the ability of ISG65 to bind to C3 before it converts to C3b and becomes reactive. The interaction of ISG65 with C3/C3b will therefore bring together the reactive residues and increases the probability that they will collide and form a conjugate. Our control with BSA, which does not bind to C3/C3b, and does not form these conjugates supports this conclusion. We have edited lines 217-222 to clarify.

      I also find it puzzling that deleting L2 or L3 in ISG65, which they found forming additional contracts with CUB domain of C3b (12 times binding tighter), does not affect the ISG65-C3b conjugate formation in the in vitro C3 convertase formation assay.

      When we consider the affinities that the L2 and L3 loop deletions variants have for ISG65, and the concentration of ISG65 in the C3 convertase assay, we would predict that the conjugates still form with the L2 and L3 variants. This binding would therefore increase the relative local concentration of the reactive residues and ensure preferential conjugate formation, as we observe.

      (1) Page 2 bottom line, "In particular, loop 2 forms a direct contact with the CUB domain of ISG65, centered around an electrostatic", ISG65 should be C3b.

      We thank the reviewer for spotting this. It has been corrected.

      (2) Page 4, "We found that ISG65 does not complete with either factor B or Factor D and does not block the binding of factor Bb (Figure 3b). This suggests that the C3 convertase can form in the presence of ISG65", "complete" should be "compete".

      It has been corrected.

      (3) Page 4, "revealed that in the presence of ISG65 a high molecular weight band appeared, which we identified through mass spectrometry to be a conjugate of ISG65 with C3b". There is no mass spectrometry data in the manuscript to support this.

      We agree with the reviewer that this data should be included in the paper and have now added it as Supplementary Table 3.

      (4) Page 5, "By inhibiting binding of CR2 to C3d, ISG65 will reduce the likelihood that B-cell receptor binding to trypanosome antigens will result in B-cell activation and antibody production." - this sentence is a bit confusing.

      We have clarified this point in lines 243-245.

      (5) Related to Figure 2a. "This structure reveals the two distinct interfaces formed between ISG65 and C3b (Figure 2a)." It would be clearer to label where interface 1 and interface 2 are in Figure 2a.

      We have now labelled interfaces 1 and 2 above the insets in Figure 2a.

      (6) Related to Figure 2C. I suggest mutagenesis to validate ISG65 L2/L3 - C3b CUB domain interaction, i.e. mutate ISG65 (N188, R187, Y190) and perform SPR with C3b.

      We agree with the reviewer that this experiment was a valuable validation of our structural data. To achieve this aim, we changed our SPR assay, coupling C3 variants to the chip surface in an orientation which would match their conjugation to a pathogen and allowing us to reliably compare the affinities of ISG65 variants. We then assessed the binding of ISG65, ISG65∆L2, and the ISG65L2N188A,H189A,Y190A proposed by the reviewer. As predicted from the structure, both loop 2 deletion and mutation reduced the affinity for C3b but did not affect the affinity for C3d, suggesting that the difference in affinity of ISG65 for C3b and C3d is due to the observed interface 2. This new data is described in lines 150-168 and is presented in Figure 2c.

      (7) Related to Figure 3a. Is the C3b only structure in the presence of ISG65 the real C3b only? Discussion can be added.

      Our cryoEM analysis of the ISG65-C3b mixture yielded three dimensional classes which contained clear density for ISG65 and those in which there was no density for ISG65. While the reviewer is technically correct, and we cannot be 100% sure that there is not an entirely disordered ISG65 attached to these ‘unbound’ C3b, we think that this is extremely unlikely. In either case, these ‘unbound’ C3b are indistinguishable from other structures of C3b and the argument in the paper stands. We have added a clause in lines 178-179 to make this point.

      (8) Related to Figure 3e. There is no label for WT and deletion mutants. Also, L1 and L3 deletion does not seem to show on the gel.

      We have added these labels.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We thank the Reviewing Editor and two additional reviewers for the insightful input they gave us on the first version of our manuscript on allosteric activity regulation of the anaerobic ribonucleotide reductase from Prevotella copri. We have revised the manuscript in the light of the reviewers' comments. In particular, we have added additional experiments using hydrogen-deuterium exchange mass spectrometry (HDX-MS) to probe the accessibility and mobility of different parts of the protein structure in the apo-state and in the presence of dATP/CTP and ATP/CTP. The results strongly confirm the binding of nucleotides to the activity and specificity sites, as seen biochemically and structurally. In the question of mobility of the glycyl radical domain the HDX-MS experiments suggest an increased mobility in the presence of dATP, though the results are not as clear-cut as for the nucleotide binding. The HDX-MS analyses are complicated by the fact that they reflect all species in solution, which are evidently multiple for all states of PcNrdD. Finally, we have rephrased key parts of the results and discussion, and modified the title, to avoid any implication that we believe the glycyl radical domain becomes extensively disordered, rather that it becomes more mobile to the extent that it cannot be seen in the cryo-EM structures.

      eLife assessment

      This study advances our understanding of the allosteric regulation of anaerobic ribonucleotide reductases (RNRs) by nucleotides, providing valuable new structural insight into class III RNRs containing ATP cones. The cryo-EM structural characterization of the system is solid, but other aspects of the manuscript, which are incomplete, could be improved by including additional functional characterization and more evidence for the proposed mechanism of inhibition by dATP. The work will be of interest to biochemists and structural biologists working on ribonucleotide reductases and other allosterically regulated enzymes.

      Public Reviews:

      Reviewer #1 (Public Review):

      The goal of this study is to understand the allosteric mechanism of overall activity regulation in an anaerobic ribonucleotide reductase (RNR) that contains an ATP-cone domain. Through cryo-EM structural analysis of various nucleotide-bound states of the RNR, the mechanism of dATP inhibition is found to involve order-disorder transitions in the active site. These effects appear to prevent substrate binding and a radical transfer needed to initiate the reaction.

      Strengths of the manuscript include the comprehensive nature of the work - including numerous structures of different forms of the RNR and detailed characterization of enzyme activity to establish the parameters of dATP inhibition. The manuscript could be improved, however, by performing additional experiments to establish that the mechanism of inhibition can be observed in other contexts and it is not an artifact of the structural approach. Additionally, some of the presentations of biochemical data could be improved to comply with standard best practices.

      The work is impactful because it reports initial observations about a potentially new mode of allosteric inhibition in this enzyme class. It also sets the stage for future work to understand the molecular basis for this phenomenon in more detail.

      We thank the editor and reviewers for their positive evaluation of the potential impact of our work. We completely agree that hypotheses based on structural data require orthogonal experimental verification. However, the number and consistency of the cryo-EM structures speak in favour of the data being representative of conditions in solution. We feel that in particular cryo-EM data should be relatively free of artefacts, e.g. biased or incorrect relative domain orientations, compared to crystallography, where crystal packing effects can affect these parameters. As we write in response to Reviewer #2, it has been difficult to propose a direct structural mechanism for transmission of the allosteric signal from the a-site in the ATP-cone to the active site and GRD given that the ATP-cones and linker are disordered in the dATP-bound dimers and only partly ordered in the dATP-bound tetramers. Further verification experiments will be performed in future but are outside the scope of the present article.

      We will improve the presentation of the biochemical data in a revised version.

      General comments:

      (1) It would be ideal to perform an additional experiment of some type to confirm the orderdisorder phenomena observed in the cryo-EM structures to rule out the possibility that it is an artifact of the structure determination approach. Circular dichroism might be a possibility?

      Circular dichroism reports only on the approximate relative proportions of helix, sheet and loop structure in a protein, thus we believe that it would not be a sensitive enough tool to distinguish between ordered and disordered states. We are considering what alternative methods might be appropriate.

      (2) Does the disordering phenomenon of one subunit in the ATP-bound structures have any significance - could it be related to half-of-sites activity? Does this RNR exhibit half-of-sites activity?

      Half-of-sites activity has not been biochemically proven in any ribonucleotide reductase in spite of the fact that it was first suggested in 1987 (PMID: 3298261). However, strong structural indication was recently published in the form of the holo-complex of the class Ia ribonucleotide reductase from Escherichia coli, which is highly asymmetrical and in which productive contacts forming an intact proton-coupled electron transfer pathway are only formed between one of two pairs of monomers (PMID: 32217749). We have not been able to prove half-of-sites activity for PcNrdD due to low overall radical content, but the structural results are indeed consistent with such an activity.

      (3) Does the disordering of the GRD with dATP bound have any long-term impact on the stability of the Gly radical? I realize that the authors tested the ability to form the Gly radical in the presence of dATP in Fig. 4 of the manuscript. But it looks like they only analyzed the samples after 20 min of incubation. Were longer time points analyzed?

      Radical content was measured after 5 min and 20 min incubation; 5 min incubations (not included in the manuscript) consistently gave higher radical content compared to 20 min incubation. Longer time points were not analysed, as we assumed that the radical content would be even lower after 20 min.

      (4) Did the authors establish whether the effect of dATP inhibition on substrate binding is reversible? If dATP is removed, can substrates rebind?

      This is an interesting question. We measured KDs for dATP in the micromolar range and are hence confident that dATP binding is reversible. Our measurements do not, however, directly prove that inhibition of the enzyme is reversible. Nevertheless, it is worth noting that the protein as purified was precipitated and analysed by the UV-visible spectrum. The aspurified PcNrdD contained 30% nucleotide contamination. The as-purified sample was then analysed by HPLC and we identified a major peak, corresponding to dATP/dADP. Therefore, purification conditions had to be optimised to remove the nucleotides. This is evidence that PcNrdD that has “seen” dATP can subsequently bind substrates in the presence of ATP. We will describe the purification more clearly in a revision.

      (5) In some figures (Fig. 6e, for example), the cryo-EM density map for the nucleotide component of the model is not continuous over the entire molecule. Can the authors comment on the significance of this phenomenon? Were the ligands validated in any way to ensure that the assignments were made correctly?

      Indeed we sometimes saw discontinuous density for the nucleotides, both in the active site and in the specificity site. However, the break was almost always near the C5’ carbon atom, which is common to all nucleotides. While we cannot readily explain this phenomenon, the nucleotides refined well with full occupancy, giving B-factors similar to those of the surrounding protein atoms. The identity of the nucleotide could always be inferred from a) the size of the base (purine or pyrimidine); b) the known nucleotide combinations added to the protein before grid preparation; c) prior knowledge on the combinations of effector and substrate that have been found valid for all RNRs since the first studies of allosteric specificity regulation.

      Reviewer #2 (Public Review):

      This manuscript describes the functional and structural characterization of an anaerobic (Class III) ribonucleotide reductase (RNR) with an ATP cone domain from Prevotella copri (PcNrdD). Most significantly, the cryo-EM structural characterization revealed the presence of a flap domain that connects the ATP cone domain and the active site and provides structural insights about how nucleotides and deoxynucleotides bind to this enzyme. The authors also demonstrated the catalytic functions and the oligomeric states. However, many of the biochemical characterizations are incomplete, and it is difficult to make mechanistic conclusions from the reported structures. The reported nucleotide-binding constants may not be accurate because of the design of the assays, which complicates the interpretation of the effects of ATP and dATP on PcNrdD oligomeric states. Importantly, statistical information was missing in most of the biochemical data. Also, while the authors concluded that the dATP binding makes the GRD flexible based on the absence of cryo-EM density for GRD in the dATP-bound PcNrdD, no other supports were provided. There was also a concern about the relevance of the proposed GRD flexibility and the stability of Gly radical. Overall, the manuscript provides structural insights about Class III RNR with ATP cone domain and how it binds ATP and dATP allosteric effectors. However, ambiguity remains about the molecular mechanism by which the dATP binding to the ATP cone domain inhibits the Class III RNR activity.

      Strengths:

      (1) The manuscript reports the first near-atomic resolution of the structures of Class III RNR with ATP domain in complex with ATP and dATP. These structures revealed the NxN flap domain proposed to form an interaction network between the substrate, the linker to the ATP cone domain, the GRD, and loop 2 important for substrate specificity. The structures also provided insights into how ATP and dATP bind to the ATP cone domain of Class III RNR. Also, the structures suggested that the ATP cone domain is directly involved in the tetramer formation by forming an interaction with the core domain in the presence of dATP. These observations serve as an important basis for future study on the mechanism of Allosteric regulation of Class III RNR.

      (2) The authors used a wide range of methodologies including activity assays, nucleotide binding assays, oligomeric state determination, and cryo-EM structural characterization, which were impressive and necessary to understand the complex allosteric regulation of RNR.

      (3) The activity assays demonstrated the catalytic function of PcNrdD and its ability to be activated by ATP and low-concentration dATP and inhibited by high-concentration dATP.

      (4) ITC and MST were used to show the ability of PcNrdD to bind NTP and dATP.

      (5) GEMMA was used successfully to determine the oligomeric state of PcNrdD, which suggested that PcNrdD exists in dimeric and tetrameric forms, whose ratio is affected by ATP and/or dATP.

      Weaknesses:

      (1) Activity assays.

      The activity assays were performed under conditions that may not represent the nucleotide reduction activity. The authors initiated the Gly radical formation and nucleotide reduction simultaneously. The authors also showed that the amount of Gly radical formation was different in the presence of ATP vs dATP. Therefore, it is possible that the observed Vmax is affected by the amount of Gly radical. In fact, some of the data fit poorly into the kinetic model. Also, the number of biological and technical replicates was not described, and no statistical information was provided for the curve fitting.

      The highest turnover activity of PcNrdD measured in presence of ATP was 1.3 s-1 (470 nmol/min/mg), a kcat comparable to recently reported values for anaerobic and aerobic RNRs from Neisseria bacilliformis, Leeuwenhoekiella blandensis, Facklamia ignava, Thermus virus P74-23, and Aquifex aeolicus (PMID: 25157154, PMID: 29388911, PMID: 30166338, PMID: 34314684, PMID: 34941255). The general trend illustrated in Figure 1 is that ATP has an activating effect on enzyme activity, whereas high concentrations of dATP have an inactivating effect on activity, which cannot be explained by suboptimal assay conditions since our EPR results consistently show that more radical is formed in incubations with dATP compared to incubations with ATP. Curve fitting methods used are listed in Materials and Methods (as specified in the Figure 1 legend), and standard errors for all specified curve fitting results (from triplicate experiments) are shown in Figure 1.

      (2) Binding assays.

      The interpretation of the binding assays is complicated by the fact that dATP binds both a- and s-sites and ATP binds a- and active sites. dATP may also bind the active site as the product. It is unknown if ATP binds s-site in PcNrdD. Despite this complexity, the binding assays were performed under the condition that all the binding sites were available.

      Therefore, it is not clear which event these assays are reporting.

      Both ITC and MST experiments involving ATP and dATP binding to the a-site were performed in the presence of at least 1 mM GTP substrate (5 mM in MST) to fill the active site, and 1 mM dTTP effector to fill the s-site (specified in the legend to Figure 2). These conditions enable binding of ATP or dATP only to the a-site in the ATP-cone.

      (3) Oligomeric states.

      Due to the ambiguity in the kinetic parameters and the binding constants determined above, the effects of ATP and dATP on the oligomeric states are difficult to interpret. The concentrations of ATP used in these experiments (50 and 100 uM) were significantly lower than KL determined by the activity assays (780 uM), while it is close to the Kd values determined by ITC or MST (~25 uM). Since it is unclear what binding events ITC and MST are reporting, the data in Figure 3 does not provide support for the claimed effects of ATP binding. For the effects of dATP, the authors did not observe a significant difference in oligomeric states between 50 or 100 uM dATP alone vs 50 uM dATP and 100 uM CTP. The former condition has dATP ~ 2x higher than the Kd and KL (Figure 1b) and therefore could be considered as "inhibited". On the other hand, NrdD should be fully active under the latter condition. Therefore, these observations show no correlation between the oligomeric state and the catalytic activity.

      The results in Figure 3 show that at in presence of 100 µM ATP plus 100 µM CTP the oligomeric equilibrium is 64% dimers plus 36% tetramers, and in presence of 50-100 µM dATP the oligomeric equilibrium is 32% dimers and 68% tetramers. We agree that there is no clear and strong correlation between oligomeric state and inhibition. We will also try to make it clearer in a revised version. Meanwhile, in order to add some clarity to our observations, SEC experiments at higher nucleotide concentrations will be done to strengthen our observations.

      (4) Effects of dATP binding on GRD structure

      One of the key conclusions of this manuscript is that dATP binding induces the dissociation of GRD from the active site. However, the structures did not provide an explanation for how the dATP binding affects the conformation of GRD or whether the dissociation of GRD is a direct consequence of dATP binding or it is due to the absence of nucleotide substrate. Also, Gly radical is unlikely to be stable when it is not protected from the bulk solvent. Therefore, it is unlikely that the GRD dissociates from the active site unless the inhibition by dATP is irreversible. Further evidence is needed to support the proposed mechanism of inhibition by dATP.

      We admit that it has been difficult to propose a direct structural mechanism for transmission of the allosteric signal from the a-site in the ATP-cone to the active site and GRD given that the ATP-cones and linker are disordered in the dATP-bound dimers and that the linker can only be partly modelled in the dATP-bound tetramers. Most likely dATP binding causes a change in the dynamics of the linker region and NxN flap that directly affects substrate binding and simultaneously causes disorder of the GRD, given that all are part of a connected system (described as “nexus” in the manuscript). The structures determined in the presence of dATP and CTP show that CTP cannot bind in the absence of an ordered NxN flap.

      In any case a major conclusion of the work is that dATP does not inhibit the anaerobic RNR by prevention of glycyl radical formation but by prevention of its subsequent transfer. We agree that further evidence is required to support the proposed mechanism, but given the extent of the data already presented in the manuscript, we feel that such studies should be the subject of a future publication.

      (5) Functional support for the observed structures.

      Evidence for connecting structural observations and mechanistic conclusions is largely missing. For example, the authors proposed that the interactions between the ATP cone domain and the core domain are responsible for tetramer formation. However, no biochemical evidence was provided to support this proposal. Similarly, the functional significance of the interaction through the NxN flap domain was not proved by mutagenesis experiments.

      We did actually make mutants to verify the observed interactions, but several of them did not behave well in our hands, e.g. with regard to protein stability. Since we have no evidence that oligomerisation is coupled to inhibition, and since we did not observe any conservation between protein sequences in the interaction area, we chose not to pursue this point further. The main merit of the tetramer structures is that they allowed a high-resolution view of dATP binding to the ATP-cone and a comparison to previously-observed ATP-cones. Nevertheless, mutation experiments, also including the NxN flap, could be the subject of future work.

      Reviewer #3 (Public Review):

      The manuscript by Bimai et al describes a structural and functional characterization of an anaerobic ribonucleotide reductase (RNR) enzyme from the human microbe, P. copri. More specifically, the authors aimed to characterize the mechanism by how (d)ATP modulates nucleotide reduction in this anaerobic RNR, using a combination of enzyme kinetics, binding thermodynamics, and cryo-EM structural determination. One of the principal findings of this paper is the ordering of a NxN 'flap' in the presence of ATP that promotes RNR catalysis and the disordering of both this flap and the glycyl radical domain (GRD) when the inhibitory effector, dATP, binds. The latter is correlated with a loss of substrate binding, which is the likely mechanism for dATP inhibition. It is important to note that the GRD is remote (>30 Ang) from the binding site of the dATP molecule, suggesting long-range communication of the structural (dis)ordering. The authors also present evidence for a shift in oligomerization in the presence of dATP. The work does provide evidence for new insights/views into the subtle differences of nucleotide modulation (allostery) of RNR through long-range interactions.

      The strengths of the work are the impressive, in-depth structural analysis of the various regulated forms of PcRNR by (d)ATP using cryo-EM. The authors present seven different models in total, with striking differences in oligomerization and (dis)ordering of select structural features, including the GRD that is integral to catalysis. The authors present several, complementary biochemical experiments (ITC, MST, EPR, kinetics) aimed at resolving the binding and regulatory mechanism of the enzyme by various nucleotides. The authors present a good breadth of the literature in which the focus of allosteric regulation of RNRs has been on the aerobic orthologues.

      Given the resolution of some of the structures in the remote regions that appear to be of importance, the rigor of the work could have been improved by complementing this experimental studies with molecular dynamics (MD) simulations to reveal the dynamics of the GRD and loops/flaps at the active site.

      We have discussed with expert colleagues the possibility of carrying out MD simulations on the different states in order to study the differential effects of ATP and dATP binding on the dynamics of the GRD. However, they felt that the chance of obtaining meaningful results was low, particularly since some structural elements are missing from the models for both forms, in particular the linker between the ATP-cone and the core.

      The biochemical data supporting the loss of substrate binding with dATP association is compelling, but the binding studies of the (d)ATP regulatory molecules are not; the authors noted less-than-unity binding stoichiometries for the effectors.

      Most of the methods used measure only binding strength, not the number of binding sites (N), whereas ITC also measures number of sites. N is dependent on the integrity of the protein, i.e. the number of protein molecules in a preparation that are involved in binding, and quite often gives lower values than the theoretical number of binding sites.

      Also, the work would benefit from additional support for oligomerization changes using an additional biochemical/biophysical approach.

      SEC (chromatography), GEMMA (mass spectrometry) and cryo-EM were used to study oligomerization. Since each method has restrictions on nucleotide concentrations as well as protein concentrations that can be used, the results are not directly comparable, but all three methods indicate nucleotide dependent oligomerization changes. The SEC results will be included in a revised version.

      Overall, the authors have mostly achieved their overall aims of the manuscript. With focused modifications, including additional control experiments, the manuscript should be a welcomed addition to the RNR field

      Recommendations for the authors: Reviewer #1 (Recommendations For The Authors):

      (1) The last sentence of the abstract is not complete. The structures implicate a complex network of interactions in ... ? What do they implicate?

      A couple of words seem to have been missed from the abstract. We have rewritten the end of the abstract to emphasise better that the dynamical transitions involve a linked network of interactions and not just the GRD.

      (2) A reference is needed in the second sentence of the introduction.

      We have added a reference as requested.

      (3) Page 2, paragraph 2. The authors state "two beta subunits (NrdB) harboring a stable radical." This is not accurate. First of all, each beta subunit harbors its own cysteine oxidant.

      And in several subclasses, that oxidant is not a stable radical but an oxidized metal cluster. Please revise to improve accuracy and also provide appropriate references.

      We have revised the description and added a recent reference.

      (4) Page 4, Fig. 1, panels C and D. The fit of the curve to the data is pretty poor. Is there an explanation? Could the data be improved in some way? In general, it is also best practice nowadays to show the individual data points in addition to the error bars in plots like the ones shown in Figure 1. Please modify the plots to include the individual data points in this figure - and probably also the subsequent figures showing binding data.

      We have modified relevant panels in Figures 1, 2 and 5 as requested.

      (5) Page 12, first paragraph. The authors state that one of the monomers in the ATP-CTP structure is well ordered and the other is less ordered. It would be ideal to show in a figure the basis for this conclusion using the cryo-EM maps. The "less ordered" monomer appears to be fully modeled.

      Since the 2-fold axis of the dimer is vertical, the GRD of the left-hand monomer is hidden from view at the back of the molecule in Figure 6. For this monomer there was a small amount of density that allowed modelling of part of the glycyl radical loop (though not the tip containing the radical Gly itself) and the NxN flap, albeit with significantly higher mobility. We have illustrated this through an additional supplement for Figure 6 (figure supplement 2) in which the B-factors of the residues are shown both as a ribbon with radius proportional to the B-factor and through colouring. We hope that the four views in Figure 6 (figure supplement 2) together illustrate the relative mobility of different parts of the dimer.

      It would also be ideal to show the basis for the conclusion that the entire GRD is disordered in the dATP-bound dimer structure.

      Thank you for this suggestion. We have added a fifth supplement to Figure 8 in which we show the cryo-EM reconstruction for the dATP-bound dimer in two orientations, with the ATP-CTP-bound structure superimposed, which clearly shows that the entire GRD, the ATPcones, linker and NxN flap are all disordered in both monomers.

      Reviewer #2 (Recommendations For The Authors):

      (1) Units to describe enzyme activity.

      • The unit for the specific activity in the main text (nmol/min•mg) is unusual. It is most likely a typo of nmol/min/mg or nmol/(min•mg).

      We have changes to nmol/min/mg in the text.

      • The unit for the Vmax is unusual and should not be confused with the specific activity. By definition, Vmax is the velocity of a reaction at a defined enzyme concentration/amount. For example, if an assay of 10 mg enzyme yielded 470 nmol of product in 1 min, Vmax is 470 nmol/min, whereas the specific activity is 47 nmol/min/mg.

      The velocity as calculated above is ca 1.3 s-1. We have added kcat values to accompany the specific activities given.

      (2) Steady-state kinetic analysis.

      • The steady-state kinetic analysis in Figure 1 needs to be repeated. While the nonlinear curve fitting for Figure 1a is reasonable, those in Figures 1b, 1c, and 1d were outside the error range. Consequently, the reported kinetic parameters are unlikely accurate. The authors should repeat the assays with different enzyme preparation to account for all the errors. If the fit curve is still outside the error range, the kinetic model is likely incorrect, and the authors need to investigate different kinetic models.

      The replotted Figure 1 now includes two different experiments for 1b (four replicates in total).

      • The authors should report the number of replicates and the statistical data for the curve fitting.

      The figure legend has been updated with statistical data for all curve fits, and the number of replicates has been added.

      • The authors should report Vmax, Ki, and KL for Figure 1d.

      Results in Figures 1c and 1d are less straightforward than those in Figures 1a and 1b where the s-site is filled with dTTP, favouring binding of GTP to the active site. The curve fit in Figure 1c is disturbed at high concentrations of ATP, which plausibly competes with the CTP substrate and results in inhibition by formed dATP. The curve fit in Figure 1d is less certain since reduction of substrate is low due to intrinsic CTP reduction in absence of effector and partially overlapping activation and inhibition effects of dATP.

      • The authors should consider presenting the data in a log scale because of the complex nature of the activation/inhibition at the lower concentrations of dATP.

      Log scale plots are included as insets in Figures 1b and 1d.

      • The basal level of CPT reduction in the absence of an effector nucleotide should be reported with an error.

      The error value has been added in the figure legend for the basal level of CTP reduction in the absence of effector.

      (3) Equations for the kinetic analysis.

      -The equations should be numbered and referred to in the Figure 1 legend.

      All equations are specified and numbered in Materials and Methods. The equation used for each curve fit in the panels in Figure 1 is specified in the figure legend.

      -KL must be defined in the main text. I suppose this is Kd for ATP or dATP. The equation for KL determination is missing brackets for dNTP.

      KL (the concentration of an allosteric effector that gives half maximal enzyme activity) is defined in Materials and Methods where the equation is described. KL is not the same as KD (the dissociation constant for a ligand and its receptor). Brackets have been added to equation 1.

      • I believe dNTP in the first equation is incorrect because ATP was the ligand for Figures 1A and 1C.

      [dNTP] in the first equation has been changed to [NTP/dNTP] to indicate that both ribonucleotides and deoxyribonucleotides can bind.

      • The second equation can be expressed as dATP as I believe this is the only ligand that inhibits the enzyme.

      We prefer to keep the more general [dNTP] in the equation.

      • The equation used for the fitting in Figure 1d must be defined more clearly than "a combination of the two equations".

      The equation used for the curve fit in Figure 1d has been specified as equation 3 in Materials and Methods.

      (4) Design of the activity assays

      It is not clear if the activity assays report the rate of glycyl radical formation or nucleotide reduction. The authors mixed NrdD and NrdG and initiated the reaction by adding formate (essential for nucleotide reduction) and dithionite (Gly radical formation). The Gly radical formation is slow (in min time scale). The authors reported that ATP/dATP affected the rate of Gly radical formation and in the presence of ATP, Gly radical formation was incomplete even after 20 min. Therefore, it is possible that within the timescale of the activity assays (5 min), the reactions could be partially limited by the Gly radical formation, which may be the reason for the poor curve fitting.

      Activity assays were performed with 5 min pre-incubation without dithionite and formate (no glycyl radical formation) and 10 min incubation after addition of dithionite and formate (glycyl radical formation plus substrate reduction). During earlier tests, NrdD and NrdG were first preincubated in the presence of dithionite (glycyl radical formation) and after addition of formate the substrate reduction was monitored during 20 min. These experiments resulted in lower enzyme activity, whereas higher activity was achieved only upon formate addition to the preincubation reaction. We suppose that the presence of dithionite, which is a strong reducing agent, affected NrdD stability and the reaction was stabilised by the presence of formate at an earlier stage of the reaction. For the EPR conditions used in the paper, 5 min incubation gave higher radical content compared to 20 min, and the reported activity assay gave highest activity after 10 min incubation; kcat of 1.3 s-1.

      (5) Methods section for the activity assays.

      • The concentration of dTTP, ATP, and dATP used in the assays must be described.

      We thank the reviewer for pointing out this omission and we have now specified the concentrations used.

      • Although the authors mentioned that they changed the concentration of dTTP, such data were not presented. Is this correct? Did the authors fix the dTTP concentration for the GTP reduction?

      We apologise for the ambiguity and have specified that the dTTP concentration was fixed at 1 mM in the GTP experiments and that only the ATP or dATP concentrations were varied.

      (6) Discrepancy between Ki/KL and Kd.

      • There is a significant ambiguity remaining about the binding event that the ITC and MST results are reporting. Although dATP binds to both a- and s-sites and ATP binds to both active site and a-site, only a single binding event was observed in both cases. To distinguish the dATP binding to a- and s-sites and the active site, the authors should perform binding assays using mutant enzymes with only one of the binding sites available for dATP/ATP binding.

      MST and ITC were performed in presence of substrate (1 mM GTP) and s-site effector (1 mM dTTP in ITC experiments, and 5 mM dTTP in MST experiments), thus dATP is blocked from binding to the s-site and ATP from binding to the active site.

      • There are significant differences between Kd determined by MST or ITC and Ki/KL determined by the activity assays. Kd measurements were performed in the absence of the substrate nucleotides, while the assays required substrates. There may be complications from the presence of NrdG and the Gly radical formation. The authors must clearly describe all these complications and the discrepancy between Kd and Ki/KL.

      MST, ITC and enzyme assays were all performed in the presence of substrate, and enzyme assays also contained NrdG, which was not present in the MST and ITC analyses. While KD is a thermodynamic constant representing the affinity of ligand to its binding site - in our case an effector nucleotide to the ATP-cone, KL is a kinetic constant (the allosteric effector concentration that gives half maximal activity) representing the relationship between the effector concentration and the reaction speed and is affected by the enzyme turnover number (kcat). The relationship between KD, KL and Ki is further complicated by conformational and possibly oligomeric state changes of NrdD upon binding of allosteric effectors, which occurs on a slower time scale than the rapid exchange of nucleotides in allosteric sites.

      • The results of ATP/dATP copurification experiments shown in Figure 2 - figure supplement 1 show the preference of dATP binding over ATP. However, the results do not necessarily support the competition between ATP and dATP for binding to the ATP cone domain. It is still possible that dATP binding to the s-site diminishes the binding of ATP to the a-site.

      Our aim was to exclude the possibility that ATP and dATP can bind to the ATP-cone at the same time and not to study competition between the two. Nevertheless, to eliminate the possibility that dATP binding to the s-site could affect nucleotide binding to the a-site, in two out of three conditions described in the supplementary figure, the experiments were performed in the presence of dTTP to prevent binding of dATP to the s-site.

      (7) Oligomeric states.

      • The authors must present the GEMMA results without ATP or dATP. Otherwise, the effects of ATP and dATP on the oligomeric state are not clear.

      We cannot report GEMMA results without ATP or dATP because apo-PcNrdD was unstable in the GEMMA buffer and clogged the capillaries. Instead, SEC analysis was performed on apo-PcNrdD in a more suitable buffer and showed a homogeneous peak corresponding to a dimer (included as Figure 3 - figure supplement 1).

      • Figure 3 does not support the induction of a2 upon ATP binding. The concentrations of ATP used in these experiments (50 and 100 uM) were significantly lower than KL determined by the activity assays (780 uM), while it is close to the Kd values determined by ITC or MST (~25 uM). Since it is unclear what binding events ITC and MST are reporting, the data in Figure 3 does not provide support for the claimed effects of ATP binding.

      MST and ITC were performed in the presence of substrate (1 mM GTP) and s-site effector (1 mM dTTP in ITC experiments, and 5 mM dTTP in MST experiments), and they thus measure binding of ATP or dATP to the ATP cone. SEC analysis with 2 µM apo-PcNrdD and higher nucleotide concentrations (1 mM) was performed, confirming the presence of both dimers and tetramers in solution at different ratios depending on the addition of ATP or dATP. The SEC analysis, included as Figure 3 - figure supplement 1, confirms the existence of an equilibrium in solution.

      • The effects of dATP must be presented more clearly. The authors did not observe a significant difference in oligomeric states between 50 or 100 uM dATP vs 50 uM dATP and 100 uM CTP. The former condition has dATP ~ 2x higher than the Kd and KL (Figure 1b) and therefore could be considered as "inhibited". On the other hand, NrdD should be fully active under the latter condition. The absence of difference in the oligomeric states between these two different conditions suggested to me that the oligomeric state does not regulate the NrdD activity. The authors seemed to indicate the same conclusion, but did not describe it clearly.

      We agree that the oligomeric state most likely does not regulate the NrdD activity and hope to have explained this better in the revised version.

      • Figure 3 legend mentioned a and b, but the figure was not labeled.

      We have corrected this.

      • The authors should triplicate the analysis and report the errors.

      Five scans were added for each trace to increase the signal-to-noise level (included in figure legend).

      (8) EPR characterization of Gly radical

      • The amount of Gly radical must be quantified by EPR. The authors must report how much NrdD has Gly radical.

      The concentration of NrdD (1 µM) in the activity assays is too low to be quantified by EPR. In the EPR experiment the glycyl radical content is given in the figure legend.

      • The authors claim that the Gly radical environment was similar based on the doublet feature. However, the double feature comes from the hyperfine splitting with α proton whose orientation relative to the radical p-orbital would not be affected by the conformation or the environment. Thus, this conclusion is incorrect and must be removed.

      We thank the reviewer for the clarifying comment and have removed our suggestion in the text.

      (9) Gly711 should be shown in Fig. 6e to help readers understand the last paragraph on page 12.

      The figure reference has been changed to Fig. 7, where this is shown more clearly. In Fig. 6e, inclusion of Gly711 would obscure other important information.

      (10) GRD structure with dATP

      The disorder of GRD in the presence of dATP does not agree with the formation of Gly radical under the same conditions. Gly radical is unlikely stable if it is extensively exposed to solvent. Most likely, the observed cryo-EM structures represent the conformation irrelevant to Gly radical formation.

      We agree that the glycyl radical is unlikely to be stable if exposed to solvent. We believe that the GRD is not completely disordered but most likely made more mobile through rigid body movements of the domain to an extent that makes it invisible in the cryo-EM maps. It is most likely still in the vicinity of the active site, shielding the glycyl radical. Our new HDX-MS results show a small but tangible increase in mobility of the GRD in the presence of dATP compared to ATP. Of course the differences in dynamics remain to be confirmed. It is worth noting that the group of Catherine Drennan at MIT published a conference abstract more than a year ago that suggested a similar pattern of ordered/dynamic GRDs, based on crystal structures, though the details have not yet been published (https://doi.org/10.1096/fasebj.2022.36.S1.R3407).

      We also agree that the cryo-EM structures do not show the GRD conformation relevant to Gly radical formation, as this has been shown spectroscopically for the GRE pyruvate formate lyase to require large conformational changes in the GRD and also the presence of the activase. However, revealing this conformation would be a completely different project. We postulate that inactivation proceeds by prevention of radical transfer to the substrate, not by prevention of its formation.

      We have altered the wording in several places in the revised manuscript, including the title, to avoid using the term “disorder”, as this may imply (partial) unfolding, and we certainly do not wish to imply that.

      (11) The difference between dATP and ATP binding

      From the presented structures, it was not clear how the absence of 2'-OH affects the oligomeric state and the structure of the GRD. The low resolution of the ATP-bound structure precluded the comparison between the ATP and dATP-bound structures.

      We agree that a detailed analysis of the differences between ATP- and dATP-bound structures requires higher resolution structures, particularly of the ATP-bound form. This will be the subject of future studies.

      (12) Conclusion about the disordered GRD.

      -The authors should describe the reason why the dATP binding affected the structure of GRD. The authors did not discuss why dATP binding affected the folding or mobility of GRD. Since this is the key conclusion of this manuscript and the authors are making this conclusion based on the absence of the ordered GRD structure (hence the negative results), the authors should carefully describe why the dATP binding does not allow the binding/folding of GRD in the position observed in the ATP-bound structure.

      As mentioned in our response to point 4 in this reviewer’s Public Review, it is difficult to propose a direct structural mechanism for transmission of the allosteric signal from the a-site in the ATP-cone to the active site and GRD given that the ATP-cones and linker are disordered in the dATP-bound dimers and that the linker cannot be completely modelled even in the dATP-bound tetramers. Our first hypotheses were that the ATP-cone might work by a steric occlusion mechanism, but the reality appears more complex. Most likely dATP binding causes a change in the dynamics of the linker region and NxN flap that directly affects substrate binding and simultaneously causes higher mobility of the GRD, given that all are part of a connected system. The structures determined in the presence of dATP and CTP show that CTP cannot bind in the absence of an ordered NxN flap. We hope that future structural studies of NrdDs from other organisms may shed further light on this mechanism.

      • The authors should test if the dATP inhibition is reversible for PcNrdD. If dATP binding induces dissociation of GRD from the active site and makes GRD flexible, Gly radical would most likely be quenched by formate or other components in the assay solution. If dATP inhibition is reversible, it is hard to believe that Gly radical dissociates completely from the active site.

      As-purified PcNrdD contains dATP and can after removal of bound nucleotides bind substrate in presence of ATP. The as-purified PcNrdD protein contained 30% nucleotide contamination. After precipitation, HPLC analysis identified a major peak corresponding to dATP/dADP. Purification conditions were optimised to remove the nucleotides and we have added this information to the purification description.

      (13) Functional support for the observed structures.

      Similar to X-ray crystallography, cryo-EM is a highly selective method that requires the selection of particles that can be analyzed with sufficient resolution. This means that the analysis could be biased towards the protein conformations stable on the cryo-EM grid. Consequently, testing the structural observations by functional characterization of mutant enzymes is critical. However, the authors did not perform such functional characterizations and made conclusions purely based on the structural observations.

      We acknowledge this limitation. We constructed several mutations located at the tetrameric interface between the ATP-cone and the core protein based on the cryo-EM structure of dATP loaded NrdD. Unfortunately, these mutant proteins were unstable and led to protein cleavage.

      (14) Other minor points:

      • In the introduction, the authors stated "The presence and function of the ATP-cone domain distinguish anaerobic RNRs from the other members of the large glycyl radical enzyme (GRE) family that are otherwise structurally and mechanistically related (Backman et al., 2017)." This statement is misleading because GREs are functionally diverse.

      We have removed the words “and mechanistically” to reduce ambiguity.

      • p. 12, e.g. should be removed.

      We are not sure what is meant here. Does the reviewer mean p. 21 “The interactions are mostly hydrophobic but are reinforced by several H-bonds, e.g. between Gln3D-Gln458A, Ser53D–Gln458A, Arg11D-Asp468A, the main chain amide of Ile12D and Tyr557A.”?

      Reviewer #3 (Recommendations For The Authors):

      Overall, the work presents an impressive and in-depth structural view of the conformational changes stemming from the interactions of (d)ATP allosteric effector molecules that are interrelated to RNR function. The manuscript is written clearly and provides a solid overview of RNR chemistry. The cryo-EM data show striking differences between ATP and dATP bound forms, though in select regions, the resolution is not good enough for strong interpretations of the finer details.

      (1) In cryo-EM structures, dATP appears to shift the oligomerization equilibrium from nearly all dimeric forms (absence of dATP) to a mixture of both dimeric and tetrameric species (presence of dATP). The examination of the oligomeric composition in solution using the GEMMA - a mass spectral technique - showed somewhat similar trends, though given the magnitude of the differences, it was less compelling. Have the authors considered a complementary solution technique, such as analytical SEC or dynamic light scattering that could provide further support for the change in oligomerization as observed in the cryo-EM?

      SEC analysis with 2 µM apoPcNrdD and higher nucleotide concentrations (1 mM) was performed, confirming the presence of both dimer and tetramer in solution at different ratios depending on the addition of ATP or dATP. The SEC analysis, included as Figure 3 - figure supplement 1, confirms the existence of an equilibrium in solution.

      (2) The protein as isolated from the final SEC shows a predominant peak corresponding to aggregate protein. It would be helpful if the authors ran an analytical SEC on the protein sample that is more refined to see how much soluble dimer/tetramer vs. aggregate protein there is. This could impact the kinetic and thermodynamic analysis of effector interactions. Further, the second major peak is labeled as 'monomer'. Is the protein isolated as a monomer and then forms dimer upon effector binding? It is unclear. The authors should consider presenting the SEC standards for the given column and buffer condition so that a reasonable estimate of the oligomerization status of the isolated protein can be assigned.

      Can the reviewer possibly have believed that Figure 1 - supplementary figure 2a shows PcNrdD rather than PcNrdG? The figure supplement corresponds to the as-isolated SEC analysis of the activase (PcNrdG), which shows the presence of two main peaks of aggregates and monomer. The monomeric peak was reinjected and showed no presence of further aggregation states. Currently it is not known which oligomeric state the activase harbours upon binding to PcNrdD and glycyl radical formation. None of the other SEC figures in the MS has any predominant peak corresponding to aggregated protein.

      (3) More details are needed for the ITC section. The ITC methods are not clear. What is the exact composition of the ligand solution being titrated into the protein solution? It is unclear how the less-than-unity binding stoichiometry was determined and what it means. Is the n value for the monomer, dimer, or tetramer forms? It is concerning that n < 1 is observed for dATP binding in the ITC whereas there are 3 dATP bound/subunit in the cryo-EM. For completeness, titration of a buffer into protein solution (no ligand) should be conducted and presented to demonstrate that the heats produced in Figure 2 correspond to the ligand only (and not a buffer mismatch).

      ITC experiments were performed in the presence of 1 mM GTP (c-site) and 1 mM dTTP (ssite). Unlike other parameters in ITC analyses, the N value is usually the least accurate of all fitted parameters and strongly depends on the concentration of the active protein in the sample. N values described in the current study are in the same range as values reported for ATP-cones in other RNRs and NrdR (Rozman Grinberg & al 2018a, 2018b, 2022 McKethan and Spiro 2013). The results most likely reflect two high-affinity binding sites for dATP and one high affinity binding site for ATP. Different nucleotide concentrations were used in the cryoEM and ITC experiments.

      (4) It is intriguing that the binding of dATP doesn't quell the glycyl radical. In fact, it appears that, as the authors suggest, the amount of glycyl radical might be increased in these samples. However, the cryo-EM data indicates that the GRD is disordered. It is unclear how these would be correlated, as one would not expect a disordered structural element to maintain such a potent oxidant.

      As already written above, we do not wish to imply that the GRD is completely or even highly disordered, just that its dynamics increase in the presence of dATP. Otherwise we completely agree that a very exposed Gly radical is incompatible with its stability. It could be that the amount of disorder is exaggerated somewhat by the vitrification process in cryo-EM. We have tried to reword some of the text to emphasise higher mobility rather than disorder.

      It has been difficult to propose a direct structural mechanism for transmission of the allosteric signal from the a-site in the ATP-cone to the active site and GRD given that the ATP-cones and linker are disordered in the dATP-bound dimers and that the linker can not be completely modelled even in the dATP-bound tetramers. We initially thought that a steric occlusion mechanism might be at play, but the reality appears more complex. Most likely dATP binding causes a change in the dynamics of the linker region and NxN flap that directly affects substrate binding and simultaneously causes higher mobility of the GRD, given that all are part of a connected system. The structures determined in the presence of dATP and CTP show that CTP cannot bind in the absence of an ordered NxN flap. We hope that future structural studies of NrdDs from other organisms may shed further light on this mechanism.

      (5) It is a bit difficult to keep track of the myriad of structural information and differences amongst the various nucleotide-dependent conditions. It would be useful for the authors to add a summary figure that depicts the various oligomers, orientations, and (dis)ordered structural elements with cartoon representations.

      Thank you for this suggestion. It has been added as Figure 11.

      (6) The mechanism by which (d)ATP binding changes the (dis)ordering of select loops based on the current cryo-EM data is unclear (even the authors agree). The addition of molecular dynamics (MD) simulations on two different structures to reveal the network or structural communication would be a great addition to the work and validate the structural data.

      We have discussed this with a colleague who is an expert in MD. Their advice was that such simulations would be very difficult given that some amino-acids are missing in both of the relevant starting structures (ATP-CTP and dATP-CTP dimer) and could give very variable results. Thus we chose to do complementary experiments with hydrogen-deuterium exchange mass spectrometry (HDX-MS) instead. The results are included in the revised manuscript.

      Minor points

      (1) There are some conflicting reports as to whether P. copri is considered a human 'pathogen'. According to Yeoh, et al Scientific Reports 2022, P. copri is one of the predominant microbes in the human gut and is linked to a positive impact on metabolism. Perhaps the addition of a citation that provides support for it as a pathogen would clarify the statement on p. 3.

      We have added a recent reference (Nii T, Maeda Y, Motooka D, et al. (2023) Genomic repertoires linked with pathogenic potency of arthritogenic Prevotella copri isolated from the gut of patients with rheumatoid arthritis. Ann Rheum Dis 82: 621-629. doi: 10.1136/annrheumdis-2022-222881).

      (2) In Figure 3, the number of dimers/tetramers for dATP (100 uM) does not add up to 100.

      What is the other 2%?

      Thank you for pointing this out - it has been corrected.

      (3) The data in Figures 5C and D do show slight changes that could be fit and interpreted as a 'weak' interaction. Thus, the statement on p 9 "where dATP-loaded PcNrdD could bind neither GTP nor CTP" should be changed to indicate that the interactions are weak (or that the nucleotides weakly associate).

      The text and the figure have been changed according to the reviewer’s suggestion.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      Firstly, the authors place a great deal of emphasis on the impact of the Hif1-a inhibitor PX-478. The literature surrounding this inhibitor and its mode of action indicates that it is not a direct inhibitor of activity but that its greatest impact is on the production of Hif1-a. The authors do include another inhibitor as a control, Echinomycin, but it does not appear to be as biologically active and the panel of experiments conducted with this is extremely limited. I would be more comfortable with a full Seahorse experimental panel for Echinomycin, similar to SFig 2.G as performed with PX-478.

      We thank the reviewer for their comment highlighting the different mechanisms of action of the HIF-1α inhibitors used in this article. While echinomycin inhibits the binding of HIF-1α to the hypoxia response element (HRE) thereby blocking HIF-1a DNA binding capability, PX-478 inhibits HIF-1α deubiquitination, decreases HIF-1α mRNA expression, and reduces HIF-1α translation. We have included a paragraph explaining this phenomenon in the new version of the manuscript (page 9). In addition, we extended the panel of experiments performed with echinomycin, which confirmed a marked inhibition of the glycolytic pathway when DCs were stimulated with irradiated Mtb in the presence of echinomycin as assessed by SCENITH (new Figure S3H).

      Similarly, it would be of value to have Seahorse profiling that directly excludes FAO from the metabolic profile through the use of Etomoxir as an inhibitor of fatty acid oxidation, which one would assume would have no impact on the metabolic response.

      In order to estimate the contribution of FAO towards fueling protein synthesis in DCs stimulated with iMtb, the FAO inhibitor etomoxir was incorporated to the SCENITH method as previously described (Adamik et al., 2022). Overall, FAO dependence was found to be less than 10% in DCs, regardless of their activation state. While mitochondrial dependence is reduced after iMtb stimulation, there is no difference in FAO dependence, suggesting that OXPHOS is primarily driven by glucose in iMtb-stimulated cells. This is consistent with HIF1α-induced increase of glucose metabolism-related genes. We have adjusted the results section to include this new result (new Figure S1).

      Aside from these minor points, I believe this to be a rigorous study.

      Reviewer #2 (Recommendations For The Authors):

      In Fig. 1 and Fig. 2, the authors conclude that Mtb rewires the metabolism of Mo-DCs and induces both glycolysis and OXPHOS. The data shows that infection with iMtb or Mtb increases glucose uptake and lactate release, suggesting an increase in glycolysis. However, an increase in lactate is not a measure of glycolysis. Lactate is a byproduct of glycolysis; the end product of glycolysis is pyruvate.

      We are grateful for the reviewer's comment, as it gives us the opportunity to explain the conceptual framework on which we based our study. Traditionally, pyruvate has been considered to be the end product of glycolysis when oxygen is present and lactate the end product under hypoxic conditions. Numerous studies have shown that lactate is produced even under aerobic conditions (Brooks, 2018). Therefore, we frame this work in accordance with this view that states that glycolysis begins with glucose as its substrate and terminates with the production of lactate as its main end product (Rogatzki, Ferguson, Goodwin, & Gladden, 2015; Schurr, 2023; Schurr & Schurr, 2017).

      Secondly, since the authors have access to the Agilent Extracellular Flux Analyzer, they should have performed detailed ECAR/OCR measurements to conclusively demonstrate that both glycolysis and OXPHOS are increased in Mo. This is especially important for OXPHOS because the only readout shown for OXPHOS is an increase in mitochondrial mass (figure 1 G, H), which is not acceptable. Overall, the data does not indicate that Mtb triggers OXPHOS in the dendritic cells. It only indicates dead iMtb increases the mass of mitochondria in DCs.

      The reviewer’s advice is well appreciated. However, we would like to clarify what may be a misunderstanding; that is, the assays alluded to by the reviewer were not performed on monocytes but on DCs. As advised by the reviewer, we now include the OCR measurements by Seahorse and describe the figures according to their order of appearance in the new version of the manuscript.

      What happens to the mitochondrial mass when infected with live Mtb?

      In response to the reviewer’s question, we determined the mitochondrial mass in infected DCs with live Mtb. In contrast to DCs treated with irradiated Mtb, those infected with live bacteria showed a clear reduction of their mitochondrial mass (modified Figure 1G). This result indicates that, although both Mtb-infected and irradiated Mtb-exposed DCs show a clear increase in their glycolytic activity, divergent responses are observed in terms of mitochondrial mass.

      It will be best if the authors indicate in the figure headings that dead Mtb was used.

      We agree with the reviewer. For figures 1-3, we applied the term “Mtb” in the figure headings since both irradiated and viable bacteria were used for the corresponding experiments. In figures 4-5, the term “iMtb” (alluding to irradiated Mtb) was used in the figure headings as suggested by the reviewer. For the remaining figures, the term “iMtb” was indicated in their legends when dead bacteria weres used to stimulate DCs.

      E.g., Figure 1F; what does live Mtb do to GLUT1 levels etc etc?

      In response to the reviewer’s question, we included new data about Glut1 expression in DCs infected with live Mtb in the latest version of the manuscript. In line with the increase in glucose uptake shown in figure 1B, we observed an increase in the percentage of Glut1 positive DCs upon Mtb infection (new Figure 1F, lower panels). The increase in Glut1 expression strengthens the notion that DCs activates their glycolytic activity in response to the infection, as demonstrated by the elevated release of lactate, glucose consumption, HIF-1α expression, LDHA expression (Figure 1) and glycolytic activity (Figure 2, SCENITH results with viable Mtb). Therefore, these data strongly support the induction of glycolysis by Mtb (either viable or irradiated) in DCs.

      Also, we found that they were still able to activate CD4+ T cells from PPD+ donors in response to iMtb. This activation of CD4 T cells with iMtb in the presence of a HIF-1alpha inhibitor is expected, as iMtb is dead and not virulent. What happens when the cells are infected with live virulent Mtb?

      We would like to clarify the main purpose of the DC-T cells co-culture assays in the presence of the HIF-1α inhibitors. To characterize the impact of HIF-1α on DC functionality, we assessed the capacity of DCs to activate autologous CD4+ T cells when stimulated with iMtb in the presence of HIF-1α inhibitors. To this end, we used iMtb merely as a source of antigens to load DCs and evaluate the effect of HIF-1α inhibition on the activation of antigen-specific T cell. The use of viable Mtb may introduce confounding factors, such as pathogen-triggered inhibitory mechanisms (e.g., EsxH secretion by Mtb, (Portal-Celhay et al., 2016)), which would prevent us from reaching conclusions about the role of HIF-1α. Thus, we consider that the use of live bacteria for this experiment is out of the scope of this manuscript.

      The authors demonstrated that CD16+ monocytes from TB patients have higher glycolytic capacity than healthy controls Fig 7. The authors should differentiate TB patient monocytes into DCs and measure their bioenergetics to test if infection alters their glycolysis and OXPHOS.

      In agreement with the reviewer, the determination of metabolic pathways in DCs differentiated from monocytes of TB patients is a key aspect of this work. Accordingly, the bioenergetic determinations of DCs generated from monocytes from TB patients versus healthy subjects are now illustrated in Figures 6F (lactate release) and 6G (SCENITH profile).

      In the discussion, the authors state that "pathologically active glycolysis in monocytes from TB patients leads to poor glycolytic induction and migratory capacities of monocyte-derived DCs." However, the data from Fig. 1 and 2 show that treatment with iMtb or Mtb induces glycolysis in MoDCs. How do the authors explain these contrasting results?

      We thank the reviewer for pointing out this issue. Figures 1 and 2 show DCs differentiated from monocytes of healthy donors (HS). In this case, DCs from HS respond to Mtb by inducing a glycolytic and migratory profile. Yet, in the case of monocytes isolated from TB patients, these cells exhibit an early glycolytic profile from the beginning of differentiation, ultimately yielding DCs with low glycolytic capacity and low migratory activity in response to Mtb. We included this explanation in the discussion (page 18) to better clarify this issue.

      Also, the term "pathological" active glycolysis (Introduction and Discussion) is an inappropriate term.

      As requested by the reviewer, we excluded the term “pathological” to describe the phenomenon reported in this study.

      Lastly, it should be shown whether the DCs generated from CD16+ monocyte from TB patients generate tolerogenic and/or aberrant DCs, which have lower glycolytic and migration capacity compared to the CD16- monocyte population. In Figure 7B, the authors should discuss why the CD16+ monocyte population has lower glycolytic capacity compared to CD16- monocytes in healthy donors. Furthermore, in contrast to the TB patients, do DCs generated from CD16+ monocyte in healthy donors have increased glycolytic and migration capacity compared to CD16- monocyte (because these monocytes showed lower glycolytic capacity)? Furthermore, if there is no difference in glycolytic capacity among the three monocyte populations in TB patients, on what basis was it concluded that DCs generated only from the CD16+ monocyte population may be the cause of lower migration capacity? The authors state in Figure 7F that the DMOG pretreatment matches the situation where the Mo-DCs from TB patients showed reduced migration. Did the authors check the Hif-1alpha levels in monocytes obtained from TB patients?

      We appreciate this in-depth analysis by the reviewer because it allows us to clarify some interpretations of the SCENITH results in Figure 7B. It is important to keep in mind that with the SCENITH technique we can only infer about the relative contributions between the metabolic pathways, without alluding to the absolute magnitudes of such contributions. In this regard, it is key to note that the amount of lactate released during the first hours of the TB monocyte culture is much higher than that released by monocytes from healthy subjects (HS, Figure 7A), even when most of monocytes, which are CD14+ CD16-, have comparable glycolytic capacities between HS and TB. Another example to illustrate how to interpret SCENITH results can be found in Figure 2, where a lower mitochondrial dependence is observed in iMtb-stimulated DCs (Figure 2A), while the absolute ATP production associated to OXPHOS is indeed higher as measured by Seahorse (Figure 2D). Therefore, the glycolytic capacity is not a direct readout of the magnitude of glycolysis, but of its contribution to total metabolism. The low levels of lactate released from HS monocytes likely reflects their low activation state and low metabolic activity compared to TB monocytes. In this regard, we have previously demonstrated that monocytes from pulmonary TB patients display an activated phenotype (Balboa et al., 2011). The fact that there is no difference between the glycolytic capacities of TB and HS CD16- monocytes indicates that their proportional contributions to protein synthesis are comparable (again, without inferring about their absolute values, which may be very different).

      Beyond the previous clarification, the reviewer's proposal to isolate subsets of monocytes is a very interesting idea. However, the experimental approach is very difficult based on the amount of blood we can obtain from patients. The cohort of patients included in this work comprises very severe patients and we are given up to 15-20 ml of peripheral blood from each. This volume of blood yields up to 10 million PBMC with approximately 1 million monocytes. If we separate the monocyte subsets, the recovered cells per condition will be insufficient to perform the intended assays.

      Nevertheless, we incorporate new evidence that TB disease is associated with an increased activation and glycolytic profile of circulating CD16+ monocytes.

      i) First, we show that the baseline glycolytic capacity of CD16+ monocytes correlates with time since the onset of TB-related symptoms (new Figure 7C).

      ii) Second, we performed high-throughput GeneSet Enrichment Analysis (GSEA) on transcriptomic data (GEO accession number: GSE185372) of CD14+CD16-, CD14+CD16+ and CD14dimCD16+ monocytes isolated from individuals with active TB, latent TB (IGRA+), as well as from TB negative healthy controls (IGRA-). We found enrichments that, unlike oxidative phosphorylation, glycolysis tends to increase in active TB in both CD14+CD16+ and CD14dimCD16+ monocytes (new Figure 7D).

      iii) We measured the expression of HIF-1α in monocyte subsets by FACS and found that this transcription factor is expressed at higher levels in CD16+ monocyte subsets from TB patients compared to their counterparts from healthy donors (new Figure 8 A). We consider this result justifies the assays shown in Figure 8B-C, in which we prematurely activated HIF-1α in healthy donor monocytes during early differentiation to DCs and measured its impact on the migration of the generated DCs.

      In the Discussion, the authors mention that circulating monocytes from TB patients differentiate from DCs with low immunogenic potential. However, the authors have not shown any immunological defect in any of their data with monocytes from TB patients. In the proxy model mentioned in Figure 7, they have in fact shown that these preconditioned DCs have higher CD86 expression. Can the authors explain/show data to justify the statement in the first paragraph of the Discussion?

      We agree with the reviewer on this observation. Our findings are limited to the generation of DCs with low migratory potential (low chemotactic activity towards CCL21 of DC differentiated from TB patient monocytes shown in figure 6H and of DC generated from pre-conditioned monocytes shown in figure 8C). We have modified that part of the discussion to better clarify this point, replacing migratory with immunogenic.

      The authors should note that oxamate is a competitive inhibitor of the enzyme lactate dehydrogenase and not glycolysis. Also, LDHA catalyzes the conversion from pyruvate to lactate and not the other way around (Results, page 6).

      This comment relates to the first one by the reviewer, in which the dogma of glycolysis was discussed. According to the new conception of glycolysis, it begins with glucose as its substrate and terminates with the production of lactate as its main end product.

      The following statements by the authors on page 6 are incorrect: "Because irradiated and viable Mtb induced comparable activation of glycolysis, we subsequently performed all our assays with irradiated Mtb only in the rest of the study due to biosafety reasons." and: "To our knowledge, this is the first study addressing the metabolic status and migratory activity of Mo-DCs from TB patients."

      We deleted the first sentence and reworded the second sentence as "To our knowledge, this is the first study to address how the metabolic status of monocytes from TB patients influences the migratory activity of further differentiated DCs".

      The Discussion reads as if live Mtb was used in the experiments, which is not the case. This should be corrected.

      We changed Mtb for iMtb when it was the case in the discussion. In some cases, Mtb stimulation was used instead of Mtb infection.

      Minor Comments:

      (1) In Figure 1F legend "Quantification of Glut1+ cells plotted to the right". The underlined part should be "plotted below".

      It was corrected.

      (2) In Figure 1H. Please describe the quantitation method and describe how many cells or the number/size of fields were used to quantitate mitochondria.

      For mitochondrial morphometric analysis, TEM images were quantified with the ImageJ “analyze particles” plugin in thresholded images, with size (μm2) settings from 0.001 to infinite. For quantification, 8–10 cells of random fields (1000x magnification) per condition were analyzed. We included this information in the methods section of the new version of the manuscript.

      (3) Please mention the number of independent experimental repeats for each experimental data set and figure.

      In each figure, the number of independent experiments is indicated by individual dots.

      (4) In Figure 2A legend, "PER; left panel" should be PER; lower panel and "OCR; right panel" should be OCR; upper panel.

      It was corrected.

      References for reviewers

      Adamik, J., Munson, P. V., Hartmann, F. J., Combes, A. J., Pierre, P., Krummel, M. F., … Butterfield, L. H. (2022). Distinct metabolic states guide maturation of inflammatory and tolerogenic dendritic cells. Nature Communications 2022 13:1, 13(1), 1–19. https://doi.org/10.1038/s41467-022-32849-1

      Balboa, L., Romero, M. M., Basile, J. I., Sabio y Garcia, C. A., Schierloh, P., Yokobori, N., … Aleman, M. (2011). Paradoxical role of CD16+CCR2+CCR5+ monocytes in tuberculosis: efficient APC in pleural effusion but also mark disease severity in blood. Journal of Leukocyte Biology. https://doi.org/10.1189/jlb.1010577

      Brooks, G. A. (2018). Cell Metabolism The Science and Translation of Lactate Shuttle Theory. Cell Metab. https://doi.org/10.1016/j.cmet.2018.03.008

      Portal-Celhay, C., Tufariello, J. M., Srivastava, S., Zahra, A., Klevorn, T., Grace, P. S., … Philips, J. A. (2016). Mycobacterium tuberculosis EsxH inhibits ESCRT-dependent CD4+ T-cell activation. Nature Microbiology, 2, 16232. https://doi.org/10.1038/NMICROBIOL.2016.232

      Rogatzki, M. J., Ferguson, B. S., Goodwin, M. L., & Gladden, L. B. (2015). Lactate is always the end product of glycolysis. Frontiers in Neuroscience, 9(FEB), 125097. https://doi.org/10.3389/FNINS.2015.00022/BIBTEX

      Schurr, A. (2023). From rags to riches: Lactate ascension as a pivotal metabolite in neuroenergetics. Frontiers in Neuroscience, 17, 1145358. https://doi.org/10.3389/FNINS.2023.1145358/BIBTEX

      Schurr, A., & Schurr, A. (2017). Lactate, Not Pyruvate, Is the End Product of Glucose Metabolism via Glycolysis. Carbohydrate. https://doi.org/10.5772/66699

    1. Author Response

      The following is the authors’ response to the previous reviews.

      Thank you for your continued review and for providing insightful suggestions. Below, I share some unpublished new findings related to the MYRF ChIP, comment on the potential interplay between myrf-1 and myrf-2, and describe the modifications we've implemented to address the reviewers' comments.

      (1) MYRF-1 ChIP

      Our collaboration with the modERN (Model Organism Encyclopedia of Regulatory Networks) project has recently yielded MYRF ChIP data. The results demonstrate clear and consistent MYRF binding across samples, notably on the lin-4 promoter. Given the significant detail and extensive description required to adequately present these findings, we have decided it is impractical to include them in the current paper. These results will be more suitably published in a separate ongoing study focused on MYRF's regulatory targets during larval development.

      (2) Inter-regulation between myrf-1 and myrf-2

      We acknowledge the interpretation that myrf-2 may act as a genetic antagonist to myrf-1, as suggested by the delayed arrest in myrf-1; myrf-2 double mutants and a trend towards increased lin-4 expression in myrf-2 mutants. Additionally, our unpublished data suggest an elevated myrf-2 expression peak in myrf-1 null mutants during the L1-L2 transition, indicating a potential mutual repressive interaction between myrf1 and myrf-2.

      On the other hand, myrf-1 and myrf-2 exhibit functional redundancy in DD synaptic rewiring and lin-4 expression. A gain of function in myrf-2 promotes early DD synaptic rewiring. Furthermore, three independent co-immunoprecipitation analyses targeting myrf-1::gfp, myrf-2::gfp, and pan-1::gfp confirm a tight association between myrf-1 and myrf-2 in vivo. These findings challenge the notion of myrf-2 primarily antagonizing myrf-1, or vice versa.

      We propose a model where myrf-1 and myrf-2 collaborate and are functionally redundant, with compensatory elevated expression when one paralog is absent. For instance, the loss of myrf-1 triggers upregulation of myrf-2, which, though insufficient on its own, accelerates the transcriptional program and exacerbates system deterioration, leading to accelerated death. How exactly this takes place is currently unclear. We notice the MYRF binding on both myrf-1 and myrf-2 genes in MYRF-ChIP.

      Given the complexity of these interactions, we have chosen not to delve deeply into this discussion in the paper without more direct evidence, which would require detailed analysis.

      (3) Revisions Addressing Reviewer Suggestions

      (a) We have revised our interpretation of the mScarlet signal changes in myrf-1(ybq6) and myrf-2(ybq42) mutants to reflect a more nuanced understanding of their potential genetic relationship, as highlighted in the main text.

      “The mScarlet signals exhibit a marked reduction in the putative null mutant myrf-1(ybq6) (Figure 1D, E). Intriguingly, in the putative null myrf-2(ybq42) mutants, there is a noticeable trend towards increased mScarlet signals, although this increase does not reach statistical significance (Figure 2C, D).”

      (b) In response to feedback on Figure 2 and the characterization of lin-4(umn84) mutants, we've included a new series of images showing lin-4(umn84)/+ and lin-4(umn84) signals through larval stages, presented as Figure 2 Figure Supplement 2. This addition clarifies the functional status of lin-4 nulls in our study.

      “Our observations revealed that mScarlet signals were not detected early L1 larvae (Figure 2C-F; Figure 2 Figure Supplement 2).”

      (c) To improve the clarity of Fig 6, we've added indicator arrows in the red, green, and merge channels, enhancing the visualization of the signals.

      We appreciate the opportunity to clarify these points and hope that our revisions and additional data address the concerns raised.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We greatly appreciate the reviewers' and editors' comments and suggestions on our manuscript "Transposable elements regulate thymus development and function." We performed additional analyses to validate our results and rephrased some manuscript sections according to the comments. We believe these changes significantly increase the solidity of our conclusions. Our point-by-point answer to the reviewers' and editors' comments is detailed below. New data and analyses are shown in Figure 1d, Figure 2g and h, Figure 5e and f, Figure 1 – figure supplement 1, Figure 2 – figure supplement 2, Figure 3 – figure supplement 1 and 2, Figure 4 – figure supplement 2, Figure 5 – figure supplement 1, as well as the corresponding text sections.

      Reviewer #1:

      (1) The authors sometimes made overstatements largely due to the lack or shortage of experimental evidence.

      For example in figure 4, the authors concluded that thymic pDCs produced higher copies of TE-derived RNAs to support the constitutive expression of type-I interferons in thymic pDCs, unlike peripheral pDCs. However, the data was showing only the correlation between the distinct TE expression pattern in pDCs and the abundance of dsRNAs. We are compelled to say that the evidence is totally too weak to mention the function of TEs in the production of interferon. Even if pDCs express a distinct type and amount of TE-derived transcripts, it may be a negligible amount compared to the total cellular RNAs. How many TE-derived RNAs potentially form the dsRNAs? Are they over-expressed in pDCs?

      The data interpretation requires more caution to connect the distinct results of transcriptome data to the biological significance.

      We contend that our manuscript combines the attributes of a research article (novel concepts) and a resource article (datasets of TEs implicated in various aspects of thymus function). The critical strength of our work is that it opens entirely novel research perspectives. We are unaware of previous studies on the role of TEs in the human thymus. The drawback is that, as with all novel multi-omic systems biology studies, our work provides a roadmap for a multitude of future mechanistic studies that could not be realized at this stage. Indeed, we performed wet lab experiments to validate some but not all conclusions: i) presentation of TE-derived MAPs by TECs and ii) formation of dsRNAs in thymic pDCs. In response to Reviewer #1, we performed supplementary analyses to increase the robustness of our conclusions. Also, we indicated when conclusions relied strictly on correlative evidence and clarified the hypotheses drawn from our observations.

      Regarding the Reviewer's questions about TE-derived dsRNAs, LINE, LTR, and SINE elements all have the potential to generate dsRNAs, given their highly repetitive nature and bi-directional transcription (1). As ~32% of TE subfamilies are overexpressed in pDCs, we hypothesized that these TE sequences might form dsRNA structures in these cells. To address the Reviewer's concerns regarding the amount of TE-derived RNAs among total cellular RNAs, we also computed the percentage of reads assigned to TEs in the different subsets of thymic APCs (see Reviewer 1 comment #4).

      (2) Lack of generality of specific examples. This manuscript discusses the whole genomic picture of TE expression. In addition, one good way is to focus on the specific example to clearly discuss the biological significance of the acquisition of TEs for the thymic APC functions and the thymic selection.

      In figure 2, the authors focused on ETS-1 and its potential target genes ZNF26 and MTMR3, however, the significance of these genes in NK cell function or development is unclear. The authors should examine and discuss whether the distinct features of TEs can be found among the genomic loci that link to the fundamental function of the thymus, e.g., antigen processing/presentation.

      We thank the Reviewer for this highly relevant comment. We investigated the genomic loci associated with NK cell biology to determine if ETS1 peaks would overlap with TE sequences in protein-coding genes' promoter region. Figure 2h illustrates two examples of ETS1 significant peaks overlapping TE sequences upstream of PRF1 and KLRD1. PRF1 is a protein implicated in NK cell cytotoxicity, whereas KLRD1 (CD94) dimerizes with NKG2 and regulates NK cell activation via interaction with the nonclassical MHC-I molecule HLA-E (2, 3). Thus, we modified the section of the manuscript addressing these results to include these new analyses:

      "Finally, we analyzed publicly available ChIP-seq data of ETS1, an important TF for NK cell development (4), to confirm its ability to bind TE sequences. Indeed, 19% of ETS1 peaks overlap with TE sequences (Figure 2g). Notably, ETS1 peaks overlapped with TE sequences (Figure 2h, in red) in the promoter regions of PRF1 and KLRD1, two genes important for NK cells' effector functions (2, 3)."

      (3) Since the deep analysis of the dataset yielded many intriguing suggestions, why not add a discussion of the biological reasons and significance? For example, in Figure 1, why is TE expression negatively correlated with proliferation? cTEC-TE is mostly postnatal, while mTEC-TE is more embryonic. What does this mean?

      We thank the Reviewer for this comment. To our knowledge, the relationship between cell division and transcriptional activity of TEs has not been extensively studied in the literature. However, a recent study has shown that L1 expression is induced in senescent cells. We therefore added the following sentences to our Discussion:

      "The negative correlation between TE expression and cell cycle scores in the thymus is coherent with recent data showing that transcriptional activity of L1s is increased in senescent cells (5). A potential rationale for this could be to prevent deleterious transposition events during DNA replication and cell division."

      We also added several discussion points regarding the regulation of TEs by KZFPs to answer concerns raised by Reviewer 2 (see Reviewer 2 comment #1).

      (4) To consolidate the experimental evidence about pDCs and TE-derived dsRNAs, one option is to show the amount of TE-derived RNA copies among total RNAs. The immunohistochemistry analysis in figure 4 requires additional data to demonstrate that overlapped staining was not caused by technical biases (e.g. uneven fixation may cause the non-specifically stained regions/cells). To show this, authors should have confirmed not only the positive stainings but also the negative staining (e.g. CD3, etc.). Another possible staining control was showing that non-pDC (CD303- cell fractions in this case) cells were less stained by the ds-RNA probe.

      We thank the Reviewer for this suggestion. We computed the proportion of reads in each cell assigned to two groups of sequences known to generate dsRNAs: TEs and mitochondrial genes (1). These analyses showed that the proportion of reads assigned to TEs is higher in pDCs than other thymic APCs by several orders of magnitude (~20% of all reads). In contrast, reads derived from mitochondrial genes had a lower abundance in pDCs. We included these results in Figure 4 – figure supplement 2 and included the following text in the Results section entitled "TE expression in human pDCs is associated with dsRNA structures":

      "To evaluate if these dsRNAs arise from TE sequences, we analyzed in thymic APC subsets the proportion of the transcriptome assigned to two groups of genomic sequences known as important sources of dsRNAs, TEs and mitochondrial genes (1). Strikingly, whereas the percentage of reads from mitochondrial genes was typically lower in pDCs than in other thymic APCs, the proportion of the transcriptome originating from TEs was higher in pDCs (~22%) by several orders of magnitude (Figure 4 – figure supplement 2)."

      As a negative control for the immunofluorescence experiments, we used CD123- cells. Indeed, flow cytometry analysis of the magnetically enriched CD303+ fraction was around 90% pure, as revealed by double staining with CD123 and CD304 (two additional markers of pDCs): CD123- cells were also CD304-/lo, showing that these cells are non-pDCs. Thus, we decided to compare the dsRNA signal between CD123+ cells (pDCs) and CD123- cells (non-pDCs). The difference between CD123+ and CD123- cells was striking (Figure 4d).

      Author response image 1.

      Reviewer #1 (Recommendations For The Authors):

      It was sometimes difficult for me to recognize the dot plots representing low expression against the white background. e.g., figure 1 supplement 1.

      We thank the Reviewer for their comment, and we modified Figure 1 – figure supplement 1 as well as Figure 3 – figure 3 supplement 2 to improve the contrast between dots and background.

      Reviewer #2:

      Reviewer #2 (Recommendations For The Authors):

      (1) In the abstract, results and discussion, the following conclusions are drawn that are not supported by the data: a) TEs interact with multiple transcription factors in thymic cells, b) TE expression leads to dsRNA formation, activation of RIG-I/MDA5 and secretion of IFN-alpha, c) TEs are regulated by cell proliferation and expression of KZFPs in the thymus. All these statements derive from correlations. Only one TF has ChIP-seq data associated with it, dsRNA formation and/or IFN-alpha secretion could be independent of TE expression, and whilst KZFPs most likely regulate TEs in the thymus, the data do not demonstrate it. The authors also seem to suggests that AIRE, FEZF2 and CHD4 regulate TEs directly, but binding is not shown. The manuscript needs a thorough revision to be absolutely clear about the correlative nature of the described associations.

      We agree with Reviewer #2 that some of the conclusions in our initial manuscript were not fully supported by experimental data. In the revised manuscript, we clearly indicated when conclusions relied strictly on correlative evidence and clarified the hypotheses drawn from our observations. Regarding the regulation of TE expression by AIRE, FEZF2, and CHD4, we reanalyzed publicly available ChIP-seq data of AIRE and FEZF2 in murine mTECs. For AIRE, we confirmed that ~30% of AIRE's statistically significant peaks overlap with TE sequences (see Reviewer 2, comment #6 for more details on read alignment and peak calling), confirming its ability to bind to TE sequences directly. We added these results to the main figures (Figure 5f) and modified the "AIRE, CHD4, and FEZF2 regulate distinct sets of TE sequences in murine mTECs" as follows:

      “[…]. As a proof of concept, we validated that 31.42% of AIRE peaks overlap with TE sequences by reanalyzing ChIP-seq data, confirming AIRE's potential to bind TE sequences (Figure 5f)."

      A reanalysis of FEZF2's ChIP-seq data yielded no significant peaks while using stringent criteria. For this reason, we decided to exclude these data and only use AIRE as a proof of concept.

      Regarding KZFPs, we agree with Reviewer #2 that their impact on TE expression is probably significantly underestimated in our data. A potential reason for this is that KZFP expression is typically low; thus, transcriptomic signals from KZFPs could have been missed by the low depth of scRNA-seq. We mentioned this point in the Discussion:

      "On the other hand, the contribution of KZFPs to TE regulation in the thymus is likely underestimated due to their typically low expression (6) and scRNA-seq's limit of detection."

      (2) On the technical side, there are many dangers about analyzing RNA-seq data at the subfamily level and without stringent quality control checks. Outputs may be greatly confounded by pervasive transcription (see PMID 31425522), DNA contamination, and overlap of TEs with highly expressed genes. Whether TE transcripts are independent units or part of a gene also has important implications for the conclusions drawn. I would say that for most purposes of this work, an analysis restricted to independent TE transcripts, with appropriate controls for DNA contamination, would provide great reassurances that the results from subfamily-level analyses are sound. Showing examples from the genome browser throughout would also help.

      We agree with the Reviewer that contamination could have interfered with TE quantification. We used FastQ Screen (7) to evaluate the contamination of our human scRNA-seq data. As illustrated in the Figure below, most reads aligned with the human genome, and there were no reads uniquely assigned to another species analyzed, confirming the high purity of our dataset.

      Author response image 2.

      As stated by the Reviewer, pervasive expression is another factor that can lead to overestimation of TE expression. To evaluate if pervasive expression impacted the results of our differential expression analysis of TEs between APC subsets, we visualized read alignment to TE sequences using a genome browser. We selected two samples containing the highest numbers of mTEC(II) and pDCs (T07_TH_EPCAM and FCAImmP7277556, respectively) and used STAR to align reads to the human genome (GRCh38). We then visualized read alignment to randomly selected loci of two subfamilies identified as overexpressed by mTEC(II) or pDCs (HERVE-int and Harlequin-int, respectively). The examples below show that the signal detected is specific to the TE sequences located in introns. Even though this visualization cannot guarantee that pervasive expression did not affect TE quantification in any way, it increases the confidence that the signal detected by our analyses genuinely originates from TE expression.

      Author response image 3.

      Author response image 4.

      Author response image 5.

      Author response image 6.

      Author response image 7.

      (3) Related to the above, it would be useful to describe in the main text (and methods) how multi-mapping reads are being handled. It wasn't clear to me how kallisto handles this, and it has implications for the results. In the analysis suggested above, only uniquely mapped reads would have to be used, despite its limitations.

      We agree with the Reviewer that this information regarding assignment of multimapping reads is important. Kallisto uses an expectation-maximization (EM) algorithm to deal with multimapping reads, a strategy used by several algorithms developed to study TE expression (8). Briefly, the EM algorithm reassigns multimapping reads based on the number of uniquely mapped reads assigned to each sequence. Thus, we added the following details to the methods section:

      "Preprocessing of the scRNA-seq data was performed with the kallisto (9), which uses an expectation-maximization algorithm to reassign multimapping reads based on the frequency of unique mappers at each sequence, and bustools workflow."

      (4) Whilst I liked the basic idea, I am not convinced that correlating TE and TF expression is a good strategy for identifying TE-TF associations at enhancers. Enhancers express very low levels of short transcripts, which I doubt would be detected in low-depth scRNA-seq data. The transcripts the authors are using to make such associations may therefore have nothing to do with the enhancer roles of TEs. I would limit these analyses to cell types for which there is histone modification data and correlate TF expression with that instead.

      We agree with the Reviewer that it would have been interesting to correlate the expression of TFs with signals of histone marks at TE sequences. However, we could not perform this analysis because we did not have matched data of histone marks throughout thymic development. Therefore, we adopted an alternative, well-suited strategy.

      Our strategy to identify TE enhancer candidates is depicted in Figure 2a: i) correlation between the expression of the TF and the TE subfamily, ii) presence of the TF binding motif in the sequence of the TE enhancer candidate, and iii) colocalization of the TE enhancer candidate with significant peaks of H3K27ac and H3K4me3 in the same cell type from the ENCODE Consortium ChIP-seq data. We limited our analyses to the eight cell types present both in our dataset and the ENCODE Consortium: B cells, CD4 Single Positive T cells (CD4 SP), CD8 Single Positive T cells (CD8 SP), dendritic cells (DC), monocytes and macrophages (Mono/Macro), NK cells, Th17, and Treg.

      (5) Figure 2G: binding of ETS1 is unconvincing. Were there statistically meaningful peaks called in these regions? It would be good to also show a metaplot/heatmap of ETS1 profile over all elements of relevant subfamilies. Showing histone marks on the genome browser snapshots would also be useful. Is there any transcriptional evidence that the specific Alus shown act as alternative promoters?

      We agree with the Reviewer that the examples provided were not particularly convincing. Thus, we reanalyzed the data to determine if statistically significant ETS1 peaks (see the answer to Reviewer 2's comment #6 for details on the methods) located near gene transcription start sites overlapped with TEs. We thereby provided examples of significant ETS1 peaks overlapping TE sequences in the promoter region of two prototypical NK cell protein-coding genes (Figure 2h).

      (6) Why was -k 10 used with bowtie2? This will map the same read to multiple locations in the genome, increasing read density at more repetitive (younger) TEs. The authors should use either default settings, being clear about the outcome (random assignment of multimapping reads to one location), or use only uniquely aligned reads.

      We thank the Reviewer for their comment and agree that using the -k 10 parameter with bowtie2 was not optimal for TE analysis. To improve the strength of our analyses, we reanalyzed all ChIP-seq data of our manuscript (Figure 2g and h, Figure 5e and f) using the following strategy: alignment with bowtie2 using default parameters except –very-sensitive, multimapping read removal with samtools view -q 10, removal of duplicate reads with samtools markdup -r, peaks calling was performed with macs2 with the -m 5 50 parameter, and peaks overlapping ENCODE's blacklist regions were removed with bedtools intersect.

      These new analyses strengthen our evidence that TEs interact with multiple genes that regulate thymic development and function. We updated the results sections concerning ChIP-seq data analyses and the Methods section to include this information:

      "ChIP-seq reads were aligned to the reference Homo sapiens genome (GRCh38) using bowtie2 (version 2.3.5) (10) with the --very-sensitive parameter. Multimapping reads were removed using the samtools view function with the -q 10 parameter, and duplicate reads were removed using the samtools markdup function with the -r parameter (11). Peak calling was performed with macs2 with the -m 5 50 parameter (12). Peaks overlapping with the ENCODE blacklist regions (13) were removed with bedtools intersect (14) with default parameters. Overlap of ETS1 peaks with TE sequences was determined using bedtools intersect with default parameters. BigWig files were generated using the bamCoverage function of deeptools2 (15), and genomic tracks were visualized in the USCS Genome Browser (16)."

      (7) Figure 1d needs a y axis scale. Could the authors also provide details of how the random distribution of TE expression was generated?

      We agree that the Reviewer that Figure 1d was incomplete and made the appropriate modifications. Regarding the random distribution, we reproduced our dataset containing the expression of 809 TE subfamilies in 18 cell populations. For each combination of TE subfamily and cell type, we randomly assigned an "expression pattern" as identified by the hierarchical clustering of Figure 1b. Then, we computed the maximal occurrence of an expression pattern across cell types for each TE subfamily to generate the distribution curve in Figure 1d. We added the following details to the Methods section to clarify how the random distribution was generated:

      "As a control, a random distribution of the expression of 809 TE subfamilies in 18 cell populations was generated. A cluster (cluster 1, 2, or 3) was randomly attributed for each combination of TE subfamily and cell type, and the maximal occurrence of a given cluster across cell types was then computed for each TE subfamily. Finally, the distributions of LINE, LTR, and SINE elements were compared to the random distribution with Kolmogorov-Smirnov tests."

      (8) The motif analysis requires a minimum of 1 locus from each TE subfamily containing it in order to be reported, but this seems like a really low threshold that will output a lot of noise. What is the rationale here?

      We agree with the Reviewer that this threshold might appear low. Nonetheless, these analyses ultimately aimed to identify TE promoter and enhancer candidates. Hence, we did not want to put an arbitrary threshold at a higher value (e.g., a certain number or percentage of all loci of a given TE subfamily), as this might create a bias based on the total number of loci of a given TE subfamily. Moreover, our rationale was that a TE locus might act as a promoter/enhancer even if it is the only locus of its subfamily containing a TF binding site.

      Even though this strategy might have created some noise in the analyses of interactions between TFs and TEs of Figure 2 (panels a-e), we are confident that our bootstrap strategy efficiently removed low-quality identifications based on low correlations values or expression of TF and TE in low percentages of cells. Additionally, the subsequent analyses on TE promoter and enhancer candidates were performed exclusively for the TE loci containing TF binding sites to avoid adding noise to these analyses.

      (9) Figure 4e: is this a log2 enrichment? If not, the enrichments for some of the gene sets are not so high.

      The enrichment values represented in Figure 4e are not log-transformed. It is essential to highlight that gene set enrichment values were computed for each possible pair of thymic APCs (e.g., pDC vs. cDC1, pDC vs. mTEC(II), etc.), and the values represented in Figure 4e are an average of each comparison pictured at the bottom of the UpSet plot.

      However, we agree with Reviewer 2 that the average enrichment value is not extremely high. We thus made the following modifications to the Results section ("TE expression in human pDCs is associated with dsRNA structures") to better represent it:

      "Notably, thymic pDCs harbored moderate yet significant enrichment of gene signatures of RIG-I and MDA5-mediated IFN ɑ/β signaling compared to all other thymic APCs (Figure 4e and Supplementary file 1 – Table 8)."

      (10) Please be clear on results subtitles when these refer to mouse.

      We apologize for the confusion and modified the subtitles to clarify if the results refer to mouse or human data.

      (11) Figure 1 - figure supplement 2: "assignation" should be 'assignment'.

      We thank the Reviewer for their keen eye and changed the title of Figure 1 – figure supplement 2.

      (1) Sadeq S, Al-Hashimi S, Cusack CM, Werner A. Endogenous Double-Stranded RNA. Noncoding RNA. 2021;7(1).

      (2) Kim N, Kim M, Yun S, Doh J, Greenberg PD, Kim TD, et al. MicroRNA-150 regulates the cytotoxicity of natural killers by targeting perforin-1. J Allergy Clin Immunol. 2014;134(1):195-203.

      (3) Gunturi A, Berg RE, Forman J. The role of CD94/NKG2 in innate and adaptive immunity. Immunol Res. 2004;30(1):29-34.

      (4) Taveirne S, Wahlen S, Van Loocke W, Kiekens L, Persyn E, Van Ammel E, et al. The transcription factor ETS1 is an important regulator of human NK cell development and terminal differentiation. Blood. 2020;136(3):288-98.

      (5) De Cecco M, Ito T, Petrashen AP, Elias AE, Skvir NJ, Criscione SW, et al. L1 drives IFN in senescent cells and promotes age-associated inflammation. Nature. 2019;566(7742):73-8.

      (6) Huntley S, Baggott DM, Hamilton AT, Tran-Gyamfi M, Yang S, Kim J, et al. A comprehensive catalog of human KRAB-associated zinc finger genes: insights into the evolutionary history of a large family of transcriptional repressors. Genome Res. 2006;16(5):669-77.

      (7) Wingett SW, Andrews S. FastQ Screen: A tool for multi-genome mapping and quality control. F1000Res. 2018;7:1338.

      (8) Lanciano S, Cristofari G. Measuring and interpreting transposable element expression. Nat Rev Genet. 2020;21(12):721-36.

      (9) Bray NL, Pimentel H, Melsted P, Pachter L. Near-optimal probabilistic RNA-seq quantification. Nat Biotechnol. 2016;34(5):525-7.

      (10) Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nat Methods. 2012;9(4):357-9.

      (11) Danecek P, Bonfield JK, Liddle J, Marshall J, Ohan V, Pollard MO, et al. Twelve years of SAMtools and BCFtools. Gigascience. 2021;10(2).

      (12) Zhang Y, Liu T, Meyer CA, Eeckhoute J, Johnson DS, Bernstein BE, et al. Model-based analysis of ChIP-Seq (MACS). Genome Biol. 2008;9(9):R137.

      (13) Amemiya HM, Kundaje A, Boyle AP. The ENCODE Blacklist: Identification of Problematic Regions of the Genome. Sci Rep. 2019;9(1):9354.

      (14) Quinlan AR, Hall IM. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010;26(6):841-2.

      (15) Ramirez F, Ryan DP, Gruning B, Bhardwaj V, Kilpert F, Richter AS, et al. deepTools2: a next generation web server for deep-sequencing data analysis. Nucleic Acids Res. 2016;44(W1):W160-5.

      (16) Kent WJ, Sugnet CW, Furey TS, Roskin KM, Pringle TH, Zahler AM, et al. The human genome browser at UCSC. Genome Res. 2002;12(6):996-1006.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      The very detailed insights gained by the authors into allosteric regulation require very specialized techniques in this study. This poses a challenge to communicate the methods, the results, and the meaning of the results to a broader audience. In some places, the authors overcome this challenge better than in others.

      Following this reviewer’s suggestions, we have extensively revised the text, making the text more understandable to a broader audience.

      The manuscript does not show up on BioRxiv.

      The manuscript is now deposited in Biorxv (doi: 10.1101/2023.09.12.557419)

      Fig3: GS-ES2 transition: the changes appear minimal in the illustration.

      As suggested by this reviewer, we have re-examined the GS-ES2 transition and clearly defined the structural characteristics of the conformationally excited state 2 (ES2) state. As shown in the revised Fig.3 of the main text, the ground state (GS) features a π-π packing between the aromatic rings of F100 and Y156, as well as a cation-π stacking between R308 and F102. In the ES2 state, these above interactions are disrupted, while a new π-π packing interaction is formed between F100 and F102. We added new comments in the main text clarifying these structural interactions that characterize each state.

      GS-ES1 transition: how is the K72-E91 salt bridge disrupted? How do you define the formation/disruption of a salt bridge? The current figure does not make this very clear and the K72-E91 salt bridge appears to be intact in ES1. Maybe the authors could replace the dotted K72-E91 line with a dotted line and distance?

      As stated above, we revised Fig. 3 highlighting the differences between the two states. The K72 and E91 salt bridge is formed when the distance between Nε of K72 and Oε of E91 is shorter than 4.0 Å (the typical cutoff for a salt bridge). In the ES1 state, the outward movement of the αC helix increases the distance over 4.5 Å, disrupting the salt bridge.

      L251: Could the authors remind the reader why they are only comparing V104 and I150? Could they give a little context as to why they consider the agreement to be good? It appears that they would be statistically different, so a little context for what comprises a good agreement in the literature may be helpful.

      Our mutagenesis studies show that V104 and I150 are key residues for allosteric communication, and if mutated, result in well-folded but inactive kinases (Sci Adv. doi: 10.1126/sciadv.1600663). Importantly, V104 and I150 show two distinct populations in the CEST experiments that can be directly related to the GS and ES states. Regarding the fitting of these residues, we obtained a good agreement with the direction of the chemical shifts, which supports the hypothesized GS -> ES structural transition. The lack of a quantitative agreement between the chemical shifts of the experimental and simulated excited state is not surprising for two reasons a) all state-of-the art simulations fall short in sampling slow conformational interconversions, and b) the uncertainty of the SHIFTX algorithm for the prediction of 13C chemical shifts of methyl groups is quite large. Finally, we would like to point out that most NMR relaxation-dispersion experiments (CEST and CPMG) are performed for the backbone 15N, 13Calpha and 1H resonances, which have been used to calculate the structures of the intermediate states (Neudecker, P. et. al Science, 2012, 336,doi: 10.1126/science.1214203) and yield reasonable agreement with the prediction for metastable states derived from Markov Models (Olsson, S. J. Am. Chem. Soc., 2017,139,doi:10.1021/jacs.6b09460). To the best of our knowledge, there is no literature reporting on calculations of the 13C CEST profiles for methyl groups from MD simulations, and remarkably, we found a reasonably good agreement between experimental and predicted chemical shifts (see Fig.5C).

      Just to clarify: the calculated CS values are informed by experimental CS values that were used in the calculation?

      We used the backbone chemical shifts as the restraints only in the metadynamics simulations. We used the chemical shifts of the methyl groups and their corresponding excited states to verify the ES2 state.

      Figure 8: in its current form this potentially exciting result is lost on the average reader.

      we modified Fig. 8 of the main text, making the intra- and inter-residue correlations visible to the reader.

      Reviewer #2:

      While the alphaC-beta4 loop is a conserved feature of protein kinases, the residues within this loop vary across various kinase families and groups, enabling group and family-specific control of activity through cis and trans acting elements. F102 in PKA interacts with co-conserved residues in the C-tail, which has been proposed to function as a cis regulatory element. The authors should elaborate on the conformational changes in the C-tail, particularly in the arginine that packs against F102, in the results and discussion. This would further extend the impact and scope of the manuscript, which is currently confined to PKA.

      As suggested by this reviewer, we re-analyzed the time-dependent interactions between F102 and R308 at the C-tail. As this reviewer suspected, these interactions differentiate the ES2 from the GS state. In the GS state, there is a stable cation-π interaction between F102 and R308, which becomes transient in the ES2 state (Fig. 3). For the F100A mutant, the interactions between F102 and R308 have lower occurrence relative to the WT enzyme, i.e., a weaker interaction between the αC-β4 loop and the C-tail (see new Figure 6 - figure supplement 1). The latter supports our conclusion that the structural coupling between the C-tail and the two lobes of the enzyme decreases for the F100A mutant. We added more comments in the main text.

      FAIR standards of making the data accessible and reproducible are not directly addressed.

      We have deposited all our NMR data on the Data Repository Site at the University of Minnesota, DRUM (https://hdl.handle.net/11299/261043).

      The MD data and conformational states would be a valuable resource for the community and should be shared via some open-source repositories.

      Due to the large size of the simulations (>500 GB), we could not deposit them in the Data Repository Site at the University of Minnesota (DRUM). We are actively working with the personnel at DRUM to upload all the trajectories in an alternate site. However, these data will be available to the public immediately upon request.

      The authors state that ES1 and ES2 states are novel and not observed in previous crystal structures. The authors should quantify this through comparisons with PKA inactive states and with other AGC kinases.

      We apologize for the confusion. We now clarify that the ES1 is a well-known inactivation pathway. As suggested by this reviewer, we now report a few examples of active and inactive conformations of PKA-C and other kinases (see new Figure 3 – figure supplement 2.). Briefly, ES1 corresponds to the typical αC-out conformation found for PKA-C bound to inhibitors or in R194A mutant. A similar conformation is present for Src, Abl, and CDK2. The C-out conformation features a disrupted β3K-αCE salt bridge, which is key for active kinases. In contrast, the transition GS-ES2 is not present in the inactive conformations deposited in the PDB.

      Based on the results, can the authors speculate on the impact of oncogenic mutations in the alphaCbeta4 loop mutations in PKA?

      We now include additional comments and another citation that further supports our findings. In short, the activation of a kinase is generated by mutation insertions that stabilize the αC-β4 loop as pointed out by Kannan and Zhang (see references 28, 30, and 68). In contrast, mutations that destabilize this allosteric site (e.g., F100A) are inactivating, disrupting the structural couplings of the two lobes (our work).

      Reviewer #3:

      The manuscript is somewhat difficult to read even for kinase experts, and even harder for the layman. The difficulty partially arises from mixing technical description of the simulations with structural interpretation of the results, which is more intuitive, and partially arises from the assumption that readers are familiar with kinase architecture and its key elements (the aC helix, the APE motif, etc).

      We revised the text and modified Fig. 1 in the main text to make the paper more accessible to the general audience.

      The authors haven't done a good job describing the ES2 state intuitively. From my examination of the figures, it appears that in the ES2 state, the kinase domain is more elongated and the N and the C lobes are relatively less engaged than in the ground state. This may or may not be exactly, but a more intuitive description of the ES2 state is needed.

      As suggested by this reviewer, we include a better description of the ES2 state of the kinase and the structural details of the inactivation pathway. Also, we checked the radius of gyration of the two lobes for GS and ES2. ES2 is slightly more elongated with an Rg of 20.3 ± 0.1 Å as compared to the GS state (20.0 ± 0.2 Å). This marginal difference is consistent with our characterization of the local packing around the C-4 loop, in which the lack of stable interaction with E and C-tail in the ES2 state makes the overall structure less compact.

      The authors need to introduce and give a brief description of technical terms such as CV (collective variable), PC (principal component) etc.

      We now specify both collective variables and principal components and include those definitions in the Method section. Briefly, to characterize the complex conformational transitions of PKA-C, we utilize collective variables (Figure 2 – figure supplement 1). We chose these variables based on structural motifs described in the literature to define local and global structural transitions (Camilloni C., Vendruscolo, M, Biochemistry, 2015,54,7470; Kukic, P. et al. Structure, 2015,23, 745). On the other hand, we utilized the principal component analysis to compare the conformational changes of the kinase in the same two-dimensional space, revealing the two lowest frequencies that define the global motions of the enzyme (Figures 7C, D, and E).

      The following paper should be discussed as it discussed similar ATP/substrate binding of Src kinase based on an extensive network that largely overlaps with the discussed PKA network. Foda, et al. "A dynamically coupled allosteric network underlies binding cooperativity in Src kinase." Nature communications 6.1 (2015): 5939.

      We apologize for missing this citation. Indeed, it makes our finding more general as allosteric cooperativity is key in other kinases such as Src and ERK2. We included this in the Discussion section.

      The CHESCA analysis appears to be an add-on that doesn't add much value. It is difficult to direct. I'd suggest considering removing it to the SI.

      We understand this concern. We rewrote part of the paper to make the NMR analysis of the correlated chemical shifts described by the CHESCA matrices linked to the MD calculations.

    1. Author Response

      Reviewer #1 (Public Review):

      Theoretical principles of viscous fluid mechanics are used here to assess likely mechanisms of transport in the ER. A set of candidate mechanisms is evaluated, making good use of imaging to represent ER network geometries. Evidence is provided that the contraction of peripheral sheets provides a much more credible mechanism than the contraction of individual tubules, junctions, or perinuclear sheets.

      The work has been conducted carefully and comprehensively, making good use of underlying physical principles. There is a good discussion of the role of slip; sensible approximations (low volume fraction, small particle size, slender geometries, pragmatic treatment of boundary conditions) allow tractable and transparent calculations; clear physical arguments provide useful bounds; stochastic and deterministic features of the problem are well integrated.

      We thank the reviewer for their positive assessment of our work.

      There are just a couple of areas where more discussion might be warranted, in my view.

      (1) The energetic cost of tubule contraction is estimated, but I did not see an equivalent estimate for the contraction of peripheral sheets. It might be helpful to estimate the energetic cost of viscous dissipation in generated flows at higher frequencies.

      This is a good point. We will also include an energetic cost estimate for the contractions of peripheral sheets in the revised manuscript.

      The mechanism of peripheral sheet contraction is unclear: do ATP-driven mechanisms somehow interact with thermal fluctuations of membranes?

      The new energetic estimates in the revision might help constrain possible hypotheses for the mechanism(s) driving peripheral sheet contraction, and suggest if a dedicated ATP-driven mechanism is required.

      (2) Mutations are mentioned in the abstract but not (as far as I could see) later in the manuscript. It would be helpful if any consequences for pathologies could be developed in the text.

      We are grateful for this suggestion. The need to rationalise pathology associated with the subtle effects of ER-morphogens’ mutations is indeed pointed out as one factor motivating the study of the interplay between ER structure and performance. In the revised manuscript, we plan to include a brief discussion potentially linking ER morphogenes’ malfunction to luminal transport, integrating additional freshly published data.

      Reviewer #2 (Public Review):

      Summary:

      This study explores theoretically the consequences of structural fluctuations of the endoplasmic reticulum (ER) morphology called contractions on molecular transport. Most of the manuscript consists of the construction of an interesting theoretical flow field (physical model) under various hypothetical assumptions. The computational modeling is followed by some simulations

      Strengths:

      The authors are focusing their attention on testing the hypothesis that a local flow in the tubule could be driven by tubular pinching. We recall that trafficking in the ER is considered to be mostly driven by diffusion at least at a spatial scale that is large enough to account for averaging of any random flow occurring from multiple directions [note that this is not the case for plants].

      We thank the reviewer. We have indeed explored here the possibilities of active transport, focusing especially on transport over the length scale of single tubules, as a result of structural fluctuations, and found tubular pinching to be ineffective compared to e.g. peripheral sheets fluctuations. In the revised version we plan to add text mentioning what is known about the ER in plants.

      Weaknesses:

      The manuscript extensively details the construction of the theoretical model, occupying a significant portion of the manuscript. While this section contains interesting computations, its relevance and utility could be better emphasized, perhaps warranting a reorganization of the manuscript to foreground this critical aspect.

      Overall, the manuscript appears highly technical with limited conclusive insights, particularly lacking predictions confirmed by experimental validation. There is an absence of substantial conclusions regarding molecular trafficking within the ER.

      We sought to balance the theoretical/computational details of our model with the biophysical conclusions drawn from its predictions. Given the model's complexity and novelty, it was essential to elucidate the theoretical underpinnings comprehensively, in order to allow others to implement it in the future with additional, or different, parameters. To maintain clarity and focus in the main text, we have judiciously relegated extensive technical details to the methods section or supplementary materials, and divided the text into stand-alone section headings allowing the reader to skip through to conclusions.

      The primary focus of our manuscript is to introduce and explore, via our theoretical model, the interplay between ER structure dynamics and molecular transport. Our approach, while in silico, generates concrete predictions about the physical processes underpinning luminal motion within the ER. For instance, our findings challenge the previously postulated role of small tubular contractions in driving luminal flow, instead highlighting the potential significance of local flat ER areas—empirically documented entities—for facilitating such motion.

      Furthermore, by deducing what type of transport may or may not occur within the range of possible ER structural fluctuations, our model offers detailed predictions designed to bridge the gap between theoretical insight and experimental verification. These predictions detail the spatial and temporal parameters essential for effective transport, delineating plausible values for these parameters. We hope that the model’s predictions will invite experimentalists to devise innovative methodologies to test them. We plan to introduce text edits to the revised version to clarify these.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Recommendations

      Recommendation #1: Address potential confounds in the experimental design:

      (1a) Confounding factors between baseline to early learning. While the visual display of the curved line remains constant, there are at least three changes between these two phases: 1) the presence of reward feedback (the focus of the paper); 2) a perturbation introduced to draw a hidden, mirror-symmetric curved line; 3) instructions provided to use reward feedback to trace the line on the screen (intentionally deceitful). As such, it remains unclear which of these factors are driving the changes in both behavior and bold signals between the two phases. The absence of a veridical feedback phase in which participants received reward feedback associated with the shown trajectory seems like a major limitation.

      (1b) Confounding Factors Between Early and Late Learning. While the authors have focused on interpreting changes from early to late due to the explore-exploit trade-off, there are three additional factors possibly at play: 1) increasing fatigue, 2) withdrawal of attention, specifically related to individuals who have either successfully learned the perturbation within the first few trials or those who have simply given up, or 3) increasing awareness of the perturbation (not clear if subjective reports about perturbation awareness were measured.). I understand that fMRI research is resource-intensive; however, it is not clear how to rule out these alternatives with their existing data without additional control groups. [Another reviewer added the following: Why did the authors not acquire data during a control condition? How can we be confident that the neural dynamics observed are not due to the simple passage of time? Or if these effects are due to the task, what drives them? The reward component, the movement execution, increased automaticity?]

      We have opted to address both of these points above within a single reply, as together they suggest potential confounding factors across the three phases of the task. We would agree that, if the results of our pairwise comparisons (e.g., Early > Baseline or Late > Early) were considered in isolation from one another, then these critiques of the study would be problematic. However, when considering the pattern of effects across the three task phases, we believe most of these critiques can be dismissed. Below, we first describe our results in this context, and then discuss how they address the reviewers’ various critiques.

      Recall that from Baseline to Early learning, we observe an expansion of several cortical areas (e.g., core regions in the DMN) along the manifold (red areas in Fig. 4A, see manifold shifts in Fig. 4C) that subsequently exhibit contraction during Early to Late learning (blue areas in Fig. 4B, see manifold shifts in Fig. 4D). We show this overlap in brain areas in Author response image 1 below, panel A. Notably, several of these brain areas appear to contract back to their original, Baseline locations along the manifold during Late learning (compare Fig. 4C and D). This is evidenced by the fact that many of these same regions (e.g., DMN regions, in Author response image 1 panel A below) fail to show a significant difference between the Baseline and Late learning epochs (see Author response image 1 panel B below, which is taken from supplementary Fig 6). That is, the regions that show significant expansion and subsequent contraction (in Author response image 1 panel A below) tend not to overlap with the regions that significantly changed over the time course of the task (in Author response image 1 panel B below).

      Author response image 1.

      Note that this basic observation above is not only true of our regional manifold eccentricity data, but also in the underlying functional connectivity data associated with individual brain regions. To make this second point clearer, we have modified and annotated our Fig. 5 and included it below. Note the reversal in seed-based functional connectivity from Baseline to Early learning (leftmost brain plots) compared to Early to Late learning (rightmost brain plots). That is, it is generally the case that for each seed-region (A-C) the areas that increase in seed-connectivity with the seed region (in red; leftmost plot) are also the areas that decrease in seed-connectivity with the seed region (in blue; rightmost plot), and vice versa. [Also note that these connectivity reversals are conveyed through the eccentricity data — the horizontal red line in the rightmost plots denote the mean eccentricity of these brain regions during the Baseline phase, helping to highlight the fact that the eccentricity of the Late learning phase reverses back towards this Baseline level].

      Author response image 2.

      Critically, these reversals in brain connectivity noted above directly counter several of the critiques noted by the reviewers. For instance, this reversal pattern of effects argues against the idea that our results during Early Learning can be simply explained due to the (i) presence of reward feedback, (ii) presence of the perturbation or (iii) instructions to use reward feedback to trace the path on the screen. Indeed, all of these factors are also present during Late learning, and yet many of the patterns of brain activity during this time period revert back to the Baseline patterns of connectivity, where these factors are absent. Similarly, this reversal pattern strongly refutes the idea that the effects are simply due to the passage of time, increasing fatigue, or general awareness of the perturbation. Indeed, if any of these factors alone could explain the data, then we would have expected a gradual increase (or decrease) in eccentricity and connectivity from Baseline to Early to Late learning, which we do not observe. We believe these are all important points when interpreting the data, but which we failed to mention in our original manuscript when discussing our findings.

      We have now rectified this in the revised paper, where we now write in our Discussion:

      “Finally, it is important to note that the reversal pattern of effects noted above suggests that our findings during learning cannot be simply attributed to the introduction of reward feedback and/or the perturbation during Early learning, as both of these task-related features are also present during Late learning. In addition, these results cannot be simply explained due to the passage of time or increasing subject fatigue, as this would predict a consistent directional change in eccentricity across the Baseline, Early and Late learning epochs.”

      However, having said the above, we acknowledge that one potential factor that our findings cannot exclude is that they are (at least partially) attributable to changes in subjects’ state of attention throughout the task. Indeed, one can certainly argue that Baseline trials in our study don’t require a great deal of attention (after all, subjects are simply tracing a curved path presented on the screen). Likewise, for subjects that have learned the hidden shape, the Late learning trials are also likely to require limited attentional resources (indeed, many subjects at this point are simply producing the same shape trial after trial). Consequently, the large shift in brain connectivity that we observe from Baseline to Early Learning, and the subsequent reversion back to Baseline-levels of connectivity during Late learning, could actually reflect a heightened allocation of attention as subjects are attempting to learn the (hidden) rewarded shape. However, we do not believe that this would reflect a ‘confound’ of our study per se — indeed, any subject who has participated in a motor learning study would agree that the early learning phase of a task is far more cognitively demanding than Baseline trials and Late learning trials. As such, it is difficult to disentangle this ‘attention’ factor from the learning process itself (and in fact, it is likely central to it).

      Of course, one could have designed a ‘control’ task in which subjects must direct their attention to something other than the learning task itself (e.g., divided attention paradigm, e.g., Taylor & Thoroughman, 2007, 2008, and/or perform a secondary task concurrently (Codol et al., 2018; Holland et al., 2018), but we know that this type of manipulation impairs the learning process itself. Thus, in such a case, it wouldn’t be obvious to the experimenter what they are actually measuring in brain activity during such a task. And, to extend this argument even further, it is true that any sort of brain-based modulation can be argued to reflect some ‘attentional’ process, rather than modulations related to the specific task-based process under consideration (in our case, motor learning). In this regard, we are sympathetic to the views of Richard Andersen and colleagues who have eloquently stated that “The study of how attention interacts with other neural processing systems is a most important endeavor. However, we think that over-generalizing attention to encompass a large variety of different neural processes weakens the concept and undercuts the ability to develop a robust understanding of other cognitive functions.” (Andersen & Cui, 2007, Neuron). In short, it appears that different fields/researchers have alternate views on the usefulness of attention as an explanatory construct (see also articles from Hommel et al., 2019, “No one knows what attention is”, and Wu, 2023, “We know what attention is!”), and we personally don’t have a dog in this fight. We only highlight these issues to draw attention (no pun intended) that it is not trivial to separate these different neural processes during a motor learning study.

      Nevertheless, we do believe these are important points worth flagging for the reader in our paper, as they might have similar questions. To this end, we have now included in our Discussion section the following text:

      “It is also possible that some of these task-related shifts in connectivity relate to shifts in task-general processes, such as changes in the allocation of attentional resources (Bédard and Song, 2013; Rosenberg et al., 2016) or overall cognitive engagement (Aben et al., 2020), which themselves play critical roles in shaping learning (Codol et al., 2018; Holland et al., 2018; Song, 2019; Taylor and Thoroughman, 2008, 2007; for a review of these topics, see Tsay et al., 2023). Such processes are particularly important during the earlier phases of learning when sensorimotor contingencies need to be established. While these remain questions for future work, our data nevertheless suggest that this shift in connectivity may be enabled through the PMC.”

      Finally, we should note that, at the end of testing, we did not assess participants' awareness of the manipulation (i.e., that they were, in fact, being rewarded based on a mirror image path). In hindsight, this would have been a good idea and provided some value to the current project. Nevertheless, it seems clear that, based on several of the learning profiles observed (e.g., subjects who exhibited very rapid learning during the Early Learning phase, more on this below), that many individuals became aware of a shape approximating the rewarded path. Note that we have included new figures (see our responses below) that give a better example of what fast versus slower learning looks like. In addition, we now note in our Methods that we did not probe participants about their subjective awareness re: the perturbation:

      “Note that, at the end of testing, we did not assess participants’ awareness of the manipulation (i.e., that they were, in fact, being rewarded based on a mirror image path of the visible path).”

      Recommendation #2: Provide more behavioral quantification.

      (2a) The authors chose to only plot the average learning score in Figure 1D, without an indication of movement variability. I think this is quite important, to give the reader an impression of how variable the movements were at baseline, during early learning, and over the course of learning. There is evidence that baseline variability influences the 'detectability' of imposed rotations (in the case of adaptation learning), which could be relevant here. Shading the plots by movement variability would also be important to see if there was some refinement of the moment after participants performed at the ceiling (which seems to be the case ~ after trial 150). This is especially worrying given that in Fig 6A there is a clear indication that there is a large difference between subjects' solutions on the task. One subject exhibits almost a one-shot learning curve (reaching a score of 75 after one or two trials), whereas others don't seem to really learn until the near end. What does this between-subject variability mean for the authors' hypothesized neural processes?

      In line with these recommendations, we have now provided much better behavioral quantification of subject-level performance in both the main manuscript and supplementary material. For instance, in a new supplemental Figure 1 (shown below), we now include mean subject (+/- SE) reaction times (RTs), movement times (MTs) and movement path variability (our computing of these measures are now defined in our Methods section).

      As can be seen in the figure, all three of these variables tended to decrease over the course of the study, though we note there was a noticeable uptick in both RTs and MTs from the Baseline to Early learning phase, once subjects started receiving trial-by-trial reward feedback based on their movements. With respect to path variability, it is not obvious that there was a significant refinement of the paths created during late learning (panel D below), though there was certainly a general trend for path variability to decrease over learning.

      Author response image 3.

      Behavioral measures of learning across the task. (A-D) shows average participant reward scores (A), reaction times (B), movement times (C) and path variability (D) over the course of the task. In each plot, the black line denotes the mean across participants and the gray banding denotes +/- 1 SEM. The three equal-length task epochs for subsequent neural analyses are indicated by the gray shaded boxes.

      In addition to these above results, we have also created a new Figure 6 in the main manuscript, which now solely focuses on individual differences in subject learning (see below). Hopefully, this figure clarifies key features of the task and its reward structure, and also depicts (in movement trajectory space) what fast versus slow learning looks like in the task. Specifically, we believe that this figure now clearly delineates for the reader the mapping between movement trajectory and the reward score feedback presented to participants, which appeared to be a source of confusion based on the reviewers’ comments below. As can be clearly observed in this figure, trajectories that approximated the ‘visible path’ (black line) resulted in fairly mediocre scores (see score color legend at right), whereas trajectories that approximated the ‘reward path’ (dashed black line, see trials 191-200 of the fast learner) resulted in fairly high scores. This figure also more clearly delineates how fPCA loadings derived from our functional data analysis were used to derive subject-level learning scores (panel C).

      Author response image 4.

      Individual differences in subject learning performance. (A) Examples of a good learner (bordered in green) and poor learner (bordered in red). (B) Individual subject learning curves for the task. Solid black line denotes the mean across all subjects whereas light gray lines denote individual participants. The green and red traces denote the learning curves for the example good and poor learners denoted in A. (C) Derivation of subject learning scores. We performed functional principal component analysis (fPCA) on subjects’ learning curves in order to identify the dominant patterns of variability during learning. The top component, which encodes overall learning, explained the majority of the observed variance (~75%). The green and red bands denote the effect of positive and negative component scores, respectively, relative to mean performance. Thus, subjects who learned more quickly than average have a higher loading (in green) on this ‘Learning score’ component than subjects who learned more slowly (in red) than average. The plot at right denotes the loading for each participant (open circles) onto this Learning score component.

      The reviewers note that there are large individual differences in learning performance across the task. This was clearly our hope when designing the reward structure of this task, as it would allow us to further investigate the neural correlates of these individual differences (indeed, during pilot testing, we sought out a reward structure to the task that would allow for these intersubject differences). The subjects who learn early during the task end up having higher fPCA scores than the subjects who learn more gradually (or learn the task late). From our perspective, these differences are a feature, and not a bug, and they do not negate any of our original interpretations. That is, subjects who learn earlier on average tend to contract their DAN-A network during the early learning phase whereas subjects who learn more slowly on average (or learn late) instead tend to contract their DAN-A network during late learning (Fig. 7).

      (2b) In the methods, the authors stated that they scaled the score such that even a perfectly traced visible path would always result in an imperfect score of 40 patients. What happens if a subject scores perfectly on the first try (which seemed to have happened for the green highlighted subject in Fig 6A), but is then permanently confronted with a score of 40 or below? Wouldn't this result in an error-clamp-like (error-based motor adaptation) design for this subject and all other high performers, which would vastly differ from the task demands for the other subjects? How did the authors factor in the wide between-subject variability?

      We think the reviewers may have misinterpreted the reward structure of the task, and we apologize for not being clearer in our descriptions. The reward score that subjects received after each trial was based on how well they traced the mirror-image of the visible path. However, all the participant can see on the screen is the visible path. We hope that our inclusion of the new Figure 6 (shown above) makes the reward structure of the task, and its relationship to movement trajectories, much clearer. We should also note that, even for the highest performing subject (denoted in Fig. 6), it still required approximately 20 trials for them to reach asymptote performance.

      (2c) The study would benefit from a more detailed description of participants' behavioral performance during the task. Specifically, it is crucial to understand how participants' motor skills evolve over time. Information on changes in movement speed, accuracy, and other relevant behavioral metrics would enhance the understanding of the relationship between behavior and brain activity during the learning process. Additionally, please clarify whether the display on the screen was presented continuously throughout the entire trial or only during active movement periods. Differences in display duration could potentially impact the observed differences in brain activity during learning.

      We hope that with our inclusion of the new Supplementary Figure 1 (shown above) this addresses the reviewers’ recommendation. Generally, we find that RTs, MTs and path variability all decrease over the course of the task. We think this relates to the early learning phase being more attentionally demanding and requiring more conscious effort, than the later learning phases.

      Also, yes, the visible path was displayed on the screen continuously throughout the trial, and only disappeared at the 4.5 second mark of each trial (when the screen was blanked and the data was saved off for 1.5 seconds prior to commencement of the next trial; 6 seconds total per trial). Thus, there were no differences in display duration across trials and phases of the task. We have now clarified this in the Methods section, where we now write the following:

      “When the cursor reached the target distance, the target changed color from red to green to indicate that the trial was completed. Importantly, other than this color change in the distance marker, the visible curved path remained constant and participants never received any feedback about the position of their cursor.”

      (2d) It is unclear from plots 6A, 6B, and 1D how the scale of the behavioral data matches with the scaling of the scores. Are these the 'real' scores, meaning 100 on the y-axis would be equivalent to 40 in the task? Why then do all subjects reach an asymptote at 75? Or is 75 equivalent to 40 and the axis labels are wrong?

      As indicated above, we clearly did a poor job of describing the reward structure of our task in our original paper, and we now hope that our inclusion of Figure 6 makes things clear. A ‘40’ score on the y-axis would indicate that a subject has perfectly traced the visible path whereas a perfect ‘100’ score would indicate that a subject has perfectly traced the (hidden) mirror image path.

      The fact that several of the subjects reach asymptote around 75 is likely a byproduct of two factors. Firstly, the subjects performed their movements in the absence of any visual error feedback (they could not see the position of a cursor that represented their hand position), which had the effect of increasing motor variability in their actions from trial to trial. Secondly, there appears to be an underestimation among subjects regarding the curvature of the concealed, mirror-image path (i.e., that the rewarded path actually had an equal but opposite curvature to that of the visible path). This is particularly evident in the case of the top-performing subject (illustrated in Figure 6A) who, even during late learning, failed to produce a completely arched movement.

      (2e) Labeling of Contrasts: There is a consistent issue with the labeling of contrasts in the presented figures, causing confusion. While the text refers to the difference as "baseline to early learning," the label used in figures, such as Figure 4, reads "baseline > early." It is essential to clarify whether the presented contrast is indeed "baseline > early" or "early > baseline" to avoid any misinterpretation.

      We thank the reviewers for catching this error. Indeed, the intended label was Early > Baseline, and this has now been corrected throughout.

      Recommendation #3. Clarify which motor learning mechanism(s) are at play.

      (3a) Participants were performing at a relatively low level, achieving around 50-60 points by the end of learning. This outcome may not be that surprising, given that reward-based learning might have a substantial explicit component and may also heavily depend on reasoning processes, beyond reinforcement learning or contextual recall (Holland et al., 2018; Tsay et al., 2023). Even within our own data, where explicit processes are isolated, average performance is low and many individuals fail to learn (Brudner et al., 2016; Tsay et al., 2022). Given this, many participants in the current study may have simply given up. A potential indicator of giving up could be a subset of participants moving straight ahead in a rote manner (a heuristic to gain moderate points). Consequently, alterations in brain networks may not reflect exploration and exploitation strategies but instead indicate levels of engagement and disengagement. Could the authors plot the average trajectory and the average curvature changes throughout learning? Are individuals indeed defaulting to moving straight ahead in learning, corresponding to an average of 50-60 points? If so, the interpretation of brain activity may need to be tempered.

      We can do one better, and actually give you a sense of the learning trajectories for every subject over time. In the figure below, which we now include as Supplementary Figure 2 in our revision, we have plotted, for each subject, a subset of their movement trajectories across learning trials (every 10 trials). As can be seen in the diversity of these trajectories, the average trajectory and average curvature would do a fairly poor job of describing the pattern of learning-related changes across subjects. Moreover, it is not obvious from looking at these plots the extent to which poor learning subjects (i.e., subjects who never converge on the reward path) actually ‘give up’ in the task — rather, many of these subjects still show some modulation (albeit minor) of their movement trajectories in the later trials (see the purple and pink traces). As an aside, we are also not entirely convinced that straight ahead movements, which we don’t find many of in our dataset, can be taken as direct evidence that the subject has given up.

      Author response image 5

      Variability in learning across subjects. Plots show representative trajectory data from each subject (n=36) over the course of the 200 learning trials. Coloured traces show individual trials over time (each trace is separated by ten trials, e.g., trial 1, 10, 20, 30, etc.) to give a sense of the trajectory changes throughout the task (20 trials in total are shown for each subject).

      We should also note that we are not entirely opposed to the idea of describing aspects of our findings in terms of subject engagement versus disengagement over time, as such processes are related at some level to exploration (i.e., cognitive engagement in finding the best solution) and exploitation (i.e., cognitively disengaging and automating one’s behavior). As noted in our reply to Recommendation #1 above, we now give some consideration of these explanations in our Discussion section, where we now write:

      “It is also possible that these task-related shifts in connectivity relates to shifts in task-general processes, such as changes in the allocation of attentional resources (Bédard and Song, 2013; Rosenberg et al., 2016) or overall cognitive engagement (Aben et al., 2020), which themselves play critical roles in shaping learning (Codol et al., 2018; Holland et al., 2018; Song, 2019; Taylor and Thoroughman, 2008, 2007; for a review of these topics, see Tsay et al., 2023). Such processes are particularly important during the earlier phases of learning when sensorimotor contingencies need to be established. While these remain questions for future work, our data nevertheless suggest that this shift in connectivity may be enabled through the PMC.”

      (3b) The authors are mixing two commonly used paradigms, reward-based learning, and motor adaptation, but provide no discussion of the different learning processes at play here. Which processes were they attempting to probe? Making this explicit would help the reader understand which brain regions should be implicated based on previous literature. As it stands, the task is hard to interpret. Relatedly, there is a wealth of literature on explicit vs implicit learning mechanisms in adaptation tasks now. Given that the authors are specifically looking at brain structures in the cerebral cortex that are commonly associated with explicit and strategic learning rather than implicit adaptation, how do the authors relate their findings to this literature? Are the learning processes probed in the task more explicit, more implicit, or is there a change in strategy usage over time? Did the authors acquire data on strategies used by the participants to solve the task? How does the baseline variability come into play here?

      As noted in our paper, our task was directly inspired by the reward-based motor learning tasks developed by Dam et al., 2013 (Plos One) and Wu et al., 2014 (Nature Neuroscience). What drew us to these tasks is that they allowed us to study the neural bases of reward-based learning mechanisms in the absence of subjects also being able to exploit error-based mechanisms to achieve learning. Indeed, when first describing the task in the Results section of our paper we wrote the following:

      “Importantly, because subjects received no visual feedback about their actual finger trajectory and could not see their own hand, they could only use the score feedback — and thus only reward-based learning mechanisms — to modify their movements from one trial to the next (Dam et al., 2013; Wu et al., 2014).”

      If the reviewers are referring to ‘motor adaptation’ in the context in which that terminology is commonly used — i.e., the use of sensory prediction errors to support error-based learning — then we would argue that motor adaptation is not a feature of the current study. It is true that in our study subjects learn to ‘adapt’ their movements across trials, but this shaping of the movement trajectories must be supported through reinforcement learning mechanisms (and, of course, supplemented by the use of cognitive strategies as discussed in the nice review by Tsay et al., 2023). We apologize for not being clearer in our paper about this key distinction and we have now included new text in the introduction to our Results to directly address this:

      “Importantly, because subjects received no visual feedback about their actual finger trajectory and could not see their own hand, they could only use the score feedback — and thus only reward-based learning mechanisms — to modify their movements from one trial to the next (Dam et al., 2013; Wu et al., 2014). That is, subjects could not use error-based learning mechanisms to achieve learning in our study, as this form of learning requires sensory errors that convey both the change in direction and magnitude needed to correct the movement.”

      With this issue aside, we are well aware of the established framework for thinking about sensorimotor adaptation as being composed of a combination of explicit and implicit components (indeed, this has been a central feature of several of our other recent neuroimaging studies that have explored visuomotor rotation learning, e.g., Gale et al., 2022 PNAS, Areshenkoff et al., 2022 elife, Standage et al., 2023 Cerebral Cortex). However, there has been comparably little work done on these parallel components within the domain of reinforcement learning tasks (though see Codol et al., 2018; Holland et al., 2018, van Mastrigt et al., 2023; see also the Tsay et al., 2023 review), and as far as we can tell, nothing has been done to date in the reward-based motor learning area using fMRI. By design, we avoided using descriptors of ‘explicit’ or ‘implicit’ in our study because our experimental paradigm did not allow a separate measurement of those two components to learning during the task. Nevertheless, it seems clear to us from examining the subjects’ learning curves (see supplementary figure 2 above), that individuals who learn very quickly are using strategic processes (such as action exploration to identify the best path) to enhance their learning. As we noted in an above response, we did not query subjects after the fact about their strategy use, which admittedly was a missed opportunity on our part.

      Author response image 6.

      With respect to the comment on baseline variability and its relationship to performance, this is an interesting idea and one that was explored in the Wu et al., 2014 Nature Neuroscience paper. Prompted by the reviewers, we have now explored this idea in the current data set by testing for a relationship between movement path variability during baseline trials (all 70 baseline trials, see Supplementary Figure 1D above for reference) and subjects’ fPCA score on our learning task. However, when we performed this analysis, we did not observe a significant positive relationship between baseline variability and subject performance. Rather, we actually found a trend towards a negative relationship (though this was non-significant; r=-0.2916, p=0.0844). Admittedly, we are not sure what conclusions can be drawn from this analysis, and in any case, we believe it to be tangential to our main results. We provide the results (at right) for the reviewers if they are interested. This may be an interesting avenue for exploration in future work.

      Recommendation #4: Provide stronger justification for brain imaging methods.

      (4a) Observing how brain activity varies across these different networks is remarkable, especially how sensorimotor regions separate and then contract with other, more cognitive areas. However, does the signal-to-noise ratio in each area/network influence manifold eccentricity and limit the possible changes in eccentricity during learning? Specifically, if a region has a low signal-to-noise ratio, it might exhibit minimal changes during learning (a phenomenon perhaps relevant to null manifold changes in the striatum due to low signal-to-noise); conversely, regions with higher signal-to-noise (e.g., motor cortex in this sensorimotor task) might exhibit changes more easily detected. As such, it is unclear how to interpret manifold changes without considering an area/network's signal-to-noise ratio.

      We appreciate where these concerns are coming from. First, we should note that the timeseries data used in our analysis were z-transformed (mean zero, 1 std) to allow normalization of the signal both over time and across regions (and thus mitigate the possibility that the changes observed could simply reflect mean overall signal changes across different regions). Nevertheless, differences in signal intensity across brain regions — particularly between cortex and striatum — are well-known, though it is not obvious how these differences may manifest in terms of a task-based modulation of MR signals.

      To examine this issue in the current data set, we extracted, for each subject and time epoch (Baseline, Early and Late learning) the raw scanner data (in MR arbitrary units, a.u.) for the cortical and striatal regions and computed the (1) mean signal intensity, (2) standard deviation of the signal (Std) and (3) temporal signal to noise ratio (tSNR; calculated by mean/Std). Note that in the fMRI connectivity literature tSNR is often the preferred SNR measure as it normalizes the mean signal based on the signal’s variability over time, thus providing a general measure of overall ‘signal quality’. The results of this analysis, averaged across subjects and regions, is shown below.

      Author response image 7.

      Note that, as expected, the overall signal intensity (left plot) of cortex is higher than in the striatum, reflecting the closer proximity of cortex to the receiver coils in the MR head coil. In fact, the signal intensity in cortex is approximately 38% higher than that in the striatum (~625 - 450)/450). However, the signal variation in cortex is also greater than striatum (middle plot), but in this case approximately 100% greater (i.e., (~5 - 2.5)/2.5)). The result of this is that the tSNR (mean/std) for our data set and the ROI parcellations we used is actually greater in the striatum than in cortex (right plot). Thus, all else being equal, there seems to have been sufficient tSNR in the striatum for us to have detected motor-learning related effects. As such, we suspect the null effects for the striatum in our study actually stem from two sources.

      The first likely source is the relatively lower number of striatal regions (12) as compared to cortical regions (998) used in our analysis, coupled with our use of PCA on these data (which, by design, identifies the largest sources of variation in connectivity). In future studies, this unbalance could be rectified by using finer parcellations of the striatum (even down to the voxel level) while keeping the same parcellation of cortex (i.e., equate the number of ‘regions’ in each of striatum and cortex). The second likely source is our use of a striatal atlas (the Harvard-Oxford atlas) that divides brain regions based on their neuroanatomy rather than their function. In future work, we plan on addressing this latter concern by using finer, more functionally relevant parcellations of striatum (such as in Tian et al., 2020, Nature Neuroscience). Note that we sought to capture these interrelated possible explanations in our Discussion section, where we wrote the following:

      “While we identified several changes in the cortical manifold that are associated with reward-based motor learning, it is noteworthy that we did not observe any significant changes in manifold eccentricity within the striatum. While clearly the evidence indicates that this region plays a key role in reward-guided behavior (Averbeck and O’Doherty, 2022; O’Doherty et al., 2017), there are several possible reasons why our manifold approach did not identify this collection of brain areas. First, the relatively small size of the striatum may mean that our analysis approach was too coarse to identify changes in the connectivity of this region. Though we used a 3T scanner and employed a widely-used parcellation scheme that divided the striatum into its constituent anatomical regions (e.g., hippocampus, caudate, etc.), both of these approaches may have obscured important differences in connectivity that exist within each of these regions. For example, areas such the hippocampus and caudate are not homogenous areas but themselves exhibit gradients of connectivity (e.g., head versus tail) that can only be revealed at the voxel level (Tian et al., 2020; Vos de Wael et al., 2021). Second, while our dimension reduction approach, by design, aims to identify gradients of functional connectivity that account for the largest amounts of variance, the limited number of striatal regions (as compared to cortex) necessitates that their contribution to the total whole-brain variance is relatively small. Consistent with this perspective, we found that the low-dimensional manifold architecture in cortex did not strongly depend on whether or not striatal regions were included in the analysis (see Supplementary Fig. 6). As such, selective changes in the patterns of functional connectivity at the level of the striatum may be obscured using our cortex x striatum dimension reduction approach. Future work can help address some of these limitations by using both finer parcellations of striatal cortex (perhaps even down to the voxel level)(Tian et al., 2020) and by focusing specifically on changes in the interactions between the striatum and cortex during learning. The latter can be accomplished by selectively performing dimension reduction on the slice of the functional connectivity matrix that corresponds to functional coupling between striatum and cortex.”

      (4b) Could the authors clarify how activity in the dorsal attention network (DAN) changes throughout learning, and how these changes also relate to individual differences in learning performance? Specifically, on average, the DAN seems to expand early and contract late, relative to the baseline. This is interpreted to signify that the DAN exhibits lesser connectivity followed by greater connectivity with other brain regions. However, in terms of how these changes relate to behavior, participants who go against the average trend (DAN exhibits more contraction early in learning, and expansion from early to late) seem to exhibit better learning performance. This finding is quite puzzling. Does this mean that the average trend of expansion and contraction is not facilitative, but rather detrimental, to learning? [Another reviewer added: The authors do not state any explicit hypotheses, but only establish that DMN coordinates activity among several regions. What predictions can we derive from this? What are the authors looking for in the data? The work seems more descriptive than hypothesis-driven. This is fine but should be clarified in the introduction.]

      These are good questions, and we are glad the reviewers appreciated the subtlety here. The reviewers are indeed correct that the relationship of the DAN-A network to behavioral performance appears to go against the grain of the group-level results that we found for the entire DAN network (which we note is composed of both the DAN-A and DAN-B networks). That is, subjects who exhibited greater contraction from Baseline to Early learning and likewise, greater expansion from Early to Late learning, tended to perform better in the task (according to our fPCA scores). However, on this point it is worth noting that it was mainly the DAN-B network which exhibited group-level expansion from Baseline to Early Learning whereas the DAN-A network exhibited negligible expansion. This can be seen in Author response image 8 below, which shows the pattern of expansion and contraction (as in Fig. 4), but instead broken down into the 17-network parcellation. The red asterisk denotes the expansion from Baseline to Early learning for the DAN-B network, which is much greater than that observed for the DAN-A network (which is basically around the zero difference line).

      Author response image 8.

      Thus, it appears that the DAN-A and DAN-B networks are modulated to a different extent during the task, which likely contributes to the perceived discrepancy between the group-level effects (reported using the 7-network parcellation) and the individual differences effects (reported using the finer 17-network parcellation). Based on the reviewers’ comments, this seems like an important distinction to clarify in the manuscript, and we have now described this nuance in our Results section where we now write:

      “...Using this permutation testing approach, we found that it was only the change in eccentricity of the DAN-A network that correlated with Learning score (see Fig. 7C), such that the more the DAN-A network decreased in eccentricity from Baseline to Early learning (i.e., contracted along the manifold), the better subjects performed at the task (see Fig. 7C, scatterplot at right). Consistent with the notion that changes in the eccentricity of the DAN-A network are linked to learning performance, we also found the inverse pattern of effects during Late learning, whereby the more that this same network increased in eccentricity from Early to Late learning (i.e., expanded along the manifold), the better subjects performed at the task (Fig. 7D). We should note that this pattern of performance effects for the DAN-A — i.e., greater contraction during Early learning and greater expansion during Late learning being associated with better learning — appears at odds with the group-level effects described in Fig. 4A and B, where we generally find the opposite pattern for the entire DAN network (composed of the DAN-A and DAN-B subnetworks). However, this potential discrepancy can be explained when examining the changes in eccentricity using the 17-network parcellation (see Supplementary Figure 8). At this higher resolution level we find that these group-level effects for the entire DAN network are being largely driven by eccentricity changes in the DAN-B network (areas in anterior superior parietal cortex and premotor cortex), and not by mean changes in the DAN-A network. By contrast, our present results suggest that it is the contraction and expansion of areas of the DAN-A network (and not DAN-B network) that are selectively associated with differences in subject learning performance.”

      Finally, re: the reviewers’ comments that we do not state any explicit hypotheses etc., we acknowledge that, beyond our general hypothesis stated at the outset about the DMN being involved in reward-based motor learning, our study is quite descriptive and exploratory in nature. Such little work has been done in this research area (i.e., using manifold learning approaches to study motor learning with fMRI) that it would be disingenuous to have any stronger hypotheses than those stated in our Introduction. Thus, to make the exploratory nature of our study clear to the reader, we have added the following text (in red) to our Introduction:

      “Here we applied this manifold approach to explore how brain activity across widely distributed cortical and striatal systems is coordinated during reward-based motor learning. We were particularly interested in characterizing how connectivity between regions within the DMN and the rest of the brain changes as participants shift from learning the relationship between motor commands and reward feedback, during early learning, to subsequently using this information, during late learning. We were also interested in exploring whether learning-dependent changes in manifold structure relate to variation in subject motor performance.”

      We hope these changes now make it obvious the intention of our study.

      (4c) The paper examines a type of motor adaptation task with a reward-based learning component. This, to me, strongly implicates the cerebellum, given that it has a long-established crucial role in adaptation and has recently been implicated in reward-based learning (see work by Wagner & Galea). Why is there no mention of the cerebellum and why it was left out of this study? Especially given that the authors state in the abstract they examine cortical and subcortical structures. It's evident from the methods that the authors did not acquire data from the cerebellum or had too small a FOV to fully cover it (34 slices at 4 mm thickness 136 mm which is likely a bit short to fully cover the cerebellum in many participants). What was the rationale behind this methodological choice? It would be good to clarify this for the reader. Related to this, the authors need to rephrase their statements on 'whole-brain' connectivity matrices or analyses - it is not whole-brain when it excludes the cerebellum.

      As we noted above, we do not believe this task to be a motor adaptation task, in the sense that subjects are not able to use sensory prediction errors (and thus error-based learning mechanisms) to improve their performance. Rather, by denying subjects this sensory error feedback they are only able to use reinforcement learning processes, along with cognitive strategies (nicely covered in Tsay et al., 2023), to improve performance. Nevertheless, we recognize that the cerebellum has been increasingly implicated in facets of reward-based learning, particularly within the rodent domain (e.g., Wagner et al., 2017; Heffley et al., 2018; Kostadinov et al., 2019, etc.). In our study, we did indeed collect data from the cerebellum but did not include it in our original analyses, as we wanted (1) the current paper to build on prior work in the human and macaque reward-learning domain (which focuses solely on striatum and cortex, and which rarely discusses cerebellum, see Averbeck & O’Doherty, 2022 & Klein-Flugge et al., 2022 for recent reviews), and, (2) allow this to be a more targeted focus of future work (specifically we plan on focusing on striatal-cerebellar interactions during learning, which are hypothesized based on the neuroanatomical tract tracing work of Bostan and Strick, etc.). We hope the reviewers respect our decisions in this regard.

      Nevertheless, we acknowledge that based on our statements about ‘whole-brain’ connectivity and vagueness about what we mean by ‘subcortex,’ that this may be confusing for the reader. We have now removed and/or corrected such references throughout the paper (however, note that in some cases it is difficult to avoid reference to “whole-brain” — e.g., “whole-brain correlation map” or “whole-brain false discovery rate correction”, which is standard terminology in the field).

      In addition, we are now explicit in our Methods section that the cerebellum was not included in our analyses.

      “Each volume comprised 34 contiguous (no gap) oblique slices acquired at a ~30° caudal tilt with respect to the plane of the anterior and posterior commissure (AC-PC), providing whole-brain coverage of the cerebrum and cerebellum. Note that for the current study, we did not examine changes in cerebellar activity during learning.”

      (4d) The authors centered the matrices before further analyses to remove variance associated with the subject. Why not run a PCA on the connectivity matrices and remove the PC that is associated with subject variance? What is the advantage of first centering the connectivity matrices? Is this standard practice in the field?

      Centering in some form has become reasonably common in the functional connectivity literature, as there is considerable evidence that task-related (or cognitive) changes in whole-brain connectivity are dwarfed by static, subject-level differences (e.g., Gratton, et al, 2018, Neuron). If covariance matrices were ordinary scalar values, then isolating task-related changes could be accomplished simply by subtracting a baseline scan or mean score; but because the space of covariance matrices is non-Euclidean, the actual computations involved in this subtraction are more complex (see our Methods). However, fundamentally (and conceptually) our procedure is simply ordinary mean-centering, but adapted to this non-Euclidean space. Despite the added complexity, there is considerable evidence that such computations — adapted directly to the geometry of the space of covariance matrices — outperform simpler methods, which treat covariance matrices as arrays of real numbers (e.g. naive substraction, see Dodero et al. & Ng et al., references below). Moreover, our previous work has found that this procedure works quite well to isolate changes associated with different task conditions (Areshenkoff et al., 2021, Neuroimage; Areshenkoff et al., 2022, elife).

      Although PCA can be adapted to work well with covariance matrix valued data, it would at best be a less direct solution than simply subtracting subjects' mean connectivity. This is because the top components from applying PCA would be dominated by both subject-specific effects (not of interest here), and by the large-scale connectivity structure typically observed in component based analyses of whole-brain connectivity (i.e. the principal gradient), whereas changes associated with task-condition (the thing of interest here) would be buried among the less reliable components. By contrast, our procedure directly isolates these task changes.

      References cited above:

      Dodero, L., Minh, H. Q., San Biagio, M., Murino, V., & Sona, D. (2015, April). Kernel-based classification for brain connectivity graphs on the Riemannian manifold of positive definite matrices. In 2015 IEEE 12th international symposium on biomedical imaging (ISBI) (pp. 42-45). IEEE.

      Ng, B., Dressler, M., Varoquaux, G., Poline, J. B., Greicius, M., & Thirion, B. (2014). Transport on Riemannian manifold for functional connectivity-based classification. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2014: 17th International Conference, Boston, MA, USA, September 14-18, 2014, Proceedings, Part II 17 (pp. 405-412). Springer International Publishing.

      (4e) Seems like a missed opportunity that the authors just use a single, PCA-derived measure to quantify learning, where multiple measures could have been of interest, especially given that the introduction established some interesting learning-related concepts related to exploration and exploitation, which could be conceptualized as movement variability and movement accuracy. It is unclear why the authors designed a task that was this novel and interesting, drawing on several psychological concepts, but then chose to ignore these concepts in the analysis.

      We were disappointed to hear that the reviewers did not appreciate our functional PCA-derived measure to quantify subject learning. This is a novel data-driven analysis approach that we have previously used with success in recent work (e.g., Areshenkoff et al., 2022, elife) and, from our perspective, we thought it was quite elegant that we were able to describe the entire trajectory of learning across all participants along a single axis that explained the majority (~75%) of the variance in the patterns of behavioral learning data. Moreover, the creation of a single behavioral measure per participant (what we call a ‘Learning score’, see Fig. 6C) helped simplify our brain-behavior correlation analyses considerably, as it provided a single measure that accounts for the natural auto-correlation in subjects’ learning curves (i.e., that subjects who learn quickly also tend to be better overall learners by the end of the learning phase). It also avoids the difficulty (and sometimes arbitrariness) of having to select specific trial bins for behavioral analysis (e.g., choosing the first 5, 10, 20 or 25 trials as a measure of ‘early learning’, and so on). Of course, one of the major alternatives to our approach would have involved fitting an exponential to each subject’s learning curves and taking measures like learning rate etc., but in our experience we have found that these types of models don’t always fit well, or derive robust/reliable parameters at the individual subject level. To strengthen the motivation for our approach, we have now included the following text in our Results:

      “To quantify this variation in subject performance in a manner that accounted the auto-correlation in learning performance over time (i.e., subjects who learned more quickly tend to exhibit better performance by the end of learning), we opted for a pure data-driven approach and performed functional principal component analysis (fPCA; (Shang, 2014)) on subjects’ learning curves. This approach allowed us to isolate the dominant patterns of variability in subject’s learning curves over time (see Methods for further details; see also Areshenkoff et al., 2022).”

      In any case, the reviewers may be pleased to hear that in current work in the lab we are using more model-based approaches to attempt to derive sets of parameters (per participant) that relate to some of the variables of interest described by the reviewers, but that we relate to much more dynamical (shorter-term) changes in brain activity.

      (4f) Overall Changes in Activity: The manuscript should delve into the potential influence of overall changes in brain activity on the results. The choice of using Euclidean distance as a metric for quantifying changes in connectivity is sensitive to scaling in overall activity. Therefore, it is crucial to discuss whether activity in task-relevant areas increases from baseline to early learning and decreases from early to late learning, or if other patterns emerge. A comprehensive analysis of overall activity changes will provide a more complete understanding of the findings.

      These are good questions and we are happy to explore this in the data. However, as mentioned in our response to query 4a above, it is important to note that the timeseries data for each brain region was z-scored prior to analysis, with the aim of removing any mean changes in activity levels (note that this is a standard preprocessing step when performing functional connectivity analysis, given that mean signal changes are not the focus of interest in functional connectivity analyses).

      To further emphasize these points, we have taken our z-scored timeseries data and calculated the mean signal for each region within each task epoch (Baseline, Early and Late learning, see panel A in figure below). The point of showing this data (where each z-score map looks near identical across the top, middle and bottom plots) is to demonstrate just how miniscule the mean signal changes are in the z-scored timeseries data. This point can also be observed when plotting the mean z-score signal across regions for each epoch (see panel B in figure below). Here we find that Baseline and Early learning have a near identical mean activation level across regions (albeit with slightly different variability across subjects), whereas there is a slight increase during late learning — though it should be noted that our y-axis, which measures in the thousandths, really magnifies this effect.

      To more directly address the reviewers’ comments, using the z-score signal per region we have also performed the same statistical pairwise comparisons (Early > Baseline and Late>Early) as we performed in the main manuscript Fig. 4 (see panel C in Author response image 9 below). In this plot, areas in red denote an increase in activity from Baseline to Early learning (top plot) and from Early to Late learning (bottom plot), whereas areas in blue denote a decrease for those same comparisons. The important thing to emphasize here is that the spatial maps resulting from this analysis are generally quite different from the maps of eccentricity that we report in Fig. 4 in our paper. For instance, in the figure below, we see significant changes in the activity of visual cortex between epochs but this is not found in our eccentricity results (compare with Fig. 4). Likewise, in our eccentricity results (Fig. 4), we find significant changes in the manifold positioning of areas in medial prefrontal cortex (MPFC), but this is not observed in the activation levels of these regions (panel C below). Again, we are hesitant to make too much of these results, as the activation differences denoted as significant in the figure below are likely to be an effect on the order of thousandths of a z-score (e.g., 0.002 > 0.001), but this hopefully assuages reviewers’ concerns that our manifold results are solely attributable to changes in overall activity levels.

      We are hesitant to include the results below in our paper as we feel that they don’t add much to the interpretation (as the purpose of z-scoring was to remove large activation differences). However, if the reviewers strongly believe otherwise, we would consider including them in the supplement.

      Author response image 9.

      Examination of overall changes in activity across regions. (A) Mean z-score maps across subjects for the Baseline (top), Early Learning (middle) and Late learning (bottom) epochs. (B) Mean z-score across brain regions for each epoch. Error bars represent +/- 1 SEM. (C) Pairwise contrasts of the z-score signal between task epochs. Positive (red) and negative (blue) values show significant increases and decreases in z-score signal, respectively, following FDR correction for region-wise paired t-tests (at q<0.05).

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      The authors aim to address a critical challenge in the field of bioinformatics: the accurate and efficient identification of protein binding sites from sequences. Their work seeks to overcome the limitations of current methods, which largely depend on multiple sequence alignments or experimental protein structures, by introducing GPSite, a multi-task network designed to predict binding residues of various molecules on proteins using ESMFold.

      Strengths:

      • Benchmarking. The authors provide a comprehensive benchmark against multiple methods, showcasing the performances of a large number of methods in various scenarios.

      • Accessibility and Ease of Use. GPSite is highlighted as a freely accessible tool with user-friendly features on its website, enhancing its potential for widespread adoption in the research community.

      RE: We thank the reviewer for acknowledging the contributions and strengths of our work!

      Weaknesses:

      • Lack of Novelty. The method primarily combines existing approaches and lacks significant technical innovation. This raises concerns about the original contribution of the work in terms of methodological development. Moreover, the paper reproduces results and analyses already presented in previous literature, without providing novel analysis or interpretation. This further diminishes the contribution of this paper to advancing knowledge in the field.

      RE: The novelty of this work is primarily manifested in four key aspects. Firstly, although we have employed several existing tools such as ProtTrans and ESMFold to extract sequence features and predict protein conformations, these techniques were hardly explored in the field of binding site prediction. We have successfully demonstrated the feasibility of substituting multiple sequence alignments with language model embeddings and training with predicted structures, providing a new solution to overcome the limitations of current methods for genome-wide applications. Secondly, though a few methods tend to capture geometric information based on protein surfaces or atom graphs, surface calculation and property mapping are usually time-consuming, while massage passing on full atom graphs is memory-consuming and thus challenging to process long sequences. Besides, these methods are sensitive towards details and errors in the predicted structures. To facilitate large-scale annotations, we have innovatively applied geometric deep learning to protein residue graphs for comprehensively capturing backbone and sidechain geometric contexts in an efficient and effective manner (Figure 1). Thirdly, we have not only exploited multi-task learning to integrate diverse ligands and enhance performance, but also shown its capability to easily extend to the binding site prediction of other unseen ligands (Figure 4 D-E). Last but not least, as a “Tools and Resources” article, we have provided a fast, accurate and user-friendly webserver, as well as constructed a large annotation database for the sequences in Swiss-Prot. Leveraging this database, we have conducted extensive analyses on the associations between binding sites and molecular functions, biological processes, and disease-causing mutations (Figure 5), indicating the potential of our tool to unveil unexplored biology underlying genomic data.

      We have now revised the descriptions in the “The geometry-aware protein binding site predictor (GPSite)” section to highlight the novelty of our work in a clearer manner:

      “In conclusion, GPSite is distinguished from the previous approaches in four key aspects. First, profiting from the effectiveness and low computational cost of ProtTrans and ESMFold, GPSite is liberated from the reliance on MSA and native structures, thus enabling genome-wide binding site prediction. Second, unlike methods that only explore the Cα models of proteins 25,40, GPSite exploits a comprehensive geometric featurizer to fully refine knowledge in the backbone and sidechain atoms. Third, the employed message propagation on residue graphs is global structure-aware and time-efficient compared to the methods based on surface point clouds 21,22, and memory-efficient unlike methods based on full atom graphs 23,24. Residue-based message passing is also less sensitive towards errors in the predicted structures. Last but not least, instead of predicting binding sites for a single molecule type or learning binding patterns separately for different molecules, GPSite applies multi-task learning to better model the latent relationships among different binding partners.”

      • Benchmark Discrepancies. The variation in benchmark results, especially between initial comparisons and those with PeSTo. GPSite achieves a PR AUC of 0.484 on the global benchmark but a PR AUC of 0.61 on the benchmark against PeSTo. For consistency, PeSTo should be included in the benchmark against all other methods. It suggests potential issues with the benchmark set or the stability of the method. This inconsistency needs to be addressed to validate the reliability of the results.

      RE: We thank the reviewer for the constructive comments. Since our performance comparison experiments involved numerous competitive methods whose training sets are disparate, it was difficult to compare or rank all these methods fairly using a single test set. Given the substantial overlap between our protein-binding site test set and the training set of PeSTo, we meticulously re-split our entire protein-protein binding site dataset to generate a new test set that avoids any overlap with the training sets of both GPSite and PeSTo and performed a separate evaluation, where GPSite achieves a higher AUPR than PeSTo (0.610 against 0.433). This is quite common in this field. For instance, in the study of PeSTo (Nat Commun 2023), the comparisons of PeSTo with MaSIF-site, SPPIDER, and PSIVER were conducted using one test set, while the comparison with ScanNet was performed on a separate test set.

      Based on the reviewer’s suggestion, we have now replaced this experiment with a direct comparison with PeSTo using the datasets from PeSTo, in order to enhance the completeness and convincingness of our results. The corresponding descriptions are now added in Appendix 1-note 2, and the results are added in Appendix 2-table 4. For convenience, we also attach the note and table here:

      “Since 340 out of 375 proteins in our protein-protein binding site test set share > 30% identity with the training sequences of PeSTo, we performed a separate comparison between GPSite and PeSTo using the training and test datasets from PeSTo. By re-training with simply the same hyperparameters, GPSite achieves better performance than PeSTo (AUPR of 0.824 against 0.797) as shown in Appendix 2-table 4. Furthermore, when using ESMFold-predicted structures as input, the performance of PeSTo decreases substantially (AUPR of 0.691), and the superiority of our method will be further reflected. As in 24, the performance of ScanNet is also included (AUPR of 0.720), which is also largely outperformed by GPSite.”

      Author response table 1.

      Performance comparison of GPSite with ScanNet and PeSTo on the protein-protein binding site test set from PeSTo 24

      Note: The performance of ScanNet and PeSTo are directly obtained from 24. PeSTo* denotes evaluation using the ESMFold-predicted structures as input. The metrics provided are the median AUPR, median AUC and median MCC. The best/second-best results are indicated by bold/underlined fonts.

      • Interface Definition Ambiguity. There is a lack of clarity in defining the interface for the binding site predictions. Different methods are trained using varying criteria (surfaces in MaSIF-site, distance thresholds in ScanNet). The authors do not adequately address how GPSite's definition aligns with or differs from these standards and how this issue was addressed. It could indicate that the comparison of those methods is unreliable and unfair.

      RE: We thank the reviewer for the comments. The precise definition of ligand-binding sites is elucidated in the “Benchmark datasets” section. Specifically, the datasets of DNA, RNA, peptide, ATP, HEM and metal ions used to train GPSite were collected from the widely acknowledged BioLiP database [PMID: 23087378]. In BioLiP, a binding residue is defined if the smallest atomic distance between the target residue and the ligand is <0.5 Å plus the sum of the Van der Waal’s radius of the two nearest atoms. Meanwhile, most comparative methods regarding these ligands were also trained on data from BioLiP, thereby ensuring fair comparisons.

      However, since BioLiP does not include data on protein-protein binding sites, studies for protein-protein binding site prediction may adopt slightly distinct label definitions, as the reviewer suggested. Here, we employed the protein-protein binding site data from our previous study [PMID: 34498061], where a protein-binding residue was defined as a surface residue (relative solvent accessibility > 5%) that lost more than 1 Å2 absolute solvent accessibility after protein-protein complex formation. This definition was initially introduced in PSIVER [PMID: 20529890] and widely applied in various studies (e.g., PMID: 31593229, PMID: 32840562). SPPIDER [PMID: 17152079] and MaSIF-site [PMID: 31819266] have also adopted similar surface-based definitions as PSIVER. On the other hand, ScanNet [PMID: 35637310] employed an atom distance threshold of 4 Å to define contacts while PeSTo [PMID: 37072397] used a threshold of 5 Å. However, it is noteworthy that current methods in this field including ScanNet (Nat Methods 2022) and PeSTo (Nat Commun 2023) directly compared methods using different label definitions without any alignment in their benchmark studies, likely due to the subtle distinctions among these definitions. For instance, the study of PeSTo directly performed comparisons with ScanNet, MaSIF-site, SPPIDER, and PSIVER. Therefore, we followed these previous works, directly comparing GPSite with other protein-protein binding site predictors.

      In the revised “Benchmark datasets” section, we have now provided more details for the binding site definitions in different datasets to avoid any potential ambiguity:

      “The benchmark datasets for evaluating binding site predictions of DNA, RNA, peptide, ATP, and HEM are constructed from BioLiP”; “A binding residue is defined if the smallest atomic distance between the target residue and the ligand is < 0.5 Å plus the sum of the Van der Waal’s radius of the two nearest atoms”; “Besides, the benchmark dataset of protein-protein binding sites is directly from 26, which contains non-redundant transient heterodimeric protein complexes dated up to May 2021. Surface regions that become solvent inaccessible on complex formation are defined as the ground truth protein-binding sites. The benchmark datasets of metal ion (Zn2+, Ca2+, Mg2+ and Mn2+) binding sites are directly from 18, which contain non-redundant proteins dated up to December 2021 from BioLiP.”

      While GPSite demonstrates the potential to surpass state-of-the-art methods in protein binding site prediction, the evidence supporting these claims seems incomplete. The lack of methodological novelty and the unresolved questions in benchmark consistency and interface definition somewhat undermine the confidence in the results. Therefore, it's not entirely clear if the authors have fully achieved their aims as outlined.

      The work is useful for the field, especially in disease mechanism elucidation and novel drug design. The availability of genome-scale binding residue annotations GPSite offers is a significant advancement. However, the utility of this tool could be hampered by the aforementioned weaknesses unless they are adequately addressed.

      RE: We thank the reviewer for acknowledging the advancement and value of our work, as well as pointing out areas where improvements can be made. As discussed above, we have now carried out the corresponding revisions in the revised manuscript to enhance the completeness and clearness of our work.

      Reviewer #2 (Public Review):

      Summary:

      This work provides a new framework, "GPsite" to predict DNA, RNA, peptide, protein, ATP, HEM, and metal ions binding sites on proteins. This framework comes with a webserver and a database of annotations. The core of the model is a Geometric featurizer neural network that predicts the binding sites of a protein. One major contribution of the authors is the fact that they feed this neural network with predicted structure from ESMFold for training and prediction (instead of native structure in similar works) and a high-quality protein Language Model representation. The other major contribution is that it provides the public with a new light framework to predict protein-ligand interactions for a broad range of ligands.

      The authors have demonstrated the interest of their framework with mostly two techniques: ablation and benchmark.

      Strengths:

      • The performance of this framework as well as the provided dataset and web server make it useful to conduct studies.

      • The ablations of some core elements of the method, such as the protein Language Model part, or the input structure are very insightful and can help convince the reader that every part of the framework is necessary. This could also guide further developments in the field. As such, the presentation of this part of the work can hold a more critical place in this work.

      RE: We thank the reviewer for recognizing the contributions of our work and for noting that our experiments are thorough.

      Weaknesses:

      • Overall, we can acknowledge the important effort of the authors to compare their work to other similar frameworks. Yet, the lack of homogeneity of training methods and data from one work to the other makes the comparison slightly unconvincing, as the authors pointed out. Overall, the paper puts significant effort into convincing the reader that the method is beating the state of the art. Maybe, there are other aspects that could be more interesting to insist on (usability, interest in protein engineering, and theoretical works).

      RE: We sincerely appreciate the reviewer for the constructive and insightful comments. As to the concern of training data heterogeneity raised by the reviewer, it is noteworthy that current studies in this field, such as ScanNet (Nat Methods 2022) and PeSTo (Nat Commun 2023), directly compare methods trained on different datasets in their benchmark experiments. Therefore, we have adhered to the paradigm in these previous works. According to the detailed recommendations by the reviewer, we have now improved our manuscript by incorporating additional ablation studies regarding the effects of training procedure and language model representations, as well as case studies regarding the predicted structure’s quality and GPSite-based function annotations. We have also refined the Discussion section to focus more on the achievements of this work. A comprehensive point-by-point response to the reviewer’s recommendations is provided below.

      Reviewer #2 (Recommendations For The Authors):

      Major comments:

      Overall I think the work is slightly deserved by its presentation. Some improvements could be made to the paper to better highlight the significance of your contribution.

      RE: We thank the reviewer for recognizing the significance of our work!

      • Line 188: "As expected, the performance of these methods mostly decreases substantially utilizing predicted structures for testing because they were trained with high-quality native structures.

      This is a major ablation that was not performed in this case. You used the predicted structure to train, while the other did not. One better way to assess the interest of this approach would be to compare the performance of a network trained with only native structure to compare the leap in performance with and without this predicted structure as you did after to assess the interest of some other aspect of your method such as single to multitask.

      RE: We thank the reviewer for the valuable recommendation. We have now assessed the benefit of training with predicted instead of native structures, which brings an average AUPR increase of 4.2% as detailed in Appendix 1-note 5 and Appendix 2-table 9. For convenience, we also attach the note and table here:

      “We examined the performance under different training and evaluation settings as shown in Appendix 2-table 9. As expected, the model yields exceptional performance (average AUPR of 0.656) when trained and evaluated using native structures. However, if this model is fed with predicted structures of the test proteins, the performance substantially declines to an average AUPR of 0.573. This trend aligns with the observations for other structure-based methods as illustrated in Figure 2. More importantly, in the practical scenario where only predicted structures are available for the target proteins, training the model with predicted structures (i.e., GPSite) results in superior performance than training the model with native structures (average AUPR of 0.594 against 0.573), probably owing to the consistency between the training and testing data. For completeness, the results in Appendix 3-figure 2 are also included where GPSite is tested with native structures (average AUPR of 0.637).”

      Author response table 2.

      Performance comparison on the ten binding site test sets under different training and evaluation settings

      Note: The numbers in this table are AUPR values. “Pep” and “Pro” denote peptide and protein, respectively. “Avg” means the average AUPR values among the ten test sets. “native” and “predicted” denote applying native and predicted structures as input, respectively.

      • Line 263: "ProtTrans consistently obtains competitive or superior performance compared to the MSA profiles, particularly for the target proteins with few homologous sequences (Neff < 2)."

      This seems a bit far-fetched. If we see clearly in the figure that the performances are far superior for Neff < 2. The performances seem rather similar for higher Neff. Could the author evaluate numerically the significance of the improvement? MSA profiles outperform GPSite on 4 intervals and I don't know the distribution of the data.

      RE: We thank the reviewer for the valuable suggestion. We have now revised this sentence to avoid any potential ambiguity:

      “As evidenced in Figure 4B and Appendix 2-table 8, ProtTrans consistently obtains competitive or superior performance compared to the MSA profile. Notably, for the target proteins with few homologous sequences (Neff < 2), ProtTrans surpasses MSA profile significantly with an improvement of 3.9% on AUC (P-value = 4.3×10-8).”

      The detailed significance tests and data distribution are now added in Appendix 2-table 8 and attached below as Author response-table 3 for convenience:

      Author response table 3.

      Performance comparison between GPSite and the baseline model using MSA profile for proteins with different Neff values in the combined test set of the ten ligands

      Note: Significance tests are performed following the procedure in 12,25. If P-value < 0.05, the difference between the performance is considered statistically significant.

      • Line 285: "We first visualized the distributions of residues in this dataset using t-SNE, where the residues are encoded by raw feature vectors encompassing ProtTrans embeddings and DSSP structural properties, or latent embedding vectors from the shared network of GPSite. "

      Wouldn't embedding from single-task be more relevant to show the interest of multi-task training here? Is the difference that big when comparing embeddings from single-task training to embeddings from multi-task training? Otherwise, I think the evidence from Figure 4e is sufficient, the interest of multitasking could be well-shown by single-task vs. multi-task AUPR and a few examples or predictions that are improved.

      RE: We thank the reviewer for the comment. In the second paragraph of the “The effects of protein features and model designs” section, we have compared the performance of multi-task and single-task learning. However, the visualization results in Figure 4D are related to the third paragraph, where we conducted a downstream exploration of the possibility to extend GPSite to other unseen ligands. This is based on the hypothesis that the shared network in GPSite may have captured certain common ligand-binding mechanisms during the preceding multi-task training process. We visualized the distributions of residues in an unseen carbohydrate-binding site dataset using t-SNE, where the residues are encoded by raw feature vectors (ProtTrans and DSSP), or latent embedding vectors from the shared network trained before. Although the shared network has not been specifically trained on the carbohydrate dataset, the latent representations from GPSite effectively improve the discriminability between the binding and non-binding residues as shown in Figure 4D. This finding indicates that the shared network trained on the initial set of ten molecule types has captured common binding mechanisms and may be applied to other unseen ligands.

      We have now added more descriptions in this paragraph to avoid potential ambiguity:

      “Residues that are conserved during evolution, exposed to solvent, or inside a pocket-shaped domain are inclined to participate in ligand binding. During the preceding multi-task training process, the shared network in GPSite should have learned to capture such common binding mechanisms. Here we show how GPSite can be easily extended to the binding site prediction for other unseen ligands by adopting the pre-trained shared network as a feature extractor. We considered a carbohydrate-binding site dataset from 54 which contains 100 proteins for training and 49 for testing. We first visualized the distributions of residues in this dataset using t-SNE 55, where the residues are encoded by raw feature vectors encompassing ProtTrans embeddings and DSSP structural properties, or latent embedding vectors from the shared network of GPSite trained on the ten molecule types previously.”

      • Line291: "Employing these informative hidden embeddings as input features to train a simple MLP exhibits remarkable performance with an AUC of 0.881 (Figure 4E), higher than that of training a single-task version of GPSite from scratch (AUC of 0.853) or other state-of-the-art methods such as MTDsite and SPRINT-CBH."

      Is it necessary to introduce other methods here? The single-task vs multi-task seems enough for what you want to show?

      RE: We thank the reviewer for the comment. As discussed above, here we aim to show the potential of GPSite for the binding site prediction of unseen ligand (i.e., carbohydrate) by adopting the pre-trained shared network as a feature extractor. Thus, we think it’s reasonable to also include the performance of other state-of-the-art methods in this carbohydrate benchmark dataset as baselines.

      • Line 321: "Specifically, a protein-level binding score can be generated for each ligand by averaging the top k predicted scores among all residues. Empirically, we set k to 5 for metal ions and 10 for other ligands, considering that the binding interfaces of metal ions are usually smaller."

      Since binding sites are usually not localized on one single amino-acid, we can expect that most of the top k residues are localized around the same area of the protein both spatially and along the sequence. Is it something you observe and could consider in your method?

      RE: We thank the reviewer for the comment. We employed a straightforward method (top-k average) to convert GPSite’s residue-level annotations into protein-level annotations, where k was set empirically based on the distributions of the numbers of binding residues per sequence observed in the training set. We have not put much effort in optimizing this strategy since it mainly serves as a proof-of-concept experiment (Figure 5 A-C) to show the potential of GPSite in discriminating ligand-binding proteins. We have now revised this sentence to better explain how we selected k:

      “Specifically, a protein-level binding score indicating the overall binding propensity to a specific ligand can be generated by averaging the top k predicted scores among all residues. Empirically, we set k to 5 for metal ions and 10 for other ligands, considering the distributions of the numbers of binding residues per sequence observed in the training set.”

      As for the question raised by the reviewer, we can indeed expect that most of the top k predicted binding residues tend to cluster into several but not necessarily one area. For instance, certain macromolecules like DNA may interact with several protein surface patches due to their elongated structures (e.g., Author esponse-figure 1A). Another case may be a protein binding to multiple molecules of the same ligand type (e.g., Author response-figure 1B).

      Author response image 1.

      The structures of 4XQK (A) and 4KYW (B) in PDB.

      • Line 327: The accuracy of the GPSite protein-level binding scores is further validated by the ROC curves in Figure 5B, where GPSite achieves satisfactory AUC values for all ligands except protein (AUC of 0.608).

      Here may be a good place to compare yourself with others, do other frameworks experience the same problem? If so, AUC and AUPR are not relevant here, can you expose some recall scores for example?

      RE: We thank the reviewer for the valuable recommendation. We have conducted comprehensive method comparisons in the preceding “GPSite outperforms state-of-the-art methods” section, where GPSite surpasses all existing frameworks across various ligands. Here, the genome-wide analyses of Swiss-Prot in Figure 5 serve as a downstream demonstration of GPSite’s capacity for large-scale annotations. We didn’t compare with other methods since most of them are time-consuming or memory-consuming, thus unavailable to process sequences of substantial quantity or length. For example, it takes about 8 min for the MSA-based method GraphBind to annotate a protein with 500 residues, while it just takes about 20 s for GPSite (see Appendix 3-figure 1 for detailed runtime comparison). It is also challenging for the atom-graph-based method PeSTo to process structures more than 100 kDa (~1000 residues) on a 32 GB GPU as the authors suggested, while GPSite can easily process structures containing up to 2500 residues on a 16 GB GPU.

      Regarding the recall score mentioned by the reviewer, GPSite achieves a recall of 0.95 (threshold = 0.5) for identifying protein-binding proteins. This indicates that GPSite can accurately identify positive samples, but it also tends to misclassify negative samples as positive. In our original manuscript, we claimed that “This may be ascribed to the fact that protein-protein interactions are ubiquitous in living organisms while the Swiss-Prot function annotations are incomplete”. To better support this claim, we have now added two examples in Appendix 1-note 7, where GPSite confidently predicted the presences of the “protein binding” function (GO:0005515). Notably, this function was absent in these two proteins in the Swiss-Prot database at the time of manuscript preparation (release: 2023-05-03), but has been included in the latest release of Swiss-Prot (release: 2023-11-08). For convenience, we also attach the note here:

      “As depicted in Figure 5A, GPSite assigns relatively high prediction scores to the proteins without “protein binding” function in the Swiss-Prot annotations, leading to a modest AUC value of 0.608 (Figure 5B). This may be ascribed to the fact that protein-protein interactions are ubiquitous in living organisms while the Swiss-Prot function annotations are incomplete. To support this hypothesis, we present two proteins as case studies, both sharing < 20% sequence identity with the protein-binding training set of GPSite. The first case is Aminodeoxychorismate synthase component 2 from Escherichia coli (UniProt ID: P00903). GPSite confidently predicted this protein as a protein-binding protein with a high prediction score of 0.936. Notably, this protein was not annotated with the “protein binding” function (GO:0005515) or any of its GO child terms in the Swiss-Prot database at the time of manuscript preparation (https://rest.uniprot.org/unisave/P00903?format=txt&versions=171, release: 2023-05-03). However, in the latest release of Swiss-Prot (https://rest.uniprot.org/unisave/P00903?format=txt&versions=174, release: 2023-11-08) during manuscript revision, this protein is annotated with the “protein heterodimerization activity” function (GO:0046982), which is a child term of “protein binding”. In fact, the heterodimerization activity of this protein has been validated through experiments in the year of 1996 (PMID: 8679677), indicating the potential incompleteness of the Swiss-Prot annotations. The other case is Hydrogenase-2 operon protein HybE from Escherichia coli (UniProt ID: P0AAN1), which was also predicted as a protein-binding protein by GPSite (score = 0.909). Similarly, this protein was not annotated with the “protein binding” function in the Swiss-Prot database at the time of manuscript preparation (https://rest.uniprot.org/unisave/P0AAN1?format=txt&versions=108). However, in the latest release of Swiss-Prot (https://rest.uniprot.org/unisave/P0AAN1?format=txt&versions=111), this protein is annotated with the “preprotein binding” function (GO:0070678), which is a child term of “protein binding”. In fact, the preprotein binding function of this protein has been validated through experiments in the year of 2003 (PMID: 12914940). These cases demonstrate the effectiveness of GPSite for completing the missing function annotations in Swiss-Prot.”

      • Line 381: 'Despite the noteworthy advancements achieved by GPSite, there remains scope for further improvements. Given that the ESM Metagenomic Atlas 34 provides 772 million predicted protein structures along with pre-computed language model embeddings, self-supervised learning can be employed to train a GPSite model for predicting masked sequence and structure attributes, or maximizing the similarity between the learned representations of substructures from identical proteins while minimizing the similarity between those from different proteins using a contrastive loss function training from scratch. Additional opportunities for upgrade exist within the network architecture. For example, a variational Expectation-Maximization (EM) framework 58 can be adopted to handle the hierarchical graph structure inherent in proteins, which contains the top view of the residue graph and the bottom view of the atom graph inside a residue. Such an EM procedure enables training two separate graph neural networks for the two views while simultaneously allowing interaction and mutual enhancement between the two modules. Meta-learning could also be explored in this multi-task scenario, which allows fast adaptation to unseen tasks with limited labels.'

      I think this does not belong here. It feels like half of your discussion is not talking about the achievements of this paper but future very specific directions. Focus on the take-home arguments (performances of the model, ability to predict a large range of tasks, interest in key components of your model, easy use) of the paper and possible future direction but without being so specific.

      RE: We thank the reviewer for the valuable suggestion. We have now simplified the discussions on the future directions notably:

      “Despite the noteworthy advancements achieved by GPSite, there remains scope for further improvements. GPSite may be improved by pre-training on the abundant predicted structures in ESM Metagenomic Atlas, and then fine-tuning on binding site datasets. Besides, the hidden embeddings from ESMFold may also serve as informative protein representations. Additional opportunities for upgrade exist within the network architecture. For example, a variational Expectation-Maximization framework can be adopted to handle the hierarchical atom-to-residue graph structure inherent in proteins. Meta-learning could also be explored in this multi-task scenario, which allows fast adaptation to unseen tasks with limited labels.”

      • Overall there is also a lack of displayed structure. You should try to select a few examples of binding sites that were identified correctly by your method and not by others, if possible get some insights on why. Also, some negative examples could be interesting so as to have a better idea of the interest.

      RE: We thank the reviewer for the valuable recommendation. We have performed a case study for the structure of the glucocorticoid receptor in Figure 3 D-H to illustrate a potential reason for the robustness of GPSite. Moreover, we have now added a case study in Appendix 1-note 3 and Appendix 3-figure 5 to explain why GPSite sometimes is not as accurate as the state-of-the-art structure-based method. For convenience, we also attach the note and figure here:

      “Here we present an example of an RNA-binding protein, i.e., the ribosome biogenesis protein ERB1 (PDB: 7R6Q, chain m), to illustrate the impact of predicted structure’s quality. As shown in Appendix 3-figure 5, ERB1 is an integral component of a large multimer structure comprising protein and RNA chains (i.e., the state E2 nucleolar 60S ribosome biogenesis intermediate). Likely due to the neglect of interactions from other protein chains, ESMFold fails to predict the correct conformation of the ERB1 chain (TM-score = 0.24). Using this incorrect predicted structure, GPSite achieves an AUPR of 0.580, lower than GraphBind input with the native structure (AUPR = 0.636). However, the performance of GraphBind substantially declines to an AUPR of 0.468 when employing the predicted structure as input. Moreover, if GPSite adopts the native structure for prediction, a notable performance boost can be obtained (AUPR = 0.681).”

      Author response image 2.

      The prediction results of GPSite and GraphBind for the ribosome biogenesis protein ERB1. (A) The state E2 nucleolar 60S ribosome biogenesis intermediate (PDB: 7R6Q). The ribosome biogenesis protein ERB1 (chain m) is highlighted in blue, while other protein chains are colored in gray. The RNA chains are shown in orange. (B) The RNA-binding sites on ERB1 (colored in red). (C) The ESMFold-predicted structure of ERB1 (TM-score = 0.24). The RNA-binding sites are also mapped onto this predicted structure (colored in red). (D-G) The prediction results of GPSite and GraphBind for the predicted and native ERB1 structures. The confidence of the predictions is represented with a gradient of color from blue for non-binding to red for binding.

      Minor comments:

      • Line 169: "Note that since our test sets may partly overlap with the training sets of these methods, the results reported here should be the upper limits for the existing methods."

      Yes, but they were potentially not trained on the most recent structures in that case. These methods could also see improved performance with an updated training set.

      RE: We thank the reviewer for the comment. We have now deleted this sentence.

      • Line176: "Since 358 of the 375 proteins in our protein-binding site test set share > 30% identity with the training sequences of PeSTo, we re-split our protein-binding dataset to generate a test set of 65 proteins sharing < 30% identity with the training set of PeSTo for a fair evaluation."

      Too specific to be here in my opinion.

      RE: We thank the reviewer for the comment. We have now moved these details to Appendix 1-note 2. The description in the main text here is now more concise:

      “Given the substantial overlap between our protein-binding site test set and the training set of PeSTo, we conducted separate training and comparison using the datasets of PeSTo, where GPSite still demonstrates a remarkable improvement over PeSTo (Appendix 1-note 2).”

      • Figure 2. The authors should try to either increase Fig A's size or increase the font size. This could probably be done by compressing the size of Figure C into a single figure.

      RE: We thank the reviewer for the suggestion. We have now increased the font size in Figure A. Besides, the figures in the final version of the manuscript should be clearer where we could upload SVG files.

      • Have you tried using embeddings from more structure-aware pLM such as ESM Fold embeddings (fine-tuned) or ProstTrans (that may be more recent than this study)?

      RE: We thank the reviewer for the insightful comment. We have not yet explored the embeddings from structure-aware pLM, but we acknowledge its potential as a promising avenue for future investigation. We have now added this point in our Discussion section:

      “Besides, the hidden embeddings from ESMFold may also serve as informative protein representations.”

      Reviewer #3 (Public Review):

      Summary

      The authors of this work aim to address the challenge of accurately and efficiently identifying protein binding sites from sequences. They recognize that the limitations of current methods, including reliance on multiple sequence alignments or experimental protein structure, and the under-explored geometry of the structure, which limit the performance and genome-scale applications. The authors have developed a multi-task network called GPSite that predicts binding residues for a range of biologically relevant molecules, including DNA, RNA, peptides, proteins, ATP, HEM, and metal ions, using a combination of sequence embeddings from protein language models and ESMFold-predicted structures. Their approach attempts to extract residual and relational geometric contexts in an end-to-end manner, surpassing current sequence-based and structure-based methods.

      Strengths

      • The GPSite model's ability to predict binding sites for a wide variety of molecules, including DNA, RNA, peptides, and various metal ions.

      • Based on the presented results, GPSite outperforms state-of-the-art methods in several benchmark datasets.

      • GPSite adopts predicted structures instead of native structures as input, enabling the model to be applied to a wider range of scenarios where native structures are rare.

      • The authors emphasize the low computational cost of GPSite, which enables rapid genome-scale binding residue annotations, indicating the model's potential for large-scale applications.

      RE: We thank the reviewer for recognizing the significance and value of our work!

      Weaknesses

      • One major advantage of GPSite, as claimed by the authors, is its efficiency. Although the manuscript mentioned that the inference takes about 5 hours for all datasets, it remains unclear how much improvement GPSite can offer compared with existing methods. A more detailed benchmark comparison of running time against other methods is recommended (including the running time of different components, since some methods like GPSite use predicted structures while some use native structures).

      RE: We thank the reviewer for the valuable suggestion. Empirically, it takes about 5-20 min for existing MSA-based methods to make predictions for a protein with 500 residues, while it only takes about 1 min for GPSite (including structure prediction). However, it is worth noting that some predictors in our benchmark study are solely available as webservers, and it is challenging to compare the runtime between a standalone program and a webserver due to the disparity in hardware configurations. Therefore, we have now included comprehensive runtime comparisons between the GPSite webserver and other top-performing servers in Appendix 3-figure 1 to illustrate the practicality and efficiency of our method. For convenience, we also attach the figure here as Author response-figure 3. The corresponding description is now added in the “GPSite outperforms state-of-the-art methods” section:

      “Moreover, GPSite is computationally efficient, achieving comparable or faster prediction speed compared to other top-performing methods (Appendix 3-figure 1).”

      Author response image 3.

      Runtime comparison of the GPSite webserver with other top-performing servers. Five protein chains (i.e., 8HN4_B, 8USJ_A, 8C1U_A, 8K3V_A and 8EXO_A) comprising 100, 300, 500, 700, and 900 residues, respectively, were selected for testing, and the average runtime is reported for each method. Note that a significant portion of GPSite’s runtime (75 s, indicated in orange) is allocated to structure prediction using ESMFold.

      • Since the model uses predicted protein structure, the authors have conducted some studies on the effect of the predicted structure's quality. However, only the 0.7 threshold was used. A more comprehensive analysis with several different thresholds is recommended.

      RE: We thank the reviewer for the comment. We assessed the effect of the predicted structure's quality by evaluating GPSite’s performance on high-quality (TM-score > 0.7) and low-quality (TM-score ≤ 0.7) predicted structures. We did not employ multiple thresholds (e.g., 0.3, 0.5, and 0.7), as the majority of proteins in the test sets were accurately predicted by ESMFold. Specifically, as shown in Figure 3B, Appendix 3-figure 3 and Appendix 2-table 5, the numbers of proteins with TM-score ≤ 0.7 are small in most datasets (e.g., 42 for DNA and 17 for ATP). Consequently, there is insufficient data available for analysis with lower thresholds, except for the RNA test set. Notably, Figure 3C presents a detailed inspection of the 104 proteins with TM-score < 0.5 in the RNA test set. Within this subset, GPSite consistently outperforms the state-of-the-art structure-based method GraphBind with predicted structures as input, regardless of the prediction quality of ESMFold. Only in cases where structures are predicted with extremely low quality (TM-score < 0.3) does GPSite fall behind GraphBind input with native structures. This result further demonstrates the robustness of GPSite. We have now added clearer explanations in the “GPSite is robust for low-quality predicted structures” section:

      “Figure 3B and Appendix 3-figure 3 show the distributions of TM-scores between native and predicted structures calculated by US-align in the ten benchmark datasets, where most proteins are accurately predicted with TM-score > 0.7 (see also Appendix 2-table 5)”; “Given the infrequency of low-quality predicted structures except for the RNA test set, we took a closer inspection of the 104 proteins with predicted structures of TM-score < 0.5 in the RNA test set.”

      • To demonstrate the robustness of GPSite, the authors performed a case study on human GR containing two zinc fingers, where the predicted structure is not perfect. The analysis could benefit from more a detailed explanation of why the model can still infer the binding site correctly even though the input structural information is slightly off.

      RE: We thank the reviewer for the comment. We have actually explained the potential reason for the robustness of GPSite in the second paragraph of the “GPSite is robust for low-quality predicted structures” section. In summary, although the whole structure of this protein is not perfectly predicted, the local structures of the binding domains of peptide, DNA and Zn2+ are actually predicted accurately as evidenced by the superpositions of the native and predicted structures in Figure 3D and 3E. Therefore, GPSite can still make reliable predictions. We have now revised this paragraph to explain these more clearly:

      “Figure 3D shows the structure of the human glucocorticoid receptor (GR), a transcription factor that binds DNA and assembles a coactivator peptide to regulate gene transcription (PDB: 7PRW, chain A). The DNA-binding domain of GR also consists of two C4-type zinc fingers to bind Zn2+ ions. Although the structure of this protein is not perfectly predicted (TM-score = 0.72), the local structures of the binding domains of peptide and DNA are actually predicted accurately as viewed by the superpositions of the native and predicted structures in Figure 3D and 3E. Therefore, GPSite can correctly predict all Zn2+ binding sites and precisely identify the binding sites of DNA and peptide with AUPR values of 0.949 and 0.924, respectively (Figure 3F, G and H).”

      • To analyze the relatively low AUC value for protein-protein interactions, the authors claimed that it is "due to the fact that protein-protein interactions are ubiquitous in living organisms while the Swiss-Prot function annotations are incomplete", which is unjustified. It is highly recommended to support this claim by showing at least one example where GPSite's prediction is a valid binding site that is not present in the current Swiss-Prot database or via other approaches.

      RE: We thank the reviewer for the valuable recommendation. To support this claim, we have now added two examples in Appendix 1-note 7, where GPSite confidently predicted the presences of the “protein binding” function (GO:0005515). Notably, this function was absent in these two proteins in the Swiss-Prot database at the time of manuscript preparation (release: 2023-05-03), but has been included in the latest release of Swiss-Prot (release: 2023-11-08). For convenience, we also attach the note below:

      “As depicted in Figure 5A, GPSite assigns relatively high prediction scores to the proteins without “protein binding” function in the Swiss-Prot annotations, leading to a modest AUC value of 0.608 (Figure 5B). This may be ascribed to the fact that protein-protein interactions are ubiquitous in living organisms while the Swiss-Prot function annotations are incomplete. To support this hypothesis, we present two proteins as case studies, both sharing < 20% sequence identity with the protein-binding training set of GPSite. The first case is Aminodeoxychorismate synthase component 2 from Escherichia coli (UniProt ID: P00903). GPSite confidently predicted this protein as a protein-binding protein with a high prediction score of 0.936. Notably, this protein was not annotated with the “protein binding” function (GO:0005515) or any of its GO child terms in the Swiss-Prot database at the time of manuscript preparation (https://rest.uniprot.org/unisave/P00903?format=txt&versions=171, release: 2023-05-03). However, in the latest release of Swiss-Prot (https://rest.uniprot.org/unisave/P00903?format=txt&versions=174, release: 2023-11-08) during manuscript revision, this protein is annotated with the “protein heterodimerization activity” function (GO:0046982), which is a child term of “protein binding”. In fact, the heterodimerization activity of this protein has been validated through experiments in the year of 1996 (PMID: 8679677), indicating the potential incompleteness of the Swiss-Prot annotations. The other case is Hydrogenase-2 operon protein HybE from Escherichia coli (UniProt ID: P0AAN1), which was also predicted as a protein-binding protein by GPSite (score = 0.909). Similarly, this protein was not annotated with the “protein binding” function in the Swiss-Prot database at the time of manuscript preparation (https://rest.uniprot.org/unisave/P0AAN1?format=txt&versions=108). However, in the latest release of Swiss-Prot (https://rest.uniprot.org/unisave/P0AAN1?format=txt&versions=111), this protein is annotated with the “preprotein binding” function (GO:0070678), which is a child term of “protein binding”. In fact, the preprotein binding function of this protein has been validated through experiments in the year of 2003 (PMID: 12914940). These cases demonstrate the effectiveness of GPSite for completing the missing function annotations in Swiss-Prot.”

      • The authors reported that many GPSite-predicted binding sites are associated with known biological functions. Notably, for RNA-binding sites, there is a significantly higher proportion of translation-related binding sites. The analysis could benefit from a further investigation into this observation, such as the analyzing the percentage of such interactions in the training site. In addition, if there is sufficient data, it would also be interesting to see the cross-interaction-type performance of the proposed model, e.g., train the model on a dataset excluding specific binding sites and test its performance on that class of interactions.

      RE: We thank the reviewer for the suggestion. We would like to clarify that the analysis in Figure 5C was conducted at “protein-level” instead of “residue-level”. As described in the second paragraph of the “Large-scale binding site annotation for Swiss-Prot” section, a protein-level ligand-binding score was assigned to a protein by averaging the top k residue-level predicted binding scores. This protein-level score indicates the overall binding propensity of the protein to a specific ligand. We gathered the top 20,000 proteins with the highest protein-level binding scores for each ligand and found that their biological process annotations from Swiss-Prot were consistent with existing knowledge. We have now revised the corresponding sentence to explain these more clearly:

      “Exploiting the residue-level binding site annotations, we could readily extend GPSite to discriminate between binding and non-binding proteins of various ligands. Specifically, a protein-level binding score indicating the overall binding propensity to a specific ligand can be generated by averaging the top k predicted scores among all residues.”

      As for the cross-interaction-type performance raised by the reviewer, we have now conducted cross-type evaluations to investigate the specificity of the ligand-specific MLPs and the inherent similarities among different ligands in Appendix 1-note 6 and Appendix 2-table 10. For convenience, we also attach the note and table here:

      “We conducted cross-type evaluations by applying different ligand-specific MLPs in GPSite for the test sets of different ligands. As shown in Appendix 2-table 10, for each ligand-binding site test set, the corresponding ligand-specific network consistently achieves the best performance. This indicates that the ligand-specific MLPs have specifically learned the binding patterns of particular molecules. We also noticed that the cross-type performance is reasonable for the ligands sharing similar properties. For instance, the DNA-specific MLP exhibits a reasonable AUPR when predicting RNA-binding sites, and vice versa. Similar trends are also observed between peptide and protein, as well as among metal ions as expected. Interestingly, the cross-type performance between ATP and HEM is also acceptable, potentially attributed to their comparable molecular weights (507.2 and 616.5, respectively).”

      Author response table 4.

      Cross-type performance by applying different ligand-specific MLPs in GPSite for the test sets of different ligands

      Note: “Pep” and “Pro” denote peptide and protein, respectively. The numbers in this table are AUPR values. The best/second-best result in each test set is indicated by bold/underlined font.

    1. Author Response

      eLife assessment

      The authors report that optogenetic inhibition of hippocampal axon terminals in retrosplenial cortex impairs the performance of a delayed non-match to place task. The significance of findings elucidating the role of hippocampal projections to the retrosplenial cortex in memory and decision-making behaviors is important. However, the strength of evidence for the paper's claims is currently incomplete.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This is a study on the role of the retrosplenial cortex (RSC) and the hippocampus in working memory. Working memory is a critical cognitive function that allows temporary retention of information for task execution. The RSC, which is functionally and anatomically connected to both primary sensory (especially visual) and higher cognitive areas, plays a key role in integrating spatial-temporal context and in goal-directed behaviors. However, the specific contributions of the RSC and the hippocampus in working memory-guided behaviors are not fully understood due to a lack of studies that experimentally disrupt the connection between these two regions during such behaviors.

      In this study, researchers employed eArch3.0 to silence hippocampal axon terminals in the RSC, aiming to explore the roles of these brain regions in working memory. Experiments were conducted where animals with silenced hippocampal axon terminals in the RSC performed a delayed non-match to place (DNMP) task. The results indicated that this manipulation impaired memory retrieval, leading to decreased performance and quicker decision-making in the animals. Notably, the authors observed that the effects of this impairment persisted beyond the light-activation period of the opsin, affecting up to three subsequent trials. They suggest that disrupting the hippocampal-RSC connection has a significant and lasting impact on working memory performance.

      Strengths:

      They conducted a study exploring the impact of direct hippocampal inputs into the RSC, a region involved in encoding spatial-temporal context and transferring contextual information, on spatial working memory tasks. Utilizing eArch3.0 expressed in hippocampal neurons via the viral vector AAV5-hSyn1-eArch3.0, they aimed to bilaterally silence hippocampal terminals located at the RSC in rats pre-trained in a DNMP task. They discovered that silencing hippocampal terminals in the RSC significantly decreased working memory performance in eArch+ animals, especially during task interleaving sessions (TI) that alternated between trials with and without light delivery. This effect persisted even in non-illuminated trials, indicating a lasting impact beyond the periods of direct manipulation. Additionally, they observed a decreased likelihood of correct responses following TI trials and an increased error rate in eArch+ animals, even after incorrect responses, suggesting an impairment in error-corrective behavior. This contrasted with baseline sessions where no light was delivered, and both eArch+ and control animals showed low error rates.

      Weaknesses:

      While I agree with the authors that the role of hippocampal inputs to the RSC in spatial working memory is understudied and merits further investigation, I find that the optogenetic experiment, a core part of this manuscript that includes viral injections, could be improved. The effects were rather subtle, rendering some of the results barely significant and possibly too weak to support major conclusions.

      We thank Reviewer#1 for carefully and critically reading our manuscript, and for the valuable comments provided. The judged “subtlety” of the effects stems from a perspective according to which a quantitatively lower effect bears less biological significance for cognition. We disagree with this perspective and find it rather reductive for several reasons.

      Once seen in the context of the animal’s ecology, subtle impairments can be life-threatening precisely because of their subtlety, leading the animal to confidently rely on a defective capacity, for such events as remembering the habitual location of a predator, or food source.

      Also, studies in animal cognition often undertake complete, rather than graded, suppression of a given mechanism (in the same sense as that of “knocking out” a gene that is relevant for behaviour), leading to a gravelly, rather that gradually, impaired model system, to the point of not allowing a hypothetical causal link to be mechanistically revealed beyond its mere presence. This often hinders a thorough interpretation of the perturbed factor’s role. If a caricatural analogy is allowed, it would be as if we were to study the role of an animal’s legs by chopping them both off and observing the resulting behaviour.

      In our study we conclude that silencing HIPP inputs in RSC perturbs cognition enough to impair behaviour while not disabling the animal entirely, as such allowing for behaviour to proceed, and for our observation of graded, decreased (not absent), proficiency under optogenetic silencing. So rather than weak, we would say the results are statistically significant, and biologically realistic.

      Additionally, no mechanistic investigation was conducted beyond referencing previous reports to interpret the core behavioral phenotypes.

      We fully agree with this being a weakness, as we wish we could have done more mechanistic studies to find out exactly what is Arch activation doing to HIPP-RSC transmission, which neurons are being affected, and perhaps in the future dissect its circuit determinants. We have all these goals very present and hope we can address them soon.

      Reviewer #2 (Public Review):

      The authors examine the impact of optogenetic inhibition of hippocampal axon terminals in the retrosplenial cortex (RSP) during the performance of a working memory T-maze task. Performance on a delayed non-match-to-place task was impaired by such inhibition. The authors also report that inhibition is associated with faster decision-making and that the effects of inhibition can be observed over several subsequent trials. The work seems reasonably well done and the role of hippocampal projections to retrosplenial cortex in memory and decision-making is very relevant to multiple fields. However, the work should be expanded in several ways before one can make firm conclusions on the role of this projection in memory and behavior.

      We thank Reviewer#2 for carefully and critically reading our manuscript, and for the valuable comments provided.

      (1) The work is very singular in its message and the experimentation. Further, the impact of the inhibition on behaviour is very moderate. In this sense, the results do not support the conclusion that the hippocampal projection to retrosplenial cortex is key to working memory in a navigational setting.

      As we have mentioned in response to Reviewer#1, the judged “very moderate” effect stems from a perspective according to which a quantitatively lower effect bears less biological significance for cognition, precluding its consideration as “key” for behaviour. We disagree with this perspective and find it rather reductive for several reasons. Once seen in the context of the animal’s ecology, quantitatively lower impairments in working memory are no less key for this cognitive capacity, and can be life-threatening precisely because of their subtlety, leading the animal to confidently rely on a defective capacity, for such events as remembering the habitual location of a predator, or food source. Furthermore, studies in animal cognition often undertake complete, rather than graded, suppression of a given mechanism (in the same sense as “knocking out” a gene that is relevant for behaviour), leading to a gravelly, rather that gradually, impaired model system, to the point of not allowing a hypothetical causal link to be mechanistically revealed beyond its mere presence. This often hinders a thorough interpretation of its role.

      In our study we conclude that silencing HIPP inputs in RSC perturbs behaviour enough to impair behaviour while not disabling the animal entirely, as such allowing for behaviour to proceed, and our observation of graded, decreased (not absent), proficiency under optogenetic silencing. So rather than weak, we would say the results are statistically significant, and biologically realistic.

      (2) There are no experiments examining other types of behavior or working memory. Given that the animals used in the studies could be put through a large number of different tasks, this is surprising. There is no control navigational task. There is no working memory test that is non-spatial. Such results should be presented in order to put the main finding in context.

      It is hard to gainsay this point. The more thorough and complete a behavioural characterization is, the more informative is the study, from every angle you look at it. While we agree that other forms of WM would be quite interesting in this context, we also cannot ignore the fact that DNMP is widely tested as a WM task, one that is biologically plausible, sensitive to perturbations of neural circuitry know to be at play therein, and fully accepted in the field. Faced with the impossibility of running further studies, for lack of additional funding and human resources, we chose to run this task.

      A control navigational task would, in our understanding, be used to assess whether silencing HIPP projections to RSC would affect (spatial?) navigation, rather than WM, thus explaining the observed impairment. To this we have the following to say: Spatial Navigation is a very basic cognitive function, one that relies on body orientation relative to spatial context, on keeping an updated representation of such spatial context, (“alas”, as memory), and on guiding behaviour according to acquired knowledge about spatial context. Some of these functions are integral to spatial working memory, as such, they might indeed be affected.

      Dissecting the determinants of spatial WM is indeed an ongoing effort, one that was not the intention of the current study, but also one that we have very present, in hope we can address in the future.

      A non-spatial WM task would indeed vastly solidify our claims beyond spatial WM, onto WM. We have, for this reason, changed the title of the manuscript which now reads “spatial working memory”.

      (3) The actual impact of the inhibition on activity in RSP is not provided. While this may not be strictly necessary, it is relevant that the hippocampal projection to RSP includes, and is perhaps dominated by inhibitory inputs. I wonder why the authors chose to manipulate hippocampal inputs to RSP when the subiculum stands as a much stronger source of afferents to RSP and has been shown to exhibit spatial and directional tuning of activity. The points here are that we cannot be sure what the manipulation is really accomplishing in terms of inhibiting RSP activity (perhaps this explains the moderate impact on behavior) and that the effect of inhibiting hippocampal inputs is not an effective means by which to study how RSP is responsive to inputs that reflect environmental locations.

      We fully agree that neural recordings addressing the effect of silencing on RSC neural activity is relevant. We do wish we could have provided more mechanistic studies, to find out exactly what is Arch activation doing to HIPP-RSC transmission, which neurons are being affected, and thus dissecting its circuit determinants. We have all these goals very present and hope we can address them soon. Subiculum, which we mention in the Introduction, is indeed a key player in this complex circuitry, one whose hypothetical influence is the subject of experimental studies which will certainly reveal many other key elements.

      (4) The impact of inhibition on trials subsequent to the trial during which optical stimulation was actually supplied seems trivial. The authors themselves point to evidence that activation of the hyperpolarizing proton pump is rather long-lasting in its action. Further, each sample-test trial pairing is independent of the prior or subsequent trials. This finding is presented as a major finding of the work, but would normally be relegated to supplemental data as an expected outcome given the dynamics of the pump when activated.

      We disagree that this finding is “trivial”, and object to the considerations of “normalcy”, which we are left wondering about.

      In lack of neurophysiological experiments (for the reasons stated above) to address this interesting finding, we chose to interpret it in light of (the few) published observations, such being the logical course of action in scientific reporting, given the present circumstances.

      Evidence for such a prolonged effect in the context of behaviour is scarce (to our knowledge only the one we cite in the manuscript). As such, it is highly relevant to report it, and give it the relevance we do in our manuscript, rather than “relegating it to supplementary data”, as the reviewer considers being “normal”.

      In the DNMP task the consecutive sample-test pairs are explicitly not independent, as they are part of the same behavioural session. This is illustrated by the simple phenomenon of learning, namely the intra-session learning curves, and the well-known behavioral trial-history effects. The brain does not simply erase such information during the ITI.

      (5) In the middle of the first paragraph of the discussion, the authors make reference to work showing RSP responses to "contextual information in egocentric and allocentric reference frames". The citations here are clearly deficient. How is the Nitzan 2020 paper at all relevant here?

      Nitzan 2020 reports the propagation of information from HIPP to CTX via SUB and RSC, thus providing a conduit for mnemonic information between the two structures, alternative to the one we target, thus providing thorough information concerning the HIPP-RSC circuitry at play during behaviour.

      Alexander and Nitz 2015 precisely cite the encoding, and conjunction, of two types of contextual information, internal (ego-) and external (allocentric).

      The subsequent reference is indeed superfluous here.

      We thank the Reviewer#2 for calling our attention to the fact that references for this information are inadequate and lacking. We have now cited (Gill et al., 2011; Miller et al., 2019; Vedder et al., 2017) and refer readers to the review (Alexander et al., 2023) for the purpose of illustrating the encoding of information in the two reference frames. In addition, we have substantially edited the Introduction and Discussion sections, and suppressed unnecessary passages.

      (6) The manuscript is deficient in referencing and discussing data from the Smith laboratory that is similar. The discussion reads mainly like a repeat of the results section.

      Please see above. We thank Reviewer#2 for this comment, we have now re-written the Discussion such that it is less of a summary of the Results and more focused on their implications and future directions.

    1. Author Response

      The following is the authors’ response to the current reviews.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Hats off to the authors for taking time to decipher the seemingly subtle but important differences between the Gnai2/3 double mutant and Ptx mutant phenotypes. These results further illustrate the dynamic requirement of Gnai/0 in hair bundle establishment. I have some minor suggestions for the authors to consider and it is up to the authors to decide whether to incorporate them:

      We decided to make the current (revised) version the version of record, and we explain why below. Please include these comments in the review+rebuttal material.

      (1) The abstract could be modified to reflect the revised interpretations of the results.

      Response: the abstract is high-level and the changes in interpretation in the revised manuscript do not modify the message there. Briefly, the abstract only states that Gnai2; Gnai3 double mutants recapitulate two defects previously only observed with pertussis toxin. There is no claim about the timing or dose of GNAI proteins involved.

      (2) The three rows of OHCs are like a different beast from each other. Mireille Montcouquiol's lab has demonstrated that there is a differential requirement for Gnai3 in hair bundle orientation among the three rows of OHCs. The results described in this manuscript support this notion as well.

      To clarify, Gnai3 inactivation does not affect OHC orientation. Only pertussis toxin, and in this work Gnai2; Gnai3 double mutants, do. The Montcouquiol lab showed different degree of OHC1, OHC2 and OHC3 misorientation upon use of pertussis toxin in vitro using cochlear explants (Ezan et al 2013). We showed the same thing in vivo using transgenic models (Tarchini et al 2013; Tarchini et al 2016). The different OHC responses by row and corresponding citations are mentioned in several locations in the manuscript, including first on line 112 in the Introduction and in Fig. 1C in a graphical summary.

      (3) I wonder if "compensate" or "redundancy" may be a better term to use than "rescue" in the Discussion and figure.

      Use of “rescue” in the Discussion is line 603 and 604. We think that “rescue” is appropriate to refer to the ability of GNAI2 to compensate for the loss of GNAI1 and GNAI3 in mutant context. We would argue that these different wordings are largely interchangeable and do not change the message.


      Author Response

      The following is the authors’ response to the original reviews.

      We really appreciate the time the reviewers spent reading and commenting on the original manuscript. Although they were positive already, we decided to spend some time to address the main comments with new experiments as thoroughly as possible in a new manuscript version. We also heavily edited some sections accordingly.: 1) we delayed pertussis toxin activation in hair cells with Atoh1-Cre to show that the resulting misorientation phenotype is delayed compared to FoxG1-Cre results, as also seen in Gnai2; Gnai3 double mutants. It follows that Gnai2; Gnai3 and pertussis mutants do share a similar misorientation profile, and that GNAI proteins are required to normally reverse OHC1-2 (from medial to lateral), but also to maintain the lateral orientation, at least transiently. 2) We experimentally verified that one of our GNAI antibodies can indeed detect GNAI1, and consequently that absence of signal in Gnai2; Gnai3 double mutants is evidence that GNAI1 is not involved in apical hair cell polarization. We believe these changes strengthen the manuscript and its conclusions.

      Reviewer #1 (Public Review):

      A subclass of inhibitory heterotrimeric guanine nucleotide-binding protein subunits, GNAI, has been implicated in sensory hair cell formation, namely the establishment of hair bundle (stereocilia) orientation and staircase formation. However, the former role of hair bundle orientation has only been demonstrated in mutants expressing pertussis toxin, which blocks all GNAI subunits, but not in mutants with a single knockout of any of the Gnai genes, suggesting that there is a redundancy among various GNAI proteins in this role. Using various conditional mutants, the authors concluded that GNAI3 is the primary GNAI proteins required for hair bundle morphogenesis, whereas hair bundle orientation requires both GNAI2 and GNAI3.

      Strength

      Various compound mutants were generated to decipher the contribution of individual GNAI1, GNAI2, GNAI3 and GNAIO in the establishment of hair bundle orientation and morphogenesis. The study is thorough with detailed quantification of hair bundle orientation and morphogenesis, as well as auditory functions.

      Weakness

      While the hair bundle orientation phenotype in the Foxg1-cre; Gnai2-/-; Gnai3 lox/lox (double mutants) appear more severe than those observed in Ptx cKO mutants, it may be an oversimplification to attribute the differences to more GNAI function in the Ptx cko mutants. The phenotypes between the double mutants and Ptx cko mutants appear qualitatively different. For example, assuming the milder phenotypes in the Ptx cKO is due to incomplete loss of GNAI function, one would expect the Ptx phenotype would be reproducible by some combination of compound mutants among various Gnai genes. Such information was not provided. Furthermore, of all the double mutant specimens analyzed for hair bundle orientation (Fig. 8), the hair bundle/kinocilium position started out normally in the lateral quadrant at E17.5 but failed to be maintained by P0. This does not appear to be the case for Ptx cKO, in which all affected hair cells showed inverted orientation by E17.5. It is not clear whether this is the end-stage of bundle orientation in Ptx cKO, and the kinocilium position started out normal, similar to the double mutants before the age of analysis at E17.5. Understanding these differences may reveal specific requirements of individual GNAI subunits or other factors are being affected in the Ptx mutants.

      This criticism was very useful and prompted new experiments as well as a change in data presentation and a fundamental rewrite regarding hair cell orientation. These changes are detailed below. Of note, however, please let us clarify that the original manuscript did show that the ptxA orientation phenotype is reproduced to some extent in Gnai2; Gnai3 double mutants (previously Fig. 8 and corresponding text line 505). We showed that OHC1-2 are also inverted in the double mutant, although at a later differentiation stage. We recognize that similarities in hair cell misorientation between ptxA and Gnai2; Gnai3 DKO were not explained and discussed well enough. This part of the manuscript has been re-worked extensively, and we hope that along with new results, comparisons between mutant models are easier to follow and understand. We notably fully adopted the idea that there are qualitative differences between ptxA and Gnai2; Gnai3 mutants, and not only a difference in the remaining “dose” of GNAI activity.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Comments related to clarification of the weakness:

      (1) In general, hair bundle orientation in the double mutants is established in the lateral quadrant of the cochlea before being inverted (Fig. 8). These results are intriguing because the lateral orientation is the correct position for these hair bundles normally and Gnai proteins are thought to be required to get the kinocilium to the lateral position. This process appears to proceed normally in the double mutants but the kinocilium reverted to the medial default position over time, which suggests that Gnai2 and Gnai3 are only required for the maintenance and not the establishment of the kinocilium in the lateral position. Is this phenotype qualitatively similar in the Ptx cKO?

      We addressed these issues with two types of modifications to the data:

      (1) We modified the eccentricity threshold used at E17.5 in Fig. 8 (orientation) to be more stringent, using 0.4 (instead of 0.25 previously) in both controls and mutants. This means that we now only graph the orientation of cells where eccentricity is more marked. The rationale is that at early stages, it is challenging to distinguish immature vs defective near-symmetrical cells. We kept a threshold of 0.25 at P0 when the hair cell apical surface is larger and better differentiated (Fig. 8C-D). Importantly, the dataset remains rigorously identical. This change usefully highlights that a large proportion of OHC1 is in fact inverted (oriented medially) at E17.5 in Gnai2; Gnai3 double mutants at the cochlear mid, as also seen in the ptxA model at the same stage and position (see new Fig. 8A). At the E17.5 base (Fig. 8B), a slightly more mature position, the outcome is unchanged (the majority of OHC1 are inverted using either a 0.25 or 0.4 threshold in double mutants and in ptxA).

      Interestingly however, the orientation trend is unchanged for OHC2: OHC2 remain oriented largely laterally (i.e. normally) at the E17.5 mid and base in Gnai2; Gnai3 double mutants even with a raised eccentricity thresholds, whereas by contrast OHC2 in ptxA are inverted at these stage and positions. In the double mutant, OHC2 only become inverted at the P0 base (Fig. 8D). This suggests that there are similarities (OHC1) but also differences (OHC2s) between the two mouse models, and that double mutants show a delay in adopting an inverted orientation compared to ptxA. Of note, OHC2 have been shown to differentiate later than OHC1 (for example, Anniko 1983 PMID:6869851).

      (2) To directly test the idea that the misorientation phenotype (inverted OHC1-2) is comparable between the two models but delayed in Gnai2; Gnai3 mutants, we performed a new experiment and added new results in the manuscript. We delayed ptxA action by using Atoh1-Cre (postmitotic hair cells) instead of FoxG1-Cre (otic progenitors). Remarkably, this produced a pattern of OHC1-2 misorientation more similar to Gnai2; Gnai3 mutants: at the E17.5 base and P0 apex, OHC2 were still largely oriented laterally (normally) in Atoh1-Cre; ptxA as in Gnai2; Gnai3 mutants whereas at the P0 base a large proportion of OHC2 were inverted (Fig. 8 Supp 1B). OHC1 were inverted at all stages and positions in the Atoh1-Cre as in the FoxG1-Cre; ptxA model. For Atoh1-Cre; ptxA, we only illustrated OHC1 and OHC2 and did not add E17.5 mid or P0 mid results because other cell types and stage/positions did not provide additional insight. In addition, we are well aware that the full FoxG1-Cre; ptxA and Gnai2; Gnai3 results for 4 cells types (IHC, OHC1-3) and 5 stages/positions is already a lot of data for cell orientation.

      These results suggest that:

      (a) The normal reversal of OHC1-2 to adopt a lateral orientation needs to be maintained, at least transiently, and that maintenance also relies on GNAI/O (Results starting line 529. Disussion line 621).

      (b) ptxA is more severe than Gnai2; Gnai3 when it comes to OHC1-2 orientation (Figure 9, role b). Oppositely, Gnai2; Gnai3 is obviously more severe when it comes to symmetry-breaking (Fig. 9, role a) and hair bundle morphogenesis (Fig. 9, c). It follows that the two early GNAI/O activities are qualitatively different and not just based on dose. This is essentially what this Reviewer correctly pointed out, and we have fully edited both Results and Discussion accordingly. We now speculate that the difference may lie in the identity of the necessary GNAI/O protein for each role. Any GNAI/O proteins acting as a switch downstream of the GPR156 receptor may relay orientation information (Fig. 9, role b), making ptxA a particularly effective disruption strategy since it downregulates all GNAI/O proteins. In contrast, symmetry-breaking may rely more specifically on GNAI2 and GNAI3, and ptxA is not expected to achieve a loss-of-function of GNAI2 and GNAI3 as extensive as a double targeted genetic inactivation of the corresponding genes. Please see new Results starting line 526 and Discussion starting line 603. We consequently abandoned the notion that increased doses of GNAI/O is required for each role, and we also clarify that symmetry-breaking (a) and orientation (b) occur at the same time (Fig. 9).

      (2) P0 may not be late enough a stage to access phenotype maturity in the double mutants. For example, it is not clear from the basal PO results whether the IHC will acquire an inverted phenotype or just misorientation in the lateral side.

      For context, the OHC1-2 misorientation pattern in the ptxA model at P0 does represent the end stage, as the same pattern is observed in adults (illustrated in Fig. 2A). In addition, OHC1-2 that express ptxA are inverted as soon as they break planar symmetry, and this was established at E16.5 in a previous publication where ptxA and Gpr156 misorientation patterns were compared and shown to be identical (Kindt et al., 2021 Supp. fig. 5C-D). However, we clearly failed to mention these important results in the original manuscript. We now cite Figure 2 for adult defects (line 522), and provide a citation for OHC1-2 inversion being observed from earliest stage of hair cell differentiation (Kindt et al., 2021) (line 519).

      The vast majority of Gnai2; Gnai3 double mutants die before weaning but the single specimen we managed to collect at P21 also showed inverted OHC1-2 (representative example in Fig. 2A). Again, we previously failed to point out this important result. We now do so line 214 and 555. This is another evidence that OHC1-2 misorientation is in fact similar in the ptxA and Gnai2; Gnai3 models (but milder and delayed in the latter).

      When it comes to IHCs and OHC3s however, the situation is less clear. These cell types are mildly misoriented in ptxA and Gpr156 mutants, but IHCs in particular appear severely misoriented in Gnai2; Gnai3 mutants based on the position of the basal body (Fig. 8). However, very dysmorphic hair bundles can pull on the basal body via the kinocilium and affect its position, which obscures hair cell orientation inferred from the basal body and subsequent interpretations. We do not delve on IHC and OHC3 and their orientation in Gnai2; Gnai3 mutants in the revision since we do not observe similar orientation defects in a different mouse model and lack sufficient adult data.

      Suggestions to improve upon the manuscript for readers:

      (1) Line 294, indicate on the figure the staining in bare zone and tips of stereocilia on row 1.

      Pertains to Figure 4. In A, we now point out the bare zone and stereocilia tips with arrow and arrowheads, respectively (as in other figures).

      (2) Fig.8 schematic diagram, the labels of the line and 90o side by side is misleading.

      We added black ticks for 0, 90, 180, 270 degree references. In contrast, the hair cell angle represented was switched to magenta.

      (3) Fig. 7 legend, redundancy towards the end of the paragraph.

      Thank you for catching this issue. A large portion of the legend was indeed accidentally repeated and is now deleted.

      (4) Line 490-493, Another plausible explanation is that other factors besides Gnai2 and Gnai3 are involved in breaking symmetry during bundle establishment.

      We now acknowledge that other proteins besides GNAI/O may be involved (Discussion line 614). That said, the notion that we do not achieve sufficient and/or early enough GNAI loss is supported for example by the Beer-Hammer 2018 study where no defects in symmetry-breaking or orientation were reported in their Gnai2 flox/flox; Gnai3 flox/flox model (Discussion new Line 637).

      (5) Line 518, the base were largely inverted (Figure 8B). Should Fig 8A be cited instead of 8B?

      Fig. 8B has graphs for the E17.5 cochlear base where OHC1-2 are inverted in both ptxA and Gnai2;3 DKO models. Fig. 8A has graphs of the E17.5 cochlear mid (less differentiated hair cells) where an inversion was not obvious previously, but is now clear although only partial in Gnai2; Gnai3 DKO (see above; raised eccentricity threshold). In the context of the previous text, this citation was thus correct. However, this section has been heavily modified to better compare Gnai2; Gnai3 DKO and ptxA and is hopefully less confusing in the revised version.

      Reviewer #2 (Public Review):

      Jarysta and colleagues set out to define how similar GNAI/O family members contribute to the shape and orientation of stereocilia bundles on auditory hair cells. Previous work demonstrated that loss of particular GNAI proteins, or inhibition of GNAIs by pertussis toxin, caused several defects in hair bundle morphogenesis, but open questions remained which the authors sought to address. Some of these questions include whether all phenotypes resulting from expression of pertussis toxin stemmed from GNAI inhibition; which GNAI family members are most critical for directing bundle development; whether GNAI proteins are needed for basal body movements that contribute to bundle patterning. These questions are important for understanding how tissue is patterned in response to planar cell polarity cues.

      To address questions related to the GNAI family in auditory hair cell development, the authors assembled an impressive and nearly comprehensive collection of mouse models. This approach allowed for each Gnai and Gnao gene to be knocked out individually or in combination with each other. Notably, a new floxed allele was generated for Gnai3 because loss of this gene in combination with Gnai2 deletion was known to be embryonic lethal. Besides these lines, a new knockin mouse was made to conditionally express untagged pertussis toxin following cre induction from a strong promoter. The breadth and complexity involved in generating and collecting these strains makes this study unique, and likely the authoritative last word on which GNAI proteins are needed for which aspect of auditory hair bundle development.

      Appropriate methods were employed by the authors to characterize auditory hair bundle morphology in each mouse line. Conclusions were carefully drawn from the data and largely based on excellent quantitative analysis. The main conclusions are that GNAI3 has the largest effect on hair bundle development. GNAI2 can compensate for GNAI3 loss in early development but incompletely in late development. The Gnai2 Gnai3 double mutant recapitulates nearly all the phenotypic effects associated with pertussis toxin expression and also reveals a role for GNAIs in early movement of the basal body. Although these results are not entirely unexpected based on earlier reports, the current results both uncover new functions and put putative functions on more solid ground.

      Based on this study, loss of GNAI1 and GNAO show a slight shortening of the tallest row of stereocilia but no other significant changes to bundle shape. Antibody staining shows no change in GNAI localization in the Gnai1 knockout, suggesting that little to no protein is found in hair cells. One caveat to this interpretation is that the antibody, while proposed to cross-react with GNAI1, is not clearly shown to immunolabel GNAI1. More than anything, this reservation mostly serves to illustrate how challenging it is to nail down every last detail. In turn, the comprehensive nature of the current study seems all the more impressive.

      (1) The original manuscript quantified stereocilia properties in Gnai1 and Gnai2 single mutants, and in Gnai1; Gnai2 double mutants using non-parametric t-tests (Mann-Whitney) for comparisons. This approach indeed suggested subtle reduction in row 1 height in IHCs in all 3 mutants. We did not quantify stereocilia features in Gnao1 mutants but could not observe defects (new Fig. 2 Supp. 1E-F). In fact, we could not observe defects in Gnai1 and Gnai2 single mutants, and in Gnai1; Gnai2 double mutants either. For this reason we have been ambivalent about reporting defects for Gnai1 and Gnai2 single and Gnai1; Gnai2 double mutants.

      In the revision, we applied a nested (hierarchical) t-test to avoid pseudo-replication (Eisner 2021; PMID: 33464305; https://pubmed.ncbi.nlm.nih.gov/33464305/). In our data, the nested t-tests structure measurements by animal instead of having all stereocilia or other cell measurements treated as independent values. This more stringent approach no longer finds row 1 height reduction significant in single Gnai1 or Gnai2 mutants, or in Gnai1; Gnai2 double mutants. We modified the text accordingly in Results and Discussion. Nested t-tests were applied uniformly across the manuscript and, besides IHC measurements in Fig. 2, now also apply to bare zone surface area in Fig. 6 and eccentricity in Fig. 7. For these experiments in contrast, previous conclusions are not changed. We think that this more careful statistical treatment is a closer representation of the data in term of the conclusions we can safely make.

      (2) The reviewer's criticism about antibody specificity is accurate and fair, and is fully addressed in the revised manuscript. First, we provide a phylogeny cartoon as Figure 1A to compare the GNAI/O proteins and highlight how closely related they are in sequence. To validate the assumption that our approach would detect GNAI1 if it were present in hair cells, we took a new dual experimental approach in the revision. First, we electroporated Gnai1, Gnai2 and Gnai3 expression constructs in the E13.5 inner ear and tested whether the two GNAI antibodies used in the study can detect ectopic GNAI1 in Kolliker organ. This revealed that “ptGNAI2” detects GNAI1 very well (in addition to GNAI2), but that “scbtGNAI3” does not detect GNAI1 efficiently (although it does detect GNAI3 very well). To verify in vivo that “ptGNAI2” can detect endogenous GNAI1, we immunolabeled the gallbladder epithelium in Gnai1 mutants and littermate controls using the “ptGNAI2” antibody. Based on IMPC consortium data* about the Gnai1 LacZ mouse strain, Gnai1 is specifically expressed in the adult gallbladder. We could verify that signals detected in the Gnai1 mutants were visually reduced in comparison to littermate controls. We now added this validation step in Results line 309 and the data in Fig. 4 Supp. 1A-B).

      *https://www.mousephenotype.org/data/genes/MGI:95771

      Reviewer #2 (Recommendations For The Authors):

      Minor comments that may marginally improve clarity.

      Abstract line 24: delete "nor polarized" because polarization cannot be assessed since the protein is undetectable.

      This is a fair point, now deleted.

      Consider revising: Lines 80-82; 188-202 (the order in which the mutants were presented was hard to follow for me); 239-240.

      Lines 80-82: Used to read as "Ptx recapitulates severe stereocilia stunting and immature-looking hair bundles observed when GPSM2 or both GNAI2 and GNAI3 are inactivated."

      Line 88: Was now changed to "Ptx provokes immature-looking hair bundles with severely stunted stereocilia, mimicking defects in Gpsm2 mutants and Gnai2; Gnai3 double mutants".

      Lines 188-202: This was the first paragraph describing adult stereocilia defects in the different Gnai/o mouse strains. We completely rewrote the entire section to reflect the order in which the strains appear in Figure 2, hopefully making the text easier to follow because it better matches panels in Fig. 2 . We also made several other modifications to streamline comparisons and better introduce the orientation defects that are later detailed at neonate stages.

      Lines 239-240: Used to read "GNAI2 makes a clear contribution since stereocilia defects increase in severity when GNAI loss extends from GNAI3 to both GNAI2 and GNAI3".

      Line 247: Was now changed for "GNAI2 makes a clear contribution since Gnai3neo stereocilia defects dramatically increase in severity when GNAI2 is absent as well in Gnai2; Gnai3 double mutants."

      Line 164: hardwired is unclear. Conserved?

      We modified this sentence as follows: Line 171: "We reasoned that apical HC development is probably highly constrained and less likely to be influenced by genetic heterogeneity compared to susceptibility to disease, for example."

      Line 299: It is not clear why GNAI1 is a better target than GNAI3. This phrase is repeated in line 303, I suspect inadvertently. Is there evidence that this antibody detects GNAI1, perhaps in another tissue? Line 308: GNAI1 may also not be detected by this antibody.

      Please see point 2 above. We removed these hypothetical statements entirely and we instead now experimentally show that one of the two commercial antibodies used can readily detect GNAI1 (yet does not detect signal in hair cells when GNAI2 and GNAI3 are absent in Fig. 4F).

    1. Author Response

      The following is the authors’ response to the previous reviews.

      Reviewer #2 (Public Review):

      Major Weaknesses:

      The assertion that MOCAT can be rapidly applied in hospital pathology departments seems overstated due to the limited availability of light-sheet microscopes outside research labs. In the first rebuttal letter, authors explain the limitations of other microscopes more readily available in hospitals. This explanation relies on your own investigations and practical experience on the matter, so including them in some part of the manuscript would be beneficial.

      We appreciate the reviewer's comments and have added a discussion on the limitations of microscopes that are more readily available in hospitals in our text:

      Revised manuscript, line 305-316:

      “3.3 Microscopy options for imaging centimeter-sized specimens

      Optical sectioning techniques are crucial for obtaining high-quality volumetric images. Techniques such as confocal microscopes, multi-photon microscopy, and light-sheet microscopy filter out-of-focus signals, resulting in sharp images of individual planes. In our study, we used light-sheet microscopy and multi-point confocal (i.e., spinning disc) for imaging centimeter-sized specimens because of their scanning speeds. While two-photon and confocal microscopy offer high-resolution imaging of smaller volumes, they are not ideal for scanning entire tissues because of their prolonged scanning times.”

      Non-optical sectioning wide-field fluorescence microscopes, like the Olympus BX series or ZEISS Axio imager series, can also be used to scan samples up to about 3.5mm thick with long working distance objective lenses. In these cases, deconvolution algorithms are required to eliminate out-of-focus signals. However, it should be noted that the epifluorescence system might reduce fluorescent intensity in deeper regions within the samples.”

      Refractive index matching is a critical point in the protocol, the one providing final transparency. Authors utilized the commercial solutions NFC1 and NFC2 (Nebulem, Taiwan) with a known refractive index, but for which its composition is non-disclosable. My knowledge on the organic chemistry around refractive index matching is limited, but if users don't really know what is going on in this final step, the whole protocol would rely on a single world-wide provider and troubleshooting would be fishing. I suggest that you try to validate the approach with solutions of known composition, or at least provide the solutions sold by other providers.

      We appreciate the reviewer's suggestions. Based on our experience, the CUBIC-R solution developed by Ueda's team also serves as an effective RI-matching solution in the MOCAT pipeline. Its only drawback is the potential reddening of the specimen, likely due to the light-responsive component, antipyrine. We have now added this information to the Methods section:

      Revised manuscript, line 492-496:

      “Refractive index (RI) matching. Before imaging, the specimens were RI-matched by being immersed in NFC1 (RI = 1.47) and NFC2 (RI = 1.52) solutions (Nebulum, Taipei, Taiwan). Each immersion lasted for one day at room temperature. Alternatively, RI-matching can also be accomplished by immersing specimens in a 1:1 dilution of CUBIC-R[28] for one day, followed by pure CUBIC-R for an additional day.“

      Reviewer #2 (Recommendations For The Authors):

      A comment on the name of the protocol, MOCAT. I am sorry to bring this now, and not before. But, I strongly recommend another name for the procedure. My concern is that the present name "MOCAT" refers to the problem, and NOT to the actual solution provided by you. See, the problem to solve is: to perform Multiplex labeling Of Centimeter-sized Archived Tissue (MOCAT), but it says nothing about HOW you did it: heat-induced antigen retrieval and Tween20-delipidation for centimeter-scale FFPE specimens. In summary, I strongly recommend that the acronym of the procedure refers more to the "solution" than to the "problem", and for me this is important because otherwise the acronym is not fair with present and future techniques pretending to provide a novel solution to the same problem. Another way to put it is that researchers can own their proposed solutions, but they do not own the problem to be solved.

      We appreciate the reviewer's suggestions. In response to their concerns, we have renamed the procedure presented in this study as Heat-Induced FFPE-based Tissue Clearing, with the acronym HIF-Clear. This change reflects the critical step in our procedure. Corresponding updates have also been made in the manuscript.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      This manuscript aims to understand the biological mechanisms underlying neuropsychiatric symptoms in Parkinson's disease by characterizing subtypes of neurons in the dorsal raphe nucleus and defining their susceptibility to the degeneration of dopaminergic and adrenergic systems in the brain. This study was well-designed, the results were presented beautifully, and the manuscript was well-written. Here are some comments that may help to improve the overall quality of this work.

      We thank the reviewer for the kind comments.

      Major concerns:

      The current study utilized an intrastriatal 6-OHDA injection, which raises the possibility that the observed electrophysiological and morphological changes of DRN5-HT and DRNDA neurons (Figs 3-6) may be due to the direct effects of 6-OHDA to DRN5-HT and DRNDA neurons projecting to the dorsal striatum (at least for DRN5-HT neurons). This possibility requires further clarification and discussion.

      6-OHDA is a catecholamine neurotoxin with low selectivity for serotonin neurons. However, changes in the levels of serotonin have been observed with high doses of 6OHDA. In our study, we used lower concentrations of 6-OHDA, which did not affect the levels of serotonin (Suppl. Fig 4D), or the number of DRN5-HT neurons (Suppl. Fig. 5B). Concerning the possible effect of 6-OHDA on DRNDA neurons, we did not observe any modification in the number of these cells in response to the administration of 6-OHDA (Suppl. Fig. 5C), (lines 170-175).

      How does the loss of nigrostriatal dopamine neurons affect the electrophysiology and morphology of DRNDA neurons (Figs. 5-6)? What are the potential circuit mechanisms?

      The dopaminergic system in the midbrain and the DRN constitute two highly interconnected nuclei and hence there are multiple possible circuit mechanisms that could explain how loss of nigrostriatal dopaminergic neurons affects DRNDA neurons: First, DRNDA neurons are directly innervated by dopaminergic neurons in the SNc and VTA and hence loss of SNc inputs might evoke acute as well as homeostatic changes in DRNDA (Lin et al., 2020; Pinto et al., 2019). Second, midbrain dopaminergic neurons are in turn innervated by the DRN (Watabe-Uchida et al., 2012) and loss of postsynaptic dopaminergic neurons might affect all neuron types in the DRN that target the midbrain. Finally, GABAergic populations in the midbrain have been shown to target DRN5-HT neurons and might potentially also target other local cell types such as DRNDA (Li et al., 2019). Another possible pathway is the bidirectional connection between the striatum and the DRN (Pollak-Dorocic et al, 2014). DA depletion in the striatum may affect the GABAergic projection to the DRN and in turn modify the properties of postsynaptic DRN neurons.

      The potential circuit mechanisms are now included in the introduction (lines 58-59).

      Whether these intrastriatal 6-OHDA mice exhibited nonmotor deficits (e.g., anxiety) that may be related to the observed changes in the DRN? Such behavioral data would enhance the overall conclusions of this work.

      The PD model utilized in this study displays non-motor deficits, including depression- and anxiety-like behavior (Masini et al. 2021, Ztaou et al., 2018). This is now highlighted in the manuscript (lines 167-169).

      Minor issues:

      The panels of Fig. 2 should be re-labelled to match the descriptions in the main text (L. 142-158).

      Fig.2 now matches the descriptions in the main text.

      Fig 4D was missing from the figure, which does not match the descriptions in the main text (L. 193-204:)

      Fig. 4D includes the parameters describing the dendritic branching and starts with the last graph on the right in the second row of the panel.

      Line 409: Extra "as" after "average"

      Corrected in revised manuscript.

      Fig 3G: Missed asterisks.

      Corrected in revised manuscript (Fig. 3G)

      Details of how action parameters were quantified should be stated and specified in the methods.

      We have now added a section called ‘Quantification of electrophysiological parameters’ in the methods where we explain how the electrophysiological properties are defined and quantified (lines 407-439).

      "Parkinson's disease" in the title should be revised to "parkinsonism"

      Corrected in revised manuscript.

      Reviewer #2 (Recommendations For The Authors):

      (1) Throughout the paper, there are numerous inaccuracies and inconsistencies in the figures, which impede the clear understanding of this paper. For example, there are discrepancies between the labeling of the main figures (sub-panels) and the corresponding manuscript (Figure 2, Figure 4).

      Corrected in the revised manuscript.

      The statistical presentations are inaccurate in several figures (Figure 3E, 3G), making it difficult to distinguish which data is statistically meaningful. Furthermore, the number of cells presented in each figure is ambiguous in the figure legend. It would be better to avoid expressions such as 'n = 28 - 43 cells per group', as in line 456 (Figure 1I). Please provide the exact number of cells for each graph.

      We agree with the reviewer, and we have now added the precise n numbers for each panel in the corresponding legends in Fig 1, Fig 3, and Fig 5. Please note that some analysis was restricted to recordings where neurons fired close to their average spontaneous firing frequency (e.g. 1Hz for DRN5-HT) to allow for a fair comparison of the data across groups and that therefore the n numbers vary in different panels.

      In some figures, the value of n in the graph seems different from the value of n in the figure legends (Figure 2G-I, Figure 4, Figure 6). Collectively, these inaccurate figures and the manuscript weaken the general credibility of the data presented.

      We apologize for the misunderstanding, but in the type of chosen graph, equal values are overlapped. The numbers described in the figure legend are correct.

      (2) Some of the authors' claims in this paper are not supported by quantitative analysis, but only by sample recording traces or simple descriptions. For example, in line 97, the authors mentioned, "no differences when comparing TH-positive to TH-negative neurons".

      But there are no data actually analyzing these two groups in Supplementary Figure 2A.

      In addition, in line 103, there is a claim that "DRN DA neurons showed that they share several properties characteristics of other DA populations located in the SNc and the ventral tegmental area". However, this claim is backed up only by a few sample traces in Figure 1E.

      The statement (lines 110-111), "a relative constant action potential (AP) amplitude", is also not supported by appropriate quantitative analysis but only by sample recording traces.

      In our study we found a small subset of DAT-tdTomato positive neurons which did not stain positive for TH after the slice recordings. In 5 of 6 of these neurons (recorded in sham), the electrophysiological properties did not differ from other TH-positive neurons. This is visualized in Suppl. Fig 2A. The absence of any statistical difference was also confirmed by a Mann Whiteny U test comparing the TH negative to the TH positive DRNDA neurons (no significant differences in all 6 of 6 properties shown in Suppl. Fig 2A). Additionally, all these cells were DAT-positive, further supporting their classification as dopaminergic neurons. Therefore, we suspect that the lack of TH staining is likely caused by the tissue processing itself. Please note that all our immunohistochemistry was run on slices after several hours of patch-clamping procedures. Finally, including or excluding this small subset of neurons in the present study does not change any of the results presented and data was therefore pooled. We have now clarified this in more detail in the results section and in Suppl. Fig 2A (lines 100-103).

      We have moved the comparison of hallmark properties found in DRNDA neurons as well as in dopaminergic neurons in the midbrain from the results section to the discussion (lines 281-283).

      The claim that DRN5HT neurons have a comparatively constant action potential amplitude compared to DRNDA neurons is supported by quantitative analysis shown in Fig 1I (left panel, “AP drop rate”), while the representative example traces are shown in Fig 1G.

      (3) In the legend of Figure 2, the mouse used in this experiment is mentioned with two different names (wild-type mice in line 463 and sham-lesion mice in line 465). Is this a mistake? Or did the authors intentionally use the brain samples from sham-lesion mice for Figure 2?

      Figure 2 shows data in control conditions (Sham-lesion in our case), both from wild-type and Dat-Tomato. The text has been changed to avoid misunderstandings.

      (4) While the primary claim of this paper is the differential alterations of DRN 5-HT and DA neurons in a mouse PD model, the observed changes in the DRN neurons of the 'DA only lesion model' are comparatively minor to the 'DA and NA lesions model'. Therefore, it looks like NA depletion has a more critical role in the DRN neurons of 6OHDA-lesion mice than DA depletion. To understand the results of this paper better, it would be great if the authors can provide additional data from the 'NA only lesion model'.

      We agree with the reviewer, and we have now added a new set of experiments in which we selectively lesioned noradrenergic cells by injecting 6-OHDA unilaterally into the LC. The new data are presented in supplementary figure 6 in the revised manuscript. We find that selective lesioning of the NA system affects DRNDA and DRN5-HT neurons mildly, suggesting that the concomitant lesion of the DA and NA systems is particularly impactful (possibly because of interactions between these two systems).

      (5) In Figure 3B and Figure 5B, only the 6-OHDA+DMI group shows significant differences from the sham group. This finding might be attributed to the effect of DMI itself, not to the nigrostriatal DA degeneration without NA degeneration. Thus, adding the 'DMI-only group' in all experiments will strengthen the conclusion of this paper.

      The effect of one acute administration of desipramine was temporally limited to the stereotactic intervention (line 373-375), which was performed several weeks before the electrophysiological and morphological analyses. Given that the half-life of desipramine is approximately 24 hrs (Nagy and Johansson, 1975), we believe that its impact was limited to the neuroprotection of NA-neurons from 6-OHDA toxicity.

      (6) DRN 5-HT neurons are known to exhibit cellular heterogeneity, and in particular their electrophysiological properties are quite heterogeneous (Bernat Kocsis. 2006; J.V. Schweimer. et al. 2011). Furthermore, 5-HT neurons in the distinct subregions of the DRN display different membrane properties (LaTasha K. Crawford, 2010). Therefore, not all DRN 5-HT neurons can be regarded as electrophysiologically identical. Given that the molecular identity of all recorded cells was confirmed with neurobiotin in this paper, it would be better to show that recorded cells are not biased toward certain subregions of DRN.

      In addition, providing more comprehensive descriptions of the electrophysiological features used in PCA analysis would be beneficial in understanding the electrophysiological profiling of DRN neurons explained in this paper.

      Although several studies have revealed electrophysiological and molecular heterogeneity within the DRN5-HT population, we did not observe any significant differences within the DRN5-HT neurons recorded in this study. We compared the properties of DRN5HT neurons recorded more anterior to those recorded in the posterior

      DRN as well as neurons found in more ventral locations to those in more dorsal locations (data not shown). We would like to point out that the largest differences within serotonergic neuron populations described by previous studies were often found when comparing those located in the medial raphe nucleus (MRN) to those found in the DRN. Calizo et al., (2011) showed for example significant differences in the input resistance and AHP amplitude between MRN5HT and DRN5HT neurons. These two properties as well as the AP amplitude, AP threshold, AP duration, and tau did however not differ between DRN subregions in their study - and neither in ours. We extended our Suppl. Fig 1 and mapped the location of DRN5HT and DRNDA neurons recorded in sham (Suppl. Fig 1D).

      Overall, we’ve sampled neurons along the anterior-posterior and dorsal-ventral axes of the DRN, while on the medial-lateral axis, recorded DRN neurons were located medially.

      We agree with the reviewer that a comprehensive description of the electrophysiological features was missing in the manuscript, and we have therefore added a new section in the materials and methods where we explain in detail how each parameter was measured and analyzed (‘Quantification of electrophysiological parameters’, lines 407-439). This section also provides detailed information about the five properties underlying the PCA shown in figure 1 (i.e. delay to the first action potential, action potential drop rate, action potential rise time, duration of the afterhyperpolarization, and capacitance).

      (7) Some sample images presented in this paper contain information that can conflict with the previous research. In Figures 4B and 6B, TH expression was significantly increased in the DMI pretreatment group compared to the control group. However, several studies have shown that the administration of DMI decreases TH expression levels (Komori et al.1992; Nestler et al.1990). Therefore, it would be great if the authors further explained how the pretreatment of DMI with 6-OHDA affects TH level within the DRN.

      Figure 4B and 6B do not show any quantification of TH expression. The difference observed in the representative pictures is casual and due to the variable expression of TH across the slice. Moreover, as mentioned in the response to point 5, mice were subjected to a single injection of DMI immediately preceding the stereotactic intervention (line 373375). In contrast, the increase in TH expression reported by Komori et al. 1992 and Nestler et al. 1990 was observed in response to chronic (two weeks) administration of DMI.

      (8) This paper lacks direct evidence to demonstrate whether DMI pretreatment could effectively protect against NA depletion. Therefore, in addition to TH expression levels, it is important to provide data to confirm the intact NA levels (or NA axons) after DMI treatment.

      NA levels in the striatum were measured by Enzyme-linked immunosorbent assay and reported in Suppl.Fig.4 in the revised manuscript.

      (9) It would be great if the authors specifically explained why 6-OHDA was injected into the striatum (neither MFB nor SNc) to make a mouse model of PD.

      Mice were injected in the dorsal striatum to produce a partial bilateral lesion of the dopamine and noradrenaline systems. This model reproduces the initial stages of PD and also recapitulates several non-motor symptoms of PD, including affective disorders, which may be related to changes in serotonergic and dopaminergic transmission in the dorsal raphe. In contrast, injections in the MFB and SNc quickly produce a severe motor phenotype closer to a late stage of the disease and cannot be done bilaterally. <br /> The striatal model has been successfully used in other publications (Kravitz et al., 2010, Masini et al., 2021, Ztaou et al., 2018, Chen et al., 2014, Branchi et al., 2008, Marques et al. 2019, Tadaiesky et al., 2008, Matheus et al., 2016, Silva et al., 2016).

      (10) Supplementary Figures 2 and 3 were erroneously cut on the right side. These figure images should be replaced with the correct ones.

      We thank the reviewer for noticing and we have now replaced the figures with the correct ones.

      (11) There should be more explanations about tdTomato-positive but non-TH neurons in Supplementary Figure 2. It is strange to regard TH-negative neurons as DA neurons although these neurons have DA neuron-like electrophysiological properties. If these tdTomato-positive but non-TH neurons cannot release DA, can we say these are DA neurons?

      In our study we found a small subset of DAT-tdTomato positive neurons which did not stain positive for TH afterwards. In 5 of 6 of these neurons (recorded in sham), the electrophysiological properties did not differ from other TH-positive neurons. This is visualized in Suppl. Fig 2A. The absence of any statistical difference was also confirmed by a Mann Whiteny U test comparing the TH-negative to the TH-positive DRNDA neurons (no significant differences in all 6 of 6 properties shown in SF2A). Additionally, all these cells were DAT-positive, further supporting their classification as dopaminergic neurons. Therefore, we suspect that the lack of TH staining is likely caused by the tissue processing itself. Please note that all our immunohistochemistry was run on slices after several hours of patch-clamping procedures. Finally, including or excluding this small subset of neurons in the present study does not change any of the results presented and data was therefore pooled. We have now clarified this in more detail in the results section and in Suppl. Fig 2A (lines 100-103).

      Reviewer #3 (Recommendations For The Authors):

      The authors report using a parametric statistical test, the t-test. The t-test makes the assumption that the data are normally distributed. Most biological data is not distributed normally, and with smaller datasets, it is difficult to say whether the underlying distribution would be normally distributed. I would recommend using the non-parametric versions of the same test (eg Mann-Whitney U test), which is likely to give a similar result while being more conservative given the potential for non-normal distribution.

      All electrophysiological data were first tested for normality before running the corresponding statistical test (either t-test for normal distributed data or Mann-Whitney U test for non-normally distributed data). The morphological data are now analyzed by the Mann-Whitney U test (lines 484-494).

      The authors state that mice were treated with 6-OHDA at 3 months, then brain slices were prepared 3 weeks later, making them about 4 months old. I could not find the age of sham/control mice and 6-OHDA/desipramine mice in the methods section. Were sham/controls and 6-OHDA slices prepared in an interleaved fashion?

      Sham and 6-OHDA+DMI mice underwent surgery at 3 months and the brain slices were prepared 3 weeks later, as the 6-OHDA mice. We have now clarified this in the methods (line 381).

      While desipramine is relatively selective as a norepinephrine reuptake inhibitor, it also can prevent serotonin reuptake. Could this mechanism also protect DRN neurons from the effects of 6-OHDA?

      Even if desipramine has some affinity for the serotonin reuptake, this affinity is 100-fold less than the one described for the noradrenaline reuptake (Richelson and Pfenning, 1984, Gillman, 2007). Moreover, in our study the 6-OHDA injection in the dorsal striatum did not cause any direct damage to the DRN5-HT, as shown by the 5-HT measurement and DRN5-HT counting (Suppl. Fig. 4D, Suppl. Fig. 5A,B), so we can exclude that the effects observed in the DMI+6-OHDA group are related to a protection of the serotonergic system exerted by a single injection of desipramine.

      On line 168, the authors use the abbreviation NA for noradrenergic. Was this abbreviation previously defined in the manuscript?

      Yes, the abbreviation is defined in the introduction (line 73).

      On line 45, the authors cite that the DRN-5HT subpopulation accounts for 30-50% of the DRN neurons. It would be helpful to know approximately what percentage of the DRN neurons belong to the DRNDA subpopulation as well.

      To the best of our knowledge, there is unfortunately no detailed analysis of the prevalence of DRNDA neurons in mice available. Previous studies in rats have estimated that this population comprises around 1000 neurons (Descarries et al., 1986). According to Calizo et al. (2011), the number of any non-serotonergic neuron population (releasing dopamine or other neurotransmitters) in the DRN is one third to one tenth less than the number of DRN5-HT neurons. But please note that this study was also performed in rats (line 55).

      While I appreciate that the authors did not over-interpret their findings, it would be useful to comment (in the Discussion) on how their findings could/should be used in interpreting other studies using 6-OHDA, as well as the relationship of their findings to loss of 5-HT and/or DRN neurons in Parkinson's Disease itself.

      In the manuscript, we refer to the utility of the 6-OHDA model for the study of a wide range of non-motor symptoms. We have now described, in this model, how the loss of midbrain dopaminergic and noradrenergic neurons affects the electrophysiological and morphological properties of DRN5-HT and DRNDA neurons. This information will allow for a more precise assessment of the mechanisms involved in the affective and cognitive aspects of PD symptomatology (lines 354-356).

    1. Author Response

      We are writing this response letter with regards to the insightful feedback you provided on our manuscript titled: "A metabolic modeling-based framework for predicting trophic dependencies in native rhizobiomes of crop plants" submitted for consideration in eLife.

      We sincerely appreciate the thorough and constructive reviews, seeing and fitting the intentions behind our work. We intend to fully address all points raised by the reviewers in our revised manuscript. Specifically, we plan to incorporate targeted revisions to address concerns raised during the review process, with focus on process benchmarking and validation of our framework to enhance its reliability and accuracy.

      We believe that the current revision would improve the consistency and quality of the framework, making it a suitable tool for the characterization of microbial trophic interactions in diverse biological landscapes.

      Thank you once again for both your time and dedication in reviewing our manuscript, as well as the constructive review.

    1. Author Response

      The following is the authors’ response to the previous reviews.

      Recommendations for the authors:

      (1) Substantial revision of the claims and interpretation of the results is needed, especially in the setting of additional data showing enhanced erythrophagocytosis with decreased RBC lifespan.

      Thank you for your valuable feedback and suggestion for a substantial revision of the claims and interpretation of our results. We acknowledge the importance of considering additional data that shows enhanced erythrophagocytosis with decreased RBC lifespan. In response, we have revised our manuscript and incorporated additional experimental data to support and clarify our findings.

      (1) In our original manuscript, we reported a decrease in the number of splenic red pulp macrophages (RPMs) and phagocytic erythrocytes after hypobaric hypoxia (HH) exposure. This conclusion was primarily based on our observations of reduced phagocytosis in the spleen.

      (2) Additional experimental data on RBC labeling and erythrophagocytosis:

      • Experiment 1 (RBC labeling and HH exposure)

      We conducted an experiment where RBCs from mice were labeled with PKH67 and injected back into the mice. These mice were then exposed to normal normoxia (NN) or HH for 7 or 14 days. The subsequent assessment of RPMs in the spleen using flow cytometry and immunofluorescence detection revealed a significant decrease in both the population of splenic RPMs (F4/80hiCD11blo, new Figure 5A and C) and PKH67-positive macrophages after HH exposure (as depicted in new Figure 5A and C-E). This finding supports our original claim of reduced phagocytosis under HH conditions.

      Author response image 1.

      -Experiment 2 (erythrophagocytosis enhancement)

      To examine the effects of enhanced erythrophagocytosis, we injected Tuftsin after administering PKH67-labelled RBCs. Our observations showed a significant decrease in PKH67 fluorescence in the spleen, particularly after Tuftsin injection compared to the NN group. This result suggests a reduction in RBC lifespan when erythrophagocytosis is enhanced (illustrated in new Figure 7, A-B).

      Author response image 2.

      (3) Revised conclusions:

      • The additional data from these experiments support our original findings by providing a more comprehensive view of the impact of HH exposure on splenic erythrophagocytosis.

      • The decrease in phagocytic RPMs and phagocytic erythrocytes after HH exposure, along with the observed decrease in RBC lifespan following enhanced erythrophagocytosis, collectively suggest a more complex interplay between hypoxia, erythrophagocytosis, and RBC lifespan than initially interpreted.

      We think that these revisions and additional experimental data provide a more robust and detailed understanding of the effects of HH on splenic erythrophagocytosis and RBCs lifespan. We hope that these changes adequately address the concerns raised and strengthen the conclusions drawn in our manuscript.

      (2) F4/80 high; CD11b low are true RPMs which the cells which the authors are presenting, i.e. splenic monocytes / pre-RPMs. To discuss RPM function requires the presentation of these cells specifically rather than general cells in the proper area of the spleen.

      Thank you for your feedback requesting a substantial revision of our claims and interpretation, particularly considering additional data showing enhanced erythrophagocytosis with decreased RBC lifespan. In response, we have thoroughly revised our manuscript and included new experimental data that further elucidate the effects of HH on RPMs and erythrophagocytosis.

      (1) Re-evaluation of RPMs population after HH exposure:

      • Flow cytometry analysis (new Figure 3G, Figure 5A and B): We revisited the analysis of RPMs (F4/80hiCD11blo) in the spleen after 7 and 14 days of HH exposure. Our revised flow cytometry data consistently showed a significant decrease in the RPMs population post-HH exposure, reinforcing our initial findings.

      Author response image 3.

      Author response image 4.

      • In situ expression of RPMs (Figure S1, A-D):

      We further confirmed the decreased population of RPMs through in situ co-staining with F4/80 and CD11b, and F4/80 and CD68, in spleen tissues. These results clearly demonstrated a significant reduction in F4/80hiCD11blo (Figure S1, A and B) and F4/80hiCD68hi (Figure S1, C and D) cells following HH exposure.

      Author response image 5.

      (2) Single-cell sequencing analysis of splenic RPMs:

      • We conducted a single-cell sequencing analysis of spleen samples post 7 days of HH exposure (Figure S2, A-C). This analysis revealed a notable shift in the distribution of RPMs, predominantly associated with Cluster 0 under NN conditions, to a reduced presence in this cluster after HH exposure.

      • Pseudo-time series analysis indicated a transition pattern change in spleen RPMs, with a shift from Cluster 2 and Cluster 1 towards Cluster 0 under NN conditions, and a reverse transition following HH exposure (Figure S2, B and D). This finding implies a decrease in resident RPMs in the spleen under HH conditions.

      (3) Consolidated findings and revised interpretation:

      • The comprehensive analysis of flow cytometry, in situ staining, and single-cell sequencing data consistently indicates a significant reduction in the number of RPMs following HH exposure.

      • These findings, taken together, strongly support the revised conclusion that HH exposure leads to a decrease in RPMs in the spleen, which in turn may affect erythrophagocytosis and RBC lifespan.

      Author response image 6.

      In conclusion, our revised manuscript now includes additional experimental data and analyses, strengthening our claims and providing a more nuanced interpretation of the impact of HH on spleen RPMs and related erythrophagocytosis processes. We believe these revisions and additional data address your concerns and enhance the scientific validity of our study.

      (3) RBC retention in the spleen should be measured anyway quantitatively, eg, with proper flow cytometry, to determine whether it is increased or decreased.

      Thank you for your query regarding the quantitative measurement of RBC retention in the spleen, particularly in relation to HH exposure. We have utilized a combination of techniques, including flow cytometry and histological staining, to investigate this aspect comprehensively. Below is a summary of our findings and methodology.

      (1) Flow cytometry analysis of labeled RBCs:

      • Our study employed both NHS-biotin (new Figure 4, A-D) and PKH67 labeling (new Figure 4, E-H) to track RBCs in mice exposed to HH. Flow cytometry results from these experiments (new Figure 4, A-H) showed a decrease in the proportion of labeled RBCs over time, both in the blood and spleen. Notably, there was a significantly greater reduction in the amplitude of fluorescently labeled RBCs after NN exposure compared to the reduced amplitude of fluorescently labeled RBCs observed in blood and spleen under HH exposure. The observed decrease in labeled RBCs was initially counterintuitive, as we expected an increase in RBC retention due to reduced erythrophagocytosis. However, this decrease can be attributed to the significantly increased production of RBCs following HH exposure, diluting the proportion of labeled cells.

      • Specifically, for blood, the biotin-labeled RBCs decreased by 12.06% under NN exposure and by 7.82% under HH exposure, while the PKH67-labeled RBCs decreased by 9.70% under NN exposure and by 4.09% under HH exposure. For spleen, the biotin-labeled RBCs decreased by 3.13% under NN exposure and by 0.46% under HH exposure, while the PKH67-labeled RBCs decreased by 1.16% under NN exposure and by 0.92% under HH exposure. These findings suggest that HH exposure leads to a decrease in the clearance rate of RBCs.

      Author response image 7.

      (2) Detection of erythrophagocytosis in spleen:

      To assess erythrophagocytosis directly, we labeled RBCs with PKH67 and analyzed their uptake by splenic macrophages (F4/80hi) after HH exposure. Our findings (new Figure 5, D-E) indicated a decrease in PKH67-positive macrophages in the spleen, suggesting reduced erythrophagocytosis.

      Author response image 8.

      (3) Flow cytometry analysis of RBC retention:

      Our flow cytometry analysis revealed a decrease in PKH67-positive RBCs in both blood and spleen (Figure S4). We postulated that this was due to increased RBC production after HH exposure. However, this method might not accurately reflect RBC retention, as it measures the proportion of PKH67-labeled RBCs relative to the total number of RBCs, which increased after HH exposure.

      Author response image 9.

      (4) Histological and immunostaining analysis:

      Histological examination using HE staining and band3 immunostaining in situ (new Figure 6, A-D, and G-H) revealed a significant increase in RBC numbers in the spleen after HH exposure. This was further confirmed by detecting retained RBCs in splenic single cells using Wright-Giemsa composite stain (new Figure 6, E and F) and retained PKH67-labelled RBCs in spleen (new Figure 6, I and J).

      Author response image 10.

      (5) Interpreting the data:

      The comprehensive analysis suggests a complex interplay between increased RBC production and decreased erythrophagocytosis in the spleen following HH exposure. While flow cytometry indicated a decrease in the proportion of labeled RBCs, histological and immunostaining analyses demonstrated an actual increase in RBCs retention in the spleen. These findings collectively suggest that while the overall RBCs production is upregulated following HH exposure, the spleen's capacity for erythrophagocytosis is concurrently diminished, leading to increased RBCs retention.

      (6) Conclusion:

      Taken together, our results indicate a significant increase in RBCs retention in the spleen post-HH exposure, likely due to reduced residual RPMs and erythrophagocytosis. This conclusion is supported by a combination of flow cytometry, histological staining, and immunostaining techniques, providing a comprehensive view of RBC dynamics under HH conditions. We think these findings offer a clear quantitative measure of RBC retention in the spleen, addressing the concerns raised in your question.

      (4) Numerous other methodological problems as listed below.

      We appreciate your question, which highlights the importance of using multiple analytical approaches to understand complex physiological processes. Please find below our point-by-point response to the methodological comments.

      Reviewer #1 (Recommendations For The Authors):

      (1) Decreased BM and spleen monocytes d/t increased liver monocyte migration is unclear. there is no evidence that this happens or why it would be a reasonable hypothesis, even in splenectomized mice.

      Thank you for highlighting the need for further clarification and justification of our hypothesized decrease in BM and spleen monocytes due to increased monocyte migration to the liver, particularly in the context of splenectomized mice. Indeed, our study has not explicitly verified an augmentation in mononuclear cell migration to the liver in splenectomized mice.

      Nonetheless, our investigations have revealed a notable increase in monocyte migration to the liver after HH exposure. Noteworthy is our discovery of a significant upregulation in colony stimulating factor-1 (CSF-1) expression in the liver, observed after both 7 and 14 days of HH exposure (data not included). This observation was substantiated through flow cytometry analysis (as depicted in Figure S4), which affirmed an enhanced migration of monocytes to the liver. Specifically, we noted a considerable increase in the population of transient macrophages, monocytes, and Kupffer cells in the liver following HH exposure.

      Author response image 11.

      Considering these findings, we hypothesize that hypoxic conditions may activate a compensatory mechanism that directs monocytes towards the liver, potentially linked to the liver’s integral role in the systemic immune response. In accordance with these insights, we intend to revise our manuscript to reflect the speculative nature of this hypothesis more accurately, and to delineate the strategies we propose for its further empirical investigation. This amendment ensures that our hypothesis is presented with full consideration of its speculative basis, supported by a coherent framework for future validation.

      (2) While F4/80+CD11b+ population is decreased, this is mainly driven by CD11b and F4/80+ alone population is significantly increased. This is counter to the hypothesis.

      Thank you for addressing the apparent discrepancy in our findings concerning the F4/80+CD11b+ population and the increase in the F4/80+ alone population, which seems to contradict our initial hypothesis. Your observation is indeed crucial for the integrity of our study, and we appreciate the opportunity to clarify this matter.

      (1) Clarification of flow cytometry results:

      • In response to the concerns raised, we revisited our flow cytometry experiments with a focus on more clearly distinguishing the cell populations. Our initial graph had some ambiguities in cell grouping, which might have led to misinterpretations.

      • The revised flow cytometry analysis, specifically aimed at identifying red pulp macrophages (RPMs) characterized as F4/80hiCD11blo in the spleen, demonstrated a significant decrease in the F4/80 population. This finding is now in alignment with our immunofluorescence results.

      Author response image 12.

      Author response image 13.

      (2) Revised data and interpretation:

      • The results presented in new Figure 3G and Figure 5 (A and B) consistently indicate a notable reduction in the RPMs population following HH exposure. This supports our revised understanding that HH exposure leads to a decrease in the specific macrophage subset (F4/80hiCD11blo) in the spleen.

      We’ve updated our manuscript to reflect these new findings and interpretations. The revised manuscript details the revised flow cytometry analysis and discusses the potential mechanisms behind the observed changes in macrophage populations.

      (3) HO-1 expression cannot be used as a surrogate to quantify number of macrophages as the expression per cell can decrease and give the same results. In addition, the localization of effect to the red pulp is not equivalent to an assertion that the conclusion applies to macrophages given the heterogeneity of this part of the organ and the spleen in general.

      Thank you for your insightful comments regarding the use of HO-1 expression as a surrogate marker for quantifying macrophage numbers, and for pointing out the complexity of attributing changes in HO-1 expression specifically to macrophages in the splenic red pulp. Your observations are indeed valid and warrant a detailed response.

      (1) Role of HO-1 in macrophage activity:

      • In our study, HO-1 expression was not utilized as a direct marker for quantifying macrophages. Instead, it was considered an indicator of macrophage activity, particularly in relation to erythrophagocytosis. HO-1, being upregulated in response to erythrophagocytosis, serves as an indirect marker of this process within splenic macrophages.

      • The rationale behind this approach was that increased HO-1 expression, induced by erythrophagocytosis in the spleen’s red pulp, could suggest an augmentation in the activity of splenic macrophages involved in this process.

      (2) Limitations of using HO-1 as an indicator:

      • We acknowledge your point that HO-1 expression per cell might decrease, potentially leading to misleading interpretations if used as a direct quantifier of macrophage numbers. The variability in HO-1 expression per cell indeed presents a limitation in using it as a sole indicator of macrophage quantity.

      • Furthermore, your observation about the heterogeneity of the spleen, particularly the red pulp, is crucial. The red pulp is a complex environment with various cell types, and asserting that changes in HO-1 expression are exclusive to macrophages could oversimplify this complexity.

      (3) Addressing the concerns:

      • To address these concerns, we propose to supplement our HO-1 expression data with additional specific markers for macrophages. This would help in correlating HO-1 expression more accurately with macrophage numbers and activity.

      • We also plan to conduct further studies to delineate the specific cell types in the red pulp contributing to HO-1 expression. This could involve techniques such as immunofluorescence or immunohistochemistry, which would allow us to localize HO-1 expression to specific cell populations within the splenic red pulp.

      We’ve revised our manuscript to clarify the role of HO-1 expression as an indirect marker of erythrophagocytosis and to acknowledge its limitations as a surrogate for quantifying macrophage numbers.

      (4) line 63-65 is inaccurate as red cell homeostasis reaches a new steady state in chronic hypoxia.

      Thank you for pointing out the inaccuracy in lines 63-65 of our manuscript regarding red cell homeostasis in chronic hypoxia. Your feedback is invaluable in ensuring the accuracy and scientific integrity of our work. We’ve revised lines 63-65 to accurately reflect the understanding.

      (5) Eryptosis is not defined in the manuscript.

      Thank you for highlighting the omission of a definition for eryptosis in our manuscript. We acknowledge the significance of precisely defining such key terminologies, particularly when they play a crucial role in the context of our research findings. Eryptosis, a term referenced in our study, is a specialized form of programmed cell death unique to erythrocytes. Similar with apoptosis in other cell types, eryptosis is characterized by distinct physiological changes including cell shrinkage, membrane blebbing, and the externalization of phosphatidylserine on the erythrocyte surface. These features are indicative of the RBCs lifecycle and its regulated destruction process.

      However, it is pertinent to note that our current study does not extensively delve into the mechanisms or implications of eryptosis. Our primary focus has been to elucidate the effects of HH exposure on the processes of splenic erythrophagocytosis and the resultant impact on the lifespan of RBCs. Given this focus, and to maintain the coherence and relevance of our manuscript, we have decided to exclude specific discussions of eryptosis from our revised manuscript. This decision aligns with our aim to provide a clear and concentrated exploration of the influence of HH exposure on RBCs dynamics and splenic function.

      We appreciate your input, which has significantly contributed to enhancing the clarity and accuracy of our manuscript. The revision ensures that our research is presented with a focused scope, aligning closely with our experimental investigations and findings.

      (6) Physiologically, there is no evidence that there is any "free iron" in cells, making line 89 point inaccurate.

      Thank you for highlighting the concern regarding the reference to "free iron" in cells in line 89 of our manuscript. The term "free iron" in our manuscript was intended to refer to divalent iron (Fe2+), rather than unbound iron ions freely circulating within cells. We acknowledge that the term "free iron" might lead to misconceptions, as it implies the presence of unchelated iron, which is not physiologically common due to the potential for oxidative damage. To rectify this and provide clarity, we’ve revised line 89 of our manuscript to reflect our meaning more accurately. Instead of "free iron," we use "divalent iron (Fe2+)" to avoid any misunderstanding regarding the state of iron in cells. We also ensure that any implications drawn from the presence of Fe2+ in cells are consistent with current scientific literature and understanding.

      (7) Fig 1f no stats

      We appreciate your critical review and suggestions, which help in improving the accuracy and clarity of our research. We’ve revised statistic diagram of new Figure 1F.

      (8) Splenectomy experiments demonstrate that erythrophagocytosis is almost completely replaced by functional macrophages in other tissues (likely Kupffer cells in the liver). there is only a minor defect and no data on whether it is in fact the liver or other organs that provide this replacement function and makes the assertions in lines 345-349 significantly overstated.

      Thank you for your critical assessment of our interpretation of the splenectomy experiments, especially concerning the role of erythrophagocytosis by macrophages in other tissues, such as Kupffer cells in the liver. We appreciate your observation that our assertions may be overstated and acknowledge the need for more specific data to identify which organs compensate for the loss of splenic erythrophagocytosis.

      (1) Splenectomy experiment findings:

      • Our findings in Figure 2D do indicate that in the splenectomized group under NN conditions, erythrophagocytosis is substantially compensated for by functional macrophages in other tissues. This is an important observation that highlights the body's ability to adapt to the loss of splenic function.

      • However, under HH conditions, our data suggest that the spleen plays an important role in managing erythrocyte turnover, as indicated by the significant impact of splenectomy on erythrophagocytosis and subsequent erythrocyte dynamics.

      (2) Addressing the lack of specific organ identification:

      • We acknowledge that our study does not definitively identify which organs, such as the liver or others, take over the erythrophagocytosis function post-splenectomy. This is an important aspect that needs further investigation.

      • To address this, we also plan to perform additional experiments that could more accurately point out the specific tissues compensating for the loss of splenic erythrophagocytosis. This could involve tracking labeled erythrocytes or using specific markers to identify macrophages actively engaged in erythrophagocytosis in various organs.

      (3) Revising manuscript statements:

      Considering your feedback, we’ve revised the statements in lines 345-349 (lines 378-383 in revised manuscript) to enhance the scientific rigor and clarity of our research presentation.

      (9) M1 vs M2 macrophage experiments are irrelevant to the main thrust of the manuscript, there are no references to support the use of only CD16 and CD86 for these purposes, and no stats are provided. It is also unclear why bone marrow monocyte data is presented and how it is relevant to the rest of the manuscript.

      Thank you for your critical evaluation of the relevance and presentation of the M1 vs. M2 macrophage experiments in our manuscript. We appreciate your insights, especially regarding the use of specific markers and the lack of statistical analysis, as well as the relevance of bone marrow monocyte data to our study's main focus.

      (1) Removal of M1 and M2 macrophage data:

      Based on your feedback and our reassessment, we agree that the results pertaining to M1 and M2 macrophages did not align well with the main objectives of our manuscript. Consequently, we have decided to remove the related content on M1 and M2 macrophages from the revised manuscript. This decision was made to ensure that our manuscript remains focused and coherent, highlighting our primary findings without the distraction of unrelated or insufficiently supported data.

      The use of only CD16 and CD86 markers for M1 and M2 macrophage characterization, without appropriate statistical analysis, was indeed a methodological limitation. We recognize that a more comprehensive set of markers and rigorous statistical analysis would be necessary for a meaningful interpretation of M1/M2 macrophage polarization. Furthermore, the relevance of these experiments to the central theme of our manuscript was not adequately established. Our study primarily focuses on erythrophagocytosis and red pulp macrophage dynamics under hypobaric hypoxia, and the M1/M2 polarization aspect did not contribute significantly to this narrative.

      (2) Clarification on bone marrow monocyte data:

      Regarding the inclusion of bone marrow monocyte data, we acknowledge that its relevance to the main thrust of the manuscript was not clearly articulated. In the revised manuscript, we provide a clearer rationale for its inclusion and how it relates to our primary objectives.

      (3) Commitment to clarity and relevance:

      We are committed to ensuring that every component of our manuscript contributes meaningfully to our overall objectives and research questions. Your feedback has been instrumental in guiding us to streamline our focus and present our findings more effectively.

      We appreciate your valuable feedback, which has led to a more focused and relevant presentation of our research. These changes enhance the clarity and impact of our manuscript, ensuring that it accurately reflects our key research findings.

      (10) Biotinolated RBC clearance is enhanced, demonstrating that RBC erythrophagocytosis is in fact ENHANCED, not diminished, calling into question the founding hypothesis that the manuscript proposes.

      Thank you for your critical evaluation of our data on biotinylated RBC clearance, which suggests enhanced erythrophagocytosis under HH conditions. This observation indeed challenges our founding hypothesis that erythrophagocytosis is diminished in this setting. Below is a summary of our findings and methodology.

      (1) Interpretation of RBC labeling results:

      Both the previous results of NHS-biotin labeled RBCs (new Figure 4, A-D) and the current results of PKH67-labeled RBCs (new Figure 4, E-H) demonstrated a decrease in the number of labeled RBCs with an increase in injection time. The production of RBCs, including bone marrow and spleen production, was significantly increased following HH exposure, resulting in a consistent decrease in the proportion of labeled RBCs via flow cytometry detection both in the blood and spleen of mice compared to the NN group. However, compared to the reduced amplitude of fluorescently labeled RBCs observed in blood and spleen under NN exposure, there was a significantly weaker reduction in the amplitude of fluorescently labeled RBCs after HH exposure. Specifically, for blood, the biotin-labeled RBCs decreased by 12.06% under NN exposure and by 7.82% under HH exposure, while the PKH67-labeled RBCs decreased by 9.70% under NN exposure and by 4.09% under HH exposure. For spleen, the biotin-labeled RBCs decreased by 3.13% under NN exposure and by 0.46% under HH exposure, while the PKH67-labeled RBCs decreased by 1.16% under NN exposure and by 0.92% under HH exposure.

      Author response image 14.

      (2) Increased RBCs production under HH conditions:

      It's important to note that RBCs production, including from bone marrow and spleen, was significantly increased following HH exposure. This increase in RBCs production could contribute to the decreased proportion of labeled RBCs observed in flow cytometry analyses, as there are more unlabeled RBCs diluting the proportion of labeled cells in the blood and spleen.

      (3) Analysis of erythrophagocytosis in RPMs:

      Our analysis of PKH67-labeled RBCs content within RPMs following HH exposure showed a significant reduction in the number of PKH67-positive RPMs in the spleen (new Figure 5). This finding suggests a decrease in erythrophagocytosis by RPMs under HH conditions.

      Author response image 15.

      (4) Reconciling the findings:

      The apparent contradiction between enhanced RBC clearance (suggested by the reduced proportion of labeled RBCs) and reduced erythrophagocytosis in RPMs (indicated by fewer PKH67-positive RPMs) may be explained by the increased overall production of RBCs under HH. This increased production could mask the actual erythrophagocytosis activity in terms of the proportion of labeled cells. Therefore, while the proportion of labeled RBCs decreases more significantly under HH conditions, this does not necessarily indicate an enhanced erythrophagocytosis rate, but rather an increased dilution effect due to higher RBCs turnover.

      (5) Revised interpretation and manuscript changes:

      Given these factors, we update our manuscript to reflect this detailed interpretation and clarify the implications of the increased RBCs production under HH conditions on our observations of labeled RBCs clearance and erythrophagocytosis. We appreciate your insightful feedback, which has prompted a careful re-examination of our data and interpretations. We hope that these revisions provide a more accurate and comprehensive understanding of the effects of HH on erythrophagocytosis and RBCs dynamics.

      (11) Legend in Fig 4c-4d looks incorrect and Fig 4e-4f is very non-specific since Wright stain does not provide evidence of what type of cells these are and making for a significant overstatement in the contribution of this data to "confirming" increased erythrophagocytosis in the spleen under HH exposure (line 395-396).

      Thank you for your insightful observations regarding the data presentation and figure legends in our manuscript, particularly in relation to Figure 4 (renamed as Figure 6 in the revised manuscript) and the use of Wright-Giemsa composite staining. We appreciate your constructive feedback and acknowledge the importance of presenting our data with utmost clarity and precision.

      (1) Amendments to Figure legends:

      We recognize the necessity of rectifying inaccuracies in the legends of the previously labeled Figure 4C and D. Corrections have been meticulously implemented to ensure the legends accurately contain the data presented. Additionally, we acknowledge the error concerning the description of Wright staining. The method employed in our study is Wright-Giemsa composite staining, which, unlike Wright staining that solely stains cytoplasm (RBC), is capable of staining both nuclei and cytoplasm.

      (2) Addressing the specificity of Wright-Giemsa Composite staining:

      Our approach involved quantifying RBC retention using Wright-Giemsa composite staining on single splenic cells post-perfusion at 7 and 14 days post HH exposure. We understand and appreciate your concerns regarding the nonspecific nature of Wright staining. Although Wright stain is a general hematologic stain and not explicitly specific for certain cell types, its application in our study aimed to provide preliminary insights. The spleen cells, devoid of nuclei and thus likely to be RBCs, were stained and observed post-perfusion, indicating RBC retention within the spleen.

      (3) Incorporating additional methods for RBC identification:

      To enhance the specificity of our findings, we integrated supplementary methods for RBC identification in the revised manuscript. We employed band3 immunostaining (in the new Figure 6, C-D and G-H) and PKH67 labeling (Figure 6, I-J) for a more targeted identification of RBCs. Band3, serving as a reliable marker for RBCs, augments the specificity of our immunostaining approach. Likewise, PKH67 labeling affords a direct and definitive means to assess RBC retention in the spleen following HH exposure.

      Author response image 16. same as 10

      (4) Revised interpretation and manuscript modifications:

      Based on these enhanced methodologies, we have refined our interpretation of the data and accordingly updated the manuscript. The revised narrative underscores that our conclusions regarding reduced erythrophagocytosis and RBC retention under HH conditions are corroborated by not only Wright-Giemsa composite staining but also by band3 immunostaining and PKH67 labeling, each contributing distinctively to our comprehensive understanding.

      We are committed to ensuring that our manuscript precisely reflects the contribution of each method to our findings and conclusions. Your thorough review has been invaluable in identifying and rectifying areas for improvement in our research report and interpretation.

      (12) Ferroptosis data in Fig 5 is not specific to macrophages and Fer-1 data confirms the expected effect of Fer-1 but there is no data that supports that Fer-1 reverses the destruction of these cells or restores their function in hypoxia. Finally, these experiments were performed in peritoneal macrophages which are functionally distinct from splenic RPM.

      Thank you for your critique of our presentation and interpretation of the ferroptosis data in Figure 5 (renamed as Figure 9 in the revised manuscript), as well as your observations regarding the specificity of the experiments to macrophages and the effects of Fer-1. We value your input and acknowledge the need to clarify these aspects in our manuscript.

      (1) Clarification on cell type used in experiments:

      • We appreciate your attention to the details of our experimental setup. The experiments presented in Figure 9 were indeed conducted on splenic macrophages, not peritoneal macrophages, as incorrectly mentioned in the original figure legend. This was an error in our manuscript, and we have revised the figure legend accordingly to accurately reflect the cell type used.

      (2) Specificity of ferroptosis data:

      • We recognize that the data presented in Figure 9 need to be more explicitly linked to the specific macrophage population being studied. In the revised manuscript, we ensure that the discussion around ferroptosis data is clearly situated within the framework of splenic macrophages.

      • We also provide additional methodological details in the 'Methods' section to reinforce the specificity of our experiments to splenic macrophages.

      (3) Effects of Fer-1 on macrophage function and survival:

      • Regarding the effect of Fer-1, we agree that while our data confirms the expected effect of Fer-1 in inhibiting ferroptosis, we have not provided direct evidence that Fer-1 reverses the destruction of macrophages or restores their function in hypoxia.

      • To address this, we propose additional experiments to specifically investigate the impact of Fer-1 on the survival and functional restoration of splenic macrophages under hypoxic conditions. This would involve assessing not only the inhibition of ferroptosis but also the recovery of macrophage functionality post-treatment.

      (4) Revised interpretation and manuscript changes:

      • We’ve revised the relevant sections of our manuscript to reflect these clarifications and proposed additional studies. This includes modifying the discussion of the ferroptosis data to more accurately represent the cell types involved and the limitations of our current findings regarding the effects of Fer-1.

      • The revised manuscript presents a more detailed interpretation of the ferroptosis data, clearly describing what our current experiments demonstrate and what remains to be investigated.

      We are grateful for your insightful feedback, which has highlighted important areas for improvement in our research presentation. We think that these revisions will enhance the clarity and scientific accuracy of our manuscript, ensuring that our findings and conclusions are well-supported and precisely communicated.

      Reviewer #2 (Recommendations For The Authors):

      The following questions and remarks should be considered by the authors:

      (1) The methods should clearly state whether the HH was discontinued during the 7 or 14 day exposure for cleaning, fresh water etc. Moreover, how was CO2 controlled? The procedure for splenectomy needs to be described in the methods.

      Thank you for your inquiry regarding the specifics of our experimental methods, particularly the management of HH exposure and the procedure for splenectomy. We appreciate your attention to detail and the importance of these aspects for the reproducibility and clarity of our research.

      (1) HH exposure conditions:

      In our experiments, mice were continuously exposed to HH for the entire duration of 7 or 14 days, without interruption for activities such as cleaning or providing fresh water. This uninterrupted exposure was crucial for maintaining consistent hypobaric conditions throughout the experiment. The hypobaric chamber was configured to ensure a ventilation rate of 25 air exchanges per minute. This high ventilation rate was effective in regulating the concentration of CO2 inside the chamber, thereby maintaining a stable environment for the mice.

      (2) The splenectomy was performed as follows:

      After anesthesia, the mice were placed in a supine position, and their limbs were fixed. The abdominal operation area was skinned, disinfected, and covered with a sterile towel. A median incision was made in the upper abdomen, followed by laparotomy to locate the spleen. The spleen was then carefully pulled out through the incision. The arterial and venous directions in the splenic pedicle were examined, and two vascular forceps were used to clamp all the tissue in the main cadre of blood vessels below the splenic portal. The splenic pedicle was cut between the forceps to remove the spleen. The end of the proximal hepatic artery was clamped with a vascular clamp, and double or through ligation was performed to secure the site. The abdominal cavity was then cleaned to ensure there was no bleeding at the ligation site, and the incision was closed. Post-operatively, the animals were housed individually. Generally, they were able to feed themselves after recovering from anesthesia and did not require special care.

      We hope this detailed description addresses your queries and provides a clear understanding of the experimental conditions and procedures used in our study. These methodological details are crucial for ensuring the accuracy and reproducibility of our research findings.

      (2) The lack of changes in MCH needs explanation? During stress erythropoiesis some limit in iron availability should cause MCH decrease particularly if the authors claim that macrophages for rapid iron recycling are decreased. Fig 1A is dispensable. Fig 1G NN control 14 days does not make sense since it is higher than 7 days of HH.

      Thank you for your inquiry regarding the lack of changes in Mean Corpuscular Hemoglobin (MCH) in our study, particularly in the context of stress erythropoiesis and decreased macrophage-mediated iron recycling. We appreciate the opportunity to provide further clarification on this aspect.

      (1) Explanation for stable MCH levels:

      • Our research identified a decrease in erythrophagocytosis and iron recycling in the spleen following HH exposure. Despite this, the MCH levels remained stable. This observation can be explained by considering the compensatory roles of other organs, particularly the liver and duodenum, in maintaining iron homeostasis.

      • Specifically, our investigations revealed an enhanced capacity of the liver to engulf RBCs and process iron under HH conditions. This increased hepatic erythrophagocytosis likely compensates for the reduced splenic activity, thereby stabilizing MCH levels.

      (2) Role of hepcidin and DMT1 expression:

      Additionally, hypoxia is known to influence iron metabolism through the downregulation of Hepcidin and upregulation of Divalent Metal Transporter 1 (DMT1) expression. These alterations lead to enhanced intestinal iron absorption and increased blood iron levels, further contributing to the maintenance of MCH levels despite reduced splenic iron recycling.

      (3) Revised Figure 1 and data presentation

      To address the confusion regarding the data presented in Figure 1G, we have made revisions in our manuscript. The original Figure 1G, which did not align with the expected trends, has been removed. In its place, we have included a statistical chart of Figure 1F in the new version of Figure 1G. This revision will provide a clearer and more accurate representation of our findings.

      (4) Manuscript updates and future research:

      • We update our manuscript to incorporate these explanations, ensuring that the rationale behind the stable MCH levels is clearly articulated. This includes a discussion on the role of the liver and duodenum in iron metabolism under hypoxic conditions.

      • Future research could explore in greater detail the mechanisms by which different organs contribute to iron homeostasis under stress conditions like HH, particularly focusing on the dynamic interplay between hepatic and splenic functions.

      We thank you for your insightful question, which has prompted a thorough re-examination of our findings and interpretations. We believe that these clarifications will enhance the overall understanding of our study and its implications in the context of iron metabolism and erythropoiesis under hypoxic conditions.

      (3) Fig 2 the difference between sham and splenectomy is really marginal and not convincing. Is there also a difference at 7 days? Why does the spleen size decrease between 7 and 14 days?

      Thank you for your observations regarding the marginal differences observed between sham and splenectomy groups in Figure 2, as well as your inquiries about spleen size dynamics over time. We appreciate this opportunity to clarify these aspects of our study.

      (1) Splenectomy vs. Sham group differences:

      • In our experiments, the difference between the sham and splenectomy groups under HH conditions, though subtle, was consistent with our hypothesis regarding the spleen's role in erythrophagocytosis and stress erythropoiesis. Under NN conditions, no significant difference was observed between these groups, which aligns with the expectation that the spleen's contribution is more pronounced under hypoxic stress.

      (2) Spleen size dynamics and peak stress erythropoiesis:

      • The observed splenic enlargement prior to 7 days can be attributed to a combination of factors, including the retention of RBCs and extramedullary hematopoiesis, which is known to be a response to hypoxic stress.

      • Prior research has elucidated that splenic stress-induced erythropoiesis, triggered by hypoxic conditions, typically attains its zenith within a timeframe of 3 to 7 days. This observation aligns with our Toluidine Blue (TO) staining results, which indicated that the apex of this response occurs at the 7-day mark (as depicted in Figure 1, F-G). Here, the culmination of this peak is characteristically succeeded by a diminution in extramedullary hematopoiesis, a phenomenon that could elucidate the observed contraction in spleen size, particularly in the interval between 7 and 14 days.

      • This pattern of splenic response under prolonged hypoxic stress is corroborated by studies such as those conducted by Wang et al. (2021), Harada et al. (2015), and Cenariu et al. (2021). These references collectively underscore that the spleen undergoes significant dynamism in reaction to sustained hypoxia. This dynamism is initially manifested as an enlargement of the spleen, attributable to escalated erythropoiesis and erythrophagocytosis. Subsequently, as these processes approach normalization, a regression in spleen size ensues.

      We’ve revised our manuscript to include a more detailed explanation of these splenic dynamics under HH conditions, referencing the relevant literature to provide a comprehensive context for our findings. We will also consider performing additional analysis or providing further data on spleen size changes at 7 days to support our observations and ensure a thorough understanding of the splenic response to hypoxic stress over time.

      (4) Fig 3 B the clusters should be explained in detail. If the decrease in macrophages in Fig 3K/L is responsible for the effect, why does splenectomy not have a much stronger effect? How do the authors know which cells died in the calcein stained population in Fig 3D?

      Thank you for your insightful questions regarding the details of our data presentation in Figure 3, particularly about the identification of cell clusters and the implications of macrophage reduction. We appreciate the opportunity to address these aspects and clarify our findings.

      (1) Explanation of cell clusters in Figure 3B:

      • In the revised manuscript, we have included detailed notes for each cell population represented in Figure 3B (Figure 3D in revised manuscript). These notes provide a clearer understanding of the cell types present in each cluster, enhancing the interpretability of our single-cell sequencing data.

      • This detailed annotation will help readers to better understand the composition of the splenic cell populations under study and how they are affected by hypoxic conditions.

      (2) Impact of splenectomy vs. macrophage reduction:

      • The interplay between the reduction in macrophage populations, as evidenced by our single-cell sequencing data, and the ramifications of splenectomy presents a multifaceted scenario. Notably, the observed decline in macrophage numbers following HH exposure does not straightforwardly equate to a comparable alteration in overall splenic function, as might be anticipated with splenectomy.

      • In the context of splenectomy under HH conditions, a significant escalation in the RBCs count was observed, surpassing that in non-splenectomized mice exposed to HH. This finding underscores the spleen's critical role in modulating RBCs dynamics under HH. It also indirectly suggests that the diminished phagocytic capacity of the spleen following HH exposure contributes to an augmented RBCs count, albeit to a lesser extent than in the splenectomy group. This difference is attributed to the fact that, while the number of RPMs in the spleen post-HH is reduced, they are still present, unlike in the case of splenectomy, where they are entirely absent.

      • Splenectomy entails the complete removal of the spleen, thus eliminating a broad spectrum of functions beyond erythrophagocytosis and iron recycling mediated by macrophages. The nuanced changes observed in our study may be reflective of the spleen's diverse functionalities and the organism's adaptive compensatory mechanisms in response to the loss of this organ.

      (3) Calcein stained population in Figure 3D:

      • Regarding the identification of cell death in the calcein-stained population in Figure 3D (Figure 3A in revised manuscript), we acknowledge that the specific cell types undergoing death could not be distinctly determined from this analysis alone.

      • The calcein staining method allows for the visualization of live (calcein-positive) and dead (calcein-negative) cells, but it does not provide specific information about the cell types. The decrease in macrophage population was inferred from the single-cell sequencing data, which offered a more precise identification of cell types.

      (4) Revised manuscript and data presentation:

      • Considering your feedback, we have revised our manuscript to provide a more comprehensive explanation of the data presented in Figure 3, including the nature of the cell clusters and the interpretation of the calcein staining results.

      • We have also updated the manuscript to reflect the removal of Figure 3K/L results and to provide a more focused discussion on the relevant findings.

      We are grateful for your detailed review, which has helped us to refine our data presentation and interpretation. These clarifications and revisions will enhance the clarity and scientific rigor of our manuscript, ensuring that our conclusions are well-supported and accurately conveyed.

      (5) Is the reduced phagocytic capacity in Fig 4B significant? Erythrophagocytosis is compromised due to the considerable spontaneous loss of labelled erythrocytes; could other assays help? (potentially by a modified Chromium release assay?). Is it necessary to stimulated phagocytosis to see a significant effect?

      Thank you for your inquiry regarding the significance of the reduced phagocytic capacity observed in Figure 4B, and the potential for employing alternative assays to elucidate erythrophagocytosis dynamics under HH conditions.

      (1) Significance of reduced phagocytic capacity:

      The observed reduction in the amplitude of fluorescently labeled RBCs in both the blood and spleen under HH conditions suggests a decrease in erythrophagocytosis. This is indicative of a diminished phagocytic capacity, particularly when contrasted with NN conditions.

      (2) Investigation of erythrophagocytosis dynamics:

      To delve deeper into erythrophagocytosis under HH, we employed Tuftsin to enhance this process. Following the injection of PKH67-labeled RBCs and subsequent HH exposure, we noted a significant decrease in PKH67 fluorescence in the spleen, particularly marked after the administration of Tuftsin. This finding implies that stimulated erythrophagocytosis can influence RBCs lifespan.

      (3) Erythrophagocytosis under normal and hypoxic conditions:

      Under normal conditions, the reduction in phagocytic activity is less apparent without stimulation. However, under HH conditions, our findings demonstrate a clear weakening of the phagocytic effect. While we established that promoting phagocytosis under NN conditions affects RBC lifespan, the impact of enhanced phagocytosis under HH on RBCs numbers was not explicitly investigated.

      (4) Potential for alternative assays:

      Considering the considerable spontaneous loss of labeled erythrocytes, alternative assays such as a modified Chromium release assay could provide further insights. Such assays might offer a more nuanced understanding of erythrophagocytosis efficiency and the stability of labeled RBCs under different conditions.

      (5) Future research directions:

      The implications of these results suggest that future studies should focus on comparing the effects of stimulated phagocytosis under both NN and HH conditions. This would offer a clearer picture of the impact of hypoxia on the phagocytic capacity of macrophages and the subsequent effects on RBC turnover.

      In summary, our findings indicate a diminished erythrophagocytic capacity, with enhanced phagocytosis affecting RBCs lifespan. Further investigation, potentially using alternative assays, would be beneficial to comprehensively understand the dynamics of erythrophagocytosis in different physiological states.

      (6) Can the observed ferroptosis be influenced by bi- and not trivalent iron chelators?

      Thank you for your question regarding the potential influence of bi- and trivalent iron chelators on ferroptosis under hypoxic conditions. We appreciate the opportunity to discuss the implications of our findings in this context.

      (1) Analysis of iron chelators on ferroptosis:

      In our study, we did not specifically analyze the effects of bi- and trivalent iron chelators on ferroptosis under hypoxia. However, our observations with Deferoxamine (DFO), a well-known iron chelator, provide some insights into how iron chelation may influence ferroptosis in splenic macrophages under hypoxic conditions.

      (2) Effect of DFO on oxidative stress markers:

      Our findings showed that under 1% O2, there was an increase in Malondialdehyde (MDA) content, a marker of lipid peroxidation, and a decrease in Glutathione (GSH) content, indicative of oxidative stress. These changes are consistent with the induction of ferroptosis, which is characterized by increased lipid peroxidation and depletion of antioxidants. Treatment with Ferrostatin-1 (Fer-1) and DFO effectively reversed these alterations. This suggests that DFO, like Fer-1, can mitigate ferroptosis in splenic macrophages under hypoxia, primarily by impacting MDA and GSH levels.

      Author response image 17.

      (3) Potential role of iron chelators in ferroptosis:

      The effectiveness of DFO in reducing markers of ferroptosis indicates that iron availability plays a crucial role in the ferroptotic process under hypoxic conditions. It is plausible that both bi- and trivalent iron chelators could influence ferroptosis, given their ability to modulate iron availability within cells. Since ferroptosis is an iron-dependent form of cell death, chelating iron, irrespective of its valence state, could potentially disrupt the process by limiting the iron necessary for the generation of reactive oxygen species and lipid peroxidation.

      (4) Additional research and manuscript updates:

      Our study highlights the need for further research to explore the differential effects of various iron chelators on ferroptosis, particularly under hypoxic conditions. Such studies could provide a more comprehensive understanding of the role of iron in ferroptosis and the potential therapeutic applications of iron chelators. We update our manuscript to include these findings and discuss the potential implications of iron chelation in the context of ferroptosis under hypoxic conditions. This will provide a broader perspective on our research and its significance in understanding the mechanisms of ferroptosis.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This valuable study provides insights into the IDA peptide with dual functions in development and immunity. The approach used is solid and helps to define the role of IDA in a two-step process, cell separation followed by activation of innate defenses. The main limitation of the study is the lack of direct evidence linking signaling by IDA and its HAE receptors to immunity. As such the work remains descriptive but it will nevertheless be of interest to a wide range of plant cell biologists.

      We thank the reviewers for thoroughly reading our manuscript. We have used their comments and suggestions- to improve the manuscript. Below is a response to the reviewer's comments.

      Public Reviews:

      Reviewer #1 (Public Review):

      The paper titled 'A dual function of the IDA peptide in regulating cell separation and modulating plant immunity at the molecular level' by Olsson Lalun et al., 2023 aims to understand how IDAHAE/HSL2 signalling modulates immunity, a pathway that has previously been implicated in development. This is a timely question to address as conflicting reports exist within the field. IDL6/7 have previously been shown to negatively regulate immune signalling, disease resistance and stress responses in leaf tissue, however IDA has been shown to positively regulate immunity through the shedding of infected tissues. Moreover, recently the related receptor NUT/HSL3 has been shown to positively regulate immune signalling and disease resistance. This work has the potential to bring clarity to this field, however the manuscript requires some additional work to address these questions. This is especially the case as it contracts some previous work with IDL peptides which are perceived by the same receptor complexes.

      Can IDA induce pathogen resistance? Does the infiltration of IDA into leaf tissue enhance or reduce pathogen growth? Previously it has been shown that IDL6 makes plants more susceptible. Is this also true for IDA? Currently cytoplasmic calcium influx and apoplastic ROS as overinterpreted as immune responses - these can also be induced by many developmental cue e.g. CLE40 induced calcium transients. Whilst gene expression is more specific is also true that treatment with synthetic peptides, which are recognised by LRR-RKs, can induce immune gene expression, especially in the short term, even when that is not there in vivo function e.g. doi.org/10.15252/embj.2019103894.

      We thank the reviewer for the concerns raised and agree that further experiments including pathogen assays would strengthen the link between IDA signaling and immunity and we plan for such experiments in future work. We have however, modified the discussion to include the possible role of IDA induced Ca2+ and ROS during development. We have recently published a preprint (accepted for publication in JXB) ( (Galindo-Trigo et al., 2023, https://doi.org/10.1101/2023.09.12.557497)) strengthening the link between IDA and defense by identifying WRKY transcription factors that regulate IDA expression through a Y1H assay.

      This paper shows that receptors other than hae/hsl2 are genetically required to induce defense gene expression, it would have been interesting to see what phenotype would be associated with higher order mutants of closely related haesa/haesa-like receptors. Indeed recently HSL1 has been shown to function as a receptor for IDA/IDL peptides. Could the triple mutant suppress all response? Could the different receptors have distinct outputs? For example for FRK1 gene expression the hae hsl2 mutant has an enhanced response. Could defence gene expression be primarily mediated by HSL1 with subfunctionalisation within this clade?

      We agree that it would be interesting to also include HSL1 in our studies. However, the focus of this study has been on HAE and HSL2 and we wanted to explore their role in IDA induced defense responses. Including HSL1 in these studies will require generation of multiple transgenic lines and repeating most of the experiments and are experiments we will consider in a follow up study together with pathogen assays (that would also address the main concern raised in the comment above). We have however, modified the text to include the known function of HSL1 and discuss the possibility of subfunctionalisation of this receptor clade.

      One striking finding of the study is the strong additive interaction between IDA and flg22 treatment on gene expression. Do the authors also see this for co-treatment of different peptides with flg22, or is this unique function of IDA? Is this receptor dependent (HAE/HSL1/HSL2)?

      This is a good question. Since our study focuses on the IDA signaling pathway we preferentially tested if the additive effect observed between flg22 and mIDA was also observed when mIDA was combined with another peptide involved in defense. The endogenous peptide PIP1, has previously been shown to amplify flg22 signaling (Hou et al 2014, doi:10.1371/journal.ppat.1004331 ). In this study it is shown that co-treatment with flg22 and PIP1 gives increased resistance to Pseudomonas PstDC3000 compared to when plants are treated with each peptide separately. In the same study, the authors also show reduced flg22 induce transcriptional activity of two defense related genes WRKY33 and PR in the receptor like kinase7 (rlk7) mutant (the receptor perceiving PIP1) (). To investigate whether PIP1 would give the same additive effect with mIDA as that observed between flg22 and mIDA, we co-treated seedlings with PIP1 and mIDA. We observed no enhanced transcriptional activity of FRK1, MYB51 and PEP3 in tissue from plants treated with both PIP1 and mIDA peptides compared to single exposure. These results are presented in supplementary figure 11. In conclusion we do not think mIDA acts as a general amplifier of all immune elicitors in plants.

      It is interesting how tissue specific calcium responses are in response to IDA and flg22, suggesting the cellular distribution of their cognate receptors. However, one striking observation made by the authors as well, is that the expression of promoter seems to be broader than the calcium response. Indicating that additional factors are required for the observed calcium response. Could diffusion of the peptide be a contributing factor, or are only some cells competent to induce a calcium response?

      It is interesting that the authors look for floral abscission phenotypes in cngc and rbohd/f mutants to conclude for genetic requirement of these in floral abscission. Do the authors have a hypothesis for why they failed to see a phenotype for the rbohd/f mutant as was published previously? Do you think there might be additional players redundantly mediating these processes?

      It is a possibility that diffusion of the peptide plays a role in the observed response. In a biological context we would assume that the local production of the peptides plays an important role in the cellular responses. In our experimental setup, we add the peptide externally and we can therefore assume that the overlaying cells get in contact with the peptide before cells in the inner tissues and this could be affecting the response recorded However, our results show that there is a differences between flg22 and mIDA induced responses even when the application of the peptides is performed in the same manner, indicating that the difference in the response is not primarily due to the diffusion rate of the peptides but is likely due to different factors being present in different cells. To acquire a better picture of the distribution of receptor expression in the root tissue and to investigate in which cells the receptors have an overlapping expression pattern, we have included results in figure 6 showing plant lines co-expressing transcriptional reporters of FLS2 and HAE or HSL2.

      Can you observe callose deposition in the cotyledons of the 35S::HAE line? Are the receptors expressed in native cotyledons? This is the only phenotype tested in the cotyledons.

      We thank the reviewer for this valuable comment. We have now conducted callose deposition assay on the 35S:HAE line. And Indeed, we observe callose depositions when cotyledons from a 35S:HAE line is treated with mIDA. We have included these results in figure 4 and have adjusted the text regarding the callose assay accordingly. In addition, we have analyzed the promoter activity of pHAE in cotelydons and we observe weak promoter activity. These results are included as supplementary figure 1d.

      Are flg22-induced calcium responses affected in hae hsl2?

      The experiment suggested by the reviewer is an important control to ensure that the hae hsl2-Aeq line can respond to a Ca2+ inducing peptide signaling through a different receptor than HAE or HSL2. One would expect to see a Ca2+ response in this line to the flg22 peptide. We performed this experiment and surprisingly we could not detect a flgg22 induced Ca2+ signal in the hae hsl2 mutnt. As it is unlikely that the Ca2+ response triggered by flg22 is dependent on HAE and HSL2 we have to assume that the lack of response is due to a malfunction of the Aeq sensor in this line. As a control to measure the amount of Aeq present in the cells we treat the Aeq seedlings with 2 M CaCl2 and measure the luminescence constantly for 180 seconds (Ranf et al., 2012, DOI10.1093/mp/ssr064). The CaCl2 treatment disrupts the cells and releases the Aeq sensor into the solution where it will react with Ca2+ and release the total possible response in the sample (Lmax) in form of a luminescent peak. When treating the hae hsl2-Aeq line with CaCl2we observe a luminescent peak, indicating the presence of the sensor, however, the response is reduced compared to WT seedlings expressing Aeq. Given the sensitivity of FLS2 to flg22 one would still expect to see a Ca2+ peak in the hae hsl2-Aeq line even if the amount of sensor is reduced. Given that this is not the case, we have to assume that localization or conformation of the sensor is somehow affected in this line or that there is another biological explanation that we cannot explain at the moment.

      We have therefore opted on omitting the results using the hae hsl2 Aeq lines from the manuscript and are in the process of mutating HAE and HSL2 by CRISPR-Cas9 in the Aeq background to verify that the mIDA triggered Ca2+ response is dependent on HAE and HSL2.

      Reviewer #2 (Public Review):

      Lalun and co-authors investigate the signalling outputs triggered by the perception of IDA, a plant peptide regulating organs abscission. The authors observed that IDA perception leads to a transient influx of Ca2+, to the production of reactive oxygen species in the apoplast, and to an increase accumulation of transcripts which are also responsive to an immunogenic epitope of bacterial flagellin, flg22. The authors show that IDA is transcriptionally upregulated in response to several biotic and abiotic stimuli. Finally, based on the similarities in the molecular responses triggered by IDA and elicitors (such as flg22) the authors proposed that IDA has a dual function in modulating abscission and immunity. The manuscript is rather descriptive and provide little information regarding IDA signalling per se. A potential functional link between IDA signalling and immune signalling remains speculative.

      We thank the reviewer for the concerns raised and agree that further experiments including pathogen assays would strengthen the link between IDA signaling and immunity and plan for such experiments in future work.

      Reviewer #3 (Public Review):

      Previously, it has been shown the essential role of IDA peptide and HAESA receptor families in driving various cell separation processes such as abscission of flowers as a natural developmental process, of leaves as a defense mechanism when plants are under pathogenic attack or at the lateral root emergence and root tip cell sloughing. In this work, Olsson et al. show for the first time the possible role of IDA peptide in triggering plant innate immunity after the cell separation process occurred. Such an event has been previously proposed to take place in order to seal open remaining tissue after cell separation to avoid creating an entry point for opportunistic pathogens.

      The elegant experiments in this work demonstrate that IDA peptide is triggering the defenseassociated marker genes together with immune specific responses including release of ROS and intracellular CA2+. Thus, the work highlights an intriguing direct link between endogenous cell wall remodeling and plant immunity. Moreover, the upregulation of IDA in response to abiotic and especially biotic stimuli are providing a valuable indication for potential involvement of HAE/IDA signalling in other processes than plant development.

      We are pleased that the reviewer finds our findings linking IDA to defense interesting and would like to thank the reviewer for this positive feedback.

      Strengths:

      The various methods and different approaches chosen by the authors consolidates the additional new role for a hormone-peptide such as IDA. The involvement of IDA in triggering of the immunity complex process represents a further step in understanding what happens after cell separation occurs. The Ca2+ and ROS imaging and measurements together with using the haehsl2 and haehsl2 p35S::HAE-YFP genotypes provide a robust quantification of defense responses activation. While Ca2+ and ROS can be detected after applying the IDA treatment after the occurrence of cell separation it is adequately shown that the enzymes responsible for ROS production, RBOHD and RBOHF, are not implicated in the floral abscission.

      Furthermore, IDA production is triggered by biotic and abiotic factors such as flg22, a bacterial elicitor, fungi, mannitol or salt, while the mature IDA is activating the production of FRK1, MYB51 and PEP3, genes known for being part of plant defense process.

      Thank you.

      Weaknesses:

      Even though there is shown a clear involvement of IDA in activating the after-cell separation immune system, the use of p35S:HAE-YFP line represent a weak point in the scientific demonstration. The mentioned line is driving the HAE receptor by a constitutive promoter, capable of loading the plant with HAE protein without discriminating on a specific tissue. Since it is known that IDA family consist of more members distributed in various tissues, it is very difficult to fully differentiate the effects of HAE present ubiquitously.

      We agree on this statement. Nevertheless, it is important to note that the responses we have observed are not detectable in WT plants that do not (over)express the HAE receptors. Suggesting that the ROS and callose deposition are induced by the addition of mIDA peptide and not the potential presence of the endogenous IDL peptides.

      The co-localization of HAE/HSL2 and FLS2 receptors is a valuable point to address since in the present work, the marker lines presented do not get activated in the same cell types of the root tissues which renders the idea of nanodomains co-localization (as hypothetically written in the discussion) rather unlikely.

      Thank you for raising an important aspect of our study. It is true that not all cells in the root which have promoter activity for FLS2 also exhibit promoter activity for either HAE or HSL2. However, we have observed that certain cells in the roots show promoter activity for both receptors. In the revised version of the manuscript, we have included plants expression a transcriptional promoter for both FLS2 and HAE or HSL2 using different fluorescent proteins. We have investigated overlapping promoter activity both at sites of lateral roots, in the tip of the primary root and in the abscission zone. Our results show overlapping expression of the transcriptional reporters in certain cells, indicating that FLS2 and HAE or HSL2 are likely to be found in some of the same cells during plant development. We also observe cells where only one or none of the promoters are active.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Supplementary Figure 3: re-labelling of y axis; 200 than 200,00 for clarity.

      This has been addressed.

      Supplementary Figure 2: It would be good to include the age of the seedlings used to study calcium influx in the legend.

      This has been addressed.

      Supplementary Figure 1: rephrase 'IDA induces ROS production in Arabidopsis'.

      This has been addressed.

      The use of chelating agents to establish the need of calcium from extracellular space is a clear experiment supporting the calcium response phenotype specific to IDA treatment in seedlings. Removing the last asparagine (N) and using it as a peptide that fails to elicit calcium response could simply be because of the peptide is smaller in length or different chemical properties. Therefore, a scrambled sequence would have been a better control.

      We thank the reviewer for the suggestion of using a scrambled peptide as a negative control, however we find it unlikely that mIDA∆N69 could induce any activity based on previous work. Results from crystal structure of mIDA bound to the HAE receptor and ligand-receptor interaction studies (10.7554/eLife.15075 ) show that the last asparagine in the mIDA peptide is essential for detectable binding to the HAE receptor and that a peptide lacking this amino acid does not have any activity. We will however, in future experiments also include a scrambled version of the peptide as an additional control.

      Reviewer #2 (Recommendations For The Authors):

      Please find below specific comments:

      (1) Most of the molecular outputs triggered by IDA can be considered as common molecular marks of plant peptides signalling, they do not represent strong evidences of a potential function of IDA in modulating immunity. For instance, perception of CIF peptides, which control the establishment of the Casparian strips, regulate the production of reactive oxygen species, and the transcription of genes associated with immune responses (Fujita et al., The EMBO Journal 2020). It should also be considered that FRK1, whose function remains unknown, may be involved in both immunity and abscission and that the upregulation of FRK1 upon IDA treatment is not indicative of active modulation of immune signalling by IDA.

      This is a fair point raised by the reviewer and we now address in the manuscript that ROS and Ca2+ are hallmarks of both plant development and defense. The function of FRK1 is not known however, it is unlikely that the upregulation of FRK1 in response to mIDA plays a role in the developmental progression of abscission as it is not temporally regulated during the abscission process, thus making it an unlikely candidate in the regulation of cell separation (Cai & Lashbrook, 2008, https://doi.org/10.1104/pp.107.110908). We do however agree that further experiments including pathogen assays would strengthen the link between IDA signaling and immunity and plan for such experiments in future work.

      (2) It remains unknown whether IDA modulate immunity. For instance, does IDA perception promote resistance to bacteria (bacterial proliferation, disease symptoms)? Is IDA genetically required for plant disease resistance immunity? Is the IDA signalling pathway genetically required for transcriptional changes induced by flg22, such as increase in FRK1 transcripts? In addition, the authors propose that the proposed function of IDA in modulating immune signalling prevents bacterial infection in tissue exposed to stress(es). Does loss of function of IDA or of its corresponding receptors leads to changes in the ability of bacteria to colonise plant root upon stress(es)?

      Please see the comment above regarding pathogen assays.

      (3) Several aspects of the work appear to correspond to preliminary investigation. For instance, the authors analyse loss of function mutant for genes encoding for Ca2+ permeable channels (CNGCs) which are transcriptionally active during the onset of abscission (Sup. Figure 5). None of the single mutants present an abscission defect. These observations provide no information regarding the identity of the channel(s) involved in IDA-induced calcium influx.

      We agree with the reviewer that we have not been able to identify the channels responsible for the IDA-induced calcium influx. Given the redundancy for many of the members of this multigenic family a future approach to identify proteins responsible for the IDA triggered calcium response could be to create multiple KO mutants by CRISPR Cas9.

      (4) Using H2DCF-DA, the authors observed a decrease in ROS accumulation in the abscission zone of rbohd/rbohf double KO line (Sup Figure 5c) but describe in the text that ROS production in this zone does not depend on RBOHD and RBOHF (L220). Please clarify.

      This has now been clarified in the text.

      (5) The authors describe that rbohd/rbohf double KO present a lower petal break-strength, which they describe as an indication of premature cell wall loosening, and that petals of rbohd/rbohf abscised one position earlier than in WT. Yet, the authors postulate that IDA-induced ROS production does not regulate abscission but may regulate additional responses. Instead the data seems to indicate that ROS production by RBOHD and RBOHF regulate the timing of abscission. In addition, it would have been interesting to test whether IDA signalling pathway regulate ROS production in the abscission zone.

      The rbohd and rbohf double mutants show several phenotypes associated to developmental stress, the mild phenotype observed with regards to premature abscission (by one position) could be caused by the phenotype of the double mutant rather than related to ROS production. Indeed, it has been suggested that the lignified brace in the AZ dependent on ROS production by the aforementioned RBOHs in necessary for the correct concentration of cell modifying enzymes (Lee et al., 2018, https://doi.org/10.1016/j.cell.2018.03.060). The precocious abscission in this double mutant clearly shows this not to be the case. We have tried to do a ROS burst assay on AZ tissue/flowers with the mIDA peptide but have not been successful with this approach. A ROS sensor expressed in AZ tissue would be a valuable tool to address whether IDA signalling regulates ROS production in AZs.

      (6) In Sup. Figure5a, it would be of interest to have a direct comparison of the transcript accumulation of the presented CNGCs and RBOHDs with other of these multigenic families.

      The CNGCs and RBOH gene expression profile shown in the figure are the family members expressed during the developmental progress of floral abscission in stamen AZs. Since there is no difference in the temporal expression of the other family members (and most are either not expressed or very weakly expressed in this tissue) it is not possible to do this comparison (Cai & Lashbrook, 2008, https://doi.org/10.1104/pp.107.110908).

      (7) L251-253, since IDAdeltaN69 cannot be perceived by its receptors, the absence of induction of pIDA::GUS by IDAdeltaN69 compared to flg22 cannot be seen as a sign of specificity in peptideinduced increase in IDA promotor activity.

      We have rephased this in the text

      (8) Please provide quantitative and statistical analysis of the calcium measurement presented in sup figure 3.

      This has been addressed.

      (9) L339-341; This sentence is unclear to me, please rephrase.

      We have rephased this in the text

      Reviewer #3 (Recommendations For The Authors):

      (1) In order to assess the role of CNGCs in abscission process, it would be more interesting to see the effect on the Ca2+ pattern and ROS signaling after application of mIDA on cngc and rbohf rbohd mutants.

      We agree in this statement and the studies on mIDA induced ROS and Ca2+ on these mutants will provide valuable information to the regulation of the response. We are in the process of making the lines needed to be able to perform these experiments. However, since it requires crossing of genetically encoded sensors into each mutant, and generation of higher order mutants this is a long process.

      (2) With regard to the ROS production (Sup Fig. 1), the application of mIDA can trigger ROS in p35S::HAE:YFP lines, but not in the wild-type plant, which is according to the text "most likely due to the absence of HAE expression" in leaves. The experiment on callose deposition is performed in wild-type cotyledons where no callose deposition could be observed after mIDA treatment (Fig. 4a,b). The conclusion from text is that IDA "is not involved in promoting deposition of callose as a long-term defence response". It appears more likely that neither ROS nor callose can be observed in wild-type plants due to the lack of HAE expression. Therefore, the callose experiment should include the p35S::HAE:YFP lines. The experiment as it is does not allow to draw any conclusion on HAE/IDA involvement in callose formation.

      We fully agree with this comment, thank you for pinpointing this out. We have now performed the callose experiment with the 35S:HAE lines. Please see our answer to reviewer #1.

      (3) Between Sup Fig. 3 and Sup Fig. 5 two different systems were used to asses the floral stage. An adjustment of the floral stages would be easier to convey the levels of HAE/HSL2 expression and hence potentially with the onset of cell-wall degradation.

      We now used the same system to assess floral stages throughout the whole manuscript.

      (4) For the Fig. 1 and 2, it will be helpful to mention the genotype used for imaging/quantification of Ca2+.

      This has been addressed.

      (5) Some of the abbreviations are not introduced as full-text at their first time use in the text, such as: mIDA (Line 68), Ef-Tu (line 85), NADPH (line 77).

      The abbreviations have now been introduced.

      (6) In the legend of Fig. 5 (lines 897 and 898)- in the figure description, the box plots are identified as light gray and dark gray, while in the panel a of the figure the box plots are colored in red and blue.

      Thank you for pointing this out, this has now been corrected.

      (7) In figure 1 and 2. the authors write that the number of replicates is 10 (n=10) but data represents a single analysis. Please provide the quantitative ROI analysis, demonstrating that the observed example is representative. This is particularly important since the authors claim very specific changes in pattern of Ca signaling between mIDA and FLG22 treatments (Line 148).

      (8) Figure 4: please use alternative scaling on the Y axis instead of breaks.

      This has now been fixed.

      (9) Figure 5: it is not clear what n=4 refers to when the authors state three independent replicates. In figure 6 they state 4 technical reps and 3 biological reps. Please ensure this is similar across all descriptions.

      We have now ensured the correct information in all descriptions.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This study presents valuable findings on Legionella pneumophila effector proteins that target host vesicle trafficking GTPases during infection and more specifically modulate ubiquitination of the host GTPase Rab10. The evidence supporting the claims of the authors is solid, although it remains unclear how modification of the GTPase Rab10 with ubiquitin supports Legionella virulence and the impact of ubiquitination during LCV formation. The work will be of interest to colleagues studying animal pathogens as well as cell biologists in general.

      We greatly appreciate the positive and valuable feedback from the editors and the reviewers. According to their suggestions, we added many new experimental data and implications of our findings in Legionella virulence in terms of the biological process of its replication niche. Please find our point-to-point responses below.

      Public Reviews:

      Reviewer #1 (Public Review):

      In this manuscript, Kubori and colleagues characterized the manipulation of the host cell GTPase Rab10 by several Legionella effector proteins, specifically members of the SidE and SidC family. They show that Rab10 undergoes both conventional ubiquitination and noncanonical phosphoribose-ubiquitination, and that this posttranslational modification contributes to the retention of Rab10 around Legionella vacuoles.

      Strengths

      Legionella is an emerging pathogen of increasing importance, and dissecting its virulence mechanisms allows us to better prevent and treat infections with this organism. How Legionella and related pathogens exploit the function of host cell vesicle transport GTPases of the Rab family is a topic of great interest to the microbial pathogenesis field. This manuscript investigates the molecular processes underlying Rab10 GTPase manipulation by several Legionella effector proteins, most notably members of the SidE and SidC families. The finding that MavC conjugates ubiquitin to SdcB to regulate its function is novel, and sheds further light into the complex network of ubiquitin-related effectors from Lp. The manuscript is well written, and the experiments were performed carefully and examined meticulously.

      Weaknesses

      Unfortunately, in its current form this manuscript offers only little additional insight into the role of effector-mediated ubiquitination during Lp infection beyond what has already been published. The enzymatic activities of the SidC and SidE family members were already known prior to this study, as was the importance of Rab10 for optimal Lp virulence. Likewise, it had previously been shown that SidE and SidC family members ubiquitinate various host Rab GTPases, like Rab33 and Rab1. The main contribution of this study is to show that Rab10 is also a substrate of the SidE and SidC family of effectors. What remains unclear is if Rab10 is indeed the main biological target of SdcB (not just 'a' target), and how exactly Rab10 modification with ubiquitin benefits Lp infection.

      Reviewer #1 (Recommendations for The Authors):

      Major points of concern

      (1) The authors show that SdcB increases Rab10 levels on LCVs at later times of infection and conclude that this is its main biological role. An alternative explanation may be that Rab10 is not 'the main' target of SdcB but merely 'a' target, which may explain why the effect of SdcB on Rab10 accumulation on LCV is only detectable after several hours of infection. An unbiased omics-based approach to identify the actual host target(s) of SdcB may be needed to confirm that Rab10 modification by SdcB is biologically relevant.

      We totally agree with your comment that SdcB should have multiple targets considering the abundance of ubiquitin observed on the LCVs when SdcB was expressed (Figure 3). However, the effect of SdcB on Rab10 accumulation at the later time point (7 h) (current Figure 4e) was well supported by the new data showing that the SdcB-mediated ubiquitin conjugation to Rab10 was highly detected at this time point (new Figure 4c). We have tried the comprehensive search of interaction partners of the ANK domain of SdcB. This analysis is planned to be included in our on-going study. We therefore decided not to add the data in this manuscript.

      (2) The authors show that Rab10 within cell lysate is ubiquitinated and conclude that ubiquitination of Rab10 is directly responsible for its retention on the LCV. What is the underlying molecular mechanism for this retention? Are GAP proteins prevented from binding and deactivating Rab10. This may be worth testing.

      It would be a fantastic hypothesis that a Rab10GAP is involved in the regulation of Rab10 localization on the LCV. However, as far as we know, GAP proteins against Rab10 have not been identified yet. It should be an important issue to be addressed when a Rab10GAP will be found.

      (3) Related to this, an alternative explanation would be that Rab10 retention is an indirect effect where inactivators of Rab10, such as host cell GAP proteins, are the main target of SidE/C family members and sent for degradation (see point #1). Can the authors show that Rab10 on the LCV is indeed ubiquitinated?

      The possible involvement of a putative Rab10GAP is currently untestable as it is not known. To address whether Rab10 located on the LCV is ubiquitinated nor not, we conducted the critical experiments using active Rab10 (QL) and inactive Rab10 (TN) (new Figure 4a, new Figure 4-figure supplement 1). As revealed for Rab1 (Murata et al., Nature Cell Biol. 2006; Ingmundson et al., Nature 2007), Rab10 is expected to be recruited to the LCV as a GDPbound inactive form and converted to a GTP-bound active form on the LCV. The new results clearly demonstrated that GTP-locked Rab10QL is preferentially ubiquitinated upon infection, strongly supporting the model; Rab10 is ubiquitinated “on the LCV” by the SidE and SidC family ligases.

      (4) Also, on what residue(s) is Rab10 ubiquitinated? Jeng et. al. (Cell Host Microbe, 2019, 26(4): 551-563)) suggested that K102, K136, and K154 of Rab10 are modified during Lp infection. How does substituting those residues affect the residency of Rab10 on LCVs? Addressing these questions may ultimately help to uncover if the growth defect of a sidE gene cluster deletion strain is due to its inability to ubiquitinate and retain Rab10 on the LCV.

      Thank you for the suggestion. We conducted mutagenesis of the three Lys residues of Rab10 and applied the derivative on the ubiquitination analysis (new Figure 1-figure supplement 1). The Lys substitution to Ala residues did not abrogate the ubiquitination upon Lp infection. This result indicates that ubiquitination sites are present in the other residue(s) including the PR-ubiquitination site(s), raising possibility that disruption of sidE genes would be detrimental for intracellular growth of L. pneumophila because of failure of Rab10 retention.

      (5) The authors proposed that "the SidE family primarily contributes towards ubiquitination of Rab10". In this case, what is the significance of SdcB-mediated ubiquitination of Rab10 during Lp infection?

      We found that the major contribution of SdcB is retention of Rab10 until the late stage of infection. This claim was supported by our new data (new Figure 4c) as mentioned above (response to comment #1).

      (6) The contribution of SdcB to ubiquitination of Rab10 relative to SidC and SdcA is unclear. SidC is shown to be unaffected by MavC. In this case, SidC can ubiquitinate Rab10 regardless of the regulatory mechanism of SdcB by MavC. This is not further being examined or discussed in the manuscript.

      The effect of intrinsic MavC is apparent at the later stage (9 h) of infection (Figure 7c) when SdcB gains its activity (see above). We therefore do not think that the contribution of MavC on the SidC/SdcA activities, which are effective in the early stage, would impact on Rab10 localization. However, without specific experiments addressing this issue, possible MavC effects on SidC/SdcA would be beyond the scope in this manuscript.

      (7) When is Rab10 required during Lp infection? The authors showed that Rab10 levels at LCV are rather stable from 1hr to 7hr post infection. If MavC regulates the activity of SdcB, when does this occur?

      While the Rab10 levels on the LCV (~40 %) are stable during 1-7 h post infection (Figure 2b), it reduced to ~20% at 9 h after infection (Figure 7c) (the description was added in lines 304-306). Rab10 seems to be required for optimal LCV biogenesis over the early to late stages, but may not be required at the maturation stage (9 h). We validated the effect of MavC on the Rab10 localization at this time point (Figure 7c). These observations allowed us to build the scheme described in Figure 7d. We revised the illustration in new Figure 7d according to the helpful suggestions from both the reviewers.

      (8) Previous analyses by MS showed that ubiquitination of Rab10 in Lp-infected cells decreases over time (from 1 hpi to 8 hpi - Cell Host Microbe, 2019, 26(4): 551-563). How does this align with the findings made here that Rab10 levels on the LCV and likely its ubiquitination levels increase over time?

      We carefully compared the Rab10 ubiquitination at 1 h and 7 h after infection (new Figure 1figure supplement 1b). This analysis showed that the level of its ubiquitination decreased over time in agreement with the previous report. Nevertheless, Rab10 was still significantly ubiquitinated at 7 h, which we believe to cause the sustained retention of Rab10 on the LCV at this time point. We added the observation in lines 146-148.

      (9) Polyubiquitination of Rab10 was not detected in cells ectopically producing SdcB and SdeA lacking its DUB domain (Figure 7 - figure supplement 2). Does SdcB actually ubiquitinate Rab10 (see also point #5)? Along the same line, it is curious to find that the ubiquitination pattern of Rab10 is not different for LpΔsidC/ΔsdcA compared to LpΔsidC/dsdcA/dsdcB (Figure 1C). The actual contribution of SdcB to ubiquitinating Rab10 compared to SidC/SdcA thus needs to be clarified.

      Thank you for the important point. We currently hypothesize that SidC/SdcA/SdcB-mediated ubiquitin conjugation can occur only in the presence of PR-ubiquitin on Rab10 (either directly on the PR-ubiquitin or on other residue(s) of Rab10). Failure to detect the polyubiquitination in the transfection condition (Figure 7-figure supplement 2) suggests that this specific ubiquitin conjugation can occur in the restricted condition, i.e. only “on the LCV”. We added this description in the discussion section (lines 334-335). No difference between the ΔsidCΔsdcA and ΔsidCΔsdcAΔsdcB strains (Figure 1C, 1h infection) can be explained by the result that SdcB gains activity at the later stages (see above).

      Minor comments In Figure 4b and 7b, the authors show a quantification of "Rab10-positive LCVs/SdcBpositive LCVs". Whys this distinction? It begs the question what the percentile of Rab10positive/SdcB-negative LCVs might be?

      We took this way of quantification as we just wanted to see the effect of SdcB on the Rab10 localization. To distinguish between SdcB-positive and negative LCVs, we would need to rely on the blue color signals of DAPI to visualize internal bacteria, which we thought to be technically difficult in this specific analysis.

      The band of FLAG-tagged SdcB was not detected by immunoblot using anti-FLAG antibody (Figure 5). The authors hypothesized that "disappearance of the SdcB band can be caused by auto-ubiquitination, as SdcB has an ability to catalyze auto-ubiquitination with a diverse repertoire of E2 enzymes. This can be easily confirmed by using MG-132 to inhibit proteasomal degradation of polyubiquitinated substrates.

      We conducted the experiment using MG-132 as suggested and found that proteasomal degradation is not the cause of the disappearance of the band (new Figure 5-figure supplement 2, added description in lines 228-233). SdcB is actually not degraded. Instead, its polyubiquitination causes its apparent loss by distributing the SdcB bands in the gel.

      In Figure 5F, the authors mentioned that "HA-UbAA did not conjugate to SdcB", whereas "shifted band detected by FLAG probing plausibly represents conjugation of cellular intrinsic Ub". The same argument was made in Figure 6B. These claims should be confirmed by immunoblot using anti-Ub antibody.

      Thank you. We added the data using anti-Ub antibody (P4D1) (Figure 6f, new third panel).

      Figure 7A: In cell producing MavC, SdcB is clearly present on LCV. However, in Figure 5A, SdcB was not detected by immunoblot in cells ectopically expressing MavC-C74A. What is the interpretation for these results?

      SdcB was not degraded in the cells, but just its apparent molecular weight shift occurred by polyubiquitination (see above). The detection of SdcB in the IF images (Figure 7a) supported this claim.

      Reviewer #2 (Public Review):

      This manuscript explores the interplay between Legionella Dot/Icm effectors that modulate ubiquitination of the host GTPase Rab10. Rab10 undergoes phosphoribosyl-ubiquitination (PR-Ub) by the SidE family of effectors which is required for its recruitment to the Legionella containing vacuole (LCV). Through a series of elegant experiments using effector gene knockouts, co-transfection studies and careful biochemistry, Kubori et al further demonstrate that:

      (1) The SidC family member SdcB contributes to the polyubiquitination (poly-Ub) of Rab10 and its retention at the LCV membrane.

      (2) The transglutaminase effector, MavC acts as an inhibitor of SdcB by crosslinking ubiquitin at Gln41 to lysine residues in SdcB.

      Some further comments and questions are provided below.

      (1) From the data in Figure 1, it appears that the PR-Ub of Rab10 precedes and in fact is a prerequisite for poly-Ub of Rab10. The authors imply this but there's no explicit statement but isn't this the case?

      Yes, we think that it is the case. We revised the description in the text accordingly (lines 326327).

      (2) The complex interplay of Legionella effectors and their meta-effectors targeting a single host protein (as shown previously for Rab1) suggests the timing and duration of Rab10 activity on the LCV is tightly regulated. How does the association of Rab10 with the LCV early during infection and then its loss from the LCV at later time points impact LCV biogenesis or stability? This could be clearer in the manuscript and the summary figure does not illustrate this aspect.

      Thank you for pointing the important issue. Association of Rab10 with the LCV is thought to be beneficial for L. pneumophila as it is the identified factor which supports bacterial growth in cells (Jeng et al., 2019). We speculate that its loss from the LCV at the later stage of infection would also be beneficial, since the LCV may need to move on to the maturation stage in which a different membrane-fusion process may proceed. As this is too speculative, we gave a simple modification on the part of discussion section (lines 356-358). We also modified the summary figure (revised Figure 7d) as illustrated with the time course.

      (3) How do the activities of the SidE and SidC effectors influence the amount of active Rab10 on the LCV (not just its localisation and ubiquitination)

      We agree that it is an important point. We tested the active Rab10 (QL) and inactive Rab10 (TN) for their ubiquitination and LCV-localization profiles (new Figure 4ab, new Figure 4figure supplement 1 and 2). These analyses led us to the unexpected finding that the active form of Rab10 is the preferential target of the effector-mediated manipulation. See also our response to Reviewer 1’s comment #3. Thank you very much for your insightful suggestion.

      (4) What is the fate of PR-Ub and then poly-Ub Rab10? How does poly-Ub of Rab10 result in its persistence at the LCV membrane rather than its degradation by the proteosome?

      We have not revealed the molecular mechanism in this study. We believe that it is an important question to be solved in future. We added the sentence in the discussion section (lines 376378).

      (5) Mutation of Lys518, the amino acid in SdcB identified by mass spec as modified by MavC, did not abrogate SdcB Ub-crosslinking, which leaves open the question of how MavC does inhibit SdcB. Is there any evidence of MavC mediated modification to the active site of SdcB?

      The active site of SdcB (C57) is required for the modification (Figure 5b), but it is not likely to be the target residue, as the MavC transglutaminase activity restricts the target residues to Lys. It would be expected that multiple Lys residues on SdcB can be modified by MavC to disturb the catalytic activity.

      (6) I found it difficult to understand the role of the ubiquitin glycine residues and the transglutaminase activity of MavC on the inhibition of SdcB function. Is structural modelling using Alphafold for example helpful to explain this?

      We conducted the Alphafold analysis of SdcB-Ub. Unfortunately, when the Glycine residues of Ub was placed to the catalytic pocket of SdcB, Q41 of Ub did not fit to the expected position of SdcB (K518). Probably, the ternary complex (MavC-Ub-SdcB) would cause the change of their entire conformation. A crystal structure analysis or more detailed molecular modeling would be required to resolve the issue.

      (7) Are the lys mutants of SdbB still active in poly-Ub of Rab10?

      We performed the experiment and found that K518R K891R mutant of SdcB still has the E3 ligase activity of similar level with the wild-type upon infection (new Figure 6-figure supplement 2) (lines 283-284). The level was actually slightly higher than that of the wildtype. This result may suggest that the blocking of the modification sites can rescue SdcB from MavC-mediated down regulation.

      Reviewer #2 (Recommendations For The Authors):

      see above

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This valuable study applies voltage clamp fluorometry to provide new information about the function of serotonin-gated ion channels 5-HT3AR. The authors convincingly investigate structural changes inside and outside the orthosteric site elicited by agonists, partial agonists, and antagonists, helping to annotate existing cryo-EM structures. This work confirms that the activation of 5-HT3 receptors is similar to other members of this well-studied receptor superfamily. The work will be of interest to scientists working on channel biophysics but also drug development targeting ligand-gated ion channels.

      Public Reviews:

      All reviewers agreed that these results are solid and interesting. However, reviewers also raised several concerns about the interpretation of the data and some other aspects related to data analysis and discussion that should be addressed by the authors. Essential revisions should include:

      (1) Please try to explicitly distinguish between a closed pore and a resting or desensitized state of the pore, to help in clarity.

      (2) Add quantification of VCF data (e.g. sensor current kinetics, as suggested by reviewer #2) or better clarify/discuss the VCF quantitative aspects that are taken into account to reach some conclusions (reviewer #3).

      (3) Review and add relevant foundational work relevant to this study that is not adequately cited.

      (4) Revise the text according to all recommendations raised by the reviewers and listed in the individual reviews below.

      We have revised the text to address all four points. See the answers to referees’ recommendations.

      Reviewer #1 (Public Review):

      Summary:

      This study brings new information about the function of serotonin-gated ion channels 5-HT3AR, by describing the conformational changes undergoing during ligands binding. These results can be potentially extrapolated to other members of the Cys-loop ligand-gated ion channels. By combining fluorescence microscopy with electrophysiological recordings, the authors investigate structural changes inside and outside the orthosteric site elicited by agonists, partial agonists, and antagonists. The results are convincing and correlate well with the observations from cryo-EM structures. The work will be of important significance and broad interest to scientists working on channel biophysics but also drug development targeting ligand-gated ion channels.

      Strengths:

      The authors present an elegant and well-designed study to investigate the conformational changes on 5-HT3AR where they combine electrophysiological and fluorometry recordings. They determined four positions suitable to act as sensors for the conformational changes of the receptor: two inside and two outside the agonist binding site. They make a strong point showing how antagonists produce conformational changes inside the orthosteric site similarly as agonists do but they failed to spread to the lower part of the ECD, in agreement with previous studies and Cryo-EM structures. They also show how some loss-of-function mutant receptors elicit conformational changes (changes in fluorescence) after partial agonist binding but failed to produce measurable ionic currents, pointing to intermediate states that are stabilized in these conditions. The four fluorescence sensors developed in this study may be good tools for further studies on characterizing drugs targeting the 5-HT3R.

      Weaknesses:

      Although the major conclusions of the manuscript seem well justified, some of the comparison with the structural data may be vague. The claim that monitoring these silent conformational changes can offer insights into the allosteric mechanisms contributing to signal transduction is not unique to this study and has been previously demonstrated by using similar techniques with other ion channels.

      The referee emphasizes that “some of the comparison with the structural data may be vague”. To better illustrate the structural reorganizations seen in the cryo-EM structures and that are used for VCF data interpretation, we added a new supplementary figure 3. It shows a superimposition of Apo, setron and 5-HT bond structures, with reorganization of loop C and Cys-loop consistent with VCF data.

      Reviewer #2 (Public Review):

      Summary:

      This study focuses on the 5-HT3 serotonin receptor, a pentameric ligand-gated ion channel important in chemical neurotransmission. There are many cryo-EM structures of this receptor with diverse ligands bound, however assignment of functional states to the structures remains incomplete. The team applies voltage-clamp fluorometry to measure, at once, both changes in ion channel activity, and changes in fluorescence. Four cysteine mutants were selected for fluorophore labeling, two near the neurotransmitter site, one in the ECD vestibule, and one at the ECD-TMD junction. Agonists, partial agonists, and antagonists were all found to yield similar changes in fluorescence, a proxy for conformational change, near the neurotransmitter site. The strength of the agonist correlated to a degree with propagation of this fluorescence change beyond the local site of neurotransmitter binding. Antagonists failed to elicit a change in fluorescence in the vestibular the ECD-TMD junction sites. The VCF results further turned up evidence supporting intermediate (likely pre-active) states.

      Strengths:

      The experiments appear rigorous, the problem the team tackles is timely and important, the writing and the figures are for the most part very clear. We sorely need approaches orthogonal to structural biology to annotate conformational states and observe conformational transitions in real membranes- this approach, and this study, get right to the heart of what is missing.

      Weaknesses:

      The weaknesses in the study itself are overall minor, I only suggest improvements geared toward clarity. What we are still missing is application of an approach like this to annotate the conformation of the part of the receptor buried in the membrane; there is important debate about which structure represents which state, and that is not addressed in the current study.

      Reviewer #3 (Public Review):

      Summary:

      The authors have examined the 5-HT3 receptor using voltage clamp fluorometry, which enables them to detect structural changes at the same time as the state of receptor activation. These are ensemble measurements, but they enable a picture of the action of different agonists and antagonists to be built up.

      Strengths:

      The combination of rigorously tested fluorescence reporters with oocyte electrophysiology is a solid development for this receptor class.

      Weaknesses:

      The interpretation of the data is solid but relevant foundational work is ignored. Although the data represent a new way of examining the 5-HT3 receptor, nothing that is found is original in the context of the superfamily. Quantitative information is discussed but not presented.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Here are some suggestions that may help to improve the manuscript: - Page 6, point 2), typo: "L131W is positioned more profound in each ECD, its side chain (...)"

      “profound” have been corrected into “profoundly”

      • Fig 1C: Why not compare 5-HT responses for the four sensors studied? If the reason is the low currents elicited by 5-HT on I160C/Y207W sensor, could you comment on this effect that is not observed for the other full agonist tested (mCPBG)?

      The point of this figure (Fig 1G) is to show currents that desensitize to follow the evolution of the fluorescence signal during desensitization, that’s why for the I160C/Y207W sensor where 5-HT become a partial agonist we have judge more appropriate to use mCPBG acting as a more potent agonist to elicit currents with clear desensitization component. We have added a sentence in the legend of the figure to explain this choice more clearly.

      • Page 9, paragraph 2: "However, concentration-response curves on V106C/L131W show a small yet visible decorrelation of fluorescence and current (...)" Statistical analysis on EC50c and EC50f will help to see this decorrelation.

      Statistical analysis (unpaired t test) has been added to figure 3 panel A.

      • Page 10, paragraph 1: the authors describe how "different antagonists promote different degrees of local conformational changes". Does it have any relation to the efficacy or potency of these antagonists? Is there any interpretation for this result?

      Since setrons are competitive antagonists, the concept of efficacy of these molecules is unclear. Concerning potency, no correlation between affinity and fluorescence variation is observed. For instance, ondansetron and alosetron bind with similar nanomolar affinity to the 5-HT3R (Thompson & Lummis Curr Pharm Des. 2006;12(28):3615-30) but elicit different fluorescence variations on both S204C and I160C/Y207W sensors.

      • Fig. 1 panel A, graph to far right: axis label is cut ("current (uA)/..."). Colors of graph A - right are not clearly distinguishable e.g. cyan from green.

      The fluorescent green color that describes the mutant has been changed into limon color which is more clearly distinguishable from cyan.

      • Why is R219C/F142W not selected in the study? Are the signals comparable to the chosen R219C/F142W?

      We have chosen not to select R219C/F142W because the current elicited by this construct was lower than the current elicited by the construct R219C/Y140W. Moreover, the residue F142 belongs to the FPF motif from Cys-loop that is essential for gating (Polovinkin et al, 2018, Nature).

      • Fig. 1 legend typo: "mutated in tryptophan”

      “in” has been changed by “into”

      • Fig. 2: yellow color (graphs in panel B) is very hard to read.

      Yellow color has been darkened to yellow/brown to allow easy reading.

      • Fig. 4 is too descriptive and undermines the information of the study. It could be improved e.g. by representing specific structures or partial structures involved. As an additional minor comment, some colors in the figure are hard to differentiate, e.g. magenta and purple.

      We have added relevant specific structures involved, namely loop C, the Cys-loop and pre-M1 loop to clarify. The intensity of magenta and purple has been increased to help differentiate the two sensor positions.

      • Fig S1C: it is confusing to see the same color pattern for the single mutants without the W. I would recommend to label each trace to make it clearer.

      Labelling of the traces corresponding to the single mutants has been added.

      • Fig S2: Indicating the statistical significance in the graph for the mutants with different desensitization properties compared to the WT receptor will help its interpretation.

      The statistical significance of the difference in the desensitization properties has been added to Figure S2.

      Reviewer #2 (Recommendations For The Authors):

      Overall comments for the authors:

      Selection of cysteine mutants and engineered Trp sites is clear and logical. VCF approach with controls for comparing the functionality of WT vs. mutants, and labeled with unlabeled receptor, is well explained and satisfying. The finding that desensitization involves little change in ECD conformation makes sense. It is somewhat surprising, at least superficially, to find that competitive antagonists promote changes in fluorescence in the same 'direction' and amplitude as strong agonists, however, this is indeed consistent with the structural biology, and with findings from other groups testing different labeling sites. Importantly, the team finds that antagonist-binding changes in deltaF do not spread beyond the region near the neurotransmitter site. The finding that most labeling sites in the ECD, in particular those not in/near the neurotransmitter site, fail to report measurable fluorescence changes, is noteworthy. It contrasts with findings in GlyR, as noted by the authors, and supports a mechanism where most of each subunit's ECD behaves as a rigid body.

      Specific questions/comments:

      I am confused about the sensor current kinetics. Results section 2) states that all sensors share the same current desensitization kinetics, while Results section 5) states that the ECD-TMD site and the vestibule site sensors exhibit faster desensitization. SF1C, right-most panel of R219C suggests the mutation and/or labeling here dramatically changes apparent activation and deactivation rates measured by TEVC. Both activation and deactivation upon washout appear faster in this one example. Data for desensitization are not shown here but are shown in aggregate in earlier panels. It is a bit surprising that activation and deactivation would both change but no effect on desensitization. Indeed, it looks like, in Fig. 1G, that desensitization rate is not consistent across all constructs. Can you please confirm/clarify?

      TEVC and VCF recordings in this study show a significant variability concerning both the apparent desensitization and desactivation kinetics. This is illustrated concerning desensitization in TEVC experiments in figure S2, where the remaining currents after 45 secondes of 5-HT perfusion and the rate constants of desensitization are measured on different oocytes from different batches. Therefore, the differences in desensitization kinetics shown in fig 1.G are not significant, the aim of the figure being solely to illustrate that no variation of fluorescence is observed during the desensitization phase. A sentence in the legend of fig 1.G has been added to precise this point. We also revised the first paragraph of result section 5, clearly stating that the slight tendency of faster desensitization of V106C/L131W and R219C/Y140W sensors is not significant.

      An alternative to the conclusion-like title of Results section 2) is that the ECD (and its labels) does not undergo notable conformational changes between activated and desensitized states.

      This is a good point and we have added a sentence at the end of results section 2 to present this idea.

      I find the discussion paragraph on partial agonist mechanisms, starting with "However," to be particularly important but at times hard to follow. Please try to revise for clarity. I am particularly excited to understand how we can understand/improve assignments of cryo-EM structures using the VCF (or other) approaches. As examples of where I struggled, near the top of p. 11, related to the partial agonist discussion, there is an assumption about the pore being either activated, or resting. Is it not also possible that partial agonists could stabilize a desensitized state of the pore? Strictly speaking, the labeling sites and current measurements do not distinguish between pre-active resting and desensitized channel conformations/states. However, the cryo-EM structures can likely help fill in the missing information there- with all the normal caveats. Please try to explicitly distinguish between a closed pore and a resting or desensitized state of the pore, to help in clarity.

      We have revised the section, and hope it is clearer now. We notably state more explicitly the argument for annotation of partial agonist bound closed structures as pre-active, mainly from kinetic consideration of VCF experiments. We also mention and cite a paper by the Chakrapani group published the 4th of January 2024 (Felt et al, Nature Communication), where they present the structures of the m5HT3AR bound to partial agonists, with a set of conformations fully consistent with our VCF data.

      This statement likely needs references: "...indirect experiments of substituted cysteine accessibility method (SCAM) and VCF experiments suggested that desensitization involves weak reorganizations of the upper part of the channel that holds the activation gate, arguing for the former hypothesis."

      Reference Polovinkin et al, Nature, 2018, has been added.

      I respectfully suggest toning down this language a little bit: "VCF allowed to characterize at an unprecedented resolution the mechanisms of action of allosteric effectors and allosteric mutations, to identify new intermediate conformations and to propose a structure-based functional annotation of known high-resolution structures." This VCF stands strongly without unclear claims about unprecedented resolution. What impresses me most are the findings distinguishing how agonists/partial agonists/antagonists share a conserved action in one area and not in another, the observations consistent with intermediate states, and the efforts to integrate these simultaneous current and conformation measurements with the intimidating array of EM structures.

      We thank the referee for his positive comments. We have removed “unprecedented resolution” and revised the sentences.

      It is beyond the scope of the current study, but I am curious what the authors think the hurdles will be to tracking conformation of the pore domain- an area where non-cryo-EM based conformational measurements are sorely needed to help annotate the EM structures.

      We fully agree with the referee that structures of the TMD are very divergent between the various conditions depending on the membrane surrogate. We are at the moment working on this region by VCF, incorporating the fluorescent unnatural amino acid ANAP.

      Minor:

      (1) P. 5, m5-HT3R: Please clarify that this refers to the mouse receptor, if that is correct.

      OK, “mouse” has been added.

      (2) Fig. 1D, I suggest moving the 180-degree arrow to the right so it is below but between the two exterior and vestibular views.

      Ok, it has been done.

      (3) Please add a standard 2D chemical structure of MTS-TAMRA, and TAMRA attached to a cysteine, to Fig 1.

      A standard chemical structure has been added for the two isomers of MTS-TAMRA.

      (4) Please label subpanels in Fig. 1G with the identity of the label site.

      The subpanels have been labelled.

      Reviewer #3 (Recommendations For The Authors):

      This is solid work but I mainly have suggestions about placing it in context.

      (1) Abstract "Data show that strong agonists promote a concerted motion of all sensors during activation, "

      The concept of sensors here is the fluorescent labels? I did not find this meaningful until I read the significance statement.

      We have specified “fluorescently-labelled” before sensors in the abstract.

      (2) p4 "each subunit in the 5-HT3A pentamer...." this description would be identical for any pentameric LGIC so the authors should beware of a misleading specificity. This goes for other phrases in this paragraph. However, the summary of the 5HT specific results is very good.

      About the description of the structure, we added “The 5-HT3AR displays a typical pLGIC structure, where….”.

      (3) This paper is very nicely put together and generally explains itself well. The work is rigorous and comprehensive. But the meaning of quenching (by local Trp) seems straightforward, but it is not made explicit in the paper. Why doesn't simple labelling (single Cys) at this site work? And can we have a more direct demonstration of the advantage of including the Trp (not in the supplementary figure?) All this information is condensed into the first part of figure 1 (the graph in Figure 1A). Figure 1 could be split and the principle of the introduced quenching could be more clearly shown

      detailed in a few more sentences the principle of the TrIQ approach. In addition, to be more explicit, the significative differences of fluorescence comparing sensors with and without tryptophan have been added in Figure 1, panel screening and a sentence have been added in the legend of this figure.

      (4) p10 "VCF measurements are also remarkably coherent with the atomic structures showing an open pore (so called F, State 2 and 5-HT asymmetric states), "

      This statement is intriguing. What do these names or concepts represent? Are they all the same thing? Where do the names come from? What is meant here? Three different concepts, all consistent? Or three names for the same concept?

      We have tried to clarify the statement by making reference to the PDB of the structures.

      (5) "Fluorescence and VCF studies identified similar intermediate conformations for nAChRs, ⍺1-GlyRs and the bacterial homolog GLIC(21,32-35). "

      Whilst this is true, the motivation for such ideas came from earlier work identifying intermediates from electrophysiology alone (such as the flip state (Burzomato et al 2004), the priming state (Mukhatsimova 2009) and the conformational wave in ACh channels grosman et al 2000). It would be appropriate to mention some of this earlier work.

      We have incorporated and described these references in the discussion. Of note, we fully quoted these references in our previous papers on the subject (Menny 2017, Lefebvre 2021, Shi 2023), but the referee is right in asking to quote them again.

      (6) "A key finding of the study is the identification of pre-active intermediates that are favored upon binding of partial agonists and/or in the presence of loss-of-function mutations. "

      Even more fundamental, the idea of a two-state equilibrium for neurotransmitter receptors was discarded in 1957 according to the action of partial agonists.

      DEL CASTILLO J, KATZ B (1957) Interaction at end-plate receptors between different choline derivatives. Proc R Soc Lond B Biol Sci

      So to discover this "intermediate" - that is, bound but minimal activity - in the present context seems a bit much. It is a big positive of this paper that the results are congruent with our expectations, but I cannot see value in posing the results as an extension of the 2-state equilibrium (for which there are anyway other objections).

      As for intermediates being favoured by loss of function mutations, this concept is already well established in glycine receptors (Plested et al 2007, Lape et al 2012) and doubtless in other cases too.

      I do get the point that the authors want to establish a basis in 5-HT3 receptors, but these previous works suggest the results are somewhat expected. This should be commented on.

      We also agree. We replace “key finding” by “key observation”, quote most of the references proposed, and explicitly conclude that “The present work thus extends this idea to the 5HT3AR, together with providing structural blueprints for cryo-EM structure annotation”.

      (7) "In addition, VCF data allow a quantitative estimate of the complex allosteric action of partial agonists, that do not exclusively stabilize the active state and document the detailed phenotypes of various allosteric mutations."

      Where is this provided? If the authors are not motivated to do this, I have some doubts that others will step in. If it is not worth doing, it's probably not worth mentioning either.

      Language has been toned down by “In addition, VCF data give insights in the action of partial agonists, that do not exclusively stabilize the active state and document the phenotypes of various allosteric mutations."

      (8) Figure 1G please mark which construct is which.

      This has been added into Figure 1G

    1. Author Response

      Provisional response

      We would like to thank the reviewers for taking the time to review our manuscript, for providing useful suggestions for improvement, and for highlighting the significance of our approach.

      Reviewer #1 (Public Review):

      Summary:

      The authors demonstrate that it is possible to carry out eQTL experiments for the model eukaryote S. cerevisiae, in "one pot" preparations, by using single-cell sequencing technologies to simultaneously genotype and measure expression. This is a very appealing approach for investigators studying genetic variation in single-celled and other microbial systems, and will likely inspire similar approaches in non-microbial systems where comparable cell mixtures of genetically heterogeneous individuals could be achieved.

      Strengths:

      While eQTL experiments have been done for nearly two decades (the corresponding author's lab are pioneers in this field), this single-cell approach creates the possibility for new insights about cell biology that would be extremely challenging to infer using bulk sequencing approaches. The major motivating application shown here is to discover cell occupancy QTL, i.e. loci where genetic variation contributes to differences in the relative occupancy of different cell cycle stages. The authors dissect and validate one such cell cycle occupancy QTL, involving the gene GPA1, a G-protein subunit that plays a role in regulating the mating response MAPK pathway. They show that variation at GPA1 is associated with proportional differences in the fraction of cells in the G1 stage of the cell cycle. Furthermore, they show that this bias is associated with differences in mating efficiency.

      We thank the reviewer for recognizing the strengths of our overall approach and our dissection of the functional consequences of the W82R variant of GPA1.

      Weaknesses:

      While the experimental validation of the role of GPA1 variation is well done, the novel cell cycle occupancy QTL aspect of the study is somewhat underexploited. The cell occupancy QTLs that are mentioned all involve loci that the authors have identified in prior studies that involved the same yeast crosses used here. It would be interesting to know what new insights, besides the "usual suspects", the analysis reveals. For example, in Cross B there is another large effect cell occupancy QTL on Chr XI that affects the G1/S stage. What candidate genes and alleles are at this locus?

      We thank the reviewer for this suggestion. We plan to expand the section on cell cycle occupancy QTL in our revision.

      And since cell cycle stages are not biologically independent (a delay in G1, could have a knock-on effect on the frequency of cells with that genotype in G1/S), it would seem important to consider the set of QTLs in concert.

      We thank the reviewer for this suggested clarification. In our revision, we will clarify that the cell cycle occupancy phenotype represents the proportion of cells assigned to a given stage. As the reviewer correctly notes, a change in the proportion of cells in one stage may alter the proportion of cells in other stages, and this could result in cell cycle occupancy QTL for multiple stages. We will make efforts to consider the cell cycle occupancy QTLs in concert in the revised manuscript.

      Reviewer #2 (Public Review):

      Boocock and colleagues present an approach whereby eQTL analysis can be carried out by scRNA-Seq alone, in a one-pot-shot experiment, due to genotypes being able to be inferred from SNPs identified in RNA-Seq reads. This approach obviates the need to isolate individual spores, genotype them separately by low-coverage sequencing, and then perform RNA-Seq on each spore separately. This is a substantial advance and opens up the possibility to straightforwardly identify eQTLs over many conditions in a cost-efficient manner. Overall, I found the paper to be well-written and well-motivated, and have no issues with either the methodological/analytical approach (though eQTL analysis is not my expertise), or with the manuscript's conclusions.

      We thank the reviewer for recognizing the significant contributions our work makes to the field.

      393 segregant experiment:

      For the experiment with the 393 previously genotyped segregants, did the authors examine whether averaging the expression by genotype for single cells gave expression profiles similar to the bulk RNA-Seq data generated from those genotypes? Also, is it possible (and maybe not, due to the asynchronous nature of the cell culture) to use the expression data to aid in genotyping for those cells whose genotypes are ambiguous? I presume it might be if one has a sufficient number of cells for each genotype, though, for the subsequent one-pot experiments, this is a moot point.

      We thank the reviewer for this comment. While we could expand the analysis along these lines, this is not relevant for the subsequent one-pot eQTL experiments, as the reviewer notes, and is therefore beyond the scope of the manuscript. We will make the data available so that anyone interested can try these analyses.

      Figure 1B:

      Is UMAP necessary to observe an ellipse/circle - I wouldn't be surprised if a simple PCA would have sufficed, and given the current discussion about whether UMAP is ever appropriate for interpreting scRNA-Seq (or ancestry) data, it seems the PCA would be a preferable approach. I would expect that the periodic elements are contained in 2 of the first 3 principal components. Also, it would be nice if there were a supplementary figure similar to Figure 4 of Macosko et al (PMID 26000488) to indeed show the cell cycle dependent expression.

      We thank the reviewer for this comment. We too have been following the debate on the utility of UMAP for scRNA-seq, and in our revision we will provide an alternative visualization of the cell cycle. We will also generate a supplementary figure similar to Figure 4 of Macosko et al. to visualize cell-cycle-dependent gene expression.

      Aging, growth rate, and bet-hedging:

      The mention of bet-hedging reminded me of Levy et al (PMID 22589700), where they saw that Tsl1 expression changed as cells aged and that this impacted a cell's ability to survive heat stress. This bet-hedging strategy meant that the older, slower-growing cells were more likely to survive, so I wondered a couple of things. It is possible from single-cell data to identify either an aging, or a growth rate signature? A number of papers from David Botstein's group culminated in a paper that showed that they could use a gene expression signature to predict instantaneous growth rate (PMID 19119411) and I wondered if a) this is possible from single-cell data, and b) whether in the slower growing cells, they see markers of aging, whether these two signatures might impact the ability to detect eQTLs, and if they are detected, whether they could in some way be accounted for to improve detection.

      We thank the reviewer for this comment and suggested analyses. We are not sure whether one can see gene expression signatures of aging in yeast scRNA-seq data. We believe that such analyses are beyond the scope of this work, but we will make the data available so that anyone interested can try them.

      AIL vs. F2 segregants:

      I'm curious if the authors have given thought to the trade-offs of developing advanced intercross lines for scRNA-Seq eQTL analysis. My impression is that AIL provides better mapping resolution, but at the expense of having to generate the lines. It might be useful to see some discussion on that.

      We thank the reviewer for their comment. We will include some discussion of the trade-offs of different experimental designs in our revision.

      10x vs SPLit-Seq

      10x is a well established, but fairly expensive approach for scRNA-Seq - I wondered how the cost of the 10x approach compares to the previously used approach of genotyping segregants and performing bulk RNA-Seq, and how those costs would change if one used SPLiT-Seq (see PMID 38282330).

      We will provide some ballpark estimates of the costs, and we will discuss the trade-offs of different scRNA-seq technologies in our revision

    1. Author Response

      The following is the authors’ response to the original reviews.

      We would like to thank the reviewers for their insightful comments and recommendations. We have extensively revised the manuscript in response to the valuable feedback. We believe the results is a more rigorous and thoughtful analysis of the data. Furthermore, our interpretation and discussion of the findings is more focused and highlights the importance of the circuit and its role in the response to stress. Thank you for helping to improve the presented science.

      Key changes made in response to the reviewers comments include:

      • Revision of statistical analyses for nearly all figures, with the addition of a new table of summary statistics to include F and/or t values alongside p-values.

      • Addition of statistical analyses for all fiber photometry data.

      • Examination of data for possible sex dependent effects.

      • Clarification of breeding strategies and genotype differences, with added details to methods to improve clarity.

      • Addressing concerns about the specificity of virus injections and the spread, with additional details added to methods.

      • Modification of terminology related to goal-directed behavior based on reviewer feedback, including removal of the term from the manuscript.

      • Clarification and additional data on the use of photostimulation and its effects, including efforts to inactivate neurons for further insight, despite technical challenges.

      • Correction of grammatical errors throughout the manuscript.

      Reviewer 1:

      Despite the manuscript being generally well-written and easy to follow, there are several grammatical errors throughout that need to be addressed.

      Thank you for highlighting this issue. Grammatical errors have been fixed in the revised version of the manuscript.

      Only p values are given in the text to support statistical differences. This is not sufficient. F and/or t values should be given as well.

      In response to this critique and similar comments from Reviewer 2, we re-evaluated our approach to statistical analyses and extensively revised analyses for nearly all figures. We also added a new table of summary statistics (Supplemental Table 1) containing the type of analysis, statistic, comparison, multiple comparisons, and p value(s). For Figures 4C-E, 5C, 6C-E, 7H-I, and 8H we analyzed these data using two-way repeated measures (RM) ANOVA that examined the main effect of time (either number of sessions or stimulation period) in the same animal and compared that to the main effect of genotype of the animal (Cre+ vs Cre-), and if there was an interaction. For Supplemental Figure 7A we also conducted a two-way RM ANOVA with time as a factor and activity state (number of port activations in active vs inactive nose port) as the other in Cre+ mice. For Figures 5D-E we conducted a two-way mixed model ANOVA that accounted and corrected for missing data. In figures that only compared two groups of data (Figures 5F-L, 6F, 8C-D, 8I, and Supp 6F-G) we used two-tailed t-test for the analysis. If our question and/or hypothesis required us to conduct multiple comparisons between or within treatments, we conducted Bonferroni’s multiple comparisons test for post hoc analysis (we note which groups we compared in Supplemental Table 1). For figures that did or did not show a change in calcium activity (Figure 3G, 3I-K, 7B, 7D-E, 8E-F), we compared waveform confidence intervals (Jean-Richard-Dit-Bressel, Clifford, McNally, 2020). The time windows we used as comparison are noted in Supplemental Table 1, and if the comparisons were significant at 95%, 99%, and 99.9% thresholds.

      None of prior comparisons in prior analyses that were significant were found to have fallen below thresh holds for significance. Of those found to be not significantly different, only one change was noted. In Figure 6E there was now a significant baseline difference between Cre+ and Cre- mice with Cre- mice taking longer to first engage the port compared to Cre+ mice (p=0.045). Although the more rigorous approach the statistical analyses did not change our interpretations we feel the enhanced the paper and thank the reviewer for pushing this improvement.

      Moreover, the fibre photometry data does not appear to have any statistical analyses reported - only confidence intervals represented in the figures without any mention of whether the null hypothesis that the elevations in activity observed are different from the baseline.

      This is particularly important where there is ambiguity, such as in Figure 3K, where the spontaneous activity of the animal appears to correlate with a spike in activity but the text mentions that there is no such difference. Without statistics, this is difficult to judge.

      Thank you for highlighting this critical point and providing an opportunity to strengthen our manuscript. We added statistical analyses of all fiber photometry data using a recently described approach based on waveform confidence intervals (Jean-Richard-Dit-Bressel, Clifford, McNally, 2020). In the statistical summary (Supplemental Table 1) we note the time window that we used for comparison in each analysis and if the comparisons were significant at 95%, 99%, and 99.9% thresholds. Thank you from highlighting this and helping make the manuscript stronger.

      With respect to Figure 3K, we are not certain we understood the spike in activity the reviewer referred to. Figure 3J and K include both velocity data (gold) and Ca2+ dependent signal (blue). We used episodes of velocity that were comparable to the avoidance respond during the ambush test and no significant differences in the Ca2+ signal when gating around changes in velocity in the absence of stressor (Supplemental Table1). This is in contrast to the significant change in Ca2+ signal following a mock predator ambush (Figure 3J). We interpret these data together to indicate that locomotion does not correlate with an increase in calcium activity in SuMVGLUT2+::POA neurons, but that coping to a stressor does. This conclusion is further examined in supplemental Figure 5, including examining cross-correlation to test for temporally offset relationship between velocity and Ca2+ signal in SUMVGLUT2+::POA neurons.

      The use of photostimulation only is unfortunate, it would have been really nice to see some inactivation of these neurons as well. This is because of the well-documented issues with being able to determine whether photostimulation is occurring in a physiological manner, and therefore makes certain data difficult to interpret. For instance, with regards to the 'active coping' behaviours - is this really the correct characterisation of what's going on? I wonder if the mice simply had developed immobile responding as a coping strategy but when they experience stimulation of these neurons that they find aversive, immobility is not sufficient to deal with the summative effects of the aversion from the swimming task as well as from the neuronal activation? An inactivation study would be more convincing.

      We agree with the point of the reviewer, experiments demonstrating necessity of SUMVGLUT2+::POA neurons would have added to the story here. We carried out multiple experiments aimed at addressing questions about necessity of SuMVGLUT2+::POA neurons in stress coping behaviors, specifically the forced swim assay. Efforts included employing chemogenetic, optogenetic, and tetanus toxin-based methods. We observed no effects on locomotor activity or stress coping. These experiments are both technically difficult and challenging to interpret. Interpretation of negative results, as we obtained, is particularly difficult because of potential technical confounds. Selective targeting of SuMVGLUT2+::POA neurons for inhibition requires a process requiring three viral injections and two recombination steps, increasing variability and reducing the number of neurons impacted. Alternatively, photoinhibition targeting SuMVGLUT2+::POA cells can be done using Retro-AAV injected into POA and a fiber implant over SuM. We tried both approaches. Data obtained were difficult to interpret because of questions about adequate coverage of SuMVGLUT2+::POA population by virally expressed constructs and/or light spread arose. The challenge of adequate coverage to effectively prevent output from the targeted population is further confounded by challenges inherent in neural inhibition, specifically determining if the inhibition created at the cellular level is adequate to block output in the context of excitatory inputs or if neurons must be first engaged in a particular manner for inhibition to be effective. Baseline neural activity, release probability, and post-synaptic effects could all be relevant, which photo-inhibition will potentially not resolve. So, while the trend is to always show “necessary and sufficient” effects, we’ve tried nearly everything, and we simply cannot conclude much from our mixed results. There are also wellestablished problems with existing photo-inhibition methods, which while people use them and tout them, are often ignored. We have a lot of expertise in photo-inhibition optogenetics, and indeed have used it with some success, developed new methods, yet in this particular case we are unable to draw conclusions related to inhibition. People have experienced similar challenges in locus coeruleus neurons, which have very low basal activity, and inhibition with chemogenetics is very hard, as well as with optogenetic pump-based approaches, because the neurons fire robust rebound APs. We have spent almost 2.5 years trying to get this to work in this circuit because reviews have been insistent on this result for the paper to be conclusive. Unfortunately, it simply isn’t possible in our view until we know more about the cell types involved. This is all in spite of experience using the approach in many other publications.

      We also employed less selective approaches, such as injecting AAV-DIO-tetanus toxin light chain (Tettox) constructs directly into SuM VGLUT2-Cre mice but found off target effects impacting animal wellbeing and impeding behavioral testing due viral spread to surrounding areas.

      While we are disappointed for being unable to directly address questions about necessity of SuMVGLUT2+::POA neurons in active coping with experimental data, we were unable to obtain results allowing for clear interpretation across numerous other domains the reviewers requested. We also feel strongly that until we have a clear picture of the molecular cell type architecture in the SuM, and Cre-drivers to target subsets of neurons, this question will be difficult to resolve for any group. We are working now on RNAseq and related spatial transcriptomics efforts in the SuM and examining additional behavioral paradigm to resolve these issues, so stay tuned for future publications.

      Accordingly, we avoid making statements relating to necessity in the manuscript. In spite of having several lines of physiological data with strong robust correlations behavior related to the SuMVGLUT2+::POA circuit.

      Nose poke is only nominally instrumental as it cannot be shown to have a unique relationship with the outcome that is independent of the stimuli-outcome relationships (in the same way that a lever press can, for example). Moreover, there is nothing here to show that the behaviours are goal-directed.

      Thank you for highlighting this point. Regarding goal-direct terminology, we removed this terminology from the manuscript. Since the mice perform highly selective (active vs inactive) port activation robustly across multiple days of training the behavior likely transitions to habitual behavior. We only tested the valuation of stimuli termination of the final day of training with time limited progressive ratio test. With respect to lever press versus active port activation, we are unclear how using a lever in this context would offer a different interpretation. Lever pressing may be more sensitive to changes in valuation when compared to nose poke port activation (Atalayer and Rowland 2008); however, in this study the focus of the operant behavior is separating innate behaviors for learned action–outcome instrumental learned behaviors for threat response (LeDoux and Daw 2018). The robust highly selective activation of the active port illustrated in Figure 6 fits as an action–outcome instrumental behavior wherein mice learn to engage the active but not inactive port to terminate photostimulation. The first activation of the port occurs through exploration of the arena but as demonstrated by the number of active port activations and the decline in time of the first active port engagement, mice expressing ChR2eYFP learn to engage the port to terminate the stimulation. To aid in illustrating this point we have added Supplemental Figure 7 showing active and inactive port activations for both Cre+ and Cre- mice. This adds clarity to high rate of selective port activation driven my stimulation of SUMVGLUT2+::POA neurons compared to controls. The elimination of goal directed and providing additional data narrows and supports one of the key points of the operant experiment.

      With regards to Figure 1: This is a nice figure, but I wonder if some quantification of the pathways and their density might be helpful, perhaps by measuring the intensity of fluorescence in image J (as these are processes, not cell bodies that can be counted)? Mind you, they all look pretty dense so perhaps this is not necessary! However, because the authors are looking at projections in so-called 'stress-engaged regions', the amygdala seems conspicuous by its absence. Did the authors look in the amygdala and find no projections? If so it seems that this would be worth noting.

      This is an interesting question but has proven to be a very technically challenging question. We consulted with several leaders who routinely use complimentary viral tracing methods in the field. We were unable to devise a method to provide a satisfactorily meaningful quantitative (as opposed to qualitative) approach to compare SUMVGLUT2+::POA to SuMVGLUT2+ projections. A few limitations are present that hinder a meaningful quantitative approach. One limitation was the need for different viral strategies to label the two populations. Labeling SuMVGLUT2+::POA neurons requires using VGLUT2-Flp mice with two injections into the POA and one into SuM. Two recombinase steps were required, reducing efficiency of overlap. This combination of viral injections, particularly the injections of RetroAAVs in the POA, can induce significant quantitative variability due to tropism, efficacy, and variability of retro-viral methods, and viral infection generally. These issues are often totally ignored in similar studies across the “neural circuit” landscape, but it doesn’t make them less relevant here.

      Although people do this in the field, and show quantification, we actually believe that it can be a quite misleading read-out of functionally relevant circuitry, given that neurotransmitter release ultimately is amplified by receptors post-synaptically, and many examples of robust behavioral effects have been observed with low fiber tracing complimentary methods (McCall, Siuda et al. 2017). In contrast, the broader SuMVGLUT2+ population was labeled using a single injection into the SuM. This means there like more efficient expression of the fluorophore. Additionally, in areas that contain terminals and passing fibers understanding and interpreting fluorescent signal is challenging. Together, these factors limit a meaningful quantitative comparison and make an interpretation difficult to make. In this context, we focused on a conservative qualitative presentation to demonstrate two central points. That 1) SuMVGLUT2+::POA neurons are subset of SuMVGLUT2+ neurons that project to specific areas and that exclude dentate gyrus, and they 2) arborize extensively to multiple areas which have be linked to threat responses. We agree that there is much to be learned about how different populations in SuM connect to targets in different regions of the brain and to continue to examine this question with different techniques. A meaningful quantitative study comparing projections is technically complex and, we feel, beyond our ability for this study.

      Also, for the reasons above we do not believe that quantification provides exceptional clarity with respect to the putative function of the circuit, glutamate released, or other cotransmitters given known amplification at the post-synaptic side of the circuit.

      With regard to the amygdala, other studies on SuM projections have found efferent projections to amygdala (Ottersen, 1980; Vertes, 1992). In our study we were unable to definitively determine projections from SuMVGLUT2+::POA neurons to amygdala, which if present are not particularly dense. For this reason we were conservative and do not comment on this particular structure.

      I would suggest removing the term goal-directed from the manuscript and just focusing on the active vs. passive distinction.

      We removed the use of goal-directed. Thank you for helping us clarify our terminology.

      The effect observed in Figure 7I is interesting, and I'm wondering if a rebound effect is the most likely explanation for this. Did the authors inhibit the VGAT neurons in this region at any other times and observe a similar rebound? If such a rebound was not observed it would suggest that it is something specific about this task that is producing the behaviour. I would like it if the authors could comment on this.

      We agree that results showing the change in coping strategy (passive to active) in forced swim after but not during stimulation of SuMVGAT+ neurons is quite interesting (Figure 7I). This experiment activated SuMVGAT+ neurons during a section of the forced swim assay and mice showed a robust shift to mobility after the stimulation of SuMVGAT+ neurons stopped. We did not carry out inhibition of SuMVGAT+ neurons in this manuscript. As the reviewer suggested, strong inhibition of local SuM neurons, including SUMVGLUT2+::POA neurons, could lead to rebound activity that may shift coping behaviors in confusing ways. We agree this is an interesting idea but do not have data to support the hypothesis further at this time.

      Reviewer 2

      (1) These are very difficult, small brain regions to hit, and it is commendable to take on the circuit under investigation here. However, there is no evidence throughout the manuscript that the authors are reliably hitting the targets and the spread is comparable across experiments, groups, etc., decreasing the significance of the current findings. There are no hit/virus spread maps presented for any data, and the representative images are cropped to avoid showing the brain regions lateral and dorsal to the target regions. In images where you can see the adjacent regions, there appears expression of cell bodies (such as Supp 6B), suggesting a lack of SuM specificity to the injections.

      We agree with the reviewer that the areas studied are small and technically challenging to hit. This was one of driving motivations for using multiple tools in tandem to restrict the area targeted for stimulation. Approaches included using a retrograde AAVs to express ChR2eFYP in SUMVGLUT2+::POA neurons; thereby, restricting expression to VGLUT2+ neurons that project to the POA. Targeting was further limited by placement of the optic fiber over cell bodies on SuM. Thus, only neurons that are VGLUT2+, project to the POA, and were close enough to the fiber were active by photostimulation. Regrettably, we were not able to compile images from mice where the fiber was misplaced leading to loss of behavioral effects. We would have liked to provide that here to address this comment. Unfortunately, generating heat maps for injections is not possible for anatomic studies that use unlabeled recombinase as part of an intersectional approach. Also determining the point of injection of a retroAAV can be difficult to accurately determine its location because neurons remote to injection site and their processes are labeled.

      Experiments described in Supplemental Figure 6B on VGAT neurons in SuM were designed and interpreted to support the point that SUMVGLUT2+::POA neurons are a distinct population that does not overlap with GABAergic neurons. For this point it is important that we targeted SuM, but highly confined targeting is not needed to support the central interpretation of the data. We do see labeling in SuM in VGAT-Cre mice but photo stimulation of SuMVGAT+ neurons does not generate the behavioral changes seen with activation of SUMVGLUT2+::POA neurons. As the reviewer points out, SuM is small target and viral injection is likely to spread beyond the anatomic boundaries to other VGAT+ neurons in the region, which are not the focus here. The activation would be restricted by the spread of light from the fiber over SuM (estimated to be about a 200um sphere in all directions). We did not further examine projections or localization of VGAT+ neurons in this study but focused on the differential behavioral effects of SUMVGLUT2+::POA neurons.

      (2) In addition, the whole brain tracing is very valuable, but there is very little quantification of the tracing. As the tracing is the first several figures and supp figure and the basis for the interpretation of the behavior results, it is important to understand things including how robust the POA projection is compared to the collateral regions, etc. Just a rep image for each of the first two figures is insufficient, especially given the above issue raised. The combination of validation of the restricted expression of viruses, rep images, and quantified tracing would add rigor that made the behavioral effects have more significance.

      For example, in Fig 2, how can one be sure that the nature of the difference between the nonspecific anterograde glutamate neuron tracing and the Sum-POA glutamate neuron tracing is real when there is no quantification or validation of the hits and expression, nor any quantification showing the effects replicate across mice? It could be due to many factors, such as the spread up the tract of the injection in the nonspecific experiment resulting in the labeling of additional regions, etc.

      Relatedly, in Supp 4, why isn’t C normalized to DAPI, which they show, or area? Similar for G what is the mcherry coverage/expression, and why isn’t Fos normalized to that?

      Thank you for highlighting the importance of anatomy and the value of anatomy. Two points based on the anatomic studies are central to our interpretation of the experimental data. First, SUMVGLUT2+::POA are a distinct population within the SuM. We show this by demonstrating they are not GABAergic and that they do not project to dentate gyrus. Projections from SuM to dentate gyrus have been described in multiple studies (Boulland et al., 2009; Haglund et al., 1987; Hashimotodani et al., 2018; Vertes, 1992) and we demonstrate them here for SuMVGLUT2+ cells. Using an intersectional approach in VGLUT2-Flp mice we show SUMVGLUT2+::POA neurons do not project to dentate gyrus. We show cell bodies of SUMVGLUT2+::POA neurons located in SuM across multiple figures including clear brain images. Thus, SUMVGLUT2+::POA neurons are SuM neurons that do not project to dentate gyrus, are not GABAergic, send projections to a distinct subset of targets, most notably excluding dentate gyrus. Second, SUMVGLUT2+::POA neurons arborize sending projections to multiple regions. We show this using a combinatorial genetic and viral approach to restrict expression of eYFP to only neurons that are in SuM (based on viral injection), project to the POA (based on retrograde AAV injection in POA), and VGLUT2+ (VGLUT2-Flp mice). Thus, any eYFP labeled projection comes from SUMVGLUT2+::POA neurons. We further confirmed projections using retroAAV injection into areas identified using anterograde approaches (Supplemental Figure 2). As discussed above in replies to Reviewer 1, we feel limitations are present that preclude meaningful quantitative analysis. We thus opted for a conservative interpretation as outlined.

      Prior studies have shown efferent projections from SuM to many areas, and projections to dentate gyrus have received substantial attention (Bouland et al., 2009; Haglund, Swanson, and Kohler, 1984; Hashimotodani et al., 2018; Soussi et al., 2010; Vertes, 1992; Pan and McNaugton, 2004). We saw many of the same projections from SuMVGLUT2+ neurons. We found no projections from SUMVGLUT2+::POA neurons to dentate gyrus (Figure 2). Our description of SuM projection to dentate gyrus is not new but finding a population of neurons in SuM that does not project to dentate gyrus but does project to other regions in hippocampus is new. This finding cannot be explained by spread of the virus in the tract or non-selective labeling.

      (3) The authors state that they use male and female mice, but they do not describe the n’s for each experiment or address sex as a biological variable in the design here. As there are baseline sex differences in locomotion, stress responses, etc., these could easily factor into behavioral effects observed here.

      Sex specific effects are possible; however, the studies presented here were not designed or powered to directly examine them. A point about experimental design that helps mitigate against strong sex dependent effect is that often the paradigm we used examined baseline (pre-stimulation) behavior, how behavior changed during stimulation, and how behavior returned (or not) to baseline after stimulation. Thus, we test changes in individual behaviors. Although we had limited statistical power, we conducted analyses to examine the effects of sex as variable in the experiments and found no differences among males and females.

      (4) In a similar vein as the above, the authors appear to use mice of different genotypes (however the exact genotypes and breeding strategy are not described) for their circuit manipulation studies without first validating that baseline behavioral expression, habituation, stress responses are not different. Therefore, it is unclear how to interpret the behavioral effects of circuit manipulation. For example in 7H, what would the VGLUT2-Cre mouse with control virus look like over time? Time is a confound for these behaviors, as mice often habituate to the task, and this varies from genotype to genotype. In Fig 8H, it looks like there may be some baseline differences between genotypes- what is normal food consumption like in these mice compared to each other? Do Cre+ mice just locomote and/or eat less? This issue exists across the figures and is related to issues of statistics, potential genotype differences, and other experimental design issues as described, as well as the question about the possibility of a general locomotor difference (vs only stress-induced). In addition, the authors use a control virus for the control groups in VGAT-Cre manipulation studies but do not explain the reasoning for the difference in approach.

      Thank you for highlighting the need for greater clarity about the breeding strategies used and for these related questions. We address the breeding strategy and then move to address the additional concerns raised. We have added details to the methods section to address this point. For VGLUT2-Cre mice we use litter mates controls from Cre/WT x WT/WT cross. The VGLUT2-Cre line (RRID:IMSR_JAX:028863) (Vong L , et al. 2011) used here been used in many other reports. We are not aware of any reports indicating a phenotype associated with the addition of the IRES-Cre to the Slc17a6 loci and there is no expected impact of expression of VGLUT2. Also, we see in many of the experiments here that the baseline (Figures 4, 5, and 7) behaviors are not different between the Cre+ and Cre- mice. For VGAT-Cre mice we used a different breeding strategy that allowed us to achieve greater control of the composition of litters and more efficient cohorts cohort. A Cre/Cre x WT/WT cross yielded all Cre/WT litters. The AAV injected, ChR2eYFP or eYFP, allowed us to balance the cohort.

      Regarding Figure 7H, which shows time immobile on the second day of a swim test, data from the Cre- mice demonstrate the natural course of progression during the second day of the test. The control mice in the VGAT-Cre cohort (Figure 7I) have similar trend. The change in behavior during the stimulation period in the Cre+ mice is caused by the activation of SUMVGLUT2+::POA neurons. The behavioral shift largely, but not completely, returns to baseline when the photostimulation stops. We have no reason to believe a VGLUT2-Cre+ mouse injected with control AAV to express eYFP would be different from WT littermate injected with AVV expressing ChR2eYFP in a Cre dependent manner.

      Turning to concerns related to 8H, which shows data from fasted mice quantify time spent interacting with food pellet immediately after presentation of a chow pellet, we found no significant difference between the control and Cre+ mice. We unaware of any evidence indicating that the two groups should have a different baseline since the Cre insertion is not expected to alter gene expression and we are unaware of reports of a phenotype relating to feeding and the presence of the transgene in this mouse line. Even if there were a small baseline shift this would not explain the large abrupt shift induced by the photostimulation. As noted above, we saw shifts in behavior abruptly induced by the initiation of photostimulation when compared to baseline in multiple experiments. This shift would not be explained by a hypothetical difference in the baseline behaviors of litter mates.

      (5) The statistics used throughout are inappropriate. The authors use serial Mann-Whitney U tests without a description of data distributions within and across groups. Further, they do not use any overall F tests even though most of the data are presented with more than two bars on the same graph. Stats should be employed according to how the data are presented together on a graph. For example, stats for pre-stim, stim, and post-stim behavior X between Cre+ and Cre- groups should employ something like a two-way repeated measures ANOVA, with post-hoc comparisons following up on those effects and interactions. There are many instances in which one group changes over time or there could be overall main effects of genotype. Not only is serially using Mann-Whitney tests within the same panel misleading and statistically inaccurate, but it cherry-picks the comparisons to be made to avoid more complex results. It is difficult to comprehend the effects of the manipulations presented without more careful consideration of the appropriate options for statistical analysis.

      We thank the reviewer for pointing this out and suggesting alterative analyses, we agree with the assessment on this topic. Therefore, we have extensively revised the statical approach to our data using the suggested approach. Reviewer 1 also made a similar comment, and we would like to point to our reply to reviewer 1’s second point in regard to what we changed and added to the new statistical analyses. Further, we have added a full table detailing the statical values for each figure to the paper.

      Conceptual:

      (6) What does the signal look like at the terminals in the POA? Any suggestion from the data that the projection to the POA is important?

      This is an interesting question that we will pursue in future investigations into the roles of the POA. We used the projection to the POA from SuM to identify a subpopulation in SuM and we were surprised to find the extensive arborization of these neurons to many areas associated with threat responses. We focused on the cell bodies as “hubs” with many “spokes”. Extensive studies are needed to understand the roles of individual projections and their targets. There is also the hypothetical technical challenge of manipulating one projection without activating retrograde propagation of action potentials to the soma. At the current time we have no specific insights into the roles of the isolated projection to POA. Interpretation of experiments activating only “spoke” of the hub would be challenging. Simple terminal stimulation experiments are challenged by the need to separate POA projections from activation of passing fibers targeting more anterior structures of the accumbens and septum.

      (7) Is this distinguishing active coping behavior without a locomotor phenotype? For example, Fig. 5I and other figure panels show a distance effect of stimulation (but see issues raised about the genotype of comparison groups). In addition, locomotor behavior is not included for many behaviors, so it is hard to completely buy the interpretation presented.

      We agree with the reviewer and thank them for highlighting this fundamental challenge in studies examining active coping behaviors in rodents, which requires movement. Additionally, actively responding to threatening stressors would include increased locomotor activity. Separation of movement alone from active coping can be challenging. Because of these concerns we undertook experiments using diverse behavioral paradigms to examine the elicited behaviors and the recruitment of SuMVGLUT2+::POA neurons to stressors. We conducted experiments to directly examine behaviors evoked by photoactivation of SuMVGLUT2+::POA. In these experiments we observed a diversity of behaviors including increased locomotion and jumping but also treading/digging (Figure 4). These are behaviors elicited in mice by threatening and noxious stimuli. An Increase of running or only jumping could signify a specific locomotor effect, but this is not what was observed. Based on these behaviors, we expected to find evidence of increase movement in open field (Figure 5G-I) and light dark choice (Figure 5J-L) assays. For many of the assays, reporting distance traveled is not practical. An important set of experiments that argues against a generic increase in locomotion is the operant behavior experiments, which require the animal to engage in a learned behavior while receiving photostimulation of SuMVGLUT2+::POA neurons (Figure 6). This is particularly true for testing using a progressive ratio when the time of ongoing photostimulation is longer, yet animals actively and selectively engage the active port (Figure 6G-H). Further, we saw a shift in behavioral strategy induce by photoactivation in forced swim test (Figure 7H). Thus, activation of SUMVGLUT2+::POA neurons elicited a range of behaviors that included swimming, jumping, treading, and learned response, not just increased movement. Together these data strongly argue that SuMVGLUT2+::POA neurons do not only promote increased locomotor behavior. We interpret these data together with the data from fiber photometry studies to show SuMVGLUT2+::POA neurons are recruited during acute stressors, contribute to aversive affective component of stress, and promote active behaviors without constraining the behavioral pattern.

      Regarding genotype, we address this in comments above as well but believe that clarifying the use of litter mates, the extensive use of the VGLUT2-Cre line by multiple groups, and experimental design allowing for comparison to baseline, stimulation evoked, and post stimulation behaviors within and across genotypes mitigate possible concerns relating to the genotype.

      (8) What is the role of GABA neurons in the SuM and how does this relate to their function and interaction with glutamate neurons? In Supp 8, GABA neuron activation also modulates locomotion and in Fig 7 there is an effect on immobility, so this seems pretty important for the overall interpretation and should probably be mentioned in the abstract.

      Thank you for noting these interesting findings. We added text to highlight these findings to the abstract. Possible roles of GABAergic neurons in SuM extend beyond the scope of the current study particularly since SuM neurons have been shown to release both GABA and glutamate (Li Y, Bao H, Luo Y, et al. 2020, Root DH, Zhang S, Barker DJ et al. 2018). GABAergic neurons regulate dentate gyrus (Ajibola MI, Wu JW, Abdulmajeed WI, Lien CC 2021), REM sleep (Billwiller F, Renouard L, Clement O, Fort P, Luppi PH 2017), and novelty processing Chen S, He L, Huang AJY, Boehringer R et al. 2020). The population of exclusively GABAergic vs dual neurotransmitter neurons in SuM requires further dissection to be understood. How they may relate to SUMVGLUT2+::POA neurons require further investigation.

      Questions about figure presentation:

      (9) In Fig 3, why are heat maps shown as a single animal for the first couple and a group average for the others?

      Thank you for highlighting this point for further clarification. We modified the labels in the figure to help make clear which figures are from one animal across multiple trials and those that are from multiple animals. In the ambush assay each animal one had one trial, to avoid habituation to the mock predator. Accordingly, we do not have multiple trials for each animal in this test. In contrast, the dunk assay (10 trial/animal) and the shock (5 trials/animal) had multiple trials for each animal. We present data from a representative animal when there are multiple trials per animal and the aggerate data.

      Why is the temporal resolution for J and K different even though the time scale shown is the same?

      Thank you for noticing this error carried forward from a prior draft of the figure so we could correct it. We replaced the image in 3J with a more correctly scaled heatmap.

      What is the evidence that these signal changes are not due to movement per se?

      Thank you for the question. There are two points of evidence. First, all the 465 nm excitation (Ca2+ dependent) data was collected in interleaved fashion with 415 nm (isosbestic) excitation data. The isosbestic signal is derived from GCaMP emission but is independent of Ca2+ binding (Martianova E, Aronson S, Proulx CD. 2019). This approach, time-division multiplexing, can correct calcium-dependent for changes in signal most often due to mechanical change. The second piece of evidence is experimental. Using multiple cohorts of mice, we examined if the change in Ca2+ signal was correlated with movement. We used the threshold of velocity of movement seen following the ambush. We found no correlation between high velocity movements and Ca2+ signal (Figure 3K) including cross correlational analysis (Supplemental figure 5). Based on these points together we conclude the change in the Ca2+ signal in SUMVGLUT2+::POA neurons is not due to movement induced mechanical changes and we find no correlation to movement unless a stressor is present, i.e. mock predator ambush or forced swim. Further, the stressors evoke very different locomotor responses fleeing, jumping, or swimming.

      (10) In Fig 4, the authors carefully code various behaviors in mice. While they pick a few and show them as bars, they do not show the distribution of behaviors in Cre- vs Cre+ mice before manipulation (to show they have similar behaviors) or how these behaviors shift categories in each group with stimulation. Which behaviors in each group are shifting to others across the stim and post-stim periods compared to pre-stim?

      This is an important point. We selected behaviors to highlight in Figure4 C-E because these behaviors are exhibited in response to stress (De Boer & Koolhaas, 2003; van Erp et al., 1994). For the highlighted behaviors, jumping, treading/digging, grooming, we show baseline (pre photostimulation), stimulation, and post stimulation for Cre+ and Cre- mice with the values for each animal plotted. We show all nine behaviors as a heat map in Figure 4B. The panels show changes that may occur as a function of time and show changes induced by photostimulation.

      The heatmaps demonstrate that photostimulation of SUMVGLUT2+::POA neurons causes a suppression of walking, grooming, and immobile behaviors with an increase in jumping, digging/treading, and rapid locomotion. After stimulation stops, there is an increase in grooming and time immobile. The control mice show a range of behaviors with no shifts noted with the onset or termination of photostimulation.

      Of note, issues of statistics, genotype, and SABV are important here. For example, the hint that treading/digging may have a slightly different pre-stim basal expression, it seems important to first evaluate strain and sex differences before interpreting these data.

      We examined the effects of sex as a biological variable in the experiments reported in the manuscript and found no differences among males and females in any of the experiments where we had enough animals in each sex (minimum of 5 mice) for meaningful comparisons. We did this by comparing means and SEM of males and females within each group (e.g. Cre+ males vs Cre+ female, Cre- males vs Cre- females) and then conducted a t-test to see if there was a difference. For figures that show time as a variable (e.g Figure 6C-E), we compared males and females with time x sex as main factors and compared them (including multiple comparisons if needed). We found no significant main effects or interactions between males and females. Because of this, and to maximize statistical power, we decided to move forward to keep males and females together in all the analyses presented in the manuscript. It is worth noting also that the core of the experimental design employed is a change in behavior caused by photostimulation. The mice are also the same strain with only difference being the modification to add an IRES and sequence for Cre behind the coding sequence of the Slc17A6 (VGLUT2) gene.

      (11) Why do the authors use 10 Hz stimulation primarily? is this a physiologically relevant stim frequency? They show that they get effects with 1 Hz, which can be quite different in terms of plasticity compared to 10 Hz.

      Thank you for the raising this important question. Because tests like open field and forced swim are subject to habituation and cannot be run multiple times per animal a test frequency was needed to use across multiple experiments for consistency. The frequency of 10Hz was selected because it falls within the rate of reported firing rates for SuM neurons (Farrel et al., 2021; Pedersen et al., 2017) and based on the robust but sub maximal effects seen in the real-time place preference assays. Identification of the native firing rates during stress response would be ideal but gathering this data for the identified population remains a dauting task.

      (12) In Fig 5A-F, it is unclear whether locomotion differences are playing a role. Entrances (which are low for both groups) are shown but distance traveled or velocity are not.

      In B, there is no color in the lower left panel. where are these mice spending their time? How is the entirety of the upper left panel brighter than the lower left? If the heat map is based on time distribution during the session, there should be more color in between blue and red in the lower left when you start to lose the red hot spots in the upper left, for example. That is, the mice have to be somewhere in apparatus. If the heat map is based on distance, it would seem the Cre- mice move less during the stim.

      We appreciate the opportunity to address this question, and the attention to detail the reviewer applied to our paper. In the real time place preference test (RTPP) stimulation would only be provided while the animal was on the stimulation side. Mice quickly leave the stimulation side of the arena, as seen in the supplemental video, particularly at the higher frequencies. Thus, the time stimulation is applied is quite low. The mice often retreat to a corner from entering the stimulation side during trials using higher frequency stimulation. Changing locomotor activity along could drive changes in the number entrances but we did not find this. In regard to the heat map, the color scale is dynamically set for each of the paired examples that are pulled from a single trial. To maximize the visibility between the paired examples the color scale does not transfer between the trials. As a result, in the example for 10 Hz the mouse spent a larger amount of time in the in the area corresponding to the lower right corner of the image and the maximum value of the color scale is assigned to that region. As seen in the supplemental video, mice often retreated to the corner of the non-stimulation side after entering the stimulation side. The control animal did not spend a concentrated amount of time in any one region, thus there is a lack of warmer colors. In contrast the baseline condition both Cre+ and Cre- mice spent time in areas disturbed on both sides of arena, as expected. As a result, the maximum value in the heat map is lower and more area are coded in warmer colors allowing for easier visual comparison between the pair. Using the scale for the 10 Hz pair across all leads to mostly dark images. We considered ways to optimized visualization across and within pairs and focused on the within pair comparison for visualization.

      (13) By starting with 1 hz, are the experimenters inducing LTD in the circuit? what would happen if you stop stimming after the first epoch? Would the behavioral effect continue? What does the heat map for the 1 hz stim look like?

      Relatedly, it is a lot of consistent stimulation over time and you likely would get glutamate depletion without a break in the stim for that long.

      Thank you for the opportunity to add clarity around this point regarding the trials in RTPP testing. Importantly, the trials were not carried out in order of increasing frequency of stimulation, as plotted. Rather, the order of trials was, to the extent possible with the number of mice, counterbalanced across the five conditions. Thus, possible contribution of effects of one trial on the next were minimized by altering the order of the trials.

      We have added a heat map for the 1 Hz condition to figure 5B.

      For experiments on RTPP the average stimulation time at 10Hz was less than 10 seconds per event. As a result, the data are unlikely to be affected by possible depletion of synaptic glutamate. For experiments using sustained stimulation (open field or light dark choice assays) we have no clear data to address if this might be a factor where 10Hz stimulation was applied for the entire trial.

      (14) In Fig 6, the authors show that the Cre- mice just don't do the task, so it is unclear what the utility of the rest of the figure is (such as the PR part). Relatedly, the pause is dependent on the activation, so isn't C just the same as D? In G and H, why ids a subset of Cre+ mice shown?

      Why not all mice, including Cre- mice?

      Thank you for the opportunity to improve the clarity of this section. A central aspect of the experiments in Figure 6 is the aversiveness of SUMVGLUT2+::POA neuron photostimulation, as shown in Figure 5B-F. The aversion to photostimulation drives task performance in the negative reinforcer paradigm. The mice perform a task (active port activation) to terminate the negative reinforcer (photostimulation of SuMVGLUT2+::POA neurons). Accordingly, control mice are not expected to perform the task because SuMVGLUT2+::POA neurons are not activated and, thus the mice are not motivated to perform the task.

      A central point we aim to covey in this figure is that while SuMVGLUT2+::POA neurons are being stimulated, mice perform the operant task. They selectively activated the active port (Supplemental Figure 7). As expected, control mice activate the active port at a low level in the process of exploring the arena. This diminishes on subsequent trials as mice habituate to the arena (Figure 6D). The data in Figures 6 C and D are related but can be divergent. Each pause in stimulation requires a port activation of a FR1 test but the number of port activations can exceed the pauses, which are 10 seconds long, if the animal continues to activate the port. Comparing data in Figures 6 C and D revels that mice generally activated the port two to three times for each pause earned with a trend towards greater efficiency on day 4 with more rewards and fewer activations.

      The purpose of the progressive ratio test is to examine if photostimulation of SuMVGLUT2+::POA continues to drive behavior as the effort required to terminate the negative stimuli increases. As seen in Figures 6 G and H, the stimulation of SuMVGLUT2+::POA neurons remains highly motivating. In the 20-minute trial we did not find a break point even as the number of port activations required to pause the stimulation exceed 50. We do not show the Cre- mice is Figure 6G and H because they did not perform the task, as seen in Figure 6F. For technical reasons in early trials, we have fully timely time stamped data for rewards and port activations from a subset of the Cre+ mice. Of note, this contains both the highest and lowest performing mice from the entire data set.

      Taken together, we interpret the results of the operant behavioral testing as demonstrating that SuMVGLUT2+::POA neuron activation is aversive, can drive performance of an operant tasks (as opposed to fixed escape behaviors), and is highly motivating.

      (15) In Fig 7, what does the GCaMP signal look like if aligned to the onset of immobility? It looks like since the hindpaw swimming is short and seems to precede immobility, and the increase in the signal is ramping up at the onset of hindpaw swimming, it may be that the calcium signal is aligned with the onset of immobility.

      What does it look like for swimming onset?

      In I, what is the temporal resolution for the decrease in immobility? Does it start prior to the termination of the stim, or does it require some elapsed time after the termination, etc?

      Thank for the opportunity to addresses these points and improve that clarity of our interpretation of the data. Regarding aligning the Ca2+ signal from fiber photometry recordings to swimming onset and offset, it is important to note that the swimming bouts are not the same length. As a result, in the time prior to alignment to offset of behaviors animals will have been swimming for different lengths of time. In Figure 7 C, we use the behavioral heat map to convey the behavioral average. Below we show the Ca2+ dependent signal aligned at the offset of hindpaw swim for an individual mouse (A) and for the total cohort (B). This alignment shows that the Ca2+ dependent signal declines corresponding to the termination of hindpaw swimming. Because these bouts last less than the total the widow shown, the data is largely included in Figure 7 C and D, which is aligned to onset. Due to the nuance of the difference is the alignment and the partial redundancy, we elected to include the requested alignment to swimming offset in the reply rather in primary figure.

      Author response image 1.

      Turning to the question regarding swimming onset, the animals started swimming immediately when placed in the water and maintained swimming and climbing behaviors until shifting behaviors as illustrated in Figure 7A and B. During this time the Ca2+-dependent signal was elevated but there is only one trial per animal. This question can perhaps be better addressed in the dunk assay presented in Figure 3C, F and G and Supplemental Figure 4 H and I. Here swimming started with each dunk and the Ca2+ signal increased.

      Regarding the question for about figure 7I. We scored for entire periods (2 mins) in aggerate. We noted in videos of the behavior test that there was an abrupt decrease in immobility tightly corresponding to the end of stimulation. In a few animals this shift occurred approximately 15-20s before the end of stimulation. This may relate to the depletion of neurotransmitter as suggested by the reviewer.

      Reviewer 3

      Major points

      (1) Results in Figure 1 suggested that SuM-Vglu2::POA projected not only POA but also to the diverse brain regions. We can think of two models which account for this. One is that homogeneous populations of neurons in SuM-Vglu2::POA have collaterals and innervated all the efferent targets shown in Figure 1. Another is to think of distinct subpopulations of neurons projecting subsets of efferent targets shown in Figure 1 as well as POA. It is suggested to address this by combining approaches taken in experiments for Figure 1 and Supplemental Figure 2.

      Thank you for raising this interesting point. We have attempted combining retroAAV injections into multiple areas that receive projections from SUMVGLUT2+::POA neurons. However, we have found the results unsatisfactory for separating the two models proposed. Using eYFP and tdTomato expressing we saw some overlapping expressing in SuM. We are not able to conclude if this indicates separate populations or partial labeling of a homogenous populations. A third option seems possible as well. There could be a mix of neurons projecting to different combinations of downstream targets. This seems particularly difficult to address using fluorophores. We are preparing to apply additional methodologies to this question, but it extends beyond the scope of this manuscript.

      (2) Since the authors drew a hypothetical model in which the diverse brain regions mediate the effect of SuM-Vglu2::POA activation in behavioral alterations at least in part, examination of the concurrent activation of those brain regions upon photoactivation of SuM-Vglu2::POA. This must help the readers to understand which neural circuits act upon the induction of active coping behavior under stress.

      Thank you for raising this important point. We agree that activating glutamatergic neurons should lead to activation of post synaptic neurons in the target regions. Delineating this in vivo is less straight forward. Doing so requires much greater knowledge of post synaptic partners of SUMVGLUT2+::POA neurons. There are a number of issues that would need to be accounted for. Undertaking two color photo stimulation plus fiber photometry is possible but not a technical triviality. Further, it is possible that we would measure Ca2+ signals in neurons that have no relevant input or that local circuits in a region may shape the signal. We would also lack temporal resolution to identify mono-postsynaptic vs polysynaptic connections. Thus, we would struggle to know if the change in signal was due to the excitatory input from SuM or from a second region. At present, we remain unclear on how to pursue this question experimentally in a manner that is likely to generate clearly interpretable results.

      (3) In Figure 4, "active coping behaviors" must be called "behaviors relevant to the active behaviors" or "active coping-like behaviors", since those behaviors were in the absence of stressors to cope with.

      Thank you for the suggestion on how to clarify our terminology. We have adopted the active coping-like term.

      (4) For the Dunk test, it is suggested to describe the results and methods more in detail, since the readers would be new to it. In particular, the mice could change their behavior between dunks under this test, although they still showed immobility across trials as in Supplemental Figure 4I. Since neural activity during the test was summarized across trials as in Figure 3, it is critical to examine whether the behavior changes according to time.

      Thank you for identifying this opportunity to improve our manuscript. We have expanded and added a detailed description of the dunk test in the methods section.

      As for Supplemental Figure 4I, we apologize for the confusion because the purpose of this figure is to show that mice remained mobile for the entire 30-second dunk trial. This did not appreciably change over the 10 trials. We have revised this figure to plot both immobile and mobile time to achieve greater clarity on this point.

      Minor points

      Typos

      In Figure 1, please add a serotype of AAVs to make it compatible with other figures and their legends.

      In the main text and Figure 2K, the authors used MHb/LHb and mHb/lHb in a mixed fashion. Please make them unified.

      In the figure legend of Figure 6, change "SuMVGLUT2+::POA neurons drive" to "SuMVGLUT2+::POA neurons " in the title.

      In line 86, please change "Retro-AAV2-Nuc-flox(mCherry)-eGFP" to "AAV5-Nuc-flox(mCherry)eGFP".

      In line 80, please change "Positive controls" to "As positive controls, ".

      Thank you for taking the time and making the effort to identify and call these out. We have corrected them.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We are very grateful to both Reviewers, the Reviewing Editor and the Senior Editor for carefully reviewing our manuscript and for providing useful comments and suggestions that further improved the quality of our work. We appreciate that our work is perceived to substantially advance the understanding of osteoblast migration and that the experiments are found to be rigorous and to provide conclusive evidence. We also look forward to reaching a broad audience in the field. Below we provide a point-by-point response to each suggestion made by the reviewers and explain how we included their recommendations in the revised manuscript.

      Public Reviews

      Reviewer #1 (Public Review):

      Summary:

      The authors were trying to achieve that Tgif1 expression is regulated by EAK1/2 and PTH in a timedependent manner, and its roles in suppressing Pak3 for facilitating osteoblast adhesion. The authors further tried to show that the Tgif1- Pak3 signaling plays a significant role in osteoblast migration to the site of bone repair and bone remodeling.

      Strengths:

      • In a previous study, it was demonstrated that Tgif1 is a target gene of PTH, and the absence of Tgif1 failed to increase bone mass by PTH treatment (Saito et al., Nat Commun., 2019). In this study, the authors found that Tgif1-Pak3 signaling prompts osteoblast migration through osteoblast adhesion to prompt bone regeneration. This novel finding provides a better understanding of how Tgif1 expression in osteoblasts regulates adherence, spreading, and migration during bone healing and bone remodeling.

      • The authors demonstrated that ERK1/2 and PTH regulate Tgif1 expression in a time-dependent manner and its role in suppressing Pak3 through various experimental approaches such as luciferase assay, ChIP assay, and gene silencing. These results contribute to the overall strength of the article.

      We thank the reviewer for acknowledging the novelty of our findings as well as the strength of the manuscript.

      Weaknesses:

      • The authors need to further justify why they focused on Pak3 in the introduction by mentioning its known function for cell adhesion.

      We thank the reviewer for this suggestion. We mention in the introduction that we further investigated Pak3 due to its implication in cell adhesion (page 6, lines 7-8).

      • Some results indicated statistically significant but small changes. The authors need to explain in the discussion part why they believe this is the major mechanism or why there may be some other possible mechanisms.

      We agree with this comment. We are confident that our work identified an important mechanism by which Tgif1 regulates cellular features of osteoblasts. However, it is certainly possible that other mechanisms may exist as well. We discuss this point in the revised manuscript (page 18, lines 16-17).

      • The study does not include enough in vivo data to claim that this mechanism is crucial for bone healing and bone remodeling in vivo.

      Re: We agree with this point and have modified the abstract accordingly by replacing “crucial” with “implicated in” as well as the text by changing “crucial” to “important” (page 2, line 9). Furthermore, we discuss this limitation in the revised manuscript (page 18, lines 9-14).

      Reviewer #2 (Public Review):

      Summary:

      Bolamperti S. et al. 2023 investigate whether the expression of TG-interacting factor (Tgif1) is essential for osteoblastic cellular activity regarding morphology, adherence, migration/recruitment, and repair. Towards this end, germ-line Tgif1 deletion (Tgif1-/-) mice or male mice lacking expression of Tgif1 in mature osteoblastic and osteocytic cells (Dmp1-Cre+; Tgif1fl/fl) and corresponding controls were studied in physiological, bone anabolic, and bone fracture-repair conditions. Both Tgif1-/- and Dmp1-Cre+; Tgif1fl/fl exhibited decreased osteoblasts on cancellous bone surfaces and adherent to collagen I-coated plates. Tgif1-/- mice exhibit impaired healing in the tibial midshaft fracture model, as indicated by decreased bone volume (BV/Cal.V), osteoid (OS/BS), and low osteoblasts (number and surface). Likewise, both Tgif1-/- and Dmp1-Cre+; Tgif1fl/fl show impaired PTH 1-34, (100µg/kg, 5x/wk for 3 wks) osteoblast activation in vivo, as detected by increases in quiescent bone surfaces. Mechanistic in vitro studies then utilized primary osteoblasts isolated from Tgif1-/- mice and siRNA Tgif1 knockdown OCY454 cells to further investigate and identify the downstream Tgif1 target driving these osteoblastic impairments. In vitro, Tgif1-/- osteoblastic and Tgif1 knockdown OCY454 cells exhibit decreased migration, abnormal morphology, and decreased focal adhesions/cells. Unexpectantly though, localization assays revealed Tgif1 to primarily concentrate in the nucleus and not to co-localize with focal adhesions (paxillin, talin). Also, the expression of major focal adhesion components (paxillin, talin, FAK, Src, etc.) or the Cdc42 family was not altered by loss of Tgif1 expression. In contrast, PAK3 expression is markedly upregulated by loss of Tgif1. In silico analysis followed by mechanistic molecular assays involving ChIP, siRNA (Tgif1, PAK3), and transfection (rat PAK3 promoter) techniques show that Tgif1 physically binds to a specific site in the PAK3 promoter region. Further, the knockdown of PAK3 rescues the Tgif1-deficient abnormal morphology in OCY454 cells. This is the first study to identify the novel transcriptional repression of PAK3 by Tgif1 as well as the specific Tgif1 binding site within the PAK3 promoter.

      Strengths:

      This work has a plethora of strengths. The co-authors achieved their aim of eliciting the role of Tgif1 expression in osteoblastic cellular functions (morphology, spreading/attachment, migration).

      Further, this work is the first to depict the novel mechanism of Tgif1 transcriptional repression of PAK3 by a thorough usage of mechanistic molecular assays (in silico analysis, ChIP, siRNA, transfection etc.). The conclusions are well supported and justified by these findings, as the appropriate controls, sample sizes (statistical power), statistics, and assays were fully utilized. The claims and conclusions are justified by the data.

      Re: We are grateful to this reviewer for recognizing the novelty, strengths, and rigor of our study and for acknowledging that the data convincingly support the conclusions drawn.

      Weaknesses:

      The discussion section could be expanded with a few sentences regarding limitations to the current study and potential future directions.

      Re: In the revised manuscript, we are discussing limitations of the work and describe possible future directions (page 18, line 9-14).

      Recommendations For The Authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) The cell spreading and migration assay is quite artificial. Trypsinized osteoblasts and quiescent osteoblasts are totally different. The authors need to cite papers from other groups to justify whether the cell spreading and migration assay is appropriate to achieve the goals of this study.

      Re: The reviewer is right that in vitro assays are often artificial and do not necessarily fully reflect in vivo situations. We have taken this aspect into account and discuss it in the revised manuscript (page 18, lines 9-10). In addition, we have included references from other groups who have used similar assays to study cell spreading and migration (Dejaeger M et al., 2017 and Dang et al., 2018).

      (2) Page 13 Line 15: The statement "Osteoblasts are greatly impaired in the ability to migrate into the repair zone" is an overstatement. The experiments in Figure 5 do not necessarily reflect osteoblast migration activities. The authors need to rephrase the sentence or need to show observation of earlier time points (e.g., 1 week after fracture) in their bone healing experiments. The number of osteoblasts/surface in Tgif1+/+ and Tgif1-/- mice at different time points during bone healing should be a good indicator for the migration of osteoblasts to the repair site.

      Re: We understand the critique that a time course or lineage tracing experiments would provide better evidence for the statement of osteoblast migration into the repair zone. To avoid overinterpretations we have removed the sentence from the revised manuscript.

      (3) Page 14, Line 24: Regarding the sentence "The observation that Tgif1 is crucial for osteoblast adherence, spreading, and migration", the authors need to clearly mention this statement is based on the in vitro experiments. The animal studies are not enough to claim that the mechanism is crucial for adherence, spreading, and migration.

      Re: We thank the reviewer for pointing out this limitation. We have clarified that the finding that Tgif1 is crucial for osteoblast adherence, spreading and migration was made in vitro (page 14, line 22).

      (4) The authors need to demonstrate the suppression of Pak3 expression in PTH-treated mice in vivo, in addition to the in vitro culture system (Fig. 7C and 7D).

      Re: We agree with the reviewer that this experiment would be very insightful. However, this is beyond the scope of the current work. Nevertheless, to take this valid point into consideration, we mention it in the discussion as potential future direction (page 18, lines 11-14).

      (5) The authors need to demonstrate that the pharmacologic suppression of Pak3 in Tgif1-/- mice reduces the % of quiescent surface/BS in vivo.

      Re: This point is also well taken, and we agree that a suppression of Pak3 in Tgif1-deficient mice would be very informative to support our in vitro findings. However, this may also be part of future investigations. This is emphasized in the discussion of the revised manuscript (page 18, lines 11-14).

      Figures (Minor)

      Fig. 1:

      Fig. 1A

      Arrows need to indicate a more precise position.

      Re: The position of the arrows has been optimized.

      Fig. 1DE

      What are blue/red bars (genotypes)?

      Re: The colors indicate the genotypes. A legend has been added to the revised figure.

      Fig. 1K

      Quantification data is needed.

      Re: Thank you for this suggestion. We added a quantification of the data (Fig. 1L, M; page 8, lines 3-4; page 21, lines 5-6)

      Fig. 2A

      Show the representative high-magnification image of round (non-spread) cells.

      Re: Representative high-magnification images (insets) are provided in the revised figure 2A.

      Fig. 5

      Red arrows need to indicate a more precise position.

      Re: The arrows have been repositioned.

      Fig. 6A, C

      Red arrows need to indicate a more precise position.

      Re: The arrows have been repositioned.

      Reviewer #2 (Recommendations For The Authors):

      (1) The microscopy images and analyses are excellent.

      Re: We thank the reviewer for acknowledging the quality of our microscopy studies.

      (2) Since the Tgif1-/- mouse has low osteoclast numbers, is it possible that this is a contributing factor to the delays/impairment in bone healing, given that resorption also has a role in fracture repair? Since the focus of these studies is on osteoblastic cells, this point is a little out of scope. However, would the authors consider exploring this further in the discussion section?

      Re: This point is well taken by the reviewer, and we agree that osteoclasts could certainly play a role in the impaired fracture healing. To acknowledge this aspect, we followed the recommendation and discuss this aspect in the revised manuscript (page 16, lines 22-24).

      Revisions

      Would the authors consider slightly re-wording the title? Tgif1 suppresses PAK3 expression; however, Tgif1-deficiency leads to the unregulated elevation of PAK3 expression.

      Re: Thank you for pointing this out. We agree with the reviewer and adapted the title accordingly.

      Suggestions

      (1) Is it possible that apoptosis and/or anoikis is being induced by Tgif1 deficiency in osteoblastic cells?

      Re: We do not have data towards this direction and although Tgif1-deficient osteoblasts are overall viable and well expanding, we cannot fully exclude this possibility.

      (2) For the fracture study, any differences in overall callus size? Would it be possible to perform micro-CT imaging with some of these samples?

      Re: There is no difference in non-mineralized callus size between Tgif1+/+ and Tgif1-/- mice. However, there is less mineralized bone per callus area in Tgif1-/- mice, confirming an impaired osteoblast phenotype. As suggested by the reviewer, we added representative micro-CT images and the respective information to the revised manuscript (Fig 5F; pages 19-20).

      (3) Fracture repair experiment-is PAK3 expression downregulated with fracture injury; and/or, is PAK3 upregulated by loss of Tgif1 expression?

      Re: Unfortunately, we do not have data to answer this very interesting question and it would need to be addressed in future studies. This is mentioned in the revised discussion (page 18, lines 12-14).

      (4) Fig 7F. within PTH treated cells, is the light blue SCR sphericity statistically different than the light green siTgif1 + siPAK3 ? While the statement of the "lack of both, Tgif1 and PAK3 prevented PTH-induced decrease in cell sphericity" is supported by the lack of differences between dark green vs. light green; is it also possible that this is due to the siPAK3 returning sphericity to control (scr) levels? (i.e. hitting a floor limit of detection).

      Re: We thank the reviewer for this thoughtful question. There is no statistically significant difference between light blue and light green. Silencing PAK3 restores the impaired capacity to spread that occurs in the absence of Tgif1 to the level of scr controls (significant difference between dark and light red vs. dark and light green and no difference between either dark or light blue vs. dark or light green). However, unlike in the (scr) controls, in the absence of both Tgif1 and PAK3, the cells do not respond to PTH (statistically significant difference between dark and light blue, no difference between dark and light green). Based on the data, cells can reach sphericity of less than 0.2 and thus it is unlikely that sphericity is “hitting the floor level of detection” in these groups.

    1. Author Response

      We would like to thank the reviewers for their positive comments and valuable suggestions for improvements to the manuscript. We intend to revisit the discussion to clarify our interpretation of how azithromycin resistance mutations impact the transmission potential of P. falciparum and expand on the differences between mouse and human malaria. Additionally, we intend to adjust the title to better align with the revised interpretation of the main findings. These changes will be reflected in the revised manuscript to be submitted as the eLife Version of Record.

    1. Author Response

      Reviewer #1 (Public Review):

      Summary:

      The current study aims to quantify associations between the regular use of proton-pump inhibitors (PPI) - defined as using PPI most days of the week during the last 4 weeks at one cross-section in time - with several respiratory outcomes up to several years later in time. There are 6 respiratory outcomes included: risk of influenza, pneumonia, COVID-19, other respiratory tract infections, as well as COVID-19 severity and mortality).

      Strengths:

      Several sensitivity analyses were performed, including i) estimation of the e-value to assess how strong unmeasured confounders should be to explain observed effects, ii) comparison with another drug with a similar indication to potentially reduce (but not eliminate) confounding by indication.

      Thank you for pointing out the strengths of our article. We also sincerely thank the reviewer for raising several concerns and providing significant suggestions to improve our manuscript. We will revise our manuscript according to our provisional responses.

      Weaknesses:

      (1) The main exposure of interest seems to be only measured at one time-point in time (at study enrollment) while patients are considered many years at risk afterwards without knowing their exposure status at the time of experiencing the outcome. As indicated by the authors, PPI are sometimes used for only short amounts of time. It seems biologically implausible that an infection was caused by using PPI for a few weeks many years ago.

      We agree with the reviewer, and this is one of the limitations of the UK Biobank data. We might identify potential long-term PPI users by defining the users that have certain indications, since they tend to regularly take PPI for a long period rather than only short amounts of time. We will evaluate the effect modification for the subgroup of potential long-term PPI users.

      (2) Previous studies have shown that by focusing on prevalent users of drugs, one often induces several biases such as collider stratification bias, selection bias through depletion of susceptible, etc.

      Due to the limitations of the data from the UK Biobank, including the lack of information on the initiation of medications and close follow-up, we can only use prevalent user design to evaluate the associations between PPI use and respiratory outcomes. We will further discuss it in the limitation section.

      (3) It seems Kaplan Meier curves are not adjusted for confounding through e.g. inverse probability weighting. As such the KM curves are currently not informative (or the authors need to make clearer that curves are actually adjusted for measured confounding).

      We will provide Kaplan Meier curves adjusted for confounding by inverse probability weighting according to the reviewer’s suggestion.

      (4) Throughout the manuscript the authors seem to misuse the term multivariate (using one model with e.g. correlated error terms to assess multiple outcomes at once) when they seem to mean multivariable.

      We will correct the misused terms throughout the manuscript according to the reviewer’s suggestions.

      (5) Given multiple outcomes are assessed there is a clear argument for accounting for multiple testing, which following the logic of the authors used in terms of claiming there is no association when results are not significant may change their conclusions. More high-level, the authors should avoid the pitfall of stating there is evidence of absence if there is only an absence of evidence in a better way (no statistically significant association doesn't mean no relationship exists).

      We will revise our interpretation of the results, especially for those without statistically significant associations based on the reviewer’s advice.

      (6) While the authors claim that the quantitative bias analysis does show results are robust to unmeasured confounding, I would disagree with this. The e-values are around 2 and it is clearly not implausible that there are one or more unmeasured risk factors that together or alone would have such an effect size. Furthermore, if one would use the same (significance) criteria as used by the authors for determining whether an association exists, the required effect size for an unmeasured confounder to render effects 'statistically non-significant' would be even smaller.

      We agree with the reviewer that there might still exist one or more unmeasured risk factors that have effect sizes larger than 2. Therefore, we could not state that the results are robust to unmeasured confounding based on the current analysis, and this would be a limitation of our study. We will add the above information to the discussion section.

      (7) Some patients are excluded due to the absence of follow-up, but it is unclear how that is determined. Is there potentially some selection bias underlying this where those who are less healthy stop participating in the UK biobank?

      We will provide the details for the determination of absence of follow-up in the UK Biobank and illustrate whether it potentially induced selection bias.

      (8) Given that the exposure is based on self-report how certain can we be that patients e.g. do know that their branded over-the-counter drugs are PPI (e.g. guardium tablets)? Some discussion around this potential issue is lacking.

      In the data collection of the UK Biobank, the participants can enter the generic or trade name of the treatment on the touchscreen to match the medications they used. We will discuss this important issue in the discussion section.

      (9) Details about the deprivation index are needed in the main text as this is a UK-specific variable that will be unfamiliar to most readers.

      We will provide details about the deprivation index in the manuscript.

      (10) It is unclear how variables were coded/incorporated from the main text. More details are required, e.g. was age included as a continuous variable and if so was non-linearity considered and how?

      Age was included as a continuous variable. We will provide information on whether non-linearity was considered in our manuscript.

      (11) The authors state that Schoenfeld residuals were tested, but don't report the test statistics. Could they please provide these, e.g. it would already be informative if they report that all p-values are above a certain value.

      We will provide the test statistics for the Schoenfeld residuals.

      (12) The authors would ideally extend their discussion around unmeasured confounding, e.g. using the DAGs provided in https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7832226/, in particular (but not limited to) around severity and not just presence/absence of comorbidities.

      We will use the DAGs provided by the article (PMC7832226) to extend our discussion around unmeasured confounding, especially the severity of comorbidities.

      (13) The UK biobank is known to be highly selected for a range of genetic, behavioural, cardiovascular, demographic, and anthropometric traits. The potential problems this might create in terms of collider stratification bias - as highlighted here for example: https://www.nature.com/articles/s41467-020-19478-2 - should be discussed in greater detail and also appreciated more when providing conclusions.

      We agree with the reviewer that the highly selective nature of the UK Biobank might create collider stratification bias for the evaluation of COVID-19-related outcomes. We will further discuss this in detail and be cautious when generating conclusions.  

      Reviewer #2 (Public Review):

      Summary:

      Zeng et al investigate in an observational population-based cohort study whether the use of proton pump inhibitors (PPIs) is associated with an increased risk of several respiratory infections among which are influenza, pneumonia, and COVID-19. They conclude that compared to non-users, people regularly taking PPIs have increased susceptibility to influenza, pneumonia, as well as COVID-19 severity and mortality. By performing several different statistical analyses, they try to reduce bias as much as possible, to end up with robust estimates of the association.

      Strengths:

      The study comprehensively adjusts for a variety of critical covariates and by using different statistical analyses, including propensity-score-matched analyses and quantitative bias analysis, the estimates of the associations can be considered robust.

      We thank the reviewer for demonstrating the strengths of our articles. We will further revise our manuscript according to the reviewer’s suggestions.

      Weaknesses:

      As it is an observational cohort study there still might be bias. Information on the dose or duration of acid suppressant use was not available, but might be of influence on the results. The outcome of interest was obtained from primary care data, suggesting that only infections as diagnosed by a physician are taken into account. Due to the self-limiting nature of the outcome, differences in health-seeking behavior might affect the results.

      We will try to adjust or provide discussions about the above factors, including the dose/duration of PPI use, outcome assessment, and health-seeking behavior.

    1. Author Response

      The following is the authors’ response to the original reviews.

      General remarks for the Editor and the Reviewers

      We would like to thank the Editor and the Reviewers for their feedback. Below we address their comments and present our point-by-point responses as well as the related changes in the manuscript.

      In addition to these changes, in a few cases we have found it necessary to move some texts and provide some additional explanations within the manuscript. We emphasize that these amendments have been made for only technical reasons, and do not alter the results and conclusions of the paper, but may help to render the text more coherent and understandable to readers with little knowledge of the subject.

      These minor corrections are:

      • We extended the Introduction section by a sentence (lines 40-42) that is intended to fit the proposed template directed, non-enzymatic replication mechanism into a more general prebiotic evolutionary context, thus emphasizing its biological relevance. This sentence includes an additional reference (Rosenberger et al., 2021).

      • Two very methodologically oriented and repeated descriptions of random sequence generation have been moved to the Methods section (lines 178-185) from the Results section (lines 336-339 and lines 351-354).

      • We complemented the Data availability statement with licensing information (lines 684-685).

      • Further minor changes (also indicated by red texts) have been implemented to remedy logical and grammatical glitches.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Szathmary and colleagues explore the parabolic growth regime of replicator evolution. Parabolic growth occurs when nucleic acid strain separation is the rate-limiting step of the replication process which would have been the case for non-enzymatic replication of short oligonucleotide that could precede the emergence of ribozyme polymerases and helicases. The key result is that parabolic replication is conducive to the maintenance of genetic diversity, that is, the coexistence of numerous master sequences (the Gause principle does not apply). Another important finding is that there is no error threshold for parabolic replication except for the extreme case of zero fidelity.

      Strengths:

      I find both the analytic and the numerical results to be quite convincing and well-described. The results of this work are potentially important because they reveal aspects of a realistic evolutionary scenario for the origin of replicators.

      Weaknesses:

      There are no obvious technical weaknesses. It can be argued that the results represent an incremental advance because many aspects of parabolic replication have been explored previously (the relevant publications are properly cited). Obviously, the work is purely theoretical, experimental study of parabolic replication is due. In the opinion of this reviewer, though, these are understandable limitations that do not actually detract from the value of this work.

      We are grateful that this Reviewer appreciates our work. We completely agree that the ultimate validation must come from experiments. It is important to stress that in this field theory often preceded experimental work by decades, and the former often guided the latter. We hope that for the topic of the present paper experiments will follow considerably faster.

      Reviewer #2 (Public Review):

      Summary:

      A dominant hypothesis concerning the origin of life is that, before the appearance of the first enzymes, RNA replicated non-enzymatically by templating. However, this replication was probably not very efficient, due to the propensity of single strands to bind to each other, thus inhibiting template replication. This phenomenon, known as product inhibition, has been shown to lead to parabolic growth instead of exponential growth. Previous works have shown that this situation limits competition between alternative replicators and therefore promotes RNA population diversity. The present work examines this scenario in a model of RNA replication, taking into account finite population size, mutations, and differences in GC content. The main results are (1) confirmation that parabolic growth promotes diversity, but that when the population size is small enough, sequences least efficient at replicating may nevertheless go extinct; (2) the observation that fitness is not only controlled by the replicability of sequences, but also by their GC content; (3) the observation that parabolic growth attenuates the impact of mutations and, in particular, that the error threshold to which exponentially growing sequences are subject can be exceeded, enabling sequence identity to be maintained at higher mutation rates.

      Strengths:

      The analyses are sound and the observations are intriguing. Indeed, it has been noted previously that parabolic growth promotes coexistence, its role in mitigating the error threshold catastrophe - which is often presented as a major obstacle to our understanding of the origin of life - had not been examined before.

      Weaknesses:

      Although all the conclusions are interesting, most are not very surprising for people familiar with the literature. As the authors point out, parabolic growth is well known to promote diversity (SzathmaryGladkih 89) and it has also been noted previously that a form of Darwinian selection can be found at small population sizes (Davis 2000).

      Given that under parabolic growth, no sequence is ever excluded for infinite populations, it is also not surprising to find that mutations have a less dramatic exclusionary impact.

      In the two articles cited (Szathmary-Gladkih 1989 and Davis 2000) the subexponentiality of the system was implemented in a mechanistic way, by introducing the exponent 0 < 𝑝 < 1. Although the behaviour of these models is more or less consistent with experimental findings (von Kiedrowski, 1986; Zielinski and Orgel, 1987), the divergence of per capita growth rates (𝑥̇/𝑥) at very low concentrations–which guarantees the ability to maintain unlimited diversity in the case of infinite population sizes–makes this formal approach partly unrealistic.

      To avoid the possible artefacts of this mechanistic approach, and as there are no previous studies analysing the diversity maintaining ability of finite populations of parabolic replicators in an individual-based model context, we implemented a simplified template replication mechanism leading to parabolic growth and analysed the dynamics in an individual-based stochastic model context. The key point of our investigation is that considerable diversity can be maintained in the system even when the population size is quite small.

      Regarding the Reviewer’s comment on selection: Darwinian selection can only occur in a simple subexponential dynamics if the ratio of replicabilities diverges, cf. Eq. (8) and the preceding paragraph in Davis, 2000.

      Our results also show (Figs. 4B and 4C) that high mutation rates and the error threshold problem can still be considered as a major limiting factor for parabolically replicating systems in terms of their diversity-maintaining ability. In the light of the above, potential mechanisms to relax the error threshold in such systems, one of which is demonstrated in the present study, seem to be important steps to account for the sequence diversification and increase in molecular complexity during the early evolution of RNA replicators.

      A general weakness is the presentation of models and parameters, whose choices often appear arbitrary. Modeling choices that would deserve to be further discussed include the association of the monomers with the strands and the ensuing polymerization, which are combined into a single association/polymerization reaction (see also below), or the choice to restrict to oligomers of length L = 10. Other models, similar to the one employed here, have been proposed that do not make these assumptions, e.g. Rosenberger et al. Self-Assembly of Informational Polymers by Templated Ligation, PRX 2021. To understand how such assumptions affect the results, it would be helpful to present the model from the perspective of existing models.

      The assumption of one-step polymerization reactions that we used here is a common technique for modelling template replication of sequence-represented replicators [see, e.g., Fontana and Schuster, 1998 (10.1126/science.280.5368.1451), Könnyű et al., 2008 (10.1186/1471-2148-8267), Vig-Milkovics et al, 2019 (10.1016/j.jtbi.2018.11.020) or Szilágyi et al., 2020 (10.1371/journal.pgen.1009155)]. This is because assuming base-to-base polymerisation of the copy would lead to a very large number of different types of intermediates, which a Gillespietype stochastic simulation algorithm could not handle in reasonable computation times, even if the sequences were relatively short. For comparison, in our model, where polymerization is one-step, the characteristic time of a simulation for 𝐿 = 10, 𝑁 = 105 and 𝛿 = 0.01 was 552 hours.

      Note that in Rosenberg et al. (PRX 2021), in contrast to a pioneering work [Fernando et al, 2007 (10.1007/s00239-006-0218-4)], sequences of replicators are not represented, which makes this approach completely inapplicable to our case, in which sequence defines the fitness. In sum, we suggest that this valid criticism points to possible future work.

      The values of the (many) parameters, often very specific, also very often lack justifications. For example, why is the "predefined error factor" ε = 0.2 and not lower or higher? How would that affect the results?

      A general remark. For the more important parameters , several values were used to test the behaviour of the model (see Table 1), but due to the considerable number of parameters, it is impossible to examine all possible combinations. 𝑐+ = 1 fixes the timescale, 𝐿 is set to 10 to obtain reasonable running times (see above).

      𝜀 characterizes how replicability decreases as the number of mutations increases. In the manuscript we used the following default vector: 𝜀 = (0.05, 0.2, 1) in which the third element corresponds to the mutation-free sequence, so it must to be 1. The first element determines the baseline replicability (see Methods), which we preferred not to change because it would fundamentally alter the ratio of replication propensities to association and dissociation propensities (as the substantial amount of complementary sequences of the master sequences are of baseline replicability) and thus would alter the reaction kinetics to an extent that it is not comparable with the original results. Therefore, only the second element can be adjusted. Accordingly, we have analysed the behaviour of the model in the cases of a steeper and a more gradual loss of replicability using the following two vectors, respectively: 𝜀, = (0.05, 𝟎. 𝟎𝟓, 1) and 𝜀,, = (0.05, 𝟎. 𝟓, 1). The choice of 𝜀, is chemically more plausible, since for very short oligomers the loss of chemical activity and replicability as a function of the number of mutations can be very sharp. We performed a series of simulations with all possible combinations of 𝛿 = 0.001, 0.005, 0.1 and 𝑁 = 103, 104, 105 for 𝜀′ and 𝜀,,in the constant population and chemostat model context (36 different runs). For other parameters, we took the default values, see Table 1. These values also correspond to the parameters we used in Figures 2 and 6. The results show that the steeper loss of replicability (𝜀,) slightly increases the diversity maintaining ability of the system, whereas the more gradual loss of replicability (𝜀,,) moderately decreases the diversity-maintaining ability of the system, and that these shifts are more pronounced in the constant population size model (Author response image 1) than in the chemostat model (Author response image 2). Altogether, these results confirm that the qualitative outcome of the model is robust in a wide range of loss of replicability (𝜀 vector) values.

      Author response image 1.

      Replicator coexistence in the constant population model with different loss of replicability (𝜀 vector) values. Within a given combination of 𝛿 and 𝑁 parameter values, the upper panel corresponds to the steeper loss of replicability (𝜀!), the middle panel to the default 𝜀 vector (Figure 2A), and the bottom panel to the more gradual loss of replicability vector (𝜀!!). Within each 𝛿; 𝑁 parameter combination, the same master sequence set was used with the three different 𝜀 vectors for comparability.

      Author response image 2.

      Replicator coexistence in the chemostat model with different loss of replicability (𝜀 vector) values. Within a given combination of 𝛿 and 𝑁 parameter values, the upper panel corresponds to the steeper loss of replicability (𝜀!), the middle panel to the default 𝜀 vector (Figure 6A), and the bottom panel to the more gradual loss of replicability vector (𝜀!!). Within each 𝛿; 𝑁 parameter combination, the same master sequence set was used with the three different 𝜀 vectors for comparability.

      Similarly, in equation (11), where does the factor 0.8 come from?

      This factor scales the decay rate of duplex sequences (𝑐"!") as the function of the binding energy

      (𝐸b). The value of 0.8 is an arbitrary choice, the value should be in the interval (0,1) and is only relevant in the chemostat model. It is expected to have a similar effect on the dynamics as the duplex decay factor parameter 𝑓, which we have investigated in a wide range of different values (cf. Table 1, Fig. 6), although 𝑓 is independent of the binding energy (𝐸/): increasing/decreasing the 0.8 factor is expected to decrease/increase the average total population size. We have investigated the diversity maintaining ability of the system at smaller (0.6) and larger (0.9) parameter values at different population sizes (𝑁 ≈ 103, 104 and 105) and at different replicability distances (δ = 0.001, 0.005 and 0.01) as shown in Fig. 6. We have found that the number of coexisting master types changes very little in response to changes in this factor. Only two shifts could be detected (underlined): factor 0.9 combined with 𝑁 ≈ 104 and 𝛿 = 0.001 caused the number of surviving master types to decrease by one, while factor 0.9 combined with 𝑁 ≈ 103 and 𝛿 = 0.01 caused the number of surviving master types to increase by one (Author response table 1). Factor 0.6 produced the same number of surviving types as the default (Author response table 1). In summary, the model shows marked robustness to changes in the values of this parameter.

      Author response table 1.

      Number of coexisting master types in the chemostat model with different binding energy dependent duplex decay rates. Within each 𝛿; 𝑁 parameter combination, the same master sequence set was used with the three different factor values: 0.6, 0.8 (the original) and 0.9 for comparability.

      Why is the kinetic constant for duplex decay reaction 1.15e10−8?

      Note that this value is the minimum of the duplex decay rate, Table 1 correctly shows the interval of this kinetic constant as: [1.15 ⋅ 10-8, 6.4 ⋅ 10-5]. Both values are derived from the basic parameters of the system and can be computed according to Eq. (11). The minimum: as the parameter set corresponding to this value is: . The maximum: with .

      Are those values related to experiments, or are they chosen because specific behaviors can happen only then?

      See above.

      The choice of the model and parameters potentially impact the two main results, the attenuation of the error threshold and the role of GC content:

      Regarding the error threshold, it is also noted (lines 379-385) that it disappears when back mutations are taken into account. This suggests that overcoming the error threshold might not be as difficult as suggested, and can be achieved in several ways, which calls into question the importance of the particular role of parabolic growth. Besides, when the concentration of replicators is low, product inhibition may be negligible, such that a "parabolic replicator" is effectively growing exponentially and an error catastrophe may occur. Do the authors think that this consideration could affect their conclusion? Can simulations be performed?

      The assumption of back mutation only provides a theoretical solution to the error threshold problem: back mutation guarantees a positive (non-zero) concentration of a master type, but, since the probability of back mutation is generally very low, this equilibrium concentration may be extremely low, or negligible for typical system sizes. Consequently, back mutation alone does not solve the problem of the error catastrophe: in our system back mutation is present (the probability that a sequence with 𝑘 errors mutates back to a master sequence is 𝜇k(1−𝜇)L-k), and the diversity-maintaining ability is limited. The effect of back mutation decreases exponentially with increasing sequence length.

      Regarding the role of the GC content, GC-rich oligomers are found to perform the worst but no rationale is provided.

      For GC-rich oligonucleotides the dissociation probability of a template-copy complex is relatively low (cf. Eqs. (9, 10)), thus they have a relatively low number of offspring, cf. lines 557-561: “a relatively high dissociation probability and the consequential higher propensity of being in a simple stranded form provides an advantage for sequences with relatively low GC content in terms of their replication affinity, that is, the expected number of offspring in case of such variants will be relatively high.”. Note that the simulation results shown in Fig. 3A, demonstrate the realization of this effect with prepared sequences (along a GC content gradient).

      One may assume that it happens because GC-rich sequences are comparatively longer to release the product. However, it is also conceivable that higher GC content may help in the polymerization of the monomers as the monomers attach longer on the template (as described in Eq. (9)). This is an instance where the choice to pull into a single step the association and polymerization reactions are pulled into a single step independent of GC content may be critical.

      It would be important to show that the result arises from the actual physics and not from this modeling choice.

      Some more specific points that would deserve to be addressed:

      • Line 53: it is said that p "reflects how easily the template-reaction product complex dissociates". This statement is not correct. A reaction order p<1 reflects product inhibition, the propensity of templates to bind to each other, not slow product release. Product release can be limiting, yet a reaction order of 1 can be achieved if substrate concentrations are sufficiently high relative to oligomer concentrations (von Kiedrowski et al., 1991).

      We think the key reference is Von Kiedrowski (1993) in this case. Other things being equal, his Table 1 on p. 134 shows that a sufficient increase in 𝐾4, i.e., the stability of the duplex (template and copy) (association rate divided by dissociation rate) throws the system into the parabolic regime. This is what we had in mind. In order to clarify this, we modified the quoted sentence thus: “In this kinetics, the growth order is equal or close to 0.5 (i.e., the dynamics is sub-exponential) because increased stability of the template-copy complex (rate of association divided by dissociation) promotes parabolic growth (von Kiedrowski et al., 1991; von Kiedrowski & Szathmáry, 2001).”

      • Population size is a key parameter, and a comparison is made between small (10^3) and large (10^5) populations, but without explaining what determines the scale (small/large relative to what?).

      The “small” value (103) corresponds to the smallest meaningful population size, significantly smaller population sizes (e.g. 102) cannot maintain the 10 master types (or any subset of them) and are chemically unrealistic. The “large value” (105) is the largest population size for which simulation times are still acceptable, in the case of 106 the runtimes are in the order of months.

      • In the same vein, we might expect size not to be the only important parameter, but also concentration.

      With constant volume population size and concentration are strictly coupled.

      • Lines 543-546: if understanding correctly, the quantitative result is that the error threshold rises from 0.1 in the exponential case to 0.196 in the parabolic. Are the authors suggesting that a factor of 2 is a significant difference?

      In this paragraph we compared the empirical error threshold of our system (which is close to 𝑝"#$ = 0.15) with the error threshold of the well-known single peak fitness landscape (which can be approximated by ) as a reference case. To make the message even clearer we have extended the last sentence (lines 596-597) as follows: “but note that applying this approach to our system is a serious oversimplification”. The 0.196 is simply the probability of error-free replication of a sequence when , but we have removed this sentence (“corresponding to the replication accuracy of a master sequence”) from the manuscript as it seems to be confusing.

      • Figure 3C: this figure shows no statistically significant effect?

      Thank you for pointing out this. We statistically tested the hypothesis that the GC content between the survived and the extinct master subsets are different. This analysis revealed that the differences between these two groups are statistically significant, which we now included in the manuscript at lines 380-390: “A direct investigation of whether the sequence composition of the master types is associated with their survival outcome was conducted using the data from the constant population model simulation results (Figure 2). In these data, the average GC content was measured to be lower in the surviving master subpopulations than in the extinct subpopulations (Figure 3C). To determine whether this difference was statistically significant, nonparametric, two-sample Wilcoxon rank-sum tests (Hollander & Wolfe, 1999) were performed on the GC content of the extinct-surviving master subsets. The GC content was significantly different between these two groups in all nine investigated parameter combinations of population size (N) and replicability distance (δ) at p<0.05 level, indicating a selective advantage for a lower GC content in the constant population model context. The exact p values obtained from this analysis are shown in Figure 3C.”

      • line 542: "phase transition-like species extension (Figure 4B)": such a clear threshold is not apparent.

      Thank you for pointing out the incorrect phrasing. As there is no clear threshold in the number of coexisting types as a function of the mutation rate, we removed the “phase transition-like” expression: “However, when finite population sizes and stochastic effects are taken into account, at the largest investigated per-base mutation rate (𝑝mut = 0.15), the summed relative steady-state master frequencies approach zero (Figure 4C) with accelerating species extinction (Figure 4B), indicating that this value is close to the system׳s empirical error threshold.” (lines 589-594).

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      On the whole, the work is well done and presented, there are no major recommendations. It seems a good idea to cite and briefly discuss this recent paper: https://pubmed.ncbi.nlm.nih.gov/36996101/ which develops a symbiotic scenario of the coevolution of primordial replicators and reproducers that appears to be fully compatible with the results of the current work.

      Thank you for bringing this article to our attention. We have inserted the following sentence at lines 621-624: “The demonstrated diversity-maintaining mechanism of finite parabolic populations can be used as a plug-in model to investigate the coevolution of naked and encapsulated molecular replicators (e.g., Babajanyan et al., 2023).”

      The manuscript is well written, but there are some minor glitches that merit attention. For example:

      l. 5 "carriers presents a problem, because product formation and mutual hybridization" - "mutual" is superfluous here, delete

      l. 13 "amplification. In addition, sequence effects (GC content) and the strength of resource" - hardly "effects" - should be 'features' or 'properties'

      l. 41 "If enzyme-free replication of oligomer modules with a high degree of sequence" - "modules" here is only confusing - simply, "oligomers"

      l. 44 "under ecological competition conditions with which distinct replicator types with different" - delete "with" etc, there are many such minor glitches that are best corrected.

      Thank you for pointing out, we have corrected! Other drafting errors, glitches, superfluous sentences have also been corrected.

      Reviewer #2 (Recommendations For The Authors):

      None

      Editor (Recommendations For The Authors):

      In the manuscript, it appears that coexistence is assessed at a given point in time, while figures seem to show that it remains time-dependent. It would be great if the authors could clarify this and/or discuss this.

      We appreciate you bringing this to our attention, as we have indeed missed to elaborate on this important point. The steady state characteristic of the coexistence is assessed in our model in the following way: the relative frequency of each master sequence is tested for the condition of ≥ 100- (cut-off relative frequency for survival) in every 2,000th replication step in the interval between 10,000 replication steps before termination and actual termination (10= replication steps). If the above condition is true more than once, we consider the master type in question as survived (we have included this explanation in the Methods section: lines 258-268). Although this relatively narrow time interval can still be regarded as a snapshot of the state of the system, according to our numerical experiences, the resulting measure is a reliable quantitative indicator of the apparent stability of species coexistence in the parabolic dynamics.