25,084 Matching Annotations
  1. Aug 2023
    1. Author Response:

      We thank the reviewers for their constructive comments. Below we include a point by point response.

      Reviewer #1 (Public Review):

      [...] Elaborate on the Methodology: Provide an in-depth explanation of the two active learning batch selection methods, including algorithmic details, implementation considerations, and any specific assumptions made. This will enable readers to better comprehend and evaluate the proposed techniques.

      We thank the reviewer for this suggestion. Following this comments we will extend the text in Methods (in Section: Batch selection via determinant maxi- mization and Section: Approximation of the posterior distribution) and in Supporting Methods (Section: Toy example). We will also include the pseudo code for the Batch optimization method.

      Clarify Evaluation Metrics: Clearly specify the evaluation metrics employed in the study to measure the performance of the active learning methods. Additionally, conduct statistical tests to establish the significance of the improvements observed over existing batch selection methods.

      Following this comment we will add to Table 1 details about the way we computed the cutoff times for the different methods. We will also provide more details on the statistics we performed to determine the significance of these differences.

      Enhance Reproducibility: To facilitate the reproducibility of the study, consider sharing the code, data, and resources necessary for readers to replicate the experiments. This will allow researchers in the field to validate and build upon your work more effectively.

      This is something we already included with the original submission. The code is publicly available. In fact, we provide a phyton library, ALIEN (Active Learning in data Exploration) which is published on the Sanofi Github (https://github.com/Sanofi-Public/Alien). We also provide details on the public data used and expect to provide the internal data as well. We included a small paragraph on code and data availability.

      Reviewer #2 (Public Review):

      [...] I would expect to see a comparison regarding other regression metrics and considering the applicability domain of models which are two essential topics for the drug design modelers community.

      We want to thank the reviewer for these comments. We will provide a detailed response to their specific comments when we resubmit.

    2. eLife assessment

      This valuable study reports a new method based on batch active learning to optimize the biological and pharmaceutical properties of small molecules of pharmaceutical interest. The new method seems compelling, but the theoretical analysis is incomplete and the reproducibility and impact of the article would benefit from disclosing the code and datasets used in the study. With these aspects strengthened, this paper would be of interest to computational and medicinal chemists and scientists working in the drug discovery field.

    3. Reviewer #1 (Public Review):

      The authors present a study focused on addressing the key challenge in drug discovery, which is the optimization of absorption and affinity properties of small molecules through in silico methods. They propose active learning as a strategy for optimizing these properties and describe the development of two novel active learning batch selection methods. The methods are tested on various public datasets with different optimization goals and sizes, and new affinity datasets are curated to provide up-to-date experimental information. The authors claim that their active learning methods outperform existing batch selection methods, potentially reducing the number of experiments required to achieve the same model performance. They also emphasize the general applicability of their methods, including compatibility with popular packages like DeepChem.

      Strengths:

      Relevance and Importance: The study addresses a significant challenge in the field of drug discovery, highlighting the importance of optimizing the absorption and affinity properties of small molecules through in silico methods. This topic is of great interest to researchers and pharmaceutical industries.

      Novelty: The development of two novel active learning batch selection methods is a commendable contribution. The study also adds value by curating new affinity datasets that provide chronological information on state-of-the-art experimental strategies.

      Comprehensive Evaluation: Testing the proposed methods on multiple public datasets with varying optimization goals and sizes enhances the credibility and generalizability of the findings. The focus on comparing the performance of the new methods against existing batch selection methods further strengthens the evaluation.

      Weaknesses:

      Lack of Technical Details: The feedback lacks specific technical details regarding the developed active learning batch selection methods. Information such as the underlying algorithms, implementation specifics, and key design choices should be provided to enable readers to understand and evaluate the methods thoroughly.

      Evaluation Metrics: The feedback does not mention the specific evaluation metrics used to assess the performance of the proposed methods. The authors should clarify the criteria employed to compare their methods against existing batch selection methods and demonstrate the statistical significance of the observed improvements.

      Reproducibility: While the authors claim that their methods can be used with any package, including DeepChem, no mention is made of providing the necessary code or resources to reproduce the experiments. Including code repositories or detailed instructions would enhance the reproducibility and practical utility of the study.

      Suggestions for Improvement:

      Elaborate on the Methodology: Provide an in-depth explanation of the two active learning batch selection methods, including algorithmic details, implementation considerations, and any specific assumptions made. This will enable readers to better comprehend and evaluate the proposed techniques.

      Clarify Evaluation Metrics: Clearly specify the evaluation metrics employed in the study to measure the performance of the active learning methods. Additionally, conduct statistical tests to establish the significance of the improvements observed over existing batch selection methods.

      Enhance Reproducibility: To facilitate the reproducibility of the study, consider sharing the code, data, and resources necessary for readers to replicate the experiments. This will allow researchers in the field to validate and build upon your work more effectively.

      Conclusion:

      The authors' study on active learning methods for optimizing drug discovery presents an important and relevant contribution to the field. The proposed batch selection methods and curated affinity datasets hold promise for improving the efficiency of drug discovery processes. However, to strengthen the study, it is crucial to provide more technical details, clarify evaluation metrics, and enhance reproducibility by sharing code and resources. Addressing these limitations will further enhance the value and impact of the research.

    4. Reviewer #2 (Public Review):

      The authors presented a well-written manuscript describing the comparison of active-learning methods with state-of-art methods for several datasets of pharmaceutical interest. This is a very important topic since active learning is similar to a cyclic drug design campaign such as testing compounds followed by designing new ones which could be used to further tests and a new design cycle and so on. The experimental design is comprehensive and adequate for proposed comparisons. However, I would expect to see a comparison regarding other regression metrics and considering the applicability domain of models which are two essential topics for the drug design modelers community.

    1. Reviewer #2 (Public Review):

      Summary:<br /> The authors sought to identify the impact skin viscoelasticity has on neural signalling of contact forces that are representative of those experienced during normal tactile behaviour. The evidence presented in the analyses indicates there is a clear effect of viscoelasticity on the imposed skin movements from a force-controlled stimulus. Both skin mechanics and evoked afferent firing were affected based on prior stimulation, which has not previously been thoroughly explored. This study outlines that viscoelastic effects have an important impact on encoding in the tactile system, which should be considered in the design and interpretation of future studies. Viscoelasticity was shown to affect the mechanical skin deflections and stresses/strains imposed by previous and current interaction force, and also the resultant neuronal signalling. The result of this was an impaired coding of contact forces based on previous stimulation. The authors may be able to strengthen their findings, by using the existing data to further explore the link between skin mechanics and neural signalling, giving a clearer picture than demonstrating shared variability. This is not a critical addition, but I believe would strengthen the work and make it more generally applicable.

      Strengths:<br /> -Elegant design of the study. Direct measurements have been made from the tactile sensory neurons to give detailed information on touch encoding. Experiments have been well designed and the forces/displacements have been thoroughly controlled and measured to give accurate measurements of global skin mechanics during a set of controlled mechanical stimuli.<br /> -Analytical techniques used. Analysis of fundamental information coding and information representation in the sensory afferents reveals dynamic coding properties to develop putative models of the neural representation of force. This advanced analysis method has been applied to a large dataset to study neural encoding of force, the temporal dynamics of this, and the variability in this.

      Weaknesses:<br /> -Lack of exploration of the variation in neural responses. Although there is a viscoelastic effect that produces variability in the stimulus effects based on prior stimulation, it is a shame that the variability in neural firing and force-induced skin displacements have been presented, and are similarly variable, but there has been no investigation of a link between the two. I believe with these data the authors can go beyond demonstrating shared variability. The force per se is clearly not faithfully represented in the neural signal, being masked by stimulation history, and it is of interest if the underlying resultant contact mechanics are.

      Validity of conclusions:<br /> The authors have succeeded in demonstrating skin viscoelasticity has an impact on skin contact mechanics with a given force and that this impacts the resultant neural coding of force. Their study has been well-designed and the results support their conclusions. The importance and scope of the work is adequately outlined for readers to interpret the results and significance.

      Impact:<br /> This study will have important implications for future studies performing tactile stimulation and evaluating tactile feedback during motor control tasks. In detailed studies of tactile function, it illustrates the necessity to measure skin contact dynamics to properly understand the effects of a force stimulus on the skin and mechanoreceptors.

    2. Reviewer #1 (Public Review):

      The authors investigate how the viscoelasticity of the fingertip skin can affect the firing of mechanoreceptive afferents and they find a clear effect of recent physical skin state (memory), which is different between afferents. The manuscript is extremely well-written and well-presented. It uses a large dataset of low threshold mechanoreceptive afferents in the fingertip, where it is particularly noteworthy that the SA-2s have been thoroughly analyzed and play an important role here. They point out in the introduction the importance of the non-linear dynamics of the event when an external stimulus contacts the skin, to the point at which this information is picked up by receptors. Although clearly correlated, these are different processes, and it has been very well-explained throughout. I have some comments and ideas that the authors could think about that could further improve their already very interesting paper. Overall, the authors have more than achieved their aims, where their results very much support the conclusions and provoke many further questions. This impact of the previous dynamics of the skin affecting the current state can be explored further in so many ways and may help us to better understand skin aging and the effects of anatomical changes of the skin.

      At the beginning of the Results, it states that FA-2s were not considered as stimuli and did not contain mechanical events with frequency components high enough to reliably excite them. Was this really the case, did the authors test any of the FA-2s from the larger dataset? If FA-2s were not at all activated, this is also relevant information for the brain to signal that it is not a relevant Pacinian stimulus (as they respond to everything). Further, afferent receptive fields that were more distant to the stimulus were included, which likely fired very little, like the FA-2s, so why not consider them even if their contribution was low?

      One question that I wondered throughout was whether you have looked at further past history in stimulation, i.e. not just the preceding stimulus, but 2 or 3 stimuli back? It would be interesting to know if there is any ongoing change that can be related back further. I do not think you would see anything as such here, but it would be interesting to test and/or explore in future work (e.g. especially with sticky, forceful, or sharp indentation touch). However, even here, it could be that certain directions gave more effects.

      Did the authors analyze or take into account the difference between receptive field locations? For example, did afferents more on the sides have lower responses and a lesser effect of history?

      Was there anything different in the firing patterns between the spontaneous and non-spontaneously active SA-2s? For example, did the non-spontaneous show more dynamic responses?

      Were the spontaneously active SA-2 afferents firing all the time or did they have periods of rest - and did this relate to recent stimulation? Were the spontaneously active SA-2s located in a certain part of the finger (e.g. nail) or were they randomly spread throughout the fingertip? Any distribution differences could indicate a more complicated role in skin sensing.

      Did the authors look to see if the spontaneous firing in SA-2s between trials could predict the extent to which the type 1 afferents encode the proceeding stimulus? Basically, does the SA-2 state relate to how the type 1 units fire?

      In the discussion, it is stated that "the viscoelastic memory of the preceding loading would have modulated the pattern of strain changes in the fingertip differently depending on where their receptor organs are situated in the fingertip". Can the authors expand on this or make any predictions about the size of the memory effect and the distance from the point of stimulation?

      In the discussion, it would be good if the authors could briefly comment more on the diversity of the mechanoreceptive afferent firing and why this may be useful to the system.

      Also, the authors could briefly discuss why this memory (or recency) effect occurs - is it useful, does it serve a purpose, or it is just a by-product of our skin structure? There are examples of memory in the other senses where comparisons could be drawn. Is it like stimulus adaptation effects in the other senses (e.g. aftereffects of visual motion)?

      One point that would be nice to add to the discussion is the implications of the work for skin sensing. What would you predict for the time constant of relaxation of fingertip skin, how long could these skin memory effects last? Two main points to address here may be how the hydration of the skin and anatomical skin changes related to aging affect the results. If the skin is less viscoelastic, what would be the implications for the firing of mechanoreceptors?

      How long does it take for the effect to end? Again, this will likely depend on the skin's viscoelasticity. However, could the authors use it in a psychophysical paradigm to predict whether participants would be more or less sensitive to future stimuli? In this way, it would be possible to test whether the direction modifies touch perception.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      We don't see the case for 1,5-IP8 as settled in plants, and none of the papers mentioned above draws this strong conclusion. This may be due to several limitations in the available data. The mentioned studies do not allow to differentiate the effects of 1-IP7 and 1,5-IP8 and, where binding or competition experiments have been performed, e.g. on the transcription factors, the differences in the Kd values for IP7 and IP8 were minor. Furthermore,1,5-IP8 levels and Pi starvation response do not always correlate. IPTK1 mutants, for example, show Pi overaccumulation, and low 5-IP7, but normal 1,5-IP8 (Riemer et al., 2021). Finally, plants are complex organisms with multiple tissue types that serve for accumulating, exporting, transporting or finally consuming Pi. Therefore, correlating inositol pyrophosphate levels from whole-plant extracts with a Pi starvation response is problematic, except if these data could both be obtained from the same cell types or at least tissues.

      The comment of the reviewer made us recognize that the complex situation in plants deserves a more detailed coverage and we have therefore adjusted the introduction accordingly.

      Results: "We determined the corresponding lysines in Pho81 (Fig. S3), created a point mutation in the genomic PHO81 locus that substitutes one of them, K154, by alanine, and investigated the impact on the PHO pathway."

      In my opinion, it would be important to test here in a quantitative in vitro binding assay if (i) the SPX domain of Pho81 can bind PP-InsPs including 1,5-InsP8, (ii) if the dissociation constant is in agreement with the cellular levels of 1,5-InsP8 in yeast (compare Fig. 2) and (iii) if the K154A mutation blocks or reduces the binding of 1,5-InsP8. Without such experimentation, I find the statement "this result underlines the efficiency of the K154A substitution in preventing PP-IP binding to the Pho81 SPX domain." to be overly speculative, as no binding experiment has been conducted.

      We agree with the comment of the reviewer concerning the overstatement in the phrase. It has been deleted.

      As mentioned already in our previous work (Wild et al., 2016), Pho81SPX counts among the SPX domains that we could not express recombinantly. Likewise, full-length Pho81, which would be the relevant object for correlating in vitro binding studies with the cellular concentrations, has not been accessible. Expression in yeast did not provide sufficient material for ITC or other quantitative techniques. Therefore, we refrained from pursuing binding studies. Nevertheless, given the high conservation of the positively charged patch on SPX domains and the fact that, in every case where it has been tested so far, SPX domains showed inositol polyphosphate binding activity, we find it a conservative assumption that the Pho81SPX binds them as well. This is supported by the effects of the binding site mutant, which mimics the effect of ablating IP8 synthesis.

      Results: "Inositol pyrophosphate binding to the SPX domain labilizes the Pho81-Pho80 interaction." Again, in the absence of any protein - protein interaction assay I find this statement not to be supported by the experiments outlined in the manuscript. The best way to address this point would be to perform either co-IP or in vitro pull-down experiments between Pho81-SPX and Pho81-85, in the pre- and absence of 1,5-InsP8 and/or using the Pho81 point-mutants described in the text.

      Since Pho81 could not be produced recombinantly, neither by us nor by others who worked on this protein previously, quantitative in vitro binding assays are not accessible for now. A simple IP suffers from the problem that Pho81 interacts with Pho85-Pho80 not only through the SPX domain but also through the minimum domain. The latter interaction may be constitutive. Since the main point of the manuscript is not to dissect the exact mechanisms of Pho85-Pho80 regulations, but only to address the point why the postulated inactivation of this kinase by an 1-IP7/minimum domain complex makes no sense, we prefer not to show a profound (and more complex) analysis of how the different Pho81 domains contribute to binding.

      To test the potential of the SPX domain for binding Pho85/Pho80 in vivo, we have created a GFP-fusion of the SPX domain of Pho81. This fusion protein localizes mainly to the cytosol when cells are on high-Pi. Upon Pi starvation, it concentrates in the nucleus. This concentration is not observed in pho80 mutant background (New Fig. S7).

      In line with this, I would suggest to move the molecular modelling/docking studies from the discussion into the results section and to use these models to design some interface mutations that could be tested in coIP and/or pull-down assays. Alternatively, the authors may choose to omit the discussion section starting with: "Even though the minimum domain is unlikely to function as a receptor for PP-IPs this does not ... and ending with . In sum, multiple lines of evidence support the view that the SPX domain exerts dominant, 1,5-IP8 mediated control over Pho81 activity in response to Pi availability."

      We have now moved the modelling data to the Results section. The structure prediction of the interface is experimentally validated. Data on the effect of interface substitutions are already published, although these substitutions had not been recognized as affecting a common interface at the time. Substituting the interface residues either on the side of Pho80 or of Pho81 constitutively activates Pho85-Pho80 kinase and destabilizes its interaction with Pho81. This was shown by Co-IP experiments from cell extracts by Huang et al. We mention the respective substitutions in the manuscript and cite the paper in which their effect on PHO pathway activation had been described.

      Reviewer #2 (Recommendations For The Authors):

      Some points need additional attention by the authors:

      • In general, it would be helpful to introduce abbreviations more thoroughly (certain enzyme names, PA, MD, ...)

      We paid more attention to this.

      • Also in general, the authors may want to think about the nomenclature of inositol pyrophosphates. Given the expansion of PP-IPs that are being detected in different organisms these days it may be a good time to convert to a more precise nomenclature, i.e. 5PP-IP5 instead of 5-IP7; and 1,5(PP)2-IP4, instead of 1,5-IP8. The latter could just be stated once, and then be abbreviated as IP8.

      To our understanding the field has not yet come up with a unified nomenclature. Therefore, we prefer to stick with the more practical nomenclature that we have chosen, which also corresponds to what is commonly used in presentations and discussions among colleagues. We have now introduced a sentence making the link to the nomenclature that the reviewer has proposed.

      • p. 1, Abstract: "negative bioenergetic impacts" - the phrasing seems really vague

      Agreed, but we find it difficult to be more explicit and precise in the abstract while remaining concise and not distracting from the main message. This aspect is better explained in the introduction.

      • p. 3, Significance statement: "... unified model across all eukaryotic kingdoms" While the intended meaning of this wording is better explained in the text later, the phrasing here suggests a more all-encompassing study at hand, instead of a conclusion that fits more closely with established reports from other organisms. Please rephrase.

      We have adapted the phrase to avoid this impression.

      • p. 4: "IPTKs" - are the ITPKs meant here?

      Yes, that was a typo.

      • p. 7, the introduction ends abruptly and could use a concluding sentence.

      Done

      • p.7, "enzymes diphosphorylation either the..."; I understand what the authors are trying to say with diphosphorylating, but the enzymes are phosphorylating a phosphorylated substrate.

      Yes. We changed the phrase to "....adding phosphate groups at the 1- or 5-positions....".

      • p. 7, subtitle "...concentrations and kinetics of..."; kinetics of what? Synthesis/turnover?

      We corrected this subtitle

      • p. 8, with regards to the recovery experiment: Was this recovery determined elsewhere (please cite)? Otherwise it would be beneficial to include an extra figure to illustrate these recoveries in the supplementary information. And do the authors suspect some hydrolysis of IP8 given the lower recovery?

      We have now added the experiment testing recovery of IPPs as the new Fig. S1.

      • p. 9: It is appreciated that the authors point out the concentration of IP6 in S. cerevisiae. I found that concentration rather low, and the authors could highlight this a bit more, given their ability to carry our absolute quantification.

      This was a leftover from a previous version of the paper. Since the paper does not treat IP6 or lower inositol polyphosphates, we have deleted this phrase.

      • p. 9, Fig 2: The exponential decay of 5-IP7 is very nicely shown in Figure 2c. But one of the most important discussion points is IP8 being the key controller of the PHO pathway - it would therefore be beneficial for the argument to also show the same kind of graph for IP8 and if possible, fit a function to the data points to better quantify and compare the decay processes (e.g. via "half-life time" of PP-IPs during starvation, in addition to the suggested "critical concentration" which was only discussed for 5-IP7 thus far).

      Kinetic resolution is an issue here. The approach shown in Figs. 2 and 5 is not apt to determine a critical concentration of IP8 because the decline upon transfer to starvation conditions is too fast and difficult to relate to the equally rapid induction of the PHO pathway. We shall address this point in a more appropriate setup in a future study.

      • p.9, Fig 2a: Where does the 5-IP7 come from in the kcs1Δ strain? In the text the authors state that 5-IP7 in kcs1Δ was not detected, but the figure suggests otherwise. Please explain.

      Currently, we do not know where these residual signals stem from. One possibility is that they represent other isomers that exist in minor concentrations and that are not resolved from 5-IP7 in CE. We added a sentence to the figure legend to indicate this.

      • p. 10: "IP8 was undetectable in kcs1Δ and decreased by 75% in vip1Δ. kcs1Δ mutants also showed a 2 to 3-fold decrease in 1-IP7, suggesting that the synthesisof 1-IP7 depends on 5-IP7. This might be explained by assuming that a significant source of 1-IP7 is synthesis of 1,5-IP8 through successive action of Kcs1 and Vip1, followed by dephosphorylation to 1-IP7." - Please specify this statement. Do the authors mean that 1,5-IP8 is only produced transiently below the detection capabilities of the method but that there still is a (reduced) flux from 5-IP7 to 1,5-IP8 to 1-IP7? Otherwise it would seem paradoxical to have a dependency on a non-existing metabolite in that cell line.

      This was not clearly expressed. The revised version now says: " ... a 2 to 3-fold decrease in 1-IP7, suggesting that the synthesis of 1-IP7 depends on 5-IP7. This might be explained by assuming that, in the wildtype, most 1-IP7 stems from the conversion of 5-IP7 to 1,5-IP8, followed by dephosphorylation of 1,5-IP8 to 1-IP7.". We hope that this clarifies the matter.

      • p. 10: "pulse-labeling approaches are not available for PP-IPs." While this statement is correct, a recent paper co-authored by Qui and Jessen showed nice pulse-labeling data for the lower Ips and could be cited here (PMID: 36589890)

      Yes, indeed, we should have been more precise here. What we wanted to express was that rapid pulse-labeling methods for following phosphate group turnover were lacking, with a temporal resolution of minutes rather than hours. Existing pulse labeling approaches, including the study mentioned by the reviewer, do not provide that. We have changed the phrase accordingly.

      • p. 10: continuation of caption of Fig 2: "were extracted [and] analyzed"

      Corrected. Thank you.

      • p. 12: How is 1-IP7 made in the vip1 kcs1 double mutant?

      As explained above, we suspect that these may be side products of IPMKs, which accumulate in the absence of vip1 phosphatase.

      • p. 13, caption to Figure 3: "XXX cells were analyzed" please replace the place holder XXX.

      Done. Thank you.

      • p. 13, Fig 3B, C, D and p. 50, Fig. S4: On screen the contrast between the different shades of grey of the bars are just visible enough, but not on paper, I suggest using a higher contrast/ different colouring scheme.

      We enhanced the contrast.

      • p. 24, 25, Fig 7.: I could not really appreciate the AlphaFold part, and found it unnecessary. No docking or molecular dynamics simulations were carried out here, and it was not clear to me what information should be gleaned from this part.

      Following this comment, we have modified the respective part of the text. This part refers to a publication from the O'Shea lab (Nat. Chem Biol. 4,25) proposing the model that 1-IP7 and the Pho81 minimum domain bind competitively to the active site of Pho85 to inhibit its kinase activity. Modeling of complexes between Pho81, Pho80 and Pho85, which we present in the manuscript, rather suggests binding of the minimum domain to a groove in Pho80. This is important because it provides a viable alternative model for the action of the minimum domain. It suggests the minimum domain as a constitutive linker that attaches Pho80 to Pho85. Importantly, this model accounts perfectly for the results of previous random mutagenesis studies on Pho80 and on the minimum domain, which had independently identified both the Pho80 groove and the minimum domain residues that bind it in the prediction as critical residues for inhibition of Pho85, and for integrity of the Pho85/Pho80/Pho81 complex. We find this alternative explanation for Pho85-Pho80 regulation by Pho81, which we can derive by combining the predictions with already published experimental data, an important element to re-evaluate the relevance of 1-IP7 in PHO pathway regulation and resolve one of the existing discrepancies.

      • p. 28: No experiments were carried out with plants or mammals. The relevance for plants or mammalian systems therefore seems to be overstated at this point in time.

      We are not quite sure how to interpret this remark. We do not claim that our data support a role for IP8 in mammals and plants. But we refer to and cite studies providing the strongest evidence in favor of it in these systems. The relevance of our current study relies in refuting seemingly strong evidence from yeast, which had been diametrically opposed to the data obtained in plants and mammals. The revision of the situation in yeast now paves the way to drawing a coherent concept for fungi, plants and mammals. We feel that this is important and should be underlined.

      • p. 31: "300 mL of 3% ammonium" - 300 µL?

      Yes. Thank you.

      • p. 45, CE-ESI-MS parameters: "1IP8"

      Corrected.

      • p. 47: Figure S1: Please include more experimental details in the caption and/or methods section. Was a similar analysis software used as e.g. Figure S2 (NIS Elements Software)? Please also include all the analysis software in the Methods section under "fluorescence microscopy". Unless these additional experimental details already clarify the following point: Can the authors briefly comment on why the morphological determination in S1 requires trypan blue staining while in later experiments the yeast cells are readily recognized by the software in "simple" brightfield images?

      Trypan blue staining is not strictly required for this. It is just a simple method to fluorescently stain the cell wall. There are many other ways of delineating the cells. It could also have been done in a brightfield image.

      We updated the figure legend to better describe how these measurements were done and deposited the script and training file on figshare.

      • p. 48: "can be downloaded from **" please insert the link once the script is available online.

      It has been deposited at Figshare under DOI 10.6084/m9.figshare.c.6700281

      Reviewer #3 (Recommendations For The Authors):

      1) Italicize the scientific names of the organisms; this was inconsistent throughout the manuscript. Also, gene names should be italicized; this was also inconsistent (e.g., p.12 "... did not induce the PHO84 and PHO5 [sic] promoters...).

      Done

      2) Summary of the Figure 2A data in the text (p.9) probably has swapped the determined concentrations for 1-IP7 and IP8 (0.3 µM or 0.5 µM) as compared with the data figure.

      Yes, indeed. We have corrected this.

      3) Figure 2A: which of the mutant PP-IP levels are significantly different from the WT control?

      We have now added asterisks to indicate the significance for every mutant.

      4) In the discussion on the data (Fig. 2A), I was tripped up by the verb tense in this phrase "5-IP7 has not been detected in the kcs1Δ mutant and 1-IP7 has been strongly reduced..."; I think you want to use the past tense "was" in both cases [as is used in the next sentence]. It made me wonder if there was a difference in the detection of 5-IP7 and IP8 in the kcs1Δ mutant, you could detect 5-IP7 but not IP8; if so, where did the 5-IP7 come from?

      We have corrected the tense. Thank you for highlighting this. For the residual inositol pyrophosphate signal in kcs1Δ. We do not know its origin. One possibility, which we now mention in the text, is that it stems from IPMK side activity. It should be underlined that all signals disappear upon PI starvation.

      Figure 2C, include the data points that the lines are built from (suggestion).

      We refrained from that for the line graphs. For reasons of consistency, we should do this for every line graph. If we did that, Fig. 4B would become quite hard to read.

      6) Figure 3B-D, please check that the stipples or hatches are in the figure - the printed copy lacked them although I could see them in the electronic version; this was also true for Figures 5 and 6 (I do not know if it is a printer issue, but other hatches were visible: e.g., not seen in S4 but seen in S5).

      They are visible in our copies, also after printing. They may have been lost during file conversion at the journal.

      7) The text description of the Pho4-yEGFP, Pho5-yEGFP and Pho84-yEGFP says that the kcs1Δ mutant "showed Pho4-yEGFP constitutively in the nucleus already ... and PHO5 and PHO84 were activated". However, the data is more complex than that: whereas the localization of Pho4-yEGFP is constitutively nuclear, there is a higher basal (repressed) expression of both Pho5 and Pho84 as well as increased expression of both proteins under -Pi conditions. What accounts for the increased expression when Pho4 is already nuclear? This is also seen in the vip1Δ kcs1Δ mutant.

      We agree with the reviewer, but we cannot explain this effect with certainty. One possibility could be a wider dysregulation of Pi metabolism in kcs1 mutants. To name a few possibilities: Wildtype cells have polyphosphate reserves that are gradually mobilized during the first hours of P-starvation. kcs1 mutants don't have those and might fall into a "deeper" state of starvation faster. It should be kept in mind that the starvation response is also regulated at the level of chromatin structure, and by antisense transcripts. The influence of kcs1 on these processes is unclear.

      8) Figure 9 legend: please add a definition of the MP region (in red) and include it more explicitly in the described model.

      We now mention the relevant region also in the legend and have labeled the relevant regions in the images (Huang et al., 2001).

      9) Figure S2 legend: information is missing (downloading link).

      It has been deposited at Figshare under DOI 10.6084/m9.figshare.c.6700281

      10) Figure S4 and S5, missing statistics.

      They have been added to the new Fig. S6, which interprets differences between strains and conditions. Fig. S4 (now S3) shows timecourses of IPPs down to zero. Adding statistics for all pairwise differences between the timepoints would be almost an overkill.

    2. eLife assessment

      This fundamental study describes the mechanisms for regulation of the phosphate starvation response in baker's yeast, clarifies the interpretations of prior data, and suggests a unifying mechanism across eukaryotes. The study provides compelling data, based on biochemical analyses, protein localization by fluorescence, and genetic approaches that 1,5-InsP8 is the phosphate nutrient messenger in yeast.

    3. Reviewer #1 (Public Review):

      Recent studies in plants and human cell lines argued for a central role of 1,5-InsP8 as the central nutrient messenger in eukaryotic cells, but previous studies concluded that this function is performed by 1-InsP7 in baker's yeast. Chabert et al now performed an elegant set of capillary electrophoresis coupled to mass spectrometry time course experiments to define the cellular concentrations of different inositol pyrophosphosphates (PP-InsPs) in wild-type yeast cells under normal and phosphate (Pi) starvation growth conditions. These experiments, in my opinion, form the center of the present study and clearly highlight that the levels of all major PP-InsPs drop under Pi starvation, with the 1,5-InsP8 isomer showing the most rapid changes.

      The analysis of known mutants in the PP-InsP biosynthetic pathways furthermore demonstrate that loss-of-function of the PPIP5K enzymes Kcs1 and Vip1 result in a loss of 1,5-InsP8 and a hyperaccumulation of 5-InsP7, respectively. In line with this, loss-of-function of known PP-InsP phosphatases Ddp1 and Swi14 result in hyperaccumulation of either 1- or 5-InsP7, as anticipated from their in vitro substrate specificities. These experiments are of high technical quality and add to our understanding of the kinetics of PP-InsP metabolism/catabolism in yeast.

      Next, the authors use changes in subcellular localisation of the central transcription factor Pho4 to assay at which time point after onset of Pi starvation the PHO pathway becomes activated. The early onset of the response, the behavior of the kcs1D mutant and of the ksc1D/vip1D all strongly argue for 1,5-InsP8 as the central nutrient messenger. I find this part of the manuscript well argued, nicely correlating PP-InsP levels, dynamics and the different mutant phenotypes.

      The third part of the manuscript is a structure-function study of the CDK inhibitor Pho81, basically using a reverse genetics approach. This analysis demonstrates at the genetic level that the Pho81 SPX domain is required for activation of the PHO pathway. Next, the authors design point mutations that should block either interaction of Pho81-SPX with 1,5-InsP8 or interaction of Pho81 with the Pho80/Pho85 complex. In my opinion, these data can only provide limited insight into the molecular mechanism, as no complementary in vitro binding assays / in vivo co-IP experiments with the wild-type and mutant forms of Pho81 are presented. This seems to be due to technical limitations in recombinantly expressing and purifying the respective Pho81 protein for in vitro PP-InsP binding and protein - protein interaction assays.

      Taken together, the work by Chabert et al, reinvestigates and clarifies the activation of the yeast PHO pathway by PP-InsP nutrient messengers and their cellular SPX receptors. From this work, a more unified eukaryotic mechanism emerges, in which 1,5-InsP8 represents the central signaling molecule in different species, with conserved SPX receptors sensing this signaling molecule.

    4. Reviewer #2 (Public Review):

      The manuscript by Chambert et al. describes a thorough and careful characterization of inositol pyrophosphate isomers and the PHO pathways in different genetic backgrounds in S. cerevisiae. The paper ultimately arrives at a proposed model in which the inositol pyrophosphate 1,5-IP8 signals phosphate abundance to SPX-domain containing proteins. To arrive at their conclusion, the authors rely heavily on CE-MS analysis of inositol pyrophosphates in different yeast strains, and monitoring inositol pyrophosphate depletion over time in response to phosphate starvation. This analysis is complemented by different reporter systems of PHO pathway activation, such as Pho4 translocation and Pho81 expression.

      The experiments are well-designed and the results interpreted with care. With their findings, the authors demonstrate convincingly, that a previous study by O'Shea and co-workers (reference 15 and 16) had been misleading. Lee et al. claimed that the PHO pathway in S. cerevisiae is triggered by an increase in 1-IP7. This claim has been debated heavily in the community, and several groups were not able to reproduce this putative increase of inositol pyrophosphates (references 6, 11, 18). The confusion regarding these discrepancies has been resolved by the current study and is of significant importance to the community.

    5. Reviewer #3 (Public Review):

      Summary. This study sought to clarify the connection between inositol pyrophosphates (IPPs) and their regulation of phosphate homeostasis in the yeast Saccharomyces cerevisiae to answer the question of whether any of the IPPs (1-IP7, 5-IP7, and IP8) or only particular IPPs are involved in regulation. IPPs bind to SPX domains in proteins to affect their activity, and there are several key proteins in the PHO pathway that have an SPX domain, including Pho81. The authors use the latest methodology, capillary electrophoresis and mass spectrometry (CE-MS), to examine the cytosolic concentrations of PP-IPs in wild-type and strains carrying mutations in the enzymes that metabolize these compounds in rich medium and during a phosphate starvation time-course for the wild-type.

      Major strengths and weaknesses. The authors have strong premises for performing these experiments: clarifying the regulatory molecule(s) in yeast and providing a unifying mechanism across eukaryotes. They use the latest methodologies and a variety of approaches including genetics, biochemistry, cell biology and protein structure to examine phosphate regulation. Their experiments are rigorous and well controlled, and the story is clearly told. The consideration of physiological levels of IPPs throughout the study was critical to interpretation of the data and a strength of the manuscript. The investigation of the structure of Pho81, its regulation by IPPs, and its interactions with Pho80 provide a vivid model for regulation.

      Appraisal. The authors achieved their goal of determining the mechanistic details for phosphate regulation, revising the prior model with new insights. Additionally, they provided strong support for the idea that IP8 regulates phosphate metabolism across eukaryotes - including animals and plants in addition to fungi.

      Impact. This study is likely to have broad impact because it addresses prior findings that are inconsistent with current understanding, and they provide good reasoning as to how older methods were inadequate.

    1. eLife assessment

      This is an important study that investigates the role of commensal microbes and molecules in the antigen presentation pathway in the development and phenotype of an unusual population of T lymphocytes. The authors provide convincing evidence to identify a population of unconventional T cells that exist in the small intestine epithelium, which appear to depend on commensal microbes, and show that a single commensal microbe (that encodes an antigen capable of weakly stimulating these cells) is sufficient to maintain the T cell population.

    2. Reviewer #1 (Public Review):

      Guan et al. explored the mechanisms responsible for the development, maintenance, and functional properties of a specific subset of unconventional T cells expressing a Va3.2 T cell receptor that recognizes a peptide, QFL, presented by the class Ib protein Qa-1. Prior studies from this group showed that cells from mice deficient in the ER protease ERAAP elicit responses in wild-type animals enriched for Qa-1-restricted CD8 T cells. They further showed that a significant proportion of these responses were directed against the QFL peptide derived from a conserved protein with incompletely understood functions. Many of these so-called QFL T cells expressed Va3.2-Ja21, were present in the spleen of wild-type mice, and exhibited a memory-like phenotype. Due to their relatively low frequency and weak staining with Qa-1 tetramers, analyzing QFL T cells has been challenging. Therefore, the authors generated dextramers, which permitted them to more rigorously identify these cells. They confirmed some of their previous findings and further showed that Va3.2+ and Va3.2- QFL T cells were present in the intestinal epithelium, where they also express CD8alpha homodimers, a characteristic of most small intestinal intraepithelial lymphocytes (siIELs), and most similar to the so-called natural siIELs that acquire their innate functions in the thymus. The authors show that TAP but not Qa-1 or ERAAP expression are required for the development of these cells, and both Qa-1 and ERAAP are required for the natural siIEL phenotype. Some of these findings were confirmed using a new TCR transgenic mouse expressing the QFL TCR. They further show that retention but not homing of QFL T cells to the intestinal epithelium involves commensal microorganisms, and using in silico approaches, they identify a commensal that contains a peptide similar to QFL that can activate QFL T cells. Finally, they show that this organism, P. pentosaceus, can promote gut retention of QFL T cells when it is introduced into germ-free mice. From these findings, the authors conclude that the microbiota influences the maintenance of Qa-1-restricted T cells.

      Comments:<br /> 1. Overall, the authors employ a number of new reagents and elegant approaches to explore the development, maintenance, and functional properties of QFL T cells.<br /> 2. Generally, conclusions made are well supported by the data presented.<br /> 3. One limitation of the work is that the immunological functions of QFL T cells remain unclear.<br /> 4. The work covers a lot of ground (intestinal IELs, unconventional T cells, innate/virtual memory T cells, Qa-1/HLA-E, etc) that may not be familiar to many readers.<br /> 5. A few questions remain:<br /> a) Regarding the results for TAP knockout animals, since Qa-1 does not appear to be required for QFL T cell development, the absence of these cells in TAP KO mice cannot be easily explained.<br /> b) The Va3.2 T cells display similarities with previously identified innate/virtual memory T cells, some of which require IL-4 production by CD1d-restricted NKT cells for their intrathymic development, which is not fully discussed.<br /> c) Qa-1/peptide complexes may also be recognized by CD94/NKG2 (both inhibitory and activating) receptors on NK cells and subsets of CD8 T cells, which may complicate data interpretation, but is not noted in the text.<br /> d) Are these conclusions relevant to the human homolog of Qa-1, HLA-E?

    3. Reviewer #2 (Public Review):

      Summary:<br /> CD8+ QFL T cells recognize a peptide, FYAEATPML (FL9), presented on Erap1-deficient cells. QFL T cells are present at a high frequency in the spleen of naïve mice. They express an antigen-experienced phenotype, and about 80% express an invariant TCRα chain Vα3.2Jα21.

      Here, Guan and colleagues report that QFL T cells are present not only in the spleen but also in the intestinal epithelium, where they display several phenotypic and functional peculiarities. The establishment of spleen and gut Vα3.2+ QFL T cells is TAP-dependent, and their phenotype is regulated by the presence/absence of Qa-1b and Erap1. Maintenance of gut Vα3.2+ QFL T cells depends on the gut microbiota and is associated with colonization by Pediococcus pentosaceus.

      Strengths:<br /> This article contains in-depth studies of a peculiar and interesting subset of unconventional CD8 T cells, based partly on generating two novel TCR-transgenic models.

      The authors discovered a clear relation between the gut microbiome and the maintenance of gut QFL T cells. One notable observation is that monocolonization of the gut with Pediococcus pentosaceus is sufficient to sustain gut QFL T cells.

      Weaknesses:<br /> In the absence of immunopeptidomic analyses, the presence or absence of the FL9 peptide on various cell types is inferred based on indirect evidence.

      Analyses of the homology between the FL9 and bacterial peptides were limited to two amino acid residues (P4 and P6).

      The potential function of QFL T cells remains elusive.

    4. Reviewer #3 (Public Review):

      The authors investigate the role of commensal microbes and molecules in the antigen presentation pathway in the development and phenotype of CD8 T cells specific for the Qa-1b-restricted peptide FL9 (QFL). The studies track both endogenous QFL-specific T cells and utilize a recently generated TCR transgenic model. The authors confirm that QFL-specific T cells in the spleen and small intestine intraepithelial lymphocyte (IEL) pool show an antigen-experienced phenotype as well as unique phenotypic and innate-like functional traits, especially among CD8+ T cells expressing Va3.2+ TCRs. They find that deficiency in the TAP transporter leads to almost complete loss of QFL-specific T cells but that loss of either Qa1 or the ERAAP aminopeptidase does not impact QFL+ T cell numbers but does cause them to maintain a more conventional, naïve-like phenotype. In germ-free (GF) mice, the QFL-specific T cells are present at similar numbers and with a similar phenotype to SPF animals, but in older animals (>18w) there is a notable loss of IEL QFL-specific cells. This drop can be avoided by neonatal colonization of GF mice with the commensal microbe Pediococcus pentosaceus but not a different commensal, Lactobacillus johnsonii, and the authors show that P. pentosaceus encodes a peptide that weakly stimulates QFL-specific T cells, while the homologous peptide from L. johnsonii does not stimulate such cells.

      This study provides new insights into the way in which the differentiation, phenotype, and function of CD8+ T cells specific for Qa-1b/FL9 is regulated by peptide processing and Qa1 expression, and by interactions with the microbiota. The approaches are well designed, the data compelling, and the interpretation, for the most part, appropriate. There are a few relatively minor concerns.

      1) For most of the report, the authors use a set of phenotypic traits to highlight the unique features of QFL-specific CD8+ T cells - specifically, CD44high, CD8aa+ve, CD8ab-ve. In Supp. Fig. 4, however, completely distinct phenotypic characteristics are presented, indicating that IEL QFL-specific T cells are CD5low, Thy-1low. No explanation is provided in the text about whether this is a previously reported phenotype, whether any elements of this phenotype are shared with splenic QFL T cells, what significance the authors ascribe to this phenotype (and to the fact that Qa1-deficiency leads to a more conventional Thy-1+ve, CD5+ve phenotype), and whether this altered phenotype is also seen in ERAAP-deficient mice. At least some explanation for this abrupt shift in focus and integration with prior published work is needed. On a related note, CD5 expression is measured in splenic QFL-specific CD8+ T cells from GF vs SPF mice (Supp. Fig. 9), to indicate that there is no phenotypic impact in the GF mice - but from Supp. Fig. 4, it would seem more appropriate to report CD5 expression in QFL-specific cells from the IEL, not the spleen.

      2) The authors suggest the finding that QFL-specific cells from ERAAP-deficient mice have a more "conventional" phenotype indicates some form of negative selection of high-affinity clones (this result being somewhat unexpected since ERAAP loss was previously shown to increase the presentation of Qa-1b loaded with FL9, confirmed in this report). It is not clear how this argument aligns with the data presented, however, since the authors convincingly show no significant reduction in the number of QFL-specific cells in ERAAP-knockout mice (Fig. 3a), and their own data (e.g. Fig. 2a) do not suggest that CD44 expression correlates with QFL-multimer staining (as a surrogate for TCR affinity/avidity). Is there some experimental basis for suggesting that ERAAP-deficient lacks a subset of high-affinity QFL-specific cells?

      3) The rationale for designing FL9 mutants, and for using these data to screen the proteomes of various commensal bacteria needs further explanation. The authors propose P4 and P6 of FL9 are likely to be "critical" but do not explain whether they predict these to be TCR or Qa-1b contact sites. Published data (e.g., PMID: 10974028) suggest that multiple residues contribute to Qa-1b binding, so while the authors find that P4A completely lost the ability to stimulate a QFL-specific hybridoma, it is unclear whether this is due to the loss of a TCR- or a Qa-1-contact site (or, possibly, both). This could easily be tested - e.g., by determining whether P4A can act as a competitive inhibitor for FL9-induced stimulation of BEko8Z (and, ideally, other Qa-1b-restricted cells, specific for distinct peptides). Without such information, it is unclear exactly what is being selected in the authors' screening strategy of commensal bacterial proteomes. This, of course, does not lessen the importance of finding the peptide from P. pentosaceus that can (albeit weakly) stimulate QFL-specific cells, and the finding that association with this microbe can sustain IEL QFL cells.

    1. eLife assessment

      This important study combines experiments on human mutation and making a mouse model lacking IQCH and the functional consequences on spermatogenesis. The mouse model is compelling but some of the analysis is indirect and incomplete and would benefit from more rigorous direct approaches. With the experimental evidence that supports direct interaction between IQCH and potential RNA binding proteins strengthened, this paper would be of interest to cell biologists and male reproductive biologists working on the sperm flagellar cytoskeleton and mitochondrial structure.

    2. Reviewer #1 (Public Review):

      By identifying a loss of function mutant of IQCH in an infertile patient, Ruan et al. show that IQCH is essential for spermiogenesis by generating a knockout mouse model of IQCH. Similar to infertile patients with a mutant of IQCH, IQCH knockout mice are characterized by a cracked flagellar axoneme and abnormal mitochondrial structure. Mechanistically, IQCH regulates the expression of RNA-binding proteins (especially HNRPAB), which are indispensable for spermatogenesis.

      Although this manuscript contains a potentially interesting piece of work that delineates a mechanism of IQCH that associates with spermatogenesis, this reviewer feels that a number of issues require clarification and re-evaluation for a better understanding of the role of IQCH in spermatogenesis. With the shortage of logic and supporting data, causal relationships are still not clear among IQCH, CaM, and HNRPAB. The most serious point in this manuscript could be that the authors try to generalize their interpretations with a model that is too simplified from limited pieces of their data. The way the data and the logic are presented needs to be largely revised, and several interpretations should be supported by direct evidence.

    3. Reviewer #2 (Public Review):

      The manuscript "IQCH regulates spermatogenesis by interacting with CaM to promote RNA-binding proteins' expression" by Ruan et al. identified a homozygous variant affecting the splicing of IQCH in two infertile men from a Chinese family. The authors also generated an IQCH knockout mouse model to confirm the abnormal sperm phenotypes associated with IQCH deficiency. Further molecular biological assays supported the important role and mechanism of IQCH in spermatogenesis. This manuscript is informative for clinical and basic research on male infertility.

    4. Reviewer #3 (Public Review):

      In this study, Ruan et al. investigate the role of the IQCH gene in spermatogenesis, focusing on its interaction with calmodulin and its regulation of RNA-binding proteins. The authors examined sperm from a male infertility patient with an inherited IQCH mutation as well as IQCH CRISPR knockout mice. The authors found that both human and mouse sperm exhibited structural and morphogenetic defects in multiple structures, leading to reduced fertility in ICHQ-knockout male mice. Molecular analyses such as mass spectrometry and immunoprecipitation indicated that RNA-binding proteins are likely targets of IQCH, with the authors focusing on the RNA-binding protein HNRPAB as a critical regulator of testicular mRNAs. The authors used in vitro cell culture models to demonstrate an interaction between IQCH and calmodulin, in addition to showing that this interaction via the IQ motif of IQCH is required for IQCH's function in promoting HNRPAB expression. In sum, the authors concluded that IQCH promotes male fertility by binding to calmodulin and controlling HNRPAB expression to regulate the expression of essential mRNAs for spermatogenesis. These findings provide new insight into molecular mechanisms underlying spermatogenesis and how important factors for sperm morphogenesis and function are regulated.

      The strengths of the study include the use of mouse and human samples, which demonstrate a likely relevance of the mouse model to humans; the use of multiple biochemical techniques to address the molecular mechanisms involved; the development of a new CRISPR mouse model; ample controls; and clearly displayed results. There are some minor weaknesses in that more background details could be provided to the reader regarding the proteins involved; some assays could benefit from more rigorous quantification; some of the mouse testis images and analyses could be improved; and larger sample sizes, especially for the male mouse breeding tests, could be increased. Overall, the claims made by the authors in this manuscript are well-supported by the data provided and there are only minor technical issues that could increase the robustness and rigor of the study.

      1. More background details are needed regarding the proteins involved, in particular IQ proteins and calmodulin. The authors state that IQ proteins are not well-represented in the literature, but do not state how many IQ proteins are encoded in the genome. They also do not provide specifics regarding which calmodulins are involved, since there are at least 5 family members in mice and humans. This information could help provide more granular details about the mechanism to the reader and help place the findings in context.

      2. The mouse fertility tests could be improved with more depth and rigor. There was no data regarding copulatory plug rate; data was unclear regarding how many WT females were used for the male breeding tests and how many litters were generated; the general methodology used for the breeding tests in the Methods section was not very explicitly or clearly described; the sample size of n=3 for the male breeding tests is rather small for that type of assay; and, given that ICHQ appears to be expressed in testicular interstitial cells (Fig. S10) and somewhat in other organs (Fig. S2), another important parameter of male fertility that should be addressed is reproductive hormone levels (e.g., LH, FSH, and testosterone).

      3. The Western blots in Figure 6 should be rigorously quantified from multiple independent experiments so that there is stronger evidence supporting claims based on those assays.

      4. Some of the mouse testis images could be improved. For example, the PNA and PLCz images in Figure S7 are difficult to interpret in that the tubules do not appear to be stage-matched, and since the authors claimed that testicular histology is unaffected in knockout testes, it should be feasible to stage-match control and knockout samples. Also, the anti-ICHQ and CaM immunofluorescence in Figure S10 would benefit from some cell-type-specific co-stains to more rigorously define their expression patterns, and they should also be stage-matched.

    1. eLife assessment

      This important study advances the understanding of physiological mechanisms in deep-sea Planctomycetes bacteria, revealing unique characteristics such as the only known Phycisphaerae using a budding mode of division, extensive involvement in nitrate assimilation and release phage particles without cell death. The study uses convincing evidence, based on experiments using growth assays, phylogenetics, transcriptomics, and gene expression data. The work will be of interest to bacteriologists and microbiologists in general.

    2. Reviewer #1 (Public Review):

      The authors of the manuscript cultivated a Planctomycetes strain affiliated with Phycisphaerae. The strain was one of the few Planctomycetes from deep-sea environments and demonstrated several unique characteristics, such as being the only known Phycisphaerae using a budding mode of division, extensive involvement in nitrate assimilation, and being able to release phage particles without cell death. The manuscript is generally well-written. However, a few issues need to be more clearly addressed, especially regarding the identification and characterization of the phage.

    3. Reviewer #2 (Public Review):

      Summary:

      Planctomycetes encompass a group of bacteria with unique biological traits, the compartmentalized cells make them appear to be organisms in between prokaryotes and eukaryotes. However, only a few of the Planctomycetes bacteria are cultured thus far, and this hampers insight into the biological traits of these evolutionarily important organisms.

      This work reports the methodology details of how to isolate the deep-sea bacteria that could be recalcitrant to laboratory cultivation, and further reveals the distinct characteristics of the new species of a deep-sea Planctomycetes bacterium, such as the chronic phage release without breaking the host and promote the host and related bacteria in nitrogen utilization. Therefore, the finding of this work is of importance in extending our knowledge of bacteria.

      Strengths:

      Through the combination of microscopic, physiological, genomics, and molecular biological approaches, this reports the isolation and comprehensive investigation of the first anaerobic representative of the deep-sea Planctomycetes bacterium, in particular in that of the budding division, and release phage without lysis of the cells. Most of the results and conclusions are supported by the experimental evidence.

      Weaknesses:

      1. While EMP glycolysis is predicted to be involved in energy conservation, no experimental evidence indicated any sugar utilization by the bacterium.<br /> 2. "anaerobic representative" is indicated in the Title, the contrary, TCA in energy metabolism is predicted by the bacterium.<br /> 3. The possible mechanisms of the chronic phage release without breaking the host are not discussed.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      It is very important to find practical and efficient means in order to increase agricultural productivity. Drawing on data from variable field environments, this study provides a useful theoretical framework to identify new factors that could increase agricultural production. There is solid evidence to support the authors' claims, though following the fate of candidate species after introduction into rice fields would have strengthened the study. Plant biologists and ecologists working in nature and fields will find the work interesting.

      Thank you so much for your careful evaluation of our manuscript. We are very pleased to hear that you found our framework useful. We have revised our manuscript according to the "Recommendations for the Authors" to improve our manuscript.

      Public Review

      Reviewer #1 (Public Review):

      This manuscript describes the identification of influential organisms on rice growth and an attempt of validation. The analysis of eDNA on rice pot and mimic field provides rice growth promoting organisms. This approach is novel for plant ecology field. However current results did not fully support whether eDNA analysis-based detection of influencing organism.

      Thank you so much for evaluating our manuscript. We have carefully read and responded to your comments. We hope our responses resolve your concerns on our study.

      The strength of this manuscript is to attempt application of eDNA analysis-based plant growth differentiation. The weakness is too preliminary data and experimental set-up to make any conclusion. The trials of authors experiments are ideal. However, the process of data analysis did not meet certain levels. For example, eDNA analysis of different time points on rice growth stages resulted in two influential organisms for rice growth. Then they cultivate two species and applied rice seedlings. Without understanding of fitness and robustness, how we can know the effect of the two species on rice growth.

      We agree with your comments that we did not have the fitness data of the two species and/or rice seedlings. Thus, it is still difficult to obtain deep understanding of the mechanisms of our findings that the species introduced in the system would influence rice growth. Nonetheless, our study demonstrated the effectiveness of our research framework as we found evidence that the species that were discovered by the eDNA monitoring and time series analysis indeed cause changes in the system. We believe that the first step is to show that the framework is workable and that detailed understanding of the mechanisms or genetic pathway was not a focus of our study. To avoid misunderstanding, we have added several explanations regarding this point in L426–431 and L447. For example, in L426, we have added the following statement: "... the detailed dynamics of the two introduced species was unclear (i.e., the fate of the introduced species). This is particularly important for understanding how the introduced organisms affected rice performance...".

      The authors did not check the fate of two species after introducing into rice. If this is true, it is difficult to link between the rice gene expression after treatments and the effectiveness of two species. I think the validation experiment in 2019 needs to be re-conducted.

      We did not check the fate of the two species (except measuring the eDNA concentrations of the species), and it is true that we cannot show evidence of "how" these two species influence the rice gene expression. Understanding molecular mechanisms of the phenomenon that we found is important (especially from the viewpoint of molecular biology), but our primary objective was to demonstrate that our "eDNA x time series analysis" framework is feasible for detecting previously overlooked but influential organisms. To this end, we believe that we achieved our objective and repeating the validation experiment should be for a different purpose (i.e., for understanding molecular mechanisms). We have clarified these points in L426–431 and L447 as explained above.

      Reviewer #2 (Public Review):

      The manuscript "Detecting and validating influential organisms for rice growth: An ecological network approach" explores the influence of biotic and abiotic entities that are often neglected on rice growth. The study has a straightforward experimental design, and well thought hypothesis for explorations. Monitoring data is collected to infer relationships between species and the environment empirically. It is analyzed with an up-to-date statistical method. This allowed the manuscript to hypothesize and test the effects most influential entities in a controlled experiment.

      Thank you so much for your careful evaluations. We are pleased to see that you evaluated our manuscript positively. We have further revised our manuscript according to your comments and hope the revision has resolved your concerns.

      The manuscript is interesting and sets up a nice framework for future studies. In general, the manuscript can be improved significantly, when this workflow is smoothly connected and communicated how they follow each other more than the sequence and dates provided. It is valuable philosophical thinking, and the research community can benefit from this framework.

      Thank you for your suggestions. In order to improve the logic flow and readability of our manuscript, we have revised the descriptions of workflow and clarified how the experimental and statistical steps were connected to each other. To do so, we have added brief explanations about what/how we did at the first sentence of Results subsections (some of these explanations were only in Materials and Methods in the original manuscript). Also, we have moved all of the Supplementary Materials and Methods to the main text. We have thoroughly revised the manuscript, and we hope that all the parts of our manuscript have been connected more smoothly than in the original manuscript.

      I understand the length and format of the manuscript make it difficult to add more details, but I am sure it can refer to/clear some concepts/methods that might be new for the audience. How/why variables are selected as important parts of the system, a tiny bit of information about the nonlinear time series analysis in the early manuscript, and the biological reasoning behind these statistically driven decisions are some examples.

      We have explained how/why variables are selected (in L125), added more information about the nonlinear time series analysis (in L129 and L175) , and added the biological reasoning behind the statistical decisions (L195).

      Reviewer #3 (Public Review):

      Most farming is done by subtracting or adding what people want based in nature. However, in nature, crops interact with various objects, and mostly we are unaware of their effects. In order to increase agricultural productivity, finding useful objects is very important. However, in an uncontrolled environment, it coexists with so many biological objects that it is very inefficient to verify them all experimentally. It is therefore necessary to develop an effective screening method to identify external environmental factors that can increase crop productivity. This study identified factors presumed to be important to crop growth based on metabarcoding analysis, field sampling, and non-linear analysis/information theory, and conducted a mesocosm experiment to verify them experimentally. In conclusion, the object proposed by the author did not increase rice yield, but rather rice growth rate.

      Thank you so much for your evaluation of our manuscript. We have revised our manuscript based on your comments, and hope it has been improved compared with the original version.

      Strength

      In actual field data, since many variables are involved in a specific phenomenon, it is necessary to effectively eliminate false positives. Based on the metabarcoding technique, various variables that may affect rice growth were quantitatively measured, although not perfectly, and the causal relationship between these variables and rice growth was analyzed by using information transfer analysis. Using this method, two new players capable of manipulating rice growth were verified, despite their unknown functions until now. I found this process to be very logical, and I think it will be valuable in subsequent ecological studies.

      We are very pleased to see that you found our framework is very logical and potentially beneficial for future ecological studies.

      Weaknesses

      CK treatment's effectiveness remains questionable. Rice's growth was clearly altered by CK treatment. The validation of the CK treatment itself is not clear compared to the GN treatment, and the transcriptome data analysis results do not show that DEG is not present. The possibility of a side effect caused by a variable that the author cannot control remains a possibility in this case. Even though this part is mentioned in Discussion, it is necessary to discuss various possibilities in more detail.

      We agree that the effectiveness of the CK treatment was questionable. We have added some more discussion about this point in L376: "The unclear effects of the CK treatment relative to those of the GN treatment could be due to the relatively unstable removal method (i.e., C. kiiensis larvae were manually removed by a hand net) or incomplete removal of the larvae (some larvae might have remained after the removal treatment)."

      Reviewer #1 (Recommendations For The Authors):

      Comment #1-1 This manuscript describes identification of influential organisms on rice growth and an attempt of validation. The analysis of eDNA on rice pot and mimic field provides rice growth promoting organisms. This approach is novel for plant ecology field. However current results did not fully support whether eDNA analysis-based detection of influencing organism.

      Thank you for your careful evaluations of our manuscript. We are pleased to see you found that our approach is novel. We have revised our manuscript in accordance with your comments, and we hope that the revision and responses resolved your concerns.

      Comment #1-2 1. Experimental setting: Authors made up small scale pot system in 2017 and then expanded manipulative experiment. I do not understand how two influencing organism sequences were identified from the single treatment depending on different time points. How they can be convince the two organisms affect the rice growth rather than other biological and environmental factors.

      In 2017, we performed an intensive monitoring of the experimental rice plots and obtained large time series data (122-day consecutive monitoring x 5 plots = 610 data points). The time series data were analyzed using the information-theoretic causal analysis. The analysis is critically different from correlational analyses and designed to identify causal relationships among variables. Although we understand that field manipulation experiments are a common and straightforward approach to identify causal relationships among organisms, we chose the "fieldmonitoring + time-series-based causal analysis" approach. This is because, as explained in the main text, there are numerous factors that could influence rice performance, and it is practically impossible to perform manipulative experiments for all the potential factors that could influence rice growth. On the other hand, our "field-monitoring + timeseries-based causal analysis" approach has a potential to identify multiple factors under field conditions, even by the single experimental treatment.

      Nonetheless, we must admit that our time-series-based approach still has a chance to misidentify causal factors. Our framework relies on statistics, so the chance of false-positive detection of causality cannot be zero. This was exactly the reason why we performed the "validation" experiment in 2019. To complement the statistical results of the 2017 experiments, we performed another experiment in 2019.

      Comment #1-3 2. eDNA technology: The eDNA analysis based on four universal primers 16s rRNA, 18s rRNA, ITS, and COI regions must not be enough to identify specific species. The resolution of species classification may not meet to confirm exact species. Thus, the accuracy of two species that they selected for further experiment is difficult to be confirmed. Authors also referred to "putative Globisporangium".

      Your point is correct. The DNA barcoding regions we selected are short and it is often difficult to identify species. However, this limitation could not have been overcome even if we had chosen a different genetic marker. The long-read sequencing technology could partially solve the issue, but the number of sequence reads generated by the long-read technique is less than that by the short-read sequencing technology, and comprehensive detection of all species in an ecological community was still challenging. Our approach struck a balance among the identification resolution, comprehensiveness of the analysis, and sequencing costs. In addition, even though we could not identify most ASVs at the species level, some ASVs could be identified at the species level (52 ASVs among the 718 ASVs which had causal influences on rice growth), and we selected the two species (G. nunn and C. kiiensis) from the 52 species.

      Further, the taxa assign algorithm we used here (i.e., Claident; Tanabe & Toju 2012 PLoS ONE 10.1371/journal.pone.0076910) adopted conservative criteria for species identification and has a low falsepositive probability.

      More importantly, this is also the reason why we performed the "validation" experiment in 2019. The species identified in the 2017 experiment are still "potential" organisms that influence rice growth (i.e., the hypothesis-generating phase), and we tested the hypothesis in 2019.

      Nonetheless, we must admit that clear description of potential limitations is important. Thus, we have discussed this in L418: "As for the second issue, short-read sequencing has dominated current eDNA studies, but it is often not sufficient for lower-level taxonomic identification. Using long-read sequencing techniques (e.g., Oxford Nanopore MinION) for eDNA studies is a promising approach to overcome the second issue".

      Comment #1-4 3. Biological relevance 1: Authors identify two organisms as influencing organism for rice growth. As conducting the first experiment in 2017, the 2019 experiment was different from natural condition. The two experiments in 2017 and 2019 were conducted under different conditions. How do they compare the experiments? At least, the eDNA analyses in 2017 and 2019 should be very similar. I cannot find such data.

      The experimental conditions were different between 2017 and 2019 because they were conducted in different years. Theoretically, it is ideal if the experimental conditions in 2019 are covered by the range of experimental conditions in 2017 (e.g,. rice variety, air temperature, rainfall, and solar radiation). If this condition were satisfied, the attractor (i.e., rice growth trajectory delineated in the state space) in 2019 would be within that in 2017, and our model prediction in 2017 would be used to predict dynamics in 2019 accurately. To fulfill the conditions, we made as much effort as possible: we used the same rice variety and soils in 2019 as those used in 2017, and started our experiment at the same timing in 2019 as that in 2017.

      Although natural ecological dynamics cannot be precisely controlled, our monitoring revealed that the ecological dynamics in 2019 was qualitatively similar to that in 2017. To demonstrate that the experimental conditions and eDNA community data were similar between the two experiments, we have presented the climate and eDNA data in an inset figure in Figure 3a, Figure 1–figure supplement 2, Figure 3–figure supplement 2. We must admit that these dynamics are not identical, but we hope that this resolves your concern.

      Comment #1-5 4. Lack of detail description: In the Materials and Methods, there are many parts which lack on detail description. For instance, authors must described the two species cultivation, application concentrations, and application methods.

      We have moved Supplementary Materials and Methods to the main text and added more detailed descriptions in Materials and Methods. Also, to improve the logical flow and readability of our manuscript, we have added brief explanations about what/how we did at the first sentence of Results subsections (some of these explanations were only in Materials and Methods in the original manuscript). We have added the reference for how to cultivate G. nunn in L608 (Kobayashi et al., 2010; Tojo et al., 1993) (C. kiiensis was not cultivated but removed from the system as in Materials and Methods), and application concentrations. Application methods were described in Materials and Methods, the section Field manipulation experiments in 2019 in L596.

      Comment #1-6 5. Validation: Application of one species clearly resulted to promote rice growth. They must include appropriate control treatment. If they pick same genus but different species that identified no specific effect on rice growth through eDNA analysis, no effect on growth can be provided. Generally application of large population of certain non-harmful organism confer plant growth promotion. It is not surprising result. Authors need to prove effectiveness of eDNA analysis. In addition, the field experiments required at least two years of consistent data for publication because environmental factors are so dynamic.

      Thank you for pointing this out. We agree with your comment that species that were predicted to have no effect should not promote rice growth in a validation experiment. It was also one of our inititial experimental plans to include such species in our manipulation experiment in 2019, but we could not include them because of the limitation of time, labor, and money. More extensive validation of the statistical results of the 2017 data, including multi-year experiments, would further validate the effectiveness of our approach, which should be done as future studies. To clarify this point, we have added statements in the paragraph starting at L396.

      Comment #1-7 In conclusion, I suggest that authors need more large data analysis and validate with more accurate and meaningful protocol.

      As we explained in the revised manuscript and the Response to Comments #1-2 to #1-7, our study demonstrated a novel research framework to detect previously overlooked influential organisms under field conditions. We agree that larger data analysis would be ideal to further validate our approach, but whether and how to collect larger data is constrained by time, money, and labor. We believe that our study was designed carefully and could provide meaningful avenues for developing an ecological-network based, novel, and environment-friendly agriculture solutions.

      Reviewer #2 (Recommendations For The Authors):

      Comment #2-1 Lines 97-110: This is so cool. Modeling with empirical data is very powerful. But a rice field is an open system consisting of metacommunity dynamics. Maybe a tiny bit of biological and biogeochemical background here would be good.

      Thank you for your comments. We have added a few examples of how and in which systems these methods were used to evaluate community dynamics and detect biological interactions in L109-L118.

      Comment #2-2 Lines 111-126: I like the summary of the study here. I think the influential species concept can be a little more elevated. Paine's famous keystone species work has been cited but a couple more pieces of literature can help to enhance the ecological importance of this work.

      We have explained the work by Paine (1966) a bit more and added one more paper that showed the effect of multiple predator species on the system dynamics at L88. We have also added a relevant sentence at L137 to emphasize the ecological/agricultural significance of our work.

      Comment #2-3 Experimental design/Figure 1:

      Is there any rationale behind choosing red individuals to measure the growth?

      Is there any competition between the individuals in the pots?

      Figure 1e: It is nice to show the ASVs in time. I wonder how the plot would look like when normalized by biomass/DNA content/coverage/rarefaction because of the seasonality.

      As for the first question, we chose the four individuals to minimize the edge effects (i.e., effects of microclimates and neighboring rice would be different between the four rice individuals and those planted in the edge regions). We have mentioned this in the legend of Figure 1.

      As for the second question, there might be competition among the individuals in the pot. However, we did not measure the effect of competition (e.g., by comparing the growth with/without other rice individuals).

      As for the third question, we published detailed dynamics of ecological community in the Supplementary Figures in Ushio (2022) Proceedings B https://doi.org/10.6084/m9.figshare.c.5842766.v1. In addition, we have uploaded a video showing the temporal dynamics of some top (= most abundant) ASVs in https://doi.org/10.6084/m9.figshare.23514150.v2.

      We have mentioned the supporting information in L153.

      Comment #2-4 Line 146-147: Is this damage influence the inferences? Maybe it is better to justify.

      While we occasionally observed physical damages, it is unlikely that they affected our causal inference because the changes in the rice heights due to the damages were smaller and less frequent than those due to growth. We have noted this at L151.

      Comment #2-5 Line 161-162: Maybe refer readers to the methods section where you explain UIC analysis. It'd be easier to interpret the figures.

      Mentioned.

      Comment #2-6 Line 175-176: I believe very brief information in the intro about the organisms might help explain the hypothesis and interpret the results better.

      We have included brief information of the two species at L197.

      Comment #2-7 Figure 2: Species interaction strength: Are these proxies to the Jacobians? Is there a threshold for the influence we can consider strong/weak? For example, influential species compared to diagonal elements of the Jacobians (intraspecies interactions) could be shown as a mean vertical line in Figure 2b.

      "Influences to rice growth" in Figure 2b is transfer entropy (TE) from a target ASV to rice growth. They are not proxies of the Jacobians, but they might positively correlate with the absolute value of the Jacobians. We have clarified this point in the legend (L953). More direct estimations of the Jacobian can be done using the MDR S-map method (Chang et al. 2021 DOI:10.1111/ele.13897), but we did not perform the MDR S-map in the present manuscript (see Ushio et al. 2023 https://doi.org/10.7554/eLife.85795 for the application of the MDR S-map). As for TE, there is no clear threshold to distinguish strong/weak interactions.

      Comment #2-8 Figure 2: Looking at panels c and d, it looks like there is a negative frequency selection between two influential species. Is it a reasonable observation?

      This is an interesting point. In this manuscript, we have not carefully examined the interspecific relationship between these two particular species. However, the interspecific interactions were examined in detail and reported in Ushio (2022) Proceedings of the Royal Society B DOI:10.1098/rspb.2021.2690). We re-checked the result in Ushio (2022); although there is a negative correlation between them, we did not find any (statistical) causal relationship between them.

      Comment #2-9 Line 209: What is t-SNE analysis? Because of the manuscript's format, maybe methods should be shortly referred to in the relevant section or explained in brackets.

      We have spelled out t-SNE.

      Comment #2-10 Line 212-214: Maybe briefly explain what the hypotheses are for the alternative analysis, and what is the contribution of the results to the study.

      We have added a brief explanation at L241: "Alternative statistical modeling that included the treatments (the control versus GN or CK treatments) and manipulation timing (i.e., before or after the manipulation), which simultaneously took the temporal changes of all the treatments into account, also showed qualitatively similar results (Supplementary file 4), further supporting the results."

      Comment #2-11 Figure 3b/c: Maybe species names as panel titles could be helpful. d: Treatment names with initials in the legend could be also helpful to read the plots.

      We have added species name as panel titles of Figure 3b,c. Treatment names were included in the legend of Figure 3.

      Comment #2-12 Line 233: Maybe mention why the manuscript uses the word "clear".

      We have mentioned this in L185.

      Comment #2-13 Line 234-236: I think that these alternative tests should be explained somewhere.

      We have revised the sentence so that it includes some explanations (L241). Also, we have referred to Materials and Methods.

      Comment #2-14 Figure 4: The title says ecological community compositions, and panels show the growth rates and cumulative growth.

      Thank you for pointing this out. This was a typo and we have corrected it.

      Comment #2-15 Lines 246-269: Can these expression patterns be transient and relevant to the time point that the sample is taken?

      Yes, these expression patterns were transient. We collected rice leaf samples for RNA-seq 1 day before the first manipulation and 1, 14, and 38 days after the third manipulation (see Supplementary file 3 for the sampling design). When we merged the pot locations, we observed no difference in the gene expression for samples 1 day before the first manipulation and 14 and 38 days after the third manipulation (except for two genes in samples 38 days after the manipulation), and thus, we consider the DEGs that appeared only in the short period after the manipulation. We have mentioned this in L278 and L383: "We found almost no DEGs for leaf samples taken one day before and 14 and 38 days after the third manipulation (the leaf sampling event 1, 3, and 4), suggesting that the influences of the treatments on the gene expression patterns were transient." (L278) and "These changes were observed relatively quickly and transient." (L383)

      Comment #2-16 I wonder if a conceptual framework figure would help to generalize the workflow that can be used for other studies.

      Thank you for your suggestion. Although we agree with your comment that such a figure would be helpful to generalize the workflow, we believe that our framework is clear and decided not to include it in the present manuscript. We might consider including such a figure (like Figure 1a in Ushio 2022) if we have an opportunity to write a review paper regarding this topic.

      Comment #2-17 Lines 329-335: I feel this information is unclear in the early manuscript. Maybe it's necessary to clearly communicate in the beginning.

      We have explained that we could not find any relevant information at least at the time we detected the ASVs in L189.

      Comment #2-18 Lines 336-337: Can these species be identified in the previous data set from the ASV sequences?

      Yes, these species were identified in the DNA data set obtained in 2017.

      Comment #2-19 Lines 387-397: Are there any measurements such as total biomass, and statistical methods to help with the eDNA bias and data compositionality?

      We have confirmed that our quantitative eDNA metabarcoding generates comparable results with the fluorescence-based method and quantitative PCR (e.g., see Supplementary Figures in Ushio 2022) (mentioned in L310 in the revised manuscript). However, at least in this study, we could not perform a direct comparison of the eDNA data with species abundance and/or biomass. This is partly because the number of our target species was too large (> 1,000 species). The accurate estimation of species abundance and/or biomass is one of our next goals.

      Comment #2-20 Line 472: Maybe mention transfer entropy somewhere in the early manuscript.

      We have mentioned this in L175.

      Comment #2-21 Lines 494-503: Maybe a summary of this reasoning should be mentioned somewhere in the early manuscript too.

      We have described a brief summary of the reasoning in L195.

      Comment #2-22 Lines 29-33 If this sentence is simplified it might be easier to follow.

      The sentence has been divided into two sentences in L28. Also, each sentence has been simplified.

      Comment #2-23 Line 38 Maybe "macrobes" can be explicitly mentioned. Fungi, protozoa, etc.

      Mentioned.

      Comment #2-24 Line 139: I am not sure if the date should be in the title.

      Similar monitoring was done in 2017 and 2019. Thus, we think the date is necessary in the section title.

      Comment #2-25 Figure 1: There are 4 red individuals in the design but 5 measurements in the plots.

      Heights and SPAD of the four individuals were measured for each plot and the averaged values were used as representative values for each plot. Therefore, 20 measurements (= 4 rice individuals 5 plots) were done every day, but each plot has one rice height for each day. We have clarified this in the legend of Figure 1: "the average values of the four individuals were regarded as representative values for each plot."

      Comment #2-26 Figure 1b: Maybe use the same axis length for the temperature as the other plots?

      Corrected.

      Comment #2-27 Lines 259-261: Are there the names of the genes in databases?

      Yes, these are gene names used in the rice databases (e.g., The Rice Annotation Project Database; https://rapdb.dna.affrc.go.jp/inde x.html).

      Reviewer #3 (Recommendations For The Authors):

      Comment #3-1 Additionally, RGR is not statistically significant, but statistical significance is observed only in cumulative growth because data presentation does not reflect plant characteristics. RGR changes according to the developmental stage of the plant. Therefore, if RGR data are shown separately according to the rice growing season, the cumulative growth pattern and the pattern will appear similar.

      RGRs were calculated daily (i.e., cm/day) and they changed depending on the developmental stage of the rice (Figure 1 and Figure 4–figure supplement 1). Therefore, we might find similar RGR patterns if we focus on a specific period of the growing season. However, unfortunately, we performed the intensive (i.e., daily) monitoring in 2019 only during the field manipulation period (middle June to middle July 2019), and we cannot investigate the changes in cumulative growth throughout the growing season (this depends on how many days we add up RGR to calculate the cumulative growth, though). We agree that, if we had investigated the detailed pattern of RGR throughout the growing season in 2019, we could have found similar pattens between RGR and cumulative growth rate at a certain period in the growing season. In Figure 4, the cumulative growths were calculated based on the RGRs before the third manipulation or during 10 days after the third manipulation. We clarified this in the legend of Figure 4.

    2. Reviewer #1 (Public Review):

      This manuscript describes identification of influential organisms on rice growth and an attempt of validation. The analysis of eDNA on rice pot and mimic field provides rice growth promoting organisms. This approach is novel for plant ecology field. However current results did not fully support whether eDNA analysis-based detection of influencing organism.

      The strength of this manuscript is to attempt application of eDNA analysis-based plant growth differentiation. The weakness is too preliminary data and experimental set-up to make any conclusion. The trials of authors experiments are ideal. However, the process of data analysis did not meet certain level. For example, eDNA analysis of different time points on rice growth stages resulted two influential organisms for rice growth. Then they cultivate two species and applied rice seedlings. Without understanding of fitness and robustness, how we can know the effect of the two species on rice growth.

      The authors did not check the fate of two species after introducing into rice. If this is true, it is difficult to link between the rice gene expression after treatments and the effectiveness of two species. I think the validation experiment in 2019 needs to be re-conducted.

      As authorized gave answered, no strong rationale to select the two species was found. However, I insist that the method has enough novelty to present to general audiences.

    3. Reviewer #2 (Public Review):

      Most farming is done by subtracting or adding what people want based in nature. However, in nature, crops interact with various objects, and mostly we are unaware of their effects. In order to increase agricultural productivity, finding useful objects is very important. However, in an uncontrolled environment, it coexists with so many biological objects that it is very inefficient to verify them all experimentally. It is therefore necessary to develop an effective screening method to identify external environmental factors that can increase crop productivity. This study identified factors presumed to be important to crop growth based on metabarcoding analysis, field sampling, and non-linear analysis/information theory, and conducted a mesocosm experiment to verify them experimentally. In conclusion, the object proposed by the author did not increase rice yield, but rather rice growth rate.

      The authors responded to my general concerns and all of my specific comments. The manuscript has significantly improved. The flow of aims and approaches is more understandable. Extra supplementary material -especially the visual ones, is useful.

      I agree with the other reviewers that the study needs more data and evidence. However, this study aims to introduce ecological concepts and advanced statistical methods to the field. Also, most time series analyses require absolute abundance data, but the manuscript provides solutions for the sequencing data.

    1. eLife assessment

      This is a valuable paper demonstrating the validity of a novel task that could advance the field of reinforcement learning to better incorporate threat processing in approach-avoidance-conflict. A compelling methodology includes the use of online samples and computational modelling, psychometrics, discovery/replication and pre-registration. This work provides a foundation for future work, which is required to address potential confounds and establish this task as relevant to psychopathology and treatment.

    2. Reviewer #1 (Public Review):

      This paper describes the development and initial validation of an approach-avoidance task and its relationship to anxiety. The task is a two-armed bandit where one choice is 'safer' - has no probability of punishment, delivered as an aversive sound, but also lower probability of reward - and the other choice involves a reward-punishment conflict. The authors fit a computational model of reinforcement learning to this task and found that self-reported state anxiety during the task was related to a greater likelihood of choosing the safe stimulus when the other (conflict) stimulus had a higher likelihood of punishment. Computationally, this was represented by a smaller value for the ratio of reward to punishment sensitivity in people with higher task-induced anxiety. They replicated this finding, but not another finding that this behavior was related to a measure of psychopathology (experiential avoidance), in a second sample. They also tested test-retest reliability in a sub-sample tested twice, one week apart and found that some aspects of task behavior had acceptable levels of reliability. The introduction makes a strong appeal to back-translation and computational validity. The task design is clever and most methods are solid - it is encouraging to see attempts to validate tasks as they are developed. The lack of replicated effects with psychopathology may mean that this task is better suited to assess state anxiety, or to serve as a foundation for additional task development.

    3. Reviewer #2 (Public Review):

      Summary:

      The authors develop a computational approach-avoidance-conflict (AAC) task, designed to overcome the limitations of existing offer based AAC tasks. The task incorporated likelihoods of receiving rewards/ punishments that would be learned by the participants to ensure computational validity and estimated model parameters related to reward/punishment and task induced anxiety. Two independent samples of online participants were tested. In both samples participants who experienced greater task induced anxiety avoided choices associated with greater probability of punishment. Computational modelling revealed that this effect was explained by greater individual sensitivities to punishment relative to rewards.

      Strengths:

      Large internet-based samples, with discovery sample (n = 369), pre-registered replication sample (n = 629) and test-retest sub group (n = 57). Extensive compliance measures (e.g. audio checks) seek to improve adherence.

      There is a great need for RL tasks that model threatening outcomes rather than simply loss of reward. The main model parameters show strong effects and the additional indices with task based anxiety are a useful extension. Associations were broadly replicated across samples. Fair to excellent reliability of model parameters is encouraging and badly needed for behavioral tasks of threat sensitivity.

      The task seems to have lower approach bias than some other AAC tasks in the literature.

      Weaknesses:

      The negative reliability of punishment learning rate is concerning as this is an important outcome.

      The Kendall's tau values underlying task induced anxiety and safety reference/ various indices are very weak (all < 0.1), as are the mediation effects (all beta < 0.01). The interaction with P(punishment|conflict) does explain some of this.

      The inclusion of only one level of reward (and punishment) limits the ecological validity of the sensitivity indices.

      Appraisal and impact:

      Overall this is a very strong paper, describing a novel task that could help move the field of RL forward to take account of threat processing more fully. The large sample size with discovery, replication and test-retest gives confidence in the findings. The task has good ecological validity and associations with task-based anxiety and clinical self-report demonstrate clinical relevance. Test-retest of the punishment learning parameter is the only real concern. Overall this task provides an exciting new probe of reward/threat that could be used in mechanistic disease models.

      Additional context:

      The sex differences between the samples are interesting as effects of sex are commonly found in AAC tasks. It would be interesting to look at the main model comparison with sex included as a covariate.

    4. Reviewer #3 (Public Review):

      This study investigated cognitive mechanisms underlying approach-avoidance behavior using a novel reinforcement learning task and computational modelling. Participants could select a risky "conflict" option (latent, fluctuating probabilities of monetary reward and/or unpleasant sound [punishment]) or a safe option (separate, generally lower probability of reward). Overall, participant choices were skewed towards more rewarded options, but were also repelled by increasing probability of punishment. Individual patterns of behavior were well-captured by a reinforcement learning model that included parameters for reward and punishment sensitivity, and learning rates for reward and punishment. This is a nice replication of existing findings suggesting reward and punishment have opposing effects on behavior through dissociated sensitivity to reward versus punishment.

      Interestingly, avoidance of the conflict option was predicted by self-reported task-induced anxiety. Importantly, when a subset of participants were retested over 1 week later, most behavioral tendencies and model parameters were recapitulated, suggesting the task may capture stable traits relevant to approach-avoidance decision-making.

      The revised paper commendably adds important additional information and analyses to support these claims. The initial concern that not accounting for participant control over punisher intensity confounded interpretation of effects has been largely addressed in follow-up analyses and discussion.

      This study complements and sits within a broad translational literature investigating interactions between reward/punishers and psychological processes in approach-avoidance decisions.

    5. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      This paper describes the development and initial validation of an approach-avoidance task and its relationship to anxiety. The task is a two-armed bandit where one choice is 'safer' - has no probability of punishment, delivered as an aversive sound, but also lower probability of reward - and the other choice involves a reward-punishment conflict. The authors fit a computational model of reinforcement learning to this task and found that self-reported state anxiety during the task was related to a greater likelihood of choosing the safe stimulus when the other (conflict) stimulus had a higher likelihood of punishment. Computationally, this was represented by a smaller value for the ratio of reward to punishment sensitivity in people with higher task-induced anxiety. They replicated this finding, but not another finding that this behavior was related to a measure of psychopathology (experiential avoidance), in a second sample. They also tested test-retest reliability in a sub-sample tested twice, one week apart and found that some aspects of task behavior had acceptable levels of reliability. The introduction makes a strong appeal to back-translation and computational validity, but many aspects of the rationale for this task need to be strengthened or better explained. The task design is clever and most methods are solid - it is encouraging to see attempts to validate tasks as they are developed. There are a few methodological questions and interpretation issues, but they do not affect the overall findings. The lack of replicated effects with psychopathology may mean that this task is better suited to assess state anxiety, or to serve as a foundation for additional task development.

      We thank the reviewer for their kind comments and constructive feedback. We agree that the approach taken in this paper appears better suited to state anxiety, and further work is needed to assess/improve its clinical relevance.

      Reviewer #1 (Recommendations For The Authors):

      1) For the introduction, the authors communicate well the appeal of tasks with translational potential, and setting up this translation through computational validity is a strong approach. However, I had some concerns about how the task was motivated in the introduction:

      a) The authors state that current approach-avoidance tasks used in humans do not resemble those used in the non-human literature, but do not provide details on what exactly is missing from these tasks that makes translation difficult.

      Our intention for the section that the reviewer refers to was to briefly convey that historically, approach-avoidance conflict would have been measured either using questionnaires or joystick-based tasks which have no direct non-human counterpart. However, we note that the phrasing was perhaps unfair to recent tasks that were explicitly designed to be translatable across species. Therefore, we have amended the text to the following:

      In humans, on the other hand, approach-avoidance conflict has historically been measured using questionnaires such as the Behavioural Inhibition/Activation Scale (Carver & White, 1994), or cognitive tasks that rely on motor biases, for example by using joysticks to approach/move towards positive stimuli and avoid/move away from negative stimuli, which have no direct non-human counterparts (Guitart-Masip et al., 2012; Kirlic et al., 2017; Mkrtchian et al., 2017; Phaf et al., 2014).

      b) Although back-translation to 'match' human paradigms to non-animal paradigms is useful for research, this isn't the end goal of task development. What really matters is how well these tasks, whether in humans or not, capture psychopathology-relevant behavior. Many animal paradigms were developed and brought into extensive use because they showed sensitivity to pharmacological compounds (e.g., benzodiazepines). The introduction accepts the validity of these paradigms at face value, and doesn't address whether developing human tests of psychopathology based on sensitivity to existing medication classes is the best way to generate new insights about psychopathology.

      We agree that whilst paradigms with translational and computational validity have merits of their own for neuroscientific theory, clinical validity (i.e. how well the paradigm reflects a phenomenon relevant to psychopathology) is key in the context of clinical applications. While our findings of associations between task performance and self-reported (state) anxiety suggest that our approach is a step in the right direction, the lack of associations with clinical measures was disappointing. Although future work is needed to more directly test the sensitivity of the current approach to psychopathology, this may mean that it, and its non-human counterparts, do not measure behaviours relevant to pathological anxiety. Since our primary focus in this paper was on translational and computational validity, we have opted to discuss the author’s suggestion in the ‘Discussion’ section, as follows:

      Further, it is worth noting that many animal paradigms were developed and widely adopted due to their sensitivity to anxiolytic medication (Cryan & Holmes, 2005). Given the lack of associations with clinical measures in our results, it is possible that current translational models of anxiety may not fully capture behaviours that are directly relevant to pathological anxiety. To develop translational paradigms of clinical utility, future research should place a stronger emphasis on assessing their clinical validity in humans.

      c) The authors may want to bring in the literature on the description-experience gap (e.g., PMID: 19836292) when discussing existing decision tasks and their computational dissimilarity to non-human operant conditioning tasks.

      We thank the reviewer for this useful addition to the introduction. We have now added the following to the 'Introduction’ section:

      Moreover, evidence from economic decision-making suggests that explicit offers of probabilistic outcomes can impact decision-making differently compared to when probabilistic contingencies need to be learned from experience (referred to as the ‘description-experience gap’; Hertwig & Erev, 2009); this finding raises potential concerns regarding the use of offer-based tasks in humans as approximations of non-human tasks that do not involve explicit offers.

      d) How does one evaluate how computationally similar human vs. non-human tasks are? What are the criteria for making this judgement? Specific to the current tasks, many animal learning tasks are not learning tasks in the same sense that human learning tasks are, in terms of the number of trials used and if the animals are choosing from a learned set of contingencies versus learning the contingencies during the testing.

      The computational similarity of human and non-human strategies in a given translational task can be tested empirically. This can be done by fitting models to the data and assessing whether similar models explain choices, even if parameter distributions might vary across species due to, for example, physiological differences. Indeed, non-human animals require much more training to perform even uni-dimensional reinforcement learning, but once they are trained, it should be possible to model their responses. In fact, it should even be possible to take training data into account in some cases. For example, the training phase of the Vogel/Geller-Seifter preclinical tests require an animal to learn to emit a certain action (e.g. lever press) simply to obtain some reward. In the next phase, an aversive outcome is introduced as an additional outcome, but one could model both the training and test phase together – the winning model in our studies would be a suitable candidate to model behaviour here. As we also discuss predictive validity in the ‘Discussion’ section, we opted to add the following text there too:

      … computational validity would also need to be assessed directly in non-human animals by fitting models to their behavioural data. This should be possible even in the face of different procedures across species such as number of trials or outcomes used (shock or aversive sound). We are encouraged by our finding that the winning computational model in our study relies on a relatively simple classical reinforcement learning strategy. There exist many studies showing that non-human animals rely on similar strategies during reward and punishment learning (Mobbs et al., 2020; Schultz, 2013); albeit to our knowledge this has never been modelled in non-human animals where rewards and punishment can occur simultaneously.

      2) What do the authors make of the non-linear relationship between probability of punishment and probability of choosing the conflict stimulus (Fig 2d), especially in the high task-induced anxiety participants? Did this effect show up in the replication sample as well?

      Figures 2c-e were created by binning the continuous predictors of outcome probabilities into discrete bins of equal interval. Since punishment probability varied according to Gaussian random walks, it was also distributed with more of its mass in the central region (~ 0.4), and so values at the extreme bins were estimated on fewer data and with greater variance. The non-linear relationships are likely thus an artefact of our task design and plotting procedure. The pattern was also evident in the replication sample, see Author response image 1:

      Author response image 1.

      However, since these effects were estimated as linear effects in the logistic regression models, and to avoid overfitting/interpretations of noise arising from our task design, we now plot logistic curves fitted to the raw data instead.

      3) How correlated were learning rate and sensitivity parameters? The EM algorithm used here can sometimes result in high correlations among these sets of parameters.

      As the reviewer suspects the parameters were strongly correlated, especially across the punishment-specific parameters. The Pearson’s r estimates for the untransformed parameter values were as follows:

      Reward parameters: discovery sample r = -0.39; replication sample r = -0.78

      Punishment parameters: discovery sample r = -0.91; replication sample r = -0.85

      We have included the correlation matrices of the estimated parameters as Supplementary Figure 2 in the ‘Computational modelling’ section of the Supplement.

      We have now also re-fitted the winning model using variational Bayesian inference (VBI) via Stan, and found that the cross-parameter correlations were much lower than when the data were fitted using EM. We also ran a sensitivity analysis assessing whether using VBI changed the main findings of our studies. This showed that the correlation between task-induced anxiety and the reward-punishment sensitivity index was robust to fitting method, as was the mediating effect of reward-punishment sensitivity index on anxiety’s effect on choice. This indicates that overall our key findings are robust to different methods of parameter-fitting.

      We now direct readers to these analyses from the new ‘Sensitivity analyses’ section in the manuscript, as follows:

      As our procedure for estimating model parameters (the expectation-maximisation algorithm, see ‘Methods’) produced high inter-parameter correlations in our data (Supplementary Figure 2), we also re-estimated the parameters using Stan’s variational Bayesian inference algorithm (Stan Development Team, 2023) – this resulted in lower inter-parameter correlations, but our primary computational finding, that the effect of anxiety on choice is mediated by relative sensitivity to reward/punishment was consistent across algorithms (see Supplement section 9.8 for details).

      We have included the relevant analyses comparing EM and VBI in the Supplement, as follows:

      [9.8 Sensitivity analysis: estimating parameters via expectation maximisation and variational Bayesian inference algorithms]

      Given that the expectation maximisation (EM) algorithm produced high inter-parameter correlations, we ran a sensitivity analysis by assessing the robustness of our computational findings to an alternative method of parameter estimation – (mean-field) variational Bayesian inference (VBI) via Stan (Stan Development Team, 2023). Since, unlike EM, the results of VBI are very sensitive to initial values, we fitted the data 10 times with different initial values.

      Inter-parameter correlations

      The VBI produced lower inter-parameter correlations than the EM algorithm (Supplementary Figure 8).

      Sensitivity analysis

      Since multicollinearity in the VBI-estimated parameters was lower than for EM, indicating less trade-off in the estimation, we re-tested our computational findings from the manuscript as part of a sensitivity analysis. We first assessed whether we observed the same correlations between task-induced anxiety and punishment learning, and reward-punishment sensitivity index (Supplementary Figure 9a). Punishment learning rate was not significantly associated with task-induced anxiety in any of the 10 VBI iterations in the discovery sample, although it was in 9/10 in the replication sample. On the other hand, the reward-punishment sensitivity index was significantly associated with task-induced anxiety in 9/10 VBI iterations in the discovery sample and all iterations in the replication sample. This suggests that the correlation of anxiety and sensitivity index is robust to these two fitting approaches.

      We also re-estimated the mediation models, where in the EM-estimated parameters, we found that the reward-punishment sensitivity index mediated the relationship between task-induced anxiety and task choice proportions (Supplementary Figure 9b). Again, we found that the reward-punishment sensitivity index was a significant mediator in 9/10 VBI iterations in the discovery sample and all iterations in the replication sample. Punishment learning rate was also a significant mediator in 9/10 iterations in the replication sample, although it was not in the discovery sample for all iterations, and this was not observed for the EM-estimated parameters.

      Overall, we found that our key results, that anxiety is associated with greater sensitivity to punishment over reward, and this mediates the relationship between anxiety and approach-avoidance behaviour, were robust across both fitting methods.

      As an aside, we were unable to run the model fitting using Markov chain Monte Carlo sampling approaches due to the computational power and time required for a sample of this size (Pike & Robinson, 2022, JAMA Psychiatry).

      4) What is the split-half reliability of the task parameters?

      We thank the reviewer for this query. We have now included a brief section on the (good-to-excellent) split-half reliability of the task in the manuscript:

      We assessed the split-half reliability of the task by correlating the overall proportion of conflict option choices and model parameters from the winning model across the first and second half of trials. For overall choice proportion, reliability was simply calculated via Pearson’s correlations. For the model parameters, we calculated model-derived estimates of Pearson’s r values from the parameter covariance matrix when first- and second-half parameters were estimated within a single model, following a previous approach recently shown to accurately estimate parameter reliability (Waltmann et al., 2022). We interpreted indices of reliability based on conventional values of < 0.40 as poor, 0.4 - 0.6 as fair, 0.6 - 0.75 as good, and > 0.75 as excellent reliability (Fleiss, 1986). Overall choice proportion showed good reliability (discovery sample r = 0.63; replication sample r = 0.63; Supplementary Figure 5). The model parameters showed good-to-excellent reliability (model-derived r values ranging from 0.61 to 0.85 [0.76 to 0.92 after Spearman-Brown correction]; Supplementary Figure 5).

      5) The authors do a good job of avoiding causal language when setting up the cross-sectional mediation analysis, but depart from this in the discussion (line 335). Without longitudinal data, they cannot claim that "mediation analyses revealed a mechanism of how anxiety induces avoidance".

      Thank you for spotting this, we have now amended the text to:

      … mediation analyses suggested a potential mechanism of how anxiety may induce avoidance.

      Reviewer #2 (Public Review):

      Summary:

      The authors develop a computational approach-avoidance-conflict (AAC) task, designed to overcome limitations of existing offer based AAC tasks. The task incorporated likelihoods of receiving rewards/ punishments that would be learned by the participants to ensure computational validity and estimated model parameters related to reward/punishment and task induced anxiety. Two independent samples of online participants were tested. In both samples participants who experienced greater task induced anxiety avoided choices associated with greater probability of punishment. Computational modelling revealed that this effect was explained by greater individual sensitivities to punishment relative to rewards.

      Strengths:

      Large internet-based samples, with discovery sample (n = 369), pre-registered replication sample (n = 629) and test-retest sub group (n = 57). Extensive compliance measures (e.g. audio checks) seek to improve adherence.

      There is a great need for RL tasks that model threatening outcomes rather than simply loss of reward. The main model parameters show strong effects and the additional indices with task based anxiety are a useful extension. Associations were broadly replicated across samples. Fair to excellent reliability of model parameters is encouraging and badly needed for behavioral tasks of threat sensitivity.

      We thank the reviewer for their comments and constructive feedback.

      The task seems to have lower approach bias than some other AAC tasks in the literature. Although this was inferred by looking at Fig 2 (it doesn't seem to drop below 46%) and Fig 3d seems to show quite a strong approach bias when using a reward/punishment sensitivity index. It would be good to confirm some overall stats on % of trials approached/avoided overall.

      The range of choice proportions is indeed an interesting statistic that we have now included in the manuscript:

      Across individuals, there was considerable variability in overall choice proportions (discovery sample: mean = 0.52, SD = 0.14, min/max = [0.03, 0.96]; replication sample: mean = 0.52, SD = 0.14, min/max = [0.01, 0.99]).

      Weaknesses:

      The negative reliability of punishment learning rate is concerning as this is an important outcome.

      We agree that this is a concerning finding. As reviewer 3 notes, this may have been due to participants having control over the volume used to play the aversive sounds in the task (see below for our response to this point). Future work with better controlled experimental settings will be needed to determine the reliability of this parameter more accurately.

      This may also have been due to the asymmetric nature of the task, as only one option could produce the punishment. This means that there were fewer trials on which to estimate learning about the occurrence of a punishment. Future work using continuous outcomes, as the reviewer suggests below, whilst keeping the asymmetric relationship between the options, could help in this regard.

      We have included the following comment on this issue in the manuscript:

      Alternatively, as participants self-determined the loudness of the punishments, differences in volume settings across sessions may have impacted the reliability of this parameter (and indeed punishment sensitivity). Further, the asymmetric nature of the task may have impacted our ability to estimate the punishment learning rate, as there were fewer occurrences of the punishment compared to the reward.

      The Kendall's tau values underlying task induced anxiety and safety reference/ various indices are very weak (all < 0.1), as are the mediation effects (all beta < 0.01). This should be highlighted as a limitation, although the interaction with P(punishment|conflict) does explain some of this.

      We now include references to the effect sizes to emphasise this limitation. We also note, as the reviewer suggests, that this may be due to crudeness of overall choice proportion as a measure of approach/avoidance, as it is contaminated with variables such as P(punishment|conflict).

      One potentially important limitation of our findings is the small effect size observed in the correlation between task-induced anxiety and avoidance (Kendall's tau values < 0.1, mediation betas < 0.01). This may be attributed to the simplicity of using overall choice proportion as a measure of approach/avoidance, as the effect of anxiety on choice was also influenced by punishment probability.

      The inclusion of only one level of reward (and punishment) limits the ecological validity of the sensitivity indices.

      We agree that using multi-level outcomes will be an important question for future work and now explicitly note this in the manuscript, as below:

      Using multi-level or continuous outcomes would also improve the ecological validity of the present approach and interpretation of the sensitivity parameters.

      Appraisal and impact:

      Overall this is a very strong paper, describing a novel task that could help move the field of RL forward to take account of threat processing more fully. The large sample size with discovery, replication and test-retest gives confidence in the findings. The task has good ecological validity and associations with task-based anxiety and clinical self-report demonstrate clinical relevance. The authors could give further context but test-retest of the punishment learning parameter is the only real concern. Overall this task provides an exciting new probe of reward/threat that could be used in mechanistic disease models.

      We thank the reviewer again for helping us to improve our analyses and manuscript.

      Reviewer #2 (Recommendations For The Authors):

      Additional context:

      In the introduction "cognitive tasks that bear little semblance to those used in the non-human literature" seems a little unfair. One study that is already cited (Ironside et al, 2020) used a task that was adapted from non-human primates for use in humans. It has almost identical visual stimuli (different levels of simultaneous reward and aversive outcome/punishment) and response selection processes (joystick) between species and some overlapping brain regions were activated across species for conflict and aversiveness. The later point that non-human animals must be trained on the association between action and outcome is well taken from the point of view of computational validity but perhaps not sufficient to justify the previous statement.

      Our intention for this section was to briefly convey that historically, approach-avoidance conflict would have been measured either using questionnaires or joystick-based tasks which have no direct non-human counterpart. However, we agree that this phrasing is unfair to recent studies such as those by Ironside and colleagues. Therefore, we have amended the text to the following:

      In humans, on the other hand, approach-avoidance conflict has historically been measured using questionnaires such as the Behavioural Inhibition/Activation Scale (Carver & White, 1994), or cognitive tasks that rely on motor biases to approach/move towards positive stimuli and avoid/move away from negative stimuli which have no direct non-human counterparts (Guitart-Masip et al., 2012; Kirlic et al., 2017; Mkrtchian et al., 2017; Phaf et al., 2014).

      It would be good to speculate on why task induced anxiety made participants slower to update their estimates of punishment probability.

      Although a meta-analysis of reinforcement learning studies using reward and punishment outcomes suggests a positive association between punishment learning rate and anxiety symptoms (and depressed mood), we paradoxically found the opposite effect. However, previous work has suggested that distinct forms of anxiety associate differently with anxiety (Wise & Dolan, 2020, Nat. Commun.), where somatic anxiety was negatively correlated with punishment learning rate whereas cognitive anxiety showed the opposite effect. We have now added the following to the manuscript, and noted that future work is needed to understand the potentially complex relationship between anxiety and learning from punishments:

      Notably, although a recent computational meta-analysis of reinforcement learning studies showed that symptoms of anxiety and depression are associated with elevated punishment learning rates (Pike & Robinson, 2022), we did not observe this pattern in our data. Indeed, we even found the contrary effect in relation to task-induced anxiety, specifically that anxiety was associated with lower rates of learning from punishment. However, other work has suggested that the direction of this effect can depend on the form of anxiety, where cognitive anxiety may be associated with elevated learning rates, but somatic anxiety may show the opposite pattern (Wise & Dolan, 2020) and this may explain the discrepancy in findings. Additionally, parameter values are highly dependent on task design (Eckstein et al., 2022), and study designs to date may be more optimised in detecting differences in learning rate (Pike & Robinson, 2022) – future work is needed to better understand the potentially complex association between anxiety and punishment learning rate. Lastly, as punishment learning rate was severely unreliable in the test-retest analyses, and the associations between punishment learning rate and state anxiety were not robust to an alternative method of parameter estimation (variational Bayesian inference), the negative correlation observed in our study should be treated with caution.

      Were those with more task-based anxiety more inflexible in general?

      The lack of associations across reward learning rate and task-induced anxiety suggest that this was not a general inflexibility effect. To test the reviewer’s hypothesis more directly, we conducted a sensitivity analysis by examining the model with a general learning rate – this did not support a general inflexibility effect. Please see the new section in the Supplement below:

      [9.10 Sensitivity analysis: anxiety and inflexibility]

      As anxious participants were slower to update their estimates of punishment probability, we determined whether this was due to greater general inflexibility by examining the model including two sensitivity parameters, but one general learning rate (i.e. not split by outcome). The correlation between this general learning rate and task-induced anxiety was not significant in either samples (discovery: tau = -0.02, p = 0.504; replication: tau = -0.01, p = 0.625), suggesting that the effect is specific to punishment.

      Was the 16% versus 20% of the two samples with clinically relevant anxiety symptoms significantly different? What about other demographics in the two samples?

      The difference in proportions were not significantly different (χ2 = 2.33, p = 0.127). The discovery sample included more females and was older on average compared to the replication sample – information which we now report in the manuscript:

      The discovery sample consisted of a significantly greater proportion of female participants than the replication sample (59% vs 52%, χ2 = 4.64, p = 0.031). The average age was significantly different across samples (discovery sample mean = 37.7, SD = 10.3, replication sample mean = 34.3, SD = 10.4; t785.5 = 5.06, p < 0.001). The differences in self-reported psychiatric symptoms across samples did not reach significance (p > 0.086).

      It would be interesting to know how many participants failed the audio attention checks.

      We have now included information about what proportion of participants fail each of the task exclusion criteria in the manuscript:

      Firstly, we excluded participants who missed a response to more than one auditory attention check (see above; 8% in both discovery and replication samples) – as these occurred infrequently and the stimuli used for the checks were played at relatively low volume, we allowed for incorrect responses so long as a response was made. Secondly, we excluded those who responded with the same response key on 20 or more consecutive trials (> 10% of all trials; 4/6% in discovery and replication samples, respectively). Lastly, we excluded those who did not respond on 20 or more trials (1/2% in discovery and replication samples, respectively). Overall, we excluded 51 out of 423 (12%) in the discovery sample, and 98 out of 725 (14%) in the replication sample.

      There doesn't appear to be a model with only learning from punishment (i.e. no reward learning) included in the model comparison. It would be interesting to see how it compared.

      We have fitted the suggested model and found that it is the least parsimonious of the models. Since participants were monetarily incentivised based on the rewards only, this was to be expected. We have now added this ‘punishment learning only’ model and its variant including a lapse term into the model comparison. The two lowest bars on the y-axis in Author response image 2 represent these models.

      Author response image 2.

      Were sex effects examined as these have been commonly found in AAC tasks. How about other covariates such as age?

      We have now tested the effects of sex and age on behaviour and on parameter values. There were indeed some significant effects, albeit with some inconsistencies across the two samples, which for completeness we have included in the manuscript, as follows:

      While sex was significantly associated with choice in the discovery sample (β = 0.16 ± 0.07, p = 0.028) with males being more likely to choose the conflict option, this pattern was not evident in the replication sample (β = 0.08 ± 0.06, p = 0.173), and age was not associated with choice in either sample (p > 0.2).

      Comparing parameters across sexes via Welch’s t-tests revealed significant differences in reward sensitivity (t289 = -2.87, p = 0.004, d = 0.34; lower in females) and consequently reward-punishment sensitivity index (t336 = -2.03, p = 0.043, d = 0.22; lower in females i.e. more avoidance-driven). In the replication sample, we observed the same effect on reward-punishment sensitivity index (t626 = -2.79, p = 0.005, d = 0.22; lower in females). However, the sex difference in reward sensitivity did not replicate (p = 0.441), although we did observe a significant sex difference in punishment sensitivity in the replication sample (t626 = 2.26, p = 0.024, d = 0.18).

      Minor: Still a few placeholders (Supplementary Table X/ Table X) in the methods

      We thank the reviewer for spotting these errors. We have now corrected these references.

      Reviewer #3 (Public Review):

      This study investigated cognitive mechanisms underlying approach-avoidance behavior using a novel reinforcement learning task and computational modelling. Participants could select a risky "conflict" option (latent, fluctuating probabilities of monetary reward and/or unpleasant sound [punishment]) or a safe option (separate, generally lower probability of reward). Overall, participant choices were skewed towards more rewarded options, but were also repelled by increasing probability of punishment. Individual patterns of behavior were well-captured by a reinforcement learning model that included parameters for reward and punishment sensitivity, and learning rates for reward and punishment. This is a nice replication of existing findings suggesting reward and punishment have opposing effects on behavior through dissociated sensitivity to reward versus punishment.

      Interestingly, avoidance of the conflict option was predicted by self-reported task-induced anxiety. This effect of anxiety was mediated by the difference in modelled sensitivity to reward versus punishment (relative sensitivity). Importantly, when a subset of participants were retested over 1 week later, most behavioral tendencies and model parameters were recapitulated, suggesting the task may capture stable traits relevant to approach-avoidance decision-making.

      We thank the reviewer for their useful analysis of our study. Indeed, it was reassuring to see that performance indices were reliable across time.

      However, interpretation of these findings are severely undermined by the fact that the aversiveness of the auditory punisher was largely determined by participants, with the far-reaching impacts of this not being accounted for in any of the analyses. The manipulation check to confirm participants did not mute their sound is highly commendable, but the thresholding of punisher volume to "loud but comfortable" at the outset of the task leaves substantial scope for variability in the punisher delivered to participants. Indeed, participants' ratings of the unpleasantness of the punishment was moderate and highly variable (M = 31.7 out of 50, SD = 12.8 [distribution unreported]). Despite having this rating, it is not incorporated into analyses. It is possible that the key finding of relationships between task-induced anxiety, reward-punishment sensitivity and avoidance are driven by differences in the punisher experienced; a louder punisher is more unpleasant, driving greater task-induced anxiety, model-derived punishment sensitivity, and avoidance (and vice versa). This issue can also explain the counterintuitive findings from re-tested participants; lower/negatively correlated task-induced anxiety and punishment-related cognitive parameters may have been due to participants adjusting their sound settings to make the task less aversive (retest punisher rating not reported). It can therefore be argued that the task may not actually capture meaningful cognitive/motivational traits and their effects on decision-making, but instead spurious differences in punisher intensity.

      We thank the reviewer for raising this important potential limitation of our study. We agree that how participants self-adjusted their sound volume may important consequences for our interpretations of the data. Unfortunately, despite the scalability of online data collection, this highlights one of its major weaknesses in the lack of controllability over experimental parameters. The previous paper from which we obtained our aversive sounds (Seow & Hauser, 2021, Behav Res, doi.org/10.3758/s13428-021-01643-0) contains useful analyses with regards to this discussion. When comparing the unpleasantness of the sounds played at 50% vs 100% volume, the authors indeed found that the lower volumes lead to lower unpleasantness ratings. However, the magnitude of this effect did not appear to be substantial (Fig. 4 from the paper), and even at 50% volume, the scream sounds we used were rated in the top quartile for unpleasantness, on average. This implies that the sounds have sufficient inherent unpleasantness, even when played at half intensity. We find this reassuring, in the sense that any self-imposed volume effects may not be large. Of note, our instructions to participants to adjust the volume to a ‘loud but comfortable’ level was based on the same phrasing used in this study.

      To the reviewers point on how this might affect the reliability of the task, we have included the following in the ‘Discussion’ section:

      Alternatively, as participants self-determined the loudness of the punishments, differences in volume settings across sessions may have impacted the reliability of this parameter (and indeed other measures).

      Please see below for analyses accounting for punishment unpleasantness ratings.

      This undercuts the proposed significance of this task as a translational tool for understanding anxiety and avoidance. More information about ratings of punisher unpleasantness and its relationship to task behavior, anxiety and cognitive parameters would be valuable for interpreting findings. It would also be of interest whether the same results were observed if the aversiveness of the punisher was titrated prior to the task.

      As suggested, we have now included sensitivity analyses using the unpleasantness ratings that show their effect is minimal on our primary inference. We report relevant results below in the ‘Recommendations For The Authors’ section. At the same time, we think it is important to acknowledge that unpleasantness is a combination of both the inherent unpleasantness of the sound and the volume it is presented at, where only the latter is controlled by the participant. Therefore, these analyses are not a perfect indicator of the effect of participant control. For convenience, we reproduce the key findings from this sensitivity analysis here:

      Approach-avoidance hierarchical logistic regression model

      We assessed whether approach and avoidance responses, and their relationships with state anxiety, were impacted by punishment unpleasantness, by including unpleasantness ratings as a covariate into the hierarchical logistic regression model. Whilst unpleasantness was a significant predictor of choice (positively predicting safe option choices), all significant predictors and interaction effects from the model without unpleasantness survived (Supplementary Figure 11). Critically, this suggests that punishment unpleasantness does not account for all of the variance in the relationship between anxiety and avoidance.

      Mediation model

      When unpleasantness ratings were included in the mediation models, the mediating effect of the reward-punishment sensitivity index did not survive (discovery sample: standardised β = 0.003 ± 0.003, p = 0.416; replication sample: standardised β = 0.004 ± 0.003, p = 0.100; Supplementary Figure 12). Pooling the samples resulted in an effect that narrowly missed the significance threshold (standardised β = 0.004 ± 0.002, p = 0.068).

      More generally, whether or not to titrate the punishments (and indeed the rewards) is an interesting experimental decision, which we think should be guided by the research question. In our case, we were interested in individual differences in reward/punishment learning and sensitivity and their relation to anxiety, so variation in how aversive the sounds affected approach-avoidance decisions was an important aspect of our design. In studies where the aim is to understand more general processes of how humans act under approach-avoidance conflict, it may be better to tightly control the salience of reinforcers.

      Ultimately, the best test of the causal role of anxiety on avoidance, and against the hypothesis that our results were driven by spurious volume control effects, would be to run within-subjects anxiety interventions, where these volume effects are naturally accounted for. This will be an important direction for future studies using similar measures. We have added a paragraph in the ‘Discussion’ section on this point:

      Relatedly, participants had some control over the intensity at which the punishments were presented, which may have driven our findings relating to anxiety and putative mechanisms of anxiety-related avoidance. Sensitivity analyses showed that our finding that anxiety is positively associated with avoidance in the task was robust to individual differences in self-reported punishment unpleasantness, whilst the mediation effects were not. Future work imposing better control over the stimuli presented, and/or using within-subjects designs will be needed to validate the role of reward/punishment sensitivities in anxiety-related avoidance.

      Although the procedure and findings reported here remain valuable to the field, claims of novelty including its translational potential are perhaps overstated. This study complements and sits within a much broader literature that investigates roles for aversion and cognitive traits in approach-avoidance decisions. This includes numerous studies that apply reinforcement learning models to behavior in two-choice tasks with latent probabilities of reward and punishment (e.g., see doi: 10.1001/jamapsychiatry.2022.0051), as well as other translationally-relevant paradigms (e.g., doi: 10.3389/fpsyg.2014.00203, 10.7554/eLife.69594, etc).

      We agree with the reviewer that our approach builds on previous work in reinforcement learning, approach-avoidance conflict and translational measures of anxiety. Whilst there are by now many studies using two-choice learning tasks with latent reward and punishment probabilities, our main, and which we refer to as ‘novel’, aim was to bring these fields together in such a way so as to model anxiety-related behaviour.

      We note that we do not make strong statements about whether these effects speak to traits per se, and as Reviewer 1 notes, the evidence from our study suggests that the present measure may be better suited to assessing state anxiety. While computational model parameters can and are certainly often interpreted as constituting stable individual traits, a more simple interpretation of our findings may be that state anxiety is associated with a momentary preference for punishment avoidance over reward pursuit. This can still be informative for the study of anxiety, especially given the notion of a continuous relationship between adaptive/state anxiety and maladaptive/persistent anxiety.

      Having said that, we agree with the underlying premise of the reviewer’s point that how the measure relates to trait-level avoidance/inhibition measures will be an interesting question for future work. We appreciate the importance of using tasks such as ours and those highlighted by the reviewer as trait-level measures, especially in computational psychiatry. We have now included a discussion on the potential roles of cognitive/motivational traits, in line with the reviewer’s recommendation – briefly, we have included the suggested references by the reviewer, discussed the measure’s potential relevance to cognitive/motivational traits, and direct interested readers to the broader literature. Please see below for details.

      Reviewer #3 (Recommendations For The Authors):

      As stated in the public review, punisher unpleasantness and its relationship to key findings (including for retest) should be reported and discussed.

      We signpost readers to our new analyses, incorporating unpleasantness ratings into the statistical models, from the main manuscript as follows:

      Since participants self-determined the volume of the punishments in the task, and therefore (at least in part) their aversiveness, we conducted sensitivity analyses by accounting for self-reported unpleasantness ratings of the punishment (see the Supplement). Our finding that anxiety impacts approach-avoidance behaviour was robust to this sensitivity analysis (p < 0.001), however the mediating effect of the reward-sensitivity sensitivity index was not (p > 0.1; see Supplement section 9.9 for details).

      We reproduce the relevant section from the Supplement below. Overall, we found that the effect of anxiety on choices (via its interaction with punishment probability) remained significant after accounting for unpleasantness, however the mediating effect of reward-punishment sensitivity was no longer significant when unpleasantness ratings were included in the model. As noted above, unpleasantness ratings are not a perfect measure of self-imposed sound volume, and indeed punishment sensitivity is essentially a computationally-derived measure of unpleasantness, which makes it difficult to interpret the mediation model which contains both of these measures. However, since we found that anxiety affected choice over and above and effects of self-imposed sound volume (using unpleasantness ratings as a proxy measure), we argue that the task still holds value as a model of anxiety-related avoidance.

      [Supplement Section 9.9: Sensitivity analyses of punishment unpleasantness]

      Distribution of unpleasantness

      The punishments were rated as unpleasant by the participants, on average (discovery sample: mean rating = 31.1 [scored between 0 and 50], SD = 13.1; replication sample: mean rating = 32.1, SD = 12.7; Supplementary Figure 10).

      Approach-avoidance hierarchical logistic regression model

      We assessed whether approach and avoidance responses, and their relationships with state anxiety, were impacted by punishment unpleasantness, by including unpleasantness ratings as a covariate into the hierarchical logistic regression model. Whilst unpleasantness was a significant predictor of choice (positively predicting safe option choices), all significant predictors and interaction effects from the model without unpleasantness ratings survived (Supplementary Figure 11). Critically, this suggests that punishment unpleasantness does not account for all of the variance in the relationship between anxiety and avoidance.

      Mediation model

      When unpleasantness ratings were included in the mediation models, the mediating effect of the reward-punishment sensitivity index did not survive (discovery sample: standardised β = 0.003 ± 0.003, p = 0.416; replication sample: standardised β = 0.004 ± 0.003, p = 0.100; Supplementary Figure 12). Pooling the samples resulted in an effect that narrowly missed the significance threshold (standardised β = 0.004 ± 0.002, p = 0.068).

      Test-retest reliability of unpleasantness

      The test-retest reliability of unpleasantness ratings was excellent (ICC(3,1) = 0.75), although participants gave significantly lower ratings in the second session (t56 = 2.7, p = 0.008, d = 0.37; mean difference of 3.12, SD = 8.63).

      Reliability of other measures with/out unpleasantness

      To assess the effect of accounting for unpleasantness ratings on reliability estimates of task performance, we extracted variance components from linear mixed models, following a standard approach (Nakagawa et al., 2017) – note that this was not the method used to estimate reliability values in the main analyses, but we used this specific approach to compare the reliability values with and without the covariate of unpleasantness ratings. The results indicated that unpleasantness ratings did not have a material effect on reliability (Supplementary Figure 14).

      We discuss the findings of these sensitivity analyses in the ‘Discussion’ section, as follows:

      Relatedly, participants had some control over the intensity at which the punishments were presented, which may have driven our findings relating to anxiety and putative mechanisms of anxiety-related avoidance. Sensitivity analyses showed that our finding that anxiety is positively associated with avoidance in the task was robust to individual differences in self-reported punishment unpleasantness, whilst the mediation effects were not. Future work imposing better control over the stimuli presented, and/or using within-subjects designs will be needed to validate the role of reward/punishment sensitivities in anxiety-related avoidance.

      Introduction and discussion should spend more time relating the task and current findings to existing procedures and findings examining individual differences in avoidance and cognitive/motivational correlates.

      We thank the reviewer for the opportunity to expand on the literature. Whilst there are numerous behavioural paradigms in both the human and non-human literature that involve learning about rewards and punishments, our starting point for the introduction was the state-of-the-art in translational models of approach-avoidance conflict models of anxiety. Therefore, for the sake of brevity and logical flow of our introduction, we have opted to bring in the discussion on other procedures primarily in the ‘Discussion’ section of the manuscript.

      We have now included the reviewer’s suggested citations from their ‘Public Review’ as follows:

      Since we developed our task with the primary focus on translational validity, its design diverges from other reinforcement learning tasks that involve reward and punishment outcomes (Pike & Robinson, 2022). One important difference is that we used distinct reinforcers as our reward and punishment outcomes, compared to many studies which use monetary outcomes for both (e.g. earning and losing £1 constitute the reward and punishment, respectively; Aylward et al., 2019; Jean-Richard-Dit-Bressel et al., 2021; Pizzagalli et al., 2005; Sharp et al., 2022). Other tasks have been used that induce a conflict between value and motor biases, relying on prepotent biases to approach/move towards rewards and withdraw from punishments, which makes it difficult to approach punishments and withdraw from rewards (Guitart-Masip et al., 2012; Mkrtchian et al., 2017). However, since translational operant conflict tasks typically induce a conflict between different types of outcome (e.g. food and shocks/sugar and quinine pellets; Oberrauch et al., 2019; van den Bos et al., 2014), we felt it was important to implement this feature. One study used monetary rewards and shock-based punishments, but also included four options for participants to choose from on each trial, with rewards and punishments associated with all four options (Seymour et al., 2012). This effectively requires participants to maintain eight probability estimates (i.e. reward and punishment at each of the four options) to solve the task, which may be too difficult for non-human animals to learn efficiently.

      We have also included a discussion on the measure’s potential relevance to cognitive/motivational traits as follows:

      Finally, whilst there is a broad literature on the roles of behavioural inhibition and avoidance tendency traits on decision-making and behaviour (Carver & White, 1994; Corr, 2004; Gray, 1982), we did not replicate the correlation of experiential avoidance and avoidance responses or the reward-punishment sensitivity index. Since there were also no significant correlations across task performance indices and clinical symptom measures, our findings suggest that the measure may be more sensitive to behaviours relating to state anxiety, rather more stable traits. Nevertheless, how performance in the present task relates to other traits such as behavioural approach/inhibition tendencies (Carver & White, 1994), as has been found in previous studies on reward/punishment learning (Sharp et al., 2022; Wise & Dolan, 2020) and approach-avoidance conflict (Aupperle et al., 2011), will be an important question for future work.

      We also now direct readers to a recent, comprehensive review on applying computational methods to approach-avoidance behaviours in the ‘Introduction’ section:

      A fundamental premise of this approach is that the brain acts as an information-processing organ that performs computations responsible for observable behaviours, including approach and avoidance (for a recent review on the application of computational methods to approach-avoidance conflict, see Letkiewicz et al., 2023).

      I am curious why participants were excluded if they made the same response on 20+ consecutive trials. How does this represent a cut-off between valid versus invalid behavioral profiles?

      We apologise for the lack of clarity on this point in our original submission – this exclusion criterion was specifically if participants used the same response key (e.g. the left arrow button) on 20 or more consecutive trials, indicating inattention. Since the left-right positions of the stimuli were randomised across trials, this did not exclude participants who repeatedly chose the same option frequently. However, as we show in the Supplement, this, along with the other exclusion criteria, did not affect our main findings.

      We have now clarified this as follows:

      … we excluded those who responded with the same response key on 20 or more consecutive trials (> 10% of all trials; 4%/6% in discovery and replication samples, respectively) – note that as the options randomly switched sides on the screen across trials, this did not exclude participants who frequently and consecutively chose a certain option.

    1. Author Response

      The following is the authors’ response to the previous reviews

      Reviewer # 1 (Public Review)

      Specific comments

      1) For all cell-based assays using shRNA to knock down CRB3, it would be desirable to perform rescue experiments to ensure that the observed phenotype of CRB3 depleted cells is specific and not due to off-target effects of the shRNA.

      Thank you for your comments. Based on your suggestions, we performed the rescue experiments to observe any alterations in the primary cilia of CRB3-depleted MCF10A cells with overexpressed CRB3. The revised parts can be found in lines 186-188 and the new Supplementary Figure 3A-C has been added.

      2) Figure 3G: it is very difficult to see that the red stained structures are primary cilia.

      Yes, the staining structure of primary cilia in mammary ductal lumen are less clear than that of individual cells and in renal tubule in Figure 3G. We used recognized acetylated tubulin and γ-tubulin to stain the primary cilia, which were clearly labeled in individual cells. However, the labeled primary cilia in renal tubule were longer length and demonstrated a more pronounced structure than those in the mammary ductal lumen. In the mammary ductal lumen of the 10 mice we analyzed, the primary cilia showed shorter length and staining structure than the others shown in Figure 3G. This difference may be due to the distinct characteristics of primary cilia in different tissues.

      3) Figure 5A: it is unfortunate the authors chose not to show the original dataset (Excel file) used for generating this figure; this makes it difficult to interpret the data. It is general policy of the journal to make source data accessible to the scientific community.

      In accordance with the journal policy, we have provided the original dataset (Excel file) for Figure 5A, as detailed in “Figure 5–Source Data 1”.

      4) The authors have a tendency to overinterpret their data, and not all claims put forth by the authors are fully supported by the data provided.

      We have carefully read through the whole text and have revised the overinterpretation parts. These parts can be found in lines 48-50, lines 93-95, and lines 260-261.

      Reviewer # 2 (Public Review)

      Thank you for recognizing and supporting our research for this manuscript.  

      Reviewer # 1 (Recommendations For The Authors)

      1) Abstract line 48-51: data overinterpretation. The authors cannot claim this based on the data they are presenting. Please modify the statement/temper the claims.

      Thanks for your comments. We have revised this sentence in the abstract, as well as lines 48-50 for details.

      2) There are several grammatical errors throughout the manuscript. In particular, the following sentences/statements are either wrong, confusing or non-sensical: lines 55-56; lines 87-90; lines 93-95; lines 385-387; lines 409-410.

      Thanks for your positive comments. We have modified lines 55-56 to become new lines 54-55. These sentences in lines 87-90 and lines 93-95 are difficult to understand and logically problematic, so we have carefully revised this paragraph (new lines 85-90). Lines 385-387 have been deleted as they are non-sensical. Lines 409-410 contain misrepresentations. We have revised them in new lines 408-409.

      3) Lines 257-259: this is data over-interpretation. It is not correct to state CRB3 is highly dynamic without having done any live cell imaging.

      Thank you for your comments. We have revised this sentence, see revised lines 260-261 for details.

      4) Figure 8E: if cells do not make cilia when CRB3 is lost (Figure 3), how is it possible to analyze SMO localization to cilia in these cells?

      Thank you for your comments. We used immunofluorescence techniques, with acetylated tubulin and SMO co-staining, to analyze the localization of SMO to cilia. The results of immunofluorescent staining of primary cilium and statistical analysis in Figure 3 showed that the proportion of cells with primary cilium was significantly lower in the CRB3 knockdown group, but cells with primary cilium were still present. We used laser confocal microscopy micrographs to identify cells with primary cilium by staining acetylated tubulin, then analyzed the co-localization under the SMO channel, and finally analyzed the proportion of SMO-positive cilia. Several publications (J Cell Biol. 2020;219(6):e201904107; Science. 2008;320(5884):1777-81; Proc Natl Acad Sci U S A. 2012;109(34):13644-9.) have demonstrated that knocking down genes can affect primary cilium formation, and this method has also been used to examine the localization of SMO-related signaling pathway molecules on primary cilium.

      5) Lines 366-366: based on the relative low magnification of the images in Figure 8H it is difficult to assess the subcellular localization of GLI1 and whether there is a difference between wild type and the Crb3 mutant cells. For example, it is not clear if GLI1 is localizing to the centrosome-cilium axis. Please modify the text accordingly.

      Thank you for your good suggestions. As you mentioned, IHC cannot observe the subcellular localization of GLI1 on the centrosome-cilium axis. However, since GLI1 is a transcriptional effector at the terminal end of the Hh signaling pathway, we may not have made it clear that what we observed in the IHC results was the localization of GLI1 in the nucleus. Therefore, we have revised the description accordingly, as described in line 368 and lines 520-521.

      6) Figure 7D, E: the zoomed-in images look pixelated.

      Thank you for your positive comments. We have replaced these images in the new Figure 7D and E.

      7) Figure 8B: Acetylacte-tub is misspelled.

      Thank you for your comments. We have revised and standardized the acetylated tubulin stain to "Ace-tubulin" in all immunofluorescent images throughout the manuscript.

      Reviewer # 2 (Recommendations For The Authors)

      1) 1) CRB3 is present in mammals as 2 isoforms, A and B, originating from an alternative splicing. In this study, the authors never mention this fact and when using approaches to KO or KD CRB3A/B they are likely to deplete both isoforms which have been shown to have different C-terminal domains and functions (Fan et al., 2007). This is also important for the CRB3 antibodies used in the study since according to the material and methods section they are either against the extracellular domain common to both isoforms or the intracellular domain which is only similar in the domain close to transmembrane between the 2 isoforms. Since the antibodies used in each figure are not detailed it is impossible to know if the authors are detecting CRB3A or B or both. Please provide the information and correct for the actual isoform detected in the data and conclusions.

      From the revised version we know now that CRB3B is used for exogenous expression. It has been shown that each isoform has a different role and localization in cells so why focus only on CRB3B for this study?

      Thank you for your positive comments. First, previous literature has reported that CRB3b localizes in the primary cilium of MDCK cells. We have corrected the Introduction to specify CRB3b (line 81). Secondly, in the methodology section, we show that the CDs sequence of CRB3b was PCR-amplified from RNA extracted from MCF10A cells. We also designed primers specific to CRB3a but were unable to amplify them, indicating that CRB3b is significantly more expressed in epithelial cells than CRB3a. Finally, according to the company recommended by Genecards website for purchasing CRB3 cloning products, the only CRB3 sequence available in the CRB3 cDNA ORF Clone in Cloning Vector, Human (Cat: HG14324-G) from Sino Biological is CRB3b.

      2) 3) The authors use GFP-CRB3A/B, it is not stated which isoform, over-expression to localize CRB3A/B in MCF10A cells (figure 4A). The levels of expression appear to be very high in the GFP panel and it is likely that the secretory pathway of the cells is clogged with GFP-CRB3A/B in transit from the ER to the plasma membrane. Thus, the colocalization with pericentrin might be due to the accumulation of ER and Golgi around the centrosome. This colocalization should be done with the endogenous CRB3A/B and with a better resolution.

      The authors do not answer about the potential mislocalization of overexpressed exogenous protein.

      We acknowledge the reviewer's perspective. The large amount of exogenous protein overexpression in the cell could potentially obstruct the protein secretion pathway, resulting in the accumulation of the exogenous protein at the ER and Golgi. Such accumulation could create the false impression of co-localization between CRB3b and the centrosome. To provide additional details (lines 215-217 and lines 426-433), we re-expressed the results exogenously and subsequently used staining of endogenous CRB3 and γ-tubulin in Fig. 4C to confirm the co-localization of CRB3 and the centrosome.

      3) 4) The staining for CRB3A/B in Figure 4C (red) is striking with a very strong accumulation in an undefined intracellular structure and the authors do not provide any explanation for such a difference with the GFP-CRB3A/B just above.

      The authors explain that two different photonic techniques are used (classical versus confocal) but in a cell biology manuscript confocal microscopy is now the standard technique.

      Thank you for your comments. We have included a discussion on the partial concordance between CRB3's endogenous staining and exogenous expression results in the "Discussion" section, specifically in lines 420-435.

      4) 7) In addition, the authors claim (Line 251/252) that Rab11 is necessary for the transport of CRB3A/B but they should KD Rab11 to show this.

      The author's answer is that blocking endocytosis with dynasore is as good as knocking down Rab11 to show its interaction and role in CRB3A/B transport which is not the case.

      Thank you for your comments. As requested by the reviewers, we have conducted experiments to knockdown Rab11 and detect CRB3 intracellular trafficking, as shown in the new Supplementary Figure 5B and added lines 258-260. These results provide additional support for our conclusions.

      5) 8) The domain of CRB3A/B that is necessary for the interaction with Rab11 is the N-terminal part of the extracellular domain. This domain is thus inside the transport vesicles and not accessible from the cytoplasm. Given that Rab11 is a cytoplasmic protein, how the 2 proteins could interact across the membrane? The authors do not even discuss this essential point for their hypothesis. Comment on the revised version: the authors still do not understand the basic of cell biology since they claim that the extracellular domain of CRB3 can be in contact with Rab11 after endocytosis. Even after endocytosis the extracellular domain of CRB3A/B is inside the lumen of the endosome and not in contact with the cytosol where Rab11 is located. Lines 420-421 of the revised manuscript still claim this interaction between the two proteins without providing the link between the cytosol where Rab11 is and the endosome lumen where the extracellular domain of CRB3A/B is. Please correct.

      Thank you for your positive comments. After carefully studying the relevant knowledge, we strongly agree with the reviewer's point of view. We have toned down our claim and removed the description regarding the binding of Rab11 endosomes to specific structural domains of intracellular CRB3 that we were unable to confirm (see lines 443-444 and lines 465-466).

    2. eLife assessment

      This is a useful study for scientists interested in cell polarity, epithelial morphogenesis, cancer, and primary cilia. The authors investigate the role of CRB3 in regulating these processes by using a combination of a mammary epithelial cell-specific conditional Crb3 knockout mouse model, and cellular, molecular and biochemical approaches. The results, which are solid, supporting and extending previous findings, suggest that CRB3 affects ciliogenesis by a mechanism involving Rab11 and gamma-TuRC.

    3. Reviewer #1 (Public Review):

      In this study the authors first perform global knockout of the gene coding for the polarity protein Crumbs 3 (CRB3) in the mouse and show that this leads to perinatal lethality and anopthalmia. Next, they create a conditional knockout mouse specifically lacking CRB3 in mammary gland epithelial cells and show that this leads to ductal epithelial hyperplasia, impaired branching morphogenesis and tumorigenesis. To study the mechanism by which CRB3 affects mammary epithelial development and morphogenesis the authors turn to MCF10A cells and find that CRB3 shRNA-mediated knockdown in these cells impairs their ability to form properly polarized acini in 3D cultures. Furthermore, they find that MCF10A cells lacking CRB3 display reduced primary ciliation frequency compared to control cells, which is supported by rescue experiments and is in agreement with previous studies implicating CRB3 in primary cilia biogenesis. Using a combination of biochemical, molecular- and imaging approaches the authors then provide evidence indicating that CRB3 promotes ciliogenesis by mediating Rab11-dependent recruitment of gamma-tubulin ring complex (gamma-TuRC) component GCP6 to the centrosome/ciliary base, and they also show that CRB3 itself is localized to the base of primary cilia. Finally, to assess the functional consequences of CRB3 loss on ciliary signaling function, the authors analyze the effect of CRB3 loss on Hedgehog and Wnt signaling using cell-based assays or a mouse model.

      Overall, the described findings are interesting and in agreement with previous studies showing an involvement of CRB3 in epithelial cell biology, tumorigenesis and ciliogenesis. The results showing a role for CRB3 in mammary epithelial development and morphogenesis in vivo seem convincing. Although the authors provide evidence that CRB3 promotes ciliogenesis via (indirect) physical association with Rab11 and gamma-TuRC, the precise mechanism by which CRB3 promotes ciliogenesis remains to be clarified.

    1. Author Response

      Reviewer #1 (Public Review):

      The authors report a new bioinformatics pipeline ("SPICE") to predict pairwise cooperative binding-sites based on input ChIP-seq data for transcription factor (TF)-of-interest, analyzed against DNA-binding sites (DNA motifs) in a database (HOCOMOCO). The pipeline also predicts the optimal distance between the paired binding sites. The pipeline correctly predicted known/reported transcription factor cooperations, and also predicted new cooperations, not yet reported in literature. The authors choose to follow up on the predicted interaction between Ikaros and Jun. Using ChIP-seq in mouse B cells, they show extensive overlap in binding regions between Ikaros and Jun in LPS+IL21 stimulated cells. In a human B-lineage cell line (MINO) they show that anti-Ikaros Ab can co-immunoprecipitate Jun protein, and that the MINO cell extracts contain protein(s) that can bind to the CNS9 probe (conserved region upstream of IL10 gene), and that binding is lost upon mutation of two basepairs in the AP1 binding motif, and reduced upon mutation of two basepairs in the non-canonical Ikaros binding motif. Part of this protein complex is super-shifted with an anti-Jun antibody, and more DNA is shifted with addition of an anti-Ikaros antibody.

      The authors perform EMSA showing that recombinant Jun can bind to the tested DNA-region (IL10 CNS9) and that addition of recombinant Ikaros (or anti-Ikaros antibody in Fig 3E) can enhance binding (increase amount of DNA shifted). The authors lastly show that the IL10 CNS9 DNA region can enhance transcription in B- and T-cells with a luciferase reporter assay, and that 2 bp mutation of the Ikaros or Jun DNA motifs greatly reduce or abolish this activity.

      This is interesting work, with two main contributions: The SPICE pipeline (if made available to the scientific community), and the report of interaction between Ikaros and Jun. However, the distinction between DNA motifs, and the proteins actually binding and having a biological function, should be made clear consistently throughout the manuscript. The same DNA motifs can be bound by multiple factors, for instance within transcription factor families with highly homology in the DNA-binding regions of the proteins.

      The reviewer has correctly assessed the content of our manuscript.

      Some specific points:

      SPICE: It is unclear if this is uploaded somewhere to be available to the scientific community.

      Thanks for this comment. We will upload the SPICE pipeline and its associated scripts (R and shell) via GitHub.

      It was unclear if Ikaros-Jun interaction was initially found from primary Jun ChIP-seq (and secondary Ikaros motif from HOCOMOCO) or from primary Ikaros CHIP-seq (and secondary Jun motif from HOCOMOCO). And - what were the two DNA motifs (primary and secondary, and their distance) from the SPICE analysis?

      The IKZF1-JUN interaction was found from primary JUN ChIP-seq data and searching for secondary IKZF1 motifs identified in the HOCOMOCO database. We will provide the primary and secondary motifs in our revised manuscript.

      Authors have mostly careful considerations and statements. One additional comment is that binding does not equal function (Fig 2D), and that opening of chromatin (by any other factor(s)) can give DNA-binding factors (like Ikaros and Jun) the opportunity to bind, without functional consequence for the biological process studied.

      We appreciate that the reviewer believes our considerations and statement are careful. We agree that opening of chromatin can give the opportunity of factors to bind, and we now make this point in the manuscript.

      Figure 2E: Ikaros is reported to be expressed at baseline in murine B cells, yet the Ikaros ChIP-seq in unstimulated cells had what looks to be no significant or low peaks. LPS stimulation induced strong Ikaros ChIP-seq signal. A western blot showing the Ikaros protein levels in the 3 conditions could help understand if the binding pattern is due to protein expression level induction. Similar for Jun (western in the 3 conditions), which seemed to mainly bind in the LPS+IL21 condition. Furthermore, as also suggested below, tracks showing Ikaros and Jun binding from all conditions (unstimulated, LPS only and LPS+IL21 stimulated cells), at select genomic loci, would be helpful in illustrating this difference in signal between the different cell conditions. This is relevant in regards to the point of cooperativity of binding.

      The main point of the paper was showing functional cooperation and proximity of binding. However, the use of purified JUN and Ikaros protein suggest cooperative binding. Exhaustive evaluation of the JUN-Ikaros association is left for future studies.

      ChIP-seq in mouse B cells showed that Ikaros bound strongly in LPS stimulated cells, in the (relative) absence of Jun binding (Fig. 2C). However, in EMSA (Fig 3C), there is no binding when the AP1 site is mutated, and the authors describe this as Ikaros binding site. What does the Ikaros binding look like at this genomic location in LPS (only) stimulated cells? The authors could show the same figure as in Fig 2F but show Ikaros and Jun ChIP-seq tracks at IL10 CNS9 locus from all conditions to compare binding in unstimulated, LPS and LPS+IL21 cells.

      As requested, we now show Ikaros and Jun ChIP-seq tracks from unstimulated, LPS-treated, and LPS + IL21-treated cells. Both Ikaros and cJUN were bound to the Il10 upstream CNS9 region with LPS treatment of cells (see Author response image 1, highlighted in red box), but binding was weaker than that observed with LPS + IL21.

      Author response image 1.

      Also: How does this reconcile with the luciferase assay in Fig 4E, where LPS (only) stimulation is used, which in Fig 2E only/mainly induced Ikaros, and not Jun ChIP-seq signal (while EMSA indicate Ikaros cannot bind the site alone, but can enhance Jun-dependent binding).

      As shown above, in the LPS (only) condition, both IKZF1 (Ikaros) and cJUN bind to Il10 CNS9 locus. Thus, this is not in conflict with our luciferase assay data in Fig. 4E, which showed Ikaros is dependent on AP-1 binding. Moreover, the AP-1 site in Fig. 4D and 4E can be bound by other AP-1 factors as well, such as JUND, JUNB, BATF, etc. These points can be made in the manuscript. These factors potentially can compete with cJUN binding and their roles remain to be explored.

      Comment on statements in results section: The luciferase assays in B and T cells do not demonstrate the role of the proteins Ikaros or Jun directly (page 10, lines 208 and surrounding text). The assay measures an effect of the DNA sequences (implying binding of some transcription factor(s)), but does not identify which protein factors bind there.

      We agree with the reviewer. It is reasonable and even likely that different family members may be partially redundant. This point is now made on our revised manuscript.

      Lastly, the authors only discuss Ikaros (using the term IKZF1 which is the gene symbol for the Ikaros protein). There are other Ikaros family members that have high homology and that are reported to bind similar DNA sequences (for instance Aiolos and Helios), which are expressed in B-cells and T-cells. A discussion of this is of relevance, as these are different proteins, although belonging to the same family (the Ikaros-family) of transcription factors. For instance, western for Aiolos and Helios will likely detect Aiolos in the B cells used, and Helios in the T cells used.

      We agree with the reviewer. As requested, we now discuss the possibility that Aiolos or Helios may also contribute.

      Reviewer #2 (Public Review):

      The study is performed with old tool Spamo (12 year ago), source data from Encode (2010-2012), even peak caller tool version MACS is old ~ 2013. De novo motif search tool is old too (new one STREME is not mentioned). Any composite element search tool published for the recent 12 years are not cited, there are some issues in data analysis in presentation. Almost all references are from about 8-10 year ago (the most recent date is 2019)

      The title is misleading

      Instead of “A new pipeline SPICE identifies novel JUN-IKZF1 composite elements”

      It should be written as “Application of SpaMo tool identifies novel JUN-IKZF1 composite elements”

      It reflects the pipeline better but honestly shows that the novelty is missed.

      Regarding the above two points, we respectfully disagree with the reviewer. Although SpaMo was used, the pipeline we developed is new and our findings are distinctive. The pipeline can systematically screen and predict novel protein-protein binding complex, and our discovery related to IKZF1-JUN composite element is new and the biological findings and validation are distinctive. This point is now made in the revised manuscript. As requested, we have added some additional references.

      The study was performed on too old data from ENCODE, authors mentioned 343 Encode ChIP-Seq libraries, but authors even did not care even about to set for each library the name of target TF (Figure 1E, Figure S2, Table 2).

      Although we used ENCODE data, which was in part when we initially developed the algorithm, those data are valid and using them allowed us to demonstrate the functionality of SPICE, which is versatile and can be used on datasets of one’s choice as well. As requested, in the revised manuscript we have added the names of the TFs in Figs, Fig. S2, and Table 1.

      Reviewer #3 (Public Review):

      The authors of this study have designed a novel screening pipeline to detect DNA motif spacing preferences between TF partners using publicly available data. They were able to recapitulate previously known composite elements, such as the AP-1/IRF4 composite elements (AICE) and predict many composite elements that are expected to be very useful to the community of researchers interested in dissecting the regulatory logic of mammalian enhancers and promoters. The authors then focus on a novel, SPICE predicted interaction between JUN and IKZF1, and show that under LPS and IL-21 treatment, JUN and IKZF1 in B cells have significant overlap in their genomic localization. Next, to know whether the two TFs physically interact, a co-immunoprecipitation experiment was performed. While JUN immunoprecipitated with an anti-IKZF1 antibody, curiously IKZF1 did not immunoprecipitate with an anti-JUN antibody. Finally, EMSA and luciferase experiments were performed to show that the two TFs bind cooperatively at an IL20 upstream probe.

      The reviewer has described the basic results of the study.

      Major strengths:

      1) SPICE was able to recapitulate previously known composite elements, such as the AP-1/IRF4 composite elements (AICE).

      2) Under LPS and IL-21 treatment, JUN and IKZF1 in B cells have significant overlap in their genomic localization. This is very good supporting evidence for the efficacy of SPICE in detecting TF partners.

      We are glad that the reviewer believes that SPICE is effective in detecting TF partners.

      Major weaknesses:

      1) The authors fail to convincingly show that IKZF1 and Jun physically interact. A quantitative measurement of their interaction strength would have been ideal.

      We agree that it is not conclusive that the factors interact directly as opposed to binding to nearby sites on DNA, which is what SPICE was intended to detect. We never intended to claim that we established a definite physical interaction. The coIP worked in one direction, but not reliably in the other, even though we have tried a total of four different antibodies. We now mention in the revised manuscript that we have tried the additional anti-JUN antibodies, cJun (60A8, CST) and JunD (D17G2, CST).

      2) The super-shift experiment to show that the proteins bound to their EMSA probe were indeed IKZF1 and JUN are not very convincing and would benefit from efforts to quantify the shift (Figure 3E). Nuclear extracts from cells with single or double CRISPR knock outs of the two TFs would have been ideal.

      We agree that using single or double knockouts would be helpful, but other Ikaros family or Jun family members could be involved, so such studies might not be definitive. That is why we used purified proteins to show apparent cooperative binding (Figure 4C).

      3) There is a second band beneath the more prominent band in the EMSA experiment with recombinant IKZF1 and JUN (Figure 4C). This second band is most probably bound by IKZF1 because it becomes weaker when the IKZF1 site is mutated and is completely absent when only JUN is added. This is completely ignored by the authors. Therefore, experiments with EMSA fail to convincingly show that IKZF1 and Jun bind cooperatively. They could just as well bind independently to the two sites.

      The second band has a faster mobility and might relate to IKZF1, although this is difficult to know. We comment on this band on revised manuscript. As noted above, the purified protein experiments do suggest cooperativity. However, our overall intent was to identify factors binding in proximity, which SPICE has successfully done, even if the binding was “independent”.

    2. eLife assessment

      This valuable study presents a screening pipeline (SPICE) for detecting DNA motif spacing preferences between TF partners. SPICE predicts previously known composite elements, but experiments to elucidate the nature of the predicted novel interaction between JUN and IKZF1 are incomplete. These experiments would benefit from more rigorous approaches using other databases to explore additional relevant data. The work will be of broad interest to those involved in dissecting the regulatory logic of mammalian enhancers and promoters.

    3. Reviewer #2 (Public Review):

      The study is performed with old tool Spamo (12 year ago), source data from Encode (2010-2012), even peak caller tool version MACS is old ~ 2013. De novo motif search tool is old too (new one STREME is not mentioned). Any composite element search tool published for the recent 12 years are not cited, there are some issues in data analysis in presentation. Almost all references are from about 8-10 year ago (the most recent date is 2019)

      The title is misleading<br /> Instead of<br /> A new pipeline SPICE identifies novel JUN-IKZF1 composite elements<br /> It should be written as<br /> Application of SpaMo tool identifies novel JUN-IKZF1 composite elements<br /> It reflects the pipeline better but honestly shows that the novelty is missed.

      The study was performed on too old data from ENCODE, authors mentioned 343 Encode ChIP-Seq libraries, but authors even did not care even about to set for each library the name of target TF (Figure 1E, Figure S2, Table 2).

    4. Reviewer #3 (Public Review):

      The authors of this study have designed a novel screening pipeline to detect DNA motif spacing preferences between TF partners using publicly available data. They were able to recapitulate previously known composite elements, such as the AP-1/IRF4 composite elements (AICE) and predict many composite elements that are expected to be very useful to the community of researchers interested in dissecting the regulatory logic of mammalian enhancers and promoters. The authors then focus on a novel, SPICE predicted interaction between JUN and IKZF1, and show that under LPS and IL-21 treatment, JUN and IKZF1 in B cells have significant overlap in their genomic localization. Next, to know whether the two TFs physically interact, a co-immunoprecipitation experiment was performed. While JUN immunoprecipitated with an anti-IKZF1 antibody, curiously IKZF1 did not immunoprecipitate with an anti-JUN antibody. Finally, EMSA and luciferase experiments were performed to show that the two TFs bind cooperatively at an IL20 upstream probe.

      Major strengths:<br /> 1. SPICE was able to recapitulate previously known composite elements, such as the AP-1/IRF4 composite elements (AICE).<br /> 2. Under LPS and IL-21 treatment, JUN and IKZF1 in B cells have significant overlap in their genomic localization. This is very good supporting evidence for the efficacy of SPICE in detecting TF partners.

      Major weaknesses:<br /> 1. The authors fail to convincingly show that IKZF1 and Jun physically interact. A quantitative measurement of their interaction strength would have been ideal.<br /> 2. The super-shift experiment to show that the proteins bound to their EMSA probe were indeed IKZF1 and JUN are not very convincing and would benefit from efforts to quantify the shift (Figure 3E). Nuclear extracts from cells with single or double CRISPR knock outs of the two TFs would have been ideal.<br /> 3. There is a second band beneath the more prominent band in the EMSA experiment with recombinant IKZF1 and JUN (Figure 4C). This second band is most probably bound by IKZF1 because it becomes weaker when the IKZF1 site is mutated and is completely absent when only JUN is added. This is completely ignored by the authors. Therefore, experiments with EMSA fail to convincingly show that IKZF1 and Jun bind cooperatively. They could just as well bind independently to the two sites.

    5. Reviewer #1 (Public Review):

      The authors report a new bioinformatics pipeline ("SPICE") to predict pairwise cooperative binding-sites based on input ChIP-seq data for transcription factor (TF)-of-interest, analyzed against DNA-binding sites (DNA motifs) in a database (HOCOMOCO). The pipeline also predicts the optimal distance between the paired binding sites. The pipeline correctly predicted known/reported transcription factor cooperations, and also predicted new cooperations, not yet reported in literature. The authors choose to follow up on the predicted interaction between Ikaros and Jun. Using ChIP-seq in mouse B cells, they show extensive overlap in binding regions between Ikaros and Jun in LPS+IL21 stimulated cells. In a human B-lineage cell line (MINO) they show that anti-Ikaros Ab can co-immunoprecipitate Jun protein, and that the MINO cell extracts contain protein(s) that can bind to the CNS9 probe (conserved region upstream of IL10 gene), and that binding is lost upon mutation of two basepairs in the AP1 binding motif, and reduced upon mutation of two basepairs in the non-canonical Ikaros binding motif. Part of this protein complex is super-shifted with an anti-Jun antibody, and more DNA is shifted with addition of an anti-Ikaros antibody.

      The authors perform EMSA showing that recombinant Jun can bind to the tested DNA-region (IL10 CNS9) and that addition of recombinant Ikaros (or anti-Ikaros antibody in Fig 3E) can enhance binding (increase amount of DNA shifted). The authors lastly show that the IL10 CNS9 DNA region can enhance transcription in B- and T-cells with a luciferase reporter assay, and that 2 bp mutation of the Ikaros or Jun DNA motifs greatly reduce or abolish this activity.

      This is interesting work, with two main contributions: The SPICE pipeline (if made available to the scientific community), and the report of interaction between Ikaros and Jun. However, the distinction between DNA motifs, and the proteins actually binding and having a biological function, should be made clear consistently throughout the manuscript. The same DNA motifs can be bound by multiple factors, for instance within transcription factor families with highly homology in the DNA-binding regions of the proteins.

      Some specific points:

      SPICE: It is unclear if this is uploaded somewhere to be available to the scientific community.

      It was unclear if Ikaros-Jun interaction was initially found from primary Jun ChIP-seq (and secondary Ikaros motif from HOCOMOCO) or from primary Ikaros CHIP-seq (and secondary Jun motif from HOCOMOCO). And - what were the two DNA motifs (primary and secondary, and their distance) from the SPICE analysis?

      Authors have mostly careful considerations and statements. One additional comment is that binding does not equal function (Fig 2D), and that opening of chromatin (by any other factor(s)) can give DNA-binding factors (like Ikaros and Jun) the opportunity to bind, without functional consequence for the biological process studied.

      Figure 2E: Ikaros is reported to be expressed at baseline in murine B cells, yet the Ikaros ChIP-seq in unstimulated cells had what looks to be no significant or low peaks. LPS stimulation induced strong Ikaros ChIP-seq signal. A western blot showing the Ikaros protein levels in the 3 conditions could help understand if the binding pattern is due to protein expression level induction. Similar for Jun (western in the 3 conditions), which seemed to mainly bind in the LPS+IL21 condition. Furthermore, as also suggested below, tracks showing Ikaros and Jun binding from all conditions (unstimulated, LPS only and LPS+IL21 stimulated cells), at select genomic loci, would be helpful in illustrating this difference in signal between the different cell conditions. This is relevant in regards to the point of cooperativity of binding.

      ChIP-seq in mouse B cells showed that Ikaros bound strongly in LPS stimulated cells, in the (relative) absence of Jun binding (Fig. 2C). However, in EMSA (Fig 3C), there is no binding when the AP1 site is mutated, and the authors describe this as Ikaros binding site. What does the Ikaros binding look like at this genomic location in LPS (only) stimulated cells? The authors could show the same figure as in Fig 2F but show Ikaros and Jun ChIP-seq tracks at IL10 CNS9 locus from all conditions to compare binding in unstimulated, LPS and LPS+IL21 cells.

      Also: How does this reconcile with the luciferase assay in Fig 4E, where LPS (only) stimulation is used, which in Fig 2E only/mainly induced Ikaros, and not Jun ChIP-seq signal (while EMSA indicate Ikaros cannot bind the site alone, but can enhance Jun-dependent binding).

      Comment on statements in results section: The luciferase assays in B and T cells do not demonstrate the role of the proteins Ikaros or Jun directly (page 10, lines 208 and surrounding text). The assay measures an effect of the DNA sequences (implying binding of some transcription factor(s)), but does not identify which protein factors bind there.

      Lastly, the authors only discuss Ikaros (using the term IKZF1 which is the gene symbol for the Ikaros protein). There are other Ikaros family members that have high homology and that are reported to bind similar DNA sequences (for instance Aiolos and Helios), which are expressed in B-cells and T-cells. A discussion of this is of relevance, as these are different proteins, although belonging to the same family (the Ikaros-family) of transcription factors. For instance, western for Aiolos and Helios will likely detect Aiolos in the B cells used, and Helios in the T cells used.

    1. eLife assessment

      This valuable study reports multi-scale molecular dynamics simulations to investigate a class of highly potent antibodies that simultaneously engage with the HIV-1 Envelope trimer and the viral membrane. The work provides insights into how broadly neutralizing antibodies associate with lipids proximal to membrane-associated epitopes to drive neutralization. However, the evidence for rules for lipid recognition in antibodies is still incomplete. In addition, the text would benefit from clearer subsections that delineate discrete mechanistic discoveries.

    2. Reviewer #1 (Public Review):

      Previous experimental studies demonstrated that membrane association drives avidity for several potent broadly HIV-neutralizing antibodies and its loss dramatically reduces neutralization. In this study, the authors present a tour de force analysis of molecular dynamics (MD) simulations that demonstrate how several HIV-neutralizing membrane-proximal external region (MPER)-targeting antibodies associate with a model lipid bilayer.

      First, the authors compared how three MPER antibodies, 4E10, PGZL1, and 10E8, associated with model membranes, constructed with a lipid composition similar to the native virion. They found that the related antibodies 4E10 and PGZL1 strongly associate with a phospholipid near heavy chain loop 1, consistent with prior crystallographic studies. They also discovered that a previously unappreciated framework region between loops 2-3 in the 4E10/PGZL1 heavy chain contributes to membrane association. Simulations of 10E8, an antibody from a different lineage, revealed several differences from published X-ray structures. Namely, a phosphatidylcholine binding site was offset and includes significant interaction with a nearby framework region.

      Next, the authors simulate another MPER-targeting antibody, LN01, with a model HIV membrane either containing or missing an MPER antigen fragment within. Of note, LN01 inserts more deeply into the membrane when the MPER antigen is present, supporting an energy balance between the lowest energy conformations of LN01, MPER, and the complex. Additional contacts and conformational restraints imposed by ectodomain regions of the envelope glycoprotein, however, remain unaddressed-the size of such simulations likely runs into technical limitations including sampling and compute time.

      The authors next established course-grained (CG) MD simulations of the various antibodies with model membranes to study membrane embedding. These simulations facilitated greater sampling of different initial antibody geometries relative to membrane. Distinct geometries derived from CG simulations were then used to initialize all-atom MD simulations to study insertion in finer detail (e.g., phospholipid association), which largely recapitulate their earlier results, albeit with more unbiased sampling. The multiscale model of an initial CG study with broad geometric sampling, followed by all-atom MD, provides a generalized framework for such simulations.

      Finally, the authors construct velocity pulling simulations to estimate the energetics of antibody membrane embedding. Using the multiscale modelling workflow to achieve greater geometric sampling, they demonstrate that their model reliably predicts lower association energetics for known mutations in 4E10 that disrupt lipid binding. However, the model does have limitations: namely, its ability to predict more subtle changes along a lineage-intermediate mutations that reduce lipid binding are indistinguishable from mutations that completely ablate lipid association. Thus, while large/binary differences in lipid affinity might be predictable, the use of this method as a generative model are likely more limited.

      The MD simulations conducted throughout are rigorous and the analysis are extensive. However, given the large amount of data presented within the manuscript, the text would benefit from clearer subsections that delineate discrete mechanistic discoveries, particularly for experimentalists interested in antibody discovery and design. One area the paper does not address involves the polyreactivity associated with membrane binding antibodies-MD simulations and/or pulling velocity experiments with model membranes of different compositions, with and without model antigens, would be needed. Finally, given the challenges in initializing these simulations and their limitations, the text regarding their generalized use for discovery, rather than mechanism, could be toned down.

      Overall, these analyses provide an important mechanistic characterization of how broadly neutralizing antibodies associate with lipids proximal to membrane-associated epitopes to drive neutralization.

    3. Reviewer #2 (Public Review):

      In this study, Maillie et al. have carried out a set of multiscale molecular dynamics simulations to investigate the interactions between the viral membrane and four broadly neutralizing antibodies that target the membrane proximal exposed region (MPER) of the HIV-1 envelope trimer. The simulation recapitulated in several cases the binding sites of lipid head groups that were observed experimentally by X-ray crystallography, as well as some new binding sites. These binding sites were further validated using a structural bioinformatics approach. Finally, steered molecular dynamics was used to measure the binding strength between the membrane and variants of the 4E10 and PGZL1 antibodies.

      The conclusions from the paper are mostly well supported by the simulations, however, they remain very descriptive and the key findings should be better described and validated. In particular:

      It has been shown that the lipid composition of HIV membrane is rich in cholesterol [1], which accounts for almost 50% molar ratio. The authors use a very different composition and should therefore provide a reference. It has been shown for 4E10 that the change in lipid composition affects dynamics of the binding. The robustness of the results to changes of the lipid composition should also be reported.

      The real advantage of the multiscale approach (coarse grained (CG) simulation followed by a back-mapped all atom simulation) remains unclear. In most cases, the binding mode in the CG simulations seem to be an artifact.

      The results reported in this study should be better compared to available experimental data. For example how does the approach angle compare to cryo-EM structure of the bnAbs engaging with the MPER region, e.g. [2-3]? How do these results from this study compare to previous molecular dynamics studies, e.g.[4-5]?

      References<br /> 1. Brügger, Britta, et al. "The HIV lipidome: a raft with an unusual composition." Proceedings of the National Academy of Sciences 103.8 (2006): 2641-2646.<br /> 2. Rantalainen, Kimmo, et al. "HIV-1 envelope and MPER antibody structures in lipid assemblies." Cell Reports 31.4 (2020).<br /> 3. Yang, Shuang, et al. "Dynamic HIV-1 spike motion creates vulnerability for its membrane-bound tripod to antibody attack." Nature Communications 13.1 (2022): 6393.<br /> 4. Carravilla, Pablo, et al. "The bilayer collective properties govern the interaction of an HIV-1 antibody with the viral membrane." Biophysical Journal 118.1 (2020): 44-56.<br /> 5. Pinto, Dora, et al. "Structural basis for broad HIV-1 neutralization by the MPER-specific human broadly neutralizing antibody LN01." Cell host & microbe 26.5 (2019): 623-637.

    1. eLife assessment

      This important study examines the effects of herbivory-induced maize volatiles on neighboring plants and their responses over time. Measurements of volatile compound classes and gene expression in receiver plants exposed to these volatiles led to the conclusion that the delayed emission of certain terpenes in receiver plants after the onset of light may be a result of stress memory, highlighting the role of priming and induction in plant defenses triggered by herbivore-induced plant volatiles (HIPVs). Most experimental data are compelling but additional experiments and accurate quantifications of the compounds would be required to confirm some of the main claims.

    2. Reviewer #1 (Public Review):

      The authors of the manuscript "High-resolution kinetics of herbivore-induced plant volatile transfer reveal tightly clocked responses in neighboring plants" assessed the effects of herbivory-induced maize volatiles on receiver plants over a period of time in order to assess the dynamics of the responses of receiver plants. Different volatile compound classes were measured over a period of time using PTR-ToF-MS and GC-MS, under both natural light:dark conditions, and continuous light. They also measured gene expression of related genes as well as defense-related phytohormones. The effects of a secondary exposure to GLVs on primed receiver plants were also measured.

      The paper addresses some interesting points, however, some questions arise regarding some of the methods employed. Firstly, I am wondering why VOCs (as measured by GC-MS) were not quantified. While I understand that quantification is time-consuming and requires more work, it allows for comparisons to be made between lines of the same species, as well as across other literature on the subject. As experiments with VOC dispensers were also used in this experiment, I find it even more baffling that the authors didn't confirm the concentration of the emission from the plants they used to make sure they matched. The references cited justifying the concentration used (saying it was within the range of GLVs emitted by their plants) to prepare the dispenser were for either a different variety of maize (delprim versus B73) or arabidopsis. Simply relying on the area under the curve and presenting results using arbitrary units is not enough for analyses like these.

      With regards to the correlation analyses shown in Figure 6, the results presented in many of the correlation plots are not actually informative. By blindly reporting the correlation coefficient important trends are being ignored, as there are clearly either bimodal relationships (e.g. upper left panel, HAC/TMTT, HAC/MNT) or even stranger relationships (e.g. upper left panel, IND/SQT, IND/MNT) that are not being well explained by a correlation plot. It is not appropriate to discuss the correlation factors presented here and to draw such strong conclusions on emission kinetics. The comparison between plants under continuous light and normal light:dark conditions is interesting, but I think there are better ways to examine these relationships, for example, multivariate analysis might reveal some patterns.

      In Figure 2, the elevated concentrations of beta-caryophyllene found in the control plants at 8h and 16.75h measurement timepoints are curious. Is this something that is commonly seen in B73? While there can be discrepancies between emissions and compounds actually present within leaf tissue, it is a little bit odd that such high levels of b-caryophyllene were found at these timepoints, however, this is not reflected in the PTR-ToF-MS measurements of sesquiterpenes. It would be beneficial to include an overview of the mechanism of production and storage of sesquiterpenes in maize leaves, which would clarify why high amounts were found only in the GC-MS analysis and not the PTR-ToF-MS analysis, which is a more sensitive analytical tool. It is possible that the amounts of b-caryophyllene present in the leaf are actually extremely low, however as the values are not given as a concentration but rather arbitrary units, it is not possible to tell. I would include a line explaining what is seen with b-caryophyllene. Additionally, it seems like the amounts of TMTT within the leaf are extraordinarily high (judging only by the au values given for scale), far higher than one would expect from maize.

    3. Reviewer #2 (Public Review):

      The exact dynamics of responses to volatiles from herbivore-attacked neighbouring plants have been little studied so far. Also, we still lack evidence of whether herbivore-induced plant volatiles (HIPVs) induce or prime plant defences of neighbours. The authors investigated the volatile emission patterns of receiver plants that respond to the volatile emission of neighbouring sender plants which are fed upon by herbivorous caterpillars. They applied a very elegant approach (more rigorous than the current state-of-the-art) to monitor temporal response patterns of neighbouring plants to HIPVs by measuring volatile emissions of senders and receivers, senders only and receivers only. Different terpenoids were produced within 2 h of such exposure in receiver plants, but not during the dark phase. Once the light turned on again, large amounts of terpenoids were released from the receiver plants. This may indicate a delayed terpene burst, but terpenoids may also be induced by the sudden change in light. A potential caveat exists with respect to the exact timing and the day-night cycle. The timing may be critical, i.e. at which time-point after onset of light herbivores were placed on the plants and how long the terpene emission lasted before the light was turned off. If the rhythm or a potential internal clock matters, then this information should also be highly relevant. Moreover, light on/off is a rather arbitrary treatment that is practical for experiments in the laboratory but which is not a very realistic setting. Particularly with regard to terpene emission, the sudden turning on of light instead of a smooth and continuous change to lighter conditions may trigger emission responses that are not found in nature. As one contrasting control, the authors also studied the time-delay in volatile emission when plants were just kept under continuous light (just for the experiment or continuously?). Here they also found a delayed terpenoid production, but this seemed to be lower compared to the plants exposed to the day-night-cycle. Another helpful control would be to start the herbivory treatment in the evening hours and leave the light on. If then again plants only release volatiles after a 17 h delay, the response is indeed independent of the diurnal clock of the plant.

      Interestingly, internal terpene pools of one of the leaves tested here remained more comparable between night and day, indicating that their pools stay higher in plants exposed to HIPVs. In contrast, terpene synthases were only induced during the light-phase, not in the dark-phase. Moreover, jasmonates were only significantly induced 22 h after the onset of the volatile exposure and thus parallel with the burst of terpene release.

      An additional experiment exposing plants to the green leaf volatile (glv) (Z)-3-hexenyl acetate revealed that plants can be primed by this glv, leading to a stronger terpene burst. The results are discussed with nice logic and considering potential ecological consequences. Some data are not discussed, e.g. the jasmonate and gene induction pattern.

      Overall, this study provides intriguing insights into the potential interplay between priming and induction, which may co-occur, enhancing (indirect and direct) plant defence. Follow-up studies are suggested that may provide additional evidence.

    1. eLife assessment

      This useful study has the potential to reveal insights into how calcineurin influences C. elegans lifespan through its role in controlling the defecation motor program. Currently, the evidence in support of the conclusions is still incomplete, largely due to concerns about partial gene inactivation by RNAi. The inclusion of experiments using a tax-6 null allele would mitigate these concerns.

    2. Reviewer #1 (Public Review):

      In this paper, the authors show that disruption of calcineurin, which is encoded by tax-6 in C. elegans, results in increased susceptibility to P. aeruginosa, but extends lifespan. In exploring the mechanisms involved, the authors show that disruption of tax-6 decreases the rate of defecation leading to intestinal accumulation of bacteria and distension of the intestinal lumen. The authors further show that the lifespan extension is dependent on hlh-30, which may be involved in breaking down lipids following deficits in defecation, and nhr-8, whose levels are increased by deficits in defecation. The authors propose a model in which disruption of the defecation motor program is responsible for the effect of calcineurin on pathogen susceptibility and lifespan, but do not exclude the possibility that calcineurin affects these phenotypes independently of defecation.

    3. Reviewer #2 (Public Review):

      The manuscript titled "Calcineurin Inhibition Enhances Caenorhabditis elegans Lifespan by Defecation Defects-Mediated Calorie Restriction and Nuclear Hormone Signaling" by Priyanka Das, Alejandro Aballay, and Jogender Singh reveals that inhibiting calcineurin, a conserved protein phosphatase, in C. elegans affects the defecation motor program (DMP), leading to intestinal bloating and increased susceptibility to bacterial infection. This intestinal bloating mimics calorie restriction, ultimately resulting in an enhanced lifespan. The research identifies the involvement of HLH-30 and NHR-8 proteins in this lifespan enhancement, providing new insights into the role of calcineurin in C. elegans DMP and mechanisms for longevity.

      The authors present novel findings on the role of calcineurin in regulating the defecation motor program in C. elegans and how its inhibition can lead to lifespan enhancement. The evidence provided is solid with multiple experiments supporting the main claims.

      Strengths:<br /> The manuscript's strength lies in the authors' use of genetic and biochemical techniques to investigate the role of calcineurin in regulating the DMP, innate immunity, and lifespan in C. elegans. Moreover, the authors' findings provide a new mechanism for calcineurin inhibition-mediated longevity extension, which could have significant implications for understanding the molecular basis of aging and developing interventions to promote healthy aging.

      1. The study uncovers a new role for calcineurin in the regulation of C. elegans DMP and a potential novel pathway for enhancing lifespan via calorie restriction involving calcineurin, HLH-30, and NHR-8 in C. elegans.<br /> 2. Multiple signaling pathways involved in lifespan enhancement were investigated with fairly strong experimental evidence supporting their claims.

      Weaknesses:<br /> The manuscript's weaknesses include the lack of mechanistic details regarding how calcineurin inhibition leads to defects in the DMP and induces calorie restriction-like effects on lifespan.

      The exact site of calcineurin action, i.e., whether in the intestine or enteric muscles (Lee et al., 2005), and the possible molecular mechanisms linking calcineurin inhibition, DMP defects, and lifespan were not adequately explored. Although characterization of the full mechanism is probably beyond the scope of this paper, given the relative simplicity and advantages of using C. elegans as a model organism for this study, some degree of rigor is expected with additional straightforward control experiments as listed below:

      The authors state that tax-6 knockdown animals had drastically reduced expulsion events (Figure 2G), leading to irregular DMP (Lines 144-145), but did not describe the nature of DMP irregularity. For example, did the reduced expulsion events still occur with regular intervals but longer cycle lengths? Or was the rhythmicity completely abolished? The former would suggest the intestine clock is still intact, and the latter would indicate that calcineurin is required for the clock to function. Therefore, ethograms of DMP in both wild-type and tax-6 mutant animals are warranted to be included in the manuscript. Along the same line, besides the cycle length, the three separable motor steps (aBoc, pBoc, EMC) are easily measurable, with each step indicating where the program goes wrong, hence the site of action, which is precisely the beauty of studying C. elegans DMP. Unfortunately, the authors did not use this opportunity to characterize the exact behavior phenotypes of the tax-6 mutant to guide future investigations. Furthermore, it is interesting that about 64% of tax-6 (p675) animals had normal DMP. The authors attributed this to p675 being a weak allele. It would be informative to further examine tax-6 RNAi as in other experiments or to make a tax-6 null mutant with CRISPR. In addition, in one of the cited papers (Lee et al., 2005), the exact calcineurin loss-of-function strain tax-6(p675) was shown to have normal defecation, including normal EMC, while the gain-of-function mutant of calcineurin tax-6(jh107) had abnormal EMC steps. It wasn't clear from Lee et al., 2005, if the reported "normal defecation" was only referring to the expulsion step or also included the cycle length. Nevertheless, this potential contradiction and calcineurin gain-of-function mutant is highly relevant to the current study and should be further explored as a follow-up to previously reported results. For some of the key experiments, such as tax-6's effects on susceptibility to PA14, DMP, intestinal bloating, and lifespan, additional controls, as the norm of C. elegans studies, including second allele and rescue experiments, would strengthen the authors' claims and conclusions.

      The second weakness of this manuscript is the data presentation for all survival rate curves. The authors stated that three independent experiments or biological replicates were performed for each group but only showed one "representative" curve for each plot. Without seeing all individual datasets or the averaged data with error bars, there is no way to evaluate the variability and consistency of the survival rate reported in this study.

      Overall, the authors' claims and conclusions are justified by their data, but further experiments are needed to confirm their findings and establish the detailed mechanisms underlying the observed effects of calcineurin inhibition on the DMP, calorie restriction, and lifespan in C. elegans.

    1. eLife assessment

      This paper reports a valuable new set of new results. The main claim is that the projection from adult-born granule cells in the dentate gyrus to the hippocampal subfield CA2 is important for the retrieval of social memories formed during development. However, the reviewers agreed that evidence for this major claim is currently incomplete.

    2. Reviewer #1 (Public Review):

      Summary:<br /> This manuscript provides some valuable findings concerning the hippocampal circuitry and the potential role of adult-born granule cells in an interesting long-term social memory retrieval. The behavior experiments and strategy employed to understand how adult-born granule cells contribute to long-term social discrimination memory are interesting.

      I have a few concerns, however with the strength of the evidence presented for some of the experiments. The data presented and the method described is incomplete in describing the connection between cell types in CA2 and the projections from abGCs. Likewise, I worry about the interpretation of the data in Figures 1 and 2 given the employed methodology. I think that the interpretation should be broadened. This second concern does not impact the interest and significance of the findings.

      Strengths:<br /> The behavior experiments are beautifully designed and executed. The experimental strategy is interesting.

      Weaknesses:<br /> The interpretation of the results may not be justified given the methods and details provided.

    3. Reviewer #2 (Public Review):

      Summary:<br /> Laham et al. investigate how the projection from adult-born granule cells into CA2 affects the retrieval of social memories at various developmental points. They use chemogenetic manipulations and electrophysiological recordings to test how this projection affects hippocampal network properties during behavior. I find the study to be very interesting, the results are important for our understanding of how social memories of different natures (remote or immediate) are encoded and supported by the hippocampal circuitry. I have some points that I added below that I think could help clarify the conclusions:<br /> - My major concern with the manuscript was that making the transitions between the different experiments for each result section is not very smooth. Maybe they can discuss a bit in a summary conclusion sentence at the end of each result section why the next set of experiments is the most logical step.<br /> - In line 113, the authors say that "the DG is known to influence hippocampal theta-gamma coupling and SWRs". Another recent study Fernandez-Ruiz et al. 2021, examined how various gamma frequencies in the dentate gyrus modulate hippocampal dynamics.<br /> - Having no single cells in the electrophysiological recordings makes it difficult to interpret the ephys part. Perhaps having a discussion on this would help interpret the results. If more SWRs are produced from the CA2 region (perhaps aided by projections from abGC), more CA2 cells that respond to social stimuli (Oliva et al. 2020) would reactivate the memories, therefore making them consolidate faster/stronger. On the other hand, the projections from abGC that the authors see, also target a great deal of PV+ interneurons, which have been shown to pace the SWRs frequency (Stark et al 2014, Gan et al 2017), which further suggests that this projection could be involved in SWRs modulation.<br /> - The authors should cite and discuss Shuo et al., 2022 (A hypothalamic novelty signal modulates hippocampal memory).<br /> - I think the authors forgot to refer to Fig 3a-f, maybe around lines 163-168.<br /> - Are the SWRs counted only during interaction time or throughout the whole behavior session for each condition?<br /> - Figure 3t shows a shift in the preferred gamma phase within theta cycles as a result of abGC projections to CA2 ablation with CNO, especially during Mother CNO condition. I think this result is worth mentioning in the text.<br /> - Figure 3u in the legend mention "scale bars = 200um", what does this refer to?<br /> - What exactly is calculated as SWR average integral? Is it a cumulative rate? Please clarify.<br /> - Alexander et al 2017, "CA2 neuronal activity controls hippocampal oscillations and social behavior", examined some of the CA2 effects in the hippocampal network after CNO silencing, and the authors should cite it.

      Strengths:<br /> Behavioral experiments after abGC projections to CA2 are compelling as they show clearly distinct behavioral readout.

      Weaknesses:<br /> Electrophysiological experiments are difficult to interpret without additional quantifications (single-cell responses during interactions etc.)

    4. Reviewer #3 (Public Review):

      Laham et al. present a manuscript investigating the function of adult-born granule cells (abGCs) projecting to the CA2 region of the hippocampus during social memory. It should be noted that no function for the general DG to CA2 projection has been proposed yet. The authors use targeted ablation, chemogenetic silencing, and in vivo ephys to demonstrate that the abGCs to CA2 projection is necessary for the retrieval of remote social memories such as the memory of one's mother. They also use in vivo ephys to show that abGCs are necessary for differential CA2 network activity, including theta-gamma coupling and sharp wave-ripples, in response to novel versus familiar social stimuli.

      The question investigated is important since the function of DG to CA2 projection remained elusive a decade after its discovery. Overall, the results are interesting but focused on the social memory of the mother, and their description in the manuscript and figures is too cursory. For example, raw interaction times must be shown before their difference. The assumption that mice exhibit social preference between familiar or novel individuals such as mother and non-mother based on social memory formation, consolidation, and retrieval should be better explained throughout the manuscript. Thus, when describing the results, the authors should comment on changes in preference and how this can be interpreted as a change in social memory retrieval. Several critical experimental details such as the total time of presentation to the mother and non-mother stimulus mice are also lacking in the manuscript. The in vivo e-phys results are interesting as well but even more succinct with no proposed mechanism as to how abGCs could regulate SWR and PAC in CA2.

      The manuscript is well-written with the appropriate references. The choice of the behavioral test is somewhat debatable, however. It is surprising that the authors chose to use a direct presentation test (presentation of the mother and non-mother in alternation) instead of the classical 3-chamber test which is particularly appropriate to investigate social preference. Since the authors focused exclusively on this preference, the 3-chamber test would have been more adequate in my opinion. It would greatly strengthen the results if the authors could repeat a key experiment from their investigation using such a test. In addition, the authors only impaired the mother's memory. An additional experiment showing that disruption of the abGCs to CA2 circuit impairs social memory retrieval would allow us to generalize the findings to social memories in general. As the manuscript stands, the authors can only conclude the importance of this circuit for the memory of the mother. Developmental memory implies the memory of familiar kin as well.

      The in vivo ephys section (Figure 3) is interesting but even more minimalistic and it is unclear how abGCs projection to CA2 can contribute to SWR and theta-gamma PAC. In Figure 1, the authors suggest that abGCs project preferentially to PV+ neurons in CA2. At a minimum, the authors should discuss how the abGCs to PV+ neurons to CA2 pyramidal neurons circuit can facilitate SWR and theta-gamma PAC.

      Finally, proposing a function for 4-6-week-old abGCs projecting to CA2 begs two questions: What are abGCs doing once they mature further, and more generally, what is the function of the DG to CA2 projection? It would be interesting for the authors to comment on these questions in the discussion.

    1. eLife assessment

      This valuable paper examines gene expression differences between male and female individuals over the course of flower development in the dioecious angiosperm Trichosantes pilosa. The authors show that male-biased genes evolve faster than female-biased and unbiased genes, which is frequently observed in animals but this is the first report of such a pattern in plants. In spite of the limited sample size, the reviewers found the evidence to be mostly solid and the methods appropriate for a non-model organism. The resources produced will be used by researchers working in the Cucurbitaceae, and the results obtained advance our understanding of the mechanisms of plant sexual reproduction and its evolutionary implications: as such they will broadly appeal to evolutionary biologists and plant biologists.

    2. Reviewer #1 (Public Review):

      The evolution of dioecy in angiosperms has significant implications for plant reproductive efficiency, adaptation, evolutionary potential, and resilience to environmental changes. Dioecy allows for the specialization and division of labor between male and female plants, where each sex can focus on specific aspects of reproduction and allocate resources accordingly. This division of labor creates an opportunity for sexual selection to act and can drive the evolution of sexual dimorphism.

      In the present study, the authors investigate sex-biased gene expression patterns in juvenile and mature dioecious flowers to gain insights into the molecular basis of sexual dimorphism. They find that a large proportion of the plant transcriptome is differentially regulated between males and females with the number of sex-biased genes in floral buds being approximately 15 times higher than in mature flowers. The functional analysis of sex-biased genes reveals that chemical defense pathways against herbivores are up-regulated in the female buds along with genes involved in the acquisition of resources such as carbon for fruit and seed production, whereas male buds are enriched in genes related to signaling, inflorescence development and senescence of male flowers. Furthermore, the authors implement sophisticated maximum likelihood methods to understand the forces driving the evolution of sex-biased genes. They highlight the influence of positive and relaxed purifying selection on the evolution of male-biased genes, which show significantly higher rates of non-synonymous to synonymous substitutions than female or unbiased genes. This is the first report (to my knowledge) highlighting the occurrence of this pattern in plants. Overall, this study provides important insights into the genetic basis of sexual dimorphism and the evolution of reproductive genes in Cucurbitaceae.

      There are, however, parts of the manuscript that are not clearly described or could be otherwise improved.

      - The number of denovo-assembled unigenes seems large and I would like to know how it compares to the number of genes in other Cucurbitaceae species. The presence of alternatively assembled isoforms or assembly artifacts may be still high in the final assembly and inflate the numbers of identified sex-biased genes.

      - It is interesting that the majority of sex-biased genes are present in the floral buds but not in the mature flowers. I think this pattern could be explored in more detail, by investigating the expression of male and female sex-biased genes throughout the flower development in the opposite sex. It is also not clear how the expression of the sex-biased genes found in the buds changes when buds and mature flowers are compared within each sex.

      - The statistical analysis of evolutionary rates between male-biased, female-biased, and unbiased genes is performed on samples with very different numbers of observations, therefore, a permutation test seems more appropriate here.

      - The impact of pleiotropy on the evolutionary rates of male-biased genes is speculative since only two tissue samples (buds and mature flowers) are used. More tissue types need to be included to draw any meaningful conclusions here.

    3. Reviewer #2 (Public Review):

      Summary:<br /> This study uses transcriptome sequence from a dioecious plant to compare evolutionary rates between genes with male- and female-biased expression and distinguish between relaxed selection and positive selection as causes for more rapid evolution. These questions have been explored in animals and algae, but few studies have investigated this in dioecious angiosperms, and none have so far identified faster rates of evolution in male-biased genes (though see Hough et al. 2014 https://doi.org/10.1073/pnas.1319227111).

      Strengths:<br /> The methods are appropriate to the questions asked. Both the sample size and the depth of sequencing are sufficient, and the methods used to estimate evolutionary rates and the strength of selection are appropriate. The data presented are consistent with faster evolution of genes with male-biased expression, due to both positive and relaxed selection.

      This is a useful contribution to understanding the effect of sex-biased expression in genetic evolution in plants. It demonstrates the range of variation in evolutionary rates and selective mechanisms, and provides further context to connect these patterns to potential explanatory factors in plant diversity such as the age of sex chromosomes and the developmental trajectories of male and female flowers.

      Weaknesses:<br /> The presence of sex chromosomes is a potential confounding factor, since there are different evolutionary expectations for X-linked, Y-linked, and autosomal genes. Attempting to distinguish transcripts on the sex chromosomes from autosomal transcripts could provide additional insight into the relative contributions of positive and relaxed selection.

    4. Reviewer #3 (Public Review):

      The potential for sexual selection and the extent of sexual dimorphism in gene expression have been studied in great detail in animals, but hardly examined in plants so far. In this context, the study by Zhao, Zhou et al. al represents a welcome addition to the literature.

      Relative to the previous studies in Angiosperms, the dataset is interesting in that it focuses on reproductive rather than somatic tissues (which makes sense to investigate sexual selection), and includes more than a single developmental stage (buds + mature flowers).

      The main limitation of the study is the very low number of samples analyzed, with only three replicate individuals per sex (i.e. the whole study is built on six individuals only). This provides low power to detect differential expression. Along the same line, only three species were used to evaluate the rates of non-synonymous to synonymous substitutions, which also represents a very limited dataset, in particular when trying to fit parameter-rich models such as those implemented here.

      A third limitation relates to the absence of a reference genome for the species, making the use of a de novo transcriptome assembly necessary, which is likely to lead to a large number of incorrectly assembled transcripts. Of course, the production of a reference transcriptome in this non-model species is already a useful resource, but this point should at least be acknowledged somewhere in the manuscript.

      Each of these shortcomings is relatively important, and together they strongly limit the scope of the conclusions that can be made, and they should at least be acknowledged more prominently. The study is valuable in spite of these limitations and the topic remains grossly understudied, so I think the study will be of interest to researchers in the field, and hopefully inspire further, more comprehensive analyses.

    1. eLife assessment

      The increased risk of fracture without decreased bone density in type 2 diabetes (T2D), the so-called "diabetic bone paradox", is mainly attributed to the limitation of assessing risk of fracture based on bone density alone in the current practice. Now we have learnt that poor bone quality and increased risk of falling due to concomitant co-morbidities partially explains it. This study this presents useful findings that clinical risk factors (though not genetic factors) related to T2D are associated with risk of fracture in T2D patients. The new approach of using Mendelian randomization to explain the relationship of two complex conditions is solid, and the discovery of 10 loci shared between T2D and fracture risk is intriguing. However, including clinically more relevant risk factors for fracture risk in T2D patients in their observational analysis would have strengthened the study.

    2. Reviewer #1 (Public Review):

      Summary:<br /> The manuscript of Zhao et al. aimed at investigating the relationships between type 2 diabetes, bone mineral density (BMD) and fracture risk using Mendelian Randomization (MR) approach.<br /> The authors found that genetically predicted T2D was associated with higher BMD and lower risk of fracture, and suggested a mediated effect of RSPO3 level. Moreover, when stratified by the risk factors secondary to T2D, they observed that the effect of T2D on the risk of fracture decreased when the number of risk factors secondary to T2D decreased.

      Strengths:<br /> - Important question<br /> - Manuscript is overall clear and well-written<br /> - MR analyses have been conducted properly, which include the usage of various MR methods and sensitivity analyses, and likely meet the criteria of the MR-strobe checklist to report MR results.

      Weaknesses:<br /> - Previous MR studies on that topic have not been discussed<br /> - Multivariable MR could have been used to better assessed the mediative effect of BMI or RSPO3 on the relationships between T2D and fracture risk.

    3. Reviewer #2 (Public Review):

      The authors employed the Mendelian Randomization method to analyze the association between type 2 diabetes (T2D) and fracture using the UK Biobank data. They found that "genetically predicted T2D was associated with higher BMD and lower risk of fracture". Additionally, they identified 10 loci that were associated with both T2D and fracture risk, with the SNP rs4580892 showing the highest signal. While the negative relationship between T2D and fracture has been previously observed, the discovery of these 10 loci adds an intriguing dimension to the findings, although the clinical implications remain uncertain.

    1. eLife assessment

      This study reports the fundamental finding of differential expression of key genes in full-term placenta between Tibetans and Han Chinese at high elevations, which are more pronounced in the placentas of male fetuses than in female ones. If confirmed, these results will help us understand how human populations adapt to high elevation by mitigating the negative effects of low oxygen on fetal growth. However, although the differential gene expression reported is solid, the downstream analyses offer only incomplete support for its connection to hypoxia-specific responses, adaptive genetic variation, and pregnancy outcomes.

    2. Reviewer #1 (Public Review):

      In this manuscript, the authors investigate differences between Tibetans and Han Chinese at altitude in terms of placental transcriptomes during full-term pregnancy. Most importantly, they found that the inter-population differentiation is mostly male-specific and the observed direction of transcriptional differentiation seems to be adaptive at high altitude. In general, it is of great importance and provides new insights into the functional basis of Tibetan high-altitude adaptations, which so far have been mostly studied via population genetic measures only. More specifically, I firmly believe that we need more phenotype data (including molecular phenotypes such as gene expression data) to fully understand Tibetan adaptations to high altitude, and this manuscript is a rare example of such a study. I have a few suggestions and/or questions with which I hope to improve the manuscript further, especially in terms of 1) testing if the observed DEG patterns are truly adaptive, and 2) how and whether the findings in this study can be linked to EPAS1 and EGLN1, the signature adaptation genes in Tibetans.

      Major Comments:

      1. The DEG analysis is the most central result in this manuscript, but the discrepancy between sex-combined and sex-specific DEGs is quite mind-boggling. For those that were differentially expressed in the sex-specific sets but not in the sex-combined one, the authors suggest an opposite direction of DE as an explanation (page 11, Figure S5). But Figure S5A does not show such a trend, showing that down-regulated genes in males are mostly not at all differentially expressed in females. Figure S6B does show such a trend, but it doesn't seem to be a dominant explanation. I would like to recommend the authors test alternative ways of analysis to boost statistical power for DEG detection other than simply splitting data into males and females and performing analysis in each subset. For example, the authors may consider utilizing gene-by-environment interaction analysis schemes here biological sex as an environmental factor.

      2. Please clarify how the authors handled multiple testing correction of p-values.

      3. The "natural selection acts on the placental DEGs ..." section is potentially misleading readers to assume that the manuscript reports evidence for positive selection on the observed DEG pattern between Tibetans and Han, which is not.<br /> a) Currently the section simply describes an overlap between DEGs and a set of 192 genes likely under positive selection in Tibetans (TSNGs). The overlap is quite small, leading to only 13 genes in total (Figure 6). The authors are currently not providing any statistical measure of whether this overlap is significantly enriched or at the level expected for random sampling.<br /> b) The authors are describing sets of DEGs that seem to affect important phenotypic changes in a consistent and adaptive direction. A relevant form of natural selection for this situation may be polygenic adaptation while the authors only consider strong positive selection at a single variant/gene level.<br /> c) The manuscript is currently providing no eQTL information that can explain the differential expression of key genes. The authors can actually do this based on the genotype and expression data of the individuals in this study. Combining eQTL info, they can set up a test for polygenic adaptation (e.g. Berg and Coop; https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1004412). This will provide a powerful and direct test for the adaptiveness of the observed DEG pattern.

      4. The manuscript is currently only minimally discussing how findings are linked to EPAS1 and EGLN1 genes, which show the hallmark signature of positive selection in Tibetans. In fact, the authors' group previously reported male-specific association between EPAS1 SNPs and blood hemoglobin level. Many readers will be intrigued to see a discussion about this point.

    3. Reviewer #2 (Public Review):

      In this manuscript, the authors use newly-generated, large-scale transcriptomic data along with histological data to attempt to dissect the mechanisms by which individuals with Tibetan ancestry are able to mitigate the negative effects of high elevation on birth weight. They present detailed analyses of the transcriptomic data and find significant sex differences in the placenta transcriptome.

      I have significant concerns about the conclusions that are presented. The analyses also lack the information necessary to evaluate their reliability.

      The experimental design does not include a low elevation comparison and thus cannot be used to answer questions about how ancestry influences hypoxia responses and thus birthweight at high elevations. Importantly, because the placenta tissues (and trophoblasts specifically) are quickly evolving, there are a priori good reasons to expect to find population differences irrespective of adaptive evolution that might contribute to fetal growth protection. There are also significant details missing in the analyses that are necessary to substantiate and replicate the analyses presented.

      Although the datasets are ultimately valuable as reference sets, the absence of low elevation comparisons for Tibetans and Han Chinese individuals undermines the ability of the authors to assess whether differences observed between populations are linked to hypoxia responses or variation in the outcomes of interest (i.e., hypoxia-dependent fetal growth restriction).

      The authors attempt to tackle this phenotypic association by looking for correlations between gene networks (WGCNA) and individual genes with birthweight and other measurements collected at birth. I have some reservations about this approach with only two groups (i.e., missing the lowland comparison), but it is further problematic that the authors do not present data demonstrating that there are differences in birthweight or any other traits between the populations in the samples they collected.

      Throughout, I thus find conclusions about the adaptive value and hypoxia-responses made by the authors to be unsubstantiated and/or the data to be inadequate. There are also a gratuitous number of speculative statements about mechanisms by which differential gene expression leads to the protection of birthweight that are not evaluated and thus cannot be substantiated by the data presented.

      As currently presented and discussed, these results thus can only be used to evaluate population differences and tissue-specific variation therein.

      There is also some important methodological information missing that makes it difficult or impossible to assess the quality of the underlying data and/or reproduce the analyses, further limiting the potential impact of these data:<br /> 1. Transcriptome data processing and analyses: RNA quality information is not mentioned (i.e., RIN). What # of reads are mapped to annotated regions? How many genes were expressed in each tissue (important for contextualizing the # of DE genes reported - are these a significant proportion of expressed genes or just a small subset?).<br /> 2. The methods suggest that DE analyses were run using data that were normalized prior to reading them into DESeq2. DESeq2 has an internal normalization process and should not be used on data that was already normalized. Please clarify how and when normalization was performed.<br /> 3. For enrichment analyses, the background gene set (all expressed genes? all genes in the genome? or only genes expressed in the tissue of interest?) has deterministic effects on the outcomes. The background sets are not specified for any analyses.<br /> 4. In the WGCNA analysis, P-values for correlations of modules with phenotype data (birthweight etc.) should be corrected for multiple testing (i.e., running the module correlation for each outcome variables) and p.adjust used to evaluate associations to limit false positives given the large number of correlations being run.<br /> 6. The plots for umbilical histological data (Fig 5 C) contain more than 5 points, but the use of replicate sections is not specified. If replicate sections were used, the authors should control for non-independence of replicate sections in their analyses (i.e., random effects model).

      On more minor notes:<br /> There is significant and relevant published data on sex differences and hypoxia in rodents (see Cuffe et al 2014, "Mid- to late-term hypoxia in the mouse alters placental morphology, glucocorticoid regulatory pathways, and nutrient transporters in a sex-specific manner" and review by Siragher and Sferuzzi-Perro 2021, "Placental hypoxia: What have we learnt from small animal models?"), and historical work reporting sex differences in placental traits associated with high elevation adaptation in Andeans (series of publications by Moira Jackson in the late 1980s, reviewed in Wilsterman and Cheviron 2021, "Fetal growth, high altitude, and evolutionary adaptation: A new perspective").

    4. Reviewer #3 (Public Review):

      More than 80 million people live at high altitude. This impacts health outcomes, including those related to pregnancy. Longer-lived populations at high altitudes, such as the Tibetan and Andean populations show partial protection against the negative health effects of high altitude. The paper by Yue sought to determine the mechanisms by which the placenta of Tibetans may have adapted to minimise the negative effect of high altitude on fetal growth outcomes. It compared placentas from pregnancies from Tibetans to those from the Han Chinese. It employed RNAseq profiling of different regions of the placenta and fetal membranes, with some follow-up of histological changes in umbilical cord structure and placental structure. The study also explored the contribution of fetal sex in these phenotypic outcomes.

      A key strength of the study is the large sample sizes for the RNAseq analysis, the analysis of different parts of the placenta and fetal membranes, and the assessment of fetal sex differences.

      A main weakness is that this study, and its conclusions, largely rely on transcriptomic changes informed by RNAseq. Changes in genes and pathways identified through bioinformatic analysis were not verified by alternate methods, such as by western blotting, which would add weight to the strength of the data and its interpretations. There is also a lack of description of patient characteristics, so the reader is unable to make their own judgments on how placental changes may link to pregnancy outcomes. Another weakness is that the histological analyses were performed on n=5 per group and were rudimentary in nature.

    1. eLife assessment

      This work presents valuable information about the specificity and promiscuity of toxic effector and immunity protein pairs. The evidence supporting the claims of the authors is currently incomplete, as there is concern about the methodology used to analyze protein interactions, which did not take potential differences in expression levels, protein folding, and/or transient interaction into account. Other methods to measure the strength of interactions and structural predictions would improve the study. The work will be of interest to microbiologists and biochemists working with toxin-antitoxin and effector-immunity proteins.

    1. Reviewer #2 (Public Review):

      In this manuscript, Xie and colleagues investigate the contribution of osteocytes to bone metastasis of non-small cell lung carcinoma (NSCLC) using a combination of clinical samples and in vitro and in vivo data. They find that metastatic NSCLC cells exhibit lower levels of the proliferation marker Ki-67 when located in areas adjacent to the bone surface in both NSCLC patients and an intraosseous animal model of NSCLC. Using in vitro approaches, they show that osteocyte-like cells inhibit the proliferation of NSCLC cells through the secretion of small extracellular vesicles (sEVs). They identify miR-99b-3p as a component of sEVs and demonstrate that miR-99b3p inhibits the proliferation of NSCLC cells by targeting the transcription factor MDM2. Interestingly, the data also shows that mechanical stimulation of osteocytes enhances the inhibitory effect of osteocytes on NSCLC cell proliferation via increasing sEVs release. By performing different in vivo studies, the authors show that tibial loading and moderate exercise (treadmill running), before and after tumor cell inoculation, suppress tumor progression in bone and protect bone mass. Intriguingly, the moderate exercise regime shows additive/synergistic effects with the co-administration of anti-resorptive therapy. These data add to the growing evidence pointing towards osteocytes as important cells of the tumor microenvironment capable of influencing the progression of tumors in bone.

      The conclusions of the paper, however, are not well supported by the data, and some critical aspects of image analysis and data analysis need to be clarified and extended.

      1) The histological images are analyzed in a qualitative manner, with no description of the methodology used. In bone metastases, cancer cells are frequently mixed with bone marrow cells. The lack of cell markers to identify NSCLC cells versus bone marrow cells makes the interpretation of the imaging data difficult. The authors rely on KI-67 as a marker of proliferation. Yet, it is intriguing that some osteocytes, non-proliferating cells by definition, are often positive for this marker, which questions the specificity of the staining. To make these results more solid, the authors should have provided the proper immunostaining controls to check for specificity and use additional markers of proliferation.

      2) Adding control groups to fully assess the impact of the in vivo interventions (tibial loading, moderate exercise, anti-resorptive therapy) on bone mass would be needed. The authors should have used naive mice or analyzed the bones from the non-injected contralateral legs. Further, validating the in vivo work with other osteocyte-like cells or primary osteocytes would have strengthened the results.

      3) The data on miRNA99b-3p on NSCLC in Supplementary Figure 3 is not convincing. The positive cells are difficult to see and most of the osteocyte lack nuclei. Better data, in humans and the mouse model, would have helped to confirm that osteocytes produce miRNA99b-3p.

      4) The conclusions of the paper are not fully supported by the data provided. Osteocytes, as well as other bone cells, can respond to mechanical stimulation and thus could virtually be responsible for the protective effects of mechanical loading or moderate exercise. In vivo experiments demonstrating a direct role of osteocytes-produced miRNA99b-3p are needed to support the notion that osteocytes maintain tumor dormancy in NSCLC bone metastasis. Further, the authors solely rely on Ki-67 as a marker of dormancy. Completing this analysis with an assessment of a dormant gene expression signature or in vivo studies assessing tumor dormancy directly would be needed to confirm this notion.

    2. eLife assessment

      This study describes a potential role for mechanical stimulation on non-small cell lung cancer (NSCLC) development. The findings are important and their observations are interesting, as models to study exercise in a mouse cancer setting would have practical implications beyond lung cancer, and biological roles of osteocytes in bone metastatic cancer is an area of great interest for potential therapy development. However, the methods and data interpretation are incomplete and the claims of osteocytes inducing tumor dormancy are overstated. The mechanism by which osteocytes affect tumor cells is not clear and the authors' theory on this remains unproven since much of the data are correlative rather than causative, and adequate controls, data quantification, and confirmation with secondary cell lines are often lacking.

    3. Reviewer #1 (Public Review):

      Xie and Colleagues propose here to investigate the mechanism by which exercise inhibits bone metastasis progression. The authors describe that osteocyte, sensing mechanical stimulation generated by exercise, inhibit NSCLC cell proliferation and sustain the dormancy thereof by releasing sEVs with tumor suppressor microRNAs. Furthermore, mechanical loading of the tibia inhibited the bone metastasis progression of NSCLC. Interestingly, exercise preconditioning effectively suppressed bone metastasis progression.

    1. eLife assessment

      This valuable study is focused on the requirement for the photoreceptor-specific tetraspanins, ROM1 and PRPH2, in the formation of the light-sensitive membrane discs. The evidence supporting the claim that deficiency in one of the proteins can be compensated by the other is convincing, with both established and modern techniques yielding results that will be of interest to those studying photoreceptor development and membrane curvature.

    2. Reviewer #1 (Public Review):

      Summary:<br /> The precise mechanism of how tetraspanin proteins engage in the generation of discs is still an open question in the field of photoreceptor biology. This question is of significance as the lack of photoreceptor discs or defects in disc morphogenesis due to mutations in tetraspanin proteins is a known cause of vision loss in humans. The authors of this study combine TEM and mouse models to tease out the role of tetraspanin proteins, peripherin, and Rom1 in the genesis of the photoreceptor discs. They show that the absence of Rom1 leads to an increase in peripherin and changes in disc morphology. Further rise in peripherin alleviates some of the defects observed in Rom1 knockout animals leading to the conclusion that peripherin can substitute for the absence of Rom1.

      Strengths:<br /> A mouse model of Rom1 generated by the McInnes group in 2000 predicted a role for Rom1 in rim closure. They also showed enlarged discs in the absence of Rom1. This study confirmed this finding and showed the compensatory changes in peripherin, maintaining the total levels of tetraspanin proteins. Lack of Rom1 leads to excessive open disks demonstrated by darkly stained tannic acid-accessible areas in TEM. Interestingly, increased peripherin expression can rescue some morphological defects, including maintaining normal disc diameters and incisures. Overall, these observations lead authors to propose a model that ROM1 can be replaced by peripherin.

      Weaknesses:<br /> The compensatory increase in peripherin and morphological rescue in the absence of ROM1 is expected, given the previous work from authors showing i) absence of peripherin showing increased ROM1 and ii) "Eliminating Rom1 also increased levels of Prph2/RRCT: mean Prph2/RRCT levels in P30 Prph2+/R retinas were 34% of WT, while levels in Prph2+/R/Rom1−/− retinas were 59% of WT" from Conley, 2019. The current study provides a comprehensive quantitative analysis. However, the mechanism behind the mechanism is unclear and warrants discussion.

      Photoreceptor morphology appears better when peripherin is overexpressed. Is there a rescue of rod function (assessed by ERG or equivalent measures) in peripherin OE/Rom1-/- mice? Given the extensive work in this area and the implications the authors allude to at the end, it is important to investigate this aspect.

    3. Reviewer #2 (Public Review):

      In this study, Lewis et al seek to further define the role of ROM1. ROM1 is a tetraspanin protein that oligomerizes with another tetraspanin, PRPH2, to shape the rims of the membrane discs that comprise the light-sensitive outer segment of vertebrate photoreceptors. ROM1 knockout mice and several PRPH2 mutant mice are reexamined. The conclusion reached is that ROM1 is redundant to PRPH2 in regulating the size of newly forming discs, although excess PRPH2 is required to compensate for the loss of ROM1.

      This replicates earlier findings while adding rigor using a mass spectrometry-based approach to quantitate the ratio of ROM1 and PRPH2 to rhodopsin (the protein packed in the body of the disc membranes) and careful analysis of tannic acid labeled newly forming discs using transmission electron microscopy.

      In ROM1 knockout mice PRPH2 expression was found to be increased so that the level of PRPH2 in those mice matches the combined amount of PRPH2 and ROM1 in wildtype mice. Despite this, there are defects in disc formation that are resolved when the ROM1 knockout is crossed to a PRPH2 overexpressing line. A weakness of the study is that the molar ratios between ROM1, PRPH2 and rhodopsin were not measured in the PRPH2 overexpressing mice. This would have allowed the authors to be more precise in their conclusion that a 'sufficient' excess of PRPH2 can compensate for defects in ROM1.

    4. Reviewer #3 (Public Review):

      In this manuscript, Lewis et al. investigate the role of tetraspanins in the formation of discs - the key structure of vertebrate photoreceptors essential for light reception. Two tetraspanin proteins play a role in this process: PRPH2 and ROM1. The critical contribution of PRPH2 has been well established and loss of its function is not tolerated and results in gross anatomical pathology and degeneration in both mice and humans. However, the role of ROM1 is much less understood and has been considered somewhat redundant. This paper provides a definitive answer about the long-standing uncertainty regarding the contribution of ROM1 firmly establishing its role in outer segment morphogenesis. First, using an ingenious quantitative proteomic technique the authors show PRPH2 compensatory increase in ROM1 knockout explaining the redundancy of its function. Second, they uncover that despite this compensation, ROM1 is still needed, and its loss delays disc enclosure and results in the failure to form incisures. Third, the authors used a transgenic mouse model and show that deficits seen in ROM1 KO could be completely compensated by the overexpression of PRPH2. Finally, they analyzed yet another mouse model based on double manipulation with both ROM1 loss and expression of PRPH2 mutant unable to form dimerizing disulfide bonds further arguing that PRPH2-ROM1 interactions are not required for disc enclosure. To top it off the authors complement their in vivo studies by a series of biochemical assays done upon reconstitution of tetraspanins in transfected cultured cells as well as fractionations of native retinas. This report is timely, addresses significant questions in cell biology of photoreceptors, and pushes the field forward in a classical area of photoreceptor biology and mechanics of membrane structure as well. The manuscript is executed at the top level of technical standard, exceptionally well written, and does not leave much more to desire. It also pushes standards of the field- one such domain is the quantitative approach to analysis of the EM images which is notoriously open to alternative interpretations - yet this study does an exceptional job unbiasing this approach.

      According to my expertise in photoreceptor biology, there is nothing wrong with this manuscript either technically or conceptually and I have no concerns to express.

    1. eLife assessment

      This is a valuable paper demonstrating validity of a novel task that could advance the field of reinforcement learning to better incorporate threat processing in approach-avoidance-conflict. A compelling methodology includes the use of online samples and computational modelling, psychometrics, discovery/replication and pre-registration. This work provides a foundation for future work, which is required to address potential confounds and establish this task as relevant to psychopathology and treatment.

    1. Author Response

      Reviewer #1 (Public Review):

      In this manuscript, the authors explored the benefits of intermittent fasting on the cardiac physiology through a multi-omics approach and compared different fasting times (IF12; IF 16 and EOD) for a duration of 6 months. Combining the RNA-sequencing, proteomics and phosphor-proteomics analysis, the authors have made an interesting observation that different fasting times would lead to different changes that could be important for the cardiac physiology. Moreover, the changes observed at transcriptional level are different from protein level, suggesting a post-transcriptional regulation mechanism. Using western blot, the authors have confirmed the key signaling pathways, including AMPK, IRS pathway to be significantly altered upon intermittent fasting for 16hrs. Lastly, as a proof of concept for better cardiac function, the animals were challenged with dobutamine and echocardiography was performed to show the mice subjected to intermittent fasting have better cardiac systolic function.

      The impact of intermittent fasting on cardiovascular health has been well characterized in several studies. This report appears to be the first one utilizing a multi-omics approach and provided an interesting dataset at transcriptome, proteome and phosphor-proteome levels, and would serve as a valuable data resource for the field. I have the following concerns:

      Major concerns:

      1) The rationale for choosing the intermittent fasting pattern and timing While the 16:8 intermittent fasting is relatively standard, what is the rationale to test IF 12 hours? As a 4-hour fasting difference might not cause dramatic changes in transcriptome and proteome. Also, what is the rationale to perform 6 months study? The dobutamine stress test is not a terminal procedure, have the authors examined the cardiac function prior to 6 months to see whether there is a difference?

      We sincerely thank the reviewer for providing insightful comments and feedback on our study. The aim of our research is to gain a comprehensive understanding of molecular reprogramming in the heart during intermittent fasting using multi-omics techniques. We acknowledge the reviewer's concern regarding the selection of three different time points for intermittent fasting. Our rationale for choosing these time points was to align with the practices commonly used by researchers in the field. By doing so, we intended to explore and compare the effects of different intermittent fasting regimens on the heart. Through our study, we found that a longer fasting period resulted in the most significant changes in the proteome abundance. Though we agree that 4-hour fasting difference may not significantly alter transcriptome and proteome in terms of expressions, remarkable changes of post-translational modifications such as phosphorylation can occur during shorter time periods and this is evident based on the analyses of the modulated phosphoproteins. Hence, we included 12 hours time point also to our analysis. In fact, we would like to emphasize that all three fasting regimens had notable effects on pathways regulating cellular carbohydrates, lipid and protein metabolism, cell-cell interactions, and myocardial cell contractility. Regarding the duration of our study, we opted for a 6-month duration of intermittent fasting to investigate the impact of chronic intermittent fasting on heart transcriptome and proteome changes. While shorter-term (2-3 months) intermittent fasting studies in animals also have shown beneficial effects, we wanted to delve deeper into the molecular alterations induced by long-term intermittent fasting. We acknowledge the reviewer's observation about the dobutamine stress test not being a terminal procedure. In our manuscript, we aimed to present extensive resource data offering molecular insights into intermittent fasting-induced structural and signaling changes in the heart, focusing on various intermittent fasting time intervals. Additionally, we included the effect of cardiac function in response to intermittent fasting, specifically examining the intermittent fasting 16 hours (IF16) group, and highlighted key pathway modulations at this time point as supporting evidence. We appreciate the reviewer’s concern about examining cardiac function prior to 6-month. Although we did not perform this analysis in the current study, we fully agree that such comparison is required for a better understating of the temporal effects of molecular pathways in relation to heart functions during the course of intermittent fasting.

      2) Lack of validation study. One interesting observation from this study is the changes of transcriptome does not reflect all the changes at protein level and there is a differential gene expression pattern in IF12, IF16 and EOD. If this is the case, the authors should select a few important targets and provide both mRNA and protein level analysis, as a proof of concept for the bioinformatics analysis accuracy.

      We appreciate the reviewer's attention to the comparison of proteome and transcriptome data across different intermittent fasting regimens, as well as their interest in understanding any specific deviations in dietary regimens or sets of proteins. Indeed, it is well-established that post-transcriptional regulation can lead to discrepancies between mRNA and protein levels, primarily due to translational control or protein degradation mechanisms. Posttranscriptional buffering of proteins, particularly enzymes and kinases, is a plausible explanation, given their regulation through post-translational modifications, such as phosphorylations or allosteric regulations. Despite observing a modest correlation between the proteome and transcriptome data, which is generally common, we did identify certain enzymes, such as HMGC2, PDK4ACOT, CLPX, and RNASE4, with a high level of concordance between protein and mRNA abundances. These instances of agreement between the two data types suggest a coordinated regulation of these enzymes at the transcriptional and translational levels during intermittent fasting. To facilitate a clearer understanding of the correlation between proteome and transcriptome data, we have included correlation levels next to the scatter plots in our manuscript. These annotations aim to provide additional insights and aid readers in assessing the relationship between the two datasets.

      3) Poor western blot image quality. The quality of the western blot has several issues: a. the change of pAMPK/AMPK appears to be a decrease of total AMPK instead of change at p-AMPK level. Same with GSK3a/b. There appears to be an increase of total GSK3a/b. The AKT should also be blotted and quantified at phosphorylation level. The western blot should be clearly labeled, for the ones with double bands, including GSK3a/b, the author should clearly label which is GSK3a and which is GSK3b. For the IRS with non-specific band, the author should point out IRS-1 band itself.

      We appreciate the reviewer's careful evaluation of our study and acknowledge the concerns raised regarding the quality of the western blot images. Despite revising these experiments multiple times, we acknowledge that the immunoblot images may not meet the highest quality standards. We have included the original immunoblots in the supplementary section to ensure transparency and provide additional data for reference.

      Reviewer #2 (Public Review):

      This study provides an unbiased characterization of the cardiac proteome in the setting of intermittent fasting. The findings constitute a resource of quantitative proteomic data that sheds light on changes in cardiac function due to diet and that may be used in the future by other investigators. There are a number of key missing details that limit interpretation or present opportunities to strengthen the study.

      1) For example, the authors find that apolipoproteins are altered with fasting but it is not clear whether this is a contribution of myocardial tissue changes or systemic effects spilling into blood in cardiac tissues.

      We appreciate the reviewer's consideration of the potential effect of spilling blood on our study results. While we agree that such an effect is possible, we would like to emphasize that the observed overall changes in the proteome profile, particularly in pathways regulating metabolism and other cardiac remodeling-associated processes, suggest that the alterations we observed are more likely attributed to changes within the myocardial tissues themselves. We would like to highlight that blood microparticles or extracellular proteins were not enriched in our proteome data and hence the impact of blood spilling is not a concern. In fact, the biological processes we observed were majorly associated with ECM receptor interaction, focal adhesion and signaling pathways, which are not typical for secreted or extracellular proteome encompassing blood leakage.

      2) Some statements in the text like "Approximately one-third of the differentially expressed proteins in IF groups compared to AL were enzymes with catalytic activity involved in energy homeostasis pathways" do not appear to be supported by data.

      The enzymes among all the differentially expressed proteins in the intermittent fasting (IF) groups compared to the ad libitum (AL) control group are indicated in Supplementary Table S2. This constitutes one-third of the total number of differentially expressed proteins and several of these are involved in metabolic and energy homeostasis pathways.

      3) It is not clear how the list of Kinases were generated for Figure 1B.

      For the kinases indicated in Figure 1B, all the kinases from the proteins that were differentially expressed among the different dietary regimens compared to the control ad libitum (AL) group were first identified (listed in Supplementary Table S2), followed by enrichment analysis ((FDR ≤ 0.05) of the identified kinases across different pathways identified from KEGG pathways derived from DAVID bioinformatics resources.

      4) Changes in chromatin or gene expression are not measured so the conclusion that EOD led to 'epigenetic changes' relative to IF16 is not well supported.

      We appreciate the reviewer's feedback. Our statement in the manuscript referred specifically to the changes observed in Figure 2, where we presented increased proteomic abundance in pathways related to chromatin remodeling, chromatin organization, gene expression regulation, and histone modification in the EOD (Every Other Day Fasting) group compared to the IF 16 (Intermittent Fasting for 16 hours) group based on functional process and pathway enrichment analysis. Our comprehensive bioinformatics analysis, depicted in Figure 2, provides intriguing insights into these pathways. We acknowledge that further validation and in-depth studies through additional experiments and functional assays are essential to strengthen the conclusion from such observations, which is beyond the scope of the current study. We thank the reviewer for such valuable suggestions that are very useful for our ongoing studies, where we aim to obtain a more robust and thorough understanding of the impact of intermittent fasting on chromatin-related processes.

      5) There are also a number of areas where the text is vague. For example, it is not clear what is meant by 'trend shift' when discussing EOD results and Figure 3 generally could use additional information to better understands the figures.

      We would like to clarify that the term 'trend shift' refers to the change in the direction of protein and transcript level alterations. Based on the 2-D enrichment analyses that revealed correlated and non-correlated functional processes at the proteome and transcriptome levels, it was evident that during the early intermittent fasting 12 hours (IF12) regimen, the abundance changes of the proteins and transcripts involved in these processes were altered in the same direction (Supplementary Fig. 4b). Nevertheless, with increased fasting hours, mainly in the Every Other Day Fasting (EOD) group, we observed that the levels of proteins and transcripts involved in several of the functional processes appeared to be non-correlated as compared to the IF12 group (Fig. 2d). In Figure 3, we summarize the overall altered protein networks associated with the different intermittent fasting regimens, highlighting densely connected clusters of proteins along with their associated biological processes and pathways. Additionally, we unravel the impact of intermittent fasting on transcriptional rewiring and highlight regimen-specific alterations of specific transcriptional factors, several of which were found to have metabolic implications.

      6) An interesting finding is that the IF16 groups showed cardiac hypertrophy (SFig 11b). This is potentially a novel finding and the text should elaborate more on this phenomenon.

      We sincerely thank the reviewer for bringing attention to this intriguing aspect of our study. The data you have highlighted warrants further investigation, and we are committed to delving deeper into this area in our future research.

    2. Reviewer #1 (Public Review):

      In this manuscript, the authors explored the benefits of intermittent fasting on the cardiac physiology through a multi-omics approach and compared different fasting times (IF12; IF 16 and EOD) for a duration of 6 months. Combining the RNA-sequencing, proteomics and phosphor-proteomics analysis, the authors have made an interesting observation that different fasting times would lead to different changes that could be important for the cardiac physiology. Moreover, the changes observed at transcriptional level are different from protein level, suggesting a post-transcriptional regulation mechanism. Using western blot, the authors have confirmed the key signaling pathways, including AMPK, IRS pathway to be significantly altered upon intermittent fasting for 16hrs. Lastly, as a proof of concept for better cardiac function, the animals were challenged with dobutamine and echocardiography was performed to show the mice subjected to intermittent fasting have better cardiac systolic function.

      The impact of intermittent fasting on cardiovascular health has been well characterized in several studies. This report appears to be the first one utilizing a multi-omics approach and provided an interesting dataset at transcriptome, proteome and phosphor-proteome levels, and would serve as a valuable data resource for the field. I have the following concerns:

      Major concerns:

      1) The rationale for choosing the intermittent fasting pattern and timing<br /> While the 16:8 intermittent fasting is relatively standard, what is the rationale to test IF 12 hours? As a 4 hour fasting difference might not cause dramatic changes in transcriptome and proteome. Also, what is the rationale to perform 6 months study? The dobutamine stress test is not a terminal procedure, have the authors examined the cardiac function prior to 6 months to see whether there is a difference?

      2) Lack of validation study<br /> One interesting observation from this study is the changes of transcriptome does not reflect all the changes at protein level and there is a differential gene expression pattern in IF12, IF16 and EOD. If this is the case, the authors should select a few important targets and provide both mRNA and protein level analysis, as a proof of concept for the bioinformatics analysis accuracy.

      3) Poor western blot image quality<br /> The quality of the western blot has several issues: a. the change of pAMPK/AMPK appears to be a decrease of total AMPK instead of change at p-AMPK level. Same with GSK3a/b. There appears to be an increase of total GSK3a/b. The AKT should also be blotted and quantified at phosphorylation level. The western blot should be clearly labeled, for the ones with double bands, including GSK3a/b, the author should clearly label which is GSK3a and which is GSK3b. For the IRS with non-specific band, the author should point out IRS-1 band itself.

    3. eLife assessment

      This study provides a useful catalog of the cardiac proteome and transcriptome in response to intermittent fasting. Although mechanistic integration is limited, the technical aspects have been executed in a solid way, and sufficient evidence is provided to support the main conclusions. Future work can build on this study to expand our understanding of the relationship between dietary perturbations and cardiac function.

    1. eLife assessment

      This valuable study uses adult and neonatal murine models, together with genetic approaches, to propose that vitamin D, via Ikfz3/Aiolos, suppresses IL-2 signalling and reduces IL-2 signalling in Th2 cells. While vitamin D has been previously thought to modulate both effector and regulatory T-cell populations via the control of IL-2 signalling, this study provides solid new data of interest to immunologists as well as asthma researchers.

    2. Reviewer #1 (Public Review):

      The association of vitamin D supplementation in reducing Asthma risk is well studied, although the mechanistic basis for this remains unanswered. In the presented study, Kilic and co-authors aim to dissect the pathway of Vitamin D mediated amelioration of allergic airway inflammation. They use initial leads from bioinformatic approaches, which they then associate with results from a clinical trial (VDAART) and then validate them using experimental approaches in murine models. The authors identify a role of VDR in inducing the expression of the key regulator Ikzf3, which possibly suppresses the IL-2/STAT5 axis, consequently blunting the Th2 response and mitigating allergic airway inflammation.

      Strengths:<br /> The major strength of the paper lies in its interdisciplinary approach, right from hypothesis generation, and linkage with clinical data, as well as in the use of extensive ex vivo experiments and in vivo approaches using knock-out mice.

      The study presents some interesting findings including an inducible baseline absence/minimal expression of VDR in lymphocytes, which could have physiological implications and needs to be explored in future studies.

      Weaknesses:<br /> The core message of the study relies on the role of vitamin D and its receptor in suppressing the Th2 response. However, there is scope for further dissection of relevant pathophysiological parameters in the in vivo experiments, which would enable stronger translation to allergic airway diseases like Asthma.

      To a large extent, the authors have been successful in validating their results, although a few inferences could be reinforced with additional techniques, or emphasised in the discussion section (possibly utilising the ideas and speculative section offered by the journal).

      The study inferences also need to be read in the context of the different sub-phenotypes and endotypes of Asthma, where the Th2 response may not be predominant. Moreover, the authors have referenced vitamin D doses for the murine models from the VDAART trials and performed the experiments in the second generation of animals. While this is appreciated, the risk of hypervitaminosis-D cannot be ignored, in view of its lipid solubility. Possibly comparison and justification of the doses used in murine experiments from previous literature, as well as the incorporation of an emphasised discussion about the side effects and toxicity of Vitamin D, is an important aspect to consider.

      In no way do the above considerations undermine the importance of this elegant study which justifies trials for vitamin D supplementation and its effects on Asthma. The work possesses tremendous potential.

    3. Reviewer #2 (Public Review):

      Summary:<br /> This study seeks to advance our knowledge of how vitamin D may be protective in allergic airway disease in both adult and neonatal mouse models. The rationale and starting point are important human clinical, genetic/bioinformatic data, with a proposed role for vitamin D regulation of 2 human chromosomal loci (Chr17q12-21.1 and Chr17q21.2) linked to the risk of immune-mediated/inflammatory disease. The authors have made significant contributions to this work specifically in airway disease/asthma. They link these data to propose a role for vitamin D in regulating IL-2 in Th2 cells implicating genes associated with these loci in this process.

      Strengths:<br /> Here the authors draw together evidence from multiple lines of investigation to propose that amongst murine CD4+ T cell populations, Th2 cells express high levels of VDR, and that vitamin D regulates many of the genes on the chromosomal loci identified to be of interest, in these cells. The bottom line is the proposal that vitamin D, via Ikfz3/Aiolos, suppresses IL-2 signalling and reduces IL-2 signalling in Th2 cells. This is a novel concept and whilst the availability of IL-2 and the control of IL-2 signalling is generally thought to play a role in the capacity of vitamin D to modulate both effector and especially regulatory T cell populations, this study provides new data.

      Weaknesses:<br /> Overall, this is a highly complicated paper with numerous strands of investigation, methodologies etc. It is not "easy" reading to follow the logic between each series of experiments and also frequently fine detail of many of the experimental systems used (too numerous to list), which will likely frustrate immunologists interested in this. There is already extensive scientific literature on many aspects of the work presented, much of which is not acknowledged and largely ignored. For example, reports on the effects of vitamin D on Th2 cells are highly contradictory, especially in vitro, even though most studies agree that in vivo effects are largely protective. Similarly, other reports on adult and neonatal models of vitamin D and modulation of allergic airway disease are not referenced. In summary, the data presentation is unwieldy, with numerous supplementary additions, which makes the data difficult to evaluate and the central message lost. Whilst there are novel data of interest to the vitamin D and wider community, this manuscript would benefit from editing to make it much more readily accessible to the reader.

      Wider impact: Strategies to target the IL-2 pathway have long been considered and there is a wealth of knowledge here in autoimmune disease, transplantation, GvHD etc - with some great messages pertinent to the current study. This includes the use of IL-2, including low dose IL-2 to boost Treg but not effector T cell populations, to engineered molecules to target IL-2/IL-2R.

    1. Reviewer #2 (Public Review):

      Summary:

      This work presents a previously undescribed neuroanatomical and neurophysiological analog between mammals and songbirds. Juvenile zebra finches learn to sing by memorizing an adult song and then, through practice, converging to a close copy of the stored template. Previous work identified pathways emanating from the avian auditory cortical regions (AIV) and basal ganglia that, through ventral pallium (VP), and the subthalamic nucleus, innervate the finches' ventral tegmental area (VTA). As in mammals, the dopaminergic projections of the VTA onto the avian striatopallidal nucleus, area X, deliver a prediction error signal. This signal encodes a surprisingly better or worse performance of the ongoing song and therefore allows the birds to improve.

      In mammals, lateral Habenula (LHb) neurons contribute to learning by signaling disappointing trial outcomes or aversive stimuli. Using viral tract tracing Roesner et al. identify projections from the zebra finch VP and AIV to the LHb as well as from the LHb to the VTA. The authors use functional mapping to show that the VP activates the LHb and that the LHb suppresses the Area X-projecting VTA neurons. Then, the authors show that lesioning the LHb in juvenile finches does not prevent them from copying their tutor's song but still leads to worse performance than controls due to the production of highly abnormal vocalizations, peppered in both lone and female-directed songs. In contrast, lesioning the LHb in adult finches has no effect on the song. Together, these findings suggest that the LHb may be part of a song evaluation system and may participate in learning by signaling vocalizations that deviate from the desired tutor template.

      The LHb is an evolutionarily conserved structure that connects the forebrain and midbrain with the epithalamus in vertebrates. By identifying the LHb as a component in song learning, the authors lay the grounds for a trove of new research into the various emotional, biophysical, memory, and sensory processes that contribute to learning within and through the LHb. Most conclusions of this paper are well supported by data, but some conceptual and analytic aspects require framing with respect to methodological limitations.

      Strengths:<br /> The use of both anatomical tracing and functional circuit mapping is a uniquely-powerful approach to addressing the main line of inquiry in this work. Specifically, collision testing and antidromic identification allow identifying LHb-->VTA and VTA-->X projecting neurons and therefore testing the response of these specific learning-related projections to stimulation in VP and LHb (respectively).

      The evaluation of abnormal vocalizations using a variational autoencoder (VAE) is a particularly strong approach that is immune to observer biases. By training this artificial neural network model with sham or pre-lesion animals, the authors clearly distinguish abnormal syllables because of their significantly poorer reconstruction through the VAE. This approach allowed the authors to provide strong quantitative support to the effect of LHb lesion in juvenile finches on their adult song.

      Weaknesses:<br /> The lesions in juveniles, as the authors discuss, were histologically examined at the end of the song development, months after their creation. The authors mention not being able to rule out damage to the medial part of the Hb. But the effect of the lesions could perhaps be mediated by damage to other brain regions, such as DLM, or passing fibers (when using electrolytic lesions).

      Additionally, the effect on learning could also be mediated indirectly. In mammals, the outputs of the LHb target dopaminergic regions, serotonergic regions, and a cholinergic region. In birds, the LHb may also have a diverse impact on neuromodulators and therefore an impact on behavior states and on sleep. Disrupted behavior states may lead to poorer or less frequent practice and indirectly to abnormal results that do not stem from erroneous performance evaluation.

    2. eLife assessment

      The authors provide the first investigation of the role of the lateral habenula in vocal learning in the songbird. This study provides important insights into the conserved connectivity of the lateral habenula with dopaminergic reinforcement circuits and presents a potential role of this circuit in zebra finch song learning. The results stem from a careful anatomical and functional mapping and from a rigorous behavior analysis that, together, implicate a previously undescribed analog between mammals and songbirds. Although many aspects of the manuscript - like the analysis of song behavior - are exceptional, the evidence linking behavior to selective lesions of the lateral habenula is, at this point, incomplete, leaving the interpretation of key results difficult.

    3. Reviewer #1 (Public Review):

      Reinforcement mechanisms play a central role in learning structured behaviors, and recent studies in the songbird have shown that reinforcement learning is also integral to the imitation of the internally motivated singing behavior of songbirds. In this study, Roeser, Teoh et al. investigate the role of the lateral habenula in this process. The lateral habenula is thought to signal unexpected aversive outcomes, like reward omission, and inhibit dopaminergic neurons in the ventral tegmental area (VTA) via direct synaptic projections. Thus, the lateral habenula could logically play a key role in the trial-and-error learning of song by signaling worse performance outcomes (as evaluated by comparing to a memory of the tutor song) as birds practice copying their father's song.

      The authors show that both the anatomical and functional connectivity of the lateral habenula in songbirds resembles what has been described in other vertebrates, including in afferent inputs from the ventral pallidum and efferent projections to the VTA that suppresses activity of putative dopaminergic neurons. Additionally, they show the lateral habenula circuits appear to be integrated with circuits known to be important for learning song, including receiving input from an auditory region, AIV, thought to be important in relaying song evaluation signals and providing inputs to VTA that overlap with neurons projecting to areas of the striatum essential for vocal learning (VTA-Area X neurons). They conclude that lesions of the lateral habenula early in song development do not disrupt a bird's ability to accurately imitate the song of their tutor but result in either the retention or development of unusual vocalizations that have qualities observed in the songs of zebra finches that have been experimentally raised without having access to a song tutor. The analysis of the adult song behavior is particularly compelling and provides novel approaches for identifying outlier vocalizations. Lastly, the authors show that birds will include these isolate-like syllables during courtship behaviors and that lesions of the lateral habenula do lead to disruptions in adult birds.

      The conclusions stemming from the analysis of habenula connectivity require stronger support, and incomplete evidence is provided to link lesions of the lateral habenula to the observed disruptions in song learning.

      This study has several strengths. First, the goal of understanding the role of the lateral habenula in natural learning of a complex behavior, like birdsong, is a valuable research avenue that can ultimately better link how natural learning of intrinsically rewarded behaviors may (or may not) harness similar learning mechanisms that have been well delineated in laboratory trained and externally reinforced behaviors. Second, the computational approaches brought to bear on the analysis of song, including variational autoencoders to help define the range of control song syllables from abnormal song syllables and anomaly scores, help provide a good framework for examining and conveying disruptions in behavior that might be associated with lesions of the lateral habenula. Lastly, the manuscript is well-written and clearly presented, and the authors do acknowledge some of the weaknesses mentioned below.

      The major weakness of the article is that the authors do not verify the completeness (i.e., how much of the lateral habenula is lesioned in individual animals) or the extent (if neuronal regions adjacent to the lateral habenula neural are also lesioned) of their lesions. It is argued that this is not possible because of the timeframe (long survival times) of the experiments. However, there are standard ways of addressing this technical hurdle. One simple approach would be to first examine the correlation in the number of retrogradely labeled neurons in LHb, VP, and Area X following injections of tracer into VTA. For convergent anatomical pathways, there is typically a strong positive correlation across input circuits. Therefore, given the number of retrogradely labeled neurons in VP and Area X following VTA injections, one can make reasonable predictions for how many retrogradely labeled neurons would be expected in LHb. Using tracer injections at the end of the experiments and quantification of the retrograde labeling would allow the authors to reasonably estimate the completeness of their lesions.

      This unfortunate problem with the design of the experiments significantly weakens any interpretations for the role of the lateral habenula in song learning. This is particularly important because the lateral habenula is a small area that has several adjacent brain structures that could also play significant roles in song development, most of which have not been well studied in this context. These include the medial habenula, the thalamic nuclei DMP and UVA, and forebrain axons from RA, as well as axons flowing into, out of, and interconnecting the structures previously mentioned. Additional tracer injections with different color tracers could be used to provide reasonable assurance that these other adjacent circuits are still intact at the end of each lesion experiment.

      There are two weaknesses with the assessment of the functional connectivity of the lateral habenula. First, the anatomical tracing experiments are not particularly compelling. Very little data is shown and there is no quantification of any of the results. In the inset for retrograde labeling of VP-LHb and VP-VTA neurons, it is unclear that neurons of either population are shown in that image. Likewise, terminals from LHb in VTA are very sparse and it is not clear how well they overlap with VTA-X neurons which are intermingled with dopaminergic neurons projecting to other areas of the brain. The images shown seem out of focus and blurry. Although the electrophysiological experiments provide better assurance of these pathways, the sample sizes in these neurophysiology experiments seem preliminary. Stronger evidence in both regards would provide better assurance of LHb circuitry.

      The interpretations and theoretical implications of these results are unclear. This is in part because it is not possible to fully tie behavioral outcomes specifically to lesions of the lateral habenula, but also because, albeit interesting, the behavioral results are somewhat confusing. The developmental lesions did not impact the ability of zebra finches to learn how to copy the song of their tutor over development, indicating, in a strict sense, this circuit is not needed for vocal imitation of a social model. However, birds clearly exhibit unusual song syllables that they throw into their song bouts, even when singing in courtship displays. What this may reflect is not addressed in this study. It could be that lesions disrupt a bird's ability to prune away poor syllables over development, and/or that lesions result in birds being unable to suppress unwanted vocal behaviors during performances. Analysis of song over development could provide insights into these possibilities and help provide a better understanding of what the lateral habenula contributes to the song-learning process.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      The manuscript focused on roles of a key fatty-acid synthesis enzyme, acetyl-coA-carboxylase 1 (ACC1), in the metabolism, gene regulation and homeostasis of invariant natural killer T (NKT_ cells and impact on these T cells' roles during asthma pathogenesis. The authors presented data showing that the acetyl-coA-carboxylase 1 enzyme regulates the expression of PPARg then the function of NKT cells including the secretion of Th2-type cytokines to impact on asthma pathogenesis. The results are clearcut and data were logically presented.

      Major concerns:

      1) This study heavily relied on the CD4-CreACC1fl/fl mice. While using of a-GalCer stimulation and Ja18KO mice mitigated the concern, it is still a major concern that at least some of the phenotype were due to the effect on conventional CD4 T cells. For example, the deletion of ACC1 gene seems also decreased the numbers of conventional CD4 T cells (Fig. 2D, Fig. S1D). Previously there were reports showing ACC1 gene in conventional CD4 T cells also plays a role in lung inflammation (Nakajima et al., J. Exp. Med. 218, 2021). If the authors believe the phenotype observed was mainly due to iNKT cells, rather than conventional CD4 T cells, a compare/contrast of the two studies should be discussed to explain or reconcile the results.

      As the reviewer pointed out, although we have experimentally demonstrated the critical role of ACC1 in iNKT cells in the regulation of allergic asthma, use of Cd4-CreAcc1fl/fl mice inevitably brings the role of conventional CD4+ T-cells in question.

      The study conducted by Nakajima et al, which reported that the absence of ACC1 in CD4+ T-cells resulted in reduced numbers and functional impairment of memory CD4+ T-cells, leading to less airway inflammation further suggests possibility of involvement of conventional CD4+ T-cells in regulation of allergic asthma. The direct compare/contrast of two studies seems difficult since Nakajima et al have focused on the role of ACC1 in memory CD4+ T cells while we have focused on iNKT cells.

      However, based on our experimental results, we believe that iNKT cells more contribute to the regulation of allergic asthma for the following reasons - (i) while the number of iNKT cells were significantly reduced in Cd4-CreAcc1fl/fl mice, the number of conventional CD4+ T cells were only slightly reduced, (ii) Cd4-CreAcc1fl/fl mice were dramatically decreased in their AHR in α-GalCer induced iNKT cell dependent allergic asthma model, and (iii) Jα18 KO mice that lack iNKT cells almost completely restore their AHR when adoptively transferred with WT iNKT cells but not ACC1-deficient iNKT cells. These results indicate that ACC1-mediated regulation of AHR is significantly dependent on iNKT cells, which might contribute to AHR in the study conducted by Nakajima et al. as well. From these, we believe that while ACC1 is a critical regulator of both conventional CD4+ T cells and iNKT cells in regulation of allergic asthma, iNKT cells may contribute more to regulation of allergic asthma compared to CD4+ T cells. We have summarized the above-mentioned contents in LINES: 421-441 with the reference you have mentioned:

      "It should be noted that Cd4-CreAcc1fl/fl mice lack ACC1 expression in both conventional CD4+ T cells and iNKT cells. It should be noted that Cd4-CreAcc1fl/fl mice lack ACC1 expression in both conventional CD4+ T cells and iNKT cells. While the use of iNKT cell- specific Cre system would demonstrate critical role of ACC1 in iNKT cells regarding allergic asthma, there is no iNKT cell-specific Cre system available yet. In addition, the study conducted by Nakajima et al, which reported that the absence of ACC1 in CD4+ T cells resulted in reduced numbers and functional impairment of memory CD4+ T cells, leading to less airway inflammation further suggests possibility of involvement of conventional CD4+ T cells in regulation of allergic asthma. However, based on our experimental results, we believe that iNKT cells more contribute to the regulation of allergic asthma for the following reasons - (i) while the number of iNKT cells were significantly reduced in Cd4-CreAcc1fl/fl mice, the number of conventional CD4+ T cells were only slightly reduced, (ii) Cd4-CreAcc1fl/fl mice were dramatically decreased in their AHR in α-GalCer induced allergic asthma model, and (iii) Jα18 KO mice that lack iNKT cells almost completely restore their AHR when adoptively transferred with WT iNKT cells but not ACC1-deficient iNKT cells. These results indicate that ACC1-mediated regulation of AHR is significantly dependent on iNKT cells, which might contribute to AHR in the study conducted by Nakajima et al. as well. From these, we believe that while ACC1 is a critical regulator of both conventional CD4+ T cells and iNKT cells in regulation of allergic asthma, iNKT cells may contribute more to regulation of allergic asthma compared to CD4+ T cells."

      2) The overall significance of the manuscript is related to the potential clinical suppression of ACC1 in human asthma patients. However, the authors only showed the elevated ACC1 genes in these patients, not even in vitro data demonstrating that suppression of ACC1 genes in the iNKT cells from patients could have potential therapeutic effect or suppression of the relevant cytokines.

      We would like to appreciate reviewer’s critical comment here. Due to paucity of iNKT cells in human PBMCs, it is extremely difficult to experimentally manipulate expression level of ACC1 in human iNKT cells. Alternatively, to address reviewer’s comment, we compared the cytokine expression of ACC1high iNKT cells from human allergic asthma patients to ACC1low iNKT cells from healthy individuals or non-allergic asthma patients. Our results show that iNKT cells from allergic asthma patients express higher levels of IL4 and IL13 than those from healthy individuals or non-allergic asthma patients, suggesting that the level of ACC1 is most likely involved in functionality of human iNKT cells as well. The results are newly shown in supplementary Fig. 5C with explanation in LINES 376-378 and 382-384:

      LINES 376-378: Lastly, the expression levels of IL4 and IL13 were significantly higher in iNKT cells from the allergic asthma patients compared to those from healthy controls and nonallergic asthma patients (Fig. S5C).

      LINES 382-384: Thus, iNKT cells from allergic asthma patients express higher ACC1, FASN and PPARG levels and lower levels of a glycolysis which is accompanied with higher levels of IL4 and IL13 than iNKT cells from healthy controls and nonallergic asthma patients.

      3) The authors report that a-GalCer administration can induce the AHR, however, in the cited paper (Hachem et al., Eur J. Immunol. 35, 2793, 2005), iNKT cell activation seems to have the opposite effect to inhibit AHR. Did the authors mean to cite different papers?

      We apologize for the confusion. We have replaced the inaccurate reference with the reference below in LINES 863-865:

      1. Glycolipid activation of invariant T cell receptor+ iNKT cells is sufficient to induce airway hyperreactivity independent of conventional CD4+ T cells, Proc Natl Acad Sci USA, 103 pp, 2782-2787 (2006),

      Reviewer #2 (Public Review):

      In this study the authors sought to investigate how the metabolic state of iNKT cells impacts their potential pathological role in allergic asthma. The authors used two mouse models, OVA and HDM-induced asthma, and assessed genes in glycolysis, TCA, B-oxidation and FAS. They found that acetyl-coA-carboxylase 1 (ACC1) was highly expressed by lung iNKT cells and that ACC1 deficient mice failed to develop OVA-induced and HDM-induced asthma. Importantly, when they performed bone marrow chimera studies, when mice that lacked iNKT cells were given ACC1 deficient iNKT cells, the mice did not develop asthma, in contrast to mice given wildtype NKT cells. In addition, these observed effects were specific to NKT cells, not classic CD4 T cells. Mechanistically, iNKT cell that lack AAC1 had decreased expression of fatty acid-binding proteins (FABPs) and peroxisome proliferator-activated receptor (PPAR)γ, but increased glycolytic capacity and increased cell death. Moreover, the authors were able to reverse the phenotype with the addition of a PPARg agonist. When the authors examined iNKT cells in patient samples, they observed higher levels of ACC1 and PPARG levels, compared to healthy donors and non-allergic-asthma patients.

      We are very grateful for your kind appreciation of our work.

      Reviewer #1 (Recommendations For The Authors):

      1) Related to major concern I, an iNKT cell-specific knockout of ACC1 in iNKT cells is highly desirable and should be used to directly address the question.

      As the reviewer suggested, iNKT cell-specific deletion of ACC1 will provide invaluable information to our study. Unfortunately, Cre-Loxp system that specifically targets iNKT cells has not be developed. Thus, we opted to use CD4-Cre system, which is the gold standard Cre system for the study of iNKT cells. In addition, to highlight the role of ACC1 in iNKT cells in relation to regulation of allergic asthma, we performed iNKT cell-dependent experiment models and conducted adoptive transfer of iNKT cells into iNKT cell-deficient mice (Jα18 KO). These have been discussed in the section of Discussion in LINES:421-441:

      "It should be noted that Cd4-CreAcc1fl/fl mice lack ACC1 expression in both conventional CD4+ T cells and iNKT cells. While the use of iNKT cell- specific Cre system would demonstrate critical role of ACC1 in iNKT cells regarding allergic asthma, there is no iNKT cell-specific Cre system available yet. In addition, the study conducted by Nakajima et al, which reported that the absence of ACC1 in CD4+ T cells resulted in reduced numbers and functional impairment of memory CD4+ T cells, leading to less airway inflammation further suggests possibility of involvement of conventional CD4+ T cells in regulation of allergic asthma. However, based on our experimental results, we believe that iNKT cells more contribute to the regulation of allergic asthma for the following reasons - (i) while the number of iNKT cells were significantly reduced in Cd4-CreAcc1fl/fl mice, the number of conventional CD4+ T cells were only slightly reduced, (ii) Cd4-CreAcc1fl/fl mice were dramatically decreased in their AHR in α-GalCer induced allergic asthma model, and (iii) Jα18 KO mice that lack iNKT cells almost completely restore their AHR when adoptively transferred with WT iNKT cells but not ACC1-deficient iNKT cells. These results indicate that ACC1-mediated regulation of AHR is significantly dependent on iNKT cells, which might contribute to AHR in the study conducted by Nakajima et al. as well. From these, we believe that while ACC1 is a critical regulator of both conventional CD4+ T cells and iNKT cells in regulation of allergic asthma, iNKT cells may contribute more to regulation of allergic asthma compared to CD4+ T cells."

      2) For Fig. 5A, RT-PCR verification of PPARg gene expression level change is needed.

      As suggested, we have verified the level of Pparg expression of ACC1-deficient iNKT cells through real time PCR and have added the results to Figure 5A.

      3) Verifying at least the cytokine secretion can be regulated by manipulating ACC1 expression in human asthma patient samples will make the paper much stronger.

      We would like to appreciate reviewer’s critical comment here. Due to paucity of iNKT cells in human PBMCs, it is extremely difficult to experimentally manipulate expression level of ACC1 in human iNKT cells. Alternatively, to address reviewer’s comment, we compared the cytokine expression of ACC1high iNKT cells from human allergic asthma patients to ACC1low iNKT cells from healthy individuals or non-allergic asthma patients. Our results show that iNKT cells from allergic asthma patients express higher levels of IL4 and IL13 than those from healthy individuals or non-allergic asthma patients, suggesting that the level of ACC1 is most likely involved in functionality of human iNKT cells as well. The results are newly shown in supplementary Fig. 5C with explanation in LINES 376-378 and 382-384:

      LINES 376-378: Lastly, the expression levels of IL4 and IL13 were significantly higher in iNKT cells from the allergic asthma patients compared to those from healthy controls and nonallergic asthma patients (Fig. S5C).

      Minor points:

      1) What are the cells being stained in Fig. S2C? Are they iNKT cells? If yes, why there is a tetramer-negative population?

      The density plot on the left panel of Fig. S2C represents magnetically enriched thymic iNKT cells. Due to their scarcity, thymic iNKT cells were enriched using CD1d tetramer via magnetic activated cell sorting (MACS)-based enrichment technique. After enrichment, we re-stained enriched cells with CD1d tetramers and gated out CD3 and CD1d tetramer double positive cells via flow cytometry to specifically identify iNKT cells. Due to the imperfect purity of magnetic cell separation technique, a small proportion of CD1d tetramer-negative population is seen in the left panel of Fig. S2C.

      A brief mention of this methodology has been added to the “Preparation and activation of murine T and iNKT cells” section under Materials and Methods in LINES 560-566:

      "Alternatively, thymic and liver mononuclear cells were labeled with APC-conjugated ɑ-GalCer/CD1d tetramers, bound to anti-APC magnetic beads, and enriched on a MACS separator (Miltenyi Biotec, Auburn, CA, USA; purity 89%). To analyze the development of thymic iNKTs cells, we re-stained enriched cells with CD1d tetramer and gated out CD3 and CD1d tetramer double positive cells via flow cytometry to identify thymic iNKT cells, which were used for further analysis."

      2) Where are the adoptive transferred iNKT cells purified/sorted from? Are they from lungs of Acc1fl/fl or CD4-cre/Acc1fl/fl mice, asthma-induced already? As there are very few iNKT cells in healthy and untreated mice. There is little described or explained in Methods and Materials.

      The adoptively transferred iNKT cells were purified and pooled from the lungs of at least 10 mice per group. Briefly, mouse lungs were finely chopped into small pieces using razor blades and enzymatically digested using type IV collagenase. iNKT cells from the lungs were sorted via FACS using CD1d tetramers. Approximately, 6.0 × 105 of iNKT cells were obtained from the lungs at least of 10 mice. A brief mention of this methodology was added to the “Adoptive transfer of iNKT cells in allergic asthma models” section in Materials and Methods in LINES 568-574: iNKT cells were obtained from the lungs of at least 10 Acc1fl/fl or Cd4-CreAcc1fl/fl mice. Mouse lungs were finely chopped into small pieces using razor blades and were enzymatically digested using type IV collagenase. iNKT cells from the lungs were sorted via FACS using CD1d tetramers. Approximately, 6.0 × 105 of iNKT cells were obtained from at least 10 mice and were adoptively transferred into individual recipient mouse via the intratracheal route.

      3) The use of 2-NBDG was not explained in multiple locations, particularly in Fig.5H. How is its fluorescence used to track iNKT cells? No description in Materials and methods.

      2-NBDG, a fluorescence tagged glucose analog is a indicator for measurement of glucose uptake in cells. The fluorescence intensity in 2-NBDG-treated cells represents the degree of glucose uptake in cells, which can be measured using flow cytometry. Thus, in the experiments where we treated 2-NBDG, we described the results as "glucose uptake". A brief explanation of this methodology was added to the main text in LINES 253-254. In addition, we have provided the detailed use of 2-NBDG in ‘Measurement of glucose uptake capacity’ under the section of Materials and methods in LINES 599-607: Measurement of glucose uptake capacity using 2-NBDG assay. After treating 2-NBDG, the fluorescence intensity of cells were measured using flow cytometry and represented the degree of glucose uptake in cells.

      4) Fig. 3A legends: it should be "Ja18 KO"?

      We would like to appreciate your comment on our mistake here. We have corrected this in the legend of figure 3A.

      5) There are two different mechanisms for explaining the less severe asthma/AHR phenotype in ACC1-KO iNKT cells. One is lower number of iNKT cells due to cell death, the other decreased cytokine secretions. It is not clear to the reviewer, what are the relationship between two mechanisms. Are they both contributing to the asthma phenotype or cooperative?

      As you mentioned, ACC1-deficient iNKT cells showed increase in intrinsic pathway of apoptosis as well as decrease in their cytokine secretion simultaneously. Thus, we believe that increase in cell death and decrease in cytokine expression of ACC1-deficient iNKT cells cooperatively contributed to the asthma phenotype. The above-mentioned point was discussed in LINES 453-458: Furthermore, the apoptotic tendency of the ACC1-deficient iNKT cells was accompanied by their functional impairment. The ACC1-deficient iNKT cells exhibited impaired viability and functionality. Treatment of glycolysis inhibitor in ACC1-deficient iNKT cells not only restored cellular survival but also their functionalities. From these results, we speculate that ACC1-mediated regulation of both cellular homeostasis and cytokine production cooperatively contributed to the asthma phenotype.

      Reviewer #2 (Recommendations For The Authors):

      Overall, this is a very strong study with few concerns.

      1) Are there tissue specific differences in the iNKT cell populations? The authors examined lung iNKT cells in the Figs 1-3, and used liver NKT cells for the mechanistic studies in Fig 4-5. The studies shown in Fig S2 suggest that ACC1 deficient iNKT cells have developmental defects and impaired homeostatic proliferative capacity. Does ACC1 impact lung and liver iNKT cells similarly and is the lack of allergic asthma in ACC1 deficient iNKT cells due to defective iNKT cell trafficking to the lungs or a failure to survive after transfer (Fig 3)?

      In absence of ACC1, the number of iNKT cells from both lungs and livers decreased and showed consistent features (i.e: metabolic parameters), suggesting that there was no tissue specific role of ACC1 in INKT cells.

      In the adoptive transfer experiments, we transferred equal number of WT and ACC1-deficient iNKT cells directly into mouse lungs via intratracheal route. Thus, decreased numbers of adoptively transferred ACC1-deficient iNKT cells is more likely from their intrinsically impaired homeostatic proliferative capacity, not due to defective trafficking to the lungs.

      2) Similarly, are chemokine receptor expression patterns similar between WT and ACC1 deficient iNKTs (Fig 4)?

      We compared chemokine receptor expression of WT and ACC1-deficient iNKT cells using our RNA-seq and verified their expression levels via real time q-PCR. The expression levels of these chemokine receptors were comparable between the two groups of iNKT cells. The results are newly shown in supplementary Fig. 4I with explanation in LINES 351-357:

      Meanwhile, chemokine receptor signaling is also implicated in regulating homeostasis of iNKT cell in the periphery. In particular, Meyer et al. suggested that iNKT cells require CCR4 to localize to the airways and to induce AHR. Thus, we examined the expression of several chemokine receptors, including CCR4. We found that WT and ACC1-deficient iNKT cells did not differ in their chemokine receptor expressions, suggesting that the chemokine signaling may not be critical for ACC1-mediated regulation in AHR.

      3) The authors data suggest that Tregs are not playing a major role in the regulation of asthma induction in their ACC1 deficient mice, based on FoxP3 expression. Did the authors perform suppressor assays to show that the Tregs function similarly in WT and ACC1 deficient mice?

      We would like to appreciate reviewer’s reasonable comment. However, we did not experimentally compare the suppressive capacity of WT and ACC1-deficient Tregs under the asthmatic conditions, due to minimal differences in their Foxp3 expression (Foxp3 expression is a critical determinant of suppressive function of Tregs- (Immunity. 2019 Feb 19;50(2):302-316.; Nat Immunol 2003; 4: 330–336; Cell Mol Immunol. 2015 Sep;12(5):558-65.)). Thus, we speculate that the suppressive capacity between WT and ACC1-deficient Tregs might be similar. Nevertheless, since the suppressive capacity of Tregs can also be regulated by other soluble factors and surface molecules, we cannot completely rule out the possibility that ACC1-deficient Tregs might differ in their suppressive capacity to WT Tregs in asthma. In short, while there are clear limitations to our interpretation here, we believe it is unlikely that Tregs from WT and ACC1 deficient mice show difference in their suppressive capacity during asthma. We have included above-mentioned points in the section of Discussion in LINES 415-419: In this regard, Tregs may also play a major role in asthma. However, the expression level of Foxp3 was comparable between WT and ACC1-deficient Tregs. The level of Foxp3 to some extent, serves as a critical determinant of suppressive function of Tregs. Thus, we speculate that they might not critically contribute to the development of asthma, although we cannot completely rule out the contribution of Tregs to our studies.

    2. eLife assessment

      The study highlights an important role of key fatty-acid synthesis enzyme, acetyl-coA-carboxylase 1 (ACC1) in development and homeostasis of invariant natural killer T (iNKT cells), as well as its significance in asthma etiology. The work defines novel mechanisms driving metabolic regulation of iNKT cells and its role in allergic asthma. The data reported in the manuscript are convincing, and the work adds to our understanding of the metabolic regulation of iNKT cells.

    3. Reviewer #1 (Public Review):

      The manuscript focused on roles of a key fatty-acid synthesis enzyme, acetyl-coA-carboxylase 1 (ACC1), in the metabolism, gene regulation and homeostasis of invariant natural killer T (NKT_ cells and impact on these T cells' roles during asthma pathogenesis. The authors presented data showing that the acetyl-coA-carboxylase 1 enzyme regulates the expression of PPARg then the function of NKT cells including the secretion of Th2-type cytokines to impact on asthma pathogenesis. The results are clearcut and data were logically presented.

    4. Reviewer #2 (Public Review):

      In this study the authors sought to investigate how the metabolic state of iNKT cells impacts their potential pathological role in allergic asthma. The authors used two mouse models, OVA and HDM-induced asthma, and assessed genes in glycolysis, TCA, B-oxidation and FAS. They found that acetyl-coA-carboxylase 1 (ACC1) was highly expressed by lung iNKT cells and that ACC1 deficient mice failed to develop OVA-induced and HDM-induced asthma. Importantly, when they performed bone marrow chimera studies, when mice that lacked iNKT cells were given ACC1 deficient iNKT cells, the mice did not develop asthma, in contrast to mice given wildtype NKT cells. In addition, these observed effects were specific to NKT cells, not classic CD4 T cells. Mechanistically, iNKT cell that lack AAC1 had decreased expression of fatty acid-binding proteins (FABPs) and peroxisome proliferator-activated receptor (PPAR)γ, but increased glycolytic capacity and increased cell death. Moreover, the authors were able to reverse the phenotype with the addition of a PPARg agonist. When the authors examined iNKT cells in patient samples, they observed higher levels of ACC1 and PPARG levels, compared to healthy donors and non-allergic-asthma patients.

    1. Author Response

      We would like to thank the reviewers for their positive and constructive comments on the manuscript.

      We are planning the following revisions to both DGRPool and the corresponding manuscript to address the reviewers’ comments:

      1) We agree with reviewer #1 that normalizing the data could potentially improve the GWAS results. Thus, we plan to explore the implementation of this option and assess its impact on the overall results. We will also investigate replacing the ANOVA test with a KRUSKAL test. Instead of upfront data normalization, we will consider using the PLINK –pheno-quantile-normalize option. Both options will be compared on a set of phenotypes where we can analyze the output (i.e., for phenotypes where we expect to find specific variants), to determine whether these strategies enhance the detection power.

      2) We also agree with both reviewers that gene expression information is of interest. However, we recognize that incorporating such information would entail substantial work (as elaborated in our response to comments below). We feel that this extensive work is beyond the current scope of this paper, which primarily focuses on phenotypes and genotype-phenotype associations. Nonetheless, we are committed to enhancing user experience by including more gene-level outlinks to Flybase. Additionally, we will link variants and gene results to Flybase's online genome browser, JBrowse. By following the reviewers' suggestions, we aim to guide DGRPool users to potentially informative genes.

      3) In agreement with reviewer #2, we acknowledge that additional tools could enhance DGRPool's functionality and facilitate meta-analyses for users. Therefore, we are in the process of developing a gene-centric tool that will allow users to query the database based on gene names. Moreover, we intend to integrate ortholog databases into the GWAS results. This feature will enable users to extend Drosophila gene associations to other species if necessary.

      4) Finally, we also concur with both reviewers about making minor edits to the manuscript to address their feedback.

      Reviewer #1 (Public Review):

      This is a technically sound paper focused on a useful resource around the DRGP phenotypes which the authors have curated, pooled, and provided a user-friendly website. This is aimed to be a crowd-sourced resource for this in the future.

      The authors should make sure they coordinate as well as possible with the NC datasets and community and broader fly community. It looks reasonable to me but I am not from that community.

      We thank the reviewer for the positive comments. We are relatively well-connected to the D. melanogaster community and aim to leverage this connection to render the resource as valuable as possible. DGRPool in fact already reflects the input of many potential users and was also inspired by key tools on the DGRP2 website. Furthermore, it also rationalizes why we are often bridging our results with other resources, such as linking out to Flybase, which is the main resource for the Drosophila community at large.

      I have only one major concern which in a more traditional review setting I would be flagging to the editor to insist the authors did on resubmission. I also have some scene setting and coordination suggestions and some minor textual / analysis considerations.

      The major concern is that the authors do not comment on the distribution of the phenotypes; it is assumed it is a continuous metric and well-behaved - broad gaussian. This is likely to be more true of means and medians per line than individual measurements, but not guaranteed, and there could easily be categorical data in the future. The application of ANOVA tests (of the "covariates") is for example fragile for this.

      The simplest recommendation is in the interface to ensure there is an inverse normalisation (rank and then project on a gaussian) function, and also to comment on this for the existing phenotypes in the analysis (presumably the authors are happy). An alternative is to offer a kruskal test (almost the same thing) on covariates, but note PLINK will also work most robustly on a normalised dataset.

      We thank the reviewer for raising this interesting point. Indeed, we did not comment on the distribution of individual phenotypes due to the underlying variability from one phenotype to another, as suggested by the reviewer. Some distributions appear normal, while others are clearly not normally distributed. This information is 'visible' to users by clicking on any phenotype; DGRPool automatically displays its global distribution if the values are continuous/quantitative. We acknowledge the reviewer's concerns regarding the use of ANOVA tests. However, we consider it acceptable to perform linear regression (including ANOVA tests) on non-normally distributed data, as only the prediction errors need to follow a normal distribution.

      Furthermore, the ANOVA test is solely conducted to assess whether any of the potential covariates (such as well-established inversions and symbiont infection status) are associated with the phenotype of interest. PLINK2 automatically corrects for the effects of these covariates during GWAS by considering them as part of the regression model.

      Nevertheless, we concur with the reviewer that normalizing the data could potentially enhance GWAS results. Consequently, we commit to exploring the impact of data normalization on the overall outcomes. Additionally, we will consider replacing the ANOVA test with a KRUSKAL test, and using the PLINK –pheno-quantile-normalize option. We intend to compare both approaches using a set of phenotypes where we can compare the output (i.e., where specific variants are expected to be identified). This comparison will help us determine if either method enhances the detection power.

      Minor points:

      On the introduction, I think the authors would find the extensive set of human GWAS/PheWAS resources useful; widespread examples include the GWAS Catalog, Open Targets PheWAS, MR-base, and the FinnGen portal. The GWAS Catalog also has summary statistics submission guidelines, and I think where possible meta-data harmonisation should be similar (not a big thing). Of course, DRGP has a very different structure (line and individuals) and of course, raw data can be freely shown, so this is not a one-to-one mapping.

      Thank you for the suggestion. We will cite these resources in the Introduction and check the GWAS catalog submission guidelines to compare to the ones we are proposing in this paper.

      For some authors coming from a human genetics background, they will be interpreting correlations of phenotypes more in the genetic variant space (eg LD score regression), rather than a more straightforward correlation between DRGP lines of different individuals. I would encourage explaining this difference somewhere.

      We appreciate this potential issue and we will make this distinction clearer in the manuscript to avoid any confusion.

      This leads to an interesting point that the inbred nature of the DRGP allows for both traditional genetic approaches and leveraging the inbred replication; there is something about looking at phenotype correlations through both these lenses, but this is for another paper I suspect that this harmonised pool of data can help.

      We agree with the reviewer and hope that more meta-analyses will be made possible by leveraging the harmonized data that are made available through DGRPool.

      I was surprised the authors did not crunch the number of transcript/gene expression phenotypes and have them in. Is this because this was better done in other datasets? Or too big and annoying on normalisation? I'd explain the rationale to leave these out.

      This is a very good point raised by the reviewer, and this is in fact something that we initially wanted to do. However, to render the analysis fair and robust, it would require processing all datasets in the same way. This implies cataloging all existing datasets and processing them through the same pipeline. Then, it also requires adding a “cell type” or “tissue” layer, because gene expression data from whole flies is obviously not directly comparable to gene expression data from specific tissues or even specific conditions. This would be key information as phenotypes are often tissue-dependent. So, as implied by the reviewer, we deemed this too big of a challenge beyond the scope of the current paper. Nevertheless, we plan to continue investigating this avenue, especially given the strong transcriptomics background of our lab, in a potential follow-up paper.

      I think 25% FDR is dangerously close to "random chance of being wrong". I'd just redo this section at a higher FDR, even if it makes the results less 'exciting'. This is not the point of the paper anyway.

      We agree with the reviewer that this threshold implies a higher risk of false positive results. However, this is not an uncommonly used threshold (Li et al., PLoS biology, 2008; Bevers et al., Nature Metabolism, 2019; Hwangbo et al, Elife, 2023), and one that seems robust enough in our analysis since similar phenotypes are significant in different studies. Nevertheless, we will revisit these results and explore how a more stringent threshold may impact the results.

      I didn't buy the extreme line piece as being informative. Something has to be on the top and bottom of the ranks; the phenotypes are an opportunity for collection and probably have known (as you show) and cryptic correlations. I think you don't need this section at all for the paper and worry it gives an idea of "super normals" or "true wild types" which ... I just don't think is helpful.

      This section of the paper was intended to investigate anecdotal evidence suggesting that certain DGRP lines consistently rank at the top or bottom when examining fitness-related traits. If accurate, this observation could imply that inbreeding might have made these lines generally weaker, potentially introducing bias into studies aimed at uncovering the genetic basis of complex traits. However, as per the analyses presented, we did not discover support for this phenomenon. Nevertheless, we consider this message important to convey. In response to the reviewer's feedback, we intend to provide a clearer explanation of the reasoning behind this section of the paper and its main conclusion.

      I'd say "well-established inversion genotypes and symbiot levels" rather than generic covariates. Covariates could mean anything. You have specific "covariates" which might actually be the causal thing.

      Thank you. We will update the manuscript accordingly.

      I wouldn't use the adjective tedious about curation. It's a bit of a value judgement and probably places the role of curation in the wrong way. Time-consuming due to lack of standards and best practice?

      Thank you. We will update the manuscript accordingly.

      Reviewer #2 (Public Review):

      Summary:

      In the present study, Gardeux et al provide a web-based tool for curated association mapping results from DRP studies. The tool lets users view association results for phenotypes and compare mean phenotype ~ phenotype correlations between studies. In the manuscript, the authors provide several example utilities associated with this new resource, including pan-study summary statistics for sex, traits, and loci. They highlight cross-trait correlations by comparing studies focused on longevity with phenotypes such as oxphos and activity.

      Strengths:

      -Considerable efforts were dedicated toward curating the many DRG studies provided.

      -Available tools to query large DRP studies are sparse and so new tools present appeal

      Weaknesses:

      The creation of a tool to query these studies for a more detailed understanding of physiologic outcomes seems underdeveloped. These could be improved by enabling usages such as more comprehensive queries of meta-analyses, molecular information to investigate given genes or pathways, and links to other information such as in mouse rat or human associations.

      We appreciate the reviewer's kind comments.

      Regarding the tools, we concur with the reviewer that incorporating additional tools could enhance DGRPool and facilitate users in conducting meta-analyses. Therefore, we intend to introduce a gene-centric tool that enables users to query the database based on gene names. Additionally, we will establish links to ortholog databases within the GWAS results, thereby allowing users to extend fly gene associations to other species, if required.

      Furthermore, we have plans to link out to a 'genome browser-like' view (Flybase’s JBrowse tool) of the GWAS results centered around the affected variants/genes. We are considering integrating this feature into the new gene-centric tool as well.

      Another potential downstream analysis we are considering is gene-set enrichment. This analysis would involve assessing the enrichment of genes in Gene Ontology or other pathway databases directly from the GWAS results page.

    2. eLife assessment

      This valuable paper describes a web-based tool for curated association mapping results from the Drosophila genome reference panel. With this tool, one can visualize and view association results for various phenotypes, and the authors provide examples for the use of the resource, including study summary statistics. The evidence for the tool working as advertised is solid, but further improvements to the tool would increase its value for the community.

    3. Reviewer #1 (Public Review):

      This is a technically sound paper focused on a useful resource around the DRGP phenotypes which the authors have curated, pooled, and provided a user-friendly website. This is aimed to be a crowd-sourced resource for this in the future.

      The authors should make sure they coordinate as well as possible with the NC datasets and community and broader fly community. It looks reasonable to me but I am not from that community.

      I have only one major concern which in a more traditional review setting I would be flagging to the editor to insist the authors did on resubmission. I also have some scene setting and coordination suggestions and some minor textual / analysis considerations.

      The major concern is that the authors do not comment on the distribution of the phenotypes; it is assumed it is a continuous metric and well-behaved - broad gaussian. This is likely to be more true of means and medians per line than individual measurements, but not guaranteed, and there could easily be categorical data in the future. The application of ANOVA tests (of the "covariates") is for example fragile for this.

      The simplest recommendation is in the interface to ensure there is an inverse normalisation (rank and then project on a gaussian) function, and also to comment on this for the existing phenotypes in the analysis (presumably the authors are happy). An alternative is to offer a kruskal test (almost the same thing) on covariates, but note PLINK will also work most robustly on a normalised dataset.

      Minor points:<br /> On the introduction, I think the authors would find the extensive set of human GWAS/PheWAS resources useful; widespread examples include the GWAS Catalog, Open Targets PheWAS, MR-base, and the FinnGen portal. The GWAS Catalog also has summary statistics submission guidelines, and I think where possible meta-data harmonisation should be similar (not a big thing). Of course, DRGP has a very different structure (line and individuals) and of course, raw data can be freely shown, so this is not a one-to-one mapping.

      For some authors coming from a human genetics background, they will be interpreting correlations of phenotypes more in the genetic variant space (eg LD score regression), rather than a more straightforward correlation between DRGP lines of different individuals. I would encourage explaining this difference somewhere.

      This leads to an interesting point that the inbred nature of the DRGP allows for both traditional genetic approaches and leveraging the inbred replication; there is something about looking at phenotype correlations through both these lenses, but this is for another paper I suspect that this harmonised pool of data can help.

      I was surprised the authors did not crunch the number of transcript/gene expression phenotypes and have them in. Is this because this was better done in other datasets? Or too big and annoying on normalisation? I'd explain the rationale to leave these out.

      I think 25% FDR is dangerously close to "random chance of being wrong". I'd just redo this section at a higher FDR, even if it makes the results less 'exciting'. This is not the point of the paper anyway.

      I didn't buy the extreme line piece as being informative. Something has to be on the top and bottom of the ranks; the phenotypes are an opportunity for collection and probably have known (as you show) and cryptic correlations. I think you don't need this section at all for the paper and worry it gives an idea of "super normals" or "true wild types" which ... I just don't think is helpful.

      I'd say "well-established inversion genotypes and symbiot levels" rather than generic covariates. Covariates could mean anything. You have specific "covariates" which might actually be the causal thing.

      I wouldn't use the adjective tedious about curation. It's a bit of a value judgement and probably places the role of curation in the wrong way. Time-consuming due to lack of standards and best practice?

    4. Reviewer #2 (Public Review):

      Summary:<br /> ​In the present study, Gardeux et al provide a web-based tool for curated association mapping results from DRP studies. The tool lets users view association results for phenotypes and compare mean phenotype ~ phenotype correlations between studies. In the manuscript, the authors provide several example utilities associated with this new resource, including pan-study summary statistics for sex, traits, and loci. They highlight cross-trait correlations by comparing studies focused on longevity with phenotypes such as oxphos and activity.

      Strengths:<br /> -Considerable efforts were dedicated toward curating the many DRG studies provided.<br /> -Available tools to query large DRP studies are sparse and so new tools present appeal

      Weaknesses:<br /> The creation of a tool to query these studies for a more detailed understanding of physiologic outcomes seems underdeveloped. These could be improved by enabling usages such as more comprehensive queries of meta-analyses, molecular information to investigate given genes or pathways, and links to other information such as in mouse rat or human associations.

    1. Reviewer #1 (Public Review):

      Summary: A well-executed series of experiments that will likely be of immense interest to (a) vector-borne disease researchers and (b) gram-negative sepsis/bacteremia researchers. The study uses comparative transcriptomics to begin probing what makes Peromyscus leucopus a unique host for numerous pathogens. Most issues with the paper are trivial, relating to descriptions of statistical cutoffs. While the paper does not provide mechanistic insight into how P. leucopus restrains its immune response to LPS or other microbial invaders, it is likely that this paper will be frequently consulted by researchers trying to understand that phenomenon.

      Strengths:

      o Use of outbred M. musculus is a commendable choice for the studies here.<br /> o Excellent decision by the authors to use their published dataset (with appropriate statistical normalization) to improve their statistical power to examine sex-biased gene expression. Is it possible to go one step further and briefly incorporate their prior BALB/c data to see how the BALB/c compare to the outbred mice. This could perhaps be just a PCA plot to see if they cluster with the outbred mice and/or Peromyscus, or are separate.<br /> o The correlations and ratios used to try to understand immune cell dynamics are clever and likely reflect interesting biology, but caution should be used when interpreting these indirect measures. As there are no tools for cell separation in P. leucopus, the authors should continue to include these data to stimulate ideas in the field, but readers should understand the "conclusions" are hypotheses due to the nature of the bulk RNAseq.

      Weaknesses:

      o Supplemental Table 1 only lists genes that passed the authors statistical thresholds. The full list of genes detected in their analysis should be included with read counts, statistics, etc. as supplemental information<br /> o While P. leucopus is a critical reservoir for B. burgdorferi, caution should be taken in directly connecting the data presented here and the Lyme disease spirochete. While it's possible that P. leucopus have a universal mechanism for limiting inflammation in response to PAMPs, B. burgdorferi lack LPS and so it is also possible the mechanisms that enable LPS tolerance and B. burgdorferi tolerance may be highly divergent.<br /> o Statistical significance is binary and p-values should not be used as the primary comparator of groups (e.g. once a p-value crosses the deigned threshold for significance, the magnitude of that p-value no longer provides biological information). For instance, in comparing GO-terms, the reason for using of high p-value cutoffs ("None of these were up-regulated gene GO terms with p values < 1011 for M. musculus.") to compare species is unclear. If the authors wish to compare effect sizes, comparing enrichment between terms that pass a cutoff would likely be the better choice. Similarly, comparing DEG expression by p-value cutoff and effect size is more meaningful than analyses based on exclusively on p-value: "Of the top 100 DEGs for each species by ascending FDR p value." Description in later figures (e.g. Figure 4) is favored.<br /> o The ability to use of CD45 to normalize data is unclear. Authors should elaborate both on the use of the method and provide some data how the data change when they are normalized. For instance, do correlations between untreated Mus and Peromyscus gene expression improve? The authors seem to imply this should be a standard for interspecies comparison and so it would be helpful to either provide data to support that or, if applicable, use of the technique in literature should be referenced.<br /> o Regarding the ISG data-is a possible conclusion not that Peromyscus don't upregulate the antiviral response because it's already so high in untreated rodents? It seems untreated Peromyscus have ISG expression roughly equivalent to the LPS mice for some of the genes. This could be compared more clearly if genes were displayed as bar plots/box and whisker plots rather than in scatter plots. It is unclear why the linear regression is the key point here rather than normalized differences in expression.<br /> o Some sections of the discussion are under supported:<br />  The claim that low inflammation contributes to increased lifespan is stated both in the introduction and discussion. Is there justification to support this? Do aged pathogen-free mice show more inflammation than aged Peromyscus?<br />  The claim that reduced Peromyscus responsiveness could lead to increased susceptibility to infection is prominently proposed but not supported by any of the literature cited.<br />  References to B. burgdorferi, which do not have LPS, in the discussion need to ensure that the reader understands this and the potential that responses could be very different.

    2. eLife assessment

      This is an important study that tries to shed light on why the deer mouse is host to many diverse pathogens. The results are convincing and rely on state of the art transcriptomic analysis. The findings will be of interest to the biologists, ecologists and infectious disease researchers.

    3. Reviewer #2 (Public Review):

      Milovic, Duong, and Barbour investigate the inflammatory response of three species of small mammals (P. leucopus, M. musculus, and R. norvegicus) to endotoxin lipopolysaccharide (LPS) injection via genome-wide transcriptomics from blood samples. Understanding the inflammation response of P. leucopus is of importance as they are a reservoir for several pathogens. The study is a thorough, controlled, well researched analysis that will be valuable for designing and interpreting future studies. The authors discuss the limitations of the data and the potential directions. Clearly P. leucopus respond differently to the LPS exposure which is very interesting and opens the door for numerous other comparative studies.

      The conclusions of the manuscript are thoughtful and mostly supported by the data, but there are a couple of points for clarification.

      1) How were the number of animals for each experiment selected? Was a power analysis conducted?

      2) The authors conducted a cursory evaluation of sex differences of P. leucopus and reported no difference in response except for Il6 and Il10 expression being higher in the males than the females in the exposed group. The data was not presented in the manuscript. Nor was sex considered for the other two species. A further discussion of the role that sex could play and future studies would be appreciated.

      3) The ratio of Nos2 and Arg1 copies for LPS treated and control P. leucopus and M.musculus in Table 3 show that in P. leucopus there is not a significant difference but in M.musculus there is an increase in Nos2 copies with LPS treatment. The authors then used a targeted RNA-seq analysis to show that in P. leucopus the number of Arg1 reads after LPS treatment is significantly higher than the controls. These results are over oversimplified in the text as an inverse relationship for Nos2/Arg1 in the two species.

    1. Author Response

      We would like to thank reviewers and editors for their thoughtful and constructive review of our manuscript. Below we have provided responses to specific points in the reviewers’ comments and eLIFE assessment, highlighting areas of the manuscript that will be edited for clarity and where efforts will be made to provide data to address reviewer concerns upon a future resubmission.

      eLife assesment:

      The authors report that Dbp5 functions in parallel with Los1 in tRNA export, in a manner dependent on Gle1 and requiring the ATPase cycle of Dbp5, but independent of Mex67, Dbp5's partner in mRNA export. The evidence for this conclusion is still incomplete, as is the biochemical evidence that Dbp5 interacts directly with tRNA in vitro with Gle1 and co-factor InsP6 triggering Dbp5 ATPase activity in the Dbp5-tRNA complex. The evidence that Dbp5 interacts with tRNA in cells independently of Los1, Msn5 and Mex67 is, however, solid.

      We intend to edit the text to make clear our conclusions and accommodate clarifications on a few details of this assessment.

      (1) We would clarify that our data supports a model in which Dbp5 recruitment to tRNA is independent of Mex67 as an adapter in cells; however, this does not mean that Mex67 and Dbp5 do not still co-function in tRNA export. For example, it is possible Dbp5 and Mex67 could still co-function in the same pathway, but instead of Dbp5 working down stream of Mex67, Dbp5 may in fact work upstream as an adapter for Mex67. Edits to the text will be made to ensure this distinction is clear and highlight the possibility for future investigation to elucidate this relationship.

      (2) We would like to highlight that based on structural and biochemical data detailing synergistic activation of Dbp5 ATPase cycle by Gle1/InsP6 and single stranded RNA, it is difficult to imagine a scenario where the apparent synergistic activation of Dbp5 ATPase cycle by tRNA and Gle1/InsP6 (Figure 5) is achieved independent of direct RNA binding. For this reason, we still support the claim that the observed synergistic activation, in combination with other in-vivo and in-vitro data provided in the manuscript, support a model where Dbp5 directly binds tRNA. However, we intend to edit the text to highlight this nuance and potential alternative conclusions based on reviewer feedback.

      Reviewer #1 (Public Review):

      “At least one result suggests that the idea of these pathways in parallel may be too simplistic as deletion of the LOS1 gene, which is not essential decreases the interaction of tRNA export substrate with Dbp5 (Figure 2A). If the two pathways were working in parallel, one might have expected removing one pathway to lead to an increase in the use of the other pathway and hence the interaction with a receptor in that pathway…. The obvious missing experiment here with respect to genetics is the test of whether deletion of the MSN5 gene in the cells, which combines deletion of LOS1 and the dbp5_R423A allele, shown in Figure 1D would be lethal…. The authors provide evidence of a model where the helicase Dbp5 plays a role in tRNA export from the nucleus. Further evidence is required to determine whether Dbp5 could function in the same pathway as the previously defined tRNA export receptors, Los1 and Msn5. There are genetic tests that could be performed to explore this question. Some of the biochemistry presented would show when Los1 is absent that the interaction of Dbp5 with tRNA decreases, which could support a model where Dbp5 plays a role in coordination with Los1”

      We agree that this is an important point that should be made clear and discussed in the text. We also agree that further experiments would be needed to be to confirm Dbp5 functions broadly in tRNA export in parallel to both Msn5 and Los1. We will aim to address these points in resubmission and discuss possible alternative conclusions of the presented results.

      Reviewer #1 (Public Review):

      “While some of the binding assays show rather modest band shifts (Figure 4B for example), the data in Figure 4A showing that there is no binding detected unless a non-hydrolyzable ATP analogue is employed, argues for specificity in nucleic acid binding. The question that does arise is whether the binding is specific for tRNA.”

      The specificity of the in-vitro interactions of Dbp5 are an important point of discussion. We will work to expand the topic of specificity of the in-vitro experiments during resubmission.

      Reviewer #1 (Public Review):

      “With the exception of the binding studies, which also employ a mixture of yeast tRNAs, this study relies primarily on a single tRNA species to come to the conclusions drawn. Many other studies have used multiple tRNAs to explore whether pathways characterized are generalizable to other tRNAs.“

      It was previously shown that Dbp5 functions to support the export of multiple tRNA species (https://doi.org/10.7554/eLife.48410). As such, we agree that additional tRNAs should be tested to explore whether phenotypes reported here are also generalizable to other tRNAs. We will add data targeting additional tRNAs during resubmission.

      Reviewer #2 (Public Review):

      “there are some pieces of data that are misinterpreted. (Figure 1A and B look the same; in Fig 1E, the DAPI staining is abnormal; in Fig 4 the bands can't be seen.)”

      Figure 1A and B represent separate experiments, showing that deletion of Los1 does not alter Dbp5 localization and conversely loss of Dbp5 does not alter Los1 localization. As such localization patterns under loss-of-function conditions look the same as wild-type localization for each protein respectively as noted. We believe that we have come to the same conclusion as the reviewer on Figure 1A and B (and this data is not misinterpreted), but also understand this panel will need to be adjusted for clarity and readability. We will make efforts to edit this figure and accompanying text make the data and conclusions clearer, including addressing the EMSAs in figure 4 and associated text for clarity.

    2. eLife assessment

      The work is a useful contribution to understanding the mechanism of nuclear export of tRNA in budding yeast. The authors report that Dbp5 functions in parallel with Los1 in tRNA export, in a manner dependent on Gle1 and requiring the ATPase cycle of Dbp5, but independent of Mex67, Dbp5's partner in mRNA export. The evidence for this conclusion is still incomplete, as is the biochemical evidence that Dbp5 interacts directly with tRNA in vitro with Gle1 and co-factor InsP6 triggering Dbp5 ATPase activity in the Dbp5-tRNA complex. The evidence that Dbp5 interacts with tRNA in cells independently of Los1, Msn5 and Mex67 is, however, solid.

    3. Reviewer #1 (Public Review):

      Summary:<br /> This study focuses on the defining cellular pathways critical for tRNA export from the nucleus. While a number of these pathways have been identified, the observation that the primary transport receptors identified thus far (Los1 and Msn5) are not essential and that cells are viable even when both the genes are deleted supports the idea that there are as yet unidentified mediators of tRNA export from the nucleus. This study implicates the helicase Dbp5 in one of these parallel pathways arguing that Dbp5 works in a pathway that is independent of Los1 and/or Msn5. The authors present genetic data to support this conclusion. At least one result suggests that the idea of these pathways in parallel may be too simplistic as deletion of the LOS1 gene, which is not essential decreases the interaction of tRNA export substrate with Dbp5 (Figure 2A). If the two pathways were working in parallel, one might have expected removing one pathway to lead to an increase in the use of the other pathway and hence the interaction with a receptor in that pathway. The authors provide solid evidence that Dbp5 interacts with tRNA directly and that the addition of the factor Gle1 together with the previously identified co-factor InsP6 can trigger helicase activity and release of tRNA. The combination of in vivo studies and biochemistry provides evidence to consider how Dbp5 contributes to the export of tRNA and more broadly adds to the conversation about how coding and non-coding RNA export from the nucleus might be coordinated to control cell physiology.

      Strengths and weaknesses:<br /> A major strength of this manuscript is the multi-pronged approach to explore a potential role for the helicase Dbp5 in one of the multiple export pathways for tRNA from the nucleus.

      The obvious missing experiment here with respect to genetics is the test of whether deletion of the MSN5 gene in the cells, which combines deletion of LOS1 and the dbp5_R423A allele, shown in Figure 1D would be lethal. This key experiment would lend substance to the argument that Dbp5 functions in a tRNA export pathway that is parallel to the Los1 and Msn5 pathways.

      While some of the binding assays show rather modest band shifts (Figure 4B for example), the data in Figure 4A showing that there is no binding detected unless a non-hydrolyzable ATP analogue is employed, argues for specificity in nucleic acid binding. The question that does arise is whether the binding is specific for tRNA.

      With the exception of the binding studies, which also employ a mixture of yeast tRNAs, this study relies primarily on a single tRNA species to come to the conclusions drawn. Many other studies have used multiple tRNAs to explore whether pathways characterized are generalizable to other tRNAs.

      The authors provide evidence of a model where the helicase Dbp5 plays a role in tRNA export from the nucleus. Further evidence is required to determine whether Dbp5 could function in the same pathway as the previously defined tRNA export receptors, Los1 and Msn5. There are genetic tests that could be performed to explore this question. Some of the biochemistry presented would show when Los1 is absent that the interaction of Dbp5 with tRNA decreases, which could support a model where Dbp5 plays a role in coordination with Los1.

      This work allows insight into key questions which still remain about the multiple pathways that are required for tRNA trafficking as well as how transport pathways for coding and non-coding RNAs might be coordinated. These questions are important as many of these pathways may be regulated in response to cellular conditions or during development and defining the fundamental pathways will be critical to understanding these dynamic processes.

    4. Reviewer #2 (Public Review):

      This submission is about the role of Dbp5/Gle1 in tRNA export. The manuscript provides data showing that Dbp5/Gle1 are involved in tRNA export from the nucleus which is an essential process critical to translation. The authors provide data that largely supports conclusions, however, there are some pieces of data that are misinterpreted. (Figure 1A and B look the same; in Fig 1E, the DAPI staining is abnormal; in Fig 4 the bands can't be seen.)

      Additionally, the methods used are fairly standard so the article does not contain any new technical achievements.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We greatly appreciate the positive feedback of the reviewers and have modified the manuscript to address their comments, including changes to the text, figures, and methods. We believe that these revisions have strengthened and improved the manuscript. Reviewers’ comments in blue and detailed responses in black are below.

      Reviewer #1 Weaknesses:

      • Is "function" of the ISNs to balance "nutrient need" or osmolarity? Balancing hemolymph osmolarity for physiological homeostasis is conceptually different from balancing thirst and hunger.

      We have added the following text to the introduction to address this: “Thus, the ISNs sense both AKH and hemolymph osmolality, arguing that they balance internal osmolality fluctuations and nutrient need (Jourjine, Mullaney et al., 2016).” (ln 80-82).

      • The final schematic nicely sums up how the different peptidergic pathways might work together, but it is unclear which connections are empirically-validated or speculative. It would be informative to show which parts of the model are speculative versus validated. For example, does FAFB volume synapse = functional connectivity and not just anatomical proximity? A bulk of the current manuscript relies on "synapses of relatively high confidence" (according to Materials and methods: line 522). I recommend distinguishing empirically tested & predicted connections in the final schematic, and maybe reword/clarify throughout the manuscript as "predicted synaptic partners"

      We modified the schematic to clarify EM based connections versus functionally validated connections. We also clarified the EM predicted synaptic partners, using “predicted synaptic partners” throughout the manuscript.

      Reviewer #2 Areas for further development:

      • Does BIT inhibit all of the IPCs or some of them? I think it is critical to indicate the ROIs used for each neuron in the methods. Which part of the neuron is used for imaging experiments? Dendrites, cell bodies, or synaptic terminals?

      ROIs used for quantification are described in the figure legends: “ArcLight response of BiT soma…” (Fig 2, Fig S2), “Calcium responses of CCHa2R-RA neurites in SEZ…” (Fig 4), “Calcium response of CCHa2R-RA SEZ neurites…” (Fig S4), “Calcium response of CCAP neurites…” (Fig 5, Fig S5), “Calcium response of all IPC somas…” (Fig S3). We have added ROIs used for quantification to the ‘In vivo calcium imaging’ and the ‘In vivo voltage imaging’ methods sections (ln 493-494).

      • The discussion section is not giving big picture explanation of how these neurons work together to regulate sugar and water ingestion. Silencing and activation experiments are good, but without showing the innate activity of these neural groups during ingestion, it is not clear what their functions are in terms of regulating fly behavior.

      We agree that how these peptidergic neurons coordinately regulate feeding is unclear. As peptide signals may act at a distance and may cause long-lasting neural activity state changes, studying their integration over space and time is challenging. Acute imaging during feeding would only in part address this challenge, as cumulative changes in nutrient need signals may impart circuit changes that are not apparent by monitoring the acute activity of peptidergic neurons. We modified a paragraph in the discussion to address this (ln 434-443).

      “Overall, our work sheds light on neural circuit mechanisms that translate internal nutrient abundance cues into the coordinated regulation of sugar and water ingestion. We show that the hunger and thirst signals detected by the ISNs influence a network of peptidergic neurons that act in concert to prioritize ingestion of specific nutrients based on internal needs. We hypothesize that multiple internal state signals are integrated in higher brain regions such that combinations of peptides and their actions signify specific needs to drive ingestion of appropriate nutrients. As peptide signals may act at a distance and may cause long-lasting neural activity state changes, studying their integration over space and time is a future challenge to further illuminate homeostatic feeding regulation.”

      Reviewer #1 (Recommendations For The Authors):

      • For the final schematic figure, it may be informative to include nanchung and AKHR in the schematic.

      We now include this (Fig 6).

      • For the ingestion duration with optogenetic activation, I don't think the right way to represent the data is by normalizing them to the no LED control. I think it should show raw ingestion time. I understand that the normalized data make the figure "cleaner" (no need to show +/- LED separately) but I think visualization of the raw data is important.

      We now include this in a new Supplemental Figure (Fig S6).

      • Methods for ingestion with optogenetic activation should be detailed in the Methods section.

      We expanded upon this in the ‘Temporal consumption assay (TCA)’ methods section. (ln 461-466).

      Reviewer #2 (Recommendations For The Authors):

      1) I think the authors are not following the recommendations of the Flywire community which recommends that people who contributed to the tracing of neurons are offered authorship in the published papers. I see the authors are thanking other lab members who have done tracing for the neurons described in this study, but I would like them to clarify whether they are following the guidelines provided by Flywire.

      We followed the Flywire guidelines and contacted all Flywire users contributing more that 10% to neuron edits for permission to publish with acknowledgements. (see Flywire guidelines https://docs.google.com/document/d/1bUkOB5JnT3u__JDvAoVDHJ3zr5NXQtV_63yx2w6Tcc/edit).

      2) The method section for voltage imaging is missing.

      We now include a section on voltage imaging (ln 496-498).

      3) ROIs for imaging are not indicated in the methods or in the figures. It is hard to judge what is the origin of neural activity plotted in the figures; are they imaging cell bodies, dendrites, or axons?

      ROIs used for quantification are described in the figure legends: “ArcLight response of BiT soma…” (Fig 2, Fig S2), “Calcium responses of CCHa2R-RA neurites in SEZ…” (Fig 4), “Calcium response of CCHa2R-RA SEZ neurites…” (Fig S4), “Calcium response of CCAP neurites…” (Fig 5, Fig S5), “Calcium response of all IPC somas…” (Fig S3). We have added ROIs used for quantification to the ‘In vivo calcium imaging’ and the ‘In vivo voltage imaging’ methods sections (ln 493-494).

    2. eLife assessment

      This important study identifies and characterizes a broad peptidergic network that coordinates nutrient-specific consumption needs for food or water. Using state-of-the-art methodology the authors combine a well-balanced set of exploratory anatomical analyses with rigorous functional experimental approaches to examine how ingestion is regulated based on internal needs. These significant and convincing new findings are of broad interest to the neuroscience field.

    3. Reviewer #1 (Public Review):

      This work by Gonzalez-Segarra et al. greatly extends previous research from the same group that identified ISNs as a key player in balancing nutrition and water ingestion. Using well-balanced sets of exploratory anatomical analyses and rigorous functional experiments, the authors identify and compile various peptidergic circuits that modulate nutrient and/or water ingestion. The findings are convincing and the experiments rigorous.

      Strengths:<br /> - The authors complement anatomically-reconstructed and functionally-validated neuronal connectivity with extensive and intensive morphological and synaptic reconstruction.

      - Neurons and genes involved in specific components of feeding control are undoubtedly challenging, because numerous neurons and circuits redundantly and reciprocally regulate the same components of feeding behavior. This work dissociates how multiple, parallel and interconnected, peptidergic circuits (dilp3, CCHa2, CCAP) modulate sucrose and water ingestion, in tandem and in parallel.

      - The authors address some of the incongruencies/discrepancies in current literature (IPCs) and try to provide explanations, rather than ignoring inconsistent findings.

      Weaknesses:<br /> - The authors have addressed several weaknesses of the paper in the revised text.

    4. Reviewer #2 (Public Review):

      In this manuscript, González-Segarra et al. investigated how ISNs regulate sugar and water ingestion in Drosophila. In their previous paper, authors have shown that inhibiting neurotransmission in ISNs has opposite effects on sugar and water ingestion. In this manuscript, the authors first identified the effector molecules released by ISNs. Their RNAi screen found that, surprisingly, ISNs use ilp3 as a neuromodulator. Next, using light and electron microscopy, they investigated the downstream neural circuits ISNs connect with to regulate water or sugar ingestion. These analyses identified a new group of neurons named Bilateral T-shaped neurons (BiT) as the main output of ISNs, and several other peptidergic neurons as downstream effectors of ISNs. While BiT activity regulated both sugar and water ingestion, BiT downstream neurons, such as CCHa2R, only impacted water ingestion. These results suggested that ISNs might interact with distinct neural circuits to control sugar or water ingestion. The authors also investigated other ISN downstream neurons, such as ilp2 and CCAP, and revealed that their activity also contributes to ingestive behaviors in flies.

      Major strengths:<br /> 1. This manuscript presents a comprehensive investigation of the downstream neurons connected to ISNs.

      2. The authors have identified and characterized a diverse set of peptidergic neurons that regulate ingestive behaviors in the fly brain.

      Weaknesses:<br /> 1. Only one RNAi hairpin is used to knock down Ilp3 in ISNs? There is a concern about off-targeting effects without the presence of another hairpin or mutant data. Do ilp3 mutants also have similar defects in sugar/water ingestion compared to ISN ilp3 knockdown?

      2. Throughout the paper, authors use either voltage or calcium sensors without explaining why they choose to use either method to determine the functional connectivity between neurons.

      3. How these diverse sets of peptidergic neurons interact to regulate ingestive behaviors is unclear and requires further investigation.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Thank you very much for your advices and comments. We took your suggestion into consideration and decided to modify it as you suggested. We will add more data and analysis on this topic in the article to make the exposition fuller.

      1) There are different cells in liver tissue, in which BATF protein is expressed most.

      Based on the analysis of single-cell public data (GEO accession: GSE129516), BATF is expressed in every cell cluster in the liver, with the highest expression in T cells and the least in cholangiocytes (Author response image 1).

      Author response image 1.

      2) The statistical data should be provided to support the liver specific over-expression of BATF.

      The results of WB in figure2 (C & E) have been quantified and relevant content has been corrected.

      3) For in vivo study, food intake is key data to exclude the change of energy intake.

      Feed intake related result plots have been added to figureS2A.

      4) For Fig.6 Since PD1 are also highly expressed in heart and spleen, how to exclude the effect of PD1 antibody on these tissues?

      According to the images of the heart (Author response image 2 left) and spleen (Author response image 2 right) during mouse dissection, the morphology and size of the two organs were similar in HFD-CN and HFD-PD1 group. Moreover, relevant literature indicated that PD-1 blockade had little impact on the number and function of transferred T cells within the spleen(Peng et al.),and anti-PD-1 had no effect on mouse splenic cell proliferation (Shindo et al.).Du et al. showed in their study that single use of PD-1 antibody (10 mg/kg, once every three days, for 4 weeks) did not affect mouse heart (Du et al.). Both our results and related literature indicated that PD 1 antibody should not have adverse effects on the heart and spleen.

      Author response image 2.

      Reviewer #2 (Public Review):

      Thank you very much for your advices and comments. We have seriously considered your suggestion and will focus on it in our future research.

      Weakness

      1) BATF protein is also abundantly expressed in control hepatocyte, but the knockdown of BATF had no effect on lipid accumulation. Besides, the expression of BATF was elevated by high fat diets. So it will be interesting to investigate its role in the liver by using its hepatic conditional knockout mice.

      We appreciate the reviewers' suggestion to investigate other functions of BATF in the liver besides its protective role in a high-fat environment. However, we did not use BATF knockout mice in this study because our data indicated that BATF knockdown had no effect on lipid accumulation. We will pursue further research and validation in future studies.

      2) The data for the direct regulation of BATF on PD1 and IL-27 is not enough, it is better to carry out CHIP experiment to further confirm it.

      Thank you for your valuable comments. The article by Kevin Man et al. found that, upregulation of transcription factor BATF regulates PD1 expression and repairs impaired cellular metabolism (Man et al.). This confirms that BATF has a regulatory effect on PD1. And in our manuscript, the dual luciferase reporter assay of BATF and PD1 confirmed that BATF can regulate the expression of PD1(Fig 5G). This confirms that BATF has a regulatory effect on PD1. We do not have conclusive evidence for a direct interaction between BATF and IL-27 yet, but there are some relevant studies that support their connection. For instance, BATF and IRF1 were found to be transcription factors induced early by IL-27 treatment, and essential for Tr1 cell differentiation and function, both in vitro and in vivo (Karwacz et al.). Moreover, Zhang et al. identified BATF as one of the transcription factors regulating IL-27 expression by transcription factor prediction and RNA sequencing analysis (Zhang et al.). These results lay the foundation for elucidating the regulation of PD1 and IL-27 by BATF.

      Reviewer #2 (Recommendations For The Authors):

      1. In Figure 3D, which subunit of AMPK was tested, alpha, beta or gamma?

      Thank you for your valuable comments. We detected the expression level of AMPKα1, We have modified the relevant names in the figure and manuscript.

      Reference:

      Du, Shisuo, et al. "Pd-1 Modulates Radiation-Induced Cardiac Toxicity through Cytotoxic T Lymphocytes." 13.4 (2018): 510-20. Print.

      Karwacz, Katarzyna, et al. "Critical Role of Irf1 and Batf in Forming Chromatin Landscape During Type 1 Regulatory Cell Differentiation." 18.4 (2017): 412-21. Print.

      Man, Kevin, et al. "Transcription Factor Irf4 Promotes Cd8+ T Cell Exhaustion and Limits the Development of Memory-Like T Cells During Chronic Infection." 47.6 (2017): 1129-41. e5. Print.

      Peng, Weiyi, et al. "Pd-1 Blockade Enhances T-Cell Migration to Tumors by Elevating Ifn-Γ Inducible Chemokinespd-1 Blockade Improves the Effectiveness of Act for Cancer." 72.20 (2012): 5209-18. Print.

      Shindo, Yuichiro, et al. "Interleukin 7 and Anti-Programmed Cell Death 1 Antibody Have Differing Effects to Reverse Sepsis-Induced Immunosuppression." 43.4 (2015): 334. Print.

      Zhang, Huiyuan, et al. "An Il-27-Driven Transcriptional Network Identifies Regulators of Il-10 Expression across T Helper Cell Subsets." 33.8 (2020): 108433. Print.

    2. eLife assessment

      This valuable study presents reports on the role of the transcription factor BATF and its target PD1 in lipid metabolism including a model of nonalcoholic fatty liver disease (NAFLD). Overall, the evidence supporting the conclusions is convincing. The work will be of interest to medical biologists working on NAFLD.

    3. Reviewer #1 (Public Review):

      The authors investigated the function of BATF in hepatic lipid metabolism. They found BATF alleviated high-fat diet (HFD)-induced hepatic steatosis. In addition, BATF could inhibit programmed cell death protein (PD)1 expression induced by HFD. By using over expression and transcriptional activity analysis, this study confirmed that BATF regulates fat accumulation by inhibiting PD1 expression and promoting energy metabolism. Then, they found PD1 antibodies alleviated hepatic lipid deposition. These data identified the regulatory role of BATF in hepatic lipid metabolism and that PD1 is a target for alleviation of NAFLD. The conclusions of this manuscript are supported by the data.

    4. Reviewer #2 (Public Review):

      In this manuscript, authors firstly investigated the role of a transcriptional factor BATF in hepatic lipid metabolism both in vivo and in vitro. By using a AAV transfection to overexpress BATF in liver, the mice with overexpression of BATF resisted the high fat diets induced obesity and attenuated the hepatic steatosis. Mechanically, the PD1 mediated its effect on lipid accumulation in hepatocyte and IL-27 mediated its effect on adiposity reduction in vivo.

      Strengths:

      1) This work found the transcription factor BATF was positive to reduce hepatic lipid accumulation and offered a potential target to treat NAFLD.<br /> 2) PD1 antibody is always used to treat cancer, authors here have developed its new function in metabolic disease. PD1 antibody could help mice to combat obesity and hepatic steatosis induced by high fat diets.<br /> 3) Overexpression of BATF in the liver not only decreased the lipid accumulation in the liver but also reduced the fat mass. IL-27 secretion in the liver was enhanced to affect the adipose tissue. The cross talk in liver and adipose tissue was also validated in this paper.

    1. eLife assessment

      This manuscript uses genetic mouse modeling to delve deeper into a rare human disease of aging. The targeted approaches employed lend greater pathophysiologic insight and makes this paper valuable to the field art large. Additionally, the approaches used are rigorous and solid in supporting their conclusions. Some minor weaknesses were noted along with suggestions to add greater clarity.

    2. Reviewer #1 (Public Review):

      The article by Reversade and colleagues reports new mutations in the PYCR1 in a progeroid disease associated with premature skin aging. Using human cell culture and a newly generated mouse model of PYCR1deficiency they identify a role for this factor in maintaining dermal homeostasis and ECM production. I have some minor concerns about the role of PYCR1 in fibroblast survival vs function and the quantification of western blots.

    3. Reviewer #2 (Public Review):

      Summary:<br /> Sotiropoulou et al. present an interesting study of an incredibly rare premature aging disease (De Barsy syndrome), examining both the underlying mechanisms at play behind the condition as well as how that biology may have a larger role in understanding features of normal aging, and in particular, human skin aging. The authors link one of the underlying genetic defects in De Barsy syndrome (PYCR1 mutations) to its phenotypic manifestations and then extrapolate those findings to present more preliminary data to suggest that a loss of PYCR1 may be a biomarker of normal human skin aging.

      Strengths:<br /> - The study is important as De Barsy syndrome is challenging to study given its rarity, thus making it an understudied condition. Here the authors combine both human patient samples and murine models to offer a nice contribution to further understanding the pathophysiology of this disease.

      - The authors are able to link some of the observed features in De Barsy syndrome preliminarily to more common aging models and processes (senescence, human skin dermal aging). They nicely show that the loss of Pycr1 in mice can provoke thinning of the dermis of mice while not affecting the epidermis. Furthermore, they present compelling data to suggest that Pycr2 may be compensating in mice (while not in humans) and this may contribute to the differences in lifespan observed between the mice and humans.

      - Should these results be further verified, this could suggest that further study of Pycr1 and Pycr2 biology may offer new insights into aging and senescence in other tissues.

      Weaknesses:<br /> - Some of the data appears preliminary and seems like it needs further analysis as described further below in my suggestions for the authors:

      1) While the authors report that there is no difference in the lifespan of the Pycr1-KO mice, can they report whether there was overall weight loss or any size differences between the mice? This is helpful particularly when comparing the dermal thickness as well as considering how the global loss of Pycr1 may affect overall systemic health.

      2) In Figure S2E, the comparison "pairs" seems somewhat arbitrarily chosen and it seems from the quantifications of these pairs that depending upon which young sample you compare to which old sample, you may end up with differing results. I think the more appropriate way to make this quantitative comparison would be to average the young samples and average the old samples and then compare them and perform statistics. This seems critical to really assess whether PYCR1 loss would be a consistent marker of human skin dermal aging. Additionally, it would be helpful to also look at Pycr2 expression in the normal young versus old dermis to see if the reported difference in Pycr1 is really something specific for Pycr1 and not something more general.

      3) Are the labels mixed up in Figures 1J and 1K or am I reading it incorrectly? From what I can see the graph is showing that the dermal thickness and collagen intensity is higher in the Pycr1-/- mice. Similarly, the authors state that there is "significantly less collagen fiber staining", although in Figure S1G neither the quantification of collagen I or collagen III are shown to be significant. These discrepancies need to be discussed or corrected.

      4) Can the authors speculate further on why Pycr2 is also diminished in human patients (while it clearly remains present in the mice).

      5) Can the authors comment on whether other canonical senescence features are seen in De Barsy syndrome (p16 positivity, senescence associated secretory phenotype, etc.)? Along these lines, there is an abundance of publicly available RNA-seq datasets from various forms of senescent cells. It would be interesting to examine these and see whether there is any loss of expression of PYCR1 or PYCR2 in these data, or is the loss of PYCR1 only seen at the protein level?

    1. eLife assessment

      This important paper provides insights into the role of the inflammasomes in the control of Salmonella replication within human macrophages. Solid evidence is provided that in the absence of inflammasome signaling that Salmonella replicated in the macrophage cytosol. This paper will be of broad interest to cell biologists, immunologists and microbiologists.

    2. Reviewer #1 (Public Review):

      Summary:

      In this excellent manuscript by Egan et al., the authors very carefully dissect the roles of inflammasome components in restricting Salmonella Typhimurium (STm) replication in human macrophages. They show that caspase-1 is essential to mediating inflammasome responses and that caspase-4 contributes to bacterial restriction at later time points. The authors show very clear roles for the host proteins that mediate terminal lysis, gasdermin D and ninjurin-1. The unique finding in this study is that in the absence of inflammasome responses, Salmonella hypereplicates within the cytosol of macrophages. These findings suggest that caspase-1 and possibly caspase-4 play roles in restricting the replication of Salmonella in the cytosol as well as in the Salmonella containing vacuole.

      Strengths:

      1) The genetic and biochemical approaches have shown for the first time in human macrophages that the caspase-1-GSDMD-NINJ1 axis is very important for restricting intracellular STm replication. In addition, they demonstrate a later role for Casp4 in control of intracellular bacterial replication.

      2) In addition, they show that in macrophages deficient in the caspase-1-GSDMD-NINJ1 axis that STm are found replicating in the cytosol, which is a novel finding. The electron microscopy is convincing that STm are in the cytosol.

      3) The authors go on to use a chloroquine resistance assay to show that inflammasome signaling also restricts STm within SCVs in human macrophages.

      4) Finally, they show that the Type 3 Secretion System encoded on Salmonella Pathogenicity Island 1 contributes to STm's cytosolic access in human macrophages.

      Weaknesses:

      1) Their results with human macrophages suggest that there are differences between murine and human macrophages in inflammasome-mediated restriction of STm growth. For example, Thurston et al. showed that in murine macrophages that inflammasome activation controls the replication of mutant STm that aberrantly invades the cytosol, but only slightly limits replication of WT STm. In contrast, here the authors found that primed human macrophages rely on caspase-1, gasdermin D and ninjurin-1 to restrict WT STm. I wonder if the priming of the human macrophages in this study could account for the differences in these studies. Along those lines, do the authors see the same results presented in this study in the absence of priming the macrophages with Pam3CSK4. I think that determining whether the control of intracellular STm replication is dependent on priming is very important. Another difference with the Thurston et al. paper is the way that the STm inoculum was prepared - stationary phase bacteria that were opsonized. Could this also account for differences between the two studies rather than differences between murine and human macrophages in inflammasome-dependent control of STm?<br /> 2) The authors show that the pore-forming proteins GSDMD and Ninj1 contribute to control of STm replication in human macrophages. Is it possible that leakage of gentamicin from the media contributes to this control?

      3) One major question that remains to be answered is whether casp-1 plays a direct role in the intracellular localization of STm. If the authors quantify the percentage of vacuolar vs. cytosolic bacteria at early time points in WT and casp-1 KO macrophages, would that be the same in the presence and absence of casp-1? If so, then this would suggest that there is a basal level of bacterial-dependent lysis of the SCV and in WT macrophages the presence of cytosolic PAMPS trigger cell death and bacteria can't replicate in the cytosol. However, in the inflammasome KO macrophages, the host cell remains alive and bacteria can replicate in the cytosol.

    3. Reviewer #2 (Public Review):

      Summary:

      This work addresses the question of how human macrophages restrict intracellular replication of Salmonella.

      Strengths:

      Through a series of genetic knockouts and using specific inhibitors, Egan et al. demonstrated that the inflammasome components caspase-1, caspase-4, gasdermin D (GSDMD), and the final lytic death effector ninjurin-1 (NINJ1) are required for control of Salmonella replication in human macrophages. Interestingly, caspase-1 proved crucial in restricting Salmonella early during infection, whereas caspase-4 was essential in the later stages of infection. Furthermore, using a chloroquine resistance assay and state-of-the-art microscopy, the authors found that NAIP receptor and caspase-1 mostly regulate replication of cytosolic bacteria, with smaller, yet significant, impact on the vacuolar bacteria.

      The finding that inflammasomes are critical in the restriction of replication of intracellular Salmonella in human macrophages contrasts with the published minimal role of inflammasomes in restriction of replication of intracellular Salmonella in murine macrophages. These findings demonstrate yet another example of interspecies and intercellular differences in regulation of bacterial infections by the immune system.

      Weaknesses: none.

    4. Reviewer #3 (Public Review):

      The manuscript by Egan and coworkers investigates how Caspase-1 and Caspase-4 mediated cell death affects replication of Salmonella in human THP-1 macrophages in vitro.

      Overall evaluation:

      Strength of the study include the use of human cells, which exhibit notable differences (e.g., Caspase 11 vs Caspase-4/5) compared to commonly used murine models. Furthermore, the study combines inhibitors with host and bacterial genetics to elucidate mechanistic links.

      The main weaknesses of the study are the inherent limitations of tissue culture models. For example, to study interaction of Salmonella with host cells in vitro, it is necessary to kill extracellular bacteria using gentamicin. However, since Salmonella-induced macrophage cell death damages the cytosolic membrane, gentamicin can reach intracellular bacteria and contribute to changes in CFU observed in tissue culture models (major point 1). This can result in tissue culture "artefacts" (i.e., observations/conclusions that cannot be recapitulated in vivo). For example, intracellular replication of Salmonella in murine macrophages requires T3SS-2 in vitro, but T3SS-2 is dispensable for replication in macrophages of the spleen in vivo (Grant et al., 2012).

      Major comments:

      In Figure 1: are increased CFU in WT vs CASP1-deficient THP-1 cells due to Caspase 1 restricting intracellular replication or due to Caspase-1 causing pore formation to allow gentamicin to enter the cytosol thereby restricting bacterial replication? The same question arises about Caspase-4 in Figure 2, where differences in CFU are observed only at 24h when differences in cell death also become apparent. The idea that gentamicin entering the cytosol through pores is responsible for controlling intracellular Salmonella replication is also consistent with the finding that GSDMD-mediated pore formation is required for restricting intracellular Salmonella replication (Figure 3). Similarly, the finding that inflammasome responses primarily control Salmonella replication in the cytosol could be explained by an intact SCV membrane protecting Salmonella from gentamicin (Figure 5).

    1. eLife assessment

      The reviewers have found the work to be valuable to the field of immunotherapy in the treatment of cancer. The data supporting the role of PDLIM2 as a tumor suppressor, and more immediately, as a strategy to improve the efficacy of immunotherapy treatment, was viewed as compelling. However, the results are lacking a completed mechanism, which would substantially expand the impact of the work.

    2. Reviewer #1 (Public Review):

      The manuscript by Sun and colleagues followed their previous findings on the tumor-suppressive role of PDLIM2 in lung cancer. They further investigated various mechanisms, including epigenetic modification, copy number variation, and LOH, that led to the decreased expression of PDLIM2 in human lung cancer. Next, they used a nanoparticle-based approach to specifically restore the expression in mouse lung tumors. They showed that over-expression of PDLIM2 in lung cancer repressed its progression in vivo. Also, this treatment could synergize with chemotherapy and checkpoint inhibitor anti-PD-1. Overall, the results were quite promising and convincing, using a treatment combination that would appear to have the potential for clinical implementation.

    3. Reviewer #2 (Public Review):

      Summary: The authors have previously demonstrated that the E3 ligase PDLIM2 inhibits NF-kB and STAT3 and is epigenetically repressed in human lung cancers (Sun et al. Nat. Comm. 2019 10: 5324); therefore, PDLIM2 is a tumor suppressor in lung cancer. In this manuscript, they follow up on their previous findings and show that expression of PDLIM2 is downregulated in human lung cancers by both genetic deletion and promoter methylation. They further describe a novel approach to restore the expression of PDLIM2 in mouse lung tumors by systemically administering PDLIM2 plasmids encapsulated in nanoparticles (termed "nanoPDLIM2"). The nanoPDLIM2 approach was shown to exhibit efficacy with low toxicity in a urethane-induced mouse lung cancer model. The authors further demonstrated the synergy of nanoPDLIM2 with chemotherapy and PD-1 blockade immunotherapy. The combination therapy of nanoPDLIM2, chemotherapy, and immunotherapy proved most effective with complete tumor remission in 60% of mice. Mechanistically, nanoPDLIM2 upregulated MHC-I expression, enhanced CD4/CD8 T cell activation and tumor infiltration, and suppressed MDR1 induction and nuclear expression of STAT3, RelA and prosurvival genes in tumors. Overall, this study is important because it reinforces the critical roles of PDLIM2 in suppressing lung cancer, and also identifies a potential approach to restoring PDLIM2 expression in lung tumors. The experiments were well executed; the data are convincing and support the conclusions made by the authors.

    4. Reviewer #3 (Public Review):

      Strengths:

      NanoPDLIM2, nanotechnologies that efficiently deliver lentivirus overcomes resistance to chemotherapy and anti-PD-1 immunotherapy. This is a new strategy for enhancing the efficiency of immune checkpoint inhibitors. This finding is important from a clinical translation perspective, but I have several minor concerns.

      Weaknesses:

      1. Please describe the mechanism of increased MHC class I and PD-L1 by PDLIM2.<br /> 2. Please describe the mechanism of decreased MDR1, nuclear RelA and STAT3 by PDLIM2.<br /> 3. Please determine whether PDLIM2 expression directly impacts immune cells (function and number)?<br /> 4. What is the efficiency of PDLIM2 delivery? Does delivery efficiency determine anti-tumor effect?<br /> 5. Authors used a non-immunogenic tumor model. Can you demonstrate the combination effect with PDLIM2 in immunogenic lung cancer models to determine whether the combination of PDLIM2 with anti-PD-1 Ab confers a synergistic effect without chemotherapy?<br /> 6. On page 11, % change can make one over-interpret data.<br /> 7. In Figure 5, what is the difference between 5A and 5D?<br /> 8. It is unclear whether PDLIM2 confers an additive or a synergistic effect with anti-PD-1/chemo.<br /> 9. Have the authors tested any toxicity in normal lungs?

    1. eLife assessment

      This study presents a valuable finding that YAP/TAZ promotes the formation of P-bodies for tumor progression via inhibiting PNRC1 which is a critical suppressor of P-body formation. The evidence supporting the claims of the authors is solid, although the inclusion of the mechanistic link between P-body formation and oncogenesis would have strengthened the study. The work will be of interest to cancer biologists or scientists working in the field of Hippo signaling.

    2. Reviewer #1 (Public Review):

      In this manuscript, the authors demonstrated that YAP/TAZ promotes P-body formation in a series of cancer cell lines. YAP/TAZ modulates the transcription of multiple P-body-related genes, especially repressing the transcription of the tumor suppressor proline-rich nuclear receptor coactivator 1 (PNRC1) through cooperation with the NuRD complex. PNRC1 functions as a critical repressor in YAP-induced biogenesis of P-bodies and tumorigenesis in colorectal cancer (CRC). Reexpression of PNRC1 or disruption of P-bodies attenuated the protumorigenic effects of YAP. Overall, these findings are interesting and the study was well conducted.

      Major concerns:

      1. RNAseq data indicated that Yap has the capacity to suppress the expression of numerous genes. In addition to PNRC1, could there be additional Yap targeting factors involved in Yap-mediated the formation of P-bodies?<br /> 2. It is still not clear how PNRC1 regulates P-bodies. Knockdown of PNRC1 prevented the reduction of P-bodies caused by Yap knockdown. How do the genes related to P-bodies that are positively regulated by Yap, such as SAMD4A, AJUBA, and WTIP, change in this scenario? Given that the expression of Yap can differ considerably among various cell types, is it possible for P-bodies to be present in tumor cells lacking Yap expression?<br /> 3. The authors demonstrated that CHD4 can bind to Yap target genes, such as CTGF, AJUBA, SAMD4A (Figure 4 - Figure Supplement 1D). Does the NuRD complex repress the expression of these genes? the NuRD complex could prevent the formation of P-bodies?<br /> 4. YAP/TAZ promotes the formation of P-bodies which contradicts the previous study's conclusion (PMID: 34516278). Please address these inconsistent findings.

    3. Reviewer #2 (Public Review):

      In a study by Shen et al., the authors investigated YAP/TAZ target genes that play a role in the formation of processing bodies (P-bodies). P-bodies are membraneless cytoplasmic granules that contain translationally repressed mRNAs and components of mRNA turnover. GO enrichment analysis of the RNA-Seq data of colorectal cancer cells (HCT116) after YAP/TAZ knockdown showed that the downregulated genes were enriched in P-body resident proteins. Overexpression, knockdown, and ChIP-qPCR analyses showed that SAMD4A, PNRC1, AJUBA, and WTIP are YAP-TEAD target genes that also play a role in P-body biogenesis. Using P-body markers such as DDX6 and DCP1A, the authors showed that the knockdown of YAP in the HCT116 cell line causes a reduction in the number of P-bodies. Similarly, overexpression of constitutively active YAP (YAP 5SA) increased the P-body number. The YAP-TEAD target genes SAMD4A and AJUBA positively regulate P-body formation, because lowering their expression levels using siRNA reduces the number of P-bodies. The other YAP target gene, PNRC1, is a negative regulator of P-body biogenesis and consistently YAP suppresses its expression through the recruitment of the NuRD complex. YAP target genes that modulate P-body formation play prominent roles in oncogenesis. PNRC1 suppression is key to YAP-mediated proliferation, colony formation, and tumorigenesis in HCT116 xenografts. Similarly, SAMD4 and AJUBA knockdown abrogated cell viability. In summary, this study demonstrated that SAMD4, AJUBA, WTIP, and PNRC1 are bona fide YAP-TEAD target genes that play a role in P-body formation, which is also linked to the oncogenesis of colon cancer cells.

      Major Strengths:

      The majority of the experiments were appropriately planned so that the generated data could support the conclusions drawn by the authors. The phenotype observed with YAP/TAZ knockdown correlated inversely with YAP5SA overexpression, which is complementary. Where possible, the authors also used point mutations that selectively disrupt protein-protein interactions, such as YAP S94A and PNRC1 W300A. The CRC cell line HCT116 was used throughout the study; additionally, data from other cancer cell lines were used to support the generality of the findings.

      Weaknesses:

      The authors did not elucidate the mechanistic link between P-body formation and oncogenesis; therefore, it is unclear why an increase in the number of P-bodies is pro-tumorigenic. AJUBA and SAMD4 may have housekeeping functions and reduce the proliferation of YAP-independent cell lines. Figure 6 - Figure Supplement 4 shows a reduction in cell viability and migration in control HCT116 cell lines upon AJUBA/SAMD4 knockdown. Therefore, it is unclear whether their tumor suppressive role is YAP-dependent. The authors extrapolated and suggested that their findings could be exploited therapeutically, without providing much detail. How do they plan to stimulate the expression of PNRC1? It is not necessary for every scientific finding to lead to a therapeutic benefit; therefore, they can tone down such statements if therapeutic exploitation is not realistic. The authors elucidated a mechanism for PNRC1 repression and one wonders why no attempts were made to understand the mechanism of activation of SAMD4, AJUBA, and WTIP expression.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We would first like to thank the reviewers and the editor for their insightful comments and suggestions. We are particularly glad to read that our so<ware package constitutes a set of “well-written analysis routines” which have “the potential to become very valuable and foundational tools for the analysis of neurophysiological data”. We have updated the manuscript to address their remarks where appropriate.

      Additionally, we would like to stress that this kind of tools is in continual development. As such, the manuscript offered a snapshot of the package at one point during this process, which in this case was several months ago at initial submission. Since then, several improvements were implemented. The manuscript has been further updated to reflect these more recent changes.

      From the Reviewing Editor:

      The reviewers identified a number of fundamental weaknesses in the paper.

      1) For a paper demonstrating a toolbox, it seems that some example analyses showing the value of the approach (and potentially the advantage in simplification, etc over previous or other approaches) are really important to demonstrate.

      As noted by the first reviewer, the online repository (i.e. GitHub page) conveys a better sense of the toolboxes’ contribution to the field than the present manuscript. This is a fair remark but at the same time, it is unclear how to illustrate this in a journal article without dedicating a great deal of page space to presenting raw code, while online tools offer an easier and clearer way to do this. As a work-around, our strategy was to illustrate some examples of data analysis in Figures 4&5 by comparing each illustrated processing step to the corresponding command line used by the Pynapple package. Each step requires a single line of code, meaning that one only needs to write three lines of code to decode a feature from population activity using a Bayesian decoder (Fig. 4a), compute a cross-correlograms of two neurons during specific stimulus presentation (Fig. 4b) or compute the average firing rate of two neurons around a specific time of the experimental task (Fig. 4c). We believe that these visual aides make it unnecessary to add code in the main text of this manuscript. However, to aid reader understanding, we now provide clear references to online Jupyter notebooks which show how each figure was generated in figure legends as well as in the “Code Availability” section.

      https://github.com/pynapple-org/pynapple-paper-2023

      Furthermore, we have opted-in for the “Executable Research Articles” feature at eLife, which will make it possible to include live scripts and figures in the manuscript once it is accepted for publication. We do not know at this stage what it entails exactly, but we hope that Figures 4&5 will become live with this feature. The readers will have the possibility to see and edit the code directly within the online version of the manuscript.

      2) The manuscript's claims about not having dependencies seem confusing.

      We agree that this claim was somewhat unfounded. There are virtually no Python packages that do not have dependencies. Our intention was to say that the package had no dependencies outside the most common ones, which are Numpy, Scipy, and Pandas. Too many packages in the field tend to have long list of dependencies making long-term back-compatibility quite challenging. By keeping depencies minimal, we hope to maximise the package’'s long term back-compatibility. We have rephrased this statement in the manuscript in the following sections:

      Figure 1, legend.

      “These methods depend only on a few, commonly used, external packages.”

      Section Foundational data processing: “they are for the most part built-in and only depend on a few widely-used external packages. This ensures that the package can be used in a near stand-alone fashion, without relying on packages that are at risk of not being maintained or of not being compatible in the near future.”

      3) Given its significant relevance, it seems important to cite the FMATool and describe connections between it (or analyses based on it) and the presented work.

      Indeed, although we had already cited other toolboxes (including a review covering the topic comprehensively), we should have included this one in the original manuscript. Unfortunately, to the best of our knowledge, this toolbox is not citable (there is no companion paper). We have added a reference to it in plain text.

      4) Some discussion of integration between Pynapple and the rest of a full experimental data pipeline should be discussed with regard to reproducibility.

      This is an interesting point, and the third paragraph of the discussion somewhat broached this issue. Pynapple was not originally designed to pre-process data. However, it can, in theory, load any type of data streams a<er the necessary pre-processing steps. Overall, modularity is a key aspect of the Pynapple framework, and this is also the case for the integration with data pre-processing pipelines, for example spike sorting in electrophysiology and detection of region of interest in calcium imaging. We do not think there should be an integrated solution to the problem but, instead, to make it possible that any piece of code can be used for data irrespective of their origin. This is why we focused on making data loading straightforward and easy to adapt to any particular situation. To expand on this point and make it clear that Pynapple is not meant to pre-process data but can, in theory, load any type of data streams a<er the necessary pre-processing steps, we have added the following sentences to the aforementioned paragraph:

      “Data in neuroscience vary widely in their structure, size, and need for pre-processing. Pynapple is built around the idea that raw data have already been pre-processed (for example, spike sorting and detection of ROIs).”

      5) Relatedly, a description of how data are stored a<er processing (i.e., how precisely are processed data stored in NWB format).

      We agree that this is a critical issue. NWB is not necessarily the best option as it is not possible to overwrite in a NWB file. This would require the creation of a new NWB file each time, which is computationally expensive and time consuming. It also further increases the odds of writing error. Theoretically, users who needs to store intermediate results in a flexible way could use any methods they prefer, writing their own data files and wrappers to reload these data into Pynapple objects. Indeed, it is not easy to properly store data in an object-specific manner. This is a long-standing issue and one we are currently working to resolve.

      To do so, we are developing I/O methods for each Pynapple core objects. We aim to provide an output format that is simple to read and backward compatible in future Pynapple releases. This feature will be available in the coming weeks. To note, while NWB may not be the central data format of Pynapple in future releases, it has become a central node in the neuroscience ecosystem of so<ware. Therefore, we aim to facilitate the interaction of users with reading and writing for this format by developing a set of simple standalone functions.

      Reviewer #1 (Public Review):

      A typical path from preprocessed data to findings in systems neuroscience o<en includes a set of analyses that o<en share common components. For example, an investigator might want to generate plots that relate one time series (e.g., a set of spike times) to another (measurements of a behavioral parameter such as pupil diameter or running speed). In most cases, each individual scientist writes their own code to carry out these analyses, and thus the same basic analysis is coded repeatedly. This is problematic for several reasons, including the waste of time, the potential for errors, and the greater difficulty inherent in sharing highly customized code.

      This paper presents Pynapple, a python package that aims to address those problems.

      Strengths:

      The authors have identified a key need in the community - well-written analysis routines that carry out a core set of functions and can import data from multiple formats. In addition, they recognized that there are some common elements of many analyses, particularly those involving timeseries, and their object- oriented architecture takes advantage of those commonalities to simplify the overall analysis process.

      The package is separated into a core set of applications and another with more advanced applications, with the goal of both providing a streamlined base for analyses and allowing for implementations/inclusion of more experimental approaches.

      Weaknesses:

      There are two main weaknesses of the paper in its present form.

      First, the claims relating to the value of the library in everyday use are not demonstrated clearly. There are no comparisons of, for example, the number of lines of code required to carry out a specific analysis with and without Pynapple or Pynacollada. Similarly, the paper does not give the reader a good sense of how analyses are carried out and how the object-oriented architecture provides a simplified user interaction experience. This contrasts with their GitHub page and associated notebooks which do a better job of showing the package in action.

      As noted in the response to the Reviewing Editor and response to the reviewer’s recommendation to the authors below, we have now included links to Jupyter notebooks that highlight how panels of Figures 4 and 5 were generated (https://github.com/pynapple-org/pynapple-paper-2023). However, we believe that including more code in the manuscript than what is currently shown (I.e. abbreviated call to methods on top of panels in Figs 4&5) would decrease the readability of the manuscript.

      Second, the paper makes several claims about the values of object-oriented programming and the overall design strategy that are not entirely accurate. For example, object-oriented programming does not inherently reduce coding errors, although it can be part of good so<ware engineering. Similarly, there is a claim that the design strategy "ensures stability" when it would be much more accurate to say that these strategies make it easier to maintain the stability of the code. And the authors state that the package has no dependencies, which is not true in the codebase. These and other claims are made without a clear definition of the properties that good scientific analysis so<ware should have (e.g., stability, extensibility, testing infrastructure, etc.).

      Following thFMAe reviewer’s comment, we have rephrased and clarified these claims. We provide detailed response to these remarks in the recommendations to authors below.

      There is also a minor issue - these packages address an important need for high-level analysis tools but do not provide associated tools for preprocessing (e.g., spike sorting) or for creating reproducible pipelines for these analyses. This is entirely reasonable, in that no one package can be expected to do everything, but a bit deeper account of the process that takes raw data and produces scientific results would be helpful. In addition, some discussion of how this package could be combined with other tools (e.g., DataJoint, Code Ocean) would help provide context for where Pynapple and Pynacollada could fit into a robust and reliable data analysis ecosystem.

      We agree the better explaining how Pynapple is integrated within data preprocessing pipelines is essential. We have clarified this aspect in the manuscript and provide more details below.

      Reviewer #1 (Recommendations For The Authors):

      Page 1

      • Title

      The authors should note that the application name- "Pynapple" could be confused with something from Apple. Users may search for "Pyapple" as many python applications contain "py" like "Numpy". "Pyapple" indeed is a Python Apple that works with Apple products. They could consider "NeuroFrame", "NeuroSeries" or "NeuroPandas" to help users realize this is not an apple product.

      We thank the referee for this interesting comment. However, we are not willing to make such change at this point. The community of users has been growing in the last year and it seems too late to change the name. To note, it is the first time such comment is made to us and it does not seem that users and collaborators are confused with any Apple products.

      • Abstract

      The authors mentioned that the Pynapple is "fully open source". It may be better to simply say it is "open source".

      We agree, corrected.

      Assuming the authors keep the name, it would be helpful if the full meaning of Pynapple - Python Neural Analysis Package was presented as early as possible.

      Corrected in the abstract.

      • Highlight

      An application being lightweight and standalone does not imply nor ensure backward compatibility. In general, it would be useful if the authors identified a set of desirable code characteristics, defined them clearly in the introduction, and then describe their so<ware in terms of those characteristics.

      Thank you for your comment. We agree that being lightweight and standalone does not necessarily imply backward compatibility. Our intention was to emphasize that Pynapple is designed to be as simple and flexible as possible, with a focus on providing a consistent interface for users across different versions. However, we understand that this may not be enough to ensure long-term stability, which is why we are committed to regular updates and maintenance to ensure that the code remains functional as the underlying code base (Python versions, etc.) changes.

      Regarding your suggestion to identify a set of desirable code characteristics, we believe this is an excellent idea. In the introduction, we briefly touch upon some of the core principles that guided our development of Pynapple: a lightweight, stable, and simple package. However, we acknowledge that providing a more detailed discussion of these characteristics and how they relate to the design of our so<ware would be useful for readers. We have added this paragraph in the discussion:

      “Pynapple was developed to be lightweight, stable, and simple. As simplicity does not necessarily imply backward compatibility (i.e. long-term stability of the code), Pynapple main objects and their properties will remain the same for the foreseeable future, even if the code in the backend may eventually change (e.g. not relying on Pandas in future version). The small number of external dependencies also decrease the need to adapt the code to new versions of external packages. This approach favors long-term backward compatibility.”

      Page 2

      • The authors wrote -

      "Despite this rapid progress, data analysis o<en relies on custom-made, lab-specific code, which is susceptible to error and can be difficult to compare across research groups."

      It would be helpful to add that custom-made, lab-specific code can lead to a violation of FAIR principles (https://en.wikipedia.org/wiki/FAIR_datadata). More generally, any package can have errors, so it would be helpful to explain any testing regiments or other approach the authors have taken to ensure that their code is error-free.

      We understand the importance of the FAIR principles for data sharing. However, Pynapple was not designed to handle data through their pre-processing. The only aspect that is somehow covered by the FAIR principles is the interoperability, but again, it is a requirement for the data to interoperate with different storage and analysis pipelines, not of the analysis framework itself. Unlike custom-made code, Pynapple will make interoperability easier, as, in theory, once the required data loaders are available, any analysis could be run on any dataset. We have added the following sentence to the discussion:

      “Data in neuroscience vary widely in their structure, size, and need for pre-processing. Pynapple is built around the idea that raw data has already been pre-processed (for example, spike sorting and ROI detection). According to the FAIR principles, pre-processed data should interoperate across different analysis pipelines. Pynapple makes this interoperability possible as, once the data are loaded in the Pynapple framework, the same code can be used to analyze different datasets”

      • The authors wrote -

      "While several toolboxes are available to perform neuronal data analysis ti–11,2ti (see ref. 29 for review), most of these programs focus on producing high-level analysis from specified types of data and do not offer the versatility required for rapidly-changing analytical methods and experimental methods."

      Here it would be helpful if the authors could give a more specific example or explain why this is problematic enough to be a concern. Users may not see a problem with high-level analysis or using specific data types.

      Again, we apologize for not fully elaborating upon our goals here. Our intention was to point out that toolboxes o<en focus on one particular case of high-level analysis. In many cases, such packages lack low level analysis features or the flexibility to derive new analysis pipelines quickly and effortlessly. Users can decide to use low-level packages such as Pandas, but in that case, the learning curve can be steep for users with low, if any, computational background. The simplicity of Pynapple, and the set of examples and notebooks, make it possible for individuals who start coding to be quickly able to analyze their data.

      As we do not want to be too specific at this point of the manuscript (second paragraph of the intro) and as we have clarified many of the aspects of the toolbox in the new revised version, we have only added the following sentence to the paragraph:

      “Users can decide to use low-level data manipulation packages such as Pandas, but in that case, the learning curve can be steep for users with low, if any, computational background.”

      • The authors wrote -

      "To meet these needs, a general toolbox for data analysis must be designed with a few principles in mind"

      Toolboxes based on many different principles can solve problems. It is likely more accurate to say that the authors designed their toolbox with a particular set of principles in mind. A clear description of those principles (as mentioned in the comment above) would help the reader understand why the specific choices made are beneficial.

      We agree that these are not “universal” principles and clearly more the principles we had in mind when we designed the package. We have clarified these principles and made clear that these are personal point of views.

      We have rephrased the following paragraph:

      “To meet these needs, we designed Pynapple, a general toolbox for data analysis in systems Neuroscience with a few principles in mind.“

      • The authors wrote -

      "The first property of such a toolbox is that it should be object-oriented, organizing so<ware around data."

      What facts make this true? For example, React is a web development library. A common approach to using this library is to use Hooks (essentially a collection of functions). This is becoming more popular than the previous approach of using Components (a collection of classes). This is an example of how Object-oriented programming is not always the best solution. In some cases, for example, object- oriented coding can cause problems (e.g. it can be hard to find the place where a given function is defined and to figure out which version is being used given complex inheritance structures.)

      In general, key selling points of object-oriented programming are extension, inheritance, and encapsulation. If the authors want to retain this text (which would be entirely reasonable), it would be helpful if they explained clearly how an object-oriented approach enables these functions and why they are critical for this application in particular.

      The referee makes a particularly important point. We are aware of the limits of OOP, especially when these objects become over-complex, and that the inheritance become unclear.

      We have clarified our goal here. We believe that in our case, OOP is powerful and, overall, is less error- prone that a collection of functions. The reasons are the following:

      An object-oriented approach facilitates better interactions between objects. By encapsulating data and behavior within objects, object-oriented programming promotes clear and well-defined interfaces between objects. This results in more structured and manageable code, as objects communicate with each other through these well-defined interfaces. Such improved interactions lead to increased code reliability.

      Inheritance, a key concept in object-oriented programming, allows for the inheritance of properties. One important example of how inheritance is crucial in the Pynapple framework is the time support of Pynapple objects. It determines the valid epoch on which the object is defined. This property needs to be carried over during different manipulations of the object. Without OOP, this property could easily be forgotten, resulting in erroneous conclusions for many types of analysis. The simplest case is the average rate of a TS object: the rate must be computed on the time support ( a property of TS objects), not the beginning to the end of the recording (or of a specific epoch, independent of the TS). Finally, it is easier to access and manipulate the meta information of a Pynapple object than without using objects.

      • The authors wrote -

      "drastically diminishing the odds of a coding error"

      This seems a bit strong here. Perhaps "reducing the odds" would be more accurate.

      We agree. Now changed.

      Page 3

      • The authors wrote -

      ". Another property of an efficient toolbox is that as much data as possible should be captured by only a small number of objects This ensures that the same code can be used for various datasets and eliminates the need of adapting the structure"

      It may be better to write something like - "Objects have a collection of preset variables/values that are well suited for general use and are very flexible." Capturing "as much data as possible" may be confusing, because it's not the amount that this helps with but rather the variety.

      We thank the referee for this remark. We have rephrased this sentence as follows:

      “Another property of an efficient toolbox is that a small number of objects could virtually represents all possible data streams in neuroscience, instead of objects made for specific physiological processes (e.g. spike trains).”

      • The authors wrote -

      "The properties listed above ensure the long-term stability of a toolbox, a crucial aspect for maintaining the code repository. Toolboxes built around these principles will be maximally flexible and will have the most general application"

      There are two issues with this statement. First, ensuring long-term stability is only possible with a long- term commitment of time and resources to ensure that that code remains functional as the underlying code base (python versions, etc.) changes. If that is something you are commisng to, it would be great to make that clear. If not, these statements need to be less firm.

      Second, it is not clear how these properties were arrived at in the first place. There are things like the FAIR Principles which could provide an organizing framework, ideally when combined with good so<ware engineering practices, and if some more systematic discussion of these properties and their justification could be added, it would help the field think about this issue more clearly.

      The referee makes a valid point that ensuring long-term stability requires a long-term commitment of time and resources to maintain the code as the underlying technology evolves. While we cannot make guarantees about the future of Pynapple, we believe that one of the best ways to ensure long-term stability is by fostering a strong community of users and contributors who can provide ongoing support and development. By promoting open-source collaboration and encouraging community involvement, we hope to create a sustainable ecosystem around Pynapple that can adapt to changes in technology and scientific practices over time. Ultimately, the longevity of any scientific tool depends on its adoption and use by the research community, and we hope that Pynapple can provide value to neuroscience researchers and continue to evolve and improve as the field progresses.

      It is noteworthy that the first author, and main developer of the package, has now been hired as a data scientist at the Center for Computational Neuroscience, Flatiron Institute, to explicitly continue the development of the tool and build a community of users and contributors.

      • The authors wrote -

      "each with a limited number of methods..."

      This may give the impression that the functionality is limited, so rephrasing may be helpful.

      Indeed! We have now rephrased this sentence:

      “The core of Pynapple is five versatile timeseries objects, whose methods make it possible to intuitively manipulate and analyze the data.”

      • The authors wrote that object-oriented coding

      "limits the chances of coding error"

      This is not always the case, but if it is the case here, it would be helpful if the authors explain exactly how it helps to use object-oriented approaches for this package.

      We agree with the referee that it is not always the case. As we explained above, we believe it is less error-prone that a collection of functions. Quite o<en, it also makes it easier to debug. We have changed this sentence with the following one:

      “Because objects are designed to be self-contained and interact with each other through well-defined methods, users are less likely to make errors when using them. This is because objects can enforce their own internal consistency, reducing the chances of data inconsistencies or unexpected behavior. Overall, OOP is a powerful tool for managing complexity and reducing errors in scientific programming.”

      • Fig 1

      In object-oriented programming, a class is a blueprint for the classes that inherit it. Instantiating that<br /> class creates an object. An object contains any or all of these - data, methods, and events. The figure could be improved if it maintained these organizational principles as figure properties.

      We agree with the referee’s remark regarding the logic of objects instantiation but how this could be incorporated in Fig. 1 without making it too complex is unclear. Here, objects are instantiated from the first to the second column. We have not provided details about the parent objects, as we believe these details are not important for reader comprehension. In its present form, the objects are inherited from Pandas objects, but it is possible that a future version is based on something else. For the users, this will be transparent as the toolbox is designed in such a way that only the methods that are specific to Pynapple are needed to do most computation, while only expert programmers may be interested in using Pandas functionalities.

      • The authors wrote that Pynapple does -

      "not depend on any external package"

      As mentioned above, this is not true. It depends on Numpy and likely other packages, and this should be explained. It is perfectly reasonable to say that it depends on only a few other packages.

      As said above, we have now clarified this claim.

      Page 5.

      • The authors wrote -

      "represent arrays of Ts and Tsd"

      For a knowledgeable reader's reference, it would be helpful to refer to these either as Numpy arrays (at least at first when they are defined) or as lists if they are native python objects.

      Indeed, using the word “arrays” here could be confusing because of Numpy arrays. We have changed this term with “groups”.

      • The authors wrote -

      "Pynapple is built with objects from the Pandas library ... Pynapple objects inherit the computational stability and flexibility"

      Here a definition of stability would be useful. Is it the case that by stability you mean "does not change o<en"? Or is some other meaning of stability implied?

      Yes, this is exactly what we meant when referring to the stability of Pandas. We have added the following precision:

      “As such, Pynapple objects inherit the long-term consistency of the code and the computational flexibility computational stability and flexibility from this widely used package.”

      Page 6

      • Fig 2

      In Fig 2 A and B, the illustrations are good. It would also be very helpful to use toy code examples to illustrate how Pynapple will be used to carry out on a sample analysis-problem so that potential users can see what would need to be done.

      We appreciate the kind works. Regarding the toy code, this is what we tried to do in Fig. 4. Instead of including the code directly in the paper, which does not seem a modern way of doing this, we now refer to the online notebooks that reproduce all panels of Figure 4.

      • The authors wrote -

      "While these objects and methods are relatively few"

      In object-oriented programming, objects contain methods. If a method is not in an object, it is not technically a method but a function. It would be helpful if the authors made sure their terminology is accurate, perhaps by saying something like "While there are relatively few objects, and while each object has relatively few methods ... "

      We agree with the referee, we have changed the sentence accordingly.

      • The authors wrote -

      "if not implemented correctly, they can be both computationally intensive and highly susceptible to user error"

      Here the authors are using "correctly" to refer to two things - "accuracy" - gesng the right answer, and "efficiency" - gesng to that answer with relatively less computation. It would be clearer if they split out those two concepts in the phrasing.

      Indeed, we used the term to cover both aspects of the problem, leading to the two possible issues cited in the second part of the sentence. We have changed the sentence following the referee’s advice:

      “While there are relatively few objects, and while each object has relatively few methods, they are the foundation of almost any analysis in systems neuroscience. However, if not implemented efficiently, they can be computationally intensive and if not implemented accurately, they are highly susceptible to user error.”

      • In the next sentence the authors wrote -

      "Pynapple addresses this concern."

      This statement would benefit from just additional text explaining how the concern is addressed.

      We thank the referee for the suggestion. We have changed the sentence to this one: “The implementation of core features in Pynapple addresses the concerns of efficiency and accuracy”

      Page 9

      • The authors wrote -

      This is implemented via a set of specialized object subclasses of the BaseLoader class. To avoid code redundancy, these I/O classes inherit the properties of the BaseLoader class. "

      From a programming perspective, the point of a base class is to avoid redundancy, so it might be better to just mention that this avoids the need to redefine I/O operations in each class.

      We have rephrased the sentence as follows:

      “This is implemented via a set of specialized object subclasses of the BaseLoader class, avoiding the need to redefine I/O operations in each subclass"

      • The authors wrote -

      "classes are unique and independent from each other, ensuring stability"

      How do classes being unique and independent ensure stability? Perhaps here again the misunderstanding is due to the lack of a definition of stability.

      We thank the referee for the remark. We first changed “stability” for “long-term backward compatibility”. We further added the following sentence to clarify this claim. “For instance, if the spike sorting tool Phy changes its output in the future, this would not affect the “Neurosuite” IO class as they are independent of each other. This allows each tool to be updated or modified independently, without requiring changes to the other tool or the overall data format.”

      • The authors wrote -

      "Using preexisting code to load data in a specific manner instead of rewriting already existing functions avoids preprocessing errors"

      Here it might be helpful to use the lingo of Object-oriented programming. (e.g. inheritance and polymorphism). Defining these terms for a neuroscience audience would be useful as well.

      We do not think it is necessary to use too much technical term in this manuscript. However, this sentence was indeed confusing. We have now simplified it:

      “[…], users can develop their own custom I/O using available template classes. Pynapple already includes several of such templates and we expect this collection to grow in the future.”

      Page 10

      • The authors wrote -

      "These analyses are powerful because they are able to describe the relationships between time series objects while requiring the fewest number of parameters to be set by the user."

      It is not clear that this makes for a powerful analysis as opposed to an easy-to-use analysis.

      We have changed “powerful” with “easy to use".

      Page 12

      "they are built-in and thus do not have any external dependencies"

      If the authors want to retain this, it would be helpful to explain (perhaps in the introduction) why having fewer external dependencies is useful. And is it true that these functions use only base python classes?

      We have rephrased this sentence as follows:

      “they are for the most part built-in and only depend on a few common external packages, ensuring that they can be used stand-alone without relying on packages that are at risk of not being maintained or of not being compatible in the near future.”

      Other comments:

      • It would be helpful, as mentioned in the public review, to frame this work in the broader context of what is needed to go from data to scientific results so that people understand what this package does and does not provide.

      We have added the following sentence to the discussion to make sure readers understand:

      “The path from data collection to reliable results involves a number of critical steps: exploratory data analysis, development of an analysis pipeline that can involve custom-made developed processing steps, and ideally the use of that pipeline and others to replicate the results. Pynapple provides a platform for these steps.”

      • It would also be helpful to describe the Pynapple so<ware ecosystem as something that readers could contribute to. Note here that GNU may not be a good license. Technically, GNU requires any changes users make to Pynapple for their internal needs to be offered back to the Pynapple team. Some labs may find that burdensome or unacceptable. A workaround would be to have GNU and MIT licenses.

      The main restriction of the GPL license is that if the code is changed by others and released, a similar license should be used, so that it cannot become proprietary. We therefore stick to this choice of license.

      We would be more than happy to receive contributions from the community. To note, several users outside the lab have already contributed. We have added the following sentence in the introduction:

      “As all users are also invited to contribute to the Pynapple ecosystem, this framework also provides a foundation upon which novel analyses can be shared and collectively built by the neuroscience community.”

      • This so<ware shares some similarities with the nelpy package, and some mention of that package would be appropriate.

      While we acknowledge the reviewer's observation that Nelpy is a similar package to Pynapple, there are several important differences between the two.

      First, Nelpy includes predefined objects such as SpikeTrain, BinnedSpikeTrain, and AnalogSignal, whereas Pynapple would use only Ts and Tsd for those. This design choice was made to provide greater flexibility and allow users to define their own data structures as needed.

      Second, Nelpy is primarily focused on electrophysiology data, whereas Pynapple is designed to handle a wider range of data types, including calcium imaging and behavioral data. This reflects our belief that the NWB format should be able to accommodate diverse experimental paradigms and modalities.

      Finally, while Nelpy offers visualization and high-level analysis tools tailored to electrophysiology, Pynapple takes a more general-purpose approach. We believe that users should be free to choose their own visualization and analysis tools based on their specific needs and preferences.

      The package has now been cited.

      Reviewer #2 (Public Review):

      Pynapple and Pynacollada have the potential to become very valuable and foundational tools for the analysis of neurophysiological data. NWB still has a steep learning curve and Pynapple offers a user- friendly toolset that can also serve as a wrapper for NWB.

      The scope of the manuscript is not clear to me, and the authors could help clarify if Pynacollada and other toolsets in the making become a future aspect of this paper (and Pynapple), or are the authors planning on building these as separate publications.

      The author writes that Pynapple can be used without the I/O layer, but the author should clarify how or if Pynapple may work outside NWB.

      Absolutely. Pynapple can be used for generic data analysis, with no requirement of specific inputs nor NWB data. For example, the lab is currently using it for a computational project in which the data are loaded from simple files (and not from full I/O functions as provided in the toolbox) for further analysis and figure generation.

      This was already noted in the manuscript, last paragraph of the section “Importing data from common and custom pipelines”

      “Third, users can still use Pynapple without using the I/O layer of Pynapple.”.

      We have added the following sentence in the discussion

      “To note, Pynapple can be used without the I/O layer and independent of NWB for generic, on-the-fly analysis of data.”

      This brings us to an important fundamental question. What are the advantages of the current approach, where data is imported into the Ts objects, compared to doing the data import into NWB files directly, and then making Pynapple secondary objects loaded from the NWB file? Does NWB natively have the ability to store the 5 object types or are they initialized on every load call?

      NWB and Pynapple are complimentary but not interdependent. NWB is meant to ensure long-term storage of data and as such contains a as much information as possible to describe the experiment. Pynapple does not use NWB to directly store the objects, however it can read from NWB to organize the data in Pynapple objects. Since the original version of this manuscript was submitted, new methods address this. Specifically, in the current beta version, each object now has a “save” method. Obviously, we are developing functions to load these objects as well. This does not depend on NWB but on npz, a Numpy specific file format. However, we believe it is a bit too premature to include these recent developments in the manuscript and prefer not to discuss this for now.

      Many of these functions and objects have a long history in MATLAB - which documents their usefulness, and I believe it would be fisng to put further stress on this aspect - what aspects already existed in MATLAB and what is completely novel. A widely used MATLAB toolset, the FMA toolbox (the Freely moving animal toolbox) has not been cited, which I believe is a mistake.

      We agree that the FMA toolbox should have been cited. This ha now been corrected.

      Pynapple was first developed in Matlab (it was then called TSToolbox). The first advantage is of course that Python is more accessible than Matlab. It has also been adopted by a large community of developers in data analysis and signal processing, which has become without a doubt much larger than the Matlab community, making it possible to find solutions online for virtually any problem one can have. Furthermore, in our experience, trainees are now unwilling to get training in Matlab.

      Yet, Python has drawbacks, which we are fully aware of. Matlab can be very computationally efficient, and old code can usually run without any change, even many years later.

      A limitation in using NWB files is its standardization with limited built-in options for derived data and additional metadata. How are derived data stored in the NWB files?

      NWB has predetermined a certain number of data containers, which are most common in systems neuroscience. It is theoretically possible to store any kind of data and associated metadata in NWB but this is difficult for a non-expert user. In addition, NWB does not allow data replacement, making is necessary to rewrite a whole new NWB file each time derived data are changed and stored. Therefore, we are currently addressing this issue as described above. Derived data and metadata will soon be easy to store and read.

      How is Pynapple handling an existing NWB dataset, where spikes, behavioral traces, and other data types have already been imported?

      This is an interesting point. In theory, Pynapple should be able to open a NWB file automatically, without providing much information. In fact, it is challenging to open a NWB file without knowing what to look for exactly and how the data were preprocessed. This would require adapting a I/O function for a specific NWB file. Unfortunately, we do not believe there is a universal solution to this problem. There are solutions being developed by others, for example NWB Widgets (NWB Widgets). We will keep an eye on this and see whether this could be adapted to create a universal NWB loader for Pynapple.

      Reviewer #2 (Recommendations For The Authors):

      Other tools and solutions are being developed by the NWB community. How will you make sure that these tools can take advantage of Pynapple and vice versa?

      We recognize the importance of collaboration within the NWB community and are committed to making sure that our tools can integrate seamlessly with other tools and solutions developed by the community.

      Regarding Pynapple specifically, we are designing it to be modular and flexible, with clear APIs and documentation, so that other tools can easily interface with it. One important thing is that we want to make sure Pynapple is not too dependent of another package or file format such as NWB. Ideally, Pynapple should be designed so that it is independent of the underlying data storage pipeline.

      Most of the tools that have been developed in the NWB community so far were designed for data visualisation and data conversion, something that Pynapple does not currently address. Multiple packages for behavioral analysis and exploration of electro/optophysiological datasets are compatible with the NWB format but do not provide additional solutions per se. They are complementary to Pynapple.

    2. eLife assessment

      This paper introduces the python software package Pynapple and a separate package of more advanced routines (Pynacollada) to the Neuroscience/Neural Engineering community. Pynapple provides a set of data objects and methods that have the potential to simplify data analysis for neural and behavioral data types. This represents a valuable contribution to the field. With more examples and as a live coding notebook, the evidence was judged to be compelling.

    3. Reviewer #1 (Public Review):

      A typical path from preprocessed data to findings in systems neuroscience often includes set of analyses that often share common components. For example, an investigator might want to generate plots that relating one time series (e.g., a set of spike times) to another (measurements of a behavioral parameter such as pupil diameter or running speed). In most cases, each individual scientist writes their own code to carry out these analyses, and thus the same basic analysis is coded repeatedly. This is problematic for several reasons, including the inefficiency of different people writing the same code over and over again.

      This paper presents Pynapple, a python package that aims to address those problems.

      Strengths:

      The authors have identified a key need in the community - well written analysis routines that carry out a core set of functions and can import data from multiple formats. In addition, they recognized that there are some common elements of many analyses, particularly those involving timeseries, and their object-oriented architecture takes advantage of those commonalities to simplify the overall analysis process.

      The package is separated into a core set of applications and another with more advanced applications, with the goal of both providing a streamlined base for analyses and allowing for implementations/inclusion of more experimental approaches.

      Weaknesses:

      The revised version of the paper does a very good job of addressing previous concerns. It would be slightly more accurate in the Highlights section to say "A lightweight and standalone package facilitating long-term backward compatibility" but this is a very minor issue.

    4. Reviewer #2 (Public Review):

      The manuscript by G. Viejo et al. describes a new open-source toolbox called Pynapple, for data analysis of electrophysiological recordings, calcium imaging, and behavioral data. It is an object-oriented python package, consisting of 5 main object types: timestamps (Ts), timestamped data (Tsd), TsGroup, TsdFrame, and IntervalSet. Each object has a set of built-in core methods and import tools for common data acquisition systems and pipelines.

      Pynapple is a low-level package that uses NWB as a file format, and further allows for other more advanced toolsets to build upon it. One of these is called Pynacollada which is a toolset for data analysis of electrophysiological, imaging, and behavioral data.

      Pynapple and Pynacollada have the potential to become very valuable and foundational tools for the analysis of neurophysiological data. NWB still has a steep learning curve and Pynapple offers a user-friendly toolset that can also serve as a wrapper for NWB.

    1. eLife assessment

      This structural and biochemical study of the mouse homolog of acidic mammalian chitinase (AMCase) enhances our understanding of the pH-dependent activity and catalytic properties of mouse AMCase and sheds light on its adaptation to different physiological pH environments. The methods and analysis of data are solid, providing several lines of evidence to support a development of mechanistic hypotheses. While the findings and interpretation will be valuable to those studying AMCase in mice, the broader significance, including extension of the results to other species including human, remain unclear.

    2. Reviewer #1 (Public Review):

      General comments:<br /> This paper investigates the pH-specific enzymatic activity of mouse acidic mammalian chitinase (AMCase) and aims to elucidate its function's underlying mechanisms. The authors employ a comprehensive approach, including hydrolysis assays, X-ray crystallography, theoretical calculations of pKa values, and molecular dynamics simulations to observe the behavior of mouse AMCase and explore the structural features influencing its pH-dependent activity.

      The study's key findings include determining kinetic parameters (Kcat and Km) under a broad range of pH conditions, spanning from strong acid to neutral. The results reveal pH-dependent changes in enzymatic activity, suggesting that mouse AMCase employs different mechanisms for protonation of the catalytic glutamic acid residue and the neighboring two aspartic acids at the catalytic motif under distinct pH conditions.

      The novelty of this research lies in the observation of structural rearrangements and the identification of pH-dependent mechanisms in mouse AMCase, offering a unique perspective on its enzymatic activity compared to other enzymes. By investigating the distinct protonation mechanisms and their relationship to pH, the authors reveal the adaptive nature of mouse AMCase, highlighting its ability to adjust its catalytic behavior in response to varying pH conditions. These insights contribute to our understanding of the pH-specific enzymatic activity of mouse AMCase and provide valuable information about its adaptation to different physiological conditions.

      Overall, the study enhances our understanding of the pH-dependent activity and catalytic properties of mouse AMCase and sheds light on its adaptation to different physiological pH environments.

    3. Reviewer #2 (Public Review):

      Summary: In this study of the mouse homolog of acidic mammalian chitinase, the overall goal is to provide a mechanistic explanation for the unusual observation of two pH optima for the enzyme. The study includes biochemical assays to establish kinetic parameters at different solution pH, structural studies of enzyme/substrate complexes, and theoretical analysis of amino acid side chain pKas and molecular dynamics.

      Strengths: The biochemical assays are rigorous and nicely complemented by the structural and computational analysis. The mechanistic proposal that results from the study is well rationalized by the observations in the study.

      Weaknesses: The overall significance of the work could be made more clear. Additional details could be provided about the limitations of prior biochemical studies of mAMC that warranted the kinetic analysis. The mouse enzyme seems unique in terms of its behavior at high and low pH, so it remains unclear how the work will enhance broader understanding of this enzyme class. It was also not clear can the findings be used for therapeutic purposes, as detailed in the abstract, if the human enzyme works differently.

    1. eLife assessment

      This important study advances existing approaches for demographic inference by incorporating rapidly mutating markers such as switches in methylation state. The authors provide a solid comparison of their approach to existing methods, although the work would benefit from some additional consideration of the challenges in the empirical use of methylation data. The work will be of broad interest to population geneticists, both in terms of the novel approach and the statistical inference proposed.

    2. Reviewer #1 (Public Review):

      The authors developed an extension to the pairwise sequentially Markov coalecent model that allows to simultaneously analyze multiple types of polymorphism data. In this paper, they focus on SNPs and DNA methylation data. Since methylation markers mutate at a much faster rate than SNPs, this potentially gives the method better power to infer size history in the recent past. Additionally, they explored a model where there are both local and regional epimutational processes.

      Integrating additional types of heritable markers into SMC is a nice idea which I like in principle. However, a major caveat to this approach seems to be a strong dependence on knowing the epimutation rate. In Fig. 6 it is seen that, when the epimutation rate is known, inferences do indeed look better; but this is not necessarily true when the rate is not known. A roughly similar pattern emerges in Supp. Figs. 4-7; in general, results when the rates have to be estimated don't seem that much better than when focusing on SNPs alone. This carries over to the real data analysis too: the interpretation in Fig. 7 appears to hinge on whether the rates are known or estimated, and the estimated rates differ by a large amount from earlier published ones.

      Overall, this is an interesting research direction, and I think the method may hold more promise as we get more and better epigenetic data, and in particular better knowledge of the epigenetic mutational process. At the same time, I would be careful about placing too much emphasis on new findings that emerge solely by switching to SNP+SMP analysis.

    3. Reviewer #2 (Public Review):

      A limitation in using SNPs to understand recent histories of genomes is their low mutation frequency. Tellier et al. explore the possibility of adding hypermutable markers to SNP based methods for better resolution over short time frames. In particular, they hypothesize that epimutations (CG methylation and demethylation) could provide a useful marker for this purpose. Individual CGs in Arabidopsis tends to be either close to 100% methylated or close to 0%, and are inherited stably enough across generations that they can be treated as genetic markers. Small regions containing multiple CGs can also be treated as genetic markers based on their cumulative methylation level. In this manuscript, Tellier et al develop computational methods to use CG methylation as a hypermutable genetic marker and test them on theoretical and real data sets. They do this both for individual CGs and small regions. My review is limited to the simple question of whether using CG methylation for this purpose makes sense at a conceptual level, not at the level of evaluating specific details of the methods. I have a small concern in that it is not clear that CG methylation measurements are nearly as binary in other plants and other eukaryotes as they are in Arabidopsis. However, I see no reason why the concept of this work is not conceptually sound. Especially in the future as new sequencing technologies provide both base calling and methylating calling capabilities, using CG methylation in addition to SNPs could become a useful and feasible tool for population genetics in situations where SNPs are insufficient.

    4. Reviewer #3 (Public Review):

      I very much like this approach and the idea of incorporating hypervariable markers. The method is intriguing, and the ability to e.g. estimate recombination rates, the size of DMRs, etc. is a really nice plus. I am not able to comment on the details of the statistical inference, but from what I can evaluate it seems sound and reasonable. This is an exciting new avenue for thinking about inference from genomic data. I have a few concerns about the presentation and then also questions about the use of empirical methylation data sets.

      I think a more detailed description of demographic accuracy is warranted. For example, in L245 MSMC2 identifies the bottleneck (albeit smoothed) and only slightly overestimates recent size. In the same analysis the authors' approach with unknown mu infers a nonexistent population increase by an order of magnitude that is not mentioned.

      Similarly, it seems problematic that (L556) the approach requiring estimation of site and region parameters (as would presumably be needed in most empirical systems like endangered nonmodel species mentioned in the introduction) does no better than using only SNPs. Overall, I think a more objective and perhaps quantitative comparison of approaches is warranted.

      The authors simulate methylated markers at 2% (and in some places up to 20%). In many plant genomes a large proportion of cytosines are methylated (e.g. 70% in maize: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8496265/). I don't know what % of these may be polymorphic, but this leads to an order of magnitude more methylated cytosines than there are SNPs. Couldn't this mean that any appreciable error in estimating methylation threatens to be of a similar order of magnitude to the SNP data? I would welcome the authors' thoughts here.

      A few points of discussion about the biology of methylation might be worth including. For example, methylation can differ among cell types or cells within a tissue, yet sequencing approaches evaluate a pool of cells. This results in a reasonable fraction of sites having methylation rates not clearly 0 or 1. How does this variation affect the method? Similarly, while the authors cite literature about the stable inheritance of methylation, a sentence or so more about the time scale over which this occurs would be helpful. Finally, in some species methylated cytosines have mutation rates an order of magnitude higher than other nucleotides. The authors mention they assume independence, but how would violation of this assumption affect their inference?

    1. eLife assessment

      This important study combines in vitro and in vivo experiments to investigate the reciprocal regulation between mitochondria-associated membranes and Notch signaling in skeletal muscle atrophy, with implications beyond the single subfield of muscle atrophy. The methods, data, and analyses convincingly support most of the claims. There are minor weaknesses, with the analysis of gene and protein expression in some parts being incomplete.

    2. Reviewer #1 (Public Review):

      In this study, the authors investigated the role of MAM and the Notch signaling pathway in the onset of the atrophic phenotype in both in vivo and in vitro models. The rationale used to obtain the data is one of the main strengths of the study. Already from the reading, the reasoning scheme used by the authors in setting up the study and evaluating the data obtained is clear. Using both cellular and mouse models in vivo consolidates the data obtained. The authors also methodologically described all the choices made in the supplementary section. A weakness, on the other hand, is the failure to include averages and statistical data in the results that would give a quantifiable idea of the data obtained. To complete the picture, the authors could also investigate the possible involvement of the intrinsic apoptosis pathway as well as describe probable metabolic shifts to muscle cells in atrophic conditions. The rationale used by the authors to obtain the result is linear. The data obtained are useful for understanding the onset and characterization of the atrophic phenotype under disuse and microgravity conditions. The methods used are in line with those used in the field and can be a starting point for other studies. The cellular models are well described in the materials and methods section. The selected mouse models followed a logical rationale and were in line with the intended aim.

    3. Reviewer #2 (Public Review):

      In this study, the authors examined how the maintenance of mitochondrial-associated endoplasmic reticulum membranes (MAM) is critical for the prevention of muscle atrophy under microgravity conditions. They observed, a reduction in MAM in myotubes placed in a microgravity condition; in addition, MFN2-deficient human iPS cells showed a decrease in the number of MAM, similar to in myotubes differentiated under microgravity conditions, in addition to the activation of the Notch signaling pathway. The authors, moreover, observed that treatment with the gamma-secretase inhibitor with DAPT preserved the atrophic phenotype of differentiated myotubes in microgravity and improve the regenerative capacity of Mfn2-deficient muscle stem cells in dystrophic mice.

      The entire study was well conducted, bringing an interesting analysis in vitro and in vivo of aging conditions. In my opinion, it is necessary to improve the analysis of both genes and proteins to better support the conclusions

      The study can contribute to a better understanding of one of the major problems of aging, such as muscle atrophy and inhibition of muscle regeneration, emphasizing the importance of the NOTCH pathway in these pathological situations. The work will be of interest to all scientists working on aging.

    1. eLife assessment

      This important study makes a bold step towards understanding what fraction of DNA that is liberated from different tissues in a healthy human is found in circulation as cell-free DNA. Unfortunately, the evidence for the conclusions is presently incomplete, but with additional controls, this could become a major achievement for reference in understanding changes in cell-free DNA in disease states.

    2. Reviewer #1 (Public Review):

      Summary:

      Sender et al describe a model to estimate what fraction of DNA becomes cell-free DNA in plasma. This is of great interest to the community, as the amount of DNA from a certain tissue (for example, a tumor) that becomes available for detection in the blood has important implications for disease detection.

      However, the authors' methods do not consider important variables related to cell-free DNA shedding and storage, and their results may thus be inaccurate. At this stage of the paper, the methods section lacks important detail. Thus, it is difficult to fully assess the manuscript and its results.

      Strengths:

      The question asked by the authors has potentially important implications for disease diagnosis. Understanding how genomic DNA degrades in the human circulation can guide towards ways to enrich for DNA of interest or may lead to unexpected methods of conserving cell-free DNA. Thus, the question "how much genomic DNA becomes cfDNA" is of great interest to the scientific and medical community. Once the weaknesses of the manuscript are addressed, I believe this manuscript has the potential to be a widely used resource.

      Weaknesses:

      There are two major weaknesses in how the analysis is presented. First, the methods lack detail. Second, the analysis does not consider key variables in their model.

      Issues pertaining to the methods section.<br /> The current manuscript builds a flux model, mostly taking values and results from three previous studies:

      1- The amount of cellular turnover by cell type, taken from Sender & Milo, 2021<br /> 2- The fractions of various tissues that contribute DNA to the plasma, taken from Moss et al, 2018 and Loyfer et al, 2023

      My expertise lies in cell-free DNA, and so I will limit my comments to the manuscripts in (2).

      Paper by Loyfer et al (additional context):

      Loyfer et al is a recent landmark paper that presents a computational method for deconvoluting tissues of origin based on methylation profiles of flow-sorted cell types. Thus, the manuscript provides a well-curated methylation dataset of sorted cell-types. The majority of this manuscript describes the methylation patterns and features of the reference methylomes (bulk, sorted cell types), with a smaller portion devoted to cell-free DNA tissue of origin deconvolution.

      I believe the data the authors are retrieving from the Loyfer study are from the 23 healthy plasma cfDNA methylomes analyzed in the study, and not the re-analysis of the 52 COVID-19 samples from Cheng et al (MED 2021).

      Paper by Moss et al (additional context):

      Moss et al is another landmark paper that predates the Loyfer et al manuscript. The technology used in this study (methylation arrays) is outdated but is an incredible resource for the community. This paper evaluates cfDNA tissues of origin in health and different disease scenarios. Again, I assume the current manuscript only pulled data from healthy patients, although I cannot be sure as it is not described in the methods section.

      This manuscript:

      The current manuscript takes (I think) the total cfDNA concentration from males and females from the Moss et al manuscript (pooled cfDNA; 2 young male groups, 2 old male groups, 2 young female groups, 2 old female groups, Supplementary Dataset; "total_cfDNA_conc" tab). I believe this is the data used as total cfDNA concentration. It would be beneficial for all readers if the authors clarified this point.

      The tissues of origin, in the supplemental dataset ("fraction" tab), presents the data from 8 cell types (erythrocytes, monocytes/macrophages, megakaryocytes, granulocytes, hepatocytes, endothelial cells, lymphocytes, other). The fractions in the spreadsheet do not match the Loyfer or Moss manuscripts for healthy individuals. Thus, I do not know what values the supplementary dataset represents. I also don't know what the deconvolution values are used for the flux model.

      The integration of these two methods lack detail. Are the authors here using yields (ie, cfDNA concentrations) from Moss et al, and tissue fractions from Loyfer et al? If so, why? There are more samples in the Loyfer manuscript, so why are the samples from Moss et al. being used? The authors are also selectively ignoring cell-types that are present in healthy individuals (Neurons from Moss et al, 2018). Why?

      Appraisal:

      At this stage of the manuscript, I think additional evidence and analysis is required to confirm the results in the manuscript.

      Impact:

      Once the authors present additional analysis to substantiate their results, this manuscript will be highly impactful on the community. The field of liquid biopsies (non-invasive diagnostics) has the potential to revolutionize the medical field (and has already in certain areas, such as prenatal diagnostics). Yet, there is a lack of basic science questions in the field. This manuscript is an important step forward in asking more "basic science" questions that seek to answer a fundamental biological question.

    3. Reviewer #2 (Public Review):

      Summary:

      Cell-free DNA (cfDNA) are short DNA fragments released into the circulation when cells die. Plasma cfDNA level is thought to reflect the degree of cell-death or tissue injury. Indeed, plasma cfDNA is a reliable diagnostic biomarker for multiple diseases, providing insights into disease severity and outcomes. In this manuscript, Dr. Sender and colleagues address a fundamental question: What fraction of DNA released from cell death is detectable as plasma cfDNA? The authors use public data to estimate the amount of DNA produced from dying cells. They also utilize public data to estimate plasma cfDNA levels. Their calculations showed that <10% of DNA released is detectable as plasma cfDNA, the fraction of detectable cfDNA varying by tissue sources. The study demonstrates new and fundamental principles that could improve disease diagnosis and treatment via cfDNA.

      Strengths:

      1. The experimental approach is resource-mindful taking advantage of publicly available data to estimate the fraction of detectable cfDNA in physiological states. The authors did not assess if the fraction of detectable cfDNA changes in disease conditions. Nonetheless, their pioneering study lays the foundation and provides the methods needed for a similar assessment in disease states.<br /> 2. The findings of this study potentially explain discrepancies in measured versus expected tissue-specific cfDNA from some tissues. For example, the gastrointestinal tract is subject to high cell turnover and release of DNA. Yet, only a small fraction of that DNA ends up in plasma as gastrointestinal cfDNA.<br /> 3. The study proposes potential mechanisms that could account for the low fraction of detectable cfDNA in plasma relative to DNA released. This includes intracellular or tissue machinery that could "chew up" DNA released from dying cells, allowing only a small fraction to escape into plasma as cfDNA. Could this explain why the gastrointestinal track with an elaborate phagosome machinery contributes a small fraction of plasma cfDNA? Given the role of cfDNA as damage-associated molecular pattern in some diseases, targeting such a machinery may provide novel therapeutic opportunities.

      Weaknesses: In vitro and in vivo studies are needed to validate these findings and define tissue machinery that contribute to cfDNA production. The validation studies should address the following limitations of the study design: -

      1. Align the cohorts to estimate DNA production and plasma cfDNA levels. Cellular turnover rate and plasma cfDNA levels vary with age, sex, circadian clock, and other factors (Madsen AT et al, EBioMedicine, 2019). This study estimated DNA production using data abstracted from a homogenous group of healthy control males (Sender & Milo, Nat Med 2021). On the other hand, plasma cfDNA levels were obtained from datasets of more diverse cohort of healthy males and females with a wide range of ages (Loyfer et al. Nature, 2023 and Moss et al., Nat Commun, 2018).<br /> 2. "cfDNA fragments are not created equal". Recent studies demonstrate that cfDNA composition vary with disease state. For example, cfDNA GC content, fraction of short fragments, and composition of some genomic elements increase in heart transplant rejection compared to no-rejection state (Agbor-Enoh, Circulation, 2021). The genomic location and disease state may therefore be important factors to consider in these analyses.<br /> 3. Alternative sources of DNA production should be considered. Aside from cell death, DNA can be released from cells via active secretion. This and other additional sources of DNA should be considered in future studies. The distinct characteristics of mitochondrial DNA to genomic DNA should also be considered.

    1. eLife assessment

      This study provides valuable findings on a causative relationship between LRRC23 mutations and male infertility due to asthenozoospermia. The evidence supporting the conclusions is solid despite a lack of analyses of the effects of the mutations detected on the flagellar structure of human sperm. This work will be of interest to biomedical researchers who work on sperm biology and non-hormonal male contraceptive development.

    2. Reviewer #1 (Public Review):

      Hwang et al., report that LRRC23 is required for RS3 head assembly and sperm motility, and the truncating LRRC23 is associated with asthenozoospermia in humans. They identified an LRRC23 variant in a consanguineous Pakistani family with infertile males diagnosed as asthenozoospermia and found this variant leads to early termination of LRRC23 translation with loss of 136 amino acids at the C-terminus. They generated Lrrc23 mutant mice that mimic the predicted outcome in human patients and found the truncated LRRC23 specifically disorganizes RS3 and the junctional structure between RS2 and RS3 in the sperm axoneme, which causes sperm motility defects and male infertility. These dates try to elucidate the pathogenicity of LRRC23 in asthenozoospermia. The conclusions of this paper are mostly well supported by data, but many aspects of data analyses and data interpretations need to be improved.

      1) The pathogenesis of truncating LRRC23 in asthenozoospermia needs to be further considered. The molecular mechanism of LRRC23 demonstrated in mice should be tested in patients with the LRRC23 variant. As it may be difficult to determine the structures of RS3 in the infertile male sperm, the LRRC23 localization should be observed in the sperm from patients with the LRRC23 variant.<br /> 2) The absence of the RS3 head in LRRC23Δ/Δ mouse sperm is not sufficient to support the specific localization of LRRC23 in RS3 head. Although LRRC23 might bind to RS head protein RSPH9, the authors state that "RSPH9 is a head component of RS1 and RS2 like in C. reinhardtii (Gui et al, 2021), but not of RS3" as the protein level and the localization of RSPH9 is not altered in LRRC23Δ/Δ sperm. Thus, the specific localization of LRRC23 in RS3 head should be further confirmed.<br /> 3) The interaction between LRRC23 and RSPH9 needs to be defined. AlphaFold models could help determine the likelihood of a direct interaction. In addition, the structure of the 96-nm modular repeats of axonemes from the flagella of human respiratory cilia has been determined (PMID: 37258679), and the localization of LRRC23 in RS could be further predicted.<br /> 4) The ortholog of the RSP15 may also be predicted or confirmed by using the reported structure in human respiratory cilia (PMID: 37258679). Whether the LRCC34 in RS2 is LRRC34?

    3. Reviewer #2 (Public Review):

      Summary:<br /> The present study explores the molecular function of LRRC23 in male fertility, specifically in the context of the regulation of spermiogenesis. The author initiates the investigation by identifying LRRC23 mutations as a potential cause of male sterility based on observations made in closely related individuals affected by asthenozoospermia (ASZ). To further investigate the function of LRRC23 in spermatogenesis, mutant mice expressing truncated LRRC23 proteins are created, aligning with the identified mutation site. Consequently, the findings confirm the deleterious effects of LRRC23 mutations on sperm motility in these mice while concurrently observing no significant abnormalities in the overall flagella structure. Furthermore, the study reveals LRRC23's interaction with the RS head protein RSPH9 and its active involvement in the assembly of the axonemal RS. Notably, LRRC23 mutations result in the loss of the RS3 head structure and disruption of the RS2-RS3 junction structure. Therefore, the author claimed that LRRC23 is an indispensable component of the RS3 head structure and suggests that mutations in LRRC23 underlie sterility in mice.

      Strengths:<br /> The key contribution of this article lies in confirming LRRC23's involvement in assembling the RS3 head structure in sperm flagella. This finding represents a significant advancement in understanding the complex architecture of the RS3 structural complex, building upon previous studies. Moreover, the article's topic is interesting and originates from clinical research, which holds significant implications for potential clinical applications.

      Weaknesses:<br /> 1. While the author generated mutant mice expressing truncated LRRC23 proteins, the expression of these truncated proteins was not detected in sperm. This implies that, in terms of sperm structure, the mutant LRRC23 protein behaves similarly to the complete knockout of the LRRC23 protein, which has been previously reported and characterized (Zhang et al., 2021).

      2. This reviewer questions the proposal that LRRC23 is an integral component of RS3, as the results indicate not only the loss of the RS3 head structure but also an incomplete RS2-RS3 junction structure. In addition, the interaction of LRRC23 with RSPH9 alone does not fully explain its involvement solely in RS3 assembly. Additional evidence is required to examine the influence of LRRC23 on the RS2-RS3 junction.

      3. The article does not explore how these mutations affect the flagella structure in human sperm, which needs further study. Expanding the study to include human sperm structure would undoubtedly enhance the quality of the article.

    1. eLife assessment:

      This important study sheds light on several apparent discrepancies observed across animal studies examining neuroimaging biomarkers of functional recovery following focal ischemia. Using 2-photon imaging of calcium activity in awake mice, the authors show solid evidence that deficits in neuronal activity and functional connectivity after photothrombosis occur within a very small distance from the infarct (<750 microns) whereas these measures were relatively unaltered more distally, even those typically implicated with functional remapping of the forelimb representation in anaesthetized animals. These findings reveal a complex spatiotemporal relationship between perilesional neuronal network function and behavioral recovery that is more nuanced than previously reported, and motivates the need for better criteria for what is considered remapping.

    2. Reviewer #1 (Public Review):

      Summary: This impressive study by Bandet and Winship uses 2-photon imaging in awake-behaving mice to examine long-term changes in neural activity and functional connectivity after focal ischemic stroke. The authors discover that there are long-lasting perturbations in neural activity and functional connectivity, specifically within peri-infarct cortex but not more distant cortical regions. Overall I thought the study provided important new findings that were supported by compelling data.

      Strengths: This is a technically challenging study to perform and the authors show high-quality data. The manuscript was well-written, and the figures were clearly presented. I really like the analytic tools they applied, which were rigorous and provided some novel insights regarding neural activity patterns during movement or rest. The discovery of long-lasting impairments in neural activity/functional connectivity is an important one as it is important for future stroke studies to recognize what problems need to be rectified in the post-stroke brain, even many weeks after injury. They also suggest there is a much more nuanced relationship between macroscopic changes in somatosensory maps and single-cell activity. Overall, I think this is a well-executed study whose primary conclusions were justified by the data presented.

      Weaknesses: I found very little in the way of weaknesses. The statistics were notably conservative and are appropriate.

    3. Reviewer #2 (Public Review):

      This study investigates the excitability of neurons in the peri-infarct cortex during recovery from ischemic stroke. The excitability of neurons in the peri-infarct cortex during stroke recovery has produced contradictory findings: some studies suggest hyper-excitability to direct-brain stimulation, while others indicate diminished responsiveness to physical stimuli. However, most studies have used anesthetized animals, which can disrupt cortical activity and functional connectivity. The present study used two-photon Ca2+ imaging after focal photothrombotic stroke to examine neural activity patterns in awake mice. The authors found reduced neuronal spiking in the peri-infarct cortex that was strongly correlated with motor performance deficits. Additionally, the authors found disruptions in neural activation, functional connectivity, and assembly architecture in the immediate peri-infarct region but not in the distal cortex regions.

      The findings of this study are very important as they show that there is no measurable change in terms of neuronal activation and reorganization in distal regions of remapped cortical response areas after stroke.

      However, cortical response areas are calculated using a threshold of 95% peak activity within a trial. The threshold is presumably used to discriminate between the sensory-evoked response and collateral activation / less "relevant" response (noise). Since the peak intensity is lower after stroke, the "response" area is larger - lower main signal results in less noise exclusion. Predictably, areas that show a higher response before stroke than after are excluded from the response area before the stroke and included after.

      We suggest a reinterpretation of the findings: much of the non-remapped areas are included when using a within-trial threshold as a criterion, and the absence of increased neuronal activation and reorganization is evidence for this claim. The take-home message of this study should be that we need a much better criterion for what we consider remapping.

    1. eLife assessment

      In this work, the authors make a valuable contribution based on convincing evidence that children 6-to-7-years-old improve in 2 years of development towards utilising more optimal value-based decision-making strategies while performing a reinforcement learning task. They found that delayed feedback learning was associated with volume in the hippocampus while immediate feedback learning was not. Striatal volume was associated with both forms of learning, in contrast to prior research funding in adults. Brain-behaviour correlations were stable across the 2-year period, despite the hippocampus increasing in volume and striatal volume remaining stable.

    2. Reviewer #1 (Public Review):

      Existing literature suggests that brain structures implicated in memory such as the hippocampus, and reward/punishment processing such as the striatal regions are also engaged in learning and value-based decision-making. However, how the contributions of these regions to learning and value-based decision-making change over time, particularly in children where these neural systems show protracted maturation was not studied systematically. This is the question the authors are aiming to address in this work in which children 6-to-7-years-old were recruited for a neuroimaging study that involves taking structural scans from this cohort to investigate how they correlate with changes in the way children approach a reinforcement learning task in which they learn to identify the better shape between 2 options through trial-and-error.

      Particular strengths of the paper are longitudinally following up a cohort of small children and engaging them in a value-based decision-making task so that the relationship between neural maturation and improvements in reinforcement learning can be studied reliably. Towards this end, the authors make use of well-established computational modelling approaches to extract key parameters such as learning rates (which designate the speed of learning from expected versus actual outcomes) or choice stochasticity (which designate the inherent variation in people's decisions and the tendency to explore between the options) from children's choices so that their structural neural correlates can be established. As a part of this endeavour, the authors rely on methodological choices which do not warrant much criticism. Their data visualization choices are particularly spot-on and highly informative about the details of the raw data.

      Also considering the importance of the hippocampal system in human memory, the key contribution of the paper is that the volumetric increases in hippocampus size between 2 assessment points correlated selectively with the delayed, but not immediate, learning score which refers to the learning condition in which the outcome feedback is given to the children after a 5-seconds delay. Although the authors also demonstrate evidence to suggest that changes in the striatal volume are also implicated in learning performance, this was more general as associations were found for both immediate and delayed feedback conditions. Thus, the paper makes an important contribution to the fields of developmental and decision neuroscience. An important question arising from the authors' findings could be that, whether the hippocampus maintains this selective role in value-based learning during the course of neuronal development, for example, whether a similar association would be found in children 8-to-9 years old. A better understanding of how these developmental trajectories map onto changes in learning and decision-making can inform fields outside neuroscience, for example tailoring educational approaches onto neural development pathways to boost learning efficiency in young children.

    3. Reviewer #2 (Public Review):

      Summary:<br /> This is an interesting and impressive study that provides a rare opportunity to learn about brain-behaviour links of learning systems at a relatively early stage of development.

      Strengths:<br /> The main strengths are that the authors followed a relatively large group of children over 2 years and used a reinforcement learning task aimed at assessing learning that depends on both the striatum and the hippocampus. The authors also included a thorough overview of the computational models and the choices they made. I think this paper would be of considerable interest and contributes to knowledge about how learning and memory systems change with development.

      Weaknesses:<br /> There were a few things that I thought would be helpful to clarify. First, what exactly are the anatomical regions included in the striatum here? Second, it was mentioned that for the reduced dataset, object recognition memory focused on "sure" ratings. This seems like the appropriate way to do it, but it was not clear whether this was also the case for the full analyses in the main text. Third, the children's fitted parameters were far from optimal; is it known whether adults would be closer to optimal on the task?

      The main thing I would find helpful is to better integrate the differences between the main results reported and the many additional results reported in the supplement, for example from the reduced dataset when excluding non-learners. I found it a bit challenging to keep track of all the differences with all the analyses and parameters. It might be helpful to report some results in tables side-by-side in the two different samples. And if relevant, discuss the differences or their implication in the Discussion. For example, if the patterns change when excluding the poor learners, in particular for the associations between delayed feedback and hippocampal volume, and those participants were also those less well fit by the value-based model, is that something to be concerned about and does that affect any interpretations? What was not clear to me is whether excluding the poor learners at one extreme simply weakens the general pattern, or whether there is a more qualitative difference between learners and non-learners. The discussion points to the relevance of deficits in hippocampal-dependent learning for psychopathology and understanding such a distinction may be relevant.

    1. eLife assessment

      This study presents useful findings regarding how a particular class of neurons within a brain region respond to threatening stimuli and their role in fear processing in male and female mice; these results are solid as they uncover the role functional of this brain region (BNST) in this particular type of processing and expand this knowledge by highlighting the function of a specific class of neurons (CRF) showing that their role in fear depends on the sex of the animal. However, the analysis is incomplete and can certainly benefit from additional (for example locomotor) controls and from clarifying interpretability issues with respect to sex differences in fear expression and to a precise role of these neurons. The work will be of interest to neuroscientists studying the biological basis of fear processing.

    2. Reviewer #1 (Public Review):

      The aim of this study is to test the overarching hypothesis that plasticity in BNST CRF neurons drives distinct behavioral responses to unpredictable threat in males and females. The manuscript provides evidence for a possible sex-specific role for CRF-expressing neurons in the BNST in unpredictable aversive conditioning and subsequent hypervigilance across sexes. As the authors note, this is an important question given the high prevalence of sex differences in stress-related disorders, like PTSD, and the role of hypervigilance and avoidance behaviors in these conditions. The study includes in vivo manipulation, bulk calcium imaging, and cellular resolution calcium imaging, which yield important insights into cell-type specific activity patterns. However, it is difficult to generate an overall conclusion from this manuscript, given that many of the results are inconsistent across sexes and across tests and there is an overall lack of converging evidence. For example, partial conditioning yields increased startle in males but not females, yet, CRF KO only increases startle response in males after full conditioning, not partial, and CRF neurons show similar activity patterns between partial and full conditioning across sexes. Further, while the study includes a KO of CRF, it does not directly address the stated aim of assessing whether plasticity in CRF neurons drives the subsequent behavioral<br /> effects unpredictable threat.

      A major strength of this manuscript is the inclusion of both males and females and attention to possible behavioral and neurobiological differences between them throughout. However, to properly assess sex-differences, sex should be included as a factor in ANOVA (e.g. for freezing, startle, and feeding data in Figure 1) to assess whether there is a significant main effect or interaction with sex. If sex is not a statistically significant factor, both sexes should be combined for subsequent analyses. See, Garcia-Sifuentes and Maney, eLife 2021 https://elifesciences.org/articles/70817. There are additional cases where t-tests are used to compare groups when repeated measures ANOVAs would be more appropriate and rigorous.

      Additionally, it's unclear whether the two sexes are equally responsive to the shock during conditioning and if this is underlying some of the differences in behavioral and neuronal effects observed. There are some reports that suggest shock sensitivity differs across sexes in rodents, and thus, using a standard shock intensity for both males and females may be confounding effects in this study.

      The data does not rule out that BNST CRF activity is not purely tracking the mobility state of the animal, given that the differences in activity also track with differences in freezing behavior. The data shows an inverse relationship between activity and freezing. This may explain a paradox in the data which is why males show a greater suppression of BNST activity after partial conditioning than full conditioning, if that activity is suspected to drive the increased anxiety-like response. Perhaps it reflects that activity is significantly suppressed at the end of the conditioning session because animals are likely to be continuously freezing after repeated shock presentations in that context. It would also explain why there is less of a suppression in activity over the course of the recall session, because there is less freezing as well during recall compared with conditioning.

      A mechanistic hypothesis linking BNST CRF neurons, the behavioral effects observed after fear conditioning, and manipulation of CRF itself are not clearly addressed here.

    3. Reviewer #2 (Public Review):

      This study examined the role of CRF neurons in the BNST in both phasic and sustained fear in males and females. The authors first established a differential fear paradigm whereby shocks were consistently paired with tones (Full) or only paired with tones 50% of the time (Part), or controls who were exposed to only tones with no shocks. Recall tests established that both Full and Part conditioned male and female mice froze to the tones, with no difference between the paradigms. Additional studies using the NSF and startle test, established that neither fear paradigm produced behavioral changes in the NSF test, suggesting that these fear paradigms do not result in an increase in anxiety-like behavior. Part fear conditioning, but not Full, did enhance startle responses in males but not females, suggesting that this fear paradigm did produce sustained increases in hypervigilance in males exclusively. Photometry studies found that while undifferentiated BNST neurons all responded to shock itself, only Full conditioning in males lead to a progressive enhancement of the magnitude of this response. BNST neurons in males, but not females, were also responsive to tone onset in both fear paradigms, but only in Full fear did the magnitude of this response increase across training. Knockdown of CRF from the BNST had no effect on fear learning in males or females, nor any effect in males on fear recall in either paradigm, but in females enhanced both baseline and tone-induced freezing only in Part fear group. When looking at anxiety following fear training, it was found in males that CRF knockdown modulated anxiety in Part fear trained animals and amplified startle in Fully trained males but had no effect in either test in females. Using 1P imaging, it was found that CRF neurons in the BNST generally decline in activity across both conditioning and recall trials, with some subtle sex differences emerging in the Part fear trained animals in that in females BNST CRF neurons were inhibited after both shock and omission trials but in males this only occurred after shock and not omission trials. In recall trials, CRF BNST neuron activity remained higher in Part conditioned mice relative to Full conditioned mice.

      Overall, this is a very detailed and complex study that incorporates both differing fear training paradigms and males and females, as well as a suite of both state of the art imaging techniques and gene knockdown approaches to isolate the role and contributions of CRF neurons in the BNST to these behavioral phenomena. The strengths of this study come from the thorough approach that the authors have taken, which in turn helped to elucidate nuanced and sex specific roles of these neurons in the BNST to differing aspects of phasic and sustained fear. More so, the methods employed provide a strong degree of cellular resolution for CRF neurons in the BNST. In general, the conclusions appropriately follow the data, although the authors do tend to minimize some of the inconsistencies across studies (discussed in more depth below), which could be better addressed through discussion of these in greater depth. As such, the primary weakness of this manuscript comes largely from the discussion and interpretation of mixed findings without a level of detail and nuance that reflects the complexity, and somewhat inconsistency, across the studies. These points are detailed below:

      -Given the focus on CRF neurons in the BNST, it is unclear why the photometry studies were performed in undifferentiated BNST neurons as opposed to CRF neurons specifically (although this is addressed, to some degree, subsequently with the 1P studies in CRF neurons directly). This does limit the continuity of the data from the photometry studies to the subsequent knockdown and 1P imaging studies. The authors should address the rationale for this approach so it is clear why they have moved from broader to more refined approaches.

      -The CRF KD studies are interesting, but it remains speculative as to whether these effects are mediated locally in the BNST or due to CRF signaling at downstream targets. As the literature on local pharmacological manipulation of CRF signaling within the BNST seems to be largely performed in males, the addition of pharmacological studies here would benefit this to help to resolve if these changes are indeed mediated by local impairments in CRF release within the BNST or not. While it is not essential to add these experiments, the manuscript would benefit from a more clear description of what pharmacological studies could be performed to resolve this issue.

      -While I can appreciate the authors perspective, I think it is more appropriate to state that startle correlates with anxiety as opposed to outright stating that startle IS anxiety. Anxiety by definition is a behavioral cluster involving many outputs, of which avoidance behavior is key. Startle, like autonomic activation, correlates with anxiety but is not the same thing as a behavioral state of anxiety (particularly when the startle response dissociates from behavior in the NSF test, which more directly tests avoidance and apprehension). Throughout the manuscript the use of anxiety or vigilance to describe startle becomes interchangeable, but then the authors also dissociate these two, such as in the first paragraph of the discussion when stating that the Part fear paradigm produces hypervigilance in males without influencing fear or anxiety-like behaviors. The manuscript would benefit from harmonization of the language used to operationally define these behaviors and my recommendation would be to remain consistent with the description that startle represents hypervigilance and not anxiety, per se.

      -The interpretation of the anxiety data following CRF KD is somewhat confusing. First, while the authors found no effect of fear training on behavior in the NSF test in the initial studies, now they do, however somewhat contradictory to what one would expect they found that Full fear trained males had reduced latency to feed (indicative of an anxiolytic response), which was unaltered by CRF KD, but in Part fear (which appeared to have no effect on its own in the NSF test), KD of CRF in these animals produced an anxiolytic effect. Given that the Part fear group was no different from control here it is difficult to interpret these data as now CRF KD does reduce latency to feed in this group, suggesting that removal of CRF now somehow conveys an anxiolytic response for Part fear animals. In the discussion the authors refer to this outcome as CRF KD "normalizing" the behavior in the NSF test of Part fear conditioned animals as now it parallels what is seen after Full fear, but given that the Part fear animals with GFP were no different then controls (and neither of these fear training paradigms produced any effect in the NSF test in the first arm of studies), it seems inappropriate to refer to this as "normalization" as it is unclear how this is now normalized. Given the complexity of these behavioral data, some greater depth in the discussion is required to put these data in context and describe the nuance of these outcomes, in particular a discussion of possible experimental factors between the initial behavioral studies and those in the CRF KD arm that could explain the discrepancy in the NSF test would be good (such as the inclusion of surgery, or other factors that may have differed between these experiments). These behavioral outcomes are even more complex given that the opposite effect was found in startle whereby CRF KD amplified startle in Full trained animals. As such, this portion of the discussion requires some reworking to more adequately address the complexity of these behavioral findings.

    4. Reviewer #3 (Public Review):

      Hon et al. investigated the role of BNST CRF signaling in modulating phasic and sustained fear in male and female mice. They found that partial and full fear conditioning had similar effects in both sexes during conditioning and during recall. However, males in the partially reinforced fear conditioning group showed enhanced acoustic startle, compared to the fully reinforced fear conditioning group, an effect not seen in females. Using fiber photometry to record calcium activity in all BNST neurons, the authors show that the BNST was responsive to foot shock in both sexes and both conditioning groups. Shock response increased over the session in males in the fully conditioned fear group, an effect not observed in the partially conditioned fear group. This effect was not observed in females. Additionally, tone onset resulted in increased BNST activity in both male groups, with the tone response increasing over time in the fully conditioned fear group. This effect was less pronounced in females, with partially conditioned females exhibiting a larger BNST response. During recall in males, BNST activity was suppressed below baseline during tone presentations and was significantly greater in the partially conditioned fear group. Both female groups showed an enhanced BNST response to the tone that slowly decayed over time. Next, they knocked CRF in the BNST to examine its effect on fear conditioning, recall and anxiety-like behavior after fear. They found no effect of the knockdown in either sex or group during fear conditioning. During fear recall, BNST CRF knockdown lead to an increase in freezing in only the partially conditioned females. In the anxiety-like behavior tasks, BNST CRF knockdown lead to increased anxiolysis in the partially reinforced fear male, but not in females. Surprisingly, BNST CRF knockdown increased startle response in fully conditioned, but not partially conditioned males. An effect not observed in either female group. In a final set of experiments, the authors single photon calcium imaging to record BNST CRF cell activity during fear conditioning and recall. Approximately, 1/3 of BNST CRF cells were excited by shock in both sexes, with the rest inhibited and no differences were observed between sexes or group during fear conditioning. During recall, BNST CRF activity decreased in both sexes, an effect pronounced in male and female fully conditioned fear groups.

      Overall, these data provide novel, intriguing evidence in how BNST CRF neurons may encode phasic and sustained fear differentially in males and females. The experiments were rigorous.

    1. eLife assessment

      In this potentially useful study, the authors employ concepts and algorithms associated with induced subgraph in graph theory to automate several key but non-trivial steps in the development of coarse-grained models. These developments can help to model complex biomolecular systems at the coarse-grained level., but given the limited number of examples explicitly discussed, the demonstration of the general applicability of the approach to biomolecular systems is considered incomplete.

    2. Reviewer #1 (Public Review):

      Summary:<br /> In this study, the authors provide a new computational platform called Vermouth to automate topology generation, a crucial step that any biomolecular simulation starts with. Given a wide arrange of chemical structures that need to be simulated, varying qualities of structural models as inputs obtained from various sources, and diverse force fields and molecular dynamics engines employed for simulations, automation of this fundamental step is challenging, especially for complex systems and in case that there is a need to conduct high-throughput simulations in the application of computer-aided drug design (CADD). To overcome this challenge, the authors develop a programming library composed of components that carry out various types of fundamental functionalities that are commonly encountered in topological generation. These components are intended to be general for any type of molecules and not to depend on any specific force field and MD engines. To demonstrate the applicability of this library, the authors employ those components to re-assemble a pipeline called Martinize2 used in topology generation for simulations with a widely used coarse-grained model (CG) MARTINI. This pipeline can fully recapitulate the functionality of its original version Martinize but exhibit greatly enhanced generality, as confirmed by the ability of the pipeline to faithfully generate topologies for two high-complexity benchmarking sets of proteins.

      Strengths:<br /> The main strength of this work is the use of concepts and algorithms associated with induced subgraph in graph theory to automate several key but non-trivial steps of topology generation such as the identification of monomer residue units (MRU), the repair of input structures with missing atoms, the mapping of topologies between different resolutions, and the generation of parameters needed for describing interactions between MRUs.

      Weaknesses:<br /> Although the Vermouth library appears promising as a general tool for topology generation, there is insufficient information in the current manuscript and a lack of documentation that may allow users to easily apply this library. More detailed explanation of various classes such as Processor, Molecule, Mapping, ForceField etc. that are mentioned is still needed, including inputs, output and associated operations of these classes. Some simple demonstration of application of these classes would be of great help to users. The formats of internal databases used to describe reference structures and force fields may also need to be clarified. This is particularly important when the Vermouth needs to be adapted for other AA/CG force fields and other MD engines.

      The successful automation of the Vermouth relies on the reference structures that need to be pre-determined. In case of the study of 43 small ligands, the reference structures and corresponding mapping to MARTINI-compatible representations for all these ligands have been already defined in the M3 force field and added into the Vermouth library. However, the authors need to comment on the scenario where significantly more ligands need to be considered and other force fields need to be used as CG representations with a lack of reference structures and mapping schemes.

    3. Reviewer #2 (Public Review):

      Summary:

      This manuscript by Kroon, Grunewald, Marrink and coworkers present the development of Vermouth library for coarse grain assignment and parameterization and an updated version of python script, the Martinize2 program, to build Martini coarse grained (CG) models, primarily for protein systems.

      Strengths:

      In contrast to many mature and widely used tools to build all-atom (AA) models, there are few well-accepted programs for CG model constructions and parameterization. The research reported in this manuscript is among the ongoing efforts to build such tools for Martini CG modeling, with a clear goal of high-throughput simulations of complex biomolecular systems and, ultimately, whole-cell simulations. Thus, this manuscript targets a practical problem in computational biophysics. The authors see such an effort to unify operations like CG mapping, parameterization, etc. as a vital step from the software engineering perspective.

      Weaknesses:

      However, the manuscript in this shape is unclear in the scientific novelty and appears incremental upon existing methods and tools. The only "validation" (more like an example application) is to create Martini models with two protein structure sets (I-TASSER and AlphaFold). The success rate in building the models was only 73%, while the significant failure is due to incomplete AA coordinates. This suggests a dependence on the input AA models, which makes the results less attractive for high-throughput applications (for example, preparation/creation of the AA models can become the bottleneck). There seems to be an improvement in considering the protonation state and chemical modification, but convincing validation is still needed. Besides, limitations in the existing Martini models remain (like the restricted dynamics due to the elastic network, the electrostatic interactions or polarizability).

    4. Reviewer #3 (Public Review):

      Summary:<br /> The manuscript Kroon et al. described two algorithms, which when combined achieve high throughput automation of "martinizing" protein structures with selected protonation states and post-translational modifications.

      Strengths:<br /> A large scale protein simulation was attempted, showing strong evidence that authors' algorithms work smoothly.

      The authors described the algorithms in detail and shared the open-source code under Apache 2.0 license on GitHub. This allows both reproducibility of extended usefulness within the field. These algorithms are potentially impactful if the authors can address some of the issues listed below.

      Weaknesses:<br /> One major caveat of the manuscript is that the authors claim their algorithms aim to "process any type of molecule or polymer, be it linear, cyclic,<br /> branched, or dendrimeric, and mixtures thereof" and "enable researchers to prepare simulation input files for arbitrary (bio)polymers". However, the examples provided by the manuscript only support one type of biopolymer, i.e. proteins. Despite the authors' recommendation of using polyply along with martinize2/vermouth, no concrete evidence has been provided to support the authors' claim. Therefore, the manuscript must be modified to either remove these claims or include new evidence.

      Method descriptions on Martinize2 and graph algorithms in SI should be core content of the manuscript. I argue that Figure S1 and Figure S2 are more important than Figure 3 (protonation state). I recommend the authors can make a workflow chart combining Figure S1 and S2 to explain Martinize2 and graph algorithms in main text.

      In Figure 3 (protonation state), the figure itself and the captions are ambiguous about whether at the end the residue is simply renamed from HIS to HIP, or if hydrogen is removed from HIP to recover HIS.

      In "Incorporating a Ligand small-molecule Database", the authors are calling for a community effort to build a small-molecule database. Some guidance on when the current database/algorithm combination does or does not work will help the community in contributing.

      A speed comparison is needed to compare Martinize2 and Martinize.

    1. Reviewer #1 (Public Review):

      This manuscript addresses the important and understudied issue of circuit-level mechanisms supporting habituation, particularly in pursuit of the possible role of increases in the activity of inhibitory neurons in suppressing behavioral output during long-term habituation. The authors make use of many of the striking advantages of the larval zebrafish to perform whole brain, single neuronal calcium imaging during repeated sensory exposure, and high throughput screening of pharmacological agents in freely moving, habituating larvae. Notably, several blockers/antagonists of GABAA(C) receptors completely suppress habituation of the O-bend escape response to dark flashes, suggesting a key role for GABAergic transmission in this form of habituation. Other substances are identified that strikingly enhance habituation, including melatonin, although here the suggested mechanistic insight is less specific. To add to these findings, a number of functional clusters of neurons are identified in the larval brain that have divergent activity through habituation, with many clusters exhibiting suppression of different degrees, in line with adaptive filtration during habituation, and a single cluster that potentiates during habituation. Further assessment reveals that all of these clusters include GABAergic inhibitory neurons and excitatory neurons, so we cannot take away the simple interpretation that the potentiating cluster of neurons is inhibitory and therefore exerts an influence on the other adapting (depressing) clusters to produce habituation. Rather, a variety of interpretations remain in play.

      Overall, there is great potential in the approach that has been used here to gain insight into circuit-level mechanisms of habituation. There are many experiments performed by the authors that cannot be achieved currently in other vertebrate systems, so the manuscript serves as a potential methodological platform that can be used to support a rich array of future work. While there are several key observations that one can take away from this manuscript, a clear interpretation of the role of GABAergic inhibitory neurons in habituation has not been established. This potential feature of habituation is emphasized throughout, particularly in the introduction and discussion sections, meaning that one is obliged as a reader to interrogate whether the results as they currently stand really do demonstrate a role for GABAergic inhibition in habituation. Currently, the key piece of evidence that may support this conclusion is that picrotoxin, which acts to block some classes of GABA receptors, prevents habituation. However, there are interpretations of this finding that do not specifically require a role for modified GABAergic inhibition. For instance, by lowering GABAergic inhibition, an overall increase in neural activity will occur within the brain, in this case below a level that could cause a seizure. That increase in activity may simply prevent learning by massively increasing neural noise and therefore either preventing synaptic plasticity or, more likely, causing indiscriminate synaptic strengthening and weakening that occludes information storage. Sensory processing itself could also be disrupted, for instance by altering the selectivity of receptive fields. Alternatively, it could be that the increase in neural activity produced by the blockade of inhibition simply drives more behavioral output, meaning that more excitatory synaptic adaptation is required to suppress that output. The authors propose two specific working models of the ways in which GABAergic inhibition could be implemented in habituation. An alternative model, in which GABAergic neurons are not themselves modified but act as a key intermediary between Hebbian assemblies of excitatory neurons that are modified to support memory and output neurons, is not explored. As yet, these or other models in which inhibition is not required for habituation, have not been fully tested.

      This manuscript describes a really substantial body of work that provides evidence of functional clusters of neurons with divergent responses to repeated sensory input and an array of pharmacological agents that can influence the rate of a fundamentally important form of learning.

    2. Reviewer #2 (Public Review):

      In this study, Lamire et al. use a calcium imaging approach, behavioural tests, and pharmacological manipulations to identify the molecular mechanisms behind visual habituation. They show a valuable drug screen paradigm to assess the impact of pharmacological compounds on the behaviour of larval zebrafish.

      The pharmacological screen identifies an expected suppression of habituation by GABA receptor antagonists. More interestingly, it identifies potentially new contributions of melatonin receptor agonists, and oestrogen receptor agonists to habituation, as they seem to increase the rate of habituation.

      The volumetric calcium imaging of habituation to dark flashes is valuable, but the mix of responses to visual cues that are not relevant to the dark flash escape, such as the slow increase back to baseline luminosity, lowers the clarity of the results. The link between the calcium imaging results and free-swimming behaviour is not especially convincing, however, that is a common issue of head-restrained imaging with larval zebrafish. The identification of a cluster of neurons with potentiating responses, which could drive the habituation is intriguing, but more characterizations of these neurons would be needed to fully understand their function in habituation. The pharmacological manipulation of the habituation circuits mapped in the first part does not arrive at any satisfying conclusion, which is acknowledged by the authors.

      Overall, the authors did identify interesting new molecular pathways that may be involved in habituation to dark flashes. Their screening approach, while not novel, will be a powerful way to interrogate other behavioural profiles. The authors identified circuit loci apparently involved in habituation to dark flashes, and the potentiation and no adaptation clusters have not been previously observed and are interesting targets for future work. This work suggests that the circuits and mechanisms underlying habituation are likely more complex than anticipated. The data will be useful to guide follow-up experiments by the community on the new pathway candidates that this screen has uncovered, including behaviours beyond dark flash habituation.

    3. Reviewer #3 (Public Review):

      To analyze the circuit mechanisms leading to the habituation of the O-bed responses upon repeated dark flashes (DFs), the authors performed 2-photon Ca2+ imaging in larvae expressing nuclear-targeted GCaMP7f pan-neuronally panning the majority of the midbrain, hindbrain, pretectum, and thalamus. They found that while the majority of neurons across the brain depress their responsiveness during habituation, a smaller population of neurons in the dorsal regions of the brain, including the torus longitudinalis, cerebellum, and dorsal hindbrain, showed the opposite pattern, suggesting that motor-related brain regions contain non-depressed signals, and therefore likely contribute to habituation plasticity.

      Further analysis using affinity propagation clustering identified 12 clusters that differed both in their adaptation to repeated DFs, as well as the shape of their response to the DF.

      Next by the pharmacological screening of 1953 small molecule compounds with known targets in conjunction with the high-throughput assay, they found that 176 compounds significantly altered some aspects of measured behavior. Among them, they sought to identify the compounds that 1) have minimal effects on the naive response to DFs, but strong effects during the training and/or memory retention periods, 2) have minimal effects on other aspects of behaviors, 3) show similar behavioral effects to other compounds tested in the same molecular pathway, and identified the GABAA/C Receptor antagonists Bicuculline, Amoxapine, and Picrotoxinin (PTX). As partial antagonism of GABAAR and/or GABACR is sufficient to strongly suppress habituation but not generalized behavioral excitability, they concluded that GABA plays a very prominent role in habituation. They also identified multiple agonists of both Melatonin and Estrogen receptors, indicating that hormonal signalling may also play a prominent role in habituation response.

      To integrate the results of the Ca2+ imaging experiments with the pharmacological screening results, the authors compared the Ca2+ activity patterns after treatment with vehicle, PTX, or Melatonin in the tethered larvae. The behavioral effects of PTX and Melatonin were much smaller compared with the very strong behavioral effects in freely-swimming animals, but the authors assumed that the difference was significant enough to continue further experiments. Based on the hypothesis that Melatonin and GABA cooperate during habituation, they expected PTX and Melatonin to have opposite effects. This was not the case in their results: for example, the size of the 12(Pot, M) neuron population was increased by both PTX and Melatonin, suggesting that pharmacological manipulations that affect habituation behavior manifest in complex functional alterations in the circuit, making capturing these effects by a simple difficult.

      Since the 12(𝑃𝑜𝑡, 𝑀) neurons potentiate their responses and thus could act to progressively depress the responses of other neuronal classes, they examined the identity of these neurons with GABA neurons. However, GABAergic neurons in the habituating circuit are not characterized by their Adaptation Profile, suggesting that global manipulations of GABAergic signalling through PTX have complex manifestations in the functional properties of neurons.

      Overall, the authors have performed an admirably large amount of work both in whole-brain neural activity imaging and pharmacological screening.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We would like to thank you for your thorough review of the manuscript. We have taken all comments into account in the revised version of the manuscript. Please find below our detailed responses to your comments.

      eLife assessment

      This study reports useful information on the limits of the organotypic culture of neonatal mouse testes, which has been regarded as an experimental strategy that can be extended to humans in the clinical setting for the conservation and subsequent re-use of testicular tissue. The evidence that the culture of testicular fragments of 6.5-day-old mouse testes does not allow optimal differentiation of steroidogenic cells is compelling and would be useful to the scientific community in the field for further optimizations.

      Thank you for this assessment. We have carefully considered all comments and made the requested revisions to improve the manuscript.

      Reviewer #1 (Public Review):

      In this manuscript, the authors aimed to compare, from testis tissues at different ages from mice in vivo and after culture, multiple aspects of Leydig cells. These aspects included mRNA levels, proliferation, apoptosis, steroid levels, protein levels, etc. A lot of work was put into this manuscript in terms of experiments, systems, and approaches. However, as written the manuscript is incredibly difficult to follow. The Introduction and Results sections contain rather loosely organized lists of information that were altogether confusing. At the end of reading these sections, it was unclear what advance was provided by this work. The technical aspects of this work may be of interest to labs working on the specific topics of in vitro spermatogenesis for fertility preservation but fail to appeal to a broader readership. This may be best exemplified by the statements at the end of both the Abstract and Discussion which state that more work needs to be done to improve this system.

      As suggested, we have reworked the manuscript to make it clearer, more meaningful and more precise. We believe that this work may be of interest to a broader readership. Indeed, the development of a model of in vitro spermatogenesis could be of interest for labs working on the specific period of puberty initiation, on germ and somatic cell maturation and on steroidogenesis under physiological and pathological conditions, and could also be useful for testing the toxicity of cancer therapies, drugs, chemicals and environmental agents (e.g. endocrine disruptors) on the developing testis.

      There is a crucial unmet need to optimize the culture conditions for in vitro spermatogenesis. It is important to identify the deregulated molecular mechanisms leading to a decreased in vitro spermatogenic yield. Such results will be of great help to improve organotypic culture conditions. In the present study, we not only uncovered for the first time a failure in adult Leydig cell development, but also an alteration in the expression of several steroidogenic and steroid-metabolizing genes, which could explain the accumulation of progesterone and estradiol and the deficiency of androstenedione in cultured tissues. This hyperestrogenic and hypoandrogenic environment could explain, at least in part, the low efficiency of in vitro spermatogenesis. Furthermore, we show that the addition of hCG (LH homolog) is not sufficient to facilitate Leydig cell differentiation, restore steroidogenesis and improve sperm yield. These data provide valuable information for improving culture conditions. More fundamentally, this culture system could be a useful tool for identifying factors that are essential for the differentiation and functionality of adult Leydig cells during puberty initiation.

      Recommendations For The Authors:

      This reviewer appreciates that a lot of work was put into this manuscript in terms of experiments, systems, and approaches. However, the manuscript needs significant revision, and in this reviewer's opinion is not appropriate for a broader readership journal. The results seem rather incremental, and the topic is too specialized in its current format.

      The manuscript was significantly revised taking into account the reviewer’s comments. In addition, as mentioned above, the development of a model of in vitro spermatogenesis could have wider applications and be of interest to a broader audience.

      Comments for improvement, roughly in order of appearance:

      1) Abstract - would recommend condensing to hit the main points of the manuscript.

      The abstract has been condensed as suggested.

      2) Introduction, overall - this is a rather loosely organized list of information that is not synthesized or communicated in a meaningful way. It contains overstatements and lumps together findings from both mice and primates and thus several statements for the actions of these steroid hormones are inaccurate. The authors rely much too heavily upon reviews and need to replace those with a more scholarly approach of carefully reading and citing primary literature.

      The Introduction has been reorganized to make it clearer, more synthetic, more meaningful and more accurate. Only findings from rodents are presented. We carefully read the literature and replaced most of reviews by primary literature.

      3) Results - this section was extremely difficult to read and comprehend, as it's essentially a laundry list of measurements of mRNAs, steroids, cholesterols, and proteins that go up or down or don't change at multiple ages, both in vitro and in vivo. The section would be improved greatly by an organization with rationale and concluding statements to prepare the reader for the factoid-style data that are presented.

      As suggested, the Results section has been improved by an organization with rationale and concluding statements to make it easier to read and comprehend.

      4) 47 - is this approach going to both "preserve and restore"? Sounds more like it will allow for the production of offspring, but the other goals are not going to happen from the approach listed in the latter part of that sentence - so not really "fertility restoration" but more of an insurance program that sperm can be produced for ART

      Freezing of prepubertal testicular tissue, which contains spermatogonia, is a fertility preservation option proposed to prepubertal boys with cancer prior to highly gonadotoxic treatments. Several fertility restoration strategies, which aim to allow the production of spermatozoa from cryopreserved spermatogonia, are being developed, including in vitro spermatogenesis. This sentence has been rewritten.

      5) 62 - specify whether this "decreased expression" is mRNA or protein, and is this because of a loss of Sertoli cells?

      “Decreased expression” was replaced by “decreased mRNA levels”. The results we obtained in the cited study (Rondanino et al., 2017) suggest that the decrease in Rhox5 mRNA levels is not the consequence of a change in the proportion of Sertoli cells but reflects an alteration in Rhox5 gene expression. In Figure 6U of the present study, we show indeed that there is no loss of Sertoli cells in organotypic cultures.

      6) 66 - what is "the first wave of mouse in vitro spermatogenesis"? Are these cultures from the first wave of mouse in vivo spermatogenesis, or is there a second wave of in vitro spermatogenesis? Please specify

      In the mouse, the first entry into meiosis occurs around 8-10 dpp and the first spermatozoa are produced at around 35 dpp: this is the first wave of spermatogenesis which takes place at the onset of puberty. By culturing 6 dpp-old testes for 30 days, our aim is to reproduce in vitro all the stages of this first wave of spermatogenesis, i.e. entry into meiosis, completion of meiosis and spermiogenesis.

      In the cited study (Pence et al., 2019), the authors cultured 5 dpp testes for 35 to 49 days and observed a decline in intratesticular testosterone levels in the cultured tissues, i.e. after the end of the first spermatogenic wave, compared to in vivo controls. Our sentence has been rewritten to make it clearer.

      7) 78 - is there a difference in T production by Fetal vs Adult LCs? It is this reviewer's understanding that the levels of T around birth in mice (and then a few months after birth in humans) are quite high, similar to adults. So, what are the authors suggesting here by providing the list of expressed genes in these two LC populations?

      As mentioned in the Introduction section, 17β-HSD3 – the enzyme responsible for the conversion of androstenedione to T – is not expressed in fetal Leydig cells but is expressed in adult Leydig cells. Therefore, unlike adult Leydig cells, fetal Leydig cells are not capable of synthesizing T.

      In the present study, we investigated steroidogenesis but also wondered which types of Leydig cells could be detected under in vitro conditions. It is therefore important to explain to the reader which steroidogenic proteins are expressed by the different Leydig cell populations.

      As described in O’Shaughnessy et al., 2002, levels of intratesticular T decline after birth, being very low between 10 and 20 dpp. Then, T levels increase. At 25 dpp, T levels are close to those observed at 1 dpp. T levels increase more than 16-fold between 25 and 30 dpp and then double between 30 dpp and adulthood. Therefore, intratesticular T levels around birth in mice are not as high as in adults, but are about 36-fold lower after birth than in adulthood. It has been shown that in the fetal testis, the conversion of androstenedione produced by fetal Leydig cells is achieved by the adjacent fetal Sertoli cells that express 17β-HSD3 (O’Shaughnessy et al., 2000; Shima et al., 2013). During postnatal development however, Sertoli cells lose the expression of 17β-HSD3 (O’Shaughnessy et al., 2000).

      8) 79 -99 - can the authors revise this long list of information to provide a summary of what they are trying to communicate to the reader? What is the intention of this information?

      This paragraph has been modified to make it clearer and more synthetic. As different Leydig cell markers are presented in the Results section, it is important to introduce the reader to the different types of Leydig cells, the proteins expressed by these cells and the factors involved in their proliferation and differentiation.

      9) 101-2 - replace "involved in" with a more meaningful word - and it is this reviewer's understanding that T has not been shown convincingly to have much of a role in spermatogonial development, at least in mice - that statement is likely true in primates, but not mice; provide primary literature citations to be more precise, rather than a broad review that covers multiple species

      “involved in” was replaced by “is essential for many aspects of spermatogenesis, including”. Moreover, we removed “spermatogonial proliferation and differentiation” and provide primary literature citations to be more precise.

      10) 105-7 - similar concern for E as for T, above - KO mouse models for ERalpha and beta did not show defects in spermatogenesis as described - not sure what evidence the authors are specifically referring to here - cite primary literature rather than a review on Vitamin D + estrogen

      We agree that the question of whether estrogens play a direct role in spermatogenesis was unanswered by the ER null mice. However, estrogens have been shown to be important for the long-term maintenance of spermatogenesis in the ArKO mouse (Robertson et al., 1999) and for the progression of normal germ cell development in the ENERKI mouse (Sinkevicius et al., 2009). This sentence has been reworded and primary literature is cited to be more precise.

      11) 113-4 - there is no convincing evidence this reviewer is aware of that the AR is expressed in male germ cells, and therefore T actions on germ cells are indirect, through Sertoli cells and perhaps PTMs; if there is some, this sentence needs a citation showing that

      We agree that there is no evidence that AR is expressed in male germ cells and that T acts indirectly on germ cells. This sentence has been rewritten.

      12) 114-6 - this is untrue - nowhere in that paper was testosterone or androgen even mentioned!

      This reference has been removed. We apologize for this mistake.

      13) 116-7 - again, E actions through the ERs are thought to be indirect in the testis, not acting on germ cells; if this is incorrect, please add supportive citations and explain; replace "involved" with a more meaningful word; Rhox5 has a very minor role in spermatogenesis

      In contrast to androgen receptors, which are localized in somatic cells, estrogen receptors have been found in most testicular cells, including germ cells. The studies reporting the expression of estrogen receptors in germ cells are cited in the Introduction section. The word “involved” was replaced by “promotes”.

      Rhox5 (also known as Pem) has not a very minor role in spermatogenesis. On the contrary, its expression is crucial for normal spermatogenesis and sperm maturation, as loss of Rhox5 in male mice leads to reduced fertility, increased germ cell apoptosis, decreased sperm count and decreased sperm motility (MacLean et al., 2005).

      14) 117 - Ref 29 does not support the statement about Rhox5's role in spermatogenesis

      The reference (MacLean et al., 2005), supporting the statement about Rhox5’s role in spermatogenesis, was added in the manuscript.

      15) 120 - Does FAAH have a protective role in that it is anti-apoptotic? Or just required for some other Sertoli cell function? Should re-word to be more specific.

      FAAH (fatty acid amide hydrolase), whose expression is stimulated by estrogens, has been shown to have a crucial role in promoting survival of Sertoli cells by degrading anandamide (N-arachidonoylethanolamine), an endocannabinoid which has a pro-apoptotic activity (Rossi et al., 2007).

      The sentence has been reworded to be more specific.

      16) 127 - should complete the Introduction with a sentence summarizing what was done and found, for reader clarity

      The Introduction has been completed for reader clarity.

      17) 136 - misspelled the procedure

      Orchidectomy was replaced by orchiectomy.

      18) Mice - why use half-day nomenclature for postpartum mice? This is not standard in the literature.

      Half-day nomenclature was used due to the uncertainty of the time of birth, which mostly takes place during the night. Since this is not standard in the literature, half-day nomenclature was removed in the entire manuscript.

      19) 172-3 - the half-life of RA is very short (<1 hr), and it is light-sensitive. This addition every 8 days means that retinoids are present for a very minimal window of time - are the authors sure retinoids have no requirement elsewhere during spermatogenesis? And in the literature, the measured pulse of RA in the mouse lasts >40 hours (stages VII-IX)...

      RA is mandatory for proper spermatogenesis and is needed many times during spermatogenesis (for review, see Schleif et al., 2022): RA is involved in spermatogonial differentiation, pre-meiotic activation and meiotic completion, establishment of the blood-testis barrier and spermiation. In our study, we did not add RA in the culture medium but retinol, the precursor of RA. Indeed, our previous studies have shown beneficial effects of retinol on in vitro spermatogenesis, including an increased production of spermatids with less nuclear alterations and DNA damage (Arkoun et al., 2015; Dumont et al., 2016).

      The reason we added retinol (and not RA, which has a very short half-life) in this study and in our previous studies is that it can be oxidized into RA but also be stored in Sertoli cells in the form of retinyl esters for later use. As retinol is photosensitive, handling and storage were performed in tubes covered with aluminum foil, which protects from direct light exposure.

      20) 362 - Start the Results section with a broader statement(s) that prepares the readers rather than jumping into specific experiments; it would be helpful for readers to have concluding sentences included as well for readers to navigate the Results section.

      As suggested, the Results section has been improved by an organization with rationale and concluding sentences to facilitate reading.

      21) 364 - KI67 is a marker of.

      Ki67 is widely used as a cell proliferation marker.

      22) 367 - replace "involved".

      “involved” was replaced by “necessary for”.

      23) What intensity thresholds were used to define a cell as positive or negative for a given marker? And there seemed to be no mention of controls - especially no primary antibody controls. This is a significant oversight if these were not done in parallel with every single immunostaining experiment.

      We did not apply intensity thresholds. Cells presenting detectable labeling were defined as positive, while unlabeled cells were defined as negative.

      Negative controls, performed by omitting the primary antibodies, were of course done in parallel to each immunostaining and are presented in Figure 1A, Figure 2J and Figure 5C. The mention of negative controls has been added in the Materials and methods section.

      24) 388 - INSL3 - is this referring to mRNA or protein? Protein nomenclature is used...

      INSL3 is here referring to the protein, whose concentrations were measured by radioimmunoassay.

      25) 402 - typo.

      “expect” was replaced by “except”.

      26) 409 - do mRNA levels really "determine the testicular steroidogenic potential"??

      This sentence has been reworded: “determine the testicular steroidogenic potential” was replaced by “highlight a potential deregulation of their expression”.

      27) 410 - western should not be capitalized.

      Western Blot was replaced by western blot in the entire manuscript.

      28) 405-28 - this reviewer is underwhelmed by qRT-PCR results for a handful of markers - what is the purpose? The results do not prove anything about the function of the system.

      As the differentiation of Leydig cells is not fully completed in organotypic cultures, we wanted to know which actors of the steroidogenic pathway show deregulated expression in vitro in comparison to physiological conditions, and thus which steps of the steroid hormone biosynthesis pathway may be impaired. We found that the expression of several genes encoding steroidogenic enzymes was decreased in vitro, notably that of Cyp17a1, necessary for the conversion of progesterone to androstenedione. Transcript levels of Hsd17b2, encoding an enzyme that converts estradiol to estrone and testosterone to androstenedione, were also decreased at D30.

      Our data therefore show that the expression of several steroidogenic genes and steroid metabolizing genes is deregulated in organotypic cultures but we agree that these results do not prove anything about the function of the system.

      We then found an accumulation of estradiol and progesterone, a decrease in androstenedione and unchanged testosterone levels in cultured tissues. The elevation in progesterone and the reduction in androstenedione in in vitro matured tissues could arise from the reduced expression of Cyp17a1. In addition, reduced Hsd17b2 transcript levels may explain why estradiol levels remain elevated in cultures while testosterone levels are similar to controls and androstenedione levels are low.

      29) How do the authors interpret data gleaned from tissues containing a variably-sized necrotic core?

      In the present study, the central necrotic area was consistent between all samples and variables: it represents on average 16-27% of the explants.

      As in our previous publications and recent RNA-seq analyses (Rondanino et al., 2017; Oblette et al., 2019; Dumont et al., 2023), the central necrotic area was removed so that transcript and protein levels in the healthy part of the samples (i.e. where in vitro spermatogenesis occurs) could be measured and compared with in vivo controls. In order to be able to compare the healthy part of the in vitro matured tissues with in vivo controls, transcript levels were normalized to housekeeping genes (Gapdh and Actb) or to the Leydig cell-specific gene Hsd3b1 while protein levels were normalized to ACTB or to 3β-HSD.

      30) 520 - after reading to this point, this reviewer was left confused and wondering why any of this is important to the reader unless that reader specifically works on this topic. The way the data were presented makes it nearly impossible for the reader to keep any of the data in their mind as they read. It's a seemingly endless list of ups and downs of many things under many conditions. What is the point of all of this? How will it advance our understanding of spermatogenesis? Or improve in vitro culture? Or help prepubertal cancer patients? Presumably, that will be explained in the Discussion, but at this point, this reviewer honestly has no idea what this all means. Why is this important??

      We have modified the Results section by including rationale and concluding statements to make it easier to read and follow for all readers, not necessarily for those working on this topic.

      As mentioned above, the identification of the molecular mechanisms that are deregulated in vitro will give us important insights for the optimization of the culture system. The development of an optimized model of in vitro spermatogenesis could lead to several applications, including improving our knowledge of the regulation of spermatogenesis during pubertal development.

      In this study, our main findings are that the differentiation of the adult Leydig cell lineage, steroid biosynthesis, metabolism and signaling are altered in organotypic cultures, leading to a hyperestrogenic and hypoandrogenic environment. In addition, we show that the presence of an LH homolog, known to be critical to adult Leydig cell differentiation and to stimulate steroidogenesis, does not rescue the expression of adult Leydig cell markers and of several steroidogenic genes, steroid metabolizing genes and steroid target genes. Other factors required for Leydig cell maturation and functionality will have to be tested in the future on cultured testicular tissues. Improvements to this in vitro maturation procedure in animal models may be useful for future cultures of human testicular biopsies, although we are aware that more work needs to be done before prepubertal cancer patients can benefit from this in vitro maturation approach.

      31) 619-20 - this sort of summarizes this reviewer's overall opinion of the manuscript. Not much seems to have been learned here that would justify publication in a broad readership journal like eLife. More work needs to be done to provide that sort of meaningful advance. The current work, with considerable re-writing to improve accuracy and clarity, is much better suited to a specialty journal where others who are working on this specific topic will appreciate its value.

      We have carefully considered the reviewer’s comments and modified the manuscript to improve accuracy and clarity. We understand the reviewer’s point of view, but we believe that this work may be of interest not only to labs working on fertility preservation and restoration, but also to those working on puberty initiation, germ and somatic cell maturation, steroidogenesis under physiological and pathological conditions, and on the effect of cancer therapies, drugs, chemicals and environmental agents (e.g. endocrine disruptors) on the developing testis.

      As mentioned above, we not only uncovered for the first time a failure in adult Leydig cell development, but also an alteration in the expression of several steroidogenic and steroid-metabolizing genes, which could explain the accumulation of progesterone and estradiol and the deficiency of androstenedione in cultured tissues. This hyperestrogenic and hypoandrogenic environment could explain, at least in part, the low efficiency of in vitro spermatogenesis. Furthermore, we show that the addition of hCG (LH homolog) is not sufficient to facilitate Leydig cell differentiation, restore steroidogenesis and improve sperm yield. These data provide valuable information for improving culture conditions. More fundamentally, this culture system could be a useful tool for identifying factors that are essential for the differentiation and functionality of adult Leydig cells during puberty initiation.

      32) Why are the figures repeated at the end of the manuscript?

      During the submission process, our bioRxiv preprint (which contains the figures) was merged with the same but higher quality figures.

      Reviewer #2 (Public Review):

      Preserving and restoring the fertility of prepubertal patients undergoing gonadotoxic treatments involves freezing testicular fragments and waking them up in a culture in the context of medically assisted procreation. This implies that spermatogenesis must be fully reproduced ex vivo. The parameters of this type of culture must be validated using non-human models. In this article, the authors make an extensive study of the quality of the organotypic culture of neonatal mouse testes, paying particular attention to the differentiation and endocrine function of Leydig cells. They show that fetal Leydig cells present at the start of culture fail to complete the differentiation process into adult Leydig cells, which has an impact on the nature of the steroids produced and even on the signaling of these hormones.

      The authors make an extensive study of the different populations of Leydig cells which are supposed to succeed each other during the first month of life of the mouse to end up with a population of adult and fully functional cells. The authors combine quantitative in situ studies with more global analyzes (RT-QtPCR Western blot, hormonal assays), which range from gene to hormone. This study is well written and illustrated, the description of the methods is honest, the analyses systematic, and are accompanied by multiple relevant control conditions.

      Since the aim of the study was to study Leydig cell differentiation in neonatal mouse testis cultures, the study is well conceived, the results answer the initial question and are not over-interpreted.

      My main concern is to understand why the authors have undertaken so much work when they mention RNA extractions and western blot, that the necrotic central part had to be carefully removed. There is no information on how this parameter was considered for immunohistochemistry and steroid measurements. The authors describe the initial material as a quarter testis, but they don't mention the resulting size of the fragment. A brief review of the literature shows that if often the culture medium is crucial for the quality of the culture (and in particular the supplementations as discussed by the authors here), the size of the fragments is also a determining factor, especially for long cultures. The main limitation of the study is therefore that the authors cannot exclude that central necrosis can have harmful effects on the survival and/or the growth and/or the differentiation of the testis in culture. In this sense, the general interpretation that the authors make of their work is correct, the culture conditions are not optimized.

      When using the organotypic culture system at a gas-liquid interphase, the central part of the testicular tissue becomes necrotic. As previously reported (Komeya et al., 2016), the central region receives insufficient nutrients and oxygen. In vitro spermatogenesis therefore only occurs in the seminiferous tubules present in the peripheral region. As in our previous publications and recent RNA-seq analyses (Rondanino et al., 2017; Oblette et al., 2019; Dumont et al., 2023), the central necrotic area was removed so that transcript and protein levels in the healthy part of the samples (i.e. where in vitro spermatogenesis occurs) could be measured and compared with in vivo controls. For histological and immunohistochemical analyses, only seminiferous tubules located at the periphery of the cultured fragments (outside of the necrotic region) were analyzed. Steroid measurements were performed on the entire fragments.

      The initial material was indeed a quarter testis, which represents approximately 0.75 mm3. No growth of the fragments was observed during the organotypic culture period (Figure 8-figure supplement 1). We agree with the reviewer that the composition of the culture medium is not the only parameter to be considered for the quality of the culture and that the size of the fragments is also a determining factor. We previously determined that 0.75 mm3 was the most appropriate size for mouse in vitro spermatogenesis (Dumont et al., 2016). We do not exclude at all that central necrosis can have harmful effects on the survival and/or the growth and/or the differentiation of the testis in culture. Optimization of the culture medium and culture design (so that the tissue center receives sufficient nutrients and oxygen) will be necessary to increase the yield of in vitro spermatogenesis.

      Organotypic culture is currently trying to cross the doors of academic research laboratories to become a clinical tool, but it requires many adjustments and many quality controls. This study shows a perfect example of the pitfall often associated with this approach. The road is still long, but every piece of information is useful.

      Reviewer #3 (Public Review):

      Moutard, Laura, et al. investigated the gene expression and functional aspects of Leydig cells in a cryopreservation/long-term culture system. The authors found that critical genetic markers for Leydig cells were diminished when compared to the in-vivo testis. The testis also showed less androgen production and androgen responsiveness. Although they did not produce normal testosterone concentrations in basal media conditions, the cultured testis still remained highly responsive to gonadotrophin exposure, exhibiting a large increase in androgen production. Even after the hCG-dependent increase in testosterone, genetic markers of Leydig cells remained low, which means there is still a missing factor in the culture media that facilitates proper Leydig cell differentiation. Optimizing this testis culture protocol to help maintain proper Leydig cell differentiation could be useful for future human testis biopsy cultures, which will help preserve fertility and child cancer patients.

      Methods: In line 226, there is mention that the central necrotic area was carefully removed before RNA extraction. This is particularly problematic for the inference of these results, especially for the RT-qPCR data. Was the central necrotic area consistent between all samples and variables (16 and 30FT)? How big was the area? This makes the in-vivo testis not a proper control for all comparisons. Leydig cells are not evenly distributed throughout the testis. A lot of Leydig cells can be found toward the center of the gonad, so the results might be driven by the loss of this region of the testis.

      When using the organotypic culture system at a gas-liquid interphase, the central part of the testicular tissue becomes necrotic. As previously reported (Komeya et al., 2016), the central region receives insufficient nutrients and oxygen. In vitro spermatogenesis therefore only occurs in the seminiferous tubules present in the peripheral region. As in our previous publications and recent RNA-seq analyses (Rondanino et al., 2017; Oblette et al., 2019; Dumont et al., 2023), the central necrotic area was removed so that transcript levels in the healthy part of the samples (i.e. where in vitro spermatogenesis occurs) could be measured and compared with in vivo controls. In order to be able to compare the healthy part of the in vitro matured tissues with in vivo controls, transcript levels of the selected genes were normalized to housekeeping genes (Gapdh and Actb) or to the Leydig cell-specific gene Hsd3b1.

      The central necrotic area was consistent between all samples and variables: it represents on average 16-27% of the explants.

      Moreover, we would like to point out that the gonads were cut into four fragments before in vitro cultures. It is therefore the central part of the cultured explants that was removed and not the central part of the gonads. The central part of the gonads was thus included in our analyses.

      What did the morphology of the testis look like after culturing for 16 and 30 days? These images will help confirm that the culturing method is like the Nature paper Sato et al. 2011 and also give a sense of how big the necrotic region was and how it varied with culturing time.

      Images showing mouse testicular tissues cultured for 16 and 30 days are presented in Figure 8-figure supplement 1. The cultured tissues resemble those shown by Sato et al., 2011. As mentioned above, the central necrotic area represents on average 16-27% of the explants. No significant difference in the area of the necrotic region was found between the two culture time points.

      There are multiple comparisons being made. Bonferroni corrections on p-value should be done.

      Bonferroni corrections are used when multiple comparisons are conducted. As mentioned in the Materials and methods section, multiple comparisons were not made in this study. Indeed, the non-parametric Mann-Whitney test was used to compare two conditions: in vitro vs in vivo (D16 FT vs 22 dpp, D16 CSF vs 22 dpp, D30 FT vs 36 dpp, D30 CSF vs 36 dpp, D30 FT + hCG vs 36 dpp, D30 CSF + hCG vs 36 dpp), cultures of fresh vs frozen tissues (6 dpp vs 6 dpp CSF, D16 FT vs D16 CSF, D30 FT vs D30 CSF, D30 FT + hCG vs D30 CSF + hCG) and cultures with vs without hCG (D30 FT + hCG vs D30 FT, D30 CSF + hCG vs D30 CSF). These comparisons were added in the Materials and methods section.

      Results: In the discussion, it is mentioned that IGF1 may be a missing factor in the media that could help Leydig cell differentiation. Have the authors tried this experiment? Improving this existing culturing method will be highly valuable.

      The decreased Igf1 mRNA levels found in the present study are in line with the RNA-seq data of Yao et al., 2017. As mentioned in the Discussion section, the addition of IGF1 in the culture medium led to a modest increase in the percentages of round and elongated spermatids in cultured mouse testicular fragments (Yao et al., 2017). However, the effect of IGF1 supplementation on Leydig cell differentiation was not investigated. The supplementation of organotypic culture medium with IGF1 is currently being tested in our research team.

      Add p-values and SEM for qPCR data. This was done for hormones, should be the same way for other results.

      p-values and SEM are shown for both qPCR and hormone data.

      Regarding all RT-qPCR data-There is a switch between 3bHSD and Actb/Gapdh as housekeeping genes. There does not seem to be as some have 3bHSD and others do not. Why do Igf1 and Dhh not use 3bHSD for housekeeping? If this is the method to be used, then 3bHSD should be used as housekeeping for the protein data, instead of ACTB. Also, based on Figure 1B and Figure 2A (Hsd3b1) there does not seem to be a strong correlation between Leydig cell # and the gene expression of Hsd3b1. If Hsd3b1 is to be used as a housekeeper and a proxy for Leydig cell number a correlation between these two measurements is necessary. If there is no correlation a housekeeping gene that is stable among all samples should be used. Sorting Leydig cells and then conducting qPCR would be optimal for these experiments.

      Hsd3b1 was used as a housekeeping gene only to normalize the mRNA levels of Leydig cell-specific genes. Therefore, Igf1 and Dhh transcript levels were not normalized with Hsd3b1 since Igf1 is expressed by several cell types in the testis (Leydig cells, Sertoli cells, peritubular myoid cells) and Dhh is expressed by Sertoli cells.

      Regarding western blots, the expression of AR, CYP19 and FAAH could not be normalized with 3-HSD since AR is expressed by Leydig cells, Sertoli cells and peritubular myoid cells, CYP19 is expressed by Leydig cells and germ cells and FAAH is expressed by Sertoli cells. For CYP17A1 however, 3B-HSD was used as housekeeping instead of ACTB (Figure 2G).

      No correlation was found between the number of Leydig cells per cm2 of testicular tissue shown in Figure 1 and Hsd3b1 mRNA levels presented in Figure 2. However, this result was expected since on the one hand the number of Leydig cells per cm2 was determined in the peripheral region of one tissue section whereas on the other hand Hsd3b1 transcript levels were measured in the entire peripheral region of the cultured fragments. The correction factor used for the analysis of genes expressed in Leydig cells present in the healthy part of the cultured tissues was therefore the Leydig cell selective marker Hsd3b1, as previously described (Cacciola et al., 2013).

      Figure 2A (CYP17a1): It is surprising that the CYP17a1 gene and protein expression is very different between D30FT and 36.5dpp, however, the immunostaining looks identical between all groups. Why is this? A lower magnification image of the testis might make it easier to see the differences in Cyp17a1 expression. Leydig cells commonly have autofluorescence and need a background quencher (TrueBlack) to visualize the true signal in Leydig cells. This might reveal the true differences in Cyp17a1.

      RT-qPCR and western blot analyses show that both Cyp17a1 mRNA levels and CYP17A1 protein levels are decreased in organotypic cultures at D30. However, we agree that such a decrease is not visible in immunostaining. No autofluorescence of Leydig cells could be observed in the negative controls (Figure 2J).

      Figure 3D: there are large differences in estradiol concentration in the testis. Could it be that the testis is becoming more female-like? Leydig and Sertoli cells with more granulosa and theca cell features? Were any female markers investigated?

      We show in the present study that the expression levels of the Sertoli cell-specific gene Dhh are not reduced in organotypic cultures. We also previously found that the expression levels of the Sertoli cell-specific gene Amh were not reduced in in vitro matured testicular tissues (Rondanino et al., 2017). Moreover, we have recently shown that Sox9, encoding a testis-specific transcription factor, is expressed in organotypic cultures (Dumont et al., 2023). Our recent transcriptomic analysis also revealed that the transcript levels of the pro-male sexual differentiation marker Sry and of the Sertoli cell-specific gene Dmrt1 remained unchanged in organotypic cultures compared to in vivo controls (Dumont et al., 2023). In addition, no increase in the mRNA levels of the female sex-determining genes Foxl2 and Rspo1 was found in vitro (Dumont et al., 2023). However, we cannot rule out that in vitro cultured testes are becoming more female-like as the expression of Hsd17b3, encoding an androgenic enzyme, is reduced (this study) while the expression of the feminizing gene Wnt4 is upregulated (Dumont et al., 2023).

      Figure 3D and Figure 5A: It is hard to imagine that intratesticular estradiol is maintained for 16-30 days without sufficient CYP19 activity or substrate (testosterone). 6.5 dpp was the last day with abundant CYP19 expression, so is most of the estrogen synthesized on this first day and it sticks around? Are there differences in estradiol metabolizing enzymes? Is there an alternative mechanism for E production?

      In the present study, abundant CYP19 expression was indeed found at 6 dpp. However, the expression of this enzyme was not measured between 6 dpp and D16. Therefore, we cannot be sure that 6 dpp is the last day with abundant CYP19 expression. We assume that the estradiol synthesized before D16 may then accumulate within the cultured tissues. In our study, we quantified the transcript levels of Sult1e1, encoding an estradiol metabolizing enzyme. SULT1E1 is thought to play a physiological role in protecting Leydig cells from estrogen-induced biochemical lesions (Tong et al., 2004). A reduction in Sult1e1 mRNA levels was found at D30 in comparison to in vivo controls, but this may occur earlier during organotypic culture. In addition, decreased transcript levels of Hsd17b2, which encodes an estrogen metabolizing enzyme that converts estradiol to estrone, were found at D30 in this study. We suggest in the Discussion section that elevated estradiol levels in cultured tissues could be a consequence of low Sult1e1 and Hsd17b2 expression. Our recent transcriptomic analyses show that the levels of Cyp1a1, Cyp1b1 and Comt, encoding other estrogen metabolizing enzymes, are unchanged in organotypic cultures (Dumont et al., 2023). To our knowledge, there is no alternative mechanism for estradiol production.

      Recommendations For The Authors:

      1) The acronyms, PLC, SLC, ILC, ALC, and FLC, become hard to follow. It is recommended to spell out the names.

      PLC was replaced by progenitor Leydig cells, SLC by stem Leydig cells, ILC by immature Leydig cells, ALC by adult Leydig cells and FLC by fetal Leydig cells in the entire manuscript.

      2) All Figures: Use letters for each bar graph. Difficult to make a connection from text to figure.

      A letter was added to each bar graph.

      3) Supplemental figure 1: Change "Changement du milieu" to English.

      These words were replaced by “Medium change”.

      4) Catalog numbers for antibodies are necessary.

      The catalog numbers of the antibodies used in this study are presented in Supplementary Table 1.

    2. eLife assessment

      This study reports useful information on the limits of the organotypic culture of neonatal mouse testes, which has been regarded as an experimental strategy that can be extended to humans in the clinical setting for the conservation and subsequent re-use of testicular tissue. The evidence that the culture of testicular fragments of 6.5-day-old mouse testes does not allow optimal differentiation of steroidogenic cells is compelling and would be useful to the scientific community in the field for further optimizations.

    3. Reviewer #1 (Public Review):

      In this manuscript, the authors aimed to compare, from testis tissues at different ages from mice in vivo and after culture, multiple aspects of Leydig cells. These aspects included mRNA levels, proliferation, apoptosis, steroid levels, protein levels, etc. A lot of work was put into this manuscript in terms of experiments, systems, and approaches. The technical aspects of this work may be of interest to labs working on the specific topics of in vitro spermatogenesis for fertility preservation.

      Second review:

      The authors should be commended for substantial improvement in their manuscript for resubmission.

    4. Reviewer #3 (Public Review):

      Moutard, Laura, et al. investigated the gene expression and functional aspects of Leydig cells in a cryopreservation/long-term culture system. The authors found that critical genetic markers for Leydig cells were diminished when compared to the in-vivo testis. The testis also showed less androgen production and androgen responsiveness. Although they did not produce normal testosterone concentrations in basal media conditions, the cultured testis still remained highly responsive to gonadotrophin exposure, exhibiting a large increase in androgen production. Even after the hCG-dependent increase in testosterone, genetic markers of Leydig cells remained low, which means there is still a missing factor in the culture media that facilitates proper Leydig cell differentiation. Optimizing this testis culture protocol to help maintain proper Leydig cell differentiation could be useful for future human testis biopsy cultures, which will help preserve fertility and child cancer patients.

      Overall, the authors addressed most comments and questions from the previous review. The additional data regarding the necrotic area is helpful for interpreting the quality of the cultures.

      The authors did not conduct a multiple comparison tests although there are multiple comparisons conducted on for a single dependent variable (Fig 2J, Fig 3F, among many others), however, the addition of this multiple comparison is unlikely to change the conclusions of the paper or the figure and, thus is a minor technical detail in this case.

    1. Author Response

      Reviewer #1 (Public Review)

      The authors present a scRNAseq study describing the transcriptomes of the tendon enthesis during postnatal development. This is an important topic that has major implication for the care of common clinical problems such as rotator cuff repair. The results are a valuable addition to the literature, providing a descriptive data set reinforcing other, more comprehensive studies. There are weaknesses, however, in the scRNAseq analyses.

      1)The authors should provide additional rationale for the PCA analysis shown in Fig 1d. It is uncommon to use PCA for histomorphologic parameters. These results do not convincingly demonstrate that P7 is as a critical developmental timepoint.

      2) According to the methods, it appears that the entire humeral head-supraspinatus tendon was used for cell isolation for scRNAseq. This results in the inclusion of cells from a variety of tissues, including bone, growth plate, enthesis and tendon. As such, only a very small percentage of cells in the analysis came from the enthesis. Inclusion of such a wide range of cells makes interpretation of enthesis cells difficult.

      3) The differentiationpseudotime analysis described in Fig 3 is difficult to follow. This map includes cell transcriptomes from vastly different tissues. Presumably, embedded in these maps are trajectories for osteoblast differentiation, chondrocyte differentiation, tenocyte differentiation, etc. With so many layers of overlapping information, it is difficult to (algorithmically) deduce a differentiation path of a particular cell type.

      4) The authors uses the term function throughout the paper (e.g., functional definition of fibrocartilage subpopulations). However, this is a descriptive scRNAseq study, and function can therefore only theoretically be inferred from the algorithms used to analyze the data. A functional role for any of the identified pathways or processes can only be defined with gain- andor loss-of-function studies.

      5) C2 highly expressed biomineralization-related genes (Clec3a, Tnn, Acan). The three example genes are not related to biomineralization.

      6) The functional characterization of the three enthesis cell clusters is not convincing. For example, activation of metabolism-related processes can mean a lot of things (including changes in differentiation), yet the authors interpret it very specifically as role in postnatal fibrochondrocyte formation and growth.

      7) The pseudotime analysis of the enthesis cell clusters is not convincing. The three clusters are quite close and overlapping on the UMAP. Furthermore, the authors focus on Tnn as a novel and unique gene, yet the expression pattern shown in Fig 5g implies even expression of this gene across all three clusters.

      8) The TC1 markers (Ly6a, Dlk3, Clec3b) imply a non-tendon-specific cell population. Perhaps a tendon progenitor pool or an endothelial cell phenotype is more appropriate.

      9) Pseudotime analyses assume that your data set includes cells from progenitor through mature cell populations. It is unclear that the timepoints studied here included cells from early progenitor states.

      10) The CellChat analysis is difficult to follow, as the authors included 18 cell types. The number of possible interactions among so many cell types is enormous, and deducing valid connections between any two cell types in this case should be justified. Is the algorithm robust to so many possible interactions

      Thank you very much for your comments and suggestions. According to your suggestions, we carefully revised the paper. We integrated our dataset with open source GSE182997 datasets and re-performed the downstream analysis. On the other hand, we added immunofluorescence tests to validate the results came from single-cell datasets. And we hope all the mentioned issues in prior version to be well addressed.

      Reviewer #2 (Public Review)

      To reveals cellular and molecular heterogeneity in enthesis, the authors established a single-cell temporal atlas during development. This study provides a transcriptional resource for further investigation of fibrocartilage development.

      Thank you very much for your kind suggestions. According to your suggestions, we integrated our dataset with open source GSE182997 datasets and re-performed the downstream analysis. On the other hand, we added immunofluorescence tests to validate the results came from sinlge-cell datasets. And we hope the mentioned issues in prior version to be well addressed.

    1. Author Response

      eLife assesssment:

      This paper conducts human and rodent experiments of non-invasive diffusion MRI estimates of axon diameter with the aim to establish whether these estimates provide biologically specific markers of axonal degeneration in MS. It will be of interest to researchers developing quantitative MRI methods and scientists studying neurodegeneration. The experiments provide evidence for the sensitivity of these markers, but do not directly validate axon diameter and do not reflect common pathological mechanisms across rodents and humans.

      We thank the Editor for the appreciation of our work. Thanks to the addition of an extensive electron microscopy paradigm, we now include a direct validation of axonal damage and expand on the common pathological mechanisms across the two species. The new results are detailed in the manuscript and summarized in Fig. 3 in the manuscript

      Reviewer #1 (Public Review):

      1.1 My primary concern relates to how meaningful the human-rodent comparisons are, and whether these comparisons really advance our understanding of AxCaliber estimates in MS. I applaud the aim to conduct "matched" experiments in both rodent models and human disease. It is a strength that the experiments are aligned with respect to the MRI measurements (although there are some caveats to this mentioned below). But beyond that, the overlap is not what one might hope for: the pathology would seem to be very distinct in humans and rodents, and the histological validation is not specific to what the MRI measurements claim to estimate. To summarize the main findings: (i) in a rat model of general axonal degeneration, axon calibre estimates correlate with neurofilaments; (ii) in MS in humans, axon calibre estimates correlate with demyelinating lesions. This gives a picture of AxCalibre estimates correlating with neuropathology, but is this something that has not already been established in the literature? If the aim is to validate AxCaliber, then there is a logic in using a rodent model that isolates alterations to axonal radius, but what then does this add to the existing literature in that space? If the aim is to study MS (for which AxCaliber results have been previously reported in Huang et al), then why not use a rodent model of MS?

      We thank the reviewer for their very insightful comments. Indeed, multiple sclerosis (MS) is a chronic neuroinflammatory and neurodegenerative disease of unknown etiology. An enormous effort has been made to obtain animal models that simulate the pathogenesis of this disease. However, while several models exist recapitulating distinct aspects of the disease (mostly related to demyelination), MS fundamentally remains a disease that only affects humans. This does not mean that EAE or lysolecithin models do not provide information on specific aspects and are therefore valuable. In fact, we believe that trying to replicate the pathological mechanisms of this disease in an animal model goes beyond the scope of the present work. In this work, our intention is to validate a biomarker of axonal damage preclinically, and for this, we use a model of axonal degeneration. We do not claim that this model should be valid to capture the complex clinical and pathological manifestation of MS, but we do think that it is a necessary step to ensure MRI sensitivity to axonal pathology. Why necessary? Because all the available (very limited) MRI literature which provides some form of validation: i) only focuses on healthy tissue, and ii) has an n of 1. Our preclinical paradigm gives conclusive evidence that the MRI axonal diameter proxy detects axonal damage as an increase in the mean diameter. This is now detailed in the discussion.

      After this necessary preclinical validation, we then apply the same framework to a human disease like MS that, among other manifestations, is believed to also cause axonal pathology. The improvements with respect to the one published work about axonal diameter in MS are: i) the whole brain analysis, which allowed us to characterize the extent of these early alterations outside the demyelinated lesions; and ii) the larger sample size, which allowed us to uncover an association with disease duration, strengthening our hypothesis about increased axonal diameter being a marker of early disease (new Fig. 5).

      Regarding the nonspecificity of histological validation, we thank the reviewer for this insightful comment, which triggered an additional analysis that we believe has added further value to the paper. Using electron microscopy, we found that in our model of neurodegeneration, axonal damage is indeed reflected as an increase in axon diameter (new Fig. 3). These recent findings strongly support the validation of our noninvasive diffusion MRI estimates of axon diameter alterations as an early-stage hallmark of normal-appearing tissue in MS.

      Coming back to the comparison between pathology in humans and in rodents, the EM data also support our choice of preclinical model, showing axonal swelling, the same phenomenon reported and characterized in recent postmortem histological data in the normal-appearing white matter of MS patients (Luchicchi et al., Ann Neurol 2021) and in lesions (Fisher et al. Ann Neurol 2007).

      All in all, we are confident that the new data supports the validity of this translational approach, and shed new light into the degenerating aspect of MS.

      Changes in the manuscript

      • Discussion, pag.12: It is important to stress that the aim of this work is not to propose a new animal model of MS, a disease that only affects humans, but rather to validate axonal damage detection (independently from the pathology that has induced it) through noninvasive MRI and apply the framework to characterize axonal pathology in MS.

      1.2 I appreciate that both rodent and patient studies are time intensive, major endeavors. Neverthless, the number of subjects is very low in both rodent (n=9) and human (MS=10, control=6) studies. At the very least, this should be more openly acknowledged. But I'm concerned that this is a major weakness of the paper. Related to this, I find it hard to tell how carefully multiple comparison correction was performed throughout. It seems reasonably clear for the TBSS analyses, but then other analyses were performed in ROIs. Are these multiple comparisons corrected as well? Similarly, in Methods, I am confused by the statement that: "post hoc t tests corrected for multiple comparisons whenever a significant effect was detected". What does this mean?

      We thank the reviewer for this comment. We agree that a small sample size was a weakness of the previous version of the paper, and therefore, in the new version, we have substantially increased the n for both animal and human experiments (from n=9 to 19 in animals, from 16 to 21 in humans). We removed the ROI analysis in the new version, and thus the confusing statement, and clarified the strategy for multiple comparisons.

      Changes in the manuscript

      • Data analysis, pag. 18: Lesion masks were excluded from the statistical analysis, and multiple comparisons across clusters were controlled for by using threshold-free cluster enhancement.

      1.3 While I do not think the text is in any sense deliberately misleading, I think the authors would do well to either tone down their claims or consider more carefully the implications of the text in many places. Some that stuck out for me are:

      Throughout, language in the paper (e.g., "Paired t tests were used to assess differences in the axonal diameter") presumes that the AxCaliber estimates specifically reflect axon diameter. I think the jury is out over whether this is true, particularly for measurements conducted with limited hardware specs. At the very least, I would encourage the author to refer to these measurements throughout as "estimates" of axon diameter.

      Thank you for this clarification. We have indeed changed the notation, and now consistently refer to the estimates of axon diameter through MRI as the “MRI axonal diameter proxy”.

      1.4 The authors suggest that their results provide "new tools for patient stratification" based on differences in lesion type, but it isn't clear what new information these markers would confer given that the lesions are differentiated based on T1w hypo/hyperintensities. In other words, these lesions are by definition already differentiable from a much simpler MRI marker.

      Thank you for this insightful comment. The reviewer is right, and following the general reviewers’ assessment we have decided to not include the lesion analysis in the new version of the manuscript.

      1.5 The authors note in the Discussion that: "sensitive to early stages of axonal degeneration, even before alterations in the myelin sheet are detected". Whether intentional or not, the implication in the context of this study is that this would hold for MS (that these markers would detect axonal degeneration preceding demyelination). While there is some discussion of alterations to axonal diameter in MS, the authors do not discuss whether these are the same mechanisms thought to occur in the IBO intervention used here.

      Thank you for this comment. Indeed, the scope of the paper is not to assess whether axonal swelling precedes or not myelin alterations, so we agree with the reviewer that this sentence might be misleading and have removed it in the text. While we do not claim that ibotenic acid injections are able to replicate the complex clinical and pathological manifestation of MS (and now we made it clear in the revised manuscript, see comment 1), the electron microscopy paradigm indicates the presence of axonal swelling in the damaged fimbria, which is indeed the same pathological manifestation found in MS post-mortem data (see e.g. Fisher et al. Ann Neurol 2007).

      1.6 In the Discussion, the authors note the lack of evidence for a relationship with disability or disease duration, but nevertheless, go on to interpret the "trends" they do observe. I would advise strongly against this: the authors acknowledge that their numbers are low, so I would avoid the temptation to speculate here.

      The reviewer is 100% correct. We should have refrained from speculating. In the new version of the paper, however, thanks to the larger human cohort, we were able to find significant associations with disease duration in voxelwise analysis of the white matter skeleton in standard space and in the whole white matter in single subject space (new Figure 5).

      1.7 In the Discussion state that "the use of neurofilaments has also been well validated in MS". Well validated for what? MS is a complex disease with a broad range of pathology, so this statement could be read to mean "neurofilaments are known to be altered in MS". However, in the context of this paragraph, the implication would seem to be that neurofilaments are a wellestablished proxy for axonal diameter. Is that the implication, and if so what general evidence is there for this?

      We thank the reviewer for this insightful comment. Indeed, altered neurofilaments are not conclusive evidence of increased axonal diameter. In this context, the addition of electron microscopy data in the new manuscript version supports the claim.

      Reviewer #2 (Public Review):

      Diffusion MRI is sensitive to the brain microstructure, and it has been used to assess the integrity of white matter for nearly 3 decades. Its main limitation is the limited specificity, which makes it difficult to link changes in diffusion parameters to a given pathological substrate. Recently methods based on diffusion MRI that enable the estimation of axonal diameter, non invasively, have become available. This paper aims at validating one of such methods using an experimental model of neurodegeneration. The authors found a significant correlation between axonal diameter estimated by MRI and an histological marker of neurodegeneration. Although this is of great interest, as it demonstrates that this method is sensitive to neurodegeneration, a direct validation would require a measurement of axonal diameter using electron or confocal microscopy, rather than a correlation with a measure of axonal degeneration not directly related to axonal diameter. So, although these data are compelling, they do not prove that the increase in axonal diameter suggested by diffusion MRI corresponds to actual axonal swelling. The Authors also apply the same method to compare the white matter of patients with multiple sclerosis (MS) and healthy controls, showing widespread increases in axonal diameter in the patients. These data are compelling, but again, not conclusive. Other factors such as gloss could bias the MRI measurement and lead to an apparent increase in axonal diameter.

      We would like to thank the reviewer for the positive assessment of our work and for the valuable suggestion. We are confident that the new version of the manuscript, by including an extensive validation based on electron microscopy, has addressed the reviewer´s criticisms.

      Reviewer #3 (Public Review):

      3.1 In this paper, Toschi et al. performed dMRI to in vivo estimate axon diameter in the brain and demonstrated that multi-compartmental modeling (AxCaliber) is sensitive to microstructural axonal damage in rats and axon caliber increase in demyelinating lesions in MS patients, suggesting that axon diameter mapping provides a potential biomarker to bridge the gap between medical imaging contrasts and biological microstructure. In particular, authors injected ibotenic acid (IBO) and saline in the left and right rat hippocampus, respectively, and compared in vivo estimated axon diameter and ex vivo neurofilament staining in left and right fimbria. The axon size estimation was larger in the fimbria of IBO injection side, where the neurofilament intensity is higher. Correlation of axon size estimation and neurofilament intensity was observed in both injection sides. Further, higher axon diameter estimation was observed in normal appearing white matter (NAWM) of MS patients, compared with the healthy subjects. The axon size estimation increased in hypointense lesions of T1 weighted contrast, but not in isointense lesions. Through the comparison of dMRI-estimated axon size and histology-based fluorescence intensity, authors indirectly validated the sensitivity of axon diameter mapping to the tissue microstructure in the rat brain, and further explored the axon size change in the brain of MS patients. However, the dMRI protocol and biophysical modeling in this study were not fully optimized to maximize the sensitivity to axon size estimation, and the dMRI-estimated axon size (4.4-5.4 micron) was much larger than values reported in previous histological studies (0.5-3 micron) [Barazany et al., Brain 2009]. Finally, although the modified AxCaliber model incorporated two fiber bundles in different directions, the fiber dispersion in each bundle was not considered (c.f. fiber dispersion ~20-30 degree in corpus callosum), potentially leading to overestimated axon diameter.

      We thank the reviewer for their appreciation of our work, which we believe is substantially improved in this revised version through the inclusion of an electron microscopy paradigm. Below, the point-by-point response to the specific points raised.

      3.2 The conclusions in this study are supported by experimental results. However, the dMRI protocol and biophysical model could be further optimized and validated: 1. To in vivo estimate the axon diameter ~1 micron using dMRI, strong diffusion weighting (b-value) should be applied to maximize the signal decay due to intra-axonal restricted diffusion and minimize the signal contribution of extra-cellular hindered diffusion. However, authors only applied maximal b-value = 4000 s/mm2, much smaller than values ~15,00020,000 s/mm2 in previous studies [Assaf et al., MRM 2008; Huang et al., BSAF 2020, 225:1277]. The use of low diffusion weighting in this study leads to a lower bound ~4-6 micron for accurate diameter estimation, the so-called resolution limit in [Nilsson et al., NMR Biomed 2017, 30:e3711]. In other words, the estimated axon diameter is potentially overestimated and related with the imaging protocol and image quality, confounding the biological interpretation.

      We thank the reviewer for this insightful comment. Indeed, while the resolution limit is a concern, the chosen b-value has been a compromise between sensitivity to small structure and SNR, as indicated by recent animal (Crater et al., 2022) and human (Jensen et al., 2016; McKinnon et al., 2017; Moss et al., 2019) work, pointing at 3000-4000 s/mm2 as the b-value for which the intra-axonal water signal is dominant. In addition, a paper from the laboratory that first developed the Axcaliber method recently came out (Gast et al., 2023, DOI: 10.1007/s12021-023-09630-w) demonstrating that an MRI protocol with a maximum b-value between 3000 and 4000 s/mm2 (and even lower) is sufficient to capture, in vivo and in humans, various well-known aspects of axonal morphometry (e.g., the corpus callosum axon diameter variation) as well as other aspects that are less explored (e.g., axon diameter-based separation of the superior longitudinal fasciculus into segments). The same paper contains resources and further bibliography supporting the fact that experimental evidence suggests that the contribution of intra-axonal water to restricted diffusion signals dominates other factors (see Online Resource 1, section A of the same paper). To challenge this recent evidence from a neurobiology perspective, we include in the supplementary material a subset of experiments in animals with lower maximum b-value (2500 s/mm2, Fig. S1), where we are able to detect the same effect of increased MRI axonal diameter proxy in the injected hemisphere compared to control.

      We would like to add that while extremely valuable and informative, simulation studies such as the excellent study by Veraart et al., 2020, are inevitably valid under certain assumptions. Among them, some critical ones are i) the need to neglect nonaxonal cells such as glia, ii) assuming that the bulk diffusivity of water in cerebral tissue would be the same as that of free water, and iii) impermeable barriers. All these assumptions are expected to play a role in the estimated resolution limit, a role difficult to quantify but likely substantial.

      For this reason, we believe that our approach, which is 100% focused on neurobiology and measurements performed in real tissue, can offer a different perspective and fuel the ongoing debate on axonal diameter measurement feasibility. We acknowledge the value of the reviewer comment and discuss the issue of b-value in the discussion (see also comment 1.8).

      Changes in the manuscript

      • Discussion, pag. 12:<br /> Despite some inevitable minor differences due to different brain sizes and magnet features, the human protocol was built to match the main characteristics of the preclinical diffusion sequence, such as the b-value and diffusion time range. The chosen b-value has been a compromise between sensitivity to small structures and the signalto-noise ratio (SNR), as indicated by recent animal (Crater et al., 2022) and human (Gast et al., 2023; Jensen et al., 2016; McKinnon et al., 2017; Moss et al., 2019) work, pointing at 4000 s/mm2 as the b-value for which the intra-axonal water signal is dominant. However, following recent work supporting sensitivity of diffusion-weighted MRI to axonal diameter even at lower b-values (Gast et al., 2023), we tested a protocol with a lower b-value in a subset of animals, with the aim of facilitating future clinical AxCaliber studies. We found no qualitative differences in the outcome (MRI axonal diameter proxy was increased following fimbria damage). Further work and perhaps more realistic simulations, considering real cell composition and morphology, are needed to clarify this issue.

      3.3 In this study, the positive correlation of dMRI-estimated axon size and neurofilament fluorescence intensity is indeed an encouraging result, and yet this validation is indirect since it relies on the positive correlation between neurofilament intensity and axon diameter in histology.

      The reviewer correctly points out a severe limitation of the previous manuscript version, which is now addressed by including an extensive electron microscopy evaluation, recapitulated in new Fig. 3.

      3.4 Authors did not consider the fiber dispersion in the proposed dMRI model. This can lead to overestimated axon diameter, even in the highly aligned WM, such as corpus callosum with ~20-30 degree dispersion in histology [Ronen et al., BSAF 2014, 219:1773; Leergaard et all, PLoS One 2010, 5(1), e8595] and MRI [Dhital et al., NeuroImage 2019, 189, 543; Novikov et al., NeuroImage 2018, 174:518].

      The reviewer is correctly pointing out an important characteristic of while matter microstructure as is fibre dispersion. However, we would like to point out that the use of a second fiber population is expected to mitigate this effect by absorbing some axonal directional dispersion in areas of a single fiber. To support this, we quantified dispersion as the angle between the two main fiber orientations captured by the AxCaliber fit, as showed in Author response image 1 for two representative subjects (one control, upper line, and one MS, lower line; the “dispersion” maps are masked by a white matter probability mask, and superimposed to a T2w). Indeed, the angle between the two main fibres in the corpus callosum is around 20 degrees or lower, compatible with the bibliography cited by the reviewer, and higher in other white matter areas known to be characterized by fiber crossing and dispersion.

      Author response image 1.

      Angle in radians between the two main fiber orientations captured by the AxCaliber fit, as showed below for two representative subjects (one control, upper line, and one MS, lower line). The dispersion maps are masked by a white matter probability mask (P>=0.95), and superimposed to a T2-weighted image.

    2. eLife assessment

      This paper conducts human and rodent experiments of non-invasive diffusion MRI estimates of axon diameter with the aim to establish whether these estimates provide biologically specific markers of axonal degeneration in MS. It will be of interest to researchers developing quantitative MRI methods and scientists studying neurodegeneration. The experiments provide evidence for the sensitivity of these markers, but do not directly validate axon diameter and do not reflect common pathological mechanisms across rodents and humans.

    3. Reviewer #3 (Public Review):

      In this paper, Toschi et al. performed dMRI to in vivo estimate axon diameter in the brain and demonstrated that multi-compartmental modeling (AxCaliber) is sensitive to microstructural axonal damage in rats and axon caliber increase in demyelinating lesions in MS patients, suggesting that axon diameter mapping provides a potential biomarker to bridge the gap between medical imaging contrasts and biological microstructure. In particular, authors injected ibotenic acid (IBO) and saline in the left and right rat hippocampus, respectively, and compared in vivo estimated axon diameter and ex vivo neurofilament staining in left and right fimbria. The axon size estimation was larger in the fimbria of IBO injection side, where the neurofilament intensity is higher. Correlation of axon size estimation and neurofilament intensity was observed in both injection sides. Further, higher axon diameter estimation was observed in normal appearing white matter (NAWM) of MS patients, compared with the healthy subjects. The axon size estimation increased in hypointense lesions of T1 weighted contrast, but not in isointense lesions. Through the comparison of dMRI-estimated axon size and histology-based fluorescence intensity, authors indirectly validated the sensitivity of axon diameter mapping to the tissue microstructure in the rat brain, and further explored the axon size change in the brain of MS patients. However, the dMRI protocol and biophysical modeling in this study were not fully optimized to maximize the sensitivity to axon size estimation, and the dMRI-estimated axon size (4.4-5.4 micron) was much larger than values reported in previous histological studies (0.5-3 micron) [Barazany et al., Brain 2009]. Finally, although the modified AxCaliber model incorporated two fiber bundles in different directions, the fiber dispersion in each bundle was not considered (c.f. fiber dispersion ~20-30 degree in corpus callosum), potentially leading to overestimated axon diameter.

      The conclusions in this study are supported by experimental results. However, the dMRI protocol and biophysical model could be further optimized and validated:<br /> 1. To in vivo estimate the axon diameter ~1 micron using dMRI, strong diffusion weighting (b-value) should be applied to maximize the signal decay due to intra-axonal restricted diffusion and minimize the signal contribution of extra-cellular hindered diffusion. However, authors only applied maximal b-value = 4000 s/mm2, much smaller than values ~15,000-20,000 s/mm2 in previous studies [Assaf et al., MRM 2008; Huang et al., BSAF 2020, 225:1277]. The use of low diffusion weighting in this study leads to a lower bound ~4-6 micron for accurate diameter estimation, the so-called resolution limit in [Nilsson et al., NMR Biomed 2017, 30:e3711]. In other words, the estimated axon diameter is potentially overestimated and related with the imaging protocol and image quality, confounding the biological interpretation.<br /> 2. In this study, the positive correlation of dMRI-estimated axon size and neurofilament fluorescence intensity is indeed an encouraging result, and yet this validation is indirect since it relies on the positive correlation between neurofilament intensity and axon diameter in histology.<br /> 3. Authors did not consider the fiber dispersion in the proposed dMRI model. This can lead to overestimated axon diameter, even in the highly aligned WM, such as corpus callosum with ~20-30 degree dispersion in histology [Ronen et al., BSAF 2014, 219:1773; Leergaard et all, PLoS One 2010, 5(1), e8595] and MRI [Dhital et al., NeuroImage 2019, 189, 543; Novikov et al., NeuroImage 2018, 174:518].

    1. Author Response

      Reviewer #1 (Public Review):

      This study examines the factors underlying the assembly of MreB, an actin family member involved in mediating longitudinal cell wall synthesis in rod-shaped bacteria. Required for maintaining rod shape and essential for growth in model bacteria, single molecule work indicates that MreB forms treadmilling polymers that guide the synthesis of new peptidoglycan along the longitudinal cell wall. MreB has proven difficult to work with and the field is littered with artifacts. In vitro analysis of MreB assembly dynamics has not fared much better as helpfully detailed in the introduction to this study. In contrast to its distant relative actin, MreB is difficult to purify and requires very specific conditions to polymerize that differ between groups of bacteria. Currently, in vitro analysis of MreB and related proteins has been mostly limited to MreBs from Gram-negative bacteria which have different properties and behaviors from related proteins in Gram-positive organisms.

      Here, Mao and colleagues use a range of techniques to purify MreB from the Gram-positive organism Geobacillus stearothermophilus, identify factors required for its assembly, and analyze the structure of MreB polymers. Notably, they identify two short hydrophobic sequences-located near one another on the 3-D structure-which are required to mediate membrane anchoring.

      With regard to assembly dynamics, the authors find that Geobacillus MreB assembly requires both interactions with membrane lipids and nucleotide binding. Nucleotide hydrolysis is required for interaction with the membrane and interaction with lipids triggers polymerization. These experiments appear to be conducted in a rigorous manner, although the salt concentration of the buffer (500mM KCl) is quite high relative to that used for in vitro analysis of MreBs from other organisms. The authors should elaborate on their decision to use such a high salt buffer, and ideally, provide insight into how it might impact their findings relative to previous work.

      Response 1.1. MreB proteins are notoriously difficult to maintain in a soluble form. Some labs deleted the N-terminal amphipathic or hydrophobic sequences to increase solubility, while other labs used full-length protein but high KCl concentration (300 mM KCl) (Harne et al, 2020; Pande et al., 2022; Popp et al, 2010; Szatmari et al, 2020). Early in the project, we tested many conditions and noticed that high KCl helped keeping a slightly better solubility of full length MreBGs, without the need for deleting a part of the protein. In addition, concentrations of salt > 100 mM would better mimic the conditions met by the protein in vivo. While 50-100 mM KCl is traditionally used in actin polymerization assays, physiological salt concentrations are around 100-150 mM KCl in invertebrates and vertebrates (Schmidt-Nielsen, 1975), around 50-250 in fungal and plant cells (Rodriguez-Navarro, 2000) and 200-300 mM in the budding yeast (Arino et al, 2010). However, cytoplasmic K+ concentration varies greatly (up to 800 mM) depending on the osmolality of the medium in both E. coli (Cayley et al, 1991; Epstein & Schultz, 1965; Rhoads et al, 1976), and B. subtilis, in which the basal intracellular concentration of KCl was estimated to be ~ 350 mM (Eisenstadt, 1972; Whatmore et al, 1990). 500 mM KCl can therefore be considered as physiological as 100 mM KCl for bacterial cells. Since we observed plenty of pairs of protofilaments at 500 mM KCl and this condition helped to avoid aggregation, we kept this high concentration as a standard for most of our experiments. Nonetheless, we had also performed TEM polymerization assays at 100 mM in line with most of MreB and F-actin in vitro literature, and found no difference in the polymerization (or absence of polymerization) conditions. This was indicated in the initial submission (e.g. M&M section L540 and footnote of Table S2) but since two reviewers bring it up as a main point, it is evident we failed at communicating it clearly, for which we apologize. This has been clarified in the revised version of the manuscript. We have also almost systematically added the 100 mM KCl concentration too as per reviewer #2 request and to conciliate our salt conditions with those used for some in vitro analysis of MreBs from other organisms (see also response to reviewer #2 comments 1A and 1B = Responses 2.1A, 2.1B below). We then decided to refer to the 100 mM KCl concentration as our “standard condition” in the revised version of the manuscript, but we compile and compare the results obtained at 500 mM too, as both concentrations are within the physiological range in Bacillus.

      Additionally, this study, like many others on MreB, makes much of MreB's relationship to actin. This leads to confusion and the use of unhelpful comparisons. For example, MreB filaments are not actin-like (line 58) any more than any polymer is "actin-like." As evidenced by the very beautiful images in this manuscript, MreB forms straight protofilaments that assemble into parallel arrays, not the paired-twisted polymers that are characteristic of F-actin. Generally, I would argue that work on MreB has been hindered by rather than benefitted from its relationship to actin (E.g early FP fusion data interpreted as evidence for an MreB endoskeleton supporting cell shape or depletion experiments implicating MreB in chromosome segregation) and thus such comparisons should be avoided unless absolutely necessary.

      Response 1.2. We completely agree with reviewer #1 regarding unhelpful comparisons of actin and MreB, and that work on MreB has been traditionally hindered from its relationship to eukaryotic actin. MreB is nonetheless a structural homolog of actin, with a close structural fold and common properties (polymerization into pairs of protofilaments, ATPase activity…). It still makes sense to refer to a protein with common features, common ancestry and widely studied as long as we don’t enclose our mind into a conceptual framework. This said, actin and MreB diverged very early in evolution, which may account for differences in their biochemical properties and cellular functions. Current data on MreB filaments confirm that they display F-actin-like and F-actin-unlike properties. We thank the reviewer for this insightful comment. We have revised the text to remove any inaccurate or unhelpful comparison to actin (in particular the ‘actin-like filaments’ statement, previously used once)

      Reviewer #2 (Public Review):

      The paper "Polymerization cycle of actin homolog MreB from a Gram-positive bacterium" by Mao et al. provides the second biochemical study of a gram-positive MreB, but importantly, the first study examines how gram-positive MreB filaments bind to membranes. They also show the first crystal structure of a MreB from a Gram-positive bacterium - in two nucleotide-bound forms, finally solving structures that have been missing for too long. They also elucidate what residues in Geobacillus MreB are required for membrane associations. Also, the QCM-D approach to monitoring MreB membrane associations is a direct and elegant assay.

      While the above findings are novel and important, this paper also makes a series of conclusions that run counter to multiple in vitro studies of MreBs from different organisms and other polymers with the actin fold. Overall, they propose that Geobacillus MreB contains biochemical properties that are quite different than not only the other MreBs examined so far but also eukaryotic actin and every actin homolog that has been characterized in vitro. As the conclusions proposed here would place the biochemical properties of Geobacillus MreB as the sole exception to all other actin fold polymers, further supporting experiments are needed to bolster these contrasting conclusions and their overall model.

      Response 2.0. We are grateful to reviewer #2 for stressing out the novelty and importance of our results. Most of our conclusions were in line with previous in vitro studies of MreBs (formation of pairs of straight filaments on a lipid layer, both ATP and GTP binding and hydrolysis, distortion of liposomes…), to the exception of the claimed requirement of NTP hydrolysis for membrane binding prior to polymerization based on the absence of pairs of filaments in free solution or in the presence of AMP-PNP in our experimental conditions (which we agree was not sufficient to make such a bold claim, see below). Thanks to the reviewer’s comments, we have performed many controls and additional experiments that lead us to refine our results and largely conciliate them with the literature. Please see the answer to the global review comments - our conclusions have been revised on the basis of our new data.

      1. (Difference 1) - The predominant concern about the in vitro studies that makes it difficult to evaluate many of their results (much less compare them to other MreB/s and actin homologs) is the use of a highly unconventional polymerization buffer containing 500(!) mM KCL. As has been demonstrated with actin and other polymers, the high KCl concentration used here (500mM) is certain to affect the polymerization equilibria, as increasing salt increases the hydrophobic effect and inhibits salt bridges, and therefore will affect the affinity between monomers and filaments. For example, past work has shown that high salt greatly changes actin polymerization, causing: a decreased critical concentration, increased bundling, and a greatly increased filament stiffness (Kang et al., 2013, 2012). Similarly, with AlfA, increased salt concentrations have been shown to increase the critical concentration, decrease the polymerization kinetics, and inhibit the bundling of AlfA filaments (Polka et al., 2009).

      A more closely related example comes from the previous observation that increasing salt concentrations increasingly slow the polymerization kinetics of B. subtilis MreB (Mayer and Amann, 2009). Lastly, These high salt concentrations might also change the interactions of MreB(Gs) with the membrane by screening charges and/or increasing the hydrophobic effect. Given that 500mM KCl was used throughout this paper, many (if not all) of the key experiments should be repeated in more standard salt concentration (~100mM), similar to those used in most previous in vitro studies of polymers.

      Response 2.1A. As per reviewer #2 request, we have done at 100 mM KCl too most experiments (TEM, cryo-EM, QCMD and ATPase assays) initially performed at 500 mM KCl only. The KCl concentration affects both membrane binding and filament stiffness as anticipated by the reviewer but the main conclusions are the same. The revised version of the manuscript compiles and compares the results obtained at both high and low [KCl], both concentrations being within the physiological range in Bacillus. Please see point 1 of the response to the global review comments and the first response to reviewer 1 (Response 1.1) for further elaboration.

      Please note that in Mayer & Amann, 2009 (B. subtilis MreB), light scattering in free solution was inversely proportional to the KCl concentration, with the higher light scattering signal at 0 mM KCl (!), a > 2-fold reduction below 30 mM KCl and no scatter at all at 250 mM, suggesting a “salting in” phenomenon (see also the “Other Points to address” answers 1A and 2, below) (Mayer & Amann, 2009). Since no effective polymer formation (e.g. polymers shown by EM) was demonstrated in these experiments, it cannot be excluded that KCl was simply preventing aggregation of B. subtilis MreB in solution, as we observe. For all their other light scattering experiments, the ‘standard polymerization condition’ used by Mayer & Amann was 0.2 mM ATP, 5 mM MgCl2, 1 mM EGTA and 10 mM imidazole pH 7.0, to which MreB (in 5 mM Tris pH 8.0) was added. No KCl was present in their ‘standard’ polymerization conditions.

      This would test if the many divergent properties of MreB(Gs) reported here arise from some difference in MreB(Gs) relative to other MreBs (and actin homologs), or if they arise from the 400mM difference in salt concentration between the studies. Critically, it would also allow direct comparisons to be made relative to previous studies of MreB (and other actin homologs) that used much lower salt, thereby allowing them to definitively demonstrate whether MreB(Gs) is indeed an outlier relative to other MreB and actin homologs. I would suggest using 100mM KCL, as historically, all polymerization assays of actin and numerous actin homologs have used 50-100mM KCL: 50mM KCl (for actin in F buffer) or 100mM KCl for multiple prokaryotic actin homologs and MreB (Deng et al., 2016; Ent et al., 2014; Esue et al., 2006, 2005; Garner et al., 2004 ; Polka et al., 2009 ; Rivera et al., 2011 ; Salje et al., 2011). Likewise, similar salt concentrations are standard for tubulin (80 mM K-Pipes) and FtsZ (100 mM KCl or 100mM KAc in HMK100 buffer).

      Response 2.1B. We appreciate the reviewer’s feedback on this point. Please note that, although actin polymerization assays are historically performed at 50-100 mM KCl and thus 100 mM KCl was used for other bacterial actin homologs (MamK, ParM and AlfA), MreB polymerization assays have previously been reported at 300 mM KCl too (Harne et al., 2020; Pande et al., 2022; Popp et al., 2010; Szatmari et al., 2020), which is closer to the physiological salt concentration in bacterial cells (see Response 1.1), but also in the absence of KCl (see above). As a matter of fact, we originally wanted to use a “standard polymerization condition” based on the literature on MreB, before realizing there was none: only half used KCl (the other half used NaCl, or no monovalent salt at all) and among these, KCl concentrations varied (out of 8 publications, 2 used 20 mM KCl, 2 used 50 mM KCl and 4 used 300 mM KCl).

      1. (Difference 2) - One of the most important differences claimed in this paper is that MreB(Gs) filaments are straight, a result that runs counter to the curved T. Maritima and C. crescentus filaments detailed by the Löwe group (Ent et al., 2014; Salje et al., 2011). Importantly, this difference could also arise from the difference in salt concentrations used in each study (500mM here vs. 100mM in the Löwe studies), and thus one cannot currently draw any direct comparisons between the two studies.

      One example of how high salt could be causing differences in filament geometry: high salts are known to greatly increase the bending stiffness of actin filaments, making them more rigid (Kang et al., 2013). Likewise, increasing salt is known to change the rigidity of membranes. As the ability of filaments to A) bend the membrane or B) Deform to the membrane depends on the stiffness of filaments relative to the stiffness of the membrane, the observed difference in the "straight vs. curved" conformation of MreB filaments might simply arise from different salt concentrations. Thus, in order to draw several direct comparisons between their findings and those of other MreB orthologs (as done here), the studies of MreB(GS) confirmations on lipids should be repeated at the same buffer conditions as used in the Löwe papers, then allowing them to be directly compared.

      Response 2.2. We fully agreed with reviewer #2 that the salts could be affecting the assay and did cryo-EM experiments also in the presence of 100 mM KCl as requested. The results unambiguously showed countless curved liposomes on the contact areas with MreB (Fig. 2F-G and Fig. 2-S5), very similar to what was reported for Thermotoga and Caulobacter MreBs by the Lowe group. Our results therefore confirm the previous findings that MreBs can bend lipids, and suggest that, indeed, high salt may increase filament stiffness as it has been shown for actin filaments. We are very grateful to reviewer #2 for his suggestion and for drawing our attention to the work of Kang et al, 2013. The different bending observed when varying the salt concentration raise relevant questions regarding the in vivo behavior of MreB, since KCl was shown to vary greatly depending on the medium composition. The manuscript has been updated accordingly in the Results (from L243) and Discussion sections (L585-595).

      1. (Difference 3) - The next important difference between MreB(Gs) and other MreBs is the claim that MreB polymers do not form in the absence of membranes.

      A) This is surprising relative to other MreBs, as MreBs from 1) T. maritime (multiple studies), E.coli (Nurse and Marians, 2013), and C. crescentus (Ent et al., 2014) have been shown to form polymers in solution (without lipids) with electron microscopy, light scattering, and time-resolved multi-angle light scattering. Notably, the Esue work was able to observe the first phase of polymer formation and a subsequent phase of polymer bundling (Esue et al., 2006) of MreB in solution. 2) Similarly, (Mayer and Amann, 2009) demonstrated B. subtilis MreB forms polymers in the absence of membranes using light scattering.

      Response 2.3A. The literature does convincingly show that Thermotoga MreB forms polymers in solution, without lipids (note that for Caulobacter MreB filaments were only reported in the presence of lipids, (van den Ent et al, 2014)). Assemblies reported in solution are bundles or sheets (included in at the earlier time points in the time-resolved EM experiments reported by Esue et al. 2006 mentioned by the reviewer – ‘2 minutes after adding ATP, EM revealed that MreB formed short filamentous bundles’) (Esue et al, 2006). However, and as discussed above (Response 2.1A), the light scattering experiments in Mayer et Amann, 2009 do not conclusively demonstrate the presence of polymers of B. subtilis MreB in solution (Mayer & Amann, 2009). We performed many light scattering experiments of B. subtilis MreB in solution in the past (before finding out that filaments were only forming in the presence of lipids), and got similar scattering curves (see two examples of DLS experiments in Author response image 1) in conditions in which NO polymers could ever been observed by EM while plenty of aggregates were present.

      Author response image 1.

      We did not consider these results publishable in the absence of true polymers observed by TEM. As pointed out on the interesting study from Nurse et al. (on E. coli MreB) (Nurse & Marians, 2013), one cannot rely only on light scattering only because non-specific aggregates would show similar patterns than polymers. Over the last two decades, about 15 publications showed polymers of MreB from several Gram-negative species, while none (despite the efforts of many) showed a single convincing MreB polymer from a Gram-positive bacterium by EM. A simple hypothesis is that a critical parameter was missing, and we present convincing evidence that lipids are critical for Geobacillus MreB to form pairs of filaments in the conditions tested. However, in solution too we do occasionally see pairs of filaments (Fig 2-S2), and also sheet-like structures among aggregates when the concentration of MreB is increased (Fig. 2-S2 and Fig. 3-S2). Thus, we agree with the reviewer that it cannot be claimed that Geobacillus MreB is unable to polymerize in the absence of lipids, but rather that lipids strongly stimulate its polymerization, condition depending.

      B) The results shown in figure 5A also go against this conclusion, as there is only a 2-fold increase in the phosphate release from MreB(Gs) in the presence of membranes relative to the absence of membranes. Thus, if their model is correct, and MreB(Gs) polymers form only on membranes, this would require the unpolymerized MreB monomers to hydrolyze ATP at 1/2 the rate of MreB in filaments. This high relative rate of hydrolysis of monomers compared to filaments is unprecedented. For all polymers examined so far, the rate of monomer hydrolysis is several orders of magnitude less than that of the filament. For example, actin monomers are known to hydrolyze ATP 430,000X slower than the monomers inside filaments (Blanchoin and Pollard, 2002; Rould et al., 2006).

      Response 2.3B. We agree with the reviewer. We have now found conditions where sheets of MreB form in solution (at high MreB concentration) in the presence of ADP and AMP-PNP. However, we have now added several controls that exclude efficient formation of polymers in solution in the presence of ATP at low concentrations of MreBGs (≤ 1.5 µM), the condition used for the malachite green assays. At these MreB concentrations, pairs of filaments are observed in the presence of lipids, but very unfrequently in solution, and sheets are not observed in solution either (Fig. 2-S2A, B). Yet, albeit puzzling, in these conditions Pi release is reproducibly observed in solution, reduced only ~ 2 to 3-fold relative to Pi release in the presence of lipids (Fig. 5A and Fig. 5-S1). A reinforcing observation is when the ATPase assays is performed at 100 mM KCl (Fig. 5A). In this condition MreB binding to lipids is increased relative to 500 mM KCl (Fig. 4-S4C), and the stimulation of the ATPase activity by the presence of lipids is also stronger that at 500 mM (Fig. 5-S1A). Further work is needed to characterize in detail the ATPase activity of MreB proteins, for which data in the literature is very scarce. We can’t exclude that MreB could nucleate in solution or form very unstable filaments that cannot be seen in our EM assay but consume ATP in the process. At the moment, the significance of the Pi released in solution is unknown and will require further investigation.

      C) Thus, there is a strong possibility that MreB(Gs) polymers are indeed forming in solution in addition to those on the membrane, and these "solution polymers" may not be captured by their electron microscopy assay. For example, high salt could be interfering with the absorption of filaments to glow discharged lacking lipids.

      Response 2.3C. We appreciate the reviewer’s insight about this critical point. Polymers presented in the original Fig. 2A were obtained at 500 mM KCl but we had tested the polymerization of MreB at 100 mM KCl as well, without noticing differences. We have nonetheless redone this quantitatively and used these data for the revised Fig. 2A, as we are now using 100 mM KCl as our standard polymerization condition throughout the revised manuscript. We also followed the other suggestion of the reviewer and tested glow discharged grids (a more classic preparation for soluble proteins) vs non-glow discharged EM grids, as well as a higher concentration of MreB. Grids are generally glow-discharged to make them hydrophilic in order to adsorb soluble proteins, but the properties of MreB (soluble but obviously presenting hydrophobic domains) made difficult to predict what support putative soluble polymers would preferentially interact with. Septins for example bind much better to hydrophobic grids despite their soluble properties (I. Adriaans, personal communication). Virtually no double filaments were observed in solution at either low or high [MreB]. The fact that in some conditions (high [MreB], other nucleotides) we were able to detect sheet-like structures excluded a technical issue that would prevent the detection of existing but “invisible” polymers here. We have added these new data in Fig. 2-S2.

      As indicated above, the reviewer’s comments made us realize that we could not state or imply that MreB cannot polymerize in the absence of lipids. As a matter of fact, we always saw some random filaments in the EM fields, both in solution and in the presence of non-hydrolysable analogues, at very low frequency (Fig. 2A). And we do see now sheets at high MreB concentration (Fig. 2-S2B). We could be just missing the optimal conditions for polymerisation in solution, while our phrasing gave the impression that no polymers could ever form in the absence of ATP or lipids. Therefore, we have:

      1) analyzed all TEM data to present it as semi-quantitative TEM, using our methodology originally implemented for the analysis of the mutants

      2) reworked the text to remove any issuing statements and to indicate that MreBGs was only found to bind to a lipid monolayer as a double protofilament in the presence of ATP/GTP but that this does not exclude that filaments may also form in other conditions.

      In order to definitively prove that MreB(Gs) does not have polymers in solution, the authors should:

      i) conduct orthogonal experiments to test for polymers in solution. The simplest test of polymerization might be conducting pelleting assays of MreB(Gs) with and without lipids, sweeping through the concentration range as done in 2B and 5a.

      Response 2.3Ci. Following reviewer #2 suggestion, we conducted a series of sedimentation assays in the presence and in the absence of lipids, at low (100 mM) and high (500 mM) salt, for both the wild-type protein and the three membrane-anchoring mutants (all at 1.3 µM). Sedimentation experiments in salt conditions preventing aggregation in solution (500 mM KCl) fitted with our TEM results: MreB wild-type pelleting increased in the presence of both ATP and lipids (Fig. R1). The sedimentation was further increased at 100 mM KCl, which would fit our other results indicating an increased interaction of MreB with the membrane. However, in addition to be poorly reproducible (in our hands), the approach does not discriminate between polymers and aggregates (or monomers bound to liposomes) and since MreB has a strong tendency to aggregate, we believe that the technique is ill-suited to reliably address MreB polymerization and prefer not to include sedimentation data in our manuscript. The recent work from Pande et al. (2022) illustrates well this issue since no sedimentation of MreB (at 2 µM) was observed in solution in conditions supporting polymerization (at 300 mM KCl): ‘the protein does not pellet on its own in the absence of liposome, irrespective of its polymerization state’, implying that sedimentation does not allow to detect MreB5 filaments in solution (Pande et al., 2022).

      ii) They also could examine if they see MreB filaments in the absence of lipids at 100mM salt (as was seen in both Löwe studies), as the high salt used here might block the charges on glow discharged grids, making it difficult for the polymer to adhere.

      See above, Response 2.3C

      iii) Likewise, the claim that MreB lacking the amino-terminus and the α2β7 hydrophobic loop "is required for polymerization" is questionable as if deleting these resides blocks membrane binding, the lack of polymers on the membrane on the grid is not unexpected, as these filaments that cannot bind the membrane would not be observable. Given these mutants cannot bind the membrane, mutant polymers could still indeed exist in solution, and thus pelleting assays should be used to test if non-membrane associated filaments composed of these mutants do or do not exist.

      Response 2.3Ciii. This is a fair point, we thank the reviewer for this remark. We did not mean to state or imply that the hydrophobic loop was required for polymerization per se, but that polymerization into double filaments only efficiently occurs upon membrane binding, which is mediated by the two hydrophobic sequences. We tested all three mutants by sedimentation as suggested by reviewer #2. In the salt condition that limits aggregation (500 mM KCl) the mutants did not pellet while the wild-type protein did (in the presence of lipids) (Fig. R2 below), in agreement with our EM data. We tested the absence of lipids on the mutant bearing the 2 deletions and observed that the (partial) sedimentation observed at low KCl concentration was ATP and lipid dependent (Fig. R3).

      Given our concerns about MreB sedimentation assays (see above, Response 2.3Ci), we prefer not to include these sedimentation data in our manuscript. Instead, we tested by TEM the possible polymerization of the mutants in solution (we only tested them in the presence of lipids in the initial submission). No filaments were detected in solution for any of the mutants (Fig. 4-S3A).

      A final note, the results shown in "Figure 1 - figure supplement 2, panel C" appear to directly refute the claim that MreB(Gs) requires lipids to polymerize. As currently written, it appears they can observe MreB(Gs) filaments on EM grids without lipids. If these experiments were done in the presence of lipids, the figure legend should be updated to indicate that. If these experiments were done in the absence of lipids, the claim that membrane association is required for MreB polymerizations should be revised.

      The TEM experiments show were indeed performed in the presence of lipids. We apologize for this was not clearly stated in the legend. To prevent all confusion, we have nevertheless removed these images in this figure since the polymerization conditions and lipid requirement are not yet presented when this figure is referred to in the text. We have instead added a panel with the calibration curve for the size exclusion profiles as per request of reviewer #3. The main point of this figure is to show the tendency of MreBGs to aggregate: analytical size-exclusion chromatography shows a single peak corresponding to the monomeric MreBGs, molecular weight ~ 37 KDa, in our purification conditions, but it can readily shift to a peak corresponding to high MW aggregates, depending on the protein concentration and/or storage conditions.

      1. (Difference 4) - The next difference between this study and previous studies of MreB and actin homologs is the conclusion that MreB(Gs) must hydrolyze ATP in order to polymerize. This conclusion is surprising, given the fact that both T. Maritima (Salje · 2011, Bean 2008) and B. subtilis MreB (Mayer 2009) have been shown to polymerize in the presence of ATP as well as AMP-PNP.

      Likewise, MreB polymerization has been shown to lag ATP hydrolysis in not only T. maritima MreB (Esue 2005), eukaryotic actin, and all other prokaryotic actin homologs whose polymerization and phosphate release have been directly compared: MamK (Deng et al., 2016), AlfA (Polka et al., 2009), and two divergent ParM homologs (Garner et al., 2004; Rivera et al., 2011). Currently, the only piece of evidence supporting the idea that MreB(Gs) must hydrolyze ATP in order to polymerize comes from 2 observations: 1) using electron microscopy, they cannot see filaments of MreB(Gs) on membranes in the presence of AMP-PNP or ApCpp, and 2) no appreciable signal increase appears testing AMPPNP- MreB(Gs) using QCM-D. This evidence is by no means conclusive enough to support this bold claim: While their competition experiment does indicate AMPPNP binds to MreB(Gs), it is possible that MreB(Gs) cannot polymerize when bound to AMPPNP.

      For example, it has been shown that different actin homologs respond differently to different non-hydrolysable analogs: Some, like actin, can hydrolyze one ATP analog but not the other, while others are able to bind to many different ATP analogs but only polymerize with some of one of them.

      Response 2.4. We agree with the reviewer, it is uncertain what analogs bind because they are quite different to ATP and some proteins just do not like them, they can change conditions such that filaments stop forming as well and be (theoretically) misleading. This is why we had tested ApCpp in addition to AMP-PNP as non-hydrolysable analog (Fig. 3A). As indicated above, our new complementary experiments (Fig. 3-S1B-D) now show that some rare (i.e. unfrequently and in limited amount) dual polymers are detected in the presence of ApCpp (Fig. 3A) and at high MreB concentration only in the presence of AMP-PNP (Fig. 3-S1B-D), suggesting different critical concentrations in the presence of alternative nucleotides. We have dampened our conclusions, in the light of our new data, and modified the discussion accordingly.

      Thus, to further verify their "hydrolysis is needed for polymerization" conclusion, they should:

      A. Test if a hydrolysis deficient MreB(Gs) mutant (such as D158A) is also unable to polymerize by EM.

      Response 2.4A. We thank the reviewer for this suggestion. As this conclusion has been reviewed on the basis of our new data (see previous response), testing putative ATPase deficient mutants is no longer required here. The study of ATPase mutants is planned for future studies (see Response 3.10 to reviewer #3).

      B. They also should conduct an orthogonal assay of MreB polymerization aside from EM (pelleting assays might be the easiest). They should test if polymers of ATP, AMP-PNP, and MreB(Gs)(D158A) form in solution (without membranes) by conducting pelleting assays. These could also be conducted with and without lipids, thereby also addressing the points noted above in point 3.

      Response 2.4B. Please see Response 2.3Ci above.

      C. Polymers may indeed form with ATP-gamma-S, and this non-hydrolysable ATP analog should be tested.

      Response 2.4C. It is fairly possible that ATP-γ-S supports polymerization since it is known to be partially hydrolysable by actin giving a mild phenotype (Mannherz et al, 1975). This molecule can even be a bona fide substrate for some ATPases (e.g. (Peck & Herschlag, 2003). Thus, we decided to exclude this “non-hydrolysable” analog and tested instead AMP-PNP and ApCpp. We know that ATP-γ-S has been and it is still frequently used, but we preferred to avoid it for the moment for the above-indicated reasons. We chose AMPPNP and AMPPCP instead because (1) they were shown to be completely non-hydrolysable by actin, in contrast to ATP-γ-S; (2) they are widely used (the most commonly used for structural studies; (Lacabanne et al, 2020), (3) AMPPNP was previously used in several publications on MreB (Bean & Amann, 2008; Nurse & Marians, 2013; Pande et al., 2022; Popp et al., 2010; Salje et al, 2011; van den Ent et al., 2014)and thus would allow direct comparison. AMPPCP was added to confirm the finding with AMP-PNP. There are many other analogs that we are planning to explore in future studies (see next Response, 2.4D).

      D. They could also test how the ADP-Phosphate bound MreB(Gs) polymerizes in bulk and on membranes, using beryllium phosphate to trap MreB in the ADP-Pi state. This might allow them to further refine their model.

      Response 2.4D. We plan to address the question of the transition state in depth in following-up work, using a series of analogs and mutants presumably affected in ATPase activity, both predicted and identified in a genetic screen. As indicated above, it is uncertain what analogs bind because they are quite different to ATP and some may bind but prevent filament formation. Thus, we anticipate that trying just one may not be sufficient, they can change conditions and be (theoretically) misleading and thus a thorough analysis is needed to address this question. Since our model and conclusions have been revised on the basis of our new data, we believe that these experiments are beyond the scope of the current manuscript.

      E. Importantly, the Mayer study of B. subtilis MreB found the same results in regard to nucleotides, "In polymerization buffer, MreB produced phosphate in the presence of ATP and GTP, but not in ADP, AMP, GDP or AMP-PNP, or without the readdition of any nucleotide". Thus this paper should be referenced and discussed

      Response 2.4E. We agree that Pi release was detected previously. We have added the reference (L121)

      1. (Difference 5) - The introduction states (lines 128-130) "However, the need for nucleotide binding and hydrolysis in polymerization remains unclear due to conflicting results, in vivo and in vitro, including the ability of MreB to polymerize or not in the presence of ADP or the non-hydrolysable ATP analog AMP-PNP."

      A) While this is a great way to introduce the problem, the statement is a bit vague and should be clarified, detaining the conflicting results and appropriate references. For example, what conflicting in vivo results are they referring to? Regarding "MreB polymerization in AMP-PNP", multiple groups have shown the polymerization of MreB(Tm) in the presence of AMP-PNP, but it is not clear what papers found opposing results.

      Response 2.5A. Thanks for the comment. We originally did not detail these ‘conflicting results’ in the Introduction because we were doing it later in the text, with the appropriate references, in particular in the Discussion (former L433-442). We have now removed this from the Discussion section and added a sentence in the introduction too (L123-130) quickly detailing the discrepancies and giving the references.

      • For more clarity, we have removed the “in vivo” (which referred to the distinct results reported for the presumed ATPase mutants by the Garner and Graumann groups) and focus on the in vitro discrepancies only.

      • These discrepancies are the following: while some studies showed indeed polymerization (as assessed by EM) of MreBTm in the presence of AMPPNP, the studies from Popp et al and Esue et al on T. maritima MreB, and of Nurse et al on E. coli MreB reported aggregation in the presence of AMP-PNP (Esue et al., 2006; Popp et al., 2010) or ADP (Nurse & Marians, 2013), or no assembly in the presence of ADP (Esue et al., 2006). As for the studies reporting polymerization in the presence of AMP-PNP by light scattering only (Bean & Amann, 2008; Gaballah et al, 2011; Mayer & Amann, 2009; Nurse & Marians, 2013), they could not differentiate between aggregates or true polymers and thus cannot be considered conclusive.

      B) The statement "However, the need for nucleotide binding and hydrolysis in polymerization remains unclear due to conflicting results, in vivo and in vitro, including the ability of MreB to polymerize or not in the presence of ADP or the non-hydrolyzable ATP analog AMP-PNP" is technically incorrect and should be rephrased or further tested.

      i. For all actin (or tubulin) family proteins, it is not that a given filament "cannot polymerize" in the presence of ADP but rather that the ADP-bound form has a higher critical concentration for polymer formation relative to the ATP-bound form. This means that the ADP polymers can indeed polymerize, but only when the total protein exceeds the ADP critical concentration. For example, many actin-family proteins do indeed polymerize in ADP: ADP actin has a 10-fold higher critical concentration than ATP actin, (Pollard, 1984) and the ADP critical concentrations of AlfA and ParM are 5X and 50X fold higher (respectively) than their ATP-bound forms(Garner et al., 2004; Polka et al., 2009)

      Response 2.5Bi. Absolutely correct. We apologize for the lack of accuracy of our phrasing and have corrected it (L123).

      ii. Likewise, (Mayer and Amann, 2009) have already demonstrated that B. subtilis MreB can polymerize in the presence of ADP, with a slightly higher critical concentration relative to the ATP-bound form.

      Response 2.5Bii. In Mayer and Amann, 2009, the same light scattering signal (interpreted as polymerization) occurred regardless of the nucleotide, and also in the absence of nucleotide (their Fig. 10) and ATP-, ADP- and AMP-PNP-MreB ‘displayed nearly indistinguishable critical concentrations’. They concluded that MreB polymerization is nucleotide-independent. Please see below (responses to ’Other points to address’) our extensive answer to the Mayer & Amann recurring point of reviewer #2

      Thus, to prove that MreB(Gs) polymers do not form in the presence of ADP would require one to test a large concentration range of ADP-bound MreB(Gs). They should test if ADP- MreB(Gs) polymerizes at the highest MreB(Gs) concentrations that can be assayed. Even if this fails, it may be the MreB(Gs) ADP polymerizes at higher concentrations than is possible with their protein preps (13uM). An even more simple fix would be to simply state MreB(Gs)-ADP filaments do not form beneath a given MreB(Gs) concentration.

      We agree with the reviewer. Our wording was overstating our conclusions. Based on our new quantifications (Fig. 3-S1B, D), we have rephrased the results section and now indicate that pairs of filaments are occasionally observed in the presence of ADP in our conditions across the range of MreB concentration that could be tested, suggesting a higher critical concentration for MreB-ADP (L310-312). Only at the highest MreB concentration, sheet- and ribbon-like structures were observed in the presence of ADP (Fig. 3-S2B).

      Other Points to address:

      1) There are several points in this paper where the work by Mayer and Amann is ignored, not cited, or readily dismissed as "hampered by aggregation" without any explanation or supporting evidence of that fact.

      We have cited the Mayer study where appropriate. However, we cannot cite it as proof of polymerization in such or such condition since their approach does not show that polymers were obtained in their conditions. Again, they based all their conclusions solely on light scattering experiments, which cannot differentiate between polymers and aggregates.

      A) Lines 100-101 - While the irregular 3-D formations seen formed by MreB in the Dersch 2020 paper could be interpreted as aggregates, stating that the results from specifically the Gaballah and Meyer papers (and not others) were "hampered by aggregation" is currently an arbitrary statement, with no evidence or backing provided. Overall, these lines (and others in the paper) dismiss these two works without giving any evidence to that point. Thus, they should provide evidence for why they believe all these papers are aggregation, or remove these (and other) dismissive statements.

      We apologize if our statements about these reports seemed dismissive or disrespectful, it was definitely not our intention. Light scattering shows an increase of size of particles over time, but there is no way to tell if the scattering is due to organized (polymers) or disorganized (aggregation) assemblies. Thus, it cannot be considered a conclusive evidence of polymerization without the proof that true filaments are formed by the protein in the conditions tested, as confirmed by EM for example. MreB is known to easily aggregate (see our size exclusion chromatography profiles and ones from Dersch 2020 (Dersch et al, 2020), and note that no chromatography profiles were shown in the Mayer report) and, as indicated above, we had similar light scattering results for MreB for years, while only aggregates could be observed by TEM (see above Response 2.3A). Several observations also suggest that aggregation instead of polymerization might be at play in the Mayer study, for example ‘polymerization’ occurring in salt-less buffer but ‘inhibited’ with as low as 100 mM KCl, which should rather be “salting in” (see below). We did not intend to be dismissive, but it seemed wrong to report their conclusions as conclusive evidence. We thought that we had cited these papers where appropriate but then explained that they show no conclusive proof of polymerization and why, but it is evident that we failed at communicating it clearly. We have reworked the text to remove any issuing and arbitrary statement about our concerns regarding these reports (e.g. L93 & L126).

      One important note - There are 2 points indicating that dismissing the Meyer and Amann work as aggregation is incorrect:

      1) the Meyer work on B. subtilis MreB shows both an ATP and a slightly higher ADP critical concentration. As the emergence of a critical concentration is a steady-state phenomenon arising from the association/dissociation of monomers (and a kinetically limiting nucleation barrier), an emergent critical concentration cannot arise from protein aggregation, critical concentrations only arise from a dynamic equilibrium between monomer and polymer.

      • Critical concentration for ATP, ADP or AMPPNP were described in Mayer & Amann (Mayer & Amann, 2009) as “nearly indistinguishable” (see Response 2.5Bii)
      • Protein aggregation depends on the solution (pH and ions), protein concentration and temperature. And above a certain concentration, proteins can become instable, thus a critical concentration for aggregation can emerge.

      2) Furthermore, Meyer observed that increased salt slowed and reduced B. subtilis MreB light scattering, the opposite of what one would expect if their "polymerization signal" was only protein aggregation, as higher salts should increase the rate of aggregation by increasing the hydrophobic effect.

      It is true that at high salt concentration proteins can precipitate, a phenomenon described as “salting out”. However, it is also true that salts help to solubilize proteins (“salting in”), and that proteins tend to precipitate in the absence of salt. Considering that the starting point of the Mayer and Amann experiment (Mayer & Amann, 2009) is the absence of salt (where they observed the highest scattering) and that they gradually reduce this scattering by increasing KCl (the scattering is almost abolished below 100 mM only!) it is plausible that a salting-in phenomenon might be at play, due to increased solubility of MreB by salt. In any case, this cannot be taken as a proof that polymerization rather than aggregation occurred.

      B) Lines 113-137 -The authors reference many different studies of MreB, including both MreB on membranes and MreB polymerized in solution (which formed bundles). However, they again neglect to mention or reference the findings of Meyer and Amann (Mayer and Amann, 2009), as it was dismissed as "aggregation". As B. subtilis is also a gram-positive organism, the Meyer results should be discussed.

      We did cite the Mayer and Amann paper but, as explained above, we cannot cite this study as an example of proven polymerization. We avoided as much as possible to polemicize in the text and cited this paper when possible. Again, we have reworked the text to avoid any issuing or dismissive statement. Also, we forgot mentioned this study at L121 as an example of reported ATPase activity, and this has now been corrected.

      2) Lines 387-391 state the rates of phosphate release relative to past MreB findings: "These rates of Pi release upon ATP hydrolysis (~ 1 Pi/MreB in 6 min at 53{degree sign}C) are comparable to those observed for MreBTm and MreB(Ec) in vitro". While the measurements of Pi release AND ATP hydrolysis have indeed been measured for actin, this statement does not apply to MreB and should be corrected: All MreB papers thus far have only measured Pi release alone, not ATP hydrolysis at the same time. Thus, it is inaccurate to state "rates of Pi release upon ATP hydrolysis" for any MreB study, as to accurately determine the rate of Pi release, one must measure: 1. The rate of polymer over time, 2) the rate of ATP hydrolysis, and 3) the rate of phosphate release. For MreB, no one has, so far, even measured the rates of ATP hydrolysis and phosphate release with the same sample.

      We completely agree with the reviewer, we apologize if our formulation was inaccurate. We have corrected the sentence (L479). Thank you for pointing out this mistake.

      3) The interpretation of the interactions between monomers in the MreB crystal should be more carefully stated to avoid confusion. While likely not their intention, the discussions of the crystal packing contacts of MreB can appear to assume that the monomer-monomer contacts they see in crystals represent the contacts within actual protofilaments. One cannot automatically assume the observations of monomer-monomer contacts within a crystal reflect those that arise in the actual filament (or protofilament).

      We agree, we thank the reviewer for his comments. We have revamped the corresponding paragraph.

      A) They state, "the apo form of MreBGs forms less stable protofilaments than its G- homologs ." Given filaments of the Apo form of MreB(GS) or b. subtilis have never been observed in solution, this statement is not accurate: while the contacts in the crystal may change with and without nucleotide, if the protein does not form polymers in solution in the apo state, then there are no "real" apo protofilaments, and any statements about their stability become moot. Thus this statement should be rephrased or appropriately qualified.

      see above.

      B) Another example: while they may see that in the apo MreB crystal, the loop of domain IB makes a single salt bridge with IIA and none with IIB. This contrasts with every actin, MreB, and actin homolog studied so far, where domain IB interacts with IIB. This might reflect the real contacts of MreB(Gs) in the solution, or it may be simply a crystal-packing artifact. Thus, the authors should be careful in their claims, making it clear to the reader that the contacts in the crystal may not necessarily be present in polymerized filaments.

      Again, we agree with the reviewer, we cannot draw general conclusions about the interactions between monomers from the apo form. We have rephrased this paragraph.

      4) lines 201-202 - "Polymers were only observed at a concentration of MreB above 0.55 μM (0.02 mg/mL)". Given this concentration dependence of filament formation, which appears the same throughout the paper, the authors could state that 0.55 μM is the critical concentration of MreB on membranes under their buffer conditions. Given the lack of critical concentration measurement in most of the MreB literature, this could be an important point to make in the field.

      Following reviewer’s #2 suggestion, we have now estimated the critical concentration (Cc=0.4485 µM) and reported it in the text. (L218).

      5) Both mg/ml and uM are used in the text and figures to refer to protein concentration. They should stick to one convention, preferably uM, as is standard in the polymer field.

      Sorry for the confusion. We have homogenized to MreB concentrations to µM throughout the text and figures.

      6) Lines 77-78 - (Teeffelen et al., 2011) should be referenced as well in regard to cell wall synthesis driving MreB motion.

      This has been corrected, sorry for omitting this reference.

      7) Line 90 - "Do they exhibit turnover (treadmill) like actin filaments?". This phrase should be modified, as turnover and treadmilling are two very different things. Turnover is the lifetime of monomers in filaments, while treadmilling entails monomer addition at one end and loss at the other. While treadmilling filaments cause turnover, there are also numerous examples of non-treadmilling filaments undergoing turnover: microtubules, intermediate filaments, and ParM. Likewise, an antiparallel filament cannot directionally treadmill, as there is no difference between the two filament ends to confer directional polarity.

      This is absolutely true, we apologize for our mistake. The sentence has been corrected (L82).

      8) Throughout the paper, the term aggregation is used occasionally to describe the polymerization shown in many previous MreB studies, almost all of which very clearly showed "bundled" filaments, very distinct entities from aggregates, as a bundle of polymers cannot form without the filaments first polymerizing on their own. Evidence to this point, polymerization has been shown to precede the bundling of MreB(Tm) by (Esue et al., 2005).

      We agree with reviewer #2 about polymers preceding bundles and “sheets”. However, we respectfully disagree that we used the word aggregation “throughout the paper” to describe structures that clearly showed polymers or sheets of filaments. A search (Ctrl-F: “aggreg”) reveals only 6 matches, 3 describing our own observations (L152, 163/5, and 1023/28), one referring to (Salje et al., 2011) (L107) but citing her claim that they observed aggregation (due to the N-terminus), and the last two (L100, L440) refer (again) to the Gaballah/Mayer/Dersch publications to say that aggregation could not be excluded in these reports as discussed above (Dersch et al., 2020; Gaballah et al., 2011; Mayer & Amann, 2009).

      9) lines 106-108 mention that "The N-terminal amphipathic helix of E. coli MreB (MreBEc) was found to be necessary for membrane binding. " This is not accurate, as Salje observed that one single helix could not cause MreB to mind to the membrane, but rather, multiple amphipathic helices were required for membrane association (Salje et al., 2011).

      Salje et al showed that in vivo the deletion of the helix abolishes the association of MreB to the membrane. This publication also shows that in vitro, addition of the helix to GFP (not to MreB) prompts binding to lipid vesicles, and that this was increased if there are 2 copies of the helix, but they could not test this directly in vitro with MreB (which is insoluble when expressed with its N-terminus). This prompted them to speculate that multiple MreBs could bind better to the membrane than monomers. However, this remained to be demonstrated. Additional hydrophobic regions in MreB such as the hydrophobic loop could participate to membrane anchoring but are absent in their in vitro assays with GFP.

      The Salje results imply that dimers (or further assemblies) of MreB drive membrane association, a point that should be discussed in regard to the question "What prompts the assembly of MreB on the inner leaflet of the cytoplasmic membrane?" posed on lines 86-87.

      We agree that this is an interesting point. As it is consistent with our results, we have incorporated it to our model (Fig. 6) and we are addressing it in the discussion L573-575.

      10) On lines 414-415, it is stated, "The requirement of the membrane for polymerization is consistent with the observation that MreB polymeric assemblies in vivo are membrane-associated only." While I agree with this hypothesis, it must be noted that the presence or absence of MreB polymers in the cytoplasm has not been directly tested, as short filaments in the cytoplasm would diffuse very quickly, requiring very short exposures (<5ms) to resolve them relative to their rate of diffusion. Thus, cytoplasmic polymers might still exist but have not been tested.

      This is also an interesting point. Indeed if a nucleated form, or very short (unbundled) polymers exist in the cytoplasm, they have not been tested by fluorescence microscopy. However, the polymers that localize at the membrane (~ 200 nm), if soluble, would have been detected in the cytoplasm by the work of reviewer #2, us or others.

      11) lines 429-431 state, "but polymerization in the presence of ADP was in most cases concluded from light scattering experiments alone, so the possibility that aggregation rather than ordered polymerization occurred in the process cannot be excluded."

      A) If an increased light scattering signal is initiated by the addition of ADP (or any nucleotide), that signal must come from polymerization or multimerization. What the authors imply is that there must be some ADP-dependent "aggregation" of MreB, which has not been seen thus far for any polymer. Furthermore, why would the addition of ADP initiate aggregation?

      We did not mean that ADP itself would prompt aggregation, but that the protein would aggregate in the buffer regardless of the presence of ADP or other nucleotides. The Mayer & Amann study claims that MreB “polymerization” is nucleotide-independent, as they got identical curves with ATP, ADP, AMPPNP and even with no nucleotides at all (Fig. 10 in their paper, pasted here) (Mayer & Amann, 2009).

      Their experiments with KCl are also remarkable as when they lowered the salt they got faster and faster “polymerization”, with the strongest light scattering signal in the absence of any salt. The high KCl concentration in which they got almost no more “polymers” was 75 mM KCl, and ‘polymerization was almost entirely inhibited at 100 mM’ (Fig. 7, pasted below). Yet the intracellular level of KCl in bacteria is estimated to be ~300 mM (see Response 1.1)

      B) Likewise, the statement "Differences in the purity of the nucleotide stocks used in these studies could also explain some of the discrepancies" is unexplained and confusing. How could an impurity in a nucleotide stock affect the past MreB results, and what is the precedent for this claim?

      We meant that the presence of ATP in the ADP stocks might have affected the outcome of some assays, generating the conflicting results existing in the literature. We agree this sentence was confusing, we have removed it.

      12) lines 467-469 state, "Thus, for both MreB and actin, despite hydrolyzing ATP before and after polymerization, respectively, the ADP-Pi-MreB intermediate would be the long-lived intermediate state within the filaments."

      A) For MreB, this statement is extremely speculative and unbiased, as no one has measured 1) polymerization, 2) ATP hydrolysis, and 3) phosphate release. For example, it could be that ATP hydrolysis is slow, while phosphate release is fast, as is seen in the actin from Saccharomyces cerevisiae.

      We agree that this was too speculative. This has been removed from the (extensively) modified Discussion section. Thanks for the comment.

      B) For actin, the statement of hydrolysis of ATP of monomer occurring "before polymerization" is functionally irrelevant, as the rate of ATP hydrolysis of actin monomers is 430,000 times slower than that of actin monomers inside filaments (Blanchoin and Pollard, 2002; Rould et al., 2006).

      We agree that the difference of hydrolysis rate between G-actin and F-actin implies that ATP hydrolysis occurs after polymerization. We are afraid that we do not follow the reviewer’s point here, we did not say or imply that ATP hydrolysis by actin monomers was functionally relevant.

      13) Lines 442-444. "On the basis of our data and the existing literature, we propose that the requirement for ATP (or GTP) hydrolysis for polymerization may be conserved for most MreBs." Again, this statement both here (and in the prior text) is an extremely bold claim, one that runs contrary to a large amount of past work on not just MreB, but also eukaryotic actin and every actin homolog studied so far. They come to this model based on 1) one piece of suggestive data (the behavior of MreB(GS) bound to 2 non-hydrolysable ATP analogs in 500mM KCL), and 2) the dismissal (throughout the paper) of many peer-reviewed MreB papers that run counter to their model as "aggregation" or "contaminated ATP stocks ." If they want to make this bold claim that their finding invalidates the work of many labs, they must back it up with further validating experiments.

      We respectfully disagree that our model was based on “one piece of suggestive data” and backed-up by dismissing most past work in the field. We only wanted to raise awareness about the conflicting data between some reports (listed in response 2.5a), and that the claims made by some publications are to be taken with caution because they only rely on light scattering or, when TEM was performed, showed only disorganized structures.

      This said, we clearly failed in proposing our model and we are sorry to see that we really annoyed the reviewer with our suspicion that the work by Mayer & Amann reports aggregation. As indicated above, we have amended our manuscript relative to this point. We also agree that our suggestion to generalize our findings to most MreBs was unsupported, and overstated considering how confusing some result from the literature are. We have refined our model and reworked the text to take on board the reviewer’s remarks as well as the new data generated during the revision process.

      We would like to thank reviewer #2 for his in-depth review of our manuscript.  

      Reviewer #3 (Public Review):

      The major claim from the paper is the dependence of two factors that determine the polymerization of MreB from a Gram-positive, thermophilic bacteria 1) The role of nucleotide hydrolysis in driving the polymerization. 2) Lipid bilayer as a facilitator/scaffold that is required for hydrolysis-dependent polymerization. These two conclusions are contrasting with what has been known until now for the MreB proteins that have been characterized in vitro. The experiments performed in the paper do not completely justify these claims as elaborated below.

      We understand the reviewer’ concerns in view of the existing literature on actin and Gram-negative MreBs. We may just be missing the optimal conditions for polymerisation in solution, while our phrasing gave the impression that polymers could never form in the absence of ATP or lipids. Our new data actually shows that MreBGs at higher concentration can assemble into bundle- and sheet-like structures in solution and in the presence of ADP/AMP-PNP. Pairs of filaments are however only observed in the presence of lipids for all conditions tested. As indicated in the answers to the global review comments, we have included our new data in the manuscript, revised our conclusions and claims about the lipid requirement and expanded on these points in the Discussion.

      Major comments:

      1) No observation of filaments in the absence of lipid monolayer can also be accounted due to the higher critical concentration of polymerization for MreBGS in that condition. It is seen that all the negative staining without lipid monolayer condition has been performed at a concentration of 0.05 mg/mL. It is important to check for polymerization of the MreBGS at higher concentration ranges as well, in order to conclusively state the requirement of lipids for polymerization.

      Response 3.1. 0.05 mg/ml (1.3µM) is our standard condition, and our leeway was limited by the rapid aggregation observed at higher MreB concentrations, as indicated in the text. We have now tested as well 0.25 mg/ml (6.5 µM - the maximum concentration possible before major aggregation occurs in our experimental conditions). At this higher concentration, we see some sheet-like structures in solution, confirming a requirement of a higher concentration of MreB for polymerization in these conditions (see the answers to the global review comments for more details)

      We thank the reviewer for pushing us to address this point. We have revised our conclusions accordingly.

      2) The absence of filaments for the non-hydrolysable conditions in the lipid layer could also be because the filaments that might have formed are not binding to the planar lipid layer, and not necessarily because of their inability to polymerize.

      Response 3.2. This is a fair point. To test the possibility that polymers would form but would not bind to the lipid layer we have now added additional semi-quantitative EM controls (for both the non-hydrolysable ATP analogs and the three ‘membrane binding’ deletion mutants) testing polymerization in solution (without lipids) and also using plasma-treated grids. These showed that in our standard polymerization conditions, virtually no polymers form in solution (Fig. 3-S1B and Fig. 4-S4A). Albeit at very low frequency, some dual protofilaments were however detected in the presence of ADP or AMP-PNP at the high MreB concentration (Fig. 3-S1D). At this high MreB concentration, the sheet-like structures occasionally observed in solution in the presence of ATP were frequent in the presence of ADP and very frequent in the presence of AMP-PNP (Fig. 3-S2B). We have revised our conclusions on the basis of these new data: MreBGs can form polymeric assemblies in solution and in the absence of ATP hydrolysis at a higher critical concentration than in the presence of ATP and lipids.

      See the answers to the global review comments (point 2) and Response 2.3C to reviewer #2 for more details.

      3) Given the ATPase activity measurements, it is not very convincing that ATP rather than ADP will be present in the structure. The ATP should have been hydrolysed to ADP within the structure. The structure is now suggestive that MreB is not capable of hydrolysis, which is contradictory to the ATP hydrolysis data.

      Response 3.3. We thank the reviewer for her insightful remarks about the MreB-ATP crystal structure. The electron density map clearly demonstrates the presence of 3 phosphates. However, as suggested by the reviewer, the density which was attributed to a Mg2+ ion was to be interpreted as a water molecule. The absence of Mg2+ in the crystal could thus explain why the ATP had not been hydrolyzed.

      References

      Arino J, Ramos J, Sychrova H (2010) Alkali metal cation transport and homeostasis in yeasts. Microbiology and molecular biology reviews 74: 95-120

      Bean GJ, Amann KJ (2008) Polymerization properties of the Thermotoga maritima actin MreB: roles of temperature, nucleotides, and ions. Biochemistry 47: 826-835

      Cayley S, Lewis BA, Guttman HJ, Record MT, Jr. (1991) Characterization of the cytoplasm of Escherichia coli K-12 as a function of external osmolarity. Implications for protein-DNA interactions in vivo. Journal of molecular biology 222: 281-300

      Dersch S, Reimold C, Stoll J, Breddermann H, Heimerl T, Defeu Soufo HJ, Graumann PL (2020) Polymerization of Bacillus subtilis MreB on a lipid membrane reveals lateral co-polymerization of MreB paralogs and strong effects of cations on filament formation. BMC Mol Cell Biol 21: 76

      Eisenstadt E (1972) Potassium content during growth and sporulation in Bacillus subtilis. Journal of bacteriology 112: 264-267

      Epstein W, Schultz SG (1965) Cation Transport in Escherichia coli: V. Regulation of cation content. J Gen Physiol 49: 221-234

      Esue O, Wirtz D, Tseng Y (2006) GTPase activity, structure, and mechanical properties of filaments assembled from bacterial cytoskeleton protein MreB. Journal of bacteriology 188: 968-976

      Gaballah A, Kloeckner A, Otten C, Sahl HG, Henrichfreise B (2011) Functional analysis of the cytoskeleton protein MreB from Chlamydophila pneumoniae. PloS one 6: e25129

      Harne S, Duret S, Pande V, Bapat M, Beven L, Gayathri P (2020) MreB5 Is a Determinant of Rod-to-Helical Transition in the Cell-Wall-less Bacterium Spiroplasma. Curr Biol 30: 4753-4762 e4757

      Kang H, Bradley MJ, McCullough BR, Pierre A, Grintsevich EE, Reisler E, De La Cruz EM (2012) Identification of cation-binding sites on actin that drive polymerization and modulate bending stiffness. Proceedings of the National Academy of Sciences of the United States of America 109: 16923-16927

      Lacabanne D, Wiegand T, Wili N, Kozlova MI, Cadalbert R, Klose D, Mulkidjanian AY, Meier BH, Bockmann A (2020) ATP Analogues for Structural Investigations: Case Studies of a DnaB Helicase and an ABC Transporter. Molecules 25

      Mannherz HG, Brehme H, Lamp U (1975) Depolymerisation of F-actin to G-actin and its repolymerisation in the presence of analogs of adenosine triphosphate. Eur J Biochem 60: 109-116

      Mayer JA, Amann KJ (2009) Assembly properties of the Bacillus subtilis actin, MreB. Cell motility and the cytoskeleton 66: 109-118

      Nurse P, Marians KJ (2013) Purification and characterization of Escherichia coli MreB protein. The Journal of biological chemistry 288: 3469-3475

      Pande V, Mitra N, Bagde SR, Srinivasan R, Gayathri P (2022) Filament organization of the bacterial actin MreB is dependent on the nucleotide state. The Journal of cell biology 221

      Peck ML, Herschlag D (2003) Adenosine 5 '-O-(3-thio)triphosphate (ATP-gamma S) is a substrate for the nucleotide hydrolysis and RNA unwinding activities of eukaryotic translation initiation factor eIF4A. Rna 9: 1180-1187

      Popp D, Narita A, Maeda K, Fujisawa T, Ghoshdastider U, Iwasa M, Maeda Y, Robinson RC (2010) Filament structure, organization, and dynamics in MreB sheets. The Journal of biological chemistry 285: 15858-15865

      Rhoads DB, Waters FB, Epstein W (1976) Cation transport in Escherichia coli. VIII. Potassium transport mutants. J Gen Physiol 67: 325-341

      Rodriguez-Navarro A (2000) Potassium transport in fungi and plants. Biochimica et biophysica acta 1469: 1-30

      Salje J, van den Ent F, de Boer P, Lowe J (2011) Direct membrane binding by bacterial actin MreB. Molecular cell 43: 478-487

      Schmidt-Nielsen B (1975) Comparative physiology of cellular ion and volume regulation. J Exp Zool 194: 207-219

      Szatmari D, Sarkany P, Kocsis B, Nagy T, Miseta A, Barko S, Longauer B, Robinson RC, Nyitrai M (2020) Intracellular ion concentrations and cation-dependent remodelling of bacterial MreB assemblies. Sci Rep-Uk 10

      van den Ent F, Izore T, Bharat TA, Johnson CM, Lowe J (2014) Bacterial actin MreB forms antiparallel double filaments. eLife 3: e02634

      Whatmore AM, Chudek JA, Reed RH (1990) The Effects of Osmotic Upshock on the Intracellular Solute Pools of Bacillus subtilis. Journal of general microbiology 136: 2527-2535

    1. Author Response

      Reviewer #2 (Public Review):

      The authors present findings on a designed peptide, PITCR, and its role in inhibiting TCR activation through an extensive series of experiments. These include the measurement of phosphorylation in the TCR zeta chain and a number of associated signaling proteins such as Zap70, LAT, PLCg1, and SLP76. In addition, the authors measure the impact of PITCR on the TCR intracellular calcium response and examine the peptide-induced inhibition of TCR activation by antigen-presenting cells. They also present data indicating that the fluorescently labeled PITCR co-localizes with TCR in Jurkat cells and with ligand-bound TCR in primary murine cells. Overall the experiments provide useful insights into the mechanism of T cell activation and generally support an allosteric model of activation, while not necessarily excluding alternative models.

      However, some aspects of the study do need clarification.

      1) The authors do not provide a clear structural basis for their peptide design, which makes it difficult to understand the rationale for choosing this particular peptide. The use of a structural model based on the TCR zeta domain, for example, and how it becomes modified to generate PITCR would provide some clarity on what types of putative interactions are being engineered.

      We thank the reviewer for giving us a chance to elaborate. We have expanded the results section to provide more information on the peptide design, where we now point out that the acidic residues in the TCR TM allow peptide design. We have also applied the artificial intelligence program AlphaFold-Multimer (AlFoM) to generate a structural model of the docking site of PITCR in the TCR (Figure 9), which informs on new mechanistic insights, as we describe in the updated results section and discuss below.

      2) The inhibitory effects of PITCR are not large. Measurement of dose dependence might improve confidence in the results.

      As the reviewer points out, we have performed an extensive set of experiments to assess the inhibitory effect of PITCR. We have demonstrated that PITCR inhibits TCR phosphorylation. We have also tested all proximal signaling proteins: Zap70, LAT, SLP76, and PLC gamma. Critically, in all cases a statistically significant inhibition is observed. Furthermore, inhibition was additionally seen when TCR was activated by peptide presentation in antigen-presenting cells. Interaction between PITCR and the receptor is supported by co-localization, co-IP and the new AlphaFold-Multimer prediction. We are therefore confident in the results presented and that the inhibitory effect indeed exists. As we responded to reviewer 1 above, we discuss that inconsistent results were obtained with lower PITCR concentrations, suggesting that the use of a high peptide concentration is required for robust inhibition.

      3) Use of control peptides is not uniform. Control peptides similar to PITCR in Figure 1 and Figure 2 studies, for example, could strengthen the authors' arguments.

      The original version of the manuscript contained two negative control peptides, the G41P mutant of PITCR, and pHLIP, another pH-responsive peptide which behaves as a conditional transmembrane peptide. However, for feasibility reasons we did not use all the negative controls in all different experiments, as we were satisfied when a negative control peptide acted as such in an experiment. However, because we agree that increased use of negative control peptides will strengthen the manuscript, we have expanded the use of negative control peptides. Specifically, the updated version of the manuscript contains a new section where AlFoM is used to predict the binding pose of PITCR and the structural consequences of interaction (see Figure 9 and the four new supplementary figures). AlFoM showed that PITCR binds with a large interaction interface, and peptide binding causes a large rearrangement of the two zeta chains in TCR. Importantly, neither of the two original negative control peptides (PITCRG41P or pHLIP) impacts the zeta chains. When we used a new negative control, the conditional transmembrane peptide TYPE7 developed by us, AlFoM did not predict it to bind to TCR, as expected, strengthening our argument.

      Reviewer #3 (Public Review):

      The use of pH-responsive TM-targeting peptides, which the authors previously developed, is a novel aspect of this study. Those peptides can be quite powerful for understanding molecular mechanisms of receptor signaling, such as the allosteric activation model as tested in this study. The manuscript contains several interesting approaches and observations, but there are concerns about the experimental design and interpretation of the results. More importantly, the authors' primary conclusion that the allosteric changes in the TM bundles determine TCR activation is not fully supported by the data presented. For example:

      1) The authors provided confocal fluorescence images showing the colocalization of fluorescently labeled peptides and TCR subunits. Based on the data, they concluded that "PITCR is able to bind to TCR". This is misleading, because given the spatial resolution of the imaging technique, "colocalization" does not indicate binding or interaction between molecules. Because the peptide binding to the TM region is the pillar of the primary finding of this study, direct evidence supporting the peptide-TM binding or interaction is essential.

      We have to disagree that our statement is misleading: the section of the manuscript that the reviewer referred to, said “suggesting that PITCR is able to bind to TCR before it is activated by OKT3“. Therefore, we were not making a conclusion, just a mere suggestion, that we consider is justified, particularly as it is supported by data presented later. Nevertheless, we certainly agree with the reviewer that co-localization experiments fundamentally cannot indicate binding. We have modified the results (page 11) to follow the suggestion of the reviewer and indicate that co-localization data are not proof of interaction. In addition, we provide new AlphaFold multimer data, which supports that transmembrane binding indeed occurs.

      2) In calcium response experiments, the authors compared calcium influx (indicated by Indo-1 ratio) under different cell activation conditions (Figure 2). There are some concerns about how the authors interpreted the data: (1) The calcium plots from OKT3 activation in A-C panels are inconsistent. The plot in (A) showed a calcium peak after activation, which is not present in the plots shown in (B) and (C). There is no explanation or discussion on this inconsistency. (2) What is more concerning is that this prominent calcium peak in (A) was used to draw the conclusion that the designer peptide inhibitor effectively reduces calcium response. However, inconsistent with that conclusion, the calcium plots are indistinguishable for the three conditions: with PITCR (peptide inhibitor), with PITCRG41P (negative control that should not affect TCR activation), or no peptide. All three plots have similar magnetite and fluctuations. This does not support the authors' conclusion that the PITCR (peptide inhibitor) reduces calcium response in T cells.

      We thank the reviewer for this comment. We have updated figure 3, which now contains a different replicate of the calcium assay, which we think it is more straightforward to analyze, and more clearly shows the calcium inhibition, as quantified in panel D of the figure.

      3) Different types of T cells were used for separate measurements: E6-1 Jurkat T cells were used for calcium influx experiments, J. OT.hCD8+ Jurkat cells were used for CD69 measurements, and primary murine CD4+ T cells were used for colocalization imaging experiments. Rationales for the choices of cells in different measurements are also unclear. This is different from the common practice where different cell types are used in repeated experiments to test the generality of a finding. Here, they were used for different experiments, and findings were lumped together as "T cells", without further evidence/discussion on how translatable the findings from different cell types are.

      As the reviewer suggests, we have updated the manuscript to include discussion on the particularities of the use of the different T cells in pages 18 and 19. We envisioned this work as a proof of principle for the design of a peptide that can eventually be modified to be used for pre-clinical applications, and this paper is a first step. With this idea in mind, we wanted to test if this peptide can work in different types of TCR since: (1) TCR populations are diverse; and (2) our design is based on the transmembrane domain of CD3zeta chain, which is largely conserved among species. Using different types of T cells met this goal since they have different types of TCR, but the transmembrane domain of CD3zeta is conserved. In our paper, we used human Jurkat-TCR, OT1-TCR coupled with hCD8, and murine CD4-TCR. In addition, we not only used one activation marker to test the peptide’s inhibitory effect, we used three: phosphorylation, calcium influx, and CD69 activation. For the co-localization experiment, we not only use murine CD4 T cells, but we also tested it in Jurkat T cells with/without OKT3 stimulation as well.

      We selected these T cells because they were particularly suited for the breath of different measurements that this manuscript contains, based on published reports. In our opinion this approach broadens the relevance of the work.

      4) The authors set out to test the model that TCR activation by pMHC occurs through allosteric changes in the TM region, but in most experiments, they activated Jurkat T cells by anti-CD3 antibody, not by antigen peptides. The anti-CD3 antibody activates TCR signaling through clustering. It is unclear whether TCR activation by anti-CD3 leads to the same allosteric changes in the TM region as activation by pMHC. As such, the main claim of the paper, namely that the designer peptide affects TCR signaling by disrupting the allosteric changes in the TM region, remains insufficiently supported by the data presented.

      Figure 8 shows that the levels of co-IP in the presence of detergent are altered by OKT3 activation of TCR. It has recently been established (PMID: 34260912) that this assay allows the investigation of allosteric changes that contribute to activation of TCR. This evidence is supportive of allosterism in TCR activation. Additionally, the TCR proximal signaling is conserved between the Jurkat T cells activated by OKT3 and TCR activated by pMHC. We can reasonably argue that the peptide acts similarly in both conditions, since the peptide also exerts an inhibitory effect in T cells activated by antigen-presenting cells (Figure 4). The newly presented AlFoM model (Figure 9) predicts that PITCR binding displaces a zeta chain in TCR. This new result provides a plausible molecular rationale for the results in Figure 8, where we observe that PITCR changes transmembrane compactness, which has been linked to allosteric activation (Lanz et al., 2021; Prakaash et al., 2021).

    1. Author Response

      eLife assessment

      This useful paper examines changes (or lack thereof) in birds' fear response to humans as a result of COVID-19 lockdowns. The evidence supporting the primary conclusion is currently inadequate, because the model used does not properly account for many potentially confounding factors that could influence the study's outcomes. If the analytic approach were improved, the findings would be of interest to urban ecologists, behavioral biologists and ecologists, and researchers interested in understanding the effects of COVID-19 lockdowns on animals.

      Many thanks for these supportive words. We did our best to improve our manuscript according to the reviewers and editor comments. Importantly, we regret being unclear in the Methods, as our models already controlled for most of the confounds (see below) discussed by the reviewers.

      For example, given that a single observer collected the data at most sites, site as a random intercept in the models controls also for the observer effects (which is one of the reasons why site is in the model). We added details to Methods (L352-356, see also “Statistical analyses” in the main text).

      The first reviewer asked us to use “some measure of urbanity (e.g. Human Footprint Index) that varies across the cities included here”. Our main results are now based on country-specific models and hence, the use of a single value predictor for each city is not appropriate. Please, see also below.

      The second reviewer is concerned about multicollinearity in our models because of the 0.95 correlation between Period and Stringency Index. However, these are key predictor variables of interest that have never been used within the same model as predictors. We now clearly explain this in the Methods (L458-538, 548-550) and within legend of Figure S2.

      The third reviewer suggested that our models would benefit from controlling for day in the species-specific breeding cycle. Although we don’t have precise city-specific information on the timing of breeding stages in the sampled populations of birds, we partly control for these effects by including a random intercept of day within each year and species. This random factor explained most of the variance (see Table S1-S2) – something that could have been expected. In other words, we do control for what the third reviewer asked for. Similarly, we account for habitat features that may influence escape distance by including site in the models. Site usually refers to a specific park (we assume that within-park heterogeneity is lower than between park variation) and hence partly addresses the reviewer’s concern. Again, we highlight this within the Methods (L466-476).

      Reviewer #1 (Public Review):

      This paper uses a series of flight initiation "challenges" conducted both prior to and during COVID-19-related restrictions on human movement to estimate the degree to which avian escape responses to humans changed during the "anthropause". This technique is suitable for understanding avian behavioral responses with a high degree of repeatability. The study collects an impressive dataset over multiple years across five cities on two continents. Overall the study finds no effect of lockdown on avian escape distance (the distance at which the "target" individual flees the approaching observer). The study considers the variable of interest as both binary (during lockdown or prior to lockdown) and continuous, using the Oxford Stringency Index (with neither apparently affecting escape distance). Overall this paper presents interesting results which may suggest that behavioral responses to humans are rather inflexible over "short" (~2 year) timespans. The anthropause represents a unique opportunity to disentangle the mechanistic drivers of myriad hypothesized impacts humans have on the behavior, distribution, and abundance of animals. Indeed, this finding would provide important context to the larger body of literature aimed at these ends.

      Thank you very much for your positive feedback.

      However, the paper could do more to carefully fit this finding into the broader literature and, in so doing, be a bit more careful about the conclusions they are able to draw given the study design and the measures used. Taking some of these points (in no particular order):

      Thank you. We did our best in addressing your comments (see below and updated Methods, Results and Discussion sections).

      1) Oxford Stringency Index is a useful measure of governmental responses to the pandemic and it's true that in some scenarios (including the (Geng et al. 2021) study cited by this paper) it can correlate with human mobility. However, it is far from a direct measure of human mobility (even in the Geng study, to my reading, the index only explained a minority of the variation). Moreover, particular sub-components of the index are wholly unrelated to human mobility (e.g. would changes to a country's public information campaign lead to concomitant changes in urban human mobility?). Finally, compliance with government restrictions can vary geographically and over time (i.e. we might expect lower compliance in 2021 than in 2020) and the index is calculated at the scale of entire countries and may not be very reflective of local conditions. Overall this paper could do more to address the potential shortcomings of the Oxford Stringency Index as a measure of human mobility including attempting to validate the effect on human mobility using other datasets (e.g. the google dataset and/or those discussed in (Noi et al. 2022). This is of critical importance since the fundamental logic of the experimental design relies on the assumption that stringency ~ mobility.

      Thank you for this comment. First, Oxford Stringency Index seemed to us as the best available index for our purposes, i.e to estimate people's mobility during the shutdown because restrictions surely influenced the possibility that people would be outside, and because the index is a country-specific estimate. However, in addition, we now checked all indices mentioned in Noi et al. 2022 and found useful only the Google Mobility Reports, which we now use, because (a) it is publicly available, (b) it is available also for territories outside US, and (c) provides data for each city included in our dataset as well as for urban parks where most of our data were collected. Note that some platforms are no longer providing their mobility data (e.g. Apple).

      However, Google Mobility provides day-to-day variation in human mobility, whereas we are interested in overall increase/decrease in human mobility. Nevertheless, we correlated the Google mobility index with the Stringency index and found that human mobility generally decreases with the strength of the anti-pandemic measures adopted in sampled countries (albeit the effect for some countries, e.g. Poland, is small; Fig. 5).

      Moreover, we also added analysis using # of humans collected directly in the field during escape trials (e.g. Fig. 6 and S6) and found that the link between # of humans and Stringency index or Google Mobility was weak and noise, 95%CIs widely crossing zero (Fig. 6).

      Importantly, if we use Google Mobility and # of humans, respectively, as predictors of escape distance, the results are qualitatively very similar to results based on Oxford Stringency Index (Fig. S6), or Period, with tiny effect sizes for both (95%CIs for Google Mobility -0.3 – 0.06, Table S5, for # of humans -0.12 – 0.02, Table S6) supporting our previous conclusions.

      Note that Google Mobility and the number of humans have their limitations (see our comment to the editor and the Methods section in the main manuscript, e.g. L418-433). The lack of Google Mobility data for years before the COVID-19 pandemic does not allow us to fully explore whether overall human activity decreased during COVID-19 or not (our test for period prior and during COVID-19). If the year 2022 reflects a return to “normal” (which is to be disputed due to COVID-19-driven rise in home office use) the 2020 and 2021 had on average lower levels of human activity (Fig. 4). Whether such a difference is biologically meaningful to birds is unclear given the immense day-to-day change in human mobility and presence (Fig 4). Moreover, the number of humans capture within- and between-day variation rather than long-term changes in human presence.

      We added details on the new analysis into the method and results sections (e.g. Fig. 4-6; L142-165, 418-438, 495-535) and Supplementary Information (Figs. S5-S9 and associated Tables) and discuss the problematic accordingly. Moreover, to enhance clarity about country specific effect (or their lack), we also add country specific estimates to the Results (Fig. 1 and Fig. S6 and respective Tables). Finally, our statistical design and random structure of the model allowed us to control for spatial and temporal variation in compliance with government restrictions.

      2) The interpretation of the primary finding (that behavioral responses to humans are inflexible) could use a bit more contextualization within the literature. Specifically, the study offers three potential explanations for the observed invariance in escape response: 1) these behaviors are consistent within individuals and this study provides evidence that there was no population turnover as a result of lockdowns; 2) escape response is linked to other urban adaptations such that to be an urban-dwelling species dictates escape response; and/or 3) these populations already exhibit maximum habituation and the reduction in human mobility would only have increased that habituation but that trait is already at a boundary condition. Some comments on each of these respectively:

      Thank for these comments. We incorporated them in the main text (L293-329). Your point 1) corresponds to our point (i): “Most urban bird species in our sample may be relatively inflexible in their escape responses because the species may be already adapted to human presence” (L293-306); your point 2) to our point (ii): “Urban environment might filter for bold individuals (Carrete and Tella, 2013, 2010; Sprau and Dingemanse, 2017). Thus, the lack of consistent change in escape behaviour of urban birds during the COVID-19 shutdowns may indicate an absence (or low influx) of generally shy, less tolerant individuals and species from rural or less disturbed areas into the cities…” (L307-314); your point 3) to our point (iii): “Urban birds might have been already habituated to or tolerant of variation in human presence, irrespective of the potential changes in human activity patterns” (L315-329). To distinguish between (ii) and (iii) or the two from (i), individually-marked birds and comprehensive genetic analyses are needed, which we now note in the Discussion (L330-348). Importantly, we also discuss that the lack of response might be due to relatively small changes in human activity (L253-292), which we unfortunately could not fully quantify.

      a) Even had these populations turned over as a result of a massive rural-to-urban dispersal event, it's not clear that the escape distance in those individuals would be different because this paper does not establish that these hypothetical rural birds have a different behavioral response which would be constant following dispersal. Thus the evidence gathered here is insufficient to tell us about possible relocations of the focal species.

      Thank you for this point. We address this point in the Introduction and Discussion (L92-101, 307-314). Rural bird populations/individuals are on average less tolerant of humans than urban birds (e.g. Díaz et al. 2013, PloS One 8:e64634; Tryjanowski et al. 2020, J Tropic Ecol 36:1-5; Mikula et al. 2023, Nat Commun 14:2146) and at the same time, bird individuals seem consistent in their escape responses (Carrete & Tella 2010, Biol Lett 23:167–170; Carrete & Tella 2013, Sci Rep 3:1–7).

      Additionally, the paper cites several papers that found no changes in abundance or movements of animals in response to lockdowns but ignore others that do. For example: (Wilmers et al. 2021), (Warrington et al. 2022) (though this may have been published after this was submitted...), and (Schrimpf et al. 2021).

      We added the papers (L89-91). Thank you!

      There is a missed opportunity to consider the drivers of some of these results - the findings in this paper are interesting in light of studies that did observe changes in space use or abundance - i.e. changes in space use could arise precisely because responses to humans are non-plastic but the distribution and activities of humans changed.

      Thank you. Indeed, we now address this in the Discussion (L303-306): “However, some studies reported changes in the space use by wildlife (Schrimpf et al., 2021; Warrington et al., 2022; Wilmers et al., 2021). and these could arise, as our results indicate, from fixed and non-plastic animal responses to humans who changed their activities”.

      To wit, the primary finding here would imply that the reaction norm to human presence is apparently fixed over such timescales - however, and critically, the putative reduction in human activity/mobility combined with fixed responses at the individual level might then imply changes in avian abundance/movement/etc.

      Unfortunately, we have not measured changes in avian abundance or movements. But, please, note that the change in human mobility in sampled cities might be not as dramatic as initially thought and we consider this scenario to be most plausible in explaining no significant differences in avian escape responses before and during the COVID-19 shutdowns (see Fig. 4). Nevertheless, we add your point into the Discussion: If our findings imply that in birds the reaction norm to human presence is fixed over the studied temporal scale, the putative changes in human presence might then imply changes in avian abundance or movement (L293 and text below it).

      b) If this were the case, wouldn't this be then measurable as a function of some measure of urbanity (e.g. Human Footprint Index) that varies across the cities included here? Site accounted for ~15% of the total variation in escape distance but was treated as a random effect - perhaps controlling for the nature of the urban environment using some e.g. remotely sensed variable would provide additional context here.

      Urbanity mirrors the long-term level of human presence in cities whereas we were interested mainly in the rather short-term effects of potential changes of human presence on bird behaviour. Thus, we are not sure how adding such variable will help elucidating the current results. Please, also note that we added the country-specific analysis. Site indeed accounted for considerable amount the total variance in escape distance and that is why it was included as random intercept, which controls for non-independents of data points from each city. This could partly help us to control for difference in habitat type (e.g. urbanization level) within cities.

      c) Because it's not clear the extent to which the populations tested had turned over between years, the paper could do with a bit more caution in interpreting these results as behavioral. This study spans several years so any response (or non-response) is not necessarily a measure of behavioral change because the sample at each time point could (likely does) represent different individuals. In fact, there may be an opportunity here to leverage the one site where pre-pandemic measures were taken several years prior to the pandemic. How much variance in the change in escape distance is observed when the gap between time points far exceeds the lifetime of the focal taxa versus measures taken close in time?

      We believe the initial Fig S4, now Figure 2, addresses this point. The between years temporal variation in FIDs exceeds the variation due to lockdowns. This is true both for measures taken in consecutive years, as well as for measures taken far apart.

      d) Finally, I think there are a few other potential explanations not sufficiently accounted for here:

      i) These behaviors might indeed be plastic, but not over the timescales observed here.

      We agree and have added this point (L301-303). Thank you.

      ii) Time of year - this study took place during the breeding season. The focal behavior here varies with the time of year, for example, escape distance for many of these species could be tied up in nest defense behaviors, tradeoffs between self-preservation and e.g. nest provisioning, etc.

      Please, note that we controlled for the date in our analyses. Date was used as a proxy for the progress in the breeding season (L463-464 and Fig. 1 caption). Note that we collected data only from foraging or resting individuals, and data were neither collected near the nest sites nor from individuals showing warning behaviours, which we now note (L400-401).

      iii) Escape behaviors from humans are adaptively evolved, strongly heritable, and not context dependent - thus we would only expect these behaviors to change on evolutionary timescales.

      We discussed this at L307-308 and 381-383. Escape behaviors from humans are highly consistent for individuals, populations, and species (Carrete & Tella 2010, Biol Lett 23:167–170; Díaz et al. 2013, PloS One 8:e64634; Mikula et al. 2023, Nat Commun 14:2146). Whether such behavior is consistent across contexts is less clear (e.g. Diamant et al. 2023, Proc Royal Soc B, in press; but see, e.g. Radkovic et al. 2019, J Ecotourism 18:100-106; Gnanapragasam et al. 2021, Am Nat 198:653-659). Escape distance is often not measured simultaneously, for example, with human presence. In other words, whereas general level of human presence may have no effect on escape distance, the day-to-day or hour-to-hour variations might. We need studies on fine temporal scales (day-to-day or hour-to-hour) using marked individual to elucidate this phenomenon.

      iv) See point one above - it's possible that the lockdown didn't modify human activity sufficiently to trigger a behavioral response or that the reaction norm to human behavior is non-linear (e.g. a threshold effect).

      We agree, now use also Google Mobility Reports and # of humans data to elucidated this phenomenon and have added such interpretations to L253-292 and, e.g. Fig. 4.

      LITERATURE CITED Geng DC, Innes J, Wu W, Wang G. 2021. Impacts of COVID-19 pandemic on urban park visitation: a global analysis. J For Res 32:553-567. doi:10.1007/s11676-020-01249-w

      Noi E, Rudolph A, Dodge S. 2022. Assessing COVID-induced changes in spatiotemporal structure of mobility in the United States in 2020: a multi-source analytical framework. Int J Geogr Inf Sci.

      Schrimpf MB, Des Brisay PG, Johnston A, Smith AC, Sánchez-Jasso J, Robinson BG, Warrington MH, Mahony NA, Horn AG, Strimas-Mackey M, Fahrig L, Koper N. 2021. Reduced human activity during COVID-19 alters avian land use across North America. Sci Adv 7:eabf5073. doi:10.1126/sciadv.abf5073

      Warrington MH, Schrimpf MB, Des Brisay P, Taylor ME, Koper N. 2022. Avian behaviour changes in response to human activity during the COVID-19 lockdown in the United Kingdom. Proc Biol Sci 289:20212740. doi:10.1098/rspb.2021.2740

      Wilmers CC, Nisi AC, Ranc N. 2021. COVID-19 suppression of human mobility releases mountain lions from a landscape of fear. Curr Biol 31:3952-3955.e3. doi:10.1016/j.cub.2021.06.050

      Reviewer #2 (Public Review):

      Mikula et al. have a large experience studying the escape distances of birds as a proxy of behavioral adaptation to urban environments. They profited from the exceptional conditions of social distance and reduced mobility during the covid-19 pandemic to continue sampling urban populations of birds under exceptional circumstances of low human disturbance. Their aim was to compare these new data with data from previous "normal" years and check whether bird behavior shifted or not as a consequence of people's lockdown. Therefore, this study would add to the growing body of literature assessing the effect of the covid-19 shutdown on animals. In this sense, this is not a novel study. However, the authors provide an interesting conclusion: birds have not changed their behavior during the pandemic shutdown. This lack of effects disagrees with most of the previously published studies on the topic. I think that the authors cannot claim that urban birds were unaffected by the covid-19 shutdown. I think that the authors should claim that they did not find evidence of covid-19-shutdown effects. This point of view is based on some concerns about data collection and analyses, as well as on evolutionary and ecological rationale used by the authors both in their hypotheses and results interpretation. I will explain my criticisms point by point:

      We are grateful for your positive appraisal of our manuscript, as well as for your helpful critical comments. We toned down the discussion to claim, as suggested by you, that we did not find evidence for effects of covid-19-shutdowns on escape behaviour of birds in urban settings (see Results and Discussion sections). In general, we attempted to provide a more nuanced discussion and reporting of our findings. We also changed the manuscript title to “Urban birds' tolerance towards humans was largely unaffected by the COVID-19 shutdowns” and added validation using Google Mobility Reports (Fig. 5 & S6, Table S3a and S5) and the actual number of humans (Fig. 6 and S6; Table S3b-e and S6). Note however that there is only a single robust study on the topic of shutdown and animal escape distances (Diamant et al. 2023, Proc Royal Soc B, in press), i.e. the topic is largely unexplored (e.g. L99-101), whereas we discuss our finding in light of shutdown influences on other behaviours (L293-329).

      1) The authors used ambivalent, sometimes contradictory, reasoning in their predictions and results interpretation. Some examples:

      We tried to clarify our reasoning and increased consistency in our claims in the Introduction. Please, note that we simplified the Introduction and now provide one main expectation: FIDs of urban birds should increase with decreased human presence. This pattern is robustly empirically documented, regardless of the mechanism involved (e.g. Díaz et al. 2013, PloS One 8:e64634; Tryjanowski et al. 2020, J Tropic Ecol 36:1-5; Mikula et al. 2023, Nat Commun 14:2146). Please, see our revised Discussion for a more comprehensive discussion of mechanisms which could explain the patterns described in our study.

      1.1) The authors claimed that urban birds perceive humans as harmless (L224), but birds actually escape from us, when we approach them... Furthermore, they escape usually 5 to 20 m away. This is more distance that would be necessary just to be not trampled.

      We agree and have deleted mentions that humans are perceived as harmless.

      1.2) If we are harmless, why birds should spend time monitoring us as a potential threat (L102)? Indeed, I disagree with the second prediction of the authors. I could argue that reduced human activity should increase animal vigilance because real bird predators (e.g. raptors) may increase their occurrence or activity in empty cities. If birds should increase their vigilance because the invisible shield of human fear of their predators is no longer available, then I would expect longer escape distances.

      Thank you for this comment. We deleted this prediction and largely rewrote Introduction based on your comments and comments from the other reviewers.

      1.3) To justify the same escape behavior shown by birds in pre- and pandemic conditions from an adaptive point of view, the authors argued a lack of plasticity and a strong genetic determination of such behavior. This contravenes the plasticity proposed in the previous point or the expected effect of the stringency index (L112).

      We now attempted to write this more clearly while incorporating your suggestions. In the Discussion, we now propose various hypothesis that can, but need not be mutually exclusive. Please, note that we simplified the Introduction and now provide one main hypothesis: FIDs of urban birds should increase with decreased human presence.

      In my opinion, some degree of plasticity in the escape behavior would be really favorable for individuals from an adaptive perspective, as they may face quite different fear landscapes during their lives. Looking at the figures, one can see notable differences in the escape distance of the same species between sites in the same city. As I can hardly imagine great genetic differences between birds sampled in a park or a cemetery in Rovaniemi, for instance, I would expect a major role of plasticity to explain the observed variability. Furthermore, if escape behavior would not be plastic, I would not expect date or hour effects. By including them in their models, the authors are accepting implicitly some degree of plasticity.

      We regret being unclear. We do accept some degree of plasticity. Yet, our study design prohibits the assessment of the degree of individual plasticity because sampled birds were not individually marked and approached repeatedly. We tried to soften the statements in our Discussion to not fully dismiss a possibility that urban birds have some degree of plasticity in their antipredator behaviour (L293-329). Note however, that while our data collection was not designed to test how hour-to-hour changes in human numbers influence escape distance, the effect of the number of humans (i.e. hour-to-hour variation in human numbers) in our sample was tiny.

      The date and hour effect simply control for the particularities of the given day and hour (e.g. warm vs cold times or the time until sunset). In other words, the within species differences (even from the same park) may have little to do with individual plasticity, but instead may reflect between individual differences. We now add this issue to Methods (L471-476): “This approach enabled us to control for spatial and temporal heterogeneity and specificity in escape behaviour of birds (e.g. species-specific responses, changes in escape distances with the progress in the breeding season, spatial and temporal variation in compliance with government restrictions or particularities of the given day and hour)....”

      2) Looking at the figures I do not see the immense stochasticity (L156, Fig. S3, S5) claimed by the authors. Instead, I can see that some species showed an obvious behavioral change during the shutdown. For instance, Motacilla alba, Larus ridibundus, or Passer domesticus clearly reduced their escape distances, while others like the Dendrocopos major, Passer montanus, or Turdus merula tended to increase it.

      At L138-141 and 327-329 we discussed the within and between genera and cross-country variation and stochasticity in response to the shutdowns (Fig. 2). The reference to species-specific plots was perhaps a little bit misleading. We think that the essential figure, that we now reference at this point, is Figure 2 that shows the temporal trends and/or stochasticity that seem to have little in common with lockdowns. Please, also look at Figure 3 and S3-S4. These show that in all selected genera/species, the trends did not significantly deviate from central regression line which indicates no change in FID before and during the COVID-19 shutdowns.

      On the other hand, birds in Poland tended to have larger escape distances during the shutdown for most species, while in Rovaniemi there was an apparent reduction of escape distances in most cases. The multispecies and multisite approach is a strength of this study, but it is an Achilles' heel at the same time. The huge heterogeneity in bird responses among species and sites counterbalanced and as a result, there was an apparent lack of shutdown effects overall. Furthermore, as most data comes from a few (European) species (i.e. Columba, Passer, Parus, Pica, Turdus, Motacilla) I would say that the overall results are heavily influenced (or biased) by them. The authors realize that results are often area- or species-specific (L203), therefore, does a whole approach make sense?

      We are grateful for this valuable comment. We believe the general approach makes sense as there is a general expectation about how birds should respond to changes in human presence. That is why we control for non-independence of data points in our sample. Thus, although lots of data come from a few European species, this is corrected for by the model. Note that given the sheer number of sampled species, some site- or species-specific trends may have occurred by chance. Importantly, we believe that Figure 2, with species-site specific temporal trends, reveals that the between year stochasticity in escape distances seems greater that any effects of lockdowns. Nevertheless, we have further dealt with this issue in the revised manuscript by running country-specific models which again clearly showed no significant effect of Period on escape behaviour of birds (including, no effects in Poland and Finland).

      3) The previous point is worsened by the heterogeneity of cities and periods sampled. For instance:

      3.1) I can hardly imagine any common feature between a small city in northern Finland (Rovaniemi) and a megacity in Australia (Melbourne). Thus, I would not be surprised to find different results between them.

      3.2) Prague baseline data was for 2014 and 2018, while for the rest of the study sites were for 2018 and 2019. If study sites used a different starting point, you cannot compare differences at the final point.

      We are slightly confused by these comments.

      3.1) The cities are expected to be different but (i) the difference may be smaller than imagined (e.g. park structures, managed grass cover, few shrubs and deciduous-dominated tree species) and (ii) we expect the effects of lockdowns to be similar across cities. Whether we have no people in Rovaniemi parks (which despite Rovaniemi’s small size are usually extremely well-visited) or no people in Melbourne parks should not make a difference in principle. Note however, that to avoid overconfident conclusions, we allow for different reaction norms within cities. Please, also note that we are now providing country-specific results which should identify whether shutdowns lead to different reaction in sampled countries. We found no strong effect of shutdowns in any of sampled countries/cities.

      3.2) Because of the possible between site differences at the starting point, we use study site as random intercept and control for the between site reaction norms by including the random slope of the period. In other words, such possible differences do not influence outcomes of our models. Regardless, our a priori expectation is that the human activity levels in a given park was similar prior to covid and hence in 2014, 2018, and 2019. Again, we are now providing country-specific results which identify whether shutdowns led to different reactions in sampled countries, which they mostly did not

      3.3) Due to the obvious seasonal differences between the northern and southern hemispheres, data collection in Australia began five months later than in the rest of the sites (Aug vs Mar 2020). There, urban birds faced already too many months of reduced human disturbances, while European birds were sampled just at the beginning of the lockdown.

      We agree that each city or even park within the city has its specific environmental conditions (here including the time point of lockdown). That is why we control for city and park location in the random structure of the model (see Method section). We now add results per country that shows no clear differences (e.g. Fig. 1).

      However, the aim of our study was to test for general, global effects of lockdowns, which are minimal. Note that we now specifically test for country-specific effects in separate models on each country (e.g. Fig. 1, Fig S6) but all country-specific effects are small and still centre around zero.

      3.4) Some cities were sampled by a single observer, while others by many of them. Even if all of them are skilled birders, they represent different observers from a statistical point of view and consequently, observer identity was an extra source of noise in your data that you did not account for.

      We agree. In Finland and Hungary, data were collected by two closely cooperating observers. In Poland, all data were collected by a single observer. In the Czech Republic and Australia, a single observer (P.M. and M.W., respectively) sampled 46 sites out of 56 and 32 sites out of 37, respectively. Each site was sampled by the same observer both before and during the shutdowns. We now clearly state it in the Methods (L352-356). In other words, our models already largely control for the possible observer confound by having site as a random intercept. Moreover, previous study showed that FID estimates do not vary significantly between trained observers (Guay et al. 2013, Wildlife Research, 40, 289-293).

      4) Although I liked the stringency index as a variable, I am not sure if it captured effectively the actual human activity every day. Even if restrictive measures were similar between countries, their actual accomplishment greatly depended on people's commitment and authorities' control and sanctions. I would suggest using a more realistic measure of human activity, such as google mobility reports.

      Thank you for this comment. We now validate the use of the stringency index with the Google Mobility Reports, showing that human mobility generally (albeit in some countries relatively weakly) decreases with the strength of governmental antipandemic measures. Please, note that our main research question is related to the general change in human outdoor activity and not to week-to-week, day-to-day or hour-to-hour changes captured by stringency index, Google Mobility or the number of humans during an escape trial data. Nevertheless, using Google Mobility and the number of humans as predictors led to the similar results as for stringency index and Period (Fig. 1 and S6). Please, see extended discussion on this topic in our manuscript (L270-292).

      5) The authors used escape trials from birds on the ground and perched birds. I think that they are not comparable, as birds on the ground probably perceive a greater risk than those placed some meters above the ground, i.e. I would expect shorter escape distances for perched birds. As this can be strongly dependent on the species preferences or sampling site (i.e, more or less available perches), I wonder how this mixture of observations from birds on the ground and perched birds could be affecting the results.

      We now added information that most birds were sampled when on the ground (79%). Importantly, previous studies have found that perch height has a minimum effect on FIDs (e.g. Bjørvik et al. 2015. J Ornithol 156:239–246; Kalb et al. 2019, Ethology 125:430-438; Ncube & Tarakini 2022, Afr J Ecol 60:533– 543; Sreekar et al. 2015,. Tropic Conserv Sci 8:505-512). We added this information to the Method section (L394-395).

      6) The authors did not sample the same location in the same breeding season to avoid repeated sampling of the same individuals (L331). This precaution may help, but it does not guarantee a lack of pseudoreplication. Birds are highly mobile organisms and the same individuals may be found in different places in the same city. This pseudoreplication seems particularly plausible for Rovaniemi, where sampling points must be necessarily close due to the modest size of this city.

      We appreciate your concern. We cannot fully exclude the possibility of sampling some individuals twice. However, we sampled during the breeding season within which most birds are territorial, active in the areas around the nests and hence an individual switching parks is unlikely. Also, most sampled birds in our study are passerines which have small territories (typically few hundred square meters). Some larger birds may have larger territories and move larger distance to forage (e.g. kestrels which often forage outside cities) but these birds represent a minority of our records and we have not sampled outside the cities.

      7) An intriguing result was that the authors collected data for 135 species during the shutdown, while they collected data only for 68 species before the pandemic. Such a two-fold increase in bird richness would not be expected with a 36% increase in sampling effort during 2020-21. I wonder if this could be reflecting an actual increase in bird richness in urban areas as a positive result of the shutdown and reduced human presence.

      There were 141 unique day-years during before COVID and 161 during COVID. So, the sampling effort as calculated by days does not explain the difference in species numbers. Whether the actual effort, which was 381 vs 463 h of sampling, explains the difference is unclear, which we now note in the Methods (L476-483). If not, your proposition is possible, but we would like to avoid any speculations on this topic in the manuscript as it is difficult to infer species diversity from FID sampling.

      8) The authors dismissed the multicollinearity problem of explanatory variables unjustifiably (L383). However, looking at fig. S1, I can see strong correlations between some of them. For instance, period and stringency index were virtually identical (r=0.95), while temperature and date were also strongly correlated.

      We are confused by this comment and think this reflects a misunderstanding. Period and stringency index are explanatory variables of interest that were never included in the same model and hence their correlation does not contribute to the within a model multicollinearity. To avoid further confusion, we note this within (Fig. S2) legend. However, we must be cautious when interpreting the results from the models on period, Google Mobility, # of humans and stringency index, as the four measure are similar.

      We discuss multicollinearity of explanatory variables within the manuscript (L458-538, 548-550) and noted that, with the exception of temperature and day within the breeding season (r = 0.48), the correlations among explanatory variables were minimal. We thus used only temperature as an explanatory variable (i.e. fixed factor; also because temperature reflects both season and variation in temperature across a season) whereas the day was included as a random intercept to control for pseudoreplication within day. Collinearity between all other predictors was low (|r| <0.36).

      9) The random structure of the models is a key element of the statistical analyses but those random factors are poorly explained and justified. I needed to look up the supplementary tables to fully understand the complex architecture of the random part of the models. To the best of my knowledge, random variables aim to account for undesirable correlations in the covariance matrix, which is expected in hierarchical designs, such as the present one. However, the theoretical violation of data independence may happen or not. As the random structure is usually of little interest, you should keep it as simple as necessary, otherwise random factors may be catching part of data variability that you would like to explain by fixed variables. I think that this is what is happening (at least, in part) here, as the authors included a too-complex random structure. For instance, if you include the year as a random factor, I think that you are leaving little room for the period effect. The authors simplified the random structure of the models (L387), but they did not explain how. Nevertheless, this model selection was not important at all, as the authors showed the results for several models. I assume, consequently, that the authors are considering all these models equally valid. This approach seems quite contradictory.

      The random structure of the model controls for possible pseudoreplication in the data, that is for the cases where we have multiple data points that may not be independent and hence technically represent one. Apart from that, random structure tells us about where the variance in the data lies. This is often of interest and your previous questions about city, site or species specificities can be answered with the random part of the model. To follow up on your example, year is included in the model because data from a single year are not independent (for example because of delayed breeding season in one year vs. in another).

      We regret being unclear about the model specification and have attempted to clarify the methods (L466-476). We first specified a model with an ideal random structure that necessarily was complex (perhaps too complex). We then showed that using models with simpler random structures did not influence the outcomes. We now use a simpler model within the main text, but do keep the alternative models to show that the results are not dependent on the random structure of the model (Fig. S1 and Table S2).

      Reviewer #3 (Public Review):

      This study examined the changes in fear response, as measured by the flight initiation distances (FID), of birds living in urban areas. The authors examined the FIDs of birds during the pandemic (COVID-19 lockdown restrictions) compared to FIDs measured before the pandemic (mostly in 2018 & 2019). The main study justification was that human presence changed drastically during the pandemic lockdowns and the change in human presence might have influenced the fear response of birds as a result of changing the "landscape of fear". Human presence was quantified using a 'stringency' index (government-mandated restrictions). Urban areas were selected from within five different cities, which included four European cities (Czech Republic - Prague, Finland - Rovaniemi, Hungary - Budapest, Poland - Poznan), and one city in the global south (Australia - Melbourne). Using 6369 flight initiation distances across 147 different bird species, the authors found that FIDs were not significantly different before the pandemic versus during the pandemic, nor was the variation in FID explained by the level of 'stringency'.

      Major strengths: There are several strengths to this study that allows for understanding the variety of factors that influence a bird's response to fear (measured as flight initiation distances). This study also demonstrates that FIDs are highly variable between species and regions.

      Specifically,

      1) One of the major strengths of this paper is the focus on birds living in urban areas, a habitat type that is hypothesized to have changed drastically in the 'landscape of fear' experienced by animals during the pandemic lockdown restrictions (due to the presumed decrease in human presence and densities). Maintaining the focus on urban birds allowed for a deeper examination of the effect of human behaviour changes on bird behaviour in urban habitats, which are at the interface of human-wildlife interactions.

      2) This study accounted for several variables that are predicted to influence flight initiation distances in birds including species, genus, region (country), variability between years, pandemic year (pre- versus during), the strictness of government-mandated lockdown measures, and ecological factors such as the human observer starting distance, flock size, species-specific body size, ambient air temperature (also a proxy of the timing during the breeding season), time of day, date of data collection (timing within the regional [Europe or Australia] breeding season), and categorization of urban site type (e.g. park, cemetery, city centre).

      3) This study examined FIDs in two years previous to the pandemic (mostly 2018 and 2019, one site was 2014) which would account for some of the within- and between-year FID variation exhibited prior to the pandemic.

      4) This study uses strong statistical approaches (mixed effect models) which allows for repeat sampling, and a post hoc analysis testing for a phylogenetic signal.

      Thank you for your supportive and positive comments.

      Major weaknesses: The authors used government 'stringency' as a proxy for human presence and densities, however, this may not have been an accurate measure of actual human presence at the study sites and during measurements of FIDs. Furthermore, although the authors accounted for many factors that are predicted to influence fear response and FIDs in birds, there are several other factors that may have contributed to the high level of variation and patterns in FIDS observed during this study, thus resulting in the authors' conclusion that FIDs did not vary between pre- and during pandemic years.

      Thank you for your suggestions. We agree. To capture the general human presence in parks, we now incorporated an analysis using Google Mobility Reports (Fig S6b) that directly measures human mobility in each of sampled cities and specifically in urban parks where most our data were collected, and also address your further concerns that you detail below. Albeit not the main interest of our study, we now also incorporated an analysis using actual # of humans during an escape trial (Fig. S6c).

      Moreover, we think that including further possible confounds should not influence our conclusions. In other words, including further confounds will decrease the variance that can be explained by shutdowns and thus such shutdown effects (if any) would be tiny and hence likely not biologically meaningful.

      Specifically,

      1) The authors used "government stringency" as a measure of change in human activity, which makes the assumption that the higher the level of 'stringency', the fewer humans in urban areas where birds are living. However, the association between "stringency" and actual human presence at the study sites was not measured, nor was 'stringency' compared to other measures of human presence such as human mobility.

      Thank you for this essential comment. Initially, we viewed Oxford Stringency Index as the best available index for our purposes. However, we now further acknowledge its limitations (L) and validate the Oxford Stringency Index with the Google Mobility Reports data, showing that both indices are generally negatively (albeit sometimes weakly) correlated across sampled cities (i.e. human mobility decreases with the increasing stringency index). Although other human presence indices were used in the past, e.g. Cuebiq, Descartes Labs and Maryland Uni index, Apple (see Noi et al. 2022, Int J Geograph Info Sci, 36, 585-616), we used only the Google Mobility index because (a) it is publicly available, (b) is available also for territories outside US, and (c) provides data for urban parks within each city included in our dataset. Note however that Google Mobility data are inappropriate to answer our primary question, i.e. whether changes in human presence outdoors due to the COVID-19 shutdowns had any effect on avian tolerance towards humans. First, Google Mobility was available only for 2020-22, i.e. the baseline pre-COVID-19 data for 2018-2019 were unavailable. Thus, there was no way to check whether the human activity levels really changed during the COVID-19 years. Second, Google Mobility data are calculated as a change from 2020 January–February baseline for each day of the week for each city and its location (here we used parks). In other words, the data are not comparable between days and cities, albeit we attempted to correct for this within the random structure of the mixed model. Also, the data may be influenced by extreme events within the 2020 Jan–Feb baseline period (see here). Third, the Google Mobility varies greatly between days and across season (see Fig 4 & S5 or the first figure in these responses), likely more than the possible change due to shutdowns. Nevertheless, we found that results based on Google Mobility are qualitatively very similar to results based on stringency index. Moreover, we showed that the relationships between # of humans and both Google Mobility or Stringency index (Figure 6) are weak and noise with 95%CIs widely overlapping zero (Table S3b-e). Also, similarly to other predictors of human presence, # of humans only poorly predicted changes in avian escape distances. We added details on the new analysis into the Methods and Results and Supplement (L134-165 and associated figures and tables, L415-535).

      2) There was considerable variation in FID measurements, which can be seen in the figures, indicating that most of the variation in FID was not accounted for in the authors' models.

      We are confused by this statement. The fact that the FIDs varied does not translate directly to that our models did not account for the variation. Nevertheless, we do control for most of the discussed confounds (see further answers below). Importantly, it is unclear how including further possible confounds should influence our conclusions, unless the lockdowns effects are tiny, in which case those might not be biologically meaningful.

      Factors that may have contributed to variation in FIDs that were not accounted for in this study are as follows:

      a. The authors accounted for the date of data collection using the 'day' since the start of the general region's breeding season (Europe: Day 1 = 1 April; Australia: Day 1 = 15 August). Using 'day' since the breeding season started probably was an attempt to quantify the effect of the breeding stage (e.g. territory establishment, nest young, fledgling) on FIDs. However, breeding stages vary both within- and between species, as well as between sub-regions (e.g. Finland vs. Hungary). As different species respond to predation or human presence differently depending on the stage during their breeding cycle, more specificity in the breeding cycle stage may allow for explaining the observed variation and patterns in FID.

      We agree. Although we don’t have a precise city-specific information on the timing of breeding stages in sampled populations of birds, we partly control for these effects by including a random intercept of day within each year and species. This random factor explained relatively high portion of the variance in our data (see Table S1 and S2) - perhaps something you expected.

      b. Variation in species-specific FIDs may also vary with habitat features within urban sites, such as the proximity of trees and other protective structures (e.g. perches and cover), the openness of the area, and the level of stressors present (e.g. noise pollution, distance to roads). Perhaps accounting for this habitat heterogeneity would account for the FID variation measured in this study.

      We agree. We don’t have such fine-scale data, but we included site identity (typically within a particular park or cemetery) which should account for the habitat heterogeneity among localities. Depending on the model, site explained relatively little variance (1-6%), indicating low heterogeneity between localities in these undescribed characteristics. Also note that park structure may be quite similar both within and between cities, i.e. managed green grass areas, with only a few shrubs and deciduous trees. Therefore, the possible minor habitat heterogeneity should not have any great impacts on our results.

      c. The authors accounted for species and genus within their models, however, FIDs may vary with other species-specific (or even specific populations of a species) characteristics such as whether the species/population is neophobic versus neophilic, precocial versus altricial, and the level of behavioural plasticity exhibited. These variables were not accounted for in the analysis.

      We agree that FIDs can be correlated with many possible factors. Here, we were interested in general patterns, while controlling for FID differences between species, as well as for possible species-specific reaction norms to lockdowns. Whether neophobic vs neophilic population or precocial versus altricial species react differently to lockdowns might be of interest, but it is beyond the scope of this study. However, that population and population specific reaction norms explain little variation (Table S2a, 0-6% of variation) so such a confound should not substantially influence our conclusion much. We do not have fine-scale data on the level of neophobia, but the effects of lockdowns seem similar for precocial (see Anas, Larus, Cygnus) and altricial (the remaining, mostly passerine) species in our dataset (see Fig. 3 and S3-S4). Please, note that we sampled mainly adults (L386). Moreover, the effects for clades, which may differ in their cognitive skills, are also similar (e.g. Corvids vs. Anas or Cygnus; Fig. 3).

      d. Three different methods of measuring the distances between flight and the observer location were used, and FIDs were only measured once per bird, such that there were no measures of repeatability for a test subject. Thus, variation surrounding the measurement of FIDs would have contributed to the variation in FIDs seen during this study.

      While all observers were trained, the three methods may add some noise to the FID estimates. However, the FID estimates from a single method may still slightly differ between observers (so do well standardized morphology measurements; Wang, et al. 2019, PLoS Biology, 17, e3000156). Importantly, FID estimates are highly replicable among skilled observers (Guay et al. 2013, Wildlife Research 40:289-293), and we previously validated this approach and showed that distance measured by counting steps did not differ from distance measured by a rangefinder (Mikula 2014, Ardea 102:53-60), which we now explicitly state (L391-394). Importantly, we control for observer bias by specifying locality as a random intercept (see further details in our response to the Editor). Moreover, each site was sampled by the same observer both before and during the shutdowns.

      3) The sample design of this study may have influenced the FID variability associated with specific species, and specific populations of species. A different number of species were sampled across the time periods of interest; 68 species were sampled before the pandemic versus 135 species after the pandemic. However, the authors do not appear to have directly compared the FIDs for the same species before the pandemic compared to during the pandemic (e.g. the FIDs of Eurasian blackbirds before the pandemic versus during the pandemic). Furthermore, within the same country-city, it is unclear whether the species observed before the pandemic were observed at the same location (e.g. same habitat type such as the same park) during the pandemic. As a species' FID response may be influenced by population characteristics and features specific to each site (e.g. habitat openness), these factors may have influenced the variability in FID measurements in this study.

      We regret being unclear in our methods. Our full model uses all data, but alternative models (see e.g. Fig. S1) used data with ≥5 as well as ≥10 observations before and during lockdowns for a given species. Importantly, Figure 2 and 3 depict data for species sampled at specific sites. We now clarify this within the Methods (L460-483) and the Results (L125-133 and associated figures) and in the figure legends (Fig. S1).

      4) The models in this study accounted for many factors predicted to affect FIDs (see the section on major strengths), however, the number of fixed and random factors are large in number compared to the total sample size (N =6369), such that models may have been over-extended.

      The number of predictors and random effects is well within the limits for the given sample size (Korner-Nievergelt et al. 2015. Bayesian Data Analysis in Ecology Using Linear Models with R, BUGS, and Stan). Importantly, simpler models give similar results as the more complex ones (Fig. S1) and the visual (model free) representations of our raw and aggregated data confirm our model results. This, we suggest, makes our findings robust and convincing.

      Overarching main conclusion

      Overall, this study examines factors influencing FIDs in a variety of bird species and concludes that FIDs did not differ during the pandemic lockdowns compared to before the pandemic (2019 and earlier). Furthermore, FIDs were not influenced by the strictness of government-mandated restrictions. Although the authors accounted for many factors influencing the measurement of FIDs in birds, the authors did not achieve their aim of disentangling the effects of pandemic-specific ecological effects from ecological effects unrelated to the pandemic (such as habitat heterogeneity).

      We find this statement confusing. We accounted for most relevant confounding factors and found little evidence for the strong effect of pandemic. Moreover, we now added country-specific analyses that confirm the lack of evidence, highlight the Figure 3 that shows no clear shutdown effect and also explore how levels of human presence changed over and within the years. Adding more possible confounds (albeit note that not many are left to add) might only further reduce the variation that could be explained by pandemic and hence such hypothetical effects of pandemic will be if anything small and thus likely not biologically meaningful.

      Their findings indicate that FIDs are highly variable both within- and between- species, but do not strongly support the conclusion that FIDs did not change in urban species during the pandemic lockdown. Therefore, this study is of limited impact on our understanding of how a drastic change in human behaviour may impact bird behaviour in urban habitats.

      It is unclear why you think our study lacks support for the conclusion that FIDs changed little during pandemic, if all results show no such effects. However, we toned down our Discussion and highlighted also potential issues linked to our approach (e.g. that sampled individuals were not marked and hence we cannot distinguish between various mechanisms that might explain the described pattern (L293-329) or that human presence may not have changed (L253-269). For further details see our previous response.

      Overall, the study demonstrates the challenges in using FIDs as a general fear response in birds, even during a pandemic lockdown when fewer humans are presumably present, and this study illustrates the large degree of variation in FIDs in response to a human observer.

      We appreciate and agree that our study demonstrates the challenges in quantifying human activity to understand bird escape distance and we added a paragraph on this topic to the discussion (L270-292).

      Nevertheless, we hope that our above responses clarify and address most of the issues you had with our manuscript. We tried to show that (a) most of your proposed controls are indeed included in our study design, models, and visualisations, and that (b) multiple evidence (from models and visualisation of raw and aggregated data) support the no overall effect conclusion. We further emphasize the temporal and between- and within-species variability in FIDs in the Results and now specifically indicate that lockdowns did not influenced FIDs above such variability (Fig. 2-3, Fig. S3). In other words, the natural (e.g. temporal) variation in FIDs seems far greater that potential effects of lockdowns (Fig. 2). We believe that even if lockdowns would have tiny effects that could have been detected with more. stringent experimental design (e.g. individually tagged birds) or even more complex models, such effects would be far from being biologically meaningful.

    2. Reviewer #3 (Public Review):

      This study examined the changes in fear response, as measured by the flight initiation distances (FID), of birds living in urban areas. The authors examined the FIDs of birds during the pandemic (COVID-19 lockdown restrictions) compared to FIDs measured before the pandemic (mostly in 2018 & 2019). The main study justification was that human presence changed drastically during the pandemic lockdowns and the change in human presence might have influenced the fear response of birds as a result of changing the "landscape of fear". Human presence was quantified using a 'stringency' index (government-mandated restrictions). Urban areas were selected from within five different cities, which included four European cities (Czech Republic - Prague, Finland - Rovaniemi, Hungary - Budapest, Poland - Poznan), and one city in the global south (Australia - Melbourne). Using 6369 flight initiation distances across 147 different bird species, the authors found that FIDs were not significantly different before the pandemic versus during the pandemic, nor was the variation in FID explained by the level of 'stringency'.

      Major strengths: There are several strengths to this study that allows for understanding the variety of factors that influence a bird's response to fear (measured as flight initiation distances). This study also demonstrates that FIDs are highly variable between species and regions.<br /> Specifically,<br /> 1) One of the major strengths of this paper is the focus on birds living in urban areas, a habitat type that is hypothesized to have changed drastically in the 'landscape of fear' experienced by animals during the pandemic lockdown restrictions (due to the presumed decrease in human presence and densities). Maintaining the focus on urban birds allowed for a deeper examination of the effect of human behaviour changes on bird behaviour in urban habitats, which are at the interface of human-wildlife interactions.<br /> 2) This study accounted for several variables that are predicted to influence flight initiation distances in birds including species, genus, region (country), variability between years, pandemic year (pre- versus during), the strictness of government-mandated lockdown measures, and ecological factors such as the human observer starting distance, flock size, species-specific body size, ambient air temperature (also a proxy of the timing during the breeding season), time of day, date of data collection (timing within the regional [Europe or Australia] breeding season), and categorization of urban site type (e.g. park, cemetery, city centre).<br /> 3) This study examined FIDs in two years previous to the pandemic (mostly 2018 and 2019, one site was 2014) which would account for some of the within- and between-year FID variation exhibited prior to the pandemic.<br /> 4) This study uses strong statistical approaches (mixed effect models) which allows for repeat sampling, and a post hoc analysis testing for a phylogenetic signal.

      Major weaknesses: The authors used government 'stringency' as a proxy for human presence and densities, however, this may not have been an accurate measure of actual human presence at the study sites and during measurements of FIDs. Furthermore, although the authors accounted for many factors that are predicted to influence fear response and FIDs in birds, there are several other factors that may have contributed to the high level of variation and patterns in FIDS observed during this study, thus resulting in the authors' conclusion that FIDs did not vary between pre- and during pandemic years.<br /> Specifically,<br /> 1) The authors used "government stringency" as a measure of change in human activity, which makes the assumption that the higher the level of 'stringency', the fewer humans in urban areas where birds are living. However, the association between "stringency" and actual human presence at the study sites was not measured, nor was 'stringency' compared to other measures of human presence such as human mobility.<br /> 2) There was considerable variation in FID measurements, which can be seen in the figures, indicating that most of the variation in FID was not accounted for in the authors' models. Factors that may have contributed to variation in FIDs that were not accounted for in this study are as follows:<br /> a. The authors accounted for the date of data collection using the 'day' since the start of the general region's breeding season (Europe: Day 1 = 1 April; Australia: Day 1 = 15 August). Using 'day' since the breeding season started probably was an attempt to quantify the effect of the breeding stage (e.g. territory establishment, nest young, fledgling) on FIDs. However, breeding stages vary both within- and between species, as well as between sub-regions (e.g. Finland vs. Hungary). As different species respond to predation or human presence differently depending on the stage during their breeding cycle, more specificity in the breeding cycle stage may allow for explaining the observed variation and patterns in FID.<br /> b. Variation in species-specific FIDs may also vary with habitat features within urban sites, such as the proximity of trees and other protective structures (e.g. perches and cover), the openness of the area, and the level of stressors present (e.g. noise pollution, distance to roads). Perhaps accounting for this habitat heterogeneity would account for the FID variation measured in this study.<br /> c. The authors accounted for species and genus within their models, however, FIDs may vary with other species-specific (or even specific populations of a species) characteristics such as whether the species/population is neophobic versus neophilic, precocial versus altricial, and the level of behavioural plasticity exhibited. These variables were not accounted for in the analysis.<br /> d. Three different methods of measuring the distances between flight and the observer location were used, and FIDs were only measured once per bird, such that there were no measures of repeatability for a test subject. Thus, variation surrounding the measurement of FIDs would have contributed to the variation in FIDs seen during this study.<br /> 3) The sample design of this study may have influenced the FID variability associated with specific species, and specific populations of species. A different number of species were sampled across the time periods of interest; 68 species were sampled before the pandemic versus 135 species after the pandemic. However, the authors do not appear to have directly compared the FIDs for the same species before the pandemic compared to during the pandemic (e.g. the FIDs of Eurasian blackbirds before the pandemic versus during the pandemic). Furthermore, within the same country-city, it is unclear whether the species observed before the pandemic were observed at the same location (e.g. same habitat type such as the same park) during the pandemic. As a species' FID response may be influenced by population characteristics and features specific to each site (e.g. habitat openness), these factors may have influenced the variability in FID measurements in this study.<br /> 4) The models in this study accounted for many factors predicted to affect FIDs (see the section on major strengths), however, the number of fixed and random factors are large in number compared to the total sample size (N =6369), such that models may have been over-extended.

      Overarching main conclusion<br /> Overall, this study examines factors influencing FIDs in a variety of bird species and concludes that FIDs did not differ during the pandemic lockdowns compared to before the pandemic (2019 and earlier). Furthermore, FIDs were not influenced by the strictness of government-mandated restrictions. Although the authors accounted for many factors influencing the measurement of FIDs in birds, the authors did not achieve their aim of disentangling the effects of pandemic-specific ecological effects from ecological effects unrelated to the pandemic (such as habitat heterogeneity). Their findings indicate that FIDs are highly variable both within- and between- species, but do not strongly support the conclusion that FIDs did not change in urban species during the pandemic lockdown. Therefore, this study is of limited impact on our understanding of how a drastic change in human behaviour may impact bird behaviour in urban habitats. Overall, the study demonstrates the challenges in using FIDs as a general fear response in birds, even during a pandemic lockdown when fewer humans are presumably present, and this study illustrates the large degree of variation in FIDs in response to a human observer.

    1. Author Response

      Reviewer #1 (Public Review):

      The authors managed to show the broad botanical landscape and not only the main crops. This unique achievement is based on decades of establishing an excellent collection of a full comparative seed collection of the current flora. This allows the identification of species that usually are not identifiable. The authors were able to compare the crops that were grown there and identify the contribution of the Roman period with that of the Arab one. This excellent study is a landmark in how such studies should be done. The list of identified species will be used for many other studies on this subject.

      We are very grateful to Reviewer #1 for this generous assessment.

      Reviewer #2 (Public Review):

      Fuks et al. provide extensive paleobotanical data from several sites in the Negev desert to address hypotheses regarding the relative importance of the Roman Agricultural Diffusion (RAD) and the Islamic Green Revolution (IGR) in the dispersal of crops across Eurasia.

      While the overall claims from the authors are convincing, I found the presentation of the data somewhat difficult to follow.

      Graphical visualization of the data with respect to the proposed hypotheses would go a long way towards making the argument clearer for a non-specialist audience.

      The authors apply appropriate caveats in the discussion about their ability to assess IGR given their timeline only incorporates the first few hundred years and some IGR plants may not leave macrobotanical remains. Yet I think more could be done to explain how the data they do find provides positive evidence for RAD. Many of their findings are inferred to be RAD introductions not because of the timing in their sites, but because of previous evidence of introductions at other sites. It would thus be helpful to be more explicit about what additional evidence these findings provide beyond previously published data of introductions of many of these crops into the Levant.

      We thank Reviewer #2 for the positive assessment and helpful comments. We have moved several tables out of the main text to the supplementary tables. We also added a new schematic of the main findings regarding 1st millennium CE introductions to the southern Levant and their significance in the Negev Highlands crop assemblage (Figure 4). We have also added explanatory text to clarify the point about taphonomy vs. period of diffusion.

    2. Reviewer #1 (Public Review):

      The authors managed to show the broad botanical landscape and not only the main crops. This unique achievement is based on decades of establishing an excellent collection of a full comparative seed collection of the current flora. This allows the identification of species that usually are not identifiable. The authors were able to compare the crops that were grown there and identify the contribution of the Roman period with that of the Arab one. This excellent study is a landmark in how such studies should be done. The list of identified species will be used for many other studies on this subject.

    1. Author Response

      eLife assessment

      This paper is of interest to researchers and policy makers involved in cervical cancer prevention. The paper provides insight into how the Covid19 pandemic accelerated changes in organized cervical cancer screening. The claim that self-sampling led to a major improvement of test coverage seems somewhat exaggerated and alternative hypotheses to those provided by the authors on the population who chose self-sampling are possible. Nonetheless, this is a valuable piece of work given the scope of the intervention(s) and the precedent it sets i.e. a crisis can in fact accelerate positive changes in screening that have been academic possibilities rather than practical realities.

      Thank you for this supportive summary. We have included exact data on exactly how much of the population test coverage that was attributable to self-samples. We have furthermore decided to focus on the population test coverage that is caused by organised testing (either taken by a clinician at a time and place that the woman was invited to by the organised program or taken by the woman herself using a sampling kit mailed to her by the organised program). These 2 improved analyses are intended to facilitate interpretation of how much of the improved test coverage that is attributable to the mailing of self-sampling kits.

      Reviewer #1 (Public Review):

      During the Covid19 pandemic, most cervical cancer screening programs were temporarily put on hold. The authors describe how Swedish health authorities dealt with this situation by implementing primary self-sampling and by launching a campaign with concomitant vaccination and screening. Besides, they show that the coverage of the screening program was one year after the start of the pandemic at pre-pandemic levels.

      Strengths of the paper are the clear presentation of the steps taken by the Swedish health authorities and the high quality of the presented screening coverage data which could be obtained directly from the screening registry. However, the paper would benefit from more in-depth analyses because the presented data raise questions. The number of invitations was >30 percent lower in the first year of the pandemic (Figure 1), but the screening coverage was only 4-5 percent lower. In the second year of the pandemic (year 2021), coverage was back at pre-pandemic levels, but the role of primary self-sampling in restoring screening coverage is a bit unclear. It is obvious that primary self-sampling made it possible to invite women again for screening during the pandemic, but there is no data on acceptance of primary self-sampling. Besides, the increase in coverage in year 2021 was only 4% and it is not clear whether such a modest increase could also have been achieved without primary self-sampling. In addition to self-sampling, the authors describe the launch of a concomitant vaccination and screening campaign. This is an interesting initiative but the authors do not show data on the coverage of this campaign in the target age range.

      We are now explaining that population test coverage is calculated over a whole screening interval. For example, if the screening interval is 3 years and improved attendance would only fully impact the population test coverage after 3 years. Furthermore, we are now presenting the exact data on how much of the test coverage is indeed attributable to the mailing of self-sampling kits.

      Reviewer #2 (Public Review):

      The manuscript by Elfstrom et al describes the impact of implementing self-sampling as the primary screening test in Sweden to address decreases in coverage following the COVID pandemic. The authors have a very rich dataset including all records of invitations to screen and screening results in the Stockholm area. A limitation is that there is no individual record linkage to allow investigation of the profile of the individuals who chose to screen using the self-sample.

      The conclusions are generally well supported by the authors with the following exceptions:

      1) There was not enough evidence presented in the manuscript to conclude that "The most likely explanation for the large increase in population coverage seen is that the sending of self-sampling kits resulted in improved attendance in particular among previously non-attending women."

      2) The authors state there is no evidence that delays in screening have impacted cervical cancer rates however they present no data to this effect in the manuscript.

      Although all screening and invitation data is indeed collected to the national screening registry, linking this data is not allowed without a permission from the Swedish National Ethical Review Board. We did apply for such a permission, which was granted on 2023-02-01, and a full set of registry linkage analyses to investigate the point raised by the reviewer is now included.

      The mention in discussion on stable cervical cancer rates was referring to public data from the national Cancer Registry. The source is now referenced.

      Reviewer #3 (Public Review):

      The authors report on the nature of interventions that were applied to aid and improve engagement in cervical screening, brought about by the SARS CoV Pandemic in Sweden.

      I appreciate that the impact of these interventions, given that they are recent, will take some time to quantify but the description (and reach) of the policy changes that occurred in a short amount of time is of significant interest to the screening community. The piece on HPV Even Faster is particularly novel; I am not aware of another example of where this has been enacted within a routine programme.

      Thank you for this supportive statement.

      The authors make reference to (15) where the reader can find greater details relating to the population who received the offer of self sampling (and the nature of the device). However I was a little confused (in this stand alone piece) as to who the self sampling group constituted exactly. Did this group not include pregnant women, women invited for first screen or women on non routine recall?

      This is correct, self-sampling kits were mailed to all women due for screening in the ages 26-70. Women due for screening aged 23-25 were invited for mid-wife-based sampling. Pregnant women were advised to come in for mid-wife-based screening, to save time. Women under follow-up from previous screens are not due for screening. This is now elaborated more clearly in the paper.

      The authors state that "the most likely explanation for the large increase in population coverage seen is that the sending of self-sampling kits resulted in improved attendance in particular among previously non-attending women" - why is this written as speculation at this stage (?) is it not possible to attribute directly the contribution made by self sampling, or is this in hand?

      See response to reviewer 2 above: Although all the data is indeed collected, we are not allowed to perform registry linkages without ethical permission. This has now been obtained and the requested analyses made.

      While self sampling is certainly an option that can support uptake and enfranchisement in cervical screening - its overall performance is fundamentally contingent on the number of women who then comply with follow up should the HPV test be positive; it is not simply about who returns the sample. It would have been of interest to see the proportion of women who did comply with follow up.

      The paper is not about follow-up strategies. Follow-up strategies are different in different settings and reporting is not standardized. They have also changed during the time of the study (e.g. cytology follow-up abandoned). A more detailed analysis of this would require a whole new paper.

    1. Author Response

      Reviewer #2 (Public Review):

      1) It could benefit from fleshing out concepts instead of using parentheses, particularly in the abstract.

      We agree and have amended the abstract and methods (please refer to responses provided to the editor’s comments 1a-1e)

      2) There is space to expand on the results presented in Table 1, including an explanation of Affected cohorts 2008 vs Affected cohorts 2008-2009. It may also be useful to explain this analysis in the methods section.

      Please refer to response provided to editor on the same question (comment 5).

      3) Given that Australia is a best-case scenario and other countries have not had the same success in HPV vaccination coverage, in the discussion would it be possible to give a comparison of how these three scenarios would look different in a population with school-based vaccination but lower coverage volume, such that readers could understand how much of the success / failures of each of the three catch-up scenarios? It would be particularly helpful for readers who are not familiar with the modelling tool used in this analysis.

      We have added some commentary in the discussion in response to the reviewer’s comment. In future, further similar work in countries with lower base coverage would be informative.

      “Australia is a relatively high HPV vaccination coverage setting. Outcomes may be less favourable in a lower coverage setting, as there would be less protection from herd effects; however, the impact of disruptions might also be smaller in a setting with lower coverage, since a lower coverage program would be less effective. Nevertheless, the finding that if catch-up is performed expeditiously then it mitigates much of the effect from vaccination delays, is likely to hold in other settings. In a previous study (Simms et al, Lancet Public Health. 2020 Apr;5(4):e223-e234) modelling the health impacts of HPV vaccination hesitancy in Japan from 2013 to 2019 and the potential effects of restoring coverage to 70% with catch-up vaccination in 2020 is informative as it demonstrates that multi-age HPV catch-up vaccination, after catastrophic falls in coverage in Japan, would be effective in mitigating the effects. “

    1. Author Response

      Reviewer #1 (Public Review):

      Ghosh and colleagues report on their multidisciplinary effort to improve cervical cancer screening attendance in the East Boston Neighborhood Health Center (March-August 2021). Specifically, the authors 1) identified using electronic medical records overdue follow-up visits, 2) scheduled screening appointments during regular clinic hours and weekends/evenings, and 3) surveyed patients on their experience. These objectives were clearly defined (although not consistently so throughout the manuscript) and data analyses/presentation were simple and straightforward, appropriate to the study design and methodology used.

      Thank you for this comment. We have clarified the objectives in the revised manuscript.

      Overall, it is unclear to what extent the overdue appointments were backlogs created by the COVID-19 pandemic or due to pre-pandemic factors that could have been exacerbated by the pandemic. In order to contextualize the current study and its findings, an elaboration is needed on whether the pandemic created the delays in cervical cancer screening or simply compounded the problem. For example, the authors report on page 8, lines 196-197 that in 30% of encounters (not clear how many of the 118 reviewed charts were overdue appointments) the healthcare provider did note the overdue appointments.

      We have Figure 2 (now Figure 4) and added Figures 2and 5 to address this comment. In 2019, prior to the COVID-19 pandemic, approximately 70% of patients were up-to-date with cervical cancer screening, corresponding to 8467 patients overdue for screening. In 2020, the up-to-date percentage dropped to 63.5% and the overdue number increased to 8812. Figure 2 is a flowchart of the project which clarifies the “30%” mentioned in the reviewer comment

      In addition, a brief description of the cervical cancer screening program in place would be informative.

      We have added this in the “setting” section of the methods on page 4-5, lines 107-128)

      Table 1 provides an effort versus value summary; however, these constructs are ill-defined, with few inconsistencies with what is reported in the text.

      This table is intended to help inform clinics that are considering implementing quality improvement programs about the effort required and value obtained for different aspects of our program. These are based in part on proprietary cost analyses so certain details are not able to be included. We have amended the text/table to eliminate inconsistencies.

      Comments specific to Aim 1:

      The methodology is missing information on key elements, mainly relating to the decision-making process of establishing and defining the "validated" patient chart list (1375 overdue patients out of 6126 reviewed charts). A chart of the 1375 approached study population is also warranted (459 patients were screened, 622 could not be reached, and 203 cancelled/missed their appointments, what about the remaining 91 patients). A description of the characteristics of the study population and a comparison of the different groups (screened, not reached, cancelled/missed appointment) along these characteristics are missing.

      We have added a flowchart with this information to the results section. See Figure 2.

      Comments specific to Aim 2:

      About 63% of the 459 scheduled screenings were done during the evening/weekend clinics, which represents a substantial gain and clearly indicates a window of opportunity to increase screening rates by pinpointing the importance of offering a convenient time to women attend screening visits. In general, and as expected, offering additional screening clinics was effective in addressing the backlog of patients, although with significant investment and resources as mentioned by the authors. How significant is significant?

      We are not able to share these data publicly. We have added the following sentence: “The cost data is proprietary/not shareable but analysis by clinical leadership indicated the program was not cost-effective/sustainable.” Page 22, lines 678-80

      Comments specific to Aim 3:

      A more structured and detailed presentation/description of the survey instrument, its administration, response rate, and significance of results are warranted in the manuscript, albeit the joint reporting of this in the appended material.

      We have added additional detail about the survey method (page 9, lines 225-6, 228-31) and results ( Page 14-5, lines 518-22, 530-3) . We also inserted the survey used in the clinics. (Figure 1).

      Reviewer #2 (Public Review):

      The purpose of this study is unclear from the introduction. Additionally, the methods are incomplete and did not describe how data was collected and analyzed. The results do not describe the sample. Once these are described more clearly, further comments can be made about what the authors were trying to achieve and the impact of the work on the field.

      We have clarified the study purpose in the introduction: “The purpose of the project was to examine the impact of a Quality Improvement intervention on improving cervical cancer screening, as well as to evaluate the effectiveness and sustainability of different methods for addressing overdue screening.” (page 3, lines 87-90)We have also clarified the methods and results to describe data extraction more completely from electronic medical records and statistical analysis using descriptive statistics.

    1. Reviewer #3 (Public Review):

      A big open question in evolutionary biology is how single cells become multicellular organisms, capable of adaptation as a collective. Many cells form groups, but adaptation at the level of the group tends to be inefficient (especially in comparison to cells). Theoretically, it has been proposed that groups formed by clonal development (cells remain attached to each other after division) can more readily lead to group-level adaptation than groups coming together through the aggregation of different cells post-division. To evaluate empirically the plausibility of this hypothesis, the authors compared adaptation in two lines of yeast that differ only in a couple of mutations determining their mechanism of group formation. Ace2 mutants develop through staying together, and Floc mutants through aggregation. They performed a form of size selection (through settling) as a way to select for multicellularity (this selection regime has been used before to obtain multicellular phenotypes). This selective regime has two components: growing (largely due to differences between cells) and settling (largely due to differences between groups). Thus, the authors assume that increases in fitness through growth are due mostly to adaptation at the single-cell level, whereas increases in fitness through settling are mostly due to adaptation at the multicellular level. They find that adaptation in clonal groups is mostly through settling and that aggregative groups adapt more through growth (despite getting bigger).

      Overall this assumption makes sense (especially in a positive way) but growth, in this case, is also selecting against groups in the snowflake case and less strongly so in the floc case in which cells aggregate and disaggregate with some probability, and therefore cells can keep growing. That is, in addition to assortment the result is somewhat expected because there is less of a trade-off between growth and settling in floc: having a higher density in floc probably leads to higher aggregation and indirectly benefits settling, whereas in the clonal case, larger groups mean that a larger proportion of cells is not growing.

      The main result of the paper holds true: clonal development favors multicellular adaptation relative to aggregative multicellularity, but the reason is not exclusively a difference in the distribution of variation, but a difference in the trade-off between single cell and multicellular traits.

      In the second part of the paper, the authors beautifully show that the mechanisms of group formation affect evolutionary processes. Clonal aggregation leads to a decrease in the effective population size (because the descendants of mutants are likely to be in the same group, and therefore be selected together). This result shows that the mode of development can affect evolution!

    1. Author Response

      Reviewer #2 (Public Review):

      This work attempts to connect the diet of a mother to the physiology and feeding behaviors of multiple generations of her offspring. Using genetic and molecular biology approaches in the fruit fly model, the authors argue that this Lamarckian inheritance is mediated by germline-inherited chromatin and is regulated by the general activity of a histone methylase. However, many of the measured effects are small and variable, the statistical tests to prove their significance are missing or poorly described, and some experiments are inadequately described and lack important controls.

      1) The authors claim that the diet of a mother can influence the physiology of her progeny for several generations. However, the observed effects of maternal diet on later generations were small and variable for most assays (see Fig1C, S1.1A, B, D). Additionally, the effect size between F0 HSD to ND was often larger than the effect size between the progeny of F0 parents and ND. To put it another way, if the authors were to compare the F1, F2, etc. to the F0 HSD flies, they would conclude that the majority of the response to diet is not maternally transmitted, and is directly controlled by the diet of the individual being measured.

      We agree with the reviewer that the effect size of acute HSD exposure (in HSD-F0 flies) was stronger than that of transgenerational inheritance (in HSD-F1/2/3/4 flies). Similar observations were also made in other studies, see Klosin et al., Science, 2017, Bozler et al., eLife, 2019. We would argue this difference in effect size was as expected and with clear biological relevance.

      For all living organisms, acute environmental changes (diet change included) have direct and profound influences on their survival and reproduction, and therefore need robust and immediate responses. In comparison, ancestral environmental changes may only provide some vague and indirect indications of the current living environment of the offspring. Such information may be beneficial for the survival and reproduction of the offspring, but the effect size is expected to be much smaller, or at least smaller than that of acute environmental changes.

      Studies on Dutch Famine offers a good example. Human individuals who were prenatally exposed to famine were found to be associated with greater risk in metabolic diseases (Ravelli et al., NEJM, 1976). But nevertheless, direct high-fat diet exposure was still the much stronger risk factor for obesity and metabolic disorders (Bray et al., Am J Clin Nutr, 1998, Jéquier et al., Int J Obes Relat Metab Disord, 2002).

      We have added additional discussions in the manuscript for clarification.

      Furthermore, since our current study aimed to investigate the mechanism of behavioral transgenerational inheritance, we focused on the comparison between HSD-F1 flies (and their progeny) vs. ND-fed flies. As the ancestors of HSD-F1/2/3/4 flies were exposed to HSD, whereas HSD-F1/2/3/4 flies themselves were never exposed to HSD, any difference we observed between the two groups could be solely attributed to transgenerational inheritance of ancestral HSD exposure. With that saying, to better distinguish the effects of acute HSD exposure vs. transgenerational inheritance upon ancestral HSD exposure, we re-analysed and presented the comparisons among ND, HSD-F0, and HSD-F1 data in the manuscript (Figure 1. B-E, Figure 1-figure supplement 1. A-E, Figure 1-figure supplement 2. A-D, Figure 3. D-E, Figure 3-figure supplement 1. B-D, Figure 3-figure supplement 2 and 3. A-B).

      2) The authors chose to study PER, which had the largest average effect sizes between conditions. However, PER was highly variable in the averaged data, with some individuals showing large effects and others having no effects. A better characterization of transgenerational PER may increase the robustness of this assay and confidence in its results. For example, the authors could measure PER in lineages derived from individual flies to determine when transgenerational effects on PER decline or disappear. This form of data collection could help to explain the high variation in the averaged data presented in the paper.

      We acknowledged that PER in general was quite a variable behavioural trait (probably as to most if not all behavioural measures). It was not surprising since animal behaviours, as complex traits, could be influenced by numerous intrinsic and extrinsic factors, such as genetic background, developmental environment, diet, population density, environmental conditions, etc. Numerous PER studies have exhibited similar variability (Masek et al., PNAS, 2010, Marella et al., Neuron, 2012, Charlu et al., Nature Communication, 2013, Wang et al., Cell Metabolism, 2016, Wang et al., Cell Reports, 2020).

      Nevertheless, in our current study we were able to identify statistically significant behavioural difference between ND-fed flies and HSD-F1/2/3 flies, demonstrating that ancestral HSD exposure imposed transgenerational inheritance on sweet sensitivity. To further increase the robustness of the study as suggested by the reviewer, we have conducted additional repetitions of many PER experiments and further confirmed the phenotype with less variability and more statistical power (Figure 1. G-I, Figure 3. D-E, Figure 3-figure supplement 1. B-D, Figure 3-figure supplement 2 and 3. A-B). The reviewer also suggested the use of isogenic flies, which might help to minimize the variations of genetic background. However, we think that demonstrating the behavioural difference in genetically diverse fly populations is a more credible way to show that such transgenerational inheritance is a reliable and generalizable phenomenon.

      3) What do the error bars represent on any figure? There are many examples where the data is highly variable and lies completely outside of the error bars. What is the statistical test for significance that is carried out in each figure? The brief comment about statistics in the methods section is inadequate. The authors should also supply the raw data used to generate the figures so that readers can perform their own statistical tests.

      Data in the manuscript were represented as means ± SEM (standard error of the mean) in all of our figures, which is a standard practice in the field (Masek et al., PNAS, 2010, Charlu et al., Nature Comm, 2013, Wang et al., Cell Metabolism, 2016). We have provided detailed explanations of the statistical tests in the manuscript. We have also prepared raw data files as suggested by the reviewer.

      The model that global H3K27me3 is regulated by ancestral diet is unconvincing without further experimental validation and explanation. Points 4-10 address specific issues.

      4) The authors performed ChIP on cycle 11 embryos. This stage is extremely short (11 min) and contains roughly 10 times less chromatin than embryos only 30 minutes older. These features make it very difficult to collect large numbers of precisely staged embryos without significant contamination. It is also debatable whether early cell cycles (including and preceding cycle 11) are slow enough to deposit and propagate histone marks in the presence of new histone incorporation. See the opposing arguments in Zenk et al 2017 and Li et al 2014. The authors could perform ChIP on older embryos to avoid this controversy.

      We thank the reviewer for the clarification. Our embryo collection protocol involved allowing flies to lay eggs freely in a cage for 30 minutes followed by 50 minutes of incubation on a juice plate, and then completing the embryo sorting within 30 minutes. Therefore, to describe it in a more stringent way, our embryos should be in the stage between cycle 10-12. We have corrected this information in the manuscript (Figure 2. A).

      Since all the embryos were sorted using the same morphological criteria within the same time frame, their developmental stages should be comparable (i.e. all from cycle 10-12). In several references we consulted, a broader range (cycle 9-13) was used for ChIP-seq sequencing analysis (for example, see Zenk et al., Science, 2017).

      Surely any maternally inherited information will also be present in cycle 14 or 15 embryos if it is to influence the development or physiology of the brain. The observed differences in global H3K27me3 levels in F1 vs ND flies could be explained by slightly different aged embryo collections or technical variations in the ChIP protocol. The authors could strengthen their conclusion by performing more ChIP replicates. Alternatively, the authors could use orthogonal approaches like antibody staining or western blots to measure global H3K27me3 levels in precisely staged embryos.

      We chose to use cycle 10-12 embryos because we aimed to identify epigenetic modulations directly transmitted through the maternal germline. Embryos in cycle 14-15 might reveal more profound changes, but since embryos in that stage had entered the zygotic phase and started the remodeling of histone modifications, we think it might mask the maternally transmitted changes we sought to identify.

      In addition, we conducted two biological replicates for each group for the ChIP-seq analysis, which was a standard in the field (Zenk et al., Nature, 2021, Ing-Simmon et al., Nature Genetics, 2021). In the current study we further verified the genes identified in the ChIP-seq analysis in RNA-seq and qPCR analysis.

      We further verified the ChIP-seq results by using western blot, which showed a ~2 folds increase in H3K27me3 modification in HSD-F1 early embryos vs. ND-fed embryos, in line with the ChIP-seq data (Figure 2-figure supplement 1. B). We have also provided immunofluorescence results for embryos at cycle 13 and cycle 14, which clearly showed a significant increase in H3K27me3 modifications in HSD-F1 embryos (Figure 2-figure supplement 1. C).

      5) The authors measure PRC2 subunit mRNA levels in adult fly heads to attempt to explain the observed differences in inherited H3K27me3 levels in fly embryos. The authors should examine PRC2 components in germ cells and early embryos to understand how germ cells and early embryos generate H3K27me3 patterns.

      We have now shown that Pcl and E(z) mRNA expression in HSD-F0 flies were not significantly changed vs. ND-fed flies (Figure 2-figure supplement 2. D-G). Meanwhile, H3K27me3 demethylase UTX and H3K27ac acetyltransferase Cbp showed significant decrease (Figure 2-figure supplement 2. H). Therefore, HSD exposure imposed complex epigenetic modifications in HSD-F0 flies, which then led to transmission of epigenetic marks to their progeny. Given the main scope of this study was to understand which epigenetic program mediated the behavioral transgenerational inheritance upon ancestral HSD exposure (but not that mediated acute HSD exposure), we focused our effect on H3K27me3 which was significantly changed between HSD-F1 flies vs. ND-fed flies.

      6) The RNAi experiment targeting PRC2 components in embryos is uninterpretable without appropriate controls and an explanation of the genotypes used in the experimental paradigm. Are the authors crossing nosNGT mothers to UAS-RNAi fathers and assaying the progeny? What is the genotype of the F1 flies and how does it compare to the genotype of the ND flies? The authors should also note that the Gal4 drivers they use are not necessarily restricted to the ovary, and could directly affect other tissues controlling PER like neurons and muscle. Additionally, the authors should supply the appropriate controls to verify that their experimental paradigm has the intended effect. PRC2 proteins are presumably loaded into embryos and would be immune to zygotic-expressed RNAi. The authors could validate when PRC2 RNAi is effective by staining embryos for H3K27me3.

      We have now added schematic diagrams and detailed explanations in our revised manuscript to better explain the RNAi experiments (Figure 3-figure supplement 1. A). As shown in the diagram, we compared each RNAi treatment group to appropriate genetic controls. We have also noted in the manuscript that the GAL4 drivers we used were not restricted to the ovary.

      We have now verified the effect of PRC2 knockdown to reduce H3K27me3 in female germline by both western blot and immunofluorescence staining (Figure 3. B-C).

      7) Although the authors do not note this, nosNGT>RNAi affects the PER of ND flies (compare Gal4>RNAi to just RNAi or just Gal4 in ND columns in Fig3A-D). This could be due to RNAi expression in neurons or muscles or some other indirect effect. Regardless of the mechanism, this result makes it difficult to interpret how RNAi treatments affect the transgenerational inheritance of PER if there is an equivalently strong nontransgenerational effect.

      Although nosNGT>RNAi appeared to slightly affect PER response of ND-fed flies, there was no statistically significant difference (Figure 3-figure supplement 1. B and D, Figure 3-figure supplement 2. A-B). Rather, the effect of E(z) knockdown was evident in HSD-F1 flies (Figure 3-figure supplement 1. B), further confirming the involvement of H3K27me3 in transgenerational inheritance of PER reduction.

      8) The matalpha gal4 experiment is inadequately explained in the text or methods. Are the authors expressing RNAi in the ovaries of the F0 flies that are fed an HSD? Does the ovary influence their PER somehow? Similar to point 8, there appears to be a nontransgenerational component to the RNAi phenotype that clouds the interpretation of the transgenerational effect (compare F0 in S3.1A-C).

      We have now added a schematic diagram and detailed explanations in our revised manuscript to better explain the RNAi experiments (Figure 3. A). As shown in the diagram, we compared each RNAi treatment group to appropriate genetic controls.

      Similar to point 7, although Mat-tub-GAL4>RNAi might seem to affect PER responses of ND-fed flies, there was no statistically significant difference (Figure 3. D-E). Rather, the effect of E(z) knockdown was evident in HSD-F1 flies (Figure 3. D), further confirming the involvement of H3K27me3 in transgenerational inheritance of PER reduction.

      9) For the EED inhibitor experiments (both PER and calcium imaging), it is unclear whether the authors fed the mothers or their adult progeny the EED inhibitor. If adult progeny were fed, what tissues were affected? The authors should stain various tissues with an H3K27me3 antibody to verify the effectiveness of their inhibitor. Finally, the effect of the EED inhibitor on calcium imaging was not convincing because the variation was so large.

      We have added a new schematic diagram and provided more detailed explanations in the manuscript for pharmacological interventions (Figure 4. A-D). To verify the effect of the drug treatment, we showed that compared to the control group fed with DMSO, flies fed with the inhibitor showed a significant decrease in H3K27me3 levels, demonstrating the effectiveness of the inhibitor (Figure 4-figure supplement 1. A).

      We acknowledged the unsatisfactory quality of our calcium imaging experiments in our initial submission. We have now improved our experimental procedures to reach better data quality, while the conclusions remained consistent (Figure 4. E).

      10) In all of the PRC2 RNAi and inhibitor experiments, are there any other phenotypes that would suggest that the treatments are working? There are many published PRC2 loss-offunction phenotypes (molecular and developmental) in different tissues. The authors could assure the reader that their treatments are working as expected by doing these controls.

      As discussed above, we have now used western blot and immunofluorescence staining to validate the efficiency of PRC2 RNAi in female germline (Figure 3. B-C).

      11) The authors propose that a transgenerationally inherited state of the caudal gene is responsible for the transgenerationally inherited PER. However, the experiments investigating the methylation state and expression level of caudal are unconvincing. Cad mRNA abundance varied immensely in the ND RNAseq samples. When the authors compared cad levels across generations, the effect size was small. A single outlier in the ND sample in both the RNAseq and the RTPCR experiments appears to drive up its mean and effect size. The H3K27me3 ChIP on cad is very similar in the F1 and ND samples and the acetylation peak on its promoter appears unchanged. The authors could vastly improve the caudal experiments in this paper by simply using cad antibodies to stain the relevant tissues that contribute to PER. For example, the authors could stain GR5a neurons for cad expression in different generations that inherit (or don't inherit) maternal PER to more accurately determine if cad levels are indeed transgenerationally regulated. The authors could also perform more ChIP experiments at a less variable stage to convincingly correlate epigenetic marks on cad with its expression level.

      As discussed above, we conducted two biological replicates for each condition of the ChIP-seq analysis, which was a standard in the field (Zenk et al., Nature, 2021, IngSimmon et al., Nature Genetics, 2021). We have also performed western blot and immunofluorescence for H3K27me3 in ND vs. HSD-F1 embryos to further validate our ChIP-seq data (Figure 2-figure supplement 1. B-C).

      As for Cad gene, H3K27m3 signals showed a statistically significant difference between ND-fed and HSD-F1 flies (Figure 5. D). We have also conducted additional qPCR experiments to verify the gene expression changes of the Cad gene (Figure 5. F, right), which was in line with the ChIP-seq data and further supported its validity.

      It was worth noting that during the developmental time window of our ChIP-seq analysis, the acetylation signals in the promoter region of cad were very low (Figure 5. D), making it impossible to make a comparison.

      Reviewer #3 (Public Review):

      Jie Yang et al. investigated the transgenerational behavioral modification of a high-sugar diet (HSD) in Drosophila and revealed the underlying molecular and neural mechanisms. It has been reported that HSD exposure decreases sweet sensitivity in gustatory sensory neurons, resulting in reduced sugar response (Proboscis extension reflex, PER) in flies. The current study reports that this effect can be transmitted across generations through the maternal germline. Furthermore, the authors show that H3K27me3 modification is enhanced in the first-generation progenies of HSD-treated flies (F1), and genetical or pharmacological disruption of PCL-PRC2 complex blocks the behavioral change and restores the sweet sensitivity in the Gr5a+ sweet sensory neurons. The authors further analyze the differentially expressed genes in the F1 flies. Among H3K27me3 hypermethylated regions, they focus on homeobox genes and find a transcription factor Caudal (Cad), which shows decreased expression in the F1 flies. Knocking down Cad in Gr5a+ neurons results in decreased PER response to sucrose.

      Transgenerational changes in physiology and metabolism have been broadly studied, while inherited changes at the behavioral level are much less investigated. This work provides convincing evidence for transgenerational modification of feeding behavior and digs out the underlying molecular and neural mechanisms. However, there still are several concerns that need to be clarified.

      1) The epigenetic regulator PCR2 has been found to play an essential role in the 7d-HSDinduced modification of the PER response. In this study, it's important to clarify for the transgenerational change, whether epigenetic modification is required in the flies exposed to HSD (F0), the progenies (F1), or both. It would be very helpful for better interpretation if the procedures of HSD treatment in RNAi experiments and the drug treatments were stated in more detail. In addition, the F0 flies should be examined as the control.

      In this current study our main scope was to understand the transgenerational influence of HSD exposure on the progeny. To this aim, we chose to study the physiological and behavioral differences between ND-fed flies vs. HSD-F1 flies (and their progeny on ND). HSD-F1 flies (and their progeny) were not exposed to HSD in their whole life cycle and therefore the physiological and behavioral changes we observed vs. ND-fed flies could be solely attributed to epigenetic modifications transmitted via germline cells from HSD-F0 flies. Therefore ND-fed flies were used as the main control.

      As for HSD-F0 flies, the acute effects of HSD exposure could be more complex. Epigenetic factor was likely involved, as evident in Figure 3-figure supplement 1. C, Figure 3-figure supplement 3. A-B and Figure 4. C. In addition, HSD exposure might also directly affect gene expression and multiple signaling pathways in HSD-F0 flies (see Chen et al., Science China Life Sciences, 2020). Therefore, we did not aim to investigate how HSD exposure affected HSD-F0 flies in this current study. We have added additional discussions in the manuscript for clarification.

      With that saying, we still added more HSD-F0 flies as controls when needed (Figure 2-figure supplement 2. D-G, Figure 3-figure supplement 1. C, Figure 4. C, Figure 5. F, left).

      We have also modified the schematic diagrams and added more detailed explanations in the manuscript, in order to provide a clearer illustration of the experimental procedures (Figure 3. A, Figure 3-figure supplement 1. A, Figure 4. A, B and D). Specifically, we employed two different RNAi approaches. Firstly, we used genetic methods to obtain homozygous Mat-tub-gal4>UAS-gene X RNAi fly lines on chromosomes Ⅱ and Ⅲ for germline-specific knockdown (Figure 3, Figure 3-figure supplement 3). Secondly, we used heterozygous nosNGT-gal4>UAS-gene X RNAi flies for embryo-specific knockdown (Figure 3-figure supplement 1 and 2). Our drug experiments involved both treating the flies and measuring their PER (Figure 4. A-C) and treating the parental flies and measuring the PER of their progeny (Figure 4. D).

      2) The information on the drug treatment period is also missing for imaging experiments (Fig.4C). Moreover, the response curve is very different from those recorded in the same neurons in previous studies. What’s the reason? Please also provide a representative image showing which part of the Gr5a neurons is recorded.

      The experimental procedures of drug treatments were shown in Figure 4. A now. We fed adult flies with specific compounds for five days after eclosion, then measuring the calcium signals of Gr5a+ neurons when flies were fed with sucrose.

      As suggested by the reviewer, we have now conducted calcium imaging experiments more carefully and thoroughly. We have now added the new data into the revised manuscript and the conclusions remained consistent (Figure 4. E). We recorded the calcium signal in the axons of Gr5a+ neurons in the SEZ.

      3) It's unclear whether the decreased Cad expression upon HSD treatment specifically occurred in Gr5a+ neurons or a lot of cells. If the change in gene expression is significant in the qPCR test, it should occur in a large number of cells, most likely including different types of gustatory sensory neurons. If lower cad expression led to lower neural response and thereby lower behavioral response, how to specifically decrease the PER response to sucrose but not to other tastes? -whether HSD-induced desensitization is specific to sucrose in the offspring?

      We agree that Cad expression might decrease in a lot of cells including Gr5a+ neurons in the proboscis. In order to investigate whether taste perception other than sweet sensing was also affected, we conducted PER experiments with fatty acids, which was another type of appetitive taste cues like sugars. Perception of fatty acids is mediated by ionotropic receptors such as ir25a, ir76b, and ir56b (Ahn, et al., eLife, 2017, Brown., et al, eLife, 2021).

      Our results indicate that PER of fatty acid in HSD-F0 and HSD-F1 was not significantly reduced compared to the ND-fed controls (Figure 1-figure supplement 2. E-F). This suggests that the impact of Cad on gustatory sensory neurons might be specific to sweet sensitivity of Gr5a+ neurons.

      4) In Fig.2D, data are sorted for genomic regions showing an up-regulated modification of H3K27me. It's unclear whether similar sorting was performed in panel C. This needs to be clarified.

      The analysis shown in Figure 2C and 2D were linked. As for 2C, we identified genomic loci with enriched H3K27me3, H3K9me3, and H3K27ac peaks, and found that H3K27me3 peaks showed the most robust changes between ND-fed and HSD-F1 flies. Therefore we concentrated on these loci where H3K27me3 modifications were significantly changed between the two groups, and further analyzed their difference. As shown in Figure 2D, within these loci, H3K27ac modifications, which was functionally antagonizing to H3K27me3, were significantly reduced; whereas H3K9me3 signals within these loci remained unchanged. Such results confirmed that ancestral HSD exposure induced robust H3K27me3 modifications in certain genomic loci.

    1. Author Response

      Reviewer #1 (Public Review):

      The paper proposes a novel approach, named ModCRE, which utilizes structure-based learning to predict the DNA binding preferences of transcription factors (TFs). The authors integrate both experimental knowledge of the structures of TF-DNA complexes and large amounts of high-throughput TF-DNA interaction data. Additionally, the authors have developed a server that automatically produces these characteristics for other TFs and their complexes with co-factors.

      Strengths: The paper's integration of experimental knowledge and highthroughput data to develop statistical knowledge-based potentials to score the binding capability of TFs in cis-regulatory elements is a powerful strategy. The proposed approach can be applied to more than 80% of TF sequences, making it a general method for characterizing binding preferences.

      Weaknesses: The paper is difficult to follow, as it contains many technical details and implementation details. The method applied is not always clear, and the paper focuses on implementation rather than the message. The results indicate that the nearest neighbors approach in Figure 4 outperforms the proposed method in many cases, and the proposed method seems to perform better only when similarity with the target is low. The same applies in Fig. 5 when using normalized ranked scores.

      It appears that the authors have successfully developed a structure-based learning approach for predicting DNA binding preferences of transcription factors. However, the paper's technical language and implementation focus make it challenging to follow at times.

      It seems the authors have successfully achieved most of their aims in improving predictions for TF-DNA interaction, and the results support their conclusions.

      This work has the potential to significantly impact the field of TF-DNA binding and gene regulation, particularly for those interested in predicting PWMs for TFs with limited or unreliable experimental data.

      General comment: We wish to thank the reviewer for his/her comments helping us to facilitate the reading, clarify the ideas and certainly improve the manuscript. We also thank his/her comments on the strengths. In the current revision we have tried to solve the faults and improve the weaknesses. Certainly, the results section contained many explanations of the method and its implementation rather than its use and application. Referred to figures 4 and 5, the reviewer is right too: Our approach can help to predict the binding motif of a transcription factor on difficult cases, when the PWMs of closest homologs are unknown, but the structure of its complex with DNA can be provided. Otherwise, when information of binding is available for close homologs, traditional state-of-the-art approaches are better than our approach and we recommend them.

      Reviewer #2 (Public Review):

      This work describes the development of a new structure-based learning approach to predict transcription binding specificity and its application in the modeling of regulatory complexes in cis-regulatory modules. The development of accurate computer tools to model protein-DNA complexes and to predict DNA binding specificity is a very relevant research topic with significant impact in many areas.

      This article highlights the importance of transcriptional regulatory elements in gene expression regulation and the challenges in understanding their mechanisms. Traditional definitions of activating regulatory elements, such as promoters and enhancers, are becoming unclear, suggesting an updated model based on DNA accessibility and enhancer/promoter potential. Experimental techniques can assess the sequence preferences of transcription factors (TFs) for binding sites. Recent models propose a cooperative model in which regulatory elements work together to increase the local concentrations of TFs, RNA polymerase II, and other co-factors. Co-operative binding can be mediated through protein-protein or DNA interactions. The authors developed a structurebased learning approach to predict TF binding features and model the regulatory complex(es) in cis-regulatory modules, integrating experimental knowledge of structures of TF-DNA complexes and high-throughput TF-DNA interactions. They developed a server to characterize and model the binding specificity of a TF sequence or its structure, which was applied to the examples of interferon-β enhanceosome and the complex of factors SOX11/SOX2 and OCT4 with the nucleosome. The models highlight the co-operativity of TFs and suggest a potential role for nucleosome opening.

      The results presented by the authors have a large variability in performance upon the different TF families tested. Therefore, it would be ideal if the performance/accuracy of the method is tested in some simple predictions and validated with prospective experimental data before applying it to model difficult scenarios such as those described here: SOX11/SOX2/OCT4 and nucleosome or interferon beta and enhanceosome. This will give more support to the models generated and thus the validity of the conclusions and hypothesis derived from them.

      General comment: We wish to thank the reviewer for his/her comments, we really appreciate them and the opportunity to have new tests with our approach. Some of his/her comments coincide with those of reviewer 1. When this is the case, we will refer to our previous answers and modifications in the manuscript. In this revision we have included new tests to validate the approach using available and published experiments different than the ones used in the original submission. We hope the new information is sufficient to support our approach.

    2. Reviewer #1 (Public Review):

      The paper proposes a novel approach, named ModCRE, which utilizes structure-based learning to predict the DNA binding preferences of transcription factors (TFs). The authors integrate both experimental knowledge of the structures of TF-DNA complexes and large amounts of high-throughput TF-DNA interaction data. Additionally, the authors have developed a server that automatically produces these characteristics for other TFs and their complexes with co-factors.

      Strengths: The paper's integration of experimental knowledge and high-throughput data to develop statistical knowledge-based potentials to score the binding capability of TFs in cis-regulatory elements is a powerful strategy. The proposed approach can be applied to more than 80% of TF sequences, making it a general method for characterizing binding preferences.

      Weaknesses: The paper is difficult to follow, as it contains many technical details and implementation details. The method applied is not always clear, and the paper focuses on implementation rather than the message. The results indicate that the nearest neighbors approach in Figure 4 outperforms the proposed method in many cases, and the proposed method seems to perform better only when similarity with the target is low. The same applies in Fig. 5 when using normalized ranked scores.

      It appears that the authors have successfully developed a structure-based learning approach for predicting DNA binding preferences of transcription factors. However, the paper's technical language and implementation focus make it challenging to follow at times.

      It seems the authors have successfully achieved most of their aims in improving predictions for TF-DNA interaction, and the results support their conclusions.

      This work has the potential to significantly impact the field of TF-DNA binding and gene regulation, particularly for those interested in predicting PWMs for TFs with limited or unreliable experimental data.

    1. Author Response

      Reviewer #1 (Public Review):

      Davies et al. examined the role of the malaria parasite's FIKK4.1 protein kinase in trafficking and host membrane insertion of key proteins that are exported by the intracellular P. falciparum parasite. FIKK4.1 is one of 18 FIKK serine/threonine kinases exported into the host erythrocyte; these kinases phosphorylate both host proteins and exported parasite proteins. FIKK4.1 has previously been implicated in rigidification of the erythrocyte cytoskeleton. It is also known to affect trafficking and insertion of PfEMP1, the parasite's primary cytoadherence ligand, on the host cell surface. In the present studies, the authors perform sophisticated gene-editing experiments that combine conditional knockout of FIKK4.1 with tagging of two kinase targets with the TurboID proximity biotin-labeling enzyme to explore phosphorylation-dependent changes in target protein localization, structure, or protein-protein interactions. Using conditional knockout of each exported FIKK kinase, they determine that FIKK4.1 is the only kinase that regulates PfEMP1 surface exposure and that it does not appear to modulate surface translocation of RIFINs, a family of parasite antigens involved in immune evasion. The combination of gene-editing, proximity labeling and mass spectrometry, and biochemical studies in the paper is to be lauded. These findings identify key targets of exported kinases and will guide future studies of host cell remodeling.

      Key limitations of the study:

      1) TurboID tagging of FIKK4.1 followed by proximity labeling and mass spectrometry of biotinylated proteins revealed parasite-stage dependent labeling of 101 parasite proteins and 39 human proteins that come in contact with FIKK4.1. Although TurboID is a more efficient biotin ligase produced through directed evolution, nonspecific biotinylation of proteins that do not form biologically relevant interactions remains an issue. Biotin addition for 4 hours, as used here and in most studies using this ligase, allows for labeling of proteins that undergo random collisions with the TurboID-tagged protein. While there was clear enrichment of exported proteins in the FIKK4.1-tagged parasite at mature schizont stages when FIKK4.1 is in the host cytosol, only 66% of the proteins labeled were exported, consistent with labeling and recovery of irrelevant proteins. As the authors performed appropriate controls and interpreted their findings cautiously, this limitation results primarily from finite efficiency of TurboID, trace levels of endogenous biotin within cells, and other complexities associated with the technology.

      We agree with the reviewer that there are limitations to TurboID and the mere presence of a protein in a dataset does not imply functional relevance (which is also true for IP data). However, it is highly complementary to data obtained through other methods (in our case previous cytoadhesion data and phosphoproteome data) and as we show here, can give high resolution information on the local protein environment of a protein. This is illustrated by highly significant protein-specific interaction datasets for PTP4 and KAHRP obtained from biological triplicate experiments. The site-specific protocol we use later in the paper allows us to eliminate unbiotinylated proteins non-specifically binding to beads which is a major advantage, evidenced by the much higher ratio of exported proteins observed in the PTP4 and KAHRP-turboID datasets.

      2) The production of dual-edited parasites carrying conditional knockout of FIKK4.1 and TurboID tagging of either KAHRP or PTP4 permitted examination of changes in localization of exported proteins upon their phosphorylation by FIKK4.1. KAHRP and PTP4 are excellent choices for these experiments because they are established targets of the kinase and good candidates for effectors involved in PfEMP1 membrane insertion. Some 30-40 proteins exhibited significant changes in biotinylation by these TurboID-tagged proteins, suggesting altered localization or structure upon loss of FIKK4.1 kinase activity. PfEMP1 trafficking proteins (PTPs), Maurer's cleft proteins, exported heat shock proteins, and components of PSAC, a parasite-associated nutrient uptake channel, all exhibited changes. Although FIKK4.1 is not essential for in vitro parasite propagation, altered localization could result either directly from changes in phosphorylation status of the protein itself or could reflect indirect effects on the cell from loss of FIKK4.1.

      The reviewer is correct in that we cannot exclude that it is not only loss of FIKK4.1 mediated phosphorylation sites that leads to the observed changes, but that the loss of the FIKK4.1 kinase domain affects the localisation of other proteins. Conditional inactivation of the FIKK4.1 kinase domain while retaining the overall protein would have been a more elegant approach. However, we do not predict the kinase domain of FIKK4.1 to be a strong structural component given that kinase domains often evolved to have low affinity interactions with their multiple targets and are less likely to act as scaffolding parts. As the reviewer points out, because we observed no growth defect upon deletion of FIKK4.1. Therefore we can be quite certain that the observed changes are not due to indirect effects caused by differences in growth but are a direct effect by the loss of the kinase domain and FIKK4.1’s enzymatic activity.

      3) As a consequence of these two limitations, these experiments could not conclusively implicate either KAHRP or a specific PTP in PfEMP1 surface translocation. Whether specific Maurer's cleft proteins or the nutrient channel components contribute to PfEMP1 surface translocation could also not be addressed. The authors' Discussion section is appropriately cautious in interpreting changes in biotinylation upon FIKK4.1 disruption. Although a large amount of data has been generated in this sophisticated study, the precise mechanism of PfEPM1 trafficking and membrane insertion remains elusive.

      We agree with the reviewer that we do not definitively explain the mechanism of FIKK4.1 in PfEMP1 surface translocation. But we identify several promising candidates for modulating its effect, some of which (for example PTP4) have previously shown to be relevant for PfEMP1 surface translocation. We also identify unexpected proteins which can now be investigated further. New methods in high resolution Cryo-EM imaging may allow us to image individual protein density in knobs and visualize the observed changes in the future. Further PerTurboID experiments with individual components will likely draw an ever finer picture. Here we focus on emphasising the potential of PerTurboID for identifying connections between proteins, and to observe changes to protein characteristics which would be missed by other techniques.

      Reviewer #2 (Public Review):

      Davies et al combine TurboID with conditional mutagenesis to reveal how a perturbing event alters the accessibility of a sub-cellular proteome to proximity biotinylation. The approach builds on established techniques for antibody-mediated enrichment of biotinylated peptides (rather than purification of whole biotinylated proteins by avidin) to enable mapping of the specific lysines that are biotinylated by TurboID and how access to these sites changes between conditions. The insights gained have a range of potential implications touching on protein trafficking/localization, complex dynamics and membrane topology. The authors apply this strategy to study trafficking of the key P. falciparum adhesin PfEMP1 to the infected erythrocyte surface. This group has previously shown that the exported parasite kinase FIKK4.1 is important for this process but the specific mechanism is unknown. In the first part of the present study, the authors develop PerTurboID and analyze the altered biotinylation patterns upon FIKK4.1 deletion in parasite lines bearing TurboID tags on PTP4 or KAHRP, two proteins required for this pathway and likely direct substrates of FIKK4.1. Numerous changes in site-specific biotinylation are quantitatively assessed on hundreds of proteins and possible implications for these changes are discussed, including topology of parasite integral membrane proteins exported into the RBC compartment as well as how the conformation of the RhopH complex might be altered upon RBC membrane integration. In a final set of experiments, the authors show that among 18 exported FIKK kinases, FIKK4.1 is uniquely important to PfEMP1 surface display but not to the distinct RIFIN class of parasite proteins that are also trafficked to the RBC surface. On the whole, the data are compelling and provide an important new approach that advances the proximity labeling toolkit.

      While the resolution of PerTurboID captures the site-specific changes in biotinylation abundance and position that occur upon loss of FIKK4.1, a limitation of the study is that these observations do not necessarily clarify the model for how FIKK4.1 is controlling the PfEMP1 trafficking pathway. The authors convincingly show that FIKK4.1 uniquely supports PfEMP1 surface presentation and cytoadhesion. However, this is not connected to the PerTurboID data in a way that provides a mechanism for how this is achieved by FIKK4.1 activity and in my opinion doesn't deliver on the title claim to "reveal the impact of kinase deletion on cytoadhesion". Certainly the changes in biotinylation suggest a range of interesting possibilities related to the accessibility and topology of proteins within and beyond the PfEMP1 trafficking pathway; however, it is hard to interpret the relationship of these changes to the process in view. For instance, deletion of FIKK4.1 increases biotinylation of several Maurer's clefts proteins in both the PTP4- and KAHRP-TurboID experiments but why this is or whether it is significant for PfEMP1 transport is unclear.

      We agree with the reviewer that we do not definitively confirm the relationship between the changes observed in protein accessibility and the role of FIKK4.1 in PfEMP1 transport. We discuss a number of likely options based on what is known of the candidate genes, but validation would require extensive further work beyond the scope of this paper. We have focussed on demonstrating the value of PerTurboID as a technique for measuring molecular-level changes which would be missed by other methods, providing a list of proteins which are likely involved in modulating FIKK4.1 activity and PfEMP1 trafficking through an interconnected network. We believe the technique will be very useful for understanding gene function in other scenarios. However, we changed the title to be more specific to proteins in the cytoadhesion complex and associated proteins, and not cytoadhesion per se.

    1. Author Response

      Reviewer #1 (Public Review):

      The finding that taste memory formation follows the same or highly similar logic and mechanisms as olfactory memory is very interesting. In particular, the new approach to use an operant learning assay developed by the authors to address this outstanding question in the field is very impressive. The shown data are of high quality and very convincing.

      While the current version will be of clear interest to fly people dissecting memory formation, it might be less accessible outside this immediate field. Below I list my suggestions, questions and criticisms.

      You have developed an operant assay and stress this in the introduction. This is important because it allows you to gain much better inside into how memory is formed and how it is recalled. Nevertheless, I was somewhat disappointed that you did not exploit that aspect more in your study. First, I suggest showing, at least for the initial figures, the traces (e.g. Fig 1D) not only for the test phase but also for the training phase. As you also mention in your discussion, the extent of memory formation will depend critically on the number of pairings during training. And perhaps not only on their number but also on their evolution/change over time. Second, you only show preference indices. I suggest showing the number of actual interactions with the food source in addition. In my opinion and experience, the preference index can be misleading or at least the interpretation might be questioned if the number of actual choices is very low or very high compared to controls or other groups. Third, regarding the same point, you show traces for test phases, but you do not comment or discuss why they might look the way they look. For instance, it appears that in some cases it takes a while to see an actual difference in the preference index while at other times it seems more instantaneously etc.

      We have now added plots showing the preference indices over time during both training and testing for all the experiments in Figures 1 and 2. We also comment in the text on our view of their interpretation. Although we recognize that interesting features of the learning process could be revealed by examining the process over time, we also caution that earlier timepoints are inherently less robust because of smaller sample size to the measurements (flies tend to not take many sips of the food over the first several minutes). Thus, emergence of a preference after a period of time may not reflect an evolution of the preference as much as a firming up of the data as more sips are recorded. As a notable example, our data in Figure 1E,G show close to a zero preference for activation of sweet sensory neurons during the first 10 minutes of training, despite the innately appetitive nature of this manipulation. This is undoubtedly because it takes some time for flies to sample both choices and build up enough interactions to show a clear preference. This is not to say that the curves are never informative, however. For example, it is reassuring to see that activation of PAM neurons does not produce a positive preference at any time during training (Figure 2F).

      We have also added the raw sip/interaction numbers for the experiments in Figure 1 in order to provide an example of how these data relate to the preference. Your concern about reliability differing depending on choice number is certainly warranted (as we also discuss above). However, the raw data does not suggest a major difference in the overall number of choices being made between groups.

      Along the same lines, I am wondering why you do not observe extinction. Frequently if the CS is re-experienced without the US over several trials, you start to see memory fade. The preference traces as well as the actual interactions might help to explain this.

      This is an interesting question, and one that we have certainly wondered about. Our assumption is that the number of exposures to the CS+ during testing is not sufficient to induce extinction. It would be interesting to run a longer testing period to see whether extinction occurs over a longer time course; however, we have not done so at this point.

      You use salt as a negative US. I suggest showing at least one experiment with bitter taste (e.g. quinine) to show how general your finding is to negative conditioning. Your optogenetic data suggests it is.

      We actually never use natural taste stimuli as the US; we only use salt as the CS+ in our appetitive learning experiments. We have revised the figures and figure legends extensively for clarity and one of the changes is to try to make it clearer what is the CS+ and CS- in each experiment.

      You analyze the role of energy state in memory formation. This is very interesting. In light of the importance of feeding state, it would be very helpful to include starvation/metabolic state information not only in the methods but also in the results section (at least briefly).

      We have now indicated in all the figure legends and in the text that flies were all food deprived for 24 hours prior to training.

      Your data convincingly shows that taste memory is formed in the mushroom body. For instance, you show that inhibition of KCs prevents the change in preference. KC inhibition was done during the entire experiment (training and test). Thus, it's important to show how KC inhibition affects (or does not) training vs. test.

      We appreciate the motivation for this suggestion and how extensively this issue has been explored in olfactory classical conditioning. We also agree that it would be interesting to perform this experiment. However, the practical logistics of doing this experiment were not possible with the constraints we were under. We unfortunately don’t currently have the means to operate the STROBE at a temperature high enough to effectively silence neurons using shibire(ts), and silencing with optogenetics is not possible with our current setup either. Thus, we will need to leave this issue unresolved for the time being.

      Along the same lines, how do you envision this memory formation to happen at the circuit level? KCs and DANs are likely activated by CS and US. It would be important to at least include a paragraph in the discussion to clarify this.

      The bulk of our characterization of this assay (including the demonstration that KCs are required) was done with 75 mM NaCl as the CS+ and optogenetic activation of PAM neurons as the US. Previous studies have shown activation of KCs by tastes (Kirkhart and Scott, 2015), so we believe that KCs are being activated by the CS+ and DANs are being activated by the US (in this case directly through optogenetics). Based on a great deal of beautiful work in olfactory classical conditioning, we believe it is likely that this co-incident activation leads to plasticity as KC-MBON synapses, thereby skewing the behaviour in favor of attraction. We have now tried to clarify this mechanism in the paper.

    1. Author Response

      We express our sincere gratitude to the editors and reviewers for their invaluable input. To further improve our manuscript, we have devised a plan to perform additional histological experiments of Bdnf and TrkB expression. Specifically, we will replace the phospho-TrkB antibody with an anti-TrkB antibody to quantify Bdnf/TrkB co-expression. Moreover, we acknowledge the concern raised by the reviewers regarding the clarity of some explanations and the potential influence of alternative mechanisms influencing the defects observed in Bdnf neurons. We aim to provide a clearer explanation and discussion. We also intend to provide a more comprehensive discussion of the limitations of our LM22A-4 drug treatment experiment. By addressing these points, we wish to ensure that our research is informative to the eLife readership.

    2. Reviewer #1 (Public Review):

      Summary:<br /> Rai1 encodes the transcription factor retinoic acid-induced 1 (RAI1), which regulates expression of factors involved in neuronal development and synaptic transmission. Rai1 haploinsufficiency leads to the monogenic disorder Smith-Magenis syndrome (SMS), which is associated with excessive feeding, obesity and intellectual disability. Consistent with findings in human subjects, Rai1+/- mice and mice with conditional deletion of Rai1 in Sim+ neurons, which are abundant in the paraventricular nucleus (PVN), exhibit hyperphagia, obesity and increased adiposity. Furthermore, RAI1-deficient mice exhibit reduced expression of brain-derived neurotrophic factor (BDNF), a satiety factor essential for the central control of energy balance. Notably, overexpression of BDNF in PVN of RAI1-deficient mice mitigated their obesity, implicating this neurotrophin in the metabolic dysfunction these animals exhibit. In this follow up study, Javed et al. interrogated the necessity of RAI1 in BDNF+ neurons promoting metabolic health.

      Consistent with previous reports, the authors observed reduced BDNF expression in the hypothalamus of Rai1+/- mice. Moreover, proteomics analysis indicated impairment in neurotrophin signaling in the mutants. Selective deletion of Rai1 in BDNF+ neurons in the brain during development resulted in increased body weight, fat mass and reduced locomotor activity and energy expenditure without changes in food intake. There was also a robust effect on glycemic control, with mutants exhibiting glucose intolerance. Selective depletion of RAI1 in BDNF+ neurons in PVN in adult mice also resulted in increased body weight, reduced locomotor activity, and glucose intolerance without affecting food intake. Blunting RAI1 activity also leads to increases and decreases in the inhibitory tone and intrinsic excitability, respectively, of BDNF+ neurons in the PVN.

      Strengths:<br /> Overall, the experiments are well designed and multidisciplinary approaches are employed to demonstrate that RAI1 deficits in BDNF+ neurons diminish hypothalamic BDNF signaling and produce metabolic dysfunction. The most significant advance relative to previous reports is the finding from electrophysiological studies showing that blunting RAI1 activity leads to increases and decreases the inhibitory tone and intrinsic excitability, respectively, of BDNF+ neurons in the PVN. Furthermore, that intact RAI1 function is required in BDNF+ neurons for the regulation of glucose homeostasis.

      Weaknesses:<br /> Some of the data need to be reconciled with previous findings by others. For example, the authors report that more than 50% of BDNF+ neurons in PVN also express pTrkB whereas about 20% of pTrkB+ cells contain BDNF, raising the possibility that autocrine mechanisms might be at play. This is in conflict with a previous study by An et al, (2015) showing that these cell populations are largely non-overlapping in the PVN.

      Another issue that deserves more in depth discussion is that diminished BDNF function appears to play a minor part driving deficits in energy balance regulation. Accordingly, both global central depletion of Rai1 in BDNF+ neurons during development and deletion of Rai1 in BDNF+ neurons in the adult PVN elicited modest effects on body weight (less than 18% increase) and did not affect food intake. This contrasts with mice with selective Bdnf deletion in the adult PVN, which are hyperphagic and dramatically obese (90% heavier than controls). Therefore, the results suggest that deficits in RAI1 in PVN or the whole brain only moderately affect BDNF actions influencing energy homeostasis and that other signaling cascades and neuronal populations play a more prominent role driving the phenotypes observed in Rai1+/- mice, which are hyperphagic and 95% heavier than controls. The results from the proteomic analysis of hypothalamic tissue of Rai1 mutant mice and controls could be useful in generating alternative hypotheses.

      Depleting RAI1 in BDNF+ neurons had a robust effect compromising glycemic control. However, as the approach does not necessarily impact BDNF exclusively, there should be a larger discussion of alternative mechanisms.

    3. Reviewer #3 (Public Review):

      Summary:<br /> Smith-Magenis syndrome (SMS) is associated with obesity and is caused by deletion or mutations in one copy of the Rai1 gene which encodes a transcriptional regulator. Previous studies have shown that Bdnf gene expression is reduced in the hypothalamus of Rai1 heterozygous mice. This manuscript by Javed et al. further links SMS-associated obesity with reduced Bdnf gene expression in the PVH.

      Strengths:<br /> The authors show that deletion of the Rai1 gene in all BDNF-expressing cells or just in the PVH BDNF neurons postnatally caused obesity. Interestingly, mutant mice displayed sexual dimorphism in the cause for the obesity phenotype. Overall, the data are well presented and convincing except the data from LM22A-4.

      Weaknesses:<br /> 1. The most serious concern is about data from LM22A-4 administration experiments (Figure 5 and associated supplemental figures). A rigorous study has demonstrated that LM22A-4 does not activate TrkB (Boltaev et al., Science Signaling, 2017), which is consistent with unpublished results from many labs in the neurotrophin field. It is tricky to interpret body weight data from pharmacological studies because compounds always have some side effects, which can reduce body weight non-specifically.

      2. The resolution of all figures are poor, and thus I could not judge the quality of the micrographs.

      3. Citation of the literature is not precise. The study by An et al. (2015) shows that deletion of the Bdnf gene in the PVH leads to obesity due to increased food intake and reduced energy expenditure (not just hyperphagic obesity; Line 72). Furthermore, the study by Unger et al. (2017) carried out Bdnf deletion in the VMH and DMH using AAV-Cre and did not discuss SF1 neurons at all (Line 354). The two studies by Yang et al. (Mol Endocrinol, 2016) and Kamitakahara et al. (Mol Metab, 2015) did use SF1-Cre to delete the Bdnf gene and did not observe any obesity phenotype.

      4. Animal number is not described in many figure legends.

    4. eLife assessment

      This valuable study informs whether diminishing BDNF expression or alterations in the activity of BDNF-containing neurons in the paraventricular nucleus of the hypothalamus contributes to metabolic alterations in individuals with reduced RAI1 function, including those afflicted with Smith-Magenis syndrome (SMS). The evidence supporting the conclusions is compelling in that RAI1 deficits in BDNF-containing neurons partly contribute, with prominent effects on glycemic control and modest effects on feeding and body weight regulation, however, the histological analyses of BDNF and TrkB expressions are inadequate. This study would be of interest to neuroscientists and medical biologists working on metabolic disorders such as obesity and diabetes, as the findings in this study further links SMS-associated obesity with reduced Bdnf gene expression in the PVH and shed light on the role of the Rai1 gene in the PVH Bdnf neurons and offer a basis for future therapeutic strategies for managing obesity in SMS.

    5. Reviewer #2 (Public Review):

      Understanding disease conditions often yields valuable insights into the physiological regulation of biological functions, as well as potential therapeutic approaches. In previous investigations, the author's research group identified abnormal expression of brain-derived neurotrophic factor (BDNF) in the hypothalamus of a mouse model exhibiting Smith-Magenis syndrome (SMS), which is caused by heterozygous mutations of the Rai1 gene. Human SMS is associated with distinct facial characteristics, sleep disturbances, behavioral issues, and intellectual disabilities, often accompanied by obesity. Conditional knockout (cKO) of the Bdnf gene from the paraventricular hypothalamus (PVH) in mice led to hyperphagic obesity, while overexpression of the Bdnf gene in the PVH of Rai1 heterozygous mice restored the SMS-like obese phenotype. Based on these preceding findings, the authors of the present study discovered that homozygous Rai1 cKO restricted to Bdnf-expressing cells, or Rai1 gene knockdown solely in Bdnf-positive neurons in the PVH, induced obesity along with intricate alterations in adipose tissue composition, energy expenditure, locomotion, feeding patterns, and glucose tolerance, some of which varied between sexes. Additionally, the authors demonstrated that a brain-penetrating drug capable of activating the TrkB pathway, a downstream signaling pathway of BDNF, partially alleviated the SMS-like obesity phenotype in female mice with Rai1 heterozygous mutations. Although the specific (neural) cell type responsible for this TrkB signaling remains an open question, the present study unequivocally highlights the importance of Rai1 gene function in PVH Bdnf neurons for the obesity phenotype, providing valuable insights into potential therapeutic strategies for managing obesity associated with SMS.

      In the proteomic analysis (Fig. 1), the authors elucidated that multiple phospho-protein signaling pathways, including Akt and mTOR pathways, exhibited significant attenuation in the SMS model mice. Of significance, the manifestation of haploinsufficiency of the Rai1 gene exclusively within the BDNF+ cells demonstrated negligible impact on body weight (Fig. 2-supple 3D), despite observing a reduction in BDNF levels in the heterozygous Rai1 mutant (Fig. 1A). Conversely, the homozygous Rai1 cKO in the BDNF+ cells prominently displayed an obesity phenotype, suggesting substantial dissimilarities in the gene expression profiles between Rai1 heterozygous and homozygous conditions within the BDNF+ cell population. It would be advantageous to precisely identify the responsible differentially expressed genes, possibly including Bdnf itself, in the homozygous cKO model. The observed reduction in the excitability of PVH BDNF+ cells (Fig. 3) is presumably attributed to aberrant gene expression other than Bdnf itself, which may serve as a prospective target for gene expression analysis. Notably, the Rai1 homozygous cKO mice in BDNF+ cells exhibited some sexual dimorphisms in feeding and energy expenditures, as evidenced by Fig. 2 and related figures. Exploring the potential relevance of these sexual differences to human SMS cases and investigating the underlying cellular/molecular mechanisms in the future would provide valuable insights.

      Although the CRISPR-mediated knockdown of the Rai1 gene (Fig. 4) appears to be highly effective, given the broad transduction of AAV serotype 9, it may be helpful to exclude the possibility of other brain regions adjacent to the PVH, such as the DMH or VMH, being affected by this viral procedure. If the PVH-specificity is established, the majority of Rai1 cKO effects in Bdnf+ cells are primarily attributed to PVH-Bdnf+ cells based on the similarity of phenotypes observed. With regards to the apparent rescue of the body weight phenotype in Rai1 heterozygous mutants using a selective TrkB activator, the specific biological processes, and neurons responsible for this effect remain unclear to this reviewer. Elucidating these aspects would be significant when considering potential applications to human SMS cases.

      Overall, the present study represents a valuable addition to the authors' series of high-quality molecular genetic investigations into the in vivo functions of the Rai1 gene. This reviewer particularly commends their diligent efforts to enhance our comprehension of SMS and contribute to the future development of more effective therapies for this syndrome.

    1. Author Response

      We would like to thank the editors and reviewers for their thoughtful comments on our manuscript. Before we can provide a point-by-point response and submit a revised version of the manuscript we would like to provisionally address and alleviate some of their main concerns.

      A concern was expressed in the ‘eLife assessment’ and by two of the reviewers that a potential confound between the coding of sensory information and behavior outcome by IC neurons might have been introduced by combining data across different sound levels, which could challenge the conclusions of the study. In addressing this we have carried out the analysis (i.e. averaging the neural activity separately for different sound levels) suggested for distinguishing between the two alternative explanations offered by reviewer #1: That the difference in neural activity between hit and miss trials reflects a) behavior or b) sound level (more precisely: differences in response magnitude arising from a higher proportion of highsound-level trials in the hit trial group than in the miss trial group). If the data favored b), we would expect no difference in activity between hit and miss trials when plotted separately for different sound levels. The figure in Author response image 1 indicates that that is not the case. Hit and miss trial activity are clearly distinct even when plotted separately for different sound levels, confirming that this difference in activity reflects the animals’ behavior rather than sensory information.

      Author response image 1.

      A related concern was expressed with regards to the decoding analysis. Namely, that differences in the distributions of sound levels in the different trial types could confound the decoding into hit and miss trials and that, consequently, the results of the decoding analysis merely reflect differences in the processing of sound level. Our analysis actually aimed to take this into account but, unfortunately, we failed to include sufficient details in the methods section of the submitted manuscript. Rather than including all the trials in a given session, only trials of intermediate difficulty were used for the decoding analysis. More specifically, we only included trials across five sound levels, comprising the lowest sound level that exceeded a d prime of 1.5 plus the two sound levels below and above that level. That ensured that differences in sound level distributions would be small, while still giving us a sufficient number of trials to perform the decoding analysis. In this context, it is worth bearing in mind that a) the decoding analysis was done on a frame-by-frame basis, meaning that the decoding score achieved early in the trial has no impact on the decoding score at later time points in the trial, b) sound-driven activity can be observed predominantly immediately after stimulus onset and is largely over about 1 s into the trial (see cluster 3, for instance, or average miss trial activity in the plots above), c) decoding performance of the behavioral outcome starts to plateau 5001000 ms into the trial and remains high until it very gradually begins to decline after about 2 s into the trial. In other words, decoding performance remains high far longer than the stimulus would be expected to have an impact on the neurons’ activity. Therefore, we would expect any residual bias due to differences in the sound level distribution that our approach did not control for to be restricted to the very beginning of the trial and not to meaningfully impact the conclusions derived from the decoding analysis.

      Another concern expressed in the reviews is that, in relation to the cluster-wise analysis of neural activity, no direct comparison (beyond the pie charts of Figure 5C) was provided between data from lesioned and non-lesioned groups, leaving unclear how similar taskrelevant activity is between these groups. In Author response image 2 we plot, analogous to Figure 5B, the average hit and miss trial activity for the 10 clusters separately for lesioned and non-lesioned mice, illustrating more clearly the high degree of similarity between the two groups.

      Author response image 2.

    2. Reviewer #1 (Public Review):

      The inferior colliculus (IC) is the central auditory system's major hub. It integrates ascending brainstem signals to provide acoustic information to the auditory thalamus. The superficial layers of the IC ("shell" IC regions as defined in the current manuscript) also receive a massive descending projection from the auditory cortex. This auditory cortico-collicular pathway has long fascinated the hearing field, as it may provide a route to funnel "high-level" cortical signals and impart behavioral salience upon an otherwise behaviorally agnostic midbrain circuit.

      Accordingly, IC neurons can respond differently to the same sound depending on whether animals engage in a behavioral task (Ryan and Miller 1977; Ryan et al., 1984; Slee & David, 2015; Saderi et al., 2021; De Franceschi & Barkat, 2021). Many studies also report a rich variety of non-auditory responses in the IC, far beyond the simple acoustic responses one expects to find in a "low-level" region (Sakurai, 1990; Metzger et al., 2006; Porter et al., 2007). A tacit assumption is that the behaviorally relevant activity of IC neurons is inherited from the auditory cortico-collicular pathway. However, this assumption has never been tested, owing to two main limitations of past studies:

      1) Prior studies could not confirm if data were obtained from IC neurons that receive monosynaptic input from the auditory cortex.

      2) Many studies have tested how auditory cortical inactivation impacts IC neuron activity; the consequence of cortical silencing is sometimes quite modest. However, all prior inactivation studies were conducted in anesthetized or passively listening animals. These conditions may not fully engage the auditory cortico-collicular pathway. Moreover, the extent of cortical inactivation in prior studies was sometimes ambiguous, which complicates interpreting modest or negative results.

      Here, the authors' goal is to directly test if auditory cortex is necessary for behaviorally relevant activity in IC neurons. They conclude that surprisingly, task relevant activity in cortico-recipient IC neuron persists in absence of auditory cortico-collicular transmission. To this end, a major strength of the paper is that the authors combine a sound-detection behavior with clever approaches that unambiguously overcome the limitations of past studies.

      First, the authors inject a transsynaptic virus into the auditory cortex, thereby expressing a genetically encoded calcium indicator in the auditory cortex's postsynaptic targets in the IC. This powerful approach enables 2-photon Ca2+ imaging from IC neurons that unambiguously receive monosynaptic input from auditory cortex. Thus, any effect of cortical silencing should be maximally observable in this neuronal population. Second, they abrogate auditory cortico-collicular transmission using lesions of auditory cortex. This "sledgehammer" approach is arguably the most direct test of whether cortico-recipient IC neurons will continue to encode task-relevant information in absence of descending feedback. Indeed, their method circumvents the known limitations of more modern optogenetic or chemogenetic silencing, e.g. variable efficacy.

      I also see three weaknesses which limit what we can learn from the authors' hard work, at least in the current form. I want to emphasize that these issues do not reflect any fatal flaw of the approach. Rather, I believe that their datasets likely contain the treasure-trove of knowledge required to completely support their claims.

      1. The conclusion of this paper requires the following assumption to be true: That the difference in neural activity between Hit and Miss trials reflects "information beyond the physical attributes of sound." The data presentation complicates asserting this assumption. Specifically, they average fluorescence transients of all Hit and all Miss trials in their detection task. Yet, Figure 3B shows that mice's d' depends on sound level, and since this is a detection task the smaller d' at low SPLs presumably reflects lower Hit rates (and thus higher Miss rates). As currently written, it is not clear if fluorescence traces for Hits arise from trials where the sound cue was played at a higher sound level than on Miss trials. Thus, the difference in neural activity on Hit and Miss trials could indeed reflect mice's behavior (licking or not licking). But in principle could also be explained by higher sound-evoked spike rates on Hit compared to Miss trials, simply due to louder click sounds. Indeed, the amplitude and decay tau of their indicator GCaMP6f is non-linearly dependent on the number and rate of spikes (Chen et al., 2013), so this isn't an unreasonable concern.

      2. The authors' central claim effectively rests upon two analyses in Figures 5 and 6. The spectral clustering algorithm of Figure 5 identifies 10 separate activity patterns in IC neurons of control and lesioned mice; most of these clusters show distinct activity on averaged Hit and Miss trials. They conclude that although the proportions of neurons from control and lesioned mice in certain clusters deviates from an expected 50/50 split, neurons from lesioned mice are still represented in all clusters. A significant issue here is that in addition to averaging all Hits and Miss trials together, the data from control and lesioned mice are lumped for the clustering. There is no direct comparison of neural activity between the two groups, so the reader must rely on interpreting a row of pie charts to assess the conclusion. It's unclear how similar task relevant activity is between control and lesioned mice; we don't even have a ballpark estimate of how auditory cortex does or does not contribute to task relevant activity. Although ideally the authors would have approached this by repeatedly imaging the same IC neurons before and after lesioning auditory cortex, this within-subjects design may be unfeasible if lesions interfere with task retention. Nevertheless, they have recordings from hundreds to thousands of neurons across two groups, so even a small effect should be observable in a between-groups comparison.

      3. In Figure 6, the authors show that logistic regression models predict whether the trial is a Hit or Miss from their fluorescence data. Classification accuracy peaks rapidly following sound presentation, implying substantial information regarding mice's actions. The authors further show that classification accuracy is reduced, but still above chance in mice with auditory cortical lesions. The authors conclude from this analysis task relevant activity persists in absence of auditory cortex. In principle I do not disagree with their conclusion.

      The weakness here is in the details. First, the reduction in classification accuracy of lesioned mice suggests that auditory cortex does nevertheless transmit some task relevant information, however minor it may be. I feel that as written, their narrative does not adequately highlight this finding. Rather one could argue that their results suggest redundant sources of task-relevant activity converging in the IC. Secondly, the authors conclude that decoding accuracy is impaired more in partially compared to fully lesioned mice. They admit that this conclusion is at face value counterintuitive, and provide compelling mechanistic arguments in the Discussion. However, aside from shaded 95% CIs, we have no estimate of variance in decoding accuracy across sessions or subjects for either control or lesioned mice. Thus we don't know if the small sample sizes of partial (n = 3) and full lesion (n = 4) groups adequately sample from the underlying population. Their result of Figure 6B may reflect spurious sampling from tail ends of the distributions, rather than a true non-monotonic effect of lesion size on task relevant activity in IC.

    1. Author Response

      Reviewer #1 (Public Review):

      Amyotrophic lateral sclerosis (ALS) is a neurodegenerative disorder leading to the loss of innervation of skeletal muscles, caused by the dysfunction and eventual death of lower motor neurons. A variety of approaches have been taken to treat this disease. With the exception of three drugs that modestly slow progression, most therapeutics have failed to provide benefit. Replacing lost motor neurons in the spinal cord with healthy cells is plagued by a number of challenges, including the toxic environment, inhibitory cues that prevent axon outgrowth to the periphery, and proper targeting of the axons to the correct muscle groups. These challenges seem to be well beyond our current technological approaches. Avoiding these challenges altogether, Bryson et al. seek to transplant the replacement motor neurons into the peripheral nerves, closer to their targets. The current manuscript addresses some of the challenges that will need to be overcome, such as immune rejection of the allograft and optimizing maturation of the neuromuscular junction.

      Bryson et al. begin by examining the survival of mESC-derived motor neurons allografted into SOD1 mice. The motor neurons, made on a 129S1/SvImJ, were transplanted into the tibial nerve of SOD1 mice on a C57BL/6J background. Without immunosuppression, most cells were lost between 14 and 35 days, suggesting an immune response had eliminated them. Tacrolimus prevented cell loss, but it also inhibited innervation of the muscle. It also uncovered the tumorigenic potential of contaminating pluripotent cells. In contrast, immunosuppression using H57-597, an antibody targeting T-cell receptor beta, prevented graft rejection while permitting some innervation of muscle. Pretreatment of the cells with mitomycin-C eliminated pluripotent cells, preventing tumor formation. The authors noted that this combination only innervated ~10% of endplates, likely due to the fact that the implanted motor neurons are not active.

      The authors then began the process of optimizing the cells themselves, using measurements taken in late-stage SOD1 mice. Fast-firing and slow-firing populations of neurons were first compared. Using optical stimulation, these two cell types appeared to be similar. The authors opted to use slow-firing neurons in the subsequent experiments. Recognizing that neuromuscular junction (NMJ) innervation and maintenance are dependent on motor neuron activity, implantable optical stimulators were also evaluated. 14 days after transplanting the cells, optical stimulation training was initiated for one hour each day. This training led to a nearly 13-fold increase in force generation, although this still remained well below the force generated by electrical stimulation. The enhanced innervation also prevented the atrophy of muscle fibers caused by denervation.

      Overall, the data for the function of the implanted cells are convincing. The dCALMS technique that the authors have developed is quite interesting and will likely be applicable to analyze muscles for other therapeutics. The identification of calcineurin inhibitors as inhibitors of reinnervation will also be important for the development of other cell-based therapeutics for ALS.

      This is an excellent summary of the state of the field of ALS therapy development and provides a clear rationale for our novel therapeutic strategy, in the near-complete vacuum of conventional treatment options for patients suffering from this devastating disorder. We are delighted that the Reviewer clearly appreciated the value of our alternative therapeutic strategy and found our supporting data to be convincing, as well as drawing attention to the dCALMS technique, which we agree could be of significant value in the investigation of other therapeutic strategies aimed at restoring muscle innervation. We are extremely grateful for the Reviewer’s diligence in assessing our manuscript.

      However, there are some issues that should be addressed. These include some common misconceptions about ALS. While ALS is split into familial and sporadic forms based on the presence or absence of a family history of the disease, mutations in the known ALS-associated genes are found in both forms [1]. The authors also state that exercise programs are likely to accelerate degeneration in ALS. This is incorrect. Moderate exercise is part of the current guidelines for treating ALS, and mouse studies have demonstrated a therapeutic effect of moderate exercise [2]. Regarding the experimental design, there are some important details missing. The animals do not appear to have been operated on at the same age, and the criteria for when to perform the operation were not described. A similar problem exists for when the animals were determined to reach endpoint [3]. The authors also do not seem to address a potential pitfall of this approach: acceleration of the disease process. Indeed, some of the data comparing the ipsilateral side to the contralateral side suggest that the implantation of the cells and/or the light source increase the denervation of the muscle [4]. Finally, there is a fairly large difference between the motor output provided by optical stimulation relative to electrical stimulation. It is currently unclear what level needs to be reached to provide an effective response in the intact animal. Thus, it is difficult to determine if the level of reinnervation that this study has achieved will be sufficient to improve a patient's quality of life [5].

      The Reviewer raises some extremely important points and highlights some additional constructive issues where more clarity is required (numbered 1-5 above). We have attempted to address each of these points in order to strengthen the key message of our study and the integrity of our manuscript:

      1) The Reviewer is absolutely correct in highlighting that causative mutations in identified genes occur in both sporadic and familial forms of ALS and that this classification simply reflects whether or not there is a known family history of the disease (which can also encompass a spectrum of disorders including frontotemporal degeneration). We will revise our manuscript in order to be more accurate and provide clarity on this important point.

      2) Regarding the potential acceleration of muscle denervation, we specifically state that the use of electrical nerve stimulation (ENS) to artificially evoke muscle contraction has been shown to accelerate denervation of the diaphragm muscle in clinical trials aimed at maintaining respiratory function in ALS patients, which significantly shortened life-expectancy. It was not our intention to imply that moderate voluntary exercise, as opposed to artificial “ENS-based” muscle stimulation programmes, could accelerate muscle denervation. Indeed, the negative side-effects of ENS that we highlighted provide a clear rationale for developing a safer alternative to artificially control muscle function once innervation by endogenous motor neurons progressively deteriorates in ALS patients; specifically, our selection of optogenetic nerve stimulation (ONS), which is highly selective to the engrafted light-sensitive motor neurons, recruits motor units in correct physiological order and avoids rapid muscle fatigue potentially overcomes the safety concerns associated with ENS.

      Importantly, unchecked disease progression means that complete paralysis of almost all muscles will eventually occur, due to loss of upper or lower motor neurons and accompanying muscle denervation, which would eventually preclude the ability of ALS patients to undertake voluntary exercise programmes, or even activities of daily life. Our approach is aimed at overcoming this inevitable loss of voluntary muscle control and onset of complete paralysis by providing a safe and effective method of artificially maintaining control of targeted muscles that would otherwise become completely paralyzed, as well as preventing their irreversible atrophy.

      To avoid the possibility that readers may infer that we are suggesting voluntary exercise programs accelerate degeneration in ALS and to provide additional clarity, we will revise the manuscript to stress that we specifically refer to “ENS-based” exercise programmes in relation to acceleration of muscle denervation.

      3) Regarding our experimental design, the congenic B6.SOD1G93A mouse model of ALS is an extremely well-characterized model, with a highly consistent timeframe of disease phenotype manifestation and progression. In order to maximize the translational value of our study, we selected an early post-symptom onset timepoint (95d +/- 4.6 days) that mirrors a time at which human ALS patients would be likely to benefit from the therapeutic strategy: in the vast majority of cases, it is not possible to treat humans until a diagnosis of ALS has been confirmed, which can often take up to 12 months from first presentation. Importantly, ALS patients in the final stages of disease progression would be unlikely to be suitable for this therapy, due to irreversible muscle atrophy, which would preclude the ability of the engrafted motor neurons to form functionally useful connections. Indeed, our strategy is to engraft the replacement motor neurons prior to severe muscle atrophy occurs, so that they are in place to compensate and take over the function of endogenous lower motor neurons as they progressively degenerate and paralysis ensues. In so doing, the replacement motor neurons could prevent the irreversible atrophy of targeted muscles through ONS-based exercise programmes and thereby indefinitely extend the ability of targeted muscles to perform functionally useful movements.

      Although the initial graft optimization component of this study, including the tacrolimus trial, was performed across a variety of disease stages (commencing between 57-101 days of age), once we identified the H57-596 monoclonal antibody as an effective means to promote graft survival (without interfering with target muscle innervation), all subsequent grafts were initiated at an early symptom onset timepoint: 95.7 ± 4.6 days for slow-firing motor neuron grafts and 106.8 ± 7.2 days for fast-firing motor neuron grafts. Transgenic SOD1G93A mice were specifically bred for this study and due to complexities of coordinating stem cell differentiation and motor neuron production, optical stimulation device production and access to surgical facilities, with timed matings set up 3-4 months in advance, we feel that this age range was acceptable and doesn’t detract from the findings of our study.

      Similarly, we made every effort to ensure that experimental end-point was consistent, at 133 ±8 days for all grafts involving H57-597 administration, which reflects translationally-relevant late-stage disease progression. Since the physiological experiments performed as part of this study are extremely time-consuming, it was necessary to stagger the experimental end-point over several days. Again, we feel that this range is acceptable and still reflects a consistent, translationally-relevant timepoint. Importantly, since the experimental paradigm tested in this study was aimed at individually targeted muscles, which would have been unlikely to have an effect on disease duration or survival, we did not feel that it was ethically justifiable to allow the B6.SOD1G93A mice to approach end-stage disease (which occurs at an average age of 150 days of age in this model).

      In the interests of full transparency, the age at which treatment commenced and the experimental end-point for every animal used in this study is reported in Supplementary Tables 2 and 3.

      4) The Reviewer raises an extremely pertinent question, regarding whether the engrafted motor neurons themselves, or the implanted stimulation device, may accelerate the progressive loss of innervation of targeted muscles by endogenous motor neurons, in light of our data that shows decreased force evoked by electrical stimulation of ipsilateral (engrafted) versus contralateral (control) muscles. It is worth noting that supramaximal electrical nerve stimulation, used to evoke maximal muscle force, should activate both endogenous and engrafted motor neurons, therefore the combined activation of both populations would be expected to result in a summative (greater) contractile response. The fact that we see the converse is unlikely to be due to an accelerated loss of endogenous motor innervation as a result of the engrafted cells, but is much more likely to be caused by physical nerve damage during the surgical engraftment process: we used a customized Hamilton syringe with a 29G needle to manually inject the cells into the targeted nerve branches, which has an outer diameter of 330μm whilst the diameter of the tibial nerve in an adult mouse is approximately 400μm. This is likely to have led to damage of the endogenous motor (and potentially sensory) axons that may have diminished regenerative capacity due to ongoing disease mechanisms. Fortunately, there is significant scope to refine the engraftment procedure by using smaller gauge needles (potentially made of more flexible materials), bespoke injection systems that can deliver the cells at a controlled rate and micromanipulators that avoid can avoid nerve damage caused by excessive movement of the needle within the nerve. Importantly, the significantly greater scale of human nerves, compared to murine nerves targeted in this study, would also be a significant advantage in terms of physically delivering the cells in ALS patients.

      5) The Reviewer’s final comment is entirely justified given that, even in the best cases following optical stimulation training of engrafted SOD1G93A mice, optical stimulation still evoked less contractile force than supramaximal electrical stimulation. The likely reasons for this are complex: there is almost certainly scope to further optimize the optical stimulation training paradigm, which could result in reinforcement of the de novo neuromuscular junctions formed between the engrafted motor neurons and targeted muscle fibres; it is possible that the expression level of the channelrhodopsin-2 protein at the cell surface may require optimization in order to reliably initiate action potentials in the engrafted motor neurons – development of newer channelrhodopsin variants may resolve this potential issue, whilst providing additional advantages (such as enabling transcutaneous stimulation) at the same time. Finally, the maximum contractile response of the triceps surae muscle elicited by optical stimulation that we observed was approximately 13g, which equates to approximately 50% of the body mass of an adult SOD1G93A mouse. Although this is only approximately 10% of the maximal contractile force of a wild-type triceps surae muscle, this would almost certainly provide the ability to perform functionally useful motor tasks if it could be reproduced in ALS patients, particularly if large numbers of targeted muscles could be controlled in a coordinated manner, something that we are actively working on.

      Reviewer #2 (Public Review):

      The authors provide convincing evidence that optogenetic stimulation of ChR2-expressing motor neurons implanted in muscles effectively restores innervation of severely affected skeletal muscles in the aggressive SOD1 mouse model of ALS, and conclude that this method can be applied to selectively control the function of implicated muscles. This was supported by convincing data presented in the paper.

      This is an interesting paper providing new/improved optogenetic methods to restore or improve muscle strength in ALS. In general, it is of high significance in both the techniques and concept, and the paper was well written. The evidence supporting the conclusions is convincing, with rigorous muscle tension physiological analysis, and nerve and muscle histology and image analysis. The work will be of broad interest to medical biologists on muscle disorders.

      One weak point is that proper control experiments were not clearly presented - these could be shown in the paper. For example, one control experiment with only YFP but no ChR2 expression with optogenetic stimulation should be performed, following similar procedures and analysis applied to the ChR2-transduced animals.

      We are extremely grateful for the Reviewer’s expert appraisal of our manuscript and we are delighted to hear that they found our study to be highly significant, of broad interest and that our supporting evidence for this novel therapeutic approach was convincing and rigorous.

      Regarding the inclusion of suggested control experiments, we have extensive negative results data from physiological recordings of muscles in response to optical stimulation in animals where the engrafted motor neurons were rejected (prior to our identification of a 100% effective immunosuppression regimen). This clearly revealed that, in the absence of ChR2-expressing motor neurons, optical stimulation does not elicit any response from the target muscle. However, we do not feel that inclusion of this negative data, which is entirely predictable, would have strengthened the findings of our study. Similarly, if we had engrafted motor neurons that only express YFP, we would have been unable to elicit any muscle contractile activity in response to optical stimulation. As a control, this may have some value in determining the ability of motor neurons derived from other cell lines that do not express ChR2 to survive and innervate target muscles but we don’t feel that the additional work would get us closer to achieving our ultimate goal of using motor neuron replacement in combination with optogenetic stimulation to restore/maintain muscle function in ALS patients. Moreover, the complex and iterative process of developing the cell line used in this study (reported in detail in our previous study) would make it extremely difficult to produce a suitable control stem cell line expressing only YFP. Having said that, we are actively in the process of developing new, more sophisticated human and mouse stem cell lines, using more translationally-relevant gene targeting methods to stably knock-in a variety of updated channelrhodopsin variants that may have superior properties for our approach. This will be reported in follow up study/studies as we feel that it goes well beyond the scope of the current study.

    2. Reviewer #1 (Public Review):

      Amyotrophic lateral sclerosis (ALS) is a neurodegenerative disorder leading to the loss of innervation of skeletal muscles, caused by the dysfunction and eventual death of lower motor neurons. A variety of approaches have been taken to treat this disease. With the exception of three drugs that modestly slow progression, most therapeutics have failed to provide benefit. Replacing lost motor neurons in the spinal cord with healthy cells is plagued by a number of challenges, including the toxic environment, inhibitory cues that prevent axon outgrowth to the periphery, and proper targeting of the axons to the correct muscle groups. These challenges seem to be well beyond our current technological approaches. Avoiding these challenges altogether, Bryson et al. seek to transplant the replacement motor neurons into the peripheral nerves, closer to their targets. The current manuscript addresses some of the challenges that will need to be overcome, such as immune rejection of the allograft and optimizing maturation of the neuromuscular junction.

      Bryson et al. begin by examining the survival of mESC-derived motor neurons allografted into SOD1 mice. The motor neurons, made on a 129S1/SvImJ, were transplanted into the tibial nerve of SOD1 mice on a C57BL/6J background. Without immunosuppression, most cells were lost between 14 and 35 days, suggesting an immune response had eliminated them. Tacrolimus prevented cell loss, but it also inhibited innervation of the muscle. It also uncovered the tumorigenic potential of contaminating pluripotent cells. In contrast, immunosuppression using H57-597, an antibody targeting T-cell receptor beta, prevented graft rejection while permitting some innervation of muscle. Pretreatment of the cells with mitomycin-C eliminated pluripotent cells, preventing tumor formation. The authors noted that this combination only innervated ~10% of endplates, likely due to the fact that the implanted motor neurons are not active.

      The authors then began the process of optimizing the cells themselves, using measurements taken in late-stage SOD1 mice. Fast-firing and slow-firing populations of neurons were first compared. Using optical stimulation, these two cell types appeared to be similar. The authors opted to use slow-firing neurons in the subsequent experiments. Recognizing that neuromuscular junction (NMJ) innervation and maintenance are dependent on motor neuron activity, implantable optical stimulators were also evaluated. 14 days after transplanting the cells, optical stimulation training was initiated for one hour each day. This training led to a nearly 13-fold increase in force generation, although this still remained well below the force generated by electrical stimulation. The enhanced innervation also prevented the atrophy of muscle fibers caused by denervation.

      Overall, the data for the function of the implanted cells are convincing. The dCALMS technique that the authors have developed is quite interesting and will likely be applicable to analyze muscles for other therapeutics. The identification of calcineurin inhibitors as inhibitors of reinnervation will also be important for the development of other cell-based therapeutics for ALS.

      However, there are some issues that should be addressed. These include some common misconceptions about ALS. While ALS is split into familial and sporadic forms based on the presence or absence of a family history of the disease, mutations in the known ALS-associated genes are found in both forms. The authors also state that exercise programs are likely to accelerate degeneration in ALS. This is incorrect. Moderate exercise is part of the current guidelines for treating ALS, and mouse studies have demonstrated a therapeutic effect of moderate exercise. Regarding the experimental design, there are some important details missing. The animals do not appear to have been operated on at the same age, and the criteria for when to perform the operation were not described. A similar problem exists for when the animals were determined to reach endpoint. The authors also do not seem to address a potential pitfall of this approach: acceleration of the disease process. Indeed, some of the data comparing the ipsilateral side to the contralateral side suggest that the implantation of the cells and/or the light source increase the denervation of the muscle. Finally, there is a fairly large difference between the motor output provided by optical stimulation relative to electrical stimulation. It is currently unclear what level needs to be reached to provide an effective response in the intact animal. Thus, it is difficult to determine if the level of reinnervation that this study has achieved will be sufficient to improve a patient's quality of life.

    3. Reviewer #2 (Public Review):

      The authors provide convincing evidence that optogenetic stimulation of ChR2-expressing motor neurons implanted in muscles effectively restores innervation of severely affected skeletal muscles in the aggressive SOD1 mouse model of ALS, and conclude that this method can be applied to selectively control the function of implicated muscles. This was supported by convincing data presented in the paper.

      This is an interesting paper providing new/improved optogenetic methods to restore or improve muscle strength in ALS. In general, it is of high significance in both the techniques and concept, and the paper was well written. The evidence supporting the conclusions is convincing, with rigorous muscle tension physiological analysis, and nerve and muscle histology and image analysis. The work will be of broad interest to medical biologists on muscle disorders.

      One weak point is that proper control experiments were not clearly presented - these could be shown in the paper. For example, one control experiment with only YFP but no ChR2 expression with optogenetic stimulation should be performed, following similar procedures and analysis applied to the ChR2-transduced animals.

    1. Author Response

      We appreciate very much your positive assessment and the comments of the two reviewers, all of which will greatly help us to improve our manuscript. In response, therefore, to these constructive comments we will take pleasure in submitting a revised manuscript during the next step of publication.

      We take the opportunity to provide a provisional author response.

      As for Reviewer #1.

      We thank Reviewer 1 very much for her/his very positive and detailed remarks, all of which will be introduced into the revised version of our manuscript.

      We will add the information about the biological control on the development of phosphatic-shelled brachiopod columns in the introduction so that our late narrative can be more understandable. The Cambrian Explosion is the innovation of metazoan body plans and radiation of animals during a relatively short geological time. The expansion of new body plans in different groups of brachiopods in early Cambrian was likely driven by the Cambrian Explosion. The columnar shell structures are not developed in living lingulate brachiopods, and thus it is important to get a better understanding of this extinct shell architecture from the fossil records in order to study the evolutionary trend of shell structures and compositions in brachiopods. Furthermore, the adaptive innovation of biomineralized columns in early brachiopod will be discussed in the revised manuscript.

      As for Reviewer #2.

      We thank Reviewer 2 very much for her/his very constructive and detailed remarks. All the comments have been thoroughly considered, and most of them will be introduced into the revised version of the manuscript.

      We agree that the knowledge is incomplete on the shell structures of early linguliform brachiopods and more research shall be helpful. We also express the idea in the first part of our manuscript that the shell structural complexity and diversity of linguliform brachiopods (especially their fossil representatives) require further studies. As the shell structure and biomineralization process are crucial to unravel the poorly resolved phylogeny and early evolution of Brachiopoda, in this paper, we undertake a primary study of exquisitely well-preserved brachiopods from the Cambrian Series 2. The morphologies, shapes and sizes of cylindrical columns are described in details in this research, and this work will be useful for further comparative studies. We are very sorry to miss the important reference paper on brachiopod shells by Butler et al. (2015), which will be added into the revised manuscript. The structure and language of the manuscript will be revised based on the very helpful suggestions.

      Concerning the families Eoobolidae and Lingulellotretidae, we are aware of the current problematic situation of these families, and we will add more discussion about the detailed characters of Eoobolidae in the Systematic Palaeontology part of the manuscript. However, the revision of the families Eoobolidae and Lingulellotretidae falls outside the scope of this paper. We prefer to leave it now as it will be part of an upcoming publication based on more global materials from China, Australia, Sweden and Estonia that we are currently working on.

    2. Reviewer #1 (Public Review):

      This is a key paper examining the evolution of an important structure (pillars) in the shell architecture of organo-phosphatic brachiopods. The advantages of these structures are adequately discussed and the evolution of the pillars is described and illustrated. There is much that is of fundamental significance here in understanding the ecology and evolution of these groups as a whole.

      1) In several places the biological control on the development of the pillars is noted. This is explained in terms of their relationship to the growth and evolution of epithelial cells. It would be useful and make the paper more understandable if this link was mentioned early on in the paper and developed during the narrative.

      2) The Cambrian Explosion is mentioned a number of times. Are these changes driven by the Cambrian Explosion, i.e. the expansion of major new body plans, or are the changes merely coincident with the long duration of the 'Explosion'?

      3) I have no doubt the process is one of adaptive innovation but it would be useful to expand on this. Why is it adaptive?

      4) Are pillars present in living Lingula?

    3. Reviewer #2 (Public Review):

      Summary: Two early Cambrian taxa of linguliform brachiopods are assigned to the family Eoobolidae. The taxa exhibit a columnar shell structure and the phylogenetic implications of this shell structure in relation to other early Cambrian families are discussed.

      Strengths: Interesting idea regarding the evolution of shell structure.

      Weaknesses: The early record of shell structures of linguliform brachiopods is incomplete and partly contradictory. The authors maintain silence regarding contradictory information throughout the article to the extent that information is cited wrongly.<br /> The structure and language of the article need reworking in my opinion, the systematic part can be in the appendix but the main results and the results relevant for the discussion should be in the main article. A critical revision of the family Eoobolidae and Lingulellotretidae including a revision of the type species of Eoobolus and Lingulellotreta is needed.

    1. Author Response

      We thank the reviewers for thoroughly evaluating our work and for providing constructive and actionable feedback to improve the manuscript. The reviews have left us with a clear direction in which our work can improve, for which we are grateful. We will provide a detailed response to the reviews together with our revised manuscript. At this time, we accept the invitation to provide a provisional reply that addresses the major themes as summarized by the editors.

      The goal of our study was to infer an individual’s control strategy from the details of kinematics. We did this using monkey and human data collected under matching experimental conditions. We quantitatively compared these data to simulations that were generated by adapting a reasonable model of sensorimotor function that is standard in the literature. We are pleased that the reviewers and editors felt “that the overall scientific approach is of interest and has scientific merit” and “the approach has promise in aiding future studies that try to link behavior and neurophysiology (allowing homology between humans and primates).”

      We agree with the reviewers that additional work is needed to corroborate our main claim that we can unambiguously infer control strategies from behavioral data. This is a known hard problem that we are not the first to address, and we do not claim to have solved it here. We appreciate the suggestions about (1) further testing the classification procedure, (2) considering other metrics that may better distinguish between the control strategies, and (3) investigating the control strategy under perturbation scenarios. We plan to undertake additional simulations, analyses and, in the future, experiments, as suggested by the reviewers to enhance the impact of our work.

      In this initial brief response, we wish to focus on one key point noted by the editors, stemming from simulations by one of the reviewers using “a simple fixed controller.” We greatly appreciate that one reviewer went as far as to perform their own simulations. These simulations suggested that subjects do not need to switch between control strategies, but rather could achieve similar behavioral results via “a modest change in gain.” Specifically, the reviewer reports that their simple fixed controller could generate trials that sometimes looked like what we would call position control and sometimes looked like what we would characterize as velocity control. It was noted that “trial-to-trial differences were driven both by motor noise and by the modest variability in gain.”

      While we cannot comment with great certainty on the reviewer’s simulation results, since we do not know the specifics, we first wish to note that our controller and experimental subjects demonstrated this same phenomenon, in that there was overlap in the distribution of the metrics for the two strategies (specifically, in Figs. 5, 7 & 8). Hence, in our findings, even under position control some trials looked more like velocity control, and vice versa. We briefly discussed this in the paper, noting that “a large number of trials fall somewhere between the Position and Velocity Control boundaries”, and that “this could be due to a mixed control strategy” or “subjects switch strategies of their own accord”. This point would have been clearer had we included examples of these hand and cursor traces in Fig. 8. We will update Fig. 8 to more clearly illustrate this point and expand our discussion on different possible interpretations.

      Second, one may interpret the differences we attributed to changes in “control strategy” as changes simply in the gain of our “fixed” controller. Specifically, similar to the controller implemented by the reviewer, our controller is fixed in terms of the plant, the actuator and the sensory feedback loop; the only change we explored was in the relative weights or gains of position vs. velocity in the Q matrix to generate the motor command. While our intent was primarily to focus on the extremes of position control vs. velocity control, we agree that a mixed strategy of minimizing some combined error in position and velocity is likely. This is something we can readily explore with our controller model.

      In summary, we consider it worthwhile to investigate how one can infer the control strategy that a subject is employing to complete the task - either in our CST, or any other task that admits multiple strategies that can lead to success. We regard this as a valuable step towards addressing more realistic behaviors and their neural underpinnings in non-human primate research. The suggestions offered by the reviewers regarding additional analyses, simulations and experiments will provide more definitive answers and clarity for our approach.

      We are truly grateful for the time and effort the reviewers put into our manuscript. We are in the process of undertaking revisions to address all of their feedback and look forward to submitting an improved manuscript with a more detailed reply in the coming weeks.

    2. eLife assessment

      This study has the potential to provide valuable insights that connect the way humans and non-human primates (Rhesus monkeys) perform visuomotor control for a simplified, virtual task that involves stabilizing an unstable system (analogous to pole balancing). Thus, the paper provides the potential, in future studies, to make new discoveries in the neural mechanisms that underly behavior. However, the evidence (including inferring control strategies on a single-trial basis) was incomplete. Overall, the question and approach are potentially valuable in informing future studies, but more evidence is needed to support the primary claims of the paper.

    3. Reviewer #1 (Public Review):

      The present study examines whether one can identify kinematic signatures of different motor strategies in both humans and non-human primates (NHP). The Critical Stability Task (CST) requires a participant to control a cursor with complex dynamics based on hand motion. The manuscript includes datasets on performance of NHPs collected from a previous study, as well as new data on humans performing the same task. Further human experiments and optimal control models highlight how different strategies lead to different patterns of hand motion. Finally, classifiers were developed to predict which strategy individuals were using on a given trial. There are several strengths to this manuscript. I think the CST task provides a useful behavioural task to explore the neural basis of voluntary control. While reaching is an important basic motor skill, there is much to learn by looking at other motor actions to address many fundamental issues on the neural basis of voluntary control. I also think the comparison between human and NHP performance is important as there is a common concern that NHPs can be overtrained in performing motor tasks leading to differences in their performance as compared to humans. The present study highlights that there are clear similarities in motor strategies of humans and NHPs. While the results are promising, I would suggest that the actual use of these paradigms and techniques likely need some improvement/refinement. Notably, the threshold or technique to identify which strategy an individual is using on a given trial needs to be more stringent given the substantial overlap in hand kinematics between different strategies.

      The most important goal of this study is to set up future studies to examine how changes in motor strategies impact neural processing. I have a few concerns that I think need to be considered. First, a classifier was developed to identify whether a trial reflected Position Control with success deemed to be a probability of >70% by the classifier. In contrast, a probability of <30% was considered successfully predicting Velocity Control (Uncertain bandwidth middle 40%). While this may be viewed as acceptable for purposes of quantifying behaviour, I'm not sure this is strict enough for interpreting neural data. Figure 7A displays the OFC Model results for the two strategies and demonstrates substantial overlap for RMS of Cursory Position and Velocity at the lowest range of values. In this region, individual trials for humans and NHP are commonly identified as reflecting Position Control by the classifier although this region clearly also falls within the range expected for Velocity Control, just a lower density of trials. The problem is that neural data is messy enough, but having trials being incorrectly labelled will make it even messier when trying to quantify differences in neural processing between strategies. A further challenge is that trials cannot be averaged as the patterns of kinematics are so different from trial-to-trial. One option is to just move up the threshold from >70%/<30% to levels where you have a higher confidence that performance only reflects one of the two strategies (perhaps 95/5% level). Another approach would be to identify the 95% confidence boundary for a given strategy and only classify a trial as reflecting a given strategy when it is inside its 95% boundary, but outside the other strategies 95% boundary (or some other level separation). A higher threshold would hopefully also deal with the challenge of individuals switching strategies within a trial. Admittedly, this more stringent separation will likely drop the number of trials prohibitively, but there is a clear trade-off between number of trials and clean data. For the future, a tweak to the task could be to lengthen the trial as this would certainly increase separation between the two conditions.

      While the paradigm creates interesting behavioural differences, it is not clear to me what one would expect to observe neurally in different brain regions beyond paralleling kinematic differences in performance. Perhaps this could be discussed. One extension of the present task would be to add some trials where visual disturbances are applied near the end of the trial. The prediction is that there would be differences in the kinematics of these motor corrections for different motor strategies. One could then explore differences in neural processing across brain regions to identify regions that simply reflect sensory feedback (no differences in the neural response after the disturbance), versus those involved in different motor strategies (differences in neural responses after the disturbance).

      It seems like a mix of lambda values are presented in Figure 5 and beyond. There needs to be some sort of analysis to verify that all strategies were equally used across lambda levels. Otherwise, apparent differences between control strategies may simply reflect changes in the difficulty of the task. It would also be useful to know if there were any trends across time? Strategies used for blocks of trials or one used early when learning and then changing later.

      Figure 2 highlights key features of performance as a function of task difficulty. Lines 187 to 191 highlight similarities in motor performance between humans and NHPs. However, there is a curious difference in hand/cursor Gain for Monkey J. Any insight as to the basis for this difference?

    1. Author Response

      Reviewer #1 (Public Review):

      This paper falls in a long tradition of studies on the costs of reproduction in birds and its contribution to understanding individual variation in life histories. Unfortunately, the meta-analyses only confirm what we know already, and the simulations based on the outcome of the meta-analysis have shortcomings that prevent the inferences on optimal clutch size, in contrast to the claims made in the paper.

      There was no information that I could find on the effect sizes used in the meta-analyses other than a figure listing the species included. In fact, there is more information on studies that were not included. This made it impossible to evaluate the data-set. This is a serious omission, because it is not uncommon for there to be serious errors in meta-analysis data sets. Moreover, in the long run the main contribution of a meta-analysis is to build a data set that can be included in further studies.

      It is disappointing that two referees comment on data availability, as we supplied a link to our full dataset and the code we used in Dryad with our submitted manuscript. We were also asked to supply our data during the review process and we again supplied a link to our dataset and code, along with a folder containing the data and code itself. We received confirmation that the reviewers had been given our data and code. We support open science and it was our intention that our dataset should be fully available to reviewers and readers. Our data and code are at https://doi.org/10.5061/dryad.q83bk3jnk.

      The main finding of the meta-analysis of the brood size manipulation studies is that the survival costs of enlarging brood size are modest, as previously reported by Santos & Nakagawa on what I suspect to be mostly the same data set.

      We disagree that the main finding of our paper is the small survival cost of manipulated brood size. The major finding of the paper, in our opinion, is that the effect sizes for experimental and observational studies are in opposite directions, therefore providing the first quantitative evidence to support the influential theoretical framework put forward by van Noordwijk and de Jong (1986), that individuals differ in their optimal clutch size and are constrained to reproducing at this level due to a trade-off with survival. We show that while the manipulation experiments have been widely accepted to be informative, they are not in fact an effective test of whether within-species variation in clutch size is the result of a trade-off between reproduction and survival.

      The comment that we are reporting the same finding as Santos & Nakagawa (2012) is a misrepresentation of both that study and our own. Santos & Nakagawa found an effect of parental effort on survival only in males who had their clutch size increased – but no effect for males who had their clutch size reduced and no survival effect on females for either increasing or reducing parental effort. However, we found an overall reduction in survival for birds who had brood sizes manipulated to make them larger (for both sexes and mixed sex studies combined). In our supplementary information, we demonstrate the overall survival effect of a change in reproductive effort to be close to zero for males, negative (though non-significant) for females and significantly negative for mixed sexes (which are not included in the Santos & Nakagawa study).

      The paper does a very poor job of critically discussing whether we should take this at face value or whether instead there may be short-comings in the general experimental approach. A major reason why survival cost estimates are barely significantly different from zero may well be that parents do not fully adjust their parental effort to the manipulated brood size, either because of time/energy constraints, because it is too costly and therefore not optimal, or because parents do not register increased offspring needs. Whatever the reason, as a consequence, there is usually a strong effect of brood size manipulation on offspring growth and thereby presumably their fitness prospects. In the simulations (Fig.4), the consequences of the survival costs of reproduction for optimal clutch size were investigated without considering brood size manipulation effects on the offspring. Effects on offspring are briefly acknowledged in the discussion, but otherwise ignored. Assuming that the survival costs of reproduction are indeed difficult to discern because the offspring bear the brunt of the increase in brood size, a simulation that ignores the latter effect is unlikely to yield any insight in optimal clutch size. It is not clear therefore what we learn from these calculations.

      The reviewer’s comment is somewhat of a paradox. We take the best studied example of the trade-off between reproductive effort and parental survival, a key theme in life-history and the biology of ageing, and subject this to a meta-analysis. The reviewer suggests we should interpret our finding as if there must be something wrong with the method or studies we included, rather than maybe considering the original hypothesis could be false or inflated in importance. The reviewer’s inclination to question the premise of the data in favor of a held hypothesis we consider not necessarily the best scientific approach here. In many places in our manuscript do we question and address issues in the underlying data and interpretation (L101-105, L149-150, 182-185 and L229-233). Moreover, we make it clear that we focus on the trade-off between current reproductive effort and subsequent parental survival and we are aware that other trade-offs could counter-balance or explain our findings, discussed on L189-191 & L246-253. Note that it is also problematic, when you do not find the expected response, to search for an alternative that has not been measured. In the case here, with trade-offs, there are endless possiblilities of where a trade-off might be incurred between traits. We purposfully focus on the one well-studied and theorised trade-off. We clearly acknowledge though that when all possible trade-offs are taken into account a trade-off on the fitness level can occur and cite two famous studies (Daan et al., 1990 and Verhulst & Tinbergen 1991) that have done just that (L250-253).

      So whilst, we agree with the reviewer that the offspring may incur costs themselves, rather than costs being incurred by the parents, the aim of our study was to test for a generalised trend across species in the survival costs of reproductive effort. It is unrealistic to suggest that incorporating offspring growth into our simulations would add insight, as a change in offspring number rarely affects all offspring in the nest equally and there can even be quite stark differences; for example this will be most evident in species that produce sacrificial offspring. This effect will be further confounded by catch-up growth, for example, and so it is likely that increased sibling competition from added chicks alters offspring growth trajectories, rather than absolute growth as the reviewer suggests. There are mixed results in the literature on the effect of altering clutch size on offspring survival, with an increased clutch size through manipulation often increasing the number of recruits from a nest.

      There are other reasons why brood size manipulations may not reveal the costs of reproduction animals would incur when opting for a larger brood size than they produced spontaneously themselves. Firstly, the manipulations do not affect the effort incurred in laying eggs (which also biases your comparison with natural variation in clutch size). Secondly, the studies by Boonekamp et al on Jackdaws found that while there was no effect of brood size manipulation on parental survival after one year of manipulation, there was a strong effect when the same individuals were manipulated in the same direction in multiple years. This could be taken to mean that costs are not immediate but delayed, explaining why single year manipulations generally show little effect on survival. It would also mean that most estimates of the fitness costs of manipulated brood size are not fit for purpose, because typically restricted to survival over a single year.

      First, our results did show a survival cost of reproduction for brood manipulations. We agree that there could be longer-term costs, and so our estimate of the survival cost for manipulated birds is likely to be an underestimate, meaning that our interpretation still holds – the cost to reproduce prevents individuals from laying beyond their optimal level. Note, however, that much theory is build on the immediate costs of reproduction and as such these costs are likely overinterpreted.

      We agree with the reviewer that lifetime manipulations could be even more informative than single-year manipulations. Unfortunately, there are currently too few studies available to be able to draw generalisable conclusions across species for lifetime manipulations. This is, however, the reason we used lifetime change in clutch size in our fitness projections, which the reviewer seems to have missed – please see methods line 360-362, where we explicitly state that this is lifetime enlargement. Of course such interpretations do not include an accumulation of costs that is greater than the annual cost, but currently there is no clear evidence that such an assumption is valid. Such a conclusion can also not be drawn from the study on jackdaws by Boonekamp et al (2014) as the treatments were life-long and, therefore, cannot separate annual from accrued (multiplicative) costs that are more than the sum of annual costs incurred.

      Details of how the analyses were carried out were opaque in places, but as I understood the analysis of the brood size manipulation studies, manipulation was coded as a covariate, with negative values for brood size reductions and positive values for brood size enlargements (and then variably scaled or not to control brood or clutch size). This approach implicitly assumes that the trade-off between current brood size (manipulation) and parental survival is linear, which contrasts with the general expectation that this trade-off is not linear. This assumption reduces the value of the analysis, and contrasts with the approach of Santos & Nakagawa.

      We thank the reviewer for highlighting a lack of clarity in places in our methods. We will add additional detail to this section in our revised manuscript.

      For clarity in our response, each effect size was extracted by performing a logistic regression with survival as a binary response variable and clutch size was the absolute value of offspring in the nest (i.e., for a bird who laid a clutch size of 5 but was manipulated to have -1 egg, we used a clutch size value of 4). The clutch size was also standardised and, separately, expressed as a proportion of the species mean.

      We disagree that our approach reduces the value of our analysis. First, our approach allows a direct comparison between experimental and observational studies, which is the novelty of our study. Our approach does differ from Santos & Nakagawa but we disagree that it contrasts. Our approach allows us to take into consideration the severity of the change in clutch size, which Santos & Nakagawa do not. Therefore, we do not agree that our approach is worse at accounting for non-linearity of trade-offs than the approach used by Santos & Nakagawa.

      Our analysis, alongside a plethora of other ecological studies, does assume that the response to our predictor variable is linear. However, it is common knowledge that there are very few (if any) truly linear relationships. We use linear relationships because they serve a good approximation of the trend and provide a more rigorous test for an underlying relationship than would fitting nonlinear models. For many datasets there is not a range of chicks added for which a non-linear relationship could be estimated. The question also remains of what the shape of this non-linear relationship should be and is hard to determine a priori. We will address non-linear effects in our revised manuscript.

      The observational study selection is not complete and apparently no attempt was made to make it complete. This is a missed opportunity - it would be interesting to learn more about interspecific variation in the association between natural variation in clutch size and parental survival.

      We clearly state in our manuscript that we deliberately made a tailored selection of studies that matched the manipulation studies (L279-282). We paired species extracted for observational studies with those extracted in experimental studies to facilitate a direct comparison between observational and experimental studies, and to ensure that the respective datasets were comparable. The reviewer’s focus in this review seems to be solely on the experimental dataset. This comment dismisses the observational component of our analysis and thereby fails to acknowledge the question being addressed in this study.

      Reviewer #2 (Public Review):

      I have read with great interest the manuscript entitled "The optimal clutch size revisited: separating individual quality from the costs of reproduction" by LA Winder and colleagues. The paper consists in a meta-analysis comparing survival rates from studies providing clutch sizes of species that are unmanipulated and from studies where the clutch sizes are manipulated, in order to better understand the effects of differences in individual quality and of the costs of reproduction. I find the idea of the manuscript very interesting. However, I am not sure the methodology used allows to reach the conclusions provided by the authors (mainly that there is no cost of reproduction, and that the entire variation in clutch size among individuals of a population is driven by "individual quality").

      We would like to highlight that we do not conclude that there is no cost of reproduction. Please see lines 258–260, where we state that our lack of evidence for trade-offs driving within-species variation in clutch size does not necessarily mean the costs of reproduction are non-existent. We conclude that individuals are constrained to their optima by the survival cost of reproduction. It is also an over-statement of our conclusion to say that we believe that variation in clutch size is only driven by quality. Our results show that unmanipulated birds who have larger clutch sizes also live longer, and we suggest this is evidence that some individuals are “better” than others, but we do not say, nor imply, that no other factors affect variation in clutch size.

      I write that I am not sure, because in its current form, the manuscript does not contain a single equation, making it impossible to assess. It would need at least a set of mathematical descriptions for the statistical analysis and for the mechanistic model that the authors infer from it.

      We appreciate this comment, but this is the first time we have been asked to put equations in a manuscript rather than explain them in terms that are accessible to a wider audience. Note however that our meta-analysis is standard and based on logistic regression and standard meta-analytic practices. We do not think we need to repeat such equations and we cite the relevant data. For the simulation, we simply simulated the resulting effects and this is not something that we feel is captured more accurately in equations rather than in text and the associated graphs. We of course supplied our code for this along with our manuscript (https://doi.org/10.5061/dryad.q83bk3jnk), though as we mentioned above, we believe this was not shared with the reviewers despite us making this available for the review process. We therefore understand the reviewer feels the simulations were not explained thoroughly. We will revise our text to see if we can add additional explanation where relevant in our revision.

      The texts mixes concepts of individual vs population statistics, of within individual vs among-individuals measures, of allocation trade-offs and fitness trade-offs, etc ....which means it would also require a glossary of the definitions the authors use for these various terms, in order to be evaluated.

      We would like to thank the reviewer for highlighting this lack of clarity in our text. We will simplify the terminology and define terms in our revised manuscript.

      This problem is emphasised by the following sentence to be found in the discussion "The effect of birds having naturally larger clutches was significantly opposite to the result of increasing clutch size through brood manipulation". The "effect" is defined as the survival rate (see Fig 1). While it is relatively easy to intuitively understand what the "effect" is for the unmanipulated studies: the sensitivity of survival to clutch size at the population level, this should be mentioned and detailed in a formula. Moreover, the concept of effect size is not at all obvious for the manipulated ones (effect of the manipulation? or survival rate whatever the manipulation (then how could it measure a trade-off ?)? at the population level? at the individual level ?) despite a whole appendix dedicated to it. This absolutely needs to be described properly in the manuscript.

      We would like to thank the reviewer for bringing to our attention the lack of clarity on the details of our methodology. We will make this more clear in our revised manuscript.

      For clarity, the effect size for both manipulated and unmanipulated nests was survival, given the brood size raised. We performed a logistic regression with survival as a binary response variable (i.e., number of individuals that survived and number of individuals that died after each breeding season), and clutch size was the absolute value of offspring in the nest (i.e., for a bird who laid a clutch size of 5 but was manipulated to have -1 egg, we used a clutch size value of 4). This allows for direct comparison of the effect size (survival given clutch size raised) between manipulated and unmanipulated birds.

      Despite the lack of information about the underlying mechanistic model tested and the statistical model used, my impression is still that the interpretation in the introduction and discussion is not granted by the outputs of the figures and tables. Let's use a model similar to that of (van Noordwijk and de Jong, 1986): imagine that the mechanism at the population level is

      a.c_(i,q)+b.s_(i,q)=E_q

      Where c_(i,q) are s_(i,q) are respectively the clutch size for individual i which is of quality q, and E_q is the level of "energy" that an individual of quality q has available during the given time-step (and a and b are constants turning the clutch size and survival rate into energy cost of reproduction and energy cost of survival, and there are both quite "high" so that an extra egg (c_(i,q) is increased by 1) at the current time-step, decreases s_(i,q) markedly (E_q is independent of the number of eggs produced), that is, we have strong individual costs of reproduction). Imagine now that the variance of c_(i,q) (when the population is not manipulated) among individuals of the same quality group, is very small (and therefore the variance of s_(i,q) is very small also) and that the expectation of both are proportional to E_q. Then, in the unmanipulated population, the variance in clutch size is mainly due to the variance in quality. And therefore, the larger the clutch size c_(i,q) the higher E_q, and the higher the survival s_(i,q).

      In the manipulated populations however, because of the large a and b, an artificial increase in clutch size, for a given E_q, will lead to a lower survival s_(i,q). And the "effect size" at the population level may vary according to a,b and the variances mentioned above. In other words, the costs of reproduction may be strong, but be hidden by the data, when there is variance in quality; however there are actually strong costs of reproduction (so strong actually that they are deterministic and that the probability to survive is a direct function of the number of eggs produced)

      We would like to thank the reviewer for these comments. Please note that our simulations only take the experimental effect of brood size on parental survival into account. Our model does not incorporate quality effects. The reviewer is right that the relationship between quality and the effects exposed by manipulating brood size can take many forms and this is a very interesting topic, but not one we aimed to tackle in our manuscript. In terms of quality we make two points: 1) overall quality effects connecting reproduction and parental survival are present 2) these effects are opposite in direction to the effects when reproduction is manipulated and similar in magnitude. We do not go further than that in interpreting our results. The reviewer is right however that we do suggest and repeat suggestions by others that quality can also mask the trade-off in some individuals or circumstances (L63-65, L85-88 & L237-240), but we do not quantify this as this is dependent on the unknown relationships between quality and the response to the manipulation. A focussed set of experiments in that context would be interesting and there is some data that could get at this, i.e. the relationship between produced clutch size and the relative effect of the manipulation. Such information is however not available for all studies and although we explored also analyzing this, currently this is not possible to do with sufficient confidence. We will include this rationale in our revision.

      Moreover, it seems to me that the costs of reproduction are a concept closely related to generation time. Looking beyond the individual allocative (and other individual components of the trade-off) cost of reproduction and towards a populational negative relationship between survival and reproduction, we have to consider the intra-population slow fast continuum (some types of individuals survive more and reproduce less (are slower) than other (which are faster)). This continuum is associated with a metric: the generation time. Some individuals will produce more eggs and survive less in a given time-period because this time-period corresponds to a higher ratio of their generation time (Gaillard and Yoccoz, 2003; Gaillard et al., 2005). It seems therefore important to me, to control for generation time and in general to account for the time-step used for each population studied when analysing costs of reproduction. The data used in this manuscript is not just clutch size and survival rates, but clutch size per year (or another time step) and annual (or other) survival rates.

      The reviewer is right that this is interesting. There has been unexplained difference in temperate (seasonal) and tropical reproduction strategies. Most of our data come from seasonal breeders however. Although there is some variation in second brooding and such often these species only produce one brood. We do agree that a wider consideration here is relevant, but we are not trying to explain all of life-history in our paper. It is clearly the case that other factors will operate and the opportunity for trade-offs will vary among species according to their respective life histories. However, our study focuses on the two most fundamental components of fitness – longevity and reproduction – to test a major hypothesis in the field, and we uncover new relationships that contrast with previous influential studies, and cast doubt on previous conclusions. We question the assumed trade-off between reproduction and annual survival. We show quality is important and that the effect we find in experimental studies, is so small that it can only explain between-species patterns but is unlikely to be the selective force that constrains reproduction within-species. We do agree that there is a lot more work that can be done in this area. We hope we contribute to this, by questioning this central trade-off. We will try and incorporate some of these suggestions in the revision where possible.

      Finally, it is important to relate any study of the costs of reproduction in a context of individual heterogeneity (in quality for instance), to the general problem of the detection of effects of individual differences on survival (see, e.g., Fay et al., 2021). Without an understanding of the very particular statistical behaviour of survival, associated to an event that by definition occurs only once per life history trajectory (by contrast to many other traits, even demographic, where the corresponding event (production of eggs for reproduction, for example) can be measured several times for a given individual during its life history trajectory).

      Thank you for raising this point. The reviewer is right that heterogeneity can dampen or augment selection. Note that by estimating the effect of quality here we give an example of how heterogeneity can possibly do exactly this. We thank the reviewer for raising that we should possibly link this to wider effects of heterogeneity and we aim to do so in the revision.

      References:

      Fay, R. et al. (2021) 'Quantifying fixed individual heterogeneity in demographic parameters: Performance of correlated random effects for Bernoulli variables', Methods in Ecology and Evolution, 2021(August), pp. 1-14. doi: 10.1111/2041-210x.13728.

      Gaillard, J.-M. et al. (2005) 'Generation time: a reliable metric to measure life-history variation among mammalian populations.', The American naturalist, 166(1), pp. 119-123; discussion 124-128. doi: 10.1086/430330.

      Gaillard, J.-M. and Yoccoz, N. G. (2003) 'Temporal Variation in Survival of Mammals: a Case of Environmental Canalization?', Ecology, 84(12), pp. 3294-3306. doi: 10.1890/02-0409.

      van Noordwijk, A. J. and de Jong, G. (1986) 'Acquisition and Allocation of Resources: Their Influence on Variation in Life History Tactics', American Naturalist, p. 137. doi: 10.1086/284547.

      Reviewer #3 (Public Review):

      The authors present here a comparative meta-analysis analysis designed to detect evidence for a reproduction/ survival trade-off, central to expectations from life history theory. They present variation in clutch size within species as an observation in conflict with expectations of optimisation of clutch size and suggest that this may be accounted for from weak selection on clutch size. The results of their analyses support this explanation - they found little evidence of a reproduction - survival trade-off across birds. They extrapolated from this result to show in a mathematical model that the fitness consequences of enlarged clutch sizes would only be expected to have a significant effect on fitness in extreme cases, outside of normal species' clutch size ranges. Given the centrality of the reproduction-survival trade-off, the authors suggest that this result should encourage us to take a more cautious approach to applying concepts the trade-off in life history theory and optimisation in behavioural ecology more generally. While many of the findings are interesting, I don't think the argument for a major re-think of life history theory and the role of trade-offs in fitness maximisation is justified.

      The interest of the paper, for me, comes from highlighting the complexities of the link between clutch size and fitness, and the challenges facing biologists who want to detect evidence for life history trade-offs. Their results highlight apparently contradictory results from observational and experimental studies on the reproduction-survival trade-off and show that species with smaller clutch sizes are under stronger selection to limit clutch size.

      Unfortunately, the authors interpret the failure to detect a life history trade-off as evidence that there isn't one. The construction of a mathematical model based on this interpretation serves to give this possible conclusion perhaps more weight than is merited on the basis of the results, of this necessarily quite simple, meta-analysis. There are several potential complicating factors that could explain the lack of detection of a trade-off in these studies, which are mentioned and dismissed as unimportant (lines 248-250) without any helpful, rigorous discussion. I list below just a selection of complexities which perhaps deserve more careful consideration by the authors to help readers understand the implications of their results:

      We would like to thank the reviewer for their thoughtful response and summary of the findings we also agree are central to our study. The reviewer also highlights areas where our manuscript could benefit from a deeper discussion and we will add detail to our discussion in our revised manuscript.

      We would like to highlight that we do not interpret the failure to detect a trade-off as evidence that there isn’t one. First, and importantly, we do find a trade-off but show this is only incurred when individuals lay beyond their optimal level. Secondly, we also state on lines 258-260 that the lack of evidence to support trade-offs being strong enough to drive variation in clutch size does not necessarily mean there are no costs of reproduction.

      The statement that we have constructed a mathematical model based on the interpretation that we have not found a trade-off is, again, factually incorrect. We ran these simulations because the opposite is true – we did find a trade-off. There is a significant effect of clutch size when manipulated on annual parental survival. To appreciate whether this effect alone can explain why reproduction is constrained, we ran the simulations. From these simulations we find that this effect size is too small to explain the constraint so something else must be going on and we do spend a considerable amount of text discussing the possible explanations (L182-194). Note the possibly most parsimonious conclusion here is that costs of reproduction are not there so we also give that explanation some thought (L201-205 and L247-253).

      We are disappointed by the suggestion that we have dismissed complicating factors which could prevent detection of a trade-off, as this was not our intention. We were aiming to highlight that what we have demonstrated to be an apparent trade-off can be explained through other mechanisms, and that the trade-off between clutch size and survival is not as strong in driving within-species variation in clutch size as previously assumed. We will add further discussion to our revised manuscript to make this clear and give readers a better understanding of the complexity of factors associated with life-history theory. Although we do feel we have addressed this (L248-255).

      • Reproductive output is optimised for lifetime reproductive success and so the consequences of being pushed off the optimum for one breeding attempt are not necessarily detectable in survival but in future reproductive success (and, therefore, lifetime reproductive success).

      We agree this is a valid point, which is mentioned in our manuscript in terms of alternative stages where the costs of reproduction might be manifested (L248-250). We would also like to highlight that in our simulations, the change in clutch size (and subsequent survival cost) was assumed for the lifetime of the individual, for this very reason.

      • The analyses include some species that hatch broods simultaneously and some that hatch sequentially (although this information is not explicitly provided (see below)). This is potentially relevant because species which have been favoured by selection to set up a size asymmetry among their broods often don't even try to raise their whole broods but only feed the biggest chicks until they are sated; any added chicks face a high probability of starvation. The first point this observation raises is that the expectation of more chicks= more cost, doesn't hold for all species. The second more general point is that the very existence of the sequential hatching strategy to produce size asymmetry in a brood is very difficult to explain if you reject the notion of a trade-off.

      We agree with the reviewer that the costs of reproduction can be absorbed by the offspring themselves, and may not be equal across offspring (we also highlight this at L249 in the manuscript). However, we disagree that for some species the addition of more chicks does not equate to an increase in cost, though we do accept this might be less for some species. This is, however, difficult to incorporate into a sensible model as the impacts will vary among species and some species do also exhibit catch-up growth. So without a priori knowledge on this we kept our model simple. To test whether the effect on parental survival (often assumed to be a strong cost) can explain the constraint on reproductive effort, and we conclude it does not.

      We would also like to make clear that we are not rejecting the notion of a trade-off. Our study shows evidence that a trade-off between survival and reproductive effort likely does not drive within-species variation in clutch size. We do explicitly say this throughout our manuscript, and also provide suggestions of other areas where a trade-off may exist (L246-250). The point of our study is not whether trade-offs exist or not, it is whether there is a generalisable across-species trend for a trade-off between reproductive effort and survival – the most fundamental trade-off in our field but for which there is a lack of conclusive evidence within species.

      • For your standard, pair-breeding passerine, there is an expectation that costs of raising chicks will increase linearly with clutch size. Each chick requires X feeding visits to reach the required fledge weight. But this is not the case for species which lay precocious chicks which are relatively independent and able to feed themselves straight after hatching - so again the relationship of care and survival is unlikely to be detectable by looking at the effect of clutch size but again, it doesn't mean there isn't a trade-off between breeding and survival.

      Precocial birds still provide a level of parental care, such as protection from predators. Though we agree that the level of parental care in provisioning food (and in some cases in all parental care given) is lower in precocial than altricial birds, this would only make our reported effect size for manipulated birds to be an underestimate. Again, we would like to draw the reviewer’s attention to the fact we did detect a trade-off in manipulated birds and we do not suggest that trade-offs do not exist. The argument the reviewer suggests here does not hold for unmanipulated birds, as we found that birds that naturally lay larger clutch sizes have higher survival.

      • The costs of raising a brood to adulthood for your standard pair-breeding passerine is bound to be extreme, simply by dint of the energy expenditure required. In fact, it was shown that the basal metabolic rate of breeding passerines was at the very edge of what is physiologically possible, the human equivalent being cycling the Tour de France (Nagy et al. 1990). If birds are at the very edge of what is physiologically possible, is it likely that clutch size is under weak selection?

      If birds are at the very edge of what is physiologically possible, then indeed it would necessarily follow that if they increase the resource allocated in one area then expenditure in another area must be reduced. In many studies however, the overall brood mass is increased when chicks are added and cared for in an experimental setting, suggesting that birds are not operating at their limit all the time. Our simulations show that if individuals increase their clutch size, the survival cost of reproduction counterbalances the fitness gained by increasing clutch size and so there is no overall fitness gain to producing more offspring. Therefore, selection on clutch size is constrained to the within-species level. We do not say in our manuscript that clutch size is under weak selection – we only ask why variation in clutch size is maintained if selection always favours high-producing birds.

      • Variation in clutch size is presented by the authors as inconsistent with the assumption that birds are under selection to lay the Lack clutch. Of course, this is absurd and makes me think that I have misunderstood the authors' intended point here. At any rate, the paper would benefit from more clarity about how variable clutch size has to be before it becomes a problem for optimality in the authors' view (lines 84-85; line 246). See Perrins (1965) for an exquisite example of how beautifully great tits optimise clutch size on average, despite laying between 5-12 eggs.

      We woud like to thank the reviewer for highlighting that our manuscript may be misleading in places, however, we are unsure which part of our conclusions the author is referring to here.The question we pose is “why all birds don’t lay at the population optimum?”, and is central to the decades-long field of life-history theory. Why is variation maintained at such a level? As the reviewer outlines it ranges massively with some birds laying half of what other birds lay.

    1. eLife assessment

      The authors have improved a method to differentiate human iPSC-derived microglial cells with immune responses and phagocytic abilities; and through transplantation into the adult mouse retina, the authors further demonstrated their integration and occupation of native microglial cell space, and functional response to retinal injuries. The study is important for potential microglial replacement therapy to treat retinal and CNS diseases. Overall, the data are solid, but there is a need to improve writing, figure-making, and data interpretation.

    2. Reviewer #1 (Public Review):

      Summary:<br /> This paper reported a protocol of using human-induced pluripotent stem cells to generate cells expressing microglia-enriched genes and responding to LPS by drastic upregulation of proinflammatory cytokines. Upon subretinal transplantation in mice, hiPSC-derived cells integrated into the host retina and maintained retinal homeostasis, while they responded to RPE injury by migration, proliferation and phagocytosis. The findings revealed the potential of using hiPSC-derived cell transplantation for microglia replacement as a therapeutic strategy for retinal diseases.

      Strengths:<br /> The paper demonstrates a method of consistently generating a significant quantity of hiPSC-derived microglia-like cells for in vitro study or for in vivo transplantation. RNAseq analysis offers an opportunity for comprehensive transcriptome profiling of the derived cells. It is impressive that following transplantation, these cells integrated into the retina well, migrated to the corresponding layers, adopted microglia-like morphologies, and survived long term without generating apparent harm. The work has laid a foundation for future utilization of hiPSC-derived microglia in lab and clinical applications.

      Weaknesses:<br /> 1. The primary weakness of the paper concerns its conclusion of having generated "homogenous mature microglia", partly based on the RNAseq analysis. However, the comparison of gene profiles was carried out only between "hiPSC-derived mature microglia" and the proliferating myeloid progenitors. While the transcriptome profiles revealed a trend of enrichment of microglia-like gene expression in "hiPSC-derived mature microglia" compared to proliferating myeloid progenitors, this is not sufficient to claim they are "mature microglia". It is important that one carries out a comparative analysis of the RNAseq data with those of primary human microglia, which may be done by leveraging the public database. To convincingly claim these cells are mature microglia, questions need to be addressed including how similar the molecular signatures of these cells are compared with the fully differentiated primary microglia cell or if they remain progenitor-like or take on mosaic properties, and how they distinguish from macrophages.

      2. While the authors attempted to demonstrate the functional property of "hiPSC-derived mature microglia" in culture, they used LPS challenge, which is an inappropriate assay. This is because human microglia respond poorly to LPS alone but need to be activated by a combination of LPS with other factors, such as IFNγ. Their data that "hiPSC-derived mature microglia" showed robust responses to LPS indeed implicates that these cells do not behave like mature human microglia.

      3. The resolution of Figs. 4 - 6 is so low that even some of the text and labels are hardly readable. Based on the morphology shown in Fig. 4 and the statement in line 147, these hiPSC-derived "cells altered their morphology to a rounded shape within an hour of incubation and rapidly internalized the fluorescent-labeled particles". This is a peculiar response. Usually, microglia do not respond to fluorescent-labeled zymosan by turning into a rounded shaped within an hour when they internalize them. Such a behavior usually implicates weak phagocytotic capacity.

      4. Data presented in Fig. 5 are not very convincing to support that transplanted cells were immunopositive for "human CD11b (Fig.5C), as well as microglia signature markers P2ry12 and TMEM119 (Fig.5D)" (line 167). The resolution and magnification of Fig. 5D is too low to tell the colocalization of tdT and human microglial marker immunolabeling. In the flat-mount images (C, I), hCD11b immunolabeling is not visible in the GCL or barely visible in the IPL. This should be discussed.

      5. Microglia respond to injury by becoming active and lose their expression of the resting state microglial marker, such as P2ry12, which is used in Fig. 6 for detection of migrated microglia. To confirm that these cells indeed respond to injury like native microglia, one should check for activated microglial markers and induction of pro-inflammatory cytokines in the sodium iodate-injury model.

    3. Reviewer #2 (Public Review):

      Summary:<br /> Ma et al. employed a myeloid progenitor/microglia differentiation protocol to produce human-induced pluripotent stem cell (hiPSC)-derived microglia in order to examine the potential of microglial cell replacement as a treatment for retinal disorders. They characterized the iPSC-derived microglia by gene expression and in vitro assay analysis. By evaluating xenografted microglia in the partly microglia-depleted retina, the function of the microglia was further assessed.

      Strengths:

      Overall, the study and the data are convincing, and xenografted microglia were also tested in a RPE injury paradigm.

      Weaknesses:

      Gene expression analysis of mature microglia cells should be better interpreted and it would be beneficial to compare the iPSC-derived microglia gene set to a human microglial cell line (for example, HMC3) instead of myeloid progenitor cells.<br /> The way that the manuscript has been written, unfortunately, is not optimal. I recommend that the entire manuscript be edited and proofread in English. The text contains spelling and grammar mistakes, and the manuscript is inconsistent in several parts. The manuscript should also be revised for a scientific paper format.

    1. eLife assessment

      This important modeling work demonstrates out-of-distribution generalization using a grid cell coding scheme combined with Determinantal Point Process Attention. The simulations provide convincing evidence that the model improves generalization performance across several tasks. The generality of the approach is unclear, however, and there is limited comparison to relevant prior work.

    2. Reviewer #1 (Public Review):

      Summary:<br /> This paper presents a cognitive model of out-of-distribution generalisation, where the representational basis is grid-cell codes. In particular, the authors consider the tasks of analogies, addition, and multiplication, and the out-of-distribution tests are shifting or scaling the input domain. The authors utilise grid cell codes, which are multi-scale as well as translationally invariant due to their periodicity. To allow for domain adaptation, the authors use DPP-A which is, in this context, a mechanism of adapting to input scale changes. The authors present simulation results demonstrating that this model can perform out-of-distribution generalisation to input translations and re-scaling, whereas other models fail.

      Strengths:<br /> This paper makes the point it sets out to - that there are some underlying representational bases, like grid cells, that when combined with a domain adaptation mechanism, like DPP-A, can facilitate out-of-generalisation. I don't have any issues with the technical details.

      Weaknesses:<br /> The paper does leave open the bigger questions of 1) how one learns a suitable representation basis in the first place, 2) how to have a domain adaptation mechanism that works in more general settings other than adapting to scale. Overall, I'm left wondering whether this model is really quite bespoke or whether there is something really general here. My comments below are trying to understand how general this approach is.

      COMMENTS<br /> This work relies on being able to map inputs into an appropriate representational space. The inputs were integers so it's easy enough to map them to grid locations. But how does this transfer to making analogies in other spaces? Do the inputs need to be mapped (potentially non-linearly) into a space where everything is linear? In general, what are the properties of the embedding space that allows the grid code to be suitable? It would be helpful to know just how much leg work an embedding model would have to do.

      It's natural that grid cells are great for domain shifts of translation, rescaling, and rotation, because they themselves are multi-scaled and are invariant to translations and rotations. But grid codes aren't going to be great for other types of domain shifts. Are the authors saying that to make analogies grid cells are all you need? If not then what else? And how does this representation get learned? Are there lots of these invariant codes hanging around? And if so how does the appropriate one get chosen for each situation? Some discussion of the points is necessary as otherwise, the model seems somewhat narrow in scope.

      For effective adaptation of scale, the authors needed to use DPP-A. Being that they are relating to brains using grid codes, what processes are implementing DPP-A? Presumably, a computational module that serves the role of DPP-A could be meta-learned? I.e. if they change their task set-up so it gets to see domain shifts in its training data an LSTM or transformer could learn to do this. The presented model comparisons feel a bit of a straw man.

      I couldn't see it explained exactly how R works.

    3. Reviewer #2 (Public Review):

      Summary:<br /> This paper presents a model of out-of-distribution (OOD) generalization that focuses on modeling an analogy task, in which translation or scaling is tested with training in one part of the space and testing in other areas of the space progressively more distant from the training location. Similar tests were performed on arithmetic including addition and multiplication, and similarly impressive results appear for addition but not multiplication. The authors show that a grid cell coding scheme helps performance on these analogy and arithmetic tasks, but the most dramatic increase in performance is provided by a complex algorithm for distributional point-process attention (DPP-A) based on maximizing the determinant of the covariance matrix of the grid embeddings.

      Strengths:<br /> The results appear quite impressive. The results for generalization appear quite dramatic when compared to other coding schemes (i.e. one-hot) or when compared to the performance when ablating the DPP-A component but retaining the same inference modules using LSTM or transformers. This appears to be an important result in terms of generalization of results in an analogy space.

      Weaknesses:<br /> There are a number of ways that its impact and connection to grid cells could be enhanced. From the neuroscience perspective, the major comments concern making a clearer and stronger connection to the actual literature on grid cells and grid cell modeling, and discussing the relationship of the complex DPP-A algorithm to biological circuits.

      Major comments:<br /> 1. They should provide more citations to other groups that have explored analogy using this type of task. Currently, they only cite one paper (Webb et al., 2020) by their own group in their footnote 1 which used the same representation of behavioral tasks for generalization of analogy. It would be useful if they could cite other papers using this simplified representation of analogy and also show the best performance of other algorithms from other groups in their figures, so that there is a sense of how their results compare to the best previous algorithm by other groups in the field (or they can identify which of their comparison algorithms corresponds to the best of previously published work).

      2. While the grid code they use is very standard and based on grid cell researchers (Bicanski and Burgess, 2019), the rest of the algorithm doesn't have a clear claim on biological plausibility. It has become somewhat standard in the field to ignore the problem of how the brain could biologically implement the latest complex algorithm, but it would be useful if they at least mention the problem (or difficulty) of implementing DPP-A in a biological network. In particular, does maximizing the determinant of the covariance matrix of the grid code correspond to something that could be tested experimentally?

      3. Related to major comment 2., it would be very exciting if they could show what the grid code looks like after the attentional modulation inner product xT w has been implemented. This could be highly useful for experimental researchers trying to connect these theoretical simulation results to data. This would be most intuitive to grid cell researchers if it is plotted in the same format as actual biological experimental data - specifically which grid cell codes get strengthened the most (beyond just the highest frequencies).

      4. To enhance the connection to biological systems, they should cite more of the experimental and modeling work on grid cell coding (for example on page 2 where they mention relational coding by grid cells). Currently, they tend to cite studies of grid cell relational representations that are very indirect in their relationship to grid cell recordings (i.e. indirect fMRI measures by Constaninescu et al., 2016 or the very abstract models by Whittington et al., 2020). They should cite more papers on actual neurophysiological recordings of grid cells that suggest relational/metric representations, and they should cite more of the previous modeling papers that have addressed relational representations. This could include work on using grid cell relational coding to guide spatial behavior (e.g. Erdem and Hasselmo, 2014; Bush, Barry, Manson, Burges, 2015). This could also include other papers on the grid cell code beyond the paper by Wei et al., 2015 - they could also cite work on the efficiency of coding by Sreenivasan and Fiete and by Mathis, Herz, and Stemmler.