2,373 Matching Annotations
  1. Last 7 days
    1. Author response:

      [The following is the authors’ response to the current reviews.]

      In response to Reviewer #2, we agree with the reviewer that it needs to be noted that not all forms of recognition are the same and have added the following: "However, we note that not all forms of recognition are the same; researchers may prefer to have their work featured instead of personal stories or critiques of the scientific environment."


      [The following is the authors’ response to the previous reviews.]

      We thank both reviewers for their detailed comments and insightful suggestions. Below we summarize our responses to each concern in addition to the edits within the manuscript.

      We would also like to add a clarification to the eLife assessment, it states “This important bibliometric analysis shows that authors of scientific papers whose names suggest they are female or East Asian get quoted less often in news stories about their work.” We show that individuals with names predicted to be from women or East Asian name origins are less likely to be quoted or mentioned in Nature’s scientific news stories than expected by publication demographics. In this study, we did not compare the level of coverage of a scientific article by the demographics of the authors of the article.

      Reviewer #1

      The article is not so clearly structured, which makes it hard to follow. A better framing, contextualization, and conceptualization of their analysis would help the readers to better understand the results. There are some unclear definitions and wrong wording of key concepts.

      We have adapted our wording in the text and added a more detailed discussion which hopefully makes the paper easier to comprehend. These changes are described in the context of your reviewer's suggestions and addressed in the next section.

      Language use: Male/Female refers to sex, not to gender.

      We have now updated the language throughout the text. Thank you for pointing this out.

      Regional disparities are not the same as names' origin. While the first might relate to the academic origin of authors, inferred from their institutional belonging, the latter reflects the authors' inferred identity. Ethnic identities and the construction of prejudice against specific populations need proper contextualization.

      We have added better contextualization in the manuscript and reworded the section in our results and discussion to clarify that we are analyzing disparities related to perceived ethnicity and not regions. We also added the following text to the results section “In our analysis, we use name origin as an estimate for the perceived ethnicity of a primary source by a journalist. Our prediction is not intended to assign ethnicity to an individual, but to be used broadly as a tool to quantify representational differences in a journalist's sociologically constructed perception of a primary source's ethnicity.” We also added the following text to our Discussion: “Our use of name origins is a proxy for a journalist's or referring scholarly peer’s potential perceptions of the ethnicity of a primary source as signaled by an individual's name. We do not intend to assign an identity to an individual, but to generate a broad metric to measure possible bias for particular ethnicities during journalists' primary source gathering.”

      It would be helpful to have a clear definition of what are quotes, mentions, and citations. For me, it was not so clear and made understanding the results more difficult.

      We added the following text to the results section Extracted Data Used for Analysis: “Quoted names are any names that were attached to a quote within the article. Mentioned names are any names that were stated within the article. Cited names are all author names of a scientific paper that was cited in the news article.”

      The comparison against Nature published research articles is not perfect because journalists will also cover articles not published in Nature. If for example, the gender representation in the quoted articles is not the same between Nature journals and other journals, then this source of inequality would be missing (e.g. if the journalists are biased against women, but not as much when they published in Nature, because they are also biased towards Nature articles). Also, the gender representation among Nature authors could not be the same as in general. Nevertheless, this seems to be a fair benchmark, especially if the authors did not have access to other more comprehensive databases. But a statement of limitations including these potential issues would be good to have.

      To add better context to the generalizability of our work, we added the following text to our discussion: “Furthermore, the news articles present on "www.nature.com" are intended for a very specific readership that may not be reflective of more broad scientific news outlets. In a separate analysis, we took a cursory look into a comparison with The Guardian and found similar disparities in gender and name origin. However, it is not clear which publications should be used as a comparator for science-related articles in The Guardian, and difficult to compare relative rates of representation. While other science news outlets may not have a direct comparator, it would be useful to take a broad comparison across multiple science news outlets to compare against one another. Our existing pipeline could be easily applied to other science news outlets and identify if there exists a consistent pattern of disparity regardless of the intended readership.”

      "we select the highest probability origin for each name as the resultant assignment". Threshold based approaches for race/ethnicity name-based inference have been criticized by the literature as they might reproduce biases (see Kozlowski, D., Murray, D. S., Bell, A., Hulsey, W., Larivière, V., Monroe-White, T., & Sugimoto, C. R. (2022). Avoiding bias when inferring race using name-based approaches. Plos one, 17(3), e0264270.). The authors could use the full distribution of probabilities over names instead of selecting one. The formulae proposed (3-5) could be easily adapted to this change.

      We thank the author for pointing this out. We have updated our analysis to use the probabilities instead of hard assignments. Figure 3 and formulae 3-5 have been updated. While we observe a slight shift in the calculated values, the overall trends are unchanged.

      Is it possible to make an analysis that intersects both name origin and gender? I am not sure if the sample size would allow for this, but if some other dimensions were collapsed, it would be very important to show what happens at the intersection of these two dimensions of discrimination.

      We agree that identifying any differences in quotation patterns at the intersection of gender and name origin would be very useful to identify. To address this, we added supplemental table 5. This table identifies the number of quotes per predicted name origin and gender over all years and article types. In this table, we don’t see a significant difference in gender distribution across predicted name origins.

      Given a larger sample size, we would be able to better identify more subtle differences, but at this sample size, we cannot make more detailed inferences. Additionally, this also addresses a QC-issue, where predicted gender accuracy varies by name origin, specifically East Asian name origin. From our data, we don’t see a large difference in proportions across any name origin. We added the following text to the results section to incorporate this analysis:

      “However, it should be noted that the error rate varies by name origin with the largest decrease in performance on names with an Asian origin [@doi:10.7717/peerj-cs.156;@doi:10.5195/jmla.2021.1252]

      . In our analysis, we did not observe a large difference in names predicted to come from a man or woman between predicted East Asian and other name origins (Table 5). “

      The use of vocabulary should be more homogeneous. For example, in page 13 the authors start to use the concepts of over/under enrichment, which appeared before in a title but was not used.

      The text has been updated to remove all mentions of “over/under enrichment” with “over/under representation”

      In the discussions section, it would be important to see as a statement of limitations the problems that automatic origin and gender inference have.

      We thank the reviewer for this suggestion. We have added the following paragraph to our discussion.

      Computational tools enabled us to automatically analyze thousands of articles to identify existing disparities by gender and name origin, but these tools are not without limitations. Our tools are unable to identify non-binary people and rely on gender predictors that are known to have region-specific biases, with the largest decrease in performance on names of an Asian origin [@doi:10.7717/peerj-cs.156;@doi:10.5195/jmla.2021.1252]. Furthermore, name origin is only a proxy for externally perceived racial or ethnic origins of a source or author and is not as accurate as self-identified race or ethnicity. Self-identification better captures the lived experience of an individual that computational estimates from a name can not capture. This is highlighted in our inability to distinguish between Black and White people from the US by their names. As the collection of demographic data by publication outlets grows, we believe this will enable a more fine-grained and accurate analysis of disparities in scientific journalism.

      Figures 2a and 3a show that the affiliations of authors and their countries was going to be used in this analysis. Yet, this section is not present in the article. I would encourage the authors to add this to the analysis as it would show important patterns, and to intersect the dimensions of gender, name origin and country.

      We were interested in using this analysis in our work, but unfortunately the sample size of cited works in each country was too small to make inferences. If this work was extended to larger scientific outlets to include larger corpora such as The Guardian or New York Times, we think one could be able to make more robust inferences. Since our work only focuses on Nature, we decided not to include this analysis. However, we do include a section in our discussion for future work.

      “As a proxy for measuring possible geographical bias of a journalist, we attempted to identify if there was any geographical bias of cited authors. To do this, we identified the affiliation of each cited author and identified their affiliated country. Unfortunately, we could not robustly extract a large enough number of cited authors from different countries to make any conclusive statements. Expanding our work to other science journalism outlets could help identify possible ways in which geographic region, genders, and perceived ethnicity interact and affect scientific visibility of specific groups. While we are unable to identify that journalists have a specific geographical bias, having reporters explicitly focused on specific regional sources will broaden coverage of international opinions in science.”

      It is not clear at that point what column dependence means.

      The abstract has been updated to state, “Gender disparity in Nature quotes was dependent on the article type.”

      Reviewer #2

      We thank the reviewer for their very detailed and insightful suggestions regarding our analysis and the key caveats that needed better contextualization in our analysis. We went through each major point the reviewer brought up below and included any additional text that was needed.

      In some cases, the manuscript lacks consistency in terminology, and uses word choice that is strange (e.g., "enrichment" and "depletion" when discussion representation).

      We thank the review for pointing this out, we have removed all instances of depletion/enrichment for over/under-representation

      Caveats to Claim 1. So while Claim 1 holds, it does not hold for all comparator sets and for all years. I don't think this is critical of the paper-the authors do discuss the trend in Claim 2-but interpretation of this claim should take care of these caveats, and readers should consider the important differences in first and last authorship.

      We thank the reviewer for their detailed feedback on this section. We have added the missing contextualization of our results. In the results section, I changed the figure caption to: “Speakers predicted to be men are sometimes overrepresented in quotes, but this depends on the year and article type.” Added the following paragraph “When considering the relative proportion of authors and speakers predicted to be men, we only find a slight over-representation of men. This overrepresentation is dependent on the authorship position and the year. Before 2010, quotes predicted as from men are overrepresented in comparison to both first and last authors, but between 2010 and 2017 quotes predicted from men are only overrepresented in comparison for first authors. In 2020, we find a slight over-representation of quotes predicted to be from women relative to first and last authors, but still severely under-represented when considering the general population. The choice of comparison between first and last authors can reveal different aspects of the current state of academia. While this does not hold in all scientific fields, first authors are typically early career scientists and last authors are more senior scientists. It has also been shown that early career scientists tend to be more diverse than senior scientists [@doi:10.7554/eLife.60829; @doi:10.1096/fj.201800639]. Since we find that quotes are only slightly more likely to come from a last author, it is reasonable to compare the relative rate of predicted quotes from men to either authorship position. Comparison with last authorships may reveal more how gender bias currently exists whereas comparison with early career scientists may reveal bias in comparison to a future, more possibly diverse academic environment. We hope that increased representation and recognition of women in science, even beyond what is observed in authorship, can increase the proportion of women first and last authors such that it better reflects the general population.”

      Generalizability to other contexts of science journalism:

      We thank the reviewer for their feedback on the generalizability of our work. We have now added the following text to our discussion to provide the reader with a better context of our results: “To articles presented on "www.nature.com" are intended for a very specific readership that may not be reflective of more broad scientific news outlets. In a separate analysis, we took a cursory look into a comparison with The Guardian and found very similar disparities in gender and name origin. However, it is not clear which publications should be used as a comparator for science-related articles in The

      Guardian, and difficult to compare relative rates of representation. While other science news outlets may not have a direct comparator, it would be useful to take a broad comparison across multiple science news outlets to compare against one another. Our existing pipeline could be easily applied to other science news outlets and identify if there exists a consistent pattern of disparity regardless of the intended readership. ”

      Shallow discussion:

      The authors highlight gender parity in career features, but why exactly is there gender parity in this format

      We thank the reviewer for encouraging us to better contextualize our findings in the broader discourse. We have now added several sections to our Discussion. To address gender parity, we have added the following text: “This finding, coupled with the near equal number of articles written by journalists predicted to be men or women, argues for more diversity in topical coverage. "Career Feature" articles highlight current topics relevant to working scientists and frequently highlight systemic issues with the scientific environment. This column allows space for marginalized people to critique the current state of affairs in science or share their personal stories. This type of content encourages the journalist to seek out a diverse set of primary sources. Including more content that is not primarily focused on recent publications, but all topics surrounding the practice of science, can serve as an additional tool to rapidly achieve gender parity in journalistic recognition.”

      Representation in quotations varies by first and last author, most certainly as a result of the academic division of labor in the life sciences. However, what does it say about the scientific quotation that it appears first authors are more often to be quoted? Does this mean that the division of labor is changing such that the first authors are the lead scientists? Or does it imply that senior authors are being skipped over, or giving away their chance to comment on a study to the first author?

      We thank the reviewer for asking bringing up these important questions. We have added better context to our first author analysis in our discussion. We have included the following two sections to address this. Also, we want to state that we find last authors to be slightly more quoted than first authors, as depicted in Fig. 2d., with first author quotation percentage largely appearing below the red line. We included this text in a response above and include it again here for convenience.

      “Before 2010, quotes predicted as from men are overrepresented in comparison to both first and last authors, but between 2010 and 2017 quotes predicted from men are only overrepresented in comparison for first authors. In 2020, we find a slight over-representation of quotes predicted to be from women relative to first and last authors, but still severely under-represented when considering the general population. The choice of comparison between first and last authors can reveal different aspects of the current state of academia. While this does not hold in all scientific fields, first authors are typically early career scientists and last authors are more senior scientists. It has also been shown that early career scientists tend to be more diverse than senior scientists [@doi:10.7554/eLife.60829; @doi:10.1096/fj.201800639]. Since we find that quotes are only slightly more likely to come from a last author, it is reasonable to compare the relative rate of predicted quotes from men to either authorship position. Comparison with last authorships may reveal more how gender bias currently exists whereas comparison with early career scientists may reveal bias in comparison to a future, more possibly diverse academic environment. We hope that increased representation and recognition of women in science, even beyond what is observed in authorship, can increase the proportion of women first and last authors such that it better reflects the general population.”

      “In our analysis, we also find that there are more first authors with predicted East Asian name origin than last authors. This is in contrast to predicted Celtic/English and European name origins.

      Furthermore, we see that the amount of first author people with predicted East Asian name origins is increasing at a much faster rate than quotes are increasing. If this mismatched rate of representation continues, this could lead to an increasingly large erasure of early career scientists with East Asian name origins. As noted before, focusing on increasing engagement with early career scientists can help to reduce the growing disparity of public visibility of scientists with East Asian name origins.”

      What might be the downstream impacts on the public stemming from the under-representation of scientists with East Asian names? According to Figure 3d, not only are East Asian names under-represented in quotations, but they are becoming more under-represented over time as they appear as authors in a greater number of Nature publications; Those with European names are proportionately represented in quotations given their share of authors in Nature. Why might this be, especially seeing as Anglo names are heavily over-represented?

      To address this point, we have added the following text to our discussion: “In our analysis, we also find that there are more first authors with predicted East Asian name origin than last authors. This is in contrast to predicted Celtic/English and European name origins. Furthermore, the amount of first author people with predicted East Asian name origins is increasing at a much faster rate than quotes are increasing. If this mismatched rate of representation continues, this could lead to an increasingly large erasure of early career scientists with East Asian name origins. As noted before, focusing on increasing engagement with early career scientists can help to reduce the growing disparity of public visibility of scientists with East Asian name origins.”

      I am very confused by Figure 1B. It mixes the counts of News-related items with (non-Springer) research articles in a single stacked bar plot which makes determining the quantity of either difficult. I would advise splitting them out

      Figure 1B has been updated, and the News and Research articles have been separated.

      When querying the first 2000 or so results from the SpringerNature API, are the authors certain that they are getting a random sample of papers?

      These papers were the first 200 English language "Journal" papers returned by the Springer Nature API for each month, resulting in 2400 papers per year from 2005 through 2020. These papers are the first 200 papers published each month by a Springer Nature journal, which may not be completely random, but we believe to be a reasonably representative sample. Furthermore, the Springer Nature comparator set is being used as an additional comparator to the complete set of all Nature research papers used in our analyses.

      In all figures: the authors use capital letters to indicate panels in the caption, but lowercase letters in the figure itself and in the main text. This should be made consistent.

      This has been updated.

      In all figures: the authors should make the caption letter bold in the figure captions, which makes it much easier to find descriptions of specific panels

      This has been updated.

      In the section "coreNLP": the authors mention "co-reference resolution" but without really remarking why it is being used. This is an issue throughout the methods-the authors describe what method they are using but either they don't mention why they are using that method until later, or else not at all.

      We have added better reasoning behind our coreNLP selected methods: “We used the standard set of annotaters: tokenize, ssplit, pos, lemma, ner, parse, coref, and additionally the quote annotator. These perform text tokenization, sentence splitting, part of speech recognition, lemmatization, named entity recoginition, division of sentences into constituent phrases, co-reference resolution, and identification of quoted entities, respectively. We used the "statistical" algorithm to perform coreference resolution for speed. Each of these aspects is required to identify the names of quoted or mentioned speakers and identify any of their associated pronouns. All results were output to json format for further downstream processing.”

      We included a better description of scrapy: “Scrapy is a tool that applies user-defined rules to follow hyperlinks on webpages and return the information contained on each webpage.

      We used Scrapy to extract all web pages containing news articles and extract the text.”

      We also included our motivation for bootstrapping: “We used the boostrap method to construct confidence intervals for each of our calculated statistics.”

      In the section "Name Formatting for Gender Prediction in Quotes or Mentions", genderizeR is mentioned before an introduction to the tool

      We added the following text to provide context: “Even though genderizeR, the computational method used to predict the name's gender, only uses the first name to make the gender prediction, identifying the full name gives us greater confidence that we correctly identified the first name. “

      In the section "Name Formatting for Gender Prediction of Authors", you state that you exclude papers with only one author. How many papers is this? I assume few, in Nature, but if not I can imagine gender differences based on who writes first-authored papers.

      We find that the number excluded is roughly 7% of all papers, which is consistent across Nature and Springer Nature (1113/15013 for cited springer articles, 2899/42155 for random springer articles, 955/12459 for nature authors). We have added the following text to the manuscript for better context: “Roughly 7% of all papers were estimated to be by a single author and removed from this analysis.: 1113/15013 for cited Springer articles, 2899/42155 for random Springer articles, 955/12459 for Nature research articles.”

      In "Name Origin Analysis", for the in-text reference to Equation 3: include the prefix "Eq." or similar to mark this as referencing the equation and not something else

      This has been updated.

      The use of the word "enrichment" in reference to the representation of East Asian authors is strange and does not fit the colloquial definition of the term. I suggest just using a simpler term like "representation" instead

      Similarly, the authors use the word "depletion" to reflect the lower rate of quotes to scientists with East-Asian names, but I feel a simpler word would be more appropriate.

      We thank the reviewer for this suggestion, all instances of “enrichment/depletion” have been replaced with “over/under representation”

      The authors claim in Figure 2d that there is a steady increase in the rate of first author citations, however, this graph is not convincing. It appears to show much more noise than anything resembling a steady change.

      We have reworded our figure description to state that there is a consistent bias towards quoting last authors. Our figure description now states: “Panel d shows a consistent but slight bias towards quoting the last author of a cited article than the first author over time.”

      Supplemental Figures 1b and 1c do not seem to be mentioned in the main text, and I struggle to see their relevance.

      We thank the reviewer for identifying this error; these subpanels have been removed.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Point-by-point response to concerns raised by reviewer #3:

      The manuscript has improved very substantially in revision. The authors have clearly taken the comments on board in good faith. Yet, some small concerns remain around the behavioural analysis.

      In Fig. 8H and H' average sleep/day is ~100. Is this minutes of sleep? 100 min/day is far too low, is it a typo?

      The numbers for sleep bouts are also too low to me e.g. in Fig 9 number of sleep bouts avg around 4.

      In their response to reviewers the authors say these errors were fixed, yet the figures appear not to have been changed. Perhaps the old figures were left in inadvertently?

      Indeed this correction was somehow missed and we thank the reviewer for noticing this. We have now corrected Fig 8H-H’ and Fig 9D.  

      The circadian anticipatory activity analyses could also be improved. The standard in the field is to perform eduction analyses and quantify anticipatory activity e.g. using the method of Harrisingh et al. (PMID: 18003827). This typically computed as the ratio of activity in the 3hrs preceding light transition to activity in the 6hrs preceding light transition.

      In their response to reviewers, the authors have revised their anticipation analyses by quantifying the mean activity in the 6 hrs preceding light transition. However, in the method of Harrisingh et al., anticipation is the ratio of activity in the 3hrs preceding light transition to activity in the 6hrs preceding light transition. Simply computing the activity in the 6hrs preceding light transition does not give a measure of anticipation, determining the ratio is key.

      We acknowledge the importance of obtaining accurate results in our analysis, therefore we have re-evaluated the anticipation activity by measuring the ratio of the mean activity in the 3h preceding light transition over the activity in the 6h preceding light transition. We have reported the data as percentages in Fig 8F-G and modified the figure legends accordingly.

    1. Author response:

      eLife assessment 

      This important study provides evidence for a combination of the latest generation of Oxford Nanopore Technology long reads with state-of-the art variant callers enabling bacterial variant discovery at accuracy that matches or exceeds the current "gold standard" with short reads. The evidence supporting the claims of the authors is convincing, although the inclusion of a larger number of reference genomes would further strengthen the study. The work will be of interest to anyone performing sequencing for outbreak investigations, bacterial epidemiology, or similar studies. 

      We thank the editor and reviewers for the accurate summary and positive assessment. We address the comment about increasing the number of reference genomes in the response to reviewer 2.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      The authors assess the accuracy of short variant calling (SNPs and indels) in bacterial genomes using Oxford Nanopore reads generated on R10.4 flow cells from a very similar genome (99.5% ANI), examining the impact of variant caller choice (three traditional variant callers: bcftools, freebayes, and longshot, and three deep learning based variant callers: clair3, deep variant, and nano caller), base calling model (fast, hac and sup) and read depth (using both simplex and duplex reads). 

      Strengths: 

      Given the stated goal (analysis of variant calling for reads drawn from genomes very similar to the reference), the analysis is largely complete and results are compelling. The authors make the code and data used in their analysis available for re-use using current best practices (a computational workflow and data archived in INSDC databases or Zenodo as appropriate). 

      Weaknesses: 

      While the medaka variant caller is now deprecated for diploid calling, it is still widely used for haploid variant calling and should at least be mentioned (even if the mention is only to explain its exclusion from the analysis). 

      We agree that this would be an informative addition to the study and will add it to the benchmarking.

      Appraisal: 

      The experiments the authors engaged in are well structured and the results are convincing. I expect that these results will be incorporated into "best practice" bacterial variant calling workflows in the future. 

      Thank you for the positive appraisal.

      Reviewer #2 (Public Review): 

      Summary: 

      Hall et al describe the superiority of ONT sequencing and deep learning-based variant callers to deliver higher SNP and Indel accuracy compared to previous gold-standard Illumina short-read sequencing. Furthermore, they provide recommendations for read sequencing depth and computational requirements when performing variant calling. 

      Strengths: 

      The study describes compelling data showing ONT superiority when using deep learning-based variant callers, such as Clair3, compared to Illumina sequencing. This challenges the paradigm that Illumina sequencing is the gold standard for variant calling in bacterial genomes. The authors provide evidence that homopolymeric regions, a systematic and problematic issue with ONT data, are no longer a concern in ONT sequencing. 

      Weaknesses: 

      (1) The inclusion of a larger number of reference genomes would have strengthened the study to accommodate larger variability (a limitation mentioned by the authors). 

      Our strategic selection of 14 genomes—spanning a variety of bacterial genera and species, diverse GC content, and both gram-negative and gram-positive species (including M. tuberculosis, which is neither)—was designed to robustly address potential variability in our results. Moreover, all our genome assemblies underwent rigorous manual inspection as the quality of the true genome sequences is the foundation this research is built upon. Given this, the fundamental conclusions regarding the accuracy of variant calls would likely remain unchanged with the addition of more genomes.  However, we do acknowledge that a substantially larger sample size, which is beyond the scope of this study, would enable more fine-grained analysis of species differences in error rates.

      (2) In Figure 2, there are clearly one or two samples that perform worse than others in all combinations (are always below the box plots). No information about species-specific variant calls is provided by the authors but one would like to know if those are recurrently associated with one or two species. Species-specific recommendations could also help the scientific community to choose the best sequencing/variant calling approaches.

      Thank you for highlighting this observation. The precision, recall, and F1 scores for each sample and condition can be found in Supplementary Table S4. We will investigate the samples that consistently perform below expectation to determine if this is associated with specific species, which may necessitate tailored recommendations for those species. Additionally, we will produce a species-segregated version of Figure 2 for a clearer interpretation and will place it in the supplementary materials.

      (3) The authors support that a read depth of 10x is sufficient to achieve variant calls that match or exceed Illumina sequencing. However, the standard here should be the optimal discriminatory power for clinical and public health utility (namely outbreak analysis). In such scenarios, the highest discriminatory power is always desirable and as such an F1 score, Recall and Precision that is as close to 100% as possible should be maintained (which changes the minimum read sequencing depth to at least 25x, which is the inflection point).

      We agree that the highest discriminatory power is always desirable for clinical or public health applications. In which case, 25x is probably a better minimum recommendation. However, we are also aware that there are resource-limited settings where parity with Illumina is sufficient. In these cases, 10x depth from ONT would provide sufficient data.

      The manuscript currently emphasises the latter scenario, but we will revise the text to clearly recommend 25x depth as a conservative aim in settings where resources are not a constraint, ensuring the highest possible discriminatory power for applications like outbreak analysis.

      (4) The sequencing of the samples was not performed with the same Illumina and ONT method/equipment, which could have introduced specific equipment/preparation artefacts that were not considered in the study. See for example https://academic.oup.com/nargab/article/3/1/lqab019/6193612

      To our knowledge, there is no evidence that sequencing on different ONT machines or barcoding kits leads to a difference in read characteristics or accuracy. To ensure consistency and minimise potential variability, we used the same ONT flowcells for all samples and performed basecalling on the same Nvidia A100 GPU. We will update the methods to emphasise this.

      For Illumina and ONT, the exact machines used for which samples will be added as a supplementary table. We will also add a comment about possible Illumina error rate differences in the ‘Limitations’ section of the Discussion.

      In summary, while there may be specific equipment or preparation artifacts to consider, we took steps to minimise these effects and maintain consistency across our sequencing methods.

      Reviewer #3 (Public Review): 

      Hall et al. benchmarked different variant calling methods on Nanopore reads of bacterial samples and compared the performance of Nanopore to short reads produced with Illumina sequencing. To establish a common ground for comparison, the authors first generated a variant truth set for each sample and then projected this set to the reference sequence of the sample to obtain a mutated reference. Subsequently, Hall et al. called SNPs and small indels using commonly used deep learning and conventional variant callers and compared the precision and accuracy from reads produced with simplex and duplex Nanopore sequencing to Illumina data. The authors did not investigate large structural variation, which is a major limitation of the current manuscript. It will be very interesting to see a follow-up study covering this much more challenging type of variation. 

      We fully agree that investigating structural variations (SVs) would be a very interesting and important follow-up. Identifying and generating ground truth SVs is a nontrivial task and we feel it deserves its own space and study. We hope to explore this in the future.

      In their comprehensive comparison of SNPs and small indels, the authors observed superior performance of deep learning over conventional variant callers when Nanopore reads were basecalled with the most accurate (but also computationally very expensive) model, even exceeding Illumina in some cases. Not surprisingly, Nanopore underperformed compared to Illumina when basecalled with the fastest (but computationally much less demanding) method with the lowest accuracy. The authors then investigated the surprisingly higher performance of Nanopore data in some cases and identified lower recall with Illumina short read data, particularly from repetitive regions and regions with high variant density, as the driver. Combining the most accurate Nanopore basecalling method with a deep learning variant caller resulted in low error rates in homopolymer regions, similar to Illumina data. This is remarkable, as homopolymer regions are (or, were) traditionally challenging for Nanopore sequencing. 

      Lastly, Hall et al. provided useful information on the required Nanopore read depth, which is surprisingly low, and the computational resources for variant calling with deep learning callers. With that, the authors established a new state-of-the-art for Nanopore-only variant, calling on bacterial sequencing data. Most likely these findings will be transferred to other organisms as well or at least provide a proof-of-concept that can be built upon. 

      As the authors mention multiple times throughout the manuscript, Nanopore can provide sequencing data in nearly real-time and in remote regions, therefore opening up a ton of new possibilities, for example for infectious disease surveillance. 

      However, the high-performing variant calling method as established in this study requires the computationally very expensive sup and/or duplex Nanopore basecalling, whereas the least computationally demanding method underperforms. Here, the manuscript would greatly benefit from extending the last section on computational requirements, as the authors determine the resources for the variant calling but do not cover the entire picture. This could even be misleading for less experienced researchers who want to perform bacterial sequencing at high performance but with low resources. The authors mention it in the discussion but do not make clear enough that the described computational resources are probably largely insufficient to perform the high-accuracy basecalling required. 

      We have provided runtime benchmarks for basecalling in Supplementary Figure S16 and detailed these times in Supplementary Table S7. In addition, we state in the Results section (P10 L228-230) “Though we do note that if the person performing the variant calling has received the raw (pod5) ONT data, basecalling also needs to be accounted for, as depending on how much sequencing was done, this step can also be resource-intensive.”

      Even with super-accuracy basecalling considered, our analysis shows that variant calling remains the most resource-intensive step for Clair3, DeepVariant, FreeBayes, and NanoCaller. Therefore, the statement “the described computational resources are probably largely insufficient to perform the high-accuracy basecalling required”, is incorrect. However, we will endeavour to make the basecalling component and considerations more prominent in the Results and Discussion.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      (1) The modeling process is outlined, but an explanation of why Maxent (Phillips & Dudík, 2008) was chosen for SDMs and why the specified predictor variables were used could provide additional context. This clarity would help readers understand the rationale behind the methodology.

      In L.558-571 (Predictor variables subsection), we added the explanation about predictor variables as follows:

      “Predictors encompass a range of environmental variables recognized to impact species distribution (Table 3): land use (Newbold et al., 2015), climate (bioclim variables (Booth et al., 2014)), vegetation (Abe, 2018), lithology (Ott, 2020) and elevational range (Udy et al., 2021). Additionally, categorical variables representing known biogeographic regions, reflecting geological history, were included. We applied  Blakiston's Line —Tsugaru straits dividing the northern and main islands of Japan (i.e., Hokkaido and Honshu islands)— reflecting a significant historical migration barrier for mammals and birds (Dobson, 1994; Saitoh et al., 2015). Due to the distinct fauna (Wepfer et al., 2016; Yamasaki, 2017), we also specified oceanic islands (i.e. Ogasawara and Daito isles) which have never been connected with the Asiatic continents. Continuous environmental variables were transformed into linear, quadratic and hinge feature classes to illustrate nonlinear associations between environments and species occurrence (Phillips et al., 2017). The regularisation multiplier was set at 2.5, falling within the established optimal range of 1.5 to 4 (Elith et al., 2010; MorenoAmat et al., 2015).

      In L.614-618 (Modelling subsection), we explain why we chose MaxEnt:

      “To model species distributions from presence-only data, several algorithms have been utilised, including generalised additive models, random forest, and neural networks (Norberg et al., 2019; Valavi et al., 2022). In our study, we opted for MaxEnt (Phillips and Dudík, 2008) due to its high estimation accuracy and relatively low computational burden (Valavi et al., 2022).

      (2) While the study outlines a manual reidentification process by experts for wild individuals, it might be beneficial to elaborate on the criteria or expertise level of these experts. This transparency ensures the reliability of the reidentification process. Reply

      In L.519-523, we added description about experts as follows:

      “These experts have professional backgrounds, serving as a technician at a prefectural research institute (fish), highly-experienced field survey conductors (plants and insects, respectively), a post-doctoral researchers (amphibians and reptiles, and mammals, respectively), and a museum curator (mollusks) specialising in the focal taxa.”

      (3) The analysis of the effects of data type (Biome+Traditional data or Traditional survey data) on BI is comprehensive. However, a brief discussion on the potential implications of these effects on the study's overall conclusions could add depth to the interpretation.

      We enforced our discussion about the causes and consequences of improved modelling accuracy. 

      In L.276-282, we argued about the causes: 

      “Therefore, incorporating Biome data could significantly enhance modelling accuracy in urban and suburban landscapes, which are typically underrepresented in traditional survey data. As pseudo-absences are selected based on search effort, our models utilise numerous pseudoabsences from these areas. Consequently, this might lead to better estimation of species absence in such areas, not just presence, resulting in an overall increase in model accuracy across a wider range of species.”  

      In L.370-387, we argued how improved modelling accuracy may help build naturepositive society as follows:

      “By blending data from traditional surveys and communities, we improved the accuracy of species distribution estimates. This enhanced estimation lays the groundwork for more precise subsequent analyses. For instance, estimated distributions will be useful in selecting new protected areas or areas with OECMs (Other Effective area-based Conservation Measures: allowing a wider range of land use as long as biodiversity and ecosystem services are sustained/improved). Using estimated distributions of each species, hotspots of species or evolutionary diverse taxa can be inferred. Such sites will be good candidates for protected areas (Jones et al., 2016) or OECMs (Shiono et al., 2021). Further, estimated distributions can be used as input for spatial conservation prioritisation tools (e.g. Marxan (Ball et al., 2009))

      In our experience, stakeholders—including corporate social responsibility managers and conservation practitioners—often seek the list of species potentially inhabiting their locations. Due to the uncertainty of SDMs and their thresholding into presence/absence, on-site surveys remain essential for assessing biodiversity status. SDMs can make such surveys costeffective by screening important locations for on-site assessment (e.g., Locate phase in TNFD framework) and narrowing down the target species for surveying. Improved estimation through SDMs can mitigate risks associated with their use in society and enable more informed decisionmaking for conservation efforts.”

      Following the editorial policy, we have reorganised our supplementary materials as follows:

      -        Formerly Supplementary File 1 - Remains unchanged.

      -        Formerly Supplementary File 2 - Transferred into the main text, in the subsection "Filtering suspicious occurrence record in Biome data" in the Methods section, and Table 2. Citations remain as Supplementary File 2.

      -        Formerly Supplementary File 3 - Remains unchanged.

      -        Formerly Supplementary File 4 - Transferred into "Figure 3—figure supplement 1".

      -        Formerly Supplementary File 5 - Transferred into Figure 4.

      -        Formerly Supplementary File 6 - Transferred into the main text, in the subsection "Predictor variables" in the Methods section and Table 3.

      -        Formerly Supplementary File 7 - Transferred into the main text, in the subsection "Pseudo-absence reflecting search effort" in the Methods section and Figure 5.

      -        Formerly Supplementary File 8 - Transferred into the main text, in the subsection "Model evaluation" in the Methods section and Figure 6.

      -        Formerly Supplementary File 9 - Renamed as Supplementary File 4.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Using a cross-modal sensory selection task in head-fixed mice, the authors attempted to characterize how different rules reconfigured representations of sensory stimuli and behavioral reports in sensory (S1, S2) and premotor cortical areas (medial motor cortex or MM, and ALM). They used silicon probe recordings during behavior, a combination of single-cell and population-level analyses of neural data, and optogenetic inhibition during the task.

      Strengths:

      A major strength of the manuscript was the clarity of the writing and motivation for experiments and analyses. The behavioral paradigm is somewhat simple but well-designed and wellcontrolled. The neural analyses were sophisticated, clearly presented, and generally supported the authors' interpretations. The statistics are clearly reported and easy to interpret. In general, my view is that the authors achieved their aims. They found that different rules affected preparatory activity in premotor areas, but not sensory areas, consistent with dynamical systems perspectives in the field that hold that initial conditions are important for determining trial-based dynamics.

      Weaknesses:

      The manuscript was generally strong. The main weakness in my view was in interpreting the optogenetic results. While the simplicity of the task was helpful for analyzing the neural data, I think it limited the informativeness of the perturbation experiments. The behavioral read-out was low dimensional -a change in hit rate or false alarm rate- but it was unclear what perceptual or cognitive process was disrupted that led to changes in these read-outs. This is a challenge for the field, and not just this paper, but was the main weakness in my view. I have some minor technical comments in the recommendations for authors that might address other minor weaknesses.

      I think this is a well-performed, well-written, and interesting study that shows differences in rule representations in sensory and premotor areas and finds that rules reconfigure preparatory activity in the motor cortex to support flexible behavior.

      Reviewer #2 (Public Review):

      Summary:

      Chang et al. investigate neuronal activity firing patterns across various cortical regions in an interesting context-dependent tactile vs visual detection task, developed previously by the authors (Chevee et al., 2021; doi: 10.1016/j.neuron.2021.11.013). The authors report the important involvement of a medial frontal cortical region (MM, probably a similar location to wM2 as described in Esmaeili et al., 2021 & 2022; doi: 10.1016/j.neuron.2021.05.005; doi: 10.1371/journal.pbio.3001667) in mice for determining task rules.

      Strengths:

      The experiments appear to have been well carried out and the data well analysed. The manuscript clearly describes the motivation for the analyses and reaches clear and well-justified conclusions. I find the manuscript interesting and exciting!

      Weaknesses:

      I did not find any major weaknesses.

      Reviewer #3 (Public Review):

      This study examines context-dependent stimulus selection by recording neural activity from several sensory and motor cortical areas along a sensorimotor pathway, including S1, S2, MM, and ALM. Mice are trained to either withhold licking or perform directional licking in response to visual or tactile stimulus. Depending on the task rule, the mice have to respond to one stimulus modality while ignoring the other. Neural activity to the same tactile stimulus is modulated by task in all the areas recorded, with significant activity changes in a subset of neurons and population activity occupying distinct activity subspaces. Recordings further reveal a contextual signal in the pre-stimulus baseline activity that differentiates task context. This signal is correlated with subsequent task modulation of stimulus activity. Comparison across brain areas shows that this contextual signal is stronger in frontal cortical regions than in sensory regions. Analyses link this signal to behavior by showing that it tracks the behavioral performance switch during task rule transitions. Silencing activity in frontal cortical regions during the baseline period impairs behavioral performance.

      Overall, this is a superb study with solid results and thorough controls. The results are relevant for context-specific neural computation and provide a neural substrate that will surely inspire follow-up mechanistic investigations. We only have a couple of suggestions to help the authors further improve the paper.

      (1) We have a comment regarding the calculation of the choice CD in Fig S3. The text on page 7 concludes that "Choice coding dimensions change with task rule". However, the motor choice response is different across blocks, i.e. lick right vs. no lick for one task and lick left vs. no lick for the other task. Therefore, the differences in the choice CD may be simply due to the motor response being different across the tasks and not due to the task rule per se. The authors may consider adding this caveat in their interpretation. This should not affect their main conclusion.

      We thank the Reviewer for the suggestion. We have discussed this caveat and performed a new analysis to calculate the choice coding dimensions using right-lick and left-lick trials (Fig. S3h) on page 8. 

      “Choice coding dimensions were obtained from left-lick and no-lick trials in respond-to-touch blocks and right-lick and no-lick trials in respond-to-light blocks. Because the required lick directions differed between the block types, the difference in choice CDs across task rules (Fig. S4f) could have been affected by the different motor responses. To rule out this possibility, we did a new version of this analysis using right-lick and left-lick trials to calculate the choice coding dimensions for both task rules. We found that the orientation of the choice coding dimension in a respond-to-touch block was still not aligned well with that in a respond-to-light block (Fig. S4h;  magnitude of dot product between the respond-to-touch choice CD and the respond-to-light choice CD, mean ± 95% CI for true vs shuffled data: S1: 0.39 ± [0.23, 0.55] vs 0.2 ± [0.1, 0.31], 10 sessions; S2: 0.32 ± [0.18, 0.46] vs 0.2 ± [0.11, 0.3], 8 sessions; MM: 0.35 ± [0.21, 0.48] vs 0.18 ± [0.11, 0.26], 9 sessions; ALM: 0.28 ± [0.17, 0.39] vs 0.21 ± [0.12, 0.31], 13 sessions).”

      We also have included the caveats for using right-lick and left-lick trials to calculate choice coding dimensions on page 13.

      “However, we also calculated choice coding dimensions using only right- and left-lick trials. In S1, S2, MM and ALM, the choice CDs calculated this way were also not aligned well across task rules (Fig. S4h), consistent with the results calculated from lick and no-lick trials (Fig. S4f). Data were limited for this analysis, however, because mice rarely licked to the unrewarded water port (# of licksunrewarded port  / # of lickstotal , respond-to-touch: 0.13, respond-to-light: 0.11). These trials usually came from rule transitions (Fig. 5a) and, in some cases, were potentially caused by exploratory behaviors. These factors could affect choice CDs.”

      (2) We have a couple of questions about the effect size on single neurons vs. population dynamics. From Fig 1, about 20% of neurons in frontal cortical regions show task rule modulation in their stimulus activity. This seems like a small effect in terms of population dynamics. There is somewhat of a disconnect from Figs 4 and S3 (for stimulus CD), which show remarkably low subspace overlap in population activity across tasks. Can the authors help bridge this disconnect? Is this because the neurons showing a difference in Fig 1 are disproportionally stimulus selective neurons?

      We thank the Reviewer for the insightful comment and agree that it is important to link the single-unit and population results. We have addressed these questions by (1) improving our analysis of task modulation of single neurons  (tHit-tCR selectivity) and (2) examining the relationship between tHit-tCR selective neurons and tHit-tCR subspace overlaps.  

      Previously, we averaged the AUC values of time bins within the stimulus window (0-150 ms, 10 ms bins). If the 95% CI on this averaged AUC value did not include 0.5, this unit was considered to show significant selectivity. This approach was highly conservative and may underestimate the percentage of units showing significant selectivity, particularly any units showing transient selectivity. In the revised manuscript, we now define a unit as showing significant tHit-tCR selectivity when three consecutive time bins (>30 ms, 10ms bins) of AUC values were significant. Using this new criterion, the percentage of tHittCR selective neurons increased compared with the previous analysis. We have updated Figure 1h and the results on page 4:

      “We found that 18-33% of neurons in these cortical areas had area under the receiver-operating curve (AUC) values significantly different from 0.5, and therefore discriminated between tHit and tCR trials (Fig. 1h; S1: 28.8%, 177 neurons; S2: 17.9%, 162 neurons; MM: 32.9%, 140 neurons; ALM: 23.4%, 256 neurons; criterion to be considered significant: Bonferroni corrected 95% CI on AUC did not include 0.5 for at least 3 consecutive 10-ms time bins).”

      Next, we have checked how tHit-tCR selective neurons were distributed across sessions. We found that the percentage of tHit-tCR selective neurons in each session varied (S1: 9-46%, S2: 0-36%, MM:25-55%, ALM:0-50%). We examined the relationship between the numbers of tHit-tCR selective neurons and tHit-tCR subspace overlaps. Sessions with more neurons showing task rule modulation tended to show lower subspace overlap, but this correlation was modest and only marginally significant (r= -0.32, p= 0.08, Pearson correlation, n= 31 sessions). While we report the percentage of neurons showing significant selectivity as a simple way to summarize single-neuron effects, this does neglect the magnitude of task rule modulation of individual neurons, which may also be relevant. 

      In summary, the apparent disconnect between the effect sizes of task modulation of single neurons and of population dynamics could be explained by (1) the percentages of tHit-tCR selective neurons were underestimated in our old analysis, (2) tHit-tCR selective neurons were not uniformly distributed among sessions, and (3) the percentages of tHit-tCR selective neurons were weakly correlated with tHit-tCR subspace overlaps. 

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      For the analysis of choice coding dimensions, it seems that the authors are somewhat data limited in that they cannot compare lick-right/lick-left within a block. So instead, they compare lick/no lick trials. But given that the mice are unable to initiate trials, the interpretation of the no lick trials is a bit complicated. It is not clear that the no lick trials reflect a perceptual judgment about the stimulus (i.e., a choice), or that the mice are just zoning out and not paying attention. If it's the latter case, what the authors are calling choice coding is more of an attentional or task engagement signal, which may still be interesting, but has a somewhat different interpretation than a choice coding dimension. It might be worth clarifying this point somewhere, or if I'm totally off-base, then being more clear about why lick/no lick is more consistent with choice than task engagement.

      We thank the Reviewer for raising this point. We have added a new paragraph on page 13 to clarify why we used lick/no-lick trials to calculate choice coding dimensions, and we now discuss the caveat regarding task engagement.  

      “No-lick trials included misses, which could be caused by mice not being engaged in the task. While the majority of no-lick trials were correct rejections (respond-to-touch: 75%; respond-to-light: 76%), we treated no-licks as one of the available choices in our task and included them to calculate choice coding dimensions (Fig. S4c,d,f). To ensure stable and balanced task engagement across task rules, we removed the last 20 trials of each session and used stimulus parameters that achieved similar behavioral performance for both task rules (Fig. 1d; ~75% correct for both rules).”

      In addition, to address a point made by Reviewer 3 as well as this point, we performed a new analysis to calculate choice coding dimensions using right-lick vs left-lick trials. We report this new analysis on page 8:

      “Choice coding dimensions were obtained from left-lick and no-lick trials in respond-to-touch blocks and right-lick and no-lick trials in respond-to-light blocks. Because the required lick directions differed between the block types, the difference in choice CDs across task rules (Fig. S4f) could have been affected by the different motor responses. To rule out this possibility, we did a new version of this analysis using right-lick and left-lick trials to calculate the choice coding dimensions for both task rules. We found that the orientation of the choice coding dimension in a respond-to-touch block was still not aligned well with that in a respond-to-light block (Fig. S4h;  magnitude of dot product between the respond-to-touch choice CD and the respond-to-light choice CD, mean ± 95% CI for true vs shuffled data: S1: 0.39 ± [0.23, 0.55] vs 0.2 ± [0.1, 0.31], 10 sessions; S2: 0.32 ± [0.18, 0.46] vs 0.2 ± [0.11, 0.3], 8 sessions; MM: 0.35 ± [0.21, 0.48] vs 0.18 ± [0.11, 0.26], 9 sessions; ALM: 0.28 ± [0.17, 0.39] vs 0.21 ± [0.12, 0.31], 13 sessions).” 

      We added discussion of the limitations of this new analysis on page 13:

      “However, we also calculated choice coding dimensions using only right- and left-lick trials. In S1, S2, MM and ALM, the choice CDs calculated this way were also not aligned well across task rules (Fig. S4h), consistent with the results calculated from lick and no-lick trials (Fig. S4f). Data were limited for this analysis, however, because mice rarely licked to the unrewarded water port (# of licksunrewarded port  / # of lickstotal , respond-to-touch: 0.13, respond-to-light: 0.11). These trials usually came from rule transitions (Fig. 5a) and, in some cases, were potentially caused by exploratory behaviors. These factors could affect choice CDs.”

      The authors find that the stimulus coding direction in most areas (S1, S2, and MM) was significantly aligned between the block types. How do the authors interpret that finding? That there is no major change in stimulus coding dimension, despite the change in subspace? I think I'm missing the big picture interpretation of this result.

      That there is no significant change in stimulus coding dimensions but a change in subspace suggests that the subspace change largely reflects a change in the choice coding dimensions.

      As I mentioned in the public review, I thought there was a weakness with interpretation of the optogenetic experiments, which the authors generally interpret as reflecting rule sensitivity. However, given that they are inhibiting premotor areas including ALM, one might imagine that there might also be an effect on lick production or kinematics. To rule this out, the authors compare the change in lick rate relative to licks during the ITI. What is the ITI lick rate? I assume pretty low, once the animal is welltrained, in which case there may be a floor effect that could obscure meaningful effects on lick production. In addition, based on the reported CI on delta p(lick), it looks like MM and AM did suppress lick rate. I think in the future, a task with richer behavioral read-outs (or including other measurements of behavior like video), or perhaps something like a psychological process model with parameters that reflect different perceptual or cognitive processes could help resolve the effects of perturbations more precisely.

      Eighteen and ten percent of trials had at least one lick in the ITI in respond-to-touch and  respond-tolight blocks, respectively. These relatively low rates of ITI licking could indeed make an effect of optogenetics on lick production harder to observe. We agree that future work would benefit from more complex tasks and measurements, and have added the following to make this point (page 14):

      “To more precisely dissect the effects of perturbations on different cognitive processes in rule-dependent sensory detection, more complex behavioral tasks and richer behavioral measurements are needed in the future.”

      Reviewer #2 (Recommendations For The Authors):

      I have the following minor suggestions that the authors might consider in revising this already excellent manuscript :

      (1) In addition to showing normalised z-score firing rates (e.g. Fig 1g), I think it is important to show the grand-average mean firing rates in Hz.

      We thank the Reviewer for the suggestion and have added the grand-average mean firing rates as a new supplementary figure (Fig. S2a). To provide more details about the firing rates of individual neurons, we have also added to this new figure the distribution of peak responses during the tactile stimulus period (Fig. S2b).

      (2) I think the authors could report more quantitative data in the main text. As a very basic example, I could not easily find how many neurons, sessions, and mice were used in various analyses.

      We have added relevant numbers at various points throughout the Results, including within the following examples:

      Page 3: “To examine how the task rules influenced the sensorimotor transformation occurring in the tactile processing stream, we performed single-unit recordings from sensory and motor cortical areas including S1, S2, MM and ALM (Fig. 1e-g, Fig. S1a-h, and Fig. S2a; S1: 6 mice, 10 sessions, 177 neurons, S2: 5 mice, 8 sessions, 162 neurons, MM: 7 mice, 9 sessions, 140 neurons, ALM: 8 mice, 13 sessions, 256 neurons).”

      Page 5: “As expected, single-unit activity before stimulus onset did not discriminate between tactile and visual trials (Fig. 2d; S1: 0%, 177 neurons; S2: 0%, 162 neurons; MM: 0%, 140 neurons; ALM: 0.8%, 256 neurons). After stimulus onset, more than 35% of neurons in the sensory cortical areas and approximately 15% of neurons in the motor cortical areas showed significant stimulus discriminability (Fig. 2e; S1: 37.3%, 177 neurons; S2: 35.2%, 162 neurons; MM: 15%, 140 neurons; ALM: 14.1%, 256 neurons).”

      Page 6: “Support vector machine (SVM) and Random Forest classifiers showed similar decoding abilities

      (Fig. S3a,b; medians of classification accuracy [true vs shuffled]; SVM: S1 [0.6 vs 0.53], 10 sessions, S2

      [0.61 vs 0.51], 8 sessions, MM [0.71 vs 0.51], 9 sessions, ALM [0.65 vs 0.52], 13 sessions; Random

      Forests: S1 [0.59 vs 0.52], 10 sessions, S2 [0.6 vs 0.52], 8 sessions, MM [0.65 vs 0.49], 9 sessions, ALM [0.7 vs 0.5], 13 sessions).”

      Page 6: “To assess this for the four cortical areas, we quantified how the tHit and tCR trajectories diverged from each other by calculating the Euclidean distance between matching time points for all possible pairs of tHit and tCR trajectories for a given session and then averaging these for the session (Fig. 4a,b; S1: 10 sessions, S2: 8 sessions, MM: 9 sessions, ALM: 13 sessions, individual sessions in gray and averages across sessions in black; window of analysis: -100 to 150 ms relative to stimulus onset; 10 ms bins; using the top 3 PCs; Methods).” 

      Page 8: “In contrast, we found that S1, S2 and MM had stimulus CDs that were significantly aligned between the two block types (Fig. S4e; magnitude of dot product between the respond-to-touch stimulus CDs and the respond-to-light stimulus CDs, mean ± 95% CI for true vs shuffled data: S1: 0.5 ± [0.34, 0.66] vs 0.21 ± [0.12, 0.34], 10 sessions; S2: 0.62 ± [0.43, 0.78] vs 0.22 ± [0.13, 0.31], 8 sessions; MM: 0.48 ± [0.38, 0.59] vs 0.24 ± [0.16, 0.33], 9 sessions; ALM: 0.33 ± [0.2, 0.47] vs 0.21 ± [0.13, 0.31], 13 sessions).”  Page 9: “For respond-to-touch to respond-to-light block transitions, the fractions of trials classified as respond-to-touch for MM and ALM decreased progressively over the course of the transition (Fig. 5d; rank correlation of the fractions calculated for each of the separate periods spanning the transition, Kendall’s tau, mean ± 95% CI: MM: -0.39 ± [-0.67, -0.11], 9 sessions, ALM: -0.29 ± [-0.54, -0.04], 13 sessions; criterion to be considered significant: 95% CI on Kendall’s tau did not include 0).

      Page 11: “Lick probability was unaffected during S1, S2, MM and ALM experiments for both tasks, indicating that the behavioral effects were not due to an inability to lick (Fig. 6i, j; 95% CI on Δ lick probability for cross-modal selection task: S1/S2 [-0.18, 0.24], 4 mice, 10 sessions; MM [-0.31, 0.03], 4 mice, 11 sessions; ALM [-0.24, 0.16], 4 mice, 10 sessions; Δ lick probability for simple tactile detection task: S1/S2 [-0.13, 0.31], 3 mice, 3 sessions; MM [-0.06, 0.45], 3 mice, 5 sessions; ALM [-0.18, 0.34], 3 mice, 4 sessions).”

      (3) Please include a clearer description of trial timing. Perhaps a schematic timeline of when stimuli are delivered and when licking would be rewarded. I may have missed it, but I did not find explicit mention of the timing of the reward window or if there was any delay period.

      We have added the following (page 3): 

      “For each trial, the stimulus duration was 0.15 s and an answer period extended from 0.1 to 2 s from stimulus onset.”

      (4) Please include a clear description of statistical tests in each figure legend as needed (for example please check Fig 4e legend).

      We have added details about statistical tests in the figure legends:

      Fig. 2f: “Relationship between block-type discriminability before stimulus onset and tHit-tCR discriminability after stimulus onset for units showing significant block-type discriminability prior to the stimulus. Pearson correlation: S1: r = 0.69, p = 0.056, 8 neurons; S2: r = 0.91, p = 0.093, 4 neurons; MM: r = 0.93, p < 0.001, 30 neurons; ALM: r = 0.83, p < 0.001, 26 neurons.” 

      Fig. 4e: “Subspace overlap for control tHit (gray) and tCR (purple) trials in the somatosensory and motor cortical areas. Each circle is a subspace overlap of a session. Paired t-test, tCR – control tHit: S1: -0.23, 8 sessions, p = 0.0016; S2: -0.23, 7 sessions, p = 0.0086; MM: -0.36, 5 sessions, p = <0.001; ALM: -0.35, 11 sessions, p < 0.001; significance: ** for p<0.01, *** for p<0.001.”  

      Fig. 5d,e: “Fraction of trials classified as coming from a respond-to-touch block based on the pre-stimulus population state, for trials occurring in different periods (see c) relative to respond-to-touch → respondto-light transitions. For MM (top row) and ALM (bottom row), progressively fewer trials were classified as coming from the respond-to-touch block as analysis windows shifted later relative to the rule transition. Kendall’s tau (rank correlation): MM: -0.39, 9 sessions; ALM: -0.29, 13 sessions. Left panels: individual sessions, right panels: mean ± 95% CI. Dash lines are chance levels (0.5). e, Same as d but for respond-to-light → respond-to-touch transitions. Kendall’s tau: MM: 0.37, 9 sessions; ALM: 0.27, 13 sessions.”

      Fig. 6: “Error bars show bootstrap 95% CI. Criterion to be considered significant: 95% CI did not include 0.”

      (5) P. 3 - "To examine how the task rules influenced the sensorimotor transformation occurring in the tactile processing stream, we performed single-unit recordings from sensory and motor cortical areas including S1, S2, MM, and ALM using 64-channel silicon probes (Fig. 1e-g and Fig. S1a-h)." Please specify if these areas were recorded simultaneously or not.

      We have added “We recorded from one of these cortical areas per session, using 64-channel silicon probes.”  on page 3.  

      (6) Figure 4b - Please describe what gray and black lines show.

      The gray traces are the distance between tHit and tCR trajectories in individual sessions and the black traces are the averages across sessions in different cortical areas. We have added this information on page 6 and in the Figure 4b legend. 

      Page 6: “To assess this for the four cortical areas, we quantified how the tHit and tCR trajectories diverged from each other by calculating the Euclidean distance between matching time points for all possible pairs of tHit and tCR trajectories for a given session and then averaging these for the session (Fig. 4a,b; S1: 10 sessions, S2: 8 sessions, MM: 9 sessions, ALM: 13 sessions, individual sessions in gray and averages across sessions in black; window of analysis: -100 to 150 ms relative to stimulus onset; 10 ms bins; using the top 3 PCs; Methods).

      Fig. 4b: “Distance between tHit and tCR trajectories in S1, S2, MM and ALM. Gray traces show the time varying tHit-tCR distance in individual sessions and black traces are session-averaged tHit-tCR distance (S1:10 sessions; S2: 8 sessions; MM: 9 sessions; ALM: 13 sessions).”

      (7) In addition to the analyses shown in Figure 5a, when investigating the timing of the rule switch, I think the authors should plot the left and right lick probabilities aligned to the timing of the rule switch time on a trial-by-trial basis averaged across mice.

      We thank the Reviewer for suggesting this addition. We have added a new figure panel to show the probabilities of right- and left-licks during rule transitions (Fig. 5a).

      Page 8: “The probabilities of right-licks and left-licks showed that the mice switched their motor responses during block transitions depending on task rules (Fig. 5a, mean ± 95% CI across 12 mice).” 

      (8) P. 12 - "Moreover, in a separate study using the same task (Finkel et al., unpublished), high-speed video analysis demonstrated no significant differences in whisker motion between respond-to-touch and respond-to-light blocks in most (12 of 14) behavioral sessions.". Such behavioral data is important and ideally would be included in the current analysis. Was high-speed videography carried out during electrophysiology in the current study?

      Finkel et al. has been accepted in principle for publication and will be available online shortly. Unfortunately we have not yet carried out simultaneous high-speed whisker video and electrophysiology in our cross-modal sensory selection task.

      Reviewer #3 (Recommendations For The Authors):

      (1) Minor point. For subspace overlap calculation of pre-stimulus activity in Fig 4e (light purple datapoints), please clarify whether the PCs for that condition were constructed in matched time windows. If the PCs are calculated from the stimulus period 0-150ms, the poor alignment could be due to mismatched time windows.

      We thank the Reviewer for the comment and clarify our analysis here. We previously used timematched windows to calculate subspace overlaps. However, the pre-stimulus activity was much weaker than the activity during the stimulus period, so the subspaces of reference tHit were subject to noise and we were not able to obtain reliable PCs. This caused the subspace overlap values between the reference tHit and control tHit to be low and variable (mean ± SD, S1:  0.46± 0.26, n = 8 sessions, S2: 0.46± 0.18, n = 7 sessions, MM: 0.44± 0.16, n = 5 sessions, ALM: 0.38± 0.22, n = 11 sessions).  Therefore, we used the tHit activity during the stimulus window to obtain PCs and projected pre-stimulus and stimulus activity in tCR trials onto these PCs. We have now added a more detailed description of this analysis in the Methods (page 32). 

      “To calculate the separation of subspaces prior to stimulus delivery, pre-stimulus activity in tCR trials (100 to 0 ms from stimulus onset) was projected to the PC space of the tHit reference group and the subspace overlap was calculated. In this analysis, we used tHit activity during stimulus delivery (0 to 150 ms from stimulus onset) to obtain reliable PCs.”   

      We acknowledge this time alignment issue and have now removed the reported subspace overlap between tHit and tCR during the pre-stimulus period from Figure 4e (light purple). However, we think the correlation between pre- and post- stimulus-onset subspace overlaps should remain similar regardless of the time windows that we used for calculating the PCs. For the PCs calculated from the pre-stimulus period (-100 to 0 ms), the correlation coefficient was 0.55 (Pearson correlation, p <0.01, n = 31 sessions). For the PCs calculated from the stimulus period (0-150 ms), the correlation coefficient was 0.68 (Figure 4f, Pearson correlation, p <0.001, n = 31 sessions). Therefore, we keep Figure 4f.  

      (2) Minor point. To help the readers follow the logic of the experiments, please explain why PPC and AMM were added in the later optogenetic experiment since these are not part of the electrophysiology experiment.

      We have added the following rationale on page 9.

      “We recorded from AMM in our cross-modal sensory selection task and observed visually-evoked activity (Fig. S1i-k), suggesting that AMM may play an important role in rule-dependent visual processing. PPC contributes to multisensory processing51–53 and sensory-motor integration50,54–58.  Therefore, we wanted to test the roles of these areas in our cross-modal sensory selection task.”

      (3) Minor point. We are somewhat confused about the timing of some of the example neurons shown in figure S1. For example, many neurons show visually evoked signals only after stimulus offset, unlike tactile evoked signals (e.g. Fig S1b and f). In addition, the reaction time for visual stimulus is systematically slower than tactile stimuli for many example neurons (e.g. Fig S1b) but somehow not other neurons (e.g. Fig S1g). Are these observations correct?

      These observations are all correct. We have a manuscript from a separate study using this same behavioral task (Finkel et al., accepted in principle) that examines and compares (1) the onsets of tactile- and visually-evoked activity and (2) the reaction times to tactile and visual stimuli. The reaction times to tactile stimuli were slightly but significantly shorter than the reaction times to visual stimuli (tactile vs visual, 397 ± 145 vs 521 ± 163 ms, median ± interquartile range [IQR], Tukey HSD test, p = 0.001, n =155 sessions). We examined how well activity of individual neurons in S1 could be used to discriminate the presence of the stimulus or the response of the mouse. For discriminability for the presence of the stimulus, S1 neurons could signal the presence of the tactile stimulus but not the visual stimulus. For discriminability for the response of the mouse, the onsets for significant discriminability occurred earlier for tactile compared with visual trials (two-sided Kolmogorov-Smirnov test, p = 1x10-16, n = 865 neurons with DP onset in tactile trials, n = 719 neurons with DP onset in visual trials).

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This is a valuable study that describes the effects of T. pallidum on neural development by applying single-cell RNA sequencing to an iPSC-derived brain organoid model. The evidence supporting the claims of the authors is solid, although further evidence to understand the differences in infection rates would strengthen the conclusions of the study. In particular, the conclusions would be strengthened by validating infection efficiency as this can impact the interpretation of single-cell sequencing results, and how these metrics affect organoid size as well as comparison with additional infectious agents. Furthermore, additional validations of downstream effectors are not adequate and could be improved. 

      Thank you very much for your valuable comments. Since we used the organoid model for the first time to investigate the effects of T. pallidum on brain development, the study design is not perfect. As you have accurately mentioned, the results of the paper do not have more in-depth details, especially to verify the infection rate of T. pallidum. Your valuable comments will be very useful for us for carrying out further research. In addition, the downstream effector validation is inadequate, so we performed an analysis of single-cell sequencing data to strengthen our view in the revised manuscript (See Figure 5F for a description in current manuscript).

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This is an interesting study by Xu et al showing the effects of infection with the Treponema pallidum virus (which causes syphilis disease) on neuronal development using iPSC-derived human brain organoids as a model and single-cell RNA sequencing. This work provides an important insight into the impact of the virus on human development, bridging the gap between the phenomena observed in studies using animal models as well as non-invasive human studies showing developmental abnormalities in fetuses infected with the virus in utero through maternal vertical transmission.

      Using single-cell RNAseq in combination with qPCR and immunofluorescence techniques, the authors show that T. pallidum infected organoids are smaller in size, in particular during later growth stages, contain a larger number of undifferentiated neuronal lineage cells, and exhibit decreased numbers of specific neuronal subcluster, which the authors have identified as undifferentiated hindbrain neurons.

      The study is an important first step in understanding how T. pallidum affects human neuronal development and provides important insight into the potential mechanisms that underlie the neurodevelopmental abnormalities observed in infected human fetuses. Several important weaknesses have also been noted, which need to be addressed to strengthen the study's conclusions.

      Strengths:

      (1) The study is well written, and the data quality is good for the most part.

      (2) The study provides an important first step in utilizing human brain organoids to study the impact of T. pallidum infection on neuronal development.

      (3) The study's conclusions may provide important insight to other researchers focused on studying how viral infections impact neuronal development. 

      Thank you very much for your positive feedback. Below, you will find our detailed responses to your concerns, addressed point-by-point. I once again sincerely appreciate your time and effort in reviewing our manuscript.

      Weaknesses:

      (1) It is unclear how T. pallidum infection was validated in the organoids. If not all cells are infected, this could have important implications for the study's conclusions, in particular the single-cell RNAseq experiments. Were only cells showing the presence of the virus selected for sequencing? A detailed description of how infection was validated and the process of selection of cells for RNAseq would strongly support the study's conclusions. 

      Thank you for your valuable comment. We completely agree with your point. Exploring the infection rate of T. pallidum to brain organoids is a key factor that must be considered. We selected pluripotent stem cell-derived brain organoids to simulate the process of foetal brain neurodevelopment and cultured them mixed with T. pallidum to mimic T. pallidum invading brain tissue. Since brain organoids are three-dimensional structures formed by nerve cell aggregation, T. pallidum invades organoids from the periphery to the center of the organoids gradually. T. pallidum acts on organoids long enough to increase the infection rates; however, the pathogen is selective in invading human cells. If we only select cells present in T. pallidum for sequencing, the authenticity of simulating "real world" infections is somewhat weakened. To better carry out this study, selecting cells from intact organoids for sequencing, without eliminating cells without T. pallidum, can better simulate the effect of T. pallidum infection on the nervous system. Of course, we should also set up a blank control group.

      (2) The authors show that T. pallidum infection results in impaired development of hindbrain neurons. How does this finding compare to what has already been shown in animal studies? Is a similar deficit in this brain region observed with this specific virus? It would be useful to strengthen the study's conclusions if the authors added a discussion about the observed deficits in hindbrain neuronal development, and prior literature on similar studies conducted in animal models or human patients. Does T. pallidum preferentially target these neurons, or is this a limitation of the current organoid model system? 

      Thank you for your valuable comments. The finding that T. pallidum infection results in impaired development of hindbrain neurons has not been verified in animal experiments. Of course, it is better to further validate the findings in organoid studies through animal experiments. Unfortunately, due to the technical challenges, mature animal models have not been developed for the study of congenital syphilis. Although our team has been working on the development of animal models of congenital neurosyphilis, the current progress is still not satisfactory. After struggling hard in this field for many years, we decided to attempt to utilize human brain organoids instead of animal models to study the impact of T. pallidum infection on neuronal development.

      We also checked prior literature on similar studies that have referred to the content in human patients. Dan Doherty et al. reported that patients with pontocerebellar hypoplasia develop microcephaly at birth or over time after birth (PMID: 23518331). Based on your constructive suggestions, we have added some content related to hindbrain to the “Discussion” section.

      Our study found that T. pallidum could inhibit the differentiation of subNPC1B in brain organoids, thereby reducing the differentiation from subNPC1B to hindbrain neurons, and ultimately affecting the development and maturation of hindbrain neurons during pregnancy. Based on our results, T. pallidum does not preferentially target hindbrain neurons. Of course, there are limitations to the current organoid model system, see the "Limitations" section.

      PMID: 23518331- Dan Doherty et al, Midbrain and hindbrain malformations: advances in clinical diagnosis, imaging, and genetics.

      Revision in the “Discussion” section, line 343-352:

      “The vertebrate hindbrain contains a complex network of dedicated neural circuits that play an essential role in controlling many physiological processes and behaviors, including those related to the cerebellum, pons, and medulla oblongata (Shoja et al., 2018). Patients with pontocerebellar hypoplasia represent the less severe end of the spectrum with early hyperreflexia, developmental delay, and feeding problems, eventually developing spasticity and involuntary movements in childhood, while some patients represent the severe end of the spectrum characterised by polyhydramnios, severe hyperreflexia, contracture, and early death from central respiratory failure. Patients with pontocerebellar hypoplasia develop microcephaly at birth or over time after birth (Doherty et al., 2013).”

      (3) The authors show that T. pallidum-infected organoids are smaller in size by measuring organoid diameter during later stages of organoid growth, with no change during early stages. Does that represent insufficient infection at the early stages? Is this due to increased cell death or lack of cell division in the infected organoids? Experiments using IHC to quantify levels of cleaved caspase and/or protein markers for cell proliferation would be able to address these questions. 

      Thank you for your valuable suggestion. The concentration of T. pallidum in patients with syphilis was generally very low (PMID: 21752804, 35315702, 33099614). In this study, a low concentration of T. pallidum was applied to brain organoids to simulate early foetal transmission of syphilis. Nerve cells mainly establish intercellular connections to form brain organoids in the way of adhesion, which can easily cause organoids to divide and die if treated with a high concentration of T. pallidum. Furthermore, based on your suggestions, we performed additional immunostaining analyses to verify the apoptosis of brain organoids infected by T. pallidum. Cleaved caspase 3 (clCASP3) staining showed that the number of apoptotic cells increased following T. pallidum infection; however, the proportion of apoptotic cells in both groups of brain organoids was very low (Figure supplement 2) (N=12 organoids, each group from three independent bioreactors), which would be not enough to affect the results of the experiment, thereby suggesting that neural differentiation and development of brain organoids were mainly inhibited following T. pallidum infection (rather than promoting organoid apoptosis).

      PMID: 21752804-- Craig Tipple et al, Getting the measure of syphilis: qPCR to better understand early infection.

      PMID: 35315702-- Cuini Wang et al, Quantified Detection of Treponema pallidum DNA by PCR Assays in Urine and Plasma of Syphilis Patients.

      PMID: 33099614—Cuini Wang et al, A New Specimen for Syphilis Diagnosis: Evidence by High Loads of Treponema pallidum DNA in Saliva.

      Revision in the “Results” section, line 105-108:

      “… cleaved caspase 3 (clCASP3) staining showed that the number of apoptotic cells increased significantly following T. pallidum infection, but the proportion of apoptotic cells in both groups of brain organoids was very low (Figure supplement 2) (N=12 organoids, each group from three independent bioreactors) …”

      Revision in the “Materials and methods” section, line 446-447:

      “…anti-cleaved caspase 3 (rabbit, 1:100, Cell Signaling Technology, 9661S),”

      Revision in the “Supplementary File” section, line 78-81:

      Author response image 1.

      The number of clCASP3+ cells in the microscopic field of brain organoids. A nonparametric t-test was used to evaluate the statistical differences between the two groups. (**: P < 0.01).

      (4) In Figure 1D authors show differences in rosette-like structure in the infected organoids. The representative images do not appear to be different in any of the discussed components (e.g., the sox2 signal looks fairly similar between the two conditions). No quantification of these structures was presented. Authors should provide quantification or a more representative image to support their statement. 

      Thank you for your valuable suggestion. I have quantified the neural rosette structure and compared the number of intact rosette-like structures between the two groups (See Figure 1D for a description in current manuscript).

      (5) The IHC images shown in Figures 3E, G, and Figure 4E look very similar between the two conditions despite the discussed decrease in the text. A more suitable representative image should be presented, or the analysis should be amended to reflect the observed results. 

      Thank you for your valuable suggestion. I have replaced more representative images in Figure 3E, G, and Figure 4E in the manuscript.

      Reviewer #2 (Public Review):

      Summary:

      This study provides an important overview of infectious etiology for neurodevelopment delay.

      Strengths:

      Strong RNA evaluation.

      Weaknesses:

      The study lacks an overview of other infectious agents. The study should address the epigenetic contributors (PMID: 36507115) and the role of supplements in improving outcomes (PMID: 27705610). 

      Addressing the above - with references included - is recommended. 

      Thank you for your valuable comment. Our research is mainly inspired by other infectious agents, such as Zika virus; there are many descriptions of Zika virus in the “Discussion” section of the manuscript to better describe and demonstrate our point of view (See pages 12–13). I was unable to retrieve the article (PMID: 36507115), kindly help in confirming the PMID number. I will be very grateful if you can provide the full text. Secondly, I have carefully read the article (PMID: 27705610), which is a very rich and comprehensive review, and summarised and cited it in appropriate places in our manuscript.

      Revision in the “Discussion- limitation” section, line 375-379:

      “First, although several recent protocols have made use of growth factors to promote further neuronal maturation and survival (Lucke-Wold et al., 2018), the organoid culture scheme needs to be further improved owing to the lower percentage of mature neurons and the challenge of cell necrosis within the organoids at this stage in day 55 organoids.”

      Reviewer #3 (Public Review): 

      This article is the first report to study the effects of T. pallidum on the neural development of an iPSC-derived brain organoid model. The study indicates that T. pallidum inhibits the differentiation of subNPC1B neurons into hindbrain neurons, hence affecting brain organoid neurodevelopment. Additionally, the TCF3 and notch signaling pathways may be involved in the inhibition of the subNPC1B-hindbrain neuron differentiation axis. While the majority of the data in this study support the conclusions, there are still some questions that need to be addressed and data quality needs to be improved. The study provides valuable insights for future investigations into the mechanisms underlying congenital neurodevelopment disability. 

      I sincerely appreciate your comments on our paper. The comments have helped us greatly improve the quality of our paper. Thank you for your time and constructive critique.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      Paired t-test analysis is not appropriate if two distinct groups are compared. 

      I sincerely apologize for our presentation. We used a nonparametric t-test to compare the two groups. I have confirmed and corrected the statistical method description of this manuscript (Revision in the “Materials and methods” section (line 553-555) and “Figures-legend” section (line 789-790, 817-818, 829-830) in current manuscript).

      Reviewer #3 (Recommendations For The Authors): 

      (1) Can the authors explain why the mean size of organoids infected with T. pallidum is smaller?

      Thank you for your valuable comment. In our study, T. pallidum infection resulted in brain organisational changes in neural rosette-like structures resembling the proliferative regions of the human ventricular zone and caused fewer and incomplete rosette-like structures. Next, the ventricular zone is also the main area where neural progenitor cells (NPCs) reside (PMID: 33838105); our results showed that the proportion of neural progenitor cells (NPC)1 was reduced after T. pallidum infection. Rosette-like structure size changes owing to NPC depletion. Therefore, the mean size of organoids infected with T. pallidum is smaller.

      Revision in the “Results” section, line 101-104:

      “T. pallidum infection resulted in brain organisational changes in neural rosette-like structures resembling the proliferative regions of the human ventricular zone where NPC reside (Krenn et al., 2021), and caused fewer and incomplete rosette-like structures (P < 0.01) (Figure 1D)”

      (2) Why was the target gene for qRT-PCR validation selected to be HOXA5、HOXC5、HOXA4?

      Thank you for your valuable comment. The qRT-PCR experiment was selected here to verify the analysis results of the scRNA-seq. HOX family genes are key factors controlling early hindbrain development, which are expressed in the hindbrain region during the gastrulation stage of early embryonic development and persist into the nerve cell stage, and are essential for the correct induction of hindbrain development and segmentation (PMID: 2571936, 1983472, 1673098, 15930115). Therefore, we selected the HOX family gene for verification.

      PMID: 2571936-WILKINSON D G, et al. Segmental expression of Hox-2 homoeobox- containing genes in the developing mouse hindbrain.

      PMID: 1983472-- FROHMAN M A, et al. Isolation of the mouse Hox-2.9 gene; analysis of embryonic expression suggests that positional information along the anterior-posterior axis is specified by mesoderm.

      PMID: 1673098--MURPHY P, et al. Expression of the mouse labial-like homeobox-containing genes, Hox 2.9 and Hox 1.6, during segmentation of the hindbrain.

      PMID: 15930115-- MCNULTY C L, et al. Knockdown of the complete Hox paralogous group 1 leads to dramatic hindbrain and neural crest defects.

      (3) Why was qRT-PCR not employed in other experimental validations, but solely to validate early neural-specific transcription factor changes?

      Thank you for your valuable comment. The qRT-PCR experiment was selected to validate early neural-specific transcription factor changes, indicating the reliability of the scRNA-seq. Then, validated scRNA-seq data were used to analyze for other neuro-specific gene differences, such as violin plots and heatmap showing differentially expressed genes (Figure 4D and Figure 5B, C). Of course, we also tested it with other experiments, such as immunohistochemistry and flow cytometric screening.

      (4) The authors found that T. pallidum might reduce the differentiation from subNPC1B to hindbrain neurons by inhibiting subNPC1B differentiation in brain organoids. Why were the subNPC1B-specific markers declining?

      Thank you for your valuable comment. scRNA-seq is aimed at complete brain organoids. Cluster analysis of cell types of organoids is performed according to specific marker genes of different cells. The decrease in the expression of marker genes of certain cell groups indicates that the cell proportion of such cell groups in the whole organoids is reduced. We analysed organoids following T. pallidum infection, uniform manifold approximation and projection (UMAP), and clustering of the NPC1 population demonstrated that T. pallidum downregulated the number of subNPC1B population. Therefore, the results demonstrated a decrease in the subNPC1B -specific markers.

      (5) In comparison to the other figures, Figure 5E letter size is excessively small and ambiguous.

      Thanks for your valuable comments, I have adjusted Figure 5E letter size.

      (6) Figure 5E shows that TCF3, more than one gene, is specifically enriched in subNPC1B of the T. pallidum group. It is best to confirm the impact of the other gene. 

      Thank you for raising this key issue that we had not addressed properly in our previous version of the manuscript; we have added further analytical data. The SCENIC analysis found that the transcriptional activity of 52 genes has significantly changed after T. pallidum infection. Furthermore, GO analyses demonstrated that 27 transcription factors were significantly enriched in four key pathways of neural differentiation and development. TCF3 is the sole transcription factor present in all four terms simultaneously, speculating that TCF3 is the key transcription factor for the inhibition of subNPC1B-hindbrain neuron differentiation caused by T. pallidum.

      Revision in the “Results” section, line 261-273:

      “Next, the single-cell regulatory network inference and clustering (SCENIC) analysis for the subNPC1B subcluster was performed to assess the differences in the transcriptional activity of the transcription factors between the two groups and found that the transcriptional activity of 52 genes significantly changed after T. pallidum infection (Figure 5E). Furthermore, GO analyses demonstrated that 27 transcription factors were significantly enriched in key pathways of neural differentiation and development in response to nervous system development, positive regulation of sequence-specific DNA-binding transcription factor activity, positive regulation of neuronal differentiation, and DNA templated transcription regulation. Remarkably, transcription factor 3 (TCF3) is the sole transcription factor present in all four terms simultaneously (Figure 5F), speculating that TCF3 is the key transcription factor for the inhibition of subNPC1B-hindbrain neuron differentiation caused by T. pallidum.”

      Revision in the “Materials and methods” section, line 540-543:

      “The Sankey diagram was created using SankeyMATIC (https://sankeymatic.com/) (Zhang et al., 2023), which was used to characterize the interactions between differential transcription factors and neural differentiation and development.”

      Revision in the “Figure and Figure Legend” section, line 832, 842-844:

      Author response image 2.

      Sankey diagram showing the correspondence between differential transcription factors and neural differentiation and development.

      (7) Are there other experiments demonstrating that TCF3 is a key transcription factor for the inhibition of subNPC1B-hindbrain neuron differentiation caused by T. pallidum

      Thank you for your valuable comment. In the previous experiment, we attempted to select a subNPC1B subcluster by flow sorting to verify the relevant molecular mechanism. Due to the small proportion of subNPC1B subcluster in the whole organoids, the selected cells were in a poor state and could not reach the number of cells required for the experiment. However, we used scRNA-seq data to further identify TCF3 as a key transcription factor that inhibits subNPC1B - hindbrain neuron differentiation induced by T. pallidum. The relevant results and descriptions of the analysis are detailed in the revised manuscript, please see our response to point (6) above.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      The authors use the innovative CRISPRi method to uncover regulators of cell density and volume in neutrophils. The results show that cells require NHE activity during chemoattractant-driven cell migration. Before migration occurs, cells also undergo a rapid cell volume increase. These results indicate that water flux, driven by ion channels, appears to play a central role in neutrophil migration. The paper is very well written and clear. I suggest adding some discussion about the role of actin in the process, but this is not essential.

      Strengths

      The novel use of CRIPSPi to uncover cell density regulators is very novel. Some of the uncovered molecules were known before, e.g. discussed in Li & Sun, Frontiers in Cell and Developmental Biology, 2021. Others are more interesting, for example PI3K-gamma. The use of caged fMLP is also nice.

      We thank the reviewer for their positive appraisal of our work and have pursued their suggestions for improving our paper in this revision.

      Weaknesses

      One area of investigation that seems to be absent is mentioned in the introduction. I.e., actin is expected to play a role in regulating cell volume increase. Did the authors perform any experiments with LatA? What was seen there? Do cells still migrate with LatA, or is a different interplay seen? The role of PI3K is interesting, and maybe somewhat related to actin. But this may be a different line of inquiry for the future.

      We agree that we could have done a better job explicitly investigating the role of actin dynamics in volume changes. Towards this end, by using Latrunculin B to depolymerize actin, we find that the volume increase in suspension is not affected (Figure 1 – supplemental figure 2A). In our FxM single cell volume measurements of adherent cells, we similarly observed unhindered swelling following latrunculin treatment. These data indicate that actin is dispensable for chemoattractant-induced cell swelling (Figure 1 – supplemental figure 2B) . There was a minor apparent reduction in the final volume reached with the Latrunculin-treated cells as measured by FxM, but this likely reflects minor uptake of the excluded dye following Latrunculin treatment rather than an actual change in final volume. This conclusion is reinforced by the change in 2D footprint area being well modeled by the 2D projection of an isotropically expanding sphere (Figure 1 – supplemental figure 2C) . Latrunculin treatment completely abolishes migration, as is expected for unconfined migration on fibronectin (Figure 1 – supplemental figure 2D-E) . The second Reviewer also wanted us to dig deeper on the role of PI3K-gamma, so we expanded our analysis of this hit (Figure 3 – supplemental figure 1B-D; Figure 4 – supplemental figure 1D-G) .

      Author response image 1.

      Chemoattractant-induced swelling, but not motility, is independent of actin polymerization. (A) Human primary neutrophils were incubated with DMSO or Latrunculin B, activated with 20 nM fMLP, and then volume responses were measured using electronic sizing via a Coulter counter. Latrunculin treatment did not alter cell swelling, indicating that actin polymerization is dispensable for the chemoattractant-induced volume increase. (B) Similar results were obtained using the FxM assay, showing that Latrunculin-treated cells are capable of swelling after stimulation. (C) The Latrunculin-treated cells also increase their footprints, albeit less so than control cells, but this is within the range of what would be expected for this degree of chemoattractant-induced volume increase (modeled by a sphere expanding an equivalent volume). (D) Single cell tracks of primary human neutrophils responding to acute chemoattractant stimulation. Both panels show 15 minutes of tracks with the tracks prior (left) and the 15 minutes post (right) uncaging the chemoattractant. The scale bar is 50 microns. The top panels show the large increase in motility displayed by control cells, while the Latrunculin-treated cells (bottom panels) fail to move. (E) Latrunculin-treated cells consistently fail to move in response to chemoattractant-stimulation. (F) Representative single cell volume traces show that Latrunculin-treated cells (black) lack short-term volume fluctuations but persistently maintain an elevated volume following chemoattractant stimulation. Control cells (blue) exhibit short-term volume fluctuations. (G) The lack of short-term volume fluctuations following latrunculin treatment is borne out across the population, with the coefficient of variation in the volume for single cells (post-swelling) being dramatically lower in Latrunculin-treated cells, suggesting that these short term volume fluctuations depend on actin-based motility.

      Author response image 2.

      Additional validation of swelling screen hits. (A) Mixed WT and CRISPR KO dHL-60 populations post-stimulation show that CA2 (black) and PI3Ky (green) KO both fail to decrease their densities as much as the WT (cyan) population following chemoattractant stimulation. Cells with negative control guides (light gray) have normal volume responses. All tubes were fractionated and aligned on the fraction containing the median of the WT population. Negative values indicate a fraction with a higher density than WT. (B) To validate the perturbations to cell swelling observed with FxM, primary human neutrophils were stimulated in suspension, and their volumes were measured using a Coulter counter. 20 nM fMLP was added at the 0 minute mark. Shaded regions represent the 95% confidence intervals. (C) PI3Kγ inhibition blocks the chemoattractant-induced volume change in primary human neutrophils, as assayed by FxM. (D) PI3Kγ inhibition also blocked the chemoattractant-drive shape change in human primary neutrophils, as measured by the change in footprint area in FxM (E) The coefficient of variation in volume for control (cyan) and iNHE1 (gold) inhibited human primary neutrophils undergoing chemokinesis are comparable, suggesting that the volume fluctuations are unchanged in moving cells upon NHE1 and PI3Kγ inhibition despite the different baseline volumes.

      Author response image 3.

      Additional validation of motility phenotypes. (A-D) Single cell tracks of primary human neutrophils responding to acute chemoattractant stimulation. Both panels show tracks of cells 15 minutes prior (left) versus 15 minutes post (right) uncaging the chemoattractant. The scale bar is 50 microns. Color saturation indicates time with tracks progressing from gray to full color. (A) Control cells show a large increase in movement upon uncaging, (B) NHE1 inhibited cells also initiate movement but to a lesser degree, (C) hypo-osmotic shock rescues the NHE1 motility defect. (D) PI3Kγ leads to a large fraction of cells failing to initiate movement. (E) PI3Kγ inhibition showed near complete blockage of the chemoattractant-induced motility increase in primary human neutrophils. (F) Control neutrophils (blue) show an increased angular alignment upon stimulation as their motility becomes directional. NHE1-inhibition (gold, iNHE1) has very little effect on this process, while PI3Kγ inhibition (green) leads to a reduction in this alignment at the population level. (G) For the PI3Kγ inhibited cells that start migrating, the migration-induced volume fluctuations are comparable to iNHE1 and control cells. The top panel shows the track of a representative migrating PI3Kγ inhibited cell and the bottom panel, its corresponding volume normalized to the pre-stimulation volume. The scale bar is 50 microns.

      Reviewer #2 (Public Review):

      Nagy et al investigated the role of volume increase and swelling in neutrophils in response to the chemoattractant. Authors show that following chemoattractant response cells lose their volume slightly owing to the cell spreading phase and then have a relatively rapid increase in the cell volume that is concomitant with cell migration. The authors performed an impressive genome-wide CRISPR screen and buoyant density assay to identify the regulators of neutrophil swelling. This assay showed that stimulating cells with chemoattractant fMLP led to an increase in the cell volume that was abrogated with the FPR1 receptor knockout. The screen revealed a cascade that could potentially be involved in cell swelling including NHE1 (sodium-proton antiporter) and PI3K. NHE1 and PI3K are required for chemoattractant-induced swelling in human primary neutrophils. Authors also suggest slightly different functions of NHE1 and PI3K activity where PI3K is also required to maintain chemoattractant-induced cell shape changes. The authors convincingly show that chemoattractant-induced cell swelling is linked to cell migration and NHE1 is required for swelling at the later stages of swelling since the cells at the early point work on low-volume and low-velocity regime. Interestingly, the authors also show that lack of swelling in NHE1-inhibited cells could be rescued by mild hypo-osmotic swelling strengthening the argument that water influx followed chemoattractant stimulation is important for potentiation for migration.

      The conclusions of this paper are mostly well supported by data and are pretty convincing, but some aspects of image acquisition and data analysis need to be clarified and extended.

      We thank the reviewer for their positive appraisal of our work and pursued their suggestions for improving our paper in this revision.

      Weaknesses

      (1) It would really help if the authors could add the missing graph for the footprint area when cells are treated with Latranculin. Graph S1F for volume changes with Lat treatment should be compared with DMSO-treated controls.

      We agree that the Latrunculin condition merits more thorough investigation. To this end, we compared the volume response of human primary neutrophils to chemoattractant addition for Latrunculin B treated cells versus DMSO controls in suspension and show that there is no difference in swelling (Figure 1 – supplemental figure 2A) . This is additionally confirmed with FxM measurements with a slight undershooting of the final volume likely due to minor uptake of the excluded dye by Latrunculin treated cells (Figure 1 – supplemental figure 2B) . We have also included the requested footprint area changes in the Latrunculin treated cells as compared to controls (Figure 1 – supplemental figure 2C) . The treated cell footprints increase much less than the controls, and this is likely due to a lack of active cell spreading in the Latrunculin treated cells. The increase in footprint area observed following latrunculin treatment is within the range of what would be expected for the 2D projection of an isotropically expanding sphere fitted to the Latrunculin volume data (salmon line).

      Author response image 4.

      Chemoattractant-induced swelling, but not motility, is independent of actin polymerization. (A) Human primary eutrophils were incubated with DMSO or Latrunculin B, activated with 20 nM fMLP, and then volume responses were measured using electronic sizing via a Coulter counter. Latrunculin treatment did not alter cell swelling, indicating that actin polymerization is dispensable for the chemoattractant-induced volume increase. (B) Similar results were obtained using the FxM assay, showing that Latrunculin-treated cells are capable of swelling after stimulation. (C) The Latrunculin-treated cells also increase their footprints, albeit less so than control cells, but this is within the range of what would be expected for this degree of chemoattractant-induced volume increase (modeled by a sphere expanding an equivalent volume).

      (2) The authors show inhibition of NHE1 blocked cell swelling using Coulter counter, a similar experiment should be done with PI3K inhibitions especially since they see PI3K inhibition impact chemoattractant-induced cell shape change.

      Good idea. PI3Ky inhibition led to a substantial reduction in the chemoattractant-driven swelling in suspension showing the critical role of PI3K in the swelling of human primary neutrophils (Figure 3 – supplemental figure 1B) .

      Author response image 5.

      Additional validation of swelling screen hits. (B) To validate the perturbations to cell swelling observed with FxM, primary human neutrophils were stimulated in suspension, and their volumes were measured using a Coulter counter. 20 nM fMLP was added at the 0 minute mark. Shaded regions represent the 95% confidence intervals.

      (3) It would be more convincing visually if the authors could also include the movie of cell spreading (footprint) and then mobility with PI3K inhibition.

      Included as suggested. We agree this is a more compelling way to present the data (Figure 4 – supplemental figure 1A-D,G)

      Author response image 6.

      Additional validation of motility phenotypes. (A-D) Single cell tracks of primary human neutrophils responding to acute chemoattractant stimulation. Both panels show tracks of cells 15 minutes prior (left) versus 15 minutes post (right) uncaging the chemoattractant. The scale bar is 50 microns. Color saturation indicates time with tracks progressing from gray to full color. (A) Control cells show a large increase in movement upon uncaging. (D) PI3Kγ leads to a large fraction of cells failing to initiate movement. (E) PI3Kγ inhibition showed near complete blockage of the chemoattractant-induced motility increase in primary human neutrophils. (G) For the PI3Kγ inhibited cells that start migrating, the migration-induced volume fluctuations are comparable to iNHE1 and control cells. The top panel shows the track of a representative migrating PI3Kγ inhibited cell and the bottom panel, its corresponding volume normalized to the pre-stimulation volume. The scale bar is 50 microns.

      (4) It is not clear how cell spreading and later volume increase are linked to overall mobility of neutrophils. Are authors suggesting that cell spreading is not required for cell mobility in neutrophils?

      We did not mean to imply that cell spreading is not required for neutrophil motility. We take advantage of the fact that we can inhibit cell swelling without inhibiting spreading to investigate the specific role of swelling on migration ( Figure 4) . Conversely, cell spreading on a substrate is not required for chemoattractant-induced cell swelling, as chemoattractant-induced swelling occurs in latrunculin-treated cells (Figure 1 – supplemental figure 2A-C) . However, these latrunculin-treated cells are not able to migrate, at least not in the context studied here (Figure 1 – supplemental figure 2 D-E) . Cell spreading and swelling are likely both critical contributors to neutrophil motility, but their relative importance is dependent on the migratory context. The single cell volume fluctuation analysis indicates that migration-associated spreading and shape changes have large impacts on cell volume ( Figure 1 F) . These fluctuations are asynchronous, obscuring their observation at the population level, but the single cell traces clearly demonstrate them and their correlation with movement.

      ( 5) Volume fluctuations associated with motility were impacted by NHE1 inhibition at the baselines, what about PI3K inhibitions? Does that impact the actual fluctuations?

      PI3K inhibition causes a significant fraction of cells to stop migrating (Figure 4 – supplemental figure 1D) , but among those that do move, they are still able to fluctuate in volume (Figure 4 – supplemental figure 1G) .

      Author response image 7.

      Additional validation of motility phenotypes. (G) For the PI3Kγ inhibited cells that start migrating, the migration-induced volume fluctuations are comparable to iNHE1 and control cells. The top panel shows the track of a representative migrating PI3Kγ inhibited cell and the bottom panel, its corresponding volume normalized to the pre-stimulation volume. The scale bar is 50 microns.

      In contrast, latrunculin abolishes the volume fluctuations that normally accompany migration (Figure 1 – supplemental figure 2F-G) . These data suggest that movement/spreading itself is the driver of the rapid volume fluctuations. In contrast, the sustained volume increase following chemoattractant stimulation is independent of shape change and still occurs in latrunculin-treated cells.

      Author response image 8.

      Chemoattractant-induced swelling, but not motility, is independent of actin polymerization. (F) Representative single cell volume traces show that Latrunculin-treated cells (black) lack short-term volume fluctuations but persistently maintain an elevated volume following chemoattractant stimulation. Control cells (blue) exhibit short-term volume fluctuations. (G) The lack of short-term volume fluctuations following latrunculin treatment is borne out across the population, with the coefficient of variation in the volume for single cells (post-swelling) being dramatically lower in Latrunculin-treated cells, suggesting that these short term volume fluctuations depend on actin-based motility.

      (6) It would really help if the authors compared similar analyses and drew conclusions from that, for example, it is unclear what the authors mean by they found no change in the angular persistence of WT and NHE1 inhibited cells which is in contrast to PI3K inhibition since they do not really have an analysis for angular persistence in PI3K inhibited cells. (S4A and S4B).

      Thanks for catching this oversight in these experiments that we previously performed but neglected to include in the initial submission. We now include plots for angular persistence, velocity, and footprint size for the PI3K-gamma-inhibited cells. The results show that PI3K-gamma inhibition interferes both with swelling (Figure 3 – supplemental figure 1B-D) and motility (Figure 4 – supplemental figure 1D-F) , which aligns with its role upstream of the other hits identified in our screen.

      Author response image 9.

      Additional validation of motility phenotypes. (A-D) Single cell tracks of primary human neutrophils responding to acute chemoattractant stimulation. Both panels show tracks of cells 15 minutes prior (left) versus 15 minutes post (right) uncaging the chemoattractant. The scale bar is 50 microns. Color saturation indicates time with tracks progressing from gray to full color. (A) Control cells show a large increase in movement upon uncaging, (B) NHE1 inhibited cells also initiate movement but to a lesser degree, (C) hypo-osmotic shock rescues the NHE1 motility defect. (D) PI3Kγ leads to a large fraction of cells failing to initiate movement. (E) PI3Kγ inhibition showed near complete blockage of the chemoattractant-induced motility increase in primary human neutrophils. (F) Control neutrophils (blue) show an increased angular alignment upon stimulation as their motility becomes directional. NHE1-inhibition (gold, iNHE1) has very little effect on this process, while PI3Kγ inhibition (green) leads to a reduction in this alignment at the population level. (G) For the PI3Kγ inhibited cells that start migrating, the migration-induced volume fluctuations are comparable to iNHE1 and control cells. The top panel shows the track of a representative migrating PI3Kγ inhibited cell and the bottom panel, its corresponding volume normalized to the pre-stimulation volume. The scale bar is 50 microns.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      The authors discuss an effect, "diffusive lensing", by which particles would accumulate in high-viscosity regions, for instance in the intracellular medium. To obtain these results, the authors rely on agent-based simulations using custom rules performed with the Ito stochastic calculus convention. The "lensing effect" discussed is a direct consequence of the choice of the Ito convention without spurious drift which has been discussed before and is likely to be inadequate for the intracellular medium, causing the presented results to likely have little relevance for biology.

      We thank the editors and the reviewers for their consideration of our manuscript. We argue in this rebuttal and revision that our results and conclusions are in fact likely to have relevance for biology. While we use the Itô convention for ease of modeling considering its non-anticipatory nature upon discretization (see (Volpe and Wehr 2016) for the discretization schemes), we refer to Figure S1B to emphasize that diffusive lensing occurs not only under the Itô convention but across a wide parameter space. Indeed, it is absent only in the normative isothermal convention; note that even a stochastic differential equation conforming to the isothermal convention may be reformulated into the Itô convention by adding suitable drift terms, allowing for diffusive lensing to be seen even in case of the isothermal convention. We note in particular that the choice of the convention is a highly context-dependent one (Sokolov 2010); there is not a universally correct choice, and one can obtain stochastic differential equations consistent with Ito or Stratonovich interpretations in different regimes. Lastly, space-dependent diffusivity is now an experimentally well-recognized feature of the cellular interior, as noted in our references and as discussed further later in this response. This fact points towards the potential relevance of our model for subcellular diffusion.

      In our revised preprint, we have made changes to the text and minor changes to figures to address reviewer concerns.

      Responses to the Reviewers

      We thank the reviewers for their feedback and address the issues they raised in this rebuttal and in the revised manuscript. The central point that the reviewers raise concerns the validity of the drift-less Itô interpretation in modeling potential nonequilibrium types of subcellular transport arising from space-dependent diffusivity. If the drift term were considered, the resulting stochastic differential equation stochastic differential equation (SDE) is equivalent to one arising from the isothermal interpretation of heterogeneous diffusivity (Volpe and Wehr 2016), wherein no diffusive lensing is seen (as shown in Fig. S1B). That is, the isothermal interpretation and the drift-comprising Itô SDE produce the same uniform steady-state particle densities.

      While we agree with the reviewers that for a given interpretation, equivalent stochastic differential equations (SDEs) arising from other interpretations may be drawn, we disagree with the generalization that all types of subcellular diffusion conform to the isothermal interpretation. That is, there is no reason why any and all instances of nonequilibrium subcellular particle diffusion must be modeled using isothermal-conforming SDEs (such as the drift-comprising Itô SDE, for instance). We refer to (Sokolov 2010) which prescribes choosing a convention in a context-dependent manner. In this regard, we disagree with the second reviewer’s characterization of making such a choice merely a “choice of writing” considering that it is entirely dependent on the choice of microscopic parameters, as detailed in the discussion section of the manuscript. The following references have also been added to the manuscript: the reference from the first reviewer (Kupferman et al. 2004) proposes a prescription for choosing an appropriate convention based upon comparing the noise correlation time and the particle relaxation time. The reference notes that the Itô convention is appropriate when the particle relaxation time is large when compared to the noise correlation time and the Stratonovich convention is appropriate in the converse scenario. In (Rupprecht et al. 2018), active noise is considered and the resulting Fokker-Planck equation conforms to the Stratonovich convention when thermal noise was negligible. The related reference, (Vishen et al. 2019) compares three timescales: those of particle relaxation, noise correlation and viscoelastic relaxation, to make the choice. Indeed, as noted in the manuscript, lensing is seen in all but one interpretation (without drift additions); only its magnitude is altered by the interpretation/choice of the drift term. The appendix has been modified to include a subsection on the interchangeability of the conventions.

      Separately, with regards to the discussion on anomalous diffusion, the section on mean squared displacement calculation has been amended to avoid confusing our model with canonical anomalous diffusion which considers the anomalous exponent; how the anomalous exponent varies with space-dependent diffusivity offers an interesting future area of study.

      Responses to specific reviewer comments appear below.

      Reviewer #1 (Public Review):

      The manuscript "Diffusive lensing as a mechanism of intracellular transport and compartmentalization", explores the implications of heterogeneous viscosity on the diffusive dynamics of particles. The authors analyze three different scenarios:

      (i)   diffusion under a gradient of viscosity,

      (ii)  clustering of interacting particles in a viscosity gradient, and

      (iii) diffusive dynamics of non-interacting particles with circular patches of heterogeneous viscous medium.

      The implications of a heterogeneous environment on phase separation and reaction kinetics in cells are under-explored. This makes the general theme of this manuscript very relevant and interesting. However, the analysis in the manuscript is not rigorous, and the claims in the abstract are not supported by the analysis in the main text.

      Following are my main comments on the work presented in this manuscript:

      (a) The central theme of this work is that spatially varying viscosity leads to position-dependent diffusion constant. This, for an overdamped Langevin dynamics with Gaussian white noise, leads to the well-known issue of the interpretation of the noise term.

      The authors use the Ito interpretation of the noise term because their system is non-equilibrium.

      One of the main criticisms I have is on this central point. The issue of interpretation arises only when there are ill-posed stochastic dynamics that do not have the relevant timescales required to analyze the noise term properly. Hence, if the authors want to start with an ill-posed equation it should be mentioned at the start. At least the Langevin dynamics considered should be explicitly mentioned in the main text. Since this work claims to be relevant to biological systems, it is also of significance to highlight the motivation for using the ill-posed equation rather than a well-posed equation. The authors refer to the non-equilibrium nature of the dynamics but it is not mentioned what non-equilibrium dynamics to authors have in mind. To properly analyze an overdamped Langevin dynamics a clear source of integrated timescales must be provided. As an example, one can write the dynamics as Eq. (1) \dot x = f(x) + g(x) \eta , which is ill-defined if the noise \eta is delta correlated in time but well-defined when \eta is exponentially correlated in time. One can of course look at the limit in which the exponential correlation goes to a delta correlation which leads to Eq. (1) interpreted in Stratonovich convention. The choice to use the Ito convention for Eq. (1) in this case is not justified.

      We thank the reviewer for detailing their concerns with our model’s assumptions. We have addressed them in the common rebuttal.

      (b) Generally, the manuscript talks of viscosity gradient but the equations deal with diffusion which is a combination of viscosity, temperature, particle size, and particle-medium interaction. There is no clear motivation provided for focus on viscosity (cytoplasm as such is a complex fluid) instead of just saying position-dependent diffusion constant. Maybe authors should use viscosity only when talking of a context where the existence of a viscosity gradient is established either in a real experiment or in a thought experiment.

      The manuscript has been amended to use only “diffusivity” to avoid confusion.

      (c) The section "Viscophoresis drives particle accumulation" seems to not have new results. Fig. 1 verifies the numerical code used to obtain the results in the later sections. If that is the case maybe this section can be moved to supplementary or at least it should be clearly stated that this is to establish the correctness of the simulation method. It would also be nice to comment a bit more on the choice of simulation methods with changing hopping sizes instead of, for example, numerically solving stochastic ODE.

      The main point of this section and of Fig. 1 is the diffusive lensing effect itself: the accumulation of particles in lower-diffusivity areas. To the best of our knowledge, diffusive lensing has not been reported elsewhere as a specific outcome of non-isothermal interpretations of diffusion, with potential relevance to nonequilibrium subcellular motilities. The simulation method has been fully described in the Methods section, and the code has also been shared (see Code Availability).

      A minor comment, the statement "the physically appropriate convention to use depends upon microscopic parameters and timescale hierarchies not captured in a coarse-grained model of diffusion." is not true as is noted in the references that authors mention, a correct coarse-grained model provides a suitable convention (see also Phys. Rev. E, 70(3), 036120., Phys. Rev. E, 100(6), 062602.).

      This has been addressed in the common rebuttal.

      (d) The section "Interaction-mediated clustering is affected by viscophoresis" makes an interesting statement about the positioning of clusters by a viscous gradient. As a theoretical calculation, the interplay between position-dependent diffusivity and phase separation is indeed interesting, but the problem needs more analysis than that offered in this manuscript. Just a plot showing clustering with and without a gradient of diffusion does not give enough insight into the interplay between density-dependent diffusion and position-dependent diffusion. A phase plot that somehow shows the relative contribution of the two effects would have been nice. Also, it should be emphasized in the main text that the inter-particle interaction is through a density-dependent diffusion constant and not a conservative coupling by an interaction potential.

      The density-dependence has been added from the Methods to the main text. The goal of the work is to present lensing as a natural outcome of the parameter choices we make and present its effects as they relate to clustering and commonly used biophysical methods to probe dynamics within cells. A dense sampling of the phase space and how it is altered as a function of diffusivity, and the subsequent interpretation, lie beyond the scope of the present work but offer exciting future directions of study.

      (e) The section "In silico microrheology shows that viscophoresis manifests as anomalous diffusion" the authors show that the MSD with and without spatial heterogeneity is different. This is not a surprise - as the underlying equations are different the MSD should be different.

      The goal here is to compare and contrast the ways in which homogeneous and heterogeneous diffusion manifest in simulated microrheology measurements. We hope that an altered saturation MSD, as is observed in our simulations, provokes interest in considering lensing while modeling experimental data.

      There are various analogies drawn in this section without any justification:

      (i) "the saturation MSD was higher than what was seen in the homogeneous diffusion scenario possibly due to particles robustly populating the bulk milieu followed by directed motion into the viscous zone (similar to that of a Brownian ratchet, (Peskin et al., 1993))."

      In case of i), the Brownian ratchet is invoked as a model to explain directed accumulation. We have removed this analogy to avoid confusion as it is not delved into further over the course of our work.

      (ii) "Note that lensing may cause particle displacements to deviate from a Gaussian distribution, which could explain anomalous behaviors observed both in our simulations and in experiments in cells (Parry et al., 2014)." Since the full trajectory of the particles is available, it can be analyzed to check if this is indeed the case.

      This has been addressed in the common rebuttal.

      (f) The final section "In silico FRAP in a heterogeneously viscous environment ... " studies the MSD of the particles in a medium with heterogeneous viscous patches which I find the most novel section of the work. As with the section on inter-particle interaction, this needs further analysis.

      We thank the reviewer for their appreciation. In presenting these three sections discussing the effects of diffusive lensing, we intend to broadly outline the scope of this phenomenon in influencing a range of behaviors. Exploring the directions further comprise promising future directions of study that lie beyond the scope of this manuscript.

      To summarise, as this is a theory paper, just showing MSD or in silico FRAP data is not sufficient. Unlike experiments where one is trying to understand the systems, here one has full access to the dynamics either analytically or in simulation. So just stating that the MSD in heterogeneous and homogeneous environments are not the same is not sufficient. With further analysis, this work can be of theoretical interest. Finally, just as a matter of personal taste, I am not in favor of the analogy with optical lensing. I don't see the connection.

      We value the reviewer’s interest in investigating the causes underlying the differences in the MSDs and agree that it represents a promising future area of study. The main point of this section of the manuscript was to make a connection to experimentally measurable quantities.

      Reviewer #2 (Public Review):

      Summary:

      The authors study through theory and simulations the diffusion of microscopic particles and aim to account for the effects of inhomogeneous viscosity and diffusion - in particular regarding the intracellular environment. They propose a mechanism, termed "Diffusive lensing", by which particles are attracted towards high-viscosity regions where they remain trapped. To obtain these results, the authors rely on agent-based simulations using custom rules performed with the Ito stochastic calculus convention, without spurious drift. They acknowledge the fact that this convention does not describe equilibrium systems, and that their results would not hold at equilibrium - and discard these facts by invoking the fact that cells are out-of-equilibrium. Finally, they show some applications of their findings, in particular enhanced clustering in the high-viscosity regions. The authors conclude that as inhomogeneous diffusion is ubiquitous in life, so must their mechanism be, and hence it must be important.

      Strengths:

      The article is well-written, and clearly intelligible, its hypotheses are stated relatively clearly and the models and mathematical derivations are compatible with these hypotheses.

      We thank the reviewer for their appreciation.

      Weaknesses:

      The main problem of the paper is these hypotheses. Indeed, it all relies on the Ito interpretation of the stochastic integrals. Stochastic conventions are a notoriously tricky business, but they are both mathematically and physically well-understood and do not result in any "dilemma" [some citations in the article, such as (Lau and Lubensky) and (Volpe and Wehr), make an unambiguous resolution of these]. Conventions are not an intrinsic, fixed property of a system, but a choice of writing; however, whenever going from one to another, one must include a "spurious drift" that compensates for the effect of this change - a mathematical subtlety that is entirely omitted in the article: if the drift is zero in one convention, it will thus be non-zero in another in the presence of diffusive gradients. It is well established that for equilibrium systems obeying fluctuation-dissipation, the spurious drift vanishes in the anti-Ito stochastic convention (which is not "anticipatory", contrarily to claims in the article, are the "steps" are local and infinitesimal). This ensures that the diffusion gradients do not induce currents and probability gradients, and thus that the steady-state PDF is the Gibbs measure. This equilibrium case should be seen as the default: a thermal system NOT obeying this law should warrant a strong justification (for instance in the Volpe and Wehr review this can occur through memory effects in robotic dynamics, or through strong fluctuation-dissipation breakdown). In near-equilibrium thermal systems such as the intracellular medium (where, although out-of-equilibrium, temperature remains a relevant and mostly homogeneous quantity), deviations from this behavior must be physically justified and go to zero when going towards equilibrium.

      Considering that the physical phenomena underlying diffusion span a range of timescales (particle relaxation, noise, environmental correlation, et cetera), we disagree with the assertion that all types of subcellular diffusion processes can be modeled as occurring at thermal equilibrium: for example, one can easily imagine memory effects arising in the presence of an appropriate hierarchy of timescales. We have added references that describe in more detail the way in which the comparison of timescales can dictate the applicability of different conventions. We also refer the referee to the common rebuttal section of our response in which we discuss factors that govern the choice of the interpretation. The adiabatic elimination arguments highlighted in (Kupferman et al. 2004) provide a clear description of how relevant particle and environment-related timescales can inform the choice of stochastic calculus to use.

      With regards to the use of the term “anticipatory” to refer to the isothermal interpretation, we refer to the comment in (Volpe and Wehr 2016) of the Itô interpretation “not looking into the future”. In any case, whether anticipatory or otherwise, the interpretation’s effect on our model remains unchanged, as highlighted in the section in the Appendix on the conversion between different conventions; this section has been added to minimize confusion about the effects of the choice of convention on lensing.

      Here, drifts are arbitrarily set to zero in the Ito convention (the exact opposite of the equilibrium anti-Ito), which is the equilibrium equivalent to adding a force (with drift $- grad D$) exactly compensating the spurious drift. If we were to interpret this as a breakdown of detailed balance with inhomogeneous temperature, the "hot" region would be effectively at 4x higher temperature than the cold region (i.e. 1200K) in Fig 1A.

      Our work is based on existing observations of space-dependent diffusivity in cells (Garner et al., 2023; Huang et al., 2021; Parry et al., 2014; Śmigiel et al., 2022; Xiang et al., 2020). These papers support a definitive model for the existence of space-dependent diffusivity without invoking space-dependent temperature.

      It is the effects of this arbitrary force (exactly compensating the Ito spurious drift) that are studied in the article. The fact that it results in probability gradients is trivial once formulated this way (and in no way is this new - many of the references, for instance, Volpe and Wehr, mention this).

      Addressed in the common rebuttal.

      Enhanced clustering is also a trivial effect of this probability gradient (the local concentration is increased by this force field, so phase separation can occur). As a side note the "neighbor sensing" scheme to describe interactions is very peculiar and not physically motivated - it violates stochastic thermodynamics laws too, as the detailed balance is apparently not respected.

      The neighbor-sensing scheme used here is just one possible model of an effective attractive potential between particles. Other models that lead to density-dependent attraction between particles should also provide qualitatively similar results as ours; this offers an interesting prospect for future research.

      Finally, the "anomalous diffusion" discussion is at odds with what the literature on this subject considers anomalous (the exponent does not appear anomalous).

      This has been addressed in the common rebuttal, and the relevant part of the manuscript has been modified to avoid confusion.

      The authors make no further justification of their choice of convention than the fact that cells are out-of-equilibrium, leaving the feeling that this is a detail. They make mentions of systems (eg glycogen, prebiotic environment) for which (near-)equilibrium physics should mostly prevail, and of fluctuation-dissipation ("Diffusivity varies inversely with viscosity", in the introduction). Yet the "phenomenon" they discuss is entirely reliant on an undiscussed mechanism by which these assumptions would be completely violated (the citations they make for this - Gnesotto '18 and Phillips '12 - are simply discussions of the fact that cells are out-of-equilibrium, not on any consequences on the convention).

      Finally, while inhomogeneous diffusion is ubiquitous, the strength of this effect in realistic conditions is not discussed (this would be a significant problem if the effect were real, which it isn't). Gravitational attraction is also an ubiquitous effect, but it is not important for intracellular compartmentalization.

      The manuscript text has been supplemented with additional references that detail the ways in which the comparison of timescales can dictate how one can apply different conventions. We refer the reviewer to the common rebuttal section of our response where we detail factors that dictate the choice of the convention to use. As previously noted, the adiabatic elimination arguments highlighted in (Kupferman et al., 2004) provide a prescription for how different timescales are to be considered in deciding the choice of stochastic calculus to use.

      With regards to the strength of space-dependent diffusivity in subcellular milieu, various measurements of heterogeneous diffusivity have been made both across different model systems and via different modalities, as cited in our manuscript. (Garner et al. 2023) used single-particle tracking to determine over 100-fold variability in diffusivity within individual S. pombe cells. Single-molecule measurements in (Xiang et al. 2020) and (Śmigiel et al. 2022) reveal an order-of-magnitude variation in tracer diffusion in mammalian cells and multi-fold variation in E. coli cytoplasm respectively. Fluorescence correlation spectroscopy measurements in (Huang et al. 2022) have found a two-fold increase in short-range diffusion of protein-sized tracers in X. laevis extracts. We have also added a reference to a study that uses 3D single particle tracking in the cytosol of a multinucleate fungus, A. gossypii, to identify regions of low-diffusivity near nuclei and hyphal tips (McLaughlin et al. 2020). Many of these references deploy particle tracking and investigate how mesoscale-sized particles (i.e. tracers spanning biologically relevant size scales) are directly impacted by space-dependent diffusivity. Therefore, we base our model on not only space-dependent diffusivity being a well-recognized feature of the cellular interior, but also on these observations pertaining to mesoscale-sized particles’ motion along relevant timescales.

      These measurements are also relevant to the reviewer’s question about the strength of the effect, which depends directly on the variability in diffusivity: for ten- or a hundred-fold diffusivity variations, the effect would be expected to be significant. In case of using the Itô convention directly, the contrast in concentration gradient is, in fact, that of the diffusivity gradient.

      To conclude, the "diffusive lensing" effect presented here is not a deep physical discovery, but a well-known effect of sticking to the wrong stochastic convention.

      As detailed in the various responses above, we respectfully disagree with the notion that there exists a singular correct stochastic convention that is applicable for all cases of subcellular heterogeneous diffusion. Further, as detailed in (Volpe and Wehr 2016) and as detailed in the Appendix, it is possible to convert between conventions and that an isothermal-abiding stochastic differential equation may be suitably altered, by means of adding a drift term, to an Itô-abiding stochastic differential equation; therefore, one can observe diffusive lensing without discarding the isothermal convention if the latter were modified. Indeed, it is only the driftless (or canonical) isothermal convention that does not allow for diffusive lensing.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Review:

      This manuscript by Yue et al. aims to understand the molecular mechanisms underlying the better reproductive outcomes of Tibetans at high altitude by characterizing the transcriptome and histology of full-term placenta of Tibetans and compare them to those Han Chinese at high elevations.

      The approach is innovative, and the data collected are valuable for testing hypotheses regarding the contribution of the placenta to better reproductive success of populations that adapted to hypoxia. The authors identified hundreds of differentially expressed genes (DEGs) between Tibetans and Han, including the EPAS1 gene that harbors the strongest signals of genetic adaptation. The authors also found that such differential expression is more prevalent and pronounced in the placentas of male fetuses than those of female fetuses, which is particularly interesting, as it echoes with the more severe reduction in birth weight of male neonates at high elevation observed by the same group of researchers (He et al., 2022).

      This revised manuscript addressed several concerns raised by reviewers in last round. However, we still find the evidence for natural selection on the identified DEGs--as a group--to be very weak, despite more convincing evidence on a few individual genes, such as EPAS1 and EGLN1.

      The authors first examined the overlap between DEGs and genes showing signals of positive selection in Tibetans and evaluated the significance of a larger overlap than expected with a permutation analysis. A minor issue related to this analysis is that the p-value is inflated, as the authors are counting permutation replicates with MORE genes in overlap than observed, yet the more appropriate way is counting replicates with EQUAL or MORE overlapping genes. Using the latter method of p-value calculation, the "sex-combined" and "female-only" DEGs will become non-significantly enriched in genes with evidence of selection, and the signal appears to solely come from male-specific DEGs. A thornier issue with this type of enrichment analysis is whether the condition on placental expression is sufficient, as other genomic or transcriptomic features (e.g., expression level, local sequence divergence level) may also confound the analysis.

      According to the suggested methods, we counted the replicates with equal or more overlapping genes than observed (≥4 for the “combined” set; ≥9 for the “male-only” set; ≥0 for the “female-only” set). We found that the overlaps between DEGs and TSNGs were significantly enriched only in the “male-only” set (p-value < 1e-4, counting 0 time from 10,000 permutations), but not in the “female-only” set (p-value = 1, counting 10,000 time from 10,000 permutations), or “combined” set (p-value = 0.0603, counting 603 time from 10,000 permutations) (see Table R1 below).

      We updated this information in the revised manuscript, including Results, Methods, and Figure S9.

      Author response table 1.

      Permutation analysis of the overlapped genes between DEGs and TSNGs.

      The authors next aimed to detect polygenic signals of adaptation of gene expression by applying the PolyGraph method to eQTLs of genes expressed in the placenta (Racimo et al 2018). This approach is ambitious but problematic, as the method is designed for testing evidence of selection on single polygenic traits. The expression levels of different genes should be considered as "different traits" with differential impacts on downstream phenotypic traits (such as birth weight). As a result, the eQTLs of different genes cannot be naively aggregated in the calculation of the polygenic score, unless the authors have a specific, oversimplified hypothesis that the expression increase of all genes with identified eQTL will improve pregnancy outcome and that they are equally important to downstream phenotypes. In general, PolyGraph method is inapplicable to eQTL data, especially those of different genes (but see Colbran et al 2023 Genetics for an example where the polygenic score is used for testing selection on the expression of individual genes).

      We would recommend removal of these analyses and focus on the discussion of individual genes with more compelling evidence of selection (e.g., EPAS1, EGLN1).

      According to the suggestion, we removed these analyses in the revised manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1: 

      This is my first review of the article entitled "The canonical stopping network: Revisiting the role of the subcortex in response inhibition" by Isherwood and colleagues. This study is one in a series of excellent papers by the Forstmann group focusing on the ability of fMRI to reliably detect activity in small subcortical nuclei - in this case, specifically those purportedly involved in the hyper- and indirect inhibitory basal ganglia pathways. I have been very fond of this work for a long time, beginning with the demonstration of De Hollander, Forstmann et al. (HBM 2017) of the fact that 3T fMRI imaging (as well as many 7T imaging sequences) do not afford sufficient signal to noise ratio to reliably image these small subcortical nuclei. This work has done a lot to reshape my view of seminal past studies of subcortical activity during inhibitory control, including some that have several thousand citations.

      In the current study, the authors compiled five datasets that aimed to investigate neural activity associated with stopping an already initiated action, as operationalized in the classic stop-signal paradigm. Three of these datasets are taken from their own 7T investigations, and two are datasets from the Poldrack group, which used 3T fMRI.

      The authors make six chief points: 

      (1) There does not seem to be a measurable BOLD response in the purportedly critical subcortical areas in contrasts of successful stopping (SS) vs. going (GO), neither across datasets nor within each individual dataset. This includes the STN but also any other areas of the indirect and hyperdirect pathways.

      (2) The failed-stop (FS) vs. GO contrast is the only contrast showing substantial differences in those nodes.

      (3) The positive findings of STN (and other subcortical) activation during the SS vs. GO contrast could be due to the usage of inappropriate smoothing kernels.

      (4) The study demonstrates the utility of aggregating publicly available fMRI data from similar cognitive tasks. 

      (5) From the abstract: "The findings challenge previous functional magnetic resonance (fMRI) of the stop-signal task" 

      (6) and further: "suggest the need to ascribe a separate function to these networks." 

      I strongly and emphatically agree with points 1-5. However, I vehemently disagree with point 6, which appears to be the main thrust of the current paper, based on the discussion, abstract, and - not least - the title.

      To me, this paper essentially shows that fMRI is ill-suited to study the subcortex in the specific context of the stop-signal task. That is not just because of the issues of subcortical small-volume SNR (the main topic of this and related works by this outstanding group), but also because of its limited temporal resolution (which is unacknowledged, but especially impactful in the context of the stop-signal task). I'll expand on what I mean in the following.

      First, the authors are underrepresenting the non-fMRI evidence in favor of the involvement of the subthalamic nucleus (STN) and the basal ganglia more generally in stopping actions. 

      - There are many more intracranial local field potential recording studies that show increased STN LFP (or even single-unit) activity in the SS vs. FS and SS vs. GO contrast than listed, which come from at least seven different labs. Here's a (likely non-exhaustive) list of studies that come to mind:

      Ray et al., NeuroImage 2012 <br /> Alegre et al., Experimental Brain Research 2013 <br /> Benis et al., NeuroImage 2014 <br /> Wessel et al., Movement Disorders 2016 <br /> Benis et al., Cortex 2016 <br /> Fischer et al., eLife 2017 <br /> Ghahremani et al., Brain and Language 2018 <br /> Chen et al., Neuron 2020 <br /> Mosher et al., Neuron 2021 <br /> Diesburg et al., eLife 2021 

      - Similarly, there is much more evidence than cited that causally influencing STN via deep-brain stimulation also influences action-stopping. Again, the following list is probably incomplete: 

      Van den Wildenberg et al., JoCN 2006 <br /> Ray et al., Neuropsychologia 2009 <br /> Hershey et al., Brain 2010 <br /> Swann et al., JNeuro 2011 <br /> Mirabella et al., Cerebral Cortex 2012 <br /> Obeso et al., Exp. Brain Res. 2013 <br /> Georgiev et al., Exp Br Res 2016 <br /> Lofredi et al., Brain 2021 <br /> van den Wildenberg et al, Behav Brain Res 2021 <br /> Wessel et al., Current Biology 2022 

      - Moreover, evidence from non-human animals similarly suggests critical STN involvement in action stopping, e.g.: 

      Eagle et al., Cerebral Cortex 2008 <br /> Schmidt et al., Nature Neuroscience 2013 <br /> Fife et al., eLife 2017 <br /> Anderson et al., Brain Res 2020 

      Together, studies like these provide either causal evidence for STN involvement via direct electrical stimulation of the nucleus or provide direct recordings of its local field potential activity during stopping. This is not to mention the extensive evidence for the involvement of the STN - and the indirect and hyperdirect pathways in general - in motor inhibition more broadly, perhaps best illustrated by their damage leading to (hemi)ballism. 

      Hence, I cannot agree with the idea that the current set of findings "suggest the need to ascribe a separate function to these networks", as suggested in the abstract and further explicated in the discussion of the current paper. For this to be the case, we would need to disregard more than a decade's worth of direct recording studies of the STN in favor of a remote measurement of the BOLD response using (provably) sub ideal imaging parameters. There are myriads of explanations of why fMRI may not be able to reveal a potential ground-truth difference in STN activity between the SS and FS/GO conditions, beginning with the simple proposition that it may not afford sufficient SNR, or that perhaps subcortical BOLD is not tightly related to the type of neurophysiological activity that distinguishes these conditions (in the purported case of the stop-signal task, specifically the beta band). But essentially, this paper shows that a specific lens into subcortical activity is likely broken, but then also suggests dismissing existing evidence from superior lenses in favor of the findings from the 'broken' lens. That doesn't make much sense to me.

      Second, there is actually another substantial reason why fMRI may indeed be unsuitable to study STN activity, specifically in the stop-signal paradigm: its limited time resolution. The sequence of subcortical processes on each specific trial type in the stop-signal task is purportedly as follows: at baseline, the basal ganglia exert inhibition on the motor system. During motor initiation, this inhibition is lifted via direct pathway innervation. This is when the three trial types start diverging. When actions then have to be rapidly cancelled (SS and FS), cortical regions signal to STN via the hyperdirect pathway that inhibition has to be rapidly reinstated (see Chen, Starr et al., Neuron 2020 for direct evidence for such a monosynaptic hyperdirect pathway, the speed of which directly predicts SSRT). Hence, inhibition is reinstated (too late in the case of FS trials, but early enough in SS trials, see recordings from the BG in Schmidt, Berke et al., Nature Neuroscience 2013; and Diesburg, Wessel et al., eLife 2021). 

      Hence, according to this prevailing model, all three trial types involve a sequence of STN activation (initial inhibition), STN deactivation (disinhibition during GO), and STN reactivation (reinstantiation of inhibition during the response via the hyperdirect pathway on SS/FS trials, reinstantiation of inhibition via the indirect pathway after the response on GO trials). What distinguishes the trial types during this period is chiefly the relative timing of the inhibitory process (earliest on SS trials, slightly later on FS trials, latest on GO trials). However, these temporal differences play out on a level of hundreds of milliseconds, and in all three cases, processing concludes well under a second overall. To fMRI, given its limited time resolution, these activations are bound to look quite similar. 

      Lastly, further building on this logic, it's not surprising that FS trials yield increased activity compared to SS and GO trials. That's because FS trials are errors, which are known to activate the STN (Cavanagh et al., JoCN 2014; Siegert et al. Cortex 2014) and afford additional inhibition of the motor system after their occurrence (Guan et al., JNeuro 2022). Again, fMRI will likely conflate this activity with the abovementioned sequence, resulting in a summation of activity and the highest level of BOLD for FS trials. 

      In sum, I believe this study has a lot of merit in demonstrating that fMRI is ill-suited to study the subcortex during the SST, but I cannot agree that it warrants any reappreciation of the subcortex's role in stopping, which are not chiefly based on fMRI evidence. 

      We would like to thank reviewer 1 for their insightful and helpful comments. We have responded point-by-point below and will give an overview of how we reframed the paper here.  

      We agree that there is good evidence from other sources for the presence of the canonical stopping network (indirect and hyperdirect) during action cancellation, and that this should be reflected more in the paper. However, we do not believe that a lack of evidence for this network during the SST makes fMRI ill-suited for studying this task, or other tasks that have neural processes occurring in quick succession. What we believe the activation patterns of fMRI reflect during this task, is the large of amount of activation caused by failed stops. That is, that the role of the STN in error processing may be more pronounced that its role in action cancellation. Due to the replicability of fMRI results, especially at higher field strengths, we believe the activation profile of failed stop trials reflects a paramount role for the STN in error processing. Therefore, while we agree we do not provide evidence against the role of the STN in action cancellation, we do provide evidence that our outlook on subcortical activation during different trial types of this task should be revisited. We have reframed the article to reflect this, and discuss points such as fMRI reliability, validity and the complex overlapping of cognitive processes in the SST in the discussion. Please see all changes to the article indicated by red text.

      A few other points: 

      - As I said before, this team's previous work has done a lot to convince me that 3T fMRI is unsuitable to study the STN. As such, it would have been nice to see a combination of the subsamples of the study that DID use imaging protocols and field strengths suitable to actually study this node. This is especially true since the second 3T sample (and arguably, the Isherwood_7T sample) does not afford a lot of trials per subject, to begin with.

      Unfortunately, this study already comprises of the only 7T open access datasets available for the SST. Therefore, unless we combined only the deHollander_7T and Miletic_7T subsamples there is no additional analysis we can do for this right now. While looking at just the sub samples that were 7T and had >300 trials would be interesting, based on the new framing of the paper we do not believe it adds to the study, as the sub samples still lack the temporal resolution seemingly required for looking at the processes in the SST.

      - What was the GLM analysis time-locked to on SS and FS trials? The stop-signal or the GO-signal? 

      SS and FS trials were time-locked to the GO signal as this is standard practice. The main reason for this is that we use contrasts to interpret differences in activation patterns between conditions. By time-locking the FS and SS trials to the stop signal, we are contrasting events at different time points, and therefore different stages of processing, which introduces its own sources of error. We agree with the reviewer, however, that a separate analysis with time-locking on the stop-signal has its own merit, and now include results in the supplementary material where the FS and SS trials are time-locked to the stop signal as well.

      - Why was SSRT calculated using the outdated mean method? 

      We originally calculated SSRT using the mean method as this was how it was reported in the oldest of the aggregated studies. We have now re-calculated the SSRTs using the integration method with go omission replacement and thank the reviewer for pointing this out. Please see response to comment 3.

      - The authors chose 3.1 as a z-score to "ensure conservatism", but since they are essentially trying to prove the null hypothesis that there is no increased STN activity on SS trials, I would suggest erring on the side of a more lenient threshold to avoid type-2 error. 

      We have used minimum FDR-corrected thresholds for each contrast now, instead of using a blanket conservative threshold of 3.1 over all contrasts. The new thresholds for each contrast are shown in text. Please see below (page 12):

      “The thresholds for each contrast are as follows: 3.01 for FS > GO, 2.26 for FS > SS and 3.1 for SS > GO.”

      - The authors state that "The results presented here add to a growing literature exposing inconsistencies in our understanding of the networks underlying successful response inhibition". It would be helpful if the authors cited these studies and what those inconsistencies are. 

      We thank reviewer 1 for their detailed and thorough evaluation of our paper. Overall, we agree that there is substantial direct and indirect evidence for the involvement of the cortico-basal-ganglia pathways in response inhibition. We have taken the vast constructive criticism on board and agree with the reviewer that the paper should be reframed. We would like to thank the reviewer for the thoroughness of their helpful comments aiding the revising of the paper.

      (1) I would suggest reframing the study, abstract, discussion, and title to reflect the fact that the study shows that fMRI is unsuitable to study subcortical activity in the SST, rather than the fact that we need to question the subcortical model of inhibition, given the reasons in my public review.

      We agree with the reviewer that the article should be reframed and not taken as direct evidence against the large sum of literature pointing towards the involvement of the cortico-basal-ganglia pathway in response inhibition. We have significantly rewritten the article in light of this.

      (2) I suggest combining the datasets that provide the best imaging parameters and then analyzing the subcortical ROIs with a more lenient threshold and with regressors time-locked to the stop-signals (if that's not already the case). This would make the claim of a null finding much more impactful. Some sort of power analysis and/or Bayes factor analysis of evidence for the null would also be appreciated. 

      Instead of using a blanket conservative threshold of 3.1, we instead used only FDR-corrected thresholds. The threshold level is therefore different for each contrast and noted in the figures. We have also added supplementary figures including the group-level SPMs and ROI analyses when the FS and SS trials were time-locked to the stop signal instead of the GO signal (Supplementary Figs 4 & 5). But as mentioned above, due to the difference in time points when contrasting, we believe that time-locking to the GO signal for all trial types makes more sense for the main analysis.

      We have now also computed BFs on the first level ROI beta estimates for all contrasts using the BayesFactor package as implemented in R. We add the following section to the methods and updated the results section accordingly (page 8):

      “In addition to the frequentist analysis we also opted to compute Bayes Factors (BFs) for each contrast per ROI per hemisphere. To do this, we extracted the beta weights for each individual trial type from our first level model. We then compared the beta weights from each trial type to one another using the ‘BayesFactor’ package as implement in R (Morey & Rouder, 2015). We compared the full model comprising of trial type, dataset and subject as predictors to the null model comprising of only the dataset and subject as predictor. The datasets and subjects were modeled as random factors. We divided the resultant BFs from the full model by the null model to provide evidence for or against a significant difference in beta weights for each trial type. To interpret the BFs, we used a modified version of Jeffreys’ scale (Jeffreys, 1939; Lee & Wagenmakers, 2014).”

      (3) I suggest calculating SSRT using the integration method with the replacement of Go omissions, as per the most recent recommendation (Verbruggen et al., eLife 2019).

      We agree we should have used a more optimal method for SSRT estimation. We have replaced our original estimations with that of the integration method with go omissions replacement, as suggested and adapted the results in table 3.

      We have also replaced text in the methods sections to reflect this (page 5):

      “For each participant, the SSRT was calculated using the mean method, estimated by subtracting the mean SSD from median go RT (Aron & Poldrack, 2006; Logan & Cowan, 1984).”

      Now reads:

      “For each participant, the SSRT was calculated using the integration method with replacement of go omissions (Verbruggen et al., 2019), estimated by integrating the RT distribution and calculating the point at which the integral equals p(respond|signal). The completion time of the stop process aligns with the nth RT, where n equals the number of RTs in the RT distribution of go trials multiplied by the probability of responding to a signal.”

      Reviewer #2:

      This work aggregates data across 5 openly available stopping studies (3 at 7 tesla and 2 at 3 tesla) to evaluate activity patterns across the common contrasts of Failed Stop (FS) > Go, FS > stop success (SS), and SS > Go. Previous work has implicated a set of regions that tend to be positively active in one or more of these contrasts, including the bilateral inferior frontal gyrus, preSMA, and multiple basal ganglia structures. However, the authors argue that upon closer examination, many previous papers have not found subcortical structures to be more active on SS than FS trials, bringing into question whether they play an essential role in (successful) inhibition. In order to evaluate this with more data and power, the authors aggregate across five datasets and find many areas that are *more* active for FS than SS, specifically bilateral preSMA, caudate, GPE, thalamus, and VTA, and unilateral M1, GPi, putamen, SN, and STN. They argue that this brings into question the role of these areas in inhibition, based upon the assumption that areas involved in inhibition should be more active on successful stop than failed stop trials, not the opposite as they observed. 

      As an empirical result, I believe that the results are robust, but this work does not attempt a new theoretical synthesis of the neuro-cognitive mechanisms of stopping. Specifically, if these many areas are more active on failed stop than successful stop trials, and (at least some of) these areas are situated in pathways that are traditionally assumed to instantiate response inhibition like the hyperdirect pathway, then what function are these areas/pathways involved in? I believe that this work would make a larger impact if the author endeavored to synthesize these results into some kind of theoretical framework for how stopping is instantiated in the brain, even if that framework may be preliminary. 

      I also have one main concern about the analysis. The authors use the mean method for computing SSRT, but this has been shown to be more susceptible to distortion from RT slowing (Verbruggen, Chambers & Logan, 2013 Psych Sci), and goes against the consensus recommendation of using the integration with replacement method (Verbruggen et al., 2019). Therefore, I would strongly recommend replacing all mean SSRT estimates with estimates using the integration with replacement method. 

      I found the paper clearly written and empirically strong. As I mentioned in the public review, I believe that the main shortcoming is the lack of theoretical synthesis. I would encourage the authors to attempt to synthesize these results into some form of theoretical explanation. I would also encourage replacing the mean method with the integration with replacement method for computing SSRT. I also have the following specific comments and suggestions (in the approximate order in which they appear in the manuscript) that I hope can improve the manuscript: 

      We would like to thank reviewer 2 for their insightful and interesting comments. We have adapted our paper to reflect these comments. Please see direct responses to your comments below. We agree with the reviewer that some type of theoretical synthesis would help with the interpretability of the article. We have substantially reworked the discussion and included theoretical considerations behind the newer narrative. Please see all changes to the article indicated by red text.

      (1) The authors say "performance on successful stop trials is quantified by the stop signal reaction time". I don't think this is technically accurate. SSRT is a measure of the average latency of the stop process for all trials, not just for the trials in which subjects successfully stop. 

      Thank you for pointing this technically incorrect statement. We have replaced the above sentence with the following (page 1):

      “Inhibition performance in the SST as a whole is quantified by the stop signal reaction time (SSRT), which estimates the speed of the latent stopping process (Verbruggen et al., 2019).”

      (2) The authors say "few studies have detected differences in the BOLD response between FS and SS trials", but then do not cite any papers that detected differences until several sentences later (de Hollander et al., 2017; Isherwood et al., 2023; Miletic et al., 2020). If these are the only ones, and they only show greater FS than SS, then I think this point could be made more clearly and directly. 

      We have moved the citations to the correct place in the text to be clearer. We have also rephrased this part of the introduction to make the points more direct (page 2).

      “In the subcortex, functional evidence is relatively inconsistent. Some studies have found an increase in BOLD response in the STN in SS > GO contrasts (Aron & Poldrack, 2006; Coxon et al., 2016; Gaillard et al., 2020; Yoon et al., 2019), but others have failed to replicate this (Bloemendaal et al., 2016; Boehler et al., 2010; Chang et al., 2020; B. Xu et al., 2015). Moreover, some studies have actually found higher STN, SN and thalamic activation in failed stop trials, not successful ones (de Hollander et al., 2017; Isherwood et al., 2023; Miletić et al., 2020).

      (3) Unless I overlooked it, I don't believe that the author specified the criterion that any given subject is excluded based upon. Given some studies have significant exclusions (e.g., Poldrack_3T), I think being clear about how many subjects violated each criterion would be useful. 

      This is indeed interesting and important information to include. We have added the number of participants who were excluded for each criterion. Please see added text below (page 4):

      “Based on these criteria, no subjects were excluded from the Aron_3T dataset. 24 subjects were excluded from the Poldrack_3T dataset (3 based on criterion 1, 9 on criterion 2, 11 on criterion 3, and 8 on criterion 4). Three subjects were excluded from the deHollander_7T dataset (2 based on criterion 1 and 1 on criterion 2). Five subjects were excluded from the Isherwood_7T dataset (2 based on criterion 1, 1 on criterion 2, and 2 on criterion 4). Two subjects were excluded from the Miletic_7T dataset (1 based on criterion 2 and 1 on criterion 4). Note that some participants in the Poldrack_3T study failed to meet multiple inclusion criteria.”

      (4) The Method section included very exhaustive descriptions of the neuroimaging processing pipeline, which was appreciated. However, it seems that much of what is presented is not actually used in any of the analyses. For example, it seems that "functional data preprocessing" section may be fMRIPrep boilerplate, which again is fine, but I think it would help to clarify that much of the preprocessing was not used in any part of the analysis pipeline for any results. For example, at first blush, I thought the authors were using global signal regression, but after a more careful examination, I believe that they are only computing global signals but never using them. Similarly with tCompCor seemingly being computed but not used. If possible, I would recommend that the authors share code that instantiates their behavioral and neuroimaging analysis pipeline so that any confusion about what was actually done could be programmatically verified. At a minimum, I would recommend more clearly distinguishing the pipeline steps that actually went into any presented analyses.

      We thank the reviewer for finding this inconsistency. The methods section indeed uses the fMRIprep boilerplate text, which we included so to be as accurate as possible when describing the preprocessing steps taken. While we believe leaving the exact boilerplate text that fMRIprep gives us is the most accurate method to show our preprocessing, we have adapted some of the text to clarify which computations were not used in the subsequent analysis. As a side-note, for future reference, we’d like to add that the fmriprep authors expressly recommend users to report the boilerplate completely and unaltered, and as such, we believe this may become a recurring issue (page 7).

      “While many regressors were computed in the preprocessing of the fMRI data, not all were used in the subsequent analysis. The exact regressors used for the analysis can be found above. For example, tCompCor and global signals were calculated in our generic preprocessing pipeline but not part of the analysis. The code used for preprocessing and analysis can be found in the data and code availability statement.”

      (5) What does it mean for the Poldrack_3T to have N/A for SSD range? Please clarify. 

      Thank you for pointing out this omission. We had not yet found the possible SSD range for this study. We have replaced this value with the correct value (0 – 1000 ms).

      (6) The SSD range of 0-2000ms for deHollander_7T and Miletic_7T seems very high. Was this limit ever reached or even approached? SSD distributions could be a useful addition to the supplement. 

      Thank you for also bringing this mistake to light. We had accidentally placed the max trial duration in these fields instead of the max allowable SSD value. We have replaced the correct value (0 – 900 ms).

      (7) The author says "In addition, median go RTs did not correlate with mean SSRTs within datasets (Aron_3T: r = .411, p = .10, BF = 1.41; Poldrack_3T: r = .011, p = .91, BF = .23; deHollander_7T: r = -.30, p = .09, BF = 1.30; Isherwood_7T: r = .13, p = .65, BF = .57; Miletic_7T: r = .37, p = .19, BF = 1.02), indicating independence between the stop and go processes, an important assumption of the horse-race model (Logan & Cowan, 1984)." However, the independent race model assumes context independence (the finishing time of the go process is not affected by the presence of the stop process) and stochastic independence (the duration of the go and stop processes are independent on a given trial). This analysis does not seem to evaluate either of these forms of independence, as it correlates RT and SSRT across subjects, so it was unclear how this analysis evaluated either of the types of independence that are assumed by the independent race model. Please clarify or remove. 

      Thank you for this comment. We realize that this analysis indeed does not evaluate either context or stochastic independence and therefore we have removed this from the manuscript.

      (8) The RTs in Isherwood_7T are considerably slower than the other studies, even though the go stimulus+response is the same (very simple) stimulus-response mapping from arrows to button presses. Is there any difference in procedure or stimuli that might explain this difference? It is the only study with a visual stop signal, but to my knowledge, there is no work suggesting visual stop signals encourage more proactive slowing. If possible, I think a brief discussion of the unusually slow RTs in Isherwood_7T would be useful. 

      We have included the following text in the manuscript to reflect this observed difference in RT between the Isherwood_7T dataset and the other datasets (page 9).

      “Longer RTs were found in the Isherwood_7T dataset in comparison to the four other datasets. The only difference in procedure in the Isherwood_7T dataset is the use of a visual stop signal as opposed to an auditory stop signal. This RT difference is consistent with previous research, where auditory stop signals and visual go stimuli have been associated with faster RTs compared to unimodal visual presentation (Carrillo-de-la-Peña et al., 2019; Weber et al., 2024). The mean SSRTs and probability of stopping are within normal range, indicating that participants understood the task and responded in the expected manner.”

      (9) When the authors included both 3T and 7T data, I thought they were preparing to evaluate the effect of magnet strength on stop networks, but they didn't do this analysis. Is this because the authors believe there is insufficient power? It seems that this could be an interesting exploratory analysis that could improve the paper.

      We thank the reviewer for this interesting comment. As our dataset sample contains only two 3T and three 7T datasets we indeed believe there is insufficient power to warrant such an analysis. In addition, we wanted the focus of this paper to be how fMRI examines the SST in general, and not differences between acquisition methods. With a greater number of datasets with different imaging parameters (especially TE or resolution) in addition to field strength, we agree such an analysis would be interesting, although beyond the scope of this article.

      (10) The authors evaluate smoothing and it seems that the conclusion that they want to come to is that with a larger smoothing kernel, the results in the stop networks bleed into surrounding areas, producing false positive activity. However, in the absence of a ground truth of the true contributions of these areas, it seems that an alternative interpretation of the results is that the denser maps when using a larger smoothing kernel could be closer to "true" activation, with the maps using a smaller smoothing kernel missing some true activity. It seems worth entertaining these two possible interpretations for the smoothing results unless there is clear reason to conclude that the smoothed results are producing false positive activity. 

      We agree with the view of the reviewer on the interpretation of the smoothing results. We indeed cannot rule this out as a possible interpretation of the results, due to a lack of ground truth. We have added text to the article to reflect this view and discuss the types of errors we can expect for both smaller and larger smoothing kernels (page 15).

      “In the absence of a ground truth, we are not able to fully justify the use of either larger or smaller kernels to analyse such data. On the one hand, aberrantly large smoothing kernels could lead to false positives in activation profiles, due to bleeding of observed activation into surrounding tissues. On the other side, too little smoothing could lead to false negatives, missing some true activity in surrounding regions. While we cannot concretely validate either choice, it should be noted that there is lower spatial uncertainty in the subcortex compared to the cortex, due to the lower anatomical variability. False positives from smoothing spatially unmatched signal, are more likely than false negatives. It may be more prudent for studies to use a range of smoothing kernels, to assess the robustness of their fMRI activation profiles.”

    1. Author response:

      The following is the authors’ response to the original reviews.

      General response:

      We thank all the reviewers for their detailed reviews.

      All reviewers made a number of valuable comments, in particular by highlighting several points that would benefit from additional clarifications and discussion. We really appreciate the time and effort that went into the reviews. We have updated the paper to reflect the changes we have made in response to the reviewers' comments (largely by including more discussion regarding the model limitations and the effect of various modeling choices). We have also included several new supplementary figures (S7, S8, S9, S10) that provide further details of the model behavior, and show the effect of changing some of the terms in the cost. Below, we go through the individual comments, and highlight the places in which we have made changes to address the reviewers’ comments.

      Reviewer 1:

      Thank you for your review and pointing out multiple things to be discussed and clarified! Below, we go through the various limitations you pointed out and refer to the places where we have tried to address them.

      (1) It's important to keep in mind that this work involves simplified models of the motor system, and often the terminology for 'motor cortex' and 'models of motor cortex' are used interchangeably, which may mislead some readers. Similarly, the introduction fails in many cases to state what model system is being discussed (e.g. line 14, line 29, line 31), even though these span humans, monkeys, mice, and simulations, which all differ in crucial ways that cannot always be lumped together.

      That is a good point. We have clarified this in the text (Introduction and Discussion), to highlight the fact that our model isn’t necessarily meant to just capture M1. We have also updated the introduction to make it more clear which species the experiments which motivate our investigation were performed in.

      (2) At multiple points in the manuscript thalamic inputs during movement (in mice) is used as a motivation for examining the role of preparation. However, there are other more salient motivations, such as delayed sensory feedback from the limb and vision arriving in the motor cortex, as well as ongoing control signals from other areas such as the premotor cortex.

      Yes – the motivation for thalamic inputs came from the fact that those have specifically been shown to be necessary for accurate movement generation in mice. However, it is true that the inputs in our model are meant to capture any signals external to the dynamical system modeled, and as such are likely to represent a mixture of sensory signals, and feedback from other areas. We have clarified this in the Discussion, and have added this additional motivation in the Introduction.

      (3) Describing the main task in this work as a delayed reaching task is not justified without caveats (by the authors' own admission: line 687), since each network is optimized with a fixed delay period length. Although this is mentioned to the reader, it's not clear enough that the dynamics observed during the delay period will not resemble those in the motor cortex for typical delayed reaching tasks.

      Yes, we completely agree that the terminology might be confusing. While the task we are modeling is a delayed reaching task, it does differ from the usual setting since the network has knowledge of the delay period, and that is indeed a caveat of the model. We have added a brief paragraph just after the description of the optimal control objective to highlight this limitation.

      We have also performed additional simulations using two different variants of a model-predictive control approach that allow us to relax the assumption that the go-cue time is known in advance. We show that these modifications of the optimal controller yield results that remain consistent with our main conclusions, and can in fact in some settings lead to preparatory activity plateaus during the preparation epoch as often found in monkey M1 (e.g in Elsayed et al. 2016). We have modified the Discussion to explain these results and their limitations, which are summarized in a new Supplementary Figure (S9).

      (4) A number of simplifications in the model may have crucial consequences for interpretation.

      a) Even following the toy examples in Figure 4, all the models in Figure 5 are linear, which may limit the generalisability of the findings.

      While we agree that linear models may be too simplistic, much prior analyses of M1 data suggest that it is often good enough to capture key aspects of M1 dynamics; for example, the generative model underlying jPCA is linear, and Sussillo et al. (2015) showed that the internal activity of nonlinear RNN models trained to reproduce EMG data aligned best with M1 activity when heavily regularized; in this regime, the RNN dynamics were close to linear. Nevertheless, this linearity assumption is indeed convenient from a modeling viewpoint: the optimal control problem is more easily solved for linear network dynamics and the optimal trajectories are more consistent across networks. Indeed, we had originally attempted to perform the analyses of Figure 5 in the nonlinear setting, but found that while the results were overall similar to what we report in the linear regime, iLQR was occasionally trapped into local minimal, resulting in more variable results especially for inhibition-stabilized network in the strongly connected end of the spectrum. Finally, Figure 5 is primarily meant to explore to what extent motor preparation can be predicted from basic linear control-theoretic properties of the Jacobian of the dynamics; in this regard, it made sense to work with linear RNNs (for which the Jacobian is constant).

      b) Crucially, there is no delayed sensory feedback in the model from the plant. Although this simplification is in some ways a strength, this decision allows networks to avoid having to deal with delayed feedback, which is a known component of closed-loop motor control and of motor cortex inputs and will have a large impact on the control policy.

      This comment resonates well with Reviewer 3's remark regarding the autonomous nature (or not) of M1 during movement. Rather than thinking of our RNN models as anatomically confined models of M1 alone, we think of them as models of the dynamics which M1 implements possibly as part of a broader network involving “inter-area loops and (at some latency) sensory feedback”, and whose state appears to be near-fully decodable from M1 activity alone. We have added a paragraph of Discussion on this important point.

      (5) A key feature determining the usefulness of preparation is the direction of the readout dimension. However, all readouts had a similar structure (random Gaussian initialization). Therefore, it would be useful to have more discussion regarding how the structure of the output connectivity would affect preparation, since the motor cortex certainly does not follow this output scheme.

      We agree with this limitation of our model — indeed one key message of Figure 4 is that the degree of reliance on preparatory inputs depends strongly on how the dynamics align with the readout. However, this strong dependence is somewhat specific to low-dimensional models; in higher-dimensional models (most of our paper), one expects that any random readout matrix C will pick out activity dimensions in the RNN that are sufficiently aligned with the most controllable directions of the dynamics to encourage preparation.

      We did consider optimizing C away (which required differentiating through the iLQR optimizer, which is possible but very costly), but the question inevitably arises what exactly should C be optimized for, and under what constraints (e.g fixed norm or not). One possibility is to optimize C with respect to the same control objective that the control inputs are optimized for, and constrain its norm (otherwise, inputs to the M1 model, and its internal activity, could become arbitrarily small as C can grow to compensate). We performed this experiment (new Supplementary Figure S7) and obtained a similar preparation index; there was one notable difference, namely that the optimized readout modes led to greater observability compared to a random readout; thus, the same amount of “muscle energy” required for a given movement could now be produced by a smaller initial condition. In turn, this led to smaller control inputs, consistent with a lower control cost overall.

      Whilst we could have systematically optimized C away, we reasoned that (i) it is computationally expensive, and (ii) the way M1 affects downstream effectors is presumably “optimized” for much richer motor tasks than simple 2D reaching, such that optimizing C for a fixed set of simple reaches could lead to misleading conclusions. We therefore decided to stick with random readouts.

      Additional comments :

      (1) The choice of cost function seems very important. Is it? For example, penalising the square of u(t) may produce very different results than penalising the absolute value.

      Yes, the choice of cost function does affect the results, at least qualitatively. The absolute value of the inputs is a challenging cost to use, as iLQR relies on a local quadratic approximation of the cost function. However, we have included additional experiments in which we penalized the squared derivative of the inputs (Supplementary Figure S8; see also our response to Reviewer 3's suggestion on this topic), and we do see differences in the qualitative behavior of the model (though the main takeaway, i.e. the reliance on preparation, continues to hold). This is now referred to and discussed in the Discussion section.

      (2) In future work it would be useful to consider the role of spinal networks, which are known to contribute to preparation in some cases (e.g. Prut and Fetz, 1999).

      (3) The control signal magnitude is penalised, but not the output torque magnitude, which highlights the fact that control in the model is quite different from muscle control, where co-contraction would be a possibility and therefore a penalty of muscle activation would be necessary. Future work should consider the role of these differences in control policy.

      Thank you for pointing us to this reference! Regarding both of these concerns, we agree that the model could be greatly improved and made more realistic in future work (another avenue for this would be to consider a more realistic biophysical model, e.g. using the MotorNet library). We hope that the current Discussion, which highlights the various limitations of our modeling choices, makes it clear that a lot of these choices could easily be modified depending on the specific assumptions/investigation being performed.

      Reviewer 2:

      Thank you for your positive review! We very much agree with the limitations you pointed out, some of which overlapped with the comments of the other reviewers. We have done our best to address them through additional discussion and new supplementary figures. We briefly highlight below where those changes can be found.

      (1) Though the optimal control theory framework is ideal to determine inputs that minimize output error while regularizing the input norm, it however cannot easily account for some other varied types of objectives especially those that may lead to a complex optimization landscape. For instance, the reusability of parts of the circuit, sparse use of additional neurons when learning many movements, and ease of planning (especially under uncertainty about when to start the movement), may be alternative or additional reasons that could help explain the preparatory activity observed in the brain. It is interesting to note that inputs that optimize the objective chosen by the authors arguably lead to a trade-off in terms of other desirable objectives. Specifically, the inputs the authors derive are time-dependent, so a recurrent network would be needed to produce them and it may not be easy to interpolate between them to drive new movement variants. In addition, these inputs depend on the desired time of output and therefore make it difficult to plan, e.g. in circumstances when timing should be decided depending on sensory signals. Finally, these inputs are specific to the full movement chain that will unfold, so they do not permit reuse of the inputs e.g. in movement sequences of different orders.

      Yes, that is a good point! We have incorporated further Discussion related to this point. We have additionally included a new example in which we regularize the temporal complexity of the inputs (see also our response to Reviewer 3's suggestion on this topic), which leads to more slowly varying inputs, and may indeed represent a more realistic constraint and lead to simpler inputs that can more easily be interpolated between. We also agree that uncertainty about the upcoming go cue may play an important role in the strategy adopted by the animals. While we have not performed an extensive investigation of the topic, we have included a Supplementary Figure (S9) in which we used Model Predictive Control to investigate the effect of planning under uncertainty about the go cue arrival time. We hope that this will give the reader a better sense of what sort of model extensions are possible within our framework.

      (2) Relatedly, if the motor circuits were to balance different types of objectives, the activity and inputs occurring before each movement may be broken down into different categories that may each specialize into one objective. For instance, previous work (Kaufman et al. eNeuron 2016, Iganaki et al., Cell 2022, Zimnik and Churchland, Nature Neuroscience 2021) has suggested that inputs occurring before the movement could be broken down into preparatory inputs 'stricto sensu' - relating to the planned characteristics of the movement - and a trigger signal, relating to the transition from planning to execution - irrespective of whether the movement is internally timed or triggered by an external event. The current work does not address which type(s) of early input may be labeled as 'preparatory' or may be thought of as a part of 'planning' computations.

      Yes, our model does indeed treat inputs in a very general way, and does not distinguish between the different types of processes they may be composed of. This is partly because we do not explicitly model where the inputs come from, such that our inputs likely englobe multiple processes. We have added discussion related to this point.

      (3) While the authors rightly point out some similarities between the inputs that they derive and observed preparatory activity in the brain, notably during motor sequences, there are also some differences. For instance, while both the derived inputs and the data show two peaks during sequences, the data reproduced from Zimnik and Churchland show preparatory inputs that have a very asymmetric shape that really plummets before the start of the next movement, whereas the derived inputs have larger amplitude during the movement period - especially for the second movement of the sequence. In addition, the data show trigger-like signals before each of the two reaches. Finally, while the data show a very high correlation between the pattern of preparatory activity of the second reach in the double reach and compound reach conditions, the derived inputs appear to be more different between the two conditions. Note that the data would be consistent with separate planning of the two reaches even in the compound reach condition, as well as the re-use of the preparatory input between the compound and double reach conditions. Therefore, different motor sequence datasets - notably, those that would show even more coarticulation between submovements - may be more promising to find a tight match between the data and the author's inputs. Further analyses in these datasets could help determine whether the coarticulation could be due to simple filtering by the circuits and muscles downstream of M1, planning of movements with adjusted curvature to mitigate the work performed by the muscles while permitting some amount of re-use across different sequences, or - as suggested by the authors - inputs fully tailored to one specific movement sequence that maximize accuracy and minimize the M1 input magnitude.

      Regarding the exact shape of the occupancy plots, it is important to note that some of the more qualitative aspects (e.g the relative height of the two peaks) will change if we change the parameters of the cost function. Right now, we have chosen the parameters to ensure that both reaches would be performed at roughly the same speed (as a way to very loosely constrain the parameters based on the observed behavior). However, small changes to the hyperparameters can lead to changes in the model output (e.g one of the two consecutive reaches being performed using greater acceleration than the other), and since our biophysical model is fairly simple, changes in the behavior are directly reflected in the network activity. Essentially, what this means is that while the double occupancy is a consistent feature of the model, the exact shape of the peaks is more sensitive to hyperparameters, and we do not wish to draw any strong conclusions from them, given the simplicity of the biophysical model. However, we do agree that our model exhibits some differences with the data. As discussed above, we have included additional discussion regarding the potential existence of separate inputs for planning vs triggering the movement in the context of single reaches.

      Overall, we are excited about the suggestions made by the Reviewer here about using our approach to analyze other motor sequence datasets, but we think that in order to do this properly, one would need to adopt a more realistic musculo-skeletal model (such as one provided by MotorNet).

      (4) Though iLQR is a powerful optimization method to find inputs optimizing the author's cost function, it also has some limitations. First, given that it relies on a linearization of the dynamics at each timestep, it has a limited ability to leverage potential advantages of nonlinearities in the dynamics. Second, the iLQR algorithm is not a biologically plausible learning rule and therefore it might be difficult for the brain to learn to produce the inputs that it finds. It remains unclear whether using alternative algorithms with different limitations - for instance, using variants of BPTT to train a separate RNN to produce the inputs in question - could impact some of the results.

      We agree that our choice of iLQR has limitations: while it offers the advantage of convergence guarantees, it does indeed restrict the choice of cost function and dynamics that we can use. We have now included extensive discussion of how the modeling choices affect our results.

      We do not view the lack of biological plausibility of iLQR as an issue, as the results are agnostic to the algorithm used for optimization. However, we agree that any structure imposed on the inputs (e.g by enforcing them to be the output of a self-contained dynamical system) would likely alter the results. A potentially interesting extension of our model would be to do just what the reviewer suggested, and try to learn a network that can generate the optimal inputs. However, this is outside the scope of our investigation, as it would then lead to new questions (e.g what brain region would that other RNN represent?).

      (5) Under the objective considered by the authors, the amount of input occurring before the movement might be impacted by the presence of online sensory signals for closed-loop control. It is therefore an open question whether the objective and network characteristics suggested by the authors could also explain the presence of preparatory activity before e.g. grasping movements that are thought to be more sensory-driven (Meirhaeghe et al., Cell Reports 2023).

      It is true that we aren’t currently modeling sensory signals explicitly. However, some of the optimal inputs we infer may be capturing upstream information which could englobe some sensory information. This is currently unclear, and would likely depend on how exactly the model is specified. We have added new discussion to emphasize that our dynamics should not be understood as just representing M1, but more general circuits whose state can be decoded from M1.

      Reviewer #2 (Recommendations For The Authors):

      Additionally, thank you for pointing out various typos in the manuscript, we have fixed those!

      Reviewer 3:

      Thank you very much for your review, which makes a lot of very insightful points, and raises several interesting questions. In summary, we very much agree with the limitations you pointed out. In particular, the choice of input cost is something we had previously discussed, but we had found it challenging to decide on what a reasonable cost for “complexity” could be. Following your comment, we have however added a first attempt at penalizing “temporal complexity”, which shows promising behavior. We have only included those additional analyses as supplementary figures, and we have included new discussion, which hopefully highlights what we meant by the different model components, and how the model behavior may change as we vary some of our choices. We hope this can be informative for future models that may use a similar approach. Below, we highlight the changes that we have made to address your comments.

      The main limitation of the study is that it focuses exclusively on one specific constraint - magnitude - that could limit motor-cortex inputs. This isn't unreasonable, but other constraints are at least as likely, if less mathematically tractable. The basic results of this study will probably be robust with regard such issues - generally speaking, any constraint on what can be delivered during execution will favor the strategy of preparing - but this robustness cuts both ways. It isn't clear that the constraint used in the present study - minimizing upstream energy costs - is the one that really matters. Upstream areas are likely to be limited in a variety of ways, including the complexity of inputs they can deliver. Indeed, one generally assumes that there are things that motor cortex can do that upstream areas can't do, which is where the real limitations should come from. Yet in the interest of a tractable cost function, the authors have built a system where motor cortex actually doesn't do anything that couldn't be done equally well by its inputs. The system might actually be better off if motor cortex were removed. About the only thing that motor cortex appears to contribute is some amplification, which is 'good' from the standpoint of the cost function (inputs can be smaller) but hardly satisfying from a scientific standpoint.

      The use of a term that punishes the squared magnitude of control signals has a long history, both because it creates mathematical tractability and because it (somewhat) maps onto the idea that one should minimize the energy expended by muscles and the possibility of damaging them with large inputs. One could make a case that those things apply to neural activity as well, and while that isn't unreasonable, it is far from clear whether this is actually true (and if it were, why punish the square if you are concerned about ATP expenditure?). Even if neural activity magnitude an important cost, any costs should pertain not just to inputs but to motor cortex activity itself. I don't think the authors really wish to propose that squared input magnitude is the key thing to be regularized. Instead, this is simply an easily imposed constraint that is tractable and acts as a stand-in for other forms of regularization / other types of constraints. Put differently, if one could write down the 'true' cost function, it might contain a term related to squared magnitude, but other regularizing terms would by very likely to dominate. Using only squared magnitude is a reasonable way to get started, but there are also ways in which it appears to be limiting the results (see below).

      I would suggest that the study explore this topic a bit. Is it possible to use other forms of regularization? One appealing option is to constrain the complexity of inputs; a long-standing idea is that the role of motor cortex is to take relatively simple inputs and convert them to complex time-evolving inputs suitable for driving outputs. I realize that exploring this idea is not necessarily trivial. The right cost-function term is not clear (should it relate to low-dimensionality across conditions, or to smoothness across time?) and even if it were, it might not produce a convex cost function. Yet while exploring this possibility might be difficult, I think it is important for two reasons.

      First, this study is an elegant exploration of how preparation emerges due to constraints on inputs, but at present that exploration focuses exclusively on one constraint. Second, at present there are a variety of aspects of the model responses that appear somewhat unrealistic. I suspect most of these flow from the fact that while the magnitude of inputs is constrained, their complexity is not (they can control every motor cortex neuron at both low and high frequencies). Because inputs are not complexity-constrained, preparatory activity appears overly complex and never 'settles' into the plateaus that one often sees in data. To be fair, even in data these plateaus are often imperfect, but they are still a very noticeable feature in the response of many neurons. Furthermore, the top PCs usually contain a nice plateau. Yet we never get to see this in the present study. In part this is because the authors never simulate the situation of an unpredictable delay (more on this below) but it also seems to be because preparatory inputs are themselves strongly time-varying. More realistic forms of regularization would likely remedy this.

      That is a very good point, and it mirrors several concerns that we had in the past. While we did focus on the input norm for the sake of simplicity, and because it represents a very natural way to regularize our control solutions, we agree that a “complexity cost” may be better suited to models of brain circuits. We have addressed this in a supplementary investigation. We chose to focus on a cost that penalizes the temporal complexity of the inputs, as ||u(t+1) - u(t)||^2. Note that this required augmenting the state of the model, making the computations quite a bit slower; while it is doable if we only penalize the first temporal derivative, it would not scale well to higher orders.

      Interestingly, we did find that the activity in that setting was somewhat more realistic (see new Supplementary Figure S8), with more sustained inputs and plateauing activity. While we have kept the original model for most of the investigations, the somewhat more realistic nature of the results under that setting suggests that further exploration of penalties of that sort could represent a promising avenue to improve the model.

      We also found the idea of a cost that would ensure low-dimensionality of the inputs across conditions very interesting. However, it is challenging to investigate with iLQR as we perform the optimization separately for each condition; nevertheless, it could be investigated using a different optimizer.

      At present, it is also not clear whether preparation always occurs even with no delay. Given only magnitude-based regularization, it wouldn't necessarily have to be. The authors should perform a subspace-based analysis like that in Figure 6, but for different delay durations. I think it is critical to explore whether the model, like monkeys, uses preparation even for zero-delay trials. At present it might or might not. If not, it may be because of the lack of more realistic constraints on inputs. One might then either need to include more realistic constraints to induce zero-delay preparation, or propose that the brain basically never uses a zero delay (it always delays the internal go cue after the preparatory inputs) and that this is a mechanism separate from that being modeled.

      I agree with the authors that the present version of the model, where optimization knows the exact time of movement onset, produces a reasonably realistic timecourse of preparation when compared to data from self-paced movements. At the same time, most readers will want to see that the model can produce realistic looking preparatory activity when presented with an unpredictable delay. I realize this may be an optimization nightmare, but there are probably ways to trick the model into optimizing to move soon, but then forcing it to wait (which is actually what monkeys are probably doing). Doing so would allow the model to produce preparation under the circumstances where most studies have examined it. In some ways this is just window-dressing (showing people something in a format they are used to and can digest) but it is actually more than that, because it would show that the model can produce a reasonable plateau of sustained preparation. At present it isn't clear it can do this, for the reasons noted above. If it can't, regularizing complexity might help (and even if this can't be shown, it could be discussed).

      In summary, I found this to be a very strong study overall, with a conceptually timely message that was well-explained and nicely documented by thorough simulations. I think it is critical to perform the test, noted above, of examining preparatory subspace activity across a range of delay durations (including zero) to see whether preparation endures as it does empirically. I think the issue of a more realistic cost function is also important, both in terms of the conceptual message and in terms of inducing the model to produce more realistic activity. Conceptually it matters because I don't think the central message should be 'preparation reduces upstream ATP usage by allowing motor cortex to be an amplifier'. I think the central message the authors wish to convey is that constraints on inputs make preparation a good strategy. Many of those constraints likely relate to the fact that upstream areas can't do things that motor cortex can do (else you wouldn't need a motor cortex) and it would be good if regularization reflected that assumption. Furthermore, additional forms of regularization would likely improve the realism of model responses, in ways that matter both aesthetically and conceptually. Yet while I think this is an important issue, it is also a deep and tricky one, and I think the authors need considerable leeway in how they address it. Many of the cost-function terms one might want to use may be intractable. The authors may have to do what makes sense given technical limitations. If some things can't be done technically, they may need to be addressed in words or via some other sort of non-optimization-based simulation.

      Specific comments

      As noted above, it would be good to show that preparatory subspace activity occurs similarly across delay durations. It actually might not, at present. For a zero ms delay, the simple magnitude-based regularization may be insufficient to induce preparation. If so, then the authors would either have to argue that a zero delay is actually never used internally (which is a reasonable argument) or show that other forms of regularization can induce zero-delay preparation.

      Yes, that is a very interesting analysis to perform, which we had not considered before! When investigating this, we found that the zero-delay strategy does not rely on preparation in the same way as is seen in the monkeys. This seems to be a reflection of the fact that our “Go cue” corresponds to an “internal” go cue which would likely come after the true, “external go cue” – such that we would indeed never actually be in the zero delay setting. This is not something we had addressed (or really considered) before, although we had tried to ensure we referred to “delta prep” as the duration of the preparatory period but not necessarily the delay period. We have now included more discussion on this topic, as well as a new Supplementary Figure S10.

      I agree with the authors that prior modeling work was limited by assuming the inputs to M1, which meant that prior work couldn't address the deep issue (tackled here) of why there should be any preparatory inputs at all. At the same time, the ability to hand-select inputs did provide some advantages. A strong assumption of prior work is that the inputs are 'simple', such that motor cortex must perform meaningful computations to convert them to outputs. This matters because if inputs can be anything, then they can just be the final outputs themselves, and motor cortex would have no job to do. Thus, prior work tried to assume the simplest inputs possible to motor cortex that could still explain the data. Most likely this went too far in the 'simple' direction, yet aspects of the simplicity were important for endowing responses with realistic properties. One such property is a large condition-invariant response just before movement onset. This is a very robust aspect of the data, and is explained by the assumption of a simple trigger signal that conveys information about when to move but is otherwise invariant to condition. Note that this is an implicit form of regularization, and one very different from that used in the present study: the input is allowed to be large, but constrained to be simple. Preparatory inputs are similarly constrained to be simple in the sense that they carry only information about which condition should be executed, but otherwise have little temporal structure. Arguably this produces slightly too simple preparatory-period responses, but the present study appears to go too far in the opposite direction. I would suggest that the authors do what they can to address these issue via simulations and/or discussion. I think it is fine if the conclusion is that there exist many constraints that tend to favor preparation, and that regularizing magnitude is just one easy way of demonstrating that. Ideally, other constraints would be explored. But even if they can't be, there should be some discussion of what is missing - preparatory plateaus, a realistic condition-invariant signal tied to movement onset - under the present modeling assumptions.

      As described above, we have now included two additional figures. In the first one (S8, already discussed above), we used a temporal smoothness prior, and we indeed get slightly more realistic activity plateaus. In a second supplementary figure (S9), we have also considered using model predictive control (MPC) to optimize the inputs under an uncertain go cue arrival time. There, we found that removing the assumption that the delay period is known came with new challenges: in particular, it requires the specification of a “mental model” of when the Go cue will arrive. While it is reasonable to expect that monkeys will have a prior over the go time arrival cue that will be shaped by the design of the experiment, some assumptions must be made about the utility functions that should be used to weigh this prior. For instance, if we imagine that monkeys carry a model of the possible arrival time of the go cue that is updated online, they could nonetheless act differently based on this information, for instance by either preparing so as to be ready for the earliest go cue possible or alternatively to be ready for the average go cue. This will likely depend on the exact task design and reward/penalty structure. Here, we added simulations with those two cases (making simplifying assumptions to make the problem tractable/solvable using model predictive control), and found that the “earliest preparation” strategy gives rise to more realistic plateauing activity, while the model where planning is done for the “most likely go time” does not. We suspect that more realistic activity patterns could be obtained by e.g combining this framework with the temporal smoothness cost. However, the main point we wished to make with this new supplementary figure is that it is possible to model the task in a slightly more realistic way (although here it comes at the cost of additional model assumptions). We have now added more discussion related to those points. Note that we have kept our analyses on these new models to a minimum, as the main takeaway we wish to convey from them is that most components of the model could be modified/made more realistic. This would impact the qualitative behavior of the system and match to data but – in the examples we have so far considered – does not appear to modify the general strategy of networks relying on preparation.

      On line 161, and in a few other places, the authors cite prior work as arguing for "autonomous internal dynamics in M1". I think it is worth being careful here because most of that work specifically stated that the dynamics are likely not internal to M1, and presumably involve inter-area loops and (at some latency) sensory feedback. The real claim of such work is that one can observe most of the key state variables in M1, such that there are periods of time where the dynamics are reasonably approximated as autonomous from a mathematical standpoint. This means that you can estimate the state from M1, and then there is some function that predicts the future state. This formal definition of autonomous shouldn't be conflated with an anatomical definition.

      Yes, that is a good point, thank you for making it so clearly! Indeed, as previous work, we do not think of our “M1 dynamics” as being internal to M1, but they may instead include sensory feedback / inter-area loops, which we summarize into the connectivity, that we chose to have dynamics that qualitatively resemble data. We have now incorporated more discussion regarding what exactly the dynamics in our model represent.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      eLife assessment 

      Dasgupta and colleagues make a valuable contribution to the understanding how the guidance factor Sema7a promotes connections between mechanosensory hair cells and afferent neurons of the zebrafish lateral line system. The authors provide solid evidence that loss of Sema7a function results in fewer contacts between hair cells and afferents through comprehensive quantitative analysis. Additional work is needed to distinguish the effects of different isoforms of Sema7a to determine whether there are specific roles of secreted and membrane bound forms. 

      Public Reviews:

      Reviewer #1 (Public Review):

      Dasguta et al. have dissected the role of Sema7a in fine tuning of a sensory microcircuit in the posterior lateral line organ of zebrafish. They attempt to also outline the different roles of a secreted verses membrane-bound form of Sema7a in this process. Using genetic perturbations and axonal network analysis, the authors show that loss of both Sema7a isoforms causes abnormal axon terminal structure with more bare terminals and fewer loops in contact with presynaptic sensory hair cells. Further, they show that loss of Sema7a causes decreased number and size of both the pre- and post-synapse. Finally, they show that overexpression of the secreted form of Sema7a specifically can elicit axon terminal outgrowth to an ectopic Sema7a expressing cell. Together, the analysis of Sema7a loss of function and overexpression on axon arbor structure is fairly thorough and revealed a novel role for Sema7a in axon terminal structure. However, the connection between different isoforms of Sema7a and the axon arborization needs to be substantiated. Furthermore, the effect of loss of Sema7a on the presynaptic cell is not ruled out as a contributing factor to the synaptic and axon structure phenotypes. These issues weaken the claims made by the authors including the statement that they have identified dual roles for the GPI-anchored verses secreted forms of Sema7a on synapse formation and as a chemoattractant for axon arborization respectively. 

      Reviewer #2 (Public Review):

      In this work, Dasgupta et al. investigates the role of Sema7a in the formation of peripheral sensory circuit in the lateral line system of zebrafish. They show that Sema7a protein is present during neuromast maturation and localized, in part, to the base of hair cells (HCs). This would be consistent with pre-synaptic Sema7a mediating formation and/or stabilization of the synapse. They use sema7a loss-of-function strain to show that lateral line sensory terminals display abnormal arborization. They provide highly quantitative analysis of the lateral line terminal arborization to show that a number of specific topological parameters are affected in mutants. Next, they ectopically express a secreted form of Sema7a to show that lateral line terminals can be ectopically attracted to the source. Finally, they also demonstrate that the synaptic assembly is impaired in the sema7a mutant. Overall, the data are of high quality and properly controlled. The availability of Sema7a antibody is a big plus, as it allows to address the endogenous protein localization as well to show the signal absence in the sema7a mutant. The quantification of the arbor topology should be useful to people in the field who are looking at the lateral line as well as other axonal terminals. I think some results are overinterpreted though. The authors state: "Our findings demonstrate that Sema7A functions both as a juxtracrine and as a secreted cue to pattern neural circuitry during sensory organ development." However, they have not actually demonstrated which isoform functions in HCs (also see comments below). In addition, they have to be careful in interpreting their topology analysis, as they cannot separate individual axons. Thus, such analysis can generate artifacts. They can perform additional experiments to address these issues or adjust their interpretations. 

      Reviewer #3 (Public Review):

      The data reported here demonstrate that Sema7a defines the local behavior of growing axons in the developing zebrafish lateral line. The analysis is sophisticated and convincingly demonstrates effects on axon growth and synapse architecture. Collectively, the findings point to the idea that the diffusible form of sema7a may influence how axons grow within the neuromast and that the GPI-linked form of sema7a may subsequently impact how synapses form, though additional work is needed to strongly link each form to its' proposed effect on circuit assembly. 

      The revised manuscript is significantly improved. The authors comprehensively and appropriately addressed most of the reviewers' concerns. In particular, they added evidence that hair cells express both Sema7A isoforms, showed that membrane bound Sema7A does not have long range effects on guidance, demonstrated how axons behave close to ectopic Sema7A, and analyzed other features of the hair cells that revealed no strong phenotypes. The authors also softened the language in many, but not all places. Overall, I am satisfied with the study as a whole. 

      Reviewer #4 (Public Review):

      This study provides direct evidence showing that Sema7a plays a role in the axon growth during the formation of peripheral sensory circuits in the lateral-line system of zebrafish. This is a valuable finding because the molecules for axon growth in hair-cell sensory systems are not well understood. The majority of the experimental evidence is convincing, and the analysis is rigorous. The evidence supporting Sema7a's juxtracrine vs. secreted role and involvement in synapse formation in hair cells is less conclusive. The study will be of interest to cell, molecular and developmental biologists, and sensory neuroscientists. 

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      In their revised manuscript, Dasgupta et al. have provided further experiments to address the role of Sema7a (sec and GPI-anchored) in regulating axon guidance in the lateral line system. Specifically, the inclusion of the heat shock controls and FM labeling to show hair cell mechanotransduction were crucial to interpretation of the results. However, there are still concerns about the specificity of the results. My primary concern is if the change in axon patterning is specifically due to loss of Sema7a in the mutant hair cells. These animals are morphologically very abnormal and, in the rebuttal, the authors state that hair cell number is reduced. This is not quantified in the manuscript and should be included. 

      Thank you for this suggestion. We have included the data in the manuscript in lines 137-139, in Figure 2—figure supplement 1B, and in the source data for Figure 2 and Figure 2-figure supplements.

      If there is not a function for Sema7a in hair cells themselves, why is the number reduced? 

      The sema7a-/- homozygous mutants are not viable and they die by 6 dpf. The loss of Sema7A protein produce other developmental defects including brain edema and a curved body axis. We believe a slight but not significant decrease in hair cell number may arise from a minute developmental delay in the morphogenesis of the neuromast. We have accordingly quantified our data at three distinct developmental stages-at 2 dpf, 3 dpf, and 4 dpf-and have incorporated them in the revised manuscript.

      Additionally, FM data should be quantified and presented in animals without a transgene in the same excitation/emission spectra for clearer interpretation of the staining.

      We have quantified the intensities of labeling with FM 4-64 styryl dye from the control and the sema7a-/- mutant larvae and incorporated the data in lines 139-146, in Figure 2—figure supplement 1D, and in source data for Figure 2 and Figure 2-figure supplements. We Kept the transgenes to concurrently show the arborization phenotype, hair cell morphology, and the FM 4-64 incorporation between the genotypes. 

      Rescue analysis using the myo6d promotor would allow the authors to ensure that the axon deficits can be rescued by putting Sema7a back into the sensory hair cells. Transient transgenesis could be useful for this approach and would not require the creation of a stable line. This could be done with both forms of Sema7a allowing the true assessment of whether or not the secreted and GPI-anchored form have disparate functions as claimed in lines 418424. 

      Although we recognize the importance of the rescue of the sema7a-/- mutant phenotype with the sema7asec and the sema7aGPI transcripts, it is not possible for us to perform that experiment at the moment, for the first author will leave the lab next week.  However, he plans to continue work on this project as an independent investigator to dissect the individual roles of the transcript variants in specifying the pattern of sensory arborization, a project that includes generation of transcript-specific knockout animals and rescue experiments with stable transgenic fish lines. 

      Other concerns:

      (1) The timeline of the heat shock experiment is confusing to me and, therefore, it makes me question the specificity of those results. Based on the speed of axon outgrowth and the time necessary for transcription and translation after heat shock induction of the transgene, it is unclear to me how the axon growth defects could occur in the timeline provided. Imaging two hours after the start of the heat shock is very rapid and speaks to either an indirect effect of the transgenesis on the axon growth or a leaky promotor/induction paradigm. It is possible I am just misunderstanding the set up but, from what I could gather, the imaging is being done 2 hrs after the start of the heat shock. This should be clarified. 

      The axons of the zebrafish posterior lateral line migrate relatively fast. The pioneering axons migrate at around 120 μm/hour (Sato et. al., 2010) and the follower axons migrate at almost 30-80 μm/hour (Sato et. al., 2010). The heat-shock promoter that we have utilized, hsp70l, is highly effective in inducing gene expression and subsequent protein formation within 30 to 60 mins. We believe an hour of heat shock and an hour of incubation post heat shock is sufficient to induce directed axon migration to a distance that spans from 27 μm to 140 μm. 

      We strongly believe that the directed arborization of the sensory axons towards the Sema7Asec source is not due to an indirect effect of transgenesis or leaky promoter induction, as in all 18 of the injected but not heat-shocked control larvae we did not observe ectopic Sema7Asec expression, and no aberrant projection was formed from the sensory arbor network. We highlight this observation in lines 297-299 and in Figure 4E.

      Sato et. al., 2010: Single-cell analysis of somatotopic map formation in the zebrafish lateral line system. Developmental Dynamics 239:2058–2065, 2010.

      Similarly, it would help to clarify if t(0) in the figure is the onset of the heat shock or onset of imaging two hours after the heat shock is started. 

      The t=0 hour in the Figure 4I denotes the onset of imaging two hours after the heat shock began. We have clarified this in the manuscript in lines 1155-1156.

      (2) In the rebuttal, the line numbers cited do not match up with the appropriate text, I believe.

      We have corrected this and updated the manuscript.

      (3) Some of the supplemental figures are not mentioned in the text, or I could not find them. For example: Figure 1 supplement 2J. 

      Thank you for pointing this. We have corrected the manuscript, and the new information is added in line 114.  

      (4) Table 1 statistics: were these adjusted for multiple comparisons using a bonferroni correction or something similar? This is necessary for statistical significance to be meaningful. 

      We did not adjust the p-values for multiple comparisons because the values correspond to only three or four statistical tests per experiment, strongly indicating the unlikelihood of erroneous significance due solely to multiple tests.

      (5) Figure 1I and 1-S3 - The legend states a positive correlation between axonal signal and sema7A signal. Correlations are 0.5, 0.6, and 0.4 (2,3, 4dpf). This is not a convincing positive correlation. At best this is no to a very weak positive correlation. 

      In lines 122-126 we mention that the basal association of the sensory arbors shows a positive correlation with Sema7A accumulation. We never emphasize on the strength of the correlation. However, a consistent positive correlation at three different developmental stages suggests that progressive Sema7A accumulation at the base of the hair cells may guide the sensory arbors to increasingly associate themselves with the hair cells.    

      Reviewer #2 (Recommendations For The Authors):

      I am a bit disappointed that the authors elected not to experimentally address the issue raised by all reviewers: whether the secreted or membrane bound isoform is active in hair cells. They rather decided to change their interpretation in the text. It is fine, given the eLife review structure. However, that would make the manuscript much stronger. Other issues were adequately addressed through textual changes as well. 

      Although we recognize the importance of the rescue of the sema7a-/- mutant phenotype with the sema7asec and the sema7aGPI transcripts, it is not possible for us to perform that experiment at the moment, for the first author will leave the lab next week.  However, he plans to continue work on this project as an independent investigator to dissect the individual roles of the transcript variants in specifying the pattern of sensory arborization, a project that includes generation of transcript-specific knockout animals and rescue experiments with stable transgenic fish lines. 

      Reviewer #3 (Recommendations For The Authors):

      Overall, I am satisfied with the study as a whole and just have a few minor comments that remain to be addressed. 

      (1) Although the authors say that they added appropriate no plasmid/heatshock-only and plasmid-only/no heatshock controls, these results need to be presented more clearly, as they are separated in the paper and only one was quantified (i.e. 100% of embryos showed no defect). Please just make it clear that no defects were observed in either control for either experiment (both secreted and membrane bound ectopic expression). 

      We have clearly stated this information in lines 297-299 and 343-345.

      (2) Please add a compass to Fig. 1A to indicate the orientation of the neuromast. It would also be helpful to add labels for developmental ages to all of the figures, rather than making the reader look it up in the legend. 

      We have updated the Figure 1A and the corresponding figure legend in lines 882883 . We have denoted the larval age in the figure legends to keep the individual images uncluttered.  

      (3) For the RT-PCR experiments in Figure 1, no negative control was included to show that supporting cell or neuronal genes are not detected in the purified hair cells and v.v. that neither isoform is detected in supporting cells or neurons. I ask only because there is a lot of immune-signal outside of the hair cells and I am curious whether that is secreted or might come from other cell types. For neurons and supporting cells, simply demonstrating absence of Sema7a overall would suffice. 

      We have utilized the transgenic line Tg(myo6b:actb1-EGFP) that expresses the fluorophore GFP specifically in the hair cells of the neuromast. Unfortunately, we do not possess a transgenic line that reliably and specifically labels the support cells in the neuromast. Hence, in our sorting experiment the GFP-negative cells that are collected from the trunk segments of the larvae contain all the non-hair cells including epidermal cells, neuronal cells, and immune cells etc. Such a mixture of varied cellular identity may not serve as a reliable negative control. 

      In Figure 7, we have plotted the normalized expression values of the sema7a gene in the neuromast. The plot clearly depicts that the source of Sema7A is the young and the mature hair cells, not the support cells. We further confirm this observation by

      immunostaining where the Sema7A signal is highly restricted to the hair cells and not in any other cell in the neuromast (Figure 1E). Immunostaining further demonstrates that the lateral line sensory arbors also do not produce the Sema7A protein (Figure 1H; Video 1).

      We agree with the reviewer that there are diverse immune cells, including macrophages in and around the neuromast. These macrophages are dynamic and possess highly ramified structure (Denans et. al., 2022). In all our Sema7A immunostainings, we never observed structures that resemble macrophages. Albeit we cannot confirm that Sema7A is not expressed in a distant immune cell, but we highly doubt that signal coming from immune cells is impacting hair cell innervation by the sensory arbors during homeostatic development.

      Denans et. al., 2022: Nature Communications volume 13, Article number: 5356 (2022).

      (4) In Figure 1, Supplement 4, I do not see the immunogen labeled in blue. 

      We have corrected the figure legend. The immunogenic region of the Sema7A protein is now clearly denoted in the figure legend of Figure 1—figure supplement 4.

      (5) In Figure 2, please add a control image as requested, as that enables direct comparison. There is ample room in the figure. 

      We have updated the Figure 2 and made the suggested change.

      (6) In Figure 2, Supplement 1, the FM4-64 data are not presented in a quantified fashion. Please report at least how many embryos showed reliable uptake and preferably how many hair cells per embryo showed reliable uptake. 

      We have quantified the FM 4-64 intensities in control and sema7a-/- mutant larvae. The new data is added to the manuscript in lines 142-146, 577-579 , and in Figure 2—figure supplement 1D.

      (7) In Figure 3, there seems to be a typo in the figure legend: "mutants in the same larvae" does not make sense to me. 

      We have corrected the error. The modified statement is represented in lines 10671068.

      (8) The text should refer more explicitly to the statistical tests reported in Table 1, i.e. as the results are presented. 

      In lines 1105 and 1109, we clearly state the statistical tests that were performed.

      (9) In Figure 6, Supplement 1, please show the raw data points not just the bar graphs

      We have updated the Figure 6—figure supplement 1.

      (10) Minor point: the authors state that they addressed the distance over which secreted Sema7A may act, but this was not evident to me in the text. Please make this finding clearer.

      We have clarified this information in lines 310-311.

      (11) Finally, the discussion contains a statement that is not supported by the data: "We have discovered dual modes of Sema7A function in vivo." They have discovered evidence that there are two isoforms, that loss of both disrupts connectivity, and that overexpression of only the secreted form can elicit growth from a distance. However, there is no direct evidence that the membrane-bound form is responsible for local effects. It is formally possible still that the phenotypes are a result of dual roles for the secreted form. It is clear that another manuscript is forthcoming that will expand on the role of the transmembrane form, but for this manuscript, the authors should make firm conclusions only about the data presented herein.

      Thank you for this suggestion. We have modified the manuscript in lines 425-434.

      Reviewer #4 (Recommendations For The Authors):

      The authors have made significant changes to the manuscript based on the comments of the reviewers. It is now suitable for publication.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      I have no more experiment to ask but the following errors should be corrected prior.

      (1) L. 183-198: Figure 3 panels were erroneously referred in several places.

      This has been corrected.

      (2) L.182-183: description of active/total cell numbers in main text does not match numbers in Figure 3B

      This has been corrected.

      (3) L.185-187: Figure 3C indicates significant changes of rheobase only between DMI+6OHDA versus 6-OHDA group. Statistical comparison between sham and DMI+6-OHDA was not provided, which may change the interpretation of the data in Figure 3B, C: "...these findings suggest that the 6-OHDA induced lesion of midbrain dopaminergic neurons evoked the increased firing of DRN5-HT neurons" (L.185-187).

      We thank the reviewer for highlighting this point. Indeed, a Kruskal-Wallis test comparing all three groups revealed a significantly lower rheobase in DMI + 6-OHDA mice compared to Sham while the 6-OHDA injected group was not affected. Therefore, the increased firing of DRN5-HT neurons recorded in 6-OHDA injected mice pretreated with DMI also critically involves the noradrenergic system. This is now included in the revised results section of the manuscript (lines 190-197).

      (4) L. 188: The description of "While the excitability of DRN5-HT neurons was not affected in 6-OHDA mice..." does not match the clearly increased cellular excitability shown in Figure 3G-I.

      This has been corrected and we are now referring more specifically to the rheobase, which is not affected in 6-OHDA mice.

      (5) Mann-Whitney tests were inappropriately used for statistics in Figures 3-6: Multiple comparisons (>=3 groups) should be performed one-way ANOVA or the Kruskal-Wallis test for nonparametric data.

      We thank the reviewer for the comment. We now applied the one-way ANOVA/KruskalWallis tests and the text has been modified accordingly.

      (6) It seems that the data points in some panels of Figure 4C represented a cell, but others were averaged within a mouse (Figure 4D). This needs to be clarified or corrected.

      None of the data in Figure 4 was averaged within a mouse. In the the type of chosen graph (aligned dot plot) the equal data are overlapped.

      Reviewer #2 (Recommendations For The Authors):

      The authors' revised manuscript has addressed most of my concerns. However, I'm not convinced by the authors' claim regarding Figure 5B. It would be great if the authors at least discuss in their manuscript why the DMI pretreatment group alone, not the 6OHDA group, significantly lowers the firing rate of DRN (DA) and increases the Erest of DRN (DA), compared to the sham-lesion group. These statistically significant data are not explained at all in the revised manuscript (This effect can be explained by the neuroprotection of NA-neurons from 6-OHDA toxicity?).

      We thank the reviewer for this comment. Since using a one-way ANOVA or a KruskalWallis test for comparing the three groups (as suggested by reviewer 1), the changes previously shown in Figure 5B are not significant.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      eLife assessment

      This manuscript represents a cleanly designed experiment for assessing biological motion processing in children (mean age = 9) with and without ADHD. The group differences concerning accuracy in global and local motion processing abilities are solid, but the analyses suggesting dissociable relationships between global and local processing and social skills, age, and IQ are inconclusive. The results are useful in terms of understanding ADHD and the ontogenesis of different components of the processing of biological motion.

      We thank the editors and reviewers for their valuable feedback and constructive comments. We have carefully considered each point raised by the reviewers and made the necessary revisions to the manuscript. Regarding the relationships between global and local BM processing, the accumulated evidence from previous studies has converged on the dissociation of the two BM components, e.g., while global BM processing is susceptible to learning and practice, local BM processing does not show a learning trend (Chang and Troje, 2009; Grossman et al., 2004), and the brain activations in response to local and global BM cues are different (Chang et al., 2018; Duarte et al., 2022). Nevertheless, we concurred with reviewers that the evidence for such dissociation from the current study by itself is not strong enough. Therefore, we have toned down on this point and no longer claimed the dissociation (including the title). Based on the current results, we focused our discussion on the different aspects of BM processing in children with and without ADHD.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The paper presents a nice study investigating the impairments of biological motion perception in individuals with ADHD in comparison with neurotypical controls. Motivated by the idea that there is a relationship between biological motion perception and social capabilities, the authors investigated the impairments of local and global (holistic) biological motion perception, the diagnosis status, and several additional behavioral variables that are affected in ADHS (IQ, social responsiveness, and attention / impulsivity). As well local as global biological motion perception is impaired in ADHD individuals. In addition, the study demonstrates a significant correlation between local biological motion perception skills and the social responsiveness score in the ADHD group, but not in controls. A path analysis in the ADHD group suggests that general performance in biological motion perception is influenced mainly by global biological motion perception performance and attentional and perceptual reasoning skills.

      Strengths:

      It is true that there exists not much work on biological motion perception and ADHD. Therefore, the presented study contributes an interesting new result to the biological motion literature, and adds potentially also new behavioral markers for this clinical group. The design of the study is straightforward and technically sound, and the drawn conclusions are supported by the presented results.

      Thanks for this positive assessment of our work.

      Weaknesses:

      Some of the claims about the relationship between genetic factors and ADHD and the components of biological motion processing have to remain speculative at this point because genetic influences were not explicitly tested in this paper. Specifically, the hypothesis that the perception of human social interaction is critically based on a local mechanism for the detection of asymmetry in foot trajectories of walkers (this is what 'BL-local' really measures), or on the detection of live agents in cluttered scenes seems not very plausible.

      Thanks for these comments. We agree that the relationship between genetic factors and BM perception remains to be further examined, as we did not test the genetic influences in this study. We have deleted relavant discussion about genetics. Based on our results, we discuss the possible mechanisms behind the relationship between local BM processing and social interaction in the revised manuscript as follows:

      “As mentioned above, we found a significant negative correlation between the SRS total score and the accuracy of local BM processing, specifically in the ADHD group. This could be due to decreased visual input related to atypical local BM processing, which further impairs global BM processing. According to the two-process theory of biological motion processing61, local BM cues guide visual attention towards BM stimuli55,62. Consequently, the visual input of BM stimuli increases, facilitating the development of the ability to process global BM cues through learning21,63. The latter is a prerequisite for attributing intentions to others and facilitating social interactions with other individuals20,64,65. Thus, atypical local BM processing may contribute to impaired social interaction through altered visual inputs. Further empirical studies are required to confirm these hypotheses.” (lines 417 - 428)

      Based on my last comments, now the discussion has been changed in a way that tries to justify the speculative claims by citing a lot of other speculative papers, which does not really address the problem. For example, the fact that chicks walk towards biological motion stimuli is interesting. To derive that this verifies a fundamental mechanism in human biological motion processing is extremely questionable, given that birds do not even have a cortex. Taking the argumentation of the authors serious, one would have to assume that the 'Local BM' mechanism is probably located in the mesencephalon in humans, and then would have to interact in some way with social perception differences of ADHD children. To me all this seems to make very strong (over-)claims. I suggest providing a much more modest interpretation of the interesting experimental result, based on what has been really experimentally shown by the authors and closely related other data, rather than providing lots of far-reaching speculations.

      In the same direction, in my view, go claims like 'local BM is an intrinsic trait' (L. 448) , which is not only imprecise (maybe better 'mechanisms of processing of local BM cues') but also rather questionable. Likely, this' local processing of BM' is a lower level mechanisms, located probably in early and mid-levels of the visual cortex, with a possible influence of lower structures. It seems not really plausible that this is related to a classical trait variables in the sense of psychology, like personality, as seems to be suggested here. Also here I suggest a much more moderate and less speculative interpretation of the results.

      We thank the reviewer for pointing out these issues. According to these comments, we have carefully revised the discussion to avoid strong (over-) claims. We have deleted the example of chicks, but substituted with more empirical studies to explain our results. We agree that the Local BM mechanism is probably located in subcortical regions in humans, which were reported by some MRI studies (Chang et al., 2018; Hirai and Senju, 2020; Loula et al., 2005). We have added some evidence that atypical local BM processing may decrease visual inputs related to social information as follows:

      “According to the two-process theory of biological motion processing61, local BM cues guide visual attention towards BM stimuli55,62. Consequently, the visual input of BM stimuli increases, facilitating the development of the ability to process global BM cues through learning21,63. The latter is a prerequisite for attributing intentions to others and facilitating social interactions with other individuals20,64,65. Thus, atypical local BM processing may contribute to impaired social interaction through altered visual inputs.” (lines 421 - 427)

      We have also deleted the clarims of 'local BM is an intrinsic trait' (originally L. 448) and related discussion as it was not conclusive based on the current study.

      Reviewer #2 (Public Review):

      Summary:

      Tian et al. aimed to assess differences in biological motion (BM) perception between children with and without ADHD, as well as relationships to indices of social functioning and possible predictors of BM perception (including demographics, reasoning ability and inattention). In their study, children with ADHD showed poorer performance relative to typically developing children in three tasks measuring local, global, and general BM perception. The authors further observed that across the whole sample, performance in all three BM tasks was negatively correlated with scores on the social responsiveness scale (SRS), whereas within groups a significant relationship to SRS scores was only observed in the ADHD group and for the local BM task. Local and global BM perception showed a dissociation in that global BM processing was predicted by age, while local BM perception was not. Finally, general (local & global combined) BM processing was predicted by age and global BM processing, while reasoning ability mediated the effect of inattention on BM processing.

      Strengths:

      Overall, the manuscript is presented in a clear fashion and methods and materials are presented with sufficient detail so the study could be reproduced by independent researchers. The study uses an innovative, albeit not novel, paradigm to investigate two independent processes underlying BM perception. The results are novel and have the potential to have wide-reaching impact on multiple fields.

      We appreciate the reviewer’s positive feedback very much.

      Weaknesses:

      The manuscript has greatly improved in clarity and methodological considerations in response to the review. There are only a few minor points which deserve the authors' attention:

      When outlining the moviation for the current study, results from studies in ADHD and ASD are used too interchangeably. The authors use a lack of evidence for contributing (psychological/developmental) factors on BM processing in ASD to motivate the present study and refer to evidence for differences between typical and non-typical BM processing using studies in both ASD and ADHD. While there are certainly overlapping features between the two conditions/neurotypes, they are not to be considered identical and may have distinct etiologies, therefore the distinction between the two should be made clearer.

      We thank the reviewer for pointing out this issue. We have removed some unnecessary citations about ASD and referred to studies about social cognition in ADHD to elaborate the motivation of this study:

      “Further exploration of a diverse range of social cognitions (e.g., biological motion perception) can provide a fresh perspective on the impaired social function observed in ADHD. Moreover, recent studies have indicated that the social cognition in ADHD may vary depending on different factors at the cognitive, pathological, or developmental levels, such as general cognitive impairment5, symptoms severity8, or age5. Nevertheless, understanding how these factors relate to social cognitive dysfunction of in ADHD is still in its infancy. Bridging this gap is crucial as it can help depict the developmental trajectory of social cognition and identify effective interventions for impaired social interaction in individuals with ADHD.” (lines 53 - 62)

      In the first/main analysis, is unclear to me why in the revised manuscript the authors changed the statistical method from ANOVA/ANCOVA to independent samples t-tests (unless the latter were only used for post-hoc comparisons, then this needs to be stated). Furthermore, although p-values look robust, for this analysis too it should be indicated whether and how multiple comparison problems were accounted for.

      Thanks for the reviewer’s comments. According to the suggestions from reviewer #3, it may be inapposite to regard gender as a covariate in ANOVA, which may violate the assumptions of ANCOVA. To ensure that gender does not influence the results, firstly, we separated boys and girls on the plots with different coloured individual data points, and there are no signs of a gender effect in their TD group. Secondly, we use t-tests to examine the difference between TD and ADHD groups. Finally, we conducted a subsampling analysis with balanced data, and the results remained consistent.

      In part 1 of the results, we aimed to compare the task accuracies between the TD and ADHD groups in three independent tasks, which assess the participants’ abilities to process three types of BM cues. We assumed that individuals with ADHD show poorer performance in three tasks compared to TD individuals. With regard to that, we consider that multiple comparisons may not be necessary.

      Reviewer #3 (Public Review):

      Strengths:

      The authors present differences between ADHD and TD children in biological motion processing, and this question has not received as much attention as equivalent processing capabilities in autism. They use a task that appears well controlled. They raise some interesting mechanistic possibilities for differences in local and global motion processing, which are distinctions worth exploring. The group differences will therefore be of interest to those studying ADHD, as well as other developmental conditions, and those examining biological motion processing mechanisms in general.

      We appreciate the reviewer’s positive assessment of this work.

      Weaknesses:

      The data are not strong enough to support claims about differences between global and lobal processing wrt social communication skills and age. The mechanistic possibilities for why these abilities may dissociate in such a way are interesting, but the crucial tests of differences between correlations do not present a clear picture. Further empirical work would be needed to test the authors' claims. Specifics:

      The authors state frequently that it was the local BM task that related to social communication skills (SRS) and not the global tasks. However, the results section shows a correlation between SRS and all three tasks. The only difference is that when looking specifically within the ADHD group, the correlation is only significant for the local task. The supplementary materials demonstrate that tests of differences between correlations present an incomplete picture. Currently they have small samples for correlations, so this is unsurprising.

      Thanks for this comment. We agree with the reviewer that the relationship between local and global processing with social communication and age needs more expirical work. Based on our results, there are only possible dissociable roles of local and global BM processing. The accumulated evidence from previous studies has converged on this dissociation, e.g., whild global BM processing is susceptible to learning and practice, local BM processing does not show a learning trend (Chang and Troje, 2009; Grossman et al., 2004), and the brain activations in response to local and global BM cues are different (Chang et al., 2018; Duarte et al., 2022). We concurred with reviewers that the evidence for such dissociation from the current study by itself is not strong enough. Therefore, we have toned down on this point and no longer emphasized the dissociation. Based on the current results, we focused our discussion on the different aspects of BM processing in children with and without ADHD. Future studies with larger sample sizes are needed to confirm this disociable relationship.

      Theoretical assumptions. The authors make some statements about local vs global biological motion processing that should still be made more tentatively. They assume that local processing is specifically genetically whereas global processing is a product of experience. These data in newborn chicks are controversial and confounded - I cannot remember the specifics but I think there an upper vs lower visual field complexity difference here.

      We appreciate the reviewer’s suggestion. We agree that the relationship between genetic factors and BM perception remains to be further examined as we didn’t perform any genetic analysis in the current study. Some speculative papers have been removed, so do the statement about newborn chicks given the controversial and confounded results. We have toned down our claims and povided a moderate interpretation of the results:

      “Sensitivity to local BM cues emerges early in life54,55 and involves rapid processing in the subcortical regions16,56-58. As a basic pre-attentive feature23, local BM cues can guide visual attention spontaneously59,60. In contrary, the ability to process global BM cues is related to slow cortical BM processing and is influenced by many factors such as attention25,26 and visual experience21,51. As mentioned above, we found a significant negative correlation between the SRS total score and the accuracy of local BM processing, specifically in the ADHD group. This could be due to decreased visual input related to atypical local BM processing, which further impairs global BM processing. According to the two-process theory of biological motion processing61, local BM cues guide visual attention towards BM stimuli55,62. Consequently, the visual input of BM stimuli increases, facilitating the development of the ability to process global BM cues through learning21,63. The latter is a prerequisite for attributing intentions to others and facilitating social interactions with other individuals20,64,65. Thus, atypical local BM processing may contribute to impaired social interaction through altered visual inputs.” (lines 413 - 427)

      “Few developmental studies have been conducted on local BM processing. The ability to process local BM cues remained stable and did not exhibit a learning trend21,25. A reasonable interpretation may be that local BM processing is a low-level mechanism, probably performed by the primary visual cortex and subcortical regions such as the superior colliculus, pulvinar, and ventral lateral nucleus14,56,61.” (lines 441- 446)

      Readability. The manuscript needs very careful proofreading and correction for grammar. There are grammatical errors throughout.

      Thank the reviewer for this feedback. We have performed thorough proofreading and corrected grammatical errors throughout the manuscript.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      I thank the authors for their revisions that address several of the minor points that I raised in my last review. A number of requests are still not sufficiently answered:

      L. 290 ff.: These model 'BM-local = age + gender etc ' is a pretty sloppy notation. I think what is meant that a GLM was used that uses the predictors genderetc. time appropriate beta_i values. This formulas should be corrected or one just says that a GLM was run with the predictors gender

      The same criticism applies to these other models that follow.

      This was corrected.

      However, the corrected text remains sloppy: example: 'BM-locaL = ...' What exacty is 'BM-Local' the accuracy? etc. Here a precise notation shoudl be given that clearly names which variables are used here as predictors and target variables.

      We appreciate the reviewer’s suggestion. We clarified which variables are used in our model and gived them precise notations:

      “Three linear models were built to investigate the contributing factors: (a) ACClocal = β0 + β1 * age + β2 * gender + β3 * FIQ + β4 * QbInattention, (b) ACCglobal = β0 + β1 * age + β2 * gender + β3 * FIQ + β4 * QbInattention, and (c) ACCgeneral = β0 + β1 * age + β2 * gender + β3 * FIQ + β4 * QbInattention + β5 * ACClocal + β6 * ACCglobal. ACClocal, ACCglobal and ACCgeneral refer to the response accuracies of the three tasks in the ADHD group, and QbInattention is the standardised score for sustained attention function.” (lines 337 - 343)

      All these models assume linearity of the combination of the predictors. was this assumption verified?

      We referred to the previous study of BM perception in children. They found main predictor variables, including IQ (Rutherford et al., 2012; Jones et al., 2011) and age (Annaz et al., 2010; van et al., 2016), have a linear relation with the ability of BM processing.

      This answer is insufficient and not convincing. Because a variable Y depends linearly on predictor A and B in some other study, this does not imply that is is also linear in predictor C, or does not show interactions with such predictors in the present study.

      What is needed here is the testing of models with interaction terms and verifying that such models are not better predictors. If authors do not want to do this, they need at least to clearly point out that they made the strong assumption of linearity of their model, which might be wrong and thus be a substantial limitation of their analysis.

      Thanks for the suggestion. We tried to compare each possible mode with and without relative interactions. The results showed that the change of Coefficient of Determination (R-squared, R2) between the two models was not statistically significant.

      L. 296ff.: For model (b) it looks like general BM performance is strongly driven by the predictor global BM performance in the ADHD group. Does the same observation also apply to the controls?

      The same phenomenon was not observed in TD children. We have briefly discussed this point in the Discussion section of the revised manuscript (lines 449 - 459).

      Was such a path analysis also done for the TD subjects or not? If yes, was then also predicted that the variable BM-Global largely and directedly influences the variable BM-General? (The answer refers to the general discussion section, where no such analysis is presented, as far as I understand.)

      Thank you for your comment. We also conduct a path analysis similar to that in the ADHD group. There is no statistically significant mediator effect in the TD group. Please see Figure S3 for complete statistics.

      Reviewer #2 (Recommendations For The Authors):

      (1) Please add public access to the data repository so data availability can be assessed.

      The data analyzed during the study is available at https://osf.io/37p5s/.

      (2) Lines 119-115: The differences observed in ADHD participants in the studies referenced here were relative to what group? The last sentence here also refers to two groups, and it is difficult to gather which specific groups are meant, also because the two references relate to both ADHD and ASD samples. Please clarify.

      The suggestion is well taken. We have clarified the expressions accordingly:

      “Specifically, compared with the typically developing (TD) group, children with ADHD showed reduced activity of motion-sensitive components (N200) while watching biological and scrambled motions, although no behavioural differences were observed. Another study found that children with ADHD performed worse in BM detection with moderate noise ratios than the TD group32.” (lines 100 - 105)

      (3) Line 116: I'm not sure what is meant by 'despite initial indications' - please briefly specify/summarise here why the investigation into BM processing in ADHD is warranted.

      Thank the reviewer for pointing out this issue. We rephrase this part and briefly specify “why the investigation into BM processing in ADHD is warranted”:

      “Despite initial findings about atypical BM perception in ADHD, previous studies on ADHD treated BM perception as a single entity, which may have led to misleading or inconsistent findings28. Hence, it is essential to deconstruct BM processing into multiple components and motion features.” (lines 108 -111)

      (4) Lines 290-293: Please complete the sentence.

      Thank the reviewer for pointing out this issue. Th sentence has been completed:

      “For Task 2 and 3, where children were asked to detect the presence or discriminate the facing direction of the target walker, TD group have higher accuracies than the ADHD group (Task 2 - TD: 0.70 ± 0.12, ADHD: 0.59 ± 0.12, t73 = 3.677, p < 0.001, Cohen's d = 0.861; Task 3 - TD: 0.79 ± 0.12, ADHD: 0.63 ± 0.17, t73 = 4.702, p < 0.001, Cohen's d = 1.100).” (lines 284 - 288)

      Reviewer #3 (Recommendations For The Authors):

      (1) Conclusions concerning differences between the local and global tasks wrt SRS and age (see above). I believe the authors need to reword throughout to reflect that the tests of differences between these crucial correlations did not present a clear picture.

      We have reworded throughout the paper to reflect the inconclusiveness with regard to the relationship between local and global processing with social communication based on this study only. Future studies with larger sample sizes are needed to confirm this conclusion. The mechanism for this dissociable relationship should be validated by more psychologial tests in the future studies.

      (2) I would again tone down the discussion of genetic specification of local processing, given it is highly controversial.

      We thank the reviewer for pointing out the issue. We agree the point about the genetic specification of local processing remains controversial. The interpretation of results about local BM processing has been rephrased. Please refer to our response to the point #2 mentioned.

      (3) The manuscript needs very careful proofreading and grammatical correction throughout.

      Thanks for the suggestion to check the grammar. We have carefully proofread the manuscript to correct grammatical errors

    1. Author response:

      The following is the authors’ response to the previous reviews.

      eLife assessment

      Following synaptic vesicle fusion events at release sites, vesicle remnants will need to be cleared in order to allow new rounds of vesicle docking and fusion. This fundamental study of Mahapatra and Takahashi examines the role of release site clearance in synaptic transmission during repetitive activity in two types of central synapses, the giant calyx of Held and hippocampal CA1 synapses. The study uses pharmacological approaches to interfere with release site clearance by blocking membrane retrieval (endocytosis). They compare the effects on short-term plasticity with those obtained by pharmacologically inhibiting scaffold protein activity. The data presented make a compelling case for fast endocytosis as necessary for rapid site clearance and vesicle recruitment to active zones. The data reveal an unexpected, fast role for local site clearance in counteracting synaptic depression.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The study examines the role of release site clearance in synaptic transmission during repetitive activity under physiological conditions in two types of central synapses, calyx of Held and hippocampal CA1 synapses. After acute block of endocytosis by pharmacology, deeper synaptic depression or less facilitation was observed in two types of synapses. Acute block of CDC42 and actin polymerization, which possibly inhibits the activity of Intersectin, affected synaptic depression at the calyx synapse, but not at CA1 synapses. The data suggest an unexpected, fast role of the site clearance in counteracting synaptic depression.

      Strengths:

      The study uses acute block of the molecular targets with pharmacology together with precise electrophysiology. The experimental results are clear cut and convincing. The study also examines the physiological roles of the site clearance using action potential-evoked transmission at physiological Ca and physiological temperature at mature animals. This condition has not been examined.

      Weaknesses:

      Pharmacology may have some off-target effects, though acute manipulation should be appreciated and the authors have tried several reagents to verify the overall conclusions.

      Reviewer #2 (Public Review):

      Summary:

      In this manuscript, Mahapatra and Takahashi report on the physiological consequences of pharmacologically blocking either clathrin and dynamin function during compensatory endocytosis or of the cortical actin scaffold both in the calyx of Held synapse and hippocampal boutons in acute slice preparations

      Strengths:

      Although many aspects of these pharmacological interventions have been studied in detail during the past decades, this is a nice comprehensive and comparative study, which reveals some interesting differences between a fast synapse (Calyx of Held) tuned to reliably transmit at several 100 Hz and a more slow hippocampal CA1 synapse. In particular the authors find that acute disturbance of the synaptic actin network leads to a marked frequency-dependent enhancement of synaptic depression in the Calyx, but not in the hippocampal synapse This striking difference between both preparations is the most interesting and novel finding.

      Weaknesses:

      Unfortunately, however, these findings concerning the different consequences of actin depolymerization are not sufficiently discussed in comparison to the literature. My only criticism concerns the interpretation of the ML 141 and Lat B data. With respect to the Calyx data, I am missing a detailed discussion of the effects observed here in light of the different RRP subpools SRP and FRP. This is very important since Lee at al. (2012, PNAS 109 (13) E765-E774) showed earlier that disruption of actin inhibits the rapid transition of SRP SVs to the FRP at the AZ. The whole literature on this important concept is missing. Likewise, the role of actin for the replacement pool at a cerebellar synapse (Miki et al., 2016) is only mentioned in half a sentence. There is quite some evidence that actin is important both at the AZ (SRP to FRP transition, activation of replacement pool) and at the peri-active zone for compensatory endocytosis and release site clearance. Both possible underlying mechanisms (SRP to FRP transition or release site clearance) should be better dissected.

      We dissected the latrunculin effect further by referring to the related literature within the scope of this study in the revised Discussion section (last paragraph).

      Reviewer #3 (Public Review):

      The manuscript by Mahapatra and Takahashi addresses the role of presynaptic release site clearance during sustained synaptic activity. The authors characterize the effects of pharmacologically interfering with SV endocytosis (pre-incubation with Dynasore or Pitstop-2) on synaptic short-term plasticity (STP) at two different CNS synapses (calyx of Held synapses and hippocampal SC to CA1 synapses) using patch-clamp recordings in acute slices under experimental conditions designed to closely mimic a physiological situation (37{degree sign}C and 1.3 mM external [Ca2+]). Endocytosis blocker-induced changes in STP and in the recovery from short-term depression (STD) are compared to those seen after pharmacologically inhibiting actin filament assembly (pre-incubation with Latrunculin-B or the selective Cdc42 GTPase inhibitor ML-141). Presynaptic capacitance (Cm) recordings in calyx terminals were used to establish the effects of the pharmacological maneuvers on SV endocytosis.

      Latrunculin-B and ML-141 neither affect SV endocytosis (assayed by Cm recordings) nor EPSC recovery following conditioning trains, but strongly enhances STD at calyx synapses. No changes in STP were observed at Latrunculin-B- or ML-141-treated SC to CA1 synapses.

      Dynasore and Pitstop-2 slow down endocytosis, limit the total amount of exocytosis in response to long stimuli, enhance STD in response to 100 Hz stimulation, but profoundly accelerate EPSC recovery following conditioning 100 Hz trains at calyx synapses. At SC to CA1 synapses, Dynasore and Pitstop-2 reduce the extend of facilitation and lower relative steady-state EPSCs suggesting a change in the facilitation-depression balance in favor of the latter.

      The authors use state-of-the art techniques and their data, which is clearly presented, leads to authors to conclude that endocytosis is universally important for clearance of release sites while the importance of scaffold protein-mediated site clearance is limited to 'fast synapses'.

      Unfortunately, and perhaps not completely unexpected in view of the pharmacological tools chosen, there are several observations which remain difficult to understand:

      (1) Blocking site clearance affects release sites that have previously been used, i.e. sites at which SV fusion has occurred and which therefore need to be cleared. Calyces use at most 20% of all release sites during a single AP, likely fewer at 1.3 mM external [Ca2+]. Even if all those 20% of release sites become completely unavailable due to a block of release site clearance, the 2nd EPSC in a train should not be reduced by >20% because ~80% of the sites cannot be affected. However, ~50% EPSC reduction was observed (Fig. 2B1, lower right panel) raising the possibility that Dynasore does more than specifically interfering with SVs endocytosis (and possibly Pitstop as well). Non-specific effects are also suggested by the observed two-fold increase in initial EPSC size in SC to CA1 synapses after Dynasore pre-incubation.

      This study compares different experimental conditions to conclude the physiological role of endocytosis on rapid neurotransmission at the large calyceal synapse in mice. A related study at the Drosophila neuromuscular junction (Kawasaki et al., Nat. Neuroscience 2000) reported similar findings in comparable experimental settings (physiological conditions and acute block of endocytosis).

      (2) More severe depression was observed at calyx synapses after blocking endocytosis which the authors attribute to a presynaptic mechanism affecting pool replenishment. When probing EPSC recovery after conditioning 100 Hz trains, a speed up was observed mediated by an "unknown mechanism" which is "masked in 2 mM [Ca2+]". These two observations, deeper synaptic depression during 100 Hz but faster recovery from depression following 100 Hz, are difficult to align and no attempt was made to find an explanation.

      By varying temperature (PT vs RT), calcium concentration (1.3 mM vs 2.0 mM), and stimulation frequency (10, 100, and 200 Hz; some data are not shown), the effect of endocytosis block on EPSC STD and recovery from STD kinetics at the post-hearing calyx were compared in these settings: (PT, 1.3 mM [Ca2+]), (PT, 2.0 mM Ca2+), and (RT, 2.0 mM [Ca2+]), to dissect their respective role.

      (3) To reconcile previous data reporting a block of Ca2+-dependent recovery (CDR) by Dynasore or Latrunculin (measured at 2 mM external [Ca2+]) with the data presented here (using 1.3 mM external [Ca2+]) reporting no effect or a speed up of recovery from depression, the authors postulate that "CDR may operate only when excessive Ca2+ enters during massive presynaptic activation" (page 10 line 244). While that is possible, such explanation ignores plenty of calyx studies demonstrating fiber stimulation-induced CDR and elucidating molecular pathways mediating fiber stimulation-induced CDR, and it also completely dismisses the strong change in recovery time course after 10 Hz conditioning (single exponential) as compared to 100 Hz conditioning (double exponential with a pronounced fast component).

      Strong presynaptic stimuli such as those illustrated in Figs. 1B,C induce massive exocytosis. The illustrated Cm increase of 2 to 2.5 pF represents fusion of 25,000 to 30,000 SVs (assuming a single SV capacitance of 80 aF) corresponding to a 12 to 15% increase in whole terminal membrane surface (assuming a mean terminal capacitance of ~16 pF). Capacitance measurements can only be considered reliable in the absence of marked changes in series and membrane conductance. Documentation of the corresponding conductance traces is therefore advisable for such massive Cm jumps and merely mentioning that the first 450 ms after stimulation were skipped during analysis or referring to previous publications showing conductance traces is insufficient.

      All bar graphs in Figures 1 through 6 and Figures S3 through S6 compare three or even four (Fig. 5C) conditions, i.e. one control and at least two treatment data sets. It appears as if repeated t-tests were used to run multiple two-group comparisons (i.e. using the same control data twice for two different comparisons). Either a proper multiple comparison test should be used or a Bonferroni correction or similar multiple-comparison correction needs to be applied.

      We updated the statistical analysis of all data using one-way ANOVA and t-test with BonferroniHolm method of p level correction and rectified one analysis in Fig 1 and 3, all major conclusions are unchanged.

      Finally, the terminology of contrasting "fast-signaling" (calyx synapses) and "slow-plastic" (SC synapses) synapses seems to imply that calyx synapses lack plasticity, as does the wording "conventional bouton-type synapses involved in synaptic plasticity" (page 11, line 251). I assume, the authors primarily refer to the maximum frequencies these two synapse types typically transmit (fast-signaling vs slow-signaling)?

      Properties of these two synapses described explicitly in updated text and they are renamed as fast and slow synapes.

      Recommendations for the authors:

      Reviewer #3 (Recommendations For The Authors):

      'SV replenishment' and 'site clearance' should not be used synonymously as it seems to be done sometimes here.

      In this revision, we described them more explicitly.

      The data presented in Fig. S6 are detached from the rest of the manuscript, not relevant and should be removed. page 4 line 95 "... to ensure sufficient Ca2+ currents to induce exo-endocytosis." ICa is large enough to induce exocytosis also at 1.3 mM Ca2+. Please clarify.

      We updated the relevant section.

      page 5, line 108 "... this slow endocytosis showed a strongly prolonged time course without accompanied by the change of Cm or presynaptic Ca2+ currents" Please fix.

      Fixed.

      page 5, line 121 "Thus, at calyces of Held, bath-application of Dynasore or Pitstop-2 can block both fast and slow endocytosis without perturbing presynaptic intracellular milieu." Bath-application never perturbs the intracellular milieu. Please clarify.

      Rephrased.

      page 6 line 128 "... physiological aCSF" is a misnomer (= physiological artificial CSF). Please fix.

      In the introduction section, it is clearly described.

      page 11, line 252 "... from hippocampal SC-CA1 pyramidal neurons" There are no "SC-CA1 pyramidal neurons". Please fix.

      Fixed.

      page 12, line 285 "In acute slices optimized to physiological conditions" The conditions are optimized, not the slices. Please fix.

      Fixed.

      page 14, line 323 same as above

      Fixed.

      page 14, line 330 LTP at SC-CA1 synapses is postsynaptic. Please clarify.

      Rephrased

      page 16, line 381 "had a series resistance of 3-4 MOhm" versus

      page 17, line 408 "The patch pipettes had a series resistance of 5-15 MOhm (less than 10 MOhm in most cells)" 3-4 is perhaps pipette resistance while 5-15 is perhaps series resistance? Please clarify.

      Fixed.

      page 17, line 398 "Cm traces were averaged at every 10 ms (for 10 Hz train stimulation) or 20 ms (for 5 ms single or 1 Hz train stimulation)." Do you mean to say that Cm traces were smoothed with a moving average using a window size of 10 or 20 ms duration? Please clarify.

      Rephrased to clarify better.

      page 18, "All values are given as mean {plus minus} SEM and significance of difference was evaluated by Student's unpaired t-test, unless otherwise noted." Please check. You cannot simply use repeated t-tests for multiple comparisons. Either a proper multiple comparison test should be used or a Bonferroni correction or similar multiple-comparison correction needs to be applied.

      All statistical analysis are updated using one-way ANOVA and t-test, with Bonferroni-Holm method of p level correction and one analysis is rectified in Fig 1 and 3, with no change in major conclusions.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1:

      Summary:

      The authors aim to test the sensory recruitment theory of visual memory, which assumes that visual sensory areas are recruited for working memory, and that these sensory areas represent visual memories in a similar fashion to how perceptual inputs are represented. To test the overlap between working memory (WM) and perception, the authors use coarse stimulus (aperture) biases that are known to account for (some) orientation decoding in the visual cortex (i.e., stimulus energy is higher for parts of an image where a grating orientation is perpendicular to an aperture edge, and stimulus energy drives decoding). Specifically, the authors show gratings (with a given "carrier" orientation) behind two different apertures: one is a radial modulator (with maximal energy aligned with the carrier orientation) and the other an angular modulator (with maximal energy orthogonal to the carrier orientation). When the subject detects contrast changes in these stimuli (the perceptual task), orientation decoding only works when training and testing within each modulator, but not across modulators, showing the impact of stimulus energy on decoding performance. Instead, when subjects remember the orientation over a 12s delay, orientation decoding works irrespective of the modulator used. The authors conclude that representations during WM are therefore not "sensory-like", given that they are immune to aperture biases. This invalidates the sensory recruitment hypothesis, or at least the part assuming that when sensory areas are recruited during WM, they are recruited in a manner that resembles how these areas are used during perception.

      Strengths:

      Duan and Curtis very convincingly show that aperture effects that are present during perception, do not appear to be present during the working memory delay. Especially when the debate about "why can we decode orientations from human visual cortex" was in full swing, many may have quietly assumed this to be true (e.g., "the memory delay has no stimuli, and ergo no stimulus aperture effects"), but it is definitely not self-evident and nobody ever thought to test it directly until now. In addition to the clear absence of aperture effects during the delay, Duan and Curtis also show that when stimulus energy aligns with the carrier orientation, cross-generalization between perception and memory does work (which could explain why perception-to-memory cross-decoding also works). All in all, this is a clever manipulation, and I'm glad someone did it, and did it well.

      Weaknesses:

      There seems to be a major possible confound that prohibits strong conclusions about "abstractions" into "line-like" representation, which is spatial attention. What if subjects simply attend the endpoints of the carrier grating, or attend to the edge of the screen where the carrier orientation "intersects" in order to do the task? This may also result in reconstructions that have higher bold at areas close to the stimulus/screen edges along the carrier orientation. The question then would be if this is truly an "abstracted representation", or if subjects are merely using spatial attention to do the task.

      Alternatively (and this reaches back to the "fine vs coarse" debate), another argument could be that during memory, what we are decoding is indeed fine-scale inhomogenous sampling of orientation preferences across many voxels. This is clearly not the most convincing argument, as the spatial reconstructions (e.g., Figure 3A and C) show higher BOLD for voxels with receptive fields that are aligned to the remembered orientation (which is in itself a form of coarse-scale bias), but could still play a role.

      To conclude that the spatial reconstruction from the data indeed comes from a line-like representation, you'd need to generate modeled reconstructions of all possible stimuli and representations. Yes, Figure 4 shows that line results in a modeled spatial map that resembles the WM data, but many other stimuli might too, and some may better match the data. For example, the alternative hypothesis (attention to grating endpoints) may very well lead to a very comparable model output to the one from a line. However testing this would not suffice, as there may be an inherent inverse problem (with multiple stimuli that can lead to the same visual field model).

      The main conclusion, and title of the paper, that visual working memories are abstractions of percepts, is therefore not supported. Subjects could be using spatial attention, for example. Furthermore, even if it is true that gratings are abstracted into lines, this form of abstraction would not generalize to any non-spatial feature (e.g., color cannot become a line, contrast cannot become a line, etc.), which means it has limited explanatory power.

      We thank the reviewer for bringing up these excellent questions.

      First, to test the alternative hypothesis of spatial attention, we fed a dot image into the image-computable model. We placed the dot where we suspect one might place their spatial attention, namely, at the edge of the stimulus that is tangent to the orientation of the grating. We generated the model response for three orientations and their combination by rotating and averaging. From Author response image 1 below, one can see that this model does not match the line-like representation we reported. Nonetheless, we would like to avoid making the argument that attention does not play a role. We strongly suspect that if one was attending to multiple places along a path that makes up a line, it would produce the results we observed. But there begins a circularity in the logic, where one cannot distinguish between attention to a line-like representation and a line of attention being the line-like representation.

      Author response image 1.

      Reconstruction maps for the dot image at the edge of 15°, 75°, 135°, and the combined across three orientation conditions.

      Second, we remain agnostic to the question of whether fine-scale inhomogenous sampling of orientation selective neurons may drive some of the decoding results we report here. It is possible that our line-like representations are driven by neurons tuned to the sample orientation that have receptive fields that lie along the line. Here, we instead focus on testing the idea that WM decoding does not depend on aperture biases.

      Finally, we agree with the reviewer that there is much more work to be done in this area. Our working hypothesis, that WM representations are abstractions of percepts, is admittedly based on Occam's razor and an appeal to efficient coding principles. We also agree that these results may not generalize to all forms of WM (eg, color). As always, there is a tradeoff between interpretability (visual spatial formats in retinotopically organized maps) and generalizability. Frankly, we have no idea how one might be able to test these ideas when subjects might be using the most common type of memory reformatting - linguistic representations, which are incredibly efficient.

      Additional context:

      The working memory and perception tasks are rather different. In this case, the perception task does not require the subject to process the carrier orientation (which is largely occluded, and possibly not that obvious without paying attention to it), but attention is paid to contrast. In this scenario, stimulus energy may dominate the signal. In the WM task, subjects have to work out what orientation is shown to do the task. Given that the sensory stimulus in both tasks is brief (1.5s during memory encoding, and 2.5s total in the perceptual task), it would be interesting to look at decoding (and reconstructions) for the WM stimulus epoch. If abstraction (into a line) happens in working memory, then this perceptual part of the task should still be susceptible to aperture biases. It allows the authors to show that it is indeed during memory (and not merely the task or attentional state of the subject) that abstraction occurs.

      Again, this is an excellent question. We used a separate perceptual task instead of the stimulus epoch as control mainly for two reasons. First, we used a control task in which participants had to process the contrast, not orientation, of the grating because we were concerned that participants would reformat the grating into a line-like representation to make the judgments. To avoid this, we used a task similar to the one used when previous researchers first found the stimulus vignetting effect (Roth et al., 2018). Again, our main goal was to try to focus on the bottom-up visual features. Second, because of the sluggishness of the BOLD response, combined with our task design (ie, memory delay always followed the target stimulus), we cannot disentangle the visual and memory responses that co-exist at this epoch. Any result could be misleading.

      What's also interesting is what happens in the passive perceptual condition, and the fact that spatial reconstructions for areas beyond V1 and V2 (i.e., V3, V3AB, and IPS0-1) align with (implied) grating endpoints, even when an angular modulator is used (Figure 3C). Are these areas also "abstracting" the stimulus (in a line-like format)?

      We agree these findings are interesting and replicate what we found in our previous paper (Kwak & Curtis, Neuron, 2022). We believe that these results do imply that these areas indeed store a reformatted line-like WM representation that is not biased by vignetting. We would like to extend a note of caution, however, because the decoding results in the higher order areas (V3AB, IPS0-1, etc) are somewhat poor (especially in comparison to V1, V2, V3) (see Figure 2).

      Reviewer #2:

      Summary:

      According to the sensory recruitment model, the contents of working memory (WM) are maintained by activity in the same sensory cortical regions responsible for processing perceptual inputs. A strong version of the sensory recruitment model predicts that stimulus-specific activity patterns measured in sensory brain areas during WM storage should be identical to those measured during perceptual processing. Previous research casts doubt on this hypothesis, but little is known about how stimulus-specific activity patterns during perception and memory differ. Through clever experimental design and rigorous analyses, Duan & Curtis convincingly demonstrate that stimulus-specific representations of remembered items are highly abstracted versions of representations measured during perceptual processing and that these abstracted representations are immune to aperture biases that contribute to fMRI feature decoding. The paper provides converging evidence that neural states responsible for representing information during perception and WM are fundamentally different, and provides a potential explanation for this difference.

      Strengths:

      (1) The generation of stimuli with matching vs. orthogonal orientations and aperture biases is clever and sets up a straightforward test regarding whether and how aperture biases contribute to orientation decoding during perception and WM. The demonstration that orientation decoding during perception is driven primarily by aperture bias while during WM it is driven primarily by orientation is compelling.

      (2) The paper suggests a reason why orientation decoding during WM might be immune to aperture biases: by weighting multivoxel patterns measured during WM storage by spatial population receptive field estimates from a different task the authors show that remembered but not actively viewed - orientations form "line-like" patterns in retinotopic cortical space.

      We thank the reviewer for noting the strengths in our work.

      Weaknesses:

      (1) The paper tests a strong version of the sensory recruitment model, where neural states representing information during WM are presumed to be identical to neural states representing the same information during perceptual processing. As the paper acknowledges, there is already ample reason to doubt this prediction (see, e.g., earlier work by Kok & de Lange, Curr Biol 2014; Bloem et al., Psych Sci, 2018; Rademaker et al., Nat Neurosci, 2019; among others). Still, the demonstration that orientation decoding during WM is immune to aperture biases known to drive orientation decoding during perception makes for a compelling demonstration.

      We agree with the reviewer, and would add that the main problem with the sensory recruitment model of WM is that it remains underspecified. The work cited above and in our paper, and the results in this report is only the beginning of efforts to fully detail what it means to recruit sensory mechanisms for memory.

      (2) Earlier work by the same group has reported line-like representations of orientations during memory storage but not during perception (e.g., Kwak & Curtis, Neuron, 2022). It's nice to see that result replicated during explicit perceptual and WM tasks in the current study, but I question whether the findings provide fundamental new insights into the neural bases of WM. That would require a model or explanation describing how stimulus-specific activation patterns measured during perception are transformed into the "line-like" patterns seen during WM, which the authors acknowledge is an important goal for future research.

      We agree with the reviewer that perhaps some might see the current results as an incremental step given our previous paper. However, we would point out that researchers have been decoding memorized orientation from the early visual cortex for 15 years, and not one of those highly impactful studies had ever done what we did here, which was to test if decoded WM representations are the product of aperture biases. Not only do our results indicate that decoding memorized orientation is immune to these biases, but they critically suggest a reason why one can decode orientation during WM.

      Reviewer #3:

      Summary:

      In this work, Duan and Curtis addressed an important issue related to the nature of working memory representations. This work is motivated by findings illustrating that orientation decoding performance for perceptual representations can be biased by the stimulus aperture (modulator). Here, the authors examined whether the decoding performance for working memory representations is similarly influenced by these aperture biases. The results provide convincing evidence that working memory representations have a different representational structure, as the decoding performance was not influenced by the type of stimulus aperture.

      Strengths:

      The strength of this work lies in the direct comparison of decoding performance for perceptual representations with working memory representations. The authors take a well-motivated approach and illustrate that perceptual and working memory representations do not share a similar representational structure. The authors test a clear question, with a rigorous approach and provide convincing evidence. First, the presented oriented stimuli are carefully manipulated to create orthogonal biases introduced by the stimulus aperture (radial or angular modulator), regardless of the stimulus carrier orientation. Second, the authors implement advanced methods to decode the orientation information present, in visual and parietal cortical regions, when directly perceiving or holding an oriented stimulus in memory. The data illustrates that working memory decoding is not influenced by the type of aperture, while this is the case in perception. In sum, the main claims are important and shed light on the nature of working memory representations.

      We thank the reviewer for noting the strengths in our work.

      Weaknesses:

      I have a few minor concerns that, although they don't affect the main conclusion of the paper, should still be addressed.

      (1) Theoretical framing in the introduction: Recent work has shown that decoding of orientation during perception does reflect orientation selectivity, and it is not only driven by the stimulus aperture (Roth, Kay & Merriam, 2022).

      Excellent point, and similar to the point made by Reviewer 1. We now adjust our text and cite the paper in the Introduction.

      Below, we paste our response to Reviewer 1:

      “Second, we remain agnostic to the question of whether fine-scale inhomogenous sampling of orientation selective neurons may drive some of the decoding we report here. It is possible that our line-like representations are driven by neurons tuned to the sample orientation that have receptive fields that lie along the line. Here, we instead focus on testing the idea that WM decoding does not depend on aperture biases.”

      (2) Figure 1C illustrates the principle of how the radial and angular modulators bias the contrast energy extracted by the V1 model, which in turn would influence orientation decoding. It would be informative if the carrier orientations used in the experiment were shown in this figure, or at a minimum it would be mentioned in the legend that the experiment used 3 carrier orientations (15{degree sign}, 75{degree sign}, 135{degree sign}) clockwise from vertical. Related, when trying to find more information regarding the carrier orientation, the 'Stimuli' section of the Methods incorrectly mentions that 180 orientations are used as the carrier orientation.

      We apologize for not clearly indicating the stimulus features in the figure. Now, we added the information about the target orientations in Figure 1C legend. Also, we now corrected in the Methods section the mistakes about the carrier orientation and the details of the task. Briefly, participants were asked to use a continuous report over 180 orientations. We now clarify that “We generated 180 orientations for the carrier grating to cover the whole orientation space during the continuous report task.”

      (3) The description of the image computable V1 model in the Methods is incomplete, and at times inaccurate. i) The model implements 6 orientation channels, which is inaccurately referred to as a bandwidth of 60{degree sign} (should be 180/6=30). ii) The steerable pyramid combines information across phase pairs to obtain a measure of contrast energy for a given stimulus. Here, it is only mentioned that the model contains different orientation and spatial scale channels. I assume there were also 2 phase pairs, and they were combined in some manner (squared and summed to create contrast energy). Currently, it is unclear what the model output represents. iii) The spatial scale channel with the maximal response differences between the 2 modulators was chosen as the final model output. What spatial frequency does this channel refer to, and how does this spatial frequency relate to the stimulus?

      (i) First, we thank the reviewer for pointing out this mistake since the range of orientations should be 180deg instead of 360deg. We corrected this in the revised version.

      (ii) Second, we apologize for not being clear. In the second paragraph of the “Simulate model outputs” section, we wrote,

      “For both types of stimuli, we used three target orientations (15°, 75°, and 135° clockwise from vertical), which had two kinds of phases for both the carriers and the modulators. We first generated the model’s responses to each target image separately, then averaged the model responses across all phases for each orientation condition.”

      We have corrected this text by now writing,

      from vertical), two phases for the carrier (0 or π), and two phases for the modulator (sine “For both types of stimuli, we used three target orientations (15°, 75°, and 135° clockwise from vertical), two phases for the carrier (0 or π), and two phases for the modulator (sine or cosine phase). We first generated the model responses to each phase condition separately, then averaged them across all phases for each orientation condition.”

      (iii) Third and again we apologize for the misunderstanding. Since both modulated gratings have the same spatial frequency, the channel with the largest response should be equal to the spatial frequency of the stimulus. We corrected this by now writing,

      “For the final predicted responses, we chose the subband with maximal responses (the 9th level), which corresponds to the spatial frequency of the stimulus (Roth, Heeger, and Merriam 2018).”

      (4) It is not clear from the Methods how the difficulty in the perceptual control task was controlled. How were the levels of task difficulty created?

      Apologies for not being clear. The task difficulty was created by setting the contrast differences between the two stimuli. The easiest level is choosing the first and the last contrast as pairs, while the hardest level is choosing the continuous two contrasts. We added these sentences

      “The contrast for each stimulus was generated from a predefined set of 20 contrasts uniformly distributed between 0.5 and 1.0 (0.025 step size). We created 19 levels of task difficulty based on the contrast distance between the two stimuli. Thus, the difficulty ranged from choosing contrast pairs with the largest difference (0.5, easiest) to contrast pairs with the smallest difference (0.025, hardest). Task difficulty level changed based on an adaptive, 1-up-2-down staircase procedure (Levitt 1971) to maintain performance at approximately 70% correct.”

      Recommendations For The Authors

      (Reviewer #1):

      (1) If the black circle (Fig 3A & C) is the stimulus size, and the stimulus (12º) is roughly half the size of the entire screen (24.8º), then how are spatial reconstructions generated for parts of the visual field that fall outside of the screen? I am asking because in Figure 3 the area over which spatial reconstructions are plotted has a diameter at least 3 times the diameter of that black circle (the stimulus). I'm guessing this is maybe possible when using a very liberal fitting approach to prf's, where the center of a prf can be outside of the screen (so you'd fit a circle to an elongated blob, assuming that blob is the edge of a circle, or something). Can you really reliably estimate that far out into visual space/ extrapolate prf's that exist in a part of the space you did not fully map (because it's outside of the screen)?

      We thank the reviewer for pointing out this confusing issue.

      First, the spatial construction map has a diameter 3 times the diameter of the stimulus because we included voxels whose pRF eccentricities were within 20º in the reconstruction, the same as Kwak & Curtis, 2022. There are reasons for doing so. First, while the height of the screen is 24.8º, the width of the screen is 44º. Thus, it is possible to have voxels whose pRF eccentricities are >20º. Second, for areas outside the height boundaries, there might not be pRF centers, but the whole pRF Gaussian distributions might still cover the area. Moreover, when creating the final map combined across three orientation conditions, we rotated them to be centered vertically, which then required a 20x20º square. Finally, inspecting the reconstruction maps, we noticed that the area that was twice the stimulus size (black circle) made very little contributions to the reconstructions. Therefore, the results depicted in Figure 3A&C are justified, but see the next comment and our response.

      (2) Is the quantification in 3B/C justified? The filter line uses a huge part of visual space outside of the stimulus (and even the screen). For the angular modulator in the "perception" condition, this means that there is no peak at -90/90 degree. But if you were to only use a line that is about the size of the stimulus (a reasonable assumption), it would have a peak at -90/90 degree.

      This is an excellent question. We completely agree that it is more reasonable to use filter lines that have the same size (12º) as the stimulus instead of the whole map size (40º). Based on the feedback from the Reviewer, we redid the spatial reconstruction analyses and now include the following changes to Figure 3.

      (1) We fitted the lines using pixels only within the stimulus. In Figure 3A and Figure 3C, we now replaced the reconstruction maps.

      (2) We added the color bar in Figure 3A.

      (3) We regenerated the filtered responses and calculated the fidelity results by using line filters with the stimulus size. We replaced the filtered responses and fidelity results in Figure 3B and Figure 3D. With the new analysis, as anticipated by the Reviewer, we now found peaks at -90/90 degrees for the angular modulated gratings in the perceptual control task in V1 and V2. Thank you Reviewer 1!!!!

      (4) We also made corresponding changes in the Supplementary Figure S4 and S5, as well as the statistical results in Table S4 and S5.

      (5) In the “Methods” section, we added “within the stimulus size” for both “fMRI data analysis: Spatial reconstruction” and “Quantification and statistical analysis” subsections.

      (3) Figure 4 is nice, but not exactly quantitative. It does not address that the reconstructions from the perceptual task are hugging the stimulus edges much more closely compared to the modeled map. Conversely, the yellow parts of the reconstructions from the delay fan out much further than those of the model. The model also does not seem to dissociate radial/angular stimuli, while in the perceptual data the magnitude of perceptual reconstruction is clearly much weaker for angular compared to radial modulator.

      We thank the reviewer for this question. First, we admit that Figure 4 is more qualitative than quantitative. However, we see no alternative that better depicts the similarity in the model prediction and the fMRI results for the perceptual control and WM tasks. The figure clearly shows the orthogonal aperture bias. Second, we agree that aspects of the observed fMRI results are not perfectly captured by the model. This could be caused by many reasons, including fMRI noise, individual differences, etc. Importantly, different modulators induce orthogonal aperture bias in the perceptual but not the WM task, and therefore does not have a major impact on the conclusions.

      (4) The working memory and perception tasks are rather different. In this case, the perception task does not require the subject to process the carrier orientation (which is largely occluded, and possibly not that obvious without paying attention to it), but attention is paid to contrast. In this scenario, stimulus energy may dominate the signal. In the WM task, subjects have to work out what orientation is shown to do the task. Given that the sensory stimulus in both tasks is brief (1.5s during memory encoding, and 2.5s total in the perceptual task), it would be interesting to look at decoding (and reconstructions) for the WM stimulus epoch. If abstraction (into a line) happens in working memory, then this perceptual part of the task should still be susceptible to aperture biases. It allows the authors to show that it is indeed during memory (and not merely the task or attentional state of the subject) that abstraction occurs.

      We addressed the same point in the response for Reviewer 1, “additional context” section.

      Recommendations for improving the writing:

      (1) The main text had too little information about the Methods. Of course, some things need not be there, but others are crucial to understanding the basics of what is being shown. For example, the main text does not describe how many orientations are used (well... actually the caption to Figure 1 says there are 2: horizontal and vertical, which is confusing), and I had to deduce from the chance level (1/3) that there must have been 3 orientations. Also, given how important the orthogonality of the carrier and modulator are, it would be good to have this explicit (I would even want an analysis showing that indeed the two are independent). A final example is the use of beta weights, and for delay period decoding only the last 6s (of the 12s delay) are modeled and used for decoding.

      We thank the reviewer for identifying aspects of the manuscript that were confusing. We made several changes to the paper to clarify these details.

      First, we added the information about the orientations we used in the caption for Figure 1 and made it clear that Figure 1C is just an illustration using vertical/horizontal orientations. Second, the carrier and the modulator are different in many ways. For example, the carrier is a grating with orientation and contrast information, while the modulator is the aperture that bounds the grating without these features. Their phases are orthogonal, and we added this in the second paragraph of the “Stimuli” section. Last, in the main text and the captions, we now denote “late delay” when writing about our procedures.

      (2) Right under Figure 3, the text reads "angular modulated gratings produced line-like representations that were orthogonal carrier orientation reflecting the influence of stimulus vignetting", but the quantification (Figure 3D) does not support this (there is no orthogonal "bump" in the filtered responses from V1-V3, and one aligned with the carrier orientation in higher areas).

      This point was addressed in the “recommendations for the authors (Reviewer 1), point 2” above.

      Minor corrections to text and figures:

      (1) Abstract: "are WM codes" should probably be "WM codes are".

      We prefer to keep “are WM codes” as it is grammatically correct.

      (2) Introduction: Second sentence 2nd paragraph: representations can be used to decode representations? Or rather voxel patterns can be used...

      Changed to “On the one hand, WM representations can be decoded from the activity patterns as early as primary visual cortex (V1)...”

      (3) Same paragraph: might be good to add more references to support the correlation between V1 decoding and behavior. There's an Ester paper, and Iamchinina et al. 2021. These are not trial-wise, but trial-wise can also be driven by fluctuating arousal effects, so across-subject correlations help fortify this point.

      We added these two papers as references.

      (4) Last paragraph: "are WM codes" should probably be "WM codes are".

      See (1) above.

      (5) Figure 1B & 2A caption: "stimulus presenting epoch" should probably be "stimulus presentation epoch".

      Changed to “stimulus epoch”.

      (6) Figure 1C: So this is very unclear, to say stimuli are created using vertical and horizontal gratings (when none of the stimuli used in the experiment are either).

      We solved and answered this point in response to Reviewer 3, point 2.

      (7) Figure 2B caption "cross" should probably be "across".

      We believe “cross” is fine since cross here means cross-decoding.

      (8) Figure 3A and C are missing a color bar, so it's unclear how these images are generated (are they scaled, or not) and what the BOLD values are in each pixel.

      All values in the map were scaled to be within -1 to 1. We added the color bar in both Figure 3 and Figure 4.

      (9) Figure 3B and D (bottom row) are missing individual subject data.

      We use SEM to indicate the variance across subjects.

      (10) Figure D caption: "early (V1 and V2)" should probably be "early areas (V1 and V2)".

      Corrected.

      (11) Methods, stimuli says "We generated 180 orientations for the carrier grating to cover the whole orientation space." But it looks like only 3 orientations were generated, so this is confusing.

      We solved and answered this point in response to Reviewer 3, point 2.

      (12) Further down (fMRI task) "random jitters" is probably "random jitter"

      Corrected.

    1. Author response:

      Response to Reviewer #1 (Public Review):

      We thank the reviewer for their constructive criticism of our study, their proposed solutions, and for highlighting areas of the methodology and analytical pipeline where explanations were unclear or unsatisfactory. We will take the reviewer’s feedback into account to improve the clarity and readability of the revised manuscript. We acknowledge the importance of ruling out eye movements as a potential confound. We address these concerns briefly below, but a more detailed explanation (and a full breakdown of the relevant analyses, including the corrected and uncorrected results) will be provided in the revised manuscript.

      First, the source of EEG activity recorded from the frontal electrodes is often unclear. Without an external reference, it is challenging to resolve the degree to which frontal EEG activity represents neural or muscular responses1. Thus, as a preventative measure against the potential contribution of eye movement activity, for all our EEG analyses, we only included activity from occipital, temporal, and parietal electrodes (the selected electrodes can be seen in the final inset of Figure 3).

      Second, as suggested by the reviewer, we re-ran our analyses using the activity measured from the frontal electrodes alone. If the source of the nonlinear decoding accuracy in the AV condition was muscular activity produced by eye movements, we would expect to observe better decoding accuracy from sensors closer to the source. Instead, we found that decoding accuracy from the frontal electrodes (peak d' = 0.08) was less than half that of decoding accuracy from the more posterior electrodes (peak d' = 0.18). These results suggest that the source of neural activity containing information about stimulus position was located over occipito-parietal areas, consistent with our topographical analyses (inset of Figure 4).

      Third, we compared the average eye movements between the three main sensory conditions (auditory, visual, and audiovisual). In the visual condition, there was little difference in eye movements corresponding to the five stimulus locations, likely because the visual stimuli were designed to be spatially diffuse. For the auditory and audiovisual conditions, there was more distinction between eye movements corresponding to the stimulus locations. However, these appeared to be the same between auditory and audiovisual conditions. If consistent saccades to audiovisual stimuli had been responsible for the nonlinear decoding we observed, we would expect to find a higher positive correlation between horizontal eye position and stimulus location in the audiovisual condition than in the auditory or visual conditions. Instead, we found no difference in correlation between audiovisual and auditory stimuli, indicating that eye movements were equivalent in these conditions and unlikely to explain better decoding accuracy for audiovisual stimuli.

      Finally, we note that the stricter eye movement criterion acknowledged in the Discussion section of the original manuscript resulted in significantly better audiovisual d' than the MLE prediction, but this difference did not survive cluster correction. This is an important distinction to make as, when combined with the results described above, it seems to support our original interpretation that the stricter criterion combined with our conservative measure of (mass-based) cluster correction2 led to type 2 error.

      References

      (1) Roy, R. N., Charbonnier, S., & Bonnet, S. (2014). Eye blink characterization from frontal EEG electrodes using source separation and pattern recognition algorithms. Biomedical Signal Processing and Control, 14, 256–264.

      (2) Pernet, C. R., Latinus, M., Nichols, T. E., & Rousselet, G. A. (2015). Cluster-based computational methods for mass univariate analyses of event-related brain potentials/fields: A simulation study. Journal of Neuroscience Methods, 250, 85–93.

      Response to Reviewer #2 (Public Review):

      We thank the reviewer for their insight and constructive feedback. As emphasized in the review, an interesting question that arises from our results is that, if the neural data exceeds the optimal statistical decision (MLE d'), why doesn’t the behavioural data? We agree with the reviewer’s suggestion that more attention should be devoted to this question, and plan to provide a deeper discussion of the relationship between behavioural and neural super-additivity in the revised manuscript. We also note that while this discrepancy remains unexplained, our results are consistent with the literature. That is, both non-linear neural responses (single-cell recordings) and behavioural responses that match MLE are reliable phenomenon in multisensory integration1,2,3,4.

      One possible explanation for this puzzling discrepancy is that behavioural responses occur sometime after the initial neural response to sensory input. There are several subsequent neural processes between perception and a behavioural response5, all of which introduce additional noise that may obscure super-additive perceptual sensitivity. In particular, the mismatch between neural and behavioural accuracy may be the result of additional neural processes that translate sensory activity into a motor response to perform the behavioural task.

      Our measure of neural super-additivity (exceeding optimally weighted linear summation) differs from how it is traditionally assessed (exceeding summation of single neuron responses)2. However, neither method has yet fully explained how this neural activity translates to behavioural responses, and we think that more work is needed to resolve the abovementioned discrepancy. However, our method will facilitate this work by providing a reliable method of measuring neural super-additivity in humans, using non-invasive recordings.

      References

      (1) Alais, D., & Burr, D. (2004). The ventriloquist effect results from near-optimal bimodal integration. Current Biology, 14(3), 257–262.

      (2) Ernst, M. O., & Banks, M. S., (2002). Humans integrate visual and haptic information in a statistically optimal fashion. Nature, 415(6870), 429–433.

      (3) Meredith, M. A., & Stein, B. E. (1993). Interactions among converging sensory inputs in the superior colliculus. Science, 221, 389–391.

      (4) Stanford, T. R., & Stein, B. E. (2007). Superadditivity in multisensory integration: putting the computation in context. Neuroreport 18, 787–792.

      (5) Heekeren, H., Marrett, S. & Ungerleider, L. (2008). The neural systems that mediate human perceptual decision making. Nature Reviews Neuroscience, 9, 467–479.

    1. Author response:

      Thanks for the eLife assessment

      “This study employed a comprehensive approach to examining how the MT+ region integrates into a complex cognition system in mediating human visuo-spatial intelligence. While the findings are useful, the experimental evidence is incomplete and the study design, hypothesis, analyses, writing, and presentation need to be improved.” We plan to revise the manuscript according to the comments of Public Reviews.

      We are grateful for the excellent and very helpful comments, and now we address provisional author responses.

      Reviewer #1 (Public Review):

      Summary:

      The study of human intelligence has been the focus of cognitive neuroscience research, and finding some objective behavioral or neural indicators of intelligence has been an ongoing problem for scientists for many years. Melnick et al, 2013 found for the first time that the phenomenon of spatial suppression in motion perception predicts an individual's IQ score. This is because IQ is likely associated with the ability to suppress irrelevant information. In this study, a high-resolution MRS approach was used to test this theory. In this paper, the phenomenon of spatial suppression in motion perception was found to be correlated with the visuo-spatial subtest of gF, while both variables were also correlated with the GABA concentration of MT+ in the human brain. In addition, there was no significant relationship with the excitatory transmitter Glu. At the same time, SI was also associated with MT+ and several frontal cortex FCs.

      Strengths:

      (1) 7T high-resolution MRS is used.

      (2) This study combines the behavioral tests, MRS, and fMRI.

      Weaknesses:

      (1) In the intro, it seems to me that the multiple-demand (MD) regions are the key in this study. However, I didn't see any results associated with the MD regions. Did I miss something??

      Thank reviewer for pointing this out. After careful consideration, we agree with your point of view. According to the results of Melnick 2013, the motion surround suppression (SI) and the time thresholds of small and large gratings representing hMT+ functionality are correlated with Verbal Comprehension, Perceptual Reasoning, Working Memory, and Processing Speed Indicators, with correlation coefficients of 0.69, 0.47, 0.49, and 0.50, respectively. This suggests that hMT+ does have the potential to become the core of MD system. However, due to our results only delving into “the GABA-ergic inhibition in human MT predicts visuo-spatial intelligence mediated by reverberation with frontal cortex”, it is not yet sufficient to prove that hMT+is the core node of the MD system, we will adjust the explanatory logic of the article, that is, emphasizing the de-redundancy of hMT+ in visual-spatial intelligence and the improvement of information processing efficiency, while weakening the significance of hMT+ in MD systems.

      (2) How was the sample size determined? Is it sufficient??

      Thank reviewer for pointing this out. We use G*power to determine our sample size. In the study by Melnick (2013), they reported a medium effect between SI and Perception Reasoning sub-ability (r=0.47). Here we use this r value as the correlation coefficient (ρ H1), setting the power at the commonly used threshold of 0.8 and the alpha error probability at 0.05. The required sample size is calculated to be 26. This ensures that our study has adequate power to yield valid statistical results. Furthermore, compared to earlier within-subject studies like Schallmo et al.'s 2018 research, which used 22 datasets to examine GABA levels in MT+ and the early visual cortex (EVC), our study includes a more extensive dataset.

      (3) In Schallmo elife 2018, there was no correlation between GABA concentration and SI. How can we justify the different results different here?

      Thank reviewer for pointing this out. There are several differences between us:

      a. While the earlier study by Schallmo et al. (2018) employed 3T MRS, we utilize 7T MRS, enhancing our ability to detect and measure GABA with greater accuracy.

      b. Schallmo elife 2018 choose to use the bilateral hMT+ as the MRS measurement region while we use the left hMT+. The reason why we focus on left hMT+ are describe in reviewer 1. (6). Briefly, use of left MT/V5 as a target was motivated by studies demonstrating that left MT/V5 TMS is more effective at causing perceptual effects (Tadin et al., 2011).

      c. The resolution of MRS sequence in Schallmo elife 2018 is 3 cm isotropic voxel, while we apply 2 cm isotropic voxel. This helps us more precisely locate hMT+ and exclude more white matter signal.

      (4) Basically this study contains the data of SI, BDT, GABA in MT+ and V1, Glu in MT+ and V1-all 6 measurements. There should be 6x5/2 = 15 pairwise correlations. However, not all of these results are included in Figure 1 and supplementary 1-3. I understand that it is not necessary to include all figures. But I suggest reporting all values in one Table.

      We thank the reviewer for the good suggestion, we are planning to make a correlation matrix to reporting all values.

      (5) In Melnick (2013), the IQ scores were measured by the full set of WAIS-III, including all subtests. However, this study only used the visual spatial domain of gF. I wonder why only the visuo-spatial subtest was used not the full WAIS-III?

      We thank the reviewer for pointing this out. The decision was informed by Melnick’s findings which indicated high correlations between Surround suppression (SI) and the Verbal Comprehension, Perceptual Reasoning, Working Memory, and Processing Speed Indexes, with correlation coefficients of 0.69, 0.47, 0.49, and 0.50, respectively. It is well-established that the hMT+ region of the brain is a sensory cortex involved in visual perception processing (3D perception). Furthermore, motion surround suppression (SI), a specific function of hMT+, aligns closely with this region's activities. Given this context, the Perception Reasoning sub-ability was deemed to have the clearest mechanism for further exploration. Consequently, we selected the most representative subtest of Perception Reasoning—the Block Design Test—which primarily assesses 3D visual intelligence.

      (6) In the functional connectivity part, there is no explanation as to why only the left MT+ was set to the seed region. What is the problem with the right MT+?

      We thank the reviewer for pointing this out. The main reason is that our MRS ROI is the left hMT+, we would like to make different models’ ROI consistent to each other. Use of left MT/V5 as a target was motivated by studies demonstrating that left MT/V5 TMS is more effective at causing perceptual effects (Tadin et al., 2011). In addition, we will check the results of our localizer to confirm whether similar findings are consistently replicated.

      (7) In Melnick (2013), the authors also reported the correlation between IQ and absolute duration thresholds of small and large stimuli. Please include these analyses as well.

      We thank the reviewer for the good advice. Containing such result do help researchers compare the result between Melnick and us. We are planning to make such picture in the revised version.

      Reviewer #2 (Public Review):

      Summary:

      Recent studies have identified specific regions within the occipito-temporal cortex as part of a broader fronto-parietal, domain-general, or "multiple-demand" (MD) network that mediates fluid intelligence (gF). According to the abstract, the authors aim to explore the mechanistic roles of these occipito-temporal regions by examining GABA/glutamate concentrations. However, the introduction presents a different rationale: investigating whether area MT+ specifically, could be a core component of the MD network.

      Strengths:

      The authors provide evidence that GABA concentrations in MT+ and its functional connectivity with frontal areas significantly correlate with visuo-spatial intelligence performance. Additionally, serial mediation analysis suggests that inhibitory mechanisms in MT+ contribute to individual differences in a specific subtest of the Wechsler Adult Intelligence Scale, which assesses visuo-spatial aspects of gF.

      Weaknesses:

      (1) While the findings are compelling and the analyses robust, the study's rationale and interpretations need strengthening. For instance, Assem et al. (2020) have previously defined the core and extended MD networks, identifying the occipito-temporal regions as TE1m and TE1p, which are located more rostrally than MT+. Area MT+ might overlap with brain regions identified previously in Fedorenko et al., 2013, however the authors attribute these activations to attentional enhancement of visual representations in the more difficult conditions of their tasks. For the aforementioned reasons, It is unclear why the authors chose MT+ as their focus. A stronger rationale for this selection is necessary and how it fits with the core/extended MD networks.

      We really appreciate reviewer’s opinions. The reason why we focus on hMT+ is following: According to the results of Melnick 2013, the motion surround suppression (SI) and the time thresholds of small and large gratings representing hMT+ functionality are correlated with Verbal Comprehension, Perceptual Reasoning, Working Memory, and Processing Speed Indicators, with high correlation coefficients of 0.69, 0.47, 0.49, and 0.50, respectively. In addition, Fedorenko et al. 2013, the averaged MD activity region appears to overlap with hMT+. Based on these findings, we assume that hMT+ does have the potential to become the core of MD system.

      (2) Moreover, although the study links MT+ inhibitory mechanisms to a visuo-spatial component of gF, this evidence alone may not suffice to position MT+ as a new core of the MD network. The MD network's definition typically encompasses a range of cognitive domains, including working memory, mathematics, language, and relational reasoning. Therefore, the claim that MT+ represents a new core of MD needs to be supported by more comprehensive evidence.

      Thank reviewer for pointing this out. After careful consideration, we agree with your point of view. Due to our results only delving into visuo-spatial intelligence, it is not yet sufficient to prove that hMT is the core node of the MD system. We will adjust the explanatory logic of the article, that is, emphasizing the de-redundancy of hMT+in visual-spatial intelligence and the improvement of information processing efficiency, while weakening the significance of hMT+ in MD systems.

      Reviewer #3 (Public Review):

      Summary:

      This manuscript aims to understand the role of GABA-ergic inhibition in the human MT+ region in predicting visuo-spatial intelligence through a combination of behavioral measures, fMRI (for functional connectivity measurement), and MRS (for GABA/glutamate concentration measurement). While this is a commendable goal, it becomes apparent that the authors lack fundamental understanding of vision, intelligence, or the relevant literature. As a result, the execution of the research is less coherent, dampening the enthusiasm of the review.

      Strengths:

      (1) Comprehensive Approach: The study adopts a multi-level approach, i.e., neurochemical analysis of GABA levels, functional connectivity, and behavioral measures to provide a holistic understanding of the relationship between GABA-ergic inhibition and visuo-spatial intelligence.

      (2) Sophisticated Techniques: The use of ultra-high field magnetic resonance spectroscopy (MRS) technology for measuring GABA and glutamate concentrations in the MT+ region is a recent development.

      Weaknesses:

      Study Design and Hypothesis

      (1) The central hypothesis of the manuscript posits that "3D visuo-spatial intelligence (the performance of BDT) might be predicted by the inhibitory and/or excitation mechanisms in MT+ and the integrative functions connecting MT+ with the frontal cortex." However, several issues arise:

      (1.1) The Suppression Index depicted in Figure 1a, labeled as the "behavior circle," appears irrelevant to the central hypothesis.

      We thank the reviewer for pointing this out. In our study, the inhibitory mechanisms in hMT+ are conceptualized through two models: the neurotransmitter model and the behavior model. The Suppression Index is essential for elucidating the local inhibitory mechanisms within behavior model. However, we acknowledge that our initial presentation in the introduction may not have clearly articulated our hypothesis, potentially leading to misunderstandings. We plan to revise the introduction to better clarify these connections and ensure the relevance of the Suppression Index is comprehensively understood.

      (1.2) The construct of 3D visuo-spatial intelligence, operationalized as the performance in the Block Design task, is inconsistently treated as another behavioral task throughout the manuscript, leading to confusion.

      We thank the reviewer for pointing this out. We acknowledge that our manuscript may have inconsistently presented this construct across different sections, causing confusion. To address this, we plan to ensure a consistent description of 3D visuo-spatial intelligence in both the introduction and the discussion sections. But we would like to maintain 'Block Design task score' within the results section to help readers clarify which subtest we use.

      (1.3) The schematics in Figure 1a and Figure 6 appear too high-level to be falsifiable. It is suggested that the authors formulate specific and testable hypotheses and preregister them before data collection.

      We thank the reviewer for pointing this out. We are planning to revise the Figure 1a and make it less abstract and more logical. For Figure 6, the schematic represents our theoretical framework of how hMT+ works in the 3D viso-spatial intelligence, we believe the elements within this framework are grounded in related theories and supported by evidence discussed in our results and discussions section, making them specific and testable.

      (2) Central to the hypothesis and design of the manuscript is a misinterpretation of a prior study by Melnick et al. (2013). While the original study identified a strong correlation between WAIS (IQ) and the Suppression Index (SI), the current manuscript erroneously asserts a specific relationship between the block design test (from WAIS) and SI. It should be noted that in the original paper, WAIS comprises Similarities, Vocabulary, Block design, and Matrix reasoning tests in Study 1, while the complete WAIS is used in Study 2. Did the authors conduct other WAIS subtests other than the block design task?

      Thanks for pointing this out. Reviewer #1 also asked this question, we copy the answers in here “The decision was informed by Melnick’s findings which indicated high correlations between Surround suppression (SI) and the Verbal Comprehension, Perceptual Reasoning, Working Memory, and Processing Speed Indexes, with correlation coefficients of 0.69, 0.47, 0.49, and 0.50, respectively. It is well-established that the hMT+ region of the brain is a sensory cortex involved in visual perception processing (3D perception). Furthermore, motion surround suppression (SI), a specific function of hMT+, aligns closely with this region's activities. Given this context, the Perception Reasoning sub-ability was deemed to have the clearest mechanism for further exploration. Consequently, we selected the most representative subtest of Perception Reasoning—the Block Design Test—which primarily assesses 3D visual intelligence.”

      (3) Additionally, there are numerous misleading references and unsubstantiated claims throughout the manuscript. As an example of misleading reference, "the human MT ... a key region in the multiple representations of sensory flows (including optic, tactile, and auditory flows) (Bedny et al., 2010; Ricciardi et al., 2007); this ideally suits it to be a new MD core." The two references in this sentence are claims about plasticity in the congenitally blind with sensory deprivation from birth, which is not really relevant to the proposal that hMT+ is a new MD core in healthy volunteers.

      Thanks for pointing this out. We have carefully read the corresponding references and considered the corresponding theories and agree with these comments. Due to our results only delving into “the GABA-ergic inhibition in human MT predicts visuo-spatial intelligence mediated by reverberation with frontal cortex”, it is not yet sufficient to prove that hMT+ is the core node of the MD system, we will adjust the explanatory logic of the article, that is, emphasizing the de redundancy of hMT+in visual-spatial intelligence and the improvement of information processing efficiency, while weakening the significance of hMT+ in MD systems. In addition, regarding the potential central role of hMT+ in the MD system, we agree with your view that research on hMT+ as a multisensory integration hub mainly focuses on developmental processes. Meanwhile, in adults, the MST region of hMT+ is considered a multisensory integration area for visual and vestibular inputs, which potentially supports the role of hMT+ in multitasking multisensory systems (Gu et al., J. Neurosci, 26(1), 73–85, 2006; Fetsch et al., Nat. Neurosci, 15, 146–154, 2012.). Further research could explore how other intelligence sub-ability such as working memory and language comprehension are facilitated by hMT+'s features.

      Another example of unsubstantiated claim: the rationale for selecting V1 as the control region is based on the assertion that "it mediates the 2D rather than 3D visual domain (Born & Bradley, 2005)". That's not the point made in the Born & Bradley (2005) paper on MT. It's crucial to note that V1 is where the initial binocular convergence occurs in cortex, i.e., inputs from both the right and left eyes to generate a perception of depth.

      Thank you for pointing this out. We acknowledge the inappropriate citation of "Born & Bradley, 2005," which focuses solely on the structure and function of the visual area MT. However, we believe that choosing hMT+ as the domain for 3D visual analysis and V1 as the control region is justified. Cumming and DeAngelis (Annu Rev Neurosci, 24:203–238.2001) state that binocular disparity provides the visual system with information about the three-dimensional layout of the environment, and the link between perception and neuronal activity is stronger in the extrastriate cortex (especially MT) than in the primary visual cortex(V1). This supports our choice and emphasizes the relevance of MT+ in our study. We will revise our reference in the revised version.

      Results & Discussion

      (1) The missing correlation between SI and BDT is crucial to the rest of the analysis. The authors should discuss whether they replicated the pattern of results from Melnick et al. (2013) despite using only one WAIS subtest.

      We thank for reviewer’s suggestion. Now the correlation result is placed in the supplemental material, we will put it back to the main text.

      (2) ROIs: can the authors clarify if the results are based on bilateral MT+/V1 or just those in the left hemisphere? Can the authors plot the MRS scan area in V1? I would be surprised if it's precise to V1 and doesn't spread to V2/3 (which is fine to report as early visual cortex).

      We thank for reviewer’s suggestion. We plan to draw the V1 ROI MRS scanning area and use the visual template to check if the scanning area contains V2/3. If it does, we will refer to it as the early visual cortex rather than specifically V1 in our reporting.

      (3) Did the authors examine V1 FC with either the frontal regions and/or whole brain, as a control analysis? If not, can the author justify why V1 serves as the control region only in the MRS but not in FC (Figure 4) or the mediation analysis (Figure 5)? That seems a little odd given that control analyses are needed to establish the specificity of the claim to MT+

      We thank for reviewer’s suggestion. We plan to do the V1 FC-behavior connection as control analysis. For mediation analysis, since V1 GABA/Glu has no correlation with BDT score, it is not sufficient to apply mediation analysis.

      (4) It is not clear how to interpret the similarity or difference between panels a and b in Figure 4.

      We thank reviewer for pointing this out. We plan to further interpret the difference between a and b in the revised version. Panels a represents BDT score correlated hMT+-region FC, which is obviously involved in frontal cortex. While panels b represents SI correlated hMT+-region FC, which shows relatively less regions. The overlap region is what we are interested in and explain how local inhibitory mechanisms works in the 3D viso-spatial intelligence. In addition, we would like to revise Figure 4 and point out the overlap region.

      (5) SI is not relevant to the authors‘ priori hypothesis, but is included in several mediation analyses. Can the authors do model comparisons between the ones in Figure 5c, d, and Figure S6? In other words, is SI necessary in the mediation model? There seem discrepancies between the necessity of SI in Figures 5c/S6 vs. Figure 5d.

      We thank the reviewer for highlighting this point. The relationship between the Suppression Index (SI) and our a priori hypotheses is elaborated in the response to reviewer 3, section (1). SI plays a crucial role in explicating how local inhibitory mechanisms function within the context of the 3D visuo-spatial task. Additionally, Figure 5c illustrates the interaction between the frontal cortex and hMT+, showing how the effects from the frontal cortex (BA46) on the Block Design Task are fully mediated by SI. This further underscores the significance of SI in our model.

      (6) The sudden appearance of "efficient information" in Figure 6, referring to the neural efficiency hypothesis, raises concerns. Efficient visual information processing occurs throughout the visual cortex, starting from V1. Thus, it appears somewhat selective to apply the neural efficiency hypothesis to MT+ in this context.

      We thank the reviewer for highlighting this point. There is no doubt that V1 involved in efficient visual information processing. However, in our result, the V1 GABA has no significant correlation between BDT score, suggesting that the V1 efficient processing might not sufficiently account for the individual differences in 3D viso-spatial intelligence. Additionally, we will clarify our use of the neural efficiency hypothesis by incorporating it into the introduction of our paper to better frame our argument.

      Transparency Issues:

      (1) Don't think it's acceptable to make the claim that "All data needed to evaluate the conclusions in the paper are present in the paper and/or the Supplementary information". It is the results or visualizations of data analysis, rather than the raw data themselves, that are presented in the paper/supp info.

      We thank reviewer for pointing this out. We realized that such expression will lead to confusion. We will delete this expression.

      (2) No GitHub link has been provided in the manuscript to access the source data, which limits the reproducibility and transparency of the study.

      We thank reviewer for pointing this out. We will attach the GitHub link in the revised version.

      Minor:

      "Locates" should be replaced with "located" throughout the paper. For example: "To investigate this issue, this study selects the human MT complex (hMT+), a region located at the occipito-temporal border, which represents multiple sensory flows, as the target brain area."

      We thank reviewer for pointing this out. We will revise it.

      Use "hMT+" instead of "MT+" to be consistent with the term in the literature.

      We thank reviewer for pointing this out. We agree to use hMT+ in the literature.

      "Green circle" in Figure 1 should be corrected to match its actual color.

      We thank reviewer for pointing this out. We will revise it.

      The abbreviation for the Wechsler Adult Intelligence Scale should be "WAIS," not "WASI."

      We thank reviewer for pointing this out. We will revise it.

    1. Author Response:

      We appreciate the thorough comments from the reviewers. Before revising the manuscript, we would like to briefly reply to the main concerns raised:

      • Is pupil size a reliable proxy of effort? A vast amount of work demonstrates that pupil size sensitively scales with fluctuations in effort: for instance, the pupil dilates when increasing load in working memory, or multiple object tracking tasks, and such pupillary effects robustly explain individual differences in cognitive ability and fluctuations in performance across trials.1–4 This extends to the planning of movements as pupil dilations are observed prior to the execution of (eye) movements.5 As reviewed previously6–12 (based on vast literature each), any increase in effort is associated with an increase in pupil size. Inadvertently, we phrased as if the link between effort and pupil size was established via shared neural correlates. However, this is not the case as the link between effort and pupil size had been established well before the underlying neural circuitry of this relationship was investigated in detail. During the revision, we plan to rewrite this section to clarify that pupil size indexes effort and to provide a clear distinction between this link and putative neural underpinnings of such effort-linked modulations.

      • Is saccade latency an alternative explanation for the link between effort and saccade selection? Longer saccade latencies may imply more complex oculomotor programming (e.g. saccades with larger amplitudes require longer latencies for non-microsaccades13, and latencies increase when distractors are presented14), and latencies are indeed known to differ across directions15,16. As suggested, it is possible that saccade latencies may also predict saccade preferences. However, even if this is the case, this would not constitute an alternative explanation. As saccade latency may index oculomotor programming complexity, it can potentially be considered an alternative outcome measure of effort, albeit restricted to the context of saccades. Therefore, if saccade latencies predict saccade preferences, this would not affect our conclusion, rather it would constitute as converging evidence that supports the conclusion that effort drives saccade selection.

      A related question is why one would use pupil size as a measure of effort, given the methodological care that pupillometry requires. There are a number of points that make pupil size sensible and promising in comparison with saccade latencies. In contrast to saccade latencies, pupil size allows to capture the effort of different effector systems (e.g. head or hand movements), and potentially even the effort associated with covert shifts of attention. Moreover, pupil size is a temporally rich and continuous measure that allows to isolate processes unfolding prior to (eye) movement onset (e.g. oculomotor programming). Together, this makes pupil size a powerful tool to study the costs of visual selection more broadly. In the revision, we will add analyses incorporating latencies and other other saccade metrics. We will also discuss the differences between pupil size and saccade latencies in capturing saccade costs and effort.

      • Are the current results causal or correlational? Most of the currently reported results are indeed correlational in nature. In our first tasks, we correlated pupil size during saccade planning to saccade preferences in a subsequent task. Although the link between across tasks was correlational, the observed relationship clearly followed our previously specified hypothesis.17 Moreover, experiments 1 and 2 of the visual search data replicated and extended this relationship. We also directly manipulated cognitive demand in the second visual search experiment. In line with the hypothesis that effort affects saccade selection, participants executed less saccades overall when performing a (primary) auditory dual task, and even cut the costly saccades most. Whilst mostly correlational, we do not know of a more fitting and parsimonious explanation for our findings than effort predicting saccade selection. We will address causality in the discussion for transparency and point more clearly to the second visual search experiment for causal evidence.

      References

      (1) Alnæs, D. et al. Pupil size signals mental effort deployed during multiple object tracking and predicts brain activity in the dorsal attention network and the locus coeruleus. J. Vis. 14, 1 (2014).

      (2) Koevoet, D., Strauch, C., Van der Stigchel, S., Mathôt, S. & Naber, M. Revealing visual working memory operations with pupillometry: Encoding, maintenance, and prioritization. WIREs Cogn. Sci. e1668 (2023) doi:10.1002/wcs.1668.

      (3) Robison, M. K. & Unsworth, N. Pupillometry tracks fluctuations in working memory performance. Atten. Percept. Psychophys. 81, 407–419 (2019).

      (4) Unsworth, N. & Miller, A. L. Individual Differences in the Intensity and Consistency of Attention. Curr. Dir. Psychol. Sci. 30, 391–400 (2021).

      (5) Richer, F. & Beatty, J. Pupillary Dilations in Movement Preparation and Execution. Psychophysiology 22, 204–207 (1985).

      (6) Bumke, O. Die Pupillenstörungen Bei Geistes-Und Nervenkrankheiten. (Fischer, 1911).

      (7) Kahneman, D. Attention and Effort. (Prentice-Hall, 1973).

      (8) van der Wel, P. & van Steenbergen, H. Pupil dilation as an index of effort in cognitive control tasks: A review. Psychon. Bull. Rev. 25, 2005–2015 (2018).

      (9) Loewenfeld, I. E. Mechanisms of reflex dilatation of the pupil. Doc. Ophthalmol. 12, 185–448 (1958).

      (10) Mathôt, S. Pupillometry: Psychology, Physiology, and Function. J. Cogn. 1, 16 (2018).

      (11) Sirois, S. & Brisson, J. Pupillometry. WIREs Cogn. Sci. 5, 679–692 (2014).

      (12) Strauch, C., Wang, C.-A., Einhäuser, W., Van der Stigchel, S. & Naber, M. Pupillometry as an integrated readout of distinct attentional networks. Trends Neurosci. 45, 635–647 (2022).

      (13) Kalesnykas, R. P. & Hallett, P. E. Retinal eccentricity and the latency of eye saccades. Vision Res. 34, 517–531 (1994).

      (14) Walker, R., Deubel, H., Schneider, W. X. & Findlay, J. M. Effect of Remote Distractors on Saccade Programming: Evidence for an Extended Fixation Zone. J. Neurophysiol. 78, 1108–1119 (1997).

      (15) Hanning, N. M., Himmelberg, M. M. & Carrasco, M. Presaccadic attention enhances contrast sensitivity, but not at the upper vertical meridian. iScience 25, 103851 (2022).

      (16) Hanning, N. M., Himmelberg, M. M. & Carrasco, M. Presaccadic Attention Depends on Eye Movement Direction and Is Related to V1 Cortical Magnification. J. Neurosci. 4

      4, (2024).

      (17) Koevoet, D., Strauch, C., Naber, M. & Van der Stigchel, S. The Costs of Paying Overt and Covert Attention Assessed With Pupillometry. Psychol. Sci. 34, 887–898 (2023).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      We thank the reviewers for the detailed assessment of our work as well as their praise and constructive feedback which helped us to significantly improve our manuscript.

      Reviewer #1 (Public Review):

      The inferior colliculus (IC) is the central auditory system's major hub. It integrates ascending brainstem signals to provide acoustic information to the auditory thalamus. The superficial layers of the IC ("shell" IC regions as defined in the current manuscript) also receive a massive descending projection from the auditory cortex. This auditory cortico-collicular pathway has long fascinated the hearing field, as it may provide a route to funnel "high-level" cortical signals and impart behavioral salience upon an otherwise behaviorally agnostic midbrain circuit.

      Accordingly, IC neurons can respond differently to the same sound depending on whether animals engage in a behavioral task (Ryan and Miller 1977; Ryan et al., 1984; Slee & David, 2015; Saderi et al., 2021; De Franceschi & Barkat, 2021). Many studies also report a rich variety of non-auditory responses in the IC, far beyond the simple acoustic responses one expects to find in a "low-level" region (Sakurai, 1990; Metzger et al., 2006; Porter et al., 2007). A tacit assumption is that the behaviorally relevant activity of IC neurons is inherited from the auditory cortico-collicular pathway. However, this assumption has never been tested, owing to two main limitations of past studies:

      (1) Prior studies could not confirm if data were obtained from IC neurons that receive monosynaptic input from the auditory cortex.

      (2) Many studies have tested how auditory cortical inactivation impacts IC neuron activity; the consequence of cortical silencing is sometimes quite modest. However, all prior inactivation studies were conducted in anesthetized or passively listening animals. These conditions may not fully engage the auditory cortico-collicular pathway. Moreover, the extent of cortical inactivation in prior studies was sometimes ambiguous, which complicates interpreting modest or negative results.

      Here, the authors' goal is to directly test if auditory cortex is necessary for behaviorally relevant activity in IC neurons. They conclude that surprisingly, task relevant activity in cortico-recipient IC neuron persists in absence of auditory cortico-collicular transmission. To this end, a major strength of the paper is that the authors combine a sound-detection behavior with clever approaches that unambiguously overcome the limitations of past studies.

      First, the authors inject a transsynaptic virus into the auditory cortex, thereby expressing a genetically encoded calcium indicator in the auditory cortex's postsynaptic targets in the IC. This powerful approach enables 2-photon Ca2+ imaging from IC neurons that unambiguously receive monosynaptic input from auditory cortex. Thus, any effect of cortical silencing should be maximally observable in this neuronal population. Second, they abrogate auditory cortico-collicular transmission using lesions of auditory cortex. This "sledgehammer" approach is arguably the most direct test of whether cortico-recipient IC neurons will continue to encode task-relevant information in absence of descending feedback. Indeed, their method circumvents the known limitations of more modern optogenetic or chemogenetic silencing, e.g. variable efficacy.

      I also see three weaknesses which limit what we can learn from the authors' hard work, at least in the current form. I want to emphasize that these issues do not reflect any fatal flaw of the approach. Rather, I believe that their datasets likely contain the treasure-trove of knowledge required to completely support their claims.

      (1) The conclusion of this paper requires the following assumption to be true: That the difference in neural activity between Hit and Miss trials reflects "information beyond the physical attributes of sound." The data presentation complicates asserting this assumption. Specifically, they average fluorescence transients of all Hit and all Miss trials in their detection task. Yet, Figure 3B shows that mice's d' depends on sound level, and since this is a detection task the smaller d' at low SPLs presumably reflects lower Hit rates (and thus higher Miss rates). As currently written, it is not clear if fluorescence traces for Hits arise from trials where the sound cue was played at a higher sound level than on Miss trials. Thus, the difference in neural activity on Hit and Miss trials could indeed reflect mice's behavior (licking or not licking). But in principle could also be explained by higher sound-evoked spike rates on Hit compared to Miss trials, simply due to louder click sounds. Indeed, the amplitude and decay tau of their indicator GCaMP6f is non-linearly dependent on the number and rate of spikes (Chen et al., 2013), so this isn't an unreasonable concern.

      (2) The authors' central claim effectively rests upon two analyses in Figures 5 and 6. The spectral clustering algorithm of Figure 5 identifies 10 separate activity patterns in IC neurons of control and lesioned mice; most of these clusters show distinct activity on averaged Hit and Miss trials. They conclude that although the proportions of neurons from control and lesioned mice in certain clusters deviates from an expected 50/50 split, neurons from lesioned mice are still represented in all clusters. A significant issue here is that in addition to averaging all Hits and Miss trials together, the data from control and lesioned mice are lumped for the clustering. There is no direct comparison of neural activity between the two groups, so the reader must rely on interpreting a row of pie charts to assess the conclusion. It's unclear how similar task relevant activity is between control and lesioned mice; we don't even have a ballpark estimate of how auditory cortex does or does not contribute to task relevant activity. Although ideally the authors would have approached this by repeatedly imaging the same IC neurons before and after lesioning auditory cortex, this within-subjects design may be unfeasible if lesions interfere with task retention. Nevertheless, they have recordings from hundreds to thousands of neurons across two groups, so even a small effect should be observable in a between-groups comparison.

      (3) In Figure 6, the authors show that logistic regression models predict whether the trial is a Hit or Miss from their fluorescence data. Classification accuracy peaks rapidly following sound presentation, implying substantial information regarding mice's actions. The authors further show that classification accuracy is reduced, but still above chance in mice with auditory cortical lesions. The authors conclude from this analysis task relevant activity persists in absence of auditory cortex. In principle I do not disagree with their conclusion.

      The weakness here is in the details. First, the reduction in classification accuracy of lesioned mice suggests that auditory cortex does nevertheless transmit some task relevant information, however minor it may be. I feel that as written, their narrative does not adequately highlight this finding. Rather one could argue that their results suggest redundant sources of task-relevant activity converging in the IC. Secondly, the authors conclude that decoding accuracy is impaired more in partially compared to fully lesioned mice. They admit that this conclusion is at face value counterintuitive, and provide compelling mechanistic arguments in the Discussion. However, aside from shaded 95% CIs, we have no estimate of variance in decoding accuracy across sessions or subjects for either control or lesioned mice. Thus we don't know if the small sample sizes of partial (n = 3) and full lesion (n = 4) groups adequately sample from the underlying population. Their result of Figure 6B may reflect spurious sampling from tail ends of the distributions, rather than a true non-monotonic effect of lesion size on task relevant activity in IC.

      Our responses to the ‘recommendations for the authors’ below lay out in detail how we addressed each comment and concern. Besides filling in key information about how our original analysis aimed at minimizing any potential impact of differences in sound level distributions - namely that trials used for decoding were limited to a subset of sound levels - and which was accidentally omitted in the original manuscript, we have now carried out several additional analyses.

      We would like to highlight one of these because it supplements both the clustering and decoding analysis that we conducted to compare hit and miss trial activity, and directly addresses what the reviewer identified as our work’s main weakness (a possible confound between animal behavior and sound level distributions) and the request for an analysis that operates at the level of single units rather than the population level. Specifically, we assessed, separately for each recorded neuron, whether there was a statistically significant difference in the magnitude of neural activity between hit and miss trials. This approach allowed us to fully balance the numbers of hit and miss trials at each sound level that were entered into the analysis. The results revealed that a large proportion (close to 50%) of units were task modulated, i.e. had significantly different response magnitudes between hit and miss trials, and that this proportion was not significantly different between lesioned and non-lesioned mice. We hope that this, together with the rest of our responses, convincingly demonstrates that the shell of the IC encodes mouse sound detection behavior even when top-down input from the auditory cortex is absent.

      Reviewer #2 (Public Review):

      Summary:

      This study takes a new approach to studying the role of corticofugal projections from auditory cortex to inferior colliculus. The authors performed two-photon imaging of cortico-recipient IC neurons during a click detection task in mice with and without lesions of auditory cortex. In both groups of animals, they observed similar task performance and relatively small differences in the encoding of task-response variables in the IC population. They conclude that non-cortical inputs to the IC provide can substantial task-related modulation, at least when AC is absent. Strengths:

      This study provides valuable new insight into big and challenging questions around top-down modulation of activity in the IC. The approach here is novel and appears to have been executed thoughtfully. Thus, it should be of interest to the community.

      Weaknesses: There are, however, substantial concerns about the interpretation of the findings and limitations to the current analysis. In particular, Analysis of single unit activity is absent, making interpretation of population clusters and decoding less interpretable. These concerns should be addressed to make sure that the results can be interpreted clearly in an active field that already contains a number of confusing and possibly contradictory findings.

      Our responses to the ‘recommendations for the authors’ below lay out in detail how we addressed each comment and concern. Several additional analyses have now been carried out including ones that operate at the level of single units rather than the population level, as requested by the reviewer. We would like to briefly highlight one here because it supplements both the clustering and decoding analysis that we conducted to compare hit and miss trial activity and directly addresses what the other reviewers identified as our work’s main weakness (a possible confound between animal behavior and sound level distributions). Specifically, we assessed, separately for each recorded neuron, whether there was a statistically significant difference in the magnitude of neural activity between hit and miss trials. This approach allowed us to fully balance the numbers of hit and miss trials at each sound level that were entered into the analysis. The results revealed that a large proportion (close to 50%) of units were task modulated, i.e. had significantly different response magnitudes between hit and miss trials, and that this proportion was not significantly different between lesioned and non-lesioned mice. We hope that this, together with the rest of our responses, convincingly demonstrates that the shell of the IC encodes mouse sound detection behavior even when top-down input from the auditory cortex is absent.

      Reviewer #3 (Public Review):

      Summary:

      This study aims to demonstrate that cortical feedback is not necessary to signal behavioral outcome to shell neurons of the inferior colliculus during a sound detection task. The demonstration is achieved by the observation of the activity of cortico-recipient neurons in animals which have received lesions of the auditory cortex. The experiment shows that neither behavior performance nor neuronal responses are significantly impacted by cortical lesions except for the case of partial lesions which seem to have a disruptive effect on behavioral outcome signaling. Strengths:

      The experimental procedure is based on state of the art methods. There is an in depth discussion of the different effects of auditory cortical lesions on sound detection behavior. Weaknesses:

      The analysis is not documented enough to be correctly evaluated. Have the authors pooled together trials with different sound levels for the key hit vs miss decoding/clustering analysis? If so, the conclusions are not well supported, as there are more misses for low sound levels, which would completely bias the outcome of the analysis. It would possible that the classification of hit versus misses actually only reflects a decoding of sound level based on sensory responses in the colliculus, and it would not be surprising then that in the presence or absence of cortical feedback, some neurons responds more to higher sound levels (hits) and less to lower sound levels (misses). It is important that the authors clarify and in any case perform an analysis in which the classification of hits vs misses is done only for the same sound levels. The description of feedback signals could be more detailed although it is difficult to achieve good temporal resolution with the calcium imaging technique necessary for targeting cortico-recipient neurons.

      Our responses to the ‘recommendations for the authors’ below lay out in detail how we addressed each comment and concern. Besides filling in key information about how our original analysis aimed at minimizing any potential impact of differences in sound level distributions - namely that trials used for decoding were limited to a subset of sound levels - and which was accidentally omitted in the original manuscript, we have now carried out several additional analyses to directly address what the reviewer identified as our work’s main weakness (a possible confound between animal behavior and sound level distributions). This includes an analysis in which we were able to demonstrate for one imaging session with a sufficiently large number of trials that limiting the trials entered into the decoding analysis to those from a single sound level did not meaningfully impact decoding accuracy. We would like to highlight another new analysis here because it supplements both the clustering and decoding analyses that we conducted to compare hit and miss trial activity and addresses the other reviewers’ request for an analysis that operates at the level of single units rather than the population level. Specifically, we assessed, separately for each recorded neuron, whether there was a statistically significant difference in the magnitude of neural activity between hit and miss trials. This approach allowed us to fully balance the numbers of hit and miss trials at each sound level that were entered into the analysis. The results revealed that a large proportion (close to 50%) of units were task modulated, i.e. had significantly different response magnitudes between hit and miss trials, and that this proportion was not significantly different between lesioned and non-lesioned mice. We hope that this, together with the rest of our responses, convincingly demonstrates that the shell of the IC encodes mouse sound detection behavior even when top-down input from the auditory cortex is absent.

      Reviewer #1 (Recommendations For The Authors):

      Thank you for the opportunity to read your paper. I think the conclusion is exciting. Indeed, you indicate that perhaps contrary to many of our (untested) assumptions, task-relevant activity in the IC may persist in absence of auditory cortex.

      As mentioned in my public review: Despite my interest in the work, I also think that there are several opportunities to significantly strengthen your conclusions. I feel this point is important because your work will likely guide the efforts of future students and post-docs working on this topic. The data can serve as a beacon to move the field away from the (somewhat naïve) idea that the evolved forebrain imparts behavioral relevance upon an otherwise uncivilized midbrain. This knowledge will inspire a search for alternative explanations. Indeed, although you don't highlight it in your narrative, your results dovetail nicely with several studies showing task-relevant activity in more ventral midbrain areas that project to the IC (e.g., pedunculopontine nuclei; see work from Hikosaka in monkeys, and more recently in mice from Karel Svoboda's lab).

      Thanks for the kind words.

      These studies, in particular the work by Inagaki et al. (2022) outlining how the transformation of an auditory go signal into movement could be mediated via a circuit involving the PPN/MRN (which might rely on the NLL for auditory input) and the motor thalamus, are indeed highly relevant.

      We made the following changes to the manuscript text.

      Line 472:”...or that the auditory midbrain, thalamus and cortex are bypassed entirely if simple acousticomotor transformations, such as licking a spout in response to a sound, are handled by circuits linking the auditory brainstem and motor thalamus via pedunculopontine and midbrain reticular nuclei (Inagaki et al., 2022).”

      The beauty of the eLife experiment is that you are free to incorporate or ignore these suggestions. After all, it's your paper, not mine. Nevertheless, I hope you find my comments useful.<br /> First, a few suggestions to address my three comments in the public review.

      Suggestion for public comment #1: An easy way to address this issue is to average the neural activity separately for each trial outcome at each sound level. That way you can measure if fluorescence amplitude (or integral) varies as a function of mice's action rather than sound level. This approach to data organization would also open the door to the additional analyses for addressing comment #2, such as directly comparing auditory and putatively non-auditory activity in neurons recorded from control and lesioned mice.

      We have carried out additional analyses for distinguishing between the two alternative explanations of the data put forward by the reviewer: That the difference in neural activity between hit and miss trials reflects a) behavior or b) sound level (more precisely: differences in response magnitude arising from a higher proportion of high-sound-level trials in the hit trial group than in the miss trial group). If the data favored b), we would expect no difference in activity between hit and miss trials when plotted separately for each sound level. The new Figure 4 - figure supplement 1 indicates that this is not the case. Hit and miss trial activity are clearly distinct even when plotted separately for different sound levels, confirming that this difference in activity reflects the animals’ behavior rather than sensory information.

      Changes to manuscript.

      Line 214: “While averaging across all neurons cannot capture the diversity of responses, the averaged response profiles suggest that it is mostly trial outcome rather than the acoustic stimulus and neuronal sensitivity to sound level that shapes those responses (Figure 4 – figure supplement 1).”

      Additionally, we assessed for each neuron separately whether there was a significant difference between hit and miss trial activity and therefore whether the activity of the neuron could be considered “task-modulated”. To achieve this, we used equal numbers of hit and miss trials at each sound level to ensure balanced sound level distributions and thus rule out any potential confound between sound level distributions and trial outcome. This analysis revealed that the proportion of task-modulated neurons was very high (close to 50%) and not significantly different between lesioned and non-lesioned mice (Figure 6 - figure supplement 3).

      Changes to the manuscript.

      Line 217: “Indeed, close to half (1272 / 2649) of all neurons showed a statistically significant difference in response magnitude between hit and miss trials…”

      Line 307: “Although the proportion of individual neurons with distinct response magnitudes in hit and miss trials in lesioned mice did not differ from that in non-lesioned mice, it was significantly lower when separating out mice with partial lesions (Figure 6 – figure supplement 3).”

      Differences in the distributions of sound levels in the different trial types could also potentially confound the decoding into hit and miss trials. Our original analysis was actually designed to take this into account but, unfortunately, we failed to include sufficient details in the methods section.

      Changes to the manuscript.

      Line 710: “Rather than including all the trials in a given session, only trials of intermediate difficulty were used for the decoding analysis. More specifically, we only included trials across five sound levels, comprising the lowest sound level that exceeded a d’ of 1.5 plus the two sound levels below and above that level. That ensured that differences in sound level distributions would be small, while still giving us a sufficient number of trials to perform the decoding analysis.“

      In this context, it is worth bearing in mind that a) the decoding analysis was done on a frame-byframe basis, meaning that the decoding score achieved early in the trial has no impact on the decoding score at later time points in the trial, b) sound-driven activity predominantly occurs immediately after stimulus onset and is largely over about 1 s into the trial (see cluster 3, for instance, or average miss trial activity in Figure 4 – figure supplement 1), c) decoding performance of the behavioral outcome starts to plateau 500-1000 ms into the trial and remains high until it very gradually begins to decline after about 2 s into the trial. In other words, decoding performance remains high far longer than the stimulus would be expected to have an impact on the neurons’ activity. Therefore, we would expect any residual bias due to differences in the sound level distribution that our approach did not control for to be restricted to the very beginning of the trial and not to meaningfully impact the conclusions derived from the decoding analysis.

      Finally, we carried out an additional decoding analysis for one imaging session in which we had a sufficient number of trials to perform the analysis not only over the five (59, 62, 65, 68, 71 dB SPL) original sound levels, but also over a reduced range of three (62, 65, 68 dB SPL) sound levels, as well as a single (65 dB SPL) sound level (Figure 6 - figure supplement 1). The mean sound level differences between the hit trial distributions and miss trial distributions for these three conditions were 3.08, 1.01 and 0 dB, respectively. This analysis suggests that decoding performance is not meaningfully impacted by changing the range of sound levels (and sound level distributions), other than that including fewer sound levels means fewer trials and thus noisier decoding.

      Changes to manuscript.

      Line 287: ”...and was not meaningfully affected by differences in sound level distributions between hit and miss trials (Figure 6 – figure supplement 1).”

      Suggestion for public comment #2: Perhaps a solution would be to display example neuron activity in each cluster, recorded in control and lesioned mice. The reader could then visually compare example data from the two groups, and immediately grasp the conclusion that task relevant activity remains in absence of auditory cortex. Additionally, one possibility might be to calculate the difference in neural activity between Hit and Miss trials for each task-modulated neuron. Then, you could compare these values for neurons recorded in control and lesion mice. I feel like this information would greatly add to our understanding of cortico-collicular processing.

      I would also argue that it's perhaps more informative to show one (or a few) example recordings rather than averaging across all cells in a cluster. Example cells would give the reader a better handle on the quality of the imaging, and this approach is more standard in the field. Finally, it would be useful to show the y axis calibration for each example trace (e.g. Figure 5 supp 1). That is also pretty standard so we can immediately grasp the magnitude of the recorded signal.

      We agree that while the information we provided shows that neurons from lesioned and nonlesioned groups are roughly equally represented across the clusters, it does not allow the reader to appreciate how similar the activity profiles of neurons are from each of the two groups. However, picking examples can be highly subjective and thus potentially open to bias. We therefore opted instead to display, separately for lesioned and non-lesioned mice, the peristimulus time histograms of all neurons in each cluster, as well as the cluster averages of the response profiles (Figure 5 - figure supplement 3). This, we believe, convincingly illustrates the close correspondence between neural activity in lesioned and non-lesioned mice across different clusters. All our existing and new figures indicate the response magnitude either on the figures’ y-axis or via scale/color bars.

      Changes to manuscript.

      Line 254: “Furthermore, there was a close correspondence between the cluster averages of lesioned and non-lesioned mice (Figure 5 – figure supplement 3).”

      Furthermore, we’ve now included a video of the imaging data which, we believe, gives the reader a much better handle on the data quality than further example response profiles would.

      Changes to manuscript.

      Line 197: ”...using two-photon microscopy (Figure 4B, Video 1).”

      Suggestion for public comment #3: In absence of laborious and costly follow-up experiments to boost the sample size of partial and complete lesion groups, it may be more prudent to simply tone down the claims that lesion size differentially impacts decoding accuracy. The results of this analysis are not necessary for your main claims.

      Our new results on the proportions of ‘task-modulated’ neurons (Figure 6 - figure supplement 3) across different experimental groups show that there is no difference between non-lesioned and lesioned mice as a whole, but mice with partial lesions have a smaller proportion of taskmodulated neurons than the other two groups. While this corroborates the results of the decoding analysis, we certainly agree that the small sample size is a caveat that needs to be acknowledged.

      Changes to manuscript.

      Line 477: ”Some differences were observed for mice with only partial lesions of the auditory cortex.

      Those mice had a lower proportion of neurons with distinct response magnitudes in hit and miss trials than mice with (near-)complete lesions. Furthermore, trial outcomes could be read out with lower accuracy from these mice. While this finding is somewhat counterintuitive and is based on only three mice with partial lesions, it has been observed before that smaller lesions…”

      A few more suggestions unrelated to public review:

      Figure 1: This is somewhat of an oddball in this manuscript, and its inclusion is not necessary for the main point. Indeed, the major conclusion of Fig 1 is that acute silencing of auditory cortex impairs task performance, and thus optogenetic methods are not suitable to test your hypothesis. However, this conclusion is also easily supported from decades of prior work, and thus citations might suffice.

      We do not agree that these data can easily be substituted with citations of prior published work. While previous studies (Talwar et al., 2001, Li et al., 2017) have demonstrated the impact of acute pharmacological silencing on sound detection in rodents, pharmacological and optogenetic silencing are not equivalent. Furthermore, we are aware of only one published study (Kato et al., 2015) that investigated the impact of optogenetically perturbing auditory cortex on sound detection (others have investigated its impact on discrimination tasks). Kato et al. (2015) examined the effect of acute optogenetic silencing of auditory cortex on the ability of mice to detect the offsets of very long (5-9 seconds) sounds, which is not easily comparable to the click detection task employed by us. Furthermore, when presenting our work at a recent meeting and leaving out the optogenetics results due to time constraints, audience members immediately enquired whether we had tried an optogenetic manipulation instead of lesions. Therefore, we believe that these data represent a valuable piece of information that will be appreciated by many readers and have decided not to remove them from the manuscript.

      A worst case scenario is that Figure 1 will detract from the reader's assessment of experimental rigor. The data of 1C are pooled from multiple sessions in three mice. It is not clear if the signed-rank test compares performance across n = 3 mice or n = 13 sessions. If the latter, a stats nitpicker could argue that the significance might not hold up with a nested analysis considering that some datapoints are not independent of one another. Finally, the experiment does not include a control group, gad2-cre mice injected with a EYFP virus. So as presented, the data are equally compatible with the pessimistic conclusion that shining light into the brain impairs mice's licking. My suggestion is to simply remove Figure 1 from the paper. Starting off with Figure 3 would be stronger, as the rest of the study hinges upon the knowledge that control and lesion mice's behavior is similar.

      Instead of reporting the results session-wise and doing stats on the d’ values, we now report results per mouse and perform stats on the proportions of hits and false alarms separately for each mouse. The results are statistically significant for each mouse and suggest that the differences in d’ are primarily caused by higher false alarm rates during the optogenetic perturbation than in the control condition.

      Changes to manuscript.

      New Figure 1.

      We agree that including control mice not expressing ChR2 would be important for fully characterizing the optogenetic manipulation and that the lack of this control group should be acknowledged. However, in the context of this study, the outcome of performing this additional experiment would be inconsequential. We originally considered using an optogenetic approach to explore the contribution of cortical activity to IC responses, but found that this altered the animals’ sound detection behavior. Whether that change in behavior is due to activation of the opsin or simply due to light being shone on the brain has no bearing on the conclusion that this type of manipulation is unsuitable for determining whether auditory cortex is required for the choice-related activity that we recorded in the IC.

      Changes to manuscript.

      Line 106: ”Although a control group in which the auditory cortex was injected with an EYFP virus lacking ChR2 would be required to confirm that the altered behavior results from an opsindependent perturbation of cortical activity, this result shows that this manipulation is also unsuitable… ”

      Figure 2, comment #1: The micrograph of panel B shows the densest fluorescence in the central IC. You interpret this as evidence of retrograde labeling of central IC neurons that project to the shell IC. This is a nice finding, but perhaps a more relevant micrograph would be to show the actual injection site in the shell layers. The rest of Figure 2 documents the non-auditory cortical sources of forebrain feedback. Since non-auditory cortical neurons may or may not target distinct shell IC sub-circuits, it's important to know where the retrograde virus was injected. Stylistic comment: The flow of the panels is somewhat unorthodox. Panel A and B follow horizontally, then C and D follow vertically, followed by E-H in a separate column. Consider sequencing either horizontally or vertically to maximize the reader's experience.

      Figure 2, comment # 2: It would also be useful to show more rostral sections from these mice, perhaps as a figure supplement, if you have the data. I think there is a lot of value here given a recent paper (Olthof et al., 2019 Jneuro) arguing that the IC receives corticofugal input from areas more rostral to the auditory cortex. So it would be beneficial for the field to know if these other cortical sources do or do not represent likely candidates for behavioral modulation in absence of auditory cortex.

      Figure 2, comment #3: You have a striking cluster of retrogradely labeled PPC neurons, and I'm not sure PPC has been consistently reported as targeting the IC. It would be good to confirm that this is a "true" IC projection as opposed to viral leakage into the SC. Indeed, Figure 2, supplement 2 also shows some visual cortex neurons that are retrogradely labeled. This has bearing on the interpretations, because choice-related activity is rampant in PPC, and thus could be a potential source of the task relevant activity that persists in your recordings. This could be addressed as the point above, by showing the SC sections from these same mice.

      All IC injections were made under visual guidance with the surface of the IC and adjacent brain areas fully exposed after removal of the imaging window. Targeting the IC and steering clear of surrounding structures, including the SC, was therefore relatively straightforward.

      We typically observed strong retrograde labeling in the central nucleus after viral injections into the dorsal IC and, given the moderate injection volume (~50 nL at each of up to three sites), it was also typical to see spatially fairly confined labeling at the injection sites. For the mouse shown in Figure 2, we do not have further images of the IC. This was one of the earliest mice to be included in the study and we did not have access to an automatic slide scanner at the time. We had to acquire confocal images in a ‘manual’ and very time-consuming manner and therefore did not take further IC images for this mouse. We have now included, however, a set of images spanning the whole IC and the adjacent SC sections for the mouse for which we already show sections in Figure 2 - figure supplement 2. These were added as Figure 2 - figure supplement 3A to the manuscript. These images show that the injections were located in the caudal half of the IC and that there was no spillover into the SC - close inspection of those sections did not reveal any labeled cell bodies in the SC. Furthermore, we include as Figure 2 - figure supplement 3B a dozen additional rostral cortical sections of the same mouse illustrating corticocollicular neurons in regions spanning visual, parietal, somatosensory and motor cortex. Given the inclusion of the IC micrographs in the new supplementary figure, we removed panel B from Figure 2. This should also make it easier for the reader to follow the sequencing of the remaining panels.

      Changes to manuscript.

      New Figure 2 - figure supplement 3.

      Line 159: “After the experiments, we injected a retrogradely-transported viral tracer (rAAV2-retrotdTomato) into the right IC to determine whether any corticocollicular neurons remained after the auditory cortex lesions (Figure 2, Figure 2 – figure supplement 2, Figure 2 – figure supplement 3). The presence of retrogradely-labeled corticocollicular neurons in non-temporal cortical areas (Figure 2) was not the result of viral leakage from the dorsal IC injection sites into the superior colliculus (Figure 2 – figure supplement 3).”

      Line 495: “...projections to the IC, such as those originating from somatosensory cortical areas (Lohse et al., 2021; Lesicko et al., 2016) and parietal cortex may have contributed to the response profiles that we observed.

      Figure 5 (see also public review point #2): I am not convinced that this unsupervised method yields particularly meaningful clusters; a grain of salt should be provided to the reader. For example, Clusters 2, 5, 6, and 7 contain neurons that pretty clearly respond with either short latency excitation or inhibition following the click sound on Hits. I would argue that neurons with such diametrically opposite responses should not be "classified" together. You can see the same issue in some of Namboodiri/Stuber's clustering (their Figure 1). It might be useful to make it clear to the reader that these clusters can reflect idiosyncrasies of the algorithm, the behavior task structure, or both.

      We agree.

      Changes to manuscript.

      Line 666: “While clustering is a useful approach for organizing and visualizing the activity of large and heterogeneous populations of neurons, we need to be mindful that, given continuous distributions of response properties, the locations of cluster boundaries can be somewhat arbitrary and/or reflect idiosyncrasies of the chosen method and thus vary from one algorithm to another. We employed an approach very similar to that described in Namboodiri et al. (2019) because it is thought to produce stable results in high-dimensional neural data (Hirokawa et al. 2019).”

      Methods:

      How was a "false alarm" defined? Is it any lick happening during the entire catch trial, or only during the time period corresponding to the response window on stimulus trials?

      The response window was identical for catch and stimulus trials and a false alarm was defined as licking during the response window of a catch trial.

      Changes to manuscript.

      Line 598: “During catch trials, neither licking (‘false alarm’) during the 1.5-second response window …”

      L597 and so forth: What's the denominator in the conversion from the raw fluorescence traces into DF/F? Did you take the median or mode fluorescence across a chunk of time? Baseline subtract average fluorescence prior to click onset? Similarly, please provide some more clarification as to how neuropil subtraction was achieved. This information will help us understand how the classifier can decode trial outcome from data prior to sound onset.

      Signal processing did not involve the subtraction of a pre-stimulus period.

      Changes to manuscript.

      Line 629: ”Neuropil extraction was performed using default suite2p parameters (https://suite2p.readthedocs.io/en/latest/settings.html), neuropil correction was done using a coefficient of 0.7, and calcium ΔF/F signals were obtained by using the median over the entire fluorescence trace as F0. To remove slow fluctuations in the signal, a baseline of each neuron’s entire trace was calculated by Gaussian filtering in addition to minimum and maximum filtering using default suite2p parameters. This baseline was then subtracted from the signal.”

      Was the experimenter blinded to the treatment group during the behavior experiments? If not, were there issues that precluded blinding (limited staffing owing to lab capacity restrictions during the pandemic)? This is important to clarify for the sake of rigor and reproducibility.

      Changes to manuscript.

      Line 574: “The experimenters were not blinded to the treatment group, i.e. lesioned or non-lesioned, but they were blind to the lesion size both during the behavior experiments and most of the data processing.”

      Minor:

      L127-128: "In order to test...lesioned the auditory cortex bilaterally in 7 out of 16 animals". I would clarify this by changing the word animals to "mice" and 7 out of 16 by stating n = 9 and n = 7 are control and lesion groups, respectively.

      Agreed.

      Changes to manuscript.

      Line 129: “...compared the performance of mice with bilateral lesions of the auditory cortex (n = 7) with non-lesioned controls (n = 9)”

      L225-226: You rule out self-generated sounds as a likely source of behavioral modulation by citing Nate Sawtell's paper in the DCN. However, Stephen David's lab suggested that in marmosets, post sound activity in central IC may in fact reflect self-generated sounds during licking. I suggest addressing this with a nod to SVD's work (Singla et al., 2017; but see Shaheen et al., 2021).

      Agreed.

      Changes to manuscript.

      Line 243: “(Singla et al., 2017; but see Shaheen et al., 2021)”

      Line 238 - 239: You state that proportions only deviate greater than 10% for one of the four statistically significant clusters. Something must be unclear here because I don't understand: The delta between the groups in the significant clusters of Fig 5C is (from left to right) 20%, 20%, 38%, and 12%. Please clarify.

      Our wording was meant to convey that a deviation “from a 50/50 split” of 10% means that each side deviates from 50 by 10% resulting in a 40/60 (or 60/40) split. We agree that that has the potential to confuse readers and is not as clear as it could be and have therefore dropped the ambiguous wording.

      Changes to manuscript.

      Line 253: ”,..the difference between the groups was greater than 20% for only one of them.”

      L445: I looked at the cited Allen experiment; I'd be cautious with the interpretation here. A monosynaptic IC->striatum projection is news to me. I think Allen Institute used an AAV1-EGFP virus for these experiments, no? As you know, AAV1 is quite transsynaptic. The labeled fibers in striatum of that experiment may reflect disynaptic labeling of MGB neurons (which do project to striatum).

      Agreed. We deleted the reference to this Allen experiment.

      L650: Please define "network activity". Is this the fluorescence value for each ROI on each frame of each trial? Averaged fluorescence of each ROI per frame? Total frame fluorescence including neuropil? Depending on who you ask, each of these measures provides some meaningful readout of network activity, so clarification would be useful.

      Changes to manuscript.

      Line 707: “Logistic regression models were trained on the network activity of each session, i.e., the ΔF/F values of all ROIs in each session, to classify hit vs miss trials. This was done on a frame-by-frame basis, meaning that each time point (frame) of each session was trained separately.

      Figure 3 narrative or legend: Listing the F values for the anova would be useful. There is pretty clearly a main effect of training session for hits, but what about for the false alarms? That information is important to solidify the result, and would help more specialized readers interpret the d-prime plot in this figure.

      Agreed. There were significant main effects of training day for both hit rates and false alarm rates (as well as d’).

      Changes to manuscript.

      Line 165: “The ability of the mice to learn and perform the click detection task was evident in increasing hit rates and decreasing false alarm rates across training days (Figure 3A, p < 0.01, mixed-design ANOVAs).”

      In summary, thank you for undertaking this work. Your conclusions are provocative, and thus will likely influence the field's direction for years to come.

      Thank you for those kind words and valuable and constructive feedback, which has certainly improved the manuscript.

      Reviewer #2 (Recommendations For The Authors):

      MAJOR CONCERNS

      (1) (Fig. 5) What fraction of individual neurons actually encode task-related information in each animal group? How many neurons respond to sound? The clustering and decoding analyses are interesting, but they obscure these simple questions, which get more directly at the main questions of the study. Suggested approach: For a direct comparison of AC-lesioned and -non-lesioned animals, why not simply compare the mean difference between PSTH response for each neuron individually? To test for trial outcome effects, compare Hit and Miss trials (same stimulus, different behavior) and for sound response effects, compare Hit and False alarm trials (same behavior, different response). How do you align for time in the latter case when there's no stimulus? Align to the first lick event. The authors should include this analysis or explain why their approach of jumping right to analysis of clusters is justified.

      We have now calculated the fraction of neurons that encode trial outcome by comparing hit and miss trial activity. That fraction does not differ between non-lesioned animals and lesioned animals as a whole, but is significantly smaller in mice with partial lesions. The author’s suggestion of comparing hit and false alarm trial activity to assess sound responsiveness is problematic because hit trials involve reward delivery and consumption. Consequently, they are behaviorally very different from false alarm trials (not least because hit trials tend to contain much more licking). Therefore, we calculated the fraction of neurons that respond to the acoustic stimulus by comparing activity before and after stimulus onset in miss trials. We found no significant difference between the non-lesioned and lesioned mice or between subgroups.

      We have addressed these points with the following changes to the manuscript:

      Line 217: “Indeed, close to half (1272 / 2649) of all neurons showed a statistically significant difference in response magnitude between hit and miss trials, while only a small fraction (97 / 2649) exhibited a significant response to the sound.”

      Line 307: “Although the proportion of individual neurons with distinct response magnitudes in hit and miss trials in lesioned mice did not differ from that in non-lesioned mice, it was significantly lower when separating out mice with partial lesions (Figure 6 – figure supplement 3).”

      Line 648: “Analysis of task-modulated and sound-driven neurons. To identify individual neurons that produced significantly different response magnitudes in hit and miss trials, we calculated the mean activity for each stimulus trial by taking the mean activity over the 5 seconds following stimulus presentation and subtracting the mean activity over the 2 seconds preceding the stimulus during that same trial. A Mann-Whitney U test was then performed to assess whether a neuron showed a statistically significant difference (Benjamini-Hochberg adjusted p-value of 0.05) in response magnitude between hit and miss trials. The analysis was performed using equal numbers of hit and miss trials at each sound level to ensure balanced sound level distributions. If, for a given sound level, there were more hit than miss trials, we randomly selected a sample of hit trials (without substitution) to match the sample size for the miss trials and vice versa. Sounddriven neurons were identified by comparing the mean miss trial activity before and after stimulus presentation. Specifically, we performed a Mann-Whitney U test to assess whether there was a statistically significant difference (Benjamini-Hochberg adjusted p-value of 0.05) between the mean activity over the 2 seconds preceding the stimulus and the mean activity over the 1 second period following stimulus presentation.”

      Some more specific concerns about focusing only on cluster-level and population decoding analysis are included below.

      (2) (L 234) "larger field of view". Do task-related or lesion-dependent effects depend on the subregion of IC imaged? Some anatomists would argue that the IC shell is not a uniform structure, and concomitantly, task-related effects may differ between fields. Did coverage of IC subregions differ between experimental groups? Is there any difference in task related effects between subregions of IC? Or maybe all this work was carried out only in the dorsal area? The differences between lesioned and non-lesioned animals are relatively small, so this may not have a huge impact, but a more nuanced discussion that accounts for observed or potential (if not tested) differences between regions of the IC.

      The specific subregion coverage could also impact the decoding analysis (Fig 6), and if possible it might be worth considering an interaction between field of view and lesion size on decoding.

      Each day we chose a new imaging location to avoid recording the same neurons more than once and aimed to sample widely across the optically accessible surface of the IC. We typically stopped the experiment only when there were no more new areas to record from. In terms of the depth of the imaged neurons, we were limited by the fact that corticorecipient neurons become sparser with depth and that the signal available from the GCaMP6f labeling of the Ai95 mice becomes rapidly weaker with increasing distance from the surface. This meant that we recorded no deeper than 150 µm from the surface of the IC. Consequently, while there may have been some variability in the average rostrocaudal and mediolateral positioning of imaging locations from animal to animal due to differences between mice in how much of the IC surface was visible, cranial window positioning, and in neuronal labeling etc, our dataset is anatomically uniform in that all recorded neurons receive input from the auditory cortex and are located within 150 µm of the surface of the IC. Therefore, we think it highly unlikely that small sampling differences across animals could have a meaningful impact on the results.

      Given that there is no consensus as to where the border between the dorsal and external/lateral cortices of the IC is located and that it is typically difficult to find reliable anatomical reference points (the location of the borders between the IC and surrounding structures is not always obvious during imaging, i.e. a transition from a labeled area to a dark area near the edge of the cranial window could indicate a border with another structure, but also the IC surface sloping away from the window or simply an unlabeled area within the IC), we made no attempt to assign our recordings from corticorecipient neurons to specific subdivisions of the IC.

      Changes to manuscript.

      Line 195: “We then proceeded to record the activity of corticorecipient neurons within about 150 µm of the dorsal surface of the IC using two-photon microscopy (Figure 4B, Video 1).”

      Line 375: “We imaged across the optically accessible dorsal surface of the IC down to a depth of about 150 µm below the surface. Consequently, the neurons we recorded were located predominantly in the dorsal cortex. However, identifying the borders between different subdivisions of the IC is not straightforward and we cannot rule out the possibility that some were located in the lateral cortex.”

      (3) (L 482-483) "auditory cortex is not required for the task-related activity recording in IC neurons of mice performing a sound detection task". Most places in the text are clearer, but this statement is confusing. Yes, animals with lesions can have a "normal"-looking IC, but does that mean that AC does not strongly modulate IC during this behavior in normal animals? The authors have shown convincingly that subcortical areas can both shape behavior and modulate IC normally, but AC may still be required for IC modulation in non-lesioned animals. Given the complexity of this system, the authors should make sure they summarize their results consistently and clearly throughout the manuscript.

      The reviewer raises an important point. What we have shown is that corticorecipient dorsal IC neurons in mice without auditory cortex show neural activity during a sound detection task that is largely indistinguishable from the activity of mice with an intact auditory cortex. In lesioned mice, the auditory cortex is thus not required. Whether the IC activity of the non-lesioned group can be shaped by input from the auditory cortex in a meaningful way in other contexts, such as during learning, is a question that our data cannot answer.

      Changes to manuscript.

      Line 508: "While modulation of IC activity by this descending projection has been implicated in various functions, most notably in the plasticity of auditory processing, we have shown in mice performing a sound detection task that IC neurons show task-related activity in the absence of auditory cortical input."

      LESSER CONCERNS

      (L. 106-107) "Optogenetic suppression of cortical activity is thus also unsuitable..." It appears that behavior is not completely abolished by the suppression. One could also imagine using a lower dose of muscimol for partial inactivation of AC feedback. When some behavior persists, it does seem possible to measure task-related changes in the IC. This may not be necessary for the current study, but the authors should consider how these transient methods could be applied usefully in the Discussion. What about inactivation of cortical terminals in the IC? Is that feasible?

      Our argument is not that acute manipulations are unsuitable because they completely abolish the behavior, but because they significantly alter the behavior. Although it would not be trivial to precisely measure the extent of pharmacological cortical silencing in behaving mice that have been fitted with a midbrain window, it should be possible to titrate the size of a muscimol injection to achieve partial silencing of the auditory cortex that does not fully abolish the ability to detect sounds. However, such an outcome would likely render the data uninterpretable. If no effect on IC activity was observed, it would not be possible to conclude whether this was due to the fact that the auditory cortex was only partially silenced or that projections from the auditory cortex have no influence on the recorded IC activity. Similarly, if IC activity was altered, it would not be possible to say whether this was due to altered descending modulation resulting from the (partially) silenced auditory cortex or to the change in behavior, which would likely be reflected in the choice-related activity measured in the IC.

      Silencing of corticocollicular axons in the IC is potentially a more promising approach and we did devote a considerable amount of time and effort to establishing a method that would allow us to simultaneously image IC neurons while silencing corticocollicular axons, trying both eNpHR3.0 and Jaws with different viral labeling approaches and mouse lines. However, we ultimately abandoned those attempts because we were not convinced that we had achieved sufficient silencing or that we would be able to convincingly verify this. Furthermore, axonal silencing comes with its own pitfalls and the interpretation of its consequences is not straightforward. Given that our discussion already contains a section (line 421) on axonal silencing, we do not feel there would be any benefit in adding to that.

      (Figure 1). Can the authors break down the performance for FA and HR, as they do in Fig. 3? It would be helpful to know what aspect of behavior is impaired by the transient inactivation.

      Good point. Figure 1 has been updated to show the results separately for hit rates, false alarms and d’. The new figure indicates that the change in d’ is primarily a consequence of altered false alarm rates. Please also see our response to a related comment by reviewer #1.

      Changes to manuscript.

      New figure 1.

      (Figure 4 legend). Minor: Please clarify, what is time 0 in panel C? Time of click presentation?

      Yes, that is correct.

      Changes to manuscript.

      Line 209: ”Vertical line at time 0 s indicates time of click presentation.”

      (L. 228-229). There has been a report of lick and other motor related activity in the IC - e.g., see Shaheen, Slee et al. (J Neurosci 2021), the timing of which suggests that some of it may be acoustically driven.

      Thanks for pointing this out. Shaheen et al., 2021 should certainly have been cited by us in this context as well as in other parts of the manuscript.

      Changes to manuscript.

      Line 243: “(Singla et al., 2017; but see Shaheen et al., 2021)”

      Also, have the authors considered measuring a peri-lick response? The difference between hit and miss trials could be perceptual or it could reflect differences in motor activity. This may be hard to tease apart, but, for example, one can test whether activity is stronger on trials with many licks vs. few licks?

      (L. 261) "Behavior can be decoded..." similar or alternative to the previous question of evoked activity, can you decode lick events from the population activity?

      The difference between hit and miss trial activity almost certainly partially reflects motor activity associated with licking. This was stated in the Discussion, but to make that point more explicitly, we now include a plot of average false alarm trial activity, i.e. trials without sound (catch trials) in which animals licked (but did not receive a reward).

      Given a sufficient number of catch trials, it should be possible to decode false alarm and correct rejection trials. However, our experiment was not designed with that in mind and contains a much smaller number of catch trials than stimulus trials (approximately one tenth the number of stimulus trials), so we have not attempted this.

      Changes to manuscript.

      New Figure 4 - figure supplement 1.

      (L. 315) "Pre-stimulus activity..." Given reports of changes in activity related to pupil-indexed arousal in the auditory system, do the authors by any chance have information about pupil size in these datasets?

      Given that all recordings were performed in the dark, fluctuations in pupil diameter were relatively small. Therefore, we have not made any attempt to relate pupil diameter to any of the variables assessed in this manuscript.

      (L. 412) "abolishes sound detection". While not exactly the same task, the authors might comment on Gimenez et al (J Neurophys 2015) which argued that temporary or permanent lesioning of AC did not impair tone discrimination. More generally, there seems to be some disagreement about what effects AC lesions have on auditory behavior.

      Thank you for this suggestion. Gimenez et al. (2015) investigated the ability of freely moving rats to discriminate sounds (and, in addition, how they adapt to changes in the discrimination boundary). Broadly consistent with later reports by Ceballo et al. (2019) (mild impairment) and O’Sullivan et al. (2019) (no impairment), Gimenez et al. (2015) reported that discrimination performance is mildly impaired after lesioning auditory cortex. Where the results of Gimenez et al. (2015) stand out is in the comparatively mild impairments that were seen in their task when they used muscimol injections, which contrast with the (much) larger impairments reported by others (e.g. Talwar et al., 2001; Li et al., 2017; Jaramillo and Zador, 2014).

      Changes to manuscript.

      Line 433: ”However, transient pharmacological silencing of the auditory cortex in freely moving rats (Talwar et al., 2001), as well as head-fixed mice (Li et al., 2017), completely abolishes sound detection (but see Gimenez et al., 2015).”

      (L. 649) "... were generally separable" Is the claim here that the clusters are really distinct from each other? This is unexpected, and it might be helpful if the authors could show this result in a figure.

      The half-sentence that this comment refers to has been removed from the methods section. Please also see a related comment by reviewer #1 which prompted us to add the following to the methods section.

      Changes to manuscript.

      Line 666: “While clustering is a useful approach for organizing and visualizing the activity of large and heterogeneous populations of neurons we need to be mindful that, given continuous distributions of response properties, the locations of cluster boundaries can be somewhat arbitrary and/or reflect idiosyncrasies of the chosen method and thus vary from one algorithm to another. We employed an approach very similar to that described in Namboodiri et al. (2019) because it is thought to produce stable results in high-dimensional neural data (Hirokawa et al. 2019).”

      Reviewer #3 (Recommendations For The Authors):

      (1) The authors must absolutely clarify if the hit versus misses decoding and clustering analysis is done for a single sound level or for multiple sound levels (what is the fraction of trials for each sound leve?). If the authors did it for multiple sound levels they should redo all analyses sound-level by sound-level, or for a single sound level if there is one that dominates. No doubt that there is information about the trial outcome in IC, but it should not be over-estimated by a confound with stimulus information.

      This is an important point. The original clustering analysis was carried out across different sound levels. We have now carried out additional analysis for distinguishing between two alternative explanations of the data, which were also raised by reviewer #1. – that the difference in neural activity between hit and miss trials could reflect a) the animals’ behavior or b) relatively more hit trials at higher sound levels, which would be expected to produce stronger responses. If the data favored b), we would expect no difference in activity between hit and miss trials when plotted separately for different sound levels. The new figure 4 - figure supplement 1 indicates that that is not the case. Hit and miss trial activity are clearly distinct even when plotted separately for different sound levels, confirming that this difference in activity reflects the animals’ behavior rather than sensory information.

      We made the following changes to manuscript.

      Line 214: “While averaging across all neurons cannot capture the diversity of responses, the averaged response profiles suggest that it is mostly trial outcome rather than the acoustic stimulus and neuronal sensitivity to sound level that shapes those responses (Figure 4 – figure supplement 1).”

      Differences in the distributions of sound levels in the different trial types could also potentially confound the decoding into hit and miss trials. Our analysis actually aimed to take this into account but, unfortunately, we failed to include sufficient details in the methods section.

      Changes to manuscript.

      Line 710: “Rather than including all the trials in a given session, only trials of intermediate difficulty were used for the decoding analysis. More specifically, we only included trials across five sound levels, comprising the lowest sound level that exceeded a d’ of 1.5 plus the two sound levels below and above that level. That ensured that differences in sound level distributions would be small, while still giving us a sufficient number of trials to perform the decoding analysis.“

      In this context, it is worth bearing in mind that a) the decoding analysis was done on a frame-byframe basis, meaning that the decoding score achieved early in the trial has no impact on the decoding score at later time points in the trial, b) sound-driven activity predominantly occurs immediately after stimulus onset and is largely over about 1 s into the trial (see cluster 3, for instance, or average miss trial activity in figure 4 - figure supplement 1), c) decoding performance of the behavioral outcome starts to plateau 500-1000 ms into the trial and remains high until it very gradually begins to decline after about 2 s into the trial. In other words, decoding performance remains high far longer than the stimulus would be expected to have an impact on the neurons’ activity. Therefore, we would expect any residual bias due to differences in the sound level distribution that our approach did not control for to be restricted to the very beginning of the trial and not to meaningfully impact the conclusions derived from the decoding analysis.

      Furthermore, we carried out an additional decoding analysis for one imaging session in which we had a sufficient number of trials to perform the analysis not only over the five (59, 62, 65, 68, 71 dB SPL) original sound levels, but also over a reduced range of three (62, 65, 68 dB SPL) sound levels, as well as a single (65 dB SPL) sound level (Figure 6 - figure supplement 1). The mean sound level difference between the hit trial distributions and miss trial distributions for these three conditions were 3.08, 1.01 and 0 dB, respectively. This analysis suggests that decoding performance is not meaningfully impacted by changing the range of sound levels (and sound level distributions) other than that including fewer sound levels means fewer trials and thus noisier decoding.

      Changes to manuscript.

      Line 287: ”...and was not meaningfully affected by differences in sound level distributions between hit and miss trials (Figure 6 – figure supplement 1).”

      Finally, in order to supplement the decoding analysis, we determined for each individual neuron whether there was a significant difference between the average hit and average miss trial activity. Note that this was done using equal numbers of hit and miss trials at each sound level to ensure balanced sound level distributions and to rule out any potential confound of sound level. This revealed that the proportion of neurons containing “information about trial outcome” was generally very high, close to 50% on average, and not significantly different between lesioned and non-lesioned mice.

      Changes to manuscript.

      Line 307: “Although the proportion of individual neurons with distinct response magnitudes in hit and miss trials in lesioned mice did not differ from that in non-lesioned mice, it was significantly lower when separating out mice with partial lesions (Figure 6 – figure supplement 3).”

      Line 648: “Analysis of task-modulated and sound-driven neurons. To identify individual neurons that produced significantly different response magnitudes in hit and miss trials, we calculated the mean activity for each stimulus trial by taking the mean activity over the 5 seconds following stimulus presentation and subtracting the mean activity over the 2 seconds preceding the stimulus during that same trial. A Mann-Whitney U test was then performed to assess whether a neuron showed a statistically significant difference (Benjamini-Hochberg adjusted p-value of 0.05) in response magnitude between hit and miss trials. The analysis was performed using equal numbers of hit and miss trials at each sound level to ensure balanced sound level distributions. If, for a given sound level, there were more hit than miss trials we randomly selected a sample of hit trials (without substitution) to match the sample size for the miss trials and vice versa. ”

      (2) I have the feeling that the authors do not exploit fully the functional data recorded with two-imaging. They identify several cluster but do not describe their functional differences. For example, cluster 3 is obviously mainly sensory driven as it is not modulated by outcome. This could be mentioned. This could also be used to rule out that trial outcome is the results of insufficient sensory inputs. Could this cluster be used to predict trial outcome at the onset response? Could it be used to predict the presence of the sound, and with which accuracy. The authors discuss a bit the different cluster type, but in a very elusive manner. I recognize that one should be careful with the use of signal analysis methods in calcium imaging but a simple linear deconvolution of the calcium dynamic who help to illustrate the conclusions that the authors propose based on peak responses. It would also be very interesting to align the clusters responses (deconvolved) to the timing of licking and rewards event to check if some clusters do not fire when mice perform licks before the sound comes. It would help clarify if the behavioral signals described here require both the presence of the sound and the behavioral action or are just the reflection of the motor command. As noted by the authors, some clusters have late peak responses (2 and 5). However, 2 and 5 are not equivalent and a deconvolution would evidence that much better. 2 has late onset firing. 5 has early onset but prolonged firing.

      We agree with the reviewer’s statement that “cluster 3 is obviously mainly sensory driven”. In the Discussion we refer to cluster 3 as having a “largely behaviorally invariant response profile to the auditory stimulus” (line X), which is consistent with the statement of the reviewer. With regard to the reviewer’s suggestion to describe the “functional differences” between the clusters, we would like to refer to the subsequent three sentences of the same paragraph in which we speculate on the cognitive and behavioral variables that may underlie the response profiles of different clusters. Given the limitations imposed by the task structure, we do not think it is justified to expand on this.

      We have added an additional analysis in order to explicitly address the question of which neurons are sound responsive (please also see response to point 3 below and to point 1 of reviewer #2). That trial outcome could be predicted on the basis of only the sound-responsive neurons’ activity during the initial period of the trial (“predict trial outcome at the onset response”) is unlikely given their small number (only 97 of 2649 neurons show a statistically significant sound-evoked response) and given that only a minority (42/98) of those sound-driven neurons are also modulated by trial outcome within that initial trial period (i.e. 0-1s after stimulus onset; data not shown).

      Changes to manuscript.

      Line 219: “..., while only a small fraction (97 / 2649) exhibited a significant response to the sound.”

      Line 658: “Sound-driven neurons were identified by comparing the mean miss trial activity before and after stimulus presentation. Specifically, we performed a Mann-Whitney U test to assess whether there was a statistically significant difference (Benjamini-Hochberg adjusted p-value of 0.05) between the mean activity over the 2 seconds preceding the stimulus and the mean activity over the 1 second period following stimulus presentation. This analysis was performed using miss trials with click intensities from 53 dB SPL to 65 dB SPL (many sessions contained very few or no miss trials at higher sound levels).”

      While calcium traces represent an indirect measure of neural activity, deconvolution does not necessarily provide an accurate picture of the spiking underlying those traces and has the potential to introduce additional problems. For instance, deconvolution algorithms tend to perform poorly at inferring the spiking of inhibited neurons (Vanwalleghem et al., 2021). Given that suppression is such a prominent feature of IC activity and is evident both in our calcium data as well as in the electrophysiology data of others (Franceschi and Barkat, 2021), we decided against using deconvolved spikes in our analyses. See also the side-by-side comparison below of the hit and miss trial activity of one example neuron based on either the calcium trace (left) or deconvolved spikes (right) (extracted using the OASIS algorithm (Friedrich et al., 2017) incorporated into suite2p (Pachitariu et al., 2016).

      Author response image 1.

      (3) Along the same line, the very small proportion of really sensory driven neurons (cluster 3) is not discussed. Is it what on would expect in typical shell or core IC neurons?

      As requested by reviewer #2 and mentioned in response to the previous point, we have now quantified the number of neurons in the dataset that produced significant responses to sound (97 / 2649). For a given imaging area, the fraction of neurons that show a statistically significant change in neural activity following presentation of a click of between 53 dB SPL and 65 dB SPL rarely exceeded ten percent. While that number is low, it is not necessarily surprising given the moderate intensity and very short duration of the stimuli. For comparison: Using the same transgenics, labeling approach and imaging setup and presenting 200-ms long pure tones at 60 dB SPL with frequencies between 2 kHz and 64 kHz, we typically find that between a quarter and a third of neurons in a given imaging area exhibit a statistically significant response (data not shown).

      Changes to manuscript.

      Line 219: “..., while only a small fraction (97 / 2649) exhibited a significant response to the sound.”

      Line 658: “Sound-driven neurons were identified by comparing the mean miss trial activity before and after stimulus presentation. Specifically, we performed a Mann-Whitney U test to assess whether there was a statistically significant difference (Benjamini-Hochberg adjusted p-value of 0.05) between the mean activity over the 2 seconds preceding the stimulus and the mean activity over the 1 second period following stimulus presentation. This analysis was performed using miss trials with click intensities from 53 dB SPL to 65 dB SPL (many sessions contained very few or no miss trials at higher sound levels).”

      Line 220: “While the number of sound-responsive neurons is low, it is not necessarily surprising given the moderate intensity and very short duration of the stimuli. For comparison: Using the same transgenics, labeling approach and imaging setup and presenting 200-ms long pure tones at 60 dB SPL with frequencies between 2 kHz and 64 kHz, we typically find that between a quarter and a third of neurons in a given imaging area exhibit a statistically significant response (data not shown).”

      (4) In the discussion, the interpretation of different transient and permanent cortical inactivation experiment is very interesting and well balanced given the complexity of the issue. There is nevertheless a comment that is difficult to follow. The authors state:

      If cortical lesioning results in a greater weight being placed on the activity in spared subcortical circuits for perceptual judgements, we would expect the accuracy with which trial-by-trial outcomes could be read out from IC neurons to be greater in mice without auditory cortex. However, that was not the case.

      However, there is no indication that the activity they observe in shell IC is causal to the behavioral decision and likely it is not. There is also no indication that the behavioral signals seen by the authors reflect the weight put on the subcortical pathway for behavior. I find this argument handwavy and would remove it.

      While we are happy to amend this section, we would not wish to remove it because a) we believe that the point we are trying to make here is an important and reasonable one and b) because it is consistent with the reviewer’s comment. Hopefully, the following will make this clearer: In order for the mouse to make a perceptual judgment and act upon it - in the context of our task, hearing a sound and then licking a spout - auditory information needs to be read out and converted into a motor command. If the auditory cortex normally plays a key role in such perceptual judgments, cortical lesions would require the animal to base its decisions on the information available from the remaining auditory structures, potentially including the auditory midbrain. This might result in a greater correspondence between the mouse’s behavior and the neural activity in those structures. That we did not observe this outcome for the IC could mean that the auditory cortex did not contribute to the relevant perceptual judgments (sound detection) in the first place. Therefore, no reweighting of signals from the other structures is necessary. Alternatively, greater weight might be placed exclusively on structures other than the auditory midbrain, e.g. the thalamus. The latter would imply that the contribution of the IC remains the same. This includes the possibility that the IC shell does not play a causal role in the behavioral decision – in either control mice or mice with cortical lesions – as suggested by the reviewer.

      Changes to manuscript.

      Line 471: “This could imply that, following cortical lesions, greater weight is placed on structures other than the IC, with the thalamus being the most likely candidate, ..”

      (5) In Fig. 5 the two colors used in B and C are the same although they describe different categories.

      The dark green and ‘deep orange’ we used to distinguish between non-lesioned and lesioned in Figure 5C are slightly lighter than the colors used to distinguish between these two categories in other figures and therefore might be more easily confused with the blue and red in Figure 5B. This has been changed.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We have made revisions accordingly. The following is a list of the changes we have made in this revised Version of Record:

      (1) We have added three more panels to Figure 1-figure supplement 1, showing that lipopolysaccharide-induced severe lung injury also generate some ectopic tuft cells expressing both Dclk1 and Gα-gustducin, a G protein α subunit expressed in taste bud cells and many tuft cells.

      (2) We have added a new supplemental figure, Figure 2-figure supplement 1, showing the reanalysis data of the single-cell RNAseq dataset (GSE197163) indicating the numbers of Trpm5-GFP+ ectopic tuft cells expressing Tas2r108, Tas2r105, Tas2r138, Tas2r137 and other Tas2rs, respectively. And the original “Figure 2-figure supplement 1” in the previous version has been changed to “Figure 2-figure supplement 2”.

      (3) We have added another new supplemental figure, Figure 3-figure supplement 1, showing the H1N1 infection-damaged lung tissue volumes in the Gng13-cKO mice are significantly greater than those in WT or Trpm-/- mice, which is in agreement with the data of the injured lung surface areas from these three genotypes of mice (Figure 3 C and D). And the original “Figure 3-figure supplement 1” in the previous version has been changed to “Figure 3-figure supplement 2”.

      (4) We have added to the new Figure 3-figure supplement 2 two new panels: I and J, showing the reanalysis data of the single-cell RNAseq dataset (GSE197163), indicating that about 57% of Trpm5-GFP+ ectopic tuft cells express Gγ13, some of which express Alox5, a key enzyme to the biosynthesis of pro-resolving mediators.

      (5) We have added one reference on Sytox and another on Alox5.

      (6) We have corrected two labeling errors to Figure 3 G and M, and some other typos in the article. Also, we have removed “Present address” attached to some authors since no present address was needed at all.

      Attached below is our point-by-point reply to the comments and suggestions made by the reviewers. We hope that you and the reviewers will find all concerns satisfactorily addressed.

      Responses to public reviews:

      Reviewer #1:

      Li et al. report here on the expression of a G-protein subunit Gng13 in ectopic tuft cells that develop after severe pulmonary injury in mice. By deleting this gene in ectopic tuft cells as they arise, the authors observed worsened lung injury and greater inflammation after influenza infection, as well as a decrease in the overall number of ectopic tuft cells. This was in stark contrast to the deletion of Trpm5, a cation channel generally thought to be required for all functional gustatory signaling in tuft cells, where no phenotype is observed. Strengths here include a thorough assessment of lung injury via a number of different techniques. Weaknesses are notable: confusingly, these findings are at odds with reports from other groups demonstrating no obvious phenotype upon influenza infection in mice lacking the transcription factor Pou2f3, which is essential for all tuft cell specification and development. The authors speculate that heterogeneity within nascent tuft cell populations, specifically the presence of pro- and anti-inflammatory tuft cells, may explain this difference, but they do not provide any data to support this idea.

      We thank the reviewer for pointing out the strengths of this work. The phenotypes of the Gng13 conditional knockout mice upon severe pulmonary injury seem to be severer than those of Trpm5 knockout or Pou2f3 knockout mice, which we would attribute to functionally specific tuft cell subtypes. In the intestines, tuft cells are known to promote type II innate immune responses. Those ectopic pulmonary tuft cells emerge at 12 days post infection, and may not be involved in the initial immune responses to the infection, and instead, some of them may contribute to the inflammation resolution and functional recovery. Reanalysis of the previously published single tuft cell RNAseq dataset indeed showed that Gng13 is expressed in a subset of these ectopic pulmonary tuft cells, and anti-inflammatory genes such as Alox5 are also found in some of these tuft cells (please see the newly added Figure 3 supplement 2 I and J). Together, these data suggest that while some of these tuft cells may still play a pro-inflammatory role as in the intestines, some other Gγ13-expressing tuft cells contribute to the inflammation resolution, and disruption of the latter’s function results in the severer phenotypes.

      Reviewer #2:

      The study by Li et al. aimed to demonstrate the role of the Gγ13-mediated signal transduction pathway in tuft cell-driven inflammation resolution and repairing injured lung tissue. The authors showed a reduced number of tuft cells in the parenchyma of Gγ13 null lungs following viral infection. Mice with a Gγ13 null mutation showed increased lung damage and heightened macrophage infiltration when exposed to the H1N1 virus. Their further findings suggested that lung inflammation resolution, epithelial barrier, and fibrosis were worsened in Gγ13 null mutants.

      Strengths:

      The beautiful immunostaining findings do suggest that the number of tuft cells is decreased in Gr13 null mutants.

      Weaknesses:

      The description of phenotypes, and the approaches used to measure the phenotypes are problematic. Rigorous investigation of the mouse lung phenotypes is needed to draw meaningful conclusions.

      Thank the reviewer for pointing out the major findings and strengths of our work. Regarding the approaches used to measure the phenotypes, we first did double immunostaining and validated that the lipopolysaccharide-induced DCLK1+ positive cells are indeed ectopic pulmonary tuft cells with an antibody to Gα-gustducin, a commonly expressed G protein α subunit in taste buds and tuft cells. Second, in addition to the measurements of the injured lung surface areas, we determined the injured lung tissue volumes by slicing the injured lungs into a series of tissue sections, quantifying the injured areas in each section and then reconstructing the injured volumes. Third, we reanalyzed the previously published single-tuft cell RNAseq dataset and found that a subset (i.e., ~57%) of Trpm5-GFP+ tuft cells express Gng13, some of which express anti-inflammatory genes such as Alox5. These additional data further support our finding that a subset of these Gγ13-expressing ectopic tuft cells may contribute to the inflammation resolution while others may play a proinflammatory role.

      Reply to the recommendations of Reviewer #1:

      (1) A major issue with this study is the fact that Chat-Cre mediated knockout of Gng13 leads to reduced tuft cells and impaired recovery, yet global TRPM5 deletion (this study) and global Pou2f3 deletion (Barr et al.) exhibit no apparent phenotype. One can imagine a Trpm5-independent role of Gng13 in tuft cells, but it is much harder to reconcile with the fact that Pou2f3 KO mice, which lack tuft cells entirely, exhibit no apparent phenotype. This was examined in some detail in Barr et al., demonstrating no apparent change in weight loss, dysplastic expansion (Krt5+ cells), or goblet cell metaplasia. The most parsimonious explanation is that Gng13 deletion in another Chat+ cell type, probably neurons of some sort, is leading to this phenotype. The authors really need to investigate this in some detail as the data does not really support a role of tuft cells in the phenotype they observe. Better yet, identification of the other Chat+ cell type in which Gng13 deletion promotes impaired lung recovery would be very interesting. While neurons seem likely, perhaps there is another Chat+ cell type expressing Gng13 in the respiratory tract that could be playing a role as well. In either case, the discrepancy between Pou2f3 KO (no phenotype) and Chat-Cre / Gng13 KO (impaired recovery) is difficult to reconcile.

      We agree with the reviewer, and it took us some time to make senses of the data as well. The differences in phenotypes between Trpm5-knockout versus Gng13 conditional knockout (Gng13-cKO) could be explained by that Gγ13 is a partner of Gβγ moiety of a heterotrimeric G protein (Gαβγ),which is known to act on many effector enzymes and ion channels, while Trpm5 largely regulates the influx of monovalent cations, depolarizing the plasma membrane potentials. Thus, it is understandable that nullification of Gng13 may have more profound effect on cell physiology and consequent phenotypes than that of Trpm5, and similar differential effects were also found in the intestines (Frontiers in Immunology, 2023, DOI 10.3389/fimmu.2023.1259521).

      Data from several research groups have indicated that there are subtypes of tuft cells, each of which displays unique gene expression patterns as well as input and out signal profiles. It is yet not well understood how each subtype may contribute to the inflammatory responses or inflammation resolution. Comparative analyses of our data from the Gng13-cKO mice versus those from Pou2f3-KO mice suggest that Gng13-expressing tuft cells may have a role in the inflammation resolution while other ectopic tuft cells may contribute to the maintenance of the inflammation at a certain level, impairing subsequent tissue repairing and recovery. The exact molecular and cellular mechanisms are to be revealed in our future studies.

      The central nervous system may also play a role in the impaired lung recovery. But our detailed immunochemical studies did not identify any significant number of neurons innervating the lung tissue co-expressing ChAT and Gng13, suggesting that no immediate action from these neurons may regulate the pulmonary inflammation resolution or functional recovery.

      Together, our data suggest the importance of tuft cell subtype-specific functions, which may help us further understand the role of these rare tuft cells.

      (2) Figures showing alternative injury models inducing the generation of ectopic tuft cells are not convincing and not quantified. DCLK1 can be a bit promiscuous, so verifying tuft cell expansion in these other models with other markers (especially for LPS and HDM which have not been reported elsewhere) is important.

      We agree with the reviewer that DCLK1 is not a very specific marker for tuft cells. We have also observed that chemical inductions of these ectopic tuft cells with bleomycin, HDM or LPS are not as effective as H1N1 viruses. To verify that these rare DCLK1-positive cells are indeed tuft cells, we performed double immunostaining with antibodies to DCLK1 and to Gα-gustducin, another tuft cell marker. The results showed that some of these spindle-shaped DCLK1 positive cells indeed also express Gα-gustducin (see the newly added panels in Figure 1-figure supplement 1), indicating that they are most likely the chemically induced ectopic tuft cells. We also agree with the reviewer that it would be important to further investigate the possible roles of these cells during the stages of the chemically induced injury, inflammation resolution and functional recovery.

      (3) Calcium responses in isolated post-flu tuft cells are interesting but difficult to interpret as presented. Can higher-power images be shown? Also, no statistical analysis is presented to provide any confidence in that data.

      Thank the reviewer for the suggestions. As found in taste buds, only a subset of these ectopic tuft cells expresses Tas2rs, and each of these cells may express a few of the 35 murine Tas2rs. Thus, a particular bitter tasting compound can activate only few tuft cells and we had to use low-magnification to include more responsive cells in a field under the imaging microscope. We agree with the reviewer that it would be an interesting idea to statistically correlate the response profile to bitter substances with the cell’s Tas2r expression pattern, which we have done with sperm cells before (Molecular Human Reproduction, 2013, doi:10.1093/molehr/gas040). However, the main focus of this work is on the effect of Gng13-cKO in a subset of these ectopic tuft cells on the recovery. We plan to investigate these interesting cells in more details in the future.

      (4) I am unaware of Sytox being a specific dye for pyroptotic cells. Can the authors please provide a reference or otherwise justify this?

      Sytox is a dye to stain dead cells, which has been used previously in the studies on gasdermin-mediated lytic cell death (Xi et al., Up-regulation of gasdermin C in mouse small intestine is associated with lytic cell death in enterocytes in worm-induced type 2 immunity. PNAS 2021 118(30) e2026307118 https://doi.org/10.1073/pnas.2026307118). In our work we used the dye for the same assay.

      (5) The authors perform qPCR for various taste receptor genes pre- and post-flu, but do not show that these genes are specifically induced in tuft cells. Since single-cell data and bulk RNA-Seq are available from Barr et al., the authors should validate the expression of these Tas2r genes specifically in post-flu tuft cells.

      Thank the reviewer for the suggestion. Yes, we have performed analysis of the single-cell RNAseq dataset (GSE197163, Barr et al. 2022) and found that among 613 Trpm5-GFP+ tuft cells, Tas2r108 was expressed in the greatest number of cells, i.e., 67 cells, followed by Tas2r105, Tas2R138, Tas2r137, Tas2r118 and Tas2r102, which were detected in 11, 10, 10, 5 and 4 cells, respectively (see the newly added Figure 2-figure supplement 1). This order of expressing cell numbers is very much in agreement with that of the relative Tas2r expression levels obtained with the qPCR experiment (Figure 2A), indicating the expression of these Tas2rs likely in the ectopic tuft cells. We will further validate the data by analyzing the bulk RNA-Seq dataset when it is accessible to us.

      (6) Some general editing of language throughout would be helpful to increase readability.

      Thanks for pointing out. We have carefully checked the manuscripts, corrected some typos and revised several sentences to increase its readability.

      (7) For the fibrosis analysis, trichrome staining is very heterogenous, which is reflected by the large error bars in Fig. 8B. A more quantitative, "whole lung" analysis such as hydroxyproline content or western blotting for Col1a1 would be ideal.

      The approach of Masson’s trichrome staining along with qRT-PCR assays on the fibrotic gene expression has been used previously to quantitatively analyze fibrosis (e.g., Zhang et al., Neuropilin-1 mediates lung tissue-specific control of ILC2 function in type 2 immunity. Nature Immunology 23:237-250, 2022, https://doi.org/10.1038/s41590-021-01097-8). We agree with the reviewer that there are large error bars in Fig. 8B, and hydroxyproline content assay or western blotting for Col1a1 would be ideal. But our qRT-PCR was performed on the RNA samples extracted from the “whole lungs”, and its data are also able to reflect the extent of fibrosis of the lungs.

      (8) The authors claim that only a subset of tuft cells express Gng13, but this is supported only by a single IF image in Fig. 3 supplement 1G. The authors could download the single-cell dataset from Barr et al. to confirm the heterogeneity of Gng13 expression and get a better sense of the fraction of total ectopic tuft cells that express this, as it is a critical point in their model.

      Thank the reviewer for the suggestion. Yes, we have downloaded and reanalyzed the single-cell RNAseq dataset (GSE197163), and found that out of 613 Trpm5-GFP+ tuft cells, 350 or 57% of these cells expressed Gng13 (Figure 3-figure supplement 2I). This result, together with our immunohistochemical data (Figure 3-figure supplement 2G and H) indicates that Gγ13 is expressed in a subset of these ectopic tuft cells. More comprehensive studies are needed to characterize these tuft cell subtypes and elucidate subtype-selective functions.

      Reply to the recommendations of Reviewer #2:

      The study needs more rigorous examinations of the phenotypes. For example, quantification of the injury area in Fig3C is problematic. Similarly, fibrotic phenotype and quantification in Fig 8C also have problems. This study heavily used qRT-PCR analysis to quantitate the level change of bitter/other receptors in a minor population of tuft cells which are also minor in a whole lung. Given the limited number of cells, it is difficult to appreciate that qRT-PCR can pick up the difference. In addition, how would the findings in this study reconcile with the finding by Huang (PMID: 36129169) where pou2f3 null mutants (without tuft cells) were used? Huang et al. did not observe more severe phenotypes in the mice without tuft cells than controls.

      Thank the reviewer for the recommendations. Regarding Fig 3C, please see the reply below: revisions for clarity point #2.

      Fig 8 B and C used Masson’s trichrome staining to quantitatively analyze fibrosis, which has been used by other groups as well (e.g., Zhang et al., Neuropilin-1 mediates lung tissue-specific control of ILC2 function in type 2 immunity. Nature Immunology 23:237-250, 2022, https://doi.org/10.1038/s41590-021-01097-8). Our qRT-PCR data on the fibrotic gene expression (Figure 8A) further support the Masson’s trichrome staining results.

      We realized that tuft cells make up only a minor population in the lungs. So, we performed qRT-PCR assays on the RNA samples isolated from mostly the injured tissues along with the corresponding tissues from the uninjured lungs as control. To validate our qRT-PCR data, we reanalyzed the previously published single ectopic tuft cell RNAseq dataset (GSE197163), and found that the most abundantly expressed Tas2r108 determined by qRT-PCR was also expressed in the greatest number of tuft cells, and the order of expression levels of other Tas2rs are also well in agreement between the qRT-PCR and single-cell RNAseq data (Figure 2A, Figure 2-figure supplement 1), cross-validating the data obtained by these two very different approaches.

      We have carefully studied the finding by Huang (PMID: 36129169). Our data suggest that there are subtypes of the ectopic tuft cells, some of which contribute to the inflammation resolution while others play a proinflammatory role. Indeed, the reanalysis of the aforementioned single tuft cell RNAseq dataset found that about 57% Trpm5-GFP+ ectopic tuft cells expressed Gng13, and some of which expressed Alox5, a key enzyme to the biosynthesis of pro-resolving mediators. Thus, in the Pou2f3-knockout mice, both pro- and anti-inflammatory tuft cells are ablated, it would be hard to observe any significant phenotypes. When the function of a subset of Gγ13-expressing tuft cells is disrupted, the anti-inflammatory role from these cells is eliminated, resulting severer phenotypes. More studies are needed to further understand the subtype-specific functions of these fascinating tuft cells.

      Do Gγ13 null mutants show similar phenotypes in bleomycin injury model?

      Bleomycin and other chemicals-induced injury models indeed engender much fewer ectopic pulmonary tuft cells. Thus, it is more difficult to test the effect of Gng13 mutation due to the small number of the Gng13-expressing tuft cells in either WT or mutant lungs.

      What is the cell fate of lineage labeled tuft cells in the lungs of Chat-Cre:Ai9:Gng13flox/flox mice following viral infection at different times examined? The numbers were decreased at different time points post-injury based on the data. Did these cells undergo apoptosis? It is an excellent idea to look into the cell fate of ChAT-Cre:Ai9:Gng13flox/flox. We believe that these cells would have a similar fate to other ectopic tuft cells, probably undergoing apoptosis. But our data suggest that Gng13 mutation suppresses the increase the ectopic tuft cells, or the increase of a particular subtype of these tuft cells. Further studies are needed to elucidate the molecular mechanisms of the Gγ13-mediated signal transduction pathways regulating the proliferation of a subset of ectopic tuft cells.

      Here are the revisions for clarity and coherence to the figures:

      (1) Fig 2: For the functional assessment, using tracheal tuft cells from the same ChAT-Cre:Ai9 mice would be a suitable positive control in the calcium response traces experiment. These specific cells could also serve as a control in Fig2a.

      We would agree with the reviewer that tracheal tuft cells from the same ChAT-Cre: Ai9 mice would be an ideal positive control in the calcium response experiment as well as in the qRT-PCR assay. But we have established reliable methods to calcium image primary cells expressing taste receptors and quantify their RNA expression levels, which have been used in our previous publications, e.g., (1) Functional characterization of bitter taste receptors expressed in mammalian testis. Molecular Human Reproduction, 2013, doi:10.1093/molehr/gas040; (2) Infection by the parasitic helminth Trichinella spiralis activates a Tas2r-mediated signaling pathway in intestinal tuft cells. PNAS 2019, www.pnas.org/cgi/doi/10.1073/pnas.1812901116. We thank the reviewer for the excellent suggestion.

      (2) Fig 3C: It is not clear whether the depicted areas really represent the injured area. To provide a more comprehensive view, the authors should also provide histological analysis and quantification of the injured lung. A 3D representation of the injury area would offer a more accurate presentation.

      Thank the reviewer for the point. The depicted areas in Fig 3C are indeed the injured surface areas of the lungs. Following the reviewer’s suggestion, we carried out the histological analysis to determine the injured tissue volumes of the lungs. We fixed the lungs, and sliced them into 12 μm-thick sections, which were imaged under a microscope. The injured areas in a section were identified and quantified using the ImageJ software, and then the injured volume for this section was obtained by multiplying the area by the thickness of the section, i.e., 12 μm. Statistical analyses indicate that the injured volume of the Gng13-cKO lungs is significantly more than those of WT or Trpm5-KO mice, which has been included in Figure 3-figure supplement 1, and is in agreement with the data of the injured surface areas (Fig 3C).

      (3) Fig 3 G/I/K/M: There seems to be an inconsistency in the time points. There's no indication for 14 dpi, yet two for 25 dpi. Additionally, a color legend for each sample would be helpful.

      Thank the reviewer for pointing out. There were two typos, which have been corrected. Yes, the time points should be 14 dpi, 20 dpi, 25 dpi and 50 dpi. And a color legend has been added as well.

      (4) Fig 4A: Using CD64 co-stained with Krt5 might better highlight the immune cells in the damaged region. Additionally, could you clarify the choice of the neutrophil marker CD64 over CD45 for staining the injured lung?

      We agree with the reviewer that Krt5 antibody staining can help define the damaged region. We sectioned the lung tissues with a special attention to the damaged areas, but we found that the adjacent healthy areas also had extra immune cells. Thus, we counted in all these CD64+ cells in both the damaged as well as the surrounding, seemingly healthy areas. We used CD64 instead of CD45 to label these altered immune cells because we found that CD64 can better label the differential immune cells between WT and Gng13-cKO mice following H1N1 infection. Furthermore, CD64-labeled cells could be readily related to the Gsdmd/Gsdme-expressing F4/80-labeled immune cells shown in Figure 5 and its supplemental figures.

      (5) Fig 5 and Supplemental Fig 5: It appears that the F4/80 staining exhibits notable background staining.

      Yes, there is some background staining. The antibody was the best we could find, but its quality could be further improved. On the other hand, we thought that there were some cellular debris that might be stained positive by that antibody. At a higher magnification, however, we could still identify individual cells co-expressing IL-1β.

      (6) Fig 8C: The depicted area does not seem to adequately represent the fibrosis in the injured lung.

      Masson’s trichrome staining has been previously used to quantitatively analyze fibrosis (e.g., Zhang et al., Neuropilin-1 mediates lung tissue-specific control of ILC2 function in type 2 immunity. Nature Immunology 23:237-250, 2022, https://doi.org/10.1038/s41590-021-01097-8). Our qRT-PCR assays on the fibrotic gene expression (Figure 8A) were performed on the RNA samples extracted from the whole lungs, and the resultant data are able to reflect the extent of fibrosis of the lungs, although we also agree with the reviewer that additional data would make the conclusion more convincing.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We express our sincere appreciation for your insightful comments and constructive suggestions. It is with great pleasure that we submit the revised version of our manuscript. Over the past months, we have meticulously considered all the invaluable feedback provided by the three anonymous reviewers, and endeavored to incorporate significant revisions accordingly. Furthermore, we have meticulously rephrased the results section in accordance with your guidance, aiming to bolster the rigor of our manuscript. The specific changes implemented in the revised manuscript are outlined below:

      - Revised the title of the manuscript.

      - Revised the description of early mitotic and meiotic chromosome structure in the scc3 mutant (Lines 167-274).

      - Added the BiFC results illustrating the interaction between SCC3 and other cohesin proteins in Figure S10.

      - Enhanced the detail in the description of figure legends, particularly for Figures 2 and 4.

      - Refined and rephrased the language of the manuscript.

      We hope these positive revisions have substantially strengthened the manuscript. Once again, we extend our heartfelt gratitude for your invaluable input.

      eLife assessment

      This important study elucidates the function of the cohesin subunit SCC3 in impeding DNA repair between inter-sister chromatids in rice. The observation of sterility in the SCC3 weak mutant prompted an investigation of abnormal chromosome behavior during anaphase I through karyotype analysis. While the evidence presented is largely solid, the strength of support can be substantially improved in some aspects, leaving room for further investigation. This research contributes to our understanding of meiosis in rice and attracts cell biologists, reproductive biologists, and plant geneticists.

      Public Reviews:

      Reviewer #1 (Public Review):

      The manuscript describes the identification and characterization of rice SCC3, including the generation and characterization of plants containing apparently lethal null mutations in SCC3 as well as mutant plants containing a c-terminal frame-shift mutation. The weak scc3 mutants showed both vegetative and reproductive defects. Specifically, mitotic chromosomes appeared to partially separate during prometaphase, while meiotic chromosomes were diffuse during early meiosis and showed alterations in sister chromatid cohesion, homologous chromosome pairing, and recombination. The authors suggest that SCC3 acts as a cohesin subunit in mitosis and meiosis, but also plays more functions other than just cohesion.

      Reviewer #2 (Public Review):

      This manuscript shows detailed evidence of the role of cohesin regulators in rice meiosis and mitosis.

      Reviewer #3 (Public Review):

      Prior research on SCC3, a cohesin subunit protein, in yeast and Arabidopsis has underscored its vital role in cell division. This study investigated into the specific functions of SCC3 in rice mitosis and meiosis. In a weakened SCC3 mutant, sister chromatids separating was observed in anaphase I, resulting in 24 univalents and subsequent sterility. The authors meticulously documented SCC3's loading and degradation dynamics on chromosomes, noting its impact on DNA replication. Despite the loss of homologous chromosome pairing and synapsis in the mutant, chromosomes retained double-strand breaks without fragmenting. Consequently, the authors inferred that in the scc3 mutant, DNA repair more frequently relies on sister chromatids as templates compared to the wild type.

      We extend our sincere gratitude to the Editors and the Reviewers for their highly constructive and insightful suggestions. We deeply appreciate receiving both positive feedback and constructive criticism on our manuscript. In light of the reviewers’ comments, we have diligently undertaken substantial revisions to improve the manuscript. The revised version comprehensively addresses all the points raised by the reviewers.

      Below, we provide a detailed point-by-point response to the reviewers’ comments:

      Recommendations for the authors:

      Reviewer #1:

      (1) Line 170- looking at pollen formation does not specifically evaluate whether SCC3 is involved in meiosis.

      Thank you very much for this advice. We totally agree with your point of view that pollen formation defects only indicate the problem of gametogenesis. We are sorry for not accurately describing this sentence. It has been revised in the manuscript (Lines 167-176).

      (2) Lines 203-205- this seems more like discussion and is pure speculation. Another possibility described above is that the truncated SCC3 protein is partially functional and what they see is due to this partial functionality. Have the authors considered the possibility that a partially functional version of SCC3 is produced that alters its function or the function of the cohesin complex? How much of the protein epitope remains in the truncated protein?

      We are so grateful for the insightful suggestions provided. We concur with the proposition that a partially functional SCC3 may indeed be synthesized, contributing to the survivability of the mutant. Notably, the truncated version of the protein retains approximately 60% to 70% of the epitope, which ostensibly maintains a residual functionality within the weak scc3 mutant. In this manuscript, the loss of C-terminal 910-1116 aa of SCC3 contains a special protein epitope and a certain protein secondary structure, which may alter the protein’s folding and its subsequent roles within the cohesin complex.

      In this study, we encountered challenges in generating null alleles of the scc3 mutants in rice utilizing the CRISPR-Cas9 system. Consequently, it is plausible that the scc3-1 and scc3-2 variants represent null alleles of SCC3, resulting in embryonic lethality. We posit that the identification of weak alleles is paramount to facilitating the survival of the organism. Thus, selecting some weak mutants, particularly those exhibiting the most pronounced phenotype, is advantageous for conducting further research. Our findings indicate that the diminished scc3 mutant lacks only a segment of the C-terminal, yet this deficiency is adequate to ensure the plant's survival while significantly impeding the meiotic process. We cannot dismiss the likelihood that these observed defects are attributable to the unique truncated proteins. We extend our sincerest thanks once again.

      (3) Lines 212- I question whether what the authors see in Figure 2 is chromosome fragmentation. It could just as well be alterations in chromosome structure. Likewise, the authors provide little to no evidence that the mutation affects the replication process. Rather, the presence of replicated chromosomes later in mitosis and meiosis would argue that replication is not disrupted.

      We express our gratitude to the reviewer for highlighting this critical inquiry. Contrary to the scenario of chromosome fragmentation, as you astutely observed, the preservation of normal sister chromatids during prometaphase indicates that the replication process remains uninterrupted. In alignment with your insights, our study embarked on an extensive series of full-length fluorescence in situ hybridization (FISH) experiments to elucidate the underlying mechanisms contributing to the observed increase in the distance between sister chromatids, particularly during interphase. The preponderance of our findings corroborates the hypothesis that the chromosomes exhibit alterations in structure, as depicted in Figure 2A. Intriguingly, our data suggest that cohesin, upon interaction with other chromatin-bound proteins, may facilitate loop extrusion, anchoring itself in a manner that potentially alters chromosomal architecture. These alterations in chromosome structure and the subsequent defects in genome folding and cohesion establishment, particularly rely on SCC3. In response to your valuable suggestions, we have meticulously revised the relevant sections of our manuscript. We extend our sincere thanks for your insightful comments.

      (4) Line 230- what does the sentence SCC3 may enhance the interaction with DNA mean, the interaction of the cohesin complex?

      We are sorry for the ambiguity in our initial description and wish to clarify that SCC3 indeed plays a pivotal role in augmenting the interaction between the cohesin complex and DNA. Our observations revealed an upsurge in the signal intensity of SCC3 as cells transition from interphase to prophase, as depicted in Figure 2B. This enhancement correlates with the observed defects in scc3 mutants during prophase, suggesting that SCC3’s functional significance is particularly pronounced at this stage of the cell cycle. We have revised our manuscript to reflect these insights more accurately, in accordance with your valuable suggestions. We express our sincere gratitude for your guidance.

      (5) Oddly, and unexplainably the authors present data indicating that SCC3 interacts with RAD21.1, but not SMC1, SMC3, or REC8. The fact that the authors report that SCC3 only interacts with RAD21.1 but no other cohesin proteins is quite hard to explain.

      As argued in the point above, the available data do not provide compelling evidence supporting the interaction between SCC3 and other cohesin proteins. We have repeated yeast two-hybrid (Y2H) experiments yielding consistent outcomes, which also surprised us initially. In the revised manuscript, we further added the bimolecular fluorescence complementation (BiFC) results between SCC3 and other cohesin proteins in rice protoplast (Figure S10). These supplementary data affirm that SCC3 predominantly interacts with RAD21.1, excluding interactions with other cohesin proteins. While the absence of such interactions is perplexing, our investigations have failed to detect any binding between SCC3 and other cohesin proteins.

      A weak interaction between SCC3 and REC8 has been reported in Arabidopsis (Kuttig et al. bioRxiv https://doi.org/10.1101/2022.06.20.496767). We speculate that either these proteins do not interact or the yeast-hybrid assays may be inadequate for detecting their interaction, as several factors can impede interaction in a heterologous system. In Figure 7, we could only detect the interaction between SCC3 and RAD21.1 in both Y2H and BiFC experiments. This suggests potential alterations in protein folding or conformation, or the involvement of additional regulatory factors modulating the interaction between SCC3 and other cohesin proteins. Notably, given RAD21.1’s pivotal role as a core component in the cohesin complex, our supplementary findings demonstrate the interactions between SMC1/3 and RAD21.1 (data not shown). Consequently, our current data propose a model wherein RAD21.1 and SMC1/3 form a circular structure, with SCC3 positioned on the outer periphery of the ring complex, associating specifically with RAD21.1 (Figure 8A).

      Reviewer #2:

      The authors did not consider creating heterozygous mutants for the replication fork. Moderate English language editing may be required.

      We extend our gratitude to the reviewer for their valuable suggestions. Initially, we did not explore the potential relationship between SCC3 and the replication fork. Cohesin, as we understand, becomes associated with DNA prior to DNA replication. The phenomenon of sister chromatid co-entrapment arises as replication forks traverse through cohesin rings, a process intricately linked to DNA replication dynamics. In this study, we exclusively observed aberrant chromosome structures in the scc3 mutant during interphase (Figure 2). We conjecture that these anomalies may stem from alterations in chromosome structure, such as genome folding and loop extrusion, rather than being directly attributable to the DNA replication fork. However, the precise nature of these chromosome structural aberrations during interphase in the scc3 mutant remains elusive, necessitating further comprehensive investigation in future studies. We have refined the language of our manuscript in accordance with the reviewer’s suggestions. Once again, we express our sincere appreciation for the invaluable suggestions provided.

      Reviewer #3:

      While the paper's conclusions are generally well-supported, further substantiation is needed for the claim that SCC3 inhibits template choice for sister chromatids. To bolster this conclusion, I recommend that the authors perform whole-genome sequencing on parental and F1 individuals from two rice variants, subsequently calculating the allele frequencies at heterozygous sites in the F1 individuals. If SCC3 indeed inhibits inter-sister chromatid repair in the wild type, we would anticipate a higher frequency of inter-homologous chromosome repair (i.e., gene conversion). This should be manifested as a bias away from the Mendelian inheritance ratio (50:50) in the offspring of the wild type compared to the offspring of the scc3+/- mutant.

      We express our sincere appreciation for your insightful suggestions. It is really a good suggestion. We have arranged to do this experiment. As it takes long time to prepare plant materials and sequence analysis, we hope the ongoing sequencing work will get some important information supporting those hypotheses. As we have not obtained the direct evidence that SCC3 involved in sister chromatid repair, we changed the title as “SCC3 is an axial element essential for homologous chromosome pairing and synapsis”. Once again, we really extend our gratitude for your invaluable suggestions.

      A point that warrants consideration is the placement of the protein interaction experiments involving SCC3 within the paper. It is presented relatively late in the manuscript. If the authors possess information regarding the interaction between RAD21.1 and SCC3 and how it relates to the functional study of RAD21.1, it could contribute to a more comprehensive analysis. However, if this information is unrelated to the current study, it might be advisable to omit it, as it appears to diverge from the main focus of this work.

      We express our sincere gratitude for your invaluable suggestions. It has been documented in yeast that the interaction between SCC3 and SCC1 is indispensable for the efficient loading of cohesin. In our study, we endeavored to elucidate the intricate relationships among various cohesin subunits. Through our investigations, we have discerned that RAD21.1 serves as a pivotal core subunit within the cohesin complex, facilitating interactions with both SMC1/3 and SCC3 (data not shown). Additionally, our findings indicate that the interaction between RAD21.1 and SCC3 is imperative for maintaining the stability of the cohesin ring and its association with DNA (data not shown). Consequently, the interaction between these two proteins assumes paramount importance for our subsequent analyses. This study holds significant promise for future investigations.

      It's worth noting that while the title of the study claims that "SCC3 inhibits inter-sister chromatids repair during rice meiosis," the last sentence of the abstract weakens this conclusion by using the word "seems." A study's title should ideally reflect the most definitive and conclusive findings.

      We sincerely appreciate your valuable suggestions. In response, we have revised the description in our manuscript to enhance its rigor.

      In Figure 8C, it appears that cohesin is depicted between two DNA strands.

      Figure 8C illustrates the process of sister chromatid repair during meiosis in the scc3 mutant. Two gray lines and two blue lines represent the four sister chromatids of two homologous chromosomes, respectively. In the wild type, cohesin plays a crucial role in tethering together the two sister chromatids. As per your reminder, cohesin should indeed encircle the two sister chromatids, as depicted in Figure 8B. Following a thorough evaluation and to mitigate any potential confusion, we have deleted Figure 8C.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We appreciate your comments and suggestions on our manuscript.

      In particular, we have measured the affinity between the middle tail domain of myosin-5a (Myo5a-MTD) and the actin-binding domain of melanophilin (Mlph-ABD) using microscale thermophoresis, and obtained the Kd of ~0.56 uM, which is similar to the Kd of the globular tail domain of myosin-5a (Myo5a-GTD) to the GTD-binding motif of melanophilin (Mlph-GTBM). Moreover, we have performed Western blot of the lysate of transfected cells, showing that the proteins of the dominant negative construct and the negative control were expressed at similar lever without noticeable degradation.

      We appreciate the editors’ and reviewers’ comment on how melanophilin might be regulated in binding to the exon-G of myosin-5 and to actin filaments. Phosphorylation of melanophilin by protein kinase A is one possible mechanism. We will investigate this issues in our future study.

      We also took this opportunity to correct several minor errors in the manuscript. Textual alterations can be viewed in the “tracked change” version of the manuscript. Below is the comments from the editors and the two reviewers together with our point-by-point responses.

      eLife assessment

      This study represents a useful description of a third interaction site between melanophilin and myosin-5a which is important in regulating the distribution of pigment granules in melanocytes. While much of the data forms a solid case for this interaction, the inclusion of important controls for the cellular studies and measurement of interaction affinities would have been helpful.

      Public Reviews:

      Reviewer #1 (Public Review):

      Interactions known to be important for melanosome transport include exon F and the globular tail domain (GTD) of MyoVa with Mlph. Motivated by a discrepancy between in vitro and cell culture results regarding necessary interactions for MyoVa to be recruited to the melanosome, the authors used a series of pull-down and pelleting assays experiments to identify an additional interaction that occurs between exon G of MyoVa and Mlph. This interaction is independent of and synergistic with the interaction of Mlph with exon F. However, the interaction of the actin-binding domain of Mlph can occur either with exon G or with the actin filament, but not both simultaneously. These data lead to a modified recruitment model where both exon F and exon G enhance the binding of Mlph to auto-inhibited MyoVa, and then via an unidentified switch (PKA?) the actin-binding domain of Mlph dissociates from MyoVa and interacts with the actin filament to enhance MyoVa processivity.

      The only weakness noted is that the authors could have had a more complete story if they pursued whether PKA phosphorylation/dephosphorylation of Mlph is indeed the switch for the actin-binding domain of Mlph to interact with exon G versus the actin filament.

      We thank Reviewer #1 for careful reading of the manuscript and appreciation of the study. We agree with the Reviewer that it is important to understand how the actin-binding domain of Mlph switch its interaction with the exon-G of Myo5a and actin filament. We would like to pursue this direction in our future research.

      Reviewer #2 (Public Review):

      The authors identify a third component in the interaction between myosin Va and melanophilin- an interaction between a 32-residue sequence encoded by exon-g in myosin Va and melanophilin's actin-binding domain. This interaction has implications for how melanosome motility may be regulated.

      While this work is largely well done and certainly publishable following needed revisions (e.g. some affinity measurements, necessary controls for the dominant negative experiments), I believe that additional work would be required to make a more compelling case. First, the study provides just one more piece to a well-developed story (the role of exon-F and the GTD in myosin Va: melanophilin (Mlph) interaction), much of which was published 20 years ago by several labs. Second, the study does not demonstrate a physiological significance for their findings other than that exon-G plays an auxiliary role in the binding of myosin Va to Mlph. For example, what dictates the choice between Mlph's actin binding domain (ABD) binding to actin or to exon-G. Is it a PTM or local actin concentration? It is unlikely to be alternative splicing as exon-G is present in all spliced isoforms of myosin Va. And what changes re melanosome dynamics in cells between these two alternatives? Similarly, the paper does not provide any in vitro evidence that binding to exon-G instead of actin effects the processivity of a Rab27a/Myosin Va/Mlph transport complex. For example, if the ABD sticks to exon-G instead of actin, does that block Mlph's ability to promote processivity through its interaction with the actin filament during transport? In summary, given that the authors did not directly test their model either in vitro or in cells, I do not think this story represent a significant conceptual advance.

      We thank Reviewer #2 for careful reading of the manuscript and the suggestions of improving the manuscript. As suggested by the reviewer, we have measured the affinity between the middle tail domain of Myo5a (Myo5a-MTD) and Mlph-ABD (Kd ~0.562 uM), which is similar to that between the globular tail domain of Myo5a (Myo5a-GTD) and the GTBM of Mlph. In addition, we have performed additional experiments showing the integrity and the expression level of the dominant negative constructs in the transfected cells.

      We believe more extensive experiments are required to address other questions raised by the reviewer. For example, what dictates the choice between Mlph's actin binding domain (ABD) binding to actin or to exon-G is an open question. As we proposed, phosphorylation by protein kinase A is only one possible mechanism. We would like to pursue them in our future research.

      Recommendations for the authors:

      The reviewing editor feels strongly that addressing some of the points raised by the reviewers would make this a more compelling manuscript. In particular, a measurement of the affinity of the relevant fragments from melanophilin and myosin-5a would indicate that the interaction might be physiologically relevant. Concerning the dominant negative experiments, the lack of effect of an expressed fragment could be that the expressed fragments were simply degraded or expressed at too low of a level to be competing. The reviewer gives guidelines on how to address this. Reviewer #2 made a point that it would be compelling if the effect of phosphorylation as suggested in the model was tested, but we all agree that this could well be the subject of a later study. In addition, the authors make a very interesting proposal for how protein kinase A could be involved in this regulation as has been suggested previously. Perhaps the use of phosphomimetic mutations could give some insight into this. Such experiments, if consistent with the proposed model would certainly raise the impact of this study. Finally, a very clear periodicity in hydrophobic amino acids is apparent in the interacting sequences of both Myo5 (yrisLykrMidLmeqLekqdktVrkLkkqLkvFakkIgeLevgqmen) and Mlph (tdeeLseMedrVamtAseVqqAeseIsdIesrIaaLra). This is strongly suggesting a leucine-zipper-like coiled coil, rather than an interaction mediated solely by charge. Recent softwares (and easily accessible too) like AlphaFold multimer might yield important structural insight into the binding configuration and might help rationalize the effect of the mutations herein.

      We thank the editors and the reviewers for their suggestions of improving the manuscript. We have performed the several essential experiments to address the concerns raised by the reviewers.

      (1) Regarding the affinity of the relevant fragments from melanophilin and myosin-5a. We have measured the affinity between Mlph-ABD and Myo5a-MTD using MST (Kd ~562 nM) (see revised Figure 3A).

      (2) Regarding the concerns on the dominant negative experiments. We have examined the molecular sizes and expression levels of  Mlph or Myo5a constructs by Western blots. First, we show that all constructs have correct molecular size in transfected cells (see revised Figure 6C and 7D), indicating that the inability of Myo5a or Mlph truncations to generate dilute-like phenotypes was not due to the intracellular degradation of the EGFP fusion protein. Second, by correcting for the percentage of transfected cells, we show that the overall expression levels of the wild-type construct and the mutants are roughly equal. Third, we categorized the expression levels into high and low, and calculated percentage of the DN phenotype in high and low expression levels. The results are consistent with the percentage of DN phenotype in total EGFP fusion protein cells.

      (3) Regarding the suggestion to investigate the effect of phosphorylation by protein kinase A on Mlph-ABD’s interaction with Myo5a and actin filament. We understand that it is important to elucidate the mechanism by which the actin-binding domain of Mlph switch its interaction with the exon-G of Myo5a and actin filament. However, as we proposed, phosphorylation by protein kinase A is one possible mechanism, and more extensive experiments are required to address this question. Therefore, we would like to pursue it in our future research.

      (4) Regarding the suggestion to predict the interaction between the exon-G of myosin-5a and Mlph-ABD using AlphaFold. We have used AlphaFold multimer to predict the Myo5a-MTD/Mlph-ABD interaction. Remarkably, the AlphaFold predicted that the binding of Myo5a-MTD with Mlph-ABD is mediated by an antiparallel coiled-coil formed by Myo5a (1430-1467) and Mlph (450-481), just as predicted by the editors. This prediction is also consistent with our finding that the exon-G of Myo5a interacts with Mlph-ABD. However, the predicted model cannot explain our mutagenesis results. We will pursue this point in the future research. Nevertheless, we are grateful to the editors for bringing this idea to our attention, because it will help us to design experiments to investigate the nature of Myo5a-exon-G/Mlph-ABD interaction.

      Reviewer #1 (Recommendations For The Authors):

      Specific minor comments

      Q1: In figs 6-7 an overlay between DAPI and EGFP would be helpful for the reader to see perinuclear distribution.

      As suggested, we have added the merged images of DAPI and EGFP in the revised Figure 6 and 7.

      Q2: The delta symbol in the pdf text was corrupted.

      The corrupted delta symbol has been fixed in the revised manuscript.

      Reviewer #2 (Recommendations For The Authors):

      Q1: Please explain in detail early in the text what exon-G is - length, position in the tail, and evidence that it is a coiled coil (CC). Of note, is it only long enough for about 4 heptad repeats? Has it been shown biochemically to form a CC? Is the CC irreversible? What would be the consequence of removing the exon-G CC on the ability of surrounding regions to bind Mlph (exon-F and the GTD)?

      We thank the reviewer for this suggestion. In the revision, we added a new paragraph (the first paragraph in the results section) and revised Figure 1A to introduce the middle tail domain and alternatively spliced exons of Myo5a.

      Exon-G is 32 amino acids in length, located at the C-terminal region of the middle tail domain, immediately before the globular tail domain. Exon-G region was predicted to form a short coiled-coil by using on-line tools (such as paircoil), and this prediction has not been tested biochemically. Moreover, we do not know whether the exon-G coiled-coil is reversible or not.

      We have not examined the effect of removing the whole exon-G on the interaction between the GTD and Mlph-GTBM. The exon-G (residues 1436-1467) and the GTD core (residues 1498-1877) are separated by a long loop of 31 residues. We therefore expect that the removing the exon-G will not affect the GTD/Mlph-GTBM interaction.

      Physically, exon-F is immediately followed by exon-G, and those two regions might interfere with each other. In our preliminary study, we found that removing the whole exon-G abolished the interaction between exon-F and Mlph-EFBD. On the other hand, removing the C-terminal half (residues 1454-1467) of exon-G had little effect the interaction between exon-F and Mlph-EFBD (see Figure 2C). In this work, we intentionally selected the later construct for functional analysis of the exon-G/Mlph-ABD interaction, because removing the C-terminal half of exon-G abolishes the interaction with Mlph-ABD, but does not affect the exon-F/Mlph-EFBD interaction.

      Q2: Figures 1-3. While the pulldown experiments demonstrating an interaction between Mlph-ABD residues 446-571 and Myo5a-MTD are a good start, one would like to see affinity measurements to gauge the likelihood that this interaction is physiologically relevant. The same goes for the pulldown experiments demonstrating an interaction between (i) the C-terminal half of exon-G (residues 1453-1467) and the Mlph-ABD, (ii) between residues 1411-1467 (a short peptide containing exon-F and exon-G) and the Mlph-ABD, and (iii) between residues 1436-1467 (a short peptide containing exon-G) and the Mlph-ABD. This would also apply to the pulldowns in 3C-3E where versions of the proteins with charge residue changes were tested.

      We agree the reviewer’s opinion that determination of the affinities between Mlph-ABD and Myo5a-MTD and their variants will be helpful in understanding the physiological relevance of Exon-G/Mlph-ABD interaction. However, the extensive experiments suggested by the reviewer require many high quality, purified proteins, which are not trivial.

      Nevertheless, we think it is important to know the affinity between Myo5a-MTD and Mlph-ABD (both wild-type), as this parameter can be used for the comparison of the three interactions between Myo5a and Mlph. Therefore, we have obtained the affinity between Myo5a-MTD and Mlph-ABD using microscale thermophoresis (MST). The dissociation constant (Kd) of Myo5a-MTD to Mlph-ABD is 0.562±0.169 uM, which is similar to that between Myo5a-GTD and Mlph-GTBM (~1 uM) (Geething & Spudich (2007) JBC 282:21518). Consistent with GST pulldown results, MST shows that deletion of C-terminal half of exon-G (1453-1467) greatly decreases the MST signals (see revised Figure 3A).

      Q3: While the domain negative (DN) approach to testing functional significance is OK, rescuing dilute/myosin Va null melanocytes with full-length myosin Va containing the various deletions would have been more convincing. Also, the authors must show (i) that the DN constructs are the correct size in transfected cells (i.e. are not degraded), and (ii) that they are expressed at roughly equal levels (either by doing Westerns and correcting for the percent of transfected cells, or by measuring total cellular fluorescence in transfected cells). Without this information, it remains possible that constructs not exhibiting a DN effect are simply degraded or poorly expressed. This applies to all the DN data in Figures 6 and 7.

      We agree with the reviewer that Myo5a null melanocytes is ideal for investigating exon G function. Unfortunately, we do not have Myo5a null melanocytes derived from dilute mice.

      To confirm the integrity of the overexpressed proteins in the transfected cells, we performed Western blot of those proteins, including  EGFP-Mlph-RBD (wild-type and two mutants) and Myo5a-Tail (wild-type and G mutant), in the lysate of the transfected cells. Western blots show that all those proteins have correct molecular masses, indicating no degradation of those overexpressed proteins (see revised Figure 6C and 7C). Moreover, by correcting for the percentage of transfected cells, we show that the overall expression levels in each transfected cell of the wild-type construct and the mutants are roughly equal. This information is included in the revised manuscript (Line 222-225; 237-241).

      Q4: The authors scored the DN phenotype as yes/no but it mostly likely varies depending on the degree of over-expression. Showing that the degree of melanosome centralization scales with the degree of overexpression, and that the correlation between expression level and phenotype varies depending on the construct would strengthen the results.

      We agree with the reviewer’s prediction that the degree of DN phenotype should depend on the of over-expression level. We analyzed the EGFP signals of transfected cells and found very few cells with medium expression level. Therefore, we simply categorized the expression levels into high and low, and calculated the DN phenotype in each categories as shown in the table below. These results are consistent with the expectation that the degree of DN phenotype depends on the over-expression level of the transfected constructs.

      Author response table 1.

      Percentage of the EGFP-expressing cells with perinuclear aggregation of melanosomes

      Q5: The conclusion from the data in Figure 8A- "the presence of both exon-F and exon-G is insufficient for binding to the Mlph occupied by Myo5a, but sufficient for binding to the unoccupied Mlph"- should be verified by also doing the experiment in myosin Va knockdown cells.

      We agree. Unfortunately, our RNAi knockdown of Myo5a in melanocytes by RNAi is not ideal and we do not have Myo5a knockout melanocytes. We will pursue this point in the future.

      Q6: Line 213 "three Mlph-binding regions, i.e., exon-F, exon-F, and GTD (Figure 7A)" has a typo.

      This typo has been corrected.

      Q7: The authors should provide high mag insets for the images in Figure 8.

      As suggested, we have revised Figure 8 by including high mag insets for the images.

    1. Author response

      Reviewer #1 (Public Review):

      Summary:

      The authors aimed to modify the characteristics of the extracellular matrix (ECM) produced by immortalized mesenchymal stem cells (MSCs) by employing the CRISPR/Cas9 system to knock out specific genes. Initially, they established VEGF-KO cell lines, demonstrating that these cells retained chondrogenic and angiogenic properties. Additionally, lyophilized carriage tissues produced by these cells exhibited retained osteogenic properties.

      Subsequently, the authors established RUNX2-KO cell lines, which exhibited reduced COLX expression during chondrogenic differentiation and notably diminished osteogenic properties in vitro. Transplantation of lyophilized carriage tissues produced by RUNX2-KO cell lines into osteochondral defects in rat knee joints resulted in the regeneration of articular cartilage tissues as well as bone tissues, a phenomenon not observed with tissues derived from parental cells. This suggests that gene-edited MSCs represent a valuable cell source for producing ECM with enhanced quality.

      Strengths:

      The enhanced cartilage regeneration observed with ECM derived from RUNX2-KO cells supports the authors' strategy of creating gene-edited MSCs capable of producing ECM with superior quality. Immortalized cell lines offer a limitless source of off-the-shelf material for tissue regeneration.

      We thank the reviewer for the interest in our work. We however want to clarify that the present manuscript does not report the generation of ECM with “superior quality”, but rather of modulated composition and thus function.

      Weaknesses:

      Most data align with anticipated outcomes, offering limited novelty to advance scientific understanding. Methodologically, the chondrogenic differentiation properties of immortalized MSCs appeared deficient, evidenced by Safranin-O staining of 3D tissues and histological findings lacking robust evidence for endochondral differentiation. This presents a critical limitation, particularly as authors propose the implantation of cartilage tissues for in vivo experiments. Instead, the bulk of data stemmed from type I collagen scaffold with factors produced by MSCs stimulated by TGFβ.

      The chondrogenic differentiation of our MSOD-B line and their capacity of undergoing endochondral ossification has been robustly demonstrated in previous studies (Pigeot et al., Advanced Materials 2021 and Grigoryan et al., Science Translational Medicine 2022). In the present manuscript, we thus compare the chondrogenic capacity of newly established VEGF-KO and RUNX-KO lines to those of MSOD-B cells. We demonstrate by qualitative (Safranin-O staining, Collagen type 2 and Collagen type X immuno-stainings) and quantitative (glycosaminoglycans assay) assays that the generated tissues consist in cartilage grafts of similar quality than the MSOD-B counterpart. Of note, the safranin-O stainings were performed on lyophilized tissues, which can alter the staining quality/intensity. We will thus provide additional stainings of generated tissues pre-lyophilization.

      The rationale behind establishing VEGF-KO cell lines remains unclear. What specific outcomes did the authors anticipate from this modification?

      VEGF is a known master regulator of angiogenesis and a key mediator of endochondral ossification. It has also been extensively used in bone tissue engineering studies as a supplemented factor – primarily in the form of VEGFα – to increase the vascularization and thus outcome of bone formation of engineered grafts (https://www.nature.com/articles/s42003-020-01606-9, https://www.sciencedirect.com/science/article/pii/S8756328216301752). In our study, it was thus identified as a natural candidate to demonstrate the possibility to generate VEGF-KO cartilage and subsequently assess the functional impact on both the angiogenic and osteogenic potential of resulting cartilage tissue.

      Insufficient depth was given to elucidate the disparity in osteogenic properties between those observed in ectopic bone formation and those observed in transplantation into osteochondral defects. While the regeneration of articular cartilage in RUNX2-KO ECM presents intriguing results, the study lacked an exploration into underlying mechanisms, such as histological analyses at earlier time points.

      Using RUNX2-KO ECM, we aimed at demonstrating the impact on cartilage remodeling and bone formation. This was performed ectopically but also in the rat osteochondral defect as a regenerative set-up of higher clinical relevance. We agree with the reviewer that additional experimental groups and time-points (not only earlier but also longer ones) would offer a better mechanistic understanding of the ECM contribution to the joint repair. However, as stated in our manuscript this is a proof-of-concept study that successfully demonstrated the influence of the cartilage ECM modification on the in vivo skeletal regeneration. A follow-up study would need to be performed to complement existing evidence and strengthen the relevance of our approach for cartilage repair.

      Reviewer #2 (Public Review):

      The manuscript submitted by Sujeethkumar et al. describes an alternative approach to skeletal tissue repair using extracellular matrix (ECM) deposited by genetically modified mesenchymal stromal/stem cells. Here, they generate a loss of function mutations in VEGF or RUNX2 in a BMP2-overexpressing MSC line and define the differences in the resulting tissue-engineered constructs following seeding onto a type I collagen matrix in vitro, and following lyophilization and subcutaneous and orthotopic implantation into mice and rats. Some strengths of this manuscript are the establishment of a platform by which modifications in cell-derived ECM can be evaluated both in vitro and in vivo, the demonstration that genetic modification of cells results in complexity of in vitro cell-derived ECM that elicits quantifiable results, and the admirable goal to improve endogenous cartilage repair. However, I recommend the authors clarify their conclusions and add more information regarding reproducibility, which was one limitation of primary-cell-derived ECMs.

      We thank the reviewer for the positive evaluation of our work.

      Overcoming the limitations of native/autologous/allogeneic ECMs such as complete decellularization and reduction of batch-to-batch variability was not specifically addressed in the data provided herein. For the maintenance of ECM organization and complexity following lyophilization, evidence of complete decellularization was not addressed, but could be easily evaluated using polarized light microscopy and quantification of human DNA for example in constructs pre and post-lyophilization.

      We will clarify the experiments and characterization performed with lyophilized tissues versus those performed with decellularized ones. We will also provide evidence of DNA removal in our decellularized ECMs.

      It would be ideal to see minimization of batch-to-batch variability using this approach, as mitigation of using a sole cell line is likely not sufficient (considering that the sole cell line-derived Matrigel does exhibit batch-to-batch and manufacturer-to-manufacturer variability). I recommend adding details regarding experimental design and outcomes not initially considered. Inter- and intra-experimental reproducibility was not adequately addressed. The size of in vitro-derived cartilage pellets was not quantified, and it is not clear that more than one independent 'differentiation' was performed from each gene-edited MSC line to generate in vitro replicates and constructs that were implanted in vivo.

      We thank the Reviewer for the comment on variability/reproducibility concern. Using a cell line does confer higher robustness but indeed does not grant unlimited consistency of batch production. We will temper our claims in the discussion and mention the need to regularly re-characterize cell lines properties upon passages.

      In our study, our grafts have been generated from various batches and tested in more than one experimental repeat. This will be further described in the revised version of our manuscript. We will also implement data on the size variability of generated tissues.

      The use of descriptive language in describing conclusions may mislead the reader and should be modified accordingly throughout the manuscript. For example, although this reviewer agrees with the comparative statements made by the authors regarding parental and gene-edited MSC lines, non-quantifiable terms such as 'frank' 'superior' (example, line 242) are inappropriate and should rather be discussed in terms of significance. Another example is 'rich-collagenous matrix,' which was not substantiated by uniform immunostaining for type II collagen (line 189).

      I have similar recommendations regarding conclusive statements from the rat implantation model, which was appropriately used for the purpose of evaluating the response of native skeletal cells to the different cell-derived ECMs. Interpretations of these results should be described with more accuracy. For example, increased TRAP staining does not indicate reduced active bone formation (line 237). Many would not conclude that GAGs were retained in the RUNX2-KO line graft subchondral region based on the histology. Quantification of % chondral regeneration using histology is not accurate as it is greatly influenced by the location in the defect from which the section was taken. Chondral regeneration is usually semi-quantified from gross observations of the cartilage surface immediately following excision. The statements regarding integration (example line 290) are not founded by histological evidence, which should show high magnification of the periphery of the graft adjacent to the native tissue.

      We thank the Reviewer for the constructive suggestions. We will revise language accordingly throughout the manuscript.

      Reviewer #3 (Public Review):

      Summary:

      In this study, the authors have started off using an immortalized human cell line and then gene-edited it to decrease the levels of VEGF1 (in order to influence vascularization), and the levels of Runx2 (to decrease chondro/osteogenesis). They first transplanted these cells with a collagen scaffold. The modified cells showed a decrease in vascularization when VEGF1 was decreased, and suggested an increase in cartilage formation.

      In another study, the matrix generated by these cells was subsequently remodeled into a bone marrow organ. When RUNX2 was decreased, the cells did not mineralize in vitro, and their matrices expressed types I and II collagen but not type X collagen in vitro, in comparison with unedited cells. In vivo, the author claims that remodeling of the matrices into bone was somewhat inhibited. Lastly, they utilized matrices generated by RUNX2 edited cells to regenerate chondro-osteal defects. They suggest that the edited cells regenerated cartilage in comparison with unedited cells.

      Strengths:

      -The notion that inducing changes in the ECM by genetically editing the cells is a novel one, as it has long been thought that ECM composition influences cell activity.

      -If successful, it may be possible to make off-the-shelf ECMS to carry out different types of tissue repair.

      We thank the Reviewer for the critical evaluation of our work and the highlighted novelty of it.

      Weaknesses:

      -The authors have not generated histologically identifiable cartilage or bone in their transplants of the cells with a type I scaffold.

      The chondrogenic differentiation of our MSOD-B line and their capacity of undergoing endochondral ossification has been robustly demonstrated in previous studies (Pigeot et al., Advanced Materials 2021 and Grigoryan et al., Science Translational Medicine 2022). In the present manuscript, we thus compare the chondrogenic capacity of newly established VEGF-KO and RUNX-KO lines to those of MSOD-B. We demonstrate by qualitative (Safranin-O staining, Collagen type 2 and Collagen type X immuno-stainings) and quantitative (glycosaminoglycans assay) assays that the generated tissues consist in cartilage tissue of similar quality than the MSOD-B. However, the safranin-O stainings were performed on lyophilized tissues, which can alter the staining quality/intensity. We will thus provide additional stainings of generated tissues pre-lyophilization.

      On the contested formation of bone in vivo by our ECMs grafts, we have provided compelling qualitative evidence via Masson´s Trichrome stainings and quantification of mineralized volume by µCT. Both cortical bone and trabecular structures were identified ectopically. Those are standard evaluation methods in the field, we would be happy to receive additional suggestions by the Reviewer.

      -In many cases, they did not generate histologically identifiable cartilage with their cell-free-edited scaffold. They did generate small amounts of bone but this is most likely due to BMPs that were synthesized by the cells and trapped in the matrix.

      We now appreciate that the Reviewer agrees on the successful formation of bone induced by our engineered grafts. We however still respectfully disagree with the “small amount of bone” statement since our MSOD-B and MSOD-B VEGF KO cartilage grafts led to the full generation of a mature ectopic bone organ (that is, also composed of extensive marrow). This has been assessed qualitatively and quantitatively.

      We agree with the Reviewer on the key role of BMP-2 in the remodeling process into bone and bone marrow, which we have extensively described in our previous publication (Pigeot et al., Advanced Materials 2021). We previously demonstrated that the low amount of BMP-2 (in the dozens of nanogram/tissue range) embedded in the matrix is not sufficient per se to induce ectopic endochondral ossification. It is the combined presence of GAGs in the matrix -thus cartilage- that allows the success of bone formation. Since we have already demonstrated in the present manuscript that the GAGs content is the same in MSOD-B and MSOD-B edited ECMs, we will provide additional data demonstrating the maintenance of BMP-2 content in all generated cartilage tissues.

      -There is a great deal of missing detail in the manuscript.

      We will provide additional information on the MSOD-B line and the overall methodology in our revised version.

      -The in vivo study is underpowered, the results are not well documented pictorially, and are not convincing.

      We will provide additional information and pictures related to our in vivo studies. We believe our group size supports our conclusions confirmed by statistical assessment.

      -Given the fact that they have genetically modified cells, they could have done analyses of ECM components to determine what was different between the lines, both at the transcriptome and the protein level. Consequently, the study is purely descriptive and does not provide any mechanistic understanding of what mixture of matrix components and growth factors works best for cartilage or bone. But this presupposes that they actually induced the formation of bona fide cartilage, at least.

      We thank the Reviewer for the suggestion. However, our study did not aim at understanding what ECM graft composition work best for cartilage nor bone regeneration respectively. Instead, we propose the exploitation of our cellular tools to interrogate the function of key ECM constituents and their impact in skeletal regeneration. We once more confirm that we generated lyophilized cartilage grafts which will be more evidently supported by histological assessment before lyophilization.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Chen and colleagues first compared the cartilage tissues collected from OA and HA patients using histology and immunostaining. Then, a genome-wide DNA methylation analysis was performed, which informed the changes of a novel gene, TNXB. IHC confirmed that TNXB has a lower expression level in HA cartilage than OA. Next, the authors demonstrated that TNXB levels were reduced in the HA animal model, and intraarticular injection of AAV carrying TNXB siRNA induced cartilage degradation and promoted chondrocyte apoptosis. Based on KEGG enrichment, histopathological analysis, and western blot, the authors also showed the relationship between TNXB and AKT phosphorylation. Lastly, AKT agonist, specifically SC79 in this study, was shown to partially rescue the changes of in vitro-cultured chondrocytes induced by Tnxb knock-down. Overall, this is an interesting study and provided sufficient data to support their conclusion.

      Strengths:

      (1) Both human and mouse samples were examined.

      (2) The HA model was used.

      (3) Genome-wide DNA methylation analysis was performed.

      Weaknesses:

      (1) In some experiments, the selection of the control groups was not ideal.

      Thank you for comments. The reviewer raised the concerns about using human OA cartilage as control, instead of health cartilage. This is an important detail we didn’t describe in the previous version. We have added our explanation in revised Methods.

      (2) More details on analyzing methods and information on replicates need to be included.

      We greatly appreciate your careful review and helpful suggestions. We have added detailed information to our revised draft.

      (3) Discussion can be improved by comparing findings to other relevant studies.

      Thank the reviewer very much for the opportunity to improve our manuscript. We have improved discussions as reviewer suggested in Recommendation 13.

      (4) The use of transgenic mice with conditional Tnxb depletion can further define the physiological roles of Tnxb.

      Thanks for this valuable comment. We understand that conditional Tnxb-KO mice is much helpful for the study of biological roles of Tnxb, and it will be constructed and used in our future studies.

      Recommendations For the Authors:

      (1) Please add more information about HA such as incidence to highlight the importance of the study.

      We greatly appreciate your careful review and helpful suggestions. We have provided more information about the importance of HA study in revised Introduction. Please see lines 90-93 and 103-112.

      (2) Please justify the use of OA cartilage, instead of normal tissues, as the control.

      Thanks for your suggestion. We certainly would have liked to use healthy cartilage as control, but we were extremely difficult to obtain enough control samples from healthy individuals. Despite the mechanistic and phenotypic differences between HA and OA, OA is often used as “disease” control to reveal the characteristics in HA 1,2. Thus, we measured cartilage degeneration and DNA methylation difference in HA and OA patients. We have provided the statement and evidence in revised manuscript. Please see lines 144-145.

      (3) Please provide details of how to calculate the Cartilage wear area ratio in Figure 1D, and measure the positive staining area in Figure 1F.

      We apologize for the issue you pointed out. Here, we provide detailed information for how positively stained areas are calculated. Specifically, in Figure 1D, we obtained the cartilage area ratio by calculating the ratio of blue cartilage staining area to the whole tissue area by using image J software. In Figure 1F, the area of positive staining was determined upon secondary antibody treatment and color development using DAB chromogen (brown stain). We then obtained the positive staining area ratio by calculating the ratio of positive staining area to the whole cartilage area by using image J software.

      (4) Please label the location of hemorrhagic ferruginous deposits in Figure 1.

      Thank you for your valuable suggestion. We have used black arrows to indicate hemorrhagic ferruginous deposits in revised Figure 1A.

      (5) Please define the meaning of "n" in all figure legends, such as technical or biological replicates.

      Thanks for your suggestion. We have defined the meaning of "n" in all figure legends in revised manuscript.

      (6) In Figure 3, please increase the font size of B, D, F, H, and J. The same applies to other figures.

      Thank you for your valuable suggestion. We have increased the font size of figures in our revised manuscript.

      (7) Line 327, "(Figure 1, F and G)" should be Figure 2F, G.

      Thanks for your reminding. We have corrected it in the revision. Please see lines 347.

      (8) Reduced TNXB levels in human HA cartilage are one of the major findings in this study. Currently, only semi-quatative IHC was used to draw the conclusion. A second method, such as real-time PCR or western blot, is required.

      Thanks for your suggestion. We feel very sorry that we did not have enough samples of human HA cartilages for qPCR and WB experiments, due to severe erosion of the HA cartilage. We have pointed out this limitation in revised drafts. Please see lines 445-448.

      (9) Figure 3 shows that reduced Tnxb was accompanied by the increased Dnmt1. In addition, this study is about methylation. Have the authors tested the change of Dnmt1 levels when Tnxb was knocked down?

      Thanks for your suggestion. According to the reviewer's suggestion, we have tested the expression of Dnmt1 in Tnxb-KD chondrocytes, and no significant alteration was observed. Please see the following Figure.

      Author response image 1.

      Figure Legend: Representative IHC staining of Dnmt1 in articular cartilage from Tnxb-KD HA mice. Corresponding quantification of the proportion of Dnmt1 positive regions. Red arrows indicate positive cells. Scale bar: 100 μm. Data were presented as means ± SD; n = 5 in each group. ns = no significance by unpaired Student’s t test.

      (10) Also, is there a causal relationship between Tnxb levels and the distribution of methylation levels? Any related study was performed?

      Following the valuable suggestion of the reviewer, we used two well-known DNA methyltransferase inhibitors (RG108 or 5-Aza-dc) 3 to examine whether DNA methylation regulates transcriptional expression of TNXB. We found that both inhibitors significantly up-regulated Tnxb mRNA level. We have added this result to the revised Supplementary Figure 4 and draft (lines 292-296 and 369-374).

      (11) In Figure 6, what was the control of "AKT agnost" group?

      Thank you for your suggestion. We feel sorry for our negligence and we have added the vehicle group as a control for AKT agonists in Figure 6 in our revised manuscript.

      (12) Previous studies have reported the involvement of TNXB in TGF-β signaling. Have the authors examined the effect of TNXB on TGF-β signaling in chondrocytes?

      Thank you for your suggestion. Here, we examined the expression of TGF-β signaling in Tnxb-KD chondrocyte and no significant changes were observed. We have discussed this result in revised draft (lines 475-479). We have added this result to the revised Supplementary Figure 7.

      (13) Discussion can be improved. For example, have previous studies reported the association between TNXB and methylation in other cells/tissues? In addition to apoptosis, are there other potential mechanisms underlying the protective role of TNXB in chondrocytes?

      Thank you for your valuable comments. Previous studies have shown the different DNA methylation of TNXB in whole blood from rheumatoid arthritis patients and in retinal pigment epithelium from patients with age-related macular degeneration 4,5. Herein, we were the first to report the association between DNA methylation of TNXB and HA cartilage degeneration. As for TNXB, there are limited public studies regarding physiological function of TNXB, among which mostly report the effect of TNXB on extracellular matrix organization 6,7. In our work, we found that TNXB regulated the phosphorylation of AKT. Since previous reports showed AKT controlled the expression of Mmp13 8, we thought that TNXB might regulated the chondrocyte extracellular matrix organization, in addition to its function on apoptosis. We have discussed these in revised manuscript (lines 462-464, and 495-501).

      (14) The manuscript writing needs to be improved. Typos and grammar issues were noted.

      Thanks. We have modified and polished our language and we hope the revised version could be acceptable for you.

      Reviewer #2 (Public Review):

      Summary:

      This manuscript mainly studied the biological effect of tenascin XB (TNXB) on hemophilic arthropathy (HA) progression. Using bioinformatic and histopathological approaches, the authors identified the novel candidate gene TNXB for HA. Next, the authors showed that TNXB knockdown leads to chondrocyte apoptosis, matrix degeneration, and subchondral bone loss in vivo/vitro. Furthermore, AKT agonists promoted extracellular matrix synthesis and prevented apoptosis in TNXB knockdown chondrocytes.

      Strengths:

      In general, this study significantly advances our understanding of HA pathogenesis. The authors utilize comprehensive experimental strategies to demonstrate the role of TNXB in cartilage degeneration associated with HA. The results are clearly presented, and the conclusions appear appropriate.

      Weaknesses:

      Additional clarification is required regarding the gender of the F8-/- mouse in the study. Is the mouse male or female?

      We feel sorry that we did not provide enough information about the gender of the F8-/- mouse in the previous draft. Here, we used male F8-/- mice as the study subjects for our experiments. Hemophilia A is predominantly seen in males because of the X chromosome linkage 9.

      Recommendations For The Authors:

      Some issues need to be addressed in the manuscript:

      (1) During the progression of HA, in addition to cartilage degeneration, synovial hypertrophy and inflammation are also significant symptoms. How is the expression of TNXB in HA synovium?

      Thank you for your valuable comments. According to the reviewer's suggestion, we tested the expression of TNXB in the synovium, and there was no statistically significant difference in the expression level of TNXB in the synovium (Supplementary Figure. 2) Please see lines 347-349.

      (2) Lines 183-188. The methods of virus infection should be more detailed. What was the concentration of the AAVs injected? And how many doses were administrated?

      Thank you for your suggestion. We have added an explanation of virus infection and injected doses in revised methods section (lines 205-206).

      (3) Line 197-198. Could the author double-check the decalcification time for human cartilage samples? Is it for 3 months? Or for 3 weeks?

      Thank you for your suggestion. We have reconfirmed the decalcification of human cartilage samples for 3 months.

      (4) Line 343-344 "Above results suggest that TNXB might be protective against HA and its cartilage suppression is closely related to HA development." The conclusion is inappropriate, please revise it.

      Thanks for your suggestion. We have revised this conclusion into “Above results suggest that the suppression of TNXB in cartilage promotes the HA development”. Please see lines 365-366.

      (5) Line 326-327, the IHC staining for human samples is shown in Figure 2, not Figure 1. Please double check and revise it.

      Thanks for your reminding. We feel sorry for our negligence and we have corrected it in the revision.

      (6) For Figure 1B, it shows the MRI images of knee joints. However, the method section lacks details regarding the MRI imaging scan and analysis. Could the author include this information in the method section?

      Thank you for your valuable comments. We have added the method of MRI imaging scan and analysis in revised Methods. Please see lines 154-163.

      (7) In Figure 5, The statistical result of Bcl-2 is inconsistent with its Western blot band. Please check.

      Thanks for your reminding. We have modified it in the revision.

      (8) Please read through the text carefully to check for language problems. For example, in Line 68 "Our" not "our".

      Thanks for your reminding. In revision, we have corrected it. Please see Line 68.

      Reviewer #3 (Public Review):

      Summary:

      The manuscript by Dr. Chen et al. investigates the genes that are differentially methylated and associated with cartilage degeneration in hemophilia patients. The study demonstrates the functional mechanisms of the TNXB gene in chondrocytes and F8-/- mice. The authors first showed significant DNA methylation differences between hemophilic arthritis (HA) and osteoarthritis through genome-wide DNA methylation analysis. Subsequently, they showed a decreased expression of the differentially methylated TNXB gene in cartilage from HA patients and mice. By knocking down TNXB in vivo and in vitro, the results indicated that TNXB regulates extracellular matrix homeostasis and apoptosis by modulating p-AKT. The findings are novel and interesting, and the study presents valuable information in blood-induced arthritis research.

      Strengths:

      The authors adopted a comprehensive approach by combining genome-wide DNA methylation analysis, in vivo and in vitro experiments using human and mouse samples to illustrate the molecular mechanisms involved in HA progression, which is crucial for developing targeted therapeutic strategies. The study identifies Tenascin XB (TNXB) as a central mediator in cartilage matrix degradation. It provides mechanistic insights into how TNXB influences cartilage matrix degradation by regulating the activation of AKT. It opens avenues for future research and potential therapeutic interventions using AKT agonists for cartilage protection in hemophilic arthropathy. The conclusions drawn from the study are clear and directly tied to the findings.

      Weaknesses:

      (1) The study utilizes a small sample size (N=5 for both osteoarthritis and hemophilic arthropathy). A larger sample size would enhance the generalizability and statistical power of the findings.

      Thank you for pointing out this deficiency. Indeed, our sample size is relatively small, although the overall sample size was sufficient for statistical analyses. And we have added this limitation in discussion in revised manuscript. Please see line 445-448. Considering the small sample size, we subsequently performed functional validation study for TNXB, one of the most significant genes, and demonstrated that TNXB exerted critical impacts on chondrocytes apoptosis in HA pathogenesis in vivo and in vitro.

      (2) The use of an animal model (F8-/- mouse) to investigate the role of TNXB may not fully capture the complexity of human hemophilic arthropathy. Differences in the biology between species may affect the translatability of the findings to human patients.

      Thank you for your valuable comments. We recognize that biological differences between species can affect the clinical translation of research findings. In our work, we sequenced human cartilage samples to obtain the differentially methylated gene-TNXB. Meanwhile, we demonstrated that protein expression of TNXB protein was significantly down-regulated in HA human cartilage and F8-/- transgenic mouse cartilage. The F8-/- transgenic mouse serves as a well-accepted model for the study of hemophilia, which is phenotypically similar to that of human patients suffering from the disease and spontaneously bleeds into the joints and soft tissues. Besides, this model mouse has been widely used in the study of hemophilia and hemophilic arthritis 9-11.

      (3) The study primarily focuses on TNXB as a central mediator, but it might overlook other potentially relevant factors contributing to cartilage degradation in hemophilic arthropathy. A more holistic exploration of genetic and molecular factors could provide a broader understanding of the condition.

      Thanks for your suggestion. Since our human sample size is relatively small, we should interpret differentially methylated genes cautiously. Therefore, we mainly focused on the most top significant gene TNXB for functional study. In our further study, we will expand the sample size to more comprehensively explore the molecular mechanisms of HA.

      Recommendations For The Authors:

      The following are my suggestions:

      (1) Why do the authors choose to concentrate on the knee joint in the introduction when hemophilia, characterized by a deficiency in clotting factor F8, is recognized as a systemic disease?

      Thank you for your valuable comments. Although hemophilia a systemic disease, approximately 80%-90% of bleeding episodes in patients with hemophilia occur within the musculoskeletal system, especially in the knee joint 12.

      (2) While Figure 1 illustrates distinct expressions of Dnmt1 and Dnmt3a, only Dnmt1 results are presented in HA mice models in Figure 3. To address this, it is suggested that the expression of Dnmt3a be explored in animal models.

      Thank you for your suggestion. According to the reviewer's suggestion, we examined the expression of Dnmt3a in mouse articular cartilage, and the expression level of Dnmt3a was significantly up-regulated in both the 4W and 8W model groups compared with the control group (Figure 3). Please see line 364.

      (3) In Figure 3, the sample size for Dnmt1 is smaller than the other indicators; therefore, supplementing the sample count is recommended.

      Thanks for your reminding. We have corrected it in the revision.

      (4) Regarding Figure 4G, a few apoptotic cells were observed in the AAV NC group. It is advised that this figure be reviewed for accuracy.

      Thanks for your suggestion. In Figure 5D, the AAV-NC group is the case of needle-injected with AAV. Therefore, it is normal for apoptotic cells to appear in the cartilage layer.

      (5) The authors concluded that TNXB plays a role in apoptosis and AKT signaling. Providing expression data for Caspase9 would be valuable to strengthen this assertion, as PI3K/AKT signaling directly influences its activation during apoptosis.

      Thank you for your comments. We have examined the expression of Cleaved-Caspase9 protein, and found that knockdown of TNXB resulted in upregulation of Cleaved-Caspase9 protein expression, which was reversed by addition of SC79. This result has added in revised Figure 6 and manuscript. Please see line 414.

      (6) Quantitative analysis of the differences between the two groups in Supplemental Figures is necessary.

      Thank you for your suggestion. We have added the quantitative analysis of the differences between the two groups in Supplemental Figures.

      (7) With three major isoforms (homologs) of AKT in mammals-AKT1, 2, and 3 - why did the authors specifically focus on AKT1?

      Thank you for your comments. Based on the results of the KEGG enrichment analysis of differential methylated genes, we investigated the role of PI3K/AKT pathway in apoptosis of HA chondrocytes. AKT is universally acknowledged as a core factor in the PI3K/AKT pathway that plays critical roles in various cellular activities such as cell proliferation, cell differentiation, cell apoptosis, metabolism and so on 13,14, More notably, several studies demonstrated that in AKT family, Akt1 primarily was involved in regulation of chondrocyte survival and proteoglycan synthesis 15. Therefore, we detected phosphorylation of AKT1 in HA cartilages and TNXB-KD chondrocytes, and found that TNXB regulation chondrocytes ECM and apoptosis by AKT1. Reference:

      (1) Cooke, E.J., Zhou, J.Y., Wyseure, T., Joshi, S., Bhat, V., Durden, D.L., Mosnier, L.O., and von Drygalski, A. (2018). Vascular Permeability and Remodelling Coincide with Inflammatory and Reparative Processes after Joint Bleeding in Factor VIII-Deficient Mice. Thromb Haemost 118, 1036-1047. 10.1055/s-0038-1641755.

      (2) Kleiboer, B., Layer, M.A., Cafuir, L.A., Cuker, A., Escobar, M., Eyster, M.E., Kraut, E., Leavitt, A.D., Lentz, S.R., Quon, D., et al. (2022). Postoperative bleeding complications in patients with hemophilia undergoing major orthopedic surgery: A prospective multicenter observational study. J Thromb Haemost 20, 857-865. 10.1111/jth.15654.

      (3) Weiland, T., Weiller, M., Kunstle, G., and Wendel, A. (2009). Sensitization by 5-azacytidine toward death receptor-induced hepatic apoptosis. J Pharmacol Exp Ther 328, 107-115. 10.1124/jpet.108.143560.

      (4) Anaparti, V., Agarwal, P., Smolik, I., Mookherjee, N., and El-Gabalawy, H. (2020). Whole Blood Targeted Bisulfite Sequencing and Differential Methylation in the C6ORF10 Gene of Patients with Rheumatoid Arthritis. J Rheumatol 47, 1614-1623. 10.3899/jrheum.190376.

      (5) Porter, L.F., Saptarshi, N., Fang, Y., Rathi, S., den Hollander, A.I., de Jong, E.K., Clark, S.J., Bishop, P.N., Olsen, T.W., Liloglou, T., et al. (2019). Whole-genome methylation profiling of the retinal pigment epithelium of individuals with age-related macular degeneration reveals differential methylation of the SKI, GTF2H4, and TNXB genes. Clin Epigenetics 11, 6. 10.1186/s13148-019-0608-2.

      (6) Mao, J.R., Taylor, G., Dean, W.B., Wagner, D.R., Afzal, V., Lotz, J.C., Rubin, E.M., and Bristow, J. (2002). Tenascin-X deficiency mimics Ehlers-Danlos syndrome in mice through alteration of collagen deposition. Nat Genet 30, 421-425. 10.1038/ng850.

      (7) Zhang, K., Wang, X., Zeng, L.T., Yang, X., Cheng, X.F., Tian, H.J., Chen, C., Sun, X.J., Zhao, C.Q., Ma, H., and Zhao, J. (2023). Circular RNA PDK1 targets miR-4731-5p to enhance TNXB expression in ligamentum flavum hypertrophy. FASEB J 37, e22877. 10.1096/fj.202200022RR.

      (8) Guo, H., Yin, W., Zou, Z., Zhang, C., Sun, M., Min, L., Yang, L., and Kong, L. (2021). Quercitrin alleviates cartilage extracellular matrix degradation and delays ACLT rat osteoarthritis development: An in vivo and in vitro study. J Adv Res 28, 255-267. 10.1016/j.jare.2020.06.020.

      (9) Weitzmann, M.N., Roser-Page, S., Vikulina, T., Weiss, D., Hao, L., Baldwin, W.H., Yu, K., Del Mazo Arbona, N., McGee-Lawrence, M.E., Meeks, S.L., and Kempton, C.L. (2019). Reduced bone formation in males and increased bone resorption in females drive bone loss in hemophilia A mice. Blood Adv 3, 288-300. 10.1182/bloodadvances.2018027557.

      (10) Haxaire, C., Hakobyan, N., Pannellini, T., Carballo, C., McIlwain, D., Mak, T.W., Rodeo, S., Acharya, S., Li, D., Szymonifka, J., et al. (2018). Blood-induced bone loss in murine hemophilic arthropathy is prevented by blocking the iRhom2/ADAM17/TNF-alpha pathway. Blood 132, 1064-1074. 10.1182/blood-2017-12-820571.

      (11) Vols, K.K., Kjelgaard-Hansen, M., Ley, C.D., Hansen, A.K., and Petersen, M. (2019). Bleed volume of experimental knee haemarthrosis correlates with the subsequent degree of haemophilic arthropathy. Haemophilia 25, 324-333. 10.1111/hae.13672.

      (12) Lobet, S., Peerlinck, K., Hermans, C., Van Damme, A., Staes, F., and Deschamps, K. (2020). Acquired multi-segment foot kinematics in haemophilic children, adolescents and young adults with or without haemophilic ankle arthropathy. Haemophilia 26, 701-710. 10.1111/hae.14076.

      (13) Garcia, D., and Shaw, R.J. (2017). AMPK: Mechanisms of Cellular Energy Sensing and Restoration of Metabolic Balance. Mol Cell 66, 789-800. 10.1016/j.molcel.2017.05.032.

      (14) Johnson, J., Chow, Z., Lee, E., Weiss, H.L., Evers, B.M., and Rychahou, P. (2021). Role of AMPK and Akt in triple negative breast cancer lung colonization. Neoplasia 23, 429-438. 10.1016/j.neo.2021.03.005.

      (15) Rao, Z., Wang, S., and Wang, J. (2017). Peroxiredoxin 4 inhibits IL-1beta-induced chondrocyte apoptosis via PI3K/AKT signaling. Biomed Pharmacother 90, 414-420. 10.1016/j.biopha.2017.03.075.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      We thank the reviewers for their thorough review of and overall positive comments on our manuscript. We have revised the manuscript to address the one remaining concern raised by one of the reviewers. This is described below.

      Fig.1B-C: To give a standard deviation from 2 data points has no statistical significance. In this case it would be better to define as range/difference of the 2 data points.

      We have modified the legend for Figure 1 to now read, “The average of two experiments is plotted with the bars representing the range of each time point.”

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      In 'Systems analysis of miR-199a/b-5p and multiple miR-199a/b-5p targets during chondrogenesis', Patel et al. present a variety of analyses using different methodologies to investigate the importance of two miRNAs in regulating gene expression in a cellular model of cartilage development. They first re-analysed existing data to identify these miRNAs as one of the most dynamic across a chondrogenesis development time course. Next, they manipulated the expression of these miRNAs and showed that this affected the expression of various marker genes as expected. An RNA-seq experiment on these manipulations identified putative mRNA targets of the miRNAs which were also supported by bioinformatics predictions. These top hits were validated experimentally and, finally, a kinetic model was developed to demonstrate the relationship between the miRNAs and mRNAs studied throughout the paper.

      I am convinced that the novel relationships reported here between miR-199a/b-5p and target genes FZD6, ITGA3, and CAV1 are likely to be genuine. It is important for researchers working on this system and related diseases to know all the miRNA/mRNA relationships but, as the authors have already published work studying the most dynamic miRNA (miR-140-5p) in this biological system I was not convinced that this study of the second miRNA in their list provided a conceptual advance on their previous work.

      We believe this study is an enhancement on our previous work for two reasons, which have been alluded to in new text within the introduction. Firstly, our previous work used experimental and bioinformatic analysis to identify microRNAs with significant regulatory roles during chondrogenesis. This new manuscript additionally uses  a systems biology approaches to identify novel miRNA-mRNA interactions and capture these within an in silico model. Secondly, this work was initiated by the analysis of our previously generated data – using a novel tool we developed for this type of data (Bioconductor - TimiRGeN).  

      I was also concerned with the lack of reporting of details of the manipulation experiments. The authors state that they have over-expressed miR-199a-5p (Figure 2A) and knocked down miR-199b-5p (Figure 2B) but they should have reported their proof that these experiments had worked as predicted, e.g. showing the qRT-PCR change in miRNA expression. Similarly, I was concerned that one miRNA was over-expressed while the other was knocked down - why did the authors not attempt to manipulate both miRNAs in both directions? Were they unable to achieve a significant change in miRNA expression or did these experiments not confirm the results reported in the manuscript?

      We agree with the reviewer that some additional data were needed to demonstrate the effective regulation of miR-199-5p.  Hence, Supplementary Figure 1 is now included which provides validation of the effects of miR-199a-5p overexpression (Supplementary Figure 1A) and inhibition of miR-199a/b-5p (Supplementary Figure 1B). Within the main manuscript, Figure 2B has been amended to include the consequences of inhibition of miR-199a-5p, with 2C showing the consequences of miR-199b-5p inhibition. Further, we include new data with regards to miR-199a/b-5p inhibition on CAV1 (Figure 4A). 

      I had a number of issues with the way in which some of the data was presented. Table 1 only reported whether a specific pathway was significant or not for a given differential expression analysis but this concealed the extent of this enrichment or the level of statistical significance reported. Could it be redrawn to more similarly match the format of Figure 3A? The various shades of grey in Figure 2 and Figure 4 made it impossible to discriminate between treatments and therefore identify whether these data supported the conclusions made in the text. It also appeared that the same results were reported in Figure 3B and 3C and, indeed, Figure 3B was not referred to in the main text. Perhaps this figure could be made more concise by removing one of these two sets of panels.

      We agree with all points made here and have amended these within the manuscript. Figure 1A is now pathway enrichment plots from the TimiRGeN R Bioconductor package, and the table which previously showed the pathways enriched at each time point is now in the supplementary materials (supp. Table 1). Figure 2 and 4 now have color instead of shades of grey. Figure 3C has now been moved to supplementary materials (Supplementary Figure 2) and is referenced in the text. 

      Overall, while I think that this is an interesting and valuable paper, I think its findings are relatively limited to those interested in the role of miRNAs in this specific biomedical context.

      Reviewer #2 (Public review):

      Summary:

      This study represents an ambitious endeavor to comprehensively analyze the role of miR199a/b-5p and its networks in cartilage formation. By conducting experiments that go beyond in vitro MSC differentiation models, more robust conclusions can be achieved.

      Strengths:

      This research investigates the role of miR-199a/b-5p during chondrogenesis using bioinformatics and in vitro experimental systems. The significance of miRNAs in chondrogenesis and OA is crucial, warranting further research, and this study contributes novel insights.

      Weaknesses:

      While miR-140 and miR-455 are used as controls, these miRNAs have been demonstrated to be more relevant to Cartilage Homeostasis than chondrogenesis itself. Their deficiency has been genetically proven to induce Osteoarthritis in mice. Therefore, the results of this study should be considered in comparison with these existing findings.

      We agree with the reviewers comments. miR-455-null mice develop normally but miR-140-null (or mutated) mice and humans do have skeletal abnormalities (e.g. Nat Med. 2019 Apr;25(4):583-590. doi: 10.1038/s41591-019-0353-2), indicating a role in chondrogenesis.  We have made an addition in the description to point towards the need to assess the roles miR-199a/b-5p may play during skeletogenesis and OA. We anticipate miR-199a/b-5p to be relevant in OA and have ongoing additional work for this – but this beyond the scope of this manuscript. 

      Recommendations to Authors:

      Reviewer #1 (Recommendations to authors):

      Beyond the issues raised in the public review, I had a few minor recommendations that are largely designed to help improve the understanding of the manuscript as it is currently written.

      (1) Please provide the statistical tests used to obtain p-values in the Figure 2 and 4 legends.

      We have now added statistical test information to the figure legends of figures 2 and 4.

      (2) It is stated on p. 9 that both miRNAs may share a functional repertoire because 25 and 341 genes are interested between their inhibition experiments. Please provide statistical support that this overlap is an enrichment over the null background in this experiment. Total DE genes – chi squared. Expected / Observed. 

      A chi-squared test is now presented in the manuscript which shows that the number of significant genes which were found in common between miR-199a-5p knockdown and miR-199b-5p knockdown were significantly more than expected for day 0 or day 1 of the experiments. 

      (3) The final sentence on p. 12 (beginning 'Size of the points reflect...') seemed out of place - is it part of a legend?

      Thank you for pointing out this mistake - it was part of figure 3C and now is in the supplementary materials.

      (4) A sentence on p. 14 reads that 'FZD6 and ITGA3 levels increased significantly' but this should read decreased, rather than increased. Quite an important typo!

      Thank you for pointing this error out. It has been corrected.

      (5) Theoretical transcripts are mentioned in the legend of Figure 5A but these were not present in the figure. Please include these or remove them from the legend.

      This error has been removed form Figure 5A.

      (6) On p 20, the references 22 and 27 should I think be moved to earlier in the sentence (after 'miR-199a-5p-FZD6 has been predicted previously'). Currently, it reads as if these references support your luciferase assays which you claim are the first evidence for this target relationship.

      We agree with this change and have corrected the manuscript.

      (7) The reference to Figure 5D on p. 20 should be a reference to Figure 5C.

      Thank you for pointing this error out – this has been corrected.

      Reviewer #2 (Recommendations to authors):

      (1) The paper is based on the importance of miR-140 and miR-455 as miRNAs in chondrogenesis, citing only Barter, M. J. et al. Stem Cells 33, (2015). Considering the scope and results of this study, this citation is insufficient.

      We agree with this reviewers comments. For many year miR-140 and miR-455 have been experimented on and their importance in OA research has become apparent. We included additional references within the introduction to address this.

      (2) Analyzing chondrogenesis solely through differentiation experiments from MSCs is inadequate. It is essential to perform experiments involving the network within normal cartilage tissue and/or the generation of knockout mice to understand the precise role of miR199a/b-5p in chondrogenesis.

      We have added an additional paragraph in the discussion to state this, and do believe it is highly important that miR-199a/b-5p be tested in OA samples – however this would be beyond the intended scope of this article.

      (3) In light of the above points, it is imperative to investigate the role of miR-199a/b-5p beyond the in vitro differentiation model from MSCs, encompassing mouse OA models or human disease samples.

      In tangent with the previous address, we agree with the pretense and believe additional experiments should be performed to gain more insight to the mechanism of how miR-199a/b-5p regulate OA. But development of a new mouse line to investigate this is not in the scope of this manuscript.

    1. Author response:

      Reviewer #1 (Public Review):

      Summary:

      In this study the authors use an elegant set of single-molecule experiments to assess the transcriptional and post-transcriptional regulation of RecB. The question stems from a previous observation from the same lab, that RecB protein levels are low and not induced under DNA damage. The authors first show that recB transcript levels are low and have a short half-life. They further show that RecB levels are likely regulated via translational control. They provide evidence for low noise in RecB protein levels across cells and show that the translation of the mRNA increases under double-strand break conditions. Authors identify Hfq binding sites in the recbcd [recBCD] operon and show that Hfq regulates the levels of RecB protein without changing the mRNA levels. They suggest that RecB translation is directly controlled by Hfq binding to mRNA, as mutating one of the binding sites has a direct effect on RecB protein levels.

      Strengths:

      The implication of Hfq in regulation of RecB translation is important and suggests mechanisms of cellular response to DNA damage that are beyond the canonically studied mechanisms (such as transcriptional regulation by LexA). Data are clearly presented and the writing is direct and easy to follow. Overall, the study is well-designed and provides novel insights into the regulation of RecB, that is part of the complex required to process break ends.

      Weaknesses:

      Some key findings need additional support/ clarifications to strengthen the conclusions. These are suggested to the authors.

      Reviewer #2 (Public Review):

      Summary:

      The authors carry out a careful and rigorous quantitative analysis of RecB transcript and protein levels at baseline and in response to DNA damage. Using single-molecule FISH and Halo-tagging in order to achieve sensitive measurements, they provide evidence that enhanced RecB protein levels in response to DNA damage are achieved through a post-transcriptional mechanism mediated by the La-like RNA binding protein, Hhq1 [Sm-like RNA binding protein, Hfq]. In terms of biological relevance, the authors suggest that this mechanism provides a way to control the optimum level of RecB expression as both deletion and over-expression are deleterious. In addition, the proposed mechanism provides a new framework for understanding how transcriptional noise can be suppressed at the protein level.

      Strengths:

      Strengths of the manuscript include the rigorous approaches and orthogonal evidence to support the core conclusions, for example, the evidence that altering either Hhq1 [Hfq] or its recognition sequence on the RNA similarly enhance the protein to RNA ratio of RecB. The writing is clear and the experiments are well-controlled. The modeling approaches provide essential context to interpret the data, particularly given the small numbers of molecules per cell. The interpretations are careful and well supported.

      Weaknesses:

      The authors make a compelling case for the biological need to exquisitely control RecB levels, which they suggest is achieved by the pathway they have uncovered and described in this work. However, this conclusion is largely inferred as the authors only investigate the effect on cell survival in response to (high levels of) DNA damage and in response to two perturbations - genetic knock-out or over-expression, both of which are likely more dramatic than the range of expression levels observed in unstimulated and DNA damage conditions.

      In the discussion, we proposed that the post-transcriptional regulation of recB that we have uncovered could be involved in keeping RecB levels within an optimal range. We agree that testing the phenotypic impact of small changes in RecB levels would add additional strength to this suggestion. However, this is experimentally very challenging because of the low copy number of RecB molecules, which makes it difficult to slightly alter RecB levels in a controlled and homogeneous (across cells) manner. Developing the synthetic biology tools necessary for such an experiment is beyond the scope of this article. In the manuscript, we will clarify the limits of our interpretation of the role of the uncovered regulation.

      Reviewer #3 (Public Review):

      Summary:

      The work by Kalita et al. reports regulation of RecB expression by Hfq protein in E.coli cell. RecBCD is an essential complex for DNA repair and chromosome maintenance. The expression level needs to be regulated at low level under regular growth conditions but upregulated upon DNA damage. Through quantitative imaging, the authors demonstrate that recB mRNAs and proteins are expressed at low level under regular conditions. While the mRNA copy number demonstrates high noise level due to stochastic gene expression, the protein level is maintained at a lower noise level compared to expected value. Upon DNA damage, the authors claim that the recB mRNA level is not significantly affected, but RecB protein level increases due to a higher translation efficiency. [Upon DNA damage, the authors claim that the recB mRNA concentration is decreased, however RecB protein level is compensated by higher translation efficiency]. Through analyzing CLASH data on Hfq, they identified two Hfq binding sites on RecB polycistronic mRNA, one of which is localized at the ribosome binding site (RBS). Through measuring RecB mRNA and protein level in the ∆hfq cell, the authors conclude that binding of Hfq to the RBS region of recB mRNA suppresses translation of recB mRNA. This conclusion is further supported by the same measurement in the presence of Hfq sequestrator, the sRNA ChiX, and the deletion of the Hfq binding region on the mRNA.

      Strengths:

      (1) The manuscript is well-written and easy to understand.

      (2) While there are reported cases of Hfq regulating translation of bound mRNAs, its effect on reducing translation noise is relatively new.

      (3) The imaging and analysis are carefully performed with necessary controls.

      Weaknesses:

      The major weaknesses include a lack of mechanistic depth, and part of the conclusions are not fully supported by the data.

      (1) Mechanistically, it is still unclear why upon DNA damage, translation level of recB mRNA increases, which makes the story less complete. The authors mention in the Discussion that a moderate (30%) decrease in Hfq protein was observed in previous study, which may explain the loss of translation repression on recB. However, given that this mRNA exists in very low copy number (a few per cell) and that Hfq copy number is on the order of a few hundred to a few thousand, it's unclear how 30% decrease in the protein level should resides a significant change in its regulation of recB mRNA.

      While Hfq is a highly abundant protein, it has many mRNA and sRNA targets, some of which are also present in large amounts (DOI: 10.1046/j.1365-2958.2003.03734.x). As recently shown, the competition among the targets over Hfq proteins results in unequal (across various targets) outcomes, where the targets with higher Hfq affinity have an advantage over the ones with less efficient binding (DOI: 10.1016/j.celrep.2020.02.016). In line with these findings, we reason that upon DNA damage, a moderate decrease in the Hfq protein abundance (30%) can lead to a similar competition among Hfq targets where high-affinity targets outcompete low- affinity ones as well as low-abundant ones (such as recB mRNAs). Therefore, we hypothesise that the regulation of low abundant targets of Hfq by moderate perturbations of Hfq protein level is a potential explanation for the change in RecB translation that we have observed. We will expand this part of the discussion to explain our reasoning in a more explicit and coherent way.

      (2) Based on the experiment and the model, Hfq regulates translation of recB gene through binding to the RBS of the upstream ptrA gene through translation coupling. In this case, one would expect that the behavior of ptrA gene expression and its response to Hfq regulation would be quite similar to recB. Performing the same measurement on ptrA gene expression in the presence and absence of Hfq would strengthen the conclusion and model

      Indeed, based on our model, we expect PtrA expression to be regulated by Hfq in a similar manner to RecB. However, the product encoded by the ptrA gene, Protease III, (i) has been poorly characterised; (ii) unlike RecB, is located in the periplasm (DOI: 10.1128/jb.149.3.1027-1033.1982); and (iii) is not involved in any DNA repair pathway. Therefore, analysing PtrA expression would take us away from the key questions of our study.

      (3) The authors agree that they cannot exclude the possibility of sRNA being involved in the translation regulation. However, this can be tested by performing the imaging experiments in the presence of Hfq proximal face mutations, which largely disrupt binding of sRNAs.

      (4) The data on construct with a long region of Hfq binding site on recB mRNA deleted is less convincing. There is no control to show that removing this sequence region itself has no effect on translation, and the effect is solely due to the lack of Hfq binding. A better experiment would be using a Hfq distal face mutant that is deficient in binding to the ARN motifs.

      We thank the referee for these suggestions. We have performed the requested experiments, and the quantification of RecB abundance in the presence of Hfq proteins mutated in the proximal and distal face will be added to the revised version of the manuscript.

      (5) Ln 249-251: The authors claim that the stability of recB mRNA is not changed in ∆hfq simply based on the steady-state mRNA level. To claim so, the lifetime needs to be measured in the absence of Hfq.

      We agree that this statement is not fully supported by our data and will address this issue in the revised version.

      (6) What's the labeling efficiency of Halo-tag? If not 100% labeled, is it considered in the protein number quantification? Is the protein copy number quantification through imaging calibrated by an independent method? Does Halo tag affect the protein translation or degradation?

      Our previous study (DOI: 10.1038/s41598-019-44278-0) described a detailed characterisation of the HaloTag labelling technique for quantifying low-copy proteins in single E. coli cells.

      In that study, we used RecB-HaloTag as an example of a low-copy number protein. We showed a complete quantitative agreement of RecB detection between two fully independent methods: HaloTag-based labelling with cell fixation and RecB-sfGFP combined with a microfluidic device that lowers protein diffusion in the bacterial cytoplasm. This second method has previously been validated for protein quantification (DOI: 10.1038/ncomms11641) and provides detection of 80-90% of the labelled protein. Additionally, in our protocol, immediate chemical fixation of cells after the labelling and quick washing steps ensure that new, unlabelled RecB proteins are not produced. We, therefore, conclude that our approach to RecB detection is highly reliable and sufficient for comparing RecB production in different conditions and mutants.

      The RecB-HaloTag construct has been designed for minimal impact on RecB production and function. The HaloTag is translationally fused to RecB in a loop positioned after the serine present at position 47 where it is unlikely to interfere with (i) the formation of RecBCD complex (based on RecBCD structure, DOI: 10.1038/nature02988), (ii) the initiation of translation (as it is far away from the 5’UTR and the beginning of the open reading frame) and (iii) conventional C-terminal-associated mechanisms of protein degradation (DOI: 10.15252/msb.20199208). In our manuscript, we showed that the RecB-HaloTag degradation rate is similar to the dilution rate due to bacterial growth. This is in line with a recent study on unlabelled proteins, which shows that RecB’s lifetime is set by the cellular growth rate (https://doi.org/10.1101/2022.08.01.502339) and indicates that the HaloTag fusion is not affecting RecB stability.

      Furthermore, we have demonstrated (DOI: 10.1038/s41598-019-44278-0) that (i) bacterial growth is not affected by replacing the native RecB with RecB-HaloTag, (ii) RecB-HaloTag is fully functional upon DNA damage, and (iii) no proteolytic processing of the RecB-HaloTag is detected by Western blot.

      These results suggest that RecB expression and functionality are unlikely to be affected by the translational HaloTag insertion at Ser-47 in RecB. In the revised version of the manuscript, we will add information about the construct and discuss the reliability of the quantification.

      (7) Upper panel of Fig S8a is redundant as in Fig 5B. Seems that Fig S8d is not described in the text.

      Indeed, the data in the upper panel in Fig S8a was repeated (from Fig 5B) for visual purposes to facilitate comparison with the panel below. We will modify the figure legend to indicate this repetition clearly.

      In Fig S8d, we confirmed the functionality of the Hfq protein expressed from the pQE-Hfq plasmid in our experimental conditions, which was not described in the text. We will include this clarification in the updated manuscript.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      We would like to first thank the Editor as well as the three reviewers for their enthusiasm and conducting another careful evaluation of our manuscript. We appreciate their thoughtful and constructive comments and suggestions. Some concerns regarding experimental design, data analysis, and over-interpretation of our findings still remains unresolved after the initial revision. Here we endeavored to address these remaining concerns through further refinement of our writing, and inclusion of these concerns in the discussion session. We hope our response can better explain the rationale of our experimental design and data interpretation. In addition, we also acknowledge the limitations of our present study, so that it will benefit future investigations into this topic. Our detail responses are provided below.

      Reviewer #1 (Public Review):

      This study examines whether the human brain uses a hexagonal grid-like representation to navigate in a non-spatial space constructed by competence and trustworthiness. To test this, the authors asked human participants to learn the levels of competence and trustworthiness for six faces by associating them with specific lengths of bar graphs that indicate their levels in each trait. After learning, participants were asked to extrapolate the location from the partially observed morphing bar graphs. Using fMRI, the authors identified brain areas where activity is modulated by the angles of morphing trajectories in six-fold symmetry. The strength of this paper lies in the question it attempts to address. Specifically, the question of whether and how the human brain uses grid-like representations not only for spatial navigation but also for navigating abstract concepts, such as social space, and guiding everyday decision-making. This question is of emerging importance.

      I acknowledge the authors' efforts to address the comments received. However, my concerns persist:

      Thanks very much again for the re-evaluation and comments. Please find our revision plans to each comment below.

      (1) The authors contend that shorter reaction times correlated with increased distances between individuals in social space imply that participants construct and utilize two-dimensional representations. This method is adapted from a previous study by Park et al. Yet, there is a fundamental distinction between the two studies. In the prior work, participants learned relationships between adjacent individuals, receiving feedback on their decisions, akin to learning spatial locations during navigation. This setup leads to two different predictions: If participants rely on memory to infer relationships, recalling more pairs would be necessary for distant individuals than for closer ones. Conversely, if participants can directly gauge distances using a cognitive map, they would estimate distances between far individuals as quickly as for closer ones. Consequently, as the authors suggest, reaction times ought to decrease with increasing decision value, which, in this context, corresponds to distances. However, the current study allowed participants to compare all possible pairs without restricting learning experiences, rendering the application of the same methodology for testing two-dimensional representations inappropriate. In this study, the results could be interpreted as participants not forming and utilizing two-dimensional representations.

      We apologize for not being clear enough about our task design, we have made relevant changes in the methodology section in the manuscript to make it clearer. The reviewer’s concern is that participants learned about all the pairs in the comparison task which makes the distance effect invalid. We would like to clarify that during all the memory test tasks (the comparison task, the collect task and the recall task outside and inside scanner), participants never received feedback on whether their responses were correct or not. Therefore, the comparison task in our study is similar to the previous study by Park et al. (2021). Participants do not have access to correct responses for all possible pairs of comparison prior to or during this task, they would need to make inference based on memory retrieval.

      (2) The confounding of visual features with the value of social decision-making complicates the interpretation of this study's results. It remains unclear whether the observed grid-like effects are due to visual features or are genuinely indicative of value-based decision-making, as argued by the authors. Contrary to the authors' argument, this issue was not present in the previous study (Constantinescu et al.). In that study, participants associated specific stimuli with the identities of hidden items, but these stimuli were not linked to decision-making values (i.e., no image was considered superior to another). The current study's paradigm is more akin to that of Bao et al., which the authors mention in the context of RSA analysis. Indeed, Bao et al. controlled the length of the bars specifically to address the problem highlighted here. Regrettably, in the current paradigm, this conflation remains inseparable.

      We’d like to thank the reviewer for facilitating the discussion on the question of ‘social space’ vs. ‘sensory space’. The task in scanner did not require value-based decision making. It is akin to both the Bao et al. (2019) study and Constantinescu et al. (2016) study in a sense that all three tasks are trying to ask participants to imagine moving along a trajectory in an abstract, non-physical space and the trajectory is grounded in sensory cue. Participants were trained to associate the sensory cue with abstract (social/nonsocial) concepts. We think that the paradigm is a relatively faithful replication of the study by Constantinescu et al. Nonetheless, we agreed that a design similar to Bao et al. (2019) which controls for sensory confounds would be more ideal to address this concern, or adopting a value-based decision-making task in the scanner similar to that by Park et al. (2021), and we have included this limitation in the discussion section.

      (3) While the authors have responded to comments in the public review, my concerns noted in the Recommendation section remain unaddressed. As indicated in my recommendations, there are aspects of the authors' methodology and results that I find difficult to comprehend. Resolving these issues is imperative to facilitate an appropriate review in subsequent stages.

      Considering that the issues raised in the previous comments remain unresolved, I have retained my earlier comments below for review.

      We apologize for not addressing the recommendations properly, please find detailed our response and plans for revision.

      I have some comments. I hope that these can help.

      (1) While the explanation of Fig.4A-C is lacking in both the main text and figure legend, I am not sure if I understand this finding correctly. Did the authors find the effects of hexagonal modulation in the medial temporal gyrus and lingual gyrus correlate with the individual differences in the extent to which their reaction times were associated with the distances between faces when choosing a better collaborator? If so, I am not sure what argument the authors try to draw from these findings. Do the authors argue that these brain areas show hexagonal modulation, which was not supported in the previous analysis (Fig.3)? What is the level of correlation between these behavioral measures and the grid consistency effects in the vmPFC and EC, where the authors found actual grid-like activity? How do the authors interpret this finding? More importantly, how does this finding associate with other findings and the argument of the study?

      We apologize for not being clear enough in the manuscript and we will improve the clarity in our revision. This exploratory analysis reported in Figure 4 aims to use whole-brain analysis to examine: 1) if there is any correlation between the strength of grid-like representation of social value map and behavioral indicators of map-like representation; and 2) if there are any correlation between the strength of grid-like representation of this social value map and participants’ social trait.

      To be more specific, for the behavioral indicator, we used the distance effect in the reaction time of the comparison task outside the scanner. We interpreted stronger distance effect as a behavioral index of having better internal map-like representation. We interpreted stronger grid consistency effect as a neural index of better representation of the 2D social space. Therefore, we’d like to see if there exists correlation between behavioral and neural indices of map-like representation.

      To achieve this goal, behavioral indicators are entered as covariates in second-level analysis of the GLM testing grid consistency effect (GLM2). Figure3 showed results from GLM2 without the covariates. Figure4 showed results of clusters whose neural indices of map-like representation covaried with that from behavior and survived multiple-comparison correction. Indeed, in these regions, the grid consistency effect was not significant at group level (so not shown in Figure 3). We tried to interpret this finding in our discussion (line 374-289 for temporal lobe correlation, line 395-404 for precuneus correlation).

      Finally, we would like to point out that including the covariates in GLM2 did not change results in Figure3, the clusters in Figure3 still survives correction. Meanwhile, these clusters in Figure 3 did not show correlation with behavioral indicators of map-like representation.

      Author response image 1.

      (2) There are no behavioral results provided. How accurately did participants perform each of the tasks? How are the effects of grid consistency associated with the level of accuracy in the map test?

      Why did participants perform the recall task again outside the scanner?

      We will endeavor to improve signposting the corresponding figures in the main text. For the behavioral results, we reported the stats in section “Participants construct social value map after associative learning of avatars and corresponding characteristics” in the main text, and the plots are shown in Figure 1. Particularly, figure 1F showed accuracy of tasks in training, as well as the recall task in the scanner. For the correlation, we did not find significant correlation between behavioural accuracy and grid consistency effect. We will make it clearer in the result section.

      (3) The methods did not explain how the grid orientation was estimated and what the regressors were in GLM2. I don't think equations 2 and 3 are quite right.

      For the grid orientation estimation method, we provided detailed description in the Supplementary methods 2.2.2. We will add links to this section in the main text.

      Equation 2 and 3 describes how the parametric regressors entered into GLM2 were formed and provided prerequisites on calculation of grid orientations. Equation 2 was the results of directly applying the angle addition and subtraction theorems so they should be correct. We will try to make the rationale clearer in the supplementary text.

      (4) With the increase in navigation distances, more grid cells would activate. Therefore, in theory, the activity in the entorhinal cortex should increase with the Euclidean distances, which has not been found here. I wonder if there was enough variability in the Euclidean distances that can be captured by neural correlates. This would require including the distributions of Euclidean distances according to their trajectory angles. Regarding how Fig.1E is generated, I don't understand what this heat map indicates. Additionally, it needs to be confirmed if the grid effects remain while controlling for the Euclidean distances of navigation trajectories.

      We did not specifically control for the trajectory length, we only controlled for the distribution of trajectory to be uniform. We have included a figure of the distribution of Euclidean distances in Figure S9 and the distribution of trajectory direction in Figure S8.

      Author response image 2.

      As for Figure 1E, we aim to reproduce the findings from Figure 1F in Constantinescu et al. (2016) where they showed that participants progressively refined the locations of the outcomes through training. We divided the space into 15×15 subregions and computed the amount of time spent in each subregion and plotted Figure 1E. Brighter color in Figure 1E indicate greater amount of time spent in the corresponding subregion. Note that all these timing indices were computed as a percentage of the total time spent in the explore task in a given session. If participants were well-acquainted with the space and avatars, they would spend more time at the avatar (brighter color in avatar locations) in the review session compared to the learning session.

      As for the effect of distances on grid-like representation, we did not include the distance as a parametric modulator in grid consistency effect GLM (GLM2) due to insufficient trials in each bin (6-8 trials). But there is side evidence that could potentially rule out this confound. In the distance representation analysis, we did not find distance representation in any of the clusters that have significant grid-like representation (regions in Figure 2).

      Reviewer #2 (Public Review):

      Summary:

      In this work, Liang et al. investigate whether an abstract social space is neurally represented by a grid-like code. They trained participants to 'navigate' around a two-dimensional space of social agents characterized by the traits warmth and competence, then measured neural activity as participants imagined navigating through this space. The primary neural analysis consisted of three procedures: 1) identifying brain regions exhibiting the hexagonal modulation characteristic of a grid-like code, 2) estimating the orientation of each region's grid, and 3) testing whether the strength of the univariate neural signal increases when a participant is navigating in a direction aligned with the grid, compared to a direction that is misaligned with the grid. From these analyses, the authors find the clearest evidence of a grid-like code in the prefrontal cortex and weaker evidence in the entorhinal cortex.

      Strengths:

      The work demonstrates the existence of a grid-like neural code for a socially-relevant task, providing evidence that such coding schemes may be relevant for a variety of two-dimensional task spaces.

      Weaknesses:

      In the revised manuscript, the authors soften their claims about finding a grid code in the entorhinal cortex and provide additional caveats about limitations in their findings. It seems that the authors and reviewers are in agreement about the following weaknesses, which were part of my original review: Claims about a grid code in the entorhinal cortex are not well-supported by the analyses presented. The whole-brain analysis does not suggest that the entorhinal cortex exhibits hexagonal modulation; the strength of the entorhinal BOLD signal does not track the putative alignment of the grid code there; multivariate analyses do not reveal any evidence of a grid-like representational geometry.

      In the authors' response to reviews, they provide additional clarification about their exploratory analyses examining whether behavior (i.e., reaction times) and individual difference measures (i.e., social anxiety and avoidance) can be predicted by the hexagonal modulation strength in some region X, conditional on region X having a similar estimated grid alignment with some other region Y. My guess is that readers would find it useful if some of this language were included in the main text, especially with regard to an explanation regarding the rationale for these exploratory studies.

      Thank you very much again for your careful re-evaluation and suggestions. We have tried to improve our writing and incorporate the suggestions in the new revision.

      Reviewer #3 (Public Review):

      Liang and colleagues set out to test whether the human brain uses distance and grid-like codes in social knowledge using a design where participants had to navigate in a two-dimensional social space based on competence and warmth during an fMRI scan. They showed that participants were able to navigate the social space and found distance-based codes as well as grid-like codes in various brain regions, and the grid-like code correlated with behavior (reaction times).

      On the whole, the experiment is designed appropriately for testing for distant-based and grid-like codes, and is relatively well powered for this type of study, with a large amount of behavioral training per participant. They revealed that a number of brain regions correlated positively or negatively with distance in the social space, and found grid-like codes in the frontal polar cortex and posterior medial entorhinal cortex, the latter in line with prior findings on grid-like activity in entorhinal cortex. The current paper seems quite similar conceptually and in design to previous work, most notably Park et al., 2021, Nature Neuroscience.

      (1) The authors claim that this study provides evidence that humans use a spatial / grid code for abstract knowledge like social knowledge.

      This data does specifically not add anything new to this argument. As with almost all studies that test for a grid code in a similar "conceptual" space (not only the current study), the problem is that, when the space is not a uniform, square/circular space, and 2-dimensional then there is no reason the code will be perfectly grid like, i.e., show six-fold symmetry. In real world scenarios of social space (as well as navigation, semantic concepts), it must be higher dimensional - or at least more than two dimensional. It is unclear if this generalizes to larger spaces where not all part of the space is relevant. Modelling work from Tim Behrens' lab (e.g., Whittington et al., 2020) and Bradley Love's lab (e.g., Mok & Love, 2019) have shown/argued this to be the case. In experimental work, like in mazes from the Mosers' labs (e.g., Derdikman et al., 2009), or trapezoid environments from the O'Keefe lab (Krupic et al., 2015), there are distortions in mEC cells, and would not pass as grid cells in terms of the six-fold symmetry criterion.

      The authors briefly discuss the limitations of this at the very end but do not really say how this speaks to the goal of their study and the claim that social space or knowledge is organized as a grid code and if it is in fact used in the brain in their study and beyond. This issue deserves to be discussed in more depth, possibly referring to prior work that addressed this, and raise the issue for future work to address the problem - or if the authors think it is a problem at all.

      Thanks very much again for your careful re-evaluation and comments. We have tried to incorporate some of the suggested papers into our discussion. In summary, we agree that there is more to six-fold symmetric code that can be utilized to represent “conceptual space”. We think that the next step for a stronger claim would be to find the representation of more spontaneous non-spatial maps.

      References

      Bao, X., Gjorgieva, E., Shanahan, L. K., Howard, J. D., Kahnt, T., & Gottfried, J. A. (2019). Grid-like Neural Representations Support Olfactory Navigation of a Two-Dimensional Odor Space. Neuron, 102(5), 1066-1075 e1065. https://doi.org/10.1016/j.neuron.2019.03.034

      Constantinescu, A. O., O'Reilly, J. X., & Behrens, T. E. J. (2016). Organizing conceptual knowledge in humans with a gridlike code. Science, 352(6292), 1464-1468. https://doi.org/10.1126/science.aaf0941

      Park, S. A., Miller, D. S., & Boorman, E. D. (2021). Inferences on a multidimensional social hierarchy use a grid-like code. Nat Neurosci, 24(9), 1292-1301. https://doi.org/10.1038/s41593-02100916-3

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Although this manuscript contains a potentially interesting piece of work that delineates a mechanism of IQCH that associates with spermatogenesis, this reviewer feels that a number of issues require clarification and re-evaluation for a better understanding of the role of IQCH in spermatogenesis. With the shortage of logics and supporting data, causal relationships are still not clear among IQCH, CaM, and HNRPAB. The most serious point in this manuscript could be that the authors try to generalize their interpretations with too simplified model from limited pieces of their data. The way the data and the logic are presented needs to be largely revised, and several interpretations should be supported by direct evidence.

      Response: Thank you for the reviewer’s comment. IQCH is a calmodulin-binding protein, and the binding of IQCH and CaM was confirmed by LC-MS/MS analysis and co-IP assay using sperm lysate. We thus speculated that if the interaction of IQCH and CaM might be a prerequisite for IQCH function. To prove that speculation, we took HNRPAB as an example. We knocked down IQCH in cultured cells, and a decrease in the expression of HNRPAB was observed. Similarly, when we knocked down CaM in cultured cells, and a decrease in the expression of HNRPAB was also detected. However, these results cannot exclude that IQCH or CaM could regulate HNRPAB expression alone. To investigate that if IQCH or CaM could regulate HNRPAB expression alone, we overexpressed IQCH in cells that knocked down CaM, while the expression of HNRPAB cannot be rescued, suggesting that IQCH cannot regulate HNRPAB expression when CaM is reduced. In consistent, we overexpressed CaM in cells that knocked down IQCH, while the expression of HNRPAB cannot be rescued, suggesting that CaM cannot regulate HNRPAB expression when IQCH is reduced. Thus, IQCH or CaM cannot regulate HNRPAB expression alone. Moreover, we deleted the IQ motif of IQCH, which is required for binding to CaM. The co-IP results showed that the interaction of IQCH and CaM was disrupted when deleting the IQ motif of IQCH, and the expression of HNRPAB was decreased. Therefore, we suggested that the interaction of IQCH and CaM might be required for IQCH regulating HNRPAB. In future studies, we will further investigate the relationships among IQCH, CaM, and HNRPAB.

      Reviewer #3 (Public Review):

      (1) More background details are needed regarding the proteins involved, in particular IQ proteins and calmodulin. The authors state that IQ proteins are not well-represented in the literature, but do not state how many IQ proteins are encoded in the genome. They also do not provide specifics regarding which calmodulins are involved, since there are at least 5 family members in mice and humans. This information could help provide more granular details about the mechanism to the reader and help place the findings in context.

      Response: Thanks to reviewer’s suggestion. We have provided additional background information regarding IQ-containing protein family members in humans and mice, as well as other IQ-containing proteins implicated in male fertility, in the Introduction section. Furthermore, we have supplemented the Introduction with background information concerning the association between CaM and male infertility.

      (2) The mouse fertility tests could be improved with more depth and rigor. There was no data regarding copulatory plug rate; data was unclear regarding how many WT females were used for the male breeding tests and how many litters were generated; the general methodology used for the breeding tests in the Methods section was not very explicitly or clearly described; the sample size of n=3 for the male breeding tests is rather small for that type of assay; and, given that ICHQ appears to be expressed in testicular interstitial cells (Fig. S10) and somewhat in other organs (Fig. S2), another important parameter of male fertility that should be addressed is reproductive hormone levels (e.g., LH, FSH, and testosterone). While normal epididymal size in Fig. S3 suggests that hormone (testosterone) levels are normal, epididymal size and/or weight were not rigorously quantified.

      Response: Thanks to reviewer’s comment. We have provided the data regarding copulatory plug rate and the average number of litters for breeding tests in revised Figure 3—figure supplement 2. The methodology used for the breeding tests has been revised to be more detailed and explicit in the revised Method section. Moreover, we have increased the sample size for male breeding tests to n=6. We measured the serum levels of FSH, LH, and Testosterone in the WT (9.3±1.9 ng/ml, 0.93±0.15 ng/ml, and 0.2±0.03 ng/ml) and Iqch KO mice (12±2 ng/ml, 1.17±0.2 ng/ml, and 0.2±0.04 ng/ml). There was no significant difference observed in the serum levels of reproductive hormones between WT and Iqch KO mice; therefore, we did not include the data in the study. Furthermore, we have added quantitative data on epididymal size in the revised Figure 3—figure supplement 2.

      (3) The Western blots in Figure 6 should be rigorously quantified from multiple independent experiments so that there is stronger evidence supporting claims based on those assays.

      Response: We appreciate the reviewer's comment. As suggested, we have added quantified data in Figure 6—figure supplement 2 from the results of Western blotting in Figure 6.

      (4) Some of the mouse testis images could be improved. For example, the PNA and PLCz images in Figure S7 are difficult to interpret in that the tubules do not appear to be stage-matched, and since the authors claimed that testicular histology is unaffected in knockout testes, it should be feasible to stage-match control and knockout samples. Also, the anti-ICHQ and CaM immunofluorescence in Figure S10 would benefit from some cell-type-specific co-stains to more rigorously define their expression patterns, and they should also be stage-matched.

      Response: Thanks to reviewer’s suggestions. We have included immunofluorescence images of anti-PLCz, anti-PNA and anti-IQCH and CaM during spermatogenesis development.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) There are multiple grammatical errors and statements drawn beyond the results. The entire manuscript would benefit from professional editing.

      Response: We are sorry for the grammatical errors. We have enlisted professional editing services to refine our manuscript.

      (2) Line 40, "Firstly" is not appropriate here.

      Response: Thanks to reviewer’s comment. The word "Firstly" has been removed from the revised manuscript.

      (3) Line 44, "processes".

      Response: Thanks to reviewer’s suggestion. We have changed “process” in to “processes” on line 45.

      (4) "spermatocytogenesis (mitosis)" is incorrect.

      Response: Thanks to reviewer’s comment. We have changed “spermatocytogenesis (mitosis)” in to “mitosis” on line 47.

      (5) Ca and Ca2+ are both used in line 67 - 77. Be consistent.

      Response: We appreciate the reviewer's detailed checks. We have maintained consistency by revising instances of "Ca" to "Ca2+" in revised manuscript.

      (6) Line 238 to 240, "To elucidate the molecular mechanism by which IQCH regulates male fertility, we performed liquid chromatography tandem mass spectrometry (LC-MS/MS) analysis using mouse sperm lysates and detected 288 interactors of IQCH (Data S1)."It is not clear how LC-MS/MS using mouse sperm lysates could detect "288 interactors of IQCH"? A co-IP experiment for IQCH using sperm lysates prior to LC-MS/MS is needed to detect "interactors of IQCH". However, in the Methods section, consistent with the main text, proteomic quantification was conducted for protein extract from sperm. Figure legend for Fig. 5 did not explain this, either.Thus, it is unable to evaluate Figure 5.

      Response: We sincerely apologize for the oversight. Following reviewer’s suggestions, we have supplemented the method details of LC-MS/MS experiment in the Methods section of revised manuscript. Additionally, we conducted a co-IP experiment for IQCH using sperm lysates prior to LC-MS/MS and we did not include the corresponding figure in the manuscript. The results are as follows:

      Author response image 1.

      The results of a co-IP experiment for IQCH using sperm lysates from WT mice.

      (7) Line 246, "... key proteins that might be activated by IQCH". What does "activated" here refer to? Should it be "upregulated"?

      Response: We are sorry to our inexact statement. Instead, "upregulated" would better convey the intended meaning. According to reviewer’s suggestions, we have modified "activated" into "upregulated".

      (8) Line 252 to 254, "the cross-analysis revealed that 76 proteins were shared between the IQCH-bound proteins and the IQCH-activated proteins (Fig. 5E), implicating this subset of genes as direct targets." This is a confusing statement. Is the author trying to say, IQCH-bound proteins have upregulated expression, suggesting that IQCH enhances their expression?

      Response: We appreciate the reviewer's comment regarding the clarity of the statement in Line 252 to 254 of the manuscript. We have modified this sentence into “Importantly, cross-analysis revealed that 76 proteins were shared between the IQCH-bound proteins and the downregulated proteins in Iqch KO mice (Figure 5E), suggesting that IQCH might regulate their expression by the interaction.”

      (9) Line 260 to 261, "SYNCRIP, HNRNPK, FUS, EWSR1, ANXA7, SLC25A4, and HNRPAB ... the loss of which showed the greatest influence on the phenotype of the Iqch KO mice." There is no evidence suggesting that the loss of SYNCRIP, HNRNPK, FUS, EWSR1, ANXA7, SLC25A4, and HNRPAB leads to Iqch KO phenotype.

      Response: We apologize for our inaccurate statement. According to the literature, Fus KO, Ewsr1 KO, and Hnrnpk KO male mice were infertile, showing the spermatogenic arrest with absence of spermatozoa (Kuroda et al. 2000; Tian et al. 2021; Xu et al. 2022). Syncrip is involved meiotic process in Drosophila by interacting with Doublefault (Sechi et al. 2019). HNRPAB might be associated with mouse spermatogenesis by binding to Protamine 2 and contributing its translational regulation. Specifically, ANXA7 is a calcium-dependent phospholipid-binding protein that is a negative regulator of mitochondrial apoptosis (Du et al. 2015). Loss of SLC25A4 results in mitochondrial energy metabolism defects in mice (Graham et al. 1997). Moreover, RNA immunoprecipitation on formaldehyde cross-linked sperm followed by qPCR detected the interactions between HNRPAB and Catsper1, Catsper2, Catsper3, Ccdc40, Ccdc39, Ccdc65, Dnah8, Irrc6, and Dnhd1, which are essential for sperm development (Fukuda et al. 2013). Our Iqch KO mice showed abnormal sperm count, motility, morphology, and mitochondria, so we inferenced that IQCH might play a role in spermatogenesis by regulating the expression of SYNCRIP, HNRNPK, FUS, EWSR1, ANXA7, SLC25A4, and HNRPAB to some extent. We have changed an appropriate stamen that “We focused on SYNCRIP, HNRNPK, FUS, EWSR1, ANXA7, SLC25A4, and HNRPAB, which play important roles in spermatogenesis.”

      (10) Fig. 6C and 6D use different styles of error bars.

      Response: We are sorry for our oversight. In accordance with the reviewer's recommendations, we have modified the representation of error bars in the revised Fig. 6C.

      (11) Line 296 to 297, "As expected, CaM interacted with IQCH, as indicated by LC-MS/MS analysis". It is not clear how LC-MS/MS detects protein interaction.

      Response: As reviewer’s suggestions, we have supplemented the method details of LC-MS/MS experiment in the Methods section of revised manuscript. The results of proteins interacting with IQCH in sperm lysates from the LC-MS/MS experiment analysis were submitted as Figure 5—source data 1.

      (12) It is still not clear how the interaction between IQCH, CaM, and HNRPAB is required for the expression of each other.

      Response: Thank you for the reviewer’s comment. IQCH is a calmodulin-binding protein, and the binding of IQCH and CaM was confirmed by LC-MS/MS analysis and co-IP assay using sperm lysate. We thus speculated that if the interaction of IQCH and CaM might be a prerequisite for IQCH function. To prove that speculation, we took HNRPAB as an example. We knocked down IQCH in cultured cells, and a decrease in the expression of HNRPAB was observed. Similarly, when we knocked down CaM in cultured cells, and a decrease in the expression of HNRPAB was also detected. However, these results cannot exclude that IQCH or CaM could regulate HNRPAB expression alone. To investigate that if IQCH or CaM could regulate HNRPAB expression alone, we overexpressed IQCH in cells that knocked down CaM, while the expression of HNRPAB cannot be rescued, suggesting that IQCH cannot regulate HNRPAB expression when CaM is reduced. In consistent, we overexpressed CaM in cells that knocked down IQCH, while the expression of HNRPAB cannot be rescued, suggesting that CaM cannot regulate HNRPAB expression when IQCH is reduced. Thus, IQCH or CaM cannot regulate HNRPAB expression alone. Moreover, we deleted the IQ motif of IQCH, which is required for binding to CaM. The co-IP results showed that the interaction of IQCH and CaM was disrupted when deleting the IQ motif of IQCH, and the expression of HNRPAB was decreased. Therefore, we suggested that the interaction of IQCH and CaM might be required for IQCH regulating HNRPAB. In future studies, we will further investigate the relationships among IQCH, CaM, and HNRPAB.

      Reviewer #3 (Recommendations For The Authors):

      The authors have addressed my minor concerns. However, they neglected to address any of my more significant concerns in the public review. I assume that they simply overlooked these critiques, despite the fact that eLife explicitly states that "...as a general rule, concerns about a claim not being justified by the data should be explained in the public review." Therefore, the authors should have looked more carefully at the public reviews. As a result, my major concerns about the manuscript remain.

      Response: We apologize for overlooking the public review process. We have improved our study based on the feedback received during the public review.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This important study advances our understanding of why diabetes is a risk factor for more severe Covid-19 disease. The authors offer solid evidence that cathepsin L is more active in diabetic individuals, that this higher activity is recapitulated at the cellular level in the presence of high glucose, and that high glucose leads to higher cathepsin L maturation. While not all aspects of the relationship between diabetes and cathepsin L (e.g., effects of metabolic acidosis) have been investigated, the work should be of interest to researchers in diabetes, virology, and immunology.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The study by He et al. investigates the relationship of an increased susceptibility of diabetes patients to COVID-19. The paper raises the possibility that hyperglycemia-induced cathepsin L maturation could be one of the driving forces in this pathology, suggesting that an increased activity of CTSL leads to accelerated virus infection rates due to an elevated processing of the SARS-CoV-2 spike protein.

      In a clinical case-control study, the team found that the severity of corona infections was higher in diabetic patients, and their CTSL levels correlated well with the progression of the disease. They further showed an increase in CTSL activity in the long term as well as acute hyperglycemia. SARS-CoV-2 increasingly infected cells that were cultured in serum from diabetic patients, the same was observed using high glucose medium. No effect was observed in the medium with increased concentrations of insulin. CTSL knockout abolished the glucose-dependent increase in infection.

      Increased glucose levels did not correlate with an increase in CTSL transcription. Rather He et al. could show that high glucose levels led to CTSL translocation from the ER into the lysosome. It was the glucose-dependent processing of the protease to its active form which promoted infection.

      Strengths:

      It is a complete study starting from a clinical observation and ending on the molecular mechanism. A strength is certainly the wide selection of experiments. The clinical study to investigate the effect of glucose on CTSL concentrations in healthy individuals sets the stage for experiments in cell culture, animal models, and human tissue. The effect of CTSL knockout cell lines on glucose-induced SARS-CoV2 infection rates is convincing. Finally, the team used a combination of Western blots and confocal microscopy to identify the underlying molecular mechanisms. The authors manage to keep the diabetic condition at the center of their study and therefore extend on previous knowledge of glucose-induced CTSL activation and their consequences for COVID-19 infections. By doing so, they create a novel connection between CTSL involvement in SARS-CoV2 infections and diabetes.

      Weaknesses:

      (1) The authors suggest that hyperglycemia as a symptom of diabetes leads to an increased infection rate in those patients. Throughout their study, the team focuses on two select symptoms of a diabetic condition, hyperglycemia and hyperinsulinemia. The team acknowledges in the discussion that there could be various other reasons. Hyperglycemia can lead to metabolic acidosis and a shift in blood pH. As CTSL activity is highly dependent on pH, it would have been crucial to include this parameter in the study.

      We sincerely appreciate your valuable comment. We agree that hyperglycemia can lead to metabolic acidosis and alter blood pH. However, the normal range for blood pH in humans is relatively narrow, typically ranging from 7.35 to 7.45. In our study, we ensured that blood pH remained within this normal range for both diabetic and healthy control samples. To address your concern, we conducted experiments to investigate CTSL activity in response to pH fluctuations within this physiological range. The updated Fig. 4a now presents these findings, demonstrating consistent CTSL activity despite pH variations. Statistical analysis was performed using one-way ANOVA with Tukey’s post hoc test to ensure robustness. We have also amended the figure legend and provided corresponding descriptions in the final edition manuscript (line 15-18, page 7).

      Author response image 1.

      (2) The study rarely differentiates between cellular and extracellular CTSL activity. A more detailed explanation for the connection between the intracellular CTSL and serum CTSL in diabetic individuals, presumably via lysosomal exocytosis, could be helpful with regard to the final model to give a more complete picture.

      Thank you for your insightful comments. Previous studies have elucidated the process by which lysosomal CTSL is transported via vesicles and subsequently secreted from the cell membrane through exocytosis (references 1-5). To provide a more comprehensive understanding, we have incorporated this information on Fig. 6h, page 32 of the final edition manuscript. This addition aims to enhance clarity regarding the connection between intracellular and serum CTSL activity in diabetic individuals, particularly through lysosomal exocytosis.

      Author response image 2.

      References:

      (1) Reddy A et al. Plasma membrane repair is mediated by Ca(2+)-regulated exocytosis of lysosomes. Cell. 2001 Jul 27;106(2):157-69. doi: 10.1016/s0092-8674(01)00421-4. PMID: 11511344.

      (2) Hasanagic M et al. Different Pathways to the Lysosome: Sorting out Alternatives. Int Rev Cell Mol Biol. 2015;320:75-101. doi: 10.1016/bs.ircmb.2015.07.008. Epub 2015 Aug 19. PMID: 26614872.

      (3) Reiser J et al. Specialized roles for cysteine cathepsins in health and disease. J Clin Invest. 2010 Oct;120(10):3421-31. doi: 10.1172/JCI42918. Epub 2010 Oct 1. PMID: 20921628; PMCID: PMC2947230.

      (4) Jaiswal JK et al. Membrane proximal lysosomes are the major vesicles responsible for calcium-dependent exocytosis in nonsecretory cells. J Cell Biol. 2002 Nov 25;159(4):625-35. doi: 10.1083/jcb.200208154. Epub 2002 Nov 18. PMID: 12438417; PMCID: PMC2173094.

      (5) Coutinho MF et al. Mannose-6-phosphate pathway: a review on its role in lysosomal function and dysfunction. Mol Genet Metab. 2012 Apr;105(4):542-50. doi: 10.1016/j.ymgme.2011.12.012. Epub 2011 Dec 23. PMID: 22266136.

      (3) In the early result section, an effect of hyperglycemia on total CTSL concentrations is described, but the data is not very convincing. Over the course of the manuscript, the hypothesis shifts increasingly towards an increase in protease trans-localization and processing to the active form rather than a change in total protease amounts. The overall importance of CTSL concentrations remains questionable.

      Thank you for your insightful feedback. We have addressed your concerns regarding the impact of hyperglycemia on CTSL concentrations. Fig. 2h-j illustrate the effect of acute hyperglycemia on both CTSL concentration and activity in 15 healthy male volunteers over a 160-minute period. During this short timeframe, CTSL concentration remained stable, as evidenced by consistent RNA results from cells exposed to varying glucose levels (Supplementary Fig.1). However, there was a significant increase in CTSL activity, indicating that glucose elevation rapidly triggers CTSL maturation through propeptide cleavage. This activation process occurs more rapidly than CTSL protein synthesis. In summary, acute hyperglycemia specifically elevates CTSL activity, while chronic hyperglycemia may impact both CTSL activity and concentration (Fig. 2a-d). Additionally, Tournu C, et al. (1998) (reference 1) and Shi Q, et al. (2018) (reference 2) have reported that increased glucose metabolism promotes the maturation and secretion of CTSL and other proteases. These findings align with our evidence that hyperglycemia drives CTSL maturation, as discussed at line 10-25, page 12 in the final edition manuscript.

      References:

      (1) Tournu C et al. Glucose controls cathepsin expression in Ras-transformed fibroblasts. Arch Biochem Biophys. 1998 Dec 1;360(1):15-24. doi: 10.1006/abbi.1998.0916. PMID: 9826424.

      (2) Shi Q et al. Increased glucose metabolism in TAMs fuels O-GlcNAcylation of lysosomal Cathepsin B to promote cancer metastasis and chemoresistance. Cancer Cell. 2022 Oct 10;40(10):1207-1222.e10. doi: 10.1016/j.ccell.2022.08.012. Epub 2022 Sep 8. PMID: 36084651.

      Reviewer #2 (Public Review):

      Summary:

      In this study, the authors hypothesized that individuals with diabetes have elevated blood CTSL levels, which facilitates SARS-CoV-2 infection. The authors conducted in vitro experiments, revealing that elevated glucose levels promote SARS-CoV-2 infection in wild-type cells. In contrast, CTSL knockout cells show reduced susceptibility to high glucose-promoted effects. Additionally, the authors utilized lung tissue samples obtained from both diabetic and non-diabetic patients, along with db/db diabetic and control mice. Their findings indicate that diabetic conditions lead to an elevation in CTSL activity in both humans and mice.

      Strengths:

      The authors have effectively met their research objectives, and their conclusions are supported by the data presented. Their findings suggest that high glucose levels promote CTSL maturation and translocation from the endoplasmic reticulum to the lysosome, potentially contributing to diabetic comorbidities and complications.

      Weaknesses:

      (1) In Figure 1e, the authors measured plasma levels of COVID-19 related proteins, including ACE2, CTSL, and CTSB, in both diabetic and non-diabetic COVID-19 patients. Notably, only CTSL levels exhibited a significant increase in diabetic patients compared to non-diabetic patients, and these levels varied throughout the course of COVID-19. Given that the diabetes groups encompass both male and female patients, it is essential to ascertain whether the authors considered the potential impact of gender on CTSL levels. The diabetes groups comprised a higher percentage of male patients (61.3%) compared to the non-diabetes group, where males constituted only 38.7%.

      Thank you for your insightful feedback. In response to your concerns regarding the potential impact of gender on CTSL levels in diabetic and non-diabetic COVID-19 patients, we conducted analyses to address this issue. While our initial study involved 62 COVID-19 patients, with 31 having diabetes and 31 without, matching based on gender and age, we acknowledged the challenge of obtaining balanced gender distribution in both groups due to the difficulty of collecting blood samples from COVID-19 patients. To mitigate potential gender bias resulting from small sample sizes, we conducted a supplementary clinical study involving 122 non-COVID-19 volunteers, including 61 individuals with diabetes and 61 without. The percentage of males in the diabetes group was 50.8%, while in the healthy group, males constituted 44.3% (P value = 0.468), indicating no significant gender bias. We have incorporated this information into the discussion section on line 4-13, page 11 in the final edition manuscript, to provide clarity on this aspect of our study.

      (2) Lines 145-149: "The results showed that WT Huh7 cell cultured in high glucose medium exhibited a much higher infective rate than those in low glucose medium. However, CTSL KO Huh7 cells maintained a low infective rate of SARS-CoV-2 regardless of glucose or insulin levels (Fig. 3f-h). Therefore, hyperglycemia enhanced SARS-CoV-2 infection dependent on CTSL." However, this evidence may be insufficient to support the claim that hyperglycemia enhances SARS-CoV-2 infection dependent on CTSL. The human hepatoma cell line Huh7 might not be an ideal model to validate the authors' hypothesis regarding high blood glucose promoting SARS-CoV-2 infection through CTSL.

      Thank you for your valuable feedback. We have addressed the concerns regarding the sufficiency of evidence supporting the claim that hyperglycemia enhances SARS-CoV-2 infection dependent on CTSL. Specifically, we have revised the expression to state, “Therefore, hyperglycemia enhanced SARS-CoV-2 infection through CTSL.” as suggested, in line 9, page 7 in the final edition manuscript. Additionally, we acknowledge the potential involvement of other bioactive factors, such as 1,5-anhydro-D-glucitol (1,5-AG), in mediating SARS-CoV-2 infection in patients with diabetes, as outlined in the discussion section from line 13-21, page 13 in the final edition manuscript.

      Regarding the choice of the human hepatoma cell line Huh7 as a model for investigating hyperglycemia-induced CTSL maturation and SARS-CoV-2 infection, we recognize the importance of tissue specificity and the liver’s significance as a target organ for COVID-19. Despite potential limitations, such as generalization of liver function abnormalities and lack of tissue specificity in SARS-CoV-2 impact, Huh7 cells offer practical advantages as a mature cell model for studying SARS-CoV-2 infection, including accessibility, susceptibility to infection, and stable proliferation (reference 1-3). We have elaborated on these considerations in the discussion section at line 19-23, page 11 in the final edition manuscript, to provide context for our choice of experimental model.

      References:

      (1) Gupta A et al. Extrapulmonary manifestations of COVID-19. Nat Med. 2020 Jul;26(7):1017-1032. doi: 10.1038/s41591-020-0968-3. Epub 2020 Jul 10. PMID: 32651579.

      (2) Nie X et al. Multi-organ proteomic landscape of COVID-19 autopsies. Cell. 2021 Feb 4;184(3):775-791.e14. doi: 10.1016/j.cell.2021.01.004. Epub 2021 Jan 9. PMID: 33503446; PMCID: PMC7794601.

      (3) Ciotti M et al. The COVID-19 pandemic. Crit Rev Clin Lab Sci. 2020 Sep;57(6):365-388. doi: 10.1080/10408363.2020.1783198. Epub 2020 Jul 9. PMID: 32645276.

      (3) The Abstract and Introduction sections lack effective organization.

      Thank you for your valuable comments. We have rewritten the Abstract and Introduction sections and incorporated the updated descriptions in the final edition manuscript.

      Reviewer #1 (Recommendations For The Authors):

      (1) When referring to diabetes, does this exclusively include diabetes type 2?

      Thank you for your inquiry. In our study, the term “diabetes” encompasses the condition of hyperglycemia in a broad sense, rather than specifically indicating type 1 diabetes (T1DM) or type 2 diabetes (T2DM). This broader definition aligns with the scope of our research objectives and findings, particularly observed in the cell experiments conducted. We have clarified this point in the revised discussion section, from line 6-9, page 12 in the final edition manuscript, to provide additional context for readers.

      (2) The titles of the individual paragraphs are not very strong and descriptive. More precise titles help to structure the paper better for the reader.

      Thank you for your valuable comments. We have rewritten the title of each section to make it more precise for readers and incorporated the updated descriptions in the manuscript.

      (3) Fig.3c, adding a 0 nM insulin control would be nice.

      Thank you for your suggestion. We have revised Fig.3c according to your advice. The revised figure was located at page 29 in the final edition manuscript. The corresponding figure legend has also been revised.

      Author response image 3.

      (4) Fig.3e non-infection control would be nice.

      Thank you for your suggestion. We have incorporated your feedback by adding a non-infection control in Fig. 3e. In this revised figure, we included a measurement of SARS-CoV-2 pseudovirus infection assessed through the fluorescence captured by a reader. Cells infected by the pseudovirus exhibited activation of the firefly luciferase, resulting in the release of fluorescence. Conversely, non-infected control cells showed no fluorescence, with the reader recording a value of zero. The updated figure can now be found on page 29 in the final edition manuscript, and we have adjusted the corresponding figure legend accordingly.

      Author response image 4.

      (5) In Figure 5, the processing of CTSL in cells (b-c) strongly differs from processing in tissue (d-e) focusing on amounts of dc-mCTSL. Do you have an explanation for this? Overall, blots are hard to judge by eye and it would be nice to include blots with shorter exposure.

      Thank you for your insightful feedback. The differences observed in the processing of CTSL between cells (Fig. 5b) and tissues (Fig. 5d-e) may be attributed to the complexities inherent in tissue samples, which can impact the clarity of the images. Furthermore, in human tissue samples, it is pertinent to consider that patients in the diabetes group had their blood glucose levels controlled within or near the normal range prior to lung surgery. As a result, the evidence supporting CTSL maturation in human lung tissue blotting images may be less compelling. We have addressed this aspect in the revised results section (lines 10-13, page 9). Additionally, we will consider including blots with shorter exposure to enhance visual clarity in future studies.

      (6) Considering Fig2B and Figure S1, the evidence of an effect of hyperglycemia or high glucose medium on total CTSL protein concentration is not very strong. In my opinion, this claim in the results section for Fig2 should be revisited.

      Thank you for your valuable suggestion. We have revisited the section in question and made appropriate revisions. The original sentence has been modified to accurately reflect the findings: "We found that plasma CTSL activity was strongly positively correlated with chronic hyperglycemia indicated by HbA1c and was significantly higher in diabetic patients than in euglycemic individuals (Fig. 2a, c). Additionally, plasma CTSL concentration showed a positive trend with chronic hyperglycemia indicated by HbA1c (Fig. 2b, d)". These changes have been incorporated into the revised results section (lines 12-16, page 5).

      (7) Overall, data hinting to increased CTSL activity is stronger than protein amount. This being said, in hyperglycemia, blood pH can be affected (metabolic acidosis). As CTSL has higher activity at low pH, could the increase in activity be caused by a drop in pH? Can you include this aspect in your manuscript? For example, is there a pH difference in serum of nondiabetic vs diabetic patients?

      Thank you for your valuable input. We have already addressed the potential impact of pH changes on CTSL activity in our response to Weakness No. 1. As indicated, although hyperglycemia can lead to metabolic acidosis and changes in blood pH, the pH levels observed in our study remained within the normal range (7.35 to 7.45). Therefore, we conducted experiments to investigate CTSL activity in response to changes in pH, which showed consistent activity levels within this range. This information has been included in our revised manuscript (line 15-18, page 7).

      Reviewer #2 (Recommendations For The Authors):

      (1) The Abstract and Introduction sections lack effective organization. The manuscript's style resembles that of Cell Journal rather than aligning with the customary format of eLife.

      Thank you for your valuable comments. The Abstract and Introduction sections have been reorganized to be more precise for readers has been included in our revised manuscript. Additionally, we have meticulously updated the manuscript's style to align with the standard format of eLife in our revised manuscript, especially key resources table of materials and methods sections.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Strengths:

      The study was designed as a 6-month follow-up, with repeated behavioral and EEG measurements through disease development, providing valuable and interesting findings on AD progression and the effect of early-life choline supplantation. Moreover, the behavioral data that suggest an adverse effect of low choline in WT mice are interesting and important beyond the context of AD.

      Thank you for identifying several strengths.

      Weaknesses:

      (1) The multiple headings and subheadings, focusing on the experimental method rather than the narrative, reduce the readability.

      We have reduced the number of headings.

      (2) Quantification of NeuN and FosB in WT littermates is needed to demonstrate rescue of neuronal death and hyperexcitability by high choline supplementation and also to gain further insights into the adverse effect of low choline on the performance of WT mice in the behavioral test.

      We agree and have added WT data for the NeuN and ΔFosB analyses. These data are included in the text and figures. For NeuN, the Figure is Figure 6. For ΔFosB it is Figure 7. In brief, the high choline diet restored NeuN and ΔFosB to the levels of WT mice.

      Below is Figure 6 and its legend to show the revised presentation of data for NeuN. Afterwards is the revised figure showing data for ΔFosB. After that are the sections of the Results that have been revised.

      Author response image 1.

      Choline supplementation improved NeuN immunoreactivity (ir) in hilar cells in Tg2576 animals. A. Representative images of NeuN-ir staining in the anterior DG of Tg2576 animals. (1) A section from a Tg2576 mouse fed the low choline diet. The area surrounded by a box is expanded below. Red arrows point to NeuN-ir hilar cells. Mol=molecular layer, GCL=granule cell layer, HIL=hilus. Calibration for the top row, 100 µm; for the bottom row, 50 µm. (2) A section from a Tg2576 mouse fed the intermediate diet. Same calibrations as for 1. (3) A section from a Tg2576 mouse fed the high choline diet. Same calibrations as for 1. B. Quantification methods. Representative images demonstrate the thresholding criteria used to quantify NeuN-ir. (1) A NeuN-stained section. The area surrounded by the white box is expanded in the inset (arrow) to show 3 hilar cells. The 2 NeuN-ir cells above threshold are marked by blue arrows. The 1 NeuN-ir cell below threshold is marked by a green arrow. (2) After converting the image to grayscale, the cells above threshold were designated as red. The inset shows that the two cells that were marked by blue arrows are red while the cell below threshold is not. (3) An example of the threshold menu from ImageJ showing the way the threshold was set. Sliders (red circles) were used to move the threshold to the left or right of the histogram of intensity values. The final position of the slider (red arrow) was positioned at the onset of the steep rise of the histogram. C. NeuN-ir in Tg2576 and WT mice. Tg2576 mice had either the low, intermediate, or high choline diet in early life. WT mice were fed the standard diet (intermediate choline). (1) Tg2576 mice treated with the high choline diet had significantly more hilar NeuN-ir cells in the anterior DG compared to Tg2576 mice that had been fed the low choline or intermediate diet. The values for Tg2576 mice that received the high choline diet were not significantly different from WT mice, suggesting that the high choline diet restored NeuN-ir. (2) There was no effect of diet or genotype in the posterior DG, probably because the low choline and intermediate diet did not appear to lower hilar NeuN-ir.

      Author response image 2.

      Choline supplementation reduced ∆FosB expression in dorsal GCs of Tg2576 mice. A. Representative images of ∆FosB staining in GCL of Tg2576 animals from each treatment group. (1) A section from a low choline-treated mouse shows robust ∆FosB-ir in the GCL. Calibration, 100 µm. Sections from intermediate (2) and high choline (3)-treated mice. Same calibration as 1. B. Quantification methods. Representative images demonstrating the thresholding criteria established to quantify ∆FosB. (1) A ∆FosB -stained section shows strongly-stained cells (white arrows). (2) A strict thresholding criteria was used to make only the darkest stained cells red. C. Use of the strict threshold to quantify ∆FosB-ir. (1) Anterior DG. Tg2576 mice treated with the choline supplemented diet had significantly less ∆FosB-ir compared to the Tg2576 mice fed the low or intermediate diets. Tg2576 mice fed the high choline diet were not significantly different from WT mice, suggesting a rescue of ∆FosB-ir. (2) There were no significant differences in ∆FosB-ir in posterior sections. D. Methods are shown using a threshold that was less strict. (1) Some of the stained cells that were included are not as dark as those used for the strict threshold (white arrows). (2) All cells above the less conservative threshold are shown in red. E. Use of the less strict threshold to quantify ∆FosB-ir. (1) Anterior DG. Tg2576 mice that were fed the high choline diet had less ΔFosB-ir pixels than the mice that were fed the other diets. There were no differences from WT mice, suggesting restoration of ∆FosB-ir by choline enrichment in early life. (2) Posterior DG. There were no significant differences between Tg2576 mice fed the 3 diets or WT mice.

      Results, Section C1, starting on Line 691:

      “To ask if the improvement in NeuN after MCS in Tg256 restored NeuN to WT levels we used WT mice. For this analysis we used a one-way ANOVA with 4 groups: Low choline Tg2576, Intermediate Tg2576, High choline Tg2576, and Intermediate WT (Figure 5C). Tukey-Kramer multiple comparisons tests were used as the post hoc tests. The WT mice were fed the intermediate diet because it is the standard mouse chow, and this group was intended to reflect normal mice. The results showed a significant group difference for anterior DG (F(3,25)=9.20; p=0.0003; Figure 5C1) but not posterior DG (F(3,28)=0.867; p=0.450; Figure 5C2). Regarding the anterior DG, there were more NeuN-ir cells in high choline-treated mice than both low choline (p=0.046) and intermediate choline-treated Tg2576 mice (p=0.003). WT mice had more NeuN-ir cells than Tg2576 mice fed the low (p=0.011) or intermediate diet (p=0.003). Tg2576 mice that were fed the high choline diet were not significantly different from WT (p=0.827).”

      Results, Section C2, starting on Line 722:

      “There was strong expression of ∆FosB in Tg2576 GCs in mice fed the low choline diet (Figure 7A1). The high choline diet and intermediate diet appeared to show less GCL ΔFosB-ir (Figure 7A2-3). A two-way ANOVA was conducted with the experimental group (Tg2576 low choline diet, Tg2576 intermediate choline diet, Tg2576 high choline diet, WT intermediate choline diet) and location (anterior or posterior) as main factors. There was a significant effect of group (F(3,32)=13.80, p=<0.0001) and location (F(1,32)=8.69, p=0.006). Tukey-Kramer post-hoc tests showed that Tg2576 mice fed the low choline diet had significantly greater ΔFosB-ir than Tg2576 mice fed the high choline diet (p=0.0005) and WT mice (p=0.0007). Tg2576 mice fed the low and intermediate diets were not significantly different (p=0.275). Tg2576 mice fed the high choline diet were not significantly different from WT (p>0.999). There were no differences between groups for the posterior DG (all p>0.05).”

      “∆FosB quantification was repeated with a lower threshold to define ∆FosB-ir GCs (see Methods) and results were the same (Figure 7D). Two-way ANOVA showed a significant effect of group (F(3,32)=14.28, p< 0.0001) and location (F(1,32)=7.07, p=0.0122) for anterior DG but not posterior DG (Figure 7D). For anterior sections, Tukey-Kramer post hoc tests showed that low choline mice had greater ΔFosB-ir than high choline mice (p=0.0024) and WT mice (p=0.005) but not Tg2576 mice fed the intermediate diet (p=0.275); Figure 7D1). Mice fed the high choline diet were not significantly different from WT (p=0.993; Figure 7D1). These data suggest that high choline in the diet early in life can reduce neuronal activity of GCs in offspring later in life. In addition, low choline has an opposite effect, suggesting low choline in early life has adverse effects.”

      (3) Quantification of the discrimination ratio of the novel object and novel location tests can facilitate the comparison between the different genotypes and diets.

      We have added the discrimination index for novel object location to the paper. The data are in a new figure: Figure 3. In brief, the results for discrimination index are the same as the results done originally, based on the analysis of percent of time exploring the novel object.

      Below is the new Figure and legend, followed by the new text in the Results.

      Author response image 3.

      Novel object location results based on the discrimination index. A. Results are shown for the 3 months-old WT and Tg2576 mice based on the discrimination index. (1) Mice fed the low choline diet showed object location memory only in WT. (2) Mice fed the intermediate diet showed object location memory only in WT. (3) Mice fed the high choline diet showed memory both for WT and Tg2576 mice. Therefore, the high choline diet improved memory in Tg2576 mice. B. The results for the 6 months-old mice are shown. (1-2) There was no significant memory demonstrated by mice that were fed either the low or intermediate choline diet. (3) Mice fed a diet enriched in choline showed memory whether they were WT or Tg2576 mice. Therefore, choline enrichment improved memory in all mice.

      Results, Section B1, starting on line 536:

      “The discrimination indices are shown in Figure 3 and results led to the same conclusions as the analyses in Figure 2. For the 3 months-old mice (Figure 3A), the low choline group did not show the ability to perform the task for WT or Tg2576 mice. Thus, a two-way ANOVA showed no effect of genotype (F(1,74)=0.027, p=0.870) or task phase (F(1,74)=1.41, p=0.239). For the intermediate diet-treated mice, there was no effect of genotype (F(1,50)=0.3.52, p=0.067) but there was an effect of task phase (F(1,50)=8.33, p=0.006). WT mice showed a greater discrimination index during testing relative to training (p=0.019) but Tg2576 mice did not (p=0.664). Therefore, Tg2576 mice fed the intermediate diet were impaired. In contrast, high choline-treated mice performed well. There was a main effect of task phase (F(1,68)=39.61, p=<0.001) with WT (p<0.0001) and Tg2576 mice (p=0.0002) showing preference for the moved object in the test phase. Interestingly, there was a main effect of genotype (F(1,68)=4.50, p=0.038) because the discrimination index for WT training was significantly different from Tg2576 testing (p<0.0001) and Tg2576 training was significantly different from WT testing (p=0.0003).”

      “The discrimination indices of 6 months-old mice led to the same conclusions as the results in Figure 2. There was no evidence of discrimination in low choline-treated mice by two-way ANOVA (no effect of genotype, (F(1,42)=3.25, p=0.079; no effect of task phase, F(1,42)=0.278, p=0.601). The same was true of mice fed the intermediate diet (genotype, F(1,12)=1.44, p=0.253; task phase, F(1,12)=2.64, p=0.130). However, both WT and Tg2576 mice performed well after being fed the high choline diet (effect of task phase, (F(1,52)=58.75, p=0.0001, but not genotype (F(1,52)=1.197, p=0.279). Tukey-Kramer post-hoc tests showed that both WT (p<0.0001) and Tg2576 mice that had received the high choline diet (p=0.0005) had elevated discrimination indices for the test session.”

      (4) The longitudinal analyses enable the performance of multi-level correlations between the discrimination ratio in NOR and NOL, NeuN and Fos levels, multiple EEG parameters, and premature death. Such analysis can potentially identify biomarkers associated with AD progression. These can be interesting in different choline supplementation, but also in the standard choline diet.

      We agree and added correlations to the paper in a new figure (Figure 9). Below is Figure 9 and its legend. Afterwards is the new Results section.

      Author response image 4.

      Correlations between IIS, Behavior, and hilar NeuN-ir. A. IIS frequency over 24 hrs is plotted against the preference for the novel object in the test phase of NOL. A greater preference is reflected by a greater percentage of time exploring the novel object. (1) The mice fed the high choline diet (red) showed greater preference for the novel object when IIS were low. These data suggest IIS impaired object location memory in the high choline-treated mice. The low choline-treated mice had very weak preference and very few IIS, potentially explaining the lack of correlation in these mice. (2) There were no significant correlations for IIS and NOR. However, there were only 4 mice for the high choline group, which is a limitation. B. IIS frequency over 24 hrs is plotted against the number of dorsal hilar cells expressing NeuN. The dorsal hilus was used because there was no effect of diet on the posterior hilus. (1) Hilar NeuN-ir is plotted against the preference for the novel object in the test phase of NOL. There were no significant correlations. (2) Hilar NeuN-ir was greater for mice that had better performance in NOR, both for the low choline (blue) and high choline (red) groups. These data support the idea that hilar cells contribute to object recognition (Kesner et al. 2015; Botterill et al. 2021; GoodSmith et al. 2022).

      Results, Section F, starting on Line 801:

      “F. Correlations between IIS and other measurements

      As shown in Figure 9A, IIS were correlated to behavioral performance in some conditions. For these correlations, only mice that were fed the low and high choline diets were included because mice that were fed the intermediate diet did not have sufficient EEG recordings in the same mouse where behavior was studied. IIS frequency over 24 hrs was plotted against the preference for the novel object in the test phase (Figure 9A). For NOL, IIS were significantly less frequent when behavior was the best, but only for the high choline-treated mice (Pearson’s r, p=0.022). In the low choline group, behavioral performance was poor regardless of IIS frequency (Pearson’s r, p=0.933; Figure 9A1). For NOR, there were no significant correlations (low choliNe, p=0.202; high choline, p=0.680) but few mice were tested in the high choline-treated mice (Figure 9B2).

      We also tested whether there were correlations between dorsal hilar NeuN-ir cell numbers and IIS frequency. In Figure 9B, IIS frequency over 24 hrs was plotted against the number of dorsal hilar cells expressing NeuN. The dorsal hilus was used because there was no effect of diet on the posterior hilus. For NOL, there was no significant correlation (low choline, p=0.273; high choline, p=0.159; Figure 9B1). However, for NOR, there were more NeuN-ir hilar cells when the behavioral performance was strongest (low choline, p=0.024; high choline, p=0.016; Figure 9B2). These data support prior studies showing that hilar cells, especially mossy cells (the majority of hilar neurons), contribute to object recognition (Botterill et al. 2021; GoodSmith et al. 2022).”

      We also noted that all mice were not possible to include because they died or other reasons, such a a loss of the headset (Results, Section A, Lines 463-464): Some mice were not possible to include in all assays either because they died before reaching 6 months or for other reasons.

      Reviewer #2 (Public Review):

      Strengths:

      The strength of the group was the ability to monitor the incidence of interictal spikes (IIS) over the course of 1.2-6 months in the Tg2576 Alzheimer's disease model, combined with meaningful behavioral and histological measures. The authors were able to demonstrate MCS had protective effects in Tg2576 mice, which was particularly convincing in the hippocampal novel object location task.

      We thank the Reviewer for identifying several strengths.

      Weaknesses:

      Although choline deficiency was associated with impaired learning and elevated FosB expression, consistent with increased hyperexcitability, IIS was reduced with both low and high choline diets. Although not necessarily a weakness, it complicates the interpretation and requires further evaluation.

      We agree and we revised the paper to address the evaluations that were suggested.

      Reviewer #1 (Recommendations For The Authors):

      (1) A reference directing to genotyping of Tg2576 mice is missing.

      We apologize for the oversight and added that the mice were genotyped by the New York University Mouse Genotyping core facility.

      Methods, Section A, Lines 210-211: “Genotypes were determined by the New York University Mouse Genotyping Core facility using a protocol to detect APP695.”

      (2) Which software was used to track the mice in the behavioral tests?

      We manually reviewed videos. This has been clarified in the revised manuscript. Methods, Section B4, Lines 268-270: Videos of the training and testing sessions were analyzed manually. A subset of data was analyzed by two independent blinded investigators and they were in agreement.

      (3) Unexpectedly, a low choline diet in AD mice was associated with reduced frequency of interictal spikes yet increased mortality and spontaneous seizures. The authors attribute this to postictal suppression.

      We did not intend to suggest that postictal depression was the only cause. It was a suggestion for one of many potential explanations why seizures would influence IIS frequency. For postictal depression, we suggested that postictal depression could transiently reduce IIS. We have clarified the text so this is clear (Discussion, starting on Line 960):

      If mice were unhealthy, IIS might have been reduced due to impaired excitatory synaptic function. Another reason for reduced IIS is that the mice that had the low choline diet had seizures which interrupted REM sleep. Thus, seizures in Tg2576 mice typically started in sleep. Less REM sleep would reduce IIS because IIS occur primarily in REM. Also, seizures in the Tg2576 mice were followed by a depression of the EEG (postictal depression; Supplemental Figure 3) that would transiently reduce IIS. A different, radical explanation is that the intermediate diet promoted IIS rather than low choline reducing IIS. Instead of choline, a constituent of the intermediate diet may have promoted IIS.

      However, reduced spike frequency is already evident at 5 weeks of age, a time point with a low occurrence of premature death. A more comprehensive analysis of EEG background activity may provide additional information if the epileptic activity is indeed reduced at this age.

      We did not intend to suggest that premature death caused reduced spike frequency. We have clarified the paper accordingly. We agree that a more in-depth EEG analysis would be useful but is beyond the scope of the study.

      (4) Supplementary Fig. 3 depicts far more spikes / 24 h compared to Fig. 7B (at least 100 spikes/24h in Supplementary Fig. 3 and less than 10 spikes/24h in Fig. 7B).

      We would like to clarify that before and after a seizure the spike frequency is unusually high. Therefore, there are far more spikes than prior figures.

      We clarified this issue by adding to the Supplemental Figure more data. The additional data are from mice without a seizure, showing their spikes are low in frequency.

      All recordings lasted several days. We included the data from mice with a seizure on one of the days and mice without any seizures. For mice with a seizure, we graphed IIS frequency for the day before, the day of the seizure, and the day after. For mice without a seizure, IIS frequency is plotted for 3 consecutive days. When there was a seizure, the day before and after showed high numbers of spikes. When there was no seizure on any of the 3 days, spikes were infrequent on all days.

      The revised figure and legend are shown below. It is Supplemental Figure 4 in the revised submission.

      Author response image 5.

      IIS frequency before and after seizures. A. Representative EEG traces recorded from electrodes implanted in the skull over the left frontal cortex, right occipital cortex, left hippocampus (Hippo) and right hippocampus during a spontaneous seizure in a 5 months-old Tg2576 mouse. Arrows point to the start (green arrow) and end of the seizure (red arrow), and postictal depression (blue arrow). B. IIS frequency was quantified from continuous video-EEG for mice that had a spontaneous seizure during the recording period and mice that did not. IIS frequency is plotted for 3 consecutive days, starting with the day before the seizure (designated as day 1), and ending with the day after the seizure (day 3). A two-way RMANOVA was conducted with the day and group (mice with or without a seizure) as main factors. There was a significant effect of day (F(2,4)=46.95, p=0.002) and group (seizure vs no seizure; F(1,2)=46.01, p=0.021) and an interaction of factors (F(2,4)=46.68, p=0.002)..Tukey-Kramer post-hoc tests showed that mice with a seizure had significantly greater IIS frequencies than mice without a seizure for every day (day 1, p=0.0005; day 2, p=0.0001; day 3, p=0.0014). For mice with a seizure, IIS frequency was higher on the day of the seizure than the day before (p=0.037) or after (p=0.010). For mice without a seizure, there were no significant differences in IIS frequency for day 1, 2, or 3. These data are similar to prior work showing that from one day to the next mice without seizures have similar IIS frequencies (Kam et al., 2016).

      In the text, the revised section is in the Results, Section C, starting on Line 772:

      “At 5-6 months, IIS frequencies were not significantly different in the mice fed the different diets (all p>0.05), probably because IIS frequency becomes increasingly variable with age (Kam et al. 2016). One source of variability is seizures, because there was a sharp increase in IIS during the day before and after a seizure (Supplemental Figure 4). Another reason that the diets failed to show differences was that the IIS frequency generally declined at 5-6 months. This can be appreciated in Figure 8B and Supplemental Figure 6B. These data are consistent with prior studies of Tg2576 mice where IIS increased from 1 to 3 months but then waxed and waned afterwards (Kam et al., 2016).”

      (5) The data indicating the protective effect of high choline supplementation are valuable, yet some of the claims are not completely supported by the data, mainly as the analysis of littermate WT mice is not complete.

      We added WT data to show that the high choline diet restored cell loss and ΔFosB expression to WT levels. These data strengthen the argument that the high choline diet was valuable. See the response to Reviewer #1, Public Review Point #2.

      • Line 591: "The results suggest that choline enrichment protected hilar neurons from NeuN loss in Tg2576 mice." A comparison to NeuN expression in WT mice is needed to make this statement.

      These data have been added. See the response to Reviewer #1, Public Review Point #2.

      • Line 623: "These data suggest that high choline in the diet early in life can reduce hyperexcitability of GCs in offspring later in life. In addition, low choline has an opposite effect, again suggesting this maternal diet has adverse effects." Also here, FosB quantification in WT mice is needed.

      These data have been added. See the response to Reviewer #1, Public Review Point #2.

      (7) Was the effect of choline associated with reduced tauopathy or A levels?

      The mice have no detectable hyperphosphorylated tau. The mice do have intracellular A before 6 months. This is especially the case in hilar neurons, but GCs have little (Criscuolo et al., eNeuro, 2023). However, in neurons that have reduced NeuN, we found previously that antibodies generally do not work well. We think it is because the neurons become pyknotic (Duffy et al., 2015), a condition associated with oxidative stress which causes antigens like NeuN to change conformation due to phosphorylation. Therefore, we did not conduct a comparison of hilar neurons across the different diets.

      (8) Since the mice were tested at 3 months and 6 months, it would be interesting to see the behavioral difference per mouse and the correlation with EEG recording and immunohistological analyses.

      We agree that would be valuable and this has been added to the paper. Please see response to Reviewer #1, Public Review Point #4.

      Reviewer #2 (Recommendations For The Authors):

      There were several areas that could be further improved, particularly in the areas of data analysis (particularly with images and supplemental figures), figure presentation, and mechanistic speculation.

      Major points:

      (1) It is understandable that, for the sake of labor and expense, WT mice were not implanted with EEG electrodes, particularly since previous work showed that WT mice have no IIS (Kam et al. 2016). However, from a standpoint of full factorial experimental design, there are several flaws - purists would argue are fatal flaws. First, the lack of WT groups creates underpowered and imbalanced groups, constraining statistical comparisons and likely reducing the significance of the results. Also, it is an assumption that diet does not influence IIS in WT mice. Secondly, with a within-subject experimental design (as described in Fig. 1A), 6-month-old mice are not naïve if they have previously been tested at 3 months. Such an experimental design may reduce effect size compared to non-naïve mice. These caveats should be included in the Discussion. It is likely that these caveats reduce effect size and that the actual statistical significance, were the experimental design perfect, would be higher overall.

      We agree and have added these points to the Limitations section of the Discussion. Starting on Line 1050: In addition, groups were not exactly matched. Although WT mice do not have IIS, a WT group for each of the Tg2576 groups would have been useful. Instead, we included WT mice for the behavioral tasks and some of the anatomical assays. Related to this point is that several mice died during the long-term EEG monitoring of IIS.

      (2) Since behavior, EEG, NeuN and FosB experiments seem to be done on every Tg2576 animal, it seems that there are missed opportunities to correlate behavior/EEG and histology on a per-mouse basis. For example, rather than speculate in the discussion, why not (for example) directly examine relationships between IIS/24 hours and FosB expression?

      We addressed this point above in responding to Reviewer #1, Public Review Point #4.

      (3) Methods of image quantification should be improved. Background subtraction should be considered in the analysis workflow (see Fig. 5C and Fig. 6C background). It would be helpful to have a Methods figure illustrating intermediate processing steps for both NeuN and FosB expression.

      We added more information to improve the methods of quantification. We did use a background subtraction approach where ImageJ provides a histogram of intensity values, and it determines when there is a sharp rise in staining relative to background. That point is where we set threshold. We think it is a procedure that has the least subjectivity.

      We added these methods to the Methods section and expanded the first figure about image quantification, Figure 6B. That figure and legend are shown above in response to Reviewer #1, Point #2.

      This is the revised section of the Methods, Section C3, starting on Line 345:

      “Photomicrographs were acquired using ImagePro Plus V7.0 (Media Cybernetics) and a digital camera (Model RET 2000R-F-CLR-12, Q-Imaging). NeuN and ∆FosB staining were quantified from micrographs using ImageJ (V1.44, National Institutes of Health). All images were first converted to grayscale and in each section, the hilus was traced, defined by zone 4 of Amaral (1978). A threshold was then calculated to identify the NeuN-stained cell bodies but not background. Then NeuN-stained cell bodies in the hilus were quantified manually. Note that the threshold was defined in ImageJ using the distribution of intensities in the micrograph. A threshold was then set using a slider in the histogram provided by Image J. The slider was pushed from the low level of staining (similar to background) to the location where staining intensity made a sharp rise, reflecting stained cells. Cells with labeling that was above threshold were counted.”

      (4) This reviewer is surprised that the authors do not speculate more about ACh-related mechanisms. For example, choline deficiency would likely reduce Ach release, which could have the same effect on IIS as muscarinic antagonism (Kam et al. 2016), and could potentially explain the paradoxical effects of a low choline diet on reducing IIS. Some additional mechanistic speculation would be helpful in the Discussion.

      We thank the Reviewer for noting this so we could add it to the Discussion. We had not because we were concerned about space limitations.

      The Discussion has a new section starting on Line 1009:

      “Choline and cholinergic neurons

      There are many suggestions for the mechanisms that allow MCS to improve health of the offspring. One hypothesis that we are interested in is that MCS improves outcomes by reducing IIS. Reducing IIS would potentially reduce hyperactivity, which is significant because hyperactivity can increase release of A. IIS would also be likely to disrupt sleep since it represents aberrant synchronous activity over widespread brain regions. The disruption to sleep could impair memory consolidation, since it is a notable function of sleep (Graves et al. 2001; Poe et al. 2010). Sleep disruption also has other negative consequences such as impairing normal clearance of A (Nedergaard and Goldman 2020). In patients, IIS and similar events, IEDs, are correlated with memory impairment (Vossel et al. 2016).

      How would choline supplementation in early life reduce IIS of the offspring? It may do so by making BFCNs more resilient. That is significant because BFCN abnormalities appear to cause IIS. Thus, the cholinergic antagonist atropine reduced IIS in vivo in Tg2576 mice. Selective silencing of BFCNs reduced IIS also. Atropine also reduced elevated synaptic activity of GCs in young Tg2576 mice in vitro. These studies are consistent with the idea that early in AD there is elevated cholinergic activity (DeKosky et al. 2002; Ikonomovic et al. 2003; Kelley et al. 2014; Mufson et al. 2015; Kelley et al. 2016), while later in life there is degeneration. Indeed, the chronic overactivity could cause the degeneration.

      Why would MCS make BFCNs resilient? There are several possibilities that have been explored, based on genes upregulated by MCS. One attractive hypothesis is that neurotrophic support for BFCNs is retained after MCS but in aging and AD it declines (Gautier et al. 2023). The neurotrophins, notably nerve growth factor (NGF) and brain-derived neurotrophic factor (BDNF) support the health of BFCNs (Mufson et al. 2003; Niewiadomska et al. 2011).”

      Minor points:

      (1) The vendor is Dyets Inc., not Dyets.

      Thank you. This correction has been made.

      (2) Anesthesia chamber not specified (make, model, company).

      We have added this information to the Methods, Section D1, starting on Line 375: The animals were anesthetized by isoflurane inhalation (3% isoflurane. 2% oxygen for induction) in a rectangular transparent plexiglas chamber (18 cm long x 10 cm wide x 8 cm high) made in-house.

      (3) It is not clear whether software was used for the detection of behavior. Was position tracking software used or did blind observers individually score metrics?

      We have added the information to the paper. Please see the response to Reviewer #1, Recommendations for Authors, Point #2.

      (4) It is not clear why rat cages and not a true Open Field Maze were used for NOL and NOR.

      We used mouse cages because in our experience that is what is ideal to detect impairments in Tg2576 mice at young ages. We think it is why we have been so successful in identifying NOL impairments in young mice. Before our work, most investigators thought behavior only became impaired later. We would like to add that, in our experience, an Open Field Maze is not the most common cage that is used.

      (5) Figure 1A is not mentioned.

      It had been mentioned in the Introduction. Figure B-D was the first Figure mentioned in the Results so that is why it might have been missed. We now have added it to the first section of the Results, Line 457, so it is easier to find.

      6) Although Fig 7 results are somewhat complicated compared to Fig. 5 and 6 results, EEG comes chronologically earlier than NeuN and FosB expression experiments.

      We have kept the order as is because as the Reviewer said, the EEG is complex. For readability, we have kept the EEG results last.

      (7) Though the statistical analysis involved parametric and nonparametric tests, It is not clear which normality tests were used.

      We have added the name of the normality tests in the Methods, Section E, Line 443: Tests for normality (Shapiro-Wilk) and homogeneity of variance (Bartlett’s test) were used to determine if parametric statistics could be used. We also added after this sentence clarification: When data were not normal, non-parametric data were used. When there was significant heteroscedasticity of variance, data were log transformed. If log transformation did not resolve the heteroscedasticity, non-parametric statistics were used. Because we added correlations and analysis of survival curves, we also added the following (starting on Line 451): For correlations, Pearson’s r was calculated. To compare survival curves, a Log rank (Mantel-Cox) test was performed.

      Figures:

      (1) In Fig. 1A, Anatomy should be placed above the line.

      We changed the figure so that the word “Anatomy” is now aligned, and the arrow that was angled is no longer needed.

      In Fig. 1C and 1D, the objects seem to be moved into the cage, not the mice. This schematic does not accurately reflect the Fig. 1C and 1D figure legend text.

      Thank you for the excellent point. The figure has been revised. We also updated it to show the objects more accurately.

      Please correct the punctuation in the Fig. 1D legend.

      Thank you for mentioning the errors. We corrected the legend.

      For ease of understanding, Fig. 1C and 1D should have training and testing labeled in the figure.

      Thank you for the suggestion. We have revised the figure as suggested.

      Author response image 6.

      (2) In Figure 2, error bars for population stats (bar graphs) are not obvious or missing. Same for Figure 3.

      We added two supplemental figures to show error bars, because adding the error bars to the existing figures made the symbols, colors, connecting lines and error bars hard to distinguish. For novel object location (Fig. 2) the error bars are shown in Supp. Fig. 2. For novel object recognition, the error bars are shown in Supplemental Fig. 3.

      (3) The authors should consider a Methods figure for quantification of NeuN and deltaFOSB (expansions of Fig. 5C and Fig. 6C).

      Please see Reviewer #1, Public Review Point #2.

      (4) In Figure 5, A should be omitted and mentioned in the Methods/figure legend. B should be enlarged. C should be inset, zoomed-in images of the hilus, with an accompanying analysis image showing a clear reduction in NeuN intensity in low choline conditions compared to intermediate and high choline conditions. In D, X axes could delineate conditions (figure legend and color unnecessary). Figure 5C should be moved to a Methods figure.

      We thank the review for the excellent suggestions. We removed A as suggested. We expanded B and included insets. We used different images to show a more obvious reduction of cells for the low choline group. We expanded the Methods schematics. The revised figure is Figure 6 and shown above in response to Reviewer 1, Public Review Point #2.

      (5) In Figure 6, A should be eliminated and mentioned in the Methods/figure legend. B should be greatly expanded with higher and lower thresholds shown on subsequent panels (3x3 design).

      We removed A as suggested. We expanded B as suggested. The higher and lower thresholds are shown in C. The revised figure is Figure 7 and shown above in response to Reviewer 1, Public Review Point #2.

      (6) In Figure 7, A2 should be expanded vertically. A3 should be expanded both vertically and horizontally. B 1 and 2 should be increased, particularly B1 where it is difficult to see symbols. Perhaps colored symbols offset/staggered per group so that the spread per group is clearer.

      We added a panel (A4) to show an expansion of A2 and A3. However, we did not see that a vertical expansion would add information so we opted not to add that. We expanded B1 as suggested but opted not to expand B2 because we did not think it would enhance clarity. The revised figure is below.

      Author response image 7.

      (7) Supplemental Figure 1 could possibly be combined with Figure 1 (use rounded corner rat cage schematic for continuity).

      We opted not to combine figures because it would make one extremely large figure. As a result, the parts of the figure would be small and difficult to see.

      (8) Supplemental Figure 2 - there does not seem to be any statistical analysis associated with A mentioned in the Results text.

      We added the statistical information. It is now Supplemental Figure 4:

      Author response image 8.

      Mortality was high in mice treated with the low choline diet. A. Survival curves are shown for mice fed the low choline diet and mice fed the high choline diet. The mice fed the high choline diet had a significantly less severe survival curve. B. Left: A photo of a mouse after sudden unexplained death. The mouse was found in a posture consistent with death during a convulsive seizure. The area surrounded by the red box is expanded below to show the outstretched hindlimb (red arrow). Right: A photo of a mouse that did not die suddenly. The area surrounded by the box is expanded below to show that the hindlimb is not outstretched.

      The revised text is in the Results, Section E, starting on Line 793:

      “The reason that low choline-treated mice appeared to die in a seizure was that they were found in a specific posture in their cage which occurs when a severe seizure leads to death (Supplemental Figure 5). They were found in a prone posture with extended, rigid limbs (Supplemental Figure 5). Regardless of how the mice died, there was greater mortality in the low choline group compared to mice that had been fed the high choline diet (Log-rank (Mantel-Cox) test, Chi square 5.36, df 1, p=0.021; Supplemental Figure 5A).”

      Also, why isn't intermediate choline also shown?

      We do not have the data from the animals. Records of death were not kept, regrettably.

      Perhaps labeling of male/female could also be done as part of this graph.

      We agree this would be very interesting but do not have all sex information.

      B is not very convincing, though it is understandable once one reads about posture.

      We have clarified the text and figure, as well as the legend. They are above.

      Are there additional animals that were seen to be in a specific posture?

      There are many examples, and we added them to hopefully make it more convincing.

      We also added posture in WT mice when there is a death to show how different it is.

      Is there any relationship between seizures detected via EEG, as shown in Supplemental Figure 3, and death?

      Several mice died during a convulsive seizure, which is the type of seizure that is shown in the Supplemental Figure.

      (9) Supplemental Figure 3 seems to display an isolated case in which EEG-detected seizures correlate with increased IIEs. It is not clear whether there are additional documented cases of seizures that could be assembled into a meaningful population graph. If this data does not exist or is too much work to include in this manuscript, perhaps it can be saved for a future paper.

      We have added other cases and revised the graph. This is now Supplemental Figure 4 and is shown above in response to Reviewer #1, Recommendation for Authors Point #4.

      Frontal is misspelled.

      We checked and our copy is not showing a misspelling. However, we are very grateful to the Reviewer for catching many errors and reading the manuscript carefully.

      (10) Supplemental Figure 4 seems incomplete in that it does not include EEG data from months 4, 5, and 6 (see Fig. 7B).

      We have added data for these ages to the Supplemental Figure (currently Supplemental Figure 6) as part B. In part A, which had been the original figure, only 1.2, 2, and 3 months-old mice were shown because there were insufficient numbers of each sex at other ages. However, by pooling 1.2 and 2 months (Supplemental Figure 6B1), 3 and 4 months (B2) and 5 and 6 months (B3) we could do the analysis of sex. The results are the same – we detected no sex differences.

      Author response image 9.

      IIS frequency was similar for each sex. A. IIS frequency was compared for females and males at 1.2 months (1), 2 months (2), and 3 months (3). Two-way ANOVA was used to analyze the effects of sex and diet. Female and male Tg2576 mice were not significantly different. B. Mice were pooled at 1.2 and 2 months (1), 3 and 4 months (2) and 5 and 6 months (3). Two-way ANOVA analyzed the effects of sex and diet. There were significant effects of diet for (1) and (2) but not (3). There were no effects of sex at any age.

      (1) There were significant effects of diet (F(2,47)=46.21, p<0.0001) but not sex (F(1,47)=0.106, p=0.746). Female and male mice fed the low choline diet or high choline diet were significantly different from female and male mice fed the intermediate diet (all p<0.05, asterisk).

      (2) There were significant effects of diet (F(2,32)=10.82, p=0.0003) but not sex (F(1,32)=1.05, p=0.313). Both female and male mice of the low choline group were significantly different from male mice fed the intermediate diet (both p<0.05, asterisk) but no other pairwise comparisons were significant.

      (3) There were no significant differences (diet, F(2,23)=1.21, p=0.317); sex, F(1,23)=0.844, p=0.368).

      The data are discussed the Results, Section G, tarting on Line 843:

      In Supplemental Figure 6B we grouped mice at 1-2 months, 3-4 months and 5-6 months so that there were sufficient females and males to compare each diet. A two-way ANOVA with diet and sex as factors showed a significant effect of diet (F(2,47)=46.21; p<0.0001) at 1-2 months of age, but not sex (F1,47)=0.11, p=0.758). Post-hoc comparisons showed that the low choline group had fewer IIS than the intermediate group, and the same was true for the high choline-treated mice. Thus, female mice fed the low choline diet differed from the females (p<0.0001) and males (p<0.0001) fed the intermediate diet. Male mice that had received the low choline diet different from females (p<0.0001) and males (p<0.0001) fed the intermediate diet. Female mice fed the high choline diet different from females (p=0.002) and males (p<0.0001) fed the intermediate diet, and males fed the high choline diet difference from females (p<0.0001) and males (p<0.0001) fed the intermediate diet.

      For the 3-4 months-old mice there was also a significant effect of diet (F(2,32)=10.82, p=0.0003) but not sex (F(1,32)=1.05, p=0.313). Post-hoc tests showed that low choline females were different from males fed the intermediate diet (p=0.007), and low choline males were also significantly different from males that had received the intermediate diet (p=0.006). There were no significant effects of diet (F(2,23)=1.21, p=0.317) or sex (F(1,23)=0.84, p=0.368) at 5-6 months of age.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      We are pleased that Reviewer 3 has deemed our revisions satisfactory; below, we provide responses to the remaining Recommendations for the Authors from Reviewer 2.

      Reviewer #2 (Recommendations For The Authors):

      Minor corrections:

      • Line 91: GWT should be GNWT

      Fixed, thank you.

      • Figure 2: fix the label "Participationcoefficient rank" (no space between Participation and coefficient)

      Fixed, thank you for spotting.

      • Line 317: Figure 2 should be Figure 3

      Fixed, thank you.

      • Line 360: Figure 4D, right?

      Fixed, thank you. We also confirm that Figure 4 and its caption are correct. Under anaesthesia, many regions have more Integrated Information than during Recovery (red regions), but the only changes that are consistently observed across all three contrasts are the decreases.

      • Line 375: Should be Figure 5A

      Fixed, thank you.

      • The recovery period of the anesthesia data is not described in Methods.

      We have now added the missing information:

      “Propofol was discontinued following the deep anaesthesia scan, and participants reached level 2 of the Ramsey scale approximately 11 minutes afterwards, as indicated by clear and rapid responses to verbal commands. This corresponds to the “recovery” period 176.”

      We have also expanded our discussion on the interaction between information decomposition and measures of directionality:

      “Indeed, transfer entropy can itself be decomposed into information-dynamic atoms through Partial Information Decomposition and Integrated Information Decomposition 33,34,49,151; ΦID can further decompose the Normalised Directed Transfer Entropy measure used by Deco et al 5, as recently demonstrated 152. We look forward to a more refined conceptualization of the synergistic workspace architecture that takes into account both information types and the directionality of information flow – especially in datasets with higher temporal resolution.”

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In their manuscript, Yu et al. describe the chemotactic gradient formation for CCL5 bound to - i.e. released from - glycosaminoglycans. The authors provide evidence for phase separation as the driving mechanism behind chemotactic gradient formation. A conclusion towards a general principle behind the finding cannot be drawn since the work focuses on one chemokine only, which is particularly prone to glycan-induced oligomerisation.

      Strengths:

      The principle of phase separation as a driving force behind and thus as an analytical tool for investigating protein interactions with strongly charged biomolecules was originally introduced for protein-nucleic acid interactions. Yu et al. have applied this in their work for the first time for chemokine-heparan sulfate interactions. This opens a novel way to investigate chemokine-glycosaminoglycan interactions in general.

      Response: Thanks for the encouragement of the reviewer.

      Weaknesses:

      As mentioned above, one of the weaknesses of the current work is the exemplification of the phase separation principle by applying it only to CCL5-heparan sulfate interactions. CCL5 is known to form higher oligomers/aggregates in the presence of glycosaminoglycans, much more than other chemokines. It would therefore have been very interesting to see, if similar results in vitro, in situ, and in vivo could have been obtained by other chemokines of the same class (e.g. CCL2) or another class (like CXCL8).

      Response: We share the reviewer’s opinion that to investigate more molecules/cytokines that interact with heparan sulfate in the system should be of interesting. We expect that researchers in the field will adapt the concept to continue the studies on additional molecules. Nevertheless, our earlier study has demonstrated that bFGF was enriched to its receptor and triggered signaling transduction through phase separation with heparan sulfate (PMID: 35236856; doi: 10.1038/s41467-022-28765-z), which supports the concept that phase separation with heparan sulfate on the cell surface may be a common mechanism for heparan sulfate binding proteins. The comment of the reviewer that phase separation is related to oligomerization is demonstrated in (Figure 1—figure supplement 2C and D), showing that the more easily aggregated mutant, A22K-CCL5, does not undergo phase separation.

      In addition, the authors have used variously labelled CCL5 (like with the organic dye Cy3 or with EGFP) for various reasons (detection and immobilisation). In the view of this reviewer, it would have been necessary to show that all the labelled chemokines yield identical/similar molecular characteristics as the unlabelled wildtype chemokine (such as heparan sulfate binding and chemotaxis). It is well known that labelling proteins either by chemical tags or by fusion to GFPs can lead to manifestly different molecular and functional characteristics.

      Response: We agree with the reviewer that labeling may lead to altered property of a protein, thus, we have compared chemotactic activity of CCL5 and CCL5-EGFP (Figure 2—figure supplement 1). To further verify this, we performed additional experiment to compare chemotactic activity between CCL5 and Cy3-CCL5 (see Author response image 1). For the convenience of readers, we have combined the original Figure 2—figure supplement 1 with the new data (Figure R1), which replaced original Figure 2—figure supplement 1.

      Author response image 1.

      Chemotactic function of CCL5-EGFP and CCL5-Cy3. Cy3-Labeled CCL5 has similar activity as CCL5, 50 nM CCL5 or CCL5-Cy3 were added to the lower chamber of the Transwell. THP-1 cells were added to upper chambers. Data are mean ± s.d. n=3. P values were determined by unpaired two-tailed t-tests. NS, Not Significant.

      Reviewer #2 (Public Review):

      Although the study by Xiaolin Yu et al is largely limited to in vitro data, the results of this study convincingly improve our current understanding of leukocyte migration.

      (1) The conclusions of the paper are mostly supported by the data although some clarification is warranted concerning the exact CCL5 forms (without or with a fluorescent label or His-tag) and amounts/concentrations that were used in the individual experiments. This is important since it is known that modification of CCL5 at the N-terminus affects the interactions of CCL5 with the GPCRs CCR1, CCR3, and CCR5 and random labeling using monosuccinimidyl esters (as done by the authors with Cy-3) is targeting lysines. Since lysines are important for the GAG-binding properties of CCL5, knowledge of the number and location of the Cy-3 labels on CCL5 is important information for the interpretation of the experimental results with the fluorescently labeled CCL5. Was the His-tag attached to the N- or C-terminus of CCL5? Indicate this for each individual experiment and consider/discuss also potential effects of the modifications on CCL5 in the results and discussion sections.

      Response: We agree with the reviewer that labeling may lead to altered property of a protein, thus, we have compared chemotactic activity of CCL5 and CCL5-EGFP (Figure 2—figure supplement 1). To further verify this, we performed additional experiment to compare chemotactic activity between CCL5 and Cy3-CCL5 (see Author response image 1). For the convenience of readers, we have combined the original Figure 2—figure supplement 1 with the new data (Author response image 1), which replaced original Figure 2—figure supplement 1.

      The His-tag is attached to the C-terminus of CCL5, in consideration of the potential impact on the N-terminus.

      (2) In general, the authors appear to use high concentrations of CCL5 in their experiments. The reason for this is not clear. Is it because of the effects of the labels on the activity of the protein? In most biological tests (e.g. chemotaxis assays), unmodified CCL5 is active already at low nM concentrations.

      Response: We agree with the reviewer that the CCL5 concentrations used in our experiments were higher than reported chemotaxis assays and also higher than physiological levels in normal human plasma. In fact, we have performed experiments with lower concentration of CCL5, where the effect of LLPS was not seen though the chemotactic activity of the cytokine was detected. Thus, LLPS-associated chemotactic activity may represent a scenario of acute inflammatory condition when the inflammatory cytokines can increase significantly.

      (3) For the statistical analyses of the results, the authors use t-tests. Was it confirmed that data follow a normal distribution prior to using the t-test? If not a non-parametric test should be used and it may affect the conclusions of some experiments.

      Response: We thank the reviewer for pointing out this issue. As shown in Author response table 1, The Shapiro-Wilk normality test showed that only two control groups (CCL5 and 44AANA47-CCL5+CHO K1) in Figure 3 did not conform to the normal distribution. The error was caused by using microculture to count and calculate when there were very few cells in the microculture. For these two groups, we re-counted 100 μL culture medium to calculate the number of cells. The results were consistent with the positive distribution and significantly different from the experimental group (Author response image 3). The original data for the number of cells chemoattractant by 500 nM CCL5 was revised from 0, 247, 247 to 247, 123, 370 and 500 nM 44AANA47 +CHO-K1 was revised from 1111, 1111, 98 to 740, 494, 617. The revised data does not affect the conclusion.

      Author response table 1.

      Table R1 Shapiro-Wilk test results of statistical data in the manuscript

      Author response image 3.

      Quantification of THP-1collected from the lower chamber. Data are mean ± s.d. n=3. P values were determined by unpaired two-tailed t-tests.

      Recommendations for the authors:

      Reviewer #1:

      See the weaknesses section of the Public Review. In addition, the authors should discuss the X-ray structure of CCL5 in complex with a heparin disaccharide in comparison with their docked structure of CCL5 and a heparin tetrasaccharide.

      Response: Our study, in fact, is strongly influenced by the report (Shaw, Johnson et al., 2004) that heparin disaccharide interaction with CCL5, which is highlighted in the text (page5, line100-102).

      Reviewer #2:

      (1) Clearly indicate in the results section and figure legends (also for the supplementary figures) which form and concentration of CCL5 is used.

      Response: The relevant missing information is indicated across the manuscript.

      (2) Clearly indicate which GAG was used. Was it heparin or heparan sulfate and what was the length (e.g. average molecular mass if known) or source (company?)?

      Response: Relevant information is added in the section “Materials and Methods.

      (3) Line 181: What do you mean exactly with "tiny amounts"?

      Response: “tiny amounts” means 400 transfected cells. This is described in the section of Materials and Methods. It is now also indicated in the text and legend to the figure.

      (4) Lines 216-217: This is a very general statement without a link to the presented data. No combination of chemokines is used, in vivo testing is limited (and I agree very difficult). You may consider deleting this sentence (certainly as an opening sentence for the Discussion).

      Response: We appreciate very much for the thoughtful suggestion of the reviewer. This sentence is deleted in the revised manuscript.

      (5) Why was 5h used for the in vitro chemotaxis assay? This is extremely long for an assay with THP-1 cells.

      Response: We apologize for the unclear description. The 5 hr includes 1 hr pre- incubation of CCL5 with the cells enable to form phase separation. After transferring the cells into the upper chamber, the actual chemotactic assay was 4 hr. This is clarified in the Materials and Methods section and the legend to each figure.

      (6) Define "Sec" in Sec-CCL5-EGFP and "Dil" in the legend of Figure 4.

      Response: The Sec-CCL5-EGFP should be “CCL5-EGFP’’, which has now been corrected. Dil is a cell membrane red fluorescent probe, which is now defined.

      (7) Why are different cell concentrations used in the experiment described in Figure 5?

      Response: The samples were from three volunteers who exhibited substantially different concentrations of cells in the blood. The experiment was designed using same amount of blood, so we did not normalize the number of the cell used for the experiment. Regardless of the difference in cell numbers, all three samples showed the same trend.

      (8) Check the text for some typos: examples are on line 83 "ratio of CCL5"; line 142 "established cell lines"; line 196 "peripheral blood mononuclear cells"; line 224 "to mediate"; line 226 "bind"; line 247 "to form a gradient"; line 248 "of the glycocalyx"; line 343 and 346 "tetrasaccharide"; line 409-410 "wild-type"; line 543 "on the surface of CHO-K1 and CHO-677"; line 568 "white".

      Response: Thanks for the careful reading. The typo errors are corrected and Manuscript was carefully read by colleagues.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Pg. 3 - lines 51-53: "Once established, the canonical RdDM pathway takes over, whereby small RNAs are generated by the plant-specific polymerase IV (Pol IV). In both cases, a second plant-specific polymerase, Pol V, is an essential downstream component." The authors' intro omits an important aspect of Pol V's function in RdDM, which is quite relevant to their study. Pol V transcribes DNA to synthesize noncoding RNA scaffolds, to which AGO4-bound 24 nt siRNAs are thought to base pair, leading to DRM2 recruitment for cytosine methylation near to these nascent Pol V transcripts (Wierzbicki et al 2008 Cell; Wierzbicki et al. 2009 Nat Genet). I recommend that the authors cite these key studies.

      These citations have now been added (see line 57).

      The authors provide compelling evidence that Pol V redistributes to ectopic heterochromatin regions in h1 mutants (e.g., Fig1a browser shot). Presumably, this would allow Pol V to transcribe these regions in h1 mutants, whereas it could not transcribe them in WT plants. Have the authors detected and/or quantified Pol V transcripts in the h1 mutant compared to WT plants at the sites of Pol V redistribution (detected via NRPE1 ChIP)?

      Robust detection of Pol V transcripts can be experimentally challenging, and instead we quantify and detect NRPE1 dependent methylation at these regions (Fig 5), which occurs downstream of Pol V transcript production. However, we note detecting Pol V transcripts as a potential future direction in the discussion (see line 263).

      Pg. 5 - lines 101-102: Figure 1e - "The preferential enrichment of NRPE1 in h1 was more pronounced at TEs that overlapped with heterochromatin associated mark, H3K9me2 (Fig. 1e). Was a statistical test performed to determine that the overall differences are significant only at TE sites with H3K9me2? Can the sites without H3K9me2 also be differentiated statistically?

      Yes, there is a statistically significant difference between WT and h1 at both the H3K9me2 marked and unmarked TEs (Wilcoxon rank sum tests, see updated Fig 1e). The size of the effect is larger for the H3K9me2 marked TEs (median difference of 0.41 vs 0.16). Median values have now been added to the boxplots so that this is directly viewable to the reader (Fig 1e). This reflects the general increase in NRPE1 occupancy in h1 mutants through the genome, with the effect consistently stronger in heterochromatin. In our initial version of the manuscript, we summarise the effect as follows “We found that h1 antagonizes NRPE1 occupancy throughout the genome, particularly at heterochromatic regions” (previous version line 83, current version line 95). Although important exceptions exist (see Fig 5, NRPE1 and DNA methylation loss in h1), we now make this point even more explicit, and have updated the manuscript at several locations (abstract line 26, results line 245, discussion line 265).

      Pg. 5 - lines 108-110: The authors state, "Importantly, we found no evidence for increased NRPE1 expression at the mRNA or protein level in the h1 mutant (Suppl. Fig. 2)." But the authors did observe reduced NRPE1 transcript levels in h1 mutants, in their re-analysis of RNA-seq data and reduced NRPE1 protein signals via western blot in (Suppl. Fig. 2), which should be reported here in the results.

      As described further below, we reanalysed h1 RNA-seq from scratch, and see no evidence for significant differential gene expression of NRPE1. This table and analysis are now provided in Supplementary Table 1.

      More importantly, the above logic about NRPE1 expression in h1 mutants assumes that NRPE1 is the stoichiometrically limiting subunit for Pol V assembly and function in vivo, but this is not known to be the case:

      (1) While NRPE1's expression is somewhat reduced (and not increased) in h1 mutant plants, we cannot be certain that other genes influencing Pol V stability or recruitment are unaffected by h1 mutants. I thus recommend that the authors perform RT-qPCR directly on the WT and h1 mutant materials used in their current study, quantifying NRPE1, NRPE2, NRPE5, DRD1, DMS3, RDM1, SUVH2 and SUVH9 transcript levels.

      (2) Normalizations used to compare samples should be included with RT-qPCR and western assays. An appropriate house-keeping gene like Actin2 or Ubiquitin could be used to normalize the RT-qPCR. Protein sample loading in Suppl. Fig. 2 could be checked by Coomassie staining and/or an antibody detection of a house-keeping protein.

      We have now included a full re-analysis of h1 RNA-seq (data from Choi et al 2020) focusing on transcriptional changes of DNA methylation machinery genes in the h1 mutant. Of the 61 genes analysed, only AGO6 and AGO9 were found to be differentially expressed (2-3 fold upregulation). This analysis is now included as a table

      (Supplementary Table 1). The western blot has been moved to Supplementary Fig 3 to now illustrate antibody specificity and H1 loss in the h1 mutant lines, so NRPE1 itself serves as a loading control (Supplementary Fig 3a).

      Pg. 6 - lines 129-131: The authors state that "over NRPE1 defined peaks (where NRPE1 occupancy is strongest in WT) we observed no change in H1 occupancy in nrpe1 (Fig 2b). The results indicate that H1 does not invade RdDM regions in the nrpe1 mutant background." This conclusion assumes that the author's H1 ChIP is successfully detecting H1 occupancy. However, in Fig 2d there does not appear to be H1 enrichment or peaks as visualized across the 10766 ZF-DMS3 off-target loci, or even at the selected 451 ZFDMS3 off-target hyper DMRs, where the putative signal for H1 enrichment on the metaplot center is extremely weak/non-existent.

      As a reference for H1 enrichment in chromatin (e.g., looking where H2A.W antagonizes H1 occupancy) one can compare analyses in Bourguet et al (2021) Nat Commun, involving co-authors of the current study. Bourguet et al (2021) Fig 5b show a metaplot of H1 levels centered on H2A.W peaks with H1 ChIP signal clearly tapering away from the metaplot center point peak. To my eye, the H1 ChIP metaplots for ZF-DMS3 offtarget loci in the current manuscript (Fig 2d) resemble "shuffled peaks" controls like those in Fig 5b of Bourguet et al (2021).

      Can one definitively interpret Fig 2d as showing RdDM "not reciprocally affecting H1 localization" without first showing the specificity of the ChIP-seq results in a genotype where H1 occupancy changes? Alternatively, could this dataset be displayed with Deeptools heatmaps to strengthen the evidence that the authors are detecting H1 occupancy/enrichment genome-wide, before diving into WT/nrpe1 mutant analysis at ZF-DMS3 off-target loci?

      This is an excellent suggestion from the reviewer. We have now included several analyses that assess and demonstrate the quality of our H1 ChIP-seq profiles. First, as suggested by the reviewer, we show that our H1 profiles peak over H2A.W enriched euchromatic TEs as defined by Bourguet et al, mirroring these published findings. Next, we investigated whether our H1 profiles match Teano’s recently described pattern over genes, confirming a similar pattern with 3’ enrichment of H1 over H3K27me3 unmarked genes. Furthermore, we show that the H1 peaks defined here are similarly enriched with GFP tagged H1.2 from the Teano et al. 2023 study. These analyses that validate the quality of our H1 ChIP-seq datasets and bolster the conclusion that NRPE1 redistribution does not affect H1 occupancy. These new analysis are now presented in Supplementary Figure 3 and see line 153.

      Pg. 8 - lines 228-230: The authors state that, "As with NRPE1, SUVH1 increased in the h1 background significantly more in heterochromatin, with preferential enrichment over long TEs, cmt2 dependent hypo CHH DMRs, and heterochromatic TEs (Fig. 6b)."

      Contrary to the above statement, the violin plots in Fig. 6c show SUVH1 occupancy increasing at euchromatic TEs in the h1 mutant. What statistical test allowed the authors to determine that the increase in h1 occurs "significantly more in heterochromatin"? The authors should critically interpret Fig. 6c and 6d, which are not currently referenced in the results section. More support is needed for the claim that SUVH1 specifically encroaches into heterochromatin in the h1 mutant, rather than just TEs generally (euchromatic and heterochromatic alike).

      Similar to what we see for NRPE1, statistical tests that we have now performed show that SUVH1 is significantly enriched in h1 in all classes. Importantly however, the effect size is larger in all of the heterochromatin associated classes. We display these statistical tests and the median values on the plots so that effects are immediately viewable (see updated Fig 6).

      In addition, the authors should verify that SUVH1-3xFLAG transgenes (in the WT and h1 mutant backgrounds, respectively) and endogenous Arabidopsis genes encoding the transcriptional activator complex (SUVH1-SUVH3-DNAJ1-DNAJ2) are not overexpressed in the h1 mutant vs. WT. Higher expression of SUVH1 or limiting factors in the larger complex could explain the observation of increased SUVH1 occupancy in the h1 background.

      We do not see a difference in SUVH1/3/DNAJ1/2 complex gene expression in the h1 background (see Supplementary Table 1). However, we cannot rule out that that our SUVH1-FLAG line in h1 is more highly expressed than the corresponding SUVH1-FLAG line in WT. We now note this point in line 248.

      Pg. 8 - lines 231-232: Here the authors make a sweeping conclusion about H1 demarcating, "the boundary between euchromatic and heterochromatic methylation pathways, likely through promoting nucleosome compaction and restricting heterochromatin access." I do not see how a H1 boundary between euchromatic and heterochromatic methylation pathways is revealed based on the SUVH1-3xFLAG occupancy data, which shows increased enrichment at every category interrogated in the h1 mutant (Fig 6b,c,d) and all along the baseline too in the h1 mutant browser tracks (Fig 6a). Can the authors provide more examples of this phenomenon (similar to Fig 6a) and better explain why their SUVH1-3xFLAG ChIP supports this demarcation model?

      The general conclusion from SUVH1 about H1’s agnostic role in preventing heterochromatin access is now further supported from our findings with H3K27me3 (see Figure 6e and description from line 250). However, we agree that the demarcation model as initially presented was overly simplistic. This point was also raised by reviewer 2. We have removed the line highlighted by the reviewer in the revised version of the manuscript. In the revised version we clarify that H1 impedes RdDM and associated machinery throughout the genome (consistent with H1’s established broad occupancy across the genome) but this effect is most pronounced in heterochromatin, corresponding to maximal H1 occupancy (abstract line 26, results line 245, discussion line 265). 

      Corrections:

      Pg. 8 - lines 226-227: "We therefore wondered whether complex's occupancy might also be affected by H1." The sentence contains a typo, where I assume the authors mean to refer to occupancy by the SUVH1-SUVH3-DNAJ1-DNAJ2 transcriptional activator complex. This needs to be specified more clearly.

      The paragraph has been updated (see from line 237).

      Pg. 13 - lines 393-405: There are minor errors in the capitalization of titles and author initials in the References. I recommend that the authors proofread all the references to eliminate these issues:

      Thank you, these have been corrected.

      Choi J, Lyons DB, Zilberman D. 2021. Histone H1 prevents non-cg methylation-mediated small RNA biogenesis in arabidopsis heterochromatin. Elife 10:1-24. doi:10.7554/eLife.72676 (...)

      Du J, Johnson LM, Groth M, Feng S, Hale CJ, Li S, Vashisht A a., Gallego-Bartolome J, Wohlschlegel J a., Patel DJ, Jacobsen SE. 2014. Mechanism of DNA methylation-directed histone methylation by KRYPTONITE. Mol Cell 55:495-504. doi:10.1016/j.molcel.2014.06.009 (...)

      Du J, Zhong X, Bernatavichute Y V, Stroud H, Feng S, Caro E, Vashisht A a, Terragni J, Chin HG, Tu A, Hetzel J, Wohlschlegel J a, Pradhan S, Patel DJ, Jacobsen SE. 2012. Dual binding of chromomethylase domains to H3K9me2-containing nucleosomes directs DNA methylation in plants. Cell 151:167-80. doi:10.1016/j.cell.2012.07.034

      Reviewer #2 (Recommendations For The Authors):

      As for a normal review, here are our major and minor points.

      Major:

      (1) Lines 38 to 45 of the introduction are important for the subsequent definition of heterochromatic and non-heterochromatic transposons, but the definition is ambiguous. Is heterochromatin defined by surrounding context such as pericentromeric position or is this an autonomous definition? Can a TE with the chromosomal arms be considered heterochromatic provided that it is long enough and recruits the right machinery? These cases should be more explicitly introduced. Ideally, a supplemental dataset should provide a key to the categories, genomic locations and overlapping TEs as they were used in this analysis, even if some of the categories were taken from another study.

      We have now added all the regions used for analysis in this study to Supplementary Table 3.

      (2) Line 80: This would be the first chance to cite Teno et al. and the "encroachment" of

      PcG complexes to TEs in H1 mutants

      Done - “H1 also plays a key role in shaping nuclear architecture and preventing ectopic polycomb-mediated H3K27me3 deposition in telomeres (Teano et al., 2023).” See line 83

      (3) It is "only" a supplemental figure but S2 but it should still follow the rules: Indicate the number of biological replicates for the RNA-seq data, and perform a statistical test. In case of WB data, provide a loading control.

      We are now using the western blot to illustrate antibody specificity and H1 loss in the h1 mutant lines, so NRPE1 itself serves as a loading control (Supplementary Fig 3a). For NRPE1 mRNA expression, we have now replaced this with a more comprehensive transcriptome analysis of methylation machinery in h1 (see Supplementary Table 1). 

      (4) Lines 115 to 124 and corresponding data: Here, the goal is to exclude other changes to heterochromatin structure other than "increased access" in H1 mutants; however, only one feature, H3K9me2, is tested. Testing this one mark does not necessarily prove that the nature of the chromatin does not change, e.g. H2A.W could be differently redistributed, DDM1 may change, VIM protein, and others. Either more comprehensive testing for heterochromatin markers should be performed, or the conclusions moderated.

      We have moderated the text accordingly (see line 135).

      (5) Lines 166ff and Figure 1, a bit out of order also Figure 5: The general hypothesis is that NRPE1 redistributes to heterochromatic regions in h1 mutants (as do other chromatin modifiers), but the data seem to only support a higher occurrence at target sites.

      a. The way the NRPE1 data is displayed makes it seem like there is much more NRPE1 in the h1 samples, even at peaks that should not be recruiting more as they do not represent "long" TEs. It would be good to present more gbrowse shots of all peak classes.

      We now clarify that h1 does result in a general increase of NRPE1 throughout the genome, but the effect is strongest at heterochromatin. In our initial version of the manuscript, we summarise the effect as follows “We found that h1 antagonizes NRPE1 occupancy throughout the genome, particularly at heterochromatic regions” (previous version line 83, current version line 95). We have modified the language at several locations throughout the manuscript to make this point more clearly (abstract line 26, results line 245, discussion line 265). We include several browser shots in Supp Fig. 8.

      b. The data are "normalized" how exactly?

      c. One argument of observing "gaining" and "losing" peaks is that there is redistribution of NRPE1 from euchromatic to heterochromatic sites. There should be an analysis and figure to corroborate the point (e.g. by comparing FRIP values). Figure 1b shows lower NRPE1 signals at the TE flanking regions. This could reflect a redistribution or a flawed normalization procedure.

      The data are normalised using a standardised pipeline by log2 fold change over input, after scaling each sample by mapped read depth using the bamCompare function in deepTools. This is now described in detail in the Materials and Methods line 365, with full code and pipelines available from GitHub (https://github.com/Zhenhuiz/H1-restrictseuchromatin-associated-methylation-pathways-from-heterochromatic-encroachment).

      d. Figure 1d and f show similar profiles comparing "long" and "short" TEs or "CMT2 dependent hypo-CHH" and "DRM2 dependent CHH". How do these categories relate to each other, how many fragments are redundant?

      The short vs long TEs were defined in Liu et al 2018 (doi: 10.1038/s41477-017-0100-y) and the DMRs were defined in Zhang et al. 2018 (DOI: 10.1073/pnas.1716300115). There is likely to be some degree of overlap between the categories, but numbers are very different (short TEs (n=820), long TEs (n=155), drm2 DMRs (n=5534), CMT (n=21784)) indicating that the different categories are informative. We have now listed all the regions used for analysis in this study as in Supplementary Table 3.

      e. The purpose of the data presented in Figure 1 b is to compare changes of NRPE1 association in H3K9me3 non-overlapping and overlapping TEs between wild-type and background, yet the figure splits the categories in two subpanels and does neither provide a fold-change number nor a statistical test of the comparison. As before, the figure does not really support the idea that NPRE1 somehow redistribute from its "normal" sites towards heterochromatin as both TE classes seem to show higher NRPE1 binding in h1 mutants.

      There is a statistically significant difference between WT and h1 at both the H3K9me2 marked and unmarked TEs, however, the size of the effect is larger for the H3K9me2 marked TEs (median difference of 0.41 vs 0.16). Median values have now been added to the boxplots so that this is directly viewable to the reader (Fig 1e). Although important exceptions exist (see Fig 5 – regions that lose NRPE1 and DNA methylation), this reflects the general increase in NRPE1 occupancy in h1 mutants throughput the genome, with a consistently stronger effect in heterochromatin. As noted above, we have updated the manuscript to make this point more clearly (abstract line 26, results line 245, discussion line 265).

      f. Panel g is the only attempt to corroborate the redistribution towards heterochromatic regions, but at this scale, the apparent reduction of binding in the chromosome arms may be driven by off-peak differences and normalization problems between different ChIP samples with different signal-to-noise-ratio.

      We describe our normalisation and informatic pipeline in more detail in the Materials and Methods line 365. It is also important to note that the reduction is not only observed at the chromosomal level, but also at specific sites. We called differential peaks between WT and h1 mutant. The "Regions that gain NRPE1 in h1" peaks are more enriched in heterochromatic regions, while " Regions that lose NRPE1 in h1" peaks are more enriched outside heterochromatic regions.

      g. Figure 5: how many regions gain vs lose NRPE1 in h1 mutants? If the "redistribution causes loss" scenario applies, the numbers should overall be balanced but that does not seem the case. The loss case appears to be rather exceptional judging from the zigzagging meta-plot. Are these sites related to the sites taken over by PcG-mediated repression in h1 mutants?

      As described in line 222 (previous version of the manuscript line 206), there are 15,075 sites that gain and 1,859 sites that lose NRPE1 in h1. Comparing these sites to

      H3K27me3 in the Teano et al. study was an excellent suggestion. We compared sites that gain NRPE1 to sites that gain H3K27me3 in h1, finding a statistically significant overlap (2.4 fold enrichment over expected, hypergeometric test p-value 2.1e-71). Reciprocally, sites that lose NRPE1 were significantly enriched for overlap with H3K27me3 loss regions (1.6 fold over expected, hypergeometric test p-value 1.4e-4). This indicates that RdDM and H3K27me3 patterning are similarly modulated by H1. To directly test this, we reanalysed the H3K27me3 ChIP-seq data from Teano et al., finding coincident gain and loss of H3K27me3 at sites that gain and lose NRPE1 in h1. These results are described from line 250 and in Fig 6e, which supports a general role for H1 in preventing heterochromatin encroachment.

      (6) Lines 166ff and Figure 3: The data walk towards the scenario of pathway redistribution but actually find that RdDM plays a minor role overall as a substantial increase in heterochromatin regions occurs in all contexts and is largely independent of RdDM.

      a. How exactly are DNA-methylation data converted across regions to reach a fraction score from 0 to 1? There is no explanation in the legend for the methods that allow to recapitulate.

      We now explain our methods in full in the Materials and Methods and all the code for generating these has now been deposited on GitHub (https://github.com/Zhenhuiz/H1restricts-euchromatin-associated-methylation-pathways-from-heterochromaticencroachment). Briefly, BSMAP is used to calculate the number of reads that are methylated vs unmethylated on a per-cytosine basis across the genome. Next, the DNA methylation fraction in each region is calculated by adding all the methylation fractions per cytosine in a given window, and divided by the total number of cytosines in that same window (ie mC/(unmC+mC)) i.e. this is expressed as a fraction ranging from 0 to 1.

      “0” indicates this region is not methylated, and “1” indicates this region is fully methylated (every cytosine is 100% methylated).  

      b. Kernel plots? These are slang for experts and should be better described. In addition, nothing is really concluded from these plots in the text, although they may be quite informative.

      Kernel density plots show the proportion of TEs that gain or lose methylation in a particular mutant, rather than the overall average as depicted in the methylation metaplots above. We now describe the kernel density plots in more detail in the Figure 3 legend. 

      (7) Figure 4: This could be a very interesting analysis if the reader could actually understand it.

      a. The legend is minimal. What is the meaning of hypo and hyper regions indicated to the right of Figure 4c?

      b. The color scale represents observed/expected values. What exactly does this mean? Mutant vs WT?

      c. Some comparisons in 4a are cryptic, e.g. h1 nrpe1 nrpe1 vs CHH?

      d. Figure 4d focuses on a correlation square of relevance, but why? Interestingly the square does not correspond to any "hypo" or "hyper" label?

      Thank you, we have revised Figure 4 and legend based on these suggestions to clarify all of the above.

      (8) Lines 226 and Figure 6B. De novo (or increased) targeting of SUVH1 to heterochromatic sites in h1 mutants, similar to NRPE1, is used to support the argument that more access allows other chromatin modifiers to encroach. SUVH1 strongly depends on RdDM for its in vivo binding and may be the least conclusive factor to argue for a "general" encroachment mechanism.

      We appreciate the reviewers point here. Something that is entirely independent of RdDM following the same pattern would be stronger evidence in favour of general encroachment. Excitingly, this is exactly what we provide evidence for when investigating the interrelationship with H3K27me3 and we appreciate the reviewer’s suggestion to check this! This data is now described in Figure 6e and line 250.

      Minor:

      (1) Line 23: "Loss of H1 resulted in heterochromatic TE enrichment by NRPE1." This does not seem right. NRPE enrichment as TEs

      Modified, (line 26) thank you.

      (2) Lines 73-74: The idea that DDM1 displaces H1 in heterochromatic TEs is somewhat counterintuitive to model that heterochromatic TEs are unavailable for RdDM because of the presence of H1. Is this displacement non-permanent and directly linked to interaction with CMT2/3 Met1?

      This is a very good question and we agree with the reviewer that the effect of DDM1 may only be transient or insufficient to allow for full RdDM assembly, or indeed there may be a direct interaction between DDM1 and CMTs/MET1. During preparation of these revisions, a structure of Arabidopsis nucleosome bound DDM1 was published, which provides some insight by showing that DDM1 promotes DNA sliding. This is at least consistent with the idea of DDM1 causing transient / non-permanent displacement of H1 that would be insufficient for RdDM establishment. We incorporate discussion of these ideas at line 80.

      (3) Line 85: A bit more background on the Reader activator complex should be given. In fact, the reader may not really care that it was more recently discovered (not really recent btw) but what does it actually do?

      We have quite extensively reconfigured this paragraph to take into account our new finding with H3K27me3, such that there is less emphasis on the reader activator complex. The sentence now reads as follows:

      “We found that h1 antagonizes NRPE1 occupancy throughout the genome, particularly at heterochromatic regions. This effect was not limited to RdDM,  similarly impacting both the methylation reader complex component, SUVH1 (Harris et al., 2018) and polycomb-mediated H3K27me3 (Teano et al., 2023).” (line 95). 

      Also, when describing the experiment the results section (line 241), we now provide more background on SUVH1’s function.

      (4) Lines 80-81: Since it is already shown that RdDM associated small RNAs are more enriched in h1 at heterochromatin, help us to know what is precisely the added value of studying the enrichment of NRPE1 at these sites.

      Good point. We have the following line: ‘...small RNAs are not a direct readout of functional RdDM activity and Pol IV dependent small RNAs are abundant in regions of the genome that do not require RdDM for methylation maintenance and that do not contain Pol V (Stroud et al., 2014).’ (line 90)

      (5) Line 99: This seems to be the only time where the connection between long TEs and heterochromatic regions is mentioned but no source is cited.

      We have added the following appropriate citations: (Bourguet et al., 2021; Zemach et al., 2013). (line 110).

      (6) Line 100: DMRs is used for the first time here without explanation and full text. The abbreviation is introduced later in the text (Line 187).

      Thank you, we now describe DMRs upon first use, line 112.

      (7) Figure 2: Panels 2 c and d should show metaplots for WT and transgenes in one panel. There is something seriously wrong with the normalization in d or the scale for left and right panel is not the same. Neither legend nor methods describe how normalization was performed.

      Thank you for pointing this out, the figure has been corrected. We have updated the Materials and Methods (line 365) and have added codes and pipelines to GitHub to explain the normalisation procedure in more detail (https://github.com/Zhenhuiz/H1restricts-euchromatin-associated-methylation-pathways-from-heterochromaticencroachment).

    1. Author response:

      The following is the authors’ response to the original reviews.

      We are very grateful to the reviewers for their constructive comments. Here is a summary of the main changes we made from the previous manuscript version, based on the reviewers’ comments:

      (1) Introduction of a new model, based on a Markov chain, capturing within-trial evolution in search strategy .

      (2) Addition of a new figure investigating inter-animal variations in search strategy.

      (3) Measurement of model fit consistency across 10 simulation repetitions, to prevent the risk of model overfitting.

      (4) Several clarifications have been made in the main text (Results, Discussion, Methods) and figure legends.

      (5) We now provide processed data and codes for analyses and models at GitHub repository

      (6) Simplification of the previous modeling. We realized that the two first models in the previous manuscript version were simply special cases of the third model. Therefore, we retained only the third model, which has been renamed as the ‘mixture model’.

      (7) Modification of Figure 4-6 and Supplementary Figure 7-8 (or their creation) to reflect the aforementioned changes

      Public Reviews:

      Reviewer #1 (Public Review):

      The authors design an automated 24-well Barnes maze with 2 orienting cues inside the maze, then model what strategies the mice use to reach the goal location across multiple days of learning. They consider a set of models and conclude that one of these models, a combined strategy model, best explains the experimental data.

      This study is written concisely and the results presented concisely. The best fit model is reasonably simple and fits the experimental data well (at least the summary measures of the data that were presented).

      Major points:

      (1) One combined strategy (once the goal location is learned) that might seem to be reasonable would be that the animal knows roughly where the goal is, but not exactly where, so it first uses a spatial strategy just to get to the first vestibule, then switches to a serial strategy until it reaches the correct vestibule. How well would such a strategy explain the data for the later sessions? The best combined model presented in the manuscript is one in which the animal starts with a roughly 50-50 chance of a serial (or spatial strategy) from the start vestibule (i.e. by the last session before the reversal the serial and spatial strategies are at ~50-50m in Fig. 5d). Is it the case that even after 15 days of training the animal starts with a serial strategy from its starting point approximately half of the time? The broader point is whether additional examination of the choices made by the animal, combined with consideration of a larger range of possible models, would be able to provide additional insight into the learning and strategies the animal uses.

      Our analysis focused on the evolution of navigation strategies across days and trials. The reviewer raises the interesting possibility that navigation strategy might evolve in a specific manner within each trial, especially on the later days once the environment is learned. To address this possibility, we first examined how some of the statistical distributions, previously analyzed across days, evolved within trials. Consistent with the reviewer’s intuition, the statistical distributions changed within trials, suggesting a specific strategy evolution within trials. Second, we developed a new model, where strategies are represented as nodes of a Markov chain. This model allows potential strategy changes after each vestibule visit, according to a specific set of transition probabilities. Vestibules are chosen based on the same stochastic processes as in the previous model. This new model could be fitted to the experimental distributions and captured both the within-trial evolution and the global distributions. Interestingly, the trials were mostly initiated in the random strategy (~67% chance) and to a lesser extent in the spatial strategy (~25% chance), but rarely in the serial strategy (~8% chance). This new model is presented in Figure 6.

      (2) To clarify, in the Fig. 4 simulations, is the "last" vestibule visit of each trial, which is by definition 0, not counted in the plots of Fig. 4b? Otherwise, I would expect that vestibule 0 is overrepresented because a trial always ends with Vi = 0.

      The last vestibule visit (vestibule 0 by definition) is counted in the plots of Fig.4b. We initially shared the same concern as the reviewer. However, upon further consideration, we arrived at the following explanation: A factor that might lead to an overrepresentation of vestibule 0 is the fact that, unlike other vestibules, it has to be contained in each trial, as trials terminated upon the selection of vestibule 0. Conversely, a factor that might contribute to an underrepresentation of vestibule 0 is that, unlike other vestibules, it cannot be counted more than once per trial. Somehow these two factors seem to counterbalance each other, resulting in no discernible overrepresentation or underrepresentation of vestibule 0 in the random process. 

      Reviewer #2 (Public Review):

      This paper uses a novel maze design to explore mouse navigation behaviour in an automated analogue of the Barnes maze. Overall I find the work to be solid, with the cleverly designed maze/protocol to be its major strength - however there are some issues that I believe should be addressed and clarified.

      (1) Whilst I'm generally a fan of the experimental protocol, the design means that internal odor cues on the maze change from trial to trial, along with cues external to the maze such as the sounds and visual features of the recording room, ultimately making it hard for the mice to use a completely allocentric spatial 'place' strategy to navigate. I do not think there is a way to control for these conflicts between reference frames in the statistical modelling, but I do think these issues should be addressed in the discussion.

      It should be pointed out that all cues on the maze (visual, tactile, odorant) remained unchanged across trials, since the maze was rotated together with goal and guiding cues. Furthermore, the maze was equipped with an opaque cover to prevent mice from seeing the surrounding room (the imaging of mouse trajectories was achieved using infrared light and camera). It is however possible that some other cues such as room sounds and odors could be perceived and somewhat interfered with the sensory cues provided inside the maze. We have now mentioned this possibility in the discussion.

      (2) Somewhat related - I could not find how the internal maze cues are moved for each trial to demarcate the new goal (i.e. the luminous cues) ? This should be clarified in the methods.

      The luminous cues were fixed to the floor of the arena. Consequently, they rotated along with the arena as a unified unit, depicted in figure 1. We have added some clarifications in Figure 1 legend and methods.

      (3) It appears some data is being withheld from Figures 2&3? E.g. Days 3/4 from Fig 2b-f and Days 1-5 on for Fig 3. Similarly, Trials 2-7 are excluded from Fig 3. If this is the case, why? It should be clarified in the main text and Figure captions, preferably with equivalent plots presenting all the data in the supplement.

      The statistical distributions for all single days/trials are shown in the color-coded panels of Figure2&3. In the line plots of Figure2&3, we show only the overlay of 2-3 lines for the sake of clarity. The days/trials represented were chosen to capture the dynamic range of variability within the distributions. We have added this information in the figure legends.

      (4) I strongly believe the data and code should be made freely available rather than "upon reasonable request".

      Matrices of processed data and various codes for simulations and analyses are now available at https://github.com/ sebiroyerlab/Vestibule_sequences.

      Reviewer #3 (Public Review):

      Royer et al. present a fully automated variant of the Barnes maze to reduce experimenter interference and ensure consistency across trials and subjects. They train mice in this maze over several days and analyze the progression of mouse search strategies during the course of the training. By fitting models involving stochastic processes, they demonstrate that a model combined of the random, spatial, and serial processes can best account for the observed changes in mice's search patterns. Their findings suggest that across training days the spatial strategy (using local landmarks) was progressively employed, mostly at the expense of the random strategy, while the serial strategy (consecutive nearby vestibule check) is reinforced from the early stages of training. Finally, they discuss potential mechanistic underpinnings within brain systems that could explain such behavioral adaptation and flexibility.

      Strength:

      The development of an automated Barnes maze allows for more naturalistic and uninterrupted behavior, facilitating the study of spatial learning and memory, as well as the analysis of the brain's neural networks during behavior when combined with neurophysiological techniques. The system's design has been thoughtfully considered, encompassing numerous intricate details. These details include the incorporation of flexible options for selecting start, goal, and proximal landmark positions, the inclusion of a rotating platform to prevent the accumulation of olfactory cues, and careful attention given to atomization, taking into account specific considerations such as the rotation of the maze without causing wire shortage or breakage. When combined with neurophysiological manipulations or recordings, the system provides a powerful tool for studying spatial navigation system.

      The behavioral experiment protocols, along with the analysis of animal behavior, are conducted with care, and the development of behavioral modeling to capture the animal's search strategy is thoughtfully executed. It is intriguing to observe how the integration of these innovative stochastic models can elucidate the evolution of mice's search strategy within a variant of the Barnes maze.

      Weakness:

      (1) The development of the well-thought-out automated Barnes maze may attract the interest of researchers exploring spatial learning and memory. However, this aspect of the paper lacks significance due to insufficient coverage of the materials and methods required for readers to replicate the behavioral methodology for their own research inquiries.

      Moreover, as discussed by the authors, the methodology favors specialists who utilize wired recordings or manipulations (e.g. optogenetics) in awake, behaving rodents. However, it remains unclear how the current maze design, which involves trapping mice in start and goal positions and incorporating angled vestibules resulting in the addition of numerous corners, can be effectively adapted for animals with wired implants.

      The reviewer is correct in pointing out that the current maze design is not suitable for performing experiments with wired implant, particularly due to the maze’s enclosed structure and the access to the start/goal boxes through side holes. Instead, pharmacogenetics and wireless approaches for optogenetic and electrophysiology would need to be used. We have now mentioned this limitation in the discussion.

      (2) Novelty: In its current format, the main axis of the paper falls on the analysis of animal behavior and the development of behavioral modeling. In this respect, while it is interesting to see how thoughtfully designed models can explain the evolution of mice search strategy in a maze, the conclusions offer limited novel findings that align with the existing body of research and prior predictions.

      We agree with the reviewer that our study is weakly connected to previous researches on hippocampus and spatial navigation, as it consists mainly of animal behavior analysis and modeling and addresses a relatively unexplored topic. We hope that the combination of our behavioral approach with optogenetic and electrophysiology will allow in the future new insights that are in line with the existing body of research.

      (3) Scalability and accessibility: While the approach may be intriguing to experts who have an interest in or are familiar with the Barnes maze, its presentation seems to primarily target this specific audience. Therefore, there is a lack of clarity and discussion regarding the scalability of behavioral modeling to experiments involving other search strategies (such as sequence or episodic learning), other animal models, or the potential for translational applications. The scalability of the method would greatly benefit a broader scientific community. In line with this view, the paper's conclusions heavily rely on the development of new models using custom-made codes. Therefore, it would be advantageous to make these codes readily available, and if possible, provide access to the processed data as well. This could enhance comprehension and enable a larger audience to benefit from the methodology.

      The current approach might indeed extend to other species in equivalent environments and might also constitute a general proof of principle regarding the characterization of animal behaviors by the mixing of stochastic processes. We have now mentioned these points in the discussion.

      As suggest by the reviewer, we have now provided model/simulation codes and processed data to replicate the figures, at https://github.com/sebiroyerlab/Vestibule_sequences

      (4) Cross-validation of models: The authors have not implemented any measures to mitigate the risk of overfitting in their modeling. It would have been beneficial to include at least some form of cross-validation with stochastic models to address this concern. Additionally, the paper lacks the presence of analytics or measures that assess and compare the performance of the models.

      To avoid the risk of model overfitting, the most appropriate solution appeared to be repeating the simulations several times and examining the consistency of the obtained parameters across repetitions. For the mixture model, we now show in Supplementary figure 7 the probabilities obtained from 10 repetitions of the simulation. Similarly, for the Markov chain model, the probabilities obtained from 10 repetitions of the simulation are shown in Figure 6.

      Regarding model comparison, we have simplified our mixture model into only one model, as we realized the 2 other models in the previous manuscript version were simply special cases of the 3rd model. Nevertheless, comparison was still needed for the estimation for the best value of N (the number of consecutive segments that a strategy lasts) in the mixture model. We now show the comparison of mean square errors obtained for different values of N, using t-test across 10 repetitions of the simulations (Figure 5c).

      (5) Quantification of inter-animal variations in strategy development: It is important to investigate, and address the argument concerning the possibility that not all animals recruit and develop the three processes (random, spatial, and serial) in a similar manner over days of training. It would be valuable to quantify the transition in strategy across days for each individual mouse and analyze how the population average, reflecting data from individual mice, corresponds to these findings. Currently, there is a lack of such quantification and analysis in the paper.

      We have added a figure (Supplementary figure 8) showing the mixture model matching analyses for individual animals. A lot of variability is indeed observed across animals, with some animals displaying strong preferences for certain strategies compare to others. The average across mouse population showed a similar trend as the result obtained with the pooled data.

      Recommendations for the authors:

      Summary of Reviewer Comments:

      (1) In its present form, the manuscript lacks sufficient coverage of the materials and methods necessary for readers to replicate the behavioral methodology in their own research inquiries. For instance, it would be beneficial to clarify how the cues are rotated relative to the goal.

      (2) The models may be over-fitted, leading to spurious conclusions, and cross-validation is necessary to rule out this possibility.

      (3) The specific choice of the three strategies used to fit behavior in this model should be better justified, as other strategies may account for the observed behavior.

      (4) The study would benefit from an analysis of behavior on an animal-by-animal basis, potentially revealing individual differences in strategies.

      (5) Spatial behavior is not necessarily fully allocentric in this task, as only the two cues in the arena can be used for spatial orientation, unlike odor cues on the floor and sound cues in the room. This should be discussed.

      (6) Making the data and code fully open source would greatly strengthen the impact of this study.

      In addition, each reviewer has raised both major and minor concerns which should be addressed if possible.

      Reviewer #1 (Recommendations For The Authors):

      Minor points:

      (1) Change "tainted" to "tinted" in Fig. 1a

      (2) Should note explicitly in Fig. 2d that the goal is at vestibule 0, and also in the legend

      (3) Fig. 3 legend should say "c-e)", not "c-f)"

      (4) Supplementary Fig. 8 legend repeats "d)" twice

      Reviewer #2 (Recommendations For The Authors):

      Packard & McGaugh 1996 is cited twice as refs 5 and 14

      Reviewer #3 (Recommendations For The Authors):

      - Figure 3: Please correct the labels referenced as "c-f)" in the figure's legend.

      - Rounding numbers issue on page 4: 82.62% + 17.37% equals 99.99%, not 100%.

      We fixed all minor points. We are very thankful to the reviewers for their constructive comments.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We are thankful to the reviewers and the editor for their detailed feedback, insightful suggestions, and thoughtful assessment of our work. Our point-by-point responses to the comments and suggestions are below.

      The revised manuscript has taken into account all the comments of the three reviewers. Modifications include corrections to errors in spelling and unit notation, additional quantification, improvements to the clarity of the language in some places, as well as additional detail in the descriptions of the methods, and revisions to the figures and figure legends.

      We have also undertaken additional analyses and added materials in response to reviewer suggestions. In brief:

      In response to a suggestion from Reviewer #1, we added Figure 6-1 to show examples of the calcium traces of individual fish and individual ROIs from the condensed data in Figure 6. We revised Figure 7 as follows:

      • We added an analysis of the duration of the response to shock to address comments from Reviewers #2 and #3.

      • In response to Reviewer #3, we added histograms showing the distribution of the amplitudes of the calcium signals in the gsc2 and rln3a neurons to show, without relying on the detection of peaks in the calcium trace, that the rln3a neurons have more oscillations in activity.

      We added Figure 8-2 in response to the suggestion from Reviewer #3 to analyze turning behavior in larvae with ablated rln3a neurons.

      To address Reviewer #2’s suggestion to show how the ablated transgenic animals compare to the non-ablated transgenic animals of the same genotype, we have added this analysis as Figure 8-3.

      A detailed point-by-point is as follows:

      The reviewers agree that the study of Spikol et al is important, with novel findings and exciting genetic tools for targeting cell types in the nucleus incertus. The conclusions are overall solid. Results could nonetheless be strengthened by performing few additional optogenetic experiments and by consolidating the analysis of calcium imaging and behavioral recordings as summarized below.

      (1) Light pulses used for optogenetic-mediated connectivity mapping were very long (5s), which could lead to non specific activation of numerous population of neurons than the targeted ones. To confirm their results, the authors should repeat their experiments with brief 5-50ms (500ms maximum) -long light pulses for stimulation.

      As the activity of the gsc2 neurons is already increased by 1.8 fold (± 0.28) within the first frame that the laser is activated (duration ~200 msec), it is unlikely that that the observed response is due to non-specific activation induced by the long light pulse.

      (2) In terms of analysis, the authors should improve :

      a) The detection of calcium events in the "calcium trace" showing the change in fluorescence over time by detecting the sharp increase in the signal when intracellular calcium rises;

      We have added an additional analysis to Figure 7 that does not rely on detection of calcium peaks. See response to Reviewer #3.

      b) The detection of bouts in the behavioral recordings by measuring when the tail beat starts and ends, thereby distinguishing the active swimming during bouts from the immobility observed between bouts.

      Our recordings capture the entire arena that the larva can explore in the experiment and therefore lack the spatial resolution to capture and analyze the tail beat. Rather, we measured the frequency and length of phases of movement in which the larva shows no more than 1 second of immobility. To avoid confusion with studies that measure bouts from the onset of tail movement, we removed this term from the manuscript and refer to activity as phases of movement.

      (3) The reviewers also ask for more precisions in the characterization of the newly-generated knock-in lines and the corresponding anatomy as explained in their detailed reports.

      Please refer to the point-by-point request for additional details that have now been added to the manuscript.

      Reviewer #1 (Recommendations For The Authors):

      The conclusions of this paper are mostly well supported by data, but some technical aspects, especially about calcium imaging and data analysis, need to be clarified.

      (1) Both the endogenous gsc2 mRNA expression and Tg(gsc2:QF2) transgenic expression are observed in a neuronal population in the NI, but also in a more sparsely distributed population of neurons located more anteriorly (for example, Fig. 2B, Fig. 5A). The latter population is not mentioned in the text. It would be necessary to clarify whether or not this anterior population is also considered as the NI, and whether this population was included for the analysis of the projection patterns and ablation experiments.

      The sparsely distributed neurons had been mentioned in the Results, line 134, but we have now added more detail. In line 328, we have clarified that: “As the sparsely distributed anterior group of gsc2 neurons (Fig. 2B, C) are anatomically distinct from the main cluster and not within the nucleus incertus proper, they were excluded from subsequent analyses.”

      (2) Both Tg(gsc2:QF2) and Tg(rln3a:QF2) transgenic lines have the QF genes inserted in the coding region of the targeted genes. This probably leads to knock out of the gene in the targeted allele. Can the authors mention whether or not the endogenous expression of gsc2 and rln3a was affected in the transgenic larvae? Is it possible that the results they obtained using these transgenic lines are affected by the (heterozygous or homozygous) mutation of the targeted genes?

      Figure 8-1 includes in situ hybridization for gsc2 and rln3a in heterozygous Tg(gsc2:QF2)c721; Tg(QUAS:GFP)c578 and Tg(rln3a:QF2; he1.1:YFP)c836; Tg(QUAS:GFP)c578 transgenic larvae.

      The expression of gsc2 is unaffected in Tg(gsc2:QF2)c721; Tg(QUAS:GFP)c578 heterozygotes

      (Fig. 8-1A), whereas the expression of rln3a is reduced in Tg(rln3a:QF2; he1.1:YFP)c836; Tg(QUAS:GFP)c578 heterozygous larvae (Fig. 8-1D), as mentioned in the legend for Figure 8-1. We confirmed these findings by comparing endogenous gene expression between transgenic and non-transgenic siblings that were processed for RNA in situ hybridization in the same tube.

      The behavioral results we obtained are not due to rln3a heterozygosity because comparisons were made with sibling larvae that are also heterozygous for Tg(rln3a:QF2; he1.1:YFP)c836; Tg(QUAS:GFP)c578, as stated in the Figure 8 legend.

      (3) Optogenetic activation and simultaneous calcium imaging is elegantly designed using the combination of the orthogonal Gal4/UAS and QF2/QUAS systems (Fig. 6). However, I have some concerns about the analysis of calcium responses from a technical point of view. Their definition of ΔF/F in this manuscript is described as (F-Fmin)/(Fmax-Fmin) (see line 1406). This is confusing because it is different from the conventional definition of ΔF/F, which is F-F0/F0, where F0 is a baseline GCaMP fluorescence. Their way of calculating the ΔF/F is inappropriate for measuring the change in fluorescence relative to the baseline signal because it rather normalizes the amplitude of the responses across different ROIs. The same argument applies to the analyses done for Fig. 7.

      We have taken a careful look at our analyses and replotted the data using F-F0/F0. However, this only changes Y-axis values and does not change the shape of the calcium trace or the change in signal upon stimulation. Both metrics (F-F0/F0 and (F-Fmin)/(Fmax-Fmin)) adjust the fluorescence values of each ROI to its own baseline.

      (4) The %ΔF/F plots shown in Fig.6 are highly condensed showing the average of different ROIs (cells) within one fish and then the average of multiple fish. It would be helpful to see example calcium traces of individual ROIs and individual fish to know the variability across ROIs and fish. Also, It would be helpful to know how much laser power (561 nm laser) was used to photostimulate ReaChR.

      Laser power (5%) was added to the section titled Calcium Signaling in Methods.

      In Figure 6, shading in the %ΔF/F plots (D, D’, E, E’, F, F’, G, G’, H, H’) represents the variability across ROIs, and the dot plots (D’’, E’’, F’’, G’’, H’’) show the variability across fish (where each data point represents an individual fish). We have now also added Figure 6-1 with examples of calcium traces from individual fish and individual ROIs.

      (5) Some calcium traces presented in Fig. 6 (Fig. 6D, D', F, H, H') show discontinuous fluctuations at the onset and offset of the photostimulation period. Is this caused by some artifacts introduced by switching the settings for the photostimulation? The authors should mention if there are some alternative explanations for this discontinuity.

      As noted by the reviewer, this artifact does result from switching the settings for photostimulation, which we mention in the legend for Figure 6.

      (6) In the introduction, they mention that the griseum centrale is a presumed analogue of the NI (lines 74-75). It would be helpful for the readers to better understand the brain anatomy if the authors could discuss whether or not their findings on the gsc2 and rln3a NI neurons support this idea.

      Our findings on the gsc2 and rln3a neurons support the idea that the griseum centrale of fish is the analogue of the mammalian NI. We have now edited the text in the third paragraph of the discussion, line 1271, to make this point more clearly: “By labeling with QUAS-driven fluorescent reporters, we determined that the anatomical location, neurotransmitter phenotype, and hodological properties of gsc2 and rln3a neurons are consistent with NI identity, supporting the assertion that the griseum centrale of fish is analogous to the mammalian NI. Both groups of neurons are GABAergic, reside on the floor of the fourth ventricle and project to the interpeduncular nucleus.”

      Reviewer #2 (Recommendations For The Authors):

      Major comments:

      (1) Throughout the figures a need for more precision and reference in the anatomical evidence:

      • Specify how many planes over which height were projected for each Z-projection in Figure 1,2,3, ....

      We added this information to the last paragraph of the section titled Confocal Imaging within the Materials and Methods.

      • Provide the rhombomere numbers, deliminate the ventricles & always indicate on the panel the orientation (Rostral Caudal, Left Right or Ventral Dorsal) for Figure 1 panels D-F , Figure 2-1B-G, Figure 2-2A-C in the adult brain, Figure 3.

      We annotated Figures 2-1 and 2-2 as suggested. We also indicated the orientation (anterior to the top or anterior to the left) in all figure legends. For additional context on the position of gsc2 and rln3a neurons within the larval brain, refer to Fig. 1A-C’, Fig. 1-2A, Fig. 2, Fig. 4 and Fig. 5.

      • Add close up when necessary: Figure 2-2A-C, specify in the text & in the figure where are the axon bundles from the gsc2+ neurons in the adult brain- seems interesting and is not commented on?

      We added a note to the legend of Figure 2-2: Arrowheads in B and B’ indicate mApple labeling of gsc2 neuronal projections to the hypothalamus. We also refer to Fig 2-2B, B’ in the Results section titled Distinct Projection Patterns of gsc2 and rln3a neurons.

      • keep the same color for one transgene within one figure: example, glutamatergic neurons should always be the same color in A,B,C - it is confusing as it is.

      We have followed the reviewer’s suggestion and made the color scheme consistent in Figure 3.

      • Movies: add the labels (which transgenic lines in which color, orientation & anatomical boundaries for NI, PAG, any other critical region that receives their projections and the brain ventricle boundaries) on the anatomical movies in supplemental (ex Movie 4-1 for gsc2 neurons and 4-2 for rln3 neurons: add cerebellum, IPN, raphe, diencephalon, and rostral and caudal hypothalamus, medulla for 4-1 as well as lateral hypothalamus and optic tectum for 42); add the ablated region when necessary.

      We added more detail to the movie legends. Please refer to Figure 4 for additional anatomical details.

      • for highlighting projections from NI neurons and distinguish them from the PAG neurons, the authors elegantly used 2 Photon ablation of one versus the other cluster: this method is valid but we need more resolution that the Z stacks added in supplemental by performing substraction of before and after maps.

      We are not sure what the author meant by subtraction as there are no before and after images in this experiment. Larvae underwent ablation of cell bodies and were imaged one day later in comparison to unablated larvae.

      In particular, it is not clear to me if both PAG and NI rln3a neurons project to medulla - can the authors specify this point & the comparison between intact & PAG vs NI ablation maps? The authors should resolve better the projections to all targeted regions of NI gsc2 neurons and differentiate them from other PAG gsc2 neurons, same for rln3a neurons.

      We have clarified this point on line 549.

      Make sure to mention in the result section the duration between ablation & observation that is key for the axons to degrade.

      We always assessed degeneration of neuronal processes at 1-day post-ablation.

      (“2) calcium imaging experiments:

      a) with optogenetic connectivity mapping:

      the authors combine an impressive diverse set of optogenetic actuators & sensors by taking advantage of the QUAS/QF2 and UAS/GAL4 systems to test connectivity from Hb-IPN onto gsc2 and rln3 neurons.

      The experiments are convincing but the choice of the duration of the stimulation (5s) is not adequate to test for direct connectivity: the authors should make sure that response in gsc2 neurons is observed with short duration (50ms-1s max).

      As noted above:

      “As the activity of the gsc2 neurons is already increased by 1.8 fold (± 0.28) within the first frame that the laser is activated (duration ~200 msec), it is unlikely that that the observed response is due to non-specific activation induced by the long light pulse.”

      note: Specify that the gsc2 neurons tested are in NI.

      We have edited the text accordingly in the Results section titled Afferent input to the NI from the dHb-IPN pathway.

      b) for the response to shock: in the example shown for rln3 neurons, the activity differs before and after the shock with long phases of inhibition that were not seen before. Is it representative? the authors should carefully stare at their data & make sure there is no difference in activity patterns after shock versus before.

      We reexamined the responses for each of the rln3a neurons individually and confirmed that, although oscillations in activity are frequent, the apparent inhibition (excursions below baseline) are an idiosyncratic feature of the particular example shown.

      (3) motor activity assay:

      a) there seems to be a misconception in the use of the word "bout" to estimate in panels H and I bout distance and duration and the analysis should be performed with the criterion used by all in the motor field:

      As we know now well based on the work of many labs on larval zebrafish (Orger, Baier, Engert, Wyart, Burgess, Portugues, Bianco, Scott, ...), a bout is defined as a discrete locomotor event corresponding to a distance swam of typically 1-6mm, bout duration is typically 200ms and larvae exhibit a bout every s or so during exploration (see Mirat et al Frontiers 2013; Marques et al Current Biology 2018; Rajan et al. Cell Reports 2022).

      Since the larval zebrafish has a low Reynolds number, it does not show much glide and its movement corresponds widely to the active phase of the tail beats.

      Instead of detecting the active (moving) frames as bouts, the authors however estimate these values quite off that indicate an error of calibration in the detection of a movement: a bout cannot last for 5-10s, nor can the fish swim for more than 1 cm per bout (in the definition of the authors, bout last for 5-10 s, and bout correspond to 10 cm as 50 cm is covered in 5 bouts).

      The authors should therefore distinguish the active (moving) from inactive (immobile) phase of the behavior to define bouts & analyze the corresponding distance travelled and duration of active swimming. They would also benefit from calculating the % of time spent swimming in order to test whether the fish with ablated rln3 neurons change the fraction of the time spent swimming.

      As noted above:

      Our recordings capture the entire arena that the larva can explore in the experiment and therefore lack the spatial resolution to capture and analyze the tail beat. Rather, we measured the frequency and length of phases of movement in which the larva shows no more than 1 second of immobility. To avoid confusion with studies that measure bouts from the onset of tail movement, we removed this term from the manuscript and refer to activity as phases of movement.

      Note that a duration in seconds is not a length and that the corresponding symbol for seconds in a scientific publication is "s" and not "sec".

      We have corrected this.

      b) controls in these experiments are key as many clutches differ in their spontaneous exploration and there is a lot of variation for 2 min long recordings (baseline is 115s). The authors specify that the control unablated are a mix of siblings; they should show us how the ablated transgenic animals compare to the non ablated transgenic animals of the same clutch.

      The unablated Tg(gsc2:QF2)c721; Tg(QUAS:GFP)c578 and Tg(rln3a:QF2, he1.1:YFP)c836; Tg(QUAS:GFP)c578 larvae in the control group are siblings of ablated larvae. We repeated the analyses using either the Tg(gsc2:QF2)c721; Tg(QUAS:GFP)c578 or Tg(rln3a:QF2, he1.1:YFP)c836; Tg(QUAS:GFP)c578 larvae only as controls and added the results in Figure 8-3. Although the statistical power is slightly reduced due to a smaller number of samples in the control group, the conclusions are the same, as the behavior of Tg(gsc2:QF2)c721; Tg(QUAS:GFP)c578 and Tg(rln3a:QF2, he1.1:YFP)c836; Tg(QUAS:GFP)c578 unablated larvae is indistinguishable.

      Minor comments:

      (1) Anatomy :

      • Add precision in the anatomy in Figure 1:

      • Improve contrast for cckb.

      The contrast is determined by the signal to background ratio from the fluorescence in situ hybridization. Increasing the brightness would increase both the signal and the background, as any modification must be applied to the whole image.

      • since the number of neurons seems low in each category, could you quantify the number of rln3+, nmbb+, gsc2+, cckb+ neurons in NI?

      Quantification of neuronal numbers has been added to the first Results section titled Identification of gsc2 neurons in the Nucleus Incertus, lines 219-224.

      note: indicate duration for the integral of the DF/F in s and not in frames.

      We have added this in the legends for Figures 6 and 7 and in Materials and Methods.

      (2) Genetic tools:

      To generate a driver line for the rln3+ neurons using the Q system, the authors used the promoter for the hatching gland in order to drive expression in a structure outside of the nervous system that turns on early and transiently during development: this is a very elegant approach that should be used by many more researchers.

      If the her1 construct was integrate together with the QF2 in the first exon of the rln3 locus as shown in Figure 2, the construct should not be listed with a ";" instead of a "," behind rln3a:QF2 in the transgene name. Please edit the transgene name accordingly.

      We have edited the text accordingly.

      (3) Typos:

      GABAergic neurons is misspelled twice in Figure 3.

      Thank you for catching this. We have corrected the misspellings.

      Reviewer #3 (Recommendations For The Authors):

      • More analysis should be done to better characterize the calcium activity of gsc2 and rln3a populations. Specifically:

      Spontaneous activity is estimated by finding peaks in the time-series data, but the example in Fig7 raises concerns about this process: Two peaks for the gsc2 cell are identified while numerous other peaks of apparently similar SNR are not detected. Moreover, the inset images suggest GCaMP7a expression might be weaker in the gsc2 transgenic and as such, differences in peak count might be related to the SNR of the recordings rather than underlying activity. Overall, the process for estimating spontaneous activity should be more rigorous.

      To not solely rely on the identification of peaks in the calcium traces, we also plotted histograms of the amplitudes of the calcium signals for the rln3a and gsc2 neurons. The histograms show that the amplitudes of the rln3a calcium signals frequently occur at small and large values (suggesting large fluctuations in activity), whereas the amplitudes of the gsc2 calcium signals occur most frequently at median values. We added this analysis to a revised Figure 7.

      Interestingly, there are a number of large negative excursions in the calcium data for the rln3a cell - what is the authors' interpretation of these? Could it be that presynaptic inhibition via GABA-B receptors in dIPN might influence dIPN-innervating rln3a neurons?

      As noted above:

      We reexamined the responses for each of the rln3a neurons individually and confirmed that, although oscillations in activity are frequent, the apparent inhibition (excursions below baseline) are an idiosyncratic feature of the particular example shown.

      Regarding shock-evoked activity, the authors state "rln3a neurons showed ... little response to shock", yet the immediate response after shock appears very similar in gsc2 vs rln3a cells (approx 30 units on the dF/F scale). The subsequent time-course of the response is what appears to distinguish gsc2 versus rln3a; it might thus be useful to separately quantify the amplitude and decay time constant of the shock evoked response for the two populations.

      The reviewer is correct that the difference between the gsc2 and rln3a neurons in the response to shock is dependent on the duration of time post-shock that is analyzed. Thus, the more relevant feature is the length of the response rather than the size. To reflect this, we compared the average length of responses for the gsc2 and rln3a neurons. We have now added this analysis to Figure 7 and updated the text accordingly.

      • The difference in spontaneous locomotor behavior is interesting and the example tracking data suggests there might also be differences in turn angle distribution and/or turn chain length following rln3 NI ablations. I would recommend the authors consider exploring this.

      Thank you for this suggestion. We wrote additional code to quantify turning behavior and found that larvae with rln3a NI neurons ablated do indeed have a statistically significant increase in turning compared to other groups. We now show this analysis as Figure 8-2 and we added an explanation of the quantification of turning behavior to the Methods section titled Locomotor assay.

      • I didn't follow the reasoning in the discussion that activity of rln3a cells may control transitions between phases of behavioral activity and inactivity. The events (at least those that are detected) in Fig7 occur with an average interval exceeding 30 s, yet swim bouts occur at a frequency around 1 Hz. The authors should clarify their hypothesis about how these disparate timescales might be connected.

      As noted above:

      Our recordings capture the entire arena that the larva can explore in the experiment and therefore lack the spatial resolution to capture and analyze the tail beat. Rather, we measure the frequency and length of phases of movement in which the larva shows no more than 1 second of immobility. To avoid confusion with studies that measure bouts from the onset of tail movement, we removed this term from the manuscript and refer to activity as phases of movement.

      • Fig2-2: Images are ordered from (A, B, C) anterior to (A', B', C') posterior. Its not clear what this means and images appear to be in sequence A, A', B, B'.... please clarify and consider including a cartoon of the brain in sagittal view showing location of sections indicated.

      We clarified the text in the Figure 2-2 legend and added a drawing of the brain showing the location of the sections.

      • In Fig7, why are 300 frames analyzed pre/post shock? Even for gsc2, the response appears complete in ~100 frames.

      Reviewer #2 also pointed out that the difference between the gsc2 and rln3a neurons in the response to shock is dependent on the duration of time post-shock that is analyzed. Thus, the more relevant feature is the length of the response rather than the size. To reflect this, we compared the average length of response for the gsc2 and rln3a neurons and modified the text and Figure as described above.

      • What are the large negative excursions in the calcium signal in the rln3a data (Fig7E)?

      See response to Reviewer # 2, repeated below:

      We looked through each of the responses of individual rln3a neuron and confirmed that, although oscillations in activity are frequent among the rln3a neurons, the apparent inhibition (excursions below baseline) are an idiosyncratic feature of the particular example shown.

      • There are several large and apparently perfectly straight lines in the fish tracking examples (Fig8) suggestive of tracking errors (ie. where the tracked centroid instantaneously jumps across the camera frame). Please investigate these and include analysis of the distribution of swim velocities to support the validity of the tracking data.

      The reason for this is indeed imperfect tracking resulting in frames in which the tracker does not detect the larva. The result is that the larva appears to move 1 cm or more in a single frame. However, analysis of the distribution of distances across all frames shows that these events (movement of 1 cm or more in a single frame) are rare (less than 0.04%), and there are no systematic differences that would explain the differences in locomotor behavior presented in Fig. 8. A summary of the data is as follows:

      Controls: 0.0249% of distances 1 cm or greater gsc2 neurons ablated: 0.0302% of distances 1 cm or greater rln3a NI neurons ablated: 0.0287% of distances 1 cm or greater rln3a PAG neurons ablated: 0.0241% of distance 1 cm or greater

      • Insufficient detail is provided in the methods about how swim bouts are detected (and their durations extracted) from the centroids tracking data. Please expand detail in this section.

      We added an explanation to the Methods section titled Locomotor assay.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment:

      This study uses carefully designed experiments to generate a useful behavioural and neuroimaging dataset on visual cognition. The results provide solid evidence for the involvement of higher-order visual cortex in processing visual oddballs and asymmetry. However, the evidence provided for the very strong claims of homogeneity as a novel concept in vision science, separable from existing concepts such as target saliency, is inadequate.

      We appreciate the positive and balanced assessment from the reviewers. We agree that visual homogeneity is similar to existing concepts such as target saliency. We have tried our best to articulate our rationale for defining it as a novel concept. However, the debate about whether visual homogeneity is novel or related to existing concepts is completely beside the point, since that is not the key contribution of our study.

      Our key contribution is our quantitative model for how the brain could be solving generic visual tasks by operating on a feature space. In the literature there are no theories regarding the decision-making process by which the brain could be solving generic visual tasks. In fact, oddball search tasks, same-different tasks and symmetry tasks are never even mentioned in the same study because it is tacitly assumed that the underlying processes are completely different! Our work brings together these disparate tasks by proposing a specific computation that enables the brain to solve both types of tasks and providing evidence for it. This specific computation is a well-defined, falsifiable model that will need to be replicated, elaborated and refined by future studies.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors define a new metric for visual displays, derived from psychophysical response times, called visual homogeneity (VH). They attempt to show that VH is explanatory of response times across multiple visual tasks. They use fMRI to find visual cortex regions with VH-correlated activity. On this basis, they declare a new visual region in the human brain, area VH, whose purpose is to represent VH for the purpose of visual search and symmetry tasks.

      Thank you for your concise summary. We appreciate your careful reading and thoughtful and constructive comments.

      Strengths:

      The authors present carefully designed experiments, combining multiple types of visual judgments and multiple types of visual stimuli with concurrent fMRI measurements. This is a rich dataset with many possibilities for analysis and interpretation.

      Thank you for your accurate assessment of the strengths of our study.

      Weaknesses:

      The datasets presented here should provide a rich basis for analysis. However, in this version of the manuscript, I believe that there are major problems with the logic underlying the authors' new theory of visual homogeneity (VH), with the specific methods they used to calculate VH, and with their interpretation of psychophysical results using these methods. These problems with the coherency of VH as a theoretical construct and metric value make it hard to interpret the fMRI results based on searchlight analysis of neural activity correlated with VH.

      We appreciate your concerns, and have tried our best to respond to them fully against your specific concerns below.

      In addition, the large regions of VH correlations identified in Experiments 1 and 2 vs. Experiments 3 and 4 are barely overlapping. This undermines the claim that VH is a universal quantity, represented in a newly discovered area of the visual cortex, that underlies a wide variety of visual tasks and functions.

      We agree with you that the VH regions defined using symmetry task and search task do not overlap completely (as we have shown in Figure S13). However this is to be expected for several reasons. First, the images in the symmetry task were presented at fixation, whereas the images in the visual search task were presented peripherally. Second, the lack of overlap could be due to variations across individuals. Indeed, considerable individual variability has been observed in the location of category-selective regions such as VWFA (Glezer and Riesenhuber 2013) and FFA (Weiner and Grill-Spector, 2012). We propose that testing the same participants on both search and symmetry tasks would reveal overlapping VH regions. We now acknowledge these issues in the Results (p. 26).

      Maybe I have missed something, or there is some flaw in my logic. But, absent that, I think the authors should radically reconsider their theory, analyses, and interpretations, in light of the detailed comments below, to make the best use of their extensive and valuable datasets combining behavior and fMRI. I think doing so could lead to a much more coherent and convincing paper, albeit possibly supporting less novel conclusions.

      We appreciate your concerns. We have tried our best to respond to them fully against your specific concerns below.

      THEORY AND ANALYSIS OF VH

      (1) VH is an unnecessary, complex proxy for response time and target-distractor similarity. VH is defined as a novel visual quality, calculable for both arrays of objects (as studied in Experiments 1-3) and individual objects (as studied in Experiment 4). It is derived from a center-to-distance calculation in a perceptual space. That space in turn is derived from the multi-dimensional scaling of response times for target-distractor pairs in an oddball detection task (Experiments 1 and 2) or in a same-different task (Experiments 3 and 4).

      The above statements are not entirely correct. Experiments 1 & 3 are oddball visual search experiments. Their purpose was to estimate the underlying perceptual space of objects.

      Proximity of objects in the space is inversely proportional to response times for arrays in which they were paired. These response times are higher for more similar objects. Hence, proximity is proportional to similarity. This is visible in Fig. 2B as the close clustering of complex, confusable animal shapes.

      VH, i.e. distance-to-center, for target-present arrays, is calculated as shown in Fig. 1C, based on a point on the line connecting the target and distractors. The authors justify this idea with previous findings that responses to multiple stimuli are an average of responses to the constituent individual stimuli. The distance of the connecting line to the center is inversely proportional to the distance between the two stimuli in the pair, as shown in Fig. 2D. As a result, VH is inversely proportional to the distance between the stimuli and thus to stimulus similarity and response times. But this just makes VH a highly derived, unnecessarily complex proxy for target-distractor similarity and response time. The original response times on which the perceptual space is based are far more simple and direct measures of similarity for predicting response times.

      We agree that VH brings no explanatory power to target-present searches, since target-present response times are a direct estimate of target-distractor similarity. However, we are additionally explaining target-absent response times. Target-absent response times are well known to vary systematically with image properties, but why they do so have not been clear in the literature.

      Our key conceptual advance lies in relating the neural response to a search array to the neural response of the constituent elements, and in proposing a decision variable using which participants can make both target-present and target-absent judgements on any search array.

      (2) The use of VH derived from Experiment 1 to predict response times in Experiment 2 is circular and does not validate the VH theory.

      The use of VH, a response time proxy, to predict response times in other, similar tasks, using the same stimuli, is circular. In effect, response times are being used to predict response times across two similar experiments using the same stimuli. Experiment 1 and the target present condition of Experiment 2 involve the same essential task of oddball detection. The results of Experiment 1 are converted into VH values as described above, and these are used to predict response times in Experiment 2 (Fig. 2F). Since VH is a derived proxy for response values in Experiment 1, this prediction is circular, and the observed correlation shows only consistency between two oddball detection tasks in two experiments using the same stimuli.

      We agree that it would be circular to use oddball search times in Experiment 1 to explain only target-present search times in Experiment 2, since they basically involve the same searches. However, we are explaining both target-present and target-absent search times in a unified framework; systematic variations in target-absent search times have been noted in the literature but never really explained. One could still simply say that target-absent search times are some function of the target-present search times, but this still doesn’t provide an explanation for how participants are making target-present and absent decisions. The existing literature contains models for how visual search might occur for a specific target and distractor but does not elucidate how participants might perform generic visual search where target and distractors are not known in advance.

      Our key conceptual advance lies in relating the neural response to a search array to the neural response of the constituent elements, and in proposing a decision variable using which participants can make both target-present and target-absent judgements on any search array.

      (3) The negative correlation of target-absent response times with VH as it is defined for target-absent arrays, based on the distance of a single stimulus from the center, is uninterpretable without understanding the effects of center-fitting. Most likely, center-fitting and the different VH metrics for target-absent trials produce an inverse correlation of VH with target-distractor similarity.

      We see no cause for concern with the center-fitting procedure, for several reasons. First, the best-fitting center remained stable despite many randomly initialized starting points. Second, the best-fitting center derived from one set of objects was able to predict the target-absent and target-present responses of another set of objects. Finally, the VH obtained for each object (i.e. distance from the best-fitting center) is strongly correlated with the average distance of that object from all other objects (Figure S1A). We have now clarified this in the Results (p. 11).

      The construction of the VH perceptual space also involves fitting a "center" point such that distances to center predict response times as closely as possible. The effect of this fitting process on distance-to-center values for individual objects or clusters of objects is unknowable from what is presented here. These effects would depend on the residual errors after fitting response times with the connecting line distances. The center point location and its effects on the distance-to-center of single objects and object clusters are not discussed or reported here.

      While it is true that the optimal center needs to be found by fitting to the data, there no particular mystery to the algorithm: we are simply performing a standard gradient-descent to maximize the fit to the data. We have described the algorithm clearly and are making our codes public. We find the algorithm to yield stable optimal centers despite many randomly initialized starting points. We find the optimal center to be able to predict responses to entirely novel images that were excluded during model training. We are making no assumption about the location of centre with respect to individual points. Therefore, we see no cause for concern regarding the center-finding algorithm.

      Yet, this uninterpretable distance-to-center of single objects is chosen as the metric for VH of target-absent displays (VHabsent). This is justified by the idea that arrays of a single stimulus will produce an average response equal to one stimulus of the same kind. However, it is not logically clear why response strength to a stimulus should be a metric for homogeneity of arrays constructed from that stimulus, or even what homogeneity could mean for a single stimulus from this set. It is not clear how this VHabsent metric based on single stimuli can be equated to the connecting line VH metric for stimulus pairs, i.e. VHpresent, or how both could be plotted on a single continuum.

      Most visual tasks, such as finding an animal, are thought to involve building a decision boundary on some underlying neural representation. Even visual search has been portrayed as a signal-detection problem where a particular target is to be discriminated from a distractor. However none of these formulations work in the case of generic visual tasks, where the target and distractor identities are unknown. We are proposing that, when we view a search array, the neural response to the search array can be deduced from the neural responses to the individual elements using well known rules, and that decisions about an oddball target being present or absent can be made by computing the distance of this neural response from some canonical mean firing rate of a population of neurons. This distance to center computation is what we denote as visual homogeneity. We have revised our manuscript throughout to make this clearer and we hope that this helps you understand the logic better.

      It is clear, however, what should be correlated with difficulty and response time in the target-absent trials, and that is the complexity of the stimuli and the numerosity of similar distractors in the overall stimulus set. The complexity of the target, similarity with potential distractors, and the number of such similar distractors all make ruling out distractor presence more difficult. The correlation seen in Fig. 2G must reflect these kinds of effects, with higher response times for complex animal shapes with lots of similar distractors and lower response times for simpler round shapes with fewer similar distractors.

      You are absolutely correct that the stimulus complexity should matter, but there are no good measures for stimulus complexity. But considering what factors are correlated with target-absent response times is entirely different from asking what decision variable or template is being used by participants to solve the task.

      The example points in Fig. 2G seem to bear this out, with higher response times for the deer stimulus (complex, many close distractors in the Fig. 2B perceptual space) and lower response times for the coffee cup (simple, few close distractors in the perceptual space). While the meaning of the VH scale in Fig. 2G, and its relationship to the scale in Fig. 2F, are unknown, it seems like the Fig. 2G scale has an inverse relationship to stimulus complexity, in contrast to the expected positive relationship for Fig. 2F. This is presumably what creates the observed negative correlation in Fig. 2G.

      Taken together, points 1-3 suggest that VHpresent and VHabsent are complex, unnecessary, and disconnected metrics for understanding target detection response times. The standard, simple explanation should stand. Task difficulty and response time in target detection tasks, in both present and absent trials, are positively correlated with target-distractor similarity.

      Respectfully, we disagree with your assessment. Your last point is not logically consistent though: response times for target-absent trials cannot be correlated with any target-distractor similarity since there is no target in the first place in a target-absent array. We have shown that target-absent response times are in fact, independent of experimental context, which means that they index an image property that is independent of any reference target (Results, p. 15; Section S4). This property is what we define as visual homogeneity.

      I think my interpretations apply to Experiments 3 and 4 as well, although I find the analysis in Fig. 4 especially hard to understand. The VH space in this case is based on Experiment 3 oddball detection in a stimulus set that included both symmetric and asymmetric objects. However, the response times for a very different task in Experiment 4, a symmetric/asymmetric judgment, are plotted against the axes derived from Experiment 3 (Fig. 4F and 4G). It is not clear to me why a measure based on oddball detection that requires no use of symmetry information should be predictive of within-stimulus symmetry detection response times. If it is, that requires a theoretical explanation not provided here.

      We are using an oddball detection task to estimate perceptual dissimilarity between objects, and construct the underlying perceptual representation of both symmetric and asymmetric objects. This enabled us to then ask if some distance-to-center computation can explain response times in a symmetry detection task, and obtain an answer in the affirmative. We have reworked the text to make this clear.

      (4) Contrary to the VH theory, same/different tasks are unlikely to depend on a decision boundary in the middle of a similarity or homogeneity continuum.

      We have provided empirical proof for our claims, by showing that target-present response times in a visual search task are correlated with “different” responses in the same-different task, and that target-absent response times in the visual search task are correlated with “same” responses in the same-different task (Section S3).

      The authors interpret the inverse relationship of response times with VHpresent and VHabsent, described above, as evidence for their theory. They hypothesize, in Fig. 1G, that VHpresent and VHabsent occupy a single scale, with maximum VHpresent falling at the same point as minimum VHabsent. This is not borne out by their analysis, since the VHpresent and VHabsent value scales are mainly overlapping, not only in Experiments 1 and 2 but also in Experiments 3 and 4. The authors dismiss this problem by saying that their analyses are a first pass that will require future refinement. Instead, the failure to conform to this basic part of the theory should be a red flag calling for revision of the theory.

      We respectfully disagree – by no means did we dismiss this problem! In fact, we have explicitly acknowledged this by saying that VH does not explain all the variance in the response times, but nonetheless explains substantial variance and might form the basis for an initial guess or a fast response. The remaining variance might be explained by processes that involve more direct scrutiny. Please see Results, page 10 & 22.

      The reason for this single scale is that the authors think of target detection as a boundary decision task, along a single scale, with a decision boundary somewhere in the middle, separating present and absent. This model makes sense for decision dimensions or spaces where there are two categories (right/left motion; cats vs. dogs), separated by an inherent boundary (equal left/right motion; training-defined cat/dog boundary). In these cases, there is less information near the boundary, leading to reduced speed/accuracy and producing a pattern like that shown in Fig. 1G.

      The key conceptual advance of our study is that we show that even target/present, same/different or symmetry judgements can be fit into the standard decision-making framework.

      This logic does not hold for target detection tasks. There is no inherent middle point boundary between target present and target absent. Instead, in both types of trials, maximum information is present when the target and distractors are most dissimilar, and minimum information is present when the target and distractors are most similar. The point of greatest similarity occurs at the limit of any metric for similarity. Correspondingly, there is no middle point dip in information that would produce greater difficulty and higher response times. Instead, task difficulty and response times increase monotonically with the similarity between targets and distractors, for both target present and target absent decisions. Thus, in Figs. 2F and 2G, response times appear to be highest for animals, which share the largest numbers of closely similar distractors.

      Unfortunately, your logic does not boil down to any quantitative account, since you are using vague terms like “maximum information”. Further, any argument based solely on item similarity to explain visual search or symmetry responses cannot explain systematic variations observed for target-absent arrays and for symmetric objects, for the reasons below.

      If target-distractor dissimilarity were the sole driver of response times, target-absent judgements should always take the longest time since the target and distractor have zero similarity, with no variation from one image to another. This account does not explain why target-absent response times vary so systematically.

      Similarly, if symmetry judgements are solely based on comparing the dissimilarity between two halves of an object, there should be no variation in the response times of symmetric objects since the dissimilarity between their two halves is zero. However we do see systematic variation in the response times to symmetric objects.

      DEFINITION OF AREA VH USING fMRI

      (1) The area VH boundaries from different experiments are nearly completely non-overlapping.

      In line with their theory that VH is a single continuum with a decision boundary somewhere in the middle, the authors use fMRI searchlight to find an area whose responses positively correlate with homogeneity, as calculated across all of their target present and target absent arrays. They report VH-correlated activity in regions anterior to LO. However, the VH defined by symmetry Experiments 3 and 4 (VHsymmetry) is substantially anterior to LO, while the VH defined by target detection Experiments 1 and 2 (VHdetection) is almost immediately adjacent to LO. Fig. S13 shows that VHsymmetry and VHdetection are nearly non-overlapping. This is a fundamental problem with the claim of discovering a new area that represents a new quantity that explains response times across multiple visual tasks. In addition, it is hard to understand why VHsymmetry does not show up in a straightforward subtraction between symmetric and asymmetric objects, which should show a clear difference in homogeneity. • Actually VHsymmetry is apparent even in a simple subtraction between symmetric and asymmetric objects (Figure S10). The VH regions identified using the visual search task and symmetry task have a partial overlap, not zero overlap as you are incorrectly claiming.

      We have noted that it is not straightforward to interpret the overlap, since there are many confounding factors. One reason could simply be that the stimuli in the symmetry task were presented at fixation, whereas the visual search arrays contained items exclusively in the periphery. Another that the participants in the two tasks were completely different, and the lack of overlap is simply due to inter-individual variability. Testing the same participants in two tasks using similar stimuli would be ideal but this is outside the scope of this study. We have acknowledged these issues in the Results (p. 26) and in the Supplementary Material (Section S8).

      (2) It is hard to understand how neural responses can be correlated with both VHpresent and VHabsent.

      The main paper results for VHdetection are based on both target-present and target-absent trials, considered together. It is hard to interpret the observed correlations, since the VHpresent and VHabsent metrics are calculated in such different ways and have opposite correlations with target similarity, task difficulty, and response times (see above). It may be that one or the other dominates the observed correlations. It would be clarifying to analyze correlations for target-present and target-absent trials separately, to see if they are both positive and correlated with each other.

      Thanks. The positive correlation between VH and neural response holds even when we do the analysis separately for target-present and -absent searches (correlation between neural response in VH region and visual homogeneity (n = 32, r = 0.66, p < 0.0005 for target-present searches & n = 32, r = 0.56, p < 0.005 for target-absent searches).

      (3) The definition of the boundaries and purpose of a new visual area in the brain requires circumspection, abundant and convergent evidence, and careful controls.

      Even if the VH metric, as defined and calculated by the authors here, is a meaningful quantity, it is a bold claim that a large cortical area just anterior to LO is devoted to calculating this metric as its major task. Vision involves much more than target detection and symmetry detection. The cortex anterior to LO is bound to perform a much wider range of visual functionalities. If the reported correlations can be clarified and supported, it would be more circumspect to treat them as one byproduct of unknown visual processing in the cortex anterior to LO, rather than treating them as the defining purpose for a large area of the visual cortex.

      We totally agree with you that reporting a new brain region would require careful interpretation and abundant and converging evidence. However, this requires many studies worth of work, and historically category-selective regions like the FFA have achieved consensus only after they were replicated and confirmed across many studies. We believe our proposal for the computation of a quantity like visual homogeneity is conceptually novel, and our study represents a first step that provides some converging evidence (through replicable results across different experiments) for such a region. We have reworked our manuscript to make this point clearer (Discussion, p 32).

      Reviewer #2 (Public Review):

      Summary:

      This study proposes visual homogeneity as a novel visual property that enables observers perform to several seemingly disparate visual tasks, such as finding an odd item, deciding if two items are the same, or judging if an object is symmetric. In Experiment 1, the reaction times on several objects were measured in human subjects. In Experiment 2, the visual homogeneity of each object was calculated based on the reaction time data. The visual homogeneity scores predicted reaction times. This value was also correlated with the BOLD signals in a specific region anterior to LO. Similar methods were used to analyze reaction time and fMRI data in a symmetry detection task. It is concluded that visual homogeneity is an important feature that enables observers to solve these two tasks.

      Strengths:

      (1) The writing is very clear. The presentation of the study is informative.

      (2) This study includes several behavioral and fMRI experiments. I appreciate the scientific rigor of the authors.

      We are grateful to you for your balanced assessment and constructive comments.

      Weaknesses:

      (1) My main concern with this paper is the way visual homogeneity is computed. On page 10, lines 188-192, it says: "we then asked if there is any point in this multidimensional representation such that distances from this point to the target-present and target-absent response vectors can accurately predict the target-present and target-absent response times with a positive and negative correlation respectively (see Methods)". This is also true for the symmetry detection task. If I understand correctly, the reference point in this perceptual space was found by deliberating satisfying the negative and positive correlations in response times. And then on page 10, lines 200-205, it shows that the positive and negative correlations actually exist. This logic is confusing. The positive and negative correlations emerge only because this method is optimized to do so. It seems more reasonable to identify the reference point of this perceptual space independently, without using the reaction time data. Otherwise, the inference process sounds circular. A simple way is to just use the mean point of all objects in Exp 1, without any optimization towards reaction time data.

      We disagree with you since the same logic applies to any curve-fitting procedure. When we fit data to a straight line, we are finding the slope and intercept that minimizes the error between the data and the straight line, but we would hardly consider the process circular when a good fit is achieved – in fact we take it as a confirmation that the data can be fit linearly. In the same vein, we would not have observed a good fit to the data, if there did not exist any good reference point relative to which the distances of the target-present and target-absent search arrays predicted these response times.

      In Section S1, we have already reported that the visual homogeneity estimates for each object is strongly correlated with the average distance of each object to all other objects (r = 0.84, p<0.0005, Figure S1). Second, to confirm that the results we obtained are not due to overfitting, we have already reported a cross-validation analysis, where we removed all searches involving a particular image and predicted these response times using visual homogeneity. This too revealed a significant model correlation confirming that our results are not due to overfitting.

      (2) On page 11, lines 214-221. It says: "these findings are non-trivial for several reasons". However, the first reason is confusing. It is unclear to me why "it suggests that there are highly specific computations that can be performed on perceptual space to solve oddball tasks". In fact, these two sentences provide no specific explanation for the results.

      We have now revised the text to make it clearer (Results, p. 11).

      (3) The second reason is interesting. Reaction times in target-present trials can be easily explained by target-distractor similarity. But why does reaction time vary substantially across target-absent stimuli? One possible explanation is that the objects that are distant from the feature distribution elicit shorter reaction times. Here, all objects constitute a statistical distribution in the feature (perceptual) space. There is certainly a mean of this distribution. Some objects look like outliers and these outliers elicit shorter reaction times in the target-absent trials because outlier detection is very salient.

      One might argue that the above account is merely a rephrasing of the idea of visual homogeneity proposed in this study. If so, feature saliency is not a new account. In other words, the idea of visual homogeneity is another way of reiterating the old feature saliency theory.

      Thank you for this interesting point. We don’t necessarily see a contradiction. However, we are proposing a quantitative decision variable that the brain could be using to make target present/absent judgements.

      (4) One way to reject the feature saliency theory is to compare the reaction times of the objects that are very different from other objects (i.e., no surrounding objects in the perceptual space, e.g., the wheel in the lower right corner of Fig. 2B) with the objects that are surrounded by several similar objects (e.g., the horse in the upper part of Fig. 2B). Also, please choose the two objects with similar distance from the reference point. I predict that the latter will elicit longer reaction times because they can be easily confounded by surrounding similar objects (i.e., four-legged horses can be easily confounded by four-legged dogs). If the density of object distribution per se influences the visual homogeneity score, I would say that the "visual homogeneity" is essentially another way of describing the distributional density of the perceptual space.

      We agree with you, and we have indeed found that visual homogeneity estimates from our model are highly correlated with the average distance of an object relative to all other objects. However, we performed several additional experiments to elucidate the nature of target-absent response times. We find that they are unaffected by whether these searches are performed in the midst of similar or dissimilar objects (Section S4, Experiment S6), and even when the same searches are performed among nearby sets of objects with completely uncorrelated average distances (Section S4, Experiment S7). We have now reworked the text to make this clearer.

      (5) The searchlight analysis looks strange to me. One can easily perform a parametric modulation by setting visual homogeneity as the trial-by-trial parametric modulator and reaction times as a covariate. This parametric modulation produces a brain map with the correlation of every voxel in the brain. On page 17 lines 340-343, it is unclear to me what the "mean activation" is.

      We have done something similar. For each region we took the mean activation at each voxel as the average activation 3x3x3 voxel neighborhood in the brain, and took its correlation with visual homogeneity. We have now reworked this to make it clearer (Results, p. 16).

      Minor points

      (1) In the intro, it says: "using simple neural rules..." actually it is very confusing what "neural rules" are here. Better to change it to "computational principles" or "neural network models"??

      We have now replaced this with “using well-known principles governing multiple object representations”.

      (2) In the intro, it says: "while machine vision algorithms are extremely successful in solving feature-based tasks like object categorization (Serre, 2019), they struggle to solve these generic tasks (Kim et al., 2018; Ricci et al. 2021). These are not generic tasks. They are just a specific type of visual task-judging relationship between multiple objects. Moreover, a large number of studies in machine vision have shown that DNNs are capable of solving these tasks and even more difficult tasks. Two survey papers are listed here.

      Wu, Q., Teney, D., Wang, P., Shen, C., Dick, A., & Van Den Hengel, A. (2017). Visual question answering: A survey of methods and datasets. Computer Vision and Image Understanding, 163, 21-40.

      Małkiński, M., & Mańdziuk, J. (2022). Deep Learning Methods for Abstract Visual Reasoning: A Survey on Raven's Progressive Matrices. arXiv preprint arXiv:2201.12382.

      Thank you for sharing these references. In fact, a recent study has shown that specific deep networks can indeed solve the same-different task (Tartaglini et al, 2023). However our broader point remains that the same-different or other such visual tasks are non-trivial for machine vision algorithms.

      Reviewer #1 (Recommendations For The Authors):

      Nothing to add to the public review. If my concerns turn out to be invalid, I apologize and will happily accept correction. If they are valid, I hope they will point toward a new version of this paper that optimizes the insights to be gained from this impressive dataset.

      Reviewer #2 (Recommendations For The Authors):

      My suggestions are as follows:

      (1) Analyze the fMRI data using the parametric modulation approach first at the single-subject level and then perform group analysis.

      To clarify, we have obtained image-level activations from each subject, and used it for all our analyses.

      (2) Think about a way to redefine visual homogeneity from a purely image-computable approach. In other words, visual homogeneity should be first defined as an image feature that is independent of any empirical response data. And then use the visual homogeneity scores to predict reaction times.

      While we understand what you mean, any image-computable representation such as from a deep network may carry its own biases and may not be an accurate representation of the underlying object representation. By contrast, neural dissimilarities in the visual cortex are strongly predictive of visual search oddball response times. That is why we used visual search oddball response times as a proxy for the underlying neural representation, and then asked whether some decision variable can be derived from this representation to explain both target present and absent judgements in visual search.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      The authors provide convincing experimental evidence of extended motivational signals encoded in the mouse anterior cingulate cortex (ACC) that are implemented by the orbitofrontal cortex (OFC)-to-ACC signaling during learning. The results are valuable to the field of motivation and cognition. The experimental methods used were state-of-the-art. The manuscript would further benefit from theory-driven analyses to inform a mechanistic understanding, particularly for the single-cell calcium imaging results. These results will be of interest to those interested in cortical function, learning, and/or motivation.

      We thank the reviewers for their thoughtful reading of our paper and providing constructive feedback. We have made the relevant changes to the manuscript to improve the writing and figures. We provide responses below to each of the reviewer’s comments.

      Reviewer #1 (Public Review):

      (1) An important conclusion (Figure 4) is that when mice are trained to run through no reward (N) cues in order to reach reward (R) cues, the OFC neurons projecting to ACC each respond to different specific events in a manner that ensures that collectively they tile the extended behavioural sequence. What I was less sure of was whether the ACC neurons do the same or not. Figure 3 suggests that on average ACC neurons maintain activity across N cues in order to get to R cues but I was not sure whether this was because all individual neurons did this or whether some had activity patterns like the OFC neurons projecting to ACC.

      We agree that it remains uncertain what individual ACC neurons do during the extended behavioral sequence. We now include a few sentences in the discussion about what we hypothesize, as we did not perform the cellular resolution imaging to determine this:

      “While we did not perform single-cell imaging of ACC in our task, we hypothesize that individual ACC neurons could encode the distribution of actions/opportunities47 (i.e. stop, run, lick, suppress lick) taken during R or N cues. ACC neurons could compute the relative value of the action taken such that more ACC neurons become recruited once mice learn to run out of N cues. The sustained increase in bulk ACC activity across N cue trials (Figure 2) could come from a stable sequence of individual neurons that encode the timescale of the actions taken. In this way, OFC projections would encode current motivation across N cues before learning, which then triggers ACC to compute the valuebased actions. Motivational signals in OFC would thus represent state since past rewards/goals, while in ACC these signals represent actions taken to pursue rewards/goals in the future.”

      (2) Figure 1 versus Figure 2: There does not seem to be a particular motivation for whether chemogenetic inactivation or optogenetic inhibition were used in different experiments. I think that this is not problematic but, if I am wrong and there were specific reasons for performing each experiment in a certain way, then further clarification as to why these decisions were made would be useful. If there is no particular reason, then simply explaining that this is the case might stop readers from seeking explanations.

      Thank you for this comment and we agree that clarification on this is important. We performed chemogenetic inhibition of ACC in Figure 1 to take a broad survey of behavioral effects throughout a 40-min long behavioral session, and performed optogenetic inhibition in Figure 2 because we wanted to restrict our inhibition to the few seconds of cue presentation during a behavioral session and across days. Furthermore, we wanted to combat any potential off-target effects that would come from repeated administration of CNO over the several days of training (Manvich et al 2018). We have included a couple sentences on page 4 to clarify this:

      “We proceeded to test whether these motivation related signals in ACC are required for learning. To restrict our inhibition to cue presentation portions of our task, and combat any potential off-target effects of CNO31 from repeated administration across several days of training, we used optogenetic inhibition.”

      (3) P5, paragraph 2. The authors argue that OFC and anteriomedial (AM) thalamic inputs into ACC are especially important for mediating motivation through N cues in order to reach R cues. Is this based on a statistical comparison between the activity in OFC or AM inputs as opposed to the other inputs?

      We determined that OFC and AM thalamic inputs to ACC are particularly important by comparing the pre-cue activity in a reward-no reward-reward trial sequence (RNR; Figure 3B). Specifically, we performed paired t-tests comparing pre-cue activity between N and R cues, and found a statistically significant increase for R cues but only for the OFC and AM inputs, not for the BLA or LC inputs.

      (4) P3, paragraph 2. Some papers by Khalighinejad and colleagues (eg Neuron 2020, Current Biology, 2022) might be helpful here in as much as they assess ACC roles in determining action frequency, initiation, and speed and mediating the relationship between reward availability and action frequency and speed.

      We thank the reviewer for bringing these relevant papers to our attention. We have included these papers in our citations in this paragraph.

      (5) Paragraph 1 "This learning is of a more deliberate, informed nature than habitual learning, as they are sensitive to the current value of outcomes and can lead to a novel sequence of actions for a desired outcome1-3." Should "they" be "it"?

      This is correct, we have edited this in the manuscript.

      Reviewer #2 (Public Review):

      Impact:

      The findings will be valuable for further research on the impact of motivational states on behaviour and cognition. The authors provided a promising concept of how persistent motivational states could be maintained, as well as established a novel, reproducible task assay. While experimental methods used are currently state-of-the-art, theoretical analysis seems to be incomplete/not extensive. We thank the reviewer for these comments. In our paper, we performed single-cell calcium imaging of OFC projection neurons to ACC to build a mechanistic understanding for the bulk ramp-like response we identified in these neurons with photometry. We identified ensembles of neurons that tile sequences of trials that match the bulk response, in particular a subset of neurons that are active at the time a reward (R) cue is reached after 2 no-reward (N) cues. We included a paragraph in the discussion to address future theory-driven analyses to address how computation is achieved by OFC projection neurons:

      “We linked the ramp-like increase in neural activity in OFC to motivation, but several questions still remain about how motivation is computed and why it would be represented as a ramp. Motivation could be computed as a combination of several variables such as time since last reward, value of reward, and effort to reach future rewards. Future theorydriven analyses could determine how motivation is computed, and whether individual variables of time, value, and effort, are encoded as clusters of similar tuned neurons, or mixed and collectively represented at the population level. In either case, it is likely that a combined map of task space and value-information carried by OFC are being used to inform downstream regions, such as ACC, for adjusting behavior. ”

      Reviewer #2 (Recommendations for the Authors):

      Overall, the layout of the figures seems a little bit chaotic and makes it hard to understand the boundaries between panels.

      We agree that the figure layout could be improved upon to aid the reader in moving from panel to panel. We have edited two of the main figures with layouts that are most irregular (Figures 2 and 4) to help with this.

      Figures/text should include the promoters used for protein expression so that readers understand which cell types would be affected.

      We have made sure to edit the figures to include the promoter of the viruses we used, and edited the text to include both the AAV serotype and promoter.

      Discuss why it is necessary for multiple prefrontal areas to be involved in maintaining motivational signals.

      We thank the reviewer for this comment. We believe that prefrontal areas would be recruited as tasks to study motivational states become more complex and require animals to keep track of task structure and perform value-guided actions. We have included a couple sentences in the final paragraph of the discussion about this:

      “Our work showed the recruitment of multiple frontal cortical areas in this process, which is to be expected as animals are required to build, maintain, and use representations of task structure and value to drive learned, motivated behaviors47. Future work can build upon the task we developed here to determine how the frontal cortex maintains motivational states across many more cue-outcome associations, and how these associations may dynamically change across time48”.

      Additionally, we included a short discussion on how in motivational signals differ between OFC and ACC in our work. We suggest OFC encodes current motivation before and after learning, which then leads ACC to represent learned actions taken and thus have a longer timescale motivational response (see response to Reviewer 1).

      Minor: Page 4, Line 1: "increase" instead of "increases".

      This is correct, we have edited this in the manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This study provides important insights into the role of neurexins as regulators of synaptic strength and timing at the glycinergic synapse between neurons of the medial nucleus of the trapezoid body and the lateral superior olive, key components of the auditory brainstem circuit involved in computing sound source location from differences in the intensity of sounds arriving at the two ears. Through an elegant combination of genetic manipulation, fluorescence in-situ hybridization, ex vivo slice electrophysiology, pharmacology, and optogenetics, the authors provide convincing evidence to support their claims. While further work is needed to reveal the mechanistic basis by which neurexins influence glycinergic neurotransmission, this work will be of interest to both auditory and synaptic neuroscientists.

      We appreciate the recognition of the significance of our study in shedding light on the role of neurexins in regulating synaptic strength and timing at the glycinergic synapse. Indeed, further investigations are warranted to delve deeper into the specific role of each different variant of neurexins in the future. We hope that our work will spark more interest and collaboration in unraveling the complexities of molecular codes of synaptic function.

      Public Reviews:

      Reviewer #1 (Public Review):

      Jiang et al. demonstrated that ablating Neurexins results in alterations to glycinergic transmission and its calcium sensitivity, utilizing a robust experimental system. Specifically, the authors employed rAAV-Cre-EGFP injection around the MNTB in Nrxn1/2/3 triple conditional mice at P0, measuring Glycine receptor-dependent IPSCs from postsynaptic LSO neurons at P13-14. Notably, the authors presented a clear reduction of 60% and 30% in the amplitudes of opto- and electric stimulation-evoked IPSCs, respectively. Additionally, they observed changes in kinetics, alterations in PPR, and sensitivity to lower calcium and the calcium chelator, EGTA, indicating solid evidence for changes in presynaptic properties of glycinergic transmission.

      Furthermore, the authors uncovered an unexpected increase in sIPSC frequency without altering amplitude. Despite the reduction in evoked IPSC, immunostaining revealed an increase in GlyT2 and VGAT in TKO mice, supporting the notion of an increase in synapse number. However, the reviewer expresses caution regarding the authors' conclusion that "glycinergic neurotransmission likely by promoting the synapse formation/maintenance, which is distinct from the phenotypes observed in glutamatergic and GABAergic neurons (Chen et al., 2017; Luo et al., 2021)", as outlined in lines 173-175. The reviewer suggests that this statement may be overstated, pointing out the authors' own discussion in lines 254-265, which acknowledges multiple possibilities, including the potential that the increase in synapses is a consequence rather than a causal effect of Nrxn deletion.

      We appreciate the reviewer’s thoughtful evaluation of our study. We agree that our conclusion regarding the promotion of synapse formation/maintenance may have been overstated and recognize the need for a more nuanced interpretation of our findings. Accordingly, we have revised our interpretation by discussing carefully the various possibilities that may cause the observed increase in synapse number in line 256-266.

      Reviewer #2 (Public Review):

      Summary:

      In this manuscript, Jiang et al., explore the role of neurexins at glycinergic MNTB-LSO synapses. The authors utilize elegant and compelling ex vivo slice electrophysiology to assess how the genetic conditional deletion of Nrxns1-3 impacts inhibitory glycinergic synaptic transmission and found that TKO of neurexins reduced electrically and optically evoked IPSC amplitudes, slowed optically evoked IPSC kinetics and reduced presynaptic release probability. The authors use classic approaches including reduced [Ca2+] in ACSF and EGTA chelation to propose that changes in these evoked properties are likely driven by the loss of calcium channel coupling. Intriguingly, while evoked transmission was impaired, the authors reported that spontaneous IPSC frequency was increased, potentially due to an increased number of synapses in LSO. Overall, this manuscript provides important insight into the role of neurexins at the glycinergic MNTP-LSO synapse and further emphasizes the need for continued study of both the non-redundant and redundant roles of neurexins.

      We thank the reviewer for the strong comments and support of our work.

      Strengths:

      This well-written manuscript seamlessly incorporates mouse genetics and elegant ex vivo electrophysiology to identify a role for neurexins in glycinergic transmission at MNTB-LSO synapses. Triple KO of all neurexins reduced the amplitude and timing of evoked glycinergic synaptic transmission. Further, spontaneous IPSC frequency was increased. The evoked synaptic phenotype is likely a result of reduced presynaptic calcium coupling while the spontaneous synaptic phenotype is likely due to increased synapse numbers. While neuroligin-4 has been identified at glycinergic synapses, this study, to the best of my knowledge, is the first to study Nrxn function at these synapses.<br />

      We again appreciate the positive feedback on the strengths of our study. We agree that the observed reduction in evoked synaptic transmission and the increase in spontaneous IPSC frequency provide intriguing insights into the function of neurexins in regulating glycinergic synaptic activity.

      Weaknesses:

      The data are compelling and report an intriguing functional phenotype. The role of Neurexins redundantly controls calcium channel coupling has been previously reported. Mechanistic insight would significantly strengthen this study.

      We wholeheartedly agree with the reviewer that understanding how neurexins control calcium channel coupling at the presynaptic active zone is crucial for elucidating their role in synaptic transmission. While our current study has provided compelling evidence for the functional phenotypes of pan-neurexin deletion, we recognize the importance of investigating the underlying molecular mechanisms in future research. Exploring these mechanisms would undoubtedly enhance our understanding of neurexin function at various synapses and contribute to advancing the field.

      The claim that triple KO of Nrxns from MNTB increases the number of synapses in LSO is not strongly supported.

      We agree. Echoing the suggestion made by reviewer 1 (as mentioned above), we acknowledge that the claim regarding the increase in synapse numbers in the LSO following the triple knockout of neurexins from the MNTB was overstated. Consequently, we have revised our conclusions more carefully to reflect this adjustment.

      Despite the stated caveats of measuring electrically evoked currents and the more robust synaptic phenotypes observed using optically evoked transmission, the authors rely heavily on electrical stimulation for most measurements.

      We acknowledge that optogenetic stimulation offers crucial advantages, and we have provided a balanced discussion of the caveats associated with both methods in our manuscript. Additionally, we have conducted new optogenetic experiments specifically for measuring the paired-pulse ratio in control and Nrxn123 TKO mice. These results have been included as a new supplementary figure (Figure S2).

      For experiments involving EGTA and low Ca2+ manipulations, we opted for electrical stimulation due to concerns regarding potential side effects of optogenetics, including the phototoxicity and photobleaching during prolonged light exposure.

      The differential expression of individual neurexins might indicate that specific neurexins may dominantly regulate synaptic transmission, however, this possibility is not discussed in detail.

      We thank the reviewer for bringing up this important point. The differential expression of individual neurexins indeed suggests that specific neurexins may play dominant roles in regulating synaptic transmission. While our study primarily focused on the collective impact of ablating all neurexins, we acknowledge the significance of exploring the specific contributions of individual neurexin isoforms in the future. Understanding the distinct roles of each neurexin isoform could provide valuable insights into the precise mechanisms underlying synaptic function and plasticity. We have added discussion in our revised manuscript Line223-230.

      Reviewer #3 (Public Review):

      Summary:

      The authors investigate the hypothesis that neurexins serve a crucial role as regulators of the synaptic strength and timing at the glycinergic synapse between neurons of the medial nucleus of the trapezoid body (MNTB) and the lateral superior olivary complex (LSO). It is worth mentioning that LSO neurons are an integration station of the auditory brainstem circuit displaying high reliability and temporal precision. These features are necessary for computing interaural cues to derive sound source location from comparing the intensities of sounds arriving at the two ears. In this context, the authors' findings build up according to the hypothesis first by displaying that neurexins were expressed in the MNTB at varying levels. They followed this up with the deletion of all neurexins in the MNTB through the employment of a triple knock-out (TKO). Using electrophysiological recordings in acute brainstem slices of these TKO mice, they gathered solid evidence for the role of neurexins in synaptic transmission at this glycinergic synapse primarily by ensuring tight coupling of Ca2+ channels and vesicular release sites. Additionally, the authors uncovered a connection between the deletion of neurexins and a higher number of glycinergic synapses in TKO mice, for which they provided evidence in the form of immunostainings and related it to electrophysiological data on spontaneous release. Consequently, this investigation expands our knowledge on the molecular regulation of synaptic transmission at glycinergic synapses, as well as on the auditory processing at the level of the brainstem.

      Strengths:

      The authors demonstrate substantial results in support of the hypothesis of a critical role of neurexins for regulating glycinergic transmission in the LSO using various techniques. They provide evidence for the expression of neurexins in the MNTB and consecutively successfully generate and characterize the neurexin TKO. For their study on LSO IPSCs the authors transduced MNTB neurons by co-injection of virus-carrying Cre and ChR2 and subsequently optogenetically evoke release of glycine. As a result, they observed a significant reduction in amplitude and significantly slower rise and decay times of the IPSCs of the TKO in comparison with control mice in which MNTB neurons were only transduced with ChR2. Furthermore, they observed an increased paired pulse ratio (PPR) of LSO IPSCs in the TKO mice, indicating lower release probability. Elaborating on the hypothesis that neurexins are essential for the coupling of synaptic vesicles to Ca2+ channels, the authors show lowered Ca2+ sensitivity in the TKO mice. Additionally, they reveal convincing evidence for the connection between the increased frequency of spontaneous IPSC and the higher number of glycinergic synapses of the LSO in the TKO mice, revealed by immunolabeling against the glycinergic presynaptic markers GlyT2 or VGAT.

      We thank the reviewer for the thoughtful and thorough evaluation of the significance of investigating the role of neurexins in glycinergic transmission at the MNTB-LSO synapse, particularly in the context of auditory processing and sound localization. The positive feedback is greatly appreciated.

      Weaknesses:

      The major concern is novelty as this work on the effects of pan-neurexin deletion in a glycinergic synapse is quite consistent with the authors' prior work on glutamatergic synapses (Luo et al., 2020). The authors might want to further work out novel aspects and strengthen the comparative perspective. Conceptually, the authors might want to be more clear about interpreting the results on the altered dependence of release on voltage-gated Ca2+ influx (Ca2+ sensitivity, coupling).

      Regarding the reviewer’s concern about the novelty of our work, we acknowledge that our previous work has explored the effects of pan-neurexin deletion on glutamatergic synapses (Luo et al., 2020). However, we would like to point out that a novelty of our present study indeed stems from the exploration of how different types of synapses converge to employ the same mechanism of synaptic function, particularly in the context of neurexin-mediated regulation. Our previous study focused on glutamatergic synapses, the current study delves into the realm of glycinergic synapses, which represent a distinct population with unique properties and functions. Despite the differences between these synapse types, our findings reveal a commonality in the underlying mechanisms of synaptic regulation mediated by neurexins. This convergence of mechanisms across different synapse types highlights the fundamental role of neurexins in synaptic function and plasticity. By elucidating how neurexins regulate synaptic transmission at both excitatory and inhibitory synapses, we provide valuable insights into the general principles governing synaptic function. In addition, this comparative perspective may shed light on the complex interplay between excitatory and inhibitory neurotransmission, which is crucial for maintaining the balance of neuronal activity and network dynamics.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      During the developmental period spanning P3-P12, the MNTB-LSO synapses undergo a transition from GABAergic to glycinergic transmission. It is well-established that Neurexin plays a role in modulating GABAergic transmission. In the authors' experimental system, AAV was injected at P0, likely impacting GABAergic transmission, including potentially influencing synapse number, before subsequently affecting glycinergic transmission. A thoughtful discussion of how the experimental interventions might have influenced this developmental process and glycinergic transmission would enhance the clarity and interpretation of their findings.

      We thank the reviewer for raising the interesting topic of the transmitter switch during neurodevelopment. Strong evidence using gerbils and rats as animal models demonstrates that the MNTB-LSO synapses undergo a shift from GABAergic to glycinergic during the early development. However, in a more recent study by Friauf and colleagues (Fisher et al., 2019), patch-clamp recordings in acute mouse brainstem slices at P4-P11 combined with pharmacological blockade of GABAA receptors and/or glycine receptors clearly demonstrated no GABAergic synaptic component on LSO principal neurons, suggesting the transmitter subtype switch may be species different. We add a discussion in our revision to clarify this topic.

      Reviewer #2 (Recommendations For The Authors):

      The data are compelling and report an intriguing functional phenotype. Mechanistic insight into how this phenotype manifests would significantly strengthen this study. For example, which neuroligin is found at these MNTB-LSO synapses?

      We agree that investigating the underlying molecular mechanisms, particularly the specific function of each variant of neurexins and their respective ligands on the postsynaptic neurons, is crucial. Exploring these mechanisms, which extend beyond the scope of our current study, would undoubtedly enhance our understanding of neurexin function at various synapses and foster advancements in the field.

      Does the TKO alter the ability of MNTB inputs to induce AP firing in LSO neurons?

      Activation of the MNTB inputs does not directly induce AP firing in LSO neurons, because the MNTB-LSO synapses are glycinergic and serve to inhibit neuronal activity.

      We think the reviewer was to ask whether pan-neurexin deletion in the MNTB neurons alter their ability to impact the firing of LSO neurons. Indeed, the weakening of glycinergic transmission due to pan-neurexin ablation in MNTB neurons could potentially alter the excitation-inhibition (E/I) balance, thereby impacting the overall excitability of LSO neurons. We have conducted preliminary experiments to investigate this aspect and found that the E/I balance at LSO neurons was notably increased in TKO mice. We are currently preparing a manuscript to comprehensively address the role of neurexins at the auditory circuit and behavior levels.

      Additional calcium measurements using GECIs would provide insight into whether nanodomain calcium or total calcium is altered at these synapses.

      We appreciate the valuable suggestion provided by the reviewer. However, distinguishing between Ca2+ nanodomain and Ca2+ microdomain using Ca2+ imaging techniques requires advanced systems such as two-photon STED microscopy, which are beyond the scope of our current research.

      It is unclear why fluorescence intensity is quantified instead of the number of synaptic clusters in LSO. In addition to changes in synapse numbers, fluorescent intensity can indicate a number of other possible morphological changes.

      We appreciate the valuable suggestion from the reviewer. We have re-analyzed our imaging data to compare synaptic density. The results, as included in Fig.3f and 3h, confirm an increase in the number of glycinergic synapses after pan-neurexin deletion.

      The most robust synaptic phenotypes were produced by measuring light-evoked oIPSCs and the authors acknowledge that electrically-evoked eIPSCs might be contaminated by uninfected fibers or by other sources of glycinergic inputs. I suggest that IPSC PPRs, EGTA, and low Ca2+ experiments be performed using optogenetics.

      As discussed in our response to Public Reviews, we acknowledge that optogenetic stimulation offers crucial advantages, and we have provided a balanced discussion of the caveats associated with both methods in our manuscript. Additionally, following the reviewer’s suggestion, we have conducted new optogenetic experiments specifically for measuring the paired-pulse ratio in control and Nrxn123 TKO mice. We included this new dataset in supplementary Figure S2, which is consistent with our result obtained with electrically fiber stimulation.

      For experiments involving EGTA and low Ca2+ manipulations, we opted for electrical stimulation due to major concerns regarding potential side effects of optogenetics, including the phototoxicity and photobleaching during prolonged light exposure.

      It is sometimes confusing which type of evoked stimulation is being used (e.g. PPR, EGTA, and low Ca2+ experiments). To aid in the interpretations of these experiments, it would help to clarify.

      We appreciate the reviewer's suggestion regarding the clarity of the evoked stimulation methods used in our experiments. We have revised the manuscript to provide clearer descriptions of the specific types of evoked stimulation employed in each experiment. Thank you for guiding towards this clarification.

      The comparisons to Chen et al 2017 and the senior author's 2020 paper seem disjointed and do not contribute to the findings, which alone, are quite interesting. Given the prevailing notion that neurexins control different synaptic properties depending on the brain region and/or synapse studied, is it surprising that the findings observed here differ from previous studies of different synapses (glutamatergic and GABAergic)?

      By comparing previous studies at different types of neurons/synapses, our findings reveal a commonality in the underlying mechanisms of synaptic regulation mediated by neurexins. This convergence of mechanisms across different synapse types highlights the fundamental role of neurexins in synaptic function and plasticity. In addition, this comparative perspective may shed light on the complex interplay between excitatory and inhibitory neurotransmission, which is crucial for maintaining the balance of neuronal activity and network dynamics.

      Despite Nrxn3 being the most abundant Nrxn mRNA in MNTB neurons, the possible contributions of this highly expressed protein are not discussed.

      We thank the reviewer for bringing up this important point. The differential expression of individual neurexins indeed suggests that specific neurexins may play dominant roles in regulating synaptic transmission. While our study primarily focused on the collective impact of ablating all neurexins, we acknowledge the significance of exploring the specific contributions of individual neurexin isoforms in the future. Understanding the distinct roles of each neurexin isoform could provide valuable insights into the precise mechanisms underlying synaptic function and plasticity. We have added discussion in our revised manuscript Line223-230.

      Reviewer #3 (Recommendations For The Authors):

      • There are several instances of spaces missing and typos, please carefully check the manuscript.

      We greatly appreciate the reviewer's helpful feedback on the text that could be clarified or improved. We have meticulously edited the manuscript to address these concerns.

      • While studying the properties of IPSC, apart from optogenetic stimulation, the authors performed experiments with electrical fiber stimulation. Their findings showed a slightly significant reduction of the IPSC amplitude and no effect on the IPSCs kinetics when comparing the TKO and control. One weakness is the discrepancy between the results from the optogenetic and fiber stimulation experiments, which the authors contribute to inefficient transfection in the fiber stimulation experiments. The authors state that they tried to optimize their protocols for virus injection protocols. However, they do not elaborate on how the transfection rates could be improved in the discussion section. Moreover, it would be good to further address the reasons for the difference in amplitude between the control IPSCs in the optogenetic and fiber stimulation experiments.

      Echoing the suggestion by Reviewer 2 (see above), we acknowledge that optogenetic stimulation offers certain advantages, and we have provided a balanced discussion of the caveats associated with both methods in our manuscript. In addition, we have performed a new set of optogenetic experiment for the paired-pulse ratio measurement in control and Nrxn123 TKO mice and included as a new figure in supplementary figure S2.

      For experiments involving EGTA and low Ca2+ manipulations, we opted for electrical stimulation due to major concerns regarding potential side effects of optogenetics, including the phototoxicity and photobleaching during prolonged light exposure.

      We added the detail of virus injection strategy that optimized the transfection rates in the method section “To enhance virus infection efficiency, we decreased the dosage per injection while increasing the frequency of injections. Additionally, we ensured the pipette remained immobilized for 20-30 seconds to guarantee virus absorption at injection sites. As a result of this strategy, we estimated that the vast majority of MNTB neurons were inoculated by AAVs.” See line288-290.

      • Abstract: "ablation of all neurexins in MNTB neurons reduced not only the amplitude but also altered the kinetics of the glycinergic synaptic transmission at LSO neurons."

      Changed as suggested.

      • Consider revising to "The synaptic dysfunctions primarily resulted from an altered dependence of release on voltage-gated Ca2+ influx."

      We appreciate the reviewer's suggestion, which helps improve the clarity of our manuscript. We have revise the phrasing as follows: "The synaptic dysfunctions primarily resulted from an impaired calcium sensitivity of release and a loosened coupling between voltage-gated calcium channels and synaptic vesicles."

      • Line 39 should be vertebrates.

      Revised as suggested.

      • Line 49 it would sound better to say "which further points to the diverse actions of neurexins in specific neurons."

      Revised as suggested.

      • Line 60 - this paragraph could include information about GABA signaling from the MNTB to the LSO, because on line 113 you mention LSO neurons receive inhibitory GABAergic/glycinergic inputs, but when you do not mention blocking of GABA currents to isolate the glycinergic ones.

      We thank the reviewer for the thoughtful and detailed suggestion. We revised the text in line 60 to “In the mature mammalian auditory brainstem” and in line 113, we removed GABAergic to emphasize the nature of glycinergic synapse, particularly in the mouse brainstem where no GABAergic components are found (Fisher et al., 2019).

      • Line 72/73 it should be adeno-associated virus; line 73: "combining this with the RNAScope technique" sounds better.

      Changed as suggested.

      • Line 91 using the RNAScope technique; lines 97, 119 as a control; line 108 the functional organization.<br />

      Changed as suggested.

      • Line 113 should be a pharmacological approach; line 122 optogenetically evoked.

      Changed as suggested.

      • Line 132, 160: the control.

      Changed as suggested.

      • Line 147 thus were infected; line 148 likely to be present but were obscured .

      Changed as suggested.

      • Line 154 which has been routinely used.

      Changed as suggested.

      • Line 155 It is not supposed to be Figure 2h but 2i; following that Figure 2i should be 2j; in my opinion, Figure 2i does not display a strong depression for the TKO mice.

      Changed as suggested.

      • Line 171 a better flow is achieved by saying: together these data show.

      Changed as suggested.

      • EC50 rather than IC50 of [Ca2+].

      Changed as suggested.

      • 180 it is better to say "we approached the matter by..."; line 183 while recording;

      Changed as suggested.

      • Line 203 were much stronger than the effect at control synapses; line 206 tightly clustering.

      Changed as suggested.

      • Line 212 sounds like they provide evidence for retina and spinal cord as well, should be made clear.

      Changed as suggested.

      • Line 289 previously.

      Changed as suggested.

      • Line 295 should be 30 min.

      Changed as suggested.

      • Line 336, 337 confocal microscope.

      Changed as suggested.

      • Please provide the number of data points also in figure captions or in the results section.

      Added in the captions as suggested.

      • Line 533, a better phrasing would be: the blocking effect of 0.2 mM Ca on IPSC amplitude.

      Changed as suggested.

      • Explain either in the methods or result section how was the EC50 of Ca2+ calculated.

      Added in the methods as suggested.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This study provides important evidence supporting the ability of a new type of neuroimaging, OPM-MEG system, to measure beta-band oscillation in sensorimotor tasks on 2-14 years old children and to demonstrate the corresponding development changes, since neuroimaging methods with high spatiotemporal resolution that could be used on small children are quite limited. The evidence supporting the conclusion is solid but lacks clarifications about the much-discussed advantages of OPM-MEG system (e.g., motion tolerance), control analyses (e.g., trial number), and rationale for using sensorimotor tasks. This work will be of interest to the neuroimaging and developmental science communities.

      We thank the editors and reviewers for their time and comments on our manuscript. We have responded in detail to the comments, on a point-by-point basis, below. Included in our responses (and our revised manuscript) are additional analyses to control for trial count, clarification of the advantages of OPM-MEG, and justification of our use of sensory (as distinct from motor) stimulation. In what follows, our responses are in bold typeface; additions to our manuscript are in bold italic typeface. 

      Reviewer #1 (Public Review):

      Summary:

      Compared with conventional SQUID-MEG, OPM-MEG offers theoretical advantages of sensor configurability (that is, sizing to suit the head size) and motion tolerance (the sensors are intrinsically in the head reference frame). This study purports to be the first to experimentally demonstrate these advantages in a developmental study from age 2 to age 34. In short, while the theoretical advantages of OPM-MEG are attractive - both in terms of young child sensitivity and in terms of motion tolerance - neither was in fact demonstrated in this manuscript. We are left with a replication of SQUID-MEG observations, which certainly establishes OPM-MEG as "substantially equivalent" to conventional technology but misses the opportunity to empirically demonstrate the much-discussed theoretical advantages/opportunities.

      Thank you for reviewing our manuscript. We agree that our results demonstrate substantial equivalence with conventional MEG. However, as mentioned by Reviewer 3, most past studies have “focused on older children and adolescents (e.g., 9-15 years old)” whereas our youngest group is 25 years. We believe that by obtaining data of sufficient quality in these age groups, without the need for any restriction of head movement, we have demonstrated the advantage of OPM-MEG. We now have made this clear in our discussion:

      “…our primary aim was to test the feasibility of OPM-MEG for neurodevelopmental studies. Our results demonstrate we were able to scan children down to age 2 years, measuring high-fidelity electrophysiological signals and characterising the neurodevelopmental trajectory of beta oscillations. The fact that we were able to complete this study demonstrates the advantages of OPM-MEG over conventional-MEG, the latter being challenging to deploy across such a large age range…”

      Strengths:

      A replication of SQUID-MEG observations, which certainly establishes OPM-MEG as "substantially equivalent" to conventional technology but misses the opportunity to empirically demonstrate the much-discussed theoretical advantages/opportunities.

      As noted above the demonstration of equivalence was one of our primary aims. We have elaborated further on the advantages below.

      Weaknesses:

      The authors describe 64 tri-axial detectors, which they refer to as 192 channels. This is in keeping with some of the SQUID-MEG description, but possibly somewhat disingenuous. For the scientific literature, perhaps "64 tri-axial detectors" is a more parsimonious description.

      The number of channels in a MEG system refers to the number of independent measurements of magnetic field. This, in turn, tells us the number of degrees of freedom in the data that can be exploited by algorithms like signal space separation or beamforming. E.g. the MEGIN (cryogenic) MEG system has 306 channels, 102 magnetometers and 204 planar gradiometers. Sensors are constructed as “triple sensor elements” with one magnetometer and 2 gradiometers (in orthogonal orientations) centred on a single location. In our system, each sensor has three orthogonal metrics of magnetic field which are (by definition) independent. We have 64 such sensors, and therefore 192 independent channels – indeed when implementing algorithms like SSS we have shown we can exploit this number of degrees of freedom.1 192 channels is therefore an accurate description of the system.

      A small fraction (<20%) of trials were eliminated for analysis because of "excess interference" - this warrants further elaboration.

      We agree that this is an important point. We now state in our methods section:

      “…Automatic trial rejection was implemented with trials containing abnormally high variance (exceeding 3 standard deviations from the mean) removed. All experimental trials were also inspected visually by an experienced MEG scientist, to exclude trials with large spikes/drifts that were missed by the automatic approach. In the adult group, there was a significant overlap between automatically and manually detected bad trials (0.7+-1.6 trials were only detected manually). In the children 10.0 +-9.4 trials were only detected manually)…”

      We also note that the other reviewers and editor questioned whether the higher rejection rate in children had any bearing on results. This is an extremely important question. In revising the manuscript this has also been taken into account with all data reanalysed with equal trial counts in children and adults. Results are presented in Supplementary Information Section 5.

      Figure 3 shows a reduced beta ERD in the youngest children. Although the authors claim that OPMMEG would be similarly sensitive for all ages and that SQUID-MEG would be relatively insensitive to young children, one trivial counterargument that needs to be addressed is that OPM has NOT in fact increased the sensitivity to young child ERD. This can possibly be addressed by analogous experiments using a SQUID-based system. An alternative would be to demonstrate similar sensitivity across ages using OPM to a brain measure such as evoked response amplitude. In short, how does Figure 3 demonstrate the (theoretical) sensitivity advantage of OPM MEG in small heads ?

      We completely understand the referees’ point – indeed the question of whether a neuromagnetic effect really changes with age, or apparently changes due to a drop in sensitivity (caused by reduced head size or - in conventional MEG and fMRI - increased subject movement) is a question that can be raised in all neurodevelopmental studies.

      Our authors have many years’ experience conducting studies using conventional MEG (including in neurodevelopment) and agreed that the idea of scanning subjects down to age two in conventional MEG would not be practical; their heads are too small and they typically fail to tolerate an environment where they are forced to remain still for long periods. Even if we tried a comparative study using conventional MEG, the likely data exclusion rate would be so high that the study would be confounded. This is why most conventional MEG studies only scan older children and adolescents. For this reason, we cannot undertake the comparative study the reviewer suggests. There are however two reasons why we believe sensitivity is not driving the neurodevelopmental effects that we observe:

      Proximity of sensors to the head: 

      For an ideal wearable MEG system, the distance between the sensors and the scalp surface (sensor proximity) would be the same regardless of age (and size), ensuring maximum sensitivity in all subjects. To test how our system performed in this regard, we undertook analyses to compute scalp-to-sensor distances. This was done in two ways:

      (1) Real distances in our adaptable system: We took the co-registered OPM sensor locations and computed the Euclidean distance from the centre of the sensitive volume (i.e. the centre of the vapour cell) to the closest point on the scalp surface. This was measured independently for all sensors, and an average across sensors calculated. We repeated this for all participants (recall participants wore helmets of varying size and this adaptability should help minimise any relationship between sensor proximity and age).

      (2) Simulated distances for a non-adaptable system: Here, the aim was to see how proximity might have changed with age, had only a single helmet size been used. We first identified the single example subject with the largest head (scanned wearing the largest helmet) and extracted the scalpto-sensor distances as above. For all other subjects, we used a rigid body transform to co-register their brain to that of the example subject (placing their head (virtually) inside the largest helmet). Proximity was then calculated as above and an average across sensors calculated. This was repeated for all participants.

      In both analyses, sensor proximity was plotted against age and significant relationships probed using Pearson correlation. 

      In addition, we also wanted to probe the relation between sensor proximity and head circumference. Head circumference was estimated by binarising the whole head MRI (to delineate volume of the head), and the axial slice with the largest circumference around was selected. We then plotted sensor proximity versus head circumference, for both the real (adaptive) and simulated (nonadaptive) case (expecting a negative relationship – i.e. larger heads mean closer sensor proximity). The slope of the relationship was measured and we used a permutation test to determine whether the use of adaptable helmets significantly lowered the identified slope (i.e. do adaptable helmets significantly improve sensor proximity in those with smaller head circumference).

      Results are shown in Figure R1. We found no measurable relationship between sensor proximity and age (r = -0.195; p = 0.171) in the case of the real helmets (panel A). When simulating a non-adaptable helmet, we did see a significant effect of age on scalp-to-sensor distance (r = -0.46; p = 0.001; panel B). This demonstrates the advantage of the adaptability of OPM-MEG; without the ability to flexibly locate sensors, we would have a significant confound of sensor proximity. 

      Plotting sensor proximity against head circumference we found a significant negative relationship in both cases (r = -0.37; p = 0.007 and  r = -0.78; p = 0.000001); however, the difference between slopes was significant according to a permutation test (p < 0.025) suggesting that adaptable has indeed improved sensor proximity in those with smaller head circumference. This again shows the benefits of adaptability to head size.

      Author response image 1.

      Scalp-to-sensor distance as a function of age (A/B) and head circumference (C/D). A and C show the case for the real helmets; B and D show the simulated non-adaptable case.

      In sum, the ideal wearable system would see sensors located on the scalp surface, to get as close as possible to the brain in all subjects. Our system of multiple helmet sizes is not perfect in this regard (there is still a significant relationship between proximity and head circumference). However, our solution has offered a significant improvement over a (simulated) non-adaptable system. Future systems should aim to improve even further on this, either by using additively manufactured bespoke helmets for every subject (this is a gold standard, but also costly for large studies), or potentially adaptable flexible helmets.

      Burst amplitudes:

      The reviewer suggested to “demonstrate similar sensitivity across ages using OPM to a brain measure”. We decided not to use the evoked response amplitude (as suggested), since this would be expected to change with age. Instead, we used the amplitude of the bursts.

      Our manuscript shows a significant correlation between beta modulation and burst probability – implying that the stimulus-related drop in beta amplitude occurs because bursts are less likely to occur. Further, we showed significant age-related changes in both beta amplitude and burst probability leading to a conclusion that the age dependence of beta modulation was caused by changes in the likelihood of bursts (i.e. bursts are less likely to ’switch off’ during sensory stimulation in children). We have now extended these analyses to test whether burst amplitude also changes significantly with age – we reasoned that if burst amplitude remained the same in children and adults, this would not only suggest that beta modulation is driven by burst probability (distinct from burst amplitude), but also show directly that the beta effects we see are not attributable to a lack of sensitivity in younger people. 

      We took the (unnormalized) beamformer projected electrophysiological time series from sensorimotor cortex and filtered it 5-48 Hz (the motivation for the large band was because bursts are known to be pan-spectral and have lower frequency content in children; this band captures most of the range of burst frequencies highlighted in our spectra). We then extracted the timings of the bursts, and for each burst took the maximum projected signal amplitude. These values were averaged across all bursts in an individual subject, and plotted for all subjects against age.

      Author response image 2.

      Beta burst amplitude as a function of age; A) shows index finger simulation trials; B shows little finger stimulation trials. In both case there was no significant modulation of burst amplitude with age.

      Results (see Figure R2) showed that the amplitude of the beta burst showed no significant age-related modulation (R2 = 0.01, p = 0.48 for index finger and R2 = 0.01, p = 0.57 for the little finger). This is distinct from both burst probability and task induced beta modulation. This adds weight to the argument that the diminished beta modulation in children is not caused by a lack of sensitivity to the MEG signal and supports our conclusion that burst probability is the primary driver of the agerelated changes in beta oscillations.

      Both of the above analyses have been added to our supplementary information and mentioned in the main manuscript. The first shows no confound of sensor proximity to the scalp with age in our study. The second shows that the bursts underlying the beta signal are not significantly lower amplitude in children – which we reasoned they would be if sensitivity was diminished at younger ages. We believe that the two together suggest that we have mitigated a sensitivity confound in our study.

      The data do not make a compelling case for the motion tolerance of OPM-MEG. Although an apparent advantage of a wearable system, an empirical demonstration is still lacking. How was motion tracked in these participants?

      We agree that this was a limitation of our experiment. 

      We have the equipment to track motion of the head during an experiment, using IR retroreflective markers placed on the helmet and a set of IR cameras located inside the MSR. However, the process takes a long time to set up, it lacks robustness, and would have required an additional computer (the one we typically use was already running the somatosensory stimulus and video). When the study was designed, we were concerned that the increased set up time for motion tracking would cause children to get bored, and result in increased participant drop out. For this reason we decided not to capture motion of the head during this study.

      With hindsight this was a limitation which – as the reviewer states – makes us unable to prove that motion robustness was a significant advantage for this study. That said, during scanning there was both a parent and an experimenter in the room for all of the children scanned, and anecdotally we can say that children tended to move their head during scans – usually to talk to the parent. Whilst this cannot be quantified (and is therefore unsatisfactory) we thought it worth mentioning in our discussion, which reads:

      “…One limitation of the current study is that practical limitations prevented us from quantitatively tracking the extent to which children (and adults) moved their head during a scan. Anecdotally however, experimenters present in the room during scans reported several instances where children moved, for example to speak to their parents who were also in the room. Such levels of movement could not be tolerated in conventional MEG or MRI and so this again demonstrates the advantages afforded by OPM-MEG…”

      As a note, empirical demonstrations of the motion tolerance of OPM-MEG have been published previously: Early demonstrations included Boto et al. 2 who captured beta oscillations in adults playing a ball game and Holmes et al. who measured visual responses as participants moved their head to change viewing angle3. In more recent demonstrations, Seymour et al. measured the auditory evoked field in standing mobile participants4; Rea et al. measured beta modulation as subjects carried out a naturalistic handwriting task5 and Holmes et al measured beta modulation as a subject walked around a room.6

      Furthermore, while the introduction discusses at some length the phenomenon of PMBR, there is no demonstration of the recording of PMBR (or post-sensory beta rebound). This is a shame because there is literature suggesting an age-sensitivity to this, that the optimal sensitivity of OPM-MEG might confirm/refute. There is little evidence in Figure 3 for adult beta rebound. Is there an explanation for the lack of sensitivity to this phenomenon in children/adolescents? Could a more robust paradigm (button-press) have shed light on this?

      We understand the question. There are two limitations to the current study in respect to measuring the PMBR:

      Firstly, sensory tasks generally do not induce as strong a PMBR as motor tasks and with this in mind a stronger rebound response could have been elicited using a button press. However, it was our intention to scan children down to age 2 and we were sceptical that the youngest children would carry out a button press as instructed. For this reason we opted for entirely passive stimulation, requiring no active engagement from our participants. The advantages of this was a stimulus that all subjects could engage with. However, this was at the cost of a diminished rebound.

      The second limitation relates to trial length. Multiple studies have shown that the PMBR can last over ~10 s 7,8. Indeed, Pfurtscheller et al. argued in 1999 that it was necessary to leave 10 s between movements to allow the PMBR to return to a true baseline9, though this has rarely been adhered to in the literature. Here, we wanted to keep recordings short for the comfort of the younger participants, so we adopted a short trial duration. However, a consequence of this short trial length is that it becomes impossible to access the PMBR directly; one can only measure beta modulation with the task. This limitation has now been addressed explicitly in our discussion:

      “…this was the first study of its kind using OPM-MEG, and consequently aspects of the study design could have been improved. Firstly, the task was designed for children; it was kept short while maximising the number of trials (to maximise signal to noise ratio). However, the classical view of beta modulation includes a PMBR which takes ~10 s to reach baseline following task cessation7–9. Our short trial duration therefore doesn’t allow the rebound to return to baseline between trials, and so conflates PMBR with rest. Consequently, we cannot differentiate the neural generators of the task induced beta power decrease and the PMBR; whilst this helped ensure a short, child friendly task, future studies should aim to use longer rest windows to independently assess which of the two processes is driving age related changes…”

      Data on functional connectivity are valuable but do not rely on OPM recording. They further do not add strength to the argument that OPM MEG is more sensitive to brain activity in smaller heads - in fact, the OPM recordings seem plagued by the same insensitivity observed using conventional systems.

      Given the demonstration above that bursts are not significantly diminished in amplitude in children relative to adults; and further given the demonstrations in the literature (e.g. Seedat et al.10) that functional connectivity is driven by bursts, we would argue that the effects of connectivity changing with age are not related to sensitivity but rather genuinely reflect a lack of coordination of brain activity.

      The discussion of burst vs oscillations, while highly relevant in the field, is somewhat independent of the OPM recording approach and does not add weight to the OPM claims.

      We agree that the burst vs. oscillations discussion does not add weight to the OPM claims per se. However, we had two aims of our paper, the second being to “investigate how task-induced beta modulation in the sensorimotor cortices is related to the occurrence of pan-spectral bursts, and how the characteristics of those bursts change with age.” As the reviewer states, this is highly relevant to the field, and therefore we believe adds impact, not only to the paper, but also by extension to the technology.

      In short, while the theoretical advantages of OPM-MEG are attractive - both in terms of young child sensitivity and in terms of motion tolerance, neither was in fact demonstrated in this manuscript. We are left with a replication of SQUID-MEG observations, which certainly establishes OPM-MEG as "substantially equivalent" to conventional technology but misses the opportunity to empirically demonstrate the much-discussed theoretical advantages/opportunities.

      We thank the referee for the time and important contributions to this paper. We believe the fact that we were able to record good data in children as young as two years old was, in itself, an experimental realisation of the ‘theoretical advantages’ of OPM-MEG. Our additional analyses, inspired by the reviewers comments, help to clarify the advantages of OPM-MEG over conventional technology. The reviewers’ insights have without doubt improved the paper.

      Reviewer #2 (Public Review):

      Summary:

      The authors introduce a new 192-channel OPM system that can be configured using different helmets to fit individuals from 2 to 34 years old. To demonstrate the veracity of the system, they conduct a sensorimotor task aimed at mapping developmental changes in beta oscillations across this age range. Many past studies have mapped the trajectory of beta (and gamma) oscillations in the sensorimotor cortices, but these studies have focused on older children and adolescents (e.g., 9-15 years old) and used motor tasks. Thus, given the study goals, the choice of a somatosensory task was surprising and not justified. The authors recorded a final sample of 27 children (2-13 years old) and 24 adults (21-34 years) and performed a time-frequency analysis to identify oscillatory activity. This revealed strong beta oscillations (decreases from baseline) following the somatosensory stimulation, which the authors imaged to discern generators in the sensorimotor cortices. They then computed the power difference between 0.3-0.8 period and 1.0-1.5 s post-stimulation period and showed that the beta response became stronger with age (more negative relative to the stimulation period). Using these same time windows, they computed the beta burst probability and showed that this probability increased as a function of age. They also showed that the spectral composition of the bursts varied with age. Finally, they conducted a whole-brain connectivity analysis. The goals of the connectivity analysis were not as clear as prior studies of sensorimotor development have not conducted such analyses and typically such whole-brain connectivity analyses are performed on resting-state data, whereas here the authors performed the analysis on task-based data. In sum, the authors demonstrate that they can image beta oscillations in young children using OPM and discern developmental effects.

      Thank you for this summary and for taking the time to review our manuscript.

      Strengths:

      Major strengths of the study include the novel OPM system and the unique participant population going down to 2-year-olds. The analyses are also innovative in many respects.

      Thank you – we also agree that the major strength is in the unique cohort.

      Weaknesses:

      Several weaknesses currently limit the impact of the study. 

      First, the choice of a somatosensory stimulation task over a motor task was not justified. The authors discuss the developmental motor literature throughout the introduction, but then present data from a somatosensory task, which is confusing. Of note, there is considerable literature on the development of somatosensory responses so the study could be framed with that.

      We completely understand the referee’s point, and we agree that the motivation for the somatosensory task was not made clear in our original manuscript.

      Our choice of task was motivated completely by our targeted cohort; whilst a motor task would have been our preference, it was generally felt that making two-year-olds comply with instructions to press a button would have been a significant challenge. In addition, there would likely have been differences in reaction times. By opting for a passive sensory stimulation we ensured compliance, and the same stimulus for all subjects. We have added text on this to our introduction as follows:

      “…Here, we combine OPM-MEG with a burst analysis based on a Hidden Markov Model (HMM) 10–12 to investigate beta dynamics. We scanned a cohort of children and adults across a wide age range (upwards from 2 years old). Because of this, we implemented a passive somatosensory task which can be completed by anyone, regardless of age…”

      We also state in our discussion:

      “…here we chose to use passive (sensory) stimulation. This helped ensure compliance with the task in subjects of all ages and prevented confounds of e.g. reaction time, force, speed and duration of movement which would be more likely in a motor task.7,8 However, there are many other systems to choose and whether the findings here regarding beta bursts and the changes with age also extend to other brain networks remains an open question.…”

      Regarding the neurodevelopmental literature – we are aware of the literature on somatosensory evoked responses – particularly median nerve stimulation – but we can find little on the neurodevelopmental trajectory of somatosensory induced beta oscillations (the topic of our paper). We have edited our introduction as follows:

      “…All these studies probed beta responses to movement execution; in the case of tactile stimulation (i.e. sensory stimulation without movement) both task induced beta power loss, and the post stimulus rebound have been consistently observed in adults9,13–18. Further, beta amplitude in sensory cortex has been related to attentional processes19 and is broadly thought to carry top down top down influence on primary areas20. However, there is less literature on how beta modulation changes with age during purely sensory tasks.…”

      We would be keen for the reviewer to point to any specific papers in the literature that we may have missed.

      Second, the primary somatosensory response actually occurs well before the time window of interest in all of the key analyses. There is an established literature showing mechanical stimulation activates the somatosensory cortex within the first 100 ms following stimulation, with the M50 being the most robust response. The authors focus on a beta decrease (desynchronization) from 0.3-0.8 s which is obviously much later, despite the primary somatosensory response being clear in some of their spectrograms (e.g., Figure 3 in older children and adults). This response appears to exhibit a robust developmental effect in these spectrograms so it is unclear why the authors did not examine it. This raises a second point; to my knowledge, the beta decrease following stimulation has not been widely studied and its function is unknown. The maps in Figure 3 suggest that the response is anterior to the somatosensory cortex and perhaps even anterior to the motor cortex. Since the goal of the study is to demonstrate the developmental trajectory of well-known neural responses using an OPM system, should the authors not focus on the best-understood responses (i.e., the primary somatosensory response that occurs from 0.0-0.3 s)?

      We understand the reviewer’s point. The original aim of our manuscript was to investigate the neurodevelopmental trajectory of beta oscillations, not the evoked response. In fact, the evoked response in this paradigm is complicated by the fact that there are three stimuli in a very short (<500 ms) time window. For this reason, we prefer the focus of our paper to remain on oscillations.

      Nevertheless, we agree that not including the evoked responses was a missed opportunity.  We have now added evoked responses to our analysis pipeline and manuscript. As surmised by the reviewer, the M50 shows neurodevelopmental changes (an increase with age). Our methods section has been updated accordingly and Figure 3 has been modified. The figure and caption are copied below for the convenience of the reviewer.

      Author response image 3.

      Beta band modulation with age: (A) Brain plots show slices through the left motor cortex, with a pseudo-T-statistical map of beta modulation (blue/green) overlaid on the standard brain. Peak MNI coordinates are indicated for each subgroup. Time frequency spectrograms show modulation of the amplitude of neural oscillations (fractional change in spectral amplitude relative to the baseline measured in the 2.5-3 s window). Vertical lines indicate the time of the first braille stimulus. In all cases results were extracted from the location of peak beta desynchronisation (in the left sensorimotor cortex). Note the clear beta amplitude reduction during stimulation. The inset line plots show the 4-40 Hz trial averaged phase-locked evoked response, with the expected prominent deflections around 20 and 50 ms. (B) Maximum difference in beta-band amplitude (0.3-0.8 s window vs 1-1.5 s window) plotted as a function of age (i.e., each data point shows a different participant; triangles represent children, circles represent adults). Note significant correlation (𝑅2 \= 0.29, 𝑝 = 0.00004 *). (C) Amplitude of the P50 component of the evoked response plotted against age. There was no significant correlation (𝑅2 \= 0.04, 𝑝 = 0.14 ). All data here relate to the index finger stimulation; similar results are available for the little finger stimulation in Supplementary Information Section 1.

      Regarding the developmental effects, the authors appear to compute a modulation index that contrasts the peak beta window (.3 to .8) to a later 1.0-1.5 s window where a rebound is present in older adults. This is problematic for several reasons. First, it prevents the origin of the developmental effect from being discerned, as a difference in the beta decrease following stimulation is confounded with the beta rebound that occurs later. A developmental effect in either of these responses could be driving the effect. From Figure 3, it visually appears that the much later rebound response is driving the developmental effect and not the beta decrease that is the primary focus of the study. Second, these time windows are a concern because a different time window was used to derive the peak voxel used in these analyses. From the methods, it appears the image was derived using the .3-.8 window versus a baseline of 2.5-3.0 s. How do the authors know that the peak would be the same in this other time window (0.3-0.8 vs. 1.0-1.5)? Given the confound mentioned above, I would recommend that the authors contrast each of their windows (0.3-0.8 and 1.0-1.5) with the 2.5-3.0 window to compute independent modulation indices. This would enable them to identify which of the two windows (beta decrease from 0.3-0.8 s or the increase from 1.0-1.5 s) exhibited a developmental effect. Also, for clarity, the authors should write out the equation that they used to compute the modulation index. The direction of the difference (positive vs. negative) is not always clear.

      We completely understand the referee’s point; referee 1 made a similar point. In fact, there are two limitations of our paradigm regarding the measurement of PMBR versus the task-induced beta decrease:

      Firstly, sensory tasks generally do not induce as strong a PMBR as motor tasks and with this in mind a stronger rebound response could have been elicited using a button press. However, as described above it was our intention to scan children down to age 2 and we were sceptical that the youngest children would carry out a button press as instructed.

      The second limitation relates to trial length. Multiple studies have shown that the PMBR can last over ~10 s7,8. Indeed, Pfurtscheller et al. argued in 1999 that it was necessary to leave 10 s between movements to allow the PMBR to return to a true baseline9 Here, we wanted to keep recordings relatively short for the younger participants, and so we adopted a short trial duration. However, a consequence of this short trial length is that it becomes impossible to access the PMBR directly because the PMBR of the nth trial is still ongoing when the (n+1)th trial begins. Because of this, there is no genuine rest period, and so the stimulus induced beta decrease and subsequent rebound cannot be disentangled. This limitation has now been made clear in our discussion as follows:

      “…this was the first study of its kind using OPM-MEG, and consequently aspects of the study design could have been improved. Firstly, the task was designed for children; it was kept short while maximising the number of trials (to maximise signal to noise ratio). However, the classical view of beta modulation includes a PMBR which takes ~10 s to reach baseline following task cessation7–9. Our short trial duration therefore doesn’t allow the rebound to return to baseline between trials, and so conflates PMBR with rest. Consequently, we cannot differentiate the neural generators of the task induced beta power decrease and the PMBR; whilst this helped ensure a short, child friendly task, future studies should aim to use longer rest windows to independently assess which of the two processes is driving age related changes…”

      To clarify our method of calculating the modulation index, we have added the following statement to the methods:

      “The beta modulation index was calculated using the equation , where , and are the average Hilbert-envelope-derived amplitudes in the stimulus (0.3-0.8s), post-stimulus (1-1.5s) and baseline (2.5-3s) windows, respectively.”

      Another complication of using a somatosensory task is that the literature on bursting is much more limited and it is unclear what the expectations would be. Overall, the burst probability appears to be relatively flat across the trial, except that there is a sharp decrease during the beta decrease (.3-.8 s). This matches the conventional trial-averaging analysis, which is good to see. However, how the bursting observed here relates to the motor literature and the PMBR versus beta ERD is unclear.

      Again, we agree completely; a motor task would have better framed the study in the context of existing burst literature – but as mentioned above, making 2-year-olds comply with the instructions for a motor task would have been difficult. Interestingly in a recent paper, Rayson et al. used EEG to investigate burst activity in infants (9 and 12 months) and adults during observed movement execution, with results showing stimulus induced decrease in beta burst rate at all ages, with the largest effects in adults21. This paper was not yet published when we submitted our article but does help us to frame our burst results since there is strong agreement between their study and ours. We now mention this study in both our introduction and discussion. 

      Another weakness is that all participants completed 42 trials, but 19% of the trials were excluded in children and 9% were excluded in adults. The number of trials is proportional to the signal-to-noise ratio. Thus, the developmental differences observed in response amplitude could reflect differences in the number of trials that went into the final analyses.

      This is an important observation and we thank the reviewer for raising the issue. We have now re-analysed all of our data, removing trials in the adults such that the overall number of trials was the same as for the children. All effects with age remained significant. We chose to keep the Figures in the main manuscript with all good trials (as previously) and present the additional analyses (with matched trial numbers) in supplementary information. However, if the reviewer feels strongly, we could do it the other way around (there is very little difference between the results).

      Reviewer #3 (Public Review):

      This study demonstrated the application of OPM-MEG in neurodevelopment studies of somatosensory beta oscillations and connections with children as young as 2 years old. It provides a new functional neuroimaging method that has a high spatial-temporal resolution as well wearable which makes it a new useful tool for studies in young children. They have constructed a 192-channel wearable OPM-MEG system that includes field compensation coils which allow free head movement scanning with a relatively high ratio of usable trials. Beta band oscillations during somatosensory tasks are well localized and the modulation with age is found in the amplitude, connectivity, and panspectral burst probability. It is demonstrated that the wearable OPM-MEG could be used in children as a quite practical and easy-to-deploy neuroimaging method with performance as good as conventional MEG. With both good spatial (several millimeters) and temporal (milliseconds) resolution, it provides a novel and powerful technology for neurodevelopment research and clinical applications not limited to somatosensory areas.

      We thank the reviewer for their summary, and their time in reviewing our manuscript.

      The conclusions of this paper are mostly well supported by data acquired under the proper method. However, some aspects of data analysis need to be improved and extended.

      (1) The colour bars selected for the pseudo-T-static pictures of beta modulation in Figures 2 and 3, which are blue/black and red/black, are not easily distinguished from the anatomical images which are grey-scale. A colour bar without black/white would make these figures better. The peak point locations are also suggested to be marked in Figure 2 and averaged locations in Figure 3 with an error bar.

      Thank you for this comment which we certainly agree with. The colour scheme used has now been changed to avoid black. We have also added peak locations. 

      (2) The data points in plots are not constant across figures. In Figures 3 and 5, they are classified into triangles and circles for children and adults, but all are circles in Figures 4 and 6.

      Thank you! We apologise for the confusion. Data points are now consistent across plots.

      (3) Although MEG is much less susceptible to conductivity inhomogeneity of the head than EEG, the forward modulating may still be impacted by the small head profile. Add more information about source localization accuracy and stability across ages or head size.

      This is an excellent point. We have added to our discussion relating to the accuracy of the forward model. 

      “…We failed to see a significant difference in the spatial location of the cortical representations of the index and little finger; there are three potential reasons for this. First, the system was not designed to look for such a difference – sensors were sparsely distributed to achieve whole head coverage (rather than packed over sensory cortex to achieve the best spatial resolution in one area22). Second, our “pseudo-MRI” approach to head modelling (see Methods) is less accurate than acquisition of participantspecific MRIs, and so may mask subtle spatial differences. Third, we used a relatively straightforward technique for modelling magnetic fields generated by the brain (a single shell forward model). Although MEG is much less susceptible to conductivity inhomogeneity of the head than EEG, the forward model may still be impacted by the small head profile. This may diminish spatial resolution and future studies might look to implement more complex models based on e.g. finite element modelling23. Finally, previous work 24 suggested that, for a motor paradigm in adults, only the beta rebound, and not the power reduction during stimulation, mapped motortopically. This may also be the case for purely sensory stimulation. Nevertheless, it remains the case that by placing sensors closer to the scalp, OPM-MEG should offer improved spatial resolution in children and adults; this should be the topic of future work…”

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      Major items to further test include the differing number of trials, the windowing issue, and the focus on motor findings in the intro and discussion. First, I would recommend the authors adjust the number of trials in adults to equate them between groups; this will make their developmental effects easier to interpret.  

      Thank you for raising this important point. This has now been done and appears in our supplementary information as discussed above.

      Second, to discern which responses are exhibiting developmental effects, the authors need to contrast the 0.3-0.8 window with the later window (2.5-3.0), not the window that appears to have the PMBR-like response. This artificially accentuates the response. I also think they should image the 1.0-1.5 vs 2.5-3.0s window to determine whether the response in this time window is in the same location as the decrease and then contrast this for beta differences. 

      We completely understand this point, which relates to separating the reduction in beta amplitude during stimulation and the rebound post stimulation. However, as explained above, doing so unambiguously would require the use of much longer trials. Here we were only able to measure stimulus induced beta modulation (distinct from the separate contributions of the task induced beta power reduction and rebound). It may be that future studies, with >10 s trial length, could probe the role of the PMBR, but such studies require long paradigms which are challenging to implement with children.

      Third, changing the framing of the study to highlight the somatosensory developmental literature would also be an improvement.

      We have added to our introduction a stated in the responses above.

      Finally, the connectivity analysis on data from a somatosensory task did not make sense given the focus of the study and should be removed in my opinion. It is very difficult to interpret given past studies used resting state data and one would expect the networks to dynamically change during different parts of the current task (i.e., stimulation versus baseline).

      We appreciate the point regarding connectivity. However, it was our intention to examine the developmental trajectory of beta oscillations, and a major role of beta oscillations is in mediating connectivity. It is true that most studies are conducted in the resting state (or more recently – particularly in children – during movie watching). The fact that we had a sensory task running is a confound; nevertheless, the connectivity we derived in adults bears a marked similarity to that from previous papers (e.g. 25) and we do see significant changes with age. We therefore believe this to be an important addition to the paper and we would prefer to keep it.

      References

      (1) Holmes, N., Bowtell, R., Brookes, M. J. & Taulu, S. An Iterative Implementation of the Signal Space Separation Method for Magnetoencephalography Systems with Low Channel Counts.

      Sensors 23, 6537 (2023).

      (2) Boto, E. et al. Moving magnetoencephalography towards real-world applications with a wearable system. Nature (2018) doi:10.1038/nature26147.

      (3) Holmes, M. et al. A bi-planar coil system for nulling background magnetic fields in scalp mounted magnetoencephalography. NeuroImage 181, 760–774 (2018).

      (4) Seymour, R. A. et al. Using OPMs to measure neural activity in standing, mobile participants. NeuroImage 244, 118604 (2021).

      (5) Rea, M. et al. A 90-channel triaxial magnetoencephalography system using optically pumped magnetometers. annals of the new york academy of sciences 1517, https://doi.org/10.1111/nyas.14890 (2022).

      (6) Holmes, N. et al. Enabling ambulatory movement in wearable magnetoencephalography with matrix coil active magnetic shielding. NeuroImage 274, 120157 (2023).

      (7) Pakenham, D. O. et al. Post-stimulus beta responses are modulated by task duration. NeuroImage 206, 116288 (2020).

      (8) Fry, A. et al. Modulation of post-movement beta rebound by contraction force and rate of force development. Human Brain Mapping 37, 2493–2511 (2016).

      (9) Pfurtscheller, G. & Lopes da Silva, F. H. Event-related EEG/MEG synchronization and desynchronization: Basic principles. Clin Neurophysio 110, 1842–1857 (1999).

      (10) Seedat, Z. A. et al. The role of transient spectral ‘bursts’ in functional connectivity: A magnetoencephalography study. NeuroImage 209, 116537 (2020).

      (11) Baker, A. P. et al. Fast transient networks in spontaneous human brain activity. eLife 2014, 1867 (2014).

      (12) Vidaurre, D. et al. Spectrally resolved fast transient brain states in electrophysiological data. NeuroImage 126, 81–95 (2016).

      (13) Gaetz, W. & Cheyne, D. Localization of sensorimotor cortical rhythms induced by tactile stimulation using spatially filtered MEG. NeuroImage 30, 899–908 (2006).

      (14) Cheyne, D. et al. Neuromagnetic imaging of cortical oscillations accompanying tactile stimulation. Cognitive Brain Research 17, 599–611 (2003).

      (15) van Ede, F., Jensen, O. & Maris, E. Tactile expectation modulates pre-stimulus β-band oscillations in human sensorimotor cortex. NeuroImage 51, 867–876 (2010).

      (16) Salenius, S., Schnitzler, A., Salmelin, R., Jousmäki, V. & Hari, R. Modulation of Human Cortical Rolandic Rhythms during Natural Sensorimotor Tasks. NeuroImage 5, 221–228 (1997).

      (17) Cheyne, D. O. MEG studies of sensorimotor rhythms: A review. Experimental Neurology 245, 27–39 (2013).

      (18) Kilavik, B. E., Zaepffel, M., Brovelli, A., MacKay, W. A. & Riehle, A. The ups and downs of beta oscillations in sensorimotor cortex. Experimental Neurology 245, 15–26 (2013).

      (19) Bauer, M., Oostenveld, R., Peeters, M. & Fries, P. Tactile Spatial Attention Enhances Gamma-Band Activity in Somatosensory Cortex and Reduces Low-Frequency Activity in Parieto-Occipital Areas. J. Neurosci. 26, 490–501 (2006).

      (20) Barone, J. & Rossiter, H. E. Understanding the Role of Sensorimotor Beta Oscillations. Frontiers in Systems Neuroscience 15, (2021).

      (21) Rayson, H. et al. Bursting with Potential: How Sensorimotor Beta Bursts Develop from Infancy to Adulthood. J Neurosci 43, 8487–8503 (2023).

      (22) Hill, R. M. et al. Optimising the Sensitivity of Optically-Pumped Magnetometer Magnetoencephalography to Gamma Band Electrophysiological Activity. Imaging Neuroscience (2024) doi:10.1162/imag_a_00112.

      (23) Stenroos, M., Hunold, A. & Haueisen, J. Comparison of three-shell and simplified volume conductor models in magnetoencephalography. NeuroImage 94, 337–348 (2014).

      (24) Barratt, E. L., Francis, S. T., Morris, P. G. & Brookes, M. J. Mapping the topological organisation of beta oscillations in motor cortex using MEG. NeuroImage 181, 831–844 (2018).

      (25) Rier, L. et al. Test-Retest Reliability of the Human Connectome: An OPM-MEG study. Imaging Neuroscience (2023) doi:10.1162/imag_a_00020.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This important study investigates the transcriptional changes in neurons that underlie loss of learning and memory with age in C. elegans, and how cognition is maintained in insulin/IGF-1-like signaling mutants. The presented evidence is convincing, utilizing a cutting-edge method to isolate neurons from worms for genomics that is clearly conveyed with a rigorous experimental approach. Overall, this study supports that older daf-2 worms maintain cognitive function via mechanisms that are unique from younger wild type worms, which will be of interest to neuroscientists and researchers studying ageing.

      Thank you, we appreciate the positive comments.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      The authors perform RNA-seq on FACS-isolated neurons from adult worms at days 1 and 8 of adulthood to profile the gene expression changes that occur with cognitive decline. Supporting data are included indicating that by day 7 of adulthood, learning and memory are reduced, indicating that this time point or after represents cognitively aged worms. Neuronal identity genes are reduced in expression within cognitively aged worms, whereas genes involved in proteostasis, transcription/chromatin, and stress response are elevated. A number of specific examples are provided, representing markers of specific neuronal subtypes, and correlating expression changes to the erosion of particular functions (e.g. motor neurons, chemosensory neurons, aversive learning neurons, etc). 

      To investigate whether the upregulation of genes in neurons with age is compensatory or deleterious, the authors reduced the expression of a set of three significantly upregulated genes and performed behavioral assays in young adults. In each case, reduction of expression improved memory, consistent with a model in which age-associated increases impair neuronal function. This claim would be bolstered by an experiment elevating the expression of these genes in young neurons, which should reduce the learning index if the hypothesis is correct. 

      This is an interesting suggestion. Our long-term goal is to find ways to improve memory, and to better understand the “rules” that might govern changes with age. In this case, were interested in addressing the hypothesis that genes that rise with age must be compensatory, which is a frequently stated theory that is not often tested. Here we showed that knocking down three genes that are upregulated in aged animals improved memory; our results suggest that the wild-type functions of these genes are likely deleterious for learning and memory functions, and further, that their increased expression with age is not a compensatory function. Certainly for future work, it might be interesting to better understand how and why these specific genes have a deleterious function that increases with age, and whether that function is different in younger animals where they are not highly expressed.

      The authors then characterize learning and memory in wild-type, daf-2, and daf-2/daf-16 worms with age and find that daf-2 worms have an extended ability to learn for approximately 10 days longer than wild types. This was daf-16 dependent. Memory was extended in daf-2 as well, and strikingly, daf-2;daf-16 had no short-term memory even at day 1. Transcriptomic analysis of FACS-sorted neurons was performed on the three groups at day 8. The authors focus their analysis on daf-2 vs. daf-2;daf-16 and present evidence that daf-2 neurons express a stress-resistance gene program. One question that remains unanswered is how well the N2 and daf-2;daf-16 correlate overall, and are there differences? This may be informative as wild type and daf-2;daf-16 mutants are not phenotypically identical when it comes to memory, and there may be differences that can be detected despite the overlap in the PCA. This analysis could reveal the daf-16 targets involved in memory. 

      Re. daf-2;daf-16 vs N2: This is a good suggestion. Our analysis in Fig. S5 showed that the daf-2 vs N2 comparison shows similar results with the daf-2 vs daf-16;daf-2 comparison, but some additional genes are differentially expressed. Interestingly, the daf-2 vs N2 comparison shows that the bZip transcription factors are upregulated in daf-2 compared with N2 worms (Fig. S6f). This may indicate that additional transcription factors are controlled by the daf-2 mutation in the nervous system in addition to the DAF-16/FOXO transcription factor.

      Author response image 1.

      We also identified the differentially expressed genes in the Day 8 neuronal daf-16;daf-2 to N2 comparison, as the reviewer is asking about. The samples from different genotypes do separate from one another in the PCA plot, indicating there are differences between daf-16,daf-2 and N2 neurons. However, the difference is smaller and there are fewer genes differentially expressed between daf-16;daf-2 and N2: only 38 genes are significantly higher in daf-16;daf-2, and only 53 genes are significantly higher in N2 (log2FC > 0.5, p-adj<0.05). The genes higher in N2 are enriched in endopeptidase inhibitors, and the genes higher in daf-16;daf-2 are not enriched in any gene ontology terms. These results indicate that there are some differences between daf-16;daf-2 and N2 neurons, which correlates with the behavioral differences we see, but the difference is small compared to daf-2 neurons. We have added these data to the paper (Fig. S4e,f); thank you for the suggestion.

      The authors tested eight candidate genes that were more highly expressed in daf-2 neurons vs. daf-2;daf-16 and showed that reduction of 2 and 5 of these genes impaired learning and memory, respectively, in daf-2 worms. This finding implicates specific neuronal transcriptional targets of IIS in maintaining cognitive ability in daf-2 with age, which, importantly, are distinct from those in young wild type worms. 

      Reviewer #2 (Public Review): 

      Weng et al. perform a comprehensive study of gene expression changes in young and old animals, in wild-type and daf-2 insulin receptor mutants, in the whole animal, and specifically in the nervous system. Using this data, they identify gene families that are correlated with neuronal ageing, as well as a distinct set of genes that are upregulated in neurons of aged daf-2 mutants. This is particularly interesting as daf-2 mutants show both extended lifespans and healthier neurons in aged animals, reflected by better learning/memory in older animals compared with wild-type controls. Indeed, the knockdown of several of these upregulated genes resulted in poorer learning and memory. In addition, the authors showed that several genes upregulated during ageing in wild-type neurons also contribute to learning and memory; specifically knockdown of these genes in young animals resulted in improved memory. This indicates that (at least in this small number of cases), genes that show increased transcript levels with age in the nervous system somehow suppress memory, potentially by having damaging effects on neuronal health. 

      Finally, from a resource perspective, the neuronal transcriptome provided here will be very useful for C. elegans researchers as it adds to other existing datasets by providing the transcriptome of older animals (animals at day 8 of adulthood) and demonstrating the benefits of performing tissue-specific RNAseq instead of whole-animal sequencing. 

      Thank you!

      The work presented here is of high quality and the authors present convincing evidence supporting their conclusions.

      Thanks!

      I only have a few comments/suggestions: 

      (1) Do the genes identified to decrease learning/memory capacity in daf-2 animals (Figure 4d/e) also impact neuronal health? daf-2 mutant worms show delayed onset of age-related changes to neuron structure (Tank et al., 2011, J Neurosci). Does knockdown of the genes shown to affect learning also affect neuron structure during ageing, potentially one mechanism through which they modulate learning/memory? 

      Thank you for this suggestion, which would be good for a future direction, particularly for genes that might have some relationship to previously-identified cellular structural process. The genes we tested here include dod-24, alh-2, mtl-1, F08H9,4, C44B7.5, hsp-12.3, hsp-12.6, and cpi-1, which are related to stress response, proteolysis inhibitor, metabolic, and innate immunity GO categories, thus associated with stress resistance, proteolysis, lipid metabolism processes; none are obvious choices for morphological effects.

      However, it is worth noting that learning and memory decline much faster (Days 4-8) than morphological differences are observed (generally after Day 12-15). Moreover, those morphological differences have been studied primarily in mechanosensory neurons (touch neurons) rather than the chemosensory neurons that are involved in learning and memory, so additional genes may be required for those differences that we were not focusing on in thisi study.

      (2) The learning and memory assay data presented in this study uses the butanone olfactory learning paradigm, which is well established by the same group. Have the authors tried other learning assays when testing for learning/memory changes after the knockdown of candidate genes? Depending on the expression pattern of these genes, they may have more or less of an effect on olfactory learning versus for example gustatory or mechanosensory-based learning. 

      The reason that we use the butanone olfactory learning paradigm is because it is more similar to learning of information (neutral odorant association with positive cue (food)) – the kind of memory we would like to preserve in humans - rather than a stress-induced memory, such as starvation or pathogenesis-associated aversive learning paradigms, which are more like PTSD. (There is likely to be quite a bit of overlap in mechanism, however, including the role of genes such as magi-1 and casy-1, so it would not be surprising if many of these genes also were required for other learning paradigms.)

      (3) I have a comment on the 'compensatory vs dysregulatory' model as stated by the authors on page 7. I understand that this model presents the two main options, but perhaps this is slightly too simplistic: the gene expression that rises during ageing may be detrimental for memory (= dysregulatory), but at the same time may also be beneficial for other physiological roles in other tissues (=compensatory). 

      This is a good point, and we made the clarification that in the text: “There may be other scenarios in which a gene with multiple functions may be detrimental for some behaviors but beneficial for other physiological roles.”

      Reviewer #3 (Public Review): 

      Summary: 

      In this manuscript, Weng et al. detect a neuron-specific transcriptome that regulates aging. The authors first profile neuron-specific responses during aging at a time point where a loss in memory function is present. They discover signatures unique to neurons which validate their pipeline and reveal the loss of neuron identity with age. For example, old neurons reduce the expression of genes related to synaptic function and neuropeptide signaling and increase the expression of chromatin regulators, insulin peptides, and glycoproteins. The authors discover the detrimental effect of selected upregulated genes (utx-1, ins-19, and nmgp-1) by knocking them down in the whole body and detecting improvement of short memory functions. They then use their pipeline to test neuronal profiles of long-lived insulin/IGF mutants. They discover that genes related to stress response pathways are upregulated upon longevity (e.g. dod-24, F08H9.4) and that they are required for improved neuron function in long-lived individuals. 

      Strengths: 

      Overall, the manuscript is well-written, and the experiments are well-described. The authors take great care to explain their reasoning for performing experiments in a specific way and guide the reader through the interpretation of the results, which makes this manuscript an enjoyable and interesting read. Using neuron-specific transcriptomic analysis in aged animals the authors discover novel regulators of learning and memory, which underlines the importance of cell-specific deep sequencing. The time points of the transcriptomic profiling are elegantly chosen, as they coincide with the loss of memory and can be used to specifically reveal gene expression profiles related to neuron function. The authors showcase on the dod-24 example how powerful this approach is. In long-lived insulin/IGF-1 receptor mutants body-wide dod-24 expression differs from neuron-specific profiles. Importantly, the depletion of dod-24 has an opposing effect on lifespan and learning memory. The dataset will provide a useful resource for the C. elegans and aging community. 

      Thank you, we do hope people will find the data useful.

      Weaknesses: 

      While this study nicely describes the neuron-specific profiles, the authors do not test the relevance in a tissue-specific way. It remains unclear if modifying the responses only in neurons has implications for either memory or potentially for lifespan. The authors point to this in the text and refer to tissue-specific datasets. However, it is possible that the tissue-specific profile changes with age. The authors should consider mining publicly available cell-specific aging datasets and performing neuron-specific RNAi to test the functional relevance of the neuron-specific response. This would strengthen the importance of cell-specific profiling.

      Thank you for your suggestions. As we have mentioned in the text, our candidate genes are either (1) only expressed in the neurons (alh-2 and F08H9.4), or they are only more highly expressed in daf-2 compared to wild type only in the nervous system (C44B7.5 or dod-24). Thus, the effect we see from knocking down these genes in daf-2 are likely neuron-specific. Additionaly, we performed our assays with neuron-sensitive RNAi strain CQ745: daf-2(e1370) III; vIs69 [pCFJ90(Pmyo-2::mCherry + Punc-119::sid-1)] V. It has been previously shown that neuronal expression of sid-1 decreases non-neuronal RNAi, suggesting that neurons expressing transgenic sid-1(+) served as a sink for dsRNA (Calixto et al., 2010). Thus, this neuron-sensitive RNAi is likely neuron-specific and our results is unlikely from knocking down these genes in non-neuronal tissues. However, we do acknowledge this issue.

      To identify the expression pattern of these genes in a more cell-specific way in the adults, we examined the expression of our candidate genes that affected learning and memory, namely dod-24, F08H9.4, C44B7.9, alh-2, and mtl-1, in the Calico database (Roux et al., 2023). From that database, we can see that dod-24 is mainly expressed in the PHC and PVM neurons, and F08H9.4 is largely expressed in various neurons. Both have only slight expression outside the nervous system. C44B7.5 and mtl-1 are more broadly expressed, but C44B7.5 was not found to be differentially expressed in other tissues in daf-2, and mtl-1 only had a slight effect on learning and memory. Perhaps due to their sequencing depth and detection limit, Roux et al. didn’t detect alh-2 expression anywhere in their data.

      Thus, the neuron-specific expression and daf-2 differential expression pattern of these genes indicate that the learning and memory improvement in aged daf-2 is unlikely due to neuronal non-autonomous effects.

      To better address this concern (that for the genes that we found only expressed in the neurons, the neuron-confined expression may change with age) we examined the expression pattern change of these genes with age. As is shown below, from the Calico database, we can see that the expression in the nervous system persists, and even slightly increases, with age, thus age-related expression pattern change is not a concern to our analysis.

      Author response image 2.

      Author response image 3.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      Most of my comments are in the public section. A few additional recommendations for the authors regarding the formatting/presentation: 

      The presentation of Figure S6e-h in the introduction is somewhat confusing and feels out of order. If presented first, it should be S1. Otherwise, discussion of this figure should go at the end of the results section or in the discussion if appropriate. 

      Thank you for pointing this out. We have moved the discussion of this figure to the Discussion section.

      I do not see Figure S5 described in the text.

      Good catch, thank you. We have added the descriptions for Figure S5 in the text.

      In general, check the figures, figure legends, and how they are referenced in the text, particularly the supplemental figures and legends.

      Minor comments:

      There is a typo in the Figure 4 legend: Neuronal IIX should be IIS. 

      Thanks for pointing this out. We have corrected it in the text.

      Reviewer #2 (Recommendations For The Authors): 

      • There are multiple instances throughout the manuscript where there are statements in brackets that provide justification or explanation for some of the approaches used. There is no reason for 'side note' brackets to be used. I suggest removing them and incorporating these statements into the narrative.

      Thank you, we have now incorporated these points into the main text.

      • Introduction: page 4 "here we RNA-sequenced FACS-isolated neurons" should be "here we performed RNA sequencing on FACS-isolated neurons...".

      Thank you, we have changed the text accordingly.

      • Figure 2A: I do not understand the legend for this panel "Tissue Query for wild-type genes expressed at higher levels in aged worms show lower nervous system and neuron prediction score." Please clarify.

      We have clarified the Figure 2A legend:

      (A)  Tissue prediction score for wild-type genes expressed at higher levels in aged worms.

      • Page 8: "We previously observed that loss of single genes that play a role in complex behaviors like learning and memory can have a large impact on function 60, unlike the additive roles of longevity-promoting genes 11." - a large impact on what function?

      Thank you for noting, we have clarified it in the text accordingly:

      “We previously observed that for genes that play a role in complex behaviors like learning and memory, the loss of single genes can have a large impact on these complex behaviors 60, unlike the additive roles of longevity-promoting genes 11.”

      • Next line "Therefore, one mechanism by which wild-type worms lose their function with age..." - again, what function?

      Thank you for noting this, we have clarified the text to say we refer to the learning and memory functions.

      • Page 9: "Thus, daf-2 mutants maintain their higher cognitive quality of life longer than wild-type worms, while daf-16;daf-2 mutants spend their whole lives without memory ability (Figure 3d), in contrast to claims that daf-2 mutants are less healthy than wild-type or daf-16 worms23." - since ref 23 did not perform any learning/memory tests, the definition of 'health' in ref 23 is different to 'cognitive health' as studied here. So the findings in this study are not 'in contrast' to ref 23 but rather add to these findings.

      Learning and memory ability is an important function for a healthy individual, thus we would assert that indeed, cognitive health is an important part of the “health” of daf-2 worms. In ref 23, Bansal et al. claim that daf-2 worms are less healthy without assessing their learning and memory ability; their lack of data is an insufficient reason for us to remove our statement, as cognitive health is part of healthspan. Here we find that the “learning span” of daf-2 lasts at least proportionally if not longer than that of wild type. We have also previously shown that daf-2 worms also have longer maximum velocity span with age (Hahm et al., 2015), in direct contrast with Bansal et al.’s claim that daf-2 worms move less well and thus are less healthy – daf-2 worms simply stop sooner when presented with food and switch to feeding, due to their higher odr-10 levels. The Bansal paper continues to be frequently cited as finding that daf-2 mutants are less healthy than wild type, a claim for which we can still find no experimental evidence to support. Therefore, it is important that we make the point that daf-2 worms have extended cognitive health, which is part of health span.

      • Page 13: I feel like the sentence "Furthermore, memory maintenance with age might require additional functions that were not previously uncovered in analyses of young animals" is both vague (what functions are referred to?) and a little bit obvious (obvious that age-related changes would not be revealed in analyses of young animals). Perhaps rephrase to make the desired point clearer? 

      We have clarified the sentence in the text:

      “Furthermore, memory maintenance with age might require additional genes that function in promoting stress resistance and neuronal resilience, which were not previously uncovered in analyses of young animals.”

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Protein conformational changes are often critical to protein function, but obtaining structural information about conformational ensembles is a challenge. Over a number of years, the authors of the current manuscript have developed and improved an algorithm, qFit protein, that models multiple conformations into high resolution electron density maps in an automated way. The current manuscript describes the latest improvements to the program, and analyzes the performance of qFit protein in a number of test cases, including classical statistical metrics of data fit like Rfree and the gap between Rwork and Rfree, model geometry, and global and case-by-case assessment of qFit performance at different data resolution cutoffs. The authors have also updated qFit to handle cryo-EM datasets, although the analysis of its performance is more limited due to a limited number of high-resolution test cases and less standardization of deposited/processed data.

      Strengths:

      The strengths of the manuscript are the careful and extensive analysis of qFit's performance over a variety of metrics and a diversity of test cases, as well as the careful discussion of the limitations of qFit. This manuscript also serves as a very useful guide for users in evaluating if and when qFit should be applied during structural refinement.

      Reviewer #2 (Public Review):

      Summary

      The manuscript by Wankowicz et al. describes updates to qFit, an algorithm for the characterization of conformational heterogeneity of protein molecules based on X-ray diffraction of Cryo-EM data. The work provides a clear description of the algorithm used by qFit. The authors then proceed to validate the performance of qFit by comparing it to deposited X-ray entries in the PDB in the 1.2-1.5 Å resolution range as quantified by Rfree, Rwork-Rfree, detailed examination of the conformations introduced by qFit, and performance on stereochemical measures (MolProbity scores). To examine the effect of experimental resolution of X-ray diffraction data, they start from an ultra high-resolution structure (SARS-CoV2 Nsp3 macrodomain) to determine how the loss of resolution (introduced artificially) degrades the ability of qFit to correctly infer the nature and presence of alternate conformations. The authors observe a gradual loss of ability to correctly infer alternate conformations as resolution degrades past 2 Å. The authors repeat this analysis for a larger set of entries in a more automated fashion and again observe that qFit works well for structures with resolutions better than 2 Å, with a rapid loss of accuracy at lower resolution. Finally, the authors examine the performance of qFit on cryo-EM data. Despite a few prominent examples, the authors find only a handful (8) of datasets for which they can confirm a resolution better than 2.0 Å. The performance of qFit on these maps is encouraging and will be of much interest because cryo-EM maps will, presumably, continue to improve and because of the rapid increase in the availability of such data for many supramolecular biological assemblies. As the authors note, practices in cryo-EM analysis are far from uniform, hampering the development and assessment of tools like qFit.

      Strengths

      qFit improves the quality of refined structures at resolutions better than 2.0 A, in terms of reflecting true conformational heterogeneity and geometry. The algorithm is well designed and does not introduce spurious or unnecessary conformational heterogeneity. I was able to install and run the program without a problem within a computing cluster environment. The paper is well written and the validation thorough.

      I found the section on cryo-EM particularly enlightening, both because it demonstrates the potential for discovery of conformational heterogeneity from such data by qFit, and because it clearly explains the hurdles towards this becoming common practice, including lack of uniformity in reporting resolution, and differences in map and solvent treatment.

      Weaknesses

      The authors begin the results section by claiming that they made "substantial improvement" relative to the previous iteration of qFit, "both algorithmically (e.g., scoring is improved by BIC, sampling of B factors is now included) and computationally (improving the efficiency and reliability of the code)" (bottom of page 3). However, the paper does not provide a comparison to previous iterations of the software or quantitation of the effects of these specific improvements, such as whether scoring is improved by the BIC, how the application of BIC has changed since the previous paper, whether sampling of B factors helps, and whether the code faster. It would help the reader to understand what, if any, the significance of each of these improvements was.

      Indeed, it is difficult (embarrassingly) to benchmark against our past work due to the dependencies on different python packages and the lack of software engineering. With the infrastructure we’ve laid down with this paper, made possible by an EOSS grant from CZI, that will not be a problem going forward. Not only is the code more reliable and standardized, but we have developed several scientific test sets that can be used as a basis for broad comparisons to judge whether improvements are substantial. We’ve also changed with “substantial improvement” to “several modifications”  to indicate the lack of comparison to past versions.

      The exclusion of structures containing ligands and multichain protein models in the validation of qFit was puzzling since both are very common in the PDB. This may convey the impression that qFit cannot handle such use cases. (Although it seems that qFit has an algorithm dedicated to modeling ligand heterogeneity and seems to be able to handle multiple chains). The paper would be more effective if it explained how a user of the software would handle scenarios with ligands and multiple chains, and why these would be excluded from analysis here.

      qFit can indeed handle both. We left out multiple chains for simplicity in constructing a dataset enriched for small proteins while still covering diversity to speed the ability to rapidly iterate and test our approaches. Improvements to qFit ligand handling will be discussed in a forthcoming work as we face similar technical debt to what we saw in proteins and are undergoing a process of introducing “several modifications” that we hope will lead to “substantial improvement” - but at the very least will accelerate further development.

      It would be helpful to add some guidance on how/whether qFit models can be further refined afterwards in Coot, Phenix, ..., or whether these models are strictly intended as the terminal step in refinement.

      We added to the abstract:

      “Importantly, unlike ensemble models, the multiconformer models produced by qFit can be manually modified in most major model building software (e.g. Coot)  and fit can be further improved by refinement using standard pipelines (e.g. Phenix, Refmac, Buster).”

      and introduction:

      “Multiconformer models are notably easier to modify and more interpretable in software like Coot12 unlike ensemble methods that generate multiple complete protein copies(Burnley et al. 2012; Ploscariu et al. 2021; Temple Burling and Brünger 1994).”

      and results:

      “This model can then be examined and edited in Coot12 or other visualization software, and further refined using software such as phenix.refine, refmac, or buster as the modeler sees fit.”

      and discussion

      “qFit is compatible with manual modification and further refinement as long as the subsequent software uses the PDB standard altloc column, as is common in most popular modeling and refinement programs. The models can therefore generally also be deposited in the PDB using the standard deposition and validation process.”

      Appraisal & Discussion

      Overall, the authors convincingly demonstrate that qFit provides a reliable means to detect and model conformational heterogeneity within high-resolution X-ray diffraction datasets and (based on a smaller sample) in cryo-EM density maps. This represents the state of the art in the field and will be of interest to any structural biologist or biochemist seeking to attain an understanding of the structural basis of the function of their system of interest, including potential allosteric mechanisms-an area where there are still few good solutions. That is, I expect qFit to find widespread use.

      Reviewer #3 (Public Review):

      Summary:

      The authors address a very important issue of going beyond a single-copy model obtained by the two principal experimental methods of structural biology, macromolecular crystallography and cryo electron microscopy (cryo-EM). Such multiconformer model is based on the fact that experimental data from both these methods represent a space- and time-average of a huge number of the molecules in a sample, or even in several samples, and that the respective distributions can be multimodal. Different from structure prediction methods, this approach is strongly based on high-resolution experimental information and requires validated single-copy high-quality models as input. Overall, the results support the authors' conclusions.

      In fact, the method addresses two problems which could be considered separately:

      - An automation of construction of multiple conformations when they can be identified visually;

      - A determination of multiple conformations when their visual identification is difficult or impossible.

      We often think about this problem similarly to the reviewer. However, in building qFit, we do not want to separate these problems - but rather use the first category (obvious visual identification) to build an approach that can accomplish part of the second category (difficult to visualize) without building “impossible”/nonexistent conformations - with a consistent approach/bias.

      The first one is a known problem, when missing alternative conformations may cost a few percent in R-factors. While these conformations are relatively easy to detect and build manually, the current procedure may save significant time being quite efficient, as the test results show.

      We agree with the reviewers' assessment here. The “floor” in terms of impact is automating a tedious part of high resolution model building and improving model quality.

      The second problem is important from the physical point of view and has been addressed first by Burling & Brunger (1994; https://doi.org/10.1002/ijch.199400022). The new procedure deals with a second-order variation in the R-factors, of about 1% or less, like placing riding hydrogen atoms, modeling density deformation or variation of the bulk solvent. In such situations, it is hard to justify model improvement. Keeping Rfree values or their marginal decreasing can be considered as a sign that the model is not overfitted data but hardly as a strong argument in favor of the model.

      We agree with the overall sentiment of this comment. What is a significant variation in R-free is an important question that we have looked at previously (http://dx.doi.org/10.1101/448795) and others have suggested an R-sleep for further cross validation (https://pubmed.ncbi.nlm.nih.gov/17704561/). For these reasons it is important to get at the significance of the changes to model types from large and diverse test sets, as we have here and in other works, and from careful examination of the biological significance of alternative conformations with experiments designed to test their importance in mechanism.

      In general, overall targets are less appropriate for this kind of problem and local characteristics may be better indicators. Improvement of the model geometry is a good choice. Indeed, yet Cruickshank (1956; https://doi.org/10.1107/S0365110X56002059) showed that averaged density images may lead to a shortening of covalent bonds when interpreting such maps by a single model. However, a total absence of geometric outliers is not necessarily required for the structures solved at a high resolution where diffraction data should have more freedom to place the atoms where the experiments "see" them.

      Again, we agree—geometric outliers should not be completely absent, but it is comforting when they and model/experiment agreement both improve.

      The key local characteristic for multi conformer models is a closeness of the model map to the experimental one. Actually, the procedure uses a kind of such measure, the Bayesian information criteria (BIC). Unfortunately, there is no information about how sharply it identifies the best model, how much it changes between the initial and final models; in overall there is not any feeling about its values. The Q-score (page 17) can be a tool for the first problem where the multiple conformations are clearly separated and not for the second problem where the contributions from neighboring conformations are merged. In addition to BIC or to even more conventional target functions such as LS or local map correlation, the extreme and mean values of the local difference maps may help to validate the models.

      We agree with the reviewer that the problem of “best” model determination is poorly posed here. We have been thinking a lot about htis in the context of Bayesian methods (see: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9278553/); however, a major stumbling block is in how variable representations of alternative conformations (and compositions) are handled. The answers are more (but by no means simply) straightforward for ensemble representations where the entire system is constantly represented but with multiple copies.

      This method with its results is a strong argument for a need in experimental data and information they contain, differently from a pure structure prediction. At the same time, absence of strong density-based proofs may limit its impact.

      We agree - indeed we think it will be difficult to further improve structure prediction methods without much more interaction with the experimental data.

      Strengths:

      Addressing an important problem and automatization of model construction for alternative conformations using high-resolution experimental data.

      Weaknesses:

      An insufficient validation of the models when no discrete alternative conformations are visible and essentially missing local real-space validation indicators.

      While not perfect real space indicators, local real-space validation is implicit in the MIQP selection step and explicit when we do employ Q-score metrics.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      A point of clarification: I don't understand why waters seem to be handled differently in for cryo-EM and crystallography datasets. I am interested about the statement on page 19 that the Molprobity Clashscore gets worse for cryo-EM datasets, primarily due to clashes with waters. But the qFit algorithm includes a round of refinement to optimize placement of ordered waters, and the clashscore improves for the qFit refinement in crystallography test cases. Why/how is this different for cryo-EM?

      We agree that this was not an appropriate point. We believe that the high clash score is coming from side chains being incorrectly modeled. We have updated this in the manuscript and it will be a focus of future improvements.

      Reviewer #2 (Recommendations For The Authors):

      - It would be instructive to the reader to explain how qFit handles the chromophore in the PYP (1OTA) example. To this end, it would be helpful to include deposition of the multiconformer model of PYP. This might also be a suitable occasion for discussion of potential hurdles in the deposition of multiconformer models in the PDB (if any!). Such concerns may be real concerns causing hesitation among potential users.

      Thank you for this comment. qFit does not alter the position or connectivity of any HETATM records (like the chromophore in this structure). Handling covalent modifications like this is an area of future development.

      Regarding deposition, we have noted above that the discussion now includes:

      “qFit is compatible with manual modification and further refinement as long as the subsequent software uses the PDB standard altloc column, as is common in most popular modeling and refinement programs. The models can therefore, generally also be deposited in the PDB using the standard deposition and validation process.”

      Finally, we have placed all PDBs in a Zenodo deposition (XXX) and have included that language in the manuscript. It is currently under a separate data availability section (page XXX). We will defer to the editor as to the best header that should go under.

      - It may be advisable to take the description of true/false pos/negatives out of the caption of Figure 4, and include it in a box or so, since these terms are important in the main text too, and the caption becomes very cluttered.

      We think adding the description of true/false pos/negatives to the Figure panel would make it very cluttered and wordy. We would like to retain this description within the caption. We have also briefly described each in the main text.

      - page 21, line 4: some issue with citation formatting.

      We have updated these citations.

      - page 25, second paragraph: cardinality is the number of members of a set. Perhaps "minimal occupancy" is more appropriate.

      Thank you for pointing this out. This was a mistake and should have been called the occupancy threshold.

      - page 26: it's - its

      Thank you, we have made this change. 

      - Font sizes in Supplementary Figures 5-7 are too small to be readable.

      We agree and will make this change. 

      Reviewer #3 (Recommendations For The Authors):

      General remarks

      (1) As I understand, the procedure starts from shifting residues one by one (page 4; A.1). Then, geometry reconstruction (e.g., B1) may be difficult in some cases joining back the shifted residues. It seems that such backbone perturbation can be done more efficiently by shifting groups of residues ("potential coupled motions") as mentioned at the bottom of page 9. Did I miss its description?

      We would describe the algorithm as sampling (which includes minimal shifts) in the backbone residues to ensure we can link neighboring residues. We agree that future iterations of qFit should include more effective backbone sampling by exploring motion along the Cβ-Cα, C-N, and (Cβ-Cα × C-N) bonds and exploring correlated backbone movements.

      (2) While the paper is well split in clear parts, some of them seem to be not at their right/optimal place and better can be moved to "Methods" (detailed "Overview of the qFit protein algorithm" as a whole) or to "Data" missed now (Two first paragraphs of "qFit improves overall fit...", page 8, and "Generating the qFit test set", page 22, and "Generating synthetic data ..." at page 26; description of the test data set), At my personal taste, description of tests with simulated data (page 15) would be better before that of tests with real data.

      Thank you for this comment, but we stand by our original decision to keep the general flow of the paper as it was submitted.

      (3) I wonder if the term "quadratic programming" (e.g., A3, page 5) is appropriate. It supposes optimization of a quadratic function of the independent parameters and not of "some" parameters. This is like the crystallographic LS which is not a quadratic function of atomic coordinates, and I think this is a similar case here. Whatever the answer on this remark is, an example of the function and its parameters is certainly missed.

      We think that the term quadratic programming is appropriate. We fit a function with a loss function (observed density - calculated density), while satisfying the independent parameters. We fit the coefficients minimizing a quadratic loss. We agree that the quadratic function is missing from the paper, and we have now included it in the Methods section.

      Technical remarks to be answered by the authors :

      (1) Page 1, Abstract, line 3. The ensemble modeling is not the only existing frontier, and saying "one of the frontiers" may be better. Also, this phrase gives a confusing impression that the authors aim to predict the ensemble models while they do it with experimental data.

      We agree with this statement and have re-worded the abstract to reflect this.

      (2) Page 2. Burling & Brunger (1994) should be cited as predecessors. On the contrary, an excellent paper by Pearce & Gros (2021) is not relevant here.

      While we agree that we should mention the Burling & Brunger paper and the Pearce & Gros (2021) should not be removed as it is not discussing the method of ensemble refinement.

      (3) Page 2, bottom. "Further, when compared to ..." The preference to such approach sounds too much affirmative.

      We have amended this sentence to state:

      “Multiconformer models are notably easier to modify and more interpretable in software like Coot(Emsley et al. 2010) unlike ensemble methods that generate multiple complete protein copies(Burnley et al. 2012; Ploscariu et al. 2021; Temple Burling and Brünger 1994).”

      “The point we were trying to make in this sentence was that ensemble-based models are much harder to manually manipulate in Coot or other similar software compared to multiconformer models. We think that the new version of this sentence states this point more clearly.”

      (4) Page 2, last paragraph. I do not see an obvious relation of references 15-17 to the phrase they are associated with.

      We disagree with this statement, and think that these references are appropriate.

      “Multiconformer models are notably easier to modify and more interpretable in software like Coot12 unlike ensemble methods that generate multiple complete protein copies(Burnley et al. 2012; Ploscariu et al. 2021; Temple Burling and Brünger 1994).”

      (5) Page 3, paragraph 2. Cryo-EM maps should be also "high-resolution"; it does not read like this from the phrase.

      We agree that high-resolution should be added, and the sentence now states:

      “However, many factors make manually creating multiconformer models difficult and time-consuming. Interpreting weak density is complicated by noise arising from many sources, including crystal imperfections, radiation damage, and poor modeling in X-ray crystallography, and errors in particle alignment and classification, poor modeling of beam induced motion, and imperfect detector Detector Quantum Efficiency (DQE) in high-resolution cryo-EM.”

      (6) Page 3, last paragraph before "results". The words "... in both individual cases and large structural bioinformatic projects" do not have much meaning, except introducing a self-reference. Also, repeating "better than 2 A" looks not necessary.

      We agree that this was unnecessary and have simplified the last sentence to state:

      “With the improvements in model quality outlined here, qFit can now be increasingly used for finalizing high-resolution models to derive ensemble-function insights.”

      (7) Page 3. "Results". Could "experimental" be replaced by a synonym, like "trial", to avoid confusing with the meaning "using experimental data"?

      We have replaced experimental with exploratory to describe the use of qFit on CryoEM data. The statement now reads:

      “For cryo-EM modeling applications, equivalent metrics of map and model quality are still developing, rendering the use of qFit for cryo-EM more exploratory.”

      (8) Page 4, A.1. Should it be "steps +/- 0.1" and "coordinate" be "coordinate axis"? One can modify coordinates and not shift them. I do not understand how, with the given steps, the authors calculated the number of combinations ("from 9 to 81"). Could a long "Alternatively, ...absent" be reduced simply to "Otherwise"?

      We have simplified and clarified the sentence on the sampling of backbone coordinates to state:

      “If anisotropic B-factors are absent, the translation of coordinates occurs in the X, Y, and Z directions. Each translation takes place in steps of 0.1 along each coordinate axis, extending to 0.3 Å, resulting in 9 (if isotropic) or to 81 (if anisotropic) distinct backbone conformations for further analysis.”

      (9) Page 6, B.1, line 2. Word "linearly" is meaningless here.

      We have modified this to read:

      “Moving from N- to C- terminus along the protein,”

      (10) Page 9, line 2. It should be explained which data set is considered as the test set to calculate Rfree.

      We think this is clear and would be repetitive if we duplicated it.

      (11) Page 9, line 7. It should be "a valuable metric" and not "an"

      We agree and have updated the sentence to read:

      “Rfree is a valuable metric for monitoring overfitting, which is an important concern when increasing model parameters as is done in multiconformer modeling.”

      (12) Page 10, paragraph 3. "... as a string (Methods)". I did not find any other mention of this term "string", including in "Methods" where it supposed to be explained. Either this should be explained (and an example is given?), or be avoided.

      We agree that string is not necessary (discussing the programmatic datatype). We have removed this from the sentence. It now reads:

      “To quantify how often qFit models new rotameric states, we analyzed the qFit models with phenix.rotalyze, which outputs the rotamer state for each conformer (Methods).”

      (13) Page10, lines 3-4 from bottom. Are these two alternative conformations justified?

      We are unsure what this is referring to.

      (14) Page 12, Fig. 2A. In comparison with Supplement Fig 2C, the direction of axes is changed. Could they be similar in both Figures?

      We have updated Supplementary Figure 2C to have the same direction of axes as Figure 2A.

      (15) Page 15, section's title. Choose a single verb in "demonstrate indicate".

      We have amended the title of this section to be:

      “Simulated data demonstrate qFit is appropriate for high-resolution data.”

      (16) Page 15, paragraph 2. "Structure factors from 0.8 to 3.0 A resolution" does not mean what the author wanted apparently to tell: "(complete?) data sets with the high-resolution limit which varied from 0.8 to 3.0 A ...". Also, a phrase of "random noise increasing" is not illustrated by Figs.5 as it is referred to.

      We have edited this sentence to now read:

      “To create the dataset for resolution dependence, we used the ground truth 7KR0 model, including all alternative conformations, and generated artificial structure factors with a high resolution limit ranging from  0.8 to 3.0 Å resolution (in increments of 0.1 Å).”

      (17) Page 15, last paragraph is written in a rather formal and confusing way while a clearer description is given in the figure legend and repeated once more in Methods. I would suggest to remove this paragraph.

      We agree that this is confusing. Instead of create a true positive/false positive/true negative/false negative matrix, we have just called things as they are, multiconformer or single conformer and match or no match. We have edited the language the in the manuscript and figure legends to reflect these changes.

      (18) Page 16. Last two paragraphs start talking about a new story and it would help to separate them somehow from the previous ones (sub-title?).

      We agree that this could use a subtitle. We have included the following subtitle above this section:

      “Simulated multiconformer data illustrate the convergence of qFit.”

      (19) Page 20. "or static" and "we determined that" seem to be not necessary.

      We have removed static and only used single conformer models. However, as one of the main conclusions of this paper is determining that qFit can pick up on alternative conformers that were modeled manually, we have decided to the keep the “we determined that”.

      (20) Page 21, first paragraph. "Data" are plural; it should be "show" and "require"

      We have made these edits. The sentence now reads:

      “However, our data here shows that not only does qFit need a high-resolution map to be able to detect signal from noise, it also requires a very well-modeled structure as input.”

      (21) Page 21, References should be indicated as [41-45], [35,46-48], [55-57]. A similar remark to [58-63] at page 22.

      We have fixed the reference layout to reflect this change.

      (22) Page 21, last paragraph. "Further reduce R-factors" (moreover repeated twice) is not correct neither by "further", since here it is rather marginal, nor as a goal; the variations of R-factors are not much significant. A more general statement like "improving fit to experimental data" (keeping in mind density maps) may be safer.

      We agree with the duplicative nature of these statements. We have amended the sentence to now read:

      “Automated detection and refinement of partial-occupancy waters should help improve fit to experimental data further reduce Rfree15 and provide additional insights into hydrogen-bond patterns and the influence of solvent on alternative conformations.”

      (23) Page 22. Sub-sections of "Methods" are given in a little bit random order; "Parallelization of large maps" in the middle of the text is an example. Put them in a better order may help.

      We have moved some section of the Methods around and made better headings by using an underscore to highlight the subsections (Generating and running the qFit test set, qFit improved features, Analysis metrics, Generating synthetic data for resolution dependence).

      (24) Page 24. Non-convex solution is a strange term. There exist non-convex problems and functions and not solutions.

      We agree and we have changed the language to reflect that we present the algorithm with non-convex problems which it cannot solve.

      (25) Page 26, "Metrics". It is worthy to describe explicitly the metrics and not (only) the references to the scripts.

      For all metrics, we describe a sentence or two on what each metric describes. As these metrics are well known in the structural biology field, we do not feel that we need to elaborate on them more.

      (26) Page 26. Multiplying B by occupancy does not have much sense. A better option would be to refer to the density value in the atomic center as occ*(4*pi/B)^1.5 which gives a relation between these two entities.

      We agree and have update the B-factor figures and metrics to reflect this.

      (27) Page 40, suppl. Fig. 5. Due to the color choice, it is difficult to distinguish the green and blue curves in the diagram.

      We have amended this with the colors of the curves have been switched.

      (28) Page 42, Suppl. Fig. 7. (A) How the width of shaded regions is defined? (B) What the blue regions stand for? Input Rfree range goes up to 0.26 and not to 0.25; there is a point at the right bound. (C) Bounds for the "orange" occupancy are inversed in the legend.

      (A) The width of the shaded region denotes the standard deviations among the values at every resolution. We have made this clearer in the caption

      (B) The blue region denotes the confidence interval for the regression estimate. Size of the confidence interval was set to 95%. We have made this clearer in the caption

      (C) This has been fixed now

      The maximum R-free value is 0.2543, which we rounded down to 0.25.

      (29) Page 43. Letters E-H in the legend are erroneously substituted by B-E.

      We apologize for this mistake. It is now corrected.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Comments to the Author):

      Summary:

      In this study, Xie and colleagues aimed to explore the function and potential mechanisms of the gut microbiota in a hamster model of severe leptospirosis. The results demonstrated that Leptospira infection was able to cause intestine damage and inflammation. Leptospira infection promoted an expansion of Proteobacteria, increased gut barrier permeability, and elevated LPS levels in the serum. Thus, they proposed an LPS-neutralization therapy which improved the survival rate of moribund hamsters combined with antibody therapy or antibiotic therapy.

      Strengths:

      The work is well-designed and the story is interesting to me. The gut microbiota is essential for immunity and systemic health. Many life-threatening pathogens, such as SARS-CoV-2 and other gut-damaged infection, have the potential to disrupt the gut microbiota in the later stages of infection, causing some harmful gut microbiota-derived substances to enter the bloodstream. It is emphasized that in addition to exogenous pathogenic pathogens, harmful substances of intestinal origin should also be considered in critically ill patients.

      Weaknesses:

      Q1: There are many serotypes of Leptospira, it is suggested to test another pathogenic serotype of Leptospira to validate the proposed therapy.

      That’s a constructive suggestion. We have tested another pathogenic serotype of Leptospira (L. interrogans serovar Autumnalis strain 56606) to verify the LPS-neutralization therapy combined with antibiotic therapy (Supplementary Fig. S9B). The results showed that the combination of the LPS-neutralization therapy with antibody therapy or antibiotic therapy also significantly improved the survival rate of hamsters infected by 56606.

      Q2: Authors should explain why the infective doses of leptospires was not consistent in different study.

      Thank you for your comment. To examine the role of the gut microbiota on acute leptospirosis, the infective doses of leptospires was chosen for 106, while in other sections of the study, the infective doses of leptospires was chosen for 107. In fact, we also used 107 leptospires to infect hamsters, however, the infective doses of 107 leptospires might be overdose, there was no significant difference on the survival rate between the control group and the Abx-treated group. A previous study also highlighted that the infective doses of leptospires was important in the investigating the sex on leptospirosis, as male hamsters infected with L. interrogans are more susceptible to severe leptospirosis after exposure to lower infectious doses than females (103 leptospires but not 104 leptospires) (1).

      Reference

      (1) GOMES C K, GUEDES M, POTULA H H, et al. Sex Matters: Male Hamsters Are More Susceptible to Lethal Infection with Lower Doses of Pathogenic Leptospira than Female Hamsters (J). Infect Immun, 2018, 86(10).

      Q3: In the discussion section, it is better to supplement the discussion of the potential link between the natural route of infection and leptospirosis.

      Thank for your suggestion. We have supplemented it in the discussion (line 523-527 in the track change PDF version).

      Q4: Line 231, what is the solvent of thioglycolate?

      We have supplemented it in the manuscript (line 242-243 in the track change PDF version).

      Q5: Lines 962-964, there are some mistakes which are not matched to Figure 7.

      Thank you for pointing that out, we have corrected it in the manuscript.

      Reviewer #2 (Comments to the Author):

      Summary:

      Severe leptospirosis in humans and some mammals often meet death in the endpoint. In this article, authors explored the role of the gut microbiota in severe leptospirosis. They found that Leptospira infection promoted a dysbiotic gut microbiota with an expansion of Proteobacteria and LPS neutralization therapy synergized with antileptospiral therapy significantly improved the survival rates in severe leptospirosis. This study is well-organized and has potentially important clinical implications not only for severe leptospirosis but also for other gut-damaged infections.

      Weaknesses:

      Q1: In the Introduction section and Discussion section, the authors should describe and discuss more about the differences in the effect of Leptospira infection between mice and hamsters, so that the readers can follow this study better.

      Thank you for your suggestion, we have supplemented it in the manuscript (line 62-66 in the track change PDF version).

      Q2: Lines 92-95, the authors should explain why they chose two different routines of infection.

      Thank you for your comment, we have explained it in the manuscript (line 100 in the track change PDF version).

      Q3: Line 179-180, the concentration of PMB and Dox is missed, and 0.016 μg/L is just ok.

      We have corrected it in the manuscript.

      Q4: "μL" or "μl" and "mL" or "ml' should be uniform in the manuscript.

      Thank you for your suggestion, we have revised it in the manuscript.

      Q5: In the culture of primary macrophages, how many cells are inoculated in the plates should be described clearly.

      We have supplemented it in the manuscript (line 250 in the track change PDF version).

      Q6: Line 271, it is better to list primers used for leptospiral detection in the text. Because it allows readers to find the information they need more directly.

      Thank you for your suggestions, we have supplemented it in the manuscript (line 281-284 in the track change PDF version).

      Q7: Line 366-369, Lactobacillus seems to be a kind of key bacteria during Leptospira infection. A previous study (doi: 10.1371/journal.pntd.0005870) also demonstrated that pre-treatment with Lactobacillus plantarum prevented severe pathogenesis in mice. The authors should discuss the potential probiotic for leptospirosis prevention.

      We have discussed it in the manuscript (line 564-566 in the track change PDF version).

      Q8: Lines 450-451, not all concentrations of fecal filtration from two groups upregulated all gene expression mentioned in the text, the authors should correct it.

      Thank you for pointing that out, we have corrected it in the manuscript (line 461-462 in the track change PDF version).

      Reviewer #3 (Comments to the Author):

      Summary:

      This is a well-prepared manuscript that presented interesting research results. The only defect is that the authors should further revise the English language.

      Strengths:

      The omics method produced unbiased results.

      Weaknesses:

      Q1: LPS neutralization is not a new method for treating leptospiral infection.

      Thank you for your comment. Yes, LPS neutralization is not a new method for treating leptospiral infection, most of which might focus on leptospiral LPS. In addition, Leptospira seemed to be naturally resistant to polymyxin B (1). Recently, neutralizing gut-derived LPS was applied in other diseases which significantly relieved diseases (2-3). In this study, we found that Leptospira infection promoted an expansion of Proteobacteria, increased gut barrier permeability, and elevated LPS levels in the serum. Thus, we proposed an LPS-neutralization therapy which improved the survival rate of moribund hamsters combined with antibody therapy or antibiotic therapy.

      Reference

      (1) LIEGEON G, DELORY T, PICARDEAU M. Antibiotic susceptibilities of livestock isolates of leptospira (J). Int J Antimicrob Agents, 2018, 51(5):693-699.

      (2) MUNOZ L, BORRERO M J, UBEDA M, et al. Intestinal Immune Dysregulation Driven by Dysbiosis Promotes Barrier Disruption and Bacterial Translocation in Rats With Cirrhosis (J). Hepatology, 2019, 70(3):925-938.

      (3) ZHANG X, LIU H, HASHIMOTO K, et al. The gut-liver axis in sepsis: interaction mechanisms and therapeutic potential (J). Crit Care, 2022, 26(1):213.

      Q2: The authors should further revise the English language used in the text.

      Thank you for your suggestion, our manuscript has been polished by American Journal Experts (certificate number: 81C8-C5C1-9D5D-109D-3F23).

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      In their valuable study, Chen et al. aim to define the neuronal role of HMMR, a microtubule-associated protein typically associated with cell division. Their findings suggest that HMMR is necessary for proper neuronal morphology and the generation of polymerizing microtubules within neurites, potentially by promoting the function of TPX2. While the study is recognized as a first step in deciphering the influence of HMMR on microtubule organization in neurons, reviewers note the current work has important gaps and would benefit from further exploration of the mechanism of microtubule stability by HMMR, the link between HMMR-mediated microtubule generation and morphogenesis, and the physiological implications of disrupting HMMR during neuronal morphogenesis.

      Public Reviews:

      Reviewer #1 (Public Review):

      The microtubule cytoskeleton is essential for basic cell functions, enabling intracellular transport, and establishment of cell polarity and motility. Microtubule-associated proteins (MAPs) contribute to the regulation of microtubule dynamics and stability - mechanisms that are specifically important for the development and physiological function of neurons. Here, the authors aimed to elucidate the neuronal function of the MAP Hmmr, which they had previously identified in a quantitative study of the proteome associated with neuronal microtubules.

      The authors conduct well-controlled experiments to demonstrate the localization of endogenous as well as exogenous Hmmr on microtubules within the soma as well as all neurites of hippocampal neurons. Functional analysis using gain- and loss-of-function approaches demonstrates that Hmmr levels are crucial for neuronal morphogenesis, as the length of both dendrites and axons decreases upon loss of Hmmr and increases upon Hmmr overexpression. In addition to length alterations, the branching pattern of neurites changes with Hmmr levels. To uncover the mechanism of how Hmmr influences neuronal morphology, the authors follow the lead that Hmmr overexpression induces looped microtubules in the soma, indicative of an increase in microtubule stability. Microtubule acetylation indeed decreases and increases with Hmmr LOF and GOF, respectively. Together with a rescue of nocodazole-induced microtubule destabilization by Hmmr GOF, these results argue that Hmmr regulates microtubule stability. Highlighted by the altered movement of a plus-end-associated protein, Hmmr also has an effect on the dynamic nature of microtubules. The authors present evidence suggesting that the nucleation frequency of neuronal microtubules depends on Hmmr's ability to recruit the microtubule nucleator Tpx2. Together, these data add novel insight into MAP-mediated regulation of microtubules as a prerequisite for neuronal morphogenesis. While the data shown support the author's conclusions, the study also has several weaknesses:

      • The study appears incomplete as the initial proteomics analysis which is referenced as an entry into the study is not presented. This surely is the authors' choice, however, without presenting this data set, it would make more sense if the authors first showed the localization of Hmmr on neuronal microtubules and then started with the functional analysis.

      The reviewer suggests moving the Hmmr localization data in front of the loss- and gain-of-function data because we did not present the proteomics data. However, we still believe placing the loss- and gain-of-function data in the beginning is the better arrangement. This is because it allows the audience to see the drastic changes on neuronal morphology when HMMR is depleted or overly abundant. It also provides a better linkage between HMMR’s localization on microtubules and its effect on the stability and dynamics of microtubules.

      • Neurite branching is quantified, but the methods used are not consistent (normalized branch density vs. Sholl analysis) and there is no distinction between alterations of branching in dendrites vs. axons. This information should be added as it could prove informative with respect to the physiological function of Hmmr in neurite branching.

      Sholl analysis is considered the gold standard in neurite branching analyses. However, in the knockdown experiment (Figure 1A~1E), HMMR-depleted neurons exhibited extremely short axons (<100 μm) and dendrites (<40 μm). Using Sholl analysis to assess the branching of these Hmmrdepleted neurons became unsuitable. That is why we used normalized branch density (Figure 1E) in the knockdown experiment and Sholl analysis (Figure 1J) in the overexpression experiment.

      Regarding the branching difference between axons and dendrites, only axons exhibit branches at 4 DIV. Therefore, the branching analysis focuses on axons rather than on dendrites. We have revised the manuscript to clarify this.

      • The authors show that altered Hmmr levels affect neurite branching and identify an effect on microtubule stability and dynamics as a molecular mechanism. However, how branching correlates with or is regulated by Hmmr-mediated microtubule dynamics is neither addressed experimentally nor discussed by the authors. The physiological significance of altered neuronal morphogenesis also lacks discussion.
      • To discuss how branching correlates with or is regulated by HMMR-mediated microtubule dynamics, we have added the following paragraph into the Discussion section:

      “It has been shown that compromising microtubule nucleation in neurons by SSNA1 mutant overexpression prevents proper axon branching (Basnet et al., 2018). Additionally, dendritic branching in Drosophila sensory neurons depends on the orientation of microtubule nucleation. Nucleation that results in an anterograde microtubule growth leads to increased branching, while nucleation that results in a retrograde microtubule growth leads to decreased branching (Yalgin et al., 2015). These results demonstrate the importance of microtubule nucleation on neurite branching. It is conceivable that overexpressing a microtubule nucleation promoting protein such as HMMR results in an increase of branching complexity.”

      • In terms of discussing the physiological significance of altered neuronal morphogenesis. We have added the following paragraph to the Discussion section:

      “Neurons are the communication units of the nervous system. The formation of their intricate shape is therefore crucial for the physiological function. Alterations in neuronal morphogenesis have a profound impact on how nerve cells communicate, leading to a variety of physiological consequences. These consequences include impaired neural circuit formation and function, compromised signal transmission between neurons, as well as altered anatomical structure of the CNS. Depending on the specific type and location of the morphogenetically altered neurons, the physiological consequences can include neurological disorders such as autism spectrum disorder (Berkel et al., 2012) and schizophrenia (Goo et al., 2023), as well as learning and memory deficits (Winkle et al., 2016). However, due to the involvement of HMMR on mitosis, most HMMR mutations are associated with familial cancers (based on ClinVar data).”

      • Multiple times, the manuscript lacks a rationale for an experimental approach, choice of cell type, time points, regions of interest, etc. Also, a meaningful description of the methods and for how data were analyzed is missing, making the paper hard to read for someone not directly from the field.

      We understand the reviewer’s comments regarding the lack of rationale for choosing the experimental approach, choice of cell type, time points, regions of interest, etc. As a result, we have added the rationales where appropriate to help readers from other fields to better understand the choice of cell type, time points, regions of interest, etc. A brief explanation is shown below:

      • Approach and timing: We employed both electroporation (immediate but milder expression) and lipofectamine transfection (delayed but stronger expression). We prioritized knocking down HMMR early in development, so electroporation was used. For overexpression experiments, we chose lipofectamine which allows high protein expression level to be achieved.

      • Cell selection: Hippocampal neurons were chosen in experiments that involve morphological quantification due to their homogeneous morphology. On the other hand, cortical neurons were selected in experiments that require large amounts of neurons and/or experiments where we want to demonstrate the universality of a proposed hypothesis.

      • Regions of interest (ROIs): In our previous publication (Chen et al., 2017), it was discovered that a significant reduction of EB3 emanation frequency can be detected at the tip and the base of the neurite but not in the middle of the neurite in TPX2-depleted neurons. The reason for this difference is due to the presence of GTP-bound Ran GTPase (RanGTP) at the tip and the base of the neurite. Since RanGTP has also been shown to regulate the interaction between HMMR and TPX2 in the cell-free system (Scrofani et al., 2015), it is possible that the same phenomenon can be observed in HMMR-depleted neurons. This is why we examined those 3 ROIs in Figure 4.

      Reviewer #2 (Public Review):

      The mechanism of microtubule formation, stabilization, and organization in neurites is important for neuronal function. In this manuscript, the authors examine the phenotype of neurons following alteration in the level of the protein HMMR, a microtubule-associated protein with established roles in mitosis. Neurite morphology is measured as well as microtubule stability and dynamic parameters using standard assays. A binding partner of HMMR, TPX2, is localized. The results support a role for HMMR in neurons.

      The work presented in this manuscript seeks to determine if a MAP called HMMR contributes to microtubule dynamics in neurons. Several steps, including validation of the RNAi, additional statistical analysis, use of cells at the same age in culture, and better documentation in figures, would increase the impact of the work.

      In many places, the data can be improved which might make the story more convincing. As presented, the results show that HMMR is distributed as puncta on neurons with data coming from a single HMMR antibody, and some background staining that was not discussed. In the discussion the authors state that HMMR impacts microtubule stability, which was evaluated by the presence of post-translational modification and resistance to nocodazole; the data are suggestive but not entirely convincing. The discussion also states that HMMR increases the “amount” of growing microtubules which was measured as the frequency of comet appearance. The authors did not comment on how the number of growing microtubules results in the observed morphological changes.

      We actually tested several HMMR antibodies, including E-19 (Santa Cruz, sc-16170), EPR4054 (Abcam, ab124729), and a variety of antibodies provided by Prof. Eva Turley. E-19 performed the best in immunofluorescence (IF) staining and knockdown validation. The other antibodies either failed to detect HMMR in IF staining or generate excessive background signal. We understand that the final images are produced using a single antibody. But since we meticulous validated this antibody and that the localization of overexpressed HMMR is consistent with the endogenous HMMR, we are very confident about our data generated using this single antibody.

      We have added the following paragraph in the Discussion section to elucidate how the number of growing microtubules result in the observed morphological changes such as an increase of axon branches:

      “It has been shown that compromising microtubule nucleation in neurons by SSNA1 mutant overexpression prevents proper axon branching (Basnet et al., 2018). Additionally, dendritic branching in Drosophila sensory neurons depends on the orientation of microtubule nucleation. Nucleation that results in an anterograde microtubule growth leads to increased branching, while nucleation that results in a retrograde microtubule growth leads to decreased branching (Yalgin et al., 2015). These results demonstrate the importance of microtubule nucleation on neurite branching. It is conceivable that overexpressing a microtubule nucleation promoting protein such as HMMR results in an increase of branching complexity.

      Reviewer #1 (Recommendations for The Authors):

      (1) The manuscript jumps extensively between main figures and supplementary figures. Please check whether parts of the supplement could be moved to the main figures.

      We understand the frustration of moving back and forth between the main figures and supplementary figures. After examining the manuscript, we decided to combine Figure 2A with Figure S3.

      (2) In Figure 1, total neurite length between days 3 and 4 DIV does not appear to change - can this be true?

      Please check or else explain.

      We carefully re-examined our raw data and found out the total neurite length of 4 DIV hippocampal neurons expressing non-targeting shRNA (Figure 1B) and that of 3 DIV hippocampal neurons expressing AcGFP (Figure 1G) are indeed very similar. The explanation is that the 3 DIV hippocampal neurons used for Figure 1G was cultured in low-density and in the presence of cortical neuron-conditioned neurobasal medium (as written in Methods, Neuron culture and transfection section). The low-density culture with minimal overlapping neurites allowed us to better quantify total neurite length, because neurons expressing AcGFP-mHMMR sprouted long and highly branched axons. However, the addition of cortical neuron-conditioned neurobasal medium promoted neurite elongation. This is the reason why the total neurite length of 4 DIV hippocampal neurons expressing non-targeting shRNA (Figure 1B) and that of 3 DIV hippocampal neurons expressing AcGFP (Figure 1G) is similar.

      (3) Groen et al. have shown that Hmmr also bundles microtubules, a mechanism that surely is important for neuronal microtubules. Please discuss.

      We thank the reviewer for pointing out that HMMR also bundles microtubules and have added this to our revised Discussion section:

      “It has been shown that the Xenopus HMMR homolog XRHAMM bundles microtubules in vitro (Groen et al., 2004). In addition, deleting proteins which promote microtubule bundling (e.g., doublecortin knockout, MAP1B/MAP2 double knockout) leads to impaired neurite outgrowth (Bielas et al., 2007; Teng et al., 2001). These observations are consistent with our data that overexpressing HMMR leads to the increased axon and dendrite outgrowth, while depleting it results in the opposite phenotype (Figure 1).”

      (4) Please explain why in Figure 4, cortical neurons were chosen for analysis and why and how the three different ROIs were picked.

      To answer the question why we chose cortical neurons for the analyses in Figure 4, it will be important to explain why we used hippocampal neurons for other figures. Primary hippocampal neurons have a high homogeneity in terms of their morphology. This uniform morphology allows more consistent morphological quantification. Figure 4, however, does not involve morphological quantification. We are more confident to conclude that HMMR regulates microtubule dynamics if this effect can be detected in the relatively heterogeneous cortical neurons. These are the reasons why we chose to analyze cortical neurons in Figure 4.

      In our previous publication (Chen et al., 2017), it was discovered that a significant reduction of EB3 emanation frequency can be detected at the tip and the base of the neurite but not in the middle of the neurite in TPX2-depleted neurons. The reason for this difference is due to the presence of GTP-bound Ran GTPase (RanGTP) at the tip of the neurite and in the soma. Since RanGTP has also been shown to regulate the interaction between HMMR and TPX2 in the cell-free system (Scrofani et al., 2015), it is possible that the same phenomenon can be observed in HMMR-depleted neurons. This was why we examined those 3 ROIs in Figure 4.

      (5) Microtubule looping has been shown to occur in regions prior to branch formation (e.g. Dent et al. 2004). As the authors identify increased looping upon Hmmr GOF, this should be discussed.

      We thank the reviewer for pointing out that microtubule looping occurs in regions of branch formation and have added this to our revised discussion:

      “It is worth noting that the elevated level of HMMR increases the branching density of axons (Figure 1J) and promotes the formation of looped microtubules (Figure 3A). This is consistent with the observations that looped microtubules are often detected in regions of axon branch formation (Dent et al., 1999; Dent and Kalil, 2001; Purro et al., 2008).”

      Reviewer #2 (Recommendations for The Authors):

      (1) The work seeks to gain insight into microtubule behavior in neurons, an important issue.

      (2) Several steps, including validation of the RNAi, additional statistical analysis, use of cells at the same age in culture, and better documentation in figures, would increase the impact of the work.

      (3) Figure 1 documents the results of experiments in which the HMMR protein was depleted using shRNA. A western blot of cell extracts from control and depleted cells is needed to verify that the protein level is reduced; alternatively, documentation of the reduction in RNA levels in treated cells could be provided. Neurite, axon, and dendrite length and branch density are measured. The neurite length is in microns, and the axon length is normalized to 100% of the non-treated cells. Please use the same for measures for easier comparison. Looking at the images in Figure 1, the length of the dendrites does not look different in the examples shown, whereas the axon appears shorter. This impression is not supported by the quantification. Are representative images shown? Additionally, the authors should report the values for each replicate of the experiment and compare the three averages rather than comparison of lengths from all measurements. A related issue is that the dendrites do not look longer in panel F, following overexpression of HMMR. For examples of using averages of replicates see: https://pubmed.ncbi.nlm.nih.gov/32346721/

      The reviewer mentioned that Western blot of cell extracts or RNA quantification from control and depleted cells are needed to verify that the protein level is reduced.

      Unfortunately, these assays are extremely difficult to perform in primary neurons due to the low transfection efficiency. We believe that the consistent knockdown phenotype from 3 different shRNA sequences (Figure 1A-D) and the immunofluorescence staining in depleted primary neurons (Figure S2) are sufficient to confirm that HMMR level is reduced.

      We revised Figure 1C, 1D, 1H, 1I so that axon and dendrite lengths are all in micron.

      We selected another image for the non-targeting control in Figure 1A to better demonstrate the reduction of dendrite length when HMMR is knocked down.

      We thank the reviewer for the suggestion of comparing the three average values rather than comparing all measurements. We have performed statistical analyses for all our data using the average values and revised the graphs accordingly. While the P-values changed, our conclusions remain the same.

      We thank the reviewer for pointing out this discrepancy and have selected another image of the AcGFP control for Figure 1F to better demonstrate the increase of dendrite length when HMMR is overexpressed.

      (4) Given the changes in neurite morphology, the authors examine the localization of endogenous and overexpressed. The supplemental figures (see S2 and S3) show evidence that HMMR is present in a punctate pattern by conventional immunofluorescence. This is reasonable evidence that the protein is in a linear pattern along cytoskeletal microtubules and that the signal is present in puncta. Please move this to the main text, perhaps replacing Figure 2A, which is low magnification and very hard to see the HMMR staining. Additionally, the level of overexpression of HMMR is not mentioned. Please address this; were cells with similar levels of overexpression selected? Did the result depend on the overexpression? A related issue is the DIV for the cells - some are examined earlier and some at later times; does this impact the results? Please provide information or perform experiments with consistent timing. For the immunofluorescence, were multiple antibodies tried to see if the result was the same with each? Were different fixations, in addition to methanol, utilized?

      We have replaced Figure 2A with Figure S3 based on the reviewer’s suggestion.

      In the HMMR overexpression experiments, we used HMMR antibody and immunofluorescence staining to confirm that the overexpression is achieved. However, we did not quantify to what extend HMMR was overexpressed.

      We performed all the depletion experiments on 4 DIV to maximize knockdown efficiency and performed all the overexpression experiments on 3 DIV to prevent excessive axon fasciculation. Nonetheless, we examined the effect of HMMR depletion on neuronal morphology on 3 DIV. The trend of reduced total neurite length, axon length, and dendrite length can be observed, but no statistical significance can be detected. We also examined the effect of HMMR overexpression on neuronal morphology on 4 DIV and did observe an increase of total neurite length, axon length, and dendrite length. But the overlapping and bundled axons made reliable quantification extremely difficult.

      We actually tested multiple HMMR antibodies, such as E-19 (Santa Cruz, sc-16170), EPR4054 (Abcam, ab124729), and a variety of antibodies provided by Prof. Eva Turley. E19 performed the best in immunofluorescence (IF) staining and knockdown validation. The other antibodies either failed to detect HMMR in IF staining or generate excessive background signal. We also tested various fixation methods, including 37°C formaldehyde fixation, -20°C methanol fixation, 37°C formaldehyde followed by -20°C methanol fixation. All fixation methods generated similar IF staining pattern using the E-19 antibody, but 3.7% formaldehyde fixation produced the highest signal.

      (5) In Figure 2 C it is hard to see DAPI fluorescence. Are the white areas in the merge with bright cell nuclei? Is Figure 2C control or overexpressing cells? If this is endogenous, is there less signal in PLA compared with S4, which was in culture longer and is overexpressed prior to using PLA for detection?

      The white areas in Figure 2C the reviewer mentioned are not cell nuclei, they are actually bubbles formed within the mounting medium.

      HMMR detected in Figure 2C is endogenous. We did not quantitatively compare the PLA signals in Figure 2C and those in Figure S4. This is because the PLA signals in Figure 2C are generated using anti-HMMR (to detect endogenous HMMR) and anti-β-III-tubulin antibodies while those in Figure S4 are generated using anti-AcGFP (to detect overexpressed AcGFP-mHMMR) and anti-β-III-tubulin antibodies. Since the affinity of the two antibodies (i.e., anti-HMMR and anti-AcGFP) toward their antigens is different, comparing the PLA signals is not informative.

      (6) The images of the endogenous HMMR (Fig S3) and the PLA with tubulin and HMMR antibodies are not the same (2C). The "dots" in PLA are widely separated; gauging from the marker bar length of 50 μm, the small clusters of dots are about 10 μm apart. In Figure S3, the puncta are much more closely spaced, appearing almost in a linear fashion along the microtubules. Enlarging the PLA image shows that each dot is very small - just a few pixels - please provide additional explanation including the minimal detection limit for the method, and why the images differ. If the standard immunofluorescence signal was enhanced, for example with the use of two secondaries, what is observed? Is the distribution of HMMR similar for both dendrites and axons? Microtubule polarity differs in these locations, so greater attention to this point seems of interest. There is a significant amount of punctate HMMR in the cytoplasm (or outside the cytoplasm?) in Figure S5; this is concerning. Please outline the cell edge for ease of visualization. What is the distribution of HMMR in a cell that has been treated with cold and/or nocodazole to disassemble the microtubules? is the signal lost?

      The reasons images of the endogenous HMMR (Figure S3) and the PLA with tubulin and HMMR antibodies (Figure 2C) differ are due to the following reasons. o PLA utilizes two primary antibodies to target two different epitopes on HMMR and βIII-tubulin. It is conceivable that not every anti-HMMR antibody has the correct orientation and/or proximity (<40 nm) toward the anti-β-III-tubulin antibody to enable DNA amplification. This results in the shortage of PLA puncta compared to immunofluorescence signals.

      • The creator of PLA has pointed out that in situ PLA is a method based upon equilibrium reactions and several enzymatic steps. Therefore, only a fraction of the inter-acting molecules is detected (Weibrecht et al., 2010).

      We have not used signal enhancing immunofluorescence staining methods [e.g., using tertiary antibodies or tyramide signal amplification (TSA)] to detect HMMR. This is mainly because HMMR signal is strong enough to be detected using standard immunofluorescence staining.

      Regarding the question “Is the distribution of HMMR similar for both dendrites and axons?” The reviewer raised a very important issue about the polarity difference of microtubules in axons (uniform) and dendrites (mixed). We were aware of such issue and very carefully examined the distribution and signal intensity of HMMR in axons vs dendrites. However, no differences were detected.

      The reviewer mentioned that “there is a significant amount of punctate HMMR in the cytoplasm (or outside the cytoplasm?) in Figure S5; this is concerning. Please outline the cell edge for ease of visualization.” Instead of outlining the cell edge, we have selected another image to facilitate the visualization of HMMR signals. There are indeed HMMR signals outside the cell. However, these outside signals are usually weaker and smaller in size compared to those inside the cell.

      After the examination of neurons expressing AcGFP-mHMMR with or without 100 nM nocodazole treatment, we did not notice any difference of AcGFP-mHMMR in distribution. We did not examine the distribution and signal intensity of the endogenous HMMR.

      (7) To determine if HMMR alters microtubule stability, the authors examine the distribution of acetylated tubulin and resistance to nocodazole-induced microtubule disassembly. In Figure 3 please show immunofluorescence images of the acetylated tubulin staining, not just the ratio images; the color is not obviously different in the various panels shown. For statistical analysis, see the comment above for Figure 1. For the nocodazole experiment, a similar change in neurite length following drug treatment was observed (Figure 3H), for the experimental and control, even though the starting length was greater in the overexpressing cells. Please consider the possibility that in both cases the microtubules are only partially resistant to nocodazole and that HMMR is not changing the fraction of microtubules that are sensitive to the drug. The cells were treated at 3 DIV; the authors note that more stable microtubules accumulate with time; how does time in culture impact stability? Often, acute treatment with a high concentration of nocodazole is used to assay microtubule stability; here the authors used a low (nM) concentration for 2 days (chronic). Why not use a higher concentration (1-10 μM) for a shorter incubation? The data show that overexpression of HMMR results in curved, buckled microtubules are these microtubules more acetylated and/or retained after nocodazole treatment?

      The reviewer suggested that we show immunofluorescence images of the acetylated tubulin staining, not just the ratio images. But we still believe showing the ratio images is the better approach. This is because the microtubules density can be different from neuron to neuron. Showing acetylated tubulin may provide a false impression when the overall microtubule density is higher or lower in a particular neuron. We realized that “16 colors” pseudo-color scheme has the cyan color at the lower intensity which can sometimes be confused with the white color at the higher intensity. Therefore, we changed the pseudocolor from “16 colors” to “fire” for Figure 3B and 3E to better visualize these images so that they appear more consistent with the quantitative data.

      The reviewer raised a very good question regarding the possibility that HMMR is not changing the fraction of microtubules that are sensitive to nocodazole. We re-conducted the same experiment and used a series of different nocodazole concentrations. While the addition of nocodazole causes a concentration-dependent reduction of total neurite length in both AcGFP and AcGFP-mHMMR expressing neurons, there are subtle differences in the susceptibility of neurite length to the concentration of nocodazole. 1) 10 nM nocodazole treatment causes a significant reduction of neurite length in AcGFP expressing neurons, but not in AcGFP-mHMMR expressing neurons. This result indicates that AcGFP-mHMMR expression increases the tolerance of neurite elongation toward 10 nM nocodazole treatment. 2) 50 nM and 100 nM nocodazole treatment exhibits no statistical significance in AcGFP expressing neurons, suggesting that 50 nM nocodazole has reached maximal effectiveness. In AcGFP-mHMMR expressing neurons, 100 nM nocodazole further reduces the neurite length compared to the 50 nM group. These results argue against the possibility that HMMR does not change the fraction of microtubules that are sensitive to nocodazole. We have revised Figure 3H accordingly.

      The reviewer asked why we did not use the acute nocodazole treatment (μM concentration) to assess the effect of Hmmr on microtubule stability. This is because we used the neurite length as an indicator for microtubule stability. That is why the chronic treatment was chosen to produce a more detectable effect on neurite length.

      The reviewer asked whether the looped microtubules caused by HMMR overexpression are more acetylated and/or nocodazole resistant. While we do not have direct evidence to answer the reviewer’s question, we can deduce the answer from our observations. We noticed that looped microtubules are only present when HMMR is highly expressed (i.e., using lipofection to introduce HMMR-expressing plasmid) but not when HMMR is mildly expressed (i.e., using electroporation to introduce HMMR-expressing plasmid). From these observations, we can conclude that HMMR is more abundantly present on looped microtubules. Since HMMR overexpression leads to higher microtubule acetylation (Figure 3E), looped microtubules which contains more HMMR are most likely to be more acetylated.

      (8) An additional measure of microtubule dynamics is to measure the growth of microtubules using a live cell marker for microtubule plus ends. Such experiments were performed, using tagged EB3. The images are rather fuzzy. Parameters of microtubule dynamics were measured at three locations - is there data that the authors can cite about any differences in dynamics in control cells at these locations? They look very similar, so it is not clear why the different locations were used. It is not possible to learn much from the kymographs which look similar for all panels; I would remove these unless they can be changed or labeled to help the reader. Data is presented for three shRNA reagents. No data are presented to document the extent to which the protein is depleted with these reagents. This should be fixed. Alternatively, an RNAi pool could be utilized. Is there a control for off-target effects? For the analysis were all the comets used to generate the average values? What about a comparison of the average of each trial - not each comet?

      In our previous publication (Chen et al., 2017), it was discovered that a significant reduction of EB3 emanation frequency can be detected at the tip and the base of the neurite but not in the middle of the neurite in TPX2-depleted neurons. The reason for this difference is due to the presence of RanGTP at the tip and the base of the neurite. Since RanGTP has also been shown to regulate the interaction between HMMR and TPX2 in the cell-free system (Scrofani et al., 2015), it is possible that the same phenomenon can be observed in HMMR-depleted neurons. This is why we examined those 3 ROIs in Figure 4.

      We notice that photobleaching causes the EB3-mCherry signal to diminish at later time points, which made it difficult to observe the differences amongst kymographs. In the revised Figure 4B and 4D, we removed the second half of all the kymographs to make the differences more obvious.

      The reviewer mentioned that there are no data documenting the extent to which the protein is depleted with the shRNAs. These data are shown in Figure S2, in which we quantified the HMMR protein level in the soma and along the neurite in neurons expressing different shRNA molecules.

      The reviewer asked whether there is a control for off-target effects. The answer is yes. We performed the rescue experiment to control for off-target effects, which is shown in Figure S1.

      We revised Figure 4 so that the dynamic properties of EB3 are quantified using the average of each experimental repetition.

      (9) In a final experiment, the authors examine the distribution of TPX2, a binding partner of HMMR. Include a standard immunofluorescence in addition to PLA to illustrate the distribution of TPX2. The quantification used was the inter puncta distance; please quantify the signal in control and treated cells.

      The reviewer asked us to include a standard immunofluorescence staining to illustrate the distribution of TPX2. We have done that in our previous publication (Chen et al., 2017) and TPX2 localizes primarily to the centrosome (https://www.nature.com/articles/srep42297/figures/2). In order to enhance the weak signal of TPX2 along the neurite, we actually needed to use PLA in that publication (https://www.nature.com/articles/srep42297/figures/3).

      Proximity ligation assay (PLA) generates fluorescent signals based on a local enzymatic reaction which catalyzes the amplification of a specific DNA sequence that can then be detected using a red fluorescent probe. Because this enzymatic reaction is not linear, the amount of amplified DNA nor the intensity of the fluorescence does not correlate with the strength of the interaction (Soderberg et al., 2006). As a result, quantification of PLA is typically done by counting the number of fluorescent puncta per unit area or by calculating the area containing fluorescent signal (not signal intensity) per unit area in the case that PLA signals are too strong and coalesced. That is why our quantification is based on the distance between PLA fluorescent puncta, not the fluorescent signal intensity.

      References

      Basnet, N., H. Nedozralova, A.H. Crevenna, S. Bodakuntla, T. Schlichthaerle, M. Taschner, G. Cardone, C. Janke, R. Jungmann, M.M. Magiera, C. Biertumpfel, and N. Mizuno. 2018. Direct induction of microtubule branching by microtubule nucleation factor SSNA1. Nat. Cell Biol. 20:1172-1180.

      Berkel, S., W. Tang, M. Trevino, M. Vogt, H.A. Obenhaus, P. Gass, S.W. Scherer, R. Sprengel, G. Schratt, and G.A. Rappold. 2012. Inherited and de novo SHANK2 variants associated with autism spectrum disorder impair neuronal morphogenesis and physiology. Hum. Mol. Genet. 21:344-357.

      Bielas, S.L., F.F. Serneo, M. Chechlacz, T.J. Deerinck, G.A. Perkins, P.B. Allen, M.H. Ellisman, and J.G. Gleeson. 2007. Spinophilin facilitates dephosphorylation of doublecortin by PP1 to mediate microtubule bundling at the axonal wrist. Cell. 129:579-591.

      Chen, W.S., Y.J. Chen, Y.A. Huang, B.Y. Hsieh, H.C. Chiu, P.Y. Kao, C.Y. Chao, and E. Hwang. 2017. Ran-dependent TPX2 activation promotes acentrosomal microtubule nucleation in neurons. Sci. Rep. 7:42297.

      Dent, E.W., J.L. Callaway, G. Szebenyi, P.W. Baas, and K. Kalil. 1999. Reorganization and movement of microtubules in axonal growth cones and developing interstitial branches. J. Neurosci. 19:8894-8908.

      Dent, E.W., and K. Kalil. 2001. Axon branching requires interactions between dynamic microtubules and actin filaments. J. Neurosci. 21:9757-9769.

      Goo, B.S., D.J. Mun, S. Kim, T.T.M. Nhung, S.B. Lee, Y. Woo, S.J. Kim, B.K. Suh, S.J. Park, H.E. Lee, K. Park, H. Jang, J.C. Rah, K.J. Yoon, S.T. Baek, S.Y. Park, and S.K. Park. 2023. Schizophrenia-associated Mitotic Arrest Deficient-1 (MAD1) regulates the polarity of migrating neurons in the developing neocortex. Mol. Psychiatry. 28:856-870.

      Groen, A.C., L.A. Cameron, M. Coughlin, D.T. Miyamoto, T.J. Mitchison, and R. Ohi. 2004. XRHAMM functions in ran-dependent microtubule nucleation and pole formation during anastral spindle assembly. Curr. Biol. 14:1801-1811.

      Purro, S.A., L. Ciani, M. Hoyos-Flight, E. Stamatakou, E. Siomou, and P.C. Salinas. 2008. Wnt regulates axon behavior through changes in microtubule growth directionality: a new role for adenomatous polyposis coli. J. Neurosci. 28:8644-8654.

      Scrofani, J., T. Sardon, S. Meunier, and I. Vernos. 2015. Microtubule nucleation in mitosis by a RanGTP-dependent protein complex. Curr. Biol. 25:131-140.

      Soderberg, O., M. Gullberg, M. Jarvius, K. Ridderstrale, K.J. Leuchowius, J. Jarvius, K. Wester, P. Hydbring, F. Bahram, L.G. Larsson, and U. Landegren. 2006. Direct observation of individual endogenous protein complexes in situ by proximity ligation. Nat. Methods. 3:995-1000.

      Teng, J., Y. Takei, A. Harada, T. Nakata, J. Chen, and N. Hirokawa. 2001. Synergistic effects of MAP2 and MAP1B knockout in neuronal migration, dendritic outgrowth, and microtubule organization. J. Cell Biol. 155:65-76.

      Weibrecht, I., K.J. Leuchowius, C.M. Clausson, T. Conze, M. Jarvius, W.M. Howell, M. Kamali-Moghaddam, and O. Soderberg. 2010. Proximity ligation assays: a recent addition to the proteomics toolbox. Expert Rev Proteomics. 7:401-409.

      Winkle, C.C., R.H. Olsen, H. Kim, S.S. Moy, J. Song, and S.L. Gupton. 2016. Trim9 Deletion Alters the Morphogenesis of Developing and Adult-Born Hippocampal Neurons and Impairs Spatial Learning and Memory. J. Neurosci. 36:49404958.

      Yalgin, C., S. Ebrahimi, C. Delandre, L.F. Yoong, S. Akimoto, H. Tran, R. Amikura, R. Spokony, B. Torben-Nielsen, K.P. White, and A.W. Moore. 2015. Centrosomin represses dendrite branching by orienting microtubule nucleation. Nat. Neurosci. 18:1437-1445.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      The authors set up a pipeline for automated high-throughput single-molecule fluorescence imaging (htSMT) in living cells and analysis of molecular dynamics

      Strengths:

      htSMT reveals information on the diffusion and bound fraction of molecules, dose-response curves, relative estimates of binding rates, and temporal changes of parameters. It enables the screening of thousands of compounds in a reasonable time and proves to be more sensitive and faster than classical cell-growth assays. If the function of a compound is coupled to the mobility of the protein of interest, or affects an interaction partner, which modulates the mobility of the protein of interest, htSMT allows identifying the modulator and getting the first indication of the mechanism of action or interaction networks, which can be a starting point for more in-depth analysis.

      Weaknesses:

      While elegantly showcasing the power of high-throughput measurements, the authors disclose little information on their microscope setup and analysis procedures. Thus, reproduction by other scientists is limited. Moreover, a critical discussion about the limits of the approach in determining dynamic parameters, the mechanism of action of compounds, and network reconstruction for the protein of interest is missing. In addition, automated imaging and analysis procedures require implementing sensitive measures to assure data and analysis quality, but a description of such measures is missing.

      The reviewer rightly highlights both the power and complexity in high throughput assay systems, and as such the authors have spent significant effort in first developing quality control checks to support screening. We discuss some of these as part of the description and characterization of the platform. We added additional details into the manuscript to help clarify. The implementation of our workflow for image acquisition, processing and analysis relies heavily on the specifics of our lab hardware and software infrastructure. We have added additional details to the text, particularly in the Methods section, and believe we have added enough information that our results can be reproduced using the suite of tools that already exist for single molecule tracking.

      The reviewer also points out that all assays have limitations, and these have not been clearly identified as part of our discussion of the htSMT platform. We have also added some comments on the limitations of the current system and our approach.

      Reviewer #2 (Public Review):

      Summary:

      McSwiggen et al present a high throughput platform for SPT that allows them to identify pharmaceutics interactions with the diffusional behavior of receptors and in turn to identify potent new ligands and cellular mechanisms. The manuscript is well written, it provides a solid new mentor and a proper experimental foundation

      Strengths:

      The method capitalizes and extends to existing high throughput toolboxes and is directly applied to multiple receptors and ligands. The outcomes are important and relevant for society. 10^6 cells and >400 ligands per is a significant achievement.

      The method can detect functionally relevant changes in transcription factor dynamics and accurately differentiate the ligand/target specificity directly within the cellular environment. This will be instrumental in screening libraries of compounds to identify starting points for the development of new therapeutics. Identifying hitherto unknown networks of biochemical signaling pathways will propel the field of single-particle live cell and quantitative microscopy in the area of diagnostics. The manuscript is well-written and clearly conveys its message.

      Weaknesses:

      There are a few elements, that if rectified would improve the claims of the manuscript.

      The authors claim that they measure receptor dynamics. In essence, their readout is a variation in diffusional behavior that correlates to ligand binding. While ligand binding can result in altered dynamics or /and shift in conformational equilibrium, SPT is not recording directly protein structural dynamics, but their effect on diffusion. They should correct and elaborate on this.

      This is an excellent clarifying question, and we have tried to make it more explicit in the text. The reviewer is absolutely correct; we’re not using SPT to directly measure protein structural dynamics, but rather the interactions a given protein makes with other macromolecules within the cell. So when an SHR binds to ligand it adopts conformations that promote association with DNA and other protein-protein interactions relevant to transcription. This is distinct from assays that directly measure conformational changes of the protein.

      L 148 What do the authors mean 'No correlation between diffusion and monomeric protein size was observed, highlighting the differences between cellular protein dynamics versus purified systems'. This is not justified by data here or literature reference. How do the authors know these are individual molecules? Intensity distributions or single bleaching steps should be presented.

      The point we were trying to make is that the relative molecular weights for the monomer protein (138 kDa for Halo-AR, 102 kDa for ER-Halo, 122 kDa for Halo-GR, and 135 kDa for Halo-PR) is uncorrelated with its apparent free diffusion coefficient. Were we to make this measurement on purified protein in buffer, where diffusion is well described by the Stokes Einstein equation, one would expect to see monomer size and diffusion related. We’ve clarified this point in the manuscript.

      Along the same lines, the data in Figs 2 and 4 show that not only the immobile fraction is increased but also that the diffusion coefficient of the fast-moving (attributed to free) is reduced. The authors mention this and show an extended Fig 5 but do not provide an explanation.

      This is an area where there is still more work to do in understanding the estrogen receptor and other SHRs. As the reviewer says, we see not only an increase in chromatin binding but also a decrease in the diffusion coefficient of the “free” population. A potential explanation is that this is a greater prevalence of freely-diffusing homodimers of the receptor, or other protein-protein interactions (14-3-3, P300, CBP, etc) that can occur after ligand binding. Nothing in our bioactive compound screen shed light on this in particular, and so we can only speculate and have refrained from drawing further conclusions in the text.

      How do potential transient ligand binding and the time-dependent heterogeneity in motion (see comment above) contribute to this? Also, in line 216 the authors write "with no evidence" of transient diffusive states. How do they define transient diffusive states? While there are toolboxes to directly extract the existence and abundance of these either by HMM analysis or temporal segmentation, the authors do not discuss or use them.

      Throughout the analysis in this work, we consider all of tracks with a 2-second FOV as representative of a single underlying population and have not looked at changes in dynamics within a single movie. As we show in the supplemental figures we added (see Figure 3, figure supplement 1), this appears to be a reasonable assumption, at least in the cases we’ve encountered in this manuscript. For experiments involving changes in dynamics over time, these are experiments where we’ve added compound simultaneous with imaging and collect many 2-second FOVs in sequence to monitor changes in ER dynamics. In this case when we refer to “transient states,” we are pointing out that we don’t observe any new states in the State Array diagram that exist in early time points but disappear at later time point.

      The reviewer suggests track-level analysis methods like hidden Markov models or variational Bayesian approaches which have been used previously in the single molecule community. These are very powerful techniques, provided the trajectories are long (typically 100s of frames). In the case of molecules that diffuse quickly and can diffuse out of the focal plane, we don’t have the luxury of such long trajectories. This was demonstrated previously (Hansen et al 2017, Heckert el al 2022) and so we’ve adopted the State Array approach to inferring state occupations from short trajectories. As the reviewer rightly points out, this approach potentially loses information about state transitions or changes over time, but as of now we are not aware of any robust methods that work on short trajectories.

      The authors discuss the methods for extracting kinetic information of ligand binding by diffusion. They should consider the temporal segmentation of heterogenous diffusion. There are numerous methods published in journals or BioRxiv based on analytical or deep learning tools to perform temporal segmentation. This could elevate their analysis of Kon and Koff.

      We’re aware of a number of approaches for analyzing both high framerate SMT as well as long exposure residence time imaging. As we say above, we’re not aware of any methods that have been demonstrated to work robustly on short trajectories aside from the approaches we’ve taken. Similarly, for residence time imaging there are published approaches, but we’re not aware of any that would offer new insight into the experiments in this study. If the reviewer has specific suggestions for analytical approaches that we’re not aware of we would happily consider them.

      Reviewer #3 (Public Review):

      Summary:

      The authors aim to demonstrate the effectiveness of their developed methodology, which utilizes super-resolution microscopy and single-molecule tracking in live cells on a high-throughput scale. Their study focuses on measuring the diffusion state of a molecule target, the estrogen receptor, in both ligand-bound and unbound forms in live cells. By showcasing the ability to screen 5067 compounds and measure the diffusive state of the estrogen receptor for each compound in live cells, they illustrate the capability and power of their methodology.

      Strengths:

      Readers are well introduced to the principles in the initial stages of the manuscript with highly convincing video examples. The methods and metrics used (fbound) are robust. The authors demonstrate high reproducibility of their screening method (R2=0.92). They also showcase the great sensitivity of their method in predicting the proliferation/viability state of cells (R2=0.84). The outcome of the screen is sound, with multiple compounds clustering identified in line with known estrogen receptor biology.

      Weaknesses:

      • Potential overstatement on the relationship of low diffusion state of ER bound to compound and chromatin state without any work on chromatin level.

      We appreciate the reviewers caution in over-interpreting the relationship between an increase in the slowest diffusing states that we observe by SMT and bona fide engagement with chromatin. In the case of the estrogen receptor there is strong precedent in the literature showing increases in chromatin binding and chromatin accessibility (as measured by ChIP-seq and ATAC-seq) upon treatment with either estradiol or SERM/Ds. Taken together with the RNA-seq, we felt it reasonable to assume all the trajectories with a diffusion coefficient less that 0.1 µm2/sec were chromatin bound.

      • Could the authors clarify if the identified lead compound effects are novel at any level?

      Most of the compounds we characterize in the manuscript have not previously been tested in an SMT assay, but many are known to functionally impact the ER or other SHRs based on other biochemical and functional assays. We have not described here any completely novel ER-interacting compounds, but to our knowledge this is the first systematic investigation of a protein showing that both direct and indirect perturbation can be inferred by observing the protein’s motion. Especially for the HSP90 inhibitors, the observation that inhibiting this complex would so dramatically increase ER chromatin-binding as opposed to increasing the speed of the free population is counterintuitive and novel.

      • More video example cases on the final lead compounds identified would be a good addition to the current data package.

      Reviewer #1 (Recommendations For The Authors):

      General:

      • More information on the microscope setup and analysis procedures should be given. Since custom code is used for automated image registration, spot detection, tracking, and analysis of dynamics, this code should be made publicly available.

      Results:

      • line 97: more details about the robotic system and automatic imaging, imaging modalities, and data analysis procedures should be given directly in the text.

      Additional information added to text and methods

      • line 100: we generated three U2OS cell lines --> how?

      Additional information added to text and methods

      • line 101: ectopically expressing HaloTag fused proteins --> how much overexpression did cells show?

      The L30 promoter tends to produce fairly low expression levels. The same approach was used for all ectopic expression plasmids, and for the SHRs the expression levels were all comparable to endogenous levels. We have not checked this for H2B, Caax and free Halo but given that the necessary dye concentration to achieve similar spot densities is within a 10-fold range for all constructs, its reasonable to say that those clonal cell lines will also have modest Halotag expression.

      • line 107: Single-molecule trajectories measured in these cell lines yielded the expected diffusion coefficients --> how was data analysis performed?

      Additional information added to text and methods

      • line 109: how was the localization error determined?

      Additional information added to text and methods

      • line 155: define occupation-weighted average diffusion coefficient.

      Additional information added to text and methods

      • line 157: with 34% bound in basal conditions and 87% bound after estradiol treatment  contradicts figure 2b, where the bound fraction is up to 50% after estradiol treatment.

      Line 157 is the absolute fraction bound, figure 2b is change in fbound

      • line 205: Figure 2c is missing.

      Fixed

      • line 215: within minutes --> how was this data set obtained? which time bins were taken?

      Additional information added to text and methods

      • line 216: with no evidence of transient diffusive states  What is meant by transient diffusive state? It seems all time points have a diffusive component, which decreases over time.

      Additional information added to text and methods

      The diffusive peak decreases, the bound peak increases but no other peaks emerge during that time (e.g. neither super fast nor super slow)

      • line 225: it seems that fbound of GDC-0810 and GDC-0927 are rather similar in FRAP experiments, please comment, how was FRAP done?

      FRAP is in the methods section. The curves and recovery times are quite distinct, is the reviewer looking at

      • line 285: reproducibly: how often was this repeated?

      Information added to the manuscript

      • line 285: it would be necessary to name all of the compounds that were tested, e.g. with an ID number in the graph and a table. This also refers to extended data 7 and 8.

      Additional supplemental file with the list of bioactive compounds tested will be included.

      • line 290/1: what is meant by vendor-provided annotation was poorly defined?

      Additional information added to text and methods. Specifically, the “other” category is the most common category, and it includes both compounds with unknown targets/functions as well as compound where the target and pathway are reasonably well documented. Hence, we applied our own analysis to better understand the list of active compounds.

      Figures:

      • fig. 2-6: detailed statistics are missing (number of measured cells, repetitions, etc.).

      We have added clarifying information, including an “experiment design and sample size” section in the Methods.

      • fig. 3: the authors need to give a list with details about the 5067 compounds tested,

      Additional supplemental file with the list of bioactive compounds tested will be included.

      • extended data 1c: time axis does not correspond to the 1.5s of imaging in the text, results line 127.

      Axes fixed

      • extended data 3: panel c and d are mislabeled.

      Panel labels fixed

      Methods:

      • line 746: HILO microscope: the authors need to explain how they can get such large fields of view using HILO

      Additional details added to the materials and methods. The combination of the power of the lasers, the size of the incident beam out of the fiber optic coupling device and the sCMOS camera are the biggest components that enable detection over a larger field of view.

      • line 761: it is common practice to publish the analysis code. Since the authors wrote their own code, they should publish it

      Our software contains proprietary information that we cannot yet release publicly. Comparable results can be achieved with HILO data using publicly-available tools like utrack. State Arrays code is distributed and the parameters used are listed in the M&M.

      Reviewer #2 (Recommendations For The Authors):

      The writing and presentation are coherent, concise, and easy to follow.

      The authors should consider justifying the following:

      Why is 1.5s imaging time selected? Topological and ligand variations may last significantly longer than this. The authors should present at least for one condition the same effect images for longer.

      Related to the similar comment above, we added a figure examining the jump length distribution as a function of frame. Over the 6 seconds of data collection the jump length distribution is unchanged, suggesting it is reasonable to consider all the trajectories within an FOV as representative of the same underlying dynamical states.

      The authors miss the k test or T test in their graphs.

      We chose to apply the Kurskal-Wallis test in the context of the bioactive screen to assess whether a grouping of compounds based on their presumed cellular target was significantly different from the control even when individual compounds might not by themselves raise to significance. In this case many of the pathway inhibitors are subtle and not necessarily obvious in their difference. In the other cases throughout the manuscript, whether two conditions are statistically distinguishable is rarely in question and of far less importance to the conclusions in the manuscript than the magnitude of the difference. We’ve added statistical tests where appropriate.

      The overall integrated area of Fig 4a appears to reduce upon ligand addition. Data appear normalized but the authors should also add N (number of molecules) on top of the graphs.

      While the integrated area may appear to decrease, all State Array analysis is performed by first randomly sampling 10,000 trajectories from the assay well and inferring state distribution on those 10,000. This has been clarified in the figure legend and in the Methods.

      Minor

      Extended Figure 3 legend c, d appear swapped and incorrectly named in the text.

      Panel labels fixed

      L 197 but this appears not to BE a general feature of SHRs (maybe missing Be).

      Error fixed

      L205 authors refer to Figure 2c, which does not exist.

      Panel reference fixed

      Reviewer #3 (Recommendations For The Authors):

      Among minor issues:

      In Figure 1B, if the authors could specify how they discriminate the specific cell lines from the mixed context, it would enhance clarity. Could they perform additional immunofluorescence to understand how the assignment is determined? Alternatively, could they also show the case with isolated cell lines in an unmixed context?

      Immunofluorescence would be a challenge given that there is not a good epitope to distinguish the three ectopically-expressed genes from each other or from endogenous proteins in the case of H2B and CaaX. We are really reliant on the single cell dynamics to determine the likely cell identity. That said, we’ve added graphs of a number of individual cell State Arrays from the same data graphed in 1A which support the notion that it’s reasonable to assume a cells identity given the observed dynamics.

      In Extended Figure 2F: possibly a CHip-Seq experiment would be more directly qualified to state the effect of ER ligand on ER ability to bind chromatin.

      This is true. Presumably ER that is competent at activating transcription of ER-responsive genes is also capable of binding DNA. ChIP would be the more direct measure, but would not address whether the protein was functional. We chose to balance these measuring these two aspects of ER biology by pairing dynamics with the end-point transcription readout.

      In Figure 3: A representation with plate-by-plate orientation along the x-axis, with controls included in each plate, would be more appropriate to reflect the consistency of the controls used in the assay across different plates. Currently, all controls are pooled in one location, and we cannot appreciate how the controls vary from plate to plate.

      Figure added to the supplement

      Also in this figure, a general workflow of the screen down to segmentation/analysis would be a great add-on.

      New figure added to the supplement and reflected in the textual description of the platform

      In Extended Figures 3B and C an add-on of the positive and negative control would make the figure more convincing.

      Addressed as part of figure added to the supplement

      Is there any description of compound leads identified that is novel in nature in relation to impact on ER, and if so could it be stated more clearly in the text as novel finding?

      To our knowledge, the impact of HSP inhibition in increasing ER-chromatin association has never been described, neither has the link between inhibition post-translation modifying enzymes like the CDKs or mTOR and ER dynamics ever been described. We added clarifying text to the manuscript

  2. May 2024
    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      Detection of early-stage colorectal cancer is of great importance. Recently, both laboratory scientists and clinicians have reported different exosomal biomarkers to identify colorectal cancer patients.

      Here, the authors exhibited a full RNA landscape for plasma exosomes of 60 individuals, including 31 colorectal cancer (CRC) patients, 19 advanced adenoma (AA) patients, and 10 noncancerous controls. RNAs with high fold change, high absolute abundance, and various module attribution were used to construct RT-qPCR-based RNA models for CRC and AA detection.

      Overall, this is a well-performed proof-of-concept study to highlight exosomal RNAs as potential biomarkers of early-stage colorectal cancer and its precancerous lesions.

      Thank you for your careful evaluation and valuable suggestions, which have provided valuable guidance for the improvement of our paper. In response to your feedback, we have implemented the following improvements.

      (1) Depicting the full RNA landscape of circulating exosomes is still quite challenging. The authors annotated 58,333 RNA species in exosomes, most of which were lncRNAs, but the authors do not explain how they characterized those RNAs.

      Author response and action taken: Thanks for your comments. In the Supplementary Methods section titled "Identification of mRNAs and lncRNAs", we have provided a comprehensive explanation on the characterization of mRNAs and lncRNAs to address the concerns you raised. Characterization of long-chain RNAs is a great challenge. For lncRNA analysis, the transcriptome was assembled using the Cufflinks and Scripture based on the reads mapped to the reference genome. The assembled transcripts were annotated using the Cuffcompare program from the Cufflinks package. The unknown transcripts were used to screen for putative lncRNAs.

      (2) The authors tested their models in a medium size population of 124 individuals, which is not enough to obtain an accurate evaluation of the specificity and sensitivity of the biomarkers proposed here. External validation would be required.

      Author response and action taken: Thanks for your comments. We fully acknowledge the significance of external validations in the evaluation of diagnostic model performance. Unfortunately, as a pilot study, we currently do not have the conditions for a multicenter investigation. To mitigate result bias and overfitting effects, we implemented a rigorous variable selection strategy and enhanced model stability through 10-fold cross-validation. In the meantime, we will persist in our efforts to elevate the quality of our research and seek additional resources for external validation in future studies.

      Reviewer #2:

      The authors present an important study on the potential of small extracellular vesicle (sEV)-derived RNAs as biomarkers for the early detection of colorectal cancer (CRC) and precancerous adenoma (AA). The authors provide a detailed analysis of the RNA landscape of sEVs isolated from participants, identifying differentially expressed sEV-RNAs associated with T1a stage CRC and AA compared to normal controls. The paper further categorises these sEV-RNAs into modules and constructs a 60-gene model that successfully distinguishes CRC/AA from NC samples. The authors also validate their findings using RT-qPCR and propose an optimised classifier with high specificity and sensitivity. Additionally, the authors discuss the potential of sEV-RNAs in understanding CRC carcinogenesis and suggest that a comprehensive biomarker panel combining sEV-RNAs and proteins could be promising for identifying both early and advanced CRC patients. Overall, the study provides valuable insights into the potential clinical application of sEV-RNAs in liquid biopsy for the early detection of CRC and AA.

      Major strengths:

      (1) Comprehensive sEV RNA profiling: The study provides a valuable dataset of the whole-transcriptomic profile of circulating sEVs, including miRNA, mRNA, and lncRNA. This approach adds to the understanding of sEV-RNAs' role in CRC carcinogenesis and facilitates the discovery of potential biomarkers.

      (2) Detection of early-stage CRC and AA: The developed 60-gene t-SNE model successfully differentiated T1a stage CRC/AA from normal controls with high specificity and sensitivity, indicating the potential of sEV-RNAs as diagnostic markers for early-stage colorectal lesions.

      (3) Independent validation cohort: The study combines RNA-seq, RT-qPCR, and modelling algorithms to select and validate candidate sEV-RNAs, maximising the performance of the developed RNA signature. The comparison of different algorithms and consideration of other factors enhance the robustness of the findings.

      Thank you for your careful evaluation and valuable suggestions. These comments have been highly valuable for the performance evaluation and clinical applications of our work. In response to your feedback, we have implemented the following improvements.

      (1). Lack of analysis on T1-only patients in the validation cohort: While the study identifies key sEV-RNAs associated with T1a stage CRC and AA, the validation cohort is only half of the patients in T1(25 out of 49). It would be better to do an analysis using only the T1 patients in the validation cohort, so the conclusion is not affected by the T2-T3 patients.

      Author response and action taken: Thanks for your comments. This feedback is essential for ensuring consistency in the results with our previous findings. In this context, we revalidated various diagnostic panels using exclusively Stage I patients (Figure 7—figure supplement 2). To minimize the potential overfitting effect due to the reduction in sample size after partitioning, we implemented a 10-fold cross-validation for each panel and these panels exhibit promising performance in Stage I colorectal cancer (CRC) patients.

      Author response image 1.

      The ROC analysis of different sEV-RNA signatures in the prediction of Stage I CRC patients by different algorithms (a: 6-gene panel; b: 7-gene panel; c: 8-gene panel; d: 9-gene panel).

      (2). Lack of performance analysis across different demographic and tumor pathology factors listed in Supplementary Table 12. It's important to know if the sEV-RNAs identified in the study work better/worse in different age/sex/tumor size/Yamada subtypes etc.

      Author response and action taken: Thanks for your comments. This feedback will be immensely beneficial for clinical diagnosis. Similarly, cross-validation was performed in this section. We assessed the discriminative effects of CRC on NC, taking into account different age groups, genders, tumor sizes, and anatomical locations (Figure 7—figure supplement 3). Overall, these sEV RNA panels perform better in individuals under the age of 55 and in female patients. There is no significant difference in discriminative effects across different tumor sizes. Compared to rectal cancer, the discriminative effects are better in colon cancer.

      Author response image 2.

      The ROC analysis of different sEV-RNA signatures for predicting CRC patients using the Lasso regression algorithm in different clinical parameters (ab: age; cd: gender; ef: tumor size; gh: anatomical position).

    1. Author response:

      We thank the reviewers for their positive assessments and constructive feedback.

      In light of their comments, we will aim to improve the explanation of the methods and interpretation of results, as well as their relation to well-established literature in this research area.

      The major contributions of our work are threefold:

      • First, we introduce a novel way of analyzing codas that specifically targets subcoda structures by considering inter-click intervals within codas in terms of transition probabilities. By describing codas’ click patterns via Variable Length Markov Chains, we do not need to consider codas in their entirety, but we can detect coda subunits.This enables a new dimension for quantitatively comparing differences among various individuals, social units, and clans; which we term ‘vocal style’.

      • Using this approach, we reinforce findings from past research, including the idea that identity codas function as symbolic markers of vocal clan identity (Hersh et al., 2022; Sharma et al., 2024). More importantly, we offer new insights into the function of non-identity codas, which comprise the majority of coda types produced by sperm whales but have been largely uncharacterized. 

      • Our work reveals that non-identity coda vocal styles are more similar for spatially overlapped clans, and suggests that this similarity in style may be maintained by social learning across clan boundaries. This opens up a paradigm shift in our understanding of between-clan acoustic interactions.

      From a broader perspective, our work builds on two well-established research areas: the form and function of sperm whale codas, and statistical generative models, specifically Variable Length Markov Chains on finite data spaces. Our methods, results, and interpretations are grounded in theories and concepts from these fields.

      For clarity, we will ensure that our terminology aligns with field standards and existing research. We will clearly introduce each key theory or concept at first mention and justify its relevance. In particular, we will clarify the definition and meaning of the distance between subcoda trees for a general audience. We agree with the reviewers’ comments on the broader implications and will refine our work accordingly.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      Both reviewers positively received the manuscript, in general. The agreement was that the manuscript presented valuable findings, using solid techniques and approaches, that shed additional light into how the canine distemper virus hemagglutinin might engage cellular receptors and how that engagement impacts host tropism. While both reviewers appreciated the X-ray crystallographic data, they also felt that the AFM experiments could have been performed at a higher standard and that the interpretation of the results ensuing from those AFM experiments could have been explained more thoroughly and in simpler terms. An additional missed opportunity of the current manuscript is the lack of comparison of the crystal structure to that of the already published cryo-EM structure, for context.

      Thank you very much for constructive comments of the editor and reviewers. Following your comments, we have changed the text related to the AFM experiments with simpler terms as follows.

      “When CDV-H was loaded onto a mica substrate and scanned with a cantilever to acquire images of attached molecules, the CDV-H dimer was observed as two globules clustered together in most cases, but sometimes, each domain moved independently (Fig. 7B and Supplementary Movie). Time-course analysis of the dynamics of the representative CDV-H dimer showed that CDV-H could adopt both associated and dissociated forms (Fig. 7C). The distances between the domains were calculated by measuring those between the centers of mass of each domain. Finally, the distribution of distances between each head domain in the CDV-H dimers showed approximately 15 nm as a major peak (Fig. 7D). This is a reasonable length for the linker between the head domain dimers.” in Page 11, Lines 8-17.

      With regards to the structural comparison between cryo-EM structure published in Proc. Natl. Acad. Sci. U. S. A. (2023) 120, e2208866120 and our crystal structure, we have compared these structures for Cα on page 6 and added the following text. “A recent cryo-EM structure of the wild-type CDV-H ectodomain revealed that the head dimer is located on one side of the stalk region in solution (Proc. Natl. Acad. Sci. U. S. A. (2023) 120, e2208866120)” in Page 14, Lines 22-24.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Fukuhara, Maenaka, and colleagues report a crystal structure of the canine distemper virus (CDV) attachment hemagglutinin protein globular domain. The structure shows a dimeric organization of the viral protein and describes the detailed amino-acid side chain interactions between the two protomers. The authors also use their best judgement to comment on predicted sites for the two cellular receptors - Nectin-4 and SLAM - and thus speculate on the CDV host tropism. A complementary AFM study suggests a breathing movement at the hemagglutinin dimer interface.

      Strengths:

      The study of CDV and related Paramyxoviruses is significant for human/animal health and is very timely. The crystallographic data seem to be of good quality.

      Thank you very much for the constructive comment of the reviewer.

      Weaknesses:

      While the recent CDV hemagglutinin cryo-EM structure is mentioned, it is not compared to the present crystal structure, and thus the context of the present study is poorly justified. Additionally, the results of the AFM experiment are not unexpected. Indeed, other paramyxoviral RBP/G proteins also show movement at the protomer interface.

      Thank you very much for constructive comments of the reviewer. When we submitted our manuscript to e-life, cryo-EM structure just published in Proc. Natl. Acad. Sci. U. S. A. (2023) 120, e2208866120 a week ago was not able to be available. Following the comment of the reviewer, we have added the text about the structural comparison between the cryo-EM structure and our crystal structure. We also have changed the text related to the AFM experiments to tone down the movement of the protomer interfaceas follows.

      “This observation raises the possibility that each head domain of CDV-H also dissociates and moves flexibly, as shown in the structure of Nipah virus (NiV)-G protein, previously (Science (2022) 375, 1373–1378).” in Page 11, Lines 4-6.

      Reviewer #2 (Public Review):

      Summary:

      The authors solved the crystal structure of CDV H-protein head domain at 3,2 A resolution to better understand the detailed mechanism of membrane fusion triggering. The structure clearly showed that the orientation of the H monomers in the homodimer was similar to that of measles virus H and different from other paramyxoviruses. The authors used the available co-crystal strictures of the closely related measles virus H structures with the SLAM and Nectin4 receptors to map the receptor binding site on CDV H. The authors also confirmed which N-linked sites were glycosylated in the CDV H protein and showed that both wildtype and vaccine strains of CDV H have the same glycosylation pattern. The authors documented that the glycans cover a vast majority of the H surface while leaving the receptor binding site exposed, which may in part explain the long-term success of measles virus and CDV vaccines. Finally, the authors used HS-AFM to visualize the real-time dynamic characteristics of CDV-H under physiological conditions. This analysis indicated that homodimers may dissociate into monomers, which has implications for the model of fusion triggering.

      The structural data and analysis were thorough and well-presented. However, the HS-AFM data, while very exciting, was not presented in a manner that could be easily grasped by readers of this manuscript. I have some suggestions for improvement.

      (1) The authors claim their structure is very similar to the recently published croy-EM structure of CDV H. Can the authors provide us with a quantitative assessment of this statement?

      Thank you very much for constructive comments of the reviewer. When we submitted our manuscript to e-life, cryo-EM structure just published in Proc. Natl. Acad. Sci. U. S. A. (2023) 120, e2208866120 a week ago was not able to be available. Following the comment of the reviewer, we have added the text about the structural comparison between the cryo-EM structure and our crystal structure. We also have changed the text related to the AFM experiments to tone down the movement of the protomer interface as follows.

      “This observation raises the possibility that each head domain of CDV-H also dissociates and moves flexibly, as shown in the structure of Nipah virus (NiV)-G protein, previously (Science (2022) 375, 1373–1378).” in Page 11, Lines 4-6.

      (2) The results for the HS-AFM are difficult to follow and it is not clear how the authors came to their conclusions. Can the authors better explain this data and justify their conclusions based on it?

      Thank you very much for constructive comments of the reviewer. Following your comments, we have changed the text related to the AFM experiments with simpler terms as follows.

      “When CDV-H was loaded onto a mica substrate and scanned with a cantilever to acquire images of attached molecules, the CDV-H dimer was observed as two globules clustered together in most cases, but sometimes, each domain moved independently (Fig. 7B and Supplementary Movie). Time-course analysis of the dynamics of the representative CDV-H dimer showed that CDV-H could adopt both associated and dissociated forms (Fig. 7C). The distances between the domains were calculated by measuring those between the centers of mass of each domain. Finally, the distribution of distances between each head domain in the CDV-H dimers showed approximately 15 nm as a major peak (Fig. 7D). This is a reasonable length for the linker between the head domain dimers.” in Page 11, Lines 8-17.

      (3) The fusion triggering model in Figure 8 is ambiguous as to when H-F interactions are occurring and when they may be disrupted. The authors should clarify this point in their model.

      Thank you very much for constructive comments of the reviewer. Following your comments, we have changed the Figure 8 and its legend.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) AFM experiments with SLAM or Nectin-4 immobilized on the cantilever would be much more informative.

      Thank you very much for the constructive comment of the reviewer. We will try this experiment in the next paper.

      (2) The authors should compare their crystal structure to that of the reported cryo-EM structure.

      With regards to the structural comparison between cryo-EM structure published in Proc. Natl. Acad. Sci. U. S. A. (2023) 120, e2208866120 and our crystal structure, we have added the text.

      (3) Figure 1D - why does the beta2 MG negative control have such a high SPR signal?

      Thank you very much for the constructive comment of the reviewer. The immobilization levels for b 2-microglobulin (beta2 MG), CDV-OP-H and CDV-5VD-H were similar, 1204.7 RU, 1235.7 RU, and 1504.5 RU, respectively. We applied relatively high concentrations (5 mM) of dNectin4 and hNectin4 onto the chip to determine low-affinity dissociation constants. Then, the signals for beta2 MG (negative control) were high. In other SPR experiments for cell surface receptors, such high signals for beta2 MG were often observed in our previous paper, Kuroki et al., J. Immunol. 2019 Dec 15;203(12):3386-3394. doi: 10.4049/jimmunol.1900562. Therefore, we think that these SPR signals are not unusual.

      (4) Figure 1C - please indicate the Ve volume for the peak and add in Ve for standard.

      Thank you very much for the constructive comment of the reviewer. We have indicated the Ve volume for the peak and added in Ve for standard in Figure 1C.

      (5) The authors mention that one of the chains in the asymmetric unit was better resolved than the other. Please show regions of the atomic model fit regions of the electron density to convince the reader of the quality of your data.

      Thank you very much for the constructive comment of the reviewer. We have added new Supplementary figure 2 for comparison of electron density maps of chains A and B.

      (6) Table 2 indicates that the difference between Rw and Rf values is larger than 5% which indicates slight overfitting during refinement. Please provide details of your refinement strategy and attempt simulated annealing as a strategy to reduce this delta.

      Thank you very much for the constructive comment of the reviewer. We further introduced TLS and NCS parameters for the refinement. Consequently, the R/Rfree factors became 0.2645/0.3092. Simulated annealing had been already carried out. All the refinement statistics in the table 2 are updated.

      Reviewer #2 (Recommendations For The Authors):

      (1) The authors' fusion triggering model was difficult to follow. For example, this sentence was difficult to understand: "The other possible models may include the monomer-dimer-tetramer transition facilitated by receptor binding for the fusion."

      Thank you very much for the constructive comment of the reviewer. Following your comments, we have removed the above sentences and have added the detail mechanism of the proposed model in Discussion. Furthermore, we have changed the Figure 8 and its legend for readers to understand more clearly.

      (2) Figure 5A is not called out in the main text.

      Thank you very much for the constructive comment of the reviewer. Following your comments, we have added the text as follows.

      “the crystal structure of MeV-H in complex with hNectin-4 showed that the H-SLAM interaction consists of three main sites (Fig. 5A) (Nat. Struct. Mol. Biol. (2013) 20, 67–72).” in Page 11, Lines 4-6.

      (3) Page 9, Line 4: interspaces? Perhaps interphases.

      Thank you very much for the constructive comment of the reviewer. We have changed the term “interspaces” to “internal spaces”.

      (4) Page 12, penultimate line: The authors mention "epitopes for anti-MeV-H Abs." Do they mean anti-CDV-H Abs?

      Thank you very much for the constructive comment of the reviewer. Following your comments, we have changed the “anti-MeV-H Abs” to “anti-morbillivirus H neutralizing antibodies”.

      (5) The paper will benefit from an English language editor to help clarify what the authors are trying to convey.

      Thank you very much for the constructive comment of the reviewer.

      We have asked a English proof reading company to check.

    1. Author response:

      We are grateful to the reviewers for their interest and enthusiasm about the work, and deeply appreciate their constructive comments and suggestions. Our responses are below

      (1) Do mice with BCR-ABL/MSI2-HOXA9 leukemia have an increased pool of leukemic stem cells (LSC), or do they have an increased propensity to develop blast cells? Is it the number of LSCs that has increased, or is it the function of LSC to give rise to the disease that has increased? It is not clear if the detected differences in Lineage-negative cells (Figure S1D) were detected in vitro in retrovirally transduced cells or were detected in vivo in transplanted mice. If the differences were detected in vitro, could the author confirm the same findings in vivo? This will greatly enhance the understanding of in vivo disease pathogenesis and could directly link the aggressivity of the disease (shortened survival) with an increased stem cell-like population.

      We find that BCR-ABL/MSI2-HOXA9 leads to a marked increase in Lineage negative (Lin-) cells which contains the LSC fraction. Specifically, the LSC containing fraction represented 14.1% of the BCR-ABL driven disease and 56.7% of the BCR-ABL and MSI2-HOXA9 driven disease (p<.0001). This suggests that MSI2-HOXA9 triggers the expansion of the undifferentiated LSC containing pool. In addition, the blast frequency was also increased albeit to a lesser extent, with 63.8% blasts (SEM 1.1) for BCR-ABL and 83.3% (SEM 3.1) for BCR-ABL/MSI2-HOXA9 (p=.0001). This suggests that the resulting aggressive disease seen with MSI2-HOXA9 is a consequence of a large increase in undifferentiated  LSC containing cells, as well as the resulting increase in the blast count. The Lin- cells were analyzed from fully established leukemias in vivo (Fig. S1D)

      (2) The authors suggest that BCR-ABL/MSI2-HOXA9 leads to the development of blast crisis-CML. One of the main characteristics of blast crisis-CML is drug resistance. Is BCR-ABL/MSI2-HOXA9 leukemia resistant to classical CML treatment drugs?

      The sensitivity to Imatinib is a very interesting question. In general, while differentiated cells in CML are sensitive to Imatinib, the more undifferentiated cells (LSCs) are resistant1,2. Based on the fact that therapy resistance in blast crisis is largely driven by the undifferentiated fraction of leukemia cells, and given that BCR-ABL/MSI2-HOXA9 driven disease harbors a larger fraction of these undifferentiated cells, we would predict that BCR-ABL/MSI2-HOXA9 leukemia would also be more resistant to imatinib. However, this would need to be experimentally demonstrated and is an important question to address.

      (3) The authors have emphasized the heightened expression of Polrmt in delineating the mitochondrial phenotype of BCR-ABL/MSI2-HOXA9 leukemia cells. However, the regulatory mechanism governing the expression of Polrmt by MSI2-HOXA9 has not been clearly demonstrated by the authors. Unveiling this mechanism would constitute a novel finding and significantly elevate the quality of the research.

      Since Polrmt and mitochondrial genes are transcribed in the nucleus we explored whether MSI2-HOXA9 may control mitochondrial gene expression by triggering expression of Polrmt and other key transcription factors. Consistent with this possibility, MSI2-HOXA9 was preferentially found in the nucleus relative to MSI2. In addition, there were 10 occurrences of the minimal MSI2 RRM1 consensus binding sequence UAGU within the Polrmt transcript. While this is consistent with the possibility that Polrmt expression can be post-transcriptionally modulated by MSI2-HOXA9, this needs to be experimentally validated using Clip Seq analysis with wild type MSI2 as well as the MSI2-HOXA9 fusion protein in context of blast crisis CML.

      (4) Did the authors observe any survival differences between BCR-ABL/NUP98-HOXA9 and BCR-ABL/MSI2-HOXA9?

      In previous work from our lab we have found that the median survival for BCR-ABL/NUP98-HOXA9 was 17 days, and with BCR-ABL/ MSI2-HOXA9 was 18.5 days (p value of 0.22). This suggests that there is not a significant difference in survival times between the leukemias driven by the distinct alleles, and they may be equally aggressive.

      (1) MSI2-HOXA9 fusion is extremely rare as it has been only found in a handful of patients and it is not clear whether other MSI2 fusions function in a similar manner.

      We were very surprised and excited to see the large number of translocations in solid cancers that involve MSI2.  Interestingly, MSI2 translocations occurred both at the N and the C terminus.  Distinct translocations are likely to have unique roles in each disease context. For example, if MSI2’s 5 prime end is part of a translocation, it may functionally contribute via its promoter to drive expression in immature cells and could thus activate oncogenic signals (e.g. controlled by the partner gene) in immature cells which are inherently more susceptible to transformation (Eµ-myc is an example of such a translocation). If Msi2’s RRM domains are part of the fusion, they could bind and target RNAs aberrantly (such as in the wrong cell and the wrong time) and lead to activation of downstream oncogenic mediators. To fully understand the role of each of these translocations in each specific cancer, we would need to experimentally test their impact by ectopic expression in the appropriate cell of origin and domain mapping the basis of any impact in the relevant cancer models as we have done for MSI2-HOXA9 in blast crisis CML in the work we report here.   While this is an intensive undertaking, it is nonetheless important future work as it will undoubtedly lead to new insight about MSI2 linked translocations in diverse solid cancers such as breast cancer and lung cancer.

      (2) The mechanism needs to be strengthened since MSI2 alone or the HOXA9 mutant may not be linked to the mitochondrial mechanism. (3) It is not clear that the mitochondrial pathway is sufficient for the MSI2-HOXA9 oncogenic mechanism.

      Our observation that MSI2-HOXA9 triggered changes in mitochondrial function was of particular interest as it was (to our knowledge) uncharted in context of Msi2 signaling in cancer, thus leading us to explore this further.  However, multiple other signals are likely downstream regulators and these may well act cooperatively with, or independently of, the heightened­­ mitochondrial function we report here. Among these pathways, the most likely mediators included oncogenic programs related to the Wnt pathway including Wnt, Fzd 3 and Frat1, and those related to the Notch pathway including-Tribbles and Hey1 as well as other stem cell genes such as Aldh1. These programs have been previously implicated in the regulation of myeloid leukemia3-11 and could well mediate the impact of the MSI2-HOXA9 translocation. The relative contribution of mitochondrial metabolism and that of developmental and stem cell signals to the onset of MSI2-HOXA9 driven blast crisis CML is an important avenue of future work.

      References

      (1) Corbin, A. S., Agarwal, A., Loriaux, M., Cortes, J., Deininger, M. W. & Druker, B. J. 2011. Human chronic myeloid leukemia stem cells are insensitive to imatinib despite inhibition of BCR-ABL activity. J Clin Invest 121: 396-409. PMC3007128.

      (2) Graham, S. M., Jørgensen, H. G., Allan, E., Pearson, C., Alcorn, M. J., Richmond, L. & Holyoake, T. L. 2002. Primitive, quiescent, Philadelphia-positive stem cells from patients with chronic myeloid leukemia are insensitive to STI571 in vitro. Blood 99: 319-325.

      (3) Gurska, L. M., Ames, K. & Gritsman, K. 2019. Signaling Pathways in Leukemic Stem Cells. Adv Exp Med Biol 1143: 1-39. PMC7249489.

      (4) Narendra, G., Raju, B., Verma, H. & Silakari, O. 2021. Identification of potential genes associated with ALDH1A1 overexpression and cyclophosphamide resistance in chronic myelogenous leukemia using network analysis. Med Oncol 38: 123.

      (5) Ran, D., Schubert, M., Pietsch, L., Taubert, I., Wuchter, P., Eckstein, V., Bruckner, T., Zoeller, M. & Ho, A. D. 2009. Aldehyde dehydrogenase activity among primary leukemia cells is associated with stem cell features and correlates with adverse clinical outcomes. Exp Hematol 37: 1423-1434.

      (6) Reya, T., Duncan, A. W., Ailles, L., Domen, J., Scherer, D. C., Willert, K., Hintz, L., Nusse, R. & Weissman, I. L. 2003. A role for Wnt signalling in self-renewal of haematopoietic stem cells. Nature 423: 409-414.

      (7) Riether, C., Schürch, C. M., Bührer, E. D., Hinterbrandner, M., Huguenin, A. L., Hoepner, S., Zlobec, I., Pabst, T., Radpour, R. & Ochsenbein, A. F. 2017. CD70/CD27 signaling promotes blast stemness and is a viable therapeutic target in acute myeloid leukemia. J Exp Med 214: 359-380. PMC5294846.

      (8) Riether, C., Schürch, C. M., Flury, C., Hinterbrandner, M., Drück, L., Huguenin, A. L., Baerlocher, G. M., Radpour, R. & Ochsenbein, A. F. 2015. Tyrosine kinase inhibitor-induced CD70 expression mediates drug resistance in leukemia stem cells by activating Wnt signaling. Sci Transl Med 7: 298ra119.

      (9) Venton, G., Pérez-Alea, M., Baier, C., Fournet, G., Quash, G., Labiad, Y., Martin, G., Sanderson, F., Poullin, P., Suchon, P., Farnault, L., Nguyen, C., Brunet, C., Ceylan, I. & Costello, R. T. 2016. Aldehyde dehydrogenases inhibition eradicates leukemia stem cells while sparing normal progenitors. Blood Cancer J 6: e469. PMC5056970.

      (10) Yin, D. D., Fan, F. Y., Hu, X. B., Hou, L. H., Zhang, X. P., Liu, L., Liang, Y. M. & Han, H. 2009. Notch signaling inhibits the growth of the human chronic myeloid leukemia cell line K562. Leuk Res 33: 109-114.

      (11) Kang, Y. A., Pietras, E. M. & Passegué, E. 2020. Deregulated Notch and Wnt signaling activates early-stage myeloid regeneration pathways in leukemia. J Exp Med 217. PMC7062512.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      Will the nanobody be available to the TB research community?

      Yes, we will make E11rv available upon request. Please see our materials availability statement.

      Reviewer #2 (Recommendations For The Authors):

      (1) It would be interesting to test the potential impact of residual ASB-14 contaminant on the biochemical behavior of ESAT-6-CFP10 heterodimer and ESAT-6 homodimer or tetramer and their hemolytic activity in comparison with the ones without ASB-14.

      We agree that this is an interesting line of questioning. Based on the study by Refai et al. that we cite in the text, ESAT-6 treated with nonionic detergents ASB-14 or LDAO, but not other common detergents, undergoes a conformational change that increases its cytotoxicity in cell assays, hemolytic activity, and ability to dimerize with CFP-10. What is not known at this point, is how similar the ASB-bound conformation is to anything seen physiologically.

      (2) Building on the progress in making anti-ESAT-6 nanobodies and their anti-Mtb effects in the cells, it could have been tested in human or mouse primary macrophages infected with Mtb and a mouse model of Mtb infection for its anti-Mtb efficiency.

      We thank the reviewer for this suggestion, and we agree that these would be very informative next steps for determining the therapeutic potential of anti-ESAT-6 nanobodies.

      Reviewer #3 (Recommendations For The Authors):

      Minor comments:

      Line 133: "It is well established that Mm-induced hemolysis is ESX-1 dependent, but our results suggest that Mtb must lack one or more factors necessary for efficient hemolysis.". I would tone this down a bit, as it is also known that M. tuberculosis escapes much later than M. marinum from the phagosome, which could indicate different kinetics.

      We thank the reviewer for their insightful comments. We agree that the kinetics of Mtb and Mm infection are quite different and that this may impact the hemolysis assay. As described by Augenstreich et al. some hemolysis by Mtb is observed at 48 hours, though the method of normalization makes it impossible to determine absolute amount of hemolysis that occurred in their experiment. Our findings just show that the absolute amount of Mtb hemolysis in 2 hours is negligible, setting it apart from Mm. We have edited the wording of this statement in the manuscript to avoid any confusion.

      Line 155: "Because Mtb often exists in an acidified compartment". First of all, the reference used here does not discuss anything about Mtb, secondly, papers that do measure the acidification of Mtb-loaded phagosomes indicate that this acidification is very mild (typically to pH 6.2).

      We agree that this point should be articulated more precisely. We have added additional clarification that the pH of Mtb-containing compartments in macrophages can fall in a broad range depending on the activation state of the macrophages, and that non-activated macrophages are typically only mildly acidic. We have updated our references to better describe the current state of knowledge on this topic.

      Line 339: "Whereas most of these functions rely only on the secretion of ESAT-6 into the cytoplasm, the ability of E11rv to access Mtb suggests that this communication is likely two-way." No, not necessary, there are many processes in which ESX-1 substrates affect the macrophage. This nanobody could affect EsxA functioning only once the bacteria reach the cytoplasm. I think checking phagosomal escape in these cells is therefore crucial.

      We agree that phagosomal escape and subsequent direct secretion of ESAT-6 into the cytoplasm is a reasonable alternative hypothesis. We have added this point to our discussion, and we agree that looking directly at phagosomal escape is an important next step.

      Figure 7 is not mentioned in the text (mistake for Fig 6).

      This has been corrected.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      The study is highly interesting and the applied methods are target-oriented. The biophysical characterization of viable N-protein species and several representative N-protein mutants is supported by the data, including polarity, hydrophobicity, thermodynamic stability, CD spectra, particle size, and especially protein self-association. The physicochemical parameters for viable N-protein and related coronavirus are described for comparison in detail. However, the conclusion becomes less convincing that the interaction of peptides or motifs was judged by different biophysical results, with no more direct data about peptide interaction. Additionally, the manuscript could benefit from more results involving peptide interaction to support the author's opinions or make expression more accurate when concerning the interaction of motifs. Although the authors put a lot of effort into the study, there are still some questions to answer.

      We thank the Reviewer for this assessment and wholeheartedly agree that there are still many questions. The main thrust of the present work was not intended to unravel the detailed mechanistic origin of all observations, but rather to juxtapose the different observations made with different viable N-protein species across the mutant spectrum, in order to get a sense of how narrowly the biophysical phenotype is confined to ensure virus viability. Such a study has become possible for the first time with the unprecedented genomic database of SARS-CoV-2. This has led to observations of non-local effects of individual mutations that are not independent and non-additive relative to the effects of other mutations, and in that sense we have inferred ‘interactions’. These might be mediated by direct contacts or indirectly through altered chain configurations. In the revised manuscript we have clarified this point.

      Meanwhile, a number of documented direct physical intra-molecular and intra-dimer interactions provide a context to our study of mutation effects. The flexibility of the IDRs provides a rich variety of contacts that have been observed in molecular dynamics and single-molecule fluorescence studies (Rozycki & Boura, Biophys Chem. 2022 and Cubuk et al, Nat Communs 2021). We have previously carried out detailed hydrodynamic studies of self-association interfaces located in the leucine-rich region. More recently, NMR data just published by the Blackledge laboratory (Botova et al., bioRxiv 2024) extend the list of intra-molecular contacts with the observation of long-range intra-molecular interactions between the NTD and the CTD, NTD and the phosphorylated SR-rich region, and NTD and the previously studied leucine-rich region. The latter contacts require the C-terminal region of the linker to loop back onto the NTD, which may well introduce susceptibility to any of the linker mutations. However, detailed linker configurations are beyond the scope of the present work.

      With regard to the effects of the Omicron mutations in the N-arm IDR, we have shown hydrodynamic data directly demonstrating peptide self-association, and we are currently working on a more detailed functional follow-up study which we hope to communicate soon.

      Reviewer #2 (Public Review):

      Summary: This work focuses on the biochemical features of the SARS-CoV-2 Nucleocapsid (N)protein, which condenses the large viral RNA genome inside the virus and also plays other roles in the infected cell. The N protein of SARS-CoV-2 and other coronaviruses is known to contain two globular RNA-binding domains, the NTD and CTD, flanked by disordered regions. The central disordered linker is particularly well understood: it contains a long SR-rich region that is extensively phosphorylated in infected cells, followed by a leucine-rich helical segment that was shown previously by these authors to promote N protein oligomerization.

      In the current work, the authors analyze 5 million viral sequence variants to assess the conservation of specific amino acids and general sequence features in the major regions of the N protein. This analysis shows that disordered regions are particularly variable but that the general hydrophobic and charge character of these regions are conserved, particularly in the SR and leucine-rich regions of the central linker. The authors then construct a series of N proteins bearing the most prevalent mutations seen in the Delta and Omicron variants, and they subject these mutant proteins to a comprehensive array of biophysical analyses (temperature sensitivity, circular dichroism, oligomerization, RNA binding, and phase separation).

      Strengths:

      The results include a number of novel findings that are worthy of further exploration. Most notable are the analyses of the previously unstudied P31L mutation of the Omicron variant. The authors use ColabFold and sedimentation analysis to suggest that this mutation promotes the self-association of the disordered N-terminal region and stimulates the formation of N protein condensates. Although the affinity of this interaction is low, it seems likely that this mutation enhances viral fitness by promoting N-terminal interactions. The work also addresses the impact of another unstudied mutation, D63G, that is located on the surface of the globular NTD and has no significant effect on the properties analyzed here, raising interesting questions about how this mutation enhances viral fitness. Finally, the paper ends with studies showing that another common mutant, R203K/G204R,disrupts phase separation and might thereby alter N protein function in a way that enhances viral fitness.

      Thank you for highlighting the strengths of our paper.

      Weaknesses:

      In general, the results in the paper confirm previous ideas about the role of N protein regions. The key novelty of the paper lies in the identification of point mutations, notablyP13L, that suggest previously unsuspected functions of the N-terminal disordered region in protein oligomerization. The paper would benefit from further exploration of these possibilities.

      We agree that the bioinformatic results confirm previous ideas about the role of the N protein regions. However, we believe our results go beyond the previous thinking in a crucial aspect, which is that we examine the full (so far known) mutant spectrum of N-protein. Properties previously inferred from the inspection of single consensus sequences can be misleading because of the quasispecies nature of RNA viruses. By considering the mutant spectrum we can obtain a sense for how significant differences in the physicochemical properties of the different regions are, and how much variation is possible without jeopardizing essential protein functions.

      With regard to the N-arm IDR mutations we believe this deserves a separate study focusing on the apparent N-arm function. Our rationale for presenting some initial N-arm results in the current paper was to highlight how the variability of N-protein species in the mutant spectrum can even include differences in the type and number of protein self-association interfaces.

      Reviewer #3 (Public Review):

      Nguyen, Zhao, et al. used bioinformatic analysis of mutational variants of SARS-CoV-2Nucleocapsid (N) protein from the large genomic database of SARS-CoV-2 sequences to identify domains and regions of N where mutations are more highly represented and computationally determined the effects of these mutations on the physicochemical properties of the protein. They found that the intrinsically disordered regions (IDRs) of N protein are more highly mutated than structured regions and that these mutations can lead to higher variability in the physical properties of these domains. These computational predictions are compared to in vitro biophysical experiments to assess the effects of identified mutations on the thermodynamic stability, oligomeric state, particle formation, and liquid-liquid phase separation of a few exemplary mutants.

      The paper is well-written and easy to follow, and the conclusions drawn are supported by the evidence presented. The analyses and conclusions are interesting and will be of value to virologists, cell biologists, and biophysicists studying SARS-CoV-2 function and assembly. It would be nice if some further extrapolation or comments could be made regarding the effects of the observed mutations on the in vivo behavior and properties of the virus, but I appreciate that this is much higher-order than could be addressed with the approaches employed here.

      We thank the Reviewer for this positive assessment. With regard to the possible in vivo behavior of mutant species, we agree that this would require additional data beyond the scope of the present work.

      However, for the N:G215C mutant we can point to a very recent preprint by Kubinski et al. (bioRxiv 2024) that describes reverse genetics experiments where the isolated N:G215C mutation caused altered in vivo pathology, enhanced viral replication, and altered virion morphology. We have cited this work in the revised manuscript.

      As mentioned above, for the P13L mutation we hope to communicate a more detailed follow-up study that will allow us to extrapolate on its in vivo behavior.

      Recommendations For The Authors:

      Reviewer #1:

      (1) Given the structure organization of N-protein in Figure 1, the authors should explain why linker region 180-247 is different from linker (175-247) mentioned in the first result.

      We thank the reviewer for bringing up this point, which we agree deserves clarification. While often the NTD has been assigned a C-terminal limit of 180 (e.g., in the NMR structure by Dinesh et al, Plos Pathogens 2020), the last several residues in the NTD are already disordered and contain the S176/R177 pair and therefore may be ascribed to the beginning of the SR-rich portion of the linker. In order not to artificially truncate functional sequences of either NTD or linker, we have decided to allow the designations of the NTD and linker regions to overlap. We believe this is conservative in that possible NTD or linker properties extending into this transition region will be preserved. In order to explain this in the manuscript, we have modified Figure 1 and inserted a brief sentence “(Due to ambiguity in delineation between NTD and linker, designations overlapping in 175-180 were used to avoid artificial truncation and permit conservative evaluation of the properties of each domain.)”.

      (2) Please specify the "physicochemical requirements" in the fourth paragraph of the first result, and its physicochemical meaning and references.

      Thank you for pointing this out; we agree this was not well expressed. We have rephrased this (including new references) to “…we find that hydrophobicity is uniformly high and polarity correspondingly low in the folded NTD and CTD domains, which is consistent with the expectation that folded structures are stabilized by buried hydrophobic residues (Eisenberg and McLachlan, 1986; Kauzmann, 1959)”.

      (3) The authors should clarify the biological meaning of the net charge and phosphorylation charge in the first result, just like the description in the results of polarity and hydrophobicity.

      We agree this will improve readability, and have inserted an introductory sentence to the study of charges in the mutant spectrum: “Charges in proteins can control multiple properties related to electrostatic interactions, from functions of active sites to protein solubility, protein interactions, and conformational ensembles in IDRs (Garcia-Viloca et al., 2004; Gerstein and Chothia, 1996; Gitlin et al., 2006; Mao et al., 2010).”.

      (4) The authors should clarify the calculation method and meaning of the column "occurs in % of all genomes" in Table 2.

      We have inserted a footnote specifying that this is the “Percentage of all sequenced genomes carrying the specific mutation.”.

      (5) Please specify what information or conclusion we can get for the shift of the intrinsic fluorescent spectrum of N: D63G in the third result paragraph 2.

      We have rephrased the second sentence of this paragraph to “The presence of the N:D63G mutation in the NTD is highlighted in the shift of the intrinsic fluorescence quantum yield of this mutant in comparison to Nref ”. It confirms the structural prediction, which positions D63G at the protein surface near the NA binding site, and sets up the question whether this obligatory mutation of Delta-variant N-protein affects NA binding and thereby possibly assembly. Unexpectedly, we did not find any impact of the D63G mutation on NA binding, although we observed a modest impact on temperature-dependent particle formation by DLS.

      (6) The conclusion, "some epistatic interaction between mutation of the linker and N-arm" in the third result paragraph 4, is over-interpreted from the result of the CD spectra because they didn't detect peptide interaction between mutation of the linker and N-arm.

      Thank you for raising this point. We did not mean to make a strong conclusion here, and have now deleted this statement.

      (7) The parallel assay for N: G215C and Nδ in SV-AUC experiments is recommended to be conducted with other groups to avoid experimental error.

      I believe this may be a misunderstanding: Indeed we had carried out SV-AUC experiments for all the mutants, as shown in Figure 5A. However, since all but the N:G215C and Nδ formed only dimers as the reference protein, we did not comment on these in the results text. We have rectified this omission in the revision by inserting the sentence: “…The same behavior is observed for N:D63G, No, N:R203K/G204R, as well as N:P13L/Δ31-33 at low micromolar concentrations (Figure 5A). By contrast, the G215C mutation promotes the formation of higher oligomers…”

      With regard to experimental error, SV-AUC is an absolute method based on first principles and we have maintained our instruments by performing regular calibrations, using methods developed by us and colleagues at NIST, as described in the literature (Anal Biochem 2013, PLOS ONE 2018, Eur. Biophys. J. 2021). Previously we have critically examined the accuracy of s-values by SV-AUC before and after calibration in a large multi-laboratory study (PLOS ONE 2015), and found that the accuracy of s-values is ~1%. This allows detailed comparisons of results from different runs and different points in time. To alleviate any concerns we have now mentioned our calibration methods in the methods section.

      (8) The authors did not test the function of Nδ R203M mutation, so they should not mention about it like in the third result paragraph 5, which is over-interpreted from result 5A.

      We accept the criticism that we have not yet examined the R203M mutation in isolation. However, we believe some speculation is in order: Nδ consists of D63G, R203M, G215C, and D377Y, of which D63G is unlikely to impact oligomeric state based on our data of N:D63G. It is therefore reasonable to assume that R203M and/or D377Y interfere with the observed promotion of oligomerization that we have observed with N:G215C. In previous work, we have traced the 215C-incuded oligomerization to the transient helix in the leucine-rich region of the linker 215-235 (Science Advances, 2023), Since 377Y is quite far away, the more proximal 203M appears to be the most plausible origin of the modulation of dimerization.

      In the revision we have more clearly outlined this speculation: “ Of the three additional mutations of Nδ relative to N:G215C, we speculate that D63G does not impact dimerization (as in N:D63G, Figure 5A), and that therefore either the distant D377Y and/or R203M might cause this reduction of helicity and oligomerization relative to N:G215C, noting that R203M is proximal to the L-rich region (215-235) reshaped by 215C. ”. Later we refer to this as “any potential inhibitory role suspected of the R203M mutation on self-association…”.

      (9) The description of LLPS formation lacks reference in the third result paragraph 6.

      Thank you. To improve the transition to this new paragraph in the results, we have inserted “As outlined in the introduction, …” and repeated the 8 references to the fact that N-protein undergoes LLPS. The two additional, separate references refer to just those published studies that examined the temperature-dependence of LLPS, which I believe is now clearer.

      (10) The authors did not test the interaction between the N-arm IDR mutation and linker IDR, it is not exponible that interaction promoted particle formation of No in the third result paragraph 8, which is over-interpreted from result 5B.

      We thank the Reviewer for raising this point. In fact, we did not want to imply a direct physical interaction (in terms of binding) between the N-arm IDR mutation and that in the linker. But clearly there are non-additive effects in particle formation since P13L/Δ31-33 inhibits slightly and R203K/G204R inhibits almost completely, whereas the combination of the two (constituting No) promotes particle formation. We have rephrased this to “alter the effect of”, avoiding the term “interact with” not to suggest a picture of direct binding and invoke instead the idea of epistatic interactions.

      (11) In the third result paragraph 9, why did the authors choose to examine the role of the N-arm mutations of the Omicron variants in greater detail? This reason should be added to the manuscript.

      Thank you for this suggestion. Naturally, we were curious how the defining N-arm mutations of Omicron variants could impact particle formation. Even though no obvious enhancement of self-association by either Omicron N-arm or linker mutations was observed at low micromolar concentrations in SV-AUC (Figure 5A), we knew from experience with the study of the leucine-rich transient helix in the linker IDR that even weak interfaces with mM Kd can be highly relevant in the context of multivalent assemblies (Science Advances, 2023). Therefore we followed the same roadmap and focused on IDR peptides with the goal to study them at higher concentrations that might reveal weak interactions.

      We have described this motivation as follows: “We were curious whether IDR mutations might alter particle formation through modulation of existing or introduction of new protein-protein interfaces. We focused on Omicron mutations as these are obligatory an all currently circulating strains, and specifically on N-arm mutations, which have recently been implicated in altered intramolecular interactions with NA-occupied NTD (Cubuk et al., 2023). Even though SV-AUC showed no indication of self-association of N:P13L/Δ31-33 at low micromolar concentrations, weak interactions with Kd > mM would not be detectable under these conditions yet could be highly relevant in the context of multi-valent complexes (Zhao et al., 2024). Following the roadmap used previously for the study of the weak self-association of the leucine-rich linker IDR (Zhao et al., 2023), we restricted the protein to the N-arm peptide such that it can be studied at much higher concentrations. To this end, we …”

      (12) Why were different proteins dissolved in either high-salt buffer or low-salt buffer for biophysical experiments? Did this affect the experimental results? Explanations and evidence are required.

      We appreciate this is an important point. Unfortunately, for practical reasons of available sample concentrations and quantities, it was not always possible to dialyze protein into both buffers. For example, the DSF data in Figure 4B show all proteins in low-salt buffer except N:R203K/G204R, which is in high-salt buffer. We had previously reported the absence of changes in Ti in DSF for Nref in the two buffers, which we have documented better in the revised manuscript by providing an additional Supplementary Figure S7: “As a buffer control, the difference in Ti for Nref in LS and HS buffer was measured and found to be within error of data acquisition (Supplementary Figure S7A).” This new Supplementary Figure provides an overlay of low-salt and high-salt DSF data for Nref, N:D63G, and No, which have variations in the Ti values for different buffers on the order of 0.1 °C. This is comparable to the precision of the measurement, and significantly smaller than the changes in Ti values between the different mutant protein species. Finally, we note that the one species for which we were unable to collect DSF data in low-salt buffer, N:R203K/G204R, was unremarkable relative to Nref, No, and N:P13L/Δ31-33.

      In the case of CD, the only species for which we could not collect spectra in low-salt buffer was No. Again, this spectrum was similar to the group including Nref, along with N:P13L/Δ31-33, and N:D63G. In the results we interpreted significant differences from Nref for N:G215C and N:R203K/G204R.

      Similarly, SV-AUC experiments were carried out in high-salt buffer, except Nref, Nδ , and N:G215C. In this case, we could observe a ≈ 5% difference in s-value for the same protein in different buffers, but the magnitude of this change is negligible compared to the ≈ 60-90% increase observed for altered oligomeric states. To clarify this we have inserted a sentence “Proteins for self-association studies were in buffer HS, except Nref, Nδ , and N:G215C were in LS, the latter causing a ≈5% increase in s-value (Supplementary Figure S7B).”, with the new Supplementary Figure S7B showing a comparison of sedimentation coefficient distributions of Nref and N:D63G in low- and high-salt buffers. Whether the small differences in s-values are indeed significant and reflective of salt-dependent conformational ensembles of IDRs will require a more detailed follow-up study, but is outside the scope of the present work.

      All other experiments were carried out with uniform buffer conditions for all protein species.

      (13) DLS data of N from other research suggests oligomers beyond dimer. Please address this discrepancy.

      Unfortunately several previous studies in the literature did not recognize the importance of eliminating nucleic acid contaminations in the N-protein preparations, and/or did not succeed in completely removing nucleic acid from the protein. We and others have repeatedly commented on this issue. For example, Tarczewska et al (IJBM 188 (2021) 391-403) clearly demonstrate this in much detail in a study dedicated to this problem.

      The clarify this point we have included a sentence in the paragraph describing the protein preparation “…the ratio of absorbance at 260 nm and 280 nm of ~0.50-0.55 confirmed absence of nucleic acid. The latter is important to eliminate higher order N-protein oligomers induced by nucleic acid binding (Carlson et al., 2020; Tarczewska et al., 2021; Zhao et al., 2021)” .

      In order to strengthen the statement in the Results that the ancestral N-protein is dimeric we have added additional references from other labs that have carried out detailed biophysical analyses: “As reported previously, the ancestral N-protein at micromolar concentrations in NA-free form is a tightly linked dimer sedimenting at ≈4 S , without significant populations of higher oligomers (Forsythe et al., 2021; Ribeiro-Filho et al., 2022; Tarczewska et al., 2021; Zhao et al., 2022, 2021).”

      Reviewer #2:

      The key novel finding of the work lies in the evidence that P31L promotes N-terminal interactions. The paper would be strengthened by additional studies of the impact of P31Lon the oligomerization of full-length N protein. The sedimentation analysis in Fig 6 shows that high concentrations of the N arm alone self-associate, while the analysis in Fig 5 argues that P31L does not have an effect on the oligomerization of the full-length protein. Perhaps there are specific conditions or mutation combinations that would provide evidence that P31L has an effect on protein behavior that might explain the prevalence of this mutation.

      We agree that the finding of P13L promoting N-terminal interactions is of great interest, and we thank the Reviewer for the suggestion to examine cross-correlations of N-arm mutations with other mutations as a tool to study its function and relevance.

      The observation of self-association in Figure 6 at high concentrations is not necessarily at odds with the absence of self-association at 100fold lower concentrations. Rather, it seems to show that the interaction mediated by the N-terminal mutation P13L is weak with an effective Kd in the mM range. It will likely not be possible to reach sufficiently high protein concentrations with the full-length protein to visualize the oligomerization of N-terminal IDR. But even if it was possible to concentrate the protein enough, very likely other assembly processes would take place, including LLPS, obscuring potential P13L interfaces. Nonetheless we believe the protein-protein interface created by the N-arm IDR is highly relevant in the context of multi-valent complexes, where entropic co-localization enhances the effective N-arm IDR concentration that then can provide additional binding energy and strengthen the assembly of multi-protein complexes.

      We are currently pursuing further experiments examining the properties and relevance of the N-arm mutations and intend to publish this in a separate study, not to distract from the thrust of the current work exploring of the extent of the biophysical phenotype space.

      The R203K/G204R mutations have a surprising impact on LLPS in Figure 7: it is not clear how such limited mutations would alter the many nonspecific, multivalent interactions that presumably lead to phase separation. The paper would benefit from a more extensive analysis of LLPS in this mutant and in the P31L mutant, perhaps by performing the analysis at various protein concentrations and times.

      Following this recommendation we have expanded the study of LLPS of Figure 7 by comparison of two different time points for Nref, N:R203K/G204R, and N:P13L in a new Supplementary Figure S6. We have also quantified the droplet distributions as shown in the new Supplementary Figure S5. Both clearly confirm the strong inhibitory effect of the R203K/G204R mutation on LLPS under our experimental conditions. What this shows is not that this protein could not undergo LLPS per se, but that the phase boundaries have shifted such that under the experimental conditions we applied LLPS does not occur yet. (In this context it is interesting to note that ≈50,000 genomes in the GISAID database have R203K/G204R as the sole N-protein mutation, without impact on viral viability.)

      That individual point-mutations in IDRs can have significant impact on LLPS has been observed previously for several other proteins. Examples include SPOP [Bouchard et al., Mol Cell 72 (2018) 19-36.e8], SHP2 [Zhu et al., Cell 183 (2020) 490-502.e18], FUS [Niaki et al., Mol Cell 77 (2020) 82-94.e4], and CAPRIN1 [Kim et al., PNAS 118 (2021) 1-11]. The latter work applies NMR and reveals that promotion of LLPS is not uniform but centered in hot-spot residues of CAPRIN1.

      While the precise molecular mechanism for LLPS of the N-protein is unclear, we can speculate how the effect of 203K/204R might be amplified. As shown by the coarse-grained MD simulations from Rozycki & Boura (Biophys. Chem. 2022), the linker IDR is highly flexible and the 203/204 residues make transient contacts to other residues throughout the linker as well as to distinct sites on the NTD. Furthermore, recent NMR data from the Blackledge lab (Botova et al., bioRxiv 2024, doi:10.1101/2024.02.22.579423) have revealed intra-molecular interactions, including a state where the L-rich (C-terminal) portion of the linker IDR interacts with a site on the distant NTD. (We have included a reference to this preprint in the discussion.) This intra-molecular contact observed in NMR must cause significant chain compaction and may thereby modulate the accessibility of portions of the linker IDR available to inter-molecular interactions contributing to LLPS. The residues 203/204 are in the middle between the SR-rich and L-rich region where bending of the chain must occur to allow for the intra-molecular contacts. The 203K/204R mutation may alter the dynamics or population of this intra-molecular bound state, especially considering the introduction of a bulky positively charged R replacing G204.

      In summary, considering the dynamics of intra-molecular contacts and considering precedent of several other disordered proteins, we believe it is not unreasonable that the local mutation in the IDR R203K/G204R may cause a significant shift in LLPS phase boundaries. We note that this mutant also shows a very distinct behavior in the temperature-dependent DLS, entirely lacking particle formation below 70 °C. This observation seems consistent with altered inter-molecular interactions.

      Reviewer #3:

      I have only a few minor specific comments:

      (1) Page 4, last paragraph - typo: "The large number of structural and non-structural N-protein functions poses the question of how they are conserved...". This either needs a colon or to be changed to "... poses the question of how they are conserved...".

      Thank you – we have changed this sentence accordingly.

      (2) Page 7, 2nd and 3rd paragraphs of "Physicochemical properties" section: why is Figure2B discussed before Figure 2A?

      Initially when we present the results of polarity and hydrophobicity we refer more generally to Figure 2, as the two properties are so closely related. Later, in the section on related coronaviruses we do refer once more to Figure 2. Here we begin this section by discussing Figure 2B since in this plot the symbols for the different viruses are most recognizable.

      (3) Page 11, lines 1-2: "Since this is a tell-tale of weak protein..." -> "tell-tale sign of ...".

      We thank the reviewer for pointing this out and have fixed this sentence.

      (4) Further down in the same paragraph, the meaning of "SV-AUC" should be spelled out at its first use.

      We have double checked that SV-AUC is spelled out at its first use.

      (5) Figures 1 and 2. Is there a good reason that the color scheme for the IDRs (magenta and cyan) is so close to the color scheme for the identifying mutations of Omicron and Delta (magenta and blue)? This initially led me to try to search for some connection, and it remains unclear to me if there is.

      We apologize for this confusion. This was indeed a poor color choice, and we have rectified this in the revised manuscript by changing the colors of the identifying mutations of Omicron and Delta to dashed green and dotted red, respectively, so that there is no connection to the shading of the IDRs. Thank you very much for pointing this out!

      (6) Figure 1: The physical limits of the subdomains, e.g. SR-rich, L-rich, C-arm1, and N3 could be more clearly delineated with lines, or some other visual representation.

      Once more, we thank the reviewer for pointing this out. We have revised Figure 1 to indicate the limits between these subdomains.

      (7) Figures 4, 5, and 6: are there any kind of error bars or confidence intervals on these measurements?

      We appreciate this concern and have addressed it in different ways for the different methods.

      For the spectra of intrinsic fluorescence in Figure 4A, we have now plotted an overlay of three acquired spectra, from which the experimental error as a function of wavelength may be assessed. It is clear that the differences between Nref and N:D63G are far greater than the measurement error.

      With regard to DSF, we have provide an error estimate of 0.3 °C for the Ti-values, a value that we have revised from the previously reported errors of sequential replicates to now include Ti variation observed with different preparations of the same protein over long time periods.

      For CD spectra we have included a new Supplementary Figure S3 that shows standard deviations of triplicate measurements as a function of wavelength. Since an overlay including errors for all species would be too crowded, we have created separate plots for all species in comparison with Nref. (On this occasion we discovered a 3% error in the magnitude of the Nref spectrum due to previously incorrect conversion to MRE, which we have now fixed.)

      In SV-AUC, for data with typical signal-noise ratio, the statistical error is very small due to the large number (> 104 ) of raw data points included in the calculation of each c(s) trace, which each data point carrying a statistical error that is usually better than 1%. Therefore, the dominant error is systematic. In the past we have carried out large studies quantifying the accuracy of the major peaks of the sedimentation coefficient distributions, and found they are typically ≈1% in s-value and 1-2% for relative peak areas. In the AUC methods section we have now included the sentence “Typical accuracy of c(s) peaks are on the order of ≈1% for peak s-values and ≈1-2% for relative peak areas (Zhao et al., 2015).”

      Finally, for the temperature-dependent DLS data we have to resort to the scatter in the temperature-dependent Rh-values. The calculated Rh-values can exhibit fluctuations once particles start to form and the distribution becomes highly polydisperse. As is characteristic for DLS under those conditions, individual Rh-values can be dominated by adventitious diffusion of few large particles into the laser focal spot. Although customarily autocorrelation functions can be filtered out through software filters (e.g., setting baseline and amplitude thresholds), this still presents the largest source of error in the Rh-values. These are systematic for the individual autocorrelation functions. We believe that the variation of Rh-values at similar temperatures outside the transition region provides a reasonable estimate for the experimental error.

      (8) Figure 7: My most major comment. It would be good to somehow quantify the differences between these images. The claim is made that the LLPS droplets are different sizes, or for the P13L/\Delta31-33 variant that droplets are coalescing or changing shape over time. It would be good to quantify this rather than rely on eyeballing the pictures.

      We are grateful to the Reviewer for this suggestion. As mentioned above, to improve the LLPS analysis we have now carried out segmentation of the images in Figure 7 to quantify the droplet numbers and areas. Histograms and statistical analyses are now provided in the new Supplementary Figure S5. In addition, we have added a comparison of the droplet numbers and sizes at two time-points for Nref, N:R203K/G204R, in addition to the previously shown N:P13L/Δ31-33, provided in the new Supplementary Figure S6. The results corroborate the previous conclusions, and depict how droplets in the N:P13L/Δ31-33 merge and grow in area more strongly than those from Nref.

    1. Author response:

      eLife assessment

      This study represents a fundamental contribution to our understanding of how gene expression levels are controlled in bacteria. Through a series of compelling and careful experiments, relying on a mutant that blocks DNA replication but permits growth, and using various methods, the authors reveal how genome concentration rapidly becomes limiting for growth when replication is inhibited. This work contributes to our understanding of the contributions and limiting roles of DNA, mRNA, and ribosomes for growth in bacteria, and will be of considerable interest within both systems biology and microbial physiology.

      Thank you!

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The manuscript by Mäkelä et al. presents compelling experimental evidence that the amount of chromosomal DNA can become limiting for the total rate of mRNA transcription and consequently protein production in the model bacterium Escherichia coli. Specifically, the authors demonstrate that upon inhibition of DNA replication the single-cell growth rate continuously decreases, in direct proportion to the concentration of active ribosomes, as measured indirectly by single-particle tracking. The decrease of ribosomal activity with filamentation, in turn, is likely caused by a decrease of the concentration of mRNAs, as suggested by an observed plateau of the total number of active RNA polymerases. These observations are compatible with the hypothesis that DNA limits the total rate of transcription and thus translation. The authors also demonstrate that the decrease of RNAp activity is independent of two candidate stress response pathways, the SOS stress response and the stringent response, as well as an anti-sigma factor previously implicated in variations of RNAp activity upon variations of nutrient sources.

      Remarkably, the reduction of growth rate is observed soon after the inhibition of DNA replication, suggesting that the amount of DNA in wild-type cells is tuned to provide just as much substrate for RNA polymerase as needed to saturate most ribosomes with mRNAs. While previous studies of bacterial growth have most often focused on ribosomes and metabolic proteins, this study provides important evidence that chromosomal DNA has a previously underestimated important and potentially rate-limiting role for growth.

      Thank you for the excellent summary of our work.

      Strengths:

      This article links the growth of single cells to the amount of DNA, the number of active ribosomes and to the number of RNA polymerases, combining quantitative experiments with theory. The correlations observed during depletion of DNA, notably in M9gluCAA medium, are compelling and point towards a limiting role of DNA for transcription and subsequently for protein production soon after reduction of the amount of DNA in the cell. The article also contains a theoretical model of transcription-translation that contains a Michaelis-Menten type dependency of transcription on DNA availability and is fit to the data. While the model fits well with the continuous reduction of relative growth rate in rich medium (M9gluCAA), the behavior in minimal media without casamino acids is a bit less clear (see comments below).

      At a technical level, single-cell growth experiments and single-particle tracking experiments are well described, suggesting that different diffusive states of molecules represent different states of RNAp/ribosome activities, which reflect the reduction of growth. However, I still have a few points about the interpretation of the data and the measured fractions of active ribosomes (see below).

      Apart from correlations in DNA-deplete cells, the article also investigates the role of candidate stress response pathways for reduced transcription, demonstrating that neither the SOS nor the stringent response are responsible for the reduced rate of growth. Equally, the anti-sigma factor Rsd recently described for its role in controlling RNA polymerase activity in nutrient-poor growth media, seems also not involved according to mass-spec data. While other (unknown) pathways might still be involved in reducing the number of active RNA polymerases, the proposed hypothesis of the DNA substrate itself being limiting for the total rate of transcription is appealing.

      Finally, the authors confirm the reduction of growth in the distant Caulobacter crescentus, which lacks overlapping rounds of replication and could thus have shown a different dependency on DNA concentration.

      Weaknesses:

      There are a range of points that should be clarified or addressed, either by additional experiments/analyses or by explanations or clear disclaimers.

      First, the continuous reduction of growth rate upon arrest of DNA replication initiation observed in rich growth medium (M9gluCAA) is not equally observed in poor media. Instead, the relative growth rate is immediately/quickly reduced by about 10-20% and then maintained for long times, as if the arrest of replication initiation had an immediate effect but would then not lead to saturation of the DNA substrate. In particular, the long plateau of a constant relative growth rate in M9ala is difficult to reconcile with the model fit in Fig 4S2. Is it possible that DNA is not limiting in poor media (at least not for the cell sizes studied here) while replication arrest still elicits a reduction of growth rate in a different way? Might this have something to do with the naturally much higher oscillations of DNA concentration in minimal medium?

      We note that the total RNAP activity (abundance x active fraction) was also significantly reduced in poor media (Figure 3 -- supplement 4G and H) similarly to rich medium (Figure 3H). This is consistent with DNA being limiting. The main difference between rich and poor medium conditions is that the total ribosome activity in poor media (Figure 2 -- supplement 4G and H) was less affected in comparison to rich media (Figure 2H). Our interpretation of these results is that while DNA is limiting in all medium conditions (as shown by the RNAP data), changes in ribosome activity or mRNA degradation can compensate for the reduction in transcription in poor media and hence maintain better scaling of growth rates under DNA limitation. We understand how our current presentation made it confusing. We will reorganize the text and figures to better explain our results and interpretations. 

      The authors argue that DNA becomes limiting in the range of physiological cell sizes, in particular for M9glCAA (Fig. 1BC). It would be helpful to know by how much (fold-change) the DNA concentration is reduced below wild-type (or multi-N) levels at t=0 in Fig 1B and how DNA concentration decays with time or cell area, to get a sense by how many-fold DNA is essentially 'overexpressed/overprovided' in wild-type cells.

      We will provide an estimate.

      Fig. 2: The distribution of diffusion coefficients of RpsB is fit to Gaussians on the log scale. Is this based on a model or on previous work or simply an empirical fit to the data? An exact analytical model for the distribution of diffusion constants can be found in the tool anaDDA by Vink, ..., Hohlbein Biophys J 2020. Alternatively, distributions of displacements are expressed analytically in other tools (e.g., in SpotOn).

      We use an empirical fit of Gaussian mixture model (GMM) of three states to the data and extract the fractions of molecules in each state. This avoids making too many assumptions on the underlying processes, e.g. a Markovian system with Brownian diffusion. The model in anaDDA (Vink et al.) is currently limited to two-transitioning states with a maximal step number of 8 steps per track for a computationally efficient solution (longer tracks are truncated). Using a short subset of the trajectories is less accurate than using the entire trajectory and because of this, we consider full tracks with at least 9 displacements. Meanwhile, Spot-On supports a three-state model but it is still based on a semi-analytical model with a pre-calculated library of parameters created by fitting of simulated data. Neither of these models considers the effect of cell confinement, which plays a major role on single-molecule diffusion in small-sized cells such as bacteria. For these reasons, we opted to use an empirical fit to the data. We note that the fractions of active ribosomes in WT cells grown in different media, which we extracted from these diffusion measurements, are consistent with estimates obtained by others using similar or different approaches (Forchhammer and Lindhal 1971; Mohapatra and Weisshaar, 2018; Sanamrad et al., 2014).

      The estimated fraction of active ribosomes in wild-type cells shows a very strong reduction with decreasing growth rate (down from 75% to 30%), twice as strong as measured in bulk experiments (Dai et al Nat Microbiology 2016; decrease from 90% to 60% for the same growth rate range) and probably incompatible with measurements of growth rate, ribosome concentrations, and almost constant translation elongation rate in this regime of growth rates. Might the different diffusive fractions of RpsB not represent active/inactive ribosomes? See also the problem of quantification above. The authors should explain and compare their results to previous work.

      We agree that our measured range is somewhat larger than the estimated range from Dai et al, 2016. However, they use different media, strains, and growth conditions. We also note that Dai et al did not make actual measurements of the active ribosome fraction. Instead, they calculate the “active ribosome equivalent” based on a model that includes growth rate, protein synthesis rate, RNA/protein abundance, and the total number of amino acids in all proteins in the cell. Importantly, our measurements show the same overall trend as Dai et al, 2016. Furthermore, our results are in quantitative agreements with previous experimental measurements that use ribosome profiling (Forchhammer and Lindhal 1971) or single-ribosome tracking (Mohapatra and Weisshaar, 2018; Sanamrad et al., 2014), which, we believe, validates our approach. We will clarify this point in the revised manuscript.

      To measure the reduction of mRNA transcripts in the cell, the authors rely on the fluorescent dye SYTO RNAselect. They argue that 70% of the dye signal represents mRNA. The argument is based on the previously observed reduction of the total signal by 70% upon treatment with rifampicin, an RNA polymerase inhibitor (Bakshi et al 2014). The idea here is presumably that mRNA should undergo rapid degradation upon rif treatment while rRNA or tRNA are stable. However, work from Hamouche et al. RNA (2021) 27:946 demonstrates that rifampicin treatment also leads to a rapid degradation of rRNA. Furthermore, the timescale of fluorescent-signal decay in the paper by Bakshi et al. (half life about 10min) is not compatible with the previously reported rapid decay of mRNA (24min) but rather compatible with the slower, still somewhat rapid, decay of rRNA reported by Hamouche et al.. A bulk method to measure total mRNA as in the cited Balakrishnan et al. (Science 2022) would thus be a preferred method to quantify mRNA. Alternatively, the authors could also test whether the mass contribution of total RNA remains constant, which would suggest that rRNA decay does not contribute to signal loss. However, since rRNA dominates total RNA, this measurement requires high accuracy. The authors might thus tone down their conclusions on mRNA concentration changes while still highlighting the compelling data on RNAp diffusion.

      Thank you for bringing the Hamouche et al 2022 paper to our attention. We will address this point in the revised manuscript.

      The proteomics experiments are a great addition to the single-cell studies, and the correlations between distance from ori and protein abundance is compelling. However, I was missing a different test, the authors might have already done but not put in the manuscript: If DNA is indeed limiting the initiation of transcription, genes that are already highly transcribed in non-perturbed conditions might saturate fastest upon replication inhibition, while genes rarely transcribed should have no problem to accommodate additional RNA polymerases. One might thus want to test, whether the (unperturbed) transcription initiation rate is a predictor of changes in protein composition. This is just a suggestion the authors may also ignore, but since it is an easy analysis, I chose to mention it here.

      Thank you for the suggestion. We will provide the suggested analysis in the revised manuscript.

      Related to the proteomics, in l. 380 the authors write that the reduced expression close to the ori might reflect a gene-dosage compensatory mechanism. I don't understand this argument. Can the authors add a sentence to explain their hypothesis?

      We apologize for the confusion. This will be addressed in the revised manuscript.

      In Fig. 1E the authors show evidence that growth rate increases with cell length/area. While this is not a main point of the paper it might be cited by others in the future. There are two possible artifacts that could influence this experiment: a) segmentation: an overestimation of the physical length of the cell based on phase-contrast images (e.g., 200 nm would cause a 10% error in the relative rate of 2 um cells, but not of longer cells). b) time-dependent changes of growth rate, e.g., due to change from liquid to solid or other perturbations. To test for the latter, one could measure growth rate as a function of time, restricting the analysis to short or long cells, or measuring growth rate for short/long cells at selected time points. For the former, I recommend comparison of phasecontrast segmentation with FM4-64-stained cell boundaries.

      As the reviewer notes, the small increase in relative growth was just a minor observation that does not affect our story whether it is biologically meaningful or the result of a technical artefact. But we agree with the reviewer that others might cite it in future works and thus should be interpreted with caution.

      An artefact associated with time-dependent changes (e.g. changing from liquid cultures to more solid agarose pads) is unlikely for two reasons. 1. We show that varying the time that cells spend on agarose pads relative to liquid cultures does not affect the cell size-dependent growth rate results (Figure 1 -- supplement 5B). 2. We show that the growth rate is stable from the beginning of the time-lapse with no transient effects upon cell placement on agarose pads for imaging (Figure 1 -- supplement 5B). These results were described in the Methods section where they could easily be missed. We will revise the text to discuss these controls more prominently in the Results section.

      As for cell segmentation, we have run simulations and agree with the reviewer that a small overestimation of cell area (which is possible with any cell segmentation methods including ours) could lead to a small increase in relative growth with increasing cell areas. Since the finding is not important to our story, we will simply alert the readers to the possibility that the observation may be due to a small cell segmentation bias.

      Reviewer #2 (Public Review):

      In this work, the authors uncovered the effects of DNA dilution on E. coli, including a decrease in growth rate and a significant change in proteome composition. The authors demonstrated that the decline in growth rate is due to the reduction of active ribosomes and active RNA polymerases because of the limited DNA copy numbers. They further showed that the change in the DNA-tovolume ratio leads to concentration changes in almost 60% of proteins, and these changes mainly stem from the change in the mRNA levels.

      Thank you for the support and accurate summary!

      Reviewer #3 (Public Review):

      Summary:

      Mäkelä et al. here investigate genome concentration as a limiting factor on growth. Previous work has identified key roles for transcription (RNA polymerase) and translation (ribosomes) as limiting factors on growth, which enable an exponential increase in cell mass. While a potential limiting role of genome concentration under certain conditions has been explored theoretically, Mäkelä et al. here present direct evidence that when replication is inhibited, genome concentration emerges as a limiting factor.

      Strengths:

      A major strength of this paper is the diligent and compelling combination of experiment and modeling used to address this core question. The use of origin- and ftsZ-targeted CRISPRi is a very nice approach that enables dissection of the specific effects of limiting genome dosage in the context of a growing cytoplasm. While it might be expected that genome concentration eventually becomes a limiting factor, what is surprising and novel here is that this happens very rapidly, with growth transitioning even for cells within the normal length distribution for E. coli. Fundamentally, it demonstrates the fine balance of bacterial physiology, where the concentration of the genome itself (at least under rapid growth conditions) is no higher than it needs to be.

      Weaknesses:

      One limitation of the study is that genome concentration is largely treated as a single commodity. While this facilitates their modeling approach, one would expect that the growth phenotypes observed arise due to copy number limitation in a relatively small number of rate-limiting genes. The authors do report shifts in the composition of both the proteome and the transcriptome in response to replication inhibition, but while they report a positional effect of distance from the replication origin (reflecting loss of high-copy, origin-proximal genes), other factors shaping compositional shifts and their functional effects on growth are not extensively explored. This is particularly true for ribosomal RNA itself, which the authors assume to grow proportionately with protein. More generally, understanding which genes exert the greatest copy number-dependent influence on growth may aid both efforts to enhance (biotechnology) and inhibit (infection) bacterial growth.

      We agree but feel that identifying the specific limiting genes is beyond the scope of the study. However, to examine other potential contributing factors and identify limiting gene candidates, we plan to carry out new correlation analyses between our proteomic/transcriptomic datasets and published genome-wide datasets that report various variables under unperturbed conditions (e.g., mRNA/protein concentration, mRNA degradation rates, fitness cost, transcription/translation initiation rates, and essentiality).

      Overall, this study provides a fundamental contribution to bacterial physiology by illuminating the relationship between DNA, mRNA, and protein in determining growth rate. While coarse-grained, the work invites exciting questions about how the composition of major cellular components is fine-tuned to a cell's needs and which specific gene products mediate this connection. This work has implications not only for biotechnology, as the authors discuss, but potentially also for our understanding of how DNA-targeted antibiotics limit bacterial growth.

      Good point about the DNA-targeted antibiotics. Thank you!

    1. Author response:

      Public Reviews: 

      Reviewer #1 (Public Review): 

      As a reviewer for this manuscript, I recognize its significant contribution to understanding the immune response to saprophytic Leptospira exposure and its implications for leptospirosis prevention strategies. The study is well-conceived, addressing an innovative hypothesis with potentially high impact. However, to fully realize its contribution to the field, the manuscript would benefit greatly from a more detailed elucidation of immune mechanisms at play, including specific cytokine profiles, antigen specificity of the antibody responses, and long-term immunity. Additionally, expanding on the methodological details, such as immunophenotyping panels, qPCR normalization methods, and the rationale behind animal model choice, would enhance the manuscript's clarity and reproducibility. Implementing functional assays to characterize effector T-cell responses and possibly investigating the microbiota's role could offer novel insights into the protective immunity mechanisms. These revisions would not only bolster the current findings but also provide a more comprehensive understanding of the potential for saprophytic Leptospira exposure in leptospirosis vaccine development. Given these considerations, I believe that after substantial revisions, this manuscript could represent a valuable addition to the literature and potentially inform future research and vaccine strategy development in the field of infectious diseases. 

      We have been interested in understanding how both pathogenic and non-pathogenic Leptospira species affect each other on a mammalian reservoir host. With the current study we continue to elucidate the immune mechanisms engaged by pathogenic Leptospira interrogans versus non-pathogenic L. biflexa, as a follow up to our previous work (Shetty et al, 2021 PMID: 34249775, and Kundu et al 2022 PMID 35392072). We found that both species engaged partially overlapping myeloid immune cells and inflammatory signatures of infection. For example, some chemokines were increased, and macrophage and dendritic cells were engaged at 24h post inoculation with both species of Leptospira (PMID: 34249775). Thus, we questioned whether this robust innate immune response raised to eliminate an immunogenic but rather non-pathogenic bacterium, could also help restrain L. interrogans pathogenesis. In this study we show that L. biflexa pre-exposure to L. interrogans challenge mediates improved kidney homeostasis, mitigates leptospirosis severity and leads to increased shedding of L. interrogans in urine. This suggests an interspecies symbiotic commensalistic process that facilitates survival of the pathogenic species. These findings have high impact on the lives of millions of people in areas endemic for leptospirosis that are naturally exposed to non-pathogenic Leptospira species.

      We will expand on the methodological details and will update the introduction and discussion to include answers to questions raised by the three reviewers to further clarify the importance and impact of our study.

      Reviewer #2 (Public Review): 

      Summary: 

      The authors try to achieve a method of protection against pathogenic strains using saprophytic species. It is undeniable that the saprophytic species, despite not causing the disease, activates an immune response. However, based on these results, using the saprophytic species does not significantly impact the animal's infection by a virulent species. 

      We separate concepts of exposure to a non-virulent bacterium that establishes a brief infection with engagement of an immune response (L. biflexa), from infection established by a virulent species of Leptospira that leads to pathogenesis (L. interrogans). While trying to understand how both pathogenic and non-pathogenic Leptospira species affect each other on a mammalian reservoir host, we previously found that L. biflexa induces immune responses that should affect immunity of populations naturally exposed to this spirochete. Thus, we designed this study to answer that question.

      Strengths: 

      Exposure to the saprophytic strain before the virulent strain reduces animal weight loss, reduces tissue kidney damage, and increases cellular response in mice.

      Weaknesses: 

      Even after the challenge with the saprophyte strain, kidney colonization and the release of bacteria through urine continue. Moreover, the authors need to determine the impact on survival if the experiment ends on the 15th. 

      Another novel and unexpected aspect of our findings in the single exposure experiment was that L. biflexa pre-exposure mediated a homeostatic environment in the kidney (lower ColA1, healthier renal physiology) that restrained pathogenesis of L. interrogans after challenge, which resulted in better health outcomes and increased shedding of L. interrogans in urine; in contrast, if the kidney is compromised (high ColA1) by L. interrogans (without L. biflexa pre-exposure) there was lower shedding L. interrogans in urine. Interestingly, this suggests an interspecies symbiotic commensalistic process that facilitates survival of the pathogenic species. Thus, these data suggest that higher shedding of L. interrogans in urine may not be a hallmark of increased disease, but rather it could be the opposite.

      We will include these concepts in the updated discussion.

      We don’t think that extending this experiment to d21 or d28 would add relevant data to our findings. We provide survival curves for both experiments up to d15 post infection.

      Reviewer #3 (Public Review): 

      Summary: 

      Kundu et al. investigated the effects of pre-exposure to a non-pathogenic Leptospira strain in the prevention of severe disease following subsequent infection by a pathogenic strain. They utilized a single or double exposure method to the non-pathogen prior to challenge with a pathogenic strain. They found that prior exposure to a non-pathogen prevented many of the disease manifestations of the pathogen. Bacteria, however, were able to disseminate, colonize the kidneys, and be shed in the urine. This is an important foundational work to describe a novel method of vaccination against leptospirosis. Numerous studies have attempted to use recombinant proteins to vaccinate against leptospirosis, with limited success. The authors provide a new approach that takes advantage of the homology between a non-pathogen and a pathogen to provide heterologous protection. This will provide a new direction in which we can approach creating vaccines against this re-emerging disease. 

      Strengths: 

      The major strength of this paper is that it is one of the first studies utilizing a live non-pathogenic strain of Leptospira to immunize against severe disease associated with leptospirosis. They utilize two independent experiments (a single and double vaccination) to define this strategy. This represents a very interesting and novel approach to vaccine development. This is of clear importance to the field. 

      The authors use a variety of experiments to show the protection imparted by pre-exposure to the non-pathogen. They look at disease manifestations such as death and weight loss. They define the ability of Leptospira to disseminate and colonize the kidney. They show the effects infection has on kidney architecture and a marker of fibrosis. They also begin to define the immune response in both of these exposure methods. This provides evidence of the numerous advantages this vaccination strategy may have. Thus, this study provides an important foundation for future studies utilizing this method to protect against leptospirosis. 

      Weaknesses: 

      Although they provide some evidence of the utility of pretreatment with a non-pathogen, there are some areas in which the paper needs to be clarified and expanded. 

      The authors draw their conclusions based on the data presented. However, they state the graphs only represent one of two independent experiments. Each experiment utilized 3-4 mice per group. In order to be confident in the conclusions, a power analysis needs to be done to show that there is sufficient power with 3-4 mice per group. In addition, it would be important to show both experiments in one graph which would inherently increase the power by doubling the group size, while also providing evidence that this is a reproducible phenotype between experiments. Overall, this weakens the strength of the conclusions drawn and would require additional statistical analysis or additional replicates to provide confidence in these conclusions. 

      We will take these suggestions into consideration and will address as many of these issues as possible in the revised manuscript.

      A direct comparison between single and double exposure to the non-pathogen is not able to be determined. The ages of mice infected were different between the single (8 weeks) and double (10 weeks) exposure methods, thus the phenotypes associated with LIC infection are different at these two ages. The authors state that this is expected, but do not provide a reasoning for this drastic difference in phenotypes. It is therefore difficult to compare the two exposure methods, and thus determine if one approach provides advantages over the other. An experiment directly comparing the two exposure methods while infecting mice at the same age would be of great relevance to and strengthen this work. 

      Both experiments need to be analyzed as separate but complementary as they provide different hind sights into L. interrogans pathogenesis and potential solutions to the problem. Optimal measurements of disease progression (weight loss, survival curves) require infection of mice at 8 weeks. Based on this, a new L. biflexa double exposure experiment would have to start when mice are 4 weeks old which is just after weaning, and before the mouse immune system is fully developed.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This is a valuable contribution to the electric fish community, and to studies of active sensing more generally, in that it provides evidence that a well-studied behavior (chirping) may serve in active sensing rather than communication. For the most part, the evidence is solid. In particular, the evidence showing increased chirping in more cluttered environments and the relationship between chirping and movement are convincing. Nevertheless, evidence to support the argument that chirps are mostly used for navigation rather than communication is incomplete.

      Thank you for the comment. In response to what seemed to be a generalized need for more evidence to support our hypothesis, we have extensively reviewed the manuscript, changed the existing figures and added new ones (3 new figures in the main text and 4 in the supplementary information section). Our edits include:

      (1) changes to the written text to remove categorical statements ruling out the possible communication function of chirps. When necessary, we have also added details on why we believe a social communication function of chirps could interfere with a role in electrolocation.

      (2) new experiments (and related figures) adding details on the behavioral correlates of chirping, on the effects of chirps on electric images (which are a way to represent current flow on the fish skin), and behavioral responses to ramp frequency playback EODs (used to test a continuous range of beat frequencies and fill the sampling gaps left by our experiments using real fish).

      Public Reviews:

      Reviewer #1 (Public Review):

      The authors investigate the role of chirping in a species of weakly electric fish. They subject the fish to various scenarios and correlate the production of chirps with many different factors. They find major correlations between the background beat signals (continuously present during any social interactions) or some aspects of social and environmental conditions with the propensity to produce different types of chirps. By analyzing more specifically different aspects of these correlations they conclude that chirping patterns are related to navigation purposes and the need to localize the source of the beat signal (i.e. the location of the conspecific).

      We thank the Reviewer for the extensive feedback received. Hereby we respond to each of the points raised.

      We have better clarified that our intention is not to propose chirps as tools for “conspecific localization” intended as the pinpointing of its particular location. Instead, based on our observation of chirps being employed at very close ranges, we suggest that chirps may serve to assess other parameters related to “conspecific positioning” (which in a wide sense, it is still “electrolocation”), and that could be derived from the beat. These parameters might include size, relative orientation, or subtle changes in position during movement. While the experiments discussed in the manuscript do not provide a conclusive answer in this regard, we prioritize here the presentation of broader evidence for a different use of chirping. We are actively working on another manuscript that explores this aspect more in detail, but, due to space limitations, additional results had to be excluded.

      In the abstract we mention a role of chirps in the enhancement of “electrolocation”, but - as above mentioned - it is here meant only in a broad sense. In the introduction (at the very end) we propose chirps as self-directed signals (homeoactive sensing). In the result paragraph dedicated to the novel environment exploration experiment the following lines were added “Most chirps (90%) in fact are produced within a distance corresponding to 1% of the maximum field intensity (i.e. roughly 30 cm; Figure S12B), indicating that chirping occurs way below the threshold range for beat detection (i.e. roughly in the range of 60-120 cm, depending on the study; see appendix 1: Detecting beats at a distance) and likely does not represent a way to improve it”. We conclude this paragraph mentioning “This further corroborates the hypothesized role of chirps in beat processing.”. The last result paragraph (on chirping in cluttered environments) ends with “This supports the notion of chirps as self-referenced probing cues, potentially employed to optimize short-range aspects of conspecific electrolocation, such as conspecific size, orientation, and swimming direction - a hypothesis that will certainly be explored in future studies.”. In the discussion paragraph entitled “probing with chirps”, we do provide hints to possible mechanisms implied in the role of chirps in beat processing. As mentioned, we have planned to add further details in another manuscript, currently in preparation.

      The study provides a wealth of interesting observation of behavior and much of this data constitute a useful dataset to document the patterns of social interactions in these fish. Some data, in particular the high propensity to chirp in cluttered environments, raises interesting questions. Their main hypothesis is a useful addition to the debate on the function of these chirps and is worth being considered and explored further. However, the data they provide does not support strong conclusion statements arguing that these chirps are used for localization purposes and is even less convincing at rejecting previously established hypotheses on the communication purpose of the chirps.

      We intentionally framed our aims a bit provocatively to underscore that, to date, the role of chirps in social communication has been supported solely by correlative evidence. While the evidence we provide to support the role of chirps as probes is also correlative, it opens at the same time critical questions on the long assumed role of chirps in social communication. In fact, chirping is strongly dependent on fish reciprocal positioning, highly constrained by beat frequency, and patterned in such ways that - in our opinion - makes the existence of links between chirp types and internal states less likely, as suggested instead by the current view. Moreover, the use of different chirp types does not appear specific to any of the social contexts analyzed but is primarily explained by DF (beat frequency). This observation, coupled with the analysis of chirp transitions (more self-referenced than reflecting an actual exchange between subjects), leads us to hypothesize with greater confidence that chirp production may be more related to sensing the environment, rather than transmitting information about a specific behavioral state.

      Nevertheless, the Reviewer's comment is valid. We've tempered the study's conclusions by introducing the possibility of chirps serving both communication and electrolocation functions, as stated in the conclusion paragraph: "While our results do not completely dismiss the possibility of chirps serving a role in electrocommunication—probing cues could, for instance, function as proximity signals to signal presence, deter approaches, or coordinate behaviors like spawning (Henninger et al., 2018).". Nonetheless, we do emphasize that our hypothesis is more likely to apply - based on our data. We refrain from categorically excluding a communicative function for chirps (between subjects), but we hypothesize that this communication - if occurring - may contain the same type of information as the self-directed signaling implied by the “chirps as probes” idea (i.e. spatial information).

      In response to the Reviewer's feedback, we've revised the end of the introduction, removing suggestions of conclusiveness: "Finally, by recording fish in different conditions of electrical 'visibility,' we provide evidence supporting a previously neglected role of chirps: homeoactive sensing." (edit: the word “validating” has been removed to give a less “conclusive” answer to the open functional questions about chirping).

      I would suggest thoroughly revising the manuscript to provide a neutral description of the results and leaving any speculations and interpretations for the discussion where the authors should be careful to separate strongly supported hypotheses from more preliminary speculations. I detail below several instances where the argumentation and/or the analysis are flawed.

      Following to the reviewer’s comment, we have revised the manuscript to emphasize the following points: 1) the need for a revision of the current view on chirping, 2) our proposal of an alternative hypothesis based on correlations between chirping and behavior, which were previously unexplored, and 3) our acknowledgment that while we offer evidence supporting a probing role of chirps (e.g., lack of behavioral correlation, DF-dependency, stereotypy in repeated trials, modulation by clutter and distance), we do not present here conclusive evidence for chirps detecting specific details of conspecific positioning. Neither do we exclude categorically a role of chirps in social communication.

      They analyze chirp patterning and show that, most likely, a chirp by an individual is followed by a chirp in the same individual. They argue that it is rare that a chirp elicits a "response" in the other fish. Even if there are clearly stronger correlations between chirps in the same individual, they provide no statistical analysis that discards the existence of occasional "response" patterns. The fact that these are rare, and that the authors don't do an appropriate analysis of probabilities, leads to this unsupported conclusion.

      We employed cross-correlation indices, calculated and assessed with a 3 standard deviation symmetrical boundary (which is a statistically sound and strict criterion). Median values were utilized to depict trends in each group/pair. To support our findings, we added new experiments and new figures: 1) a correlation analysis between chirps and behaviors, providing more convincing evidence of how chirps are employed during "scanning" swimming activity (backward swimming); 2) a text mining approach to underscore chirp-behavior correlations, employing alternative and statistically more robust methods.

      One of the main pieces of evidence that chirps can be used to enhance conspecific localization is based on their "interference" measure. The measure is based on an analysis of "inter-peak-intervales". This in itself is a questionable choice. The nervous system encodes all parts of the stimulus, not just the peak, and disruption occurring at other phases of the beat might be as relevant. The interference will be mostly affected by the summed duration of intervals between peaks in the chirp AM. They do not explain why this varies with beat frequency. It is likely that the changes they see are simply an artifact of the simplistic measure. A clear demonstration that this measure is not adequate comes from the observation in Fig7E-H. They show that the interference value changes as the signal is weaker. This measure should be independent of the strength of the signal. The method is based on detecting peaks and quantifying the time between peaks. The only reason this measure could be affected by signal strength is if noisy recordings affect how the peak detection occurs. There is no way to argue that this phenomenon would happen the same way in the nervous system. Furthermore, they qualitatively argue that patterns of chirp production follow patterns of interference strength. No statistical demonstration is done. Even the qualitative appraisal is questionable. For example, they argue that there are relatively few chirps being produced for DFs of 60 or -60 Hz. But these are DF where they have only a very small sample size. The single pair of fish that they recorded at some of these frequencies might not have chirped by chance and a rigorous statistical analysis is necessary. Similarly, in Fig 5C they argue that the position of the chirps fall on areas of the graph where the interferences are strongest (darker blue) but this is far from obvious and, again, not proven.

      We would like to clarify that the estimation of the effects of chirps on the beat (referred to as “beat interference”) was not intended to serve as the primary evidence supporting a different use of chirping. In fact, all the experiments conducted prior to that calculation already provide substantial evidence supporting the hypothesis we have proposed. In an attempt to address the Reviewer’s concern and to avoid misleading interpretations, we moved this part now to the Supplementary Information (see now Figures S8 and S9), in agreement with the non crucial relevance of this approach. We also added the following statement to the result paragraph entitled “Chirps significantly interfere with the beat and enhance electric image contrast”: “Obviously, measuring chirp-triggered beat interferences by using an elementary outlier detection algorithm on the distribution of beat cycles does not reflect any physiological process carried out by the electrosensory system and can be therefore used only as an oversimplified estimate.”.

      Regarding the meaning of “beat interference” (as here estimated) from a perspective of brain physiology: chirp interference was calculated using the beat cycles as a reference. Beat peaks were used only to estimate beat cycle duration. Regardless of whether or not a beat peak is represented in the brain, beat cycle duration (estimated using the peaks) is the main determinant of p-unit rhythmic response to a beat. Regarding the effect of signal amplitude, this is also not very relevant. It is obvious that a chirp creates more - or less - interference based on the chirp FM and its duration (but also the sign of the DF and the magnitude of the amplitude modulation). If electroreceptor responses are entrained in waves of beat AMs and if “interference” is a measure of how such waves are scrambled, then “interference” is a measure of how chirps scramble waves of electroreceptor activity by affecting beat AMs.

      The reason why the interference fades with the signal (previous figure 7, now Figure S12) is because it is weighted on the signal strength (the signals used as carrier for chirps are recalculated based on real measurements of signal strength at different distances). Nonetheless, the Reviewer is right: mathematically speaking interference would not change at all because it is just the result of an outlier detection algorithm. This outlier detection is actually set to have a 1% threshold (percent of beat contrast).

      Regarding the comparison “chirps vs interference”, we did not make a statistical analysis because we wanted to just show a qualitative observation. Similar results can be obtained for slightly shorter or longer time windows, within certain limits of course (see added Figure S9, in the Supplementary Information). We hope that moving this analysis to the supplementary information makes it clear that this approach is not central to make our point.

      The Reviewer’s point on the DF sampling is correct, we have reconsidered the low chirping at 60Hz as potentially the result of sampling bias and edited the respective result paragraph.

      They relate the angle at which one fish produces chirps relative to the orientation of the mesh enclosing. They argue that this is related to the orientation of electric field lines by doing a qualitative comparison with a simplified estimate of field lines. To be convincing this analysis should include a quantitative comparison using the exact same body position of the two fish when the chirps are emitted.

      We agree with the Reviewer, this type of experiment would be much better suited to illustrate the correlations between chirping and reciprocal positioning in fish. What we can see is that chirping occurs at certain orientations more often than others. This could have something to do with either field geometry or with locomotion in the particular test environment we have used. As mentioned earlier, we are currently editing a second manuscript which will include the type of analysis/experiment the Reviewer is thinking of. We preferred to focus in this first study on the broader behavioral correlates of chirping. We removed the mention to the field current lines because - we agree - the argument is vague as presented here.

      They show that the very vast majority of chirps in Fig 6 occur when the fish are within a few centimeters (e.g. very large first bin in Fig6E-Type2). This is a situation when the other fish signal will be strongest and localization will be the easiest. It is hard to understand why the fish would need a mechanism to enhance localization in these conditions (this is the opposite of difficult conditions e.g. the "cluttered" environment).

      Agreed, in fact we do not explicitly propose chirps as means to improve “electrolocation” (this word is used only broadly in the abstract) but instead as probes to extract spatial information (e.g. shape, motion, orientation) from a beat source. In a broader sense, all these spatial parameters contribute to any given instance of "localization." Because we were unable to explore all these aspects in greater detail, we chose to maintain a broader perspective. If chirps contribute to a better resolution of fine spatial attributes of conspecific locations, it is reasonable to expect higher chirping rates in proximity to the target fish.

      The argumentation aimed at rejecting the well-established role of chirp in communication is weak at best. First, they ignored some existing data when they argue that there is no correlation between chirping and behavioral interactions. Particularly, Hupe and Lewis (2008) showed a clear temporal correlation between chirps and a decrease in bites during aggressive encounters. It could be argued that this is "causal evidence" (to reuse their wording) that chirps cause a decrease in attacks by the receiver fish (see Fig 8B of the Hupe paper and associated significant statistics). Also, Oboti et al. argue that social interactions involve "higher levels of locomotion" which would explain the use of chirps since they are used to localize. But chirps are frequent in "chirp chamber" paradigms where no movement is involved. They also point out that social context covaries with beat frequency and thus that it is hard to distinguish which one is linked to chirping propensity and then say that it is hard to disentangle this from "biophysical features of EOD fields affecting detection and localization of conspecific fish". But they don't provide any proof that beat frequency affects detection and localization so their argument is not clear. Last, they argue that tests in one species shouldn't be extrapolated to other species. But many of the studies arguing for the role of chirps in communication was done on brown ghost. In conclusion of this point, they do not provide any strong argument that rejects the role of chirps as a communication signal. A perspective that would be better supported by their data and consistent with past research would be to argue that, in addition to a role in communication, chirps could sometimes be used to help localize conspecifics.

      We did not intend to disregard the extensive body of literature supporting a role of chirps in social communication. Rather, the primary goal of this study was to present a valid alternative perspective to this prevailing view. The existence of a well-established hypothesis does not imply that new evidence cannot change it; it simply indicates that changing it may be challenging either because it's genuinely difficult or because the idea has not been thoroughly explored. Whatever the case may be, proposing new hypotheses, whether complementary or alternative to established theories, is a challenging undertaking for a single study. We judged that starting from broad correlations would be the most desirable approach.

      We did not ignore data from Hupé and Lewis 2008. We cited this study repeatedly and compared their findings to those of others, not only for the correlation chirp-behaviors but also for chirping distance considerations. However, following the Reviewer’s comment, we now cite this study in the context of the behavioral analysis recently added (data from the PSTH plots could possibly confirm the observation of lower chirps during attacks). We also cited the study by Triefenbach and Zakon 2008, which reports something along the same lines. See the statement: “Overall, these results provided mutually reinforcing evidence indicating that chirps are produced more often during locomotion or scanning-related motor activity and confirm previous reports of a lower occurrence of chirping during more direct aggressive contact (as shown also by Triefenbach and Zakon, 2008; Hupé and Lewis, 2008).”, in the result paragraph related to the behavioral correlates of chirping.

      In our study we make it clear how we distinguish causal evidence (i.e. providing evidence that A is required for B) from correlation (i.e. evidence for A simply occurring together with B). We also make it clear that we are not going to provide causal evidence but we are going to provide new evidence for correlations that were so far not considered, in order to propose a new unexplored function of chirps.

      The Reviewer's point on chirping during motion and while caged in a chirp chamber is valid. Indeed at first we were also puzzled by this finding. However, under the “chirps as probes” paradigm, chirping in a chirp-chamber can be explained by the need to obtain spatial information from an otherwise unreachable beat source (brown ghosts are typically exploring new environmental objects or conspecifics by actively swimming around them - something caged fish can’t do). So, eventually the observation of chirping under conditions of limited movement (such as in a chirp chamber experiment) is not in contradiction with our hypothesis, rather it can be used to support it. Further experiments are required - as rightfully pointed out - to evaluate the effects of beat frequency on beat detection. We added a note about this in the “probing with chirps” discussion paragraph.

      The Reviewer's comment regarding generalization is unclear. We acknowledge that most studies are conducted in brown ghosts, as stated in the abstract. Our intention was to highlight that insights gained from this species have been applied to broaden the understanding of chirps in other species. Specifically, the "behavioral meaning idea" of chirping has been extended to other gymnotiform species producing EOD frequency modulations .

      Our study's aim is not to dismiss the idea of chirps being used for communication but to present an alternative hypothesis and to provide supporting evidence. While our results may not align well with the communication theory, our intention is not dismissal but rather engaging in a discussion and exploration of alternative perspectives.

      The discussion they provide on the possible mechanism by which chirps could help with localization of the conspecific is problematic. They imply that chirps cause a stronger response in the receptors. For most chirps considered here, this is not true. For a large portion of the beat frequencies shown in this paper, chirps will cause a de-synchronization of the receptors with no increase in firing rate. They cannot argue that this represents an enhanced response. They also discuss a role for having a broader frequency spectrum -during the chirp- in localization by making a parallel with pulse fish. There is no evidence that a similar mechanism could even work in wave-type fish.

      We have already commented on the “localization” idea in our previous responses. The Reviewer is right in saying that we have provided only vague descriptions of the potential mechanisms implied by our hypothesis. The studies by Benda and others (2005, 2006) demonstrate a clear synchronizing effect of chirps on p-unit firing rates, especially at low DFs (at ranges similar to those considered in this study). This synchronization could lead to an enhanced response at the electroreceptor level, as described in these very studies, which in turn would result in a higher probability of firing in downstream neurons (E-cells in the ELL).

      As also reported within the same works, chirps may also exert an opposite effect on p-units (i.e. desynchronization). This is what happens for large chirps at high DFs. Desynchronization may cause temporary lapses of p-unit firing, which in turn may lead to increased activity of I-cells in the ELL (which are indeed specifically tuned to p-unit lack of activity).

      So, in general, if we consider both ON and OFF pyramidal cells (in the ELL) and small and large chirps, we could state that chirps can be potentially used to enhance the activity of peripheral electrosensory circuits through different mechanisms, contingent on the chirp type and beat frequency. Unfortunately, space constraints limited our ability to dig into these details in the present study.

      However, to address the Reviewer’s rightful point, we now mention this in the manuscript: Since the beat AMs generated by the chirps always trigger reliable responses in primary electrosensory circuits (pyramidal cells in the ELL respond to both increases and decreases in beat AM), any chirp-triggered AM causing a sudden change in p-unit firing could potentially amplify the downstream signal (Marsat and Maler, 2010) and thus enhance EI contrast.” (see result paragraph on beat interference and electric images).

      They write the whole paper as if males and females had been identified in their experiments. Although EOD frequency can provide some guess of the sex the method is unreliable. We can expect a non-negligible percentage of error in assigning sex.

      We agree and in fact, in the method section we state:

      “The limitation of this approach is that females cannot be distinguished from immature males with absolute certainty, since no post-mortem gonadal inspection was carried out.”

      to this we added:

      “Although a more accurate way to determine the sex of brown ghosts would be to consider other morphological features such as the shape of the snout, the body size, the occurrence of developing eggs, EOD frequency has been extensively used for this purpose.”

      Moreover, the consistent behavioral differences observed in low frequency fish, measured with those behavioral experiments aimed at assessing responses to playback stimuli and swimming behavior in novel environments, could also be caused by a younger age (as opposed to femaleness). However, the size ranges of our fish (an admittedly unreliable proxy of age) were all comparable, making this possibility perhaps less likely.

      Reviewer #2 (Public Review):

      Studying the weakly electric brown ghost knifefish, the authors provide evidence that 'chirps' (brief modulations in the frequency and amplitude of the ongoing electric signal) function in active sensing (specifically homeoactive sensing) rather than communication. This is a behavior that has been very well studied, including numerous studies on the sensory coding of chirps and the neural mechanisms for chirp generation. Chirps are largely thought to function in communication behavior, so this alternative function is a very exciting possibility that could have a great impact on the field. The authors do provide convincing evidence that chirps may function in homeoactive sensing. However, their evidence arguing against a role for chirps in communication is not as strong, and neglects a large body of research. Ultimately, the manuscript has great potential but suffers from framing these two possibilities as mutually exclusive and dismissing evidence in favor of a communicative function.

      We thank the Reviewer for the comment. Overall, we have edited the manuscript to soften our conclusions and avoid any strong categorical statement excluding the widely accepted role of chirps in social communication. We have added some new experiments with the aim to add more detail to the behavioral correlates of chirping and to the DF dependency of the production of different types of chirps. Nonetheless, based on our results, we are prone to conclude that the communication idea - although widely accepted - is not as well substantiated as it should be.

      Although we do not dismiss the bulk of literature supporting a role of chirps in social communication, we think that our hypothesis (i.e. decoding of spatial parameters from the beat) may be not fully compatible with the social communication hypothesis for the following reasons:

      (1) Chirp type dependency on DF makes chirps likely to be adaptive responses to beat frequency. While this idea is compatible with a role of chirps in the detection of beat parameters, their concurrent role in social communication would imply that chirps interacting at given beat frequencies (DFs) would communicate only (or mainly) by delivering a very limited range of “messages”. For instance, assuming type 2 chirps are related to aggression (as widely suggested), are female-male pairs - with larger DFs - interacting less aggressively than same sex pairs? Our experiments often suggested this is not the case. In addition, large DFs are not always indicative of opposite sex interactions, while they are very often characterized by the emission of large chirps. Not to mention that, despite the fact that opposite sex interactions in absence of breeding-like conditions, cannot be considered truly courtship-related, large chirps are often considered courtship signals, regardless of the reproductive state of the emitting fish.

      (2) Chirping is highly affected by locomotion (consider female/male pairs with or without mesh divider) and distance (as shown in the novel environment exploration experiment). While the involvement of both parameters is compatible with a role of chirps in active sensing, a role of chirps in social communication implies that such signaling would occur only when fish are in very close proximity to each other. In this case, the beat is therefore heavily distorted not only by fish position/locomotion but also by chirps. Which means that when fish are close to each other, the 2 different types of information relayed by the beat (electrolocation and electrocommunication) would certainly interfere (this idea has been better phrased in the Introduction paragraph).

      (3) In our playback experiments we could not see any meaningful matching (e.g. angry-chirp → angry-chirp or sexy-chirp → approach) between playback chirps and evoked chirps, raising doubts on the meaning associated so far with the different types. Considering that playback experiments are typically used to assess signal meaning based on how animals respond to them, this result is suggesting quite strongly that such meaning cannot be assigned to chirps.

      (4) In playback experiments in which the same stimulus is provided multiple times, chirp type transitions (i.e. emission of a different chirp type after a given chirp) become predictable (as shown in the added playback experiments using ramping signals). This confirms that the choice to emit a given chirp type has something to do with beat frequency (or a change in this parameter) and not a communication of internal states. It would be otherwise unclear how a fish could change its internal state so quickly - and so reliably - even in the span of a few seconds.

      Despite this evidence against a semantic content of chirps in the context of social communication, we conclude the manuscript reminding that we are not providing strong evidence dismissing the communication hypothesis, and that both could coexist (see the example of “proximity signals” in the mating context given in the concluding paragraph).

      (1) The specific underlying question of this study is not made clear in the abstract or introduction. It becomes apparent in reading through the manuscript that the authors seek to test the hypothesis that chirps function in active sensing (specifically homeoactive sensing). This should be made explicitly clear in both the abstract and introduction, along with the rationale for this hypothesis.

      In the abstract we state “Despite the success of this model in neuroethology over the past seven decades, the underlying logic of their electric communication remains unclear. This study re-evaluates this view, aiming to offer an alternative, and possibly complementary, explanation for why these freshwater bottom dwellers emit electric chirps.”. This statement is meant as a summary of our aims. However, in order to convey a clearer message, we have revised the whole manuscript to more explicitly articulate our objectives. In particular we stress that with our experiments we intend to provide correlative evidence for a different role of chirps (previously unexplored) with the idea to stimulate a discussion and possibly a revision of the current theory about the functional role of chirps.

      In the introduction we have added a paragraph explaining our aim and also why we think that communicating through chirps could potentially interfere with efficient electrolocation: “Since both chirps and positional parameters (such as size, orientation or motion) can only be detected as perturbations of the beat (Petzold et al., 2016; Yu et al., 2012; Fotowat et al., 2013), and via the same electroreceptors, the inputs relaying both types of information are inevitably interfering. Moreover, as the majority of chirps are produced within a short range (< 50 cm; Zupanc et al., 2006; Hupé and Lewis 2008; Henninger et al., 2018; see appendix 1) this interference is likely to occur consistently during social interactions.

      Under the communication-hypothesis, the assumption that chirps and beats are conveying different types of information (i.e. semantic value as opposed to position and related geometrical parameters) is therefore leaving this issue unresolved.”.

      (2) My biggest issue with this manuscript is that it is much too strong in dismissing evidence that chirping correlates with context. This is captured in this sentence in the introduction, "We first show that the choice of different chirp types does not significantly correlate with any particular behavioral or social context." This very strong conclusion comes up repeatedly, and I disagree with it, for the following reasons:

      In your behavioral observations, you found sex differences in chirping as well as differences between freely interacting and physically separated fish. Your model of chirp variability found that environmental experience, social experience, and beat frequency (DF) are the most important factors explaining chirp variability. Are these not all considered "behavioral or social context"? Beat frequency (DF) in particular is heavily downplayed as being a part of "context" but it is a crucial part of the context, as it provides information about the identity of the fish you're interacting with.

      In your playback experiments, fish responded differently to small vs. large DFs, males chirped more than females, type 2 chirps became more frequent throughout a playback, and rises tended to occur at the end of a playback. These are all examples of context-dependent behavior.

      We agree with the Reviewer’s comment and we think that probably we have been unclear in what the meaning of that statement was. We also agree with the Reviewer about what is defined as “context”, and that a given beat frequency (DF) can in the end represent a “behavioral context” as well. In order to make it clearer, we have rephrased this statement and changed it to: “We first show that the relative number of different chirp types in a given recording does not significantly correlate with any particular behavioral or social context.”. This new form refers specifically to the observation that - in all different social conditions examined - the relative amounts of different types of chirps is unchanged (see Figure S2). We thought the Reviewer maybe interpreted our statement as if we suggested that chirp type choice is random or unaffected by any social variable. We agree with the Reviewer that this is not the case. We also reported that sex differences in chirping are present, but we have emphasized they may have something to do with the propensity of the brown ghosts of either sex to swim/explore as opposed to seek refuge and wait (as suggested by our experiments in which FM pairs were either divided or freely interacting and our novel environment exploration experiments).

      We agree DF is important, in fact it is the 3rd most important factor explaining chirp variance in our model. In our fish pair recordings, we see a strong correlation of chirp total variance with tank experience (one naïve, one experienced, both fish equally experienced) and social context (novel to each other/familiar to each other, subordinate/dominant, breeding/non breeding, accessible/not accessible) although data clustering seems to better distinguish “divided” vs “freely moving” conditions (and sex may also play a role as well because of the reversal of sexual dimorphism in chirp rates in precisely this case) more than other variables. However, we do not see a specific effect of these variables on the proportion of different types of chirps in any recording (see Figure S2).

      We also edited the beginning of the first result paragraph and changed it to “Thus, if behavioral meaning can be attributed to different types of chirps, as posed by the prevailing view (e.g., Hagedorn and Heiligenberg, 1985; Larimer and MacDonald, 1968; Rose, 2004), one should be able to identify clear correlations between behavioral contexts characterizing different internal states and the relative amounts of different types of chirp”, to emphasize we are here assessing the meaning of different types of chirps (not of the total amount of chirping in general).

      Further, you only considered the identity of interacting fish or stimulated fish, not their behavior during the interaction or during playback. Such an analysis is likely beyond the scope of this study, but several other studies have shown correlations between social behavior and chirping. In the absence of such data here, it is too strong to claim that chirping is unrelated to context.

      We agree with the Reviewer, in fact this analysis was previously carried out but purposely left out in an attempt to limit the manuscript length. We have now made space for this experimental work which is now added (see the new Figure 6).

      In summary, it is simply too strong to say that chirping does not correlate with context. Importantly, however, this does not detract from your hypothesis that chirping functions in homeoactive sensing. A given EOD behavior could serve both communication and homeoactive sensing. I actually suspect that this is quite common in electric fish. The two are not mutually exclusive, and there is no reason for you to present them as such. I recommend focusing more on the positive evidence for a homeoactive function and less on the negative evidence against a communication function.

      We aimed to clarify that our reference was to the lack of correlation between "chirp type relative numbers" and the analyzed context. Regarding the communication function, we tempered negative statements. However, as this study stems from evidence within the established paradigm of "chirps as communication signals", and aims at proposing an alternative hypothesis, eliminating all references to it could undermine the study's purpose.

      (3) The results were generally challenging to follow. In the first 4 sections, it is not made clear what the specific question is, what the approach to addressing that question is, and what specific experiment was carried out (the last two sections of the results were much clearer). The independent variables (contexts) are not clearly established before presenting the results. Instead they are often mentioned in passing when describing the results. They come across as an unbalanced hodgepodge of multiple factors, and it is not made clear why they were chosen. This makes it challenging to understand why you did what you did, the results, and their implications. For each set of major results, I recommend: First, pose a clear question. Then, describe the general approach to answering that question. Next, describe the specifics of the experimental design, with a rationale that appeals to the general approach described. Finally, describe the specific results.

      The introductory sentences of the first result paragraphs have been edited, rendering the aim of the experiments more explicit.

      (4) Results: "We thus predicted that, if behavioral meaning can be attributed to different types of chirps, as posed by the prevailing view (e.g., Hagedorn and Heiligenberg, 1985; Larimer and MacDonald, 1968; Rose, 2004)..." It should be made clear why this is the prevailing view, and this description should likely be moved to the introduction. There is a large body of evidence supporting this view and it is important to be complete in describing it, especially since the authors seem to seek to refute it.

      We understand the Reviewer’s question and we tried to express in the introduction the main reasons for why this is the current view. We state “Different types of chirps are thought to carry different semantic content based on their occurrence during either affiliative or agonistic encounters (Larimer and MacDonald 1968; Bullock 1969; Hopkins 1974; Hagedorn and Heiligenberg 1985; Zupanc and Maler 1993; Engler et al. 2000; Engler and Zupanc 2001; Bastian et al., 2001).”. To this we added: “Although supported mainly by correlative evidence, this idea gained popularity because it is intuitive and because it matches well enough with the numerous behavioral observations of interacting brown ghosts.”.

      We believe the prevailing view is based on intuition and a series of basic observed correlations repeated throughout the years. The crystallization of this idea is not due to negligence but mainly to technical limitations existing at the time of the first recordings. In order to assess the role of chirps in behaving fish a tight and precise temporal control over synched video-EOD recordings is most likely necessary, and this is a technical feature probably available only much later than the 50-60ies, when electric communication was first described.

      (5) I am not convinced of the conclusion drawn by the analysis of chirp transitions. The transition matrices show plenty of 1-2 and 2-1 transitions occurring. Further, the cross-correlation analysis only shows that chirp timing between individuals is not phase-locked at these small timescales. It is entirely possible that chirp rates are correlated between interacting individuals, even if their precise timing is not.

      We agree with the Reviewer: chirp repertoires recorded in different social contexts are not devoid of reciprocal chirp transitions (i.e. fish 1 chirp - to - fish 2 chirp, or vice versa). Yet our point is to emphasize that their abundance is way more limited when compared to the self-referenced ones (i.e. 1-1 and 2-2). This is a fair concern and in order to further address this point, we have added a whole new set of analyses and new experiments (see chirp-behavior correlations, PSTHs and more analysis based on more solid statistical methods; see Figure 6).

      Reviewer #3 (Public Review):

      Summary:

      This important paper provides the best-to-date characterization of chirping in weakly electric fish using a large number of variables. These include environment (free vs divided fish, with or without clutter), breeding state, gender, intruder vs resident, social status, locomotion state and social and environmental experience, as well as with playback experiments. It applies state-of-the-art methods for reducing dimensionality and finding patterns of correlation between different kinds of variables (factor analysis, K-means). The exceptional strength of the evidence, collated from a large number of trials with many controls, leads to the conclusion that a number of commonly accepted truths about which variable affects chirping must be carefully rewritten or nuanced. Based on their extensive analyses, the authors suggest that chirps are mainly used as probes that help detect beats and objects.

      Strengths:

      The work is based on completely novel recordings using interaction chambers. The amount of new data and associated analyses is simply staggering, and yet, well organized in presentation. The study further evaluates the electric field strength around a fish (via modelling with the boundary element method) and how its decay parallels the chirp rate, thereby relating the above variables to electric field geometry.

      The main conclusions are that the lack of any significant behavioural correlates for chirping, and the lack of temporal patterning in chirp time series, cast doubt on a communication goal for most chirps. Rather, the key determinants of chirping are the difference frequency between two interacting conspecifics as well as individual subjects' environmental and social experience. These conclusions by themselves will be hugely useful to the field. They will also allow scientists working on other "communication" systems to at least reconsider, and perhaps expand the precise goal of the probes used in those senses. There are a lot of data summarized in this paper, and thorough referencing to past work. For example, the paper concludes that there is a lack of evidence for stereotyped temporal patterning of chirp time series, as well as of sender-received chirp transitions beyond the known increase in chirp frequency during an interaction.

      The alternative hypotheses that arise from the work are that chirps are mainly used as environmental probes for better beat detection and processing and object localization.

      The authors also advance the interesting idea that the sinusoidal frequency modulations caused by chirps are the electric fish's solution to the minute (and undetectable by neural wetware) echo-delays available to it, due to the propagation of electric fields at the speed of light in water.

      Weaknesses:

      My main criticism is that the alternative putative role for chirps as probe signals that optimize beat detection could be better developed. The paper could be clearer as to what that means precisely.

      We appreciate the Reviewer's kind comments. While we acknowledge that our exploration of chirp function in this study may be limited and not entirely satisfying, we made this decision due to space constraints, opting for a broader and diversified approach. We hope that future studies will build on these data and start filling the gaps. We are also working on another manuscript which is addressing this point more in detail.

      Nonetheless, we considered the Reviewer’s criticism and added not only a new figure (to show more explicitly what chirps can do to the perceived electric fields, as simulated by electric images) but also more descriptive parts explaining how we think chirps may act to improve the spatial resolution of beat processing (see the discussion paragraph “probing with chirps”). In this paragraph we rendered more clearly how chirps could improve beat processing by phase shifting EODs and recovering eventual blind-spots on the fish skin caused by disruptive EOD interferences (resulting in lower beat contrast). We also mention that enhancement of electrosensory input triggered by chirps, could be localized not only at the level of electroreceptors (consider the synchronizing effects small chirps have on p-units at low frequency beats) but also at the level of ON and OFF pyramidal cells in the ELL. Looked at from the perspective of these neurons, any chirp would enhance the activity of these input lines, yet in opposite ways.

      And there is an egg-and-chicken type issue as well, namely, that one needs a beat in order to "chirp" the beating pattern, but then how does chirping optimize the detection of the said beat? Perhaps the authors mean (as they wrote elsewhere in the paper) that the chirps could enhance electrosensory responses to the beat.

      According to the Reviewer’s comment, we have now revised several instances of the misleading phrasing identified.

      In the results on novel environment exploration: “If chirps enhance beat processing, for instance, chirping should occur within beat detection range but at a certain distance.”.

      “This, in turn, could be used to validate our beat-interference estimates as meaningfully related to beat processing.” and “In all this, rises may represent an exception as their locations are spread over larger distances and even in presence of obstacles potentially occluding the beat source (such as shelters, plants, or walls), all of which are conditions in which beat detection or beat processing could be more difficult (this, could be coherent with the production of rises right at the end of EOD playbacks; Figure S5).”

      Last result paragraph (clutter experiment): “Overall, these results indicate that chirping is significantly affected by the presence of environmental clutter partially disrupting - or simply obstructing - the processing of beat related information during locomotion”.

      In the probing with chirps discussion paragraph “In theory, chirps could also be used to improve electrolocation of objects as well (as opposed to the processing of the beat).”.

      In the conclusions: “optimizing the otherwise passive responses to the beat”.

      A second criticism is that the study links the beat detection to underwater object localization. I did not see a sufficiently developed argument in this direction, nor how the data provided support for this argument. It is certainly possible that the image on the fish's body of an object in the environment will be slightly modified by introducing a chirp on the waveform, as this may enhance certain heterogeneities of the object in relation to its environment. The thrust of this argument seems to derive more from the notion of Fourier analysis with pulse type fish (and radar theory more generally) that the higher temporal frequencies in the beat waveform induced by the chirp will enable a better spatial resolution of objects. It remains to be seen whether this is significant.

      The Reviewer is correct in noting that this point is not addressed in the manuscript. We introduced it as a speculative discussion point to mention alternative possibilities. These could be subject to further testing in future studies.

      I would also have liked to see a proposal for new experiments that could test these possible new roles.

      We have added clearer suggestions for future experiments throughout the discussion: these may be aimed at 1) improving playback experiments using more realistic copies of the brown ghost’s EODs (including harmonics), 2) assess fish reciprocal positioning during chirping in better detail and 3) test the use of chirping during target-reaching tasks in order to better assess the probing function of chirps.

      The authors should recall for the readers the gist of Bastian's 2001 argument that the chirp "can adjust the beat frequency to levels that are better detectable" in the light of their current. Further, at the beginning of the "Probing with chirps" section, the 3rd way in which chirps could improve conspecific localization mentions the phase-shifting of the EOD. The authors should clarify whether they mean that the tuberous receptors and associated ELL/toral circuitry could deal with that cue, or that the T_unit pathway would be needed?

      We thank the Reviewer for identifying this unclear point. We added reference to the p-units “Yet, this does not exclude the possibility that chirps could be used to briefly shift the EOD phase in order to avoid disruptive interferences caused by phase opposition (at the level of p-units)” in the above mentioned paragraph. We would prefer to omit a more detailed reference to t-units in order to avoid lengthy descriptions required to discuss the different electroreceptor types.

      On p.17 I don't understand what is meant by most chirps being produced, possibly aligned with the field lines, since field lines are everywhere. And what is one to conclude from the comparison of Fig.6D and 7A? Likewise it was not clear what is meant by chirps having a detectable effect on randomly generated beats.

      We agree on the valid point raised by the Reviewer and we have removed reference to current lines from the text.

      In the section on Inconsistencies between behaviour and hypothesized signal meaning, the authors could perhaps nuance the interpretation of the results further in the context of the unrealistic copy of natural stimuli using EOD mimics. In particular, Kelly et al. 2008 argued that electrode placement mattered in terms of representation of a mimic fish onto the body of a real fish, and thus, if I properly understand the set up here, the movement would cause the mimic to vary in quality. This may nevertheless be a small confounding issue.

      We agree with the Reviewer and added a comment at the beginning of the paragraph mentioned. “Nonetheless, it's plausible that playback stimuli, as employed in our study and others, may not faithfully replicate natural signals, thus potentially influencing the reliability of the observed behaviors. Future studies might consider replicating these findings using either natural signals or improved mimics, which could include harmonic components (excluded in this study).”

      Recommendations for the authors:

      8Reviewer #2 (Recommendations For The Authors):*

      (1) Abstract: "...is probably the most intensely studied species..." is a weak, unsupported, and unnecessary statement. Just state that it has been heavily studied, or is one of the most well-studied,...

      rephrased

      (2) Abstract: "...are thus used as references to specific internal states during recordings - of either the brain or the electric organ..." This was not clear to me.

      rephrased

      (3) Abstract: "...the logic underlying this electric communication..." It is not clear to me what the authors mean here by "logic".

      rephrased

      (4) I strongly recommend clearly defining homeoactive sensing and distinguishing it from allocative sensing when this term is first introduced in the introduction. This is not a commonly used term. Most readers likely think they understand what is meant by the term active sensing, however I recommend first defining it, and then distinguishing amongst these two different types of active sensing.

      rephrased

      (5) Introduction: "Together with a few other species (Rose, 2004),..." More than a few. There are hundreds of species with electric organs. It is certainly not a "unique" capability.

      rephrased

      (6) Introduction: "But the real advantage of active electrolocation can be appreciated in the context of social interaction." This is unclear. Why is this the "real advantage" of active electrolocation when an electrically silent fish could detect an electrically communicating fish just fine without interference? Active electrolocation is needed to detect objects that are not actively emitting an electric field. It is not needed to detect signaling individuals.

      rephrased

      (7) Introduction: why is active sensing using EODs limited to distances of 6-12 cm? Why does it not work at closer range?

      Here we meant to give a range based on published data. We rephrased it to “up to 12”.

      (8) Introduction: electric fields decay with the cubed of distance, as you show in appendix 1.

      rephrased

      (9) Introduction: it is not clear what is meant by "blurred EOD amplitude".

      rephrased (“noisy”)

      (10) Figure 2C is very challenging to interpret. I recommend spending more time in the manuscript walking the reader through this analysis and its presentation.

      We are grateful for the comment as we probably overlooked this point. We now added a small paragraph to explain these data in better detail.

      (11) Results: "This was done by calculating the ratio between the duration of the beat cycles affected by the chirp (beat interpeak intervals) and the total duration of the beat cycles detected within a fixed time window (roughly double the size of the maximum chirp duration, 700 ms)." This was not clear to me.

      We now rephrased to “Estimates of beat interference were made by calculating the ratio between the cumulative duration of the beat cycles affected by a given chirp (1 beat cycle corresponding to the beat comprised by two consecutive beat peaks, or - more simply - the beat inter-peak interval) over the cumulative duration of all the beat cycles within the time window used as a reference (700 ms; other analysis windows were tested Figure S9)” to clarify this method.

      (12) Results: "For each chirp, the interference values obtained for 4 different phases (90{degree sign} steps) were averaged." Why was this done?

      To consider an average effect across phases. Although it is true that chirp parameters may have a different impact on the beat, depending on EOD phase, including this parameter in our figure/s would have considerably increased the volume of data reported giving too much emphasis to an analysis we judged not crucially important. In addition, since we did not consider EOD phase in our recordings, we opted for an average estimate encompassing different phase values.

      (13) Discussion: "Third, observations in a few species are generalized to all other gymnotiforms without testing for species differences (Turner et al., 2007; Smith et al., 2013; Petzold et al., 2016)." I strongly disagree with this statement. First, the studies referenced here do explicitly compare chirps across species. Second, you only studied one species here, so it is not clear to me how this is a relevant concern in interpreting your findings.

      Here we have probably been unclear in the writing: the point we wanted to make is that the idea of chirps having semantic content has been generalized to other species without investigating the nature of their chirping with as much detail as done for brown ghosts.

      We have now rephrased the statement and changed it to: “Second, observations in a few species are generalized to all other gymnotiforms without testing whether chirping may have similar functions in other species (Turner et al., 2007; Smith et al., 2013; Petzold et al., 2016)”

      (14) Discussion: "The two beats could be indistinguishable (assuming that the mechanism underlying the discrimination of the sign of DF at low DFs, and thought to be the basis of the so called jamming avoidance response (JAR; Metzner, 1999), is not functional at higher DFs)." Why would you assume this?

      What we meant here is that it is unlikely that the two DFs are not discriminated by the same mechanisms implied in the JAR, even if the DF is higher than the levels at which usually JARs are detected (i.e. DF = 1-10 Hz?). To improve clarity, we rephrased this statement. “The two beats could be indistinguishable (assuming - perhaps not realistically - that the same mechanism involved in DF discrimination at lower DF values would not work in this case; Metzner, 1999)”.

      (15) Discussion: "...an idea which seems congruent with published electrophysiological studies..." How so?

      Rephrased to “Based on our beat interference estimates, we propose that the occurrence of the different types of chirps at more positive DFs (such as in male-to-female chirping) may be explained by their different effect on the beat (Figure 5D; Benda et al., 2006; Walz et al., 2013).”

      Reviewer #3 (Recommendations For The Authors):

      On p.2 there is a discrepancy between the quoted ranges for active sensing of objects, first 10-12 cm, and then 6-12 cm further down. And in the following paragraph right below this passage, electric fields are said to decay with the squared distance (appendix 1). That expression has a cos(theta) which is inversely proportional to the distance, and so one is really dealing, as expected for dipolar fields, with a drop-off that decays with the distance cubed.

      We thank the Reviewer for the comment, we have now corrected the mistake and added “cubed”. We also removed the imprecise reference to the range 6-12 cm, rephrased to “up to 12 cm”.

      At the end of the section on Inconsistencies..., it is not clear what "activity levels" refers to. It should also be made clearer at the outset, and reminded in this section too, that for the authors, behavioural context does not include social experience, which is somewhat counter-intuitive.

      We now specified we meant “locomotor activity levels”. Regarding the social experience we included it as “behavioral context”, we now made it clearer in the first result paragraph. We hope we resolved the confusion.

      The caption of Fig.8 could use more clarity in terms of what is being compared in (C) (and is "1*2p" a typo?)

      We corrected the typo and edited the figure to make the references more clear.

      The concept of "high self-correlation of chirp time series" is presented only in the Conclusion using those words. The word self-correlation is not used beforehand. This needs to be fixed so the reader knows clearly what is being referred to.

      Thank you for noting this. We have now changed the wording using the term “auto-correlation” and changed a statement at the beginning of the “interference” result paragraph accordingly, removing references to self-correlation.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      We thank the reviewers for their thorough re-evaluation of our revised manuscript. Addressing final issues they raised has improved the manuscript further. We sincerely appreciate the detailed explanations that the reviewers provided in the "recommendations for authors" section. This comprehensive feedback helped us identify the sources of ambiguity within the analysis descriptions and in the discussion where we interpreted the results. Below, you will find our responses to the specific comments and recommendations.

      Reviewer #1 (Recommendations):

      (1) I find that the manuscript has improved significantly from the last version, especially in terms of making explicit the assumptions of this work and competing models. I think the response letter makes a good case that the existence of other research makes it more likely that oscillators are at play in the study at hand (though the authors might consider incorporating this argumentation a bit more into the paper too). Furthermore, the authors' response that the harmonic analysis is valid even when including x=y because standard correlation analysis were not significant is a helpful response. The key issue that remains for me is that I have confusions about the additional analyses prompted by my review to a point where I find it hard to evaluate how and whether they demonstrate entrainment or not. 

      First, I don't fully understand Figure 2B and how it confirms the Arnold tongue slice prediction. In the response letter the authors write: "...indicating that accuracy increased towards the preferred rate at fast rates and decreased as the stimulus rate diverged from the preferred rate at slow rates". The figure shows that, but also more. The green line (IOI < preferred rate) indeed increases toward the preferred rate (which is IOI = 0 on the x-axis; as I get it), but then it continues to go up in accuracy even after the preferred rate. And for the blue line, performance also continues to go up beyond preferred rate. Wouldn't the Arnold tongue and thus entrainment prediction be that accuracy goes down again after the preferred rate has passed? That is to say, shouldn't the pattern look like this (https://i.imgur.com/GPlt38F.png) which with linear regression should turn to a line with a slope of 0?

      This was my confusion at first, but then I thought longer about how e.g. the blue line is predicted only using trials with IOI larger than the preferred rate. If that is so, then shouldn't the plot look like this? (https://i.imgur.com/SmU6X73.png). But if those are the only data and the rest of the regression line is extrapolation, why does the regression error vary in the extrapolated region? It would be helpful if the authors could clarify this plot a bit better. Ideally, they might want to include the average datapoints so it becomes easier to understand what is being fitted. As a side note, colours blue/green have a different meaning in 2B than 2D and E, which might be confusing. 

      We thank the reviewer for their recommendation to clarify the additional analyses we ran in the previous revision to assess whether accuracy systematically increased toward the preferred rate estimate. We realized that the description of the regression analysis led to misunderstandings. In particular, we think that the reviewer interpreted (1) our analysis as linear regression (based on the request to plot raw data rather than fits), whereas, in fact, we used logistic regression, and (2) the regression lines in Figure 2B as raw IOI values, while, in fact, they were the z-scored IOI values (from trials where stimulus IOI were faster than an individual’s preferred rate, IOI < preferred rate, in green; and from trials stimulus IOI were slower than an individual’s preferred rate, IOI > preferred rate, in blue), as the x axis label depicted. We are happy to have the opportunity to clarify these points in the manuscript. We have also revised Figure 2B, which was admittedly maybe a bit opaque, to more clearly show the “Arnold tongue slice”.  

      The logic for using (1) logistic regression with (2) Z-scored IOI values as the predictor is as follows. Since the response variable in this analysis, accuracy, was binary (correct response = 1, incorrect response = 0), we used a logistic regression. The goal was to quantify an acrosssubjects effect (increase in accuracy toward preferred rate), so we aggregated datasets across all participants into the model. The crucial point here is that each participant had a different preferred rate estimate. Let’s say participant A had the estimate at IOI = 400 ms, and participant B had an estimate at IOI = 600 ms. The trials where IOI was faster than participant A’s estimate would then be those ranging from 200 ms to 398 ms, and those that were slower would range from 402 ms to 998 ms. For Participant B, the situation would be different:  trials where IOI was faster than their estimate would range from 200 ms to 598 ms, and slower trials would range between 602 ms to 998 ms. For a fair analysis that assesses the accuracy increase, regardless of a participant’s actual preferred rate, we normalized these IOI values (faster or slower than the preferred rate). Zscore normalization is a common method of normalizing predictors in regression models, and was especially important here since we were aggregating predictors across participants, and the predictors ranges varied across participants. Z-scoring ensured that the scale of the sample (that differs between participant A and B, in this example) was comparable across the datasets. This is also important for the interpretation of Figure 2B. Since Z-scoring involves mean subtraction, the zero point on the Z-scaled IOI axis corresponds to the mean of the sample prior to normalization (for Participant A: 299 ms, for Participant B: 399 ms) and not the preferred rate estimate. We have now revised Figure 2B in a way that we think makes this much clearer.  

      The manuscript text includes clarification that the analyses included logistic regression and stimulus IOI was z-scored: 

      “In addition to estimating the preferred rate as stimulus rates with peak performance, we investigated whether accuracy increased as a function of detuning, namely, the difference between stimulus rate and preferred rate, as predicted by the entrainment models (Large, 1994; McAuley, 1995; Jones, 2018). We tested this prediction by assessing the slopes of mixed-effects logistic regression models, where accuracy was regressed on the IOI condition, separately for stimulus rates that were faster or slower than an individual’s preferred rate estimate. To do so, we first z-scored IOIs that were faster and slower than the participant’s preferred rate estimates, separately to render IOI scales comparable across participants.” (p. 7)

      While thinking through the reviewer’s comment, we realized we could improve this analysis by fitting mixed effects models separately to sessions’ data. In these models, fixed effects were z-scored IOI and ‘detuning direction’ (i.e., whether IOI was faster or slower than the participant’s preferred rate estimate). To control for variability across participants in the predicted interaction between z-scored IOI and direction, this interaction was added as a random effect. 

      “Ideally, they might want to include the average datapoints so it becomes easier to understand what is being fitted.”

      Although we agree with the reviewer that including average datapoints in a figure in addition to model predictions usually better illustrates what is being fitted than the fits alone, this doesn’t work super well for logistic regression, since the dependent variable is binary. To try to do a better job illustrating single-participant data though, we instead  fitted logistic models to each participant’s single session datasets, separately to conditions where z-scored IOI from fasterthan-preferred rate trials, and those from slower-than-preferred rate trials, predicted accuracy. From these single-participant models, we obtained slope values, we referred to as ‘relative detuning slope’, for each condition and session type. This analysis allowed us to illustrate the effect of relative detuning on accuracy for each participant. Figure 2B now shows each participant’s best-fit lines from each detuning direction condition and session.

      Since we now had relative detuning slopes for each individual (which we did not before), we took advantage of this to assess the relationship between oscillator flexibility and the oscillator’s behavior in different detuning situations (how strongly leaving the preferred rate hurt accuracy, as a proxy for the width of the Arnold tongue slice). Theoretically, flexible oscillators should be able to synchronize to wide range of rates, not suffering in conditions where detuning is large (Pikovsky et al., 2003). Conversely, synchronization of inflexible oscillators should depend strongly on detuning. To test whether our flexibility measure predicted this dependence on detuning, which is a different angle on oscillator flexibility, we first averaged each participant’s detuning slopes across detuning directions (after sign-flipping one of them). Then, we assessed the correlation between the average detuning slopes and flexibility estimates, separately from conditions where |-𝚫IOI| or |+𝚫IOI| predicted accuracy. The results revealed significant negative correlations (Fig. 2F), suggesting that performance of individuals with less flexible oscillators suffered more as detuning increased. Note that flexibility estimates quantified how much accuracy decreased as a function of trial-to-trial changes in stimulus rate (±𝚫IOI). Thus, these results show that oscillators that were robust to changes in stimulus rate were also less dependent on detuning to be able to synchronize across a wide range of stimulus rates. We are excited to be able to provide this extra validation of predictions made by entrainment models. 

      To revise the manuscript with the updated analysis on detuning:

      • We added the descriptions of the analyses to the Experiment 1 Methods section.

      Calculation of detuning slopes and their averaging procedure are in Preferred rate estimates:

      “In addition to estimating the preferred rate as stimulus rates with peak performance, we investigated whether accuracy increased as a function of detuning, namely, the difference between stimulus rate and preferred rate, as predicted by the entrainment models (Large, 1994; McAuley, 1995; Jones, 2018). We tested this prediction by assessing the slopes of mixed-effects logistic regression models, where accuracy was regressed on the IOI condition, separately for stimulus rates that were faster or slower than an individual’s preferred rate estimate. To do so, we first z-scored IOIs that were faster and slower than the participant’s preferred rate estimates, separately to render IOI scales comparable across participants. The detuning direction (i.e., whether stimulus IOI was faster or slower than the preferred rate estimate) was coded categorically. Accuracy (binary) was predicted by these variables (zscored IOI, detuning direction), and their interaction. The model was fitted separately to datasets from random-order and linear-order sessions, using the fitglme function in MATLAB. Fixed effects were z-scored IOI and detuning direction and random effect was their interaction. We expected a systematic increase in performance toward the preferred rate, which would result in a significant interaction between stimulus rate and detuning direction. To decompose the significant interaction and to visualize the effects of detuning, we fitted separate models to each participant’s single-session datasets, and obtained slopes from each direction condition, hereafter denoted as the ‘relative-detuning slope’. We treated relative-detuning slope as an index of the magnitude of relative detuning effects on accuracy. We then evaluated these models, using the glmval function in MATLAB to obtain predicted accuracy values for each participant and session. To visualize the relative-detuning curves, we averaged the predicted accuracies across participants within each session, separately for each direction condition (faster or slower than the preferred rate). To obtain a single value of relative-detuning magnitude for each participant, we averaged relative detuning slopes across direction conditions. However, since slopes from IOI > preferred rate conditions quantified an accuracy decrease as a function of detuning, we sign-flipped these slopes before averaging. The resulting average relative detuning slopes, obtained from each participant’s single-session datasets, quantified how much the accuracy increase towards preferred rate was dependent on, in other words, sensitive to, relative detuning.” (p. 7-8)

      • We added the information on the correlation analyses between average detuning slopes in Flexibility estimates.

      “We further tested the relationship between the flexibility estimates (𝛽 from models where |𝚫IOI| or |+𝚫IOI| predicted accuracy) and average detuning slopes (see Preferred rate estimates) from random-order sessions. We predicted that flexible oscillators (larger 𝛽) would be less severely affected by detuning, and thus have smaller detuning slopes. Conversely, inflexible oscillators (smaller 𝛽) should have more difficulty in adapting to a large range of stimulus rates, and their adaptive abilities should be constrained around the preferred rate, as indexed by steeper relative detuning slopes.” (p. 8)

      • We provided the results in Experiment 1 Results section.

      “Logistic models assessing a systematic increase in accuracy toward the preferred rate estimate in each session type revealed significant main effects of IOI (linear-order session: 𝛽 = 0.264, p < .001; random-order session: 𝛽 = 0.175, p < .001), and significant interactions between IOI and direction (linear-order session: 𝛽 = -0.444, p < .001; random-order session: 𝛽 = -0.364, p < .001), indicating that accuracy increased as fast rates slowed toward the preferred rate (positive slopes) and decreased again as slow rates slowed further past the preferred rate (negative slopes), regardless of the session type. Fig. 2B illustrates the preferred rate estimation method for an example participant’s dataset and shows the predicted accuracy values from models fitted to each participant’s single-session datasets. Note that the main effect and interaction were obtained from mixed effects models that included aggregated datasets from all participants, whereas the slopes quantifying the accuracy increase as a function of detuning (i.e., relative detuning slopes) were from models fitted to single-participant datasets.” (p. 9-10)

      “We tested the relationship between the flexibility estimates and single-participant relative detuning slopes from random-order sessions (Fig. 2B). The results revealed negative correlations between the relative detuning slopes and flexibility estimates, both with 𝛽 (r(23) =0.529, p = 0.007) from models where |-𝚫IOI| predicted accuracy (adapting to speeding-up trials), and 𝛽 (r(23) =-0.580, p = 0.002) from models where |+𝚫IOI| predicted accuracy (adapting to slowing-down trials). That is, the performance of individuals with less flexible oscillators suffered more as detuning increased. These results are shown in Fig. 2F.” (p. 10)

      • We modified Figure 2. In Figure 2B, there are now separate subfigures with the z-scored IOI faster (left) or slower (right) than the preferred rate predicting accuracy. We illustrated the correlations between average relative detuning slopes and flexibility estimates in Figure 2F. 

      Author response image 1.

      Main findings of Experiment 1. A Left: Each circle represents a single participant’s preferred rate estimate from the random-order session (x axis) and linear-order session (y axis). The histograms along the top and right of the plot show the distributions of estimates for each session type. The dotted and dashed lines respectively represent 1:2 and 2:1 ratio between the axes, and the solid line represents one-to-one correspondence. Right: permutation test results. The distribution of summed residuals (distance of data points to the closest y=x, y=2*x and y=x/2 lines) of shuffled data over 1000 iterations, and the summed residual from original data (dashed line) that fell below .008 of the permutation distribution. B Top: Illustration of the preferred rate estimation method from an example participant’s linear-order session dataset. Estimates were the stimulus rates (IOI) where smoothed accuracy (orange line) was maximum (arrow). The dotted lines originating from the IOI axis delineate the stimulus rates that were faster (left, IOI < preferred rate) and slower (right, IOI > preferred rate) than the preferred rate estimate and expand those separate axes, the values of which were Z-scored for the relative-detuning analysis. Bottom: Predicted accuracy, calculated from single-participant models where accuracy in random-order (purple) and linear-order (orange) sessions was predicted by z-scored IOIs that were faster than a participant’s preferred rate estimate (left), and by those that were slower (right). Thin lines show predicted accuracy from single-participant models, solid lines show the averages across participants and the shaded areas represent standard error of the mean. Predicted accuracy is maximal at the preferred rate and decreases as a function of detuning. C Average accuracy from random-order (left, purple) and linear-order (right, orange) sessions. Each circle represents a participant’s average accuracy. D Flexibility estimates. Each circle represents an individuals’ slope (𝛽) obtained from logistic models, fitted separately to conditions where |𝚫IOI| (left, green) or |+𝚫IOI| (right blue) predicted accuracy, with greater values (arrow’s direction) indicating better oscillator flexibility. The means of the distributions of 𝛽 from both conditions were smaller than zero (dashed line), indicating a negative effect of between-trial absolute rate change on accuracy. E Participants’ average bias from |𝚫IOI| (green), and |+𝚫IOI| (blue) conditions in random-order (left) and linear-order (right) sessions. Negative bias indicates underestimation of the comparison intervals, positive bias indicates the opposite. Box plots in C-E show median (black vertical line), 25th and 75th percentiles (box edges) and extreme datapoints (whiskers). In C and E, empty circles show outlier values that remained after data cleaning procedures. F Correlations between participants’ average relative detuning slopes, indexing the steepness of the increase in accuracy towards the preferred rate estimate (from panel B), and flexibility estimates from |-𝚫IOI| (top, green), and |+𝚫IOI| (bottom, blue) conditions (from panel C). Solid black lines represent the best-fit line, dashed lines represent 95% confidence intervals.

      • We discussed the results in General Discussion and emphasized that only entrainment models, compared to timekeeper models, predict a relationship between detuning and accuracy that is amplified by oscillator’s inflexibility: “we observed systematic increases in task accuracy (Experiment 1) toward the best-performance rates (i.e., preferred rate estimates), with the steepness of this increase being closely related to the effects of rate change (i.e., oscillator flexibility). Two interdependent properties of an underlying system together modulating an individual’s timing responses show strong support for the entrainment approach” (p. 24)

      “As a side note, colours blue/green have a different meaning in 2B than 2D and E, which might be confusing.” 

      Upon the reviewer’s recommendation, we changed the color scale across Figure 2, such that colors refer to the same set of conditions across all panels. 

      (2) Second, I don't understand the additional harmonic relationship analyses in the appendix, and I suspect other readers will not either. As with the previous point, it is not my view that the analyses are faulty or inadequate, it is rather that the lack of clarity makes it challenging to evaluate whether they support an entrainment model or not. 

      We decided to remove the analysis that was based on a circular approach, and we have clarified the analysis that was based on a modular approach by giving example cases: 

      “We first calculated how much the slower estimate (larger IOI value) diverts, proportionally from the faster estimate (smaller IOI value) or its multiples (i.e., harmonics) by normalizing the estimates from both sessions by the faster estimate. The outcome measure was the modulus of the slower, with respect to the faster estimate, divided by the faster estimate, described as mod(max(X), min(X))/min(X) where X = [session1_estimate session2_estimate]. An example case would be a preferred rate estimate of IOI = 603 ms from the linear-order session and an estimate of IOI = 295 ms from the random-order session. In this case, the slower estimate (603 ms) diverts from the multiple of the faster estimate (295*2 = 590 ms) by 13 ms, a proportional deviation of 4% of the faster estimate (295 ms). The outcome measure in this example is calculated as mod(603,295)/295 = 0.04.” (Supplementary Information, p. 2)

      Crucially, the ability of oscillators to respond to harmonically-related stimulus rates is a main distinction between entrainment and interval (timekeeper) models. In the current study, we found that each participant’s best-performance rates, the preferred rate estimates, had harmonic relationships. The additional analyses further showed that these harmonic relationships were not due to chance. This finding speaks against the interval (timekeeper) approaches and is maximally compatible with the entrainment framework. 

      Here are a number of questions I would like to list to sketch my confusion: 

      • The authors write: "We first normalized each participant's estimates by rescaling the slower estimate with respect to the faster one and converting the values to radians". Does slower estimate mean: "task accuracy in those trials in which IOI was slower than a participant's preferred frequency"? 

      Preferred rate estimates were stimulus rates (IOI) with best performance, as described in Experiment 1 Methods section. 

      “We conceptualized individuals' preferred rates as the stimulus rates where durationdiscrimination accuracy was highest. To estimate preferred rate on an individual basis, we smoothed response accuracy across the stimulus-rate (IOI) dimension for each session type, using the smoothdata function in Matlab. Estimates of preferred rate were taken as the smoothed IOI that yielded maximum accuracy” (p. 7). 

      The estimation method and the resulting estimate for an example participant was provided in Figure 2B. The updated figure in the current revision has this illustration only for linear-order session. 

      “Estimates were the stimulus rates (IOI) where smoothed accuracy (orange line) was maximum (arrow)” (Figure caption, p. 9).

      • "We reasoned that values with integer-ratio relationships should correspond to the same phase on a unit circle". What is values here; IOI, or accuracy values for certain IOIs? And why should this correspond to the same phase? 

      We removed the analysis on integer-ratio relationships that was based on a circular approach that the reviewer is referring to here. We clarified the analysis that was based on a modular approach and avoided using the term ‘values’ without specifying what values corresponded to.

      • Des "integer-ratio relationships" have to do with the y=x, y=x*2 and y=x/2 relationships of the other analyses?  

      Integer-ratio relationships indeed refer to y=x, y=x*2 and y=x/2 relationships. For example, if a number y is double of another number x (y = x*2), these values have an integer-ratio relationship, since 2 is an integer. This holds true also for the case where y = x/2 since x = y*2. 

      • Supplementary Figure S2c shows a distribution of median divergences resulting from the modular approach. The p-value is 0.004 but the dashed line appears to be at a much higher percentile of the distribution. I find this hard to understand. 

      We thank the reviewer for a detailed inspection of all figures and information in the manuscript. The reviewer’s comment led us to realize that this figure had an error. We updated the figure in Supplementary Information (Supplementary Figure S2). 

      Reviewer #2 (Public Review):

      To get a better understanding of the mechanisms underlying the behavioral observations, it would have been useful to compare the observed pattern of results with simulations done with existing biophysical models. However, this point is addressed if the current study is read along with this other publication of the same research group: Kaya, E., & Henry, M. J. (2024, February 5). Modeling rhythm perception and temporal adaptation: top-down influences on a gradually decaying oscillator.       https://doi.org/10.31234/osf.io/q9uvr 

      We agree with the reviewer that the mechanisms underlying behavioral responses can be better understood by modeling approaches. We thank the reviewer for acknowledging our computational modeling study that addressed this concern. 

      Reviewer #2 (Recommendations):

      I very much appreciate the thorough work done by the authors in assessing all reviewers' concerns. In this new version they clearly state the assumptions to be tested by their experiments, added extra analyses further strengthening the conclusions and point the reader to a neurocomputational model compatible with the current observations. 

      I only regret that the authors misunderstood the take home message of our Essay (Doelling & Assaneo 2021). Despite this being obviously out of the scope of the current work, I would like to take this opportunity to clarify this point. In that paper, we adopted a Stuart-Landau model not to determine how an oscillator should behave, but as an example to show that some behaviors usually used to prove or refute an underlying "oscillator like" mechanism can be falsified. We obviously acknowledge that some of the examples presented in that work are attainable by specific biophysical models, as explicitly stated in the essay: "There may well be certain conditions, equations, or parameters under which some of these commonly held beliefs are true. In that case, the authors who put forth these claims must clearly state what these conditions are to clarify exactly what hypotheses are being tested." 

      This work did not mean to delineate what oscillator is (or in not), but to stress the importance of explicitly introducing biophysical models to be tested instead of relying on vague definitions sometimes reflecting the researchers' own beliefs. The take home message that we wanted to deliver to the reader appears explicitly in the last paragraph of that essay: "We believe that rather than concerning ourselves with supporting or refuting neural oscillators, a more useful framework would be to focus our attention on the specific neural dynamics we hope to explain and to develop candidate quantitative models that are constrained by these dynamics. Furthermore, such models should be able to predict future recordings or be falsified by them. That is to say that it should no longer be sufficient to claim that a particular mechanism is or is not an oscillator but instead to choose specific dynamical systems to test. In so doing, we expect to overcome our looping debate and to ultimately develop-by means of testing many model types in many different experimental conditions-a fundamental understanding of cognitive processes and the general organization of neural behavior." 

      We appreciate the reviewer’s clarification of the take-home message from Doelling and Assaneo (2021). We concur with the assertions made in this essay, particularly regarding the benefits of employing computational modeling approaches. Such methodologies provide a nuanced and wellstructured foundation for theoretical predictions, thereby minimizing the potential for reductionist interpretations of behavioral or neural data.

      In addition, we would like to underscore the significance of delineating the level of analysis when investigating the mechanisms underlying behavioral or neural observations. The current study or Kaya & Henry (2024) involved no electrophysiological measures. Thus, we would argue that the appropriate level of analysis across our studies concerns the theoretical mechanisms rather than how these mechanisms are implemented on the neural (physical) level. In both studies, we aimed to explore or approximate the theoretical oscillator that guides dynamic attention rather than the neural dynamics underlying these theoretical processes. That is, theoretical (attentional) entrainment may not necessarily correspond to neural entrainment, and differentiating these levels could be informative about the parallels and differences between these levels. 

      References

      Doelling, K. B., & Assaneo, M. F. (2021). Neural oscillations are a start toward understanding brain activity rather than the end. PLoS Biol, 19(5), e3001234. https://doi.org/10.1371/journal.pbio.3001234  Jones, M. R. (2018). Time will tell: A theory of dynamic attending. Oxford University Press. 

      Kaya, E., & Henry, M. J. (2024). Modeling rhythm perception and temporal adaptation: top-down influences on a gradually decaying oscillator. PsyArxiv. https://doi.org/https://doi.org/10.31234/osf.io/q9uvr 

      Large, E. W. (1994). Dynamic representation of musical structure. The Ohio State University. 

      McAuley, J. D. (1995). Perception of time as phase: Toward an adaptive-oscillator model of rhythmic pattern processing Indiana University Bloomington]. 

      Pikovsky, A., Rosenblum, M., & Kurths, J. (2003). Synchronization: A Universal Concept in Nonlinear Sciences. Cambridge University Press.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      We thank you for the time you took to review our work and for your feedback! 

      The major changes to the manuscript are:

      (1) We have added visual flow speed and locomotion velocity traces to Figure 5 as suggested.

      (2) We have rephrased the abstract to more clearly indicate that our statement regarding acetylcholine enabling faster switching of internal representations in layer 5 is speculative.

      (3) We have further clarified the positioning of our findings regarding the basal forebrain cholinergic signal in visual cortex in the introduction.

      (4) We have added a video (Video S1) to illustrate different mouse running speeds covered by our data.

      A detailed point-by-point response to all reviewer concerns is provided below.

      Reviewer #1 (Recommendations For The Authors):

      The authors have addressed most of the concerns raised in the initial review. While the paper has been improved, there are still some points of concern in the revised version. 

      Major comments

      (1) Page 1, Line 21: The authors claim, "Our results suggest that acetylcholine augments the responsiveness of layer 5 neurons to inputs from outside of the local network, enabling faster switching between internal representations during locomotion." However, it is not clear which specific data or results support the claim of "switching between internal representations." ... 

      Authors' response: "... That acetylcholine enables a faster switching between internal representations in layer 5 is a speculation. We have attempted to make this clearer in the discussion. ..." 

      In the revised version, there is no new data added to directly support the claim - "Our results suggest acetylcholine ..., enabling faster switching between internal representations during locomotion" (in the abstract). The authors themselves acknowledge that this statement is speculative. The present data only demonstrate that ACh reduces the response latency of L5 neurons to visual stimuli, but not that ACh facilitates quicker transitions in neuronal responses from one visual stimulus to another. To maintain scientific rigor and clarity, I recommend the authors amend this sentence to more accurately reflect the findings. 

      This might be a semantic disagreement? We would argue both a gray screen and a grating are visual stimuli. Hence, we are not sure we understand what the reviewer means by “but not that ACh facilitates quicker transitions in neuronal responses from one visual stimulus to another”. We concur, our data only address one of many possible transitions, but it is a switch between distinct visual stimuli that is sped up by ACh. Nevertheless, we have rephrased the sentence in question by changing “our data suggest” to “based on this we speculate” - but are not sure whether this addresses the reviewer’s concern.  

      (2) Page 4, Line 103: "..., a direct measurement of the activity of cholinergic projection from basal forebrain to the visual cortex during locomotion has not been made." This statement is incorrect. An earlier study by Reimer et al. indeed imaged cholinergic axons in the visual cortex of mice running on a wheel. 

      Authors' response: "We have clarified this as suggested. However, we disagree slightly with the reviewer here. The key question is whether the cholinergic axons imaged originate in basal forebrain. While Reimer et al. 2016 did set out to do this, we believe a number of methodological considerations prevent this conclusion: ... Collins et al. 2023 inject more laterally and thus characterize cholinergic input to S1 and A1, ..."

      The authors pointed out some methodological caveats in previous studies that measured the BF input in V1, and I agree with them on several points. Nonetheless, the statement that "a direct measurement of the activity of cholinergic projection from basal forebrain to visual cortex during locomotion has not been made. ... Prior measurements of the activity of cholinergic axons in visual cortex have all relied on data from a cross of ChAT-Cre mice with a reporter line ..." (Page 4, Line 103) seems to be an oversimplification. In fact, contrary to what the authors noted, Collins et al. (2023) conducted direct imaging of BF cholinergic axons in V1 (Fig. 1) - "Selected axon segments were chosen from putative retrosplenial, somatosensory, primary and secondary motor, and visual cortices". They used a viral approach to express GCaMP in BF axons to bypass the limitations associated with the use of a GCaMP reporter mouse line - "Viral injections were used for BF- ACh studies to avoid imaging axons or dendrites from cholinergic projections not arising from the BF (e.g. cortical cholinergic interneurons)." The authors should reconsider the text. 

      The reason we think that our statement here was – while simplified – accurate, is that Collins et al. do record from cholinergic axons in V1, but they don’t show these data (they only show pooled data across all recordings sites). By superimposing the recording locations of the Collins paper on the Allen mouse brain atlas (Figure R1), we estimate that of the approximately 50 recording sites, most are in somatosensory and somatomotor areas of cortex, and only 1 appears to be in V1, something that is often missed as it is not really highlighted in that paper. If this is indeed correct, we would argue that the data in the Collins et al. paper are not representative of cholinergic activity in visual cortex (we fear only the authors would know for sure). Nevertheless, we have rephrased again. 

      Author response image 1.

      Overlay of the Collins et al. imaging sites (red dots, black outline and dashed circle) on the Allen mouse brain atlas (green shading). Very few (we estimate that it was only 1) of the recording sites appear to be in V1 (the lightest green area), and maybe an additional 4 appear to be in secondary visual areas.  

      Minor comments

      (1) It is unclear which BF subregion(s) were targeted in this study. 

      Authors' response: Thanks for pointing this out. We targeted the entire basal forebrain (medial septum, vertical and horizontal limbs of the diagonal band, and nucleus basalis) with our viral injections. ... We have now added the labels for basal forebrain subregions targeted next to the injection coordinates in the manuscript. 

      The authors provided the coordinates for their virus injections targeting the BF subregions - "(AP, ML, DV (in mm): ... ; +0.6, +0.6, -4.9 (nucleus basalis) ..." Is this the right coordinates for the nucleus basalis? 

      Thank you for catching this - this was indeed incorrect. The coordinates were correct, but our annotation of brain region was not (as the reviewer correctly points out, these coordinates are in the horizontal limb of the diagonal band, not the nucleus basalis). We have corrected this.

      Reviewer #2 (Recommendations For The Authors):

      Thank you for addressing most of the points raised in my original review. I still some concerns relating to the analysis of the data. 

      (1) I appreciate the authors point that getting mice to reliably during head-fixed recordings can require training. Since mice in this study were not trained to run, their low speed of locomotion limits the interpretation of the results. I think this is an important potential caveat and I have retained it in the public review. 

      This might be a misunderstanding. The Jordan paper was a bit of an outlier in that we needed mice to run at very high rates due to fact that our recording times was only minutes. Mice were chosen such that they would more or less continuously run, to maximize the likelihood that they would run during the intracellular recordings. This was what we tried to convey in our previous response. The speed range covered by the analysis in this paper is 0 cm/s to 36 cm/s. 36 cm/s is not far away from the top speed mice can reach on this treadmill (30 cm/s is 1 revolution of the treadmill per second). In our data, the top speed we measured across all mice was 36 cm/s. In the Jordan paper, the peak running speed across the entire dataset was 44 cm/s. Based on the reviewer’s comment, we suspect that the reviewer may be under the impression that 30 cm/s is a relatively slow running speed. To illustrate what this looks like we have made added a video (Video S1) to illustrate different running speeds. 

      (2) The majority of the analyses in the revised manuscript focus on grand average responses, which may mask heterogeneity in the underlying neural populations. This could be addressed by analysing the magnitude and latency of responses for individual neurons. For example, if I understand correctly, the analyses include all neurons, whether or not they are activated, inhibited, or unaffected by visual stimulation and locomotion. For example, while on average layer 2/3 neurons are suppressed by the grating stimulus (Figure 4A), presumable a subset are activated. Evaluating the effects of optogenetic stimulation and locomotion without analyzing them at the level of individual neurons could result in misleading conclusions. This could be presented in the form of a scatter plot, depicting the magnitude of neuronal responses in locomotion vs stationary condition, and opto+ vs no opto conditions. 

      We might be misunderstanding. The first part of the comment is a bit too unspecific to address directly. In cases in which we find the variability is relevant to our conclusions, we do show this for individual cells (e.g.the latencies to running onset are shown as histograms for all cells and axons in Figure S1). It is also unclear to us what the reviewer means by “Evaluating the effects of optogenetic stimulation and locomotion without analyzing them at the level of individual neurons could result in misleading conclusions”. Our conclusions relate to the average responses in L2/3, consistent with the analysis shown. All data will be freely available for anyone to perform follow-up analysis of things we may have missed. E.g., the specific suggestion of presenting the data shown in Figure 4 as a scatter plot is shown below (Figure R2). This is something we had looked at but found not to be relevant to our conclusions. The problem with this analysis is that it is difficult to estimate how much the different sources of variability contribute to the total variability observed in the data, and no interesting pattern is clearly apparent. All relevant and clear conclusions are already captured by the mean differences shown in Figure 4. 

      Author response image 2.

      Optogenetic activation of cholinergic axons in visual cortex primarily enhances responses of layer 5, but not layer 2/3 neurons. Related to Figure 4. (A) Average calcium response of layer 2/3 neurons in visual cortex to full field drifting grating in the absence or presence of locomotion. Each dot is the average calcium activity of an individual neuron during the two conditions. (B) As in A, but for layer 5 neurons. (C) As in A, but comparing the average response while the mice were stationary, to that while cholinergic axons were optogenetically stimulated. (D) As in C, but for layer 5 neurons. (E) Average calcium response of layer 2/3 neurons in visual cortex to visuomotor mismatch, without and with optogenetic stimulation of cholinergic axons in visual cortex. (F) As in E, but for layer 5 neurons. (G) Average calcium response of layer 2/3 neurons in visual cortex to locomotion onset in closed loop, without and with optogenetic stimulation of cholinergic axons in visual cortex. (H) As in G, but for layer 5 neurons.

      (3) To help the reader understand the experimental conditions in open loop experiments, please include average visual flow speed traces for each condition in Figure 5. 

      We have added the locomotion velocity and visual flow speeds to the corresponding conditions in Figure

    1. Author response: 

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Meissner-Bernard et al present a biologically constrained model of telencephalic area of adult zebrafish, a homologous area to the piriform cortex, and argue for the role of precisely balanced memory networks in olfactory processing.

      This is interesting as it can add to recent evidence on the presence of functional subnetworks in multiple sensory cortices. It is also important in deviating from traditional accounts of memory systems as attractor networks. Evidence for attractor networks has been found in some systems, like in the head direction circuits in the flies. However, the presence of attractor dynamics in other modalities, like sensory systems, and their role in computation has been more contentious. This work contributes to this active line of research in experimental and computational neuroscience by suggesting that, rather than being represented in attractor networks and persistent activity, olfactory memories might be coded by balanced excitation-inhibitory subnetworks.

      Strengths:

      The main strength of the work is in: (1) direct link to biological parameters and measurements, (2) good controls and quantification of the results, and (3) comparison across multiple models.

      (1) The authors have done a good job of gathering the current experimental information to inform a biological-constrained spiking model of the telencephalic area of adult zebrafish. The results are compared to previous experimental measurements to choose the right regimes of operation.

      (2) Multiple quantification metrics and controls are used to support the main conclusions and to ensure that the key parameters are controlled for - e.g. when comparing across multiple models.

      (3) Four specific models (random, scaled I / attractor, and two variant of specific E-I networks - tuned I and tuned E+I) are compared with different metrics, helping to pinpoint which features emerge in which model.

      Weaknesses:

      Major problems with the work are: (1) mechanistic explanation of the results in specific E-I networks, (2) parameter exploration, and (3) the functional significance of the specific E-I model.

      (1) The main problem with the paper is a lack of mechanistic analysis of the models. The models are treated like biological entities and only tested with different assays and metrics to describe their different features (e.g. different geometry of representation in Fig. 4). Given that all the key parameters of the models are known and can be changed (unlike biological networks), it is expected to provide a more analytical account of why specific networks show the reported results. For instance, what is the key mechanism for medium amplification in specific E/I network models (Fig. 3)? How does the specific geometry of representation/manifolds (in Fig. 4) emerge in terms of excitatory-inhibitory interactions, and what are the main mechanisms/parameters? Mechanistic account and analysis of these results are missing in the current version of the paper.

      We agree with the reviewer that a mechanistic analysis of manifold geometry is of high interest and we will address this issue in our revisions. We are currently exploring approaches to better understand how amplification of activity is controlled in E/I assemblies, and how geometric modifications can be described in terms of elementary excitatory and inhibitory interactions. We expect these approaches to provide new mechanistic insights into representational manifolds.

      (2) The second major issue with the study is a lack of systematic exploration and analysis of the parameter space. Some parameters are biologically constrained, but not all the parameters. For instance, it is not clear what the justification for the choice of synaptic time scales are (with E synaptic time constants being larger than inhibition: tau_syn_i = 10 ms, tau_syn_E = 30 ms). How would the results change if they are varying these - and other unconstrained - parameters? It is important to show how the main results, especially the manifold localisation, would change by doing a systematic exploration of the key parameters and performing some sensitivity analysis. This would also help to see how robust the results are, which parameters are more important and which parameters are less relevant, and to shed light on the key mechanisms.

      We varied neuronal and network parameters in the past and we are currently performing additional systematic parameter variations to further address this comment. Preliminary results indicate that networks with similar properties can be obtained with equal synaptic time constants and biophysical parameters for all E and I neurons, thus supporting the notion that representational geometry is determined primarily by connectivity. Results of parameter variations will be reported in the revised manuscript.

      (3) It is not clear what the main functional advantage of the specific E-I network model is compared to random networks. In terms of activity, they show that specific E-I networks amplify the input more than random networks (Fig. 3). But when it comes to classification, the effect seems to be very small (Fig. 5c). Description of different geometry of representation and manifold localization in specific networks compared to random networks is good, but it is more of an illustration of different activity patterns than proving a functional benefit for the network. The reader is still left with the question of what major functional benefits (in terms of computational/biological processing) should be expected from these networks, if they are to be a good model for olfactory processing and learning.

      One possibility for instance might be that the tasks used here are too easy to reveal the main benefits of the specific models - and more complex tasks would be needed to assess the functional enhancement (e.g. more noisy conditions or more combination of odours). It would be good to show this more clearly - or at least discuss it in relation to computation and function.

      We agree that further insights into potential benefits of manifold representations would be interesting. In the initial manuscript we performed analyses of pattern classification primarily to examine whether the structured E/I networks studied here can support pattern classification at all, given that they do not exhibit discrete attractor states or global pattern completion. As structured E/I networks still support pattern classification when activity is read out from neuronal subsets, we concluded that structured E/I networks are not in conflict with the general notion of pattern classification by autoassociation. In addition, manifold representations may support a variety of other computations that we discussed only superficially.  In the revised we are planning to address this issue in more depth by additional discussion and analyses. In particular, we are planning to address the hypothesis that manifold geometry provides a continuous distance metric to analyze relationships between inputs and relevant stimuli (learned odors) in the presence of irrelevant stimulus components (non-learned odors).

      Reviewer #2 (Public Review):

      Summary:

      The authors conducted a comparative analysis of four networks, varying in the presence of excitatory assemblies and the architecture of inhibitory cell assembly connectivity. They found that co-tuned E-I assemblies provide network stability and a continuous representation of input patterns (on locally constrained manifolds), contrasting with networks with global inhibition that result in attractor networks.

      Strengths:

      The findings presented in this paper are very interesting and cutting-edge. The manuscript effectively conveys the message and presents a creative way to represent high-dimensional inputs and network responses. Particularly, the result regarding the projection of input patterns onto local manifolds and continuous representation of input/memory is very Intriguing and novel. Both computational and experimental neuroscientists would find value in reading the paper.

      Weaknesses:

      Intuitively, classification (decodability) in discrete attractor networks is much better than in networks that have continuous representations. This could also be shown in Figure 5B, along with the performance of the random and tuned E-I networks. The latter networks have the advantage of providing network stability compared to the Scaled I network, but at the cost of reduced network salience and, therefore, reduced input decodability. The authors may consider designing a decoder to quantify and compare the classification performance of all four networks.

      As suggested by the reviewer, we will explicitly examine decodability by different types of networks in the revised manuscript.

      Networks featuring E/I assemblies could potentially represent multistable attractors by exploring the parameter space for their reciprocal connectivity and connectivity with the rest of the network. However, for co-tuned E-I networks, the scope for achieving multistability is relatively constrained compared to networks employing global or lateral inhibition between assemblies. It would be good if the authors mentioned this in the discussion. Also, the fact that reciprocal inhibition increases network stability has been shown before and should be cited in the statements addressing network stability (e.g., some of the citations in the manuscript, including Rost et al. 2018, Lagzi & Fairhall 2022, and Vogels et al. 2011 have shown this).

      We thank the reviewer for this comment and will revise the manuscript accordingly.

      Providing raster plots of the pDp network for familiar and novel inputs would help with understanding the claims regarding continuous versus discrete representation of inputs, allowing readers to visualize the activity patterns of the four different networks. (similar to Figure 1B).

      We will follow the suggestion by the reviewer and include raster plots of responses to both familiar and novel inputs in the revised manuscript.

      Reviewer #3 (Public Review):

      Summary:

      This work investigates the computational consequences of assemblies containing both excitatory and inhibitory neurons (E/I assembly) in a model with parameters constrained by experimental data from the telencephalic area Dp of zebrafish. The authors show how this precise E/I balance shapes the geometry of neuronal dynamics in comparison to unstructured networks and networks with more global inhibitory balance. Specifically, E/I assemblies lead to the activity being locally restricted onto manifolds - a dynamical structure in between high-dimensional representations in unstructured networks and discrete attractors in networks with global inhibitory balance. Furthermore, E/I assemblies lead to smoother representations of mixtures of stimuli while those stimuli can still be reliably classified, and allow for more robust learning of additional stimuli.

      Strengths:

      Since experimental studies do suggest that E/I balance is very precise and E/I assemblies exist, it is important to study the consequences of those connectivity structures on network dynamics. The authors convincingly show that E/I assemblies lead to different geometries of stimulus representation compared to unstructured networks and networks with global inhibition. This finding might open the door for future studies for exploring the functional advantage of these locally defined manifolds, and how other network properties allow to shape those manifolds.

      The authors also make sure that their spiking model is well-constrained by experimental data from the zebrafish pDp. Both spontaneous and odor stimulus triggered spiking activity is within the range of experimental measurements. But the model is also general enough to be potentially applied to findings in other animal models and brain regions.

      Weaknesses:

      I find the point about pattern completion a bit confusing. In Fig. 3 the authors argue that only the Scaled I network can lead to pattern completion for morphed inputs since the output correlations are higher than the input correlations. For me, this sounds less like the network can perform pattern completion but it can nonlinearly increase the output correlations. Furthermore, in Suppl. Fig. 3 the authors show that activating half the assembly does lead to pattern completion in the sense that also non-activated assembly cells become highly active and that this pattern completion can be seen for Scaled I, Tuned E+I, and Tuned I networks. These two results seem a bit contradictory to me and require further clarification, and the authors might want to clarify how exactly they define pattern completion.

      We believe that this comment concerns a semantic misunderstanding and apologize for any lack of clarity. The reviewer is correct that “pattern completion” in morphing experiments can be described as a nonlinear increase in output correlations in response to related inputs. This is different from the results obtained by simulated current injections because currents were targeted to subsets of assembly neurons and the analysis focused on firing rates within and outside assemblies. We referred to results of both experiments as “pattern completion” because this has been standard in the neurobiological and in the computer science literature, respectively. However, we agree that this can cause confusion and we will revise the manuscript to clarify this issue.

      The authors argue that Tuned E+I networks have several advantages over Scaled I networks. While I agree with the authors that in some cases adding this localized E/I balance is beneficial, I believe that a more rigorous comparison between Tuned E+I networks and Scaled I networks is needed: quantification of variance (Fig. 4G) and angle distributions (Fig. 4H) should also be shown for the Scaled I network. Similarly in Fig. 5, what is the Mahalanobis distance for Scaled I networks and how well can the Scaled I network be classified compared to the Tuned E+I network? I suspect that the Scaled I network will actually be better at classifying odors compared to the E+I network. The authors might want to speculate about the benefit of having networks with both sources of inhibition (local and global) and hence being able to switch between locally defined manifolds and discrete attractor states.

      As pointed out already in response to reviewer 1, we agree that the potential computational benefits of continuous manifold representations in comparison to discrete attractor states is an important point that merits further exploration and discussion. We are therefore planning to include a more in-depth discussion and to perform further analyses. The specific suggestions of the reviewer will be addressed.

      At a few points in the manuscript, the authors use statements without actually providing evidence in terms of a Figure. Often the authors themselves acknowledge this, by adding the term "not shown" to the end of the sentence. I believe it will be helpful to the reader to be provided with figures or panels in support of the statements.

      Thank you for this comment. We shall be happy to include additional data figures in the revised manuscript.

    1. Author response:

      eLife assessment

      The authors present an algorithm and workflow for the inference of developmental trajectories from single-cell data, including a mathematical approach to increase computational efficiency. While such efforts are in principle useful, the absence of benchmarking against synthetic data and a wide range of different single-cell data sets make this study incomplete. Based on what is presented, one can neither ultimately judge if this will be an advance over previous work nor whether the approach will be of general applicability.

      We thank the eLife editor for the valuable feedback. We wish to emphasize that both, benchmarking against other methods and validation on a synthetic dataset (“dyntoy”) are indeed presented in Supplementary Note, although we failed to sufficiently emphasize it in the main text. 

      We will extend the benchmarking to more TI methods and we will improve the results and discussion sections to present those facts more clearly to the reader.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors present tviblindi, a computational workflow for trajectory inference from molecular data at single-cell resolution. The method is based on (i) pseudo-time inference via expecting hitting time, (ii) sampling of random walks in a directed acyclic k-NN where edges are oriented away from a cell of origin w.r.t. the involved nodes' expected hitting times, and (iii) clustering of the random walks via persistent homology. An extended use case on mass cytometry data shows that tviblindi can be used elucidate the biology of T cell development.

      Strengths:

      - Overall, the paper is very well written and most (but not all, see below) steps of the tviblindi algorithm are explained well.

      - The T cell biology use case is convincing (at least to me: I'm not an immunologist, only a bioinformatician with a strong interest in immunology).

      We thank the reviewer for feedback and suggestions that we will accommodate, we respond point-by-point below

      Weaknesses:

      - The main weakness of the paper is that a systematic comparison of tviblindi against other tools for trajectory inference (there are many) is entirely missing. Even though I really like the algorithmic approach underlying tviblindi, I would therefore not recommend to our wet-lab collaborators that they should use tviblindi to analyze their data. The only validation in the manuscript is the T cell development use case. Although this use case is convincing, it does not suffice for showing that the algorithms's results are systematically trustworthy and more meaningful (at least in some dimension) than trajectories inferred with one of the many existing methods.

      We have compared tviblindi to several trajectory inference methods (Supplementary note section 8.2: Comparison to state-of-the-art methods, namely Monocle3 (v1.3.1) Cao et al. (2019), Stream (v1.1) Chen et al. (2019), Palantir (v1.0.0) Setty et al. (2019), VIA (v0.1.89) Stassen et al. (2021) and PAGA (scanpy==1.9.3) Wolf et al. (2019).) We will add thorough and systematic comparisons to the other algorithms mentioned by reviewers. We will include extended evaluation on publically available datasets.

      Also, we have successfully used tviblindi to investigate human B-cell development in primary immunodeficiency (manuscript in revisions), double negative T-cells development in ALPS (Autoimmune Lymphoproliferative Syndrome) by mass cytometry (project in progress).

      - The authors' explanation of the random walk clustering via persistent homology in the Results (subsection "Real-time topological interactive clustering") is not detailed enough, essentially only concept dropping. What does "sparse regions" mean here and what does it mean that "persistent homology" is used? The authors should try to better describe this step such that the reader has a chance to get an intuition how the random walk clustering actually works. This is especially important because the selection of sparse regions is done interactively. Therefore, it's crucial that the users understand how this selection affects the results. For this, the authors must manage to provide a better intuition of the maths behind clustering of random walks via persistent homology.

      In order to satisfy both reader types: the biologist and the mathematician, we explain the mathematics in detail in the Supplementary Note, section 4. We will improve the Results text to better point the reader to the mathematical foundations in the Supplementary Note.

      - To motivate their work, the authors write in the introduction that "TI methods often use multiple steps of dimensionality reduction and/or clustering, inadvertently introducing bias. The choice of hyperparameters also fixes the a priori resolution in a way that is difficult to predict." They claim that tviblindi is better than the original methods because "analysis is performed in the original high-dimensional space, avoiding artifacts of dimensionality reduction." However, in the manuscript, tviblindi is tested only on mass cytometry data which has a much lower dimensionality than scRNA-seq data for which most existing trajectory inference methods are designed. Since tviblindi works on a k-NN graph representation of the input data, it is unclear if it could be run on scRNA-seq data without prior dimensionality reduction. For this, cell-cell distances would have to be computed in the original high-dimensional space, which is problematic due to the very high dimensionality of scRNA-seq data. Of course, the authors could explicitly reduce the scope of tviblindi to data of lower dimensionality, but this would have to be stated explicitly.

      In the manuscript we tested the framework on the scRNA-seq data from Park et al 2020 (DOI: 10.1126/science.aay3224). To illustrate that tviblindi can work directly in the high-dimensional space, we applied the framework successfully on imputed 2000 dimensional data.

      The idea behind tviblindi is to be able to work without the necessity to use non-linear dimensionality reduction techniques, which reduce the dimensionality to a very low number of dimensions and whose effects on the data distribution are difficult to predict. On the other hand the use of (linear) dimensionality reduction techniques which effectively suppress noise in the data such as PCA is a good practice (see also response to reviewer 2). We will emphasize this in the revised version and add the results of the corresponding analysis.

      - Also tviblindi has at least one hyper-parameter, the number k used to construct the k-NN graphs (there are probably more hidden in the algorithm's subroutines). I did not find a systematic evaluation of the effect of this hyper-parameter.

      Detailed discussion of the topic is presented in the Supplementary Note, section 8.1, where Spearman correlation coefficient between pseudotime estimated using k=10 and k=50 nearest neighbors was 0.997.   The number k however does affect the number of candidate endpoints. But even when larger k causes spurious connection between unrelated cell fates, the topological clustering of random walks allows for the separation of different trajectories. We will expand the “sensitivity to hyperparameters section” also in response to reviewer 2.

      Reviewer #2 (Public Review):

      Summary:

      In Deconstructing Complexity: A Computational Topology Approach to Trajectory Inference in the Human Thymus with tviblindi, Stuchly et al. propose a new trajectory inference algorithm called tviblindi and a visualization algorithm called vaevictis for single-cell data. The paper utilizes novel and exciting ideas from computational topology coupled with random walk simulations to align single cells onto a continuum. The authors validate the utility of their approach largely using simulated data and establish known protein expression dynamics along CD4/CD8 T cell development in thymus using mass cytometry data. The authors also apply their method to track Treg development in single-cell RNA-sequencing data of human thymus.

      The technical crux of the method is as follows: The authors provide an interactive tool to align single cells along a continuum axis. The method uses expected hitting time (given a user input start cell) to obtain a pseudotime alignment of cells. The pseudotime gives an orientation/direction for each cell, which is then used to simulate random walks. The random walks are then arranged/clustered based on the sparse region in the data they navigate using persistent homology.

      We thank the reviewer for feedback and suggestions that we will accommodate, we respond point-by-point below.

      Strengths:

      The notion of using persistent homology to group random walks to identify trajectories in the data is novel.

      The strength of the method lies in the implementation details that make computationally demanding ideas such as persistent homology more tractable for large scale single-cell data.

      This enables the authors to make the method more user friendly and interactive allowing real-time user query with the data.

      Weaknesses:

      The interactive nature of the tool is also a weakness, by allowing for user bias leading to possible overfitting for a specific data.

      tviblindi is not designed as a fully automated TI tool (although it implements a fully automated module), but as a data driven framework for exploratory analysis of unknown data. There is always a risk of possible bias in this type of analysis - starting with experimental design, choice of hyperparameters in the downstream analysis, and an expert interpretation of the results. The successful analysis of new biological data involves a great deal of expert knowledge which is difficult to a priori include in the computational models.

      tvilblindi tries to solve this challenge by intentionally overfitting the data and keeping the level of resolution on a single random walk. In this way we aim to capture all putative local relationships in the data. The on-demand aggregation of the walks using the global topology of the data allows researchers to use their expert knowledge to choose the right level of detail (as demonstrated in the Figure 4 of the manuscript) while relying on the topological structure of the high dimensional point cloud. At all times tviblindi allows to inspect the composition of the trajectory to assess the variance in the development, possible hubs on the KNN-graph etc.

      The main weakness of the method is lack of benchmarking the method on real data and comparison to other methods. Trajectory inference is a very crowded field with many highly successful and widely used algorithms, the two most relevant ones (closest to this manuscript) are not only not benchmarked against, but also not sited. Including those that specifically use persistent homology to discover trajectories (Rizvi et.al. published Nat Biotech 2017). Including those that specifically implement the idea of simulating random walks to identify stable states in single-cell data (e.g. CellRank published in Lange et.al Nat Meth 2022), as well as many trajectory algorithms that take alternative approaches. The paper has much less benchmarking, demonstration on real data and comparison to the very many other previous trajectory algorithms published before it. Generally speaking, in a crowded field of previously published trajectory methods, I do not think this one approach will compete well against prior work (especially due to its inability to handle the noise typical in real world data (as was even demonstrated in the little bit of application to real world data provided).

      We provide comparisons of tviblindi and vaevictis in the Supplementary Note, section 8.2, where we compare it to Monocle3 (v1.3.1) Cao et al. (2019), Stream (v1.1) Chen et al. (2019), Palantir (v1.0.0) Setty et al. (2019), VIA (v0.1.89) Stassen et al. (2021) and PAGA (scanpy==1.9.3) Wolf et al. (2019). We use two datasets: artificial Dyntoy and real mass cytometry thymus+peripheral blood dataset. We thank the reviewer for suggesting specific methods.  CellRank was excluded from the benchmarking as it was originally designed for RNA-velocity data (not available in mass cytometry data), but will include recent upgrade CellRank2 (preprint at doi.org/10.1101/2023.07.19.549685) which offers more flexibility.

      We will add further benchmarking as suggested by the reviewer in the course of revisions.

      Beyond general lack of benchmarking there are two issues that give me particular concern. As previously mentioned, the algorithm is highly susceptible to user bias and overfitting. The paper gives the example (Figure 4) of a trajectory which mistakenly shows that cells may pass from an apoptotic phase to a different developmental stage. To circumvent this mistake, the authors propose the interactive version of tviblindi that allows users to zoom in (increase resolution) and identify that there are in fact two trajectories in one. In this case, the authors show how the author can fix a mistake when the answer is known. However, the point of trajectory inference is to discover the unknown. With so much interactive options for the user to guide the result, the method is more user/bias driven than data-driven. So a rigorous and quantitative discussion of robustness of the method, as well as how to ensure data-driven inference and avoid over-fitting would be useful.

      Local directionality in expression data is a challenge which is not, to our knowledge, solved. And we are not sure it can be solved entirely, even theoretically. The random walks passing “through” the apoptotic phase are biologically infeasible, but it is an (unbiased) representation of what the data look like based on the diffusion model. It is a property of the data (or of the panel design), which has to be interpreted properly rather than a mistake. Of note, except for Monocle3 (which does not provide the directionality) other tested methods did not discover this trajectory at all.

      The “zoom in” has in fact nothing to do with “passing through the apoptosis”. We show how the researcher can investigate the suggested trajectory to see if there is an additional structure of interest and/or relevance. This investigation is still data driven (although not fully automated). Anecdotally in this particular case this branching was discovered by an bioinformatician, who knew nothing about the presence of beta-selection in the data. 

      We show that the trajectory of apoptosis of cortical thymocytes consists of 2 trajectories corresponding to 2 different checkpoints (beta-selection and positive/negative selection). This type of structure, where 2 (or more) trajectories share the same path for most of the time, then diverge only to be connected at a later moment (immediately from the point of view of the beta-selection failure trajectory) is a challenge for TI algorithms and none of tested methods gave a correct result. More importantly there seems to be no clear way to focus on these kinds of structures (common origin and common fate) in TI methods.

      Of note, the “zoom in” is a recommended and convenient method to look for an inner structure, but it does not necessarily mean addition of further homological classes. Indeed, in this case the reason that the structure is not visible directly is the limitation of the dendrogram complexity (only branches containing at least 10% of simulated random walks are shown by default).

      In summary, tviblindi effectively handled all noise in the data that obscured biologically valid trajectories for other methods. We will improve the discussion of the robustness in the reviewed version. 

      Second, the paper discusses the benefit of tviblindi operating in the original high dimensions of the data. This is perhaps adequate for mass cytometry data where there is less of an issue of dropouts and the proteins may be chosen to be large independent. But in the context of single-cell RNA-sequencing data, the massive undersampling of mRNA, as well as high degree of noise (e.g. ambient RNA), introduces very large degree of noise so that modeling data in the original high dimensions leads to methods being fit to the noise. Therefore ALL other methods for trajectory inference work in a lower dimension, for very good reason, otherwise one is learning noise rather than signal. It would be great to have a discussion on the feasibility of the method as is for such noisy data and provide users with guidance. We note that the example scRNA-seq data included in the paper is denoised using imputation, which will likely result in the trajectory inference being oversmoothed as well.

      We agree with the reviewer. In our manuscript we wanted to showcase that tviblindi can directly operate in high-dimensional space (thousands of dimensions) and we used MAGIC imputation for this purpose. This was not ideal. More standard approach, which uses 30-50 PCs as input to the algorithm resulted in equivalent trajectories. We will add this analysis to the study.

      In summary, the fact that tviblindi scales well with dimensionality of the data and is able to work in the original space does not mean that it is always the best option. We will emphasize in the revised paper that we aim to avoid the non-linear dimensional reduction techniques as a data preprocessing tool, as the effect of the reduction is difficult to predict. We will also discuss the preprocessing of scRNA-seq data in greater detail.

      Reviewer #3 (Public Review):

      Summary:

      Stuchly et al. proposed a single-cell trajectory inference tool, tviblindi, which was built on a sequential implementation of the k-nearest neighbor graph, random walk, persistent homology and clustering, and interactive visualization. The paper was organized around the detailed illustration of the usage and interpretation of results through the human thymus system.

      Strengths:

      Overall, I found the paper and method to be practical and needed in the field. Especially the in-depth, step-by-step demonstration of the application of tviblindi in numerous T cell development trajectories and how to interpret and validate the findings can be a template for many basic science and disease-related studies. The videos are also very helpful in showcasing how the tool works.

      Weaknesses:

      I only have a few minor suggestions that hopefully can make the paper easier to follow and the advantage of the method to be more convincing.

      (1) The "Computational method for the TI and interrogation - tviblindi" subsection under the Results is a little hard to follow without having a thorough understanding of the tviblindi algorithm procedures. I would suggest that the authors discuss the uniqueness and advantages of the tool after the detailed introduction of the method (moving it after the "Connectome - a fully automated pipeline".

      We thank the reviewer for the suggestion and we will accommodate it to improve readability of the text.

      Also, considering it is a computational tool paper, inevitably, readers are curious about how it functions compared to other popular trajectory inference approaches. I did not find any formal discussion until almost the end of the supplementary note (even that is not cited anywhere in the main text). Authors may consider improving the summary of the advantages of tviblindi by incorporating concrete quantitative comparisons with other trajectory tools.

      We provide comparisons of tviblindi and vaevictis in the Supplementary Note, section 8.2, where we compare it to Monocle3 (v1.3.1) Cao et al. (2019), Stream (v1.1) Chen et al. (2019), Palantir (v1.0.0) Setty et al. (2019), VIA (v0.1.89) Stassen et al. (2021) and PAGA (scanpy==1.9.3) Wolf et al. (2019). We use two datasets: artificial Dyntoy and real mass cytometry thymus+peripheral blood dataset. We will also add CellRank2 into comparisons and we will strengthen the message of the benchmarking results in the Discussion section.

      (2) Regarding the discussion in Figure 4 the trajectory goes through the apoptotic stage and reconnects back to the canonical trajectory with counterintuitive directionality, it can be a checkpoint as authors interpret using their expert knowledge, or maybe a false discovery of the tool. Maybe authors can consider running other algorithms on those cells and see which tracks they identify and if the directionality matches with the tviblindi.

      We have indeed used the thymus dataset for comparison of all TI algorithms listed above. Except for Monocle 3 they failed to discover the negative selection branch (Monocle 3 does not offer directionality information). Therefore, a valid topological trajectory with incorrect (expert-corrected) directionality was partly or entirely missed by other algorithms.

      (3) The paper mainly focused on mass cytometry data and had a brief discussion on scRNA-seq. Can the tool be applied to multimodality data such as CITE-seq data that have both protein markers and gene expression? Any suggestions if users want to adapt to scATAC-seq or other epigenomic data?

      The analysis of multimodal data is the logical next step and is the topic of our current research. At this moment tviblindi cannot be applied directly to multimodal data. It is possible to use the KNN-graph based on multimodal data (such as weighted nearest neighbor graph implemented in Seurat) for pseudotime calculation and random walk simulation. However, we do not have a fully developed triangulation for the multimodal case yet.

    1. Author response:

      Reviewer #1 (Public Review):

      Summary:

      The manuscript focuses on the role of the deubiquitinating enzyme UPS-50/USP8 in endosome maturation. The authors aimed to clarify how this enzyme drives the conversion of early endosomes into late endosomes. Overall, they did achieve their aims in shedding light on the precise mechanisms by which UPS-50/USP8 regulates endosome maturation. The results support their conclusions that UPS-50 acts by disassociating RABX-5 from early endosomes to deactivate RAB-5 and by recruiting SAND-1/Mon1 to activate RAB-7. This work is commendable and will have a significant impact on the field. The methods and data presented here will be useful to the community in advancing our understanding of endosome maturation and identifying potential therapeutic targets for diseases related to endosomal dysfunction. It is worth noting that further investigation is required to fully understand the complexities of endosome maturation. However, the findings presented in this manuscript provide a solid foundation for future studies.

      We thank this reviewer for the instructive suggestions and encouragement.

      Strengths:

      The major strengths of this work lie in the well-designed experiments used to examine the effects of UPS-50 loss. The authors employed confocal imaging to obtain a picture of the aftermath of the USP-50 loss. Their findings indicated enlarged early endosomes and MVB-like structures in cells deficient in USP-50/USP8.

      We thank this reviewer for the instructive suggestions and encouragement.

      Weaknesses:

      Specifically, there is a need for further investigation to accurately characterize the anomalous structures detected in the ups-50 mutant. Also, the correlation between the presence of these abnormal structures and ESCRT-0 is yet to be addressed, and the current working model needs to be revised to prevent any confusion between enlarged early endosomes and MVBs.

      Excellent suggestions. The EM imaging indeed revealed an increase in enlarged cellular vesicles containing various contents in usp-50 mutants. However, the detailed molecular features of these vesicles remain unclear. Therefore, we plan to utilize ESCRT components for double staining with early or late endosome markers. This will enable us to accurately characterize the anomalous structures detected in the usp-50 mutants.

      Reviewer #2 (Public Review):

      Summary:

      In this study, the authors study how the deubiquitinase USP8 regulates endosome maturation in C. elegans and mammalian cells. The authors have isolated USP8 mutant alleles in C. elegans and used multiple in vivo reporter lines to demonstrate the impact of USP8 loss-of-function on endosome morphology and maturation. They show that in USP8 mutant cells, the early endosomes and MVB-like structures are enlarged while the late endosomes and lysosomal compartments are reduced. They elucidate that USP8 interacts with Rabx5, a guanine nucleotide exchange factor (GEF) for Rab5, and show that USP8 likely targets specific lysine residue of Rabx5 to dissociate it from early endosomes. They also find that the localization of USP8 to early endosomes is disrupted in Rabx5 mutant cells. They observe that in both Rabx5 and USP8 mutant cells, the Rab7 GEF SAND-1 puncta which likely represents late endosomes are diminished, although Rabex5 is accumulated in USP8 mutant cells. The authors provide evidence that USP8 regulates endosomal maturation in a similar fashion in mammalian cells. Based on their observations they propose that USP8 dissociates Rabex5 from early endosomes and enhances the recruitment of SAND-1 to promote endosome maturation.

      We thank this reviewer for the instructive suggestions and encouragement.

      Strengths:

      The major highlights of this study include the direct visualization of endosome dynamics in a living multi-cellular organism, C. elegans. The high-quality images provide clear in vivo evidence to support the main conclusions. The authors have generated valuable resources to study mechanisms involved in endosome dynamics regulation in both the worm and mammalian cells, which would benefit many members of the cell biology community. The work identifies a fascinating link between USP8 and the Rab5 guanine nucleotide exchange factor Rabx5, which expands the targets and modes of action of USP8. The findings make a solid contribution toward the understanding of how endosomal trafficking is controlled.

      We thank this reviewer for the instructive suggestions and encouragement.

      Weaknesses:

      -The authors utilized multiple fluorescent protein reporters, including those generated by themselves, to label endosomal vesicles. Although these are routine and powerful tools for studying endosomal trafficking, these results cannot tell whether the endogenous proteins (Rab5, Rabex5, Rab7, etc.) are affected in the same fashion.

      Good suggestion. Indeed, to test whether the endogenous proteins (Rab5, Rabex5, Rab7, etc.) are affected in the same fashion as fluorescent protein reporters, we supplemented our approach with the utilization of endogenous markers. These markers, including Rab5, RAB-5, Rabex5, RABX-5, and EEA1 for early endosomes, as well as RAB-7, Mon1a, and Mon1b for late endosomes, were instrumental in our investigations (refer to Figure 3, Figure 6, Sup Figure 4, Sup Figure 5, and Sup Figure 7). Our comprehensive analysis, employing various methodologies such as tissue-specific fused proteins, CRISPR/Cas9 knock-in, and antibody staining, consistently highlights the critical role of USP8 in early-to-late endosome conversion.

      -The authors clearly demonstrated a link between USP8 and Rabx5, and they showed that cells deficient in both factors displayed similar defects in late endosomes/lysosomes. However, the authors didn't confirm whether and/or to which extent USP8 regulates endosome maturation through Rabx5. Additional genetic and molecular evidence might be required to better support their working model.

      Excellent point. We plan to conduct additional genetic analyses, including the construction of double mutants between usp-50 and various rabex-5 mutations, to further elucidate the extent to which USP8 regulates endosome maturation via Rabex5.

      Reviewer #3 (Public Review):

      Summary:

      The authors were trying to elucidate the role of USP8 in the endocytic pathway. Using C. elegans epithelial cells as a model, they observed that when USP8 function is lost, the cells have a decreased number and size in lysosomes. Since USP8 was already known to be a protein linked to ESCRT components, they looked into what role USP8 might play in connecting lysosomes and multivesicular bodies (MVB). They observed fewer ESCRT-associated vesicles but an increased number of "abnormal" enlarged vesicles when USP8 function was lost. At this specific point, it's not clear what the objective of the authors was. What would have been their hypothesis addressing whether the reduced lysosomal structures in USP8 (-) animals were linked to MVB formation? Then they observed that the abnormally enlarged vesicles, marked by the PI3P biosensor YFP-2xFYVE, are bigger but in the same number in USP8 (-) compared to wild-type animals, suggesting homotypic fusion. They confirmed this result by knocking down USP8 in a human cell line, and they observed enlarged vesicles marked by YFP-2xFYVE as well. At this point, there is quite an important issue. The use of YFP-2xFYVE to detect early endosomes requires the transfection of the cells, which has already been demonstrated to produce differences in the distribution, number, and size of PI3P-positive vesicles (doi.org/10.1080/15548627.2017.1341465). The enlarged vesicles marked by YFP-2xFYVE would not necessarily be due to the loss of UPS8. In any case, it appears relatively clear that USP8 localizes to early endosomes, and the authors claim that this localization is mediated by Rabex-5 (or Rabx-5). They finally propose that USP8 dissociates Rabx-5 from early endosomes facilitating endosome maturation.

      Weaknesses:

      The weaknesses of this study are, on one side, that the results are almost exclusively dependent on the overexpression of fusion proteins. While useful in the field, this strategy does not represent the optimal way to dissect a cell biology issue. On the other side, the way the authors construct the rationale for each approximation is somehow difficult to follow. Finally, the use of two models, C. elegans and a mammalian cell line, which would strengthen the observations, contributes to the difficulty in reading the manuscript.

      The findings are useful but do not clearly support the idea that USP8 mediates Rab5-Rab7 exchange and endosome maturation, In contrast, they appear to be incomplete and open new questions regarding the complexity of this process and the precise role of USP8 within it.

      We thank this reviewer for the insightful comments. Fluorescence-fused proteins serve as potent tools for visualizing subcellular organelles both in vivo and in live settings. Specifically, in epidermal cells of worms, the tissue-specific expression of these fused proteins is indispensable for studying organelle dynamics within living organisms. This approach is necessitated by the inherent limitations of endogenously tagged proteins, whose fluorescence signals are often weak and unsuitable for live imaging or genetic screening purposes. Acknowledging concerns raised by the reviewer regarding potential alterations in organelle morphology due to overexpression of certain fused proteins, we supplemented our approach with the utilization of endogenous markers. These markers, including Rab5, RAB-5, Rabex5, RABX-5, and EEA1 for early endosomes, as well as RAB-7, Mon1a, and Mon1b for late endosomes, were instrumental in our investigations (refer to Figure 3, Figure 6, Sup Figure 4, Sup Figure 5, and Sup Figure 7). Our comprehensive analysis, employing various methodologies such as tissue-specific fused proteins, CRISPR/Cas9 knock-in, and antibody staining, consistently highlights the critical role of USP8 in early-to-late endosome conversion. Specifically, we discovered that the recruitment of USP-50/USP8 to early endosomes is depending on Rabex5. However, instead of stabilizing Rabex5, the recruitment of USP-50/USP8 leads to its dissociation from endosomes, concomitantly facilitating the recruitment of the Rab7 GEF SAND-1/Mon1. In cells with loss-of-function mutations in usp-50/usp8, we observed enhanced RABX-5/Rabex5 signaling and mis-localization of SAND-1/Mon1 proteins from endosomes. Consequently, this disruption impairs endolysosomal trafficking, resulting in the accumulation of enlarged vesicles containing various intraluminal contents and rudimentary lysosomal structures.

      Through an unbiased genetic screen, verified by cultured mammalian cell studies, we observed that loss-of-function mutations in usp-50/usp8 result in diminished lysosome/late endosomes. To elucidate the underlying mechanisms, we investigated the formation of multivesicular bodies (MVBs), a process tightly linked to USP8 function. Extensive electron microscopy (EM) analysis indicated that MVB-like structures are largely intact in usp-50 mutant cells, suggesting that USP8/USP-50 likely regulate lysosome formation through alternative pathways in addition to their roles in MVB formation and ESCRT component function. USP8 is known to regulate the endocytic trafficking and stability of numerous transmembrane proteins. Interestingly, loss-of-function mutations in usp8 often lead to the enlargement of early endosomes, yet the mechanisms underlying this phenomenon remain unclear. Given that lysosomes receive and degrade materials generated by endocytic pathways, we hypothesized that the abnormally enlarged MVB-like vesicular structures observed in usp-50 or usp8 mutant cells correspond to the enlarged vesicles coated by early endosome markers. Indeed, in the absence of usp8/usp-50, the endosomal Rab5 signal is enhanced, while early endosomes are significantly enlarged. Given that Rab5 guanine nucleotide exchange factor (GEF), Rabex5, is essential for Rab5 activation, we further investigated its dynamics. Additional analyses conducted in both worm hypodermal cells and cultured mammalian cells revealed an increase of endosomal Rabex5 in response to usp8/usp-50 loss-of-function. Live imaging studies further demonstrated active recruitment of USP8 to newly formed Rab5-positive vesicles, aligning spatiotemporally with Rabex5 regulation. Through systematic exploration of putative USP-50 binding partners on early endosomes, we identified its interaction with Rabex5. Comprehensive genetics and biochemistry experiments demonstrated that USP8 acts through K323 site de-ubiquitination to dissociate Rabex5 from early endosomes and promotes the recruitment of the Rab7 GEF SAND-1/Mon1. In summary, our study began with an unbiased genetic screen and subsequent examination of established theories, leading to the formulation of our own hypothesis. Through multifaceted approaches, we unveiled a novel function of USP8 in early-to-late endosome conversion.

    1. Author response:

      We would like to thank all the reviewers and editors for their thoughtful and detailed comments, critiques and suggestions. We will revise our manuscript in accordance with all the points raised by the reviewers. Here we summarize some of the main points that we intend to address in our revised manuscript.

      The reviewers noted that we were not sufficiently careful in identifying possible exogenous cues that the mice might be using to locate the cues and that we did not consider why such cues might be ineffective. As the reviewers point out, the mice may be ignoring the visual landmarks (and floor scratches) because they are not reliable cues and their relation to the food varies with the entrance the mice have used. In particular, a reviewer refers to papers that show that “in environments with 'unreliable' landmarks, place cells are not controlled by landmarks”. These papers were known to the authors but failed to make final cut of our extensive discussion. This important point will be thoroughly addressed.

      Another critical point was the mice were often doing thigmotaxis. The literature on thigmotaxis was known to us and we will now directly refer to this point. We do note that the final average start to food trajectory (TEV) is directly to the food. In other words, the thigmotaxic trajectories and “towards the center” trajectories effectively average out.

      There was a very cogent point about the difficulty of totally eliminating odor cues that we will now address. Finally, based on studies using a virtual reality environment, one reviewer questioned the use of “path integration” as a signal that encodes goal location. The relevance of path integration to spatial learning and performance is a very difficult issue that, to our knowledge, has never been entirely settled in the vast spatial learning literature. We do not think that our data can “settle’ this issue but will try to at least be explicit re the complexity of the path integration hypothesis as it applies to both our own data and the virtual reality literature. In particular, we will discuss the potential roles of optic flow versus proprioceptive and vestibular inputs to a putative path integration mechanism.

      Finally, the reviewers raised many important technical points re statistics reporting and how the figures are presented. In our revision, we will completely comply with all these helpful critiques.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      eLife assessment

      Chang et al. provide glutamate co-expression profiles in the central noradrenergic system and test the requirement of Vglut2-based glutamatergic release in respiratory and metabolic activity under physiologically relevant gas challenges. Their experiments provide compelling evidence that conditional deletion of Vglut2 in noradrenergic neurons does not impact steadystate breathing or metabolic activity in room air, hypercapnia, or hypoxia. This study provides an important contribution to our understanding of how noradrenergic neurons regulate respiratory homeostasis in conscious adult mice.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Chang et al. provide glutamate co-expression profiles in the central noradrenergic system and test the requirement of Vglut2-based glutamatergic release in respiratory and metabolic activity under physiologically relevant gas challenges. Their experiments show that conditional deletion of Vglut2 in NA neurons does not impact steady-state breathing or metabolic activity in room air, hypercapnia, or hypoxia. Their observations challenge the importance of glutamatergic signaling from Vglut2 expressing NA neurons in normal respiratory homeostasis in conscious adult mice.

      Strengths:

      The comprehensive Vglut1, Vglut2, and Vglut3 co-expression profiles in the central noradrenergic system and the combined measurements of breathing and oxygen consumption are two major strengths of this study. Observations from these experiments provide previously undescribed insights into (1) expression patterns for subtypes of the vesicular glutamate transporter protein in the noradrenergic system and (2) the dispensable nature of Vglut2-dependent glutamate signaling from noradrenergic neurons to breathing responses to physiologically relevant gas challenges in adult conscious mice.

      Weaknesses:

      Although the cellular expression profiles for the vesicular glutamate transporters are provided, the study fails to document that glutamatergic-based signaling originating from noradrenergic neurons is evident at the cellular level under normal, hypoxic, and/or hypercapnic conditions. This limits the reader's understanding of why conditional Vglut2 knockdown is dispensable for breathing under the conditions tested.

      We thank the reviewers for their positive evaluation of our work. First, we would like to highlight that multiple studies have provided anatomical evidence of innervation of multiple cardio-respiratory nuclei by Vglut2+ noradrenergic fibers. Thus, the anatomical substrates are present for noradrenergic based Vglut2 signaling to either play a direct role in breathing control or, upon perturbation, to indirectly affect breathing through disrupted metabolic or cardiovascular control. We have included supplemental table 1 that summarizes central noradrenergic Vglut2+ innervations of respiratory and autonomic nuclei. Additionally, Ultrastructural evidence shows asymmetric synaptic contacts assuming glutamatergic transmission between C1 neurons and LC, A1, A2 and the dorsal motor nucleus of the vagus (DMV) (Milner et al., 1989; Abbott et al., 2012; Holloway et al., 2013; DePuy et al., 2013).

      Functionally, electrophysiological evidence showed that photostimulating C1 neurons activate LC, A1, A2 noradrenergic neurons monosynaptically by releasing glutamate (Holloway et al., 2013; DePuy et al., 2013) and optogenetic stimulation of LC neurons excite the downstream parabrachial nucleus (PBN) neurons by releasing glutamate. Thus, at least the glutamatergic signaling from C1 and LC noradrenergic neurons (two noradrenergic nuclei that have been shown to play a role in breathing control) is evident at the cellular level under normal conditions. Other evidence, highlighted in our manuscript, is more circumstantial.

      Reviewer #2 (Public Review):

      The authors characterized the recombinase-based cumulative fate maps for vesicular glutamate transporters (Vglut1, Vglut2 and Vglut3) expression and compared those maps to their real-time expression profiles in central NA neurons by RNA in situ hybridization in adult mice. Authors have revealed a new and intriguing expression pattern for Vglut2, along with an entirely uncharted co-expression domain for Vglut3 within central noradrenergic neurons. Interestingly, and in contrast to previous studies, the authors demonstrated that glutamatergic signaling in central noradrenergic neurons does not exert any influence on breathing and metabolic control either under normoxic/normocapnic conditions or after chemoreflex stimulation. Also, they showed for the first-time the Vglut3-expressing NA population in C2/A2 nuclei. In addition, they were also able to demonstrate Vglut2 expression in anterior NA populations, such as LC neurons, by using more refined techniques, unlike previous studies.

      A major strength of the study is the use of a set of techniques to investigate the participation of NA-based glutamatergic signaling in breathing and metabolic control. The authors provided a full characterization of the recombinase-based cumulative fate maps for Vglut transporters. They performed real-time mRNA expression of Vglut transporters in central NA neurons of adult mice. Further, they evaluated the effect of knocking down Vglut2 expression in NA neurons using a DBH-Cre; Vglut2cKO mice on breathing and control in unanesthetized mice. Finally, they injected the AAV virus containing Cre-dependent Td tomato into LC of v-Glut2 Cre mice to verify the VGlut2 expression in LC-NA neurons. A very positive aspect of the article is that the authors combined ventilation with metabolic measurements. This integration holds particular significance, especially when delving into the exploration of respiratory chemosensitivity. Furthermore, the sample size of the experiments is excellent.

      Despite the clear strengths of the paper, some weaknesses exist. It is not clear in the manuscript if the experiments were performed in males and females and if the data were combined. I believe that the study would have benefited from a more comprehensive analysis exploring the sex specific differences. The reason I think this is particularly relevant is the developmental disorders mentioned by the authors, such as SIDS and Rett syndrome, which could potentially arise from disruptions in central noradrenergic (NA) function, exhibit varying degrees of sex predominance. Moreover, some of the noradrenergic cell groups are sexually dimorphic. For instance, female Wistar rats exhibit a larger LC size and more LC-NA neurons than male subjects (Pinos et al., 2001; Garcia-Falgueras et al., 2005). More recently, a detailed transcriptional profiling investigation has unveiled the identities of over 3,000 genes in the LC. This revelation has highlighted significant sexual dimorphisms, with more than 100 genes exhibiting differential expression within LC-NA neurons at the transcript level. Furthermore, this investigation has convincingly showcased that these distinct gene expression patterns have the capacity to elicit disparate behavioral responses between sexes (Mulvey et al., 2018). Therefore, the authors should compare the fate maps, Vglut transporters in males and females, at least considering LC-NA neurons. Even in the absence of identified sex differences, this information retains significant importance.

      All experiments contained both males and females as described in the original submission. In our analysis of breathing and metabolism, sex was included in the analysis and no significant phenotypic difference was observed. For the fate map and in situ experiments, we did not see obvious differences in the expression patterns in the three glutamate transporters between females and males, though the group size is small. Though all the anatomical and phenotypic data in this manuscript are presented as combined graphs, we have differentially labeled our data points by sex. The reviewer does raise important questions regarding possible sexual dimorphisms in the central noradrenergic system and whether such dimorphisms may extend to glutamate transporter co-expression. Our thorough interrogation of respiratory-metabolic parameters fails to reveal any sex specific differences in control or experimental mice. Thus, it is unclear if any of the previously described and cited dimorphisms are functionally relevant in this setting. Given the large differences in the real time expression and cumulative fate maps of Vglut2, a worthwhile interrogation of differential glutamate transporter expression would be best served by longitudinal studies with large group sizes across age as it is not clear what underlies the dynamic VGlut2 expression changes. Such changes may at times be greater in males and other times in females, driven by experience or physiological challenges etc., but resulting in averaged cumulative fatemaps that are similar between sexes. Such a longitudinal quantitative study of real-time and fatemapped cell populations across the central NA system would be of a scale that is beyond the scope of this report, especially when no phenotypic changes have been observed in our respiratory data.

      An important point well raised by the authors is that although suggestive, these experiments do not definitively rule out that NA-Vglut2 based glutamatergic signaling has a role in breathing control. Subsequent experiments will be necessary to validate this hypothesis.

      As noted, we discuss that we only address requirement, not sufficiency, of NA Vglut2 in breathing. Functional sufficiency experiments usually involve increasing the relevant output. However, these experiments can lead to non-specific, pleiotropic effects that would be difficult to disambiguate, even if done with high cellular specificity. Viral or genetic overexpression of Vglut2 in NA neurons may be a feasible approach. Conditional ablation of TH or DBH with concurrent chemo or optogenetic stimulation may also be informative. These approaches would require significant investments in mouse model generation and suffer additional experimental limitations.

      An improvement could be made in terms of measuring body temperature. Opting for implanted sensors over rectal probes would circumvent the need to open the chamber, thereby preventing alterations in gas composition during respiratory measurements. Further, what happens to body temperature phenotype in these animals under different gas exposures? These data should be included in the Tables.

      While surgical implantation of sensors would provide a more direct assessment of temperature, it requires components that were not available at the time of the study and addresses a question (temperature changes during a time course of gas exposure) that go beyond the scope of the current work focused on respiratory response. As we have done for prior experiments (Martinez et al., 2019; Ray et al., 2011), the body temperature was measured immediately before and after measuring breathing only. Our flow through system using inline gas sensors (AEI P-61B CO2 sensor and AEI N-22M O2 sensor) ensure that gas challenges were constant and consistent across all measurements. Any disruption in gas composition would have been noted by our software analysis system, Breathe Easy, and the data rejected. We did not observe any such perturbations.

      Is it plausible that another neurotransmitter within NA neurons might be released in higher amounts in DBH-Cre; Vglut2 cKO mice to compensate for the deficiency in glutamate and prevent changes in ventilation?

      We agree that compensation is always a possibility at the synaptic, cellular, and circuit levels that may involve a variety of transcriptional, translational, cellular, and circuit mechanisms (i.e., synaptic strength). This could be interrogated by combining multiple conditional alleles and recombinase drivers for various transmitters and receptors, but would, in our experience, take multiple years for the requisite breeding to be completed.

      Continuing along the same line of inquiry is there a possibility that Vglut2 cKO from NA neurons not only eliminates glutamate release but also reduces NA release? A similar mechanism was previously found in VGLUT2 cKO from DA neurons in previous studies (Alsio et al., 2011; Fortin et al., 2012; Hnasko et al., 2010). Additionally, does glutamate play a role in the vesicular loading of NA? Therefore, could the lack of effect on breathing be explained by the lack of noradrenaline and not glutamate?

      These are all excellent points, but prior studies suggest that reductions in NA signaling would itself have an apparent effect (Zanella et al., 2006; Kuo et al., 2016). Although several studies showed that LC and C1 NA neurons co-release noradrenaline and glutamate, no direct evidence yet makes clear that glutamate facilitates NA release or vice versa. However, it would be of great interest to test if reduced or lack of NA compensated for loss of glutamate in the future. We do fully acknowledge that compensation in the manuscript that any number of compensatory events could be at play in these findings.

      Reviewer #3 (Public Review):

      Summary:

      The authors, Y Chang and colleagues, have performed elegant studies in transgenic mouse models that were designed to examine glutamatergic transmission in noradrenergic neurons, with a focus on respiratory regulation. They generated 3 different transgenic lines, in which a red fluorophore was expressed in dopamine-B-hydroxylase (DBH; noradrenergic and adrenergic neurons) neurons that did not express a vesicular glutamate transporter (Vglut) and a green fluorophore in DBH neurons that did express one of either Vglut1, Vglut2 or Vglut3.

      Further experiments generated a transgenic mouse with knockout of Vglut2 in DBH neurons. The authors used plethysmography to measure respiratory parameters in conscious, unrestrained mice in response to various challenges.

      Strengths:

      The distribution of the Vglut expression is broadly in agreement with other studies, but with the addition of some novel Vglut3 expression. Validation of the transgenic results, using in situ hybridization histochemistry to examine mRNA expression, revealed potential modulation of Vglut2 expression during phases of development. This dataset is comprehensive, wellpresented and very useful.

      In the physiological studies the authors observed that neither baseline respiratory parameters, nor respiratory responses to hypercapnea (5, 7, 10% CO2) or hypoxia (10% O2) were different between knockout mice and littermate controls. The studies are well-designed and comprehensive. They provide observations that are supportive of previous reports using similar methodology.

      Weaknesses:

      In relation to the expression of Vglut2, the authors conclude that modulation of expression occurs, such that in adulthood there are differences in expression patterns in some (nor)adrenergic cell groups. Altered sensitivity is provided as an explanation for different results between studies examining mRNA expression. These are likely explanations; however, the conclusion would really be definitive with inclusion of a conditional cre expressing mouse. Given the effort taken to generate this dataset, it seems to me that taking that extra step would be of value for the overall understanding of glutamatergic expression in these catecholaminergic neurons

      The seemingly dynamic Vglut2 expression pattern across the NA system is intriguing. As noted in our comments to reviewer 2, a robust age dependent interrogation would require a large magnitude study. The reviewer correctly points out that a temporally controlled recombinase fate mapping experiment would offer greater insight into the dynamic expression of Vglut2. We strongly agree with that idea and did work to develop a Vglut2-CreER targeted allele that, despite our many other successes in mouse genetic engineering (Lusk et al., 2022; Sun and Ray, 2016), did not succeed on the first attempt. We aim to complete the line in the near future so that we may better understand the Vglut2 expression pattern in central noradrenergic neurons in a time-specific manner and sex specific manner.

      The respiratory physiology is very convincing and provides clear support for the view that Vglut2 is not required for modulation of the respiratory parameters measured and the reflex responses tested. It is stated that this is surprising. However, comparison with the data from Abbott et al., Eur J Neurosci (2014) in which the same transgenic approach was used, shows that they also observed no change in baseline breathing frequency. Differences were observed with strong, coordinated optogenetic stimulation, but, as discussed in this manuscript, it is not clear what physiological function this is relevant to. It just shows that some C1 neurons can use glutamate as a signaling molecule. Further, Holloway et al., Eur J Neurosci (2015), using the same transgenic mouse approach, showed that the respiratory response to optogenetic activation of Phox2 expressing neurons is not altered in DBH-Vglut2 KO mice. The conclusion seems to be that some C1 neuron effects are reliant upon glutamatergic transmission (C1DMV for example), and some not.

      We agree that activation of C1 neurons may be sufficient to modulate breathing when artificially stimulated and that such stimulation relies on glutamatergic transmission for its effect. This is why we find our results surprising and important in clarifying for the field that glutamatergic signaling in noradrenergic cells is dispensable for breathing and hypoxic and hypercapnic responses under physiological conditions.

      Further contrast is made in this manuscript to the work of Malheiros-Lima and colleagues (eLife 2020) who showed that the activation of abdominal expiratory nerve activity in response to peripheral chemoreceptor activation with cyanide was dependent upon C1 neurons and could be attenuated by blockade of glutamate receptors in the pFRG - i.e. the supposition that glutamate release from C1 neurons was responsible for the function. However, it is interesting to observe that diaphragm EMG responses to hypercapnia (10% CO2) or cyanide, and the expiratory activation to hypercapnia, were not affected by the glutamate receptor blockade. Thus, a very specific response is affected and one that was not measured in the current study.

      As we mention above, we do not dispute that glutamate signaling can be manipulated to create a response in non-physiological conditions – we suggest that framing the interpretation around the glutamatergic role in a model that better matches physiological conditions should inform our interpretation. Furthermore, we do include an examination of expiratory flow – which was not impacted by loss of glutamatergic activity in NA neurons – which would be likely to have been impacted if abdominal expiratory nerve activity was modified.

      These previous published observations are consistent with the current study which provides a more comprehensive analysis of the role of glutamatergic contributions respiratory physiology. A more nuanced discussion of the data and acknowledgement of the differences, which are not actually at odds, would improve the paper and place the information within a more comprehensive model.

      Thank you for the comments. As noted in the original and extended discussion, we respectfully disagree with the perspective that our results align with prior results.

      Recommendations for the authors:

      The three reviewers believe this is an important study. They have numerous suggestions for improvement of the manuscript (outlined below), but no new experiments are required. The Editor requests some nomenclature changes as indicated in attachment 1.

      Reviewer #1 (Recommendations For The Authors):

      Abstract/Introduction: Although the need for this study is obvious, it is important that the authors explicitly communicate their working hypothesis < before the start of the work> to the reader. In the current form, it is unclear whether the authors aimed to test the hypothesis that glutamatergic signaling from noradrenergic neurons is important to breathing or whether to test the hypothesis that glutamatergic signaling from noradrenergic neurons is not important to breathing. If it is the latter-it is not important-then the study (related to the breathing measurements) is poorly justified and designed, as additional orthogonal approaches (e.g., actual measurements of glutamatergic signaling at the cellular level) are almost requisite. If the authors' hypothesis was originally based on existing literature suggesting that glutamatergic signaling from noradrenergic neurons is important to breathing, then the experimental design appropriate.

      Thank you for the suggestion. The working hypothesis has been added in the abstract (line 2425) and the introduction (line 92-94)), making clear that we initially hypothesized that glutamatergic signaling from noradrenergic neurons is important in breathing.

      Results: While the steady state measurements for breathing metrics are clearly important in defining how glutamatergic signaling may contribute to be pulmonary function, the role of glutamatergic signaling may have a greater role in the dynamics of patterns (i.e., regularity of the breathing rhythms) such traits can be described using SD1 and SD2 from Poincare maps, and/or entropy measurements. Such an analysis should be performed.

      Thank you for the suggestion. The dynamic patterns of respiratory rate (Vf), tidal volume (VT), minute ventilation (VE), inspiratory duration (TI), expiratory duration (TE), breath cycle duration (TTOT), inspiratory flow rate (VT/TI), expiratory flow rate (VT/TE) have been shown as Poincaré plots and quantified and tested using the SD1 and SD2 statistics in the supplemental figures of Figure 4-7.

      Results: Analyses of Inspiratory time (Ti) and flow rate (i.e., Tidal Volume / Ti) should be assessed and included.

      Thank you for the suggestion. Inspiratory duration (Ti), expiratory duration (TE), breath cycle duration (TTOT), inspiratory flow rate (VT/Ti), and expiratory flow rate (VT/TE) have been included in the Figures 4-7.

      Results/Methods: If similar analytical approaches were used in the current study as to that in Lusk et al. 2022, it appears that data was discontinuously sampled, rejecting periods of movement and only including periods of quiescent breathing. Were the periods of quiescent breathing different? Information should be provided to describe the total sampling duration included.

      For room air, the entire gas condition was used for data analysis. For hypercapnia (5% CO2, 7% CO2, 10% CO2), only the last 5 minutes of the gas challenge period was used for data analysis. For hypoxia (10% O2), we analyzed the breathing trace of three 5-minute epochs following initiation of the gas exposure separately, e.g., epoch 1 = 5-10min, epoch 2 = 10-15min, and epoch 3 = 15-20min. All breaths included as quiescent breathing were analyzed in the aggregate for each group and experimental condition, we did not compare individual periods of quiescent breathing within or across an animal(s)/group(s)/experimental condition(s). We have added the details in the Materials and Methods (line 637-642).

      Results: As mice were conscious in this study, were sniff periods (transient periods of fast breathing, i.e.,>8Hz) included in the analysis?

      No, only regular quiescent breathing periods were included in the analysis.

      Discussion: The authors need to discuss the limitations of their findings.

      • How should the reader interpret the findings? Concluding that glutamatergic signaling is dispensable implies that it occurs in room air, hypoxia, and hypercapnia.

      We have edited our discussion for clarity to highlight our conclusions that Vglut2-based glutamatergic signaling from noradrenergic neurons is ultimately dispensable for baseline breathing and hypercapnia and hypoxic chemoreflex in unanesthetized and unrestrained mice.

      • Assuming that glutamatergic signaling is active during the conditions tested, then the authors should discuss what may be the potential compensations.

      We have provided additional discussion surrounding potential compensatory events that may have taken place and could result in the unchanged phenotype in the experimental group.

      • The authors need to discuss how age and state of consciousness may play a role in their finds. The current discussion gives the impression that their findings are broadly applicable in all cases, but the lack of differences in this study may not hold true under different conditions.

      The study was done in adult (6–8-week-old) unanesthetized and unrestrained mice. In the discussion (line 472-474), we highlight that in our unpublished results, loss of NA-expressed Vglut2 does not change the survival curve in P7 neonate mice undergoing repeated bouts of autoresuscitation until death. Thus, we believed that Vglut2-based glutamatergic signaling in central NA neurons is dispensable for baseline breathing and the hypercapnic and hypoxic chemoreflexes in unanesthetized and unrestrained mice across different ages. Otherwise, we do not imply that we have interrogated any other aspects of breathing in our discussion.

      Methods: Further description of the analysis window for the respiratory metrics should be provided. Were breath values for each condition taken throughout the entire condition? This is particularly important for hypoxia, where the stereotypical respiratory response is biphasic.

      For room air, the entire gas condition was used for data analysis. For hypercapnia (5% CO2, 7% CO2, 10% CO2), only the last 5min of the gas challenge period was used for data analysis. For hypoxia (10% O2), we analyzed the breathing trace of three 5min time periods separately including 5-10min, 10-15min, and 15-20min during the hypoxic challenge as noted in our original manuscript, we graph and assess three 5min epochs during hypoxic exposure to capture the dynamic nature of the hypoxic ventilatory response. We have added the details in the Materials and Methods (line 637-642).

      Methods: How was consciousness determined?

      The conscious mice mentioned in the manuscript refer to the mice without anesthesia. We have replaced “awake” and “conscious” with “unanesthetized” in the text.

      Reviewer #2 (Recommendations For The Authors):

      Since no EEG/EMG recording was performed it would be more appropriate to remove "awake" and "conscious" throughout the manuscript and include the term "unanesthetized".

      Thank you for the suggestion. “Awake” and “conscious” have been replaced by “unanesthetized” in the text.

      Line 545: Why 32C? Isn't this temperature too high for animals?

      30-32°C is the thermoneutral zone for mice. It is the range of ambient temperature where mice can maintain a stable core temperature with their minimal metabolic rate (Gordon, 1985). Whole-body plethysmography uses the barometric technique to detect pressure oscillations caused by changes in temperature and humidity with each breathing act when an animal sits in a sealed chamber (Mortola et al., 2013). Thus, maintaining the chamber temperature near the thermoneutral zone during the plethysmography assay is required to maintain constancy in respiratory and metabolic parameters from trial to trial as well as to maintain linearity of ventilatory pressure changes due to humidification, rarefaction, and thermal expansion and contraction during inspiration and expiration (Ray et al., 2011). The chamber temperature that has been used for adult plethysmography has been set across a range 30-34°C (Hodges et al., 2008; Ray et al., 2011; Hennessy et al., 2017). We use 32°C in this manuscript which is consistent with previously published literature from other groups and our own work (Sun et al., 2017; Lusk et al., 2022).

      I would include the units of the physiological variables in the tables.

      Thank you for the suggestion. The units of the physiological variables have been added in all the tables.

      Reviewer #3 (Recommendations For The Authors):

      Why is the C3 group not considered in this study?

      The C3 adrenergic group, best characterized in rat, is only seen in rodents but not in many other species including primates (including human) (Kitahama et al., 1994). Thus, the C3 group is not the focus of this study where we aim to discuss if glutamate derived from noradrenergic neurons could be the potential therapeutic target of human respiratory disorders. The C3 adrenergic group is typically described as a population containing only about 30 neurons. We have added the fate map data and the adult expression pattern for the three vesicular glutamate transporters for the C3 group in the figure 1 and 2 supplements for reference.

      Sub CD/CV does not appear to be defined in the manuscript.

      Thank you for the point. The definition of sub CD/CV has been added in the text (line 126).

      The data on line 131-133 is interesting but could be described more effectively and clearly.

      Thank you for the suggestion. The text has been modified accordingly.

      The end of the paragraph at lines 140 onwards is rather repeated in the paragraph that starts at line 146.

      The repeated text has been removed accordingly.

      Whilst anterior and posterior are correct anatomical terms, for a quadraped, rostral and caudal are more widely used - particularly in the brainstem field. Is there a particular reason for using anterior/posterior?

      We followed the anatomical terminations in the Robertson et al. (2013) where they used anterior/posterior to describe C2/A2 and C1/A1.

      On the protocol lines include in Figure 4-7 it would be worth adding the test day. This seems a little strange. Why wait up to one week after the habituation to perform the stimulation. How many mice were left for each day between habituation and experimentation, and does this timing affect responses? Do mice forget the habituation after a period?

      Thank you for the point. We have added the test day for plethysmography in figures 4-7. After the 5 days of habituation, we began the plethysmography recordings on the sixth day. A maximum of 6 mice can be assayed for plethysmography per day due to the limited number of barometric flow through plethysmography and metabolic measurement systems we have. Thus, all animals were finished with plethysmography “within” one week of the last day of habituation. This protocol is consistent with our previous published work (Martinez et al., 2019; Lusk et al., 2022; Lusk et al., 2023). For the experiments in this manuscript, mice were assayed within 3 days after habituation. As noted in our methods and figures, each mouse is given as much as 40 mins to acclimate to the chamber (determined by directly observed quiet breathing) before data acquisition. We have no reason or evidence that indicates testing order and thus timing was a factor. The detailed explanation for the plethysmography protocol has been added in the material and methods section (line 606-625).

      Please state clearly that each mouse is only exposed to one gas mixture (what I interpret is the case), or could one mouse be exposed to several different stimuli?

      Each mouse is only exposed to one gas challenge (5% CO2, 7% CO2, 10% CO2, or 10% O2) in a testing period. Each testing period for an individual mouse was separated by 24hs to allow for a full recovery. The protocol is to put the mouse under room air for 45mins, switch to one gas challenge for 20mins, and switch back to room air for 20mins.

      With apologies if I missed this, but did each of the respiratory stimuli produce a statistically significant response in the control mice? For example, the response to 10%O2?

      Yes, each respiratory stimuli including 5/7/10% CO2 and 10% O2 produced a statistically significant response in both mutant and control mice. We have labeled the statistical significance in the Figures 4-7. Thank you for pointing this out.

      Line 312: Optogenetic stimulation induced an increase from 130 to 180 breaths per min (Abbott et al., EJN 2014). It is surprising that this is called "modest". Baseline respiratory frequency was presented.

      Thank you for the point. The word “modest” has been removed and the discussion has been changed accordingly (line 355-360).

      Line 338: This discussion is not sufficiently nuanced. It is the increased Dia amplitude (to KCN only, not 10%CO2 ) and the stimulation of active expiration, to both stimuli, that is blocked by kyn in pFRG. There is no effect of breathing frequency. The current study would not detect such differences in active expiration.

      Thank you for the suggestion. The discussion has been modified accordingly (line 382-388).

    1. Author response:

      The following is the authors’ response to the current reviews.

      Reviewer #1:

      Summary:

      The authors study age-related changes in the excitability and firing properties of sympathetic neurons, which they ascribe to age-related changes in the expression of KCNQ (Kv7, "M-type") K+ currents in rodent sympathetic neurons, whose regulation by GPCRs has been most thoroughly studied for over 40 years.

      Strengths:

      The strengths include the rigor of the current-clamp and voltage-clamp experiments and the lovely, crisp presentation of the data, The separation of neurons into tonic, phasic and adapting classes is also interesting, and informative. The ability to successfully isolate and dissociate peripheral ganglia from such older animals is also quite rare and commendable! There is much useful detail here.

      Thank you for recognizing the effort we put on presenting the data and analyzing the neuronal populations. I also believe the ability to isolate neurons from old animals is worth communicating to the scientific community.

      Weaknesses:

      Where the manuscript becomes less compelling is in the rapamycin section, which does not provide much in the way of mechanistic insights. As such, the effect is more of an epi-phenomenon of unclear insight, and the authors cannot ascribe a signaling mechanism to it that is supported by data. Thus, this latter part rather undermines the overall impact and central advance of the manuscript. The problem is exacerbated by the controversial and anecdotal nature of the entire mTor/aging field, some of whose findings have very unfortunately had to be recently retracted.

      I would strongly recommend to the authors that they end the manuscript with their analysis of the role of M current/KCNQ channels in the numerous age-related changes in sympathetic neuron function that they elegantly report, and save the rapamycin, and possible mTor action, for a separate line of inquiry that the authors could develop in a more thorough and scholarly way.

      Whereas the description of the data are very nice and useful, the manuscript does not provide much in the way of mechanistic insights. As such, the effect is more of an epi-phenomenon of unclear insight, and the authors cannot ascribe changes in signaling mechanisms, such as that of M1 mAChRs to the phenomena that is supported by data.

      I appreciate the new comment. We had agreed that our rapamycin experiments did not allow to ascribe the mechanism to the signaling pathway of mTOR. The new comment mentions M1 mAChRs signaling as another potential signaling mechanism. Our work centered on determining whether aging altered the function of sympathetic motor neurons and defining the mechanism. We presented evidence showing that the mechanism is a reduction of the M-current. We did not attempt to identify the signaling mechanism linking aging to a reduction in M-current. Therefore, we agree with the reviewer that we do not provide further details on the mechanism and that that remains an open question. However, I find it harsh to say that “the effect is more of an epiphenomenon of unclear insight”. How could we possibly test that the effect of aging on the excitability of these neurons only arises as a secondary effect or that is not causal? How could we test for sufficiency and necessity of aging? How could we modify the state of aging to test for causality? We would have to reverse aging and show that the effect on the excitability is gone. And that is exactly what we tried to do with the rapamycin experiment.

      Reviewer #1 (Recommendations For The Authors):

      (1) The significance values greater than p < 0.05 do not add anything and distract focus from the results that are meaningful. Fig. 5 is a good example. What does p = 0.7 mean? Or p = 0.6? Does this help the reader with useful information?

      I thank Reviewer 1 for raising this question. We have attempted different versions of how we report p values, as we want to make sure to address rigor and transparency in reporting data. As corresponding author, I favor reporting p values for all statistical comparisons. To help the reader identifying what we considered statistically significant, we color coded the p values, with red for p-value<0.05 and black for p-value>0.05. As a reader, seeing a p-value=0.7 allows me to know that the authors performed an analysis comparing these conditions and found the mean not to be different. Not presenting the p-value makes me wonder whether the authors even analyzed those groups. In other words, I value more the ability to analyze the data seeing all p-values than not being distracted by not-significant p-values. This is just my preference.

      (2) Fig. 1 is not informative and should be removed.

      I thank Reviewer 1 for the suggestion. In previous drafts of the manuscript, this figure was included only as a panel. However, we decided it was better to guide the reader into the scope of our work. This is part of our scientific style and, therefore, we prefer to keep the figure.

      (3) The emphasis on a particular muscarinic agonist favored by many ion channel physiologists, oxotremorine, is not meaningful (lines 192, 198). The important point is stimulation of muscarinic AChRs, which physiologically are stimulated by acetylcholine. The particular muscarinic agonist used is unimportant. Unless mandated by eLife, "cholinergic type 1 muscarinic receptors" are usually referred to as M1 mAChRs, or even better is "Gq-coupled M1 mAChRs." I don't think that Kruse and Whitten, 2021 were the first to demonstrate the increase in excitability of sympathetic neurons from stimulation of M1 mAChRs. Please try and cite in a more scholarly fashion.

      A) I have modified lines 192 and 198 removing mention to oxotremorine.

      B) I have modified the nomenclature used to refer to cholinergic type 1 muscarinic receptors.

      C) I cited references on the role of M current on sympathetic motor neuron excitability. I also removed the reference (Kruse and Whitten, 2021) referring only on the temporal correlation between the decrease of KCNQ current with excitability.

      (4) The authors may want to use the term "M current" (after defining it) as the current produced by KCNQ2&3-containing channels in sympathetic neurons, and reserve "KCNQ" or "Kv7" currents as those made by cloned KCNQ/Kv7 channels in heterologous systems. A reason for this is to exclude currents KCNQ1-containing channels, which most definitely do not contribute to the "KCNQ" current in these cells. I am not mandating this, but rather suggesting it to conform with the literature.

      Thank you for the suggestion. I have modified the text to use the term M current. I maintain the use of KCNQ only when referring to KCNQ channel, such as in the section describing the abundance of KCNQ2.

      (5) The section in the text on "Aging reduces KCNQ current" is confusing. Can the authors describe their results and their interpretation more directly?

      I am not sure to understand the request. I assumed point 5 and 6 are related and decided to answer point 6.

      (6) Please explain the meaning of the increase in KCNQ2 abundance with age in Fig. 6G. How is this increase in KCNQ2 expression consistent with an increase in excitability? The explanation of "The decrease in KCNQ current and the increase in the abundance of KCNQ2 protein suggest a potential compensatory mechanism that occurs during aging, which we are actively investigating in an independent study." is rather odd, considering that the entire thesis of this paper is that changes in excitability and firing properties are underlied by changes in KCNQ2/3 channel expression/density. Suddenly, is this not the case?? What about KCNQ3? It would be very enlightening if the authors would just quantify the ratio of KCNQ2:KCNQ3 subunits in M-type channels in young and old mice using simple TEA dose/response curves (see Shapiro et al., JNS, 2000; Selyanko et al., J. Physiol., Hadley et al., Br. J. Pharm., 2001 and a great many more). It is also surprising that the authors did not assess or probe for differences in mAChR-induced suppression of M current between SCG neurons of young and old mice. This would seem to be a fundamental experiment in this line of inquiry.

      A. Please explain the meaning of the increase in KCNQ2 abundance with age in Fig. 6G. How is this increase in KCNQ2 expression consistent with an increase in excitability? The explanation of "The decrease in KCNQ current and the increase in the abundance of KCNQ2 protein suggest a potential compensatory mechanism that occurs during aging, which we are actively investigating in an independent study." is rather odd, considering that the entire thesis of this paper is that changes in excitability and firing properties are underlied by changes in KCNQ2/3 channel expression/density. Suddenly, is this not the case?? Our interpretation is that the decrease in M current is not caused by a decrease in the abundance of KCNQ (2) channels. We do not claim that changes in excitability are underlied by a reduction in the expression or density of KCNQ2 channels. On the contrary, our working hypothesis is that the reduction in M current is caused by changes in traffic, degradation, posttranslational modifications, or cofactors for KCNQ2 or KCNQ3 channels. We have modified the description in the results section to clarify this concept.

      B. What about KCNQ3? Unfortunately, we did not find an antibody to detect KCNQ3 channels. I have added a sentence to state this.

      C. KCNQ2:KCNQ3 subunits in M-type channels in young and old mice using simple TEA dose/response curves. This is a great idea. Thank you for the suggestion. Is this a necessary experiment for the acceptance of this manuscript?

      D. It is also surprising that the authors did not assess or probe for differences in mAChR-induced suppression of M current between SCG neurons of young and old mice. This would seem to be a fundamental experiment in this line of inquiry. Reviewer 1 is correct. We did not assess for differences in the suppression of M current by mAChR activation. We do not see the connection of this experiment with the scope of the current investigation.

      (7) Why do the authors use linopirdine instead of XE-991? Both are dirty drugs hardly specific to KCNQ channels at 25 uM concentrations, but linopirdine less so. The Methods section lists the source of XE991 used in the study, not linopirdine. Is there an error?

      A. Why do the authors use linopirdine instead of XE-991? After validation of KCNQ2/3 inhibition by Linopirdine, we found the effect on membrane potential recordings to be reproducible. Linopirdine has also been reported to be reversible. We wanted to assess reversibility on the excitability of young neurons. We did not find the effect to be reversible. We performed experiments applying XE-991 while recording the membrane potential. XE-991 did not show a clear effect. I was not surprised by this. It is very likely that the pharmacological inhibition of one channel leads to the activation of other channel types. This is highlighted in the work by Kimm, Khaliq, and Bean, 2015. “Further experiments revealed that inhibiting either BK or Kv2 alone leads to recruitment of additional current through the other channel type during the action potential as a consequence of changes in spike shape.” In fact, it was quite remarkable that the aged and young phenotypes were mimicked by targeting KCNQ pharmacologically.

      B. Both are dirty drugs hardly specific to KCNQ channels at 25 uM concentrations, but linopirdine less so. I have added a sentence to point out that linopirdine is less potent than XE-991. It reads: “We want to point out that linopirdine is less potent than XE-991 and that it has been reported to activate TRPV1 channels (Neacsu and Babes, 2010). Despite this limitation, the application of linopirdine to young sympathetic motor neurons led to depolarization and firing of action potentials.”

      C. The Methods section lists the source of XE991 used in the study, not linopirdine. Is there an error? Thank you for pointing out this. I have added information for both retigabine and linopirdine in the Methods section, both were missing.

      (8) Can the authors use a more scientific explanation of RTG action than "activating KCNQ channels?" For instance, RTG induces both a negative-shift in the voltage-dependance of activation and a voltage-independent increase in the open probability, both of which differing in detail between KCNQ2 and KCNQ3 subunits. The authors are free to use these exact words. Thus, the degree of "activation" is very dependent upon voltage at any voltages negative to the saturating voltages for channel activation.

      I have modified the text to reflect your suggestion.

      (9) Methods: did the authors really use "poly-l-lysine-coated coverslips?" Almost all investigators use poly-D-lysine as a coating for mammalian tissue-culture cells and more substantial coatings such as poly-D-lysine + laminin or rat-tail collagen for peripheral neurons, to allow firm attachment to the coverslip.

      That is correct. We used poly-L-lysine-coated coverslips. Sympathetic motor neurons do not adhere to poly-D-Lysine.

      (10) As a suggestion, sampling M-type/KCNQ/Kv7 current at 2 kHz is not advised, as this is far faster than the gating kinetics of the channels. Were the signals filtered?

      It is correct. Currents were sampled at 2KHz. Data were low-pass filtered at 3 KHz. Our conditions are not far from what is reported by others. Some sample at 10KHz and even 50 KHz. Others do not report the sample frequency.

      Reviewer #2:

      Weaknesses:

      None, the revised version of the manuscript has addressed all my concerns.

      I am glad we were able to satisfy previous concerns.

      Reviewer #3:

      The main weakness is that this study is a descriptive tabulation of changes in the electrophysiology of neurons in culture, and the effects shown are correlative rather than establishing causality.

      Allow me to clarify our previous responses and determine how this aligns with your concerns. In the previous revision, Reviewer 3 wrote: “It is difficult to know from the data presented whether the changes in KCNQ channels are in fact directly responsible for the observed changes in membrane excitability.” And suggested to “use of blockers and activators to provide greater relevance.” I assumed these comments were the main concern and that doing such experiments was enough to satisfy the criticism. It is discouraging to see that our experiments did not satisfy the concerns of the reviewer of being correlative.

      If Reviewer 3 is referring to stablishing causality between aging and a reduction in M current, I would like to emphasize that such endeavor is complicated as there is not a clear experiment to solve that issue. Our best attempt was to reverse aging with rapamycin, but the recommendation was to remove those experiments.

      … but the specifics of the effects and relevance to intact preparations are unclear. Additional experiments in slice cultures would provide greater significance on the potential relevance of the findings for intact preparations.

      I apologize for missing this point in the previous revision. The proposed experiments will require an upward microscope coupled to an electrophysiology rig. Unfortunately, I do not have the equipment to do these experiments.

      Summary of recommendations from the three reviewers:

      Please make corrections as suggested by reviewer 1 to improve the manuscript. Specifically, reviewer 1 suggests making changes to p values in Figure 5,

      It is not clear what the suggested changes are. The comment from Reviewer 1 says: The significance values greater than p < 0.05 do not add anything and distract focus from the results that are meaningful. If the suggested change is to remove p values > 0.05, I have explained my rational for keeping those values. If the Journal has a specific format on how to report p-values, I will be happy to make appropriate changes.

      and the importance of citing original scholarly works related to effects of increase in excitability of sympathetic neurons by M1 receptors, and the terminology for M currents and KCNQ currents. These changes will improve the manuscript and are strongly recommended.

      I cited original papers on that area, and changed the terminology for M current. I kept KCNQ when referring to the channel protein or abundance.

      The section dealing with Aging Reduces KCNQ currents seems to contain a lot of extraneous information especially in the last part of the long paragraph and this section should be rewritten for improved clarity… and - the implications or lack thereof - of the correlation of KCNQ with AP firing rates.

      A. I removed extraneous information in that section. It now reads: Previous work by our group and others demonstrated that cholinergic stimulation leads to a decrease in M current and increases the excitability of sympathetic motor neurons at young ages \cite{RN67,RN68,RN69,RN71, RN72, RN73, RN74, RN75}. The molecular determinants of the M current are channels formed by KCNQ2 and KCNQ3 in these neurons \cite{RN76, RN77, RN70}. Thus, Figure 6A shows a voltage response (measured in current-clamp mode) and a consecutive M current recording (measured in voltage-clamp mode) in the same neuron upon stimulation of cholinergic type 1 muscarinic receptors. It illustrates the temporal correlation between the decrease of M current with the increase in excitability and firing of APs upon activation with oxotremorine. This strong dependence led us to hypothesize that aging decreases M current, leading to a depolarized RMP and hyperexcitability (Figure 6B). For these experiments, we measured the RMP and evoked activity using perforated patch, followed by the amplitude of M current using a whole-cell voltage clamp in the same cell. We also measured the membrane capacitance as a proxy for cell size. Interestingly, M current density was smaller by 29\% in middle age (7.5 ± 0.7 pA/pF) and by 55\% in old (4.8 ± 0.7 pA/pF) compared to young (10.6 ± 1.5 pA/pF) neurons (Figure 6C-D). The average capacitance was similar in young (30.8 ± 2.2 pF), middle-aged (27.4 ± 1.2 pF), and old (28.8 ± 2.3 pF) neurons (Figure 6E), suggesting that aging is not associated with changes in cell size of sympathetic motor neurons, and supporting the hypothesis that aging alters the levels of M current. Next, we tested the effect on the abundance of the channels mediating M current. Contrary to our expectation, we observed that KCNQ2 protein levels were 1.5 ± 0.1 -fold higher in old compared to young neurons (Figure 6F-G). Unfortunately, we did not find an antibody to detect consistently KCNQ3 channels. We concluded that the decrease in M current is not caused by a decrease in the abundance of KCNQ2 protein.

      B. and - the implications or lack thereof - of the correlation of KCNQ with AP firing rates. I am not sure to understand the request on the section of the correlation of KCNQ with AP firing rate. I divided the long paragraph.

      The apparent lack of correlation between KCNQ current and KCNQ2 protein needs to be better explained. This is a central part of the study and this result undercuts the premise of the paper.

      Indeed, total KCNQ2 protein abundance increases while M current decreases. We do not claim in our work that changes in excitability are caused by a reduction in the expression or density of KCNQ2 channels. On the contrary, our current working hypothesis is that the reduction in M current is caused by changes in traffic, degradation, posttranslational modifications, or cofactors for KCNQ2 or KCNQ3 channels. I have modified the description in the results section and discussion to clarify this concept.

      Additionally, the poor specificity of Linordipine for KCNQ should be pointed out in the limitations.

      I pointed this limitation. It reads: We want to point out that linopirdine is less potent than XE-991 and that it has been reported to activate TRPV1 channels (Neacsu and Babes, 2010). Despite this limitation, the application of linopirdine to young sympathetic motor neurons led to depolarization and firing of action potentials.

      Finally, the editor notes that the author response should not contain ambiguities in what was addressed in the revision. In the original summary of consolidated revisions that were requested, one clearly and separately stated point (point 4) was that experiments in slice cultures should be strongly considered to extend the significance of the work to an intact brain preparation. The author response letter seems to imply that this was done, but this is not the case. The author response seems to have combined this point with another separate point (point 3) about using KCNQ drugs, and imply that all concerns were addressed. Authors should be clear about what revisions were in fact addressed.

      As corresponding author, and direct responsible of the document provided for the reply to the reviewers, I apologize for my mistake. After reviewing this comment, I realized I did not respond to the Major points in the section of the Recommendations for the authors from Reviewer 3. I missed that entire section. My previous responses addressed the Public review of reviewer 3. When doing so, I did not separate the sentences, omitting the request on performing the experiment in slices.


      The following is the authors’ response to the original reviews.

      Reviewer #1

      Summary:

      The authors study age-related changes in the excitability and firing properties of sympathetic neurons, which they ascribe to age-related changes in the expression of KCNQ (Kv7, "M-type") K+ currents in rodent sympathetic neurons, whose regulation by GPCRs has been most thoroughly studied for over 40 years. The authors suggest the ingestion of rapamycin may partially reverse the age-related decrease in M-channel expression. With the rapamycin part included, it is unclear how this work will impact the field of age-related neuronal dysfunction, as the mechanistic information is not strong.

      Strengths:

      The strengths include the rigor of the current-clamp and voltage-clamp experiments, the lovely, crisp presentation of the data, and the expert statistics. The separation of neurons into tonic, phasic, and adapting classes is also interesting, and informative. The writing is also elegant, and crisp. The above is especially true of the manuscript up until the part dealing with the effects of rapamycin, which becomes less compelling.

      We appreciate the thoughtful comments and constructive feedback to improve the impact of the manuscript.

      Weaknesses:

      Where the manuscript becomes less compelling is in the rapamycin section, which does not provide much in the way of mechanistic insights. As such, the effect is more of an epi-phenomenon of unclear insight, and the authors cannot ascribe a signaling mechanism to it that is supported by data. Thus, this latter part rather undermines the overall impact and central advance of the manuscript. The problem is exacerbated by the controversial and anecdotal nature of the entire mTor/aging field, some of whose findings have very unfortunately had to be recently retracted.

      I would strongly recommend to the authors that they end the manuscript with their analysis of the role of M current/KCNQ channels in the numerous age-related changes in sympathetic neuron function that they elegantly report, and save the rapamycin, and possible mTor action, for a separate line of inquiry that the authors could develop in a more thorough and scholarly way.

      We agree with the reviewer in that we cannot ascribe a signaling mechanism to the reversibility observed with rapamycin. Therefore, we are following the recommendation of the reviewer and have removed the rapamycin section.

      We want to emphasize that, in the aging field, any advancement in the knowledge of how drugs such as rapamycin reverse age-associated phenotypes is of crucial importance. These drugs, commonly referred to as aging interventions, include rapamycin, calorie restriction, elamipretide, and metformin. We could have used any of these interventions. And yet, the cellular and molecular mechanisms for each one of these anti-aging drugs are unknown.

      We want to note that, although the nature of the mTOR field is controversial, the effect of rapamycin in extending lifespan and improving health is not. At least these authors have not been able to find retracted papers on that subject or notices from the NIA alerting on this issue. We kindly request the reviewer to provide the references related to rapamycin that were retracted so we can evaluate how that affects the rigor of the premise for our future work.

      As authors, we also find it important to note that we are confident of our observations regarding the effect of rapamycin, and that we are not removing this section because we are retracting our claims. We will use these data to continue our research of the mechanism behind the effect of aging on sympathetic motor neurons.

      Reviewer #2:

      Summary:

      This research shows compelling and detailed evidence showing that aging influences intrinsic membrane properties of peripheral sympathetic motor neurons such that they become more excitable. Furthermore, the authors present convincing evidence that the oral administration of the anti-aging drug Rapamycin partially reversed hyperexcitability in aged neurons. This study also investigates the molecular mechanisms underlying age-associated hyperexcitability in mouse sympathetic motor neurons. In that regard, the authors found an age-associated reduction of an outward current having properties similar to KCNQ2/Q3 potassium current. They suggested a reduction of KCNQ2/Q3 current density in aged neurons as a potential mechanism behind their overactivity.

      Strengths:

      Detailed and rigorous analysis of electrical responses of peripheral sympathetic motor neurons using electrophysiology (perforated patch and whole-cell recordings). Most of the conclusions of this paper are well supported by the data.

      We thank the reviewer for valuing our effort to present a detailed and rigorous analysis.

      Weaknesses:

      (1) The identity of the age-associated reduced current as KCNQ2/Q3 is not corroborated by pharmacology (blocking the current with the specific blocker XE-991).

      We have performed experiments using blockers of KCNQ channels. See responses below.

      (2) The manuscript does not include a direct test of the reduction of KCNQ current as the mechanism behind age-induced hyperexcitability.

      Thank you for raising this point. We have performed experiments blocking KCNQ channels with Linopiridine in young neurons and found that the pharmacological reduction of KCNQ current was enough to depolarize the cell and, in some cases, elicit the firing of action potentials. We present the results in a new figure. We also added the description in the Results section.

      Reviewer #3:

      This is a descriptive study of membrane excitability and Na+ and K+ current amplitudes of sympathetic motor neurons in culture. The main findings of the study are that neurons isolated from aged animals show increased membrane excitability manifested as increased firing rates in response to electrical stimulation and changes in related membrane properties including depolarized resting membrane potential, increased rheobase, and spontaneous firing. By contrast, neuron cultures from young mice show little to no spontaneous firing and relatively low firing rates in response to current injection. These changes in excitability correlate with significant reductions in the magnitude of KCNQ currents in aged neurons compared to young neurons. Treating cultures with the immunosuppressive drug, rapamycin, which has known antiaging effects in model animals appears to reverse the firing rates in aged neurons and enhance KCNQ current. The authors conclude that aging promotes hyperexcitability of sympathetic motor neurons.

      The electrophysiological cataloging of the neuronal properties is generally well done, and the experiments are performed using perforated patch recordings which preserve the internal constituents of neurons, providing confidence that the effects seen are not due to washout of regulators from the cells.

      The main weakness is that this study is a descriptive tabulation of changes in the electrophysiology of neurons in culture, and the effects shown are correlative rather than establishing causality. It is difficult to know from the data presented whether the changes in KCNQ channels are in fact directly responsible for the observed changes in membrane excitability.

      We appreciate the constructive criticism. In an attempt to assess whether changes in KCNQ are in fact directly responsible for the changes in membrane excitability, we have performed experiments blocking KCNQ channels with Linopirdine in young neurons and found that the pharmacological reduction of KCNQ current was enough to depolarize the cell and, in some cases, elicit the firing of action potentials. Conversely, we activated KCNQ channels in old neurons with retigabine and found that the pharmacological activation was enough to hyperpolarize the membrane potential and stop the firing of action potentials. This effect was reversible. These two experiments provide solid evidence to our statement that age-associated reduction of KCNQ activity is responsible for the hyperexcited state in sympathetic motor neurons. We present the results in a new figure (Figure 8). We also added the description in the Results section.

      Furthermore, a notable omission seems to be the analysis of Ca2+ currents which have been widely linked to alterations in membrane properties in aging.

      We thank the reviewer for the comment. We did omit to include data on our studies of calcium currents. We agree that the study of the effect of calcium currents is relevant as it can influence the afterhyperpolarization. Furthermore, we believe that potential effects on calcium currents need to be studied in relation to other physiological processes that depend on calcium, including excitation-transcription coupling, calcium handling, and neurotransmitter release. Adding this information to this manuscript would only contribute to the tabulation of effects that we observe in sympathetic motor neurons with aging. As our main goal was to determine the ion channels responsible for the hyperexcited state, voltage-gated calcium channels or other calcium sources could have reflected a more indirect mechanism as compared to changes in sodium or potassium currents. We will continue our investigation on calcium currents and report our observations in the future, but for now, we have decided to leave it out of this work.

      As well, additional experiments in slice cultures would provide greater significance on the potential relevance of the findings for intact preparations. Finally, experiments using KCNQ blockers and activators could provide greater relevance that the observed changes in KCNQ are indeed connected to changes in membrane excitability.

      We are happy to report that we have performed these experiments and that the results strengthen the conclusion that changes in KCNQ are connected to changes in membrane excitability.

      Recommendations for the authors:

      We recommend the following essential revisions summarized from the reviews:

      (1) Is the change in KCNQ current responsible for the altered membrane excitability? What happens to membrane excitability when KCNQ is partially blocked (see reviewer 2 comment below)? Conversely, what happens to the excitability of aged neurons if KCNQ is activated (e.g., with retigabine)? (see reviewer 3 comment below). Results of these important experiments are needed to support the argument that KCNQ underlies the alterations in firing and membrane excitability.

      We have responded to this point. Thank you for the suggested experiments. In summary, the new experiments show that blocking KCNQ channels in young neurons lead to depolarization, and in some cases, the firing of action potentials. Conversely, the activation of KCNQ channels in aged neurons leads to hyperpolarization and a cease of firing. We have added a new figure and reported the results in the Results section.

      (2) Rapamycin experiments are underdeveloped and weak. These should be further developed by examining the effects of KCNQ blockers to see if their effects on membrane excitability are reversed. Also, see comment 2 from reviewer 1.

      We have followed the recommendation by reviewer 1 and removed the section on rapamycin.

      (3) The study should examine voltage-gated calcium currents to determine potential changes in these currents with aging. See reviewer 3 comments.

      We thank the reviewer for the comment. We performed preliminary experiments and found that aging impacts calcium currents. However, we omitted to include the data. In our opinion, the changes in calcium currents are outside the scope of this work, as the changes could be related to physiological processes that go beyond the control of firing. Effects on calcium currents need to be studied in relation to other physiological processes that depend on calcium, including excitation-transcription coupling, calcium handling, and neurotransmitter release. The study of the relationship between changes in calcium currents and those physiological processes would require multiple experiments and detailed analysis. We will continue our investigation on calcium currents and report our observations in the future, but for now, we have decided to leave it out of this work.

      We have also edited suggestions in the Figures and Legends.

      (2) In Fig.4 panel H, Y-axis must be # AP at 100 pA.

      We corrected the axis in Figure 4H.

      (3) In Legend Fig. 5, the number of cells for each subpopulation (n) needs to be corrected. In plots F-I, n= 9, 7, and 3 seem to be the number of adapting cells for 12-, 64- and 115w-old, respectively, instead of the number of single, phasic, and old cells for 12-week-old mice. A similar correction seems to be needed for 64-week-old and 115-week-old.

      We corrected the n number in Figure 5.

      (4) In Figure 6 panel C, it would be helpful for a reader to align the voltage protocol depicted with the current shown.

      We have aligned the voltage protocol to the current traces.

      (5) In the legend of Figure 7, the description of panel A ends with "Magnitude of voltage step to elicit each trace is shown in black", however in panel A there is no voltage depiction. In the description of panel D, "N = X animals, n=x cells" must be corrected.

      We have modified the legend to clarify. It now reads: “Text at the right of each current trace corresponds to the voltage used to elicit that current.”

      New Figure 8

      Author response image 1.

      Pharmacological inhibition and activation of KCNQ channels mimic the age-dependent phenotype. A. Membrane potential recordings from two young neurons treated with 25 μM linopirdine during the time illustrated by the light gray box. No holding current was applied. B. Left: Summary of the resting membrane potential measured before (light orange) and after (dark orange) the application of linopirdine. Right: Summary of the depolarization produced by linopirdine calculated by subtracting the post-drug voltage from the pre-drug voltage (V). Data points are from N = 2 animals, n = 8 cells, 14-week-old mice. C. Membrane potential recordings from two aged neurons treated with 10 μM retigabine during the time illustrated by the light gray box. No holding current was applied. D. Left: Summary of the resting membrane potential measured before (light purple) and after (dark purple) the application of retigabine. Right: Summary of the hyperpolarization produced by retigabine calculated by subtracting the post-drug voltage from the pre-drug voltage (V). Data points are from N = 2 animals, n = 7 cells, 120-week-old mice. P-values are shown at the top of the graphs.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      In this important paper, Blin and colleagues develop a high-throughput behavioral assay to test spontaneous swimming and olfactory preference in individual Mexican cavefish larvae. The authors present compelling evidence that the surface and cave morphs of the fish show different olfactory preferences and odor sensitivities and that individual fish show substantial variability in their spontaneous activity that is relevant for olfactory behaviour. The paper will be of interest to neurobiologists working on the evolution of behaviour, olfaction, and the individuality of behaviour.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors posed a research question about how an animal integrates sensory information to optimize its behavioral outputs and how this process evolved. Their data (behavioral output analysis with detailed categories in response to the different odors in different concentrations by comparing surface and cave populations and their hybrid) partially answer this tough question. They built a new low-disturbance system to answer the question. They also found that the personality of individual fish is a good predictor of behavioral outputs against odor response. They concluded that cavefish evolved to specialize their response to alanine and histidine while surface fish are more general responders, which was supported by their data.

      Strengths:

      With their new system, the authors could generate clearer results without mechanical disturbances. The authors characterize multiple measurements to score the odor response behaviors, and also brought a new personality analysis. Their conclusion that cavefish evolved as a specialist to sense alanine and histidine among 6 tested amino acids was well supported by their data.

      Weaknesses:

      The authors posed a big research question: How do animals evolve the processes of sensory integration to optimize their behavioral outputs? I personally feel that, to answer the questions about how sensory integration generates proper (evolved) behavior, the authors at least need to show the ecological relevance of their response. For the alanine/histidine preference in cavefish, they need data for the alanine and other amino acid concentrations in the local cave water and compare them with those of surface water.

      We agree with the reviewer. This is why, in the Discussion section, we had written: “…Such significant variations in odor preferences or value may be adaptive and relate to the differences in the environmental and ecological conditions in which these different animals live. However, the reason why Pachón cavefish have become “alanine specialists” remains a mystery and prompts analysis of the chemical ecology of their natural habitat. Of note, we have not found an odor that would be repulsive for Astyanax so far, and this may relate to their opportunist, omnivorous and detritivore regime (Espinasa et al., 2017; Marandel et al., 2020).” This is also why we currently develop field work projects aimed at clarifying this question. However, such experiments and analyses are challenging, practically and technically. We hope we can reach some conclusions in the future.

      To complete the discussion we have also added an important hypothesis: “Alternatively, specialization for alanine may not need to be specific for an olfactory cue present only, or frequently, or in high amounts in caves. Bat guano for example, which is probably the main source of food in the Pachón cave, must contain many amino acids. Enhanced recognition of one of them - in the present case alanine but evolution may have randomly acted for enhanced recognition of another amino acid – should suffice to confer cavefish with augmented sensitivity to their main source of nutriment.”

      Also, as for "personality matters", I read that personality explains a large variation in surface fish. Also, thigmotaxis or wall-following cavefish individuals are exceeded to respond well to odorants compared with circling and random swimming cavefish individuals. However, I failed to understand the authors' point about how much percentages of the odorant-response variations are explained (PVE) by personality. Association (= correlation) was good to show as the authors presented, but showing proper PVE or the effect size of personality to predict the behavioral outputs is important to conclude "personality is matter"; otherwise, the conclusion is not so supported.

      From the above, I recommend the authors reconsider the title also their research questions well. At this moment, I feel that the authors' conclusions and their research questions are a little too exaggerated, with less supportive evidence.

      Thank you for this interesting suggestion, which we have fully taken into consideration. We have therefore now calculated and plotted PVE (the percentage of variation explained on the olfactory score) as a function of swimming speed or as a function of swimming pattern. The results are shown in modified Figure 8 of our revised ms and they suggest that the personality (here, swimming patterns or swimming speed) indeed predicts the olfactory response skills. Therefore, we would like to keep our title as we provide support for the fact that “personality matters”.

      Also, for the statistical method, Fisher's exact test is not appropriate for the compositional data (such as Figure 2B). The authors may quickly check it at https://en.wikipedia.org/wiki/Compositional_data or https://www.annualreviews.org/doi/pdf/10.1146/annurev-statistics-042720-124436.

      The authors may want to use centered log transformation or other appropriate transformations (Rpackage could be: https://doi.org/10.1016/j.cageo.2006.11.017). According to changing the statistical tests, the authors' conclusion may not be supported.

      Actually, in most cases, the distributions are so different (as seen by the completely different colors in the distribution graphs) that there is little doubt that swimming behaviors are indeed different between surface and cavefish, or between ‘before’ and ‘after’ odor stimulation. However, it is true that Fisher’s exact test is not fully appropriate because data can be considered as compositional type. For this kind of data, centered log transformation have been suggested. However, our dataset contains many zeros, and this is a case where log transformations have difficulty handling.

      To help us dealing with our data, the reviewer proposed to consider the paper by Greenacre (2021) (https://www.annualreviews.org/doi/pdf/10.1146/annurev-statistics-042720-124436). In his paper, Greenacre clearly wrote: "Zeros in compositional data are the Achilles heel of the logratio approach (LRA)."

      Therefore, we have now tested our data using CA (Correspondence Analysis), that can deal with table containing many zeros and is a trustable alternative to LRA (Cook-Thibeau, 2021; Greenacre, 2011).

      The results of CA analysis are shown in Supplemental figure 8 and they fully confirm the difference in baseline swimming patterns between morphs as well as changes (or absence of changes) in behavioral patterns after odor stimulation suggested by the colored bar plots in main figures, with confidence ellipses overlapping or not overlapping, depending on cases. Therefore, the CA method fully confirms and even strengthens our initial interpretations.

      Finally, we have kept our initial graphical representation in the ms (color-coded bar plots; the complete color code is now given in Suppl. Fig7), and CA results are shown in Suppl. Figure 8 and added in text.

      Reviewer #2 (Public Review):

      In their submitted manuscript, Blin et al. describe differences in the olfactory-driven behaviors of river-dwelling surface forms and cave-dwelling blind forms of the Mexican tetra, Astyanax mexicanus. They provide a dataset of unprecedented detail, that compares not only the behaviors of the two morphs but also that of a significant number of F2 hybrids, therefore also demonstrating that many of the differences observed between the two populations have a clear (and probably relatively simple) genetic underpinning.

      To complete the monumental task of behaviorally testing 425 six-week-old Astyanax larvae, the authors created a setup that allows for the simultaneous behavioral monitoring of multiple larvae and the infusion of different odorants without introducing physical perturbations into the system, thus biasing the responses of cavefish that are particularly fine-tuned for this sensory modality. During the optimization of their protocol, the authors also found that for cave-dwelling forms one hour of habituation was insufficient and a full 24 hours were necessary to allow them to revert to their natural behavior. It is also noteworthy that this extremely large dataset can help us see that population averages of different morphs can mask quite significant variations in individual behaviors.

      Testing with different amino-acids (applied as relevant food-related odorant cues) shows that cavefish are alanine- and histidine-specialists, while surface fish elicit the strongest behavioral responses to cysteine. It is interesting that the two forms also react differently after odor detection: while cave-dwelling fish decrease their locomotory activity, surface fish increase it. These differences are probably related to different foraging strategies used by the two populations, although, as the observations were made in the dark, it would be also interesting to see if surface fish elicit the same changes in light as well.

      Thank you for these nice comments.

      Further work will be needed to pinpoint the exact nature of the genetic changes that underlie the differences between the two forms. Such experimental work will also reveal how natural selection acted on existing behavioral variations already present in the SF population.

      Yes. Searching for genetic underpinnings of the sensory-driven behavioral differences is our current endeavor through a QTL study and we should be able to report it in the near future.

      It will be equally interesting, however, to understand what lies behind the large individual variation of behaviors observed both in the case surface and cave populations. Are these differences purely genetic, or perhaps environmental cues also contribute to their development? Does stochasticity provided by the developmental process has also a role in this? Answering these questions will reveal if the evolvability of Astyanax behavior was an important factor in the repeated successful colonization of underground caves.

      Yes. We will also access (at least partially) responses to most of these questions in our current QTL study.

      Reviewer #3 (Public Review):

      Summary:

      The paper explores chemosensory behaviour in surface and cave morphs and F2 hybrids in the Mexican cavefish Astyanax mexicanus. The authors develop a new behavioural assay for the longterm imaging of individual fish in a parallel high-throughput setup. The authors first demonstrate that the different morphs show different basal exploratory swimming patterns and that these patterns are stable for individual fish. Next, the authors test the attraction of fish to various concentrations of alanine and other amino acids. They find that the cave morph is a lot more sensitive to chemicals and shows directional chemotaxis along a diffusion gradient of amino acids. For surface fish, although they can detect the chemicals, they do not show marked chemotaxis behaviour and have an overall lower sensitivity. These differences have been reported previously but the authors report longer-term observations on many individual fish of both morphs and their F2 hybrids. The data also indicate that the observed behavior is a quantitative genetic trait. The approach presented will allow the mapping of genes' contribution to these traits. The work will be of general interest to behavioural neuroscientists and those interested in olfactory behaviours and the individual variability in behavioural patterns.

      Strengths:

      A particular strength of this paper is the development of a new and improved setup for the behavioural imaging of individual fish for extended periods and under chemosensory stimulation. The authors show that cavefish need up to 24 h of habituation to display a behavioural pattern that is consistent and unlikely to be due to the stressed state of the animals. The setup also uses relatively large tanks that allow the build-up of chemical gradients that are apparently present for at least 30 min.

      The paper is well written, and the presentation of the data and the analyses are clear and to a high standard.

      Thank you for these nice comments.

      Weaknesses:

      One point that would benefit from some clarification or additional experiments is the diffusion of chemicals within the behavioural chamber. The behavioural data suggest that the chemical gradient is stable for up to 30 min, which is quite surprising. It would be great if the authors could quantify e.g. by the use of a dye the diffusion and stability of chemical gradients.

      OK. We had tested the diffusion of dyes in our previous setup and we also did in the present one (not shown). We think that, due to differences of molecular weight and hydrophobicity between the tested dyes and the amino acid molecules we are using, their diffusion does not constitute a proper read-out of actual amino acid diffusion. We anticipate that amino acid diffusion is extremely complex in the test box, possibly with odor plumes diffusing and evolving in non-gradient patterns, in the 3 dimensions of the box, and potentially further modified by the fish swimming through it, the flow coming from the opposite water injection side and the borders of the box. This is the reason why we have designed the assay with contrasting “odor side” and “water control side”. Moreover, our question here is not to determine the exact concentration of amino acid to which the fish respond, but to compare the responses in cavefish, surface fish and F2 hybrids. Finally and importantly, we have performed dose/response experiments whereby varying concentrations have been presented for 3 of the 6 amino acids tested, and these experiments clearly show a difference in the threshold of response of the different morphs.

      The paper starts with a statement that reflects a simplified input-output (sensory-motor) view of the organisation of nervous systems. "Their brains perceive the external world via their sensory systems, compute information and generate appropriate behavioral outputs." The authors' data also clearly show that this is a biased perspective. There is a lot of spontaneous organised activity even in fish that are not exposed to sensory stimulation. This sentence should be reworded, e.g. "The nervous system generates autonomous activity that is modified by sensory systems to adapt the behavioural pattern to the external world." or something along these lines.

      Done

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      In addition to my comments in the "weakness" section above, here are my other comments.

      How many times fish were repeatedly assayed and what the order (alanine followed by cysteine, etc) was, is not clear (Pg 24, Materials and Methods). I am afraid that fish memorize the prior experience to get better/worse their response to the higher conc of alanine, etc. Please clarify this point.

      Many fish were tested in different conditions on consecutive days, indeed. Most often, control experiments (eg, water/nothing; water/water; nothing/nothing) were followed by odor testing. In such cases, there is no risk that fish memorize prior experience and that such previous experience interferes with response to odor. In other instances, fish were tested with a low concentration of one amino acid, followed by a high concentration of another amino acid, which is also on the safe side. Of note, on consecutive days, the odors were always perfused on alternate sides of the test box, to avoid possibility of spatial memory. Finally, in the few cases where increasing concentrations of the same amino acids were perfused consecutively, 1) they were perfused on alternate sides, 2) if the fish does not detect a low concentration below threshold / does not respond, then prior experience should not interfere for responding to higher concentrations, and 3) we have evidence (unpublished, current studies) that when a fish is given increasing concentrations of the same amino acid above detection threshold, then the behavioral response is stable and reproducible (eg does not decrease or increase).

      Minor points:

      Thygmotaxis and wall following.

      Classically, thigmotaxis and wall following are treated as the same (sharma et al., 2009; https://pubmed.ncbi.nlm.nih.gov/19093125/) but the authors discriminate it in thigmotaxis at X-axis and Y-axis because fish repeatedly swam back and forth on x-axis wall or y-axis wall. I understand the authors' point to discriminate WF and T but present them with more explanations (what the differences between them) in the introduction and result sections.

      Done

      Pg5 "genetic architecture" in the introduction.

      "Genetic architecture" analysis needs a more genomic survey, such as GWAS, QTL mapping, and Hi-C. Phenotype differences in F2 generation can be stated as "genetic factor(s)" "genetic component(s)", etc. please revise.

      Done

      Pg10 At the serine treatment, the authors concluded that "...suggesting that their detection threshold for serine is lower than for alanine." I believe that the 'threshold for serine is higher' according to the authors' data. Their threshold-related statement is correct in Pg21 "as SF olfactory concentration detection threshold are higher than CF,..." So the statement on page 10 is a just mistake, I think. Please revise.

      Done (mistake indeed)

      Pg11 After explaining Fig5, the statement "In sum, the responses of the different fish types to different concentrations of different amino acids were diverse and may reflect complex, case-bycase, behavioral outputs" does not convey any information. Please revise.

      OK. Done : “In sum, the different fish types show diverse responses to different concentrations of different amino acids.”

      For the personality analysis (Fig 7)

      The index value needs more explanation. I read the materials and methods three times but am still confused. From the equation, the index does not seem to exceed 1.0, unless the "before score" was a negative value, and the "after score" value was positive. I could not get why the authors set a score of 1.5 as the threshold for the cumulative score of these different behavior index values (= individual score). Please provide more description. Currently, I am skeptical about this index value in Fig 7.

      Done, in results and methods.

      Pg15 the discussion section

      Please discuss well the difference between the authors' finding (cavefish respond 10^-4M for position and surface fish responded 10^-4 for thig-Y; Fig 4AB), and those in Hinaux et al. 2016 (cavefish responded 10^-10M alanine but surface fish responded 10^-5M or higher). It seems that surface fish could respond to the low conc of alanine as cavefish do, which is opposed to the finding in Hinaux 2016.

      The increase in NbrtY at population level for surface fish with 10-4M alanine (~10-6M in box) was most probably due to only a few individuals. Contrarily to cavefish, all other parameters were unchanged in surface fish for this concentration. Moreover, at individual level, only 3.2% of surface fish had significant olfactory scores (to be compared to 81.3% for cavefish). Thus, we think that globally this result does not contradict our previous findings in Hinaux et al (2016), and solely represent the natural, unexplained variations inherent to the analysis of complex animal behaviors – even when we attempt to use the highest standards of controlled conditions.

      Of note, in the revised version, we have now included a full dose/response analysis for alanine concentration ranging from 10-2M to 10-10M, on cavefish. Alanine 10-5M has significant effects (now shown in Suppl Fig2 and indicated in text; a column has been added for 10-5M in Summary Table 1). Lower concentrations have milder effects (described in text) but confirm the very low detection threshold of cavefish for this amino acid.

      Pg19, "In sum, CF foraging strategy has evolved in response to the serious challenge of finding food in the dark"

      My point is the same as explained in the 'weakness' section above: how this behavior is effective in the cave life, if they conclude so? Please explain or revise this statement.

      The present manuscript reports on experiments performed in “artificial” and controlled laboratory conditions. We are fully aware that these conditions are probably distantly related to conditions encountered in the wild. Note that we had written in original version (page 20) “…for 6-week old juveniles in a rectangular box - but the link may be more elusive when considering a fish swimming in a natural, complex environment.” As the reviewer may know, we also perform field studies in a more ethological approach of animal behaviors, thus we may be able to discuss this point more accurately in the future.

      Pg20 "To our knowledge, this is the first time individual variations are taken into consideration in Astyanax behavioral studies."

      This is wrong. Please see Fernandes et al., 2022. (https://pubmed.ncbi.nlm.nih.gov/36575431/).

      OK. The sentence is wrong if taken in its absolute sense, i.e., considering inter-individual variations of a given parameter (e.g., number of neuromasts per individual or number of approaches to vibrating rod in Fernandez et al, 2022). In this same sense, Astyanax QTL studies on behaviors in the past also took into account variations among F2 individuals. Here, we wanted to stress that personality was taken into consideration. The sentence has been changed: “To our knowledge, this is the first time individual temperament is taken into consideration in Astyanax behavioral studies.”

      Figure 2B and others.

      The order of categories (R, R-TX, etc) should match in all columns (SF, F2, and CF). Currently, the category orders seem random or the larger ratio categories at the bottom, which is quite difficult to compare between SF, F2, and CF. Also, the writings in Fig 2A (times, Y-axis labels, etc), and the bargraphs' writings are quite difficult to read in Fig 2B, Fig 3B 4H, 5GN, 6EFG. Also, no need to show fish ID in Fig 2C in the current way, but identify the fish data points of the fish in Fig 2D (SF#40, CF#65, and F2#26) in Fig 2C if the authors want to show fish ID numbers in the boxplots. Fish ID numbers in other boxplot figures are recommended to be removed too.

      We have thought a lot on how to best represent the distributions of swimming patterns in graphs such as Fig 2B and others. The difficulty is due to the existence of many combinations (33 possibilities in total, see new Suppl Fig7), which are never the same in different plots/conditions because individual tested fish are different. We decided that that the best way was to represent, from bottom to top, the most used to the less used swimming patterns, and to use a color code that matches at best the different combinations. It was impossible to give the full color code on each figure, therefore it was simplified, and we believe that the results are well conveyed on the graphs. We would like to keep it as it is. To respond (partially) to the reviewer’s concern, we have now added a full color code description in a new Supplemental Figure 7 (associated to Methods).

      Size of lettering has been modified in all pattern graphs like Fig2A. Thanks for the suggestion, it reads better now.

      Finally, we would like to keep the fish ID numbers because this contributes to conveying the message of the paper, that individuality matters.

      Raw data files were not easy to read in Excel or LibreOffice. Please convert them into the csv format to support the rigor in the authors' conclusion.

      We do not understand this request. Our very large dataset must be analysed with R, not excel for stats or for plotting and pattern analysis. However, raw data files can be opened in excel with format conversion.

      Reviewer #2 (Recommendations For The Authors):

      I think most of the experimental procedures (with few exceptions, see below) are well-defined and nicely described, so the majority of my suggestions will be related to the visualization of the data. I think the authors have done a great job in presenting this complex dataset, but there are still some smaller tweaks that could be used to increase the legibility of the presented data.

      First and perhaps foremost, a better definition of the swimming pattern subsets is needed. I have no problem understanding the main behavioral types, but whereas the color codes for these suggest that there is continuous variance within each pattern, it is not clear (at least to me), what particular aspect(s) of the behaviors vary. Also, whereas the sidebars/legends suggest a continuum within these behaviors, the bar charts themselves clearly present binned data. I did not find a detailed description of how the binning was done. As this has been - according the Methods section - a manual process, more clarity about the details of the binning would be welcome. I would also suggest using binned color codes for the legends as well.

      Done, in Results and Methods. We hope it is now clear that there is no “continuum”, rather multiple combinations of discrete swimming patterns. The gradient aspect in color code in figures has been removed to avoid the idea of continuum. According to the chosen color code, WF is in red, R in blue, T in yellow and C in green. Then, combination are represented by colors in between, for example, R+WF is purple. We have now added a full color code description for the swimming patterns and their combinations in a new Supplemental Figure 7 (associated to Methods).

      Also, to better explain the definition of the swimming patterns and the graphical representation, it now reads (in Methods):

      “The determination of baseline swimming patterns and swimming patterns after odor injection was performed manually based on graphical representations such as in Figure 2A or Figure 3A. Four distinctive baseline behaviors clearly emerged: random swim (R; defined as haphazard swimming with no clear pattern, covering entirely or partly the surface of the arena), wall following (WF; defined as the fish continuously following along the 4 sides of the box and turning around it, in a clockwise or counterclockwise fashion), large or small circles (C; self explanatory), and thigmotactism (T, along the X- or the Y-axis of the box; defined as the fish swimming back and forth along one of the 4 sides of the box). On graphical representations of swimming pattern distributions, we used the following color code: R in blue, WF in red, C in green, T in yellow. Of note, many fish swam according to combination(s) of these four elementary swimming patterns (see descriptions in the legends of Supplemental figures, showing many examples). To fully represent the diversity and the combinations of swimming patterns used by individual fish, we used an additional color code derived from the “basic” color code described above and where, for example R+WF is purple. The complete combinatorial color code is shown in Suppl. Fig7.”

      It would be also easier to comprehend the stacked bar charts, presenting the particular swimming patterns in each population, if the order of different swimming patterns was the same for all the plots (e.g. the frequency of WF always presented at the bottom, R on the top, and C and T in the middle). This would bring consistency and would highlight existing differences between SF, CF, and F2s. Furthermore, such a change would also make it much easier to see (and compare) shifts in behaviors.

      We have thought a lot on how to best represent the distributions of swimming patterns in graphs such as Fig 2B and others. The difficulty is due to the existence of many combinations, which are never the same in different plots/conditions because the individual fish tested are different. We decided to keep it as it currently stands, because we think re-doing all the graphs and figures would not significantly improve the representation. In fact, we think that the differences between morphs (dominant blue in SF, dominant red in CF) and between conditions (bar charts next to each other) are easy to interpret at first glance in the vast majority of cases. Moreover, they are now completed by CA analyses (Suppl Figure 8).

      While the color coding of the timeline in the "3D" plots presented for individual animals is a nice feature, at the moment it is slightly confusing, as the authors use the same color palette as for the stacked bar charts, representing the proportionality of the particular swimming patterns. As the y-axis is already representing "time" here, the color coding is not even really necessary. If the authors would like to use a color scheme for aesthetic reasons, I would suggest using another palette, such as "grey" or "viridis".

      We would like to keep the graphical aspect of our figures as they are, for aesthetic reasons. To avoid confusion with stacked bar chart color code, we have added a sentence in Methods and in the legend of Figure 2, where the colors first appear:

      “The complete combinatorial color code is shown in Suppl. Figure 7. Of note, in all figures, the swimming pattern color code does not relate whatsoever with the time color code used in the 2D plus time representation of swimming tracks such as in Figure 2A”.

      I would also suggest changing the boxplots to violin-plots. Figure 7 clearly shows bimodality for F2 scores (something, as the authors themselves note, not entirely surprising given the probably poligenic nature of the trait), but looking at SF and CF scores I think there are also clear hints for non-normal distributions. If non-normal distribution of traits is the norm, violin-plots would capture the variance in the data in a more digestible way. (The existence of differently behaving cohorts within the population of both SF and CF forms would also help to highlight the large pre-existing variance, something that was probably exploited by natural selection as well, as mentioned briefly in the Discussion by the authors, too.)

      The bimodal distribution of scores shown by F2s in Figure 7B is indeed probably due to the polygenic nature of the trait. However, such distribution is rather the exception than the norm. Moreover, the boxplot representations we have used throughout figures include all the individual points, and outliers can be identified as they have the fish ID number next to them. This allows the reader to grasp the variance of the data. Again, redoing all graphs and figures would constitute a lot of work, for little gain in term of conveying the results. Therefore, we choose not to change the boxplot for violin plots.

      The summary data of individual scores in Table 1B shows some intriguing patterns, that warrant a bit further discussion, in my opinion. For example, we can see opposite trends in scores of SF and CF forms with increasing alanine concentration. Is there an easy explanation for this? Also, in the case of serine, the CF scores do not seem to respond in a dose-dependent manner and puzzlingly at 10^(-3)M serine concentration F2 scores are above those of both grandparental populations.

      That is true. However, we have no simple explanation for this. To begin responding to this question, we have now performed full dose/responses expts for alanine (concentrations tested from 10-2M to 10-10M on cavefish; confirm that CF are bona fide “alanine specialists”) and for serine (10-2M to 104M tested on both morphs; confirm that both morphs respond well to this amino acid). These complementary results are now included in text and figures (partially) and in the summary table 1.

      If anything is known about this, I would also welcome some discussion on how thigmotactic behavior, a marker of stress in SF, could have evolved to become the normal behavior of CF forms, with lower cortisol levels and, therefore lower anxiety.

      We actually think thigmotactism is a marker of stress in both morphs. See Pierre et al, JEB 2020, Figure S3A: in both SF and CF thigmotaxis behavior decreases after long habituation times. In our hands, the only difference between the two morphs is that surface fish (at 5 month of age) express stress by thigmotactism but also freezing and rapid erratic movements, while cavefish have a more restricted stress repertoire.

      This is why in the present paper we have carefully made the distinction between thigmotactism (= possible stress readout) and wall following (= exploratory behavior). Our finding that WF and large circles confers better olfactory response scores to cavefish is in strong support of the different nature of these two swimming patterns. Then, why is swimming along the 4 walls of a tank fundamentally different from swimming along one wall? The question is open, although the number of changes of direction is probably an important parameter: in WF the fish always swims forward in the same direction, while in T the fish constantly changes direction when reaching the corner of the tank – which is similar to erratic swim in stressed surface fish.

      Finally two smaller suggestions:

      • When referring to multiple panels on the same figure it would be better to format the reference as "Figure 4D-G" instead of "Figure 4DEFG";

      Done

      • On page 4, where the introduction reads as "although adults have a similar olfactory rosette with 2025 lamellae", in my opinion, it would be better to state that "while adults of the two forms have a similar olfactory rosette with 20-25 lamellae".

      Done

      Reviewer #3 (Recommendations For The Authors):

      Consider moving Figure 3 to be a supplement of Figure 4. This figure shows a water control and therefore best supplements the alanine experiment.

      We would like to keep this figure as a main figure: we consider it very important to establish the validity of our behavioral setup at the beginning of the ms, and to establish that in all the following figures we are recording bona fide olfactory responses.

      "sensory changes in mecano-sensory and gustatory systems " - mechano-sensory.

      Done

      Figure 2 legend: "(3) the right track is the 3D plus time (color-coded)" - shouldn't it be 2D plus time or 3D (x,y, time).

      True! Thanks for noting this, corrected.

      Figure 4 legend "E, Change in swimming patterns" should be H.

      Done

      "suggesting that their detection threshold for serine is lower than for alanine" - higher?

      Done

      In the behavioural plots, I assume that the "mean position" value represents the mean position along the X-axis of the chamber - this should be clarified and the axis label updated accordingly.

      That is correct and has been updated in Methods and Figures and legends.

      "speed, back and forth trips in X and Y, position and pattern changes (see Methods; Figure 7A)." - here it would be helpful to add an explanation like "to define an olfactory score for individual fish."

      This has been changed in Results and more detailed explanations on score calculations are now given in Methods.

      "possess enhanced mecanosensory lateral line" - mechanosensory.

      Done

    1. Author response:

      Reviewer #1 (Public Review):

      (1) Deleting ICP34.5 from the HSV construct has a very strong effect on HIV reactivation. Why is no eGFP readout given in Figure 1C as for WT HSV? The mechanism underlying increased activation by deleting ICP34.5 is only partially explored. Overexpression of ICP34.5 has a much smaller effect (reduction in reactivation) than deletion of ICP34.5 (strong activation); so the story seems incomplete.

      Thank you for your comment. (1) In Figure 1c, "HSV-wt" refers to the virus rescued from pBAC—GFP-HSV (as mentioned in the “Method” section), which carries GFP itself. Therefore, detecting GFP cannot distinguish between HSV infection and HIV reactivation. Hence, we assess the reactivation effect by measuring the mRNA levels of HIV LTR. (2) Our data indicate that overexpression of ICP34.5 inhibits the reactivation of the HIV latent reservoir, but this effect is not equivalent to the activation observed in HSV-1 with ICP34.5 deletion. There are some possible reasons: one is that the overexpression of ICP34.5 by lentivirus is randomly integrated into the genome of J-Lat cell line, which will potentially activate HIV latency to some extent. The other is that ICP34.5 mainly inhibited HIV reactivation through modulation of host NF-κB or HSF1 pathways, while PMA, TNF-a, and HSV-1 with deleted ICP34.5 can reactivate HIV latency by other mechanisms that have yet to be determined. Thereby, exerting a synergistic small inhibitory effect. We will further discuss this issue in the revised version. Thank you.

      (2) No toxicity data are given for deleting ICP34.5. How specific is the effect for HIV reactivation? An RNA seq analysis is required to show the effect on cellular genes.

      Thank you for your comment. We plan to conduct several experiments to demonstrate a reduction in HSV-1 replication after ICP34.5 deletion: (1) Detect the growth curve of HSV-1 deleted with ICP34.5 in Vero cells. The virus growth curve of HSV-1 with deleted ICP34.5 may be lower than that of wild-type HSV-1, which could demonstrate a reduction in HSV-1 replication after ICP34.5 deletion. (2) Detect the level of inflammatory factors in tumor cells after infection with HSV-1 deleted with ICP34.5.

      We believe that the effect is specific, as we previously tested poxviruses and adenoviruses and found no activation of the latent reservoir. We consider the activation observed with HSV-1 virus and HSV-1 with deleted ICP34.5 to be specific. We will supplement relevant data in the revised version.

      In addition, we will provide the corresponding RNA-seq data to assess its effect on cellular genes.

      (3) The primate groups are too small and the results to variable to make averages. In Figure 5, the group with ART and saline has two slow rebounders. It is not correct to average those with a single quick rebounder. Here the interpretation is NOT supported by the data.

      We agree with you that this is a pilot study of limited numbers of rhesus macaques. There were only 3 monkeys per group in this study, but our results were encouraging. Although the number of macaques was relatively limited, these nine macaques were distributed very carefully based on age, sex, weight and genotype. All SIV-infected macaques used in this study had a long history of SIV infection and had several courses of ART therapy, which mimics treatment of chronic HIV-1 infection in humans. These macaques were infected with SIVmac239 for more than 5 years, and highly pathogenic SIV-infected macaques have been well-validated as a stringent model to recapitulate HIV-1 pathogenesis and persistence during ART therapy in humans. Indeed, in our rhesus model, ART treatment effectively suppressed SIV infection to undetectable levels in plasma, and upon ART discontinuation, virus rapidly rebounded, which is very similar with that in ART-treated HIV patients. Our further studies will be expanded the scale of animals and then to preclinical and clinical study in our next projects. Thank you for your understanding.

      Discussion

      HSV vectors are mainly used in cancer treatment partially due to induced inflammation. Whether these are suitable to cure PLWH without major symptoms is a bit questionable to me and should at least be argued for.

      We will provide more data about the safety assessment of HSV-1 vector in SIV-infected macaques, and also further discuss the potential of inflammatory HSV vector in PLWH in the revised manuscript.

      Reviewer #2 (Public Review):

      (1) While the mechanism of ICP34.5 interaction and modulation of the NF-kB and HSF1 pathways are shown, this only proves ICP34.5 interactions but does not give away the mechanism of how the HSV-deltaICP-34.5 vector purges HIV-1 latency. What other components of the vector are required for latency reversal? Perhaps serial deletion experiments of the other ORFs in the HSV-deltaICP-34.5 vector might be revealing.

      We agree with your suggestion. In fact, we are currently further exploring some viral genes of HSV-1 that play a role in activation. We have found that the ICP0 gene of HSV-1 virus can activate HIV, and the specific mechanism is under investigation.

      (2) The efficacy of the HSV vaccine vectors was evaluated in Rhesus Macaque model animals. Animals were chronically infected with SIV (a parent of HIV), treated with ART, challenged with bi-functional HSV vaccine or controls, and discontinued treatment, and the resulting virus burden and immune responses were monitored. The animals showed SIV Gag and Env-specific immune responses, and delayed virus rebound (however rebound is still there), and below-detection viral DNA copies. What would make a more convincing argument to this reviewer will be data to demonstrate that after the bi-functional vaccine, the animals show overall reduction in the number of circulating latent cells. The feasibility of obtaining such a result is not clearly demonstrated.

      Thank you for your suggestion. We will plan to conduct IPDA experiments to further supplement data on the overall reduction in circulating latent cell numbers in animals.

      (3) The authors state that the reduced virus rebound detected following bi-functional vaccine delivery is due to latent genomes becoming activated and steady-state neutralization of these viruses by antibody response. This needs to be demonstrated. Perhaps cell-culture experiments from specimens taken from animals might help address this issue. In lab cultures one could create environments without antibody responses, under these conditions one would expect a higher level of viral loads to be released in response to the vaccine in question.

      We plan to use primary cells for related experiments to further validate the results of the cell experiments.

      (4) How do the authors imagine neutralizing HIV-1 envelope epitopes by a similar strategy? A discussion of this point may also help.

      Thank you for your comments. In fact, our study adopts the "shock and kill" strategy, with a focus on the "kill" aspect leaning towards T-cell therapy. Although the vaccine in the paper also utilizes Env antigen, we believe these antibodies are insufficient for neutralizing the mutated SIV virus. We strongly agree with your suggestion that in HIV/AIDS treatment, effective T-cell killing combined with broad-spectrum neutralizing antibodies would be more effective. This aligns with our findings, as our treatment has partially delayed viral rebound but with a relatively short duration of suppression. This may indicate insufficient killing activity. In future research, we will further consider the role of broad-spectrum neutralizing antibodies. Our revised manuscript will elaborate on this in the discussion section.

      (5) I thought the empty HSV-vector control also elicited somewhat delayed kinetics in virus rebound and neutralization, can the authors comment on why this is the case?

      We agree with you that the HSV-1 empty vector does exhibit somewhat a delayed rebound. The reason is that our treatment simultaneously utilizes both the HSV vector vaccine and ART therapy. Although the empty HSV-vector cannot elicit SIV-specific CTL response, it effectively activates the latent SIV reservoirs and then these activated virions can be partially killed by ART, Therefore, even without carrying antigens, the slight delay may be achieved.

    1. Author response:

      We would like to thank the eLife Editors and Reviewers for their positive assessment and constructive comments, and for the opportunity to revise our manuscript. We greatly appreciate the Reviewers’ recommendations and believe that they will further improve our manuscript.

      In revising the manuscript, our primary focus will be enhancing the clarity surrounding testing procedures and addressing corrections for multiple comparisons. Additionally, we intend to offer more explicit information about the statistical tests employed, along with the details about the number of models/comparisons for each test. We will also include an extended discussion on potential limitations of the dopaminergic receptor mapping methods used, addressing the Reviewers’ comments relating to the quality of PET imaging with different dopaminergic tracers in mesiotemporal regions such as the hippocampus. While the code used for connectopic mapping is publicly available through the ConGrads toolbox, we will provide the additional code we have used for data processing and analysis, visualization of hippocampal gradients, and the cortical projections. The data used in the current study is not publicly available due to ethical considerations concerning data sharing, but can be shared upon reasonable request from the senior author. Additional plans include clarifying and discussing which findings were successfully replicated, and addressing Reviewers’ suggestions for using other openly available cohorts for replication, and implementing alternative coordinate systems to quantify connectivity change along gradients.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The manuscript "comparative transcriptomics reveal a novel tardigrade specific DNA binding protein induced in response to ionizing radiation" aims to provide insights into the mediators and mechanisms underlying tardigrade radiation tolerance. The authors start by assessing the effect of ionizing radiation (IR) on the tardigrade lab species, H. exemplaris, as well as the ability of this organism to recover from this stress - specifically, they look at DNA double and single-strand breaks. They go on to characterize the response of H. exemplaris and two other tardigrade species to IR at the transcriptomic level. Excitingly, the authors identify a novel gene/protein called TDR1 (tardigrade DNA damage response protein 1). They carefully assess the induction of expression/enrichment of this gene/protein using a combination of transcriptomics and biochemistry - even going so far as to use a translational inhibitor to confirm the de novo production of this protein. TDR1 binds DNA in vitro and co-localizes with DNA in tardigrades.

      Reverse genetics in tardigrades is difficult, thus the authors use a heterologous system (human cells) to express TDR1 in. They find that when transiently expressed TDR1 helps improve human cell resistance to IR.

      This work is a masterclass in integrative biology incorporating a holistic set of approaches spanning next-gen sequencing, organismal biology, biochemistry, and cell biology. I find very little to critique in their experimental approaches.

      Strengths:

      (1) Use of trans/interdisciplinary approaches ('omics, molecular biology, biochemistry, organismal biology)

      (2) Careful probing of TDR1 expression/enrichment

      (3) Identification of a completely novel protein seemingly involved in tardigrade radio-tolerance.

      (4) Use of multiple, diverse, tardigrade species of 'omics comparison.

      Weaknesses:

      (1) No reverse genetics in tardigrades - all insights into TDR1 function from heterologous cell culture system.

      (2) Weak discussion of Dsup's role in preventing DNA damage in light of DNA damage levels measured in this manuscript.

      (3) Missing sequence data which is essential for making a complete review of the work.

      Overall, I find this to be one of the more compelling papers on tardigrade stress-tolerance I have read. I believe there are points still that the authors should address, but I think the editor would do well to give the authors a chance to address these points as I find this manuscript highly insightful and novel.

      We thank the reviewer for his comments.

      We agree that it will be important to further investigate the role of Dsup in radio-tolerance. We briefly mentioned this point in the discussion (p14). Our findings show that tardigrades undergo DNA damage at levels roughly similar to radio-sensitive organisms and therefore support a major role for DNA repair in the maintenance of genome integrity after exposure to IR. Nevertheless, we believe that more precise quantification of DNA damage may still reveal a contribution of genome protection to radio-tolerance of tardigrades compared to radio-sensitive organisms. Dsup loss of function experiments in tardigrades would clearly be the best way to assess this possibility. In the absence of experiments directly addressing the function of Dsup, we prefer to refrain from drawing any firm conclusion on prevention of DNA damage by Dsup and thus to keep a more open position. In any case, as discussed in the text, we note that Dsup has only been reported in Hypsibioidea and other molecular players, such as TDR1, are likely involved in radio-tolerance in other tardigrade species.

      The sequence data can be accessed at the NCBI SRA database with Bioproject ID PRJNA997229.

      Reviewer #3 (Public Review):

      Summary:

      This paper describes transcriptomes from three tardigrade species with or without treatment with ionizing radiation (IR). The authors show that IR produces numerous single-strand and double-strand breaks as expected and that these are substantially repaired within 4-8 hours. Treatment with IR induces strong upregulation of transcripts from numerous DNA repair proteins including Dsup specific to the Hypsobioidea superfamily. Transcripts from the newly described protein TDR1 with homologs in both Hypsibioidea and Macrobiotoidea supefamilies are also strongly upregulated. They show that TDR1 transcription produces newly translated TDR1 protein, which can bind DNA and co-localizes with DNA in the nucleus. At higher concentrations, TDR appears to form aggregates with DNA, which might be relevant to a possible function in DNA damage repair. When introduced into human U2OS cells treated with bleomycin, TDR1 reduces the number of double-strand breaks as detected by gamma H2A spots. This paper will be of interest to the DNA repair field and to radiobiologists.

      Strengths:

      The paper is well-written and provides solid evidence of the upregulation of DNA repair enzymes after irradiation of tardigrades, as well as upregulation of the TRD1 protein. The reduction of gamma-H2A.X spots in U2OS cells after expression of TRD1 supports a role in DNA damage.

      Weaknesses:

      Genetic tools are still being developed in tardigrades, so there is no mutant phenotype to support a DNA repair function for TRD1, but this may be available soon.

      We thank the reviewer for his comments.

      Reviewer #4 (Public Review):

      The manuscript brings convincing results regarding genes involved in the radio-resistance of tardigrades. It is nicely written and the authors used different techniques to study these genes. There are sometimes problems with the structure of the manuscript but these could be easily solved. According to me, there are also some points which should be clarified in the result sections. The discussion section is clear but could be more detailed, although some results were actually discussed in the results section. I wish that the authors would go deeper in the comparison with other IR-resistant eucaryotes. Overall, this is a very nice study and of interest to researchers studying molecular mechanisms of ionizing radiation resistance.

      I have two small suggestions regarding the content of the study itself.

      (1) I think the study would benefit from the analyses of a gene tree (if feasible) in order to verify if TDR1 is indeed tardigrade-specific.

      (2) It would be appreciated to indicate the expression level of the different genes discussed in the study, using, for example, transcript per millions (TPMs).Recommendations for the authors: please note that you control which revisions to undertake from the public reviews and recommendations for the authors

      We thank the reviewer for his comments.

      (1) To identify TDR1 homologous sequences in non-tardigrade species, we conducted extensive homology searches using multiple homology-based approaches (Blastp and Diamond against the NCBI non-redundant protein sequences (nr) database and hmmsearch against the EBI reference proteomes), which failed to identify TDR1 homologs in non-tardigrade ecdysozoans, thus strongly supporting that TDR1 is indeed tardigrade-specific.

      To be clearer in the manuscript, we now state the absence of hits for TDR1 in non-tardigrade ecdysozoans. Given the absence of homologs in non-tardigrade species, it is not possible to make a gene tree with non-tardigrade species.

      (2) To further document expression levels (which were already available from the Tables in the initial submission), we added MAplots (representing log2foldchange and logNormalized read counts) in the supplementary materials (Supp Figure 3 and Supp Figure 8). These additional figures clearly document that the DNA repair genes discussed in the main text and TDR1 are highly expressed genes after IR and after Bleomycin treatment.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      We thank the reviewer for his comments.

      (1) It has always seemed strange to me that tardigrades accumulate just as much DNA damage as any other organism when irradiated and yet their Dsup protein is supposed to shield and protect their DNA from damage. Perhaps this is an appropriate time for this idea to be reconsidered given the Dsup was NOT induced by IR in this study and the authors found that their animals incurred just as much damage as other biological systems. While Dsup is clearly not the focus of this manuscript, it is the protein most associated with tardigrade radio-tolerance and I would argue this new paper would call into question previous conclusions made about Dsup.

      We agree that it will be important to further investigate the role of Dsup in radio-tolerance. We briefly mentioned this point in the discussion (p14). Our findings show that tardigrades undergo DNA damage at levels roughly similar to radio-sensitive organisms and therefore support a major role for DNA repair in the maintenance of genome integrity after exposure to IR. Nevertheless, we believe that more precise quantification of DNA damage may still reveal a contribution of genome protection to radio-tolerance of tardigrades compared to radio-sensitive organisms. Dsup loss of function experiments in tardigrades would clearly be the best way to assess this possibility. In the absence of experiments directly addressing the function of Dsup, we prefer to refrain from drawing any firm conclusion on prevention of DNA damage by Dsup and thus to keep a more open position. In any case, as discussed in the text, we note that Dsup has only been reported in Hypsibioidea and other molecular players, such as TDR1, are likely involved in radio-tolerance in other tardigrade species.

      (2) While reverse genetics are difficult in tardigrades, they are not impossible, and RNAi can be used to good effect in these animals. In fact several authors on this manuscript have used RNAi to examine the necessity of genes in tardigrade stress tolerance in the past. Was an attempt made to RNAi TDR1? If not, why? With the large amount of work that the authors put into showing the sufficiency of TDR1 for increasing radiotolerance in cell culture, one would think looking at necessity in tardigrades would be of great interest. If RNAi was performed, what were the results? Even a negative result here is informative since a protein can be sufficient but not necessary for a function - if this were the case it would mean tardigrades have some redundant mechanism(s) for surviving radiation exposure beyond TDR1.

      We have attempted RNAi experiments targeting TDR1 or a mix of DNA repair genes (including XRCC5) and examined response to a bleomycin treatment of 2 weeks. Unfortunately, we could not distinguish any difference between uninjected animals and animals injected with TDR1 dsRNAs , or the mix of DNA repair genes dsRNAs. We concluded that, bleomycin treatment, that we used because it is much easier to perform than irradiation, was perhaps not the best way to assay a potential impact of RNAi on survival since it required long term treatment for several days during which the effect of RNAi may have waned. Another attempt was therefore made injecting with TDR1 or control GFP dsRNAs and exposing animals to a 2000Gy IR treatment. We noticed that the viability was lower after injection with GFP dsRNAs than with TDR1 dsRNAs (likely due to problems we had with the injection needle during injections). The next day, animals were irradiated and we observed after 24h that animals injected with GFP dsRNAs exhibited higher lethality rates than animals injected with TDR1 dsRNAs or uninjected animals. We found that this set of experiments were not conclusive. Our current experimental set up will make it difficult to distinguish lethality due to injections from lethality due to potentially decreased resistance to IR. In particular, many key controls are difficult to make (in particular, we could not confirm the efficiency of target gene knockdown, as it is very challenging given the low amount of biological material available and the poor expression of these genes without irradiation). From a practical point of view, performing these experiments is thus very challenging. We nevertheless agree that, in future work, further experimentation is needed to examine the impact of knock-down by RNAi of TDR1 or of other genes such as DNA repair genes or Dsup, in tardigrade DNA repair and survival after IR. Gene knock-out with CRISPR-Cas9 is a very promising alternative to RNAi given that studies in mutant lines will eliminate the confounding effect of lethality due to injections.

      (3) Regarding the U2OS experiments. I have several questions/points of clarification:

      a. Were survival/proliferation levels tested or only H2AX foci? I think that showing decreased H2AX foci (fewer double-stranded breaks) correlates with higher survival rates would be important.

      In the experiments reported in Figure 6, cells were transiently transfected with expression vectors and we did not examine the impact on survival rates. U2OS cells are resistant to high doses of Bleomycin and testing survival would require longer exposure at much higher concentrations (Buscemi et al, 2014, PMID: 25486478). In order to try and better address an impact on cell survival, we therefore generated populations of cells stably expressing the candidate tardigrade proteins fused to GFP. Despite trying different experiment conditions for treatment with Bleomycin, we could not detect a reproducibly significant benefit on cell survival for any of the tardigrade proteins tested, including RvDsup which was used as a positive control (since it was previously reported to improve cell survival in response to X-rays). One possibility is that the analysis should be performed in clones and not in populations of cells with heterogeneous expression levels of the tardigrade protein tested. For example, expression levels of the tardigrade protein needed to reduce the number of phospho-H2AX foci in response to DNA damage may interfere with cell division. We note that in the original Dsup paper, the benefit of RvDsup on cell survival was reported in specific transgenic clones. Experiments in different biological systems have also started to document toxic effects of RvDsup expression, illustrating the challenge, when performing experiments in heterologous systems, to achieve suitable expression levels of the tested protein. Trying to perform such a finer analysis, in our opinion, would go beyond the scope of our manuscript and will be best addressed in future studies. We are therefore careful in the text not to make any claim on the benefit of TDR1 expression on cell survival in response to Bleomycin in human cultured cells.

      (b) From the methods I am a bit confused as to how the images were treated/foci quantified. With the automatic segmentation and foci identification, is this done through the entire Z-series or a single layer? If the latter then I am not sure the results are meaningful, since we do not know how many foci might be present in other layers of the nuclei analyzed. If the former, please clarify this in the method since it is a very important consideration.

      We have acquired images throughout the entire Z-series and edited the text to make it more clear ; We now write: “ Z-stacks were maximum projected and analyzed with Zen Blue software (v2.3)...”. To limit the time needed for image analysis, we have generated an artificial image by projecting the entire Z-series into a single image and counted foci in that single maximum projection image. Although there are potential drawbacks, such as potentially only counting one focus when two foci are superposed along the Z axis, this approach overcomes the limitations of quantification from a single layer. We further ensured statistical robustness of the analysis by performing quantification from several independent fields of the labelled cells and several independent biological replicates (n>=3 as now specified in the legend of figure 6a).

      (c) RvDsup reduced levels of HXA1 foci in these experiments, however, HeDsup was not found to be enriched in the transcriptomic analysis performed here. Was there a reason HeDsup was not used in the cell-based experiments? One could argue that RvDsup is from a different species of tardigrade, but it is a bit concerning that an ortholog of a protein found NOT to be induced by radiation exposure seems to perform as well (if not better) than some versions of TDR1.

      RvDsup is the protein initially shown to increase survival of human HEK293 cells treated with X-rays and reduce the number of phospho-H2AX foci induced: it was therefore used as a positive control in our experiments. The sequence of HeDsup is only poorly similar to RvDsup (with 26% identity) and activity of HeDsup in cultured cells has not been reported before. We therefore believe that HeDsup is not well suited to provide a positive control for the experiments performed in our manuscript.

      (d) From the methods, it seems that cells were treated with Bleomycin and then immediately fixed without any sort of recovery time. In this short timeframe, the presence of TDR1 appears to be enough to deal with a substantial amount of double-stranded breaks (as evidenced by the reduced number of HXA1 foci). Does this make sense? How quickly could one expect DNA repair machinery to make significant progress in resolving damaged DNA? This response seems much faster than what was observed in tardigrades. Perhaps the authors to comment on this.

      Kinetic studies in human cells show extremely rapid repair of DNA double-strand breaks. Sensing of DNA double strand breaks by PARP proteins takes place within seconds after irradiation by IR (Pandey and Black, 2021, PMID: 33674152). NHEJ is then observed to take place by formation of 53BP1 foci within 15 minutes (Schultz et al, 2000, PMID: 11134068). The number of phospho-H2AX and 53BP1 foci peaks at 30 minutes and starts declining thereafter, showing that at a significant number of sites, DNA repair is proceeding very rapidly (by NHEJ). Although we are not aware of any studies of DNA repair kinetics in U2OS cells after addition of Bleomycin, DNA damage must be instantaneous and further take place during exposure to the drug in parallel to DNA repair, which would be expected to have similar kinetics than after irradiation with IR.

      In our experiments, several mechanisms may be involved in reducing the number of phospho-H2AX foci induced by Bleomycin, such as DNA protection (for Dsup expression) or stimulation of DNA repair (for RNF146 expression). For TDR1, the molecular mechanism involved remains to be determined. Given our finding that TDR1 can form aggregates with DNA, an additional possibility is that clustering of phospho-H2AX foci is induced.

      (4) I could not find the sequences of the TDR1 proteins studied here. I did find the cDNA sequence of HeTDR1 in the final supplementary file, but not the other TDR1 orthologs. In the place where it appeared the TDR1 sequences from other tardigrades should be there were very short segments of the HETDR1 sequence. All sequences of proteins used in this study should be easily accessible to the reader and reviewers as it is not possible to review this work without accessing the sequences.

      Our apologies for the inappropriate documentation of TDR1 sequences in the original manuscript. As requested, we have now included the TDR1 sequences in the Supplementary Table 4.

      (5) Likewise, the RNA sequence data is said to be deposited in NCBI under PRJNA997229, but I do not find this available on NCBI.

      The RNA sequence data was deposited in NCBI under the indicated reference before submission of the manuscript. The data has now been released and is fully available on NCBI.

      (6) A few typographical errors: e.g., Page 10 - sentence 4 has two periods ". ." or page 14 which has an open parenthesis that is not closed.

      These typos have been corrected in the revised manuscript.

      Reviewer #3 (Recommendations For The Authors):

      We thank the reviewer for his comments.

      In Figure 4C, what fraction of the 50 genes upregulated in all species and treatments are DNA repair genes? Is there any other notable commonality between these 50 genes? The bulk of upregulated genes are specific to a species and to treatment with IR or bleomycin. What fraction of DNA repair genes are specific to a species or treatment?

      The results in Figure 4C on the 50 putative orthologous genes upregulated in all species and treatments are further detailed in supp Figure 10. The legend to supp Figure 10 now provides the requested information: 14/50 genes are DNA repair genes and the other notable commonality is that 21/50 are “stress response genes”. We did not further breakdown the analysis to evaluate the fraction of DNA repair genes specific to a species or treatment. It will be interesting to gather data in more species to hed light on the evolutionary history of DNA repair gene regulation in response to IR.

      How does the suite of upregulated tardigrade DNA repair proteins after IR or bleomycin compare with DNA or repair proteins upregulated under similar treatments in human cells? Are they quantitatively or qualitatively different, or both?

      There is a great wealth of studies documenting genes differentially expressed in human cells in response to IR (e.g. Borras-Fresneda et al, 2016, PMID: 27245205; Rieger and Chu, 2004, PMID: 15356296; Budwoeth et al, 2012, PMID: 23144912 ; Rashi-Elkeles et al, 2011, PMID: 21795128; Jen and Cheung, 2003, PMID: 12915489...). Upregulation of DNA repair and cell cycle genes is commonly found. However, the number of DNA repair genes induced is always very limited and fold stimulation very modest compared to the massive upregulation observed in tardigrades.

      On page 14, please explain the acronym BER. Do the authors mean Base Excision Repair? Or something else?

      As assumed by the reviewer, the acronym BER stands for Base Excision Repair. The acronym has been removed from the main text and replaced by the full name.

      Reviewer #4 (Recommendations For The Authors):

      We thank the reviewer for his comments.

      Abstract:

      The abstract is fine. What was hard to grasp at the beginning is why TDR1 gene was named that way. It should be clearer that this study decided to further focus on that gene, one of the most overexpressed gene after IR, with an unknown function. Then maybe introduce that it was found to be unique to tardigrade and to interact with DNA. Therefore, it was named TDR1.

      Introduction:

      The introduction has been modified according to the suggestions of Reviewer#4 below. One of the suggested references, Nicolas et al 2023 from the Van Doninck lab, was published while our manuscript was under review and cannot be considered as background information for our study.

      1st paragraph:

      The study is on tardigrades, I found it strange that the first paragraph is on D. radiodurans. I think it is fine to mention what is known in bacteria and eucaryotes but we should already know what will be the main topic in the first paragraph of the introduction. Some details about D. radiodurans seem less important and distracting from the main topic (3D conformation).

      2nd paragraph:

      When mentioning radio-resistant eurcaryotes the authors do not mention the larvae of the anhydrobiotic insect Polypedilum vanderplanki. Stating that the mechanisms of resistance are poorly characterized should perhaps be nuanced. There are some recent studies on D. radiodurans (Ujaoney et al., 2017) the insect P. vanderplanki (Ryabova et al., 2017), tardigrades (Kamilari et al., 2019), and rotifers (Nicolas et al., 2023, Moris et al., 2023). Perhaps these papers are worth indicating that if mechanisms are not elucidated yet, recent studies suggest some actors involved in their resistance. Regarding the sentence stating that DNA repair rather than DNA protection plays a predominant role in the radio-resistance of bdelloid rotifers should also be nuanced. Indeed, many chaperones, antioxidants were mentioned to play a role in the radio-resistance of bdelloid rotifers (Moris et al., 2023). The authors mentioned the reference Hespeels et al., 2023 which is not found in their list of references, I am not sure which paper they refer to. The last sentence of the second paragraph does not mean much. I am not sure what the authors want to state with this. Perhaps they should specify if they mean that the function of many other genes overexpressed after IR remains unknown.

      Still, in the second paragraph, the authors focus on rotifers. They also do not mention what is known in the insect P. vanderplanki, which should be added. They still do not mention tardigrades. I think it is nice to first start with eucaryotes and then focus on tardigrades but as I mentioned before it would help to understand the aim of the paper if the first paragraph mentioned briefly the tardigrades and then could go into detail in the third paragraph.

      3rd paragraph:

      The sentence starting "with over 1400 species" best to remove from it "but they can differ in their resistance" and start the next sentence with that.

      4th paragraph:

      Very clear, we finally understand what is the focus of the manuscript.

      5th paragraph:

      Very clear. The authors should mention the names of the three studied species. Here, A. antarcticus is missing. The sentence "Further analyses in H. exemplaris... showed that TDR1 protein is present and upregulated". The authors should mention in which conditions the protein is upregulated. In that paragraph the authors mention phospho-H2AX: it might be good to introduce its functions before in the introduction (it is mentioned in the second sentence of the results: best to move it to the introduction).

      Results:

      There are a few sentences in this section which rather discuss the results than describe them. I think the manuscript might gain in quality if these interpretations of the results are moved into the discussion section. That would make the result section more concise and the discussion enriched.

      For instance, I suggest to move these sentences into the discussion:

      • "the finding of persistent DSBs in gonads at 72h.... likely explains...".

      • "suggesting that (i) DNA synthesis..."

      • " Phospho-H2AX....also suggested"

      • "Moreover, expression of TDR1-GFP..., supporting the potential role of TDR1 proteins..."

      • "our results suggest that RNF146 upreguation could contribute..."

      • "AMNP gene g12777 was shown to increase...Based on our results, it is possible that..."

      Interpretations mentioned here above were always introduced cautiously (-"suggesting that (i) DNA synthesis..." ; -" Phospho-H2AX....also suggested" ; -"Moreover, expression of TDR1-GFP..., supporting the potential role of TDR1 proteins..." ; -"our results suggest that RNF146 upreguation could contribute..." ). These cautious interpretations were usually important in deciding next steps of the work. We therefore believe it is important to mention these interpretations in the results section to clearly expose the milestones marking the progression of the study.

      For some results, they were directly discussed in the results section for the sake of concision (for example -"the finding of persistent DSBs in gonads at 72h.... likely explains..."; -"AMNP gene g12777 was shown to increase...Based on our results, it is possible that..." ) since, in our opinion, there was no need to mention them again in the main discussion.

      Some other parts could be good to be moved into the introduction:

      • "Previous studies have indicated that irradiation with IR increases expression of Rad51,..." none of the actors involved in DNA repair are mentioned in the introduction. Also, change resistant into resistance

      • "A. antarcticus ..., known for its resistant to high doses of UV....

      We have moved these parts to the introduction as recommended.

      It was in O. areolatus.... that the first demonstration..."

      This piece of information is somewhat anecdotical. We choose to keep it it here in the results section. This information on the radio-resistance of the species P. areolatus is only relevant at this specific step of the study because it encouraged us to consider that P. fairbanksi, which we isolated fortuitously, would be a good model species for studying radio-resistance of tardigrades.

      Here are some additional comments/suggestions on the result section:

      1st section

      • Remove the Gross et al., 2018 from the sentence "using confocal microscopy", it looks otherwise that these results are from their study, not yours.

      We have changed the text to make it clear that this is indeed a finding of Gross et al which was previously made in non-irradiated tardigrades. We replicated this finding, which showed that the protocol was working appropriately, and that we could use this control result for comparison with irradiated animals. We apologize for this confusion.

      The text now states: “Using confocal microscopy, we could detect DNA synthesis in replicating intestinal cells of control animals, as previously shown by (Gross et al. 2018).”

      2nd section

      • It is confusing what has been found induced by IR and/or by Bleomycin.

      • I think it might help if the authors first present what is induced after IR, then write if it is similar after Bleomycin. Especially since they start to do it in the first paragraph of that section. However, they only mention TDR1 in the second paragraph dedicated to Bleomycin treatment which is confusing as it is also overexpressed after IR. It is also not clear if RNF146 is also induced by Bleomycin.

      As recommended, the text presents first what is induced after IR and then what is induced by Bleomycin in the following paragraph. When reporting results with Bleomycin, we have provided a global assessment of what is common to both treatments in Supp Figure 3 and in Supp Table 3. In this figure, we also specifically highlighted several key genes of DNA repair induced by both treatments. These are also mentioned in the text (p8) to illustrate the point that many key DNA repair genes are common to both treatments. We have now added RNF146 to that list as recommended.

      • Regarding TDR1, it is not clear when introduced in the text as "promising candidate" why it is the case. It is clear in the figures but perhaps the authors should explain why they chose these genes for further analyses: high log2foldchange and expression level for instance. Regarding that last comment, it would be interesting to have an idea about the expression level of the genes with high log2foldchange. In Figures 2, 3, and 4 the pvalue and log2foldchange are represented but not the expression level (ideally Transcript per Millions). These values would give an additional idea on the importance of that gene. While looking at the figures, it is unclear why you did not further characterize other genes with high log2foldchange (some with even hints of their function): the mentioned RNF146, macroH2A1 (not even mentioned in the results), some genes unannotated in the figures with likely unknown functions,

      When selecting genes of interest, we did indeed take into account high expression levels. To more clearly document expression levels (which were already available from the Tables), we added MAplots (representing log2foldchange and logNormalized read counts) in the supplementary materials (Supp Figure 3 and Supp Figure 8).

      • It is also unclear at that stage why you named it "Tardigrade DNA damage response protein", as it is characterized as DNA repair/damage proteins by specific GO id or is it based on your downstream analyses, I think it might be worth to quickly mention the reason of that name.

      The name illustrates two points which were already characteristic at this point in time of the study i.e. 1) it is a tardigrade specific protein and 2) it is induced in response to DNA damage.

      • Regarding the BLAST analyses the protein was searched in C. elegans, D. melanogaster and H. sapiens. Why only these three species? What were the threshold evalues used for these analyses. As mentioned in the main comment, it would be worth searching species phylogenetically close to tardigrades to verify if it is well-tardigrade specific. Did you try to make a gene tree, after looking for a conserved domain (using hmmersearch)?

      As indicated in the methods section, the “Tardigrade-specific" annotation was determined by absence of hits after high-throughput alignment (with diamond using –ultrasensitive-option) on the NCBI nr database and absence of hits after blast search on C. elegans, D. melanogaster and H. sapiens proteomes as a complementary criterion (the latter blast search was primarily performed to enrich for functional annotations). Based on these criteria, TDR1 was annotated as “Tardigrade-specific”. As stated in the text, we also searched for TDR1 related sequences with 1) blastp (which is more sensitive than diamond) on the NCBI nr database and 2) HMMER on Reference Proteomes, and no hits were found among non-tardigrade ecdysozoans organisms, confirming TDR1 is specific to tardigrades. For Blast search for example, there were five hits in non-ecdysozoans organisms (two cephalochordates, one mollusc and two echinoderma). The blastp and HMMER results are now included in the revised supplementary material (Supp Table 5). These very few hits in species phylogenetically distant from tardigrades cannot be taken to support the existence of TDR1 genes outside tardigrades.

      To be clearer in the manuscript, we now state the absence of hits for TDR1 in non-tardigrade ecdysozoans. Given the absence of homologs in non-tardigrade species, it is not possible to make a gene tree with non-tardigrade species.

      • Page 9: "Proteins extracts from H. exemplaris... at 4h and 24h..." I think this sentence can be removed as this is mentioned again 2 paragraphs after: "...we conducted an unbiased proteome analysis... at 4h..." The log2foldchange threshold mentioned for the proteomic analyses is 0.3: why this threshold, was it chosen randomly?

      This is threshold is commonly used when considering log2foldchange with the technology used in our study, an isobaric multiplexed quantitative proteomic strategy which is known to compress ratios (Hogrebe et al. 2018).

      • Page 10:

      It would be good for more clarity to indicate at the beginning of the new section which species were investigated after IR or Bleomycin treatment.

      TDR1 homologs in the other tardigrade species were identified based on what? Best reciprocal hit?

      As indicated in the methods section of the manuscript, we searched for homologs in other tardigrade species by BLAST. A best reciprocal hit approach was not performed to try to determine which homologs might be orthologs. In particular, most TDR1 homologs identified are known from transcriptome assemblies and high-contiguity genome assemblies are needed to more confidently identify orthology (using synteny). The results of the BLASTP search are now provided as supplementary material (Supp Table 5).

      Preliminary experiments indicated that A. antarcticus and P. fairbanski survived exposure to 1000 Gy: is there a supplementary graph showing this?

      We have corrected the text to avoid any confusion. We have not rigorously examined the dose-dependent survival of P. fairbanksi in response to irradiation. Text was changed to: “We found by visual inspection of animals after IR that A. antarcticus and P. fairbanksi readily survived exposure to 1000 Gy.”

      • Page 11:

      "A set of 50 genes was upregulated in the three species": please be precise if only after IR.

      Done

      These genes cannot be the same as they are from different species. Did the author mean that they are coding for similar proteins? It might be good to give some more details even if the supplementary figure is mentioned.

      Obviously, these genes are putative orthologs. We have changed the text to:

      ” a set of 50 putative orthologous genes was upregulated in response to IR in all three species”

      Discussion:

      • General comment: the discussion is focused mainly on TDR1, it would be nice to also discuss the other results: DNA repair genes, RNF146.

      A whole paragraph is devoted to discussion of results on DNA repair genes and RNF146. We have extended that discussion following on the suggestion of the reviewer. In particular, we have explicitly mentioned the apparent paradox that XRCC5 and XRCC6, which are among the most highly stimulated genes at the mRNA level, only display modest upregulation at the protein level. Although further studies would be needed to examine the mechanisms involved, we propose that upregulation of RNF146, whose human homolog has been shown to drive degradation of PARylated XRCC5 and XRCC6 proteins in response to IR (Kang et al. 2011), may be responsible for higher degradation rates and may thus counterbalance increased levels of protein synthesis.

      • Pulse field electrophoresis would be nice to be performed. It has been used to assess DSBs in bdelloid rotifers, is it possible in tardigrades?

      As stated in the discussion, we believe that it would be challenging to perform pulse field electrophoresis in tardigrades. However, if possible, these experiments would certainly bring invaluable information to complement our analysis of DNA damage induced by IR.

      • "By comparative transcriptomics": please rephrase that sentence.

      • Proteins acting early in DNA repair: I am not sure I understand this sentence. Actors as ligases act not at the beginning of the repair pathways.

      Well noted. We have removed ligases from the list.

      • It is confusing that the authors mention NHEJ and double-strand break repair pathways as different pathways. There are 2 main pathways to repair DBSs: NHEJ and HR. It would be nice to add a reference to the sentence "PARP proteins act as sensors of DNA damage etc."

      A typo in the sentence gave rise to the misleading suggestion that NHEJ is not a double strand repair pathway. It has been corrected.

      A reference has been added for PARP proteins.

      • It would be nice if the authors can explain deeper their suggestion that degradation of DNA repair actors is essential for tardigrade IR resistance.

      We have expanded this part of the discussion and hope that it is clearer.

      “For XRCC5 and XRCC6, our studyestablished, by two independent methods, proteomics and Western blot analysies, that the stimulation at the protein level could be much more modest (6 and 20-fold at most (Supp Figure 6) than at the RNA level (420 and 90 fold respectively). This finding suggests that the abundance of DNA repair proteins does not simply increase massively to quantitatively match high numbers of DNA damages. Interestingly, in response to IR, the RNF146 ubiquitin ligase was also found to be strongly upregulated. RNF146 was previously shown to interact with PARylated XRCC5 and XRCC6 and to target them for degradation by the ubiquitin-proteasome system (Kang et al. 2011). To explain the lower fold stimulation of XRCC5 and XRCC6 at the protein levels, it is therefore tempting to speculate that, XRCC5 and XRCC6 protein levels (and perhaps that of other scaffolding complexes of DNA repair as well) are regulated by a dynamic balance of synthesis, promoted by gene overexpression, and degradation, made possible by RNF146 upregulation. Consistent with this hypothesis, we found that, similar to human RNF146 (Kang et al. 2011), He-RNF146 expression in human cells reduced the number of phospho-H2AX foci detected in response to Bleomycin (Figure 6).”

      • Page 15: Please add a reference for the sentence "Functional analysis of promotor sequences in transgenic tardigrades etc."

      The reference has been added to fix this omission.

      Material and Methods:

      Small comments:

      • 40 μm mesh: space missing

      • 100 μm mesh: space missing

      • (for Bleomycin)): parenthesis missing

      • remove "as indicated in the text"

      • The investigated time points after radiation need to be clearly stated in the method section. It is also unclear in the IR and Bleomycin section which tardigrades were treated with what. Not all were treated with Bleomycin.

      The small comments above have been fixed in the revised version of the manuscript.

      • Page 21: please precise the coverage of the RNA sequencing

      Statistics on mapping of RNAseq reads are now provided in Supp Table 10.

      • Page 22: Was any read trimming performed? Anything about the quality check of the reads?

      Trimming was conducted using trimmomatic (v0.39) and quality check using FastQC (v. ?) This information has been added to the Methods section.

      • Were the analyses confirmed by a second approach: for instance, EdgeR? Deseq2 and EdgeR do not always have the same results. For more robust analyses it is advised to use both.

      Differential transcriptome analyses were conducted with DESeq2 only. The robustness of our identification of differentially expressed genes in response to IR stems from performing comparative analyses in three different species, rather than from using two bioinformatics pipelines in a single species. We also note that benchmarking reported in the initial DEseq2 paper showed that identification of differentially expressed genes with large log fold changes (which, as reported in our manuscript, is characteristic of many DNA repair genes in response to IR) is very consistent between DEseq2 and EdgeR.

      Figures:

      • Figure 2: Legend vertical dotted line does not indicate log2foldchange value of 4 in all panels: it would be good to indicate for panels a and c as well.

      Figure 2has been improved following on the suggestions of the reviewer. Dotted lines now show log2foldchange value of 2 in all panels (ie Fold Change of 4 as mentioned in the main text).

      • Figure 2C: There are a few points with high log2foldchange which are not annotated: was it because nothing was found in the blast research? If yes, it would be good to indicate their functions. If not, it would be good to mention in the discussion that there are some genes with still unknown functions which might play an important role in the resistance of tardigrades to IR.

      The few points which are not annotated in figure 2c can now be found in Supp Table 3 Some of them have no hit in Blast search, some others such as BV898_09662 or BV898_07145 have hits on DNA repair genes as RBBP8/CtIP or XRCC6 respectively but are not annnotated as such by eggnog in KEGG pathway.

      • Figure 4C: Why not have included the response of P. fairbanski to bleomycin? I guess it was not done, but it is unclear in the results and methods sections.

      P.fairbanksi response to bleomycin wasn’t assessed as we didn’t get enough animals to run the study. The method section has been modified to precise this point.

    1. Author response:

      Reviewer #1 (Public Review):

      This study makes a substantial contribution to our understanding of the molecular evolutionary dynamics of microbial genomes by proposing a model that incorporates relatively frequent adaptive reversion mutations. In many ways, this makes sense from my own experience with evolutionary genomic data of microbes, where reversions are surprisingly familiar as evidence of the immense power of selection in large populations.

      One criticism is the reliance on one major data set of B. fragilis to test fits of these models, but this is relatively minor in my opinion and can be caveated by discussion of other relevant datasets for parallel investigation.

      We analyze data from 10 species of the Bacteroidales family, and we compare it to a dataset of Bacteroides fragilis. We have now added a reference to a recent manuscript from our group showing phenotypic alteration by reversion of a stop codon and further breaking of the same pathway through stop codons in other genes in Burkholderia dolosa on page 9, and have added a new analysis of codon usage in support of the reversion model on page 14.

      We have chosen not to analyze other species as there are no large data sets with rigorous and evenly-applied quality control across scales. We anticipate the reversion model would be able to fit the data in these cases. We now note that this work remains to be done in the discussion.

      Another point is that this problem isn't as new as the manuscript indicates, see for example https://journals.asm.org/doi/10.1128/aem.02002-20 .

      Loo et al puts forward an explanation similar to the purifying model proposed by Rocha et al, which we refute here. Quoting from Loo et al: “Our results confirm the observation that nonsynonymous SNPs are relatively elevated under shorter time periods and that purifying selection is more apparent over longer periods or during transmission.” While there is some linguistic similarity between the weak purifying model and our model of strong local adaptation model and strong adaptive reversion, we believe that the dynamical and predictive implications suggested by the reversion model are an important conceptual leap and correction to the literature. We now cite Loo et al and additional works cited therein. We have updated the abstract, introduction, and discussion to further emphasize the distinction of the reversion model from previous models: namely the implication of the reversion model that long-time scale dN/dS hides dynamics.

      Nonetheless, the paper succeeds by both developing theory and offering concrete parameters to illustrate the magnitudes of the problems that distinguish competing ideas, for example, the risk of mutational load posed in the absence of frequent back mutation.

      Reviewer #2 (Public Review):

      This manuscript asks how different forms of selection affect the patterns of genetic diversity in microbial populations. One popular metric used to infer signatures of selection is dN/dS, the ratio of nonsynonymous to synonymous distances between two genomes. Previous observations across many bacterial species have found dN/dS decreases with dS, which is a proxy for the divergence time. The most common interpretation of this pattern was proposed by Rocha et al. (2006), who suggested the excess in nonsynonymous mutations on short divergence times represent transient deleterious mutations that have not yet been purged by selection.

      In this study, the authors propose an alternative model based on the population structure of human gut bacteria, in which dN is dominated by selective sweeps of SNPs that revert previous mutations within local populations. The authors argue that contrary to standard population genetics models, which are based on the population dynamics of large eukaryotes, the large populations in the human gut mean that reversions may be quite common and may have a large impact on evolutionary dynamics. They show that such a model can fit the decrease of dN/dS in time at least as well as the purifying selection model.

      Strengths

      The main strength of the manuscript is to show that adaptive sweeps in gut microbial populations can lead to small dN/dS. While previous work has shown that using dN/dS to infer the strength of selection within a population is problematic (see Kryazhimskiy and Plotkin, 2008, cited in the paper) the particular mechanism proposed by the authors is new to my knowledge. In addition, despite the known caveats, dN/dS values are still routinely reported in studies of microbial evolution, and so their interpretation should be of considerable interest to the community.

      The authors provide compelling justification for the importance of adaptive reversions and make a good case that these need to be carefully considered by future studies of microbial evolution. The authors show that their model can fit the data as well as the standard model based on purifying selection and the parameters they infer appear to be plausible given known data. More generally, I found the discussion on the implications of traditional population genetics models in the context of human gut bacteria to be a valuable contribution of the paper.

      Thank you for the kind words and appreciation of the manuscript.

      Weaknesses

      The authors argue that the purifying selection model would predict a gradual loss in fitness via Muller's ratchet. This is true if recombination is ignored, but this assumption is inconsistent with the data from Garud, et al. (2019) cited in the manuscript, who showed a significant linkage decrease in the bacteria also used in this study.

      We now investigate the effect of recombination on the purifying selection model on page 8 and in Supplementary Figure S6. In short, we show that reasonable levels of recombination (obtained from literature r/m values) cannot rescue the purifying selection model from Muller’s ratchet when s is so low and the influx of new deleterious mutations is so high. We thank the reviewers for prompting this improvement.

      I also found that the data analysis part of the paper added little new to what was previously known. Most of the data comes directly from the Garud et al. study and the analysis is very similar as well. Even if other appropriate data may not currently be available, I feel that more could be done to test specific predictions of the model with more careful analysis.

      In addition to new analyses regarding recombination and compensatory mutations using the Garud et al data set, we have now added two new analyses, both using Bacteroides fragilis . First, we show that de novo mutations in Zhao & Lieberman et al dataset include an enrichment of premature stop codons (page 9). Second we show that genes expected to be under fluctuating selection in B. fragilis displays a significant closeness to stop codons, consistent with recent stop codons and reversions. We thank the reviewer for prompting the improvement.

      Finally, I found the description of the underlying assumptions of the model and the theoretical results difficult to understand. I could not, for example, relate the fitting parameters nloci and Tadapt to the simulations after reading the main text and the supplement. In addition, it was not clear to me if simulations involved actual hosts or how the changes in selection coefficients for different sites was implemented. Note that these are not simply issues of exposition since the specific implementation of the model could conceivably lead to different results. For example, if the environmental change is due to the colonization of a different host, it would presumably affect the selection coefficients at many sites at once and lead to clonal interference. Related to this point, it was also not clear that the weak mutation strong selection assumption is consistent with the microscopic parameters of the model. The authors also mention that "superspreading" may somehow make a difference to the probability of maintaining the least loaded class in the purifying selection model, but what they mean by this was not adequately explained.

      We apologize for leaving the specifics of the implementation from the paper and only accessible through the Github page and have corrected this. We have added a new section in the methods further detailing the reversion model and the specifics of how nloci and Tadapt (now tau_switch as of the edits) are implemented in the code.

      The possibility for clonal interference is indeed included in the simulation. Switching is not correlated with transmissions in our main figure simulations (Figure 4a). When we run simulations in which transmission and selection are correlated, the results remain essentially the same, barring higher variance at lower divergences (new Figure S10). We have now clarified these points in the results, and have also better clarified the selection only at transmission model in the main results.

      Reviewer #3 (Public Review):

      The diversity of bacterial species in the human gut microbiome is widely known, but the extensive diversity within each species is far less appreciated. Strains found in individuals on opposite sides of the globe can differ by as little as handfuls of mutations, while strains found in an individual's gut, or in the same household, might have a common ancestor tens of thousands of years ago. What are the evolutionary, ecological, and transmission dynamics that established and maintain this diversity?

      The time, T, since the common ancestor of two strains, can be directly inferred by comparing their core genomes and finding the fraction of synonymous (non-amino acid changing) sites at which they differ: dS. With the per-site per-generation mutation rate, μ, and the mean generation times roughly known, this directly yields T (albeit with substantial uncertainty of the generation time.) A traditional way to probe the extent to which selection plays a role is to study pairs of strains and compare the fraction of non-synonymous (amino acid or stop-codon changing) sites, dN, at which the strains differ with their dS. Small dN/dS, as found between distantly related strains, is attributed to purifying selection against deleterious mutations dominating over mutations that have driven adaptive evolution. Large dN/dS as found in laboratory evolution experiments, is caused by beneficial mutations that quickly arise in large bacterial populations, and, with substantial selective advantages, per generation, can rise to high abundance fast enough that very few synonymous mutations arise in the lineages that take over the population.

      A number of studies (including by Lieberman's group) have analyzed large numbers of strains of various dominant human gut species and studied how dN/dS varies. Although between closely related strains the variations are large -- often much larger than attributable to just statistical variations -- a systematic trend from dN/dS around unity or larger for close relatives to dN/dS ~ 0.1 for more distant relatives has been found in enough species that it is natural to conjecture a general explanation.

      The conventional explanation is that, for close relatives, the effects of selection over the time since they diverged has not yet purged weakly deleterious mutations that arose by chance -- roughly mutations with sT<1 -- while since the common ancestor of more distantly related strains, there is plenty of time for most of those that arose to have been purged.

      Torrillo and Lieberman have carried out an in-depth -- sophisticated and quantitative -- analysis of models of some of the evolutionary processes that shape the dependence of dN/dS on dS -- and hence on their divergence time, T. They first review the purifying selection model and show that -- even ignoring its inability to explain dN/dS > 1 for many closely related pairs -- the model has major problems explaining the crossover from dN/dS somewhat less than unity to much smaller values as dS goes through -- on a logarithmic scale -- the 10^-4 range. The first problem, already seen in the infinite-population-size deterministic model, is that a very large fraction of non-synonymous mutations would have to have deleterious s's in the 10^-5 per generation range to fit the data (and a small fraction effectively neutral). As the s's are naturally expected (at least in the absence of quantitative analysis to the contrary) to be spread out over a wide range on a logarithmic scale of s, this seems implausible. But the authors go further and analyze the effects of fluctuations that occur even in the very large populations: ~ >10^12 bacteria per species in one gut, and 10^10 human guts globally. They show that Muller's ratchet -- the gradual accumulation of weakly deleterious mutations that are not purged by selection - leads to a mutational meltdown with the parameters needed to fit the purifying selection model. In particular, with N_e the "effective population size" that roughly parametrizes the magnitude of stochastic birth-death and transition fluctuations, and U the total mutation rate to such deleterious mutations this occurs for U/s > log(sN_e) which they show would obtain with the fitted parameters.

      Torrillo and Lieberman promise an alternate model: that there are a modest number of "loci" at which conditionally beneficial mutations can occur that are beneficial in some individual guts (or other environmental conditions) at some times, but deleterious in other (or the same) gut at other times. With the ancestors of a pair of strains having passed through one too many individuals and transmissions, it is possible for a beneficial mutation to occur and rise in the population, only later to be reverted by the beneficial inverse mutation. With tens of loci at which this can occur, they show that this process could explain the drop of dN/dS from short times -- in which very few such mutations have occurred -- to very long times by which most have flipped back and forth so that a random pair of strains will have the same nucleotide at such sites with 50% probability. Their qualitative analysis of a minimally simple model of this process shows that the bacterial populations are plenty big enough for such specific mutations to occur many times in each individual's gut, and with modest beneficials, to takeover. With a few of these conditionally beneficial mutations or reversions occurring during an individuals lifetime, they get a reasonably quantitative agreement with the dN/dS vs dS data with very few parameters. A key assumption of their model is that genetically exact reversion mutations are far more likely to takeover a gut population -- and spread -- than compensatory mutations which have a similar phenotypic-reversion effect: a mutation that is reverted does not show up in dN, while one that is compensated by another shows up as a two-mutation difference after the environment has changed twice.

      Strengths:

      The quantitative arguments made against the conventional purifying selection model are highly compelling, especially the consideration of multiple aspects that are usually ignored, including -- crucially -- how Muller's ratchet arises and depends on the realistic and needed-to-fit parameters; the effects of bottlenecks in transmission and the possibility that purifying selection mainly occurs then; and complications of the model of a single deleterious s, to include a distribution of selective disadvantages. Generally, the author's approach of focusing on the simplest models with as few as possible parameters (some roughly known), and then adding in various effects one-by-one, is outstanding and, in being used to analyze environmental microbial data, exceptional.

      The reversion model the authors propose and study is a simple general one and they again explore carefully various aspects of it -- including dynamics within and between hosts -- and the consequent qualitative and quantitative effects. Again, the quantitive analysis of almost all aspects is exemplary. Although it is hard to make a compelling guess of the number of loci that are subject to alternating selection on the needed time-scales (years to centuries) they make a reasonable argument for a lower bound in terms of the number of known invertible promoters (that can genetically switch gene expression on and off).

      We are very grateful for the reviewer’s kind words and careful reading.

      Weaknesses:

      The primary weakness of this paper is one that the author's are completely open about: the assumption that, collectively, any of possibly-many compensatory mutations that could phenotypically revert an earlier mutation, are less likely to arise and takeover local populations than the exact specific reversion mutation. While detailed analysis of this is, reasonably enough, beyond the scope of the present paper, more discussion of this issue would add substantially to this work. Quantitatively, the problem is that even a modest number of compensatory mutations occurring as the environmental pressures change could lead to enough accumulation of non-synonymous mutations that they could cause dN/dS to stay large -- easily >1 -- to much larger dS than is observed. If, say, the appropriate locus is a gene, the number of combinations of mutations that are better in each environment would play a role in how large dN would saturate to in the steady state (1/2 of n_loci in the author's model). It is possible that clonal interference between compensatory and reversion mutations would result in the mutations with the largest s -- eg, as mentioned, reversion of a stop codon -- being much more likely to take over, and this could limit the typical number of differences between quite well-diverged strains. However, the reversion and subsequent re-reversion would have to both beat out other possible compensatory mutations -- naively less likely. I recommend that a few sentences in the Discussion be added on this important issue along with comments on the more general puzzle -- at least to this reader! -- as to why there appear to be so little adaptive genetic changes in core genomes on time scales of human lifetimes and civilization.

      We now directly consider compensatory mutations (page 14, SI text 3.2, and Supplementary Figure 12). We show that as long as true reversions are more likely than compensatory mutations overall, (adaptive) nonsynonymous mutations will still tend to revert towards their initial state and not contribute to asymptotic dN/dS, and show that true reversions are expected in a large swath of parameter space. Thank you for motivating this improvement!

      We note in the discussion that directional selection could be incorporated into the parameter alpha (assuming even more of the genome is deleterious) on page 16.

      An important feature of gut bacterial evolution that is now being intensely studied is only mentioned in passing at the end of this paper: horizontal transfer and recombination of core genetic material. As this tends to bring in many more mutations overall than occur in regions of a pair of genomes with asexual ancestry, the effects cannot be neglected. To what extent can this give rise to a similar dependence of dN/dS on dS as seen in the data? Of course, such a picture begs the question as to what sets the low dN/dS of segments that are recombined --- often from genetic distances comparable to the diameter of the species.

      We now discuss the effect of recombination on the purifying selection model on page 8 and in Supplementary Figure S6. In short, we now show that reasonable levels of recombination cannot rescue the purifying selection model from Muller’s ratchet when s is so low and the influx of new deleterious mutations is so high. We thank the reviewers for prompting this improvement

    1. Author response:

      Reviewer #1 (Public Review):

      1) Napthylamine (1NA), an industrial reagent used in the manufacturing of dyes and pesticides is harmful to humans and the environment. In the current manuscript, the authors report the successful isolation of a Pseudomonas strain from a former naphthylamine manufacturing site that is capable of degrading 1NA. Using genetic and enzymatic analysis they identified the initial stages of 1NA degradation and the enzymes responsible for downstream processing of 1,2-dihydroxynapthalene and Salicylate. The authors determined the molecular structure of NpaA1, the first enzyme in the pathway responsible for glutamylation of 1NA. NpaA1 has a border substrate specificity compared to previously characterized enzymes involved in aromatic amine degradation. They carried out structural comparison of NpaA1 with glutamine synthase structures, alfa-fold models of similar enzymes and put forth hypothesis to explain the broad substrate specificity of NpaA1.

      The manuscript is well written and easy to understand. The authors carried out careful genetic analysis to identify the genes/enzymes responsible for degradation of 1NA to catechol. They characterized the first enzyme in the pathway, NpaA1 which is responsible glutamylation of 1NA. and determined the molecular structure of apo-NpaA1, NpaA1 - AMPPNP complex and Npa1 - ADP - Met-Sox-P complex using X-ray crystallography.

      The proposed mechanism of broad substrate specificity of NpaA1, however, is based on comparison of 1NA docked NpaA1 structure with St-GS (Glutamate synthase) and Alphafold2 predicted model of AtdA1 from an aniline degrading strain of Acinetobacter sp. Lack of molecular structure or mutational studies to back the proposed mechanism makes it difficult to agree with the proposed mechanism.

      We appreciate your valuable comments. To further demonstrate that the structure of the aromatic amine binding tunnel and active pocket determines the broad substrate specificity of NpaA1, we have conducted additional experiments with several key residue mutants of the binding tunnel for naphthylamine and monocyclic aniline activities. The results provide a more detailed elucidation of the reasons for NpaA1's broad substrate specificity. Specific results and analyses are provided in the subsequent response.

      Reviewer #2 (Public Review):

      Microbial degradation of synthetic organic compounds is the basis of bioremediation. Biodegradation of 1NA has not been previously reported. The report describes a complete study of 1NA biodegradation by a new isolate Pseudomonas sp. strain JS3066. The study includes the enrichment and isolation of the 1NA-degrading bacterium Pseudomonas sp. strain JS3066, the identification of the genes and enzymes involved in 1NA degradation, and the detailed characterization of γ-glutamylorganoamide synthetase by using biochemical and structural analysis. In the discussion, the potential evolution of 1NA degradation pathway, the similarity and difference between γ-glutamylorganoamide synthetase and glutamine synthetase, and the significance were explained. The conclusions were well supported by the results presented.

      We deeply appreciate the reviewer’s comments on the manuscript. We have responded to the recommendations one by one in the later section.

    1. Author response:

      Reviewer #1 (Public Review):

      “… it remains unclear how ninein reduction causes bone defects …”

      We have added several control experiments that permit us to conclude that osteoblast numbers remain unaltered in the ninein-knockout embryos, and that bone abnormalities in vivo are caused by fusion defects of osteoclast precursor cells, whereas the proliferation, viability, or the adhesion of these precursor cells remain unaffected. For details, please see our comments below.

      “Discussion includes several unfounded potential mechanisms that really need to be thoroughly analyzed to gain a mechanistic understanding of the bone defects…”

      The new data back up our claim of fusion defects as a cause for limited osteoclast function. We have re-written parts of the discussion, to take into account our new findings.

      “Data showing normal osteoblasts in ninein-null mice was qualitative and requires further in-depth analysis and quantification of osteoblast …”

      To address this point, quantification of osteoblast numbers in tibiae at E16.5 and E18.5 was performed in control and ninein-deleted mouse embryos. The data are presented in the new Figures 3G and J.

      “In ninein knock-out mice, reduced TRAP+ve multinuclear cells were observed (Figure 6A and 6B). However, the magnitude of difference (about 5% decrease in multinucleated cells) is not consistent with the skeletal deformities reported in Figures 2-4, potentially suggesting the contribution of additional mechanisms.”

      We agree that the difference appears to be small at first glance, but nevertheless it remains statistically significant (a more than three-fold difference). We would like to recall that these observations (Fig. 6A) were performed at E14.5, i.e. at a stage when no ossification has occurred yet. We are looking at the first fusion events of myeloid precursors, likely derived from the fetal liver, that colonize the area of the first bone to form, and small differences in the number of functional osteoclasts may account for different timing of ossification. We think that differences in osteoclast fusion also account for the premature appearance of ossification centers for other skeletal elements, at later time points during development.

      “The fusion assay in Figure 6C needs further clarification. How was the syncytia perimeter defined to measure cell surface? The x-axis suggests that there are syncytia that contain up to 160 nuclei at day 3. How were the nuclei differentially stained and quantified?”

      We provide now additional information on the experimental approach in the revised manuscript, on pages 16-17 (Materials and Methods). For information: high numbers of syncytial nuclei in cultures were also observed by other groups in the past (Tiedemann et al., 2017, Front Cell Dev Biol. 5:54). In addition, we performed new experiments and quantified the fusion of osteoclast precursors by staining for actin and nuclei (new Figure 7C). This allowed us to quantify several additional parameters related to cell fusion (as initially performed in Raynaud-Messina et al., 2018, PNAS, 115:E2556-E2565).

      “Some text needs clarification. … What is the definition of "large syncytia"? Is the fusion index increase by day 5 diminished in later days? A graph of the syncytia size/ nuclei number or fusion index in the above-mentioned days will be helpful.”

      Information on the definition of “large syncytia” is now provided on page 10 (1st paragraph). We added further experimental details on osteoclast size for days 3, 4, and 5 in the supplemental Figures 7A and B. Most importantly, we performed additional assays of the fusion index by quantifying syncytial versus non-syncytial nuclei in a semi-automated manner. The new data are presented in Figure 7C, and the methods are explained on page 17. Together with our new analysis of cell proliferation, cell viability, and cell adhesion (Figure 7C, D, suppl. Fig. 7C-G), we provide now solid evidence for a fusion defect at the origin of impaired formation of ninein del/del osteoclasts.

      “Assessment of resorption was qualitative in Figure 6E and since the fusion deficiencies are transient, quantification of a corresponding resorption activity is needed. This should be described in the Materials and Methods section.”

      Quantifications of the bone resorption activities are now provided in the new Figure 7E, and a reference for the methods is provided on page 16.

      “Further experiments are needed to show connections between reduced centrosome clustering and reduced osteoclast formation as there is no evidence to date that suggest centrosome clustering is required for cell fusion. Multi-color live imaging and dynamic analysis can be used to determine if the ninein deficient cells show defective movement/migration/ fusion dynamics.”

      We agree that it is an important question, and studying potential links between centrosomal microtubule organization and osteoclast fusion is an ongoing project of the team. However, we estimate that in order to obtain conclusive results this will require 1-2 additional years of research activity, and we intend to present this as a separate project in the future. At the current point of our investigation, we think that providing a solid link between ninein, osteoclast fusion, and controlled timing of ossification, as shown in this manuscript, represents valuable progress to understand previously published bone abnormalities in patients with ninein mutations.

      “Quantification of the % of multinucleated osteoclasts that contain clustered and dispersed centrosomes is needed.”

      New quantification experiments on centrosome clustering are now provided in Figure 8H. These quantifications demonstrate that the potential of centrosome clustering is almost completely lost in osteoclasts without ninein.

      Reviewer #2 (Public Review):

      “Based on the decrease in the number of osteoclasts (Fig 5E, G, and also per coverslip after 2 days in culture), the authors suggest that the loss of ninein impacts osteoclast proliferation. First, proliferation can be directly quantified using Ki67 staining or EdU incorporation. Second, other interpretations are also plausible and can also be experimentally tested. These include less adhesion and attachment of the mutants to the coverslips, but perhaps more relevant in vivo is cell death of the ninein mutant osteoclasts. It has been established that the loss of centrosome function activates p53- dependent cell death and osteoclasts might be a vulnerable cell population. Quantifying p53 immunoreactivity and/or cell death in osteoclasts might help clarify the phenotype of osteoclast reduction.”

      In response to the reviewers, we have performed a series of new experiments that include

      1) A careful analysis of the fusion index, using a semi-automated approach, indicating significant differences in the fusion of precursor cells into osteoclasts (Fig. 7C).

      2) We have repeated the quantification of cell numbers prior to fusion and find variations between samples from different mice (also among mice of the same genotype), but we see on average comparable cell adhesion between samples from control mice and ninein-del/del mice. The data are provided in the supplemental Figure 7F. Moreover, we have quantified the expression of three main beta-integrins at the surface of control and ninein del/del osteoclast precursors (suppl. Fig. 7G), without detecting significant differences. Altogether, these data suggest the cell adhesion is comparable for the two genotypes.

      3) We have addressed the question of altered cell proliferation, by performing flow cytometry experiments and by quantifying the different cell cycle stages (Fig. 7D), and by quantifying Ki67 expression (suppl. Fig. 7C). We see no significant differences between samples from control and ninein-del/del mice.

      4) We have addressed the question of cell death, by performing Annexin V staining and flow cytometry (suppl. Fig. 7D), and by immunoblotting for cleaved caspase 3 and PARP (suppl. Fig. 7E). These experiments reveal no significant differences between the control and ninein del/del samples. Our data permit us to exclude cell death as a likely cause for the reduction of fused osteoclasts in the absence of ninein.

      Overall, the new experiments show that the defects in osteoclast formation from ninein-deleted samples are due to defects in cell fusion, but not in cell proliferation, cell adhesion or viability.

      Reviewer #3 (Public Review):

      “The authors put much emphasis on the centrosome in the Introduction session. However, it was not until Figure 7 did they show abnormal centriole clustering in osteoclasts. The introduction should include more background on osteoclast and osteoblast balance during skeletal development.”

      To address this, we included more background on the role of osteoclasts and osteoblasts in the revised introduction (page 4).

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Recommendations For The Authors):

      Results showing reactivation for near and far items separately are now included in Fig. 5 and convincingly suggest a simultaneous reactivation. For me, the open question remaining (see public) review is the degree to which the methods used here to show clustered vs sequential reactivation are mutually exclusive; and if the pre-selection of a time window of peak reactivation (based on all future items) biases the analyses towards clustered reactivation. The discussion would benefit from a brief discussion of these issues.

      We have added a brief discussion of the issues. However, we want to clarify a minor point of the public review: While our interpretation implies that replay and reactivation are probably mutually exclusive within a single retrieval event, it does not imply that strategies cannot vary within different retrieval events of the same participant. Nevertheless, we want to address this raised concern (that is, if we understand correctly, that replay events that are contained within the time window of the reactivation analysis could not be distinguished by the chosen methods) and have added it to the discussion.

      The corresponding sentence reads:

      “[…] Finally, we want to acknowledge that by selecting a time window for the clustered reactivation we cannot distinguish very fast replay events (<=30ms) from clustered reactivation if they are contained exactly within the specific reactivation analysis time window..

      Reviewer #2 (Recommendations For The Authors):

      Figure 5D shows the difference scores between near vs. distant items for learning and retrieval. Similar to Figure 5 from the first version of your paper, the difference score does not show whether reactivation of the near vs. distant items change from learning to retrieval. You could show this change in a 2 (near vs. distant) x 2 (learning vs. retrieval) box plot (corresponding to Figure 5A).

      We have added the requested plot as supplement 9 and referred to it in the figure description. However comparing absolute, raw probabilities between different blocks is tricky, as baseline probabilities are varying over time (e.g. due to shift in distance to sensors), therefore, differential reactivation might be better suited as it is a relative measure to compare between blocks.

      At the end of the results section, you state: "On average, differential reactivation probability increased from pre to post resting state (Figure 5D).". I would suggest providing some statistical comparison and the corresponding values.

      We have calculated and added respective p-value statistics of a T-Test and reported that the increase is only descriptive and not statistically significant.

    1. Author response:

      We thank both the reviewers for their thorough reading of our manuscript and insightful suggestions. We thank the editors for their assessment of our article. We will submit a revised manuscript that addresses several comments and include a point-by-point response to the reviewers.

      (1) With respect to how our data compares with previously published datasets, we will provide a table comparing cell numbers. Study differences such as read depth, strain of animals used (including pigmented vs albino), method of cell isolation (including drug exposure), and number of cells profiled raise a significant impediment to integration with previously published datasets. We would like to highlight that ours is the first SEC single cell study that uses pigmented mouse eyes on C57BL/6J background. Integrating with the albino mouse data (Thompson et al. 2021) hindered pathway analyses possibly due to the variable drop out of genes across studies that was likely impacted by differences in method of cell isolation and increased representation of stress response genes in their dataset. We also attempted an integrated analysis with published mouse data (Van Zyl et al. 2020) but did not obtain additional meaningful information due to their low SEC numbers.

      (2) The reviewers commented that our integration of single cell and single nuc data should be done with caution: we agree and had given careful consideration to the integration process. We will demonstrate the contribution of different samples and datasets to show how our datasets have integrated.

      (3) To address the purity of bulk RNA seq, we will add more details for isolation of SECs for bulk seq. The markers to distinguish the three cell types were informed by immunofluorescence. Using these markers, we performed FACS using gates that were well separated. We have provided a heatmap with hierarchical clustering based on Euclidean distance of the EC subtypes (Figure 1B) analyzed by bulk RNA seq in addition to number of DE genes between subtypes.

      (4) To address the immunostaining of NPNT and CCL21A, since both our antibodies are derived from the same species (goat), a co-labeling wasn’t possible. To be prudent, we used adjacent sections, flat-mounts, and RNAscope and provided further evidence of the anterior/posterior “bias” in supplemental figures. Although we agree on its importance, work with human tissue will be a focus of future work.

      (5) Regarding the reviewer’s comments on substructure and that profiling may still not be comprehensive, we agree that further even more comprehensive studies are still needed. Profiling more cells will determine the robustness of the detected cell state difference and will help to resolve the cause of substructure within clusters as due to either lack of completely comprehensive profiling of cell types/states or more stochastic differences. We will add a comment to the discussion.

    1. Author response:

      Reviewer #1:

      The phenomenon of stress-inducible mutagenesis in bacterial evolution remains a topic of heated debate. Consequently, the emergence of genetically encoded resistance may stem from either microevolution or the dissemination of pre-existing variants from polyclonal infections under drug pressure. We believe that the Introduction presents both of these hypotheses in a balanced manner to elucidate the rationale behind our mutation accumulation investigations.

      While we acknowledge the well-known existence of phenotypic antibiotic resistance, it's worth noting that conclusions regarding mutation rates are often drawn from fluctuation assays without confirmation of genetic-level changes. This discrepancy persists despite fluctuation assays accounting for both phenotypic and genotypic alterations. Combining genome sequencing with fluctuation assays underscores the importance of making this distinction.

      Thank you for the suggestion regarding improving the figures; we will incorporate these changes accordingly in the revised version. Additionally, we will address the rationale for using sub-lethal doses of antibiotics and compare our results with the referenced papers.

      Reviewer #2:

      Thank you for acknowledging the values of the manuscript and for the insightful suggestions for improvement. We agree on the necessity to directly connect the mutation accumulation experiments with the tolerance assay, and we have already initiated additional experiments to integrate into a revised version.

      We also agree with and have been aware of the notion that cell death affects the calculation of the mutation rate. However, the error in the estimation of the generation time leads to an overestimation of the mutation rate, which, in our case, reinforces the conclusion that no discernible increase in mutation rate occurs in our mutation accumulation experiment. In the revised version, we aim to address i) the source of variation in cell death degree and ii) its influence on calculations.

      The SNPs identified from the lineages of each treatment are compiled in the "unique muts.xls" file within the Figshare document bundle we included with the manuscript. We regret not providing a detailed reference to this in the manuscript; instead, the Figshare files were merely mentioned under the Data Availability section (No. 6) without specifics. As advised, we will create a supplementary table containing this data.

      Reviewer #3:

      Thank you for appreciating the manuscript's merits and for the instructive suggestions (also articulated in the specific comments). We agree that we should show the data on reduced colony growth on agar plates to demonstrate that the drug concentrations used in the study are relevant. We will include this in the revised version, as well as changes in response to all specific comments.

      We acknowledge that the observed upregulation of DNA repair enzymes and the low mutation rates under drug pressure represent correlative data. Therefore, we opted against presenting the qPCR results as a mechanistic explanation. In the manuscript, we carefully stated: "The observed upregulation of the relevant DNA repair enzymes might account for the low mutation rate even under drug pressure." We did not establish a mechanistic link or emphasize the repair activation in the title, abstract, or discussion. We recognize the necessity for a new series of targeted experiments to provide mechanistic explanations. In this paper, our aim is to convincingly demonstrate that antibiotic pressure did not induce the occurrence of new adaptive mutations.

    1. Author response:

      eLife assessment

      This paper presents a valuable optimization algorithm for determining the spatio-temporal organization of chromatin. The algorithm identifies the polymer model that best fits population averaged Hi-C data and makes predictions about the spatio-temoral organization of specific genomic loci such as the oncogenic Myc locus. While the algorithm will be of value to biologists and physicists working in the field of genome organization, the provided methodological details and evidence are incomplete to fully substantiate the conclusions. In particular, the following would be beneficial: analysis of single-cell data, the inclusion of loci beyond Myc, testing the dependence of results on the chosen parameters, providing more details on CTCF occupancy at loop anchors, and better substantiating the claim about predictions of single-cell heterogeneity.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors of this study aim to use an optimization algorithm approach, based on the established Nelder-Mead method, to infer polymer models that best match input bulk Hi-C contact data. The procedure infers the best parameters of a generic polymer model that combines loop-extrusion (LE) dynamics and compartmentalization of chromatin types driven by weak biochemical affinities. Using this and DNA FISH, the authors investigate the chromatin structure of the MYC locus in leukemia cells, showing that loop extrusion alone cannot explain local pathogenic chromatin rearrangements. Finally, they study the locus single-cell heterogeneity and time dynamics.

      Strengths:

      • The optimization method provides a fast computational tool that speeds up the parameter search of complex chromatin polymer models and is a good technical advancement.

      • The method is not restricted to short genomic regions, as in principle it can be applied genome-wide to any input Hi-C dataset, and could be potentially useful for testing predictions on chromatin structure.

      Weaknesses:

      (1) The optimization is based on the iterative comparison of simulated and Hi-C contact matrices using the Spearman correlation. However, the inferred set of the best-fit simulation parameters could sensitively depend on such a specific metric choice, questioning the robustness of the output polymer models. How do results change by using different correlation coefficients?

      This is an important question. We have tested several metrics in the process of building the fitting procedure. We will showcase side-by-side comparisons of the fitting results obtained using these different metrics in an upcoming version of the preprint.

      (2) The best-fit contact threshold of 420nm seems a quite large value, considering that contact probabilities of pairs of loci at the mega-base scale are defined within 150nm (see, e.g., (Bintu et al. 2018) and (Takei et al. 2021)).

      This is a good point. Unfortunately, there is no established standard distance cutoff to map distances to Hi-C contact frequency data. Indeed, previous publications have used anywhere between 120 nm to 500 nm (see e.g. (Cardozo Gizzi et al. 2019), (Cattoni et al. 2017) , (Mateo et al. 2019), (Hafner et al. 2022), (Murphy and Boettiger 2022), (Takei et al. 2021), (Fudenberg and Imakaev 2017) , (Wang et al. 2016), (Su et al. 2020), (Chen et al. 2022), (Finn et al. 2019)). We will include a supplementary table in the upcoming revised preprint listing these values to demonstrate the lack of consensus. This large variation could reflect different chromatin compaction levels across distinct model systems, and different spatial resolutions in DNA FISH experiments performed by different labs. The variance in the threshold choice is also likely partially explained by Hi-C experimental details, e.g. the enzyme used for digestion, which biases the effective length scale of interactions detected (Akgol Oksuz et al. 2021). Among commonly used restriction enzymes, HindIII has a relatively low cutting frequency which results in a lower sensitivity to short-range interactions; on the other hand, MboI has a higher cutting frequency which results in a higher sensitivity to short-range interactions (Akgol Oksuz et al. 2021). Because the Hi-C data we used for the Myc locus in (Kloetgen et al. 2020) was generated using HindIII, we chose a distance cutoff close to the larger end of published values (420 nm).

      (3) In their model, the authors consider the presence of LE anchor sites at Hi-C TAD boundaries. Do they correspond to real, experimentally found CTCF sites located at genomic positions, or they are just assumed? A track of CTCF peaks of the considered chromatin loci would be needed.

      We apologize this was not clear. The LE anchor sites in the simulation model were chosen because they correspond to experimental CTCF sites and ChIP-seq peaks located at the corresponding genomic positions. Representative CTCF ChIP-seq tracks from (Kloetgen et al. 2020) will be added to figure 2 in the revised preprint version to emphasize this point.

      (4) In the model, each TAD is assigned a specific energy affinity value. Do the different domain types (i.e., different colors) have a mutually attractive energy? If so, what is its value and how is it determined? The simulated contact maps (e.g., Figure 2C) seem to allow attractions between different blocks, yet this is unclear.

      Sorry this was not explicit. The attraction energy between a pair of monomers in the simulation is determined using the geometric mean of the affinities of the two monomers. This applies to both monomers within the same domain and in different domains. This detail will be clarified in the upcoming revised preprint.

      (5) To substantiate the claim that the simulations can predict heterogeneity across single cells, the authors should perform additional analyses. For instance, they could plot the histograms (models vs. experiments) of the TAD2-TAD4 distance distributions and check whether the models can recapitulate the FISH-observed variance or standard deviation. They could also add other testable predictions, e.g., on gyration radius distributions, kurtosis, all-against-all comparison of single-molecule distance matrices, etc,.

      We agree that heterogeneity prediction is a key advantage of the simulations. We do note that the histograms (models vs. experiments) of the TAD2-TAD4 distance distributions measured by FISH were plotted in Fig. 3C as empirical cumulative probability distributions (as is standard in the field), side by side with the simulation predictions. Simulations indeed recapitulate the variance observed by FISH. We also had emphasized this important point in the main text: “Importantly, not just the average distances, but the shape of the distance distribution across individual cells closely matches the predictions of the simulations in both cell types, further confirming that the simulations can predict heterogeneity across cells.”

      (6) The authors state that loop extrusion is crucial for enhancer function only at large distances. How does that reconcile, e.g., with Mach et al. Nature Gen. (2022) where LE is found to constrain the dynamics of genomically close (150kb) chromatin loci?

      This is an interesting question. In (Mach et al. 2022), the authors tracked the physical distance between two fluorescent labels positioned next to either anchor of a ~150 kb engineered topological domain using live-cell imaging. They found that abrogation of the loop anchors by ablation of the CTCF binding motifs, or knock-down of the cohesin subunit Rad21 resulted in increased physical distance between the loci. HMM Modeling of the distance over time traces suggests that the increased distance resulted from rarer and shorter contacts between the anchors. While this might seem at odds with the results of Fig. 4L, we note a key difference between the loci. While (Mach et al. 2022) observed the dynamics of the distance separating two CTCF loop anchors, in our model only the MYC promoter is proximal to a loop anchor, while the position of the second locus is varied, but remains far from the other anchor. The deletion of the CTCF sites at both anchors in (Mach et al. 2022) indeed results in a lowered sensitivity of the physical distance to Rad21 knock-down, reminiscent of the results of Fig. 4L in our work. This result demonstrates that loop extrusion disruption disproportionately impacts distances between loci close to loop anchors, consistent with Hi-C results (Rao et al. 2017; Nora et al. 2017). We therefore believe that the models in our work and (Mach et al. 2022) are not at odds, but simply reflect that loop extrusion perturbations impact distances between loop anchors the most. Enhancer-Promoter loops are generally distinct from CTCF-mediated loops (Hsieh et al. 2020, 2022). While (Mach et al. 2022) represents a landmark study in our understanding of the dynamics of genomic folding by loop extrusion, we therefore believe that the locus we chose here - which matches the endogenous MYC architecture - may more accurately represent Enhancer-Promoter dynamics than a synthetic CTCF loop. To better articulate the similarities between model predictions and differences between the two loci, we will simulate a locus matching that of (Mach et al. 2022) in the upcoming revised preprint, and test the sensitivity of contact frequency and duration to in silico cohesin knock-down. This will also serve to extend the generality of our conclusions to different categories of genomic architectures, and the text will be clarified accordingly.

      Reviewer #2 (Public Review):

      Summary:

      The authors Fu et al., developed polymer models that combine loop extrusion with attractive interactions to best describe Hi-C population average data. They analyzed Hi-C data of the MYC locus as an example and developed an optimization strategy to extract the parameters that best fit this average Hi-C data.

      Strengths:

      The model has an intuitive nature and the authors masterfully fitted the model to predict relevant biology/Hi-C methodology parameters. This includes loop extrusion parameters, the need for self-interaction with specific energies, and the time and distance parameters expected for Hi-C capture.

      Weaknesses:

      (1) We are no longer in the age in which the community only has access to population average Hi-C. Why was only the population average Hi-C used in this study?

      Can single-cell data: i.e. single-cell Hi-C/Dip-C data or chromatin tracing data (i.e. see Tan et al Science 2018 - for Dip-C, Bintu et al Science 2018, Su et al Cell 2020 for chromatin tracing, etc.) or even 2 color DNA FISH data (used here only as validation) better constrain these models? At the very least the simulations themselves could be used to answer this essential question.

      I am expecting that the single-cell variance and overall distributions of distances between loci might better constrain the models, and the authors should at least comment on it.

      We agree that it is possible to recapitulate single-cell Hi-C or chromatin tracing data with simulations, and that these data modalities have a superior potential to constrain polymer models because they provide an ensemble of single allele structures rather than population-averaged contact frequencies. However, these data remain out of reach for most labs compared to Hi-C. Our goal with this work was to provide an approachable method that anyone interested could deploy on their locus of choice, and reasoned that Hi-C currently remains the data modality available to most. We envision this strategy will help reach labs beyond the small number of groups expert in single cell chromatin architecture, and thus hopefully broaden the impact of polymer simulations in the chromatin organization field.

      Nevertheless, we do agree that the comparison of single-cell chromatin architectures to simulations is a fertile ground for future studies. We will include a brief discussion of the potential of single-cell architectures in an upcoming version of the manuscript.

      (2) The authors claimed "Our parameter optimization can be adapted to build biophysical models of any locus of interest. Despite the model's simplicity, the best-fit simulations are sufficient to predict the contribution of loop extrusion and domain interactions, as well as single-cell variability from Hi-C data. Modeling dynamics enables testing mechanistic relationships between chromatin dynamics and transcription regulation. As more experimental results emerge to define simulation parameters, updates to the model should further increase its power." The focus on the Myc locus in this study is too narrow for this claim. I am expecting at least one more locus for testing the generality of this model.

      We note that we used two distinct loci in the study, the MYC locus in leukemia vs T cells (Figs. 2-3) and a representative locus in experiments comparing WT CTCF with a mutant that leads to loss of a subset of CTCF binding sites (Fig. 1L). To further demonstrate generality, we will add to the upcoming revised preprint a demonstration of the simulation fitting to other loci acquired in different cell types.

      Akgol Oksuz, Betul, Liyan Yang, Sameer Abraham, Sergey V. Venev, Nils Krietenstein, Krishna Mohan Parsi, Hakan Ozadam, et al. 2021. “Systematic Evaluation of Chromosome Conformation Capture Assays.” Nature Methods 18 (9): 1046–55.

      Bintu, Bogdan, Leslie J. Mateo, Jun-Han Su, Nicholas A. Sinnott-Armstrong, Mirae Parker, Seon Kinrot, Kei Yamaya, Alistair N. Boettiger, and Xiaowei Zhuang. 2018. “Super-Resolution Chromatin Tracing Reveals Domains and Cooperative Interactions in Single Cells.” Science 362 (6413). https://doi.org/10.1126/science.aau1783.

      Cardozo Gizzi, Andrés M., Diego I. Cattoni, Jean-Bernard Fiche, Sergio M. Espinola, Julian Gurgo, Olivier Messina, Christophe Houbron, et al. 2019. “Microscopy-Based Chromosome Conformation Capture Enables Simultaneous Visualization of Genome Organization and Transcription in Intact Organisms.” Molecular Cell 74 (1): 212–22.e5.

      Cattoni, Diego I., Andrés M. Cardozo Gizzi, Mariya Georgieva, Marco Di Stefano, Alessandro Valeri, Delphine Chamousset, Christophe Houbron, et al. 2017. “Single-Cell Absolute Contact Probability Detection Reveals Chromosomes Are Organized by Multiple Low-Frequency yet Specific Interactions.” Nature Communications 8 (1): 1753.

      Chen, Liang-Fu, Hannah Katherine Long, Minhee Park, Tomek Swigut, Alistair Nicol Boettiger, and Joanna Wysocka. 2022. “Structural Elements Facilitate Extreme Long-Range Gene Regulation at a Human Disease Locus.” bioRxiv. https://doi.org/10.1101/2022.10.20.513057.

      Finn, Elizabeth H., Gianluca Pegoraro, Hugo B. Brandão, Anne-Laure Valton, Marlies E. Oomen, Job Dekker, Leonid Mirny, and Tom Misteli. 2019. “Extensive Heterogeneity and Intrinsic Variation in Spatial Genome Organization.” Cell 176 (6): 1502–15.e10.

      Fudenberg, Geoffrey, and Maxim Imakaev. 2017. “FISH-Ing for Captured Contacts: Towards Reconciling FISH and 3C.” Nature Methods 14 (7): 673–78.

      Hafner, Antonina, Minhee Park, Scott E. Berger, Elphège P. Nora, and Alistair N. Boettiger. 2022. “Loop Stacking Organizes Genome Folding from TADs to Chromosomes.” bioRxiv. https://doi.org/10.1101/2022.07.13.499982.

      Hsieh, Tsung-Han S., Claudia Cattoglio, Elena Slobodyanyuk, Anders S. Hansen, Xavier Darzacq, and Robert Tjian. 2022. “Enhancer-Promoter Interactions and Transcription Are Largely Maintained upon Acute Loss of CTCF, Cohesin, WAPL or YY1.” Nature Genetics 54 (12): 1919–32.

      Hsieh, Tsung-Han S., Claudia Cattoglio, Elena Slobodyanyuk, Anders S. Hansen, Oliver J. Rando, Robert Tjian, and Xavier Darzacq. 2020. “Resolving the 3D Landscape of Transcription-Linked Mammalian Chromatin Folding.” Molecular Cell 78 (3): 539–53.e8.

      Kloetgen, Andreas, Palaniraja Thandapani, Panagiotis Ntziachristos, Yohana Ghebrechristos, Sofia Nomikou, Charalampos Lazaris, Xufeng Chen, et al. 2020. “Three-Dimensional Chromatin Landscapes in T Cell Acute Lymphoblastic Leukemia.” Nature Genetics 52 (4): 388–400.

      Mach, Pia, Pavel I. Kos, Yinxiu Zhan, Julie Cramard, Simon Gaudin, Jana Tünnermann, Edoardo Marchi, et al. 2022. “Cohesin and CTCF Control the Dynamics of Chromosome Folding.” Nature Genetics 54 (12): 1907–18.

      Mateo, Leslie J., Sedona E. Murphy, Antonina Hafner, Isaac S. Cinquini, Carly A. Walker, and Alistair N. Boettiger. 2019. “Visualizing DNA Folding and RNA in Embryos at Single-Cell Resolution.” Nature 568 (7750): 49–54.

      Murphy, Sedona, and Alistair Nicol Boettiger. 2022. “Polycomb Repression of Hox Genes Involves Spatial Feedback but Not Domain Compaction or Demixing.” bioRxiv. https://doi.org/10.1101/2022.10.14.512199.

      Nora, Elphège P., Anton Goloborodko, Anne-Laure Valton, Johan H. Gibcus, Alec Uebersohn, Nezar Abdennur, Job Dekker, Leonid A. Mirny, and Benoit G. Bruneau. 2017. “Targeted Degradation of CTCF Decouples Local Insulation of Chromosome Domains from Genomic Compartmentalization.” Cell 169 (5): 930–44.e22.

      Nuebler, Johannes, Geoffrey Fudenberg, Maxim Imakaev, Nezar Abdennur, and Leonid A. Mirny. 2018. “Chromatin Organization by an Interplay of Loop Extrusion and Compartmental Segregation.” Proceedings of the National Academy of Sciences of the United States of America 115 (29): E6697–6706.

      Rao, Suhas S. P., Su-Chen Huang, Brian Glenn St Hilaire, Jesse M. Engreitz, Elizabeth M. Perez, Kyong-Rim Kieffer-Kwon, Adrian L. Sanborn, et al. 2017. “Cohesin Loss Eliminates All Loop Domains.” Cell 171 (2): 305–20.e24.

      Su, Jun-Han, Pu Zheng, Seon S. Kinrot, Bogdan Bintu, and Xiaowei Zhuang. 2020. “Genome-Scale Imaging of the 3D Organization and Transcriptional Activity of Chromatin.” Cell 182 (6): 1641–59.e26.

      Takei, Yodai, Shiwei Zheng, Jina Yun, Sheel Shah, Nico Pierson, Jonathan White, Simone Schindler, Carsten H. Tischbirek, Guo-Cheng Yuan, and Long Cai. 2021. “Single-Cell Nuclear Architecture across Cell Types in the Mouse Brain.” Science 374 (6567): 586–94.

      Wang, Siyuan, Jun-Han Su, Brian J. Beliveau, Bogdan Bintu, Jeffrey R. Moffitt, Chao-Ting Wu, and Xiaowei Zhuang. 2016. “Spatial Organization of Chromatin Domains and Compartments in Single Chromosomes.” Science 353 (6299): 598–602.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We would like to thank all of the reviewers for their helpful and the effort they made in reading and evaluating our manuscript. In response to them, we have made major changes to the text and figures and performed substantial new experiments. These new data and changes to the text and figures have substantially strengthened the manuscript. We believe that the manuscript is now very strong in both its impact and scope and we hope that reviewers will find it suitable for publication in eLife

      A point-by-point response to the reviewers' specific comments is provided below.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this report, Yu et al ascribe potential tumor suppressive functions to the non-core regions of RAG1/2 recombinases. Using a well-established BCR-ABL oncogene-driven system, the authors model the development of B cell acute lymphoblastic leukemia in mice and found that RAG mutants lacking non-core regions show accelerated leukemogenesis. They further report that the loss of non-core regions of RAG1/2 increases genomic instability, possibly caused by increased off-target recombination of aberrant RAG-induced breaks. The authors conclude that the non-core regions of RAG1 in particular not only increase the fidelity of VDJ recombination, but may also influence the recombination "range" of off-target joints, and that in the absence of the non-core regions, mutant RAG1/2 (termed cRAGs) catalyze high levels of off-target recombination leading to the development of aggressive leukemia.

      Strengths:

      The authors used a genetically defined oncogene-driven model to study the effect of RAG non-core regions on leukemogenesis. The animal studies were well performed and generally included a good number of mice. Therefore, the finding that cRAG expression led to the development of more aggressive BCR-ABL+ leukemia compared to fRAG is solid.

      Weaknesses:

      In general, I find the mechanistic explanation offered by the authors to explain how the non-core regions of RAG1/2 suppress leukemogenesis to be less convincing. My main concern is that cRAG1 and cRAG2 are overexpressed relative to fRAG1/2. This raises the possibility that the observed increased aggressiveness of cRAG tumors compared to fRAG tumors could be solely due to cRAG1/2 overexpression, rather than any intrinsic differences in the activity of cRAG1/2 vs fRAG1/2; and indeed, the authors allude to this possibility in Fig S8, where it was shown that elevated expression of RAG (i.e. fRAG) correlated with decreased survival in pediatric ALL. Although it doesn't mean the authors' assertions are incorrect, this potential caveat should nevertheless be discussed.

      We appreciate the valuable suggestions from the reviewer. BCR-ABL1+ B-ALL is characterized by halted early B-lineage differentiation. In BCR-ABL1+ B cells, RAG recombinases are highly expressed, leading to the inactivation of genes that encode essential transcription factors for B-lineage differentiation. This results in cells being trapped within the precursor compartment, thereby elevating RAG gene expression. Our interpretation of the data suggests that, in BCR-ABL1+ B-ALL mouse models, the high expression of both cRAG and fRAG and the deletion of the non-core regions influence the precision of RAG targeting within the genome. This causes more genomic damage in cRAG tumors than in fRAG tumors, consequently leading to the observed increased aggressiveness of cRAG tumors compared to fRAG tumors. We discussed the issues on Page 12, lines 295-307 in the revised manuscript.

      Some of the conclusions drawn were not supported by the data.

      (1) I'm not sure that the authors can conclude based on μHC expression that there is a loss of pre-BCR checkpoint in cRAG tumors. In fact, Fig. 2B showed that the differences are not statistically significant overall, and more importantly, μHC expression should be detectable in small pre-B cells (CD43-). This is also corroborated by the authors' analysis of VDJ rearrangements, showing that it has occurred at the H chain locus in cRAG cells.

      We appreciate the insightful comment from the reviewer. Upon reevaluation of the data presented in Fig. 2B, we identified and rectified certain errors. The revised analysis now shows that the differences in μHC expression are statistically significant. This significant expression of μHC in fRAG leukemic cells implies that these cells may progress further in differentiation, potentially acquiring an immune phenotype. These modifications have been incorporated into the manuscript on page 7, lines 153-156 in the revised manuscript.

      (2) The authors found a high degree of polyclonal VDJ rearrangements in fRAG tumor cells but a much more limited oligoclonal VDJ repertoire in cRAG tumors. They concluded that this explains why cRAG tumors are more aggressive because BCR-ABL induced leukemia requires secondary oncogenic hits, resulting in the outgrowth of a few dominant clones (Page 19, lines 381-398). I'm not sure this is necessarily a causal relationship since we don't know if the oligoclonality of cRAG tumors is due to selection based on oncogenic potential or if it may actually reflect a more restricted usage of different VDJ gene segments during rearrangement.

      Thank you for your insightful comments and questions regarding the relationship between the oligoclonality of V(D)J rearrangements and the aggressiveness of cRAG tumors. You raise an important point regarding whether the observed oligoclonality is a result of selective pressure favoring clones with specific oncogenic potential, or if it reflects inherent limitations in V(D)J segment usage during rearrangement in cRAG models. In our study, we observed a marked difference in the V(D)J rearrangement patterns between fRAG and cRAG tumor cells, with cRAG tumors exhibiting a more limited, oligoclonal repertoire. This observation led us to speculate that the aggressive nature of cRAG tumors might be linked to a selective advantage conferred by specific V(D)J rearrangements that cooperate with the BCR-ABL1 oncogene to drive leukemogenesis. However, we acknowledge that our current data do not definitively establish a causal relationship between oligoclonality and tumor aggressiveness. The restricted V(D)J repertoire in cRAG tumors could indeed be due to a more constrained rearrangement process, possibly influenced by the altered expression or function of RAG1/2 in the absence of non-core regions. This could limit the diversity of V(D)J rearrangements, leading to the emergence of a few dominant clones not necessarily because they have greater oncogenic potential, but because of a narrowed field of rearrangement possibilities.

      To address this question more thoroughly, future studies could examine the functional consequences of specific V(D)J rearrangements found in dominant cRAG tumor clones. This could include assessing the oncogenic potential of these rearrangements in isolation and in cooperation with BCR-ABL1, as well as exploring the mechanistic basis for the restricted V(D)J repertoire. Such studies would provide deeper insight into the interplay between RAG-mediated recombination, clonal selection, and leukemogenesis in BCR-ABL1+ B-ALL.

      We appreciate your feedback on this matter and agree that further investigation is required to unravel the precise relationship between V(D)J rearrangement diversity and leukemic progression in cRAG models. We have revised our discussion to reflect these considerations and to clarify the speculative nature of our conclusions regarding the link between oligoclonality and tumor aggressiveness. We added more discussion on this issue on Page 7, lines 166-170 in the revised manuscript.

      (3) What constitutes a cancer gene can be highly context- and tissue-dependent. Given that there is no additional information on how any putative cancer gene was disrupted (e.g., truncation of regulatory or coding regions), it is not possible to infer whether increased off-target cRAG activity really directly contributed to the increased aggressiveness of leukemia.

      We totally agree you raised the issues. In Supplementary Table 3, we have presented data on off-target gene disruptions, specifically in introns, exons, downstream regions, promoters, 3' UTRs, and 5' UTRs. However, this dataset alone does not suffice to conclusively determine whether the increased off-target activity of cRAG directly influences the heightened aggressiveness of leukemia. To bridge this knowledge gap, our future research will extend to include both knockout and overexpression experiments targeting these off-target genes.

      (4) Fig. 6A, it seems that it is really the first four nucleotide (CACA) that determines fRAG binding and the first three (CAC) that determine cRAG binding, as opposed to five for fRAG and four for cRAG, as the author wrote (page 24, lines 493-497).

      We thank the reviewer for the insightful comment. In response, we have revised the text to accurately reflect the nucleotide sequences responsible for RAG binding and cleavage. Specifically, we now clarify that the first four nucleotides (CACA) are crucial for fRAG binding and cleavage, while the initial three nucleotides (CAC) are essential for cRAG binding and cleavage. These updates have been made on page 10, lines 242-245 of the revised manuscript.

      (5) Fig S3B, I don't really see why "significant variations in NHEJ" would necessarily equate "aberrant expression of DNA repair pathways in cRAG leukemic cells". This is purely speculative. Since it has been reported previously that alt-EJ/MMEJ can join off target RAG breaks, do the authors detect high levels of microhomology usage at break points in cRAG tumors?

      We appreciate the reviewer's comment. Currently, we have not observed microhomology usage at breakpoints in cRAG tumors. We plan to address this aspect in a future, more detailed study. Regarding the 'aberrant expression of DNA repair pathways in cRAG leukemic cells, we acknowledge that this is speculative. Therefore, we have carefully rephrased this to 'suggesting a potential aberrant expression of DNA repair pathways in cRAG leukemic cells.' This modification is reflected on page 12, lines 290-291 of the revised manuscript.

      (6) Fig. S7, CDKN2B inhibits CDK4/6 activation by cyclin D, but I don't think it has been shown to regulate CDK6 mRNA expression. The increase in CDK6 mRNA likely just reflects a more proliferative tumor but may have nothing to do with CDKN2B deletion in cRAG1 tumors.

      We fully concur with the reviewer's comment. We have deleted this inappropriate part from the text.

      Insufficient details in some figures. For instance, Fig. 1A, please include statistics in the plot showing a comparison of fRAG vs cRAG1, fRAG vs cRAG2, cRAG1 vs cRAG2. As of now, there's a single p-value (0.0425) stated in the main text and the legend but why is there only one p-value when fRAG is compared to cRAG1 or cRAG2? Similarly, the authors wrote "median survival days 11-26, 10-16, 11-21 days, P < 0.0023-0.0299, Fig. S2B." However, it is difficult for me to figure out what are the numbers referring to. For instance, is 11-26 referring to median survival of fRAG inoculated with three different concentrations of GFP+ leukemic cells or is 11-26 referring to median survival of fRAG, cRAG1, cRAG2 inoculated with 10^5 cells? It would be much clearer if the authors can provide the numbers for each pair-wise comparison, if not in the main text, then at least in the figure legend. In Fig. 5A-B, do the plots depict SVs in cRAG tumors or both cRAG and fRAG cells? Also in Fig. 5, why did 24 SVs give rise to 42 breakpoints, and not 48? Doesn't it take 2 breaks to accomplish rearrangement? In Fig. 6B-C, it is not clear how the recombination sizes were calculated. In the examples shown in Fig. 4, only cRAG1 tumors show intra-chromosomal joins (chr 12), while fRAG and cRAG2 tumors show exclusively inter-chromosomal joins.

      We appreciate the reviewer's feedback and have made the following revisions:

      (1) The text has been adjusted to rectify the previously mentioned error in the figure legends (page 1, lines 5-6).

      (2) We have clarified the intended message in the revised text (page 6, lines 129-130) and the figure legend (page 4-5, lines 107-113) for greater precision.

      (3) Figure 5A-B now presents an overview of all structural variants (SVs) identified in both cRAG and fRAG cells, offering a comprehensive comparison.

      (4) Among the analyzed SVs, 24 generated a total of 48 breakpoints, with 41 occurring within gene bodies and the remaining 7 in adjacent flanking sequences. This informs our exon-intron distribution profile analysis.

      (5) We have defined recombination sizes as ‘the DNA fragment size spanning the two breakpoints’ for clarity (page 10, lines 251-252).

      (6) All off-target recombinations identified in the genome-wide analyses of fRAG, cRAG1, and cRAG2 leukemic cells were determined to be intra-chromosomal joins, highlighting their specific nature within the genomic context.

      Insufficient details on certain reagents/methods. For instance, are the cRAG1/2 mice of the same genetic background as fRAG mice (C57BL/6 WT)? On Page 23, line 481, what is a cancer gene? How are they defined? In Fig. 3C, are the FACS plots gated on intact cells? Since apoptotic cells show high levels of gH2AX, I'm surprised that the fraction of gH2AX+ cells is so much lower in fRAG tumors compared to cRAG tumors. The in vitro VDJ assay shown in Fig 3B is not described in the Method section (although it is described in Fig S5b). Fig. 5A-B, do the plots depict SVs in cRAG tumors or both cRAG and fRAG cells?

      We are grateful for the reviewer's feedback and have incorporated their insights as follows:

      (1) We clarify that both cRAG1/2 and fRAG mice share the same genetic background, specifically the C57BL/6 WT strain, ensuring consistency across experimental models.

      (2) We define a 'cancer gene' as one harboring somatic mutations implicated in cancer. To support our analysis, we refer to the Catalogue Of Somatic Mutations In Cancer (COSMIC) at http://cancer.sanger.ac.uk/cosmic. COSMIC serves as the most extensive repository for understanding the role of somatic mutations in human cancers.

      (3) Upon thorough review of the raw data for γ-H2AX and the fluorescence-activated cell sorting (FACS) plots gated on intact cells, we propose that the observed discrepancies might stem from the limited sensitivity of the γ-H2AX flow cytometry detection method. This insight prompts our commitment to employing more efficient detection methodologies in forthcoming studies.

      (4) Detailed procedures for the in vitro V(D)J recombination assay have been included in the Methods section (page 15, lines 384-388) to enhance the manuscript's comprehensiveness and reproducibility.

      (5) The presented plots offer a comprehensive overview of structural variants (SVs) identified in both cRAG and fRAG cells, providing a holistic view of the genomic landscape across different models.

      Reviewer #3 (Public Review):

      Summary:

      In the manuscript, the authors summarized and introduced the correlation between the non-core regions of RAG1 and RAG2 in BCR-ABL1+acute B lymphoblastic leukemia and off-target recombination which has certain innovative and clinical significance.

      Recommendations For The Authors:

      Reviewer #1 (Recommendations For The Authors):

      I would suggest that the authors tone down some of their conclusions, which are not necessarily supported by their own data. in addition, there are some minor mistakes in figure assembly/presentation. For instance, I believe that the axes labels in Fig. 1E were flipped. BrdU should be on y-axis and 7-AAD on the x-axis. Fig. 3B, the y-axis contains a typo, it should be "CD90.1..." and not "D90.1...". In Fig. 5C, the numbers seem to be flipped, with 93% corresponding to cRAG1 and 100% to cRAG2 (compare with the description on page 23, lines 474-475). Fig. 5C, y-axis, "hybrid" is a typo. Page 3, line 59: The abbreviation of RSS has already been described earlier (p4, line 53).

      We thank the reviewer for these suggestions. We carefully checked the raw data and corrected these mistakes in the revised manuscript.

      Page 3, line 63: "signal" segment (commonly referred to as signal ends), not "signaling" segment.

      We have changed “signaling segment” to “signal ends in the revised manuscript. (page 3, lines 54-55)

      Page 3, lines 64-65: VDJ recombination promotes the development of both B and T cells, and aberrant recombination can cause both B and T cell lymphomas.

      The statement about the role of V(D)J recombination in B and T cell development and its link to lymphomagenesis is grounded in a substantial body of research. Theoretical frameworks and empirical studies delineate how aberrations in the recombination process can lead to genomic instability, potentially triggering oncogenic events. This connection is extensively documented in immunology and oncology literature, illustrating the critical balance between necessary genetic rearrangements for immune diversity and the risk of malignancy when these processes are dysregulated (Thomson, et al.,2020; Mendes, et al.,2014; Onozawa and Aplan,2012).

      Page 4, line 72: "recombinant dispensability" is not a commonly used phrase. Do the authors mean the say that the non-core regions of RAG1/2 are not strictly required for VDJ recombination?

      We thank the reviewers for their insightful suggestion. We have revised the sentence to read, 'Although the non-core regions of RAG1/2 are not essential for V(D)J recombination, the evolutionary conservation of these regions suggests their potential significance in vivo, possibly affecting RAG activity and expression in both quantitative and qualitative manners.' This revision appears on page 3, lines 61-62, in the revised manuscript.

      Fig. 4. It would have been nice to show at least one more cRAG1 tumor circus plot.

      We appreciate the reviewer's comment and concur with the suggestion. In future sequencing experiments, we will consider including additional replicates. However, due to time and financial constraints, the current sequencing effort was limited to a maximum of three replicates.

      Reviewer #3 (Recommendations For The Authors):

      In the manuscript, the authors summarized and introduced the correlation between the non-core regions of RAG1 and RAG2 in BCR-ABL1+acute B lymphoblastic leukemia and off-target recombination which has certain innovative and clinical significance. The following issues need to be addressed by the authors.

      (1) Authors should check and review extensively for improvements to the use of English.

      We thank the reviewer for their comment. With assistance from a native English speaker, we have carefully revised the manuscript to enhance its readability.

      (2) Authors should revise the conclusion so that the above can be clearly reviewed and summarized.

      The conclusion has been partially revised in the revised manuscript.

      (3) The article should state that the experiment was independently repeated three times.

      The experiment was repeated under the same conditions three times and the information has been descripted in Statistics section on page 19, lines 473-475 in the revised manuscript.

      (4) The article will be more convincing if it uses references in the last 5 years.

      We are grateful to the reviewer for their guidance in enhancing our manuscript. We have incorporated additional references from the past five years in the revised version.

      (5) Additional experiments are suggested to elucidate the molecular mechanisms related to off-target recombination.

      We thank the reviewer for this suggestion. In future experiments, we plan to perform ChIP-seq analysis to investigate the relationship between chromatin accessibility and off-target effects, as well as to examine the impact of knocking out and overexpressing off-target genes on cancer development and progression.

      (6) It is suggested to further analyze the effect of the absence of non-core RAG region on the differentiation and development of peripheral B cells in mice by flow analysis and expression of B1 and B2.

      Thank you very much for highlighting this crucial issue. FACS analysis was performed, revealing that leukemia cells in peripheral B cells in mice did not express CD5. The data are presented as follows:

      Author response image 1.

      (7) Fig3A should have three biological replicates and the molecular weight should be labeled on the right side of the strip.

      Thank you for this suggestion. The experiment was independently repeated three times, and the molecular weights have been labeled on the right side of the bands in the revised version

      References:

      Mendes RD, Sarmento LM, Canté-Barrett K, Zuurbier L, Buijs-Gladdines JG, Póvoa V, Smits WK, Abecasis M, Yunes JA, Sonneveld E, Horstmann MA, Pieters R, Barata JT, Meijerink JP. 2014. PTEN microdeletions in T-cell acute lymphoblastic leukemia are caused by illegitimate RAG-mediated recombination events. BLOOD 124:567-578. doi:10.1182/blood-2014-03-562751

      Onozawa M, Aplan PD. 2012. Illegitimate V(D)J recombination involving nonantigen receptor loci in lymphoid malignancy. Genes Chromosomes Cancer 51:525-535. doi:10.1002/gcc.21942

      Thomson DW, Shahrin NH, Wang P, Wadham C, Shanmuganathan N, Scott HS, Dinger ME, Hughes TP, Schreiber AW, Branford S. 2020. Aberrant RAG-mediated recombination contributes to multiple structural rearrangements in lymphoid blast crisis of chronic myeloid leukemia. LEUKEMIA 34:2051-2063. doi:10.1038/s41375-020-0751-y

    1. Author response:

      The authors express their gratitude to the reviewers for their insightful comments.

      Reviewer #1: We are uncertain about the reference to an overjudgement of the recovery of spermatogonial stem cells, as we did not draw any conclusions on this in the current study. Additionally, we have received feedback mentioning the multitude and diversity of datasets as both a strength and a weakness. However, we would appreciate clarification on which datasets may have been insufficiently reviewed and how our selection of highlights may have introduced bias to the interpretation and conclusion of the study. It is important to note that we did not select any patients/ data; all patient data were incorporated into our results section. We acknowledge the need for clarification regarding our study population for the germ cell stainings. As stated in our Materials and Methods section, our current study population includes the cohort from our previous publication (Vereecke et al., 2020), supplemented by nine additional participants, totaling n=106 trans women. While Fig. 1C incorporates both previous and new data on germ cells, we understand the need to clarify this to avoid confusion. Additionally, we will include information on the Tanner stages of the trans women in our cohort (all G5), as well as details on the selection criteria for our controls and their Tanner stages. As briefly touched upon in the discussion, a marker such as delta-like homolog 1 would indeed be valuable to assess the presence of truly immature Leydig cells. Unfortunately, our attempts to optimize the immunofluorescence protocol for this marker were unsuccessful, resulting in a double staining instead of a triple staining for the Leydig cells. The suboptimal resolution of Fig.1 will be solved.

      Reviewer #2 raises concerns regarding the suitability of rejuvenated testicular tissue for research purposes. However, we emphasize that this tissue source holds significant value. Although there is a wide availability of adult testicular tissue (coming from prostate cancer patients or vasectomy reversal patients), we are especially looking for alternatives for the scarce prepubertal/ pubertal tissue for research on in vitro spermatogenesis. While we acknowledge that transgender tissue with severe hyalinization or without spermatogonia may not be suitable for such research, the abundance of transgender tissue without these issues emphasizes the value of this tissue source.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We would like to thank both Editors and reviewers for their valuable time, careful reading, and constructive comments. The comments have been highly valuable and useful for improving the quality of our study, as well as important in guiding the direction of our present and future research. In the revised manuscript, we have incorporated the necessary changes including additional experimental data as suggested; please find our detailed pointby-point response to the reviewer’s comments and the changes we have made in the manuscript as follows.

      Reviewer #1 (Public Review):

      In this work, the authors have explored how treating C. albicans fungal cells with EDTA affects their growth and virulence potential. They then explore the use of EDTA-treated yeast as a whole-cell vaccine in a mouse model of systemic infection. In general, the results of the paper are unsurprising. Treating yeast cells with EDTA affects their growth and the addition of metals rescues the phenotype. Because of the significant growth defects of the cells, they don't infect mice and you see reduced virulence. Injection with these cells effectively immunises the mice, in the same way that heatkilled yeast cells would. The data is fairly sound and mostly well-presented, and the paper is easy to follow. However, I feel the data is an incremental advance at best, and the immune analysis in the paper is very basic and descriptive.

      Strengths:

      Detailed analysis of EDTA-treated yeast cells

      Weaknesses:

      • Basic immune data with little advance in knowledge.

      • No comparison between their whole-cell vaccine and others tried in the field.

      • The data is largely unsurprising and not novel.

      Reply: Thank you so much for appreciating our effort to generate a whole cell anti-fungal vaccine by treating C. albicans cells with EDTA. Also, we appreciate your comment that the manuscript is sound and well-presented. However, we are afraid that the respected reviewer assumed the CAET cells as dead cells while they only divide relatively slower than the untreated cells. In the revised manuscript, we have presented additional evidence to show that CAET are live cells (Supp. Figs 2) and based on the new data, we expect a positive change in the reviewer’s opinion. Since CAET is a live strain, the data presented here is novel.

      Reviewer #2 (Public Review):

      Summary:

      Invasive fungal infections are very difficult to treat with limited drug options. With the increasing concern of drug resistance, developing an antifungal vaccine is a high priority. In this study, the authors studied the metal metabolism in Candida albicans by testing some chelators, including EDTA, to block the metal acquisition and metabolism by the fungus. Interestingly, they found EDTAtreated yeast cells grew poorly in vitro and non-pathogenic in vivo in a murine model. Mice immunized by EDTA-treated Candida (CAET) were protected against challenge with wild-type Candida cells. RNA-Seq analysis to survey the gene expression profile in response to EDTA treatment in vitro revealed upregulation of genes in metal homeostasis and downregulation of ribosome biogenesis. They also revealed an induction of both pro- and anti-inflammatory cytokines involved in Th1, Th2 and Th17 host immune response in response to CAET immunization. Overall, this is an interesting study with translational potential.

      Strengths:

      The main strength of the report is that the authors identified a potential whole-cell live vaccine strain that can provide full protection against candidiasis. Abundant data both on in vitro phenotype, gene expression profile, and host immune response have been presented.

      Weaknesses:

      A weakness is that the immune mechanism of CAET-mediated host protection remains unclear. The immune data is somewhat confusing. The authors only checked cytokines and chemokines in blood. The immune response in infected tissues and antibody response may be investigated.

      Reply: Thank you very much for appreciating our work and finding our strain to be a live whole-cell anti-fungal vaccine strain with translational potential. Since the current study focused on the identification and detailed characterizations of a non-genetically modified live-attenuated strain and determination of its safety and efficacy as a potential vaccine candidate in the preclinical model, we have excluded the possible immune mechanisms involving CAET. In a separate study, we are currently investigating both cellular and molecular mechanisms that provide protective immunity in CAET-vaccinated mice.

      Reviewer #3 (Public Review):

      Summary:

      The authors are trying to find a vaccine solution for invasive candidiasis.

      Strengths:

      The testing of the antifungal activity of EDTA on Candida is not new as many other papers have examined this effect. The novelty here is the use of this EDTA-treated strain as a vaccine to protect against a secondary challenge with wild-type Candida.

      Weaknesses:

      However, data presented in Figure 5 and Figure 6 are not convincing and need further experimental controls and analysis as the authors do not show a time-dependent effect on the CFU of their vaccine formulation. The methodology used is also an issue. As it stands, the impact is minor.

      Reply: Thank you so much for appreciating our efforts to develop a novel vaccine against fungal infections. We are extremely sorry for the lack of clarity in our writing related to Figs. 5 and 6, we have now modified the text and hope that the respected reviewer will find these convincing.

      Recommendations for the authors:

      Although the reviewers recognize the importance of the manuscript, they would like to see: 1) comparisons between their whole-cell vaccine and others tried in the field, 2) an investigation of the immune response in infected tissues and antibody response, and 3) more controls in Figures 5 and 6, and a time-dependent effect on the colony-forming units of their vaccine formulation. Please, address the questions and submit a revised version together with a rebuttal letter addressing point-by-point raised by each reviewer.

      Reply: (1) We are afraid that a comparative study of a live and heat-killed cell vaccines will mislead the information presented here. This is the only non-genetically modified antifungal vaccine candidate therefore a comparison with a dead strain at present is unwarranted. We have now added supporting data to confirm that, the survivability of C. albicans cells was unaffected at 6 hr of EDTA treatment (CAET, Supp. Fig. S2). (2) Since the current study focused on the identification and a detailed characterization of a non-genetically modified live attenuated strain and its safety and efficacy as a potential vaccine candidate in the preclinical model, we have excluded the possible immune mechanisms involving CAET. However, in a separate study, we are currently investigating both cellular and molecular mechanisms that provide protective immunity in CAET-vaccinated mice. (3) The results of Figs 5 and 6 were misinterpreted by the respected reviewer, please see the explanation below.

      Reviewer #1 (Recommendations For The Authors):

      Some specific comments/suggestions for the authors: (1) What was the viability of the yeast after EDTA treatment? Is the delayed growth response because many cells died and it takes a while for remaining viable cells to catch up? This is important to know because it may mean the dose given to mice is substantially different and that should be accounted for. Some PI staining of the cells after treatment would help.

      Reply: The growth curve assays (Fig. 1A and 1E) were initiated with O.D.600nm=0.5 of each cultures (~ 107 cells/mL) and the analyses suggested that the EDTA-treated C. albicans cells grew slower than the untreated cells. Fig. 1B and 1F further demonstrated that EDTA has minimal effect on the survival of the strain up to 8 hrs post-exposure. The proportion of the number of cells increased without and with metal chelators almost remained the same for this duration (0 – 8 hrs). Therefore, for subsequent analyses, 6 hr treatment was selected and such treated cells were considered as CAET, which were actively dividing live cells, albeit slower than untreated cells. As suggested and to strengthen our finding, a time dependent SYTOX Green and Propidium iodide staining of C. albicans cells without and with EDTA treatment was carried out and analysed by flow cytometry and microscopy, respectively. Both analyses revealed that the percentage of dead cells up to 12 hrs of without and with EDTA treatment remained the same. The new data has now been added in the revised version of the manuscript as Supplementary figure 2.

      Author response image 1.

      (2) In line with the above, what was the viability of the CAET cells after 3h in media? In the macrophage in vitro experiments, how do you know the reduced viability of the CAET cells is macrophage-specific? Did you run a control of CAET cells in media on their own to determine how CFU changed in macrophage-free conditions? Is the proliferation rates of untreated and CAET cells different? That would affect CFSE labelling and results. These experiments would work better with a GFP-expressing C. albicans strain, which is widely available. In the images in Figure 4c, it looks like there are more hyphae in CAET than untreated - was hyphal induction checked/measured? That's important to know because more hyphae usually means more clumping and this can affect CFU counts (giving the impression of less CFU when actually there is more). Because of all the issues above, I'm not fully convinced by the uptake/killing data.

      Reply: As explained in response 1, we used actively dividing WT and CAET cells, and equal number of these cells were CFSE labelled. As can be seen in Fig.4A, the rate of phagocytosis was the same in 1 hr of pre-culture, but in the subsequent time points the double-positive cells were reduced in the case of CAET cells and that is due to fungal killing by macrophages. Fungal cells were released from the macrophages by warm water treatment and CFU was determined. Fig. 4B suggested that at 1hr of co-culture, the CFU of both fungal cells (WT and CAET) were the same and the fungal clearance was observed at later time points. Thus, the reduced viability of CAET cells was macrophagespecific. EDTA has minimal effect on hyphal transition without and with the presence of serum and the new data has now been provided in the revised version (Supplementary Fig. 3).

      Author response image 2.

      (3) Pooled data should be shown for all animal experiments.

      Reply: Thank you for the suggestion, wherever it was meaningful pooled data for the animal experiments have now been provided.

      (4) Immune cell counts/analysis in the kidney and bone marrow would be hugely helpful and more relevant to understanding immune responses following immunisation/infection. I think a more interesting analysis for the authors to consider would be to immunise with heat-killed yeast vs EDTAtreated yeast and see if there is a qualitative difference or better protection, i.e. is the EDTA-treated whole-cell vaccine superior to the heat-killed version? That is a better question to address. As it stands, the data in the paper is not surprising.

      Reply: The studies on cellular and molecular mechanisms underlying protective immunity in CAETvaccinated mice are under progress in a separate study. This study mostly focused on the identification and detailed characterization of a non-genetically modified live-attenuated strain and its safety and efficacy as a potential vaccine candidate in a preclinical model. We are afraid that a comparison of a live cell (CAET) with a dead cell (heat-killed) will dilute the content of the manuscript and will not be meaningful. It is well accepted that the heat-killed C. albicans strain only provides partial short-lived protection to re-challenge (Refs-PMIDs: 12146759, and 9916097), thus, it does not warrant any comparison with CAET.

      Reviewer #2 (Recommendations For The Authors):

      Overall, this is a highly interesting study. I have the following specific comments for clarification.

      (1) In the introduction, the authors mentioned other anti-candida vaccines that are mostly effective against Candida infection by inducing neutralizing antibodies. However, in their CAET vaccine candidate, they only checked the cellular immunity in blood and found a balanced immune response (both pro- and anti-inflammatory responses are induced). How about the antibody production in these mice? It is a bit surprising that both untreated Candida infection and CAET Candida infection produced similar immune activation based on Figure 6, yet the CAET immunization provides protection. Some innate cell recruitment is higher in untreated Ca infection than the CAET infected mice (Figure 5F). The overall results on immune response characterization did not seem to explain why the CAET infection led to host protection while untreated Ca infection cannot. Characterizing infected tissue immune cell differentiation and cytokine production may offer some additional insights.

      Reply: We agree with you that in this manuscript we have not provided any mechanistic study on the protective immunity in CAET-vaccinated mice. This will be demonstrated in a subsequent study.

      (2) In Figure 5, some critical data seem to be missing in panels B and C. The CFU and histopathological images for CAET-treated mice challenged by Ca should also be shown there for comparison. Although they did show some data in Figure 5E and Figure S4, it is necessary to have that data in 5B and 5C from the same experiment. Figure S4 is a very busy figure and the images are quite small. It may be necessary to use arrows to point out what information authors want to emphasize.

      Reply: Fig 5 B and 5C showed the data for mice that succumbed to infection. Since the other mice (saline control groups, CAET infected, CAET vaccinated, and re-challenged groups) survived, they were not sacrificed; therefore, the CFU data was not collected. In addition, we wanted to see the longevity of these survived mice and after 1 year of observations, they were handed over to the animal house for clearance as per the institutional guidelines. However, Figure 5E and Figure S4 (now Fig. S6) included all the mice groups as they were sacrificed at various time points irrespective of humane end points. As suggested FigS6 has now been modified and fungal cells were denoted by yellow arrows.

      (3) EDTA-treated yeast cells showed poor growth but also had thicker cell walls with high chitin, glucan, and mannan levels. What leads to its clearance in vivo remains unclear, as usually, cells with thick cell wall structures and low metabolism are more resistant to stress, e.g., dormant cells. Macrophages were shown to contribute to CAET killing in a phagocytosis assay (Figure 4). Checking cytokines produced by macrophages during co-incubation may offer some insights. In all, additional discussion on what caused in vivo clearance would be helpful.

      Reply: Mechanistic study on the protective immune responses of CAET will be demonstrated in a separate study. As suggested, the discussion section now contains additional information emphasising the in vivo clearance of CAET cells in the 3rd paragraph of discussion section.

      (4) Long paragraphs in the discussion section could be divided into a bigger number of shorter paragraphs.

      Reply: Thank you for the suggestion, it has now been modified in the revised version (7 short paragraphs). To make it more comprehensive, some of the content has been removed.

      Reviewer #3 (Recommendations For The Authors):

      (1) It is unclear how many cells were treated with 250 micromolar of EDTA for 6 hours before preparing the inoculum. It seems that only the OD was measured before adding EDTA. This is not a very rigorous and reproducible method.

      Reply: In this manuscript, we have repeatedly used the same protocol to generate CAET cells for various analyses. The O.D.600nm= 0.5 culture is equivalent to 107 C. albicans cells per mL and this information has now been added in the revised manuscript.

      (2) Upon treatment with 250 micromolar of EDTA, cells were harvested and counted to prepare the inoculum (5x10e5) for injecting it in mice. However, it appears that CFU of the inoculum was not done. Based on data shown in Fig. 1B, 250 micromolar of EDTA does inhibit Candida cell replication. Thus, the authors may have counted dead cells and, thus, injected dead cells together with live cells for the CAET inoculum. Thus, mice receiving this inoculum may have been infected (and vaccinated) with a lower number of live Candida cells.

      Reply: Please see a similar response to reviewer #1. EDTA has minimal effect on the survival of C. albicans cells at 6 hr (also see supp. Fig. S2). We have already mentioned the CFU analysis of untreated and CAET cells in the methodology section related to inoculum preparation.

      (3) It is unclear if 6 hours of treatment with 250 micromolar of EDTA is enough to induce a block of Candida cell replication. In Figure 1B, the authors treated for 24h. The authors are encouraged to wash the cells after 6 hours of treatment and see if their cell division will recover upon removal of EDTA.

      Reply: Thank you for the suggestion. At 6 hr treatment, survivability of C. albicans cells was unaffected upon EDTA exposure. PI and SYTOX GREEN staining confirmed it (Supp. Fig. 2). Additionally, as suggested a rescue experiment was carried out by exogenous addition of divalent metals after 6 hr EDTA treatment and growth/CFU analyses were followed thereafter. A modified Fig. 1 A and B with new data has been provided.

      (4) The data shown in Figure 5A is extremely exciting. However, the number of mice in each group (n=6) is too low. Normally, 10 mice per group are used for virulence studies unless the authors provide a power analysis that 6 mice per group will be sufficient. Also, CFU data were only provided for Ca and saline-Ca groups (Fig. 5B) and not for the other groups. CFU data should be provided for all mice.

      Reply: Thank you for the suggestion and a statistical analysis of Fig. 5A was provided in the revised version. The rationale behind not including all mice groups in Fig. 5B is already explained in a response to reviewer #2.

      (5) It is unclear how the authors differentiate between CFU arising from CAET or from WT Candida.

      Reply: Since the Fig 5 E demonstrated that no CAET cells were detected in the kidney beyond 10 days of inoculation, in the re-challenged mice group (1CAET 2 Ca), the fungal cells those detected in the 3rd and 7th days were from the later inoculated cells (brown colour).

      (6) Figure 5E: it is unclear if a 1 saline-2 saline (Figure legend) or if 1 saline-2 Ca (text) group was included. If the latter, where are the CFU? It is impossible that 1 saline-2 Ca mice have no CFU.

      Reply: Thank you so much for pointing this out. The legend has now been modified that include 1saline-2saline and 1CAET-2Ca.

      (7) It seems that CFU is significantly present in the kidney in the 1 CAET - 2 Ca group at day 7 but not at day 3. How is this possible? This is an extremely invasive model of infection, and the authors are challenging intravenously 500,000 live Candida cells. If by the 3rd day, the authors detect no CFU, then how is it possible that CFUs are arising on day 7?

      Reply: We do detect fungal cells on 3rd day in 1CAET 2 WT mice group (~2000 cells), albeit much lower than in 7 days (~11200 cells). A Log10 scale graph has now been provided for better representation.

      (8) Most importantly, if the authors are not detecting CFU at day 3, then earlier time points (e.g. day 2, day 1, or even 12 hours post-challenge) must be analyzed. The authors should show that CFU from the organs is decreasing in a time-dependent manner. Also, all CFU should be shown as Log10.

      Reply: please see the previous response.

      (9) Fig. 6: because it is unclear if the mice were challenged with the same inoculum of live Candida cells (untreated and treated with EDTA), the different cytokine profiles between the two groups could be simply due to the different inoculum sizes and not to the effect of EDTA on Ca.

      Reply: please see the previous response as given also for Reviewer 1.

    1. Author response:

      Reviewer #1

      […] it seems that the readout units are not operating in continuous time, and that interval discrimination relies in part on external information. Specifically, the readout units only look at the spike counts during the window delta_t_w.

      In the first version of the review, the reviewer implied that each readout unit only receives input during a small window around the interval it represents. However, this is not the case. The small window that is depicted in Fig. 16 is a sliding window that is used to compute the states (i.e., an estimate of the instantaneous firing rate) at each point in time. The fact that the readout units indeed do operate in continuous time is apparent from Fig. 2A, showing the activity of all output units as a function of time: There is gradually changing activity with a peak at the represented interval. If each unit would only receive input during a window of a couple milliseconds, there would be a single peak of activity at the represented interval, and near-zero activity at any other time.

      This misunderstanding has been cleared out in the current version of the review (see last paragraph of review #1).

      Stimulus onset occurs at 1500 ms in order to allow the network to stabilize. Ideally, this value should be randomized across trials to ensure performance generalizes across initial states.

      This is a valid point which we will address in the revision. However, we note that experimentation with different onset values did not change the dynamics of the network systematically in previous studies (i.e., Hass et al., 2022).

      Why does StDev saturate? Is that because subjective time saturates as well?

      Indeed, the two phenomena are closely related. In section “Deviations from the scalar property and the origin on Vierordt’s law”, we discuss that both is caused by the broadening of the tuning curves of the readout units (Fig 1A) as the longest time constants of the network are exceeded.

      In the discussion, it would be nice to explain that dopaminergic modulation of subjective timing is not as universally observed as the linear psychophysical law or the scalar property, and I believe somewhat controversial (e.g., Ward, ..., Balsam, 2009).

      We are thankful for this advice and will adapt the discussion accordingly in the revision. Still, we note that dopaminergic modulation of subjective timing is one of the more robust effects observed in several time perception experiments.

      Reviewer #2:

      (1) Lack of Empirical Data: […] The paper would benefit from quantitative and qualitative simulations of results from specific, large-sample studies to anchor the model's predictions in concrete empirical evidence.

      While it is correct that this study does not attempt the replicate a concrete empirical study, we note that do compare the model's results with specific studies wherever possible. The comparison is done on the level of parameters of functional relationships: For the linear psychophysical law, we compare the slope and the indifference point of the model with those from experimental studies. For the scalar property, we compare the Weber fraction of the model to those computed from experiments. For dopaminergic modulation of subjective duration, no direct comparison with experimental data is possible, as the levels of modulation are estimated from in vitro experiments and cannot be directly compared with modulations in vivo. However, we discuss a range of qualitative observations in experiments that are reproduced (and explained) by the model.

      The above arguments notwithstanding, one can discuss whether the presentation of the experimental results and the comparison with the simulations is appropriate, and we do plan to extend this presentation in a revision.

      (2) Methodological Ambiguities: The training and testing procedures lack robust checks for generalization, leading to potential overfitting issues.

      It is correct that formal checks for generalization, such as cross-validation protocols, are missing, and we will include them in the revision. However, as we obtained a mechanistic understanding of how the model tells time, we are confident that our results are not due to overfitting.

      (3) Inadequate Visualization of Empirical Data: References to empirical data are vague and not directly visualized alongside model outputs. Future iterations should include empirical data, not general trends from psychophysics, in figures for a clear comparison.

      As mentioned above, the comparison between simulation and empirical data will be extended in a revision. However, we argue that the “general trends”, namely adherence of the model to the often-reported psychophysical regularities, are of greater importance compared to the replication of, e.g. one specific slope of the linear psychophysical law, which does vary a lot between experiments.

      (4) Limitations in Model Scope and Dynamics: […] Expanding the model limitations to consider isochronous pulse processing and the emergence of limit-cycle behaviors after prolonged stimulation would provide a more comprehensive understanding of the model's capabilities and limitations.

      The current research focuses on the estimation of a single duration rather than the processing of sequences of durations. Sequence processing is a vast field, and it has been argued that it comprises different mechanisms compared to duration estimation. Thus, we feel that including sequences processing would be beyond the scope of the already quite extensive paper. However, we will discuss a possible extension of the model to sequence processing in the revision.

      Additionally, the justification for using(N_{Poisson}\) as a proxy for more connections is unclear and warrants a more direct approach.

      We considered different means to vary the noise input into the network, including changes in the number of connections. We ultimately chose to vary the firing rate of a fixed number of Poisson input neurons. As the sum of the firing rates of N independent Poisson neurons with the same f is simply N*f and the synaptic contributions from each spike also linearly add up, this is equivalent to adding more Poisson neurons and thus, more connections.

      (5) Omissions and Redundancies: Certain omissions, such as the lack of a condition in Figure 7A or missing references to relevant models and reviews, detract from the paper's thoroughness.

      The reviewer refers to a condition where everything is ablated except NMDA. We will include such a condition in the revision. Regarding missing references, the reviewer requests including references that focus on sequence processing. While the focus of the current work is on estimating a single duration rather than a sequence of durations (see above), we will include a review on this topic as an outlook on this possible extension of the model.

      Moreover, some statements and terms like "internal clock" are used without a clear mechanistic definition within the model.

      We are thankful for this advice and will adapt the revision accordingly.

    1. Author response:

      The following is the authors’ response to the current reviews.

      eLife assessment

      This useful manuscript challenges the utility of current paradigms for estimating brain-age with magnetic resonance imaging measures, but presents inadequate evidence to support the suggestion that an alternative approach focused on predicting cognition is more useful. The paper would benefit from a clearer explication of the methods and a more critical evaluation of the conceptual basis of the different models. This work will be of interest to researchers working on brain-age and related models.

      Thank you so much for providing high-quality reviews on our manuscript. We revised the manuscript to address all of the reviewers’ comments and provided full responses to each of the comments below. Importantly, in this revision, we clarified that we did not intend to use Brain Cognition as an alternative approach as mentioned by the editor. This is because, by design, the variation in fluid cognition explained by Brain Cognition should be higher or equal to that explained by Brain Age. Here we made this point more explicit and further stated that the relationship between Brain Cognition and fluid cognition indicates the upper limit of Brain Age’s capability in capturing fluid cognition. By examining what was captured by Brain Cognition, over and above Brain Age and chronological age via the unique effects of Brain Cognition, we were able to quantify the amount of co-variation between brain MRI and fluid cognition that was missed by Brain Age. And such quantification is the third aim of this study.

      Reviewer #1 (Public Review):

      In this paper, the authors evaluate the utility of brain age derived metrics for predicting cognitive decline by performing a 'commonality' analysis in a downstream regression that enables the different contribution of different predictors to be assessed. The main conclusion is that brain age derived metrics do not explain much additional variation in cognition over and above what is already explained by age. The authors propose to use a regression model trained to predict cognition ('brain cognition') as an alternative suited to applications of cognitive decline. While this is less accurate overall than brain age, it explains more unique variance in the downstream regression.

      Importantly, in this revision, we clarified that we did not intend to use Brain Cognition as an alternative approach. This is because, by design, the variation in fluid cognition explained by Brain Cognition should be higher or equal to that explained by Brain Age. Here we made this point more explicit and further stated that the relationship between Brain Cognition and fluid cognition indicates the upper limit of Brain Age’s capability in capturing fluid cognition. By examining what was captured by Brain Cognition, over and above Brain Age and chronological age via the unique effects of Brain Cognition, we were able to quantify the amount of co-variation between brain MRI and fluid cognition that was missed by Brain Age.

      REVISED VERSION: while the authors have partially addressed my concerns, I do not feel they have addressed them all. I do not feel they have addressed the weight instability and concerns about the stacked regression models satisfactorily.

      Please see our responses to #3 below

      I also must say that I agree with Reviewer 3 about the limitations of the brain age and brain cognition methods conceptually. In particular that the regression model used to predict fluid cognition will by construction explain more variance in cognition than a brain age model that is trained to predict age. This suffers from the same problem the authors raise with brain age and would indeed disappear if the authors had a separate measure of cognition against which to validate and were then to regress this out as they do for age correction. I am aware that these conceptual problems are more widespread than this paper alone (in fact throughout the brain age literature), so I do not believe the authors should be penalised for that. However, I do think they can make these concerns more explicit and further tone down the comments they make about the utility of brain cognition. I have indicated the main considerations about these points in the recommendations section below.

      Thank you so much for raising this point. We now have the following statement in the introduction and discussion to address this concern (see below).

      Briefly, we made it explicit that, by design, the variation in fluid cognition explained by Brain Cognition should be higher or equal to that explained by Brain Age. That is, the relationship between Brain Cognition and fluid cognition indicates the upper limit of Brain Age’s capability in capturing fluid cognition. More importantly, by examining what was captured by Brain Cognition, over and above Brain Age and chronological age via the unique effects of Brain Cognition, we were able to quantify the amount of co-variation between brain MRI and fluid cognition that was missed by Brain Age. And this is the third goal of this present study.

      From Introduction:

      “Third and finally, certain variation in fluid cognition is related to brain MRI, but to what extent does Brain Age not capture this variation? To estimate the variation in fluid cognition that is related to the brain MRI, we could build prediction models that directly predict fluid cognition (i.e., as opposed to chronological age) from brain MRI data. Previous studies found reasonable predictive performances of these cognition-prediction models, built from certain MRI modalities (Dubois et al., 2018; Pat, Wang, Anney, et al., 2022; Rasero et al., 2021; Sripada et al., 2020; Tetereva et al., 2022; for review, see Vieira et al., 2022). Analogous to Brain Age, we called the predicted values from these cognition-prediction models, Brain Cognition. The strength of an out-of-sample relationship between Brain Cognition and fluid cognition reflects variation in fluid cognition that is related to the brain MRI and, therefore, indicates the upper limit of Brain Age’s capability in capturing fluid cognition. This is, by design, the variation in fluid cognition explained by Brain Cognition should be higher or equal to that explained by Brain Age. Consequently, if we included Brain Cognition, Brain Age and chronological age in the same model to explain fluid cognition, we would be able to examine the unique effects of Brain Cognition that explain fluid cognition beyond Brain Age and chronological age. These unique effects of Brain Cognition, in turn, would indicate the amount of co-variation between brain MRI and fluid cognition that is missed by Brain Age.”

      From Discussion:

      “Third, by introducing Brain Cognition, we showed the extent to which Brain Age indices were not able to capture the variation in fluid cognition that is related to brain MRI. More specifically, using Brain Cognition allowed us to gauge the variation in fluid cognition that is related to the brain MRI, and thereby, to estimate the upper limit of what Brain Age can do. Moreover, by examining what was captured by Brain Cognition, over and above Brain Age and chronological age via the unique effects of Brain Cognition, we were able to quantify the amount of co-variation between brain MRI and fluid cognition that was missed by Brain Age.

      From our results, Brain Cognition, especially from certain cognition-prediction models such as the stacked models, has relatively good predictive performance, consistent with previous studies (Dubois et al., 2018; Pat, Wang, Anney, et al., 2022; Rasero et al., 2021; Sripada et al., 2020; Tetereva et al., 2022; for review, see Vieira et al., 2022). We then examined Brain Cognition using commonality analyses (Nimon et al., 2008) in multiple regression models having a Brain Age index, chronological age and Brain Cognition as regressors to explain fluid cognition. Similar to Brain Age indices, Brain Cognition exhibited large common effects with chronological age. But more importantly, unlike Brain Age indices, Brain Cognition showed large unique effects, up to around 11%. As explained above, the unique effects of Brain Cognition indicated the amount of co-variation between brain MRI and fluid cognition that was missed by a Brain Age index and chronological age. This missing amount was relatively high, considering that Brain Age and chronological age together explained around 32% of the total variation in fluid cognition. Accordingly, if a Brain Age index was used as a biomarker along with chronological age, we would have missed an opportunity to improve the performance of the model by around one-third of the variation explained.”

      This is a reasonably good paper and the use of a commonality analysis is a nice contribution to understanding variance partitioning across different covariates. I have some comments that I believe the authors ought to address, which mostly relate to clarity and interpretation

      Reviewer #1 Public Review #1

      First, from a conceptual point of view, the authors focus exclusively on cognition as a downstream outcome. I would suggest the authors nuance their discussion to provide broader considerations of the utility of their method and on the limits of interpretation of brain age models more generally.

      Thank you for your comments on this issue.

      We now discussed the broader consideration in detail:

      (1) the consistency between our findings on fluid cognition and other recent works on brain disorders,

      (2) the difference between studies investigating the utility of Brain Age in explaining cognitive functioning, including ours and others (e.g., Butler et al., 2021; Cole, 2020, 2020; Jirsaraie, Kaufmann, et al., 2023) and those explaining neurological/psychological disorders (e.g., Bashyam et al., 2020; Rokicki et al., 2021)

      and

      (3) suggested solutions we and others made to optimise the utility of Brain Age for both cognitive functioning and brain disorders.

      From Discussion:

      “This discrepancy between the predictive performance of age-prediction models and the utility of Brain Age indices as a biomarker is consistent with recent findings (for review, see Jirsaraie, Gorelik, et al., 2023), both in the context of cognitive functioning (Jirsaraie, Kaufmann, et al., 2023) and neurological/psychological disorders (Bashyam et al., 2020; Rokicki et al., 2021). For instance, combining different MRI modalities into the prediction models, similar to our stacked models, often leads to the highest performance of age-prediction models, but does not likely explain the highest variance across different phenotypes, including cognitive functioning and beyond (Jirsaraie, Gorelik, et al., 2023).”

      “There is a notable difference between studies investigating the utility of Brain Age in explaining cognitive functioning, including ours and others (e.g., Butler et al., 2021; Cole, 2020, 2020; Jirsaraie, Kaufmann, et al., 2023) and those explaining neurological/psychological disorders (e.g., Bashyam et al., 2020; Rokicki et al., 2021). We consider the former as a normative type of study and the latter as a case-control type of study (Insel et al., 2010; Marquand et al., 2016). Those case-control Brain Age studies focusing on neurological/psychological disorders often build age-prediction models from MRI data of largely healthy participants (e.g., controls in a case-control design or large samples in a population-based design), apply the built age-prediction models to participants without vs. with neurological/psychological disorders and compare Brain Age indices between the two groups. On the one hand, this means that case-control studies treat Brain Age as a method to detect anomalies in the neurological/psychological group (Hahn et al., 2021). On the other hand, this also means that case-control studies have to ignore under-fitted models when applied prediction models built from largely healthy participants to participants with neurological/psychological disorders (i.e., Brain Age may predict chronological age well for the controls, but not for those with a disorder). On the contrary, our study and other normative studies focusing on cognitive functioning often build age-prediction models from MRI data of largely healthy participants and apply the built age-prediction models to participants who are also largely healthy. Accordingly, the age-prediction models for explaining cognitive functioning in normative studies, while not allowing us to detect group-level anomalies, do not suffer from being under-fitted. This unfortunately might limit the generalisability of our study into just the normative type of study. Future work is still needed to test the utility of brain age in the case-control case.”

      “Next, researchers should not select age-prediction models based solely on age-prediction performance. Instead, researchers could select age-prediction models that explained phenotypes of interest the best. Here we selected age-prediction models based on a set of features (i.e., modalities) of brain MRI. This strategy was found effective not only for fluid cognition as we demonstrated here, but also for neurological and psychological disorders as shown elsewhere (Jirsaraie, Gorelik, et al., 2023; Rokicki et al., 2021). Rokicki and colleagues (2021), for instance, found that, while integrating across MRI modalities led to age-prediction models with the highest age-prediction performance, using only T1 structural MRI gave age-prediction models that were better at classifying Alzheimer’s disease. Similarly, using only cerebral blood flow gave age-prediction models that were better at classifying mild/subjective cognitive impairment, schizophrenia and bipolar disorder.

      As opposed to selecting age-prediction models based on a set of features, researchers could also select age-prediction models based on modelling methods. For instance, Jirsaraie and colleagues (2023) compared gradient tree boosting (GTB) and deep-learning brain network (DBN) algorithms in building age-prediction models. They found GTB to have higher age-prediction performance but DBN to have better utility in explaining cognitive functioning. In this case, an algorithm with better utility (e.g., DBN) should be used for explaining a phenotype of interest. Similarly, Bashyam and colleagues (2020) built different DBN-based age-prediction models, varying in age-prediction performance. The DBN models with a higher number of epochs corresponded to higher age-prediction performance. However, DBN-based age-prediction models with a moderate (as opposed to higher or lower) number of epochs were better at classifying Alzheimer’s disease, mild cognitive impairment and schizophrenia. In this case, a model from the same algorithm with better utility (e.g., those DBN with a moderate epoch number) should be used for explaining a phenotype of interest. Accordingly, this calls for a change in research practice, as recently pointed out by Jirasarie and colleagues (2023, p7), “Despite mounting evidence, there is a persisting assumption across several studies that the most accurate brain age models will have the most potential for detecting differences in a given phenotype of interest”. Future neuroimaging research should aim to build age-prediction models that are not necessarily good at predicting age, but at capturing phenotypes of interest.”

      Reviewer #1 Public Review #2

      Second, from a methods perspective, there is not a sufficient explanation of the methodological procedures in the current manuscript to fully understand how the stacked regression models were constructed. I would request that the authors provide more information to enable the reader to better understand the stacked regression models used to ensure that these models are not overfit.

      Thank you for allowing us an opportunity to clarify our stacked model. We made additional clarification to make this clearer (see below). We wanted to confirm that we did not use test sets to build a stacked model in both lower and higher levels of the Elastic Net models. Test sets were there just for testing the performance of the models.

      From Methods: “We used nested cross-validation (CV) to build these prediction models (see Figure 7). We first split the data into five outer folds, leaving each outer fold with around 100 participants. This number of participants in each fold is to ensure the stability of the test performance across folds. In each outer-fold CV loop, one of the outer folds was treated as an outer-fold test set, and the rest was treated as an outer-fold training set. Ultimately, looping through the nested CV resulted in a) prediction models from each of the 18 sets of features as well as b) prediction models that drew information across different combinations of the 18 separate sets, known as “stacked models.” We specified eight stacked models: “All” (i.e., including all 18 sets of features), “All excluding Task FC”, “All excluding Task Contrast”, “Non-Task” (i.e., including only Rest FC and sMRI), “Resting and Task FC”, “Task Contrast and FC”, “Task Contrast” and “Task FC”. Accordingly, there were 26 prediction models in total for both Brain Age and Brain Cognition.

      To create these 26 prediction models, we applied three steps for each outer-fold loop. The first step aimed at tuning prediction models for each of 18 sets of features. This step only involved the outer-fold training set and did not involve the outer-fold test set. Here, we divided the outer-fold training set into five inner folds and applied inner-fold CV to tune hyperparameters with grid search. Specifically, in each inner-fold CV, one of the inner folds was treated as an inner-fold validation set, and the rest was treated as an inner-fold training set. Within each inner-fold CV loop, we used the inner-fold training set to estimate parameters of the prediction model with a particular set of hyperparameters and applied the estimated model to the inner-fold validation set. After looping through the inner-fold CV, we, then, chose the prediction models that led to the highest performance, reflected by coefficient of determination (R2), on average across the inner-fold validation sets. This led to 18 tuned models, one for each of the 18 sets of features, for each outer fold.

      The second step aimed at tuning stacked models. Same as the first step, the second step only involved the outer-fold training set and did not involve the outer-fold test set. Here, using the same outer-fold training set as the first step, we applied tuned models, created from the first step, one from each of the 18 sets of features, resulting in 18 predicted values for each participant. We, then, re-divided this outer-fold training set into new five inner folds. In each inner fold, we treated different combinations of the 18 predicted values from separate sets of features as features to predict the targets in separate “stacked” models. Same as the first step, in each inner-fold CV loop, we treated one out of five inner folds as an inner-fold validation set, and the rest as an inner-fold training set. Also as in the first step, we used the inner-fold training set to estimate parameters of the prediction model with a particular set of hyperparameters from our grid. We tuned the hyperparameters of stacked models using grid search by selecting the models with the highest R2 on average across the inner-fold validation sets. This led to eight tuned stacked models.

      The third step aimed at testing the predictive performance of the 18 tuned prediction models from each of the set of features, built from the first step, and eight tuned stacked models, built from the second step. Unlike the first two steps, here we applied the already tuned models to the outer-fold test set. We started by applying the 18 tuned prediction models from each of the sets of features to each observation in the outer-fold test set, resulting in 18 predicted values. We then applied the tuned stacked models to these predicted values from separate sets of features, resulting in eight predicted values.

      To demonstrate the predictive performance, we assessed the similarity between the observed values and the predicted values of each model across outer-fold test sets, using Pearson’s r, coefficient of determination (R2) and mean absolute error (MAE). Note that for R2, we used the sum of squares definition (i.e., R2 = 1 – (sum of squares residuals/total sum of squares)) per a previous recommendation (Poldrack et al., 2020). We considered the predicted values from the outer-fold test sets of models predicting age or fluid cognition, as Brain Age and Brain Cognition, respectively.”

      Note some previous research, including ours (Tetereva et al., 2022), splits the observations in the outer-fold training set into layer 1 and layer 2 and applies the first and second steps to layers 1 and 2, respectively. Here we decided against this approach and used the same outer-fold training set for both first and second steps in order to avoid potential bias toward the stacked models. This is because, when the data are split into two layers, predictive models built for each separate set of features only use the data from layer 1, while the stacked models use the data from both layers 1 and 2. In practice with large enough data, these two approaches might not differ much, as we demonstrated previously (Tetereva et al., 2022).

      Reviewer #1 Public Review #3

      Please also provide an indication of the different regression strengths that were estimated across the different models and cross-validation splits. Also, how stable were the weights across splits?

      The focus of this article is on the predictions. Still, it is informative for readers to understand how stable the feature importance (i.e., Elastic Net coefficients) is. To demonstrate the stability of feature importance, we now examined the rank stability of feature importance using Spearman’s ρ (see Figure 4). Specifically, we correlated the feature importance between two prediction models of the same features, used in two different outer-fold test sets. Given that there were five outer-fold test sets, we computed 10 Spearman’s ρ for each prediction model of the same features. We found Spearman’s ρ to be varied dramatically in both age-prediction (range=.31-.94) and fluid cognition-prediction (range=.16-.84) models. This means that some prediction models were much more stable in their feature importance than others. This is probably due to various factors such as a) the collinearity of features in the model, b) the number of features (e.g., 71,631 features in functional connectivity, which were further reduced to 75 PCAs, as compared to 19 features in subcortical volume based on the ASEG atlas), c) the penalisation of coefficients either with ‘Ridge’ or ‘Lasso’ methods, which resulted in reduction as a group of features or selection of a feature among correlated features, respectively, and d) the predictive performance of the models. Understanding the stability of feature importance is beyond the scope of the current article. As mentioned by Reviewer 1, “The predictions can be stable when the coefficients are not,” and we chose to focus on the prediction in the current article.

      Reviewer #1 Public Review #4

      Please provide more details about the task designs, MRI processing procedures that were employed on this sample in addition to the regression methods and bias correction methods used. For example, there are several different parameterisations of the elastic net, please provide equations to describe the method used here so that readers can easily determine how the regularisation parameters should be interpreted.

      Thank you for the opportunity for us to provide more methodical details.

      First, for the task design, we included the following statements:

      From Methods:

      “HCP-A collected fMRI data from three tasks: Face Name (Sperling et al., 2001), Conditioned Approach Response Inhibition Task (CARIT) (Somerville et al., 2018) and VISual MOTOR (VISMOTOR) (Ances et al., 2009).

      First, the Face Name task (Sperling et al., 2001) taps into episodic memory. The task had three blocks. In the encoding block [Encoding], participants were asked to memorise the names of faces shown. These faces were then shown again in the recall block [Recall] when the participants were asked if they could remember the names of the previously shown faces. There was also the distractor block [Distractor] occurring between the encoding and recall blocks. Here participants were distracted by a Go/NoGo task. We computed six contrasts for this Face Name task: [Encode], [Recall], [Distractor], [Encode vs. Distractor], [Recall vs. Distractor] and [Encode vs. Recall].

      Second, the CARIT task (Somerville et al., 2018) was adapted from the classic Go/NoGo task and taps into inhibitory control. Participants were asked to press a button to all [Go] but not to two [NoGo] shapes. We computed three contrasts for the CARIT task: [NoGo], [Go] and [NoGo vs. Go].

      Third, the VISMOTOR task (Ances et al., 2009) was designed to test simple activation of the motor and visual cortices. Participants saw a checkerboard with a red square either on the left or right. They needed to press a corresponding key to indicate the location of the red square. We computed just one contrast for the VISMOTOR task: [Vismotor], which indicates the presence of the checkerboard vs. baseline.”

      Second, for MRI processing procedures, we included the following statements.

      From Methods: “HCP-A provides details of parameters for brain MRI elsewhere (Bookheimer et al., 2019; Harms et al., 2018). Here we used MRI data that were pre-processed by the HCP-A with recommended methods, including the MSMALL alignment (Glasser et al., 2016; Robinson et al., 2018) and ICA-FIX (Glasser et al., 2016) for functional MRI. We used multiple brain MRI modalities, covering task functional MRI (task fMRI), resting-state functional MRI (rsfMRI) and structural MRI (sMRI), and organised them into 19 sets of features.”

      “ Sets of Features 1-10: Task fMRI contrast (Task Contrast) Task contrasts reflect fMRI activation relevant to events in each task. Bookheimer and colleagues (2019) provided detailed information about the fMRI in HCP-A. Here we focused on the pre-processed task fMRI Connectivity Informatics Technology Initiative (CIFTI) files with a suffix, “_PA_Atlas_MSMAll_hp0_clean.dtseries.nii.” These CIFTI files encompassed both the cortical mesh surface and subcortical volume (Glasser et al., 2013). Collected using the posterior-to-anterior (PA) phase, these files were aligned using MSMALL (Glasser et al., 2016; Robinson et al., 2018), linear detrended (see https://groups.google.com/a/humanconnectome.org/g/hcp-users/c/ZLJc092h980/m/GiihzQAUAwAJ) and cleaned from potential artifacts using ICA-FIX (Glasser et al., 2016).

      To extract Task Contrasts, we regressed the fMRI time series on the convolved task events using a double-gamma canonical hemodynamic response function via FMRIB Software Library (FSL)’s FMRI Expert Analysis Tool (FEAT) (Woolrich et al., 2001). We kept FSL’s default high pass cutoff at 200s (i.e., .005 Hz). We then parcellated the contrast ‘cope’ files, using the Glasser atlas (Gordon et al., 2016) for cortical surface regions and the Freesurfer’s automatic segmentation (aseg) (Fischl et al., 2002) for subcortical regions. This resulted in 379 regions, whose number was, in turn, the number of features for each Task Contrast set of features. “

      “ Sets of Features 11-13: Task fMRI functional connectivity (Task FC) Task FC reflects functional connectivity (FC ) among the brain regions during each task, which is considered an important source of individual differences (Elliott et al., 2019; Fair et al., 2007; Gratton et al., 2018). We used the same CIFTI file “_PA_Atlas_MSMAll_hp0_clean.dtseries.nii.” as the task contrasts. Unlike Task Contrasts, here we treated the double-gamma, convolved task events as regressors of no interest and focused on the residuals of the regression from each task (Fair et al., 2007). We computed these regressors on FSL, and regressed them in nilearn (Abraham et al., 2014). Following previous work on task FC (Elliott et al., 2019), we applied a highpass at .008 Hz. For parcellation, we used the same atlases as Task Contrast (Fischl et al., 2002; Glasser et al., 2016). We computed Pearson’s correlations of each pair of 379 regions, resulting in a table of 71,631 non-overlapping FC indices for each task. We then applied r-to-z transformation and principal component analysis (PCA) of 75 components (Rasero et al., 2021; Sripada et al., 2019, 2020). Note to avoid data leakage, we conducted the PCA on each training set and applied its definition to the corresponding test set. Accordingly, there were three sets of 75 features for Task FC, one for each task.

      Set of Features 14: Resting-state functional MRI functional connectivity (Rest FC) Similar to Task FC, Rest FC reflects functional connectivity (FC ) among the brain regions, except that Rest FC occurred during the resting (as opposed to task-performing) period. HCP-A collected Rest FC from four 6.42-min (488 frames) runs across two days, leading to 26-min long data (Harms et al., 2018). On each day, the study scanned two runs of Rest FC, starting with anterior-to-posterior (AP) and then with posterior-to-anterior (PA) phase encoding polarity. We used the “rfMRI_REST_Atlas_MSMAll_hp0_clean.dscalar.nii” file that was pre-processed and concatenated across the four runs. We applied the same computations (i.e., highpass filter, parcellation, Pearson’s correlations, r-to-z transformation and PCA) with the Task FC.

      Sets of Features 15-18: Structural MRI (sMRI)

      sMRI reflects individual differences in brain anatomy. The HCP-A used an established pre-processing pipeline for sMRI (Glasser et al., 2013). We focused on four sets of features: cortical thickness, cortical surface area, subcortical volume and total brain volume. For cortical thickness and cortical surface area, we used Destrieux’s atlas (Destrieux et al., 2010; Fischl, 2012) from FreeSurfer’s “aparc.stats” file, resulting in 148 regions for each set of features. For subcortical volume, we used the aseg atlas (Fischl et al., 2002) from FreeSurfer’s “aseg.stats” file, resulting in 19 regions. For total brain volume, we had five FreeSurfer-based features: “FS_IntraCranial_Vol” or estimated intra-cranial volume, “FS_TotCort_GM_Vol” or total cortical grey matter volume, “FS_Tot_WM_Vol” or total cortical white matter volume, “FS_SubCort_GM_Vol” or total subcortical grey matter volume and “FS_BrainSegVol_eTIV_Ratio” or ratio of brain segmentation volume to estimated total intracranial volume.”

      Third, for regression methods and bias correction methods used, we included the following statements:

      From Methods:

      “For the machine learning algorithm, we used Elastic Net (Zou & Hastie, 2005). Elastic Net is a general form of penalised regressions (including Lasso and Ridge regression), allowing us to simultaneously draw information across different brain indices to predict one target variable. Penalised regressions are commonly used for building age-prediction models (Jirsaraie, Gorelik, et al., 2023). Previously we showed that the performance of Elastic Net in predicting cognitive abilities is on par, if not better than, many non-linear and more-complicated algorithms (Pat, Wang, Bartonicek, et al., 2022; Tetereva et al., 2022). Moreover, Elastic Net coefficients are readily explainable, allowing us the ability to explain how our age-prediction and cognition-prediction models made the prediction from each brain feature (Molnar, 2019; Pat, Wang, Bartonicek, et al., 2022) (see below).

      Elastic Net simultaneously minimises the weighted sum of the features’ coefficients. The degree of penalty to the sum of the feature’s coefficients is determined by a shrinkage hyperparameter ‘α’: the greater the α, the more the coefficients shrink, and the more regularised the model becomes. Elastic Net also includes another hyperparameter, ‘l1 ratio’, which determines the degree to which the sum of either the squared (known as ‘Ridge’; l1 ratio=0) or absolute (known as ‘Lasso’; l1 ratio=1) coefficients is penalised (Zou & Hastie, 2005). The objective function of Elastic Net as implemented by sklearn (Pedregosa et al., 2011) is defined as:

      where X is the features, y is the target, and β is the coefficient. In our grid search, we tuned two Elastic Net hyperparameters: α using 70 numbers in log space, ranging from .1 and 100, and l_1-ratio using 25 numbers in linear space, ranging from 0 and 1.

      To understand how Elastic Net made a prediction based on different brain features, we examined the coefficients of the tuned model. Elastic Net coefficients can be considered as feature importance, such that more positive Elastic Net coefficients lead to more positive predicted values and, similarly, more negative Elastic Net coefficients lead to more negative predicted values (Molnar, 2019; Pat, Wang, Bartonicek, et al., 2022). While the magnitude of Elastic Net coefficients is regularised (thus making it difficult for us to interpret the magnitude itself directly), we could still indicate that a brain feature with a higher magnitude weights relatively stronger in making a prediction. Another benefit of Elastic Net as a penalised regression is that the coefficients are less susceptible to collinearity among features as they have already been regularised (Dormann et al., 2013; Pat, Wang, Bartonicek, et al., 2022).

      Given that we used five-fold nested cross validation, different outer folds may have different degrees of ‘α’ and ‘l1 ratio’, making the final coefficients from different folds to be different. For instance, for certain sets of features, penalisation may not play a big part (i.e., higher or lower ‘α’ leads to similar predictive performance), resulting in different ‘α’ for different folds. To remedy this in the visualisation of Elastic Net feature importance, we refitted the Elastic Net model to the full dataset without splitting them into five folds and visualised the coefficients on brain images using Brainspace (Vos De Wael et al., 2020) and Nilern (Abraham et al., 2014) packages. Note, unlike other sets of features, Task FC and Rest FC were modelled after data reduction via PCA. Thus, for Task FC and Rest FC, we, first, multiplied the absolute PCA scores (extracted from the ‘components_’ attribute of ‘sklearn.decomposition.PCA’) with Elastic Net coefficients and, then, summed the multiplied values across the 75 components, leaving 71,631 ROI-pair indices. “

      References

      Abraham, A., Pedregosa, F., Eickenberg, M., Gervais, P., Mueller, A., Kossaifi, J., Gramfort, A., Thirion, B., & Varoquaux, G. (2014). Machine learning for neuroimaging with scikit-learn. Frontiers in Neuroinformatics, 8, 14. https://doi.org/10.3389/fninf.2014.00014

      Ances, B. M., Liang, C. L., Leontiev, O., Perthen, J. E., Fleisher, A. S., Lansing, A. E., & Buxton, R. B. (2009). Effects of aging on cerebral blood flow, oxygen metabolism, and blood oxygenation level dependent responses to visual stimulation. Human Brain Mapping, 30(4), 1120–1132. https://doi.org/10.1002/hbm.20574

      Bashyam, V. M., Erus, G., Doshi, J., Habes, M., Nasrallah, I. M., Truelove-Hill, M., Srinivasan, D., Mamourian, L., Pomponio, R., Fan, Y., Launer, L. J., Masters, C. L., Maruff, P., Zhuo, C., Völzke, H., Johnson, S. C., Fripp, J., Koutsouleris, N., Satterthwaite, T. D., … on behalf of the ISTAGING Consortium, the P. A. disease C., ADNI, and CARDIA studies. (2020). MRI signatures of brain age and disease over the lifespan based on a deep brain network and 14 468 individuals worldwide. Brain, 143(7), 2312–2324. https://doi.org/10.1093/brain/awaa160

      Bookheimer, S. Y., Salat, D. H., Terpstra, M., Ances, B. M., Barch, D. M., Buckner, R. L., Burgess, G. C., Curtiss, S. W., Diaz-Santos, M., Elam, J. S., Fischl, B., Greve, D. N., Hagy, H. A., Harms, M. P., Hatch, O. M., Hedden, T., Hodge, C., Japardi, K. C., Kuhn, T. P., … Yacoub, E. (2019). The Lifespan Human Connectome Project in Aging: An overview. NeuroImage, 185, 335–348. https://doi.org/10.1016/j.neuroimage.2018.10.009

      Butler, E. R., Chen, A., Ramadan, R., Le, T. T., Ruparel, K., Moore, T. M., Satterthwaite, T. D., Zhang, F., Shou, H., Gur, R. C., Nichols, T. E., & Shinohara, R. T. (2021). Pitfalls in brain age analyses. Human Brain Mapping, 42(13), 4092–4101. https://doi.org/10.1002/hbm.25533

      Cole, J. H. (2020). Multimodality neuroimaging brain-age in UK biobank: Relationship to biomedical, lifestyle, and cognitive factors. Neurobiology of Aging, 92, 34–42. https://doi.org/10.1016/j.neurobiolaging.2020.03.014

      Destrieux, C., Fischl, B., Dale, A., & Halgren, E. (2010). Automatic parcellation of human cortical gyri and sulci using standard anatomical nomenclature. NeuroImage, 53(1), 1–15. https://doi.org/10.1016/j.neuroimage.2010.06.010

      Dormann, C. F., Elith, J., Bacher, S., Buchmann, C., Carl, G., Carré, G., Marquéz, J. R. G., Gruber, B., Lafourcade, B., Leitão, P. J., Münkemüller, T., McClean, C., Osborne, P. E., Reineking, B., Schröder, B., Skidmore, A. K., Zurell, D., & Lautenbach, S. (2013). Collinearity: A review of methods to deal with it and a simulation study evaluating their performance. Ecography, 36(1), 27–46. https://doi.org/10.1111/j.1600-0587.2012.07348.x

      Dubois, J., Galdi, P., Paul, L. K., & Adolphs, R. (2018). A distributed brain network predicts general intelligence from resting-state human neuroimaging data. Philosophical Transactions of the Royal Society B: Biological Sciences, 373(1756), 20170284. https://doi.org/10.1098/rstb.2017.0284

      Elliott, M. L., Knodt, A. R., Cooke, M., Kim, M. J., Melzer, T. R., Keenan, R., Ireland, D., Ramrakha, S., Poulton, R., Caspi, A., Moffitt, T. E., & Hariri, A. R. (2019). General functional connectivity: Shared features of resting-state and task fMRI drive reliable and heritable individual differences in functional brain networks. NeuroImage, 189, 516–532. https://doi.org/10.1016/j.neuroimage.2019.01.068

      Fair, D. A., Schlaggar, B. L., Cohen, A. L., Miezin, F. M., Dosenbach, N. U. F., Wenger, K. K., Fox, M. D., Snyder, A. Z., Raichle, M. E., & Petersen, S. E. (2007). A method for using blocked and event-related fMRI data to study “resting state” functional connectivity. NeuroImage, 35(1), 396–405. https://doi.org/10.1016/j.neuroimage.2006.11.051

      Fischl, B. (2012). FreeSurfer. NeuroImage, 62(2), 774–781. https://doi.org/10.1016/j.neuroimage.2012.01.021

      Fischl, B., Salat, D. H., Busa, E., Albert, M., Dieterich, M., Haselgrove, C., van der Kouwe, A., Killiany, R., Kennedy, D., Klaveness, S., Montillo, A., Makris, N., Rosen, B., & Dale, A. M. (2002). Whole Brain Segmentation. Neuron, 33(3), 341–355. https://doi.org/10.1016/S0896-6273(02)00569-X

      Glasser, M. F., Smith, S. M., Marcus, D. S., Andersson, J. L. R., Auerbach, E. J., Behrens, T. E. J., Coalson, T. S., Harms, M. P., Jenkinson, M., Moeller, S., Robinson, E. C., Sotiropoulos, S. N., Xu, J., Yacoub, E., Ugurbil, K., & Van Essen, D. C. (2016). The Human Connectome Project’s neuroimaging approach. Nature Neuroscience, 19(9), 1175–1187. https://doi.org/10.1038/nn.4361

      Glasser, M. F., Sotiropoulos, S. N., Wilson, J. A., Coalson, T. S., Fischl, B., Andersson, J. L., Xu, J., Jbabdi, S., Webster, M., Polimeni, J. R., Van Essen, D. C., & Jenkinson, M. (2013). The minimal preprocessing pipelines for the Human Connectome Project. NeuroImage, 80, 105–124. https://doi.org/10.1016/j.neuroimage.2013.04.127

      Gordon, E. M., Laumann, T. O., Adeyemo, B., Huckins, J. F., Kelley, W. M., & Petersen, S. E. (2016). Generation and Evaluation of a Cortical Area Parcellation from Resting-State Correlations. Cerebral Cortex, 26(1), 288–303. https://doi.org/10.1093/cercor/bhu239

      Gratton, C., Laumann, T. O., Nielsen, A. N., Greene, D. J., Gordon, E. M., Gilmore, A. W., Nelson, S. M., Coalson, R. S., Snyder, A. Z., Schlaggar, B. L., Dosenbach, N. U. F., & Petersen, S. E. (2018). Functional Brain Networks Are Dominated by Stable Group and Individual Factors, Not Cognitive or Daily Variation. Neuron, 98(2), 439-452.e5. https://doi.org/10.1016/j.neuron.2018.03.035

      Hahn, T., Fisch, L., Ernsting, J., Winter, N. R., Leenings, R., Sarink, K., Emden, D., Kircher, T., Berger, K., & Dannlowski, U. (2021). From ‘loose fitting’ to high-performance, uncertainty-aware brain-age modelling. Brain, 144(3), e31–e31. https://doi.org/10.1093/brain/awaa454

      Harms, M. P., Somerville, L. H., Ances, B. M., Andersson, J., Barch, D. M., Bastiani, M., Bookheimer, S. Y., Brown, T. B., Buckner, R. L., Burgess, G. C., Coalson, T. S., Chappell, M. A., Dapretto, M., Douaud, G., Fischl, B., Glasser, M. F., Greve, D. N., Hodge, C., Jamison, K. W., … Yacoub, E. (2018). Extending the Human Connectome Project across ages: Imaging protocols for the Lifespan Development and Aging projects. NeuroImage, 183, 972–984. https://doi.org/10.1016/j.neuroimage.2018.09.060

      Insel, T., Cuthbert, B., Garvey, M., Heinssen, R., Pine, D. S., Quinn, K., Sanislow, C., & Wang, P. (2010). Research Domain Criteria (RDoC): Toward a New Classification Framework for Research on Mental Disorders. American Journal of Psychiatry, 167(7), 748–751. https://doi.org/10.1176/appi.ajp.2010.09091379

      Jirsaraie, R. J., Gorelik, A. J., Gatavins, M. M., Engemann, D. A., Bogdan, R., Barch, D. M., & Sotiras, A. (2023). A systematic review of multimodal brain age studies: Uncovering a divergence between model accuracy and utility. Patterns, 4(4), 100712. https://doi.org/10.1016/j.patter.2023.100712

      Jirsaraie, R. J., Kaufmann, T., Bashyam, V., Erus, G., Luby, J. L., Westlye, L. T., Davatzikos, C., Barch, D. M., & Sotiras, A. (2023). Benchmarking the generalizability of brain age models: Challenges posed by scanner variance and prediction bias. Human Brain Mapping, 44(3), 1118–1128. https://doi.org/10.1002/hbm.26144

      Marquand, A. F., Rezek, I., Buitelaar, J., & Beckmann, C. F. (2016). Understanding Heterogeneity in Clinical Cohorts Using Normative Models: Beyond Case-Control Studies. Biological Psychiatry, 80(7), 552–561. https://doi.org/10.1016/j.biopsych.2015.12.023

      Molnar, C. (2019). Interpretable Machine Learning. A Guide for Making Black Box Models Explainable. https://christophm.github.io/interpretable-ml-book/

      Nimon, K., Lewis, M., Kane, R., & Haynes, R. M. (2008). An R package to compute commonality coefficients in the multiple regression case: An introduction to the package and a practical example. Behavior Research Methods, 40(2), 457–466. https://doi.org/10.3758/BRM.40.2.457

      Pat, N., Wang, Y., Anney, R., Riglin, L., Thapar, A., & Stringaris, A. (2022). Longitudinally stable, brain‐based predictive models mediate the relationships between childhood cognition and socio‐demographic, psychological and genetic factors. Human Brain Mapping, hbm.26027. https://doi.org/10.1002/hbm.26027

      Pat, N., Wang, Y., Bartonicek, A., Candia, J., & Stringaris, A. (2022). Explainable machine learning approach to predict and explain the relationship between task-based fMRI and individual differences in cognition. Cerebral Cortex, bhac235. https://doi.org/10.1093/cercor/bhac235

      Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., & Duchesnay, É. (2011). Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research, 12(85), 2825–2830.

      Poldrack, R. A., Huckins, G., & Varoquaux, G. (2020). Establishment of Best Practices for Evidence for Prediction: A Review. JAMA Psychiatry, 77(5), 534–540. https://doi.org/10.1001/jamapsychiatry.2019.3671

      Rasero, J., Sentis, A. I., Yeh, F.-C., & Verstynen, T. (2021). Integrating across neuroimaging modalities boosts prediction accuracy of cognitive ability. PLOS Computational Biology, 17(3), e1008347. https://doi.org/10.1371/journal.pcbi.1008347

      Robinson, E. C., Garcia, K., Glasser, M. F., Chen, Z., Coalson, T. S., Makropoulos, A., Bozek, J., Wright, R., Schuh, A., Webster, M., Hutter, J., Price, A., Cordero Grande, L., Hughes, E., Tusor, N., Bayly, P. V., Van Essen, D. C., Smith, S. M., Edwards, A. D., … Rueckert, D. (2018). Multimodal surface matching with higher-order smoothness constraints. NeuroImage, 167, 453–465. https://doi.org/10.1016/j.neuroimage.2017.10.037

      Rokicki, J., Wolfers, T., Nordhøy, W., Tesli, N., Quintana, D. S., Alnæs, D., Richard, G., de Lange, A.-M. G., Lund, M. J., Norbom, L., Agartz, I., Melle, I., Nærland, T., Selbæk, G., Persson, K., Nordvik, J. E., Schwarz, E., Andreassen, O. A., Kaufmann, T., & Westlye, L. T. (2021). Multimodal imaging improves brain age prediction and reveals distinct abnormalities in patients with psychiatric and neurological disorders. Human Brain Mapping, 42(6), 1714–1726. https://doi.org/10.1002/hbm.25323

      Somerville, L. H., Bookheimer, S. Y., Buckner, R. L., Burgess, G. C., Curtiss, S. W., Dapretto, M., Elam, J. S., Gaffrey, M. S., Harms, M. P., Hodge, C., Kandala, S., Kastman, E. K., Nichols, T. E., Schlaggar, B. L., Smith, S. M., Thomas, K. M., Yacoub, E., Van Essen, D. C., & Barch, D. M. (2018). The Lifespan Human Connectome Project in Development: A large-scale study of brain connectivity development in 5–21 year olds. NeuroImage, 183, 456–468. https://doi.org/10.1016/j.neuroimage.2018.08.050

      Sperling, R. A., Bates, J. F., Cocchiarella, A. J., Schacter, D. L., Rosen, B. R., & Albert, M. S. (2001). Encoding novel face-name associations: A functional MRI study. Human Brain Mapping, 14(3), 129–139. https://doi.org/10.1002/hbm.1047

      Sripada, C., Angstadt, M., Rutherford, S., Kessler, D., Kim, Y., Yee, M., & Levina, E. (2019). Basic Units of Inter-Individual Variation in Resting State Connectomes. Scientific Reports, 9(1), Article 1. https://doi.org/10.1038/s41598-018-38406-5

      Sripada, C., Angstadt, M., Rutherford, S., Taxali, A., & Shedden, K. (2020). Toward a “treadmill test” for cognition: Improved prediction of general cognitive ability from the task activated brain. Human Brain Mapping, 41(12), 3186–3197. https://doi.org/10.1002/hbm.25007

      Tetereva, A., Li, J., Deng, J. D., Stringaris, A., & Pat, N. (2022). Capturing brain‐cognition relationship: Integrating task‐based fMRI across tasks markedly boosts prediction and test‐retest reliability. NeuroImage, 263, 119588. https://doi.org/10.1016/j.neuroimage.2022.119588

      Vieira, B. H., Pamplona, G. S. P., Fachinello, K., Silva, A. K., Foss, M. P., & Salmon, C. E. G. (2022). On the prediction of human intelligence from neuroimaging: A systematic review of methods and reporting. Intelligence, 93, 101654. https://doi.org/10.1016/j.intell.2022.101654

      Vos De Wael, R., Benkarim, O., Paquola, C., Lariviere, S., Royer, J., Tavakol, S., Xu, T., Hong, S.-J., Langs, G., Valk, S., Misic, B., Milham, M., Margulies, D., Smallwood, J., & Bernhardt, B. C. (2020). BrainSpace: A toolbox for the analysis of macroscale gradients in neuroimaging and connectomics datasets. Communications Biology, 3(1), 103. https://doi.org/10.1038/s42003-020-0794-7

      Woolrich, M. W., Ripley, B. D., Brady, M., & Smith, S. M. (2001). Temporal Autocorrelation in Univariate Linear Modeling of FMRI Data. NeuroImage, 14(6), 1370–1386. https://doi.org/10.1006/nimg.2001.0931

      Zou, H., & Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(2), 301–320. https://doi.org/10.1111/j.1467-9868.2005.00503.x


      The following is the authors’ response to the previous reviews.

      eLife assessment

      This useful manuscript challenges the utility of current paradigms for estimating brain-age with magnetic resonance imaging measures, but presents inadequate evidence to support the suggestion that an alternative approach focused on predicting cognition is more useful. The paper would benefit from a clearer explication of the methods and a more critical evaluation of the conceptual basis of the different models. This work will be of interest to researchers working on brain-age and related models.

      Thank you so much for providing high-quality reviews on our manuscript. We revised the manuscript to address all of the reviewers’ comments and provided full responses to each of the comments below. Importantly, in this revision, we clarified that we did not intend to use Brain Cognition as an alternative approach. This is because, by design, the variation in fluid cognition explained by Brain Cognition should be higher or equal to that explained by Brain Age. Here we made this point more explicit and further stated that the relationship between Brain Cognition and fluid cognition indicates the upper limit of Brain Age’s capability in capturing fluid cognition. By examining what was captured by Brain Cognition, over and above Brain Age and chronological age via the unique effects of Brain Cognition, we were able to quantify the amount of co-variation between brain MRI and fluid cognition that was missed by Brain Age. And such quantification is the third aim of this study.

      Public Reviews:

      Reviewer 1 (Public Review):

      In this paper, the authors evaluate the utility of brain-age-derived metrics for predicting cognitive decline by performing a 'commonality' analysis in a downstream regression that enables the different contribution of different predictors to be assessed. The main conclusion is that brain-age-derived metrics do not explain much additional variation in cognition over and above what is already explained by age. The authors propose to use a regression model trained to predict cognition ("brain-cognition") as an alternative suited to applications of cognitive decline. While this is less accurate overall than brain age, it explains more unique variance in the downstream regression.

      (1) I thank the authors for addressing many of my concerns with this revision. However, I do not feel they have addressed them all. In particular I think the authors could do more to address the concern I raised about the instability of the regression coefficients and about providing enough detail to determine that the stacked regression models do not overfit.

      Thank you Reviewer 1 for the comment. We addressed them in our response to Reviewer 1 Recommendations For The Authors #1 and #2 (see below).

      (2) In considering my responses to the authors revision, I also must say that I agree with Reviewer 3 about the limitations of the brain age and brain cognition methods conceptually. In particular that the regression model used to predict fluid cognition will by construction explain more variance in cognition than a brain age model that is trained to predict age. To be fair, these conceptual problems are more widespread than this paper alone, so I do not believe the authors should be penalised for that. However, I would recommend to make these concerns more explicit in the manuscript

      Thank you Reviewer 1 for the comment. We addressed them in our response to Reviewer 1 Recommendations For The Authors #3 (see below).

      Reviewer 2 (Public Review):

      In this study, the authors aimed to evaluate the contribution of brain-age indices in capturing variance in cognitive decline and proposed an alternative index, brain-cognition, for consideration.

      The study employs suitable methods and data to address the research questions, and the methods and results sections are generally clear and easy to follow.

      I appreciate the authors' efforts in significantly improving the paper, including some considerable changes, from the original submission. While not all reviewer points were tackled, the majority of them were adequately addressed. These include additional analyses, more clarity in the methods and a much richer and nuanced discussion. While recognising the merits of the revised paper, I have a few additional comments.

      (1) Perhaps it would help the reader to note that it might be expected for brain-cognition to account for a significantly larger variance (11%) in fluid cognition, in contrast to brain-age. This stems from the fact that the authors specifically trained brain-cognition to predict fluid cognition, the very variable under consideration. In line with this, the authors later recommend that researchers considering the use of brain-age should evaluate its utility using a regression approach. The latter involves including a brain index (e.g. brain-cognition) previously trained to predict the regression's target variable (e.g. fluid cognition) alongside a brain-age index (e.g., corrected brain-age gap). If the target-trained brain index outperforms the brain-age metric, it suggests that relying solely on brain-age might not be the optimal choice. Although not necessarily the case, is it surprising for the target-trained brain index to demonstrate better performance than brain-age? This harks back to the broader point raised in the initial review: while brain-age may prove useful (though sometimes with modest effect sizes) across diverse outcomes as a generally applicable metric, a brain index tailored for predicting a specific outcome, such as brain-cognition in this case, might capture a considerably larger share of variance in that specific context but could lack broader applicability. The latter aspect needs to be empirically assessed.

      Thank you so much for raising this point. Reviewer 1 (Public Review #2/Recommendations For The Authors #3) and Reviewer 3 (Recommendations for the Authors #1) made a similar observation. We now made changes to the introduction and discussion to address this concern (please see our responses to Reviewer 1 Recommendations For The Authors #3 below).

      Briefly, as in our 2nd revision, we did not intend to compare Brain Age with Brain Cognition since, by design, the variation in fluid cognition explained by Brain Cognition should be higher or equal to that explained by Brain Age. Here we made this point more explicit and further stated that the relationship between Brain Cognition and fluid cognition indicates the upper limit of Brain Age’s capability in capturing fluid cognition. By examining what was captured by Brain Cognition, over and above Brain Age and chronological age via the unique effects of Brain Cognition, we were able to quantify the amount of co-variation between brain MRI and fluid cognition that was missed by Brain Age. And such quantification is the third aim of this study.

      (2) Furthermore, the discussion pertaining to training brain-age models on healthy populations for subsequent testing on individuals with neurological or psychological disorders seems somewhat one-sided within the broader debate. This one-sidedness might potentially confuse readers. It is worth noting that the choice to employ healthy participants in the training model is likely deliberate, serving as a norm against which atypical populations are compared. To provide a more comprehensive understanding, referencing Tim Hans's counterargument to Bashyam's perspective could offer a more complete view (https://academic.oup.com/brain/article/144/3/e31/6214475?login=false).

      Thank you Reviewer 2 for bringing up this issue. We have now revised the paragraph in question and added nuances on the usage of Brain Age for normative vs. case-control studies. We also cited Tim Hahn’s article that explained the conceptual foundation of the use of Brain Age in case-control studies. Please see below. Additionally, we also made a statement about our study not being able to address issues about the case-control studies directly in the newly written conclusion (see Reviewer 3 Recommendations for the Authors #3).

      Discussion:

      “There is a notable difference between studies investigating the utility of Brain Age in explaining cognitive functioning, including ours and others (e.g., Butler et al., 2021; Cole, 2020, 2020; Jirsaraie et al., 2023) and those explaining neurological/psychological disorders (e.g., Bashyam et al., 2020; Rokicki et al., 2021). We consider the former as a normative type of study and the latter as a case-control type of study (Insel et al., 2010; Marquand et al., 2016). Those case-control Brain Age studies focusing on neurological/psychological disorders often build age-prediction models from MRI data of largely healthy participants (e.g., controls in a case-control design or large samples in a population-based design), apply the built age-prediction models to participants without vs. with neurological/psychological disorders and compare Brain Age indices between the two groups. On the one hand, this means that case-control studies treat Brain Age as a method to detect anomalies in the neurological/psychological group (Hahn et al., 2021). On the other hand, this also means that case-control studies have to ignore under-fitted models when applied prediction models built from largely healthy participants to participants with neurological/psychological disorders (i.e., Brain Age may predict chronological age well for the controls, but not for those with a disorder). On the contrary, our study and other normative studies focusing on cognitive functioning often build age-prediction models from MRI data of largely healthy participants and apply the built age-prediction models to participants who are also largely healthy. Accordingly, the age-prediction models for explaining cognitive functioning in normative studies, while not allowing us to detect group-level anomalies, do not suffer from being under-fitted. This unfortunately might limit the generalisability of our study into just the normative type of study. Future work is still needed to test the utility of brain age in the case-control case.”

      (3) Overall, this paper makes a significant contribution to the field of brain-age and related brain indices and their utility.

      Thank you for the encouragement.

      Reviewer 3 (Public Review):

      The main question of this article is as follows: "To what extent does having information on brain-age improve our ability to capture declines in fluid cognition beyond knowing a person's chronological age?" This question is worthwhile, considering that there is considerable confusion in the field about the nature of brain-age.

      (1) Thank you to the authors for addressing so many of my concerns with this revision. There are a few points that I feel still need addressing/clarifying related to 1) calculating brain cognition, 2) the inevitability of their results, and 3) their continued recommendation to use brain-age metrics.

      Thank you Reviewer 3 for the comment. We addressed them in our response to Reviewer 3 Recommendations For The Authors #1-3 (see below).

      Recommendations for the authors:

      Reviewer 1 (Recommendations For The Authors):

      (1) I do not feel the authors have fully addressed the concern I raised about the stacked regression models. Despite the new figure, it is still not entirely clear what the authors are using as the training set in the final step. To be clear, the problem occurs because of the parameters, not the hyperparameters (which the authors now state that they are optimising via nested grid search). in other words, given a regression model y = X*beta, if the X are taken to be predictions from a lower level regression model, then they contain information that is derived from both the training set at the test set for the model that this was trained on. If the split is the same (i.e. the predictions are derived on the same test set as is being used at the second level), then this can lead to overfitting. It is not clear to me whether the authors have done this or not. Please provide additional detail to clarify this point.

      Thank you for allowing us an opportunity to clarify our stacked model. We wanted to confirm that we did not use test sets to build a stacked model in both lower and higher levels of the Elastic Net models. Test sets were there just for testing the performance of the models. We made additional clarification to make this clearer (see below). Let us explain what we did and provide the rationales below.

      From Methods:

      “We used nested cross-validation (CV) to build these prediction models (see Figure 7). We first split the data into five outer folds, leaving each outer fold with around 100 participants. This number of participants in each fold is to ensure the stability of the test performance across folds. In each outer-fold CV loop, one of the outer folds was treated as an outer-fold test set, and the rest was treated as an outer-fold training set. Ultimately, looping through the nested CV resulted in a) prediction models from each of the 18 sets of features as well as b) prediction models that drew information across different combinations of the 18 separate sets, known as “stacked models.” We specified eight stacked models: “All” (i.e., including all 18 sets of features), “All excluding Task FC”, “All excluding Task Contrast”, “Non-Task” (i.e., including only Rest FC and sMRI), “Resting and Task FC”, “Task Contrast and FC”, “Task Contrast” and “Task FC”. Accordingly, there were 26 prediction models in total for both Brain Age and Brain Cognition.

      To create these 26 prediction models, we applied three steps for each outer-fold loop. The first step aimed at tuning prediction models for each of 18 sets of features. This step only involved the outer-fold training set and did not involve the outer-fold test set. Here, we divided the outer-fold training set into five inner folds and applied inner-fold CV to tune hyperparameters with grid search. Specifically, in each inner-fold CV, one of the inner folds was treated as an inner-fold validation set, and the rest was treated as an inner-fold training set. Within each inner-fold CV loop, we used the inner-fold training set to estimate parameters of the prediction model with a particular set of hyperparameters and applied the estimated model to the inner-fold validation set. After looping through the inner-fold CV, we, then, chose the prediction models that led to the highest performance, reflected by coefficient of determination (R2), on average across the inner-fold validation sets. This led to 18 tuned models, one for each of the 18 sets of features, for each outer fold.

      The second step aimed at tuning stacked models. Same as the first step, the second step only involved the outer-fold training set and did not involve the outer-fold test set. Here, using the same outer-fold training set as the first step, we applied tuned models, created from the first step, one from each of the 18 sets of features, resulting in 18 predicted values for each participant. We, then, re-divided this outer-fold training set into new five inner folds. In each inner fold, we treated different combinations of the 18 predicted values from separate sets of features as features to predict the targets in separate “stacked” models. Same as the first step, in each inner-fold CV loop, we treated one out of five inner folds as an inner-fold validation set, and the rest as an inner-fold training set. Also as in the first step, we used the inner-fold training set to estimate parameters of the prediction model with a particular set of hyperparameters from our grid. We tuned the hyperparameters of stacked models using grid search by selecting the models with the highest R2 on average across the inner-fold validation sets. This led to eight tuned stacked models.

      The third step aimed at testing the predictive performance of the 18 tuned prediction models from each of the set of features, built from the first step, and eight tuned stacked models, built from the second step. Unlike the first two steps, here we applied the already tuned models to the outer-fold test set. We started by applying the 18 tuned prediction models from each of the sets of features to each observation in the outer-fold test set, resulting in 18 predicted values. We then applied the tuned stacked models to these predicted values from separate sets of features, resulting in eight predicted values.

      To demonstrate the predictive performance, we assessed the similarity between the observed values and the predicted values of each model across outer-fold test sets, using Pearson’s r, coefficient of determination (R2) and mean absolute error (MAE). Note that for R2, we used the sum of squares definition (i.e., R2 = 1 – (sum of squares residuals/total sum of squares)) per a previous recommendation (Poldrack et al., 2020). We considered the predicted values from the outer-fold test sets of models predicting age or fluid cognition, as Brain Age and Brain Cognition, respectively.”

      Author response image 1.

      Diagram of the nested cross-validation used for creating predictions for models of each set of features as well as predictions for stacked models.

      Note some previous research, including ours (Tetereva et al., 2022), splits the observations in the outer-fold training set into layer 1 and layer 2 and applies the first and second steps to layers 1 and 2, respectively. Here we decided against this approach and used the same outer-fold training set for both first and second steps in order to avoid potential bias toward the stacked models. This is because, when the data are split into two layers, predictive models built for each separate set of features only use the data from layer 1, while the stacked models use the data from both layers 1 and 2. In practice with large enough data, these two approaches might not differ much, as we demonstrated previously (Tetereva et al., 2022).

      (2) I also do not feel the authors have fully addressed the concern I raised about stability of the regression coefficients over splits of the data. I wanted to see the regression coefficients, not the predictions. The predictions can be stable when the coefficients are not.

      The focus of this article is on the predictions. Still, as pointed out by reviewer 1, it is informative for readers to understand how stable the feature importance (i.e., Elastic Net coefficients) is. To demonstrate the stability of feature importance, we now examined the rank stability of feature importance using Spearman’s ρ (see Figure 4). Specifically, we correlated the feature importance between two prediction models of the same features, used in two different outer-fold test sets. Given that there were five outer-fold test sets, we computed 10 Spearman’s ρ for each prediction model of the same features. We found Spearman’s ρ to be varied dramatically in both age-prediction (range=.31-.94) and fluid cognition-prediction (range=.16-.84) models. This means that some prediction models were much more stable in their feature importance than others. This is probably due to various factors such as a) the collinearity of features in the model, b) the number of features (e.g., 71,631 features in functional connectivity, which were further reduced to 75 PCAs, as compared to 19 features in subcortical volume based on the ASEG atlas), c) the penalisation of coefficients either with ‘Ridge’ or ‘Lasso’ methods, which resulted in reduction as a group of features or selection of a feature among correlated features, respectively, and d) the predictive performance of the models. Understanding the stability of feature importance is beyond the scope of the current article. As mentioned by Reviewer 1, “The predictions can be stable when the coefficients are not,” and we chose to focus on the prediction in the current article.

      Author response image 2.

      Stability of feature importance (i.e., Elastic Net Coefficients) of prediction models. Each dot represents rank stability (reflected by Spearman’s ρ) in the feature importance between two prediction models of the same features, used in two different outer-fold test sets. Given that there were five outer-fold test sets, there were 10 Spearman’s ρs for each prediction model. The numbers to the right of the plots indicate the mean of Spearman’s ρ for each prediction model.

      (3) I also must say that I agree with Reviewer 3 about the limitations of the brain-age and brain-cognition methods conceptually. In particular that the regression model used to predict fluid cognition will by construction explain more variance in cognition than a brain-age model that is trained to predict age. This suffers from the same problem the authors raise with brain-age and I agree that this would probably disappear if the authors had a separate measure of cognition against which to validate and were then to regress this out as they do for age correction. I am aware that these conceptual problems are more widespread than this paper alone (in fact throughout the brain-age literature), so I do not believe the authors should be penalised for that. However, I do think they can make these concerns more explicit and further tone down the comments they make about the utility of brain-cognition.

      Thank you so much for raising this point. Reviewer 2 (Public Review #1) and Reviewer 3 (Recommendations for the Authors #1) made a similar observation. We now made changes to the introduction and discussion to address this concern (see below).

      Briefly, we made it explicit that, by design, the variation in fluid cognition explained by Brain Cognition should be higher or equal to that explained by Brain Age. That is, the relationship between Brain Cognition and fluid cognition indicates the upper limit of Brain Age’s capability in capturing fluid cognition. More importantly, by examining what was captured by Brain Cognition, over and above Brain Age and chronological age via the unique effects of Brain Cognition, we were able to quantify the amount of co-variation between brain MRI and fluid cognition that was missed by Brain Age. And this is the third goal of this present study.

      From Introduction:

      “Third and finally, certain variation in fluid cognition is related to brain MRI, but to what extent does Brain Age not capture this variation? To estimate the variation in fluid cognition that is related to the brain MRI, we could build prediction models that directly predict fluid cognition (i.e., as opposed to chronological age) from brain MRI data. Previous studies found reasonable predictive performances of these cognition-prediction models, built from certain MRI modalities (Dubois et al., 2018; Pat et al., 2022; Rasero et al., 2021; Sripada et al., 2020; Tetereva et al., 2022; for review, see Vieira et al., 2022). Analogous to Brain Age, we called the predicted values from these cognition-prediction models, Brain Cognition. The strength of an out-of-sample relationship between Brain Cognition and fluid cognition reflects variation in fluid cognition that is related to the brain MRI and, therefore, indicates the upper limit of Brain Age’s capability in capturing fluid cognition. This is, by design, the variation in fluid cognition explained by Brain Cognition should be higher or equal to that explained by Brain Age. Consequently, if we included Brain Cognition, Brain Age and chronological age in the same model to explain fluid cognition, we would be able to examine the unique effects of Brain Cognition that explain fluid cognition beyond Brain Age and chronological age. These unique effects of Brain Cognition, in turn, would indicate the amount of co-variation between brain MRI and fluid cognition that is missed by Brain Age.”

      From Discussion:

      “Third, by introducing Brain Cognition, we showed the extent to which Brain Age indices were not able to capture the variation in fluid cognition that is related to brain MRI. More specifically, using Brain Cognition allowed us to gauge the variation in fluid cognition that is related to the brain MRI, and thereby, to estimate the upper limit of what Brain Age can do. Moreover, by examining what was captured by Brain Cognition, over and above Brain Age and chronological age via the unique effects of Brain Cognition, we were able to quantify the amount of co-variation between brain MRI and fluid cognition that was missed by Brain Age.

      From our results, Brain Cognition, especially from certain cognition-prediction models such as the stacked models, has relatively good predictive performance, consistent with previous studies (Dubois et al., 2018; Pat et al., 2022; Rasero et al., 2021; Sripada et al., 2020; Tetereva et al., 2022; for review, see Vieira et al., 2022). We then examined Brain Cognition using commonality analyses (Nimon et al., 2008) in multiple regression models having a Brain Age index, chronological age and Brain Cognition as regressors to explain fluid cognition. Similar to Brain Age indices, Brain Cognition exhibited large common effects with chronological age. But more importantly, unlike Brain Age indices, Brain Cognition showed large unique effects, up to around 11%. As explained above, the unique effects of Brain Cognition indicated the amount of co-variation between brain MRI and fluid cognition that was missed by a Brain Age index and chronological age. This missing amount was relatively high, considering that Brain Age and chronological age together explained around 32% of the total variation in fluid cognition. Accordingly, if a Brain Age index was used as a biomarker along with chronological age, we would have missed an opportunity to improve the performance of the model by around one-third of the variation explained.”

      Reviewer #3 (Recommendations For The Authors):

      Thank you to the authors for addressing so many of my concerns with this revision. There are a few points that I feel still need addressing/clarifying related to: 1) calculating brain cognition, 2) the inevitability of their results, and 3) their continued recommendation to use brain age metrics.

      (1) I understand your point here. I think the distinction is that it is fine to build predictive models, but then there is no need to go through this intermediate step of "brain-cognition". Just say that brain features can predict cognition XX well, and brain-age (or some related metric) can predict cognition YY well. It creates a confusing framework for the reader that can lead them to believe that "brain-cognition" is not just a predicted value of fluid cognition from a model using brain features to predict cognition. While you clearly state that that is in fact what it is in the text, which is a huge improvement, I do not see what is added by going through brain-cognition instead of simply just obtaining a change in R2 where the first model uses brain features alone to predict cognition, and the second adds on brain-age (or related metrics), or visa versa, depending on the question. Please do this analysis, and either compare and contrast it with going through "brain-cognition" in your paper, or switch to this analysis, as it more directly addresses the question of the incremental predictive utility of brain-age above and beyond brain features.

      Thank you so much for raising this point. Reviewer 1 (Public Review #2/Recommendations For The Authors #3) and Reviewer 2 (Public Review #1) made a similar observation. We now made changes to the introduction and discussion to address this concern (see our responses to Reviewer 1 Recommendations For The Authors #3 above).

      Briefly, as in our 2nd revision, we made it explicitly clear that we did not intend to compare Brain Age with Brain Cognition since, by design, the variation in fluid cognition explained by Brain Cognition should be higher or equal to that explained by Brain Age. And, by examining what was captured by Brain Cognition, over and above Brain Age and chronological age via the unique effects of Brain Cognition, we were able to quantify the amount of co-variation between brain MRI and fluid cognition that was missed by Brain Age.

      We have thought about changing the name Brain Cognition into something along the lines of “predicted values of prediction models predicting fluid cognition based on brain MRI.” However, this made the manuscript hard to follow, especially with the commonality analyses. For instance, the sentence, “Here, we tested Brain Cognition’s unique effects in multiple regression models with a Brain Age index, chronological age and Brain Cognition as regressors to explain fluid cognition” would become “Here, we tested predicted values of prediction models predicting fluid cognition based on brain MRI unique effects in multiple regression models with a Brain Age index, chronological age and predicted values of prediction models predicting fluid cognition based on brain MRI as regressors to explain fluid cognition.” We believe, given our additional explanation (see our responses to Reviewer 1 Recommendations For The Authors #3 above), readers should understand what Brain Cognition is, and that we did not intend to compare Brain Age and Brain Cognition directly.

      As for the suggested analysis, “obtaining a change in R2 where the first model uses brain features alone to predict cognition, and the second adds on brain-age (or related metrics), or visa versa,” we have already done this in the form of commonality analysis (Nimon et al., 2008) (see Figure 7 below). That is, to obtain unique and common effects of the regressors, we need to look at all of the possible changes in R2 when all possible subsets of regressors were excluded or included, see equations 12 and 13 below.

      From Methods:

      “Similar to the above multiple regression model, we had chronological age, each Brain Age index and Brain Cognition as the regressors for fluid cognition:

      Fluid Cognitioni = β0 + β1 Chronological Agei + β2 Brain Age Indexi,j + β3 Brain Cognitioni + εi, (12)

      Applying the commonality analysis here allowed us, first, to investigate the addictive, unique effects of Brain Cognition, over and above chronological age and Brain Age indices. More importantly, the commonality analysis also enabled us to test the common, shared effects that Brain Cognition had with chronological age and Brain Age indices in explaining fluid cognition. We calculated the commonality analysis as follows (Nimon et al., 2017):

      Unique Effectchronological age = ΔR2chronological age = R2chronological age, Brain Age index, Brain Cognition – R2 Brain Age index, Brain Cognition

      Unique EffectBrain Age index = ΔR2Brain Age index = R2chronological age, Brain Age index, Brain Cognition – R2 chronological age, Brain Cognition

      Unique EffectBrain Cognition = ΔR2Brain Cognition = R2chronological age, Brain Age index, Brain Cognition – R2 chronological age, Brain Age Index

      Common Effectchronological age, Brain Age index = R2chronological age, Brain Cognition + R2 Brain Age index, Brain Cognition – R2 Brain Cognition – R2chronological age, Brain Age index, Brain Cognition

      Common Effectchronological age, Brain Cognition = R2chronological age, Brain Age Index + R2 Brain Age index, Brain Cognition – R2 Brain Age Index – R2chronological age, Brain Age index, Brain Cognition

      Common Effect Brain Age index, Brain Cognition = R2chronological age, Brain Age Index + R2 chronological age, Brain Cognition – R2 chronological age – R2chronological age, Brain Age index, Brain Cognition

      Common Effect chronological age, Brain Age index, Brain Cognition = R2 chronological age + R2 Brain Age Index + R2 Brain Cognition – R2chronological age, Brain Age Index – R2 chronological age, Brain Cognition – R2 Brain Age Index, Brain Cognition – R2chronological age, Brain Age index, Brain Cognition , (13)”

      (2) I agree that the solution is not to exclude age as a covariate, and that there is a big difference between inevitable and obvious. I simply think a further discussion of the inevitability of the results would be clarifying for the readers. There is a big opportunity in the brain-age literature to be as direct as possible about why you are finding what you are finding. People need to know not only what you found, but why you found what you found.

      Thank you. We agreed that we need to make this point more explicit and direct. In the revised manuscript, we had the statements in both Introduction and Discussion (see below) about the tight relationship between Brain Age and chronological age by design, making the small unique effects of Brain Age inevitable.

      Introduction:

      “Accordingly, by design, Brain Age is tightly close to chronological age. Because chronological age usually has a strong relationship with fluid cognition, to begin with, it is unclear how much Brain Age adds to what is already captured by chronological age.“

      Discussion:

      “First, Brain Age itself did not add much more information to help us capture fluid cognition than what we had already known from a person’s chronological age. This can clearly be seen from the small unique effects of Brain Age indices in the multiple regression models having Brain Age and chronological age as the regressors. While the unique effects of some Brain Age indices from certain age-prediction models were statistically significant, there were all relatively small. Without Brain Age indices, chronological age by itself already explained around 32% of the variation in fluid cognition. Including Brain Age indices only added around 1.6% at best. We believe the small unique effects of Brain Age were inevitable because, by design, Brain Age is tightly close to chronological age. Therefore, chronological age and Brain Age captured mostly a similar variation in fluid cognition.

      Investigating the simple regression models and the commonality analysis between each Brain Age index and chronological age provided additional insights….”

      (3) I believe it is very important to critically examine the use of brain-age and related metrics. As part of this process, I think we should be asking ourselves the following questions (among others): Why go through age prediction? Wouldn't the predictions of cognition (or another variable) using the same set of brain features always be as good or better? You still have not justified the use of brain-age. As I said before, if you are going to continue to recommend the use of brain-age, you need a very strong argument for why you are recommending this. What does it truly add? Otherwise, temper your statements to indicate possible better paths forward.

      Thank you Reviewer 3 for making an argument against the use of Brain Age. We largely agree with you. However, our work only focuses on one phenotype, fluid cognition, and on the normative situation (i.e., not having a case vs control group). As Reviewer 2 pointed out, Brain Age might still have utility in other cases, not studied here. Still, future studies that focus on other phenotypes may consider using our approach as a template to test the utility of Brain Age in other situations. We added the conclusion statement to reflect this.

      From Discussion:

      “Altogether, we examined the utility of Brain Age as a biomarker for fluid cognition. Here are the three conclusions. First, Brain Age failed to add substantially more information over and above chronological age. Second, a higher ability to predict chronological age did not correspond to a higher utility to capture fluid cognition. Third, Brain Age missed up to around one-third of the variation in fluid cognition that could have been explained by brain MRI. Yet, given our focus on fluid cognition, future empirical research is needed to test the utility of Brain Age on other phenotypes, especially when Brain Age is used for anomaly detection in case-control studies (e.g., Bashyam et al., 2020; Rokicki et al., 2021). We hope that future studies may consider applying our approach (i.e., using the commonality analysis that includes predicted values from a model that directly predicts the phenotype of interest) to test the utility of Brain Age as a biomarker for other phenotypes.”

      References

      Bashyam, V. M., Erus, G., Doshi, J., Habes, M., Nasrallah, I. M., Truelove-Hill, M., Srinivasan, D., Mamourian, L., Pomponio, R., Fan, Y., Launer, L. J., Masters, C. L., Maruff, P., Zhuo, C., Völzke, H., Johnson, S. C., Fripp, J., Koutsouleris, N., Satterthwaite, T. D., … on behalf of the ISTAGING Consortium, the P. A. disease C., ADNI, and CARDIA studies. (2020). MRI signatures of brain age and disease over the lifespan based on a deep brain network and 14 468 individuals worldwide. Brain, 143(7), 2312–2324. https://doi.org/10.1093/brain/awaa160

      Butler, E. R., Chen, A., Ramadan, R., Le, T. T., Ruparel, K., Moore, T. M., Satterthwaite, T. D., Zhang, F., Shou, H., Gur, R. C., Nichols, T. E., & Shinohara, R. T. (2021). Pitfalls in brain age analyses. Human Brain Mapping, 42(13), 4092–4101. https://doi.org/10.1002/hbm.25533

      Cole, J. H. (2020). Multimodality neuroimaging brain-age in UK biobank: Relationship to biomedical, lifestyle, and cognitive factors. Neurobiology of Aging, 92, 34–42. https://doi.org/10.1016/j.neurobiolaging.2020.03.014

      Dubois, J., Galdi, P., Paul, L. K., & Adolphs, R. (2018). A distributed brain network predicts general intelligence from resting-state human neuroimaging data. Philosophical Transactions of the Royal Society B: Biological Sciences, 373(1756), 20170284. https://doi.org/10.1098/rstb.2017.0284

      Hahn, T., Fisch, L., Ernsting, J., Winter, N. R., Leenings, R., Sarink, K., Emden, D., Kircher, T., Berger, K., & Dannlowski, U. (2021). From ‘loose fitting’ to high-performance, uncertainty-aware brain-age modelling. Brain, 144(3), e31–e31. https://doi.org/10.1093/brain/awaa454

      Insel, T., Cuthbert, B., Garvey, M., Heinssen, R., Pine, D. S., Quinn, K., Sanislow, C., & Wang, P. (2010). Research Domain Criteria (RDoC): Toward a New Classification Framework for Research on Mental Disorders. American Journal of Psychiatry, 167(7), 748–751. https://doi.org/10.1176/appi.ajp.2010.09091379

      Jirsaraie, R. J., Kaufmann, T., Bashyam, V., Erus, G., Luby, J. L., Westlye, L. T., Davatzikos, C., Barch, D. M., & Sotiras, A. (2023). Benchmarking the generalizability of brain age models: Challenges posed by scanner variance and prediction bias. Human Brain Mapping, 44(3), 1118–1128. https://doi.org/10.1002/hbm.26144

      Marquand, A. F., Rezek, I., Buitelaar, J., & Beckmann, C. F. (2016). Understanding Heterogeneity in Clinical Cohorts Using Normative Models: Beyond Case-Control Studies. Biological Psychiatry, 80(7), 552–561. https://doi.org/10.1016/j.biopsych.2015.12.023

      Nimon, K., Lewis, M., Kane, R., & Haynes, R. M. (2008). An R package to compute commonality coefficients in the multiple regression case: An introduction to the package and a practical example. Behavior Research Methods, 40(2), 457–466. https://doi.org/10.3758/BRM.40.2.457

      Pat, N., Wang, Y., Anney, R., Riglin, L., Thapar, A., & Stringaris, A. (2022). Longitudinally stable, brain‐based predictive models mediate the relationships between childhood cognition and socio‐demographic, psychological and genetic factors. Human Brain Mapping, hbm.26027. https://doi.org/10.1002/hbm.26027

      Poldrack, R. A., Huckins, G., & Varoquaux, G. (2020). Establishment of Best Practices for Evidence for Prediction: A Review. JAMA Psychiatry, 77(5), 534–540. https://doi.org/10.1001/jamapsychiatry.2019.3671

      Rasero, J., Sentis, A. I., Yeh, F.-C., & Verstynen, T. (2021). Integrating across neuroimaging modalities boosts prediction accuracy of cognitive ability. PLOS Computational Biology, 17(3), e1008347. https://doi.org/10.1371/journal.pcbi.1008347

      Rokicki, J., Wolfers, T., Nordhøy, W., Tesli, N., Quintana, D. S., Alnæs, D., Richard, G., de Lange, A.-M. G., Lund, M. J., Norbom, L., Agartz, I., Melle, I., Nærland, T., Selbæk, G., Persson, K., Nordvik, J. E., Schwarz, E., Andreassen, O. A., Kaufmann, T., & Westlye, L. T. (2021). Multimodal imaging improves brain age prediction and reveals distinct abnormalities in patients with psychiatric and neurological disorders. Human Brain Mapping, 42(6), 1714–1726. https://doi.org/10.1002/hbm.25323

      Sripada, C., Angstadt, M., Rutherford, S., Taxali, A., & Shedden, K. (2020). Toward a “treadmill test” for cognition: Improved prediction of general cognitive ability from the task activated brain. Human Brain Mapping, 41(12), 3186–3197. https://doi.org/10.1002/hbm.25007

      Tetereva, A., Li, J., Deng, J. D., Stringaris, A., & Pat, N. (2022). Capturing brain‐cognition relationship: Integrating task‐based fMRI across tasks markedly boosts prediction and test‐retest reliability. NeuroImage, 263, 119588. https://doi.org/10.1016/j.neuroimage.2022.119588

      Vieira, B. H., Pamplona, G. S. P., Fachinello, K., Silva, A. K., Foss, M. P., & Salmon, C. E. G. (2022). On the prediction of human intelligence from neuroimaging: A systematic review of methods and reporting. Intelligence, 93, 101654. https://doi.org/10.1016/j.intell.2022.101654

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This paper provides useful information about how the ionome of Arabidopsis thaliana adapts to very high CO2-levels, backed up by solid evidence and carefully designed studies. However, the broader claims of the paper about climate change and food security - heavily emphasized in the abstract, introduction, and discussion - are inappropriate, as there is no direct link to the presented work.

      We sincerely thank you for the work you have done in reviewing our manuscript. We very much appreciate your overall positive assessment of the experimental work as a whole, its value and robustness.

      In this revised version, we took on board the majority of your suggestions and your comments. In particular, we understood your critical point about overstating our objectives, which might in turn seem uncorrelated with our results. We fully agree with the comments that have been made on this point. Consequently, we have made substantial modifications and corrections in order to clarify our objectives and their implications: exploring in depth the natural variation of the shoot ionome response to elevated CO2, and generating a valuable resource allowing a better understanding of the genetic and molecular mechanisms involved in the regulation of plant mineral nutrition by the elevation of atmospheric CO2.

      We also made modifications in response to the other suggestions, including a clarification of the functional experiments carried out around the function of TIP2;2 in response to elevated CO2. Figure 7 now comprises the comparison between both ambient and elevated CO2 conditions, which is much more informative that what appeared in the previous version.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The study's abstract, introduction, and conclusions are not supported by the methods and results conducted. In fact, the results presented suggest that Arabidopsis could easily adapt to an extremely high CO2 environment.

      We understand the reviewer’s comment. Although our work is considered useful, robust and well designed, we agree with the reviewer's point. We have certainly overemphasized the significance of our work to address the issue of food security in response to rising atmospheric CO2, at the expense of the factual description of the results of our fundamental study of the mechanisms at the interface between CO2 and mineral nutrition. We have clarified this focus by modifying the text of the introduction, objectives and discussion. We hope that these modifications will enable readers to better appreciate the core of this work.

      Regarding the last part of the comment, our results do suggest that genetic variation could allow adaptation to rising atmospheric CO2, and our study does indeed aim to identify the extent and basis of this genetic variation.

      This study offers good evidence pointing to a genetic basis for Arabidopsis thaliana's response to elevated CO2 (eCO2) levels and its subsequent impact on the leaf ionome. The natural variation analyses in the study support the hypothesis that genetic factors, rather than local adaptation, guide the influence of eCO2 on the ionome of rosette leaves in Arabidopsis. However, the manuscript's claim regarding its role in "the development of biofortified crops adapted to a high-CO2 world" (line 23) is overstated, especially given the absence of any analysis on the influence of eCO2 on the seed ionome and Arabidopsis is a poor model for harvest index for any crop. The manuscript, in its current form, necessitates massive revisions, particularly in clarifying its broader implications and in providing more substantial evidence for some of its assertions.

      We thank the reviewer for this comment, and we would like to thank the reviewer for the positive appreciation for the identification of genetic basis for Arabidopsis thaliana's response to elevated CO2 and its subsequent impact on the leaf ionome. Nevertheless, it is true that the study of the leaf ionome is far from being able to lead to the development of biofortified plants. Some papers described that nutrient harvest index in Arabidopsis is a potential indicator of nutrient use efficiency (for instance, Masclaux-Daubresse and Chardon, Journal of Experimental Botany 2011 or Aranjuelo et al., Journal of Experimental Botany 2013). However, as we did not include any seed ionome data in the paper, we added clear mentions that our analyses were made on leaves (lines 56/57/250/319) and a comment in the discussion section to address this limitation (lines 325-328).

      Major Drawbacks and Questions:

      (1) Evidence for the Central Premise:

      The foundational premise of the study is the assertion that rising atmospheric CO2 levels result in a decline in plant mineral content. This phenomenon is primarily observed in C3 plants, with C4 plants seemingly less affected. The evidence provided on this topic is scant and, in some instances, contradicts the authors' own references. The potential reduction of certain minerals, especially in grains, can be debated. For instance, reduced nitrogen (N) and phosphorus (P) content in grains might not necessarily be detrimental for human and animal consumption. In fact, it could potentially mitigate issues like nitrogen emissions and phosphorus leaching. Labeling this as a "major threat to food security" (line 30) is exaggerated. While the case for microelements might be more compelling, the introduction fails to articulate this adequately. Furthermore, the introduction lacks any discussion on how eCO2 might influence nutrient allocation to grains, which would be crucial in substantiating the claim that eCO2 poses a threat to food security. A more comprehensive introduction that clearly delineates the adverse effects of eCO2 and its implications for food security would greatly enhance the manuscript.

      We partially agree with this comment. The decline in mineral status of C3 plants under conditions of elevated atmospheric CO2 has been widely described in the literature, and specifically documented for the cereal grains. While there are variations in this effect (depending on species, ecotype, cultivar), there is no debate about its acceptance. Here are just a few of the many works describing this effect, both on a global scale and at the level of the individual plant (Cotrufo MF (1998) Elevated CO2 reduces the nitrogen concentration of plant tissues. Global Change Biology 4: 43-54; Loladze I (2014) Hidden shift of the ionome of plants exposed to elevated CO(2)depletes minerals at the base of human nutrition. eLife 3: e02245; Myers SS (2014) Increasing CO2 threatens human nutrition. Nature 510: 139-142; Poorter H (1997) The effect of elevated CO2 on the chemical composition and construction costs of leaves of 27 C3 species. Plant, Cell & Environment 20: 472-482 ; Soares JC (2019) Preserving the nutritional quality of crop plants under a changing climate: importance and strategies. Plant and Soil 443: 1-26; Stitt] M (1999) The interaction between elevated carbon dioxide and nitrogen nutrition: the physiological and molecular background. Plant, Cell & Environment 22: 583-621; Uddling J (2018) Crop quality under rising atmospheric CO2. Curr Opin Plant Biol 45: 262-267).

      In addition to this, the threat to food security posed by this alteration in plant mineral status has also been well described in the literature by several modeling approaches (Beach RH (2019) Combining the effects of increased atmospheric carbon dioxide on protein, iron, and zinc availability and projected climate change on global diets: a modelling study. Lancet Planet Health 3: e307-e317; Ebi KL (2019) Elevated atmospheric CO(2) concentrations and climate change will affect our food's quality and quantity. Lancet Planet Health 3: e283-e284; Medek DE (2017) Estimated Effects of Future Atmospheric CO2 Concentrations on Protein Intake and the Risk of Protein Deficiency by Country and Region. Environ Health Perspect 125: 087002; Smith MR (2018) Impact of anthropogenic CO2 emissions on global human nutrition. Nature Climate Change 8: 834-839; Weyant C (2018) Anticipated burden and mitigation of carbon-dioxide-induced nutritional deficiencies and related diseases: A simulation modeling study. PLoS Med 15: e1002586; Zhu C (2018) Carbon dioxide (CO2) levels this century will alter the protein, micronutrients, and vitamin content of rice grains with potential health consequences for the poorest rice-dependent countries. Sci Adv 4: eaaq1012). To reinforce this point, we have added a sentence and references (lines 30-33). Nevertheless, we understand the reviewer's comment on the nuance to be given to the intensity of this potential threat. We have therefore modified the text, replacing "major threat" by "significant threat" (lines 3 and 29).

      We also would like to answer the reviewer’s comment on the potential environmental benefit associated with reduced N and P content in grains (mitigation of N emissions and P leaching). Indeed, if this reduced N and P content results from a lowered use efficiency of soil nutrients by plants, as suggested by several studies (Bloom 2010, Cassan 2023, Gojon 2023 and references therein), this may at the opposite favor N oxides emission and P leaching from the soil.

      (2) Exaggerated Concerns:

      The paper begins with the concern that carbon fertilization will lead to carbon dilution in our foods. While we indeed face numerous genuine threats in the coming decades, this particular issue is manageable. The increase in CO2 alone offers many opportunities for boosting yield. However, the heightened heat and increased evapotranspiration will pose massive challenges in many environments.

      While there are indeed multiple threats that we are facing in the coming decades, we don't fully agree with this comment. At present, there's no evidence to say that the negative effect of CO2 on plant mineral content will be manageable. Furthermore, there is compelling evidence that altered mineral nutrition and mineral status of plants will be an important factor limiting the high CO2-induced increase in yield, as will be heat or increased evapotranspiration (see for instance Coskun et al (2016) Nutrient constraints on terrestrial carbon fixation: The role of Nitrogen. J. Plant Physiol. 203: 95-109; Jiang M (2020) Low phosphorus supply constrains plant responses to elevated CO2 : A meta-analysis. Glob Chang Biol 26: 5856-5873 ; Reich PB (2006) Nitrogen limitation constrains sustainability of ecosystem response to CO2. Nature 440: 922-925). Thus, although we do not negate the crucial importance of heat and water stress, we believe it is relevant to study the basic mechanisms responsible for the negative effect of CO2 on plant mineral composition.

      Figure 4 in fact suggests that 43% of the REGMAP panel (cluster 3) is already pre-adapted to very high CO2 levels. This suggests annual species could adapt very rapidly.

      We agree with the reviewer. However, this suggests that genetic variation exists in some ecotypes to support adaptation to elevated CO2. The purpose of this work is indeed to identify this genetic variation, in order to characterize the mechanisms behind.

      (3) Assumptions on CO2 Levels:

      The assumption of 900ppm seems to be based on a very extreme climate change scenario. Most people believe we will overshoot the 1.5°C scenario, however, it seems plausible that 2.5 to 3°C scenarios are more likely. This would correspond to around 500ppm of CO2. https://www.nature.com/articles/s41597-022-01196-7/tables/4

      We agree with the reviewer that the CO2 concentration we used corresponds to a high value in the IPCC projections. That said, this value is currently considered very plausible: the following figure (from Smith and Myers (2018) Nature Climate Change) shows that current CO2 emissions align with the IPCC's most extreme model (RCP 8.5), which would result in a CO2 concentration of around 900 ppm in 2100. Furthermore, nothing allows to exclude the 4°C scenario in the 6th IPCC report.

      Author response image 1.

      (4) Focus on Real Challenges:

      We have numerous real challenges, such as extreme heat and inconsistent rainfall, to address in the context of climate change. However, testing under extreme CO2 conditions and then asserting that carbon dilution will negatively impact nutrition is exaggerated.

      While we fully agree that several threats linked to climate change exist, and all deserve to be studied, we find it questionable to consider that the potential effect of high CO2 on the mineral nutrition of plants is not a real challenge. The mineral nutrition of plants is already a current major environmental challenge. This perspective seems to reflect the reviewer's personal opinion rather than an analysis of our work.

      In contrast, the FACE experiments are fundamental and are conducted at more realistic eCO2 levels. Understanding the interaction between a 20% increase in CO2 and new precipitation patterns is key for global carbon flux prediction.

      Again, we do not fully understand this comment, as the aim of our study was not to perform a global carbon flux prediction, but to unravel genes and mechanisms underlying the negative effect of elevated CO2 on the nutrient content of Arabidopsis rosettes. However, we agree with the reviewer’s comment and with the fact that FACE are useful facilities to explore the CO2 response in more natural environments, and we highlight the fact that the decrease in mineral status of C3 plants has been widely documented in FACE studies. FACE experiments do not facilitate, however, to conduct fully controlled experiments (temperature, rainfall, wind and light intensities are not controllable in FACE), that allow to disentangle the mechanisms by which elevated CO2 regulates the signaling pathways associated with the plant mineral composition. In the longer term, studying the mechanisms we have identified in a more global context of climate change could be highly relevant.

      As I look at the literature on commercial greenhouse tomato production, 1000ppm of eCO2 is common, but it also looks like the breeders and growers have already solved for flavor and nutrition under these conditions.

      Indeed, tomato is often cultivated in CO2-enriched greenhouses at 1000 ppm. According to the literature, this results in a 20-25% reduction in vitamin C or lycopene, and requires a significantly higher nitrogen and water intake to reach expected sugar levels (Doddrell H (2023) Horticulture Research). In addition, the negative effect of elevated CO2 on tomato nutrient content seems to have significant repercussions on nutrition-health properties (Boufeldja (2023), Molecules).

      Conclusion:

      While the study provides valuable insights into the genetic underpinnings of Arabidopsis thaliana's response to elevated CO2 levels, it requires an entirely revised writeup, especially in its abstract, broader claims and implications. The manuscript would benefit from a more thorough introduction, a clearer definition of its scope, and a clear focus on the limits of this study.

      We thank the reviewer for the comments made on our manuscript. In addition to the responses that we provide to these comments, we have modified the main text of the introduction, objectives and discussion to take these comments into consideration. We believe that this will significantly improve the manuscript.

      Reviewer #2 (Public Review):

      Strengths:

      The authors have conducted a large, well-designed experiment to test the response to eCO2. Overall, the experimental design is sound and appropriate for the questions about how a change in CO2 affects the ionome of Arabidopsis. Most of the conclusions in this area are well supported by the data that the authors present.

      We thank the reviewer for this positive appreciation.

      Weakness:

      While the authors have done good experiments, it is a big stretch from Arabidopsis grown in an arbitrary concentration of CO2 to relevance to human and animal nutrition in future climates. Arabidopsis is a great model plant, but its leaves are not generally eaten by humans or animals.

      We agree with the reviewer’s comment. We recognized that implying a direct contribution of our work to human nutrition in the future climates is overstated, as mentioned by the reviewer 1 as well. This was not an intentional overstatement, as we have always been convinced that our work contributed to the understanding of the basic mechanisms involved in the negative regulation of plant mineral nutrition by high CO2. We have significantly modified the text to correct any misunderstanding of our work’s implication.

      The authors don't justify their choice of a CO2 concentration. Given the importance of the parameter for the experiment, the rationale for selecting 900 ppm as elevated CO2 compared to any other concentration should be addressed. And CO2 is just one of the variables that plants will have to contend with in future climates, other variables will also affect elemental concentrations.

      We agree with this comment. We added a justification of the high CO2 concentration used in this work in the Material and Methods section (lines 343-344). You can also read the explanation of this choice in the response to the reviewer 1’s point 3.

      Given these concerns, I think the emphasis on biofortification for future climates is unwarranted for this study.

      Anew, we agree with this comment and we have significantly modified the text to correct any misunderstanding of our work’s implication.

      Additionally, I have trouble with these conclusions:

      -Abstract "Finally, we demonstrate that manipulating the function of one of these genes can mitigate the negative effect of elevated CO2 on the plant mineral composition."

      -Discussion "Consistent with these results, we show that manipulating TIP2;2 expressions with a knock-out mutant can modulate the Zn loss observed under high CO2."

      The authors have not included the data to support this conclusion as stated. They have shown that this mutant increases the Zn content of the leaves when compared to WT but have not demonstrated that this response is different than in ambient CO2. This is an important distinction: one way to ameliorate the reduction of nutrients due to eCO2 is to try to identify genes that are involved in the mechanism of eCO2-induced reduction. Another way is to increase the concentration of nutrients so that the eCO2-induced reduction is not as important (i.e. a 10% reduction in Zn due to eCO2 is not as important if you have increased the baseline Zn concentration by 20%). The authors identified tip2 as a target from the GWAS on difference, but their validation experiment only looks at eCO2.

      We thank the reviewer for this comment, and we agree with it. It is much more interesting, especially in the context of this paper, to analyze the function of a candidate gene not only in elevated CO2, but in both ambient and elevated CO2. Therefore, we added in Figure 7 data for the expression of TIP2;2 in contrasted haplotypes under ambient CO2, in comparison to those already presented under elevated CO2 (now Fig. 7C and 7D). This showed that TIP2;2 expression is lower in haplotype 0 also under ambient CO2. We also added in Figure 7 (Fig. 7E) the Zn level in WT and tip2;2-1 mutant under ambient CO2, in comparison to those already presented under elevated CO2. This showed that that the tip2;2-1 mutant line did not present any decrease in Zn shoot content in response to elevated CO2, in opposition to what is observed for the WT.

      We have added comments associated to these new results in the Results and Discussion sections and in the discussion section (lines 233-242 in the results section, and lines 310-314 in the discussion section).

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Reviewer Comments on the Article's Approach to Ionome Analysis

      (1) Omission of Phosphorus from the Ionome:

      It's surprising that phosphorus (P) was not measured in the ionome. After nitrogen (N), P is often the most limiting mineral for plant development and yield, making it a significant component of the ionome. Why did the authors omit this crucial element?

      We agree with the reviewer that P is an important mineral for plant growth. The absence of data related to P content is due to feasibility constraints rather than oversight. The MP-AES instrument we used to analyze the ionome (except N and C, that we obtained from an Elementar Analyzer) would have required an extra-step and an extra-analysis to obtain data for macronutrient such as P or K. In the context of this large-scale experiment, we faced the necessity to compromise and proceed without these data.

      (2) Relationship Between Leaf Ionome and Seed:

      The manuscript lacks evidence demonstrating the relationship between the leaf ionome and the seed. This connection is vital to establish the study's aims as outlined in lines 20-24. If the central argument is that eCO2 threatens food security, it's essential for the authors to either:

      • Provide evidence that eCO2 induces changes in the ionome profiles of seeds.

      • Show that changes in the rosette leaf ionome lead to alterations in seed ionome profiles.

      We agree with the reviewer. Although we know that seed ionome composition of Arabidopsis model accession such as Columbia is indeed negatively affected by eCO2, we do not provide the data that support some of the terms used in lines 20-24. The correspondence between leaf and seed ionome in natural population under eCO2 is certainly a next question that we will address. Therefore, to align our stated objectives with our data, we have modified the sentence in lines 20-24. We also added a comment on this point lines on the discussion section (lines 324-328).

      (3) Analysis of Ionome in Rosette Leaves:

      Why did the authors choose to analyze the ionome specifically in rosette leaves? Is there a known correlation between the ionome profile in rosette leaves and seeds?

      See our answer to the above comment.

      (4) Experimental Design Comments:

      • The layout of the accession growouts, the methods of randomization, blocking, and controls/checks should be detailed.

      • Were BLUEs (Best Linear Unbiased Estimators) or BLUPs (Best Linear Unbiased Predictors) employed to account for experimental design conditions? If not, it's recommended that they be used.

      We thank the reviewer for this comment. A note on replicates has been added in the Method/Plant Material section. Concerning the BLUEs/BLUPs, although I am not familiar with their use, I do not think that these approaches are relevant in our experimental design. Indeed, we pooled 3 to 5 replicates for each accession to measure the ionome (as mentioned in the Method/Ionome analysis section – we realized this was perhaps not clear enough, and thus we reinforced this point in this section). Therefore, we do not have the variance data required to perform BLUEs/BLUPs.

      (5) Carbon Dilution Effect:

      The statement, "The first component of the PCA described a clear antagonistic trend between C content and the change of other mineral elements (Fig. 3B)..." suggests a well-understood carbon dilution effect. These results are anticipated and align with existing knowledge.

      We thank the reviewer for this comment. However, this sentence does not relate to the biomass dilution hypothesis referred to by the reviewer. Indeed, the composition of each mineral (C and others) is expressed as a percentage of biomass, not as an absolute value. Therefore, this reflects more a probable effect of the increase in carbon compounds (notably soluble sugars), which could influence mineral composition.

      (6) Heritability Estimates:

      The authors should report both the broad-sense heritability and an estimate of heritability based on a GRM or Kinship matrix.

      We thank the reviewer for this suggestion. We are skeptical of using a kinship matrix to estimate heritability in our study. Estimating narrow-sense heritability using a kinship matrix is conceptually based on the infinitesimal model of Fisher, thereby meaning that phenotypic variation is driven by hundreds to thousands of QTLs with small effects. If this is the case, GWAS conducted on several hundred (or even thousands) of genotypes will not be powerful enough to detect such QTLs. Accordingly, estimates of broad-sense heritability based on estimates of variance components can drastically differ from estimates of narrow-sense heritability based on the use of a kinship matrix, as illustrated in the study of Bergelson et al. (2019 Scientific Reports).

      (7) Application of the Breeder's Equation:

      It would be beneficial if the authors applied the breeder's equation to estimate the species' potential rate of response. Based on the allele frequency of the adapted cluster 3 (69 ecotypes or 43% frequency of Figure 3B), it seems plausible that the populations could adapt within 23 generations.

      We thank the reviewer for this suggestion. Indeed, it would be really interesting to test whether sub-populations could adapt in comparison with others, and over what period of time. It is nevertheless not possible to do so using the Breeder’s equation in our case, as this requires fitness data under conditions of ambient or elevated CO2 (i.e. production of seeds) to be applied, and we do not have these data at the level of the whole population.

      (8) Overall Quality:

      In general, the authors have executed a high-quality ionome mapping experiment. However, the abstract, introduction, and discussion should be entirely rewritten and reframed.

      We thank the reviewer for the positive evaluation of our experiment. As previously mentioned, we are for the most part in agreement with the comments made about the need to align our stated objectives with our experimental data and conclusions. To do so, we have rewritten part of the abstract, introduction and discussion. The details of these modifications are described in the responses made to each comment.

      Here's a line-by-line list of suggestions on writing:

      Line 30 would read better with a comma after thus (or by replacing thus with therefore and then a comma at the start of the sentence).

      Line 33 nevertheless would read better in between commas.

      Lines 45 - 48 sentence is too long, could probably divide it into two.

      Lines 90 - 94 are hard to interpret, recommend rephrasing for clarity.

      Line 130 - keep verbs in the past tense for consistency (ran instead of run).

      Line 194 - what do the authors mean by crossed? I'm inferring they looked at the intersection of DEGs with the list of genes identified by GWA mapping, probably should use a more concise word.

      There's a concurrent use of the adjective strong (Lines 80, 142, 144, 197, 245). I would advise using a more concise adjective or avoiding its use to let the reader form their own opinion on the data.

      Lines 174-176 the cited reference (No. 15) is incorrect. The study by Katz et al. (2022) does not provide information on the role of ZIF1 in zinc sequestration mechanisms under elevated CO2 conditions.

      We thank the reviewer for these detailed recommendations. We have corrected or rephrased the text according to these suggestions.

      Reviewer #2 (Recommendations For The Authors):

      Technical points:

      900 ppm as elevated CO2: Given the importance of the parameter for the experiment, the rationale for selection 900 ppm as elevated CO2 compared to any other concentration should be addressed.

      We acknowledge the reviewer's point and have previously addressed related aspects earlier in our response. In line with this, we have included a justification for this particular parameter in the Method section.

      The authors do not mention what genotype was used for their root/shoot RNAseq experiment.

      We thank the reviewer for this comment, and indeed, this information was not mentioned. This is now done, in the Method section.

      Line 125: Spelling error "REGMPA".

      This has been corrected.

      Line 338: Removal of outlier observations - "Prior to GWAS and multivariate analyses such as PCA or clustering, mineral composition measures were pre-processed to remove technical outliers". The authors should mention the exact number of outliers that were removed and what the explicit criteria were for removal.

      The number of outliers removed from each dataset is now indicated in Supplemental Table 7 (this is cited in the Method section). The explicit criteria used for this analysis is actually mentioned in the corresponding Method section: “the values positioned more than 5 median absolute deviations away from the median were removed from the dataset”.

      Line 379: "Lowly expressed genes with an average value across conditions under 25 reads were excluded from the analysis". Providing information about the number of the lowly expressed genes that were removed from the analysis can help with the interpretation of the likelihood of the candidates selected being correct.

      This is a standard procedure in RNAseq analysis. It avoids many false positives in the differential analysis of gene expression based on ratios (where a very small number in the denominator can lead to a very high variation in expression, of no real significance). For information, this step led to the removal of 11607 and 10121 genes for the shoot and root datasets.

      Line 384: It's not clear how many biological replicates were used.

      This has been corrected.

      Additional comment: We have also become aware of a confusion concerning one of the candidate genes located close to GWA peaks: line 180 of the first version, we mentioned CAX1 (AT1G16380) for its role on nutrient deficiency response. There are actually two genes annotated as CAX1 in TAIR (both are cation exchangers), but the one involved in nutrient deficiency response is AT2G38170. We therefore removed the sentence mentioning AT1G16380/CAX1 as a potential candidate gene.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We thank the reviewers for their constructive comments and suggestions. We have prepared a revised manuscript with updated quantification of theta cycle skipping, new statistical comparisons of the difference between the two behavioral tasks, and general improvements to the text and figures.

      Reviewer #1 (Public Review):

      Summary

      The authors provide very compelling evidence that the lateral septum (LS) engages in theta cycle skipping.

      Strengths

      The data and analysis are highly compelling regarding the existence of cycle skipping.

      Weaknesses

      The manuscript falls short on in describing the behavioral or physiological importance of the witnessed theta cycle skipping, and there is a lack of attention to detail with some of the findings and figures:

      More/any description is needed in the article text to explain the switching task and the behavioral paradigm generally. This should be moved from only being in methods as it is essential for understanding the study.

      Following this suggestion, we have expanded the description of the behavioral tasks in the Results section.

      An explanation is needed as to how a cell can be theta skipping if it is not theta rhythmic.

      A cell that is purely theta skipping (i.e., always fires on alternating theta cycles and never on adjacent theta cycles) will only have enhanced power at half theta frequency and not at theta frequency. Such a cell will therefore not be considered theta rhythmic in our analysis. Note, however, that there is a large overlap between theta rhythmic and theta skipping cell populations in our data (Figure 3 - figure supplement 2), indicating that most cells are not purely theta skipping.

      The most interesting result, in my opinion, is the last paragraph of the entire results section, where there is more switching in the alternation task, but the reader is kind of left hanging as to how this relates to other findings. How does this relate to differences in decoding of relative arms (the correct or incorrect arm) during those theta cycles or to the animal's actual choice? Similarly, how does it relate to the animal's actual choice? Is this phenomenon actually behaviorally or physiologically meaningful at all? Does it contribute at all to any sort of planning or decision-making?

      We agree that the difference between the two behavioral tasks is very interesting. It may provide clues about the mechanisms that control the cycle-by-cycle expression of possible future paths and the potential impact of goal-directed planning and (recent) experience. In the revised manuscript, we have expanded the analysis of the differences in theta-cycle dynamics between the two behavioral tasks. First, we confirm the difference through a new quantification and statistical comparison. Second, we performed additional analyses to explore the idea that the alternation of non-local representations reflects the number of relevant paths available to the animal (Figure 11 – figure supplements 2 and 3), but this did not appear to be the case. However, these results provide a starting point for future studies to clarify the task dependence of the theta- cycle dynamics of spatial representations and to address the important question of behavioral/physiological relevance.

      The authors state that there is more cycle skipping in the alternation task than in the switching task, and that this switching occurs in the lead-up to the choice point. Then they say there is a higher peak at ~125 in the alternation task, which is consistent. However, in the final sentence, the authors note that "This result indicates that the representations of the goal arms alternate more strongly ahead of the choice point when animals performed a task in which either goal arm potentially leads to reward." Doesn't either arm potentially lead to a reward (but different amounts) in the switching task, not the alternation task? Yet switching is stronger in the alternation task, which is not constant and contradicts this last sentence.

      The reviewer is correct that both choices lead to (different amounts of) reward in the switching task. As written, the sentence that the reviewer refers to is indeed not accurate and we have rephrased it to: “This result indicates that the representations of the goal arms alternate more strongly ahead of the choice point when animals performed a task in which either goal arm potentially leads to a desirable high-value reward.”.

      Additionally, regarding the same sentence - "representations of the goal arms alternate more strongly ahead of the choice point when the animals performed a task in which either goal arm potentially leads to reward." - is this actually what is going on? Is there any reason at all to think this has anything to do with reward versus just a navigational choice?

      We appreciate the reviewer’s feedback and acknowledge that our statement needs clarification. At the choice point in the Y-maze there are two physical future paths available to the animal (disregarding the path that the animal took to reach the choice point) – we assume this is what the reviewer refers to as “a navigational choice”. One hypothesis could be that alternation of goal arm representations is present whenever there are multiple future paths available, irrespective of the animal’s (learned) preference to visit one or the other goal arm. However, the reduced alternation of goal arm representations in the switching task that we report, suggests that the animal’s recent history of goal arm visits and reward expectations likely do influence the theta-cycle representations ahead of the choice point. We have expanded our analysis to test if theta cycle dynamics differ for trials before and after a switch in reward contingency in the switching task, but there was no statistical difference in our data. We have rewritten and expanded this part of the results to make our point more clearly.

      Similarly, the authors mention several times that the LS links the HPC to 'reward' regions in the brain, and it has been found that the LS represents rewarded locations comparatively more than the hippocampus. How does this relate to their finding?

      Indeed, Wirtshafter and Wilson (2020) reported that lateral septum cells are more likely to have a place field close to a reward site than elsewhere in their double-sided T-maze. It is possible that this indicates a shift towards reward or value representations in the lateral septum. In our study we did not look at reward-biased cells and whether they are more or less likely to engage in theta cycle skipping. This could be a topic for future analyses. It should be noted that the study by Wirtshafter and Wilson (2020) reports that a reward bias was predominantly present for place fields in the direction of travel away from the reward site. These reward-proximate LS cells may thus contribute to theta-cycle skipping in the inbound direction, but it is not clear if these cells would be active during theta sweeps when approaching the choice point in the outbound direction.

      Reviewer #2 (Public Review)

      Summary

      Recent evidence indicates that cells of the navigation system representing different directions and whole spatial routes fire in a rhythmic alternation during 5-10 Hz (theta) network oscillation (Brandon et al., 2013, Kay et al., 2020). This phenomenon of theta cycle skipping was also reported in broader circuitry connecting the navigation system with the cognitive control regions (Jankowski et al., 2014, Tang et al., 2021). Yet nothing was known about the translation of these temporally separate representations to midbrain regions involved in reward processing as well as the hypothalamic regions, which integrate metabolic, visceral, and sensory signals with the descending signals from the forebrain to ensure adaptive control of innate behaviors (Carus-Cadavieco et al., 2017). The present work aimed to investigate theta cycle skipping and alternating representations of trajectories in the lateral septum, neurons of which receive inputs from a large number of CA1 and nearly all CA3 pyramidal cells (Risold and Swanson, 1995). While spatial firing has been reported in the lateral septum before (Leutgeb and Mizumori, 2002, Wirtshafter and Wilson, 2019), its dynamic aspects have remained elusive. The present study replicates the previous findings of theta-rhythmic neuronal activity in the lateral septum and reports a temporal alternation of spatial representations in this region, thus filling an important knowledge gap and significantly extending the understanding of the processing of spatial information in the brain. The lateral septum thus propagates the representations of alternative spatial behaviors to its efferent regions. The results can instruct further research of neural mechanisms supporting learning during goal-oriented navigation and decision-making in the behaviourally crucial circuits entailing the lateral septum.

      Strengths

      To this end, cutting-edge approaches for high-density monitoring of neuronal activity in freely behaving rodents and neural decoding were applied. Strengths of this work include comparisons of different anatomically and probably functionally distinct compartments of the lateral septum, innervated by different hippocampal domains and projecting to different parts of the hypothalamus; large neuronal datasets including many sessions with simultaneously recorded neurons; consequently, the rhythmic aspects of the spatial code could be directly revealed from the analysis of multiple spike trains, which were also used for decoding of spatial trajectories; and comparisons of the spatial coding between the two differently reinforced tasks.

      Weaknesses

      Possible in principle, with the present data across sessions, longitudinal analysis of the spatial coding during learning the task was not performed. Without using perturbation techniques, the present approach could not identify the aspects of the spatial code actually influencing the generation of behaviors by downstream regions.

      Reviewer #3 (Public Review)

      Summary

      Bzymek and Kloosterman carried out a complex experiment to determine the temporal spike dynamics of cells in the dorsal and intermediate lateral septum during the performance of a Y-maze spatial task. In this descriptive study, the authors aim to determine if inputting spatial and temporal dynamics of hippocampal cells carry over to the lateral septum, thereby presenting the possibility that this information could then be conveyed to other interconnected subcortical circuits. The authors are successful in these aims, demonstrating that the phenomenon of theta cycle skipping is present in cells of the lateral septum. This finding is a significant contribution to the field as it indicates the phenomenon is present in neocortex, hippocampus, and the subcortical hub of the lateral septal circuit. In effect, this discovery closes the circuit loop on theta cycle skipping between the interconnected regions of the entorhinal cortex, hippocampus, and lateral septum. Moreover, the authors make 2 additional findings: 1) There are differences in the degree of theta modulation and theta cycle skipping as a function of depth, between the dorsal and intermediate lateral septum; and 2) The significant proportion of lateral septum cells that exhibit theta cycle skipping, predominantly do so during 'non-local' spatial processing.

      Strengths

      The major strength of the study lies in its design, with 2 behavioral tasks within the Y-maze and a battery of established analyses drawn from prior studies that have established spatial and temporal firing patterns of entorhinal and hippocampal cells during these tasks. Primary among these analyses, is the ability to decode the animal's position relative to locations of increased spatial cognitive demand, such as the choice point before the goal arms. The presence of theta cycle skipping cells in the lateral septum is robust and has significant implications for the ability to dissect the generation and transfer of spatial routes to goals within and between the neocortex and subcortical neural circuits.

      Weaknesses

      There are no major discernable weaknesses in the study, yet the scope and mechanism of the theta cycle phenomenon remain to be placed in the context of other phenomena indicative of spatial processing independent of the animal's current position. An example of this would be the ensemble-level 'scan ahead' activity of hippocampal place cells (Gupta et al., 2012; Johnson & Redish, 2007). Given the extensive analytical demands of the study, it is understandable that the authors chose to limit the analyses to the spatial and burst firing dynamics of the septal cells rather than the phasic firing of septal action potentials relative to local theta oscillations or CA1 theta oscillations. Yet, one would ideally be able to link, rather than parse the phenomena of temporal dynamics. For example, Tingley et al recently showed that there was significant phase coding of action potentials in lateral septum cells relative to spatial location (Tingley & Buzsaki, 2018). This begs the question as to whether the non-uniform distribution of septal cell activity within the Y-maze may have a phasic firing component, as well as a theta cycle skipping component. If so, these phenomena could represent another means of information transfer within the spatial circuit during cognitive demands. Alternatively, these phenomena could be part of the same process, ultimately representing the coherent input of information from one region to another. Future experiments will therefore have to sort out whether theta cycle skipping, is a feature of either rate or phase coding, or perhaps both, depending on circuit and cognitive demands.

      The authors have achieved their aims of describing the temporal dynamics of the lateral septum, at both the dorsal extreme and the intermediate region. All conclusions are warranted.

      Reviewer #1 (Recommendations For The Authors)

      The text states: "We found that 39.7% of cells in the LSD and 32.4% of cells in LSI had significantly higher CSI values than expected by chance on at least one of the trajectories." The text in the supplemental figure indicates a p-value of 0.05 was used to determine significance. However, four trajectory categories are being examined so a Bonferroni correction should be used (significance at p<0.0125).

      Indeed, a p-value correction for multiple tests should be performed when determining theta cycle skipping behavior for each of the four trajectories. We thank the reviewer for pointing out this oversight. We have implemented a Holm-Sidak p-value correction for the number of tested trajectories per cell (excluding trajectories with insufficient spikes). As a consequence, the number of cells with significant cycle-skipping activity decreased, but overall the results have not changed.

      Figure 4 is very confusing as raster plots are displayed for multiple animals but it is unclear which animal the LFP refers to? The bottom of the plot is also referenced twice in the figure caption.

      We apologize for the confusion. We have removed this figure in the revised manuscript, as it was not necessary to make the point about the spatial distribution of theta cycle skipping. Instead, we show examples of spatially-resolved cycle skipping in Figure 4 (formerly Figure 5 - supplementary figures 1 and 2) and we have added a plot with the spatially-resolved cycle skipping index for all analyzed cells in Figure 5A.

      Figure 6 has, I think, an incorrect caption or figure. Only A and B are marked in the figure but A-G are mentioned in the caption but do not appear to correspond to anything in the figure.

      Indeed, the caption was outdated. This has now been corrected.

      Figure 8 is also confusing for several reasons: how is the probability scale on the right related to multiple semi-separate (top and middle) figures? In the top and bottom figures, it is not clear what the right and left sides refer to. It is also unclear why a probability of 0.25 is used for position (seems potentially low). The caption also mentions Figure A but there are no lettered "sub" figures in Figure 8.

      The color bar on the right applies to both the top plot (directional decoding) and the middle plot (positional decoding). However, the maximum probability that is represented by black differs between the top and middle plots. We acknowledge that a shared color bar may lead to confusion and we have given each of the plots a separate color bar.

      As for the maximum probability of 0.25 for position: this was a typo in the legend. The correct maximum value is 0.5. In general, the posterior probability will be distributed over multiple (often neighboring) spatial bins, and the distribution of maximum probabilities will depend on the number of spatial bins, the level of spatial smoothing in the decoding algorithm, and the amount of decodable information in the data. It would be more appropriate to consider the integrated probability over a small section of the maze, rather than the peak probability that is assigned to a single 5 cm bin. Also, note that a posterior probability of 0.5 is many times higher than the probability associated with a uniform distribution, which is in our case.

      The left and right sides of the plots represent two different journeys that the animal ran. On the left an outbound journey is shown, and on the right an inbound journey. We have improved the figure and the description in the legend to make this clearer.

      The reviewer is correct that there are no panels in Figure 8 and we have corrected the legend.

      Some minor concerns

      The introduction states that "a few studies have reported place cell-like activity in the lateral septum (Tingley and Buzsaki, 2018; Wirtshafter and Wilson, 2020, 2019)." However, notably and controversially, the Tingley study is one of the few studies to find NO place cell activity in the lateral septum. This is sort of mentioned later but the citation in this location should be removed.

      The reviewer is correct, Tingley and Buzsaki reported a spatial phase code but no spatial rate code. We have removed the citation.

      Stronger position/direction coding in the dLS consistent with prior studies and they should be cited in text (not a novel finding).

      Thank you for pointing out this omission. Indeed, a stronger spatial coding in the dorsal lateral septum has been reported before, for example by Van der Veldt et al. (2021). We now cite this paper when discussing these findings.

      Why is the alternation task administered for 30m but the switching task for 45m?

      The reason is that rats received a larger reward in the switching task (in the high-reward goal arm) and took longer to complete trials on average. To obtain a more-or-less similar number of trials per session in both tasks, we extended the duration of switching task sessions to 45 minutes. We have added this explanation to the text.

      Regarding the percentage of spatially modulated cells in the discussion, it is also worth pointing out that bits/sec information is consistent with previous studies.

      Thank you for the suggestion. We now point out that the spatial information in our data is consistent with previous studies.

      Reviewer #2 (Recommendations For The Authors)

      While the results of the study are robust and timely, further details of behavioural training, additional quantitative comparisons, and improvements in the data presentation would make the study more comprehensible and complete.

      Major comments

      (1) I could not fully comprehend the behavioural protocols. They require a clearer explanation of both the specific rationale of the two tasks as well as a more detailed presentation of the protocols. Specifically:

      (1.1) In the alternation task, were the arms baited in a random succession? How many trials were applied per session? Fig 1D: how could animals reach high choice accuracy if the baiting was random?

      We used a continuous version of the alternation task, in which the animals were rewarded for left→home→right and right→home→left visit sequences. In addition, animals were always rewarded on inbound journeys. There was no random baiting of goal arms. Perhaps the confusion stems from our use of the word “trial” to refer to a completed lap (i.e., a pair of outbound/inbound journeys). On average, animals performed 54 of such trials per 30-minute session in the alternation task. We have expanded the description of the behavioral tasks in the Results and further clarified these points in the Methods section.

      (1.2) Were they rewarded for correct inbound trials? If there was no reward, why were they considered correct?

      Yes, rats received a reward at the home platform for correct inbound trials. We have now explicitly stated this in the text.

      (1.3) In the switch alternation protocol, for how many trials was one arm kept more rewarding than the other, and how many trials followed after the rewarding value switch?

      A switch was triggered when rats (of their own volition) visited the high-reward goal arm eight times in a row. Following a switch, the animals could complete as many trials as necessary until they visited the new high- reward goal arm in eight consecutive trials, which triggered another switch. As can be seen in Figure 1D, at the population level, animals needed ~13 trials to fully commit to the high-reward goal arm following a switch. We have further clarified the switching task protocol in the Results and Methods sections.

      (1.4) What does the phrase "the opposite arm (as 8 consecutive visits)" exactly mean? Sounds like 8 consecutive visits signalled that the arm was rewarded (as if were not predefined in the protocol).

      The task is self-paced and the animals initially visit both goal arms, before developing a bias for the high- reward goal arm. A switch of reward size was triggered as soon as the animal visited the high-reward goal arm for eight consecutive trials. We have rewritten the description of the switching task protocol, including this sentence, which hopefully clarifies the procedure.

      (1.5) P. 15, 1st paragraph, Theta cycle skipping and alternation of spatial representations is more prominent in the alternation task. Why in the switching task, did rats visit the left and right arms approximately equally often if one was more rewarding than the other? How many switches were applied per recording session, and how many trials were there in total?

      Both the left and right goal arms were sampled more or less equally by the animals because both goal arms at various times were associated with a large reward following switches in reward values during sessions. The number of switches per session varied from 1 to 3. Sampling of both goal arms was also evident at the beginning of each session and following each reward value switch, before animals switched their behavior to the (new) highly rewarded goal arm. In Table 1, we have now listed the number of trials and the number of reward-value switches for all sessions.

      (1.6) Is the goal arm in figures the rewarded/highly rewarded arm only or are non-baited arms also considered here?

      Both left and right arms are considered goal arms and were included in the analyses, irrespective of the reward that was received (or not received).

      (2) The spatial navigation-centred behavioural study design and the interpretation of results highlight the importance of the dorsal hippocampal input to the LS. Yet, the recorded LSI cells are innervated by intermediate and ventral aspects of the hippocampus, and LS receives inputs from the amygdala and the prefrontal cortex, which together may together bring about - crucial for the adaptive behaviours regulated by the LS - reward, and reward-prediction-related aspects in the firing of LS cells during spatial navigation. Does success or failure to acquire reward in a trial modify spatial coding and cycle skipping of LSD vs. LSI cells in ensuing inbound and outbound trials?

      This is an excellent question and given the length of the current manuscript, we think that exploration of this question is best left for a future extension of our study.

      A related question: in Figure 10, it is interesting that cycle skipping is prominent in the goal arm for outbound switching trials and inbound trials of both tasks. Could it be analytically explained by task contingencies and behaviour (e.g. correct/incorrect trial, learning dynamics, running speed, or acceleration)?

      Our observation of cycle skipping at the single-cell level in the goal arms is somewhat surprising and, we agree with the reviewer, potentially interesting. However, it was not accompanied by alternation of representations at the population level. Given the current focus and length of the manuscript, we think further investigation of cycle skipping in the goal arm is better left for future analyses.

      (3) Regarding possible cellular and circuit mechanisms of cycle skipping and their relation to the alternating representations in the LS. Recent history of spiking influences the discharge probability; e.g. complex spike bursts in the hippocampus are associated with a post-burst delay of spiking. In LS, cycle skipping was characteristic for LS cells with high firing rates and was not uniformly present in all trajectories and arms. The authors propose that cycle skipping can be more pronounced in epochs of reduced firing, yet the opposite seems also possible - this phenomenon can be due to an intermittently increased drive onto some LS cells. Was there a systematic relationship between cycle skipping in a given cell and the concurrent firing rate or a recent discharge with short interspike intervals?

      In our discussion, we tried to explain the presence of theta cycle skipping in the goal arms at the single-cell level without corresponding alternation dynamics at the population level. We mentioned the possibility of a decrease in excitatory drive. As the reviewer suggests, an increase in excitatory drive combined with post- burst suppression or delay of spiking is an alternative explanation. We analyzed the spatial tuning of cells with theta cycle skipping and found that, on average, these cells have a higher firing rate in the goal arm than the stem of the maze in both outbound and inbound run directions (Figure 5 – figure supplement 1). In contrast, cells that do not display theta cycle skipping do not show increased firing in the goal arm. These results are more consistent with the reviewer’s suggested mechanism and we have updated the discussion accordingly.

      (4) Were the differences between the theta modulation (cycle skipping) of local vs. non-local representations (P.14, line 10-12, "In contrast...", Figure 9A) and between alternation vs. switching tasks (Figure 10 C,D) significantly different?

      We have added quantification and statistical comparisons for the auto- and cross-correlations of the local/non-local representations. The results indeed show significantly stronger theta cycle skipping of the non-local representations as compared to the local representations (Figure 10 - figure supplement 1A), a stronger alternation of non-local representations in the outbound direction (Figure 10 - figure supplement 1B), and significant differences between the two tasks (Figure 11E,F).

      (5) Regarding the possibility of prospective coding in LS, is the accurate coding of run direction not consistent with prospective coding? Can the direction be decoded from the neural activity in the start arm? Are the cycling representations of the upcoming arms near the choice point equally likely or preferential for the then- selected arm?

      The coding of run direction (outbound or inbound) is distinct from the prospective/retrospective coding of the goal arm. As implemented, the directional decoding model does not differentiate between the two goal arms and accurate decoding of direction with this model can not inform us whether or not there is prospective (or retrospective) coding. To address the reviewer’s comments, we performed two additional analyses. First, we analyzed the directional (outbound/inbound) decoding performance as a function of location in the maze (Figure 6 - figure supplement 3E). The results show that directional decoding performance is high in both stem and goal arms. Second, we analyzed how well we can predict the trajectory type (i.e., to/from the left or right goal arm) as a function of location in the maze, and separately for outbound and inbound trajectories (Figure 6 - figure supplement 3C,D). The results show that on outbound journeys, decoding the future goal arm is close to chance when the animals are running along the stem. The decoding performance goes up around the choice point and reaches the highest level when animals are in the goal arm.

      (6) Figure 10 seems to show the same or similar data as Figures 5 (A,B) and 9 (C,D).

      Figure 10 (figure 11 in revised manuscript) re-analyzes the same data as presented in Figures 5 and 9, but separates the experimental sessions according to the behavioral task. We now explicitly state this.

      Minor comments

      (1) If cycle skipping in the periodicity of non-local representations was more prominent in alternation than in the switching task, one might expect them to be also prominent in early trials of the switching task, when the preference of a more rewarding arm is not yet established. Was this the case?

      The reviewer makes an interesting suggestion. Indeed, if theta cycle skipping and the alternation of non-local representations reflect that there are multiple paths that the animal is considering, one may predict that the theta skipping dynamics are similar between the two tasks in early trials (as the reviewer suggests). Similarly, one may predict that in the switching task, the alternation of non-local representations is weaker immediately before a reward contingency switch (when the animal has developed a bias towards the goal arm with a large reward) as compared to after the switch.

      We have now quantified the theta cycle dynamics of spatial representations in the early trials in each session of both tasks (Figure 11 - figure supplement 2) and in the trials before and after each switch in the switching task (Figure 11 - figure supplement 3).

      The results of the early trial analysis indicate stronger alternation of non-local representations in the alternation task than in the switching task (consistent with the whole session analysis), which is contrary to the prediction.

      The pre-/post-switch analysis did not reveal a significant difference between the trials before and after a reward contingency switch. If anything, there was a trend towards stronger theta cycle skipping/alternation in the trials before a switch, which would be opposite to the prediction.

      These results do not appear to support the idea that the alternation of non-local representations reflects the number of relevant paths available to the animal. We have updated the text to incorporate these new data and discuss the implications.

      (2) Summary: sounds like the encoding of spatial information and its readout in the efferent regions are equally well established.

      Thank you for pointing this out.

      (3) Summary: "motivation and reward processing centers such as the ventral tegmental area." How about also mentioning here the hypothalamus, which is a more prominent output of the lateral septum than the VTA?

      We have now also mentioned the hypothalamus.

      (4) "lateral septum may contribute to the hippocampal theta" - readers not familiar with details of the medial vs. lateral septum research may misinterpret the modest role of LS in theta compared to MS.

      We have added “in addition to the strong theta drive originating from the medial septum” to make clear that the lateral septum has a modest role in hippocampal theta generation.

      (5) "(Tingley and Buzsáki, 2018) found a lack of spatial rate coding in the lateral septum and instead reported a place coding by specific phases of the hippocampal theta rhythm (Rizzi-Wise and Wang, 2021) " needs rephrasing.

      Thank you, we have rephrased the sentence.

      (6) Figure 4 is a bit hard to generalize. The authors may additionally consider a sorted raster presentation of the dataset in this main figure.

      We have removed this figure in the revised manuscript, as it was not necessary to make the point about the location of theta cycle skipping. Instead, we show examples of spatially-resolved cycle skipping in Figure 4 (formerly Figure 5 - supplementary figures 1 and 2), and, following the reviewer’s suggestion, we have added a plot with the spatially-resolved cycle skipping index for all analyzed cells (Figure 5A).

      (7) It would help if legends of Figure 5 (and related supplementary figures) state in which of the two tasks the data was acquired, as it is done for Figure 10.

      Thank you for the suggestion. The legends of Figure 4A,B (formerly Figure 5 – supplemental figures 1 and 2) and Figure 5 now include in which behavioral task the data was acquired.

      (8) Page 10, "Spatial coding...", 1st Citing the initial report by Leugeb and Mizumori would be appropriate here too.

      The reviewer is correct. We have added the citation.

      (9) The legend in Figure 6 (panels A-G) does not match the figure (only panels A,B). What is shown in Fig. 6B, the legend does not seem to fully match.

      Indeed, the legend was outdated. This has now been corrected.

      (10) 7 suppl., if extended to enable comparisons, could be a main figure. Presently, Figure 7C does not account for the confounding effect of population size and is therefore difficult to interpret without complex comparisons with the Supplementary Figure which is revealing per se.

      We thank the reviewer for their suggestion. We have changed Figure 7 such that it only shows the analysis of decoding performed with all LSD and LSI cells. Figure 7 – supplemental figure 1 has been transformed into main Figure 8, with the addition of a panel to show a statistical comparison between decoding performance in LSD and LSI with a fixed number of cells.

      (11) 14, line 10 there is no Figure 8A

      This has been corrected.

      (12) 15 paragraph 1, is the discussed here model the one from Kay et al?

      From Kay et al. (2020) and also Wang et al. (2020). We have added the citations.

      (13) Figure 5 - Figure Supplement 1 presents a nice analysis that, in my view, can merit a main figure. I could not find the description of the colour code in CSI panels, does grey/red refer to non/significant points?

      Indeed, grey/red refers to non-significant points and significant points respectively. We have clarified the color code in the figure legend. Following the reviewer’s suggestion, we have made Figure 5 Supplement 1 and 2 a main figure (Figure 4).

      (14) Figure 5 -Figure Supplement 2. Half of the cells (255 and 549) seems not to be representative of the typically high SCI in the goal arm in left and right inbound trials combined (Figure 5 A). Were the changes in CSI in the right and left inbound trials similar enough to be combined in Fig 5A? Otherwise, considering left and right inbound runs separately and trying to explain where the differences come from would seem to make sense.

      Figure 5 – figure supplement 2 is now part of the new main Figure 4. Originally, the examples were from a single session and the same cells as shown in the old Figure 4. However, since the old Figure 4 has been removed, we have selected examples from different sessions and both left/right trajectories that are more representative of the overall distribution. We have further added a plot with the spatially-resolved cycle skipping for all analyzed cells in Figure 5A.

      (15) In the second paragraph of the Discussion, dorso-ventral topography of hippocampal projections to the LS (Risold and Swanson, Science, 90s) could be more explicitly stated here.

      Thank you for the suggestion. We have now explicitly mentioned the dorsal-ventral topography of hippocampal-lateral septum projections and cite Risold & Swanson (1997).

      (16) Discussion point: why do the differences in spatial information of cells in the ventral/intermediate vs. dorsal hippocampus not translate into similarly prominent differences in LSI vs. LSD?

      In our data, we do observe clear differences in spatial coding between LSD and LSI. Specifically, cell activity in the LSD is more directional, has higher goal arm selectivity, and higher spatial information (we have now added statistical comparisons to Figure 6 – figure supplement 1). As a result, spatial decoding performance is much better for LSD cell populations than LSI cell populations (see updated Figure 8, with statistical comparison of decoding performance). Spatial coding in the LS is not as strong as in the hippocampus, likely because of the convergence of hippocampal inputs, which may give the impression of a less prominent difference between the two subregions.

      (17) Discussion, last paragraph: citation of the few original anatomical and neurophysiological studies would be fitting here, in addition to the recent review article.

      Thank you for the suggestion. We have added selected citations of the original literature.

      (18) Methods, what was the reference electrode?

      We used an external reference electrode that was soldered to a skull screw, which was positioned above the cerebellum. We have added this to the Methods section.

      (19) Methods, Theta cycle skipping: bandwidth = gaussian kerner parameter?

      The bandwidth is indeed a parameter of the Gaussian smoothing kernel and is equal to the standard deviation.

      Reviewer #3 (Recommendations For The Authors)

      Below I offer a short list of minor comments and suggestions that may benefit the manuscript.

      (A) I was not able to access the Open Science Framework Repository. Can this be rectified?

      Thank you for checking the OSF repository. The data and analysis code are now publicly available.

      (B) In the discussion the authors should attempt to flesh out whether they can place theta cycle skipping into context with left/right sweeps or scan ahead phenomena, as shown in the Redish lab.

      Thank you for the excellent suggestion. We have now added a discussion of the possible link between theta cycle skipping and the previously reported scan-ahead theta sweeps.

      (C) What is the mechanism of cycle skipping? This could be relevant to intrinsic vs network oscillator models. Reference should also be made to the Deshmukh model of interference between theta and delta (Deshmukh, Yoganarasimha, Voicu, & Knierim, 2010).

      We had discussed a potential mechanism in the discussion (2nd to last paragraph in the revised manuscript), which now includes a citation of a recent computational study (Chu et al., 2023). We have now also added a reference to the interference model in Deshmukh et al, 2010.

      (D) Little background was given for the motivation and expectation for potential differences between the comparison of the dorsal and intermediate lateral septum. I don't believe that this is the same as the dorsal/ventral axis of the hippocampus, but if there's a physiological justification, the authors need to make it.

      We have added a paragraph to the introduction to explain the anatomical and physiological differences across the lateral septum subregions that provide our rationale for comparing dorsal and intermediate lateral septum (we excluded the ventral lateral septum because the number of cells recorded in this region was too low).

      (E) It would help to label "outbound" and "inbound" on several of the figures. All axes need to be labeled, with appropriate units indicated.

      We have carefully checked the figures and added inbound/outbound labels and axes labels where appropriate.

      (F) In Figure 6, the legend doesn't match the figure.

      Indeed, the legend was outdated. This has now been corrected.

      (G) The firing rate was non-uniform across the Y-maze. Does this mean that the cells tended to fire more in specific positions of the maze? If so, how would this affect the result? Would increased theta cycle skipping at the choice point translate to a lower firing rate at the choice point? Perhaps less overdispersion of the firing rate (Fenton et al., 2010)?

      Individual cells indeed show a non-uniform firing rate across the maze. To address the reviewer’s comment and test if theta cycle skipping cells were active preferentially near the choice point or other locations, we computed the mean-corrected spatial tuning curves for cell-trajectory pairs with and without significant theta cycle skipping. This additional analysis indicates that, on average, the population of theta cycle skipping cells showed a higher firing rate in the goal arms than in the stem of the maze as compared to non-skipping cells for outbound and inbound directions (shown in Figure 5 - figure supplement 1).

      (H) As mentioned above, it could be helpful to look at phase preference. Was there an increased phase preference at the choice point? Would half-cycle firing correlate with an increased or decreased phase preference? Based on prior work, one would expect increased phase preference, at least in CA1, at the choice point (Schomburg et al., 2014). In contrast, other work might predict phasic preference according to spatial location (Tingley & Buzsaki, 2018). Including phase analyses is a suggestion, of course. The manuscript is already sufficiently novel and informative. Yet, the authors should state why phase was not analyzed and that these questions remain for follow-up analyses. If the authors did analyze this and found negative results, it should be included in this manuscript.

      We thank the reviewer for their suggestion. We have not yet analyzed the theta phase preference of lateral septum cells or other relations to the theta phase. We agree that this would be a valuable extension of our work, but prefer to leave it for future analyses.

      (I) One of the most important aspects of the manuscript, is that there is now evidence of theta cycle skipping in the circuit loop between the EC, CA1, and LS. This now creates a foundation for circuit-based studies that could dissect the origin of route planning. Perhaps the authors should state this? In the same line of thinking, how would one determine whether theta cycle skipping is necessary for route planning as opposed to a byproduct of route planning? While this question is extremely complex, other studies have shown that spatial navigation and memory are still possible during the optogenetic manipulation of septal oscillations (Mouchati, Kloc, Holmes, White, & Barry, 2020; Quirk et al., 2021). However, pharmacological perturbation or lesioning of septal activity can have a more profound effect on spatial navigation (Bolding, Ferbinteanu, Fox, & Muller, 2019; Winson, 1978). As a descriptive study, I think it would be helpful to remind the readers of these basic concepts.

      We thank the reviewer for their comment and for pointing out possible future directions for linking theta cycle skipping to route planning. Experimental manipulations to directly test this link would be very challenging, but worthwhile to pursue. We now mention how circuit-based studies may help to test if theta cycle skipping in the broader subcortical-cortical network is necessary for route planning. Given that the discussion is already quite long, we decided to omit a more detailed discussion of the possible role of the medial septum (which is the focus of the papers cited by the reviewer).

      Very minor points

      (A) In the introduction, "one study" begins the sentence but there is a second reference.

      Thank you, we have rephrased the sentence.

      (B) Also in the introduction, it could be helpful to have an operational definition of theta cycle skipping (i.e., 'enhanced rhythmicity at half theta frequency').

      We followed the reviewer’s suggestion.

      (C) The others should be more explicit in the introduction about their main question. Theta cycle skipping exists in CA1, and then import some of the explanations mentioned in the discussion to the introduction (i.e., attractors states of multiple routes). The main question is then whether this phenomenon, and others from CA1, translate to the output in LS.

      We have edited the introduction to more clearly state the main question of our study, following the suggestion from the reviewer.

      (D) There are a few instances of extra closing parentheses.

      We checked the text but did not find instances of erroneous extra closing parentheses. There are instances of nested parentheses, which may have given the impression that closing parentheses were duplicated.

      (E) The first paragraph of the Discussion lacks sufficient references.

      We have now added references to the first paragraph of the discussion.

      (F) At the end of the 2nd paragraph in the Discussion, the comparison is missing. More than what? It's not until the next reference that one can assume that the authors are referring to a dorsal/ventral axis. However, the physiological motivation for this comparison is lacking. Why would one expect a dorsal/intermediate continuum for theta modulation as there is along the dorsal/ventral axis of the hippocampus?

      Thank you for spotting this omission. We have rewritten the paragraph to more clearly make the parallel between dorsal-ventral gradients in the lateral septum and hippocampus and how this relates to the topographical connections between the two structures.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Recommendations For The Authors):

      In this revision the authors address some of the key concerns, including clarification of the balanced nature of the RL driven pitch changes and conducting analyses to control for the possible effects of singing quantity on their results. The paper is much improved but still has some sources of confusion, especially around Fig. 4, that should be fixed. The authors also start the paper with a statistically underpowered minor claim that seems unnecessary in the context of the major finding. I recommend the authors may want to restructure their results section to focus on the major points backed by sufficient n and stats.

      Major issues.

      (1) The results section begins very weak - a negative result based on n=2 birds and then a technical mistake of tube clogging re-spun as an opportunity to peak at intermittent song in the otherwise muted birds. The logic may be sound but these issues detract from the main experiment, result, analysis, and interpretation. I recommend re-writing this section to home in on, from the outset, the well-powered results. How much is really gained from the n=2 birds that were muted before ANY experience? These negative results may not provide enough data to make a claim. Nor is this claim necessary to motivate what was done in the next 6 birds. I recommend dropping the claim?

      We thank the reviewer for the recommendation. We moved the information to the Methods.

      (2) Fig. 4 is very important yet remains very confusing, as detailed below.

      Fig. 4a. Can the authors clarify if the cohort of WNd birds that give rise to the positive result in Fig 4 ever experienced the mismatch in the absence of ongoing DAF reinforcement pre-deafening? Fig4a does nor the next clearly specifies this. This is important because we know that there are day timescale delays in LMAN-dependent bias away from DAF and consolidation into the HVC-RA pathway (Andalman and Fee, 2009). Thus, if birds experienced mismatch pre-deafening in the absence of DAF, then an earnly learning phase in Area X could be set in place. Then deafening occurs, but these weight changes in X could result in LMAN bias that expresses only days later -independent of auditory feedback. Such a process would not require an internal model as the authors are arguing for here. It would simply arise from delays in implementing reinforcement-driven feedback. If the birds in Fig 4 always had DAF on before deafening, then this is not an issue. But if the birds had hours of singing with DAF off before deafening, and therefore had the opportunity to associate DA error signals with the targeted time in the song (e.g. pauses on the far-from-target renditions (Duffy et al, 2022), then the return-to-baseline would be expected to be set in place independent of auditory feedback. Please clarify exactly if the pitch-contingent DAF was on or off in the WNd cohort in the hours before deafening. In Fig. 3b it looks like the answer is yes but I cannot find this clearly stated in the text.

      We did not provide DAF-free singing experience to the birds in Fig. 4 before deafening. Thus, according to the reviewer, the concern does not apply.

      Note that we disagree with the reviewer’s premise that there is ‘day timescale delay in LMAN-dependent bias away from DAF and consolidation into the HVC-RA pathway’. More recent data reveals immediate consolidation of the anterior forebrain bias without a night-time effect (Kollmorgen, Hahnloser, Mante 2020; Tachibana, Lee, Kai, Kojima 2022). Thus, the single bird in (Andalman and Fee 2009) seems to be somewhat of an outlier.

      Hearing birds can experience the mismatch regardless of whether they experience DAF-free singing (provided their song was sufficiently shifted): even the renditions followed by white noise can be assessed with regards to their pitch mismatch, so that DAF imposes no limitation on mismatch assessment.

      We disagree with their claim that no internal model would be needed in case consolidation was delayed in Area X. If indeed, Area X stores the needed change and it takes time to implement this change in LMAN, then we would interpret the change in Area X as the plan that birds would be able to implement without auditory feedback. Because pitch can either revert (after DAF stops) or shift further away (when DAF is still present), there is no rigid delay that is involved in recovering the target, but a flexible decision making of implementing the plan, which in our view amounts to using a model.

      Fig 4b. Early and Late colored dots in legend are both red; late should be yellow? Perhaps use colors that are more distinct - this may be an issue of my screen but the two colors are difficult to discern.

      We used colors yellow to red to distinguish different birds and not early and late. We modified the markers to improve visual clarity: Early is indicated with round markers and late with crosses.

      Fig 4b. R, E, and L phases are only plotted for 4c; not in 4b. But the figure legend says that R, E and L are on both panels.

      In Fig. 4b E and L are marked with markers because they are different for different birds. In Fig. 4c the phases are the same for all birds and thus we labeled them on top. We additionally marked R in Fig. 4b as in Fig. 4c.

      Fig 4e. Did the color code switch? In the rest of Fig 4, DLO is red and WND is blue. Then in 4e it swaps. Is this a typo in the caption? Or are the colors switch? Please fix this it's very confusing.

      Thank you for pointing out the typo in the caption. We corrected it.

      The y axes in Fig 4d-e are both in std of pitch change - yet they have different ylim which make it visually difficult to compare by eye. Is there a reason for this? Can the authors make the ylim the same for fig 4d-e?.

      We added dashed lines to clarify the difference in ylim.

      Fig 4d-3 is really the main positive finding of the paper. Can the others show an example bird that showcases this positive result, plotted as in Fig 3b? This will help the audience clearly visualize the raw data that go into the d' analyses and get a more intuitive sense of the magnitude of the positive result.

      We added example birds to figure 4, one for WNd and one for dLO.

      Please define 'late' in Fig.4 legend.

      Done

      Minor

      Define NRP In the text with an example. Is an NRP of 100 where the birds was before the withdrawal of reinforcement?

      We added the sentence to the results:

      "We quantified recovery in terms of 𝑵𝑹𝑷 to discount for differences in the amount of initial pitch shift where 𝑵𝑹𝑷 = 𝟎% corresponds to complete recovery and 𝑵𝑹𝑷 = 𝟏𝟎𝟎% corresponds pitch values before withdrawal of reinforcement (R) and thus no recovery."

      Reviewer #3 (Recommendations For The Authors):

      The use of "hierarchically lower" to refer to the flexible process is confusing to me, and possibly to many readers. Some people think of flexible, top-down processes as being _higher_ in a hierarchy. Regardless, it doesn't seem important, in this paper, to label the processes in a hierarchy, so perhaps avoid using that terminology.

      We reformulated the paragraph using ‘nested processes’ instead of hierarchical processes.

      In the statement "a seeming analogous task to re-pitching of zebra finch song, in humans, is to modify developmentally learned speech patterns", a few suggestions: it is not clear whether "re-pitching" refers to planning or feedback-dependent learning (I didn't see it introduced anywhere else). And if this means planning, then it is not clear why this would be analogous to "humans modifying developmentally learned speech patterns". As you mentioned, humans are more flexible at planning, so it seems re-pitching would _not_ be analogous (or is this referring to the less flexible modification of accents?).

      We changed the sentence to:

      "Thus, a seeming analogous task to feedback-dependent learning of zebra finch song, in humans, is to modify developmentally learned speech patterns."

    1. Author response:

      We would first like to thank the editor for considering our findings for publication in eLife. Furthermore, we thank the reviewers and editors for their encouraging reviews and for providing helpful and insightful comments.

      Reviewer #1 (Public Review):

      Summary:

      The pituitary gonadotropins, FSH and LH, are critical regulators of reproduction. In mammals, synthesis and secretion of FSH and LH by gonadotrope cells are controlled by the hypothalamic peptide, GnRH. As FSH and LH are made in the same cells in mammals, variation in the nature of GnRH secretion is thought to contribute to the differential regulation of the two hormones. In contrast, in fish, FSH and LH are produced in distinct gonadotrope populations and may be less (or differently) dependent on GnRH than in mammals. In the present manuscript, the authors endeavored to determine whether FSH may be independently controlled by a distinct peptide, cholecystokinin (CCK), in zebrafish.

      Strengths:

      The authors demonstrated that the CCK receptor is enriched in FSH-producing relative to LH-producing gonadotropes, and that genetic deletion of the receptor leads to dramatic decreases in gonadotropin production and gonadal development in zebrafish. Also, using innovative in vivo and ex vivo calcium imaging approaches, they show that LH- and FSH-producing gonadotropes preferentially respond to GnRH and CCK, respectively. Exogenous CCK also preferentially stimulated FSH secretion ex vivo and in vivo.

      Weaknesses:

      The concept that there may be a distinct FSH-releasing hormone (FSHRH) has been debated for decades. As the authors suggest that CCK is the long-sought FSHRH (at least in fish), they must provide data that convincingly leads to such a conclusion. In my estimation, they have not yet met this burden. In particular, they show that CCK is sufficient to activate FSH-producing cells, but have not yet demonstrated its necessity. Their one attempt to do so was using fish in which they inactivated the CCK receptor using CRISPR-Cas9. While this manipulation led to a reduction in FSH, LH was affected to a similar extent. As a result, they have not shown that CCK is a selective regulator of FSH.

      Our conclusion regarding the necessity of CCK signaling for FSH secretion is based on the following evidence:

      (1) CCK-like receptors are expressed in the pituitary gland predominantly on FSH cells.

      (2) Application of CCK to pituitaries elicits FSH cell activation and FSH release, and, to a lesser degree, activation of LH cells.

      (3) Mutating the CCK-like receptor causes a decrease in fsh and lh mRNA synthesis.

      (4) Mutating the CCK-like receptor gives rise to a phenotype which is identical to that caused by mutation of both lh and fsh genes in zebrafish.

      (5) Mutating the FSH-specific CCK receptor in a different species of fish (medaka) also causes a complete shutdown of FSH production and phenocopies a fsh-mutant phenotype (Uehara et al, BioRxiv, DOI: 10.1101/2023.05.26.542428).

      Taken together, we believe that this data strongly supports the conclusion that CCK is necessary for FSH production and release from the fish pituitary. Admittedly, the overlapping effects of CCK on both FSH and LH cells in zebrafish (evident in both our calcium imaging experiments and the KO phenotype) complicates the interpretation of the phenotype. We speculate that the effect of CCK on LH cells in zebrafish can be caused either by paracrine signaling within the gland or by the effects of CCK on higher levels of the axis. In our revised manuscript we will make sure to highlight the overlapping effects of CCK on LH cells rather than portray it as a selective activator of FSH cells.

      Moreover, they do not yet demonstrate that the effects observed reflect the loss of the receptor's function in gonadotropes, as opposed to other cell types.

      Although there is evidence for the expression of CCK receptor in other tissues, we do show a direct decrease of FSH and LH expression in the gonadotrophs of the pituitary of the mutant fish; taken together with its significant expression in FSH cells, it is the most reasonable and forward explanation for the mutant phenotype. Unfortunately, unlike in mice, technologies for conditional knockout of genes in specific cell types are not yet available for our model and cell types. However, in the revised manuscript we will add a supplementary figure describing the distribution of this receptor in other tissues.

      It also is not clear whether the phenotypes of the fish reflect perturbations in pituitary development vs. a loss of CCK receptor function in the pituitary later in life. Ideally, the authors would attempt to block CCK signaling in adult fish that develop normally. For example, if CCK receptor antagonists are available, they could be used to treat fish and see whether and how this affects FSH vs. LH secretion.

      While the observed gonadal phenotype of the KO (sex inversion) should have a developmental origin since it requires a long time to manifest, the effect of the KO on FSH and LH cells is probably more acute.

      In the Discussion, the authors suggest that CCK, as a satiety factor, may provide a link between metabolism and reproduction. This is an interesting idea, but it is not supported by the data presented. That is, none of the results shown link metabolic state to CCK regulation of FSH and fertility. Absent such data, the lengthy Discussion of the link is speculative and not fully merited.

      In the revised manuscript, we will address this comment by either providing data to link cck with metabolic status or tuning down the Discussion of this topic.

      Also in the Discussion, the authors argue that "CCK directly controls FSH cells by innervating the pituitary gland and binding to specific receptors that are particularly abundant in FSH gonadotrophs." However, their imaging does not demonstrate innervation of FSH cells by CCK terminals (e.g., at the EM level).

      Innervation of the fish pituitary does not imply a synaptic-like connection between axon terminals and endocrine cells. In fact, such connections are extremely rare, and their functionality is unclear. Instead, the mode of regulation between hypothalamic terminals and endocrine cells in the fish pituitary is more similar to "volume transmission" in the CNS, i.e. peptides are released into the tissue and carried to their endocrine cell targets by the circulation or via diffusion.

      Moreover, they have not demonstrated the binding of CCK to these cells. Indeed, no CCK receptor protein data are shown.

      Our revised manuscript will include detailed experiments showing the activation of the receptor by its ligand. Unfortunately, no antibody is available against this fish- specific receptor (one of the caveats of working with fish models); therefore, we cannot present receptor protein data.

      The calcium responses of FSH cells to exogenous CCK certainly suggest the presence of functional CCK receptors therein; but, the nature of the preparations (with all pituitary cell types present) does not demonstrate that CCK is acting directly in these cells.

      We agree with the reviewer that there are some disadvantages in choosing to work with a whole-tissue preparation. However, we believe that the advantages of working in a more physiological context far outweigh the drawbacks as it reflects the natural dynamics more precisely. Since our transcriptome data as well as our ISH staining, show that the CCK receptor is exclusively expressed on FSH cells, it is improbable that the observed calcium response is mediated via a different pituitary cell type.

      Indeed, the asynchrony in responses of individual FSH cells to CCK (Figure 4) suggests that not all cells may be activated in the same way. Contrast the response of LH cells to GnRH, where the onset of calcium signaling is similar across cells (Figure 3).

      The difference between the synchronization levels of LH and FSH cells activity stems from the gap-junction mediated coupling between LH cells that does not exist between FSH cells (Golan et al 2016, DOI: 10.1038/srep23777). Therefore, the onset of calcium response in FSH cells is dependent on the irregular diffusion rate of the peptide within the preparation, whereas the tight homotypic coupling between LH cells generates a strong and synchronized calcium rise that propagates quickly throughout the entire population; we will make sure this is clear in the final revision.

      Finally, as the authors note in the Discussion, the data presented do not enable them to conclude that the endogenous CCK regulating FSH (assuming it does) is from the brain as opposed to other sources (e.g., the gut).

      We agree with the reviewer that, for now, we are unable to determine whether hypothalamic or peripheral CCK are the main drivers of FSH cells. While the strong innervation of the gland by CCK-secreting hypothalamic neurons strengthens the notion of a hypothalamic-releasing hormone and also fits with the dogma of the neural control of the pituitary gland in fish (Ball, 1981; doi: 10.1016/0016-6480(81)90243-4.), more experiments are required to resolve this question.

      Reviewer #2 (Public Review):

      Summary:

      This manuscript builds on previous work suggesting that the CCK peptide is the releasing hormone for FSH in fishes, which is different than that observed in mammals where both LH and FSH release are under the control of GnRH. Based on data using calcium imaging as a readout for stimulation of the gonadotrophs, the researchers present data supporting the hypothesis that CCK stimulates FSH- containing cells in the pituitary. In contrast, LH-containing cells show a weak and variable response to CCK but are highly responsive to GnRH. Data are presented that support the role of CCK in the release of FSH. Researchers also state that functional overlap exists in the potency of GnRH to activate FSH cells, thus the two signalling pathways are not separate.

      The results are of interest to the field because for many years the assumption has been that fishes use the same signalling mechanism. These data present an intriguing variation where a hormone involved in satiation acts in the control of reproduction.

      Strengths:

      The strengths of the manuscript are that researchers have shed light on different pathways controlling reproduction in fishes.

      Weaknesses:

      Weaknesses are that it is not clear if multiple ligand/receptors are involved (more than one CCK and more than one receptor?). The imaging of the CCK terminals and CCK receptors needs to be reinforced.

      Reviewer consultation summary:

      • The data presented establish sufficiency, but not necessity of CCK in FSH regulation. The paper did not show that CCK endogenously regulates FSH in fish. This has not been established yet.

      This is a very important comment, also raised by reviewer 1. To avoid repetition, please see our detailed response to the comment above.

      • The paper presents the pharmacological effects of CCK on ex vivo preparations but does not establish the in vivo physiological function of the peptide. The current evidence for a novel physiological regulatory mechanism is incomplete and would require further physiological experiments. These could include the use of a CCK receptor antagonist in adult fish to see the effects on FSH and LH release, the generation of a CCK knockout, or cell-specific genetic manipulations.

      As detailed in the responses to the first reviewer,we cannot conduct conditional, cell- specific gene knockout in our model.

      • Zebrafish have two CCK ligands: ccka, cckb and also multiple receptors: cckar, cckbra and cckbrb. There is ambiguity about which CCK receptor and ligand are expressed and which gene was knocked out.

      In the revised manuscript, we will clarify which of the receptors are expressed and which receptor is targeted. We will also provide data showing the specificity of the receptors (both WT and mutant) to the ligands.

      • Blocking CCK action in fish (with receptor KO) affects FSH and LH. Therefore, the work did not demonstrate a selective role for CCK in FSH regulation in vivo and any claims to have discovered FSHRH need to be more conservative.

      We agree with the reviewer that the overlap in the effect of CCK measured in the calcium activation of cells and in the KO model does not allow us to conclude selectivity. In this context, it is crucial to highlight that CCK-R exhibits high expression on FSH cells but not on LH cells. Therefore, the effect of CCK on LH cells is likely paracrine rather than solely endocrine. We will tone down our claims of selectivity in the revised manuscript.

      • The labelling of the terminals with anti-CCK looks a lot like the background and the authors did not show a specificity control (e.g. anti-CCK antibody pre-absorbed with the peptide or anti-CCK in morphant/KO animals).

      We will update the colors of the image for better clarity. Also, The same antibody had been previously used to mark CCK-positive cells in the gut of the red drum fish (K.A. Webb, Jr. 2010; DOI: https://doi.org/10.1016/j.ygcen.2009.10.010), where a control (pre-absorbed with the peptide) experiment had been conducted.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      The authors have addressed my comments. As a final minor point, regarding comment 2, these condensates are likely viscoelastic rather than purely viscous. It is prudent to indicate that the data may refer to an apparent viscosity.

      We added the following text to the manuscript to highlight the viscoelastic nature of ELP condensates, and the relationship of reported values with the steady state viscosity. “It is worth noting that the reported values, although related, may not quantitatively represent the steady-state viscosity. This discrepancy arises from the slow relaxation timescale inherent in ELP condensates with viscoelastic properties.”

    1. Author response:

      We thank eLife and the reviewers for the thoughtful summary and valuable review of our manuscript. We largely agree with the summary and review and have provided our responses to the comments below. We believe BADGER is a significant new tool for identifying associated risk factors for complex diseases, and the associations we observed in the analysis provide insights into the genetic basis of Alzheimer's disease.

      Reviewer #1 (Public Review):

      The major aim of the paper was a method for determining genetic associations between two traits using common variants tested in genome-wide association studies. The work includes a software implementation and application of their approach. The results of the application of their method generally agree with what others have seen using similar AD and UKB data.

      The paper has several distinct portions. The first is a method for testing genetic associations between two or more traits using genome-wide association tests statistics. The second is a python implementation of the method. The last portion is the results of their method using GWAS from AD and UK Biobank.

      We thank the reviewer for the conclusion and positive comments.

      Regarding the method, it seems like it has similarities to LDSC, and it is not clear how it differs from LDSC or other similar methods. The implementation of the method used python 2.7 (or at least was reportedly tested using that version) that was retired in 2020. The implementation was committed between Wed Oct 3 15:21:49 2018 to Mon Jan 28 09:18:09 2019 using data that existed at the time so it was a bit surprising it used python 2.7 since it was initially going to be set for end-of-life in 2015. Anyway, trying to run the package resulted in unmet dependency errors, which I think are related to an internal package not getting installed. I would expect that published software could be installed using standard tooling for the language, and, ideally, software should have automated testing of key portions.

      We thank the reviewer for their comments. To clarify, the primary difference between our proposed method, BADGERS, and LDSC lies in their respective objectives and applications. LDSC is designed to estimate heritability and genetic correlations between traits by utilizing GWAS summary statistics, thereby aiding in the elucidation of the genetic architecture of complex traits and diseases. Conversely, BADGERS is specifically developed to explore causal relationships between risk factors, such as biomarkers, and diseases of interest. It employs genetic variants as variables to deduce causality, thereby addressing the challenges of confounding and reverse causation that are common in observational studies. Although BADGERS utilizes the LD reference panel derived from LDSC, the LD reference panel is used to obtain the predicted trait expression. The ultimate goal is to focus on linking biobank traits with Alzheimer’s disease and building causal relationships instead of identifying genetic architecture.

      Regarding the technical aspects mentioned, we acknowledge the concerns about the use of Python 2.7 and the issues encountered during the package installation. We are in the process of updating the software to ensure compatibility with current versions of Python and to enhance the installation process with standard tooling and automated testing for a more user-friendly experience. We have provided tests for each portion of the software so the user can test if the software is working properly.

      Regarding the main results, they find what has largely been shown by others using the same data or similar data, which add prima facie validity to the work The portions of the work dealing with AD subgroups, pathology, biomarkers, and cognitive traits of interest. I was puzzled why the authors suggested surprise regarding parental history and high cholesterol not associated with MCI or cognitive composite scores since the this would seem like the likely fallout of selection of the WRAP cohort. The discussion paragraph that started "What's more, environmental factors may play a big role in the identified associations." confused me. I think what the authors are referring to are how selection, especially in a biobank dataset, can induce correlations, which is not what I think of as an environmental effect.

      We thank the reviewer very much for their comment. We're glad that our findings align with existing research using similar data, increasing the validity of our work and the proposed BADGER algorithm. Your point about the lack of association between parental history, high cholesterol, and mild cognitive impairment (MCI) or cognitive composite scores in the WRAP cohort is well-taken. We agree that the selection criteria of the WRAP cohort may influence these findings, as it consists of individuals with a specific risk profile for Alzheimer's disease. This selection could indeed mitigate the observed association between these factors and cognitive outcomes, which we initially found surprising.

      Regarding the environmental factors, we appreciate your clarification and understand the confusion. Our intention was to discuss the potential for selection bias and confounding factors in biobank datasets for the identified associations, which might not necessarily be direct environmental effects.

      Overall, the work has merit, but I am left without a clear impression of the improvement in the approach over similar methods. Likewise, the results are interesting, but similar findings are described with the data that was used in the study, which are over 5 years old at the time of this review.

      We thank the reviewer a lot for their endorsement of the BADGER framework. We believe that our method, BADGER, improves on existing approaches by effectively linking genetic data with the detailed phenotypic information in biobanks and large disease GWAS. This enhances our ability to detect associations without needing individual-level data, offering clearer insights while reducing issues like reverse causality and confounding factors.

      Even though the IGAP dataset is over five years old, it remains one of the largest publicly available datasets for Alzheimer’s Disease. Likewise, the UK biobank is one of the largest publicly available human traits datasets, which researchers continue to use. These datasets' continued utility demonstrates their value in the research community. Additionally, the versatility of the BADGER framework makes it suitable for future research investigating the relationship between human traits and various diseases using different datasets.

      Reviewer #2 (Public Review):

      Summary:

      Yan, Hu, and colleagues introduce BADGERS, a new method for biobank-wide scanning to find associations between a phenotype of interest, and the genetic component of a battery of candidate phenotypes. Briefly, BADGERS capitalizes on publicly available weights of genetic variants for a myriad of traits to estimate polygenic risk scores for each trait, and then identify associations with the trait of interest. Of note, the method works using summary statistics for the trait of interest, which is especially beneficial for running in population-based cohorts that are not enriched for any particular phenotype (ie. with few actual cases of the phenotype of interest).

      Here, they apply BADGERS on Alzheimer's disease (AD) as the trait of interest, and a battery of circa 2,000 phenotypes with publicly available precalculated genome-wide summary statistics from the UK Biobank. They run it on two AD cohorts, to discover at least 14 significant associations between AD and traits. These include expected associations with dementia, cognition (educational attainment), and socioeconomic status-related phenotypes. Through multivariate modelling, they distinguish between (1) clearly independent components associated with AD, from (2) by-product associations that are inflated in the original bivariate analysis. Analyses stratified according to APOE inclusion show that this region does not seem to play a role in the association of some of the identified phenotypes. Of note, they observe overlap but significant differences in the associations identified with BADGERS and other Mendelian randomization (MR), hinting at BADGERS being more powerful than classical top variant-based MR approaches. They then extend BADGERS to other AD-related phenotypes, which serves to refine the hypotheses about the underlying mechanisms accounting for the genetic correlation patterns originally identified for AD. Finally, they run BADGERS on a pre-clinical cohort with mild cognitive impairment. They observe important differences in the association patterns, suggesting that this preclinical phenotype (at least in this cohort) has a different genetic architecture than general AD.

      We thank the reviewer a lot for the conclusion and positive comments.

      Strengths:

      BADGERS is an interesting new addition to a stream of attempts to "squeeze" biobank data beyond pure association studies for diagnosis. Increasingly available biobank cohorts do not usually focus on specific diseases. However, they tend to be data-rich, opening for deep explorations that can be useful to refine our knowledge of the latent factors that lead to diagnosis. Indeed, the possibility of running genetic correlation studies in specific sub-settings of interest (e.g. preclinical cohorts) is arguably the most interesting aspect of BADGERS. Classical methods like LDSC or two-sample MR capitalize on publicly available summary statistics from large cohorts, or having access to individual genotype data of large cohorts to ensure statistical power. Seemingly, BADGERS provides a balanced opportunity to dissect the correlation between traits of interest in settings with small sample size in which other methods do not work well.

      We thank the reviewer a lot for the conclusion and positive comments.

      Weaknesses:

      However, the increased statistical power is just hinted, and for instance, they do not explore if LDSC would have identified these associations. Although I suspect that is the case, this evidence is important to ensure that the abovementioned balance is right. Finally, as discussed by the authors, the reliance on polygenic risk scoring necessarily undermines the causality evidence gained through BADGERS. In this sense, BADGERS provides an alternative to strict instrumental-variable based analysis, which can be particularly useful to generate new mechanistic hypotheses.

      We thank the reviewer a lot for the comments. We understand the importance of comparing BADGER to other methods. The comparison with LDSC, while not directly relevant to BADGER’s causal inference aims, is indeed an interesting aspect to consider for future studies. In this paper, we focused on comparing BADGER with Mendelian Randomization (MR), which shares its causal inference objective.

      As a result, BADGERS identified a total of 48 traits that reached Bonferroni-corrected statistical significance. In contrast, MR-IVW only identified nine traits with Bonferroni-corrected statistical significance. Among these nine traits, seven were also identified by BADGERS. This demonstrates that BADGER holds higher power in detecting causal relationships.

      Regarding the use of polygenic risk scoring, we agree that it holds challenges in directly inferring causality. While BADGERS offers an innovative way to explore genetic correlations and can help generate new hypotheses about disease mechanisms, it does not replace the causal inferences that can be drawn from instrumental-variable-based analyses. Instead, it should be viewed as a complementary tool that can illuminate potential genetic relationships and guide further causal investigations.

      In summary, after 15 years of focus on diagnosis that would require having individual access to large patient cohorts, BADGERS can become an excellent tool to dig into trait heterogeneity, especially if it turns out to be more powerful than other available methodologies.

      We thank the reviewer a lot for the conclusion and positive comments.

    1. Author response:

      We thank the reviewers and editors for their time and effort reviewing and improving this manuscript. We also thank them for their support.

      Following the guidelines received by eLife we submit here the preliminary author’s response to the Public review with our planned changes to the manuscript.

      Reviewer 1.

      Comment 1. Issue on cross-reactivities of MafB antibodies.

      We are confident that our description of MafB V1 interneurons is correct despite some cross-reactivity with one of the antibodies used. We test all antibodies we use, and unfortunately, we found an inverse relationship between sensitivity and specificity with the two MafB antibodies used in this study. We chose for quantification the one with highest sensitivity, despite the presence of some cross-reactivity in interneurons other than the dorsal and ventral (Renshaw) V1 populations we focus on. The dorsal and ventral (Renshaw) V1 populations we describe here are also reactive with the more specific antibody (although with lower sensitivity) and both are neatly labeled in a MafB-GFP reporter mouse as described in Figure 3. We will add an image to the supplement with MafB-GFP V1 Interneurons at P5 showing the immunoreactivity of both MafB antibodies as suggested by the reviewer. We agree with the reviewer that this will give further support to the characterization of these populations by either immunocytochemical or genetic means at P5.

      Unfortunately, we cannot show lack of immunoreactivity for MafB antibodies in MafB GFP/GFP knockout mice at P5 because MafB global KOs die at birth as a result of respiratory failure. This is due to removal of inhibitory interneurons in brainstem centers critical for respiration (Blanchi at al. 2003 MafB deficiency causes defective respiratory rhythmogenesis and fatal central apnea at birth. Nat Neurosci. 6(10):1091-100. doi: 10.1038/nn1129. PMID: 14513037). This is why we used tissues from late embryos for testing antibody specificity in KO spinal cords. We will make this clearer in the text.

      Comment 2. Overlap of V1 clades with lineage labeled Foxp2-V1s at P5.

      We collected the data requested by the reviewer for P5 Foxp2-V1 interneurons and this will be added to an updated version of this figure. In comparison to the results with the OTP mouse, we only found marginal overlap at P5 with Renshaw cells, Pou6f2, and Sp8 V1s in our genetic intersection to label Foxp2-V1s. We apologize for not showing the data. We will make this clearer.

      Reviewer 2.

      Comment 1. Paper VERY hard to read.

      We will make every effort to make the paper more readable by moving methodological discussions to supplementary materials. We strive to keep our methods as rigorous, clean, and replicable as possible, and that sometimes requires lengthy explanations of the details and reasoning behind our approaches. We will make sure this does not distract from the principal scientific messages we want to convey. We agree with the reviewer that these should be emphasized over methodological detail, and we will correct any mistakes in the text that lead to confusion. Thank you for pointing out this problem that we hope to correct in a new version. Why focus on Foxp2 V1s? We focus in the Foxp2 population for several reasons: 1) This is the largest population of V1s, and it is the one with a close spatial association to motoneurons, in particular limb motoneurons; 2) Given previous results (Benito-Gonzalez and Alvarez, 2012, cited in bibliography) it likely includes many reciprocal inhibitory interneurons; 3) We do not have the mice for studying the Pou6f2 (or Sp8) population, but similar studies are now being carried out in the Bikoff lab.

      Comment 2. Lack of functional studies.

      Functional studies are currently being carried out, both during development of limb function in postnatal mice as well as in adult animals. These studies required the creation of several new animal models and reagents. As with the present manuscript, we thoroughly characterize all animals and methods. This takes time and space. These studies are beyond the goals and length of the current manuscript, but we agree with the reviewer that these are the critical next experiments that need to be performed. We are now finalizing studies on the role of Foxp2-V1 interneurons in the postnatal development of limb coordination and validating approaches for silencing them in the adult while also optimizing behavioral assays and recordings. The data presented here on Foxp2-V1 interneuron heterogeneity and relations with limb motoneurons gives the necessary context for raising stronger hypotheses and aiding in the interpretation of future results in functional studies.

      Synapse counts.

      We respectfully disagree with the reviewer’s comments on our synapse density estimates. To fully explain the reasons and prevent any ambiguity, we need to focus on detailed methodological aspects. We apologize for the lengthy response. Two major issues were raised:

      (1) Focus on the cell body.

      The issue pointed by the reviewer of potential synapses in distal dendrites from V1 subgroups not projecting proximally was already discussed in the text. The reason we focus on the cell body is because 1) it is not feasible to study the full dendritic arbor of so many different types of motoneurons and 2) it allows us to identify V1 subpopulations that likely exert stronger modulation of motoneuron firing by targeting the proximal somatodendritic membrane. The fact that synaptic organization on motoneurons is similar on cell bodies and proximal dendrites (first 100 µm) suggests that inputs from V1 clades other than Renshaw cells are likely further away, and therefore there is limited benefit to include analyses of proximal dendrites in these data. Additionally, dendrites would be difficult to consistently follow in Chat immunostained tissue. We are currently using novel viral approaches to obtain labeling of single motoneurons and their full dendritic trees for more in depth dendritic analyses in the mouse. The classical method based on single cell in vivo intracellular labeling using micropipettes is presently very low yield in the adult mouse. We are experienced with detailed single motoneuron dendritic arbor analyses in cat and rat motoneurons (Alvarez et al. 1997 Cell-type specific organization of glycine receptor clusters in the mammalian spinal cord. J Comp Neurol. 379(1):150-70; Alvarez et al., 1998 Distribution of 5-hydroxytryptamine-immunoreactive boutons on alpha-motoneurons in the lumbar spinal cord of adult cats. J Comp Neurol. 393(1):69-83; Rotterman et al., 2014. Normal distribution of VGLUT1 synapses on spinal motoneuron dendrites and their reorganization after nerve injury. J Neurosci. 34(10):3475-92. doi: 10.1523/JNEUROSCI.4768-13.2014). Based on this experience, we do not believe it is feasible to include similar analyses to compare all motor columns throughout 6 segments of the spinal cord in this study. We agree with the reviewer that these are important data sets that need to be collected and they are planned for future experiments. These analyses will address different questions than the ones posed and answered in our current manuscript.

      (2) Number of motoneurons analyzed.

      We disagree with the reviewer assessment that our conclusions might be biased because of the numbers of motoneurons analyzed. We sampled a total of 295 motoneurons in 5 different mice (117 LMC/HMC, 99 MMC, and 79 PGC motoneurons), and we used stringent methods for synapse detection. Due to a technical error, Mouse 3 lacked data in upper lumbar and Th13, but all other mice included data in almost all motor columns and segments. We disagree with the characterization that these are small samples. For full transparency, all motoneurons analyzed were identified in Figure 6D. Each of the nearly 300 motoneuron cell bodies was carefully reconstructed through several optical planes to obtain an accurate estimate of synapse density. More automatic methods in current use in the literature sometimes analyze larger samples, but our methods are designed to avoid methodological biases inherent to these automatic methods. We do not use image thresholding to extract synaptic contacts because they lack accuracy identifying single synapses. Thus, estimates using this technique frequently refer to coverage, not synapse density. In addition, it is hard to keep threshold criteria consistent across multiple optical planes to analyze enough section thickness to estimate a motoneuron surface. This is because tissue light diffraction alters thresholding levels continuously across optical planes. Thus, many authors present data as linear densities across a perimeter (in a single plane) measuring many cells in one field in one plane. We avoid cell body linear densities (or coverage) because they bias counts towards larger synapses that have higher probability of being present at any single confocal plane. Moreover, estimates along a surface reduces synapse sampling variability and better estimate synaptic coverage compared to estimates derived from analyzing single cross-sections. We also confirm each genetically labeled varicosity as a likely synapse by accumulation of VGAT. In this manner we restrict our counts to synaptic boutons and not axons or intervaricose regions. Previously, we used bassoon to show the accuracy of our methods (Wootz et al. 2013 Alterations in the motor neuron-Renshaw cell circuit in the Sod1(G93A) mouse model. J Comp Neurol. 521(7):1449-69. doi: 10.1002/cne.23266). That means that our densities are true synaptic densities, which are difficult to extract from automatic methods that estimate fluorescence coverage over larger samples of somatic profiles but fail to individualize synapses and frequently bias results. These bulk methods introduce significant confounds in data interpretation: Is higher coverage due to bigger synapses or more synapses? Do threshold structures represent true synapses or also include axons? To what extent does sub- or over-thresholding in different planes affect identification of structures in contact with the motoneuron surface? We avoid all these problems. Not surprisingly, a nested ANOVA demonstrated consistent significant differences among motor columns and segments.

      In summary, while more automatic methods allow larger samples, they disregard true synaptic densities and are based on thresholding methods with high variability in different motoneurons, optical planes and histological sections, thereby they require much larger numbers of motoneurons to overcome their many biases and sources of error. This is not our case. Our sample size is large enough considering the accuracy of our methods and data quality. This is demonstrated by consistency in statistical results across motor columns in different segments and mice.

      Comment 3. Possibility of anterograde transsynaptic labeling from primary afferents infected with rabies virus.

      This is a fair question that we did not clearly explain. The reviewer compares our results with those of Pimpinella et al., 2022. The methods used are different. To obtain anterograde tracing, these authors used Cre lines to achieve high levels of expression of TVA and RV glycoprotein in specific subtypes of sensory neurons including proprioceptors. Then EnVa-coated Rabies virus was injected directly inside the spinal cord for cell-type specificity. This method transynaptically labeled in the anterograde direction interneurons receiving inputs from specific types of sensory afferents, but the method does not have the muscle specificity required in our analyses. In our case, we used intramuscular injections at P5 of AAV1-G for transcomplementation with Rabies virus delta G injected in the same muscles later, at P15. In previous studies in which we used the RV-delta G virus without AAV1G, we analyzed motoneuron and primary afferent infection rates and found both to be considerably reduced with injection age. In our hands, there is almost no RV infection of primary afferents when Rabies virus is injected i.m. at P15, but there is some limited motoneuron infection remaining (that we used to our advantage in this paper to avoid primary afferent and developmental confounds).

      Unfortunately, these methodological studies are presently communicated only in abstract form (GomezPerez et al., 2015 and 2016; Program Nos. 242.08 and 366.06). Therefore, we will add to the supplementary information some images from serial sections to those illustrated in the paper and that will show a few “start” LG motoneurons that remained labeled at this survival time point and the lack of any dorsal horn primary afferent labeling. This is consistent with our yet unpublished data that is based on a larger number of animals and more extensive time courses.

      Comment 4. Temporal resolution of birth-dating.

      We agree with the reviewer, and that is the reason we explicitly discuss that temporal resolution is not perfect (we also add a few more caveats that affect temporal resolution beyond the reviewers’ comments). However, the method is good enough to differentiate temporal sequences of neurogenesis with close to 12-hour resolution, once enough animals are analyzed to compensate for methodological temporal overlaps. That is the reason for our Figure 1D.

      Reviewer 3

      Comment 1. Text is too long and main message buried in technical details.

      We agree and similar to our response to the first comment of Reviewer 2, we will revise the writing to make it more straightforward while moving some of the information on methods and technical discussion to supplementary materials. As demonstrated by reviewer 2 comments, methodological discussions are still important to best interpret the data presented in this paper.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This study presents a valuable contribution to cardiac arrhythmia research by demonstrating long noncoding RNA Dachshund homolog 1 (lncDACH1) tunes sodium channel functional expression and affects cardiac action potential conduction and rhythms. Whereas the evidence for functional impact of lncDACH1 expression on cardiac sodium currents and rhythms is convincing, biochemical experiments addressing the mechanism of changes in sodium channel expression and subcellular localization are incomplete.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this study, the authors show that a long-non coding RNA lncDACH1 inhibits sodium currents in cardiomyocytes by binding to and altering the localization of dystrophin. The authors use a number of methodologies to demonstrate that lncDACH1 binds to dystrophin and disrupts its localization to the membrane, which in turn downregulates NaV1.5 currents. Knockdown of lncDACH1 upregulates NaV1.5 currents. Furthermore, in heart failure, lncDACH1 is shown to be upregulated which suggests that this mechanism may have pathophysiolgoical relevance.

      Strengths:

      (1) This study presents a novel mechanism of Na channel regulation which may be pathophysiologically important.

      (2) The experiments are comprehensive and systematically evaluate the physiological importance of lncDACH1.

      Weaknesses:

      (1). What is indicated by the cytoplasmic level of NaV1.5, a transmembrane protein? The methods do not provide details regarding how this was determined. Do you authors means NaV1.5 retained in various intracellular organelles?

      Thank you for the good suggestion. Our study showed that Nav1.5 was transferred to the cell membrane by the scaffold protein Dystropin in response to the regulation of LncDACH1, but not all Nav1.5 in the cytoplasm was transferred to the cell membrane. Therefore, the cytoplasmic level of Nav1.5 represents the Nav1.5 protein that is not transferred to the cell membrane but stays in the cytoplasm and various organelles within the cytoplasm when Nav1.5 is regulated by LncDACH1

      (2) What is the negative control in Fig. 2b, Fig. 4b, Fig. 6e, Fig. 7c? The maximum current amplitude in these seem quite different. -40 pA/pF in some, -30 pA/pF in others and this value seems to be different than in CMs from WT mice (<-20 pA/pF). Is there an explanation for what causes this variability between experiments and/or increase with transfection of the negative control? This is important since the effect of lncDACH1 is less than 50% reduction and these could fall in the range depending on the amplitude of the negative control.

      Thank you for the insightful comment. The negative control in Fig. 2b, Fig. 4b, Fig. 6e are primary cardiomyocytes transfected with empty plasmids. The negative control in Fig.7c are cardiomyocytes of wild-type mice injected with control virus. When we prepare cells before the patch-clamp experiments, the transfection efficiency of the transfection reagent used in different batches of cells, as well as the different cell sizes, ultimately lead to differences in CMS.

      (3) NaV1.5 staining in Fig. 1E is difficult to visualize and to separate from lncDACH1. Is it possible to pseudocolor differently so that all three channels can be visualized/distinguished more robustly?

      Thank you for the good suggestion. We have re-added color to the original image to distinguish between the three channels.

      Author response image 1.

      (4) The authors use shRNA to knockdown lncDACH1 levels. It would be helpful to have a scrambled ShRNA control.

      Thank you for the insightful comment. The control group we used was actually the scrambled shRNA, but we labeled the control group as NC in the article, maybe this has caused you to misunderstand.

      (5) Is there any measurement on the baseline levels of LncDACH1 in wild-type mice? It seems quite low and yet is a substantial increase in NaV1.5 currents upon knocking down LncDACH1. By comparison, the level of LncDACH1 seems to be massively upregulated in TAC models. Have the authors measured NaV1.5 currents in these cells? Furthermore, does LncDACH1 knockdown evoke a larger increase in NaV1.5 currents?

      Thank you for the insightful comment.

      (1).The baseline protein levels of LncDACH1 in wild-type mice and LncDACH1-CKO mice has been verified in a previously published article(Figure 3).(Hypertension. 2019;74:00-00. DOI: 10.1161/HYPERTENSIONAHA.119.12998.)

      Author response image 2.

      (2). We did not measure the Nav1.5 currents in cardiomyocytes of the TAC model mice in this artical, but in another published paper, we found that the Nav1.5 current in the TAC model mice was remarkably reduced than that in wild-type mice(Figure 4).(Gene Ther. 2023 Feb;30(1-2):142-149. DOI: 10.1038/s41434-022-00348-z)

      Author response image 3.

      This is consistent with our results in this artical, and our results show that LncDACH1 levels are significantly upregulated in the TAC model, then in the LncDACH1-TG group, the Nav1.5 current is significantly reduced after the LncDACH1 upregulation(Figure 3).

      Author response image 4.

      (6) What do error bars denote in all bar graphs, and also in the current voltage relationships?

      Thank you for the good comment. All the error bars represent the mean ± SEM. They represent the fluctuation of all individuals of a set of data based on the average value of this set of data, that is, the dispersion of a set of data.

      Reviewer #2 (Public Review):

      This manuscript by Xue et al. describes the effects of a long noncoding RNA, lncDACH1, on the localization of Nav channel expression, the magnitude of INa, and arrhythmia susceptibility in the mouse heart. Because lncDACH1 was previously reported to bind and disrupt membrane expression of dystrophin, which in turn is required for proper Nav1.5 localization, much of the findings are inferred through the lens of dystrophin alterations.

      The results report that cardiomyocyte-specific transgenic overexpression of lncDACH1 reduces INa in isolated cardiomyocytes; measurements in whole heart show a corresponding reduction in conduction velocity and enhanced susceptibility to arrhythmia. The effect on INa was confirmed in isolated WT mouse cardiomyocytes infected with a lncDACH1 adenoviral construct. Importantly, reducing lncDACH1 expression via either a cardiomyocyte-specific knockout or using shRNA had the opposite effect: INa was increased in isolated cells, as was conduction velocity in heart. Experiments were also conducted with a fragment of lnDACH1 identified by its conservation with other mammalian species. Overexpression of this fragment resulted in reduced INa and greater proarrhythmic behavior. Alteration of expression was confirmed by qPCR.

      The mechanism by which lnDACH1 exerts its effects on INa was explored by measuring protein levels from cell fractions and immunofluorescence localization in cells. In general, overexpression was reported to reduce Nav1.5 and dystrophin levels and knockout or knockdown increased them.

      Thank you for summarizing our work and thank you very much for your appreciation on our work.

      Reviewer #3 (Public Review):

      Summary:

      In this manuscript, the authors report the first evidence of Nav1.5 regulation by a long noncoding RNA, LncRNA-DACH1, and suggest its implication in the reduction in sodium current observed in heart failure. Since no direct interaction is observed between Nav1.5 and the LncRNA, they propose that the regulation is via dystrophin and targeting of Nav1.5 to the plasma membrane.

      Strengths:

      (1) First evidence of Nav1.5 regulation by a long noncoding RNA.

      (2) Implication of LncRNA-DACH1 in heart failure and mechanisms of arrhythmias.

      (3) Demonstration of LncRNA-DACH1 binding to dystrophin.

      (4) Potential rescuing of dystrophin and Nav1.5 strategy.

      Thank you very much for your appreciation on our work.

      Weaknesses:

      (1) Main concern is that the authors do not provide evidence of how LncRNA-DACH1 regulates Nav1.5 protein level. The decrease in total Nav1.5 protein by about 50% seems to be the main consequence of the LncRNA on Nav1.5, but no mechanistic information is provided as to how this occurs.

      Thank you for the insightful comment.

      (1) The mechanism of the whole article is as mentioned in the discussion at the end of the article: LncDACH1 binds to dystrophin and thus inhibits membrane trafficking of Nav1.5, Dystrophin is a well-characterized Nav1.5 partner protein. It indirectly interacts with Nav1.5 via syntrophin, which binds with the C-terminus of dystrophin and with the SIV motif on the C-terminus of Nav1.5(Circ Res. 2006;99:407-414. doi: 10.1161/01.RES.0000237466.13252.5e)(Circulation.2014;130:147-160.doi:10.1161/CIRCULATIONAHA.113.007852).

      And we performed pulldown and RNA immunoprecipitation experiments to verify it (Figure 1).

      Author response image 5.

      2) Then we found that overexpression of lncDACH1 increased the ubiquitination of Nav1.5, which explains the downregulation of total Nav1.5 protein (Online Supplementary Figure 12).

      Author response image 6.

      3). Lastly,we found that lncDACH1 failed to pulldown Nav1.5 and anti-Nav1.5 did not precipitate lncDACH1( Supplementary Fig. 1).

      Author response image 7.

      These data indicated that lncDACH does not interact with Nav1.5 directly. It participates in the regulation of Nav1.5 by binding to dystrophin.Cytoplasmic Nav1.5 that failed to target on plasma membrane may be quickly distinguished and then degraded by these ubiquitination enzymes.

      (2) The fact that the total Nav1.5 protein is reduced by 50% which is similar to the reduction in the membrane reduction questions the main conclusion of the authors implicating dystrophin in the reduced Nav1.5 targeting. The reduction in membrane Nav1.5 could simply be due to the reduction in total protein.

      Thank you for the insightful comment. We do not rule out the possibility that the reduction in membrane Nav1.5 maybe be due to the reduction in total protein, but we don't think this is the main mechanism. Our data indicates that the membrane and total protein levels of Nav1.5 were reduced by 50%. However, the cytoplasmic Nav1.5 increased in the hearts of lncDACH1-TG mice than WT controls rather than reduced like membrane and total protein(Figure 1).

      Author response image 8.

      Therefore, we think the mian mechanism of the whole article is as mentioned in the discussion at the end of the article: LncDACH1 binds to dystrophin and thus inhibits membrane trafficking of Nav1.5.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) In Fig. 6E the error bars are only in one direction for cF-lncDACH1. It seems that this error overlaps for NC and cF-lncDACH1 at several voltages, yet it is marked as statistically significant. Also in Fig. 7C, what statistical test was used? Do the authors account for multiple comparisons?

      Thank you for the insightful comment.

      (1) We have recalculated the two sets of data and confirmed that there are indeed statistically significant between the two sets of data for NC and cF-lncDACH1 at In Fig. 6E, The overlaps in the picture may only be visually apparent.

      (2) The data in Fig. 7C are expressed as mean ± SEM. Statistical analysis was performed using unpaired Student’s t test or One-Way Analysis of Variance (ANOVA) followed by Tukey’s post-hoc analysis.

      (2) line 57, "The Western blot" remove "The"

      Sorry for the mistake. We have corrected it.

      (3) line 61, "The opposite data were collected" It is unclear what is meant by opposite.

      Sorry for the mistake. We have corrected it.

      (4) Lines 137-140. This sentence is complex, I would simplify as two sentences.

      Sorry for the mistake. We have corrected it.

      (5) Line 150, "We firstly validated" should be "we first validated"

      Sorry for the mistake. We have corrected it.

      (6) Line 181, "Consistently, the membrane" Is this statement meant to indicate that the experiments yielded a consistent results or that this statement is consistent with the previous one? In either case, this sentence should be reworded for clarification.

      Sorry for the mistake. We have corrected it.

      (7) Line 223, "In consistent, the ex vivo" I am not sure what In consistent means here.

      Thank you for the good suggestion. We mean that the results of ex vivo is consistent with the results of in vivo. We have corrected it to make it clearer.

      (8) Line 285. "a bunch of studies" could be rephrased as "multiple studies"

      Sorry for the mistake. We have corrected it.

      (9) Line 299 "produced no influence" Do you mean produced no change?

      Thank you for the good suggestion.As you put it,we mean it produced no change.

      (10) Line 325 "is to interact with the molecules" no need for "the molecules

      Sorry for the mistake. We have corrected it.

      (11) lines 332-335. This sentence is very confusing.

      Thank you for the insightful comment. We have corrected it.

      (12) Lines 341-342. It is unnecessary to claim primacy here.

      Thank you for the good suggestion. We have removed this sentence.

      (13) Line 373. "Sodium channel remodeling is commonly occured in" perhaps rephrase as occurs commonly

      Thank you for the insightful comment. We have corrected it.

      Reviewer #2 (Recommendations For The Authors):

      Critique

      (1) Aside from some issues with presentation noted below, these data provide convincing evidence of a link between lncDACH1 and Na channel function. The identification of a lncDACH1 segment conserved among mammalian species is compelling. The observation that lncDACH1 is increased in a heart failure model and provides a plausible hypothesis for disease mechanism.

      Thank you very much for your appreciation on our work.

      (2) Has a causal link between dystrophin and Na channel surface expression has been made, or is it an argument based on correlation? Is it possible to rule out a direct effect of lncDACH1 on Na channel expression? A bit more discussion of the limitations of the study would help here.

      Thank you for the insightful comment.

      (1). Dystrophin is a well-characterized Nav1.5 partner protein. It indirectly interacts with Nav1.5 via syntrophin, which binds with the C-terminus of dystrophin and with the SIV motif on the C-terminus of Nav1.5(Circ Res. 2006;99:407-414. doi: 10.1161/01.RES.0000237466.13252.5e)(Circulation.2014;130:147-160.doi:10.1161/CIRCULATIONAHA.113.007852).

      Author response image 9.

      (2).we performed pulldown and RNA immunoprecipitation experiments. The data showed that lncDACH1 failed to pulldown Nav1.5 and anti-Nav1.5 did not precipitate lncDACH1 (Online Supplementary Figure 11). These data indicated that lncDACH does not interact with Nav1.5 directly. ( Supplementary Fig. 1)

      Author response image 10.

      (3) What normalization procedures were used for qPCR quantification? I could not find these.

      Thank you for the good suggestion.The expression levels of mRNA were calculated using the comparative cycle threshold (Ct) method (2−ΔΔCt). Each data point was then normalized to ACTIN as an internal control in each sample. The final results are expressed as fold changes by normalizing the data to the values from control subjects. We have added the normalization procedures in the methods section of the article.

      (4) In general, I found the IF to be unconvincing - first, because the reported effects were not very apparent to me, but more importantly, because only exemplars were shown without quantification of a larger sample size.

      Thank you for the good suggestion. Accordingly, we quantified the immunostaining data. The data have been included in Supplementary Figure 2- 16.The sample size is labeled in the caption.

      Author response image 11.

      Fluorescence intensity of lncDACH1, dystrophin and Nav1.5 in isolated cardiomyocytes of lncDACH1-TG mice. a,b, Membrane levels of dystrophin (dys) and Nav1.5. N=9 for dys. N=8 for Nav1.5. P<0.05 versus WT group. c,d, Cytoplasm levels of dystrophin and Nav1.5. N=9. P<0.05 versus WT group. e, Fluorescence in situ hybridization (FISH) images of LncDACH1. N=10. *P<0.05 versus WT group. P-values were determined by unpaired t test.

      Author response image 12.

      Fluorescence intensity of dystrophin and Nav1.5 in cultured neonatal cardiomyocyte overexpressing lncDACH1. a,b, Membrane levels of dystrophin and Nav1.5. N=9. P<0.05 versus NC group. c,d, Cytoplasm levels of dystrophin and Nav1.5. N=9 for dys. N=12 for Nav1.5. P<0.05 versus NC group. P-values were determined by unpaired t test.

      Author response image 13.

      Fluorescence intensity of lncDACH1, dystrophin and Nav1.5 in isolated cardiomyocytes of lncDACH1-cKO mice. a,b, Membrane levels of dystrophin (dys) and Nav1.5. N=12 for dys. N=8 for Nav1.5. P<0.05 versus WT group. c,d, Distribution of cytoplasm levels of dystrophin and Nav1.5. N=12. P<0.05 versus WT group. e, Fluorescence in situ hybridization (FISH) images of LncDACH1 expression. N=8. *P<0.05 versus WT group. P-values were determined by unpaired t test.

      Author response image 14.

      Fluorescence intensity of dystrophin and Nav1.5 in cultured neonatal cardiomyocytes after knocking down of lncDACH1. a,b, Distribution of membrane levels of dystrophin and Nav1.5. N=11 for dys. N=8 for Nav1.5.P<0.05 versus NC group. c,d, Distribution of cytoplasm levels of dystrophin and Nav1.5. N=12 for dys. N=9 for Nav1.5.P<0.05 versus NC group. P-values were determined by unpaired t test.

      Author response image 15.

      Fluorescence intensity of dystrophin and Nav1.5 in isolated cardiomyocytes overexpressing cF-lncDACH1. a,b, Membrane levels of dystrophin (dys) and Nav1.5. N=9 for dys. N=7 for Nav1.5. P<0.05 versus NC group. c,d, Cytoplasm levels of dystrophin and Nav1.5. N=6 for dys. N=7 for Nav1.5. P<0.05 versus NC group. P-values were determined by unpaired t test.

      Author response image 16.

      Fluorescence intensity of dystrophin and Nav1.5 in cultured neonatal cardiomyocytes overexpressing cF-lncDACH1. a,b, Membrane levels of dystrophin and Nav1.5. N=10 for dys. N=11 for Nav1.5. P<0.05 versus NC group. c,d, Cytoplasm levels of dystrophin and Nav1.5. N=7 for dys. N=6 for Nav1.5.P<0.05 versus NC group. P-values were determined by unpaired t test.

      Author response image 17.

      Fluorescence intensity of Nav1.5 in human iPS differentiated cardiomyocytes overexpressing cF-lncDACH1. a, Membrane levels of Nav1.5. N=8 for Nav1.5. P<0.05 versus NC group. b, Cytoplasm levels of Nav1.5. N=10 for Nav1.5.P<0.05 versus NC group. P-values were determined by unpaired t test.

      (5) More information on how the fractionation kit works would be helpful. How are membrane v. cytoplasm fractions identified?

      a. I presume the ER is part of the membrane fraction? When Nav1.5 is found in the cytoplasmic fraction, what subcompartment is it in - the proteasome?

      b. In the middle panel of A - is the dystrophin signal visible on the WB for WT? I assume the selected exemplar is the best of the blots and so this raises concerns. Much is riding on the confidence with which the fractions report "membrane" v "cytoplasm."

      Thank you for the insightful comment.

      (1). How the fractionation kit works:

      The kit utilizes centrifuge column technology to obtain plasma membrane structures with native activity and minimal cross-contamination with organelles without the need for an ultracentrifuge and can be used for a variety of downstream assays. Separation principle: cells/tissues are sensitized by Buffer A, the cells pass through the centrifuge column under the action of 16000Xg centrifugation, the cell membrane is cut to make the cell rupture, and then the four components of nucleus, cytoplasm, organelle and plasma membrane will be obtained sequentially through differential centrifugation and density centrifugation, which can be used for downstream detection.

      Author response image 18.

      (2). How are membrane v. cytoplasm fractions identified:

      The membrane proteins and cytosolic proteins isolated by the kit, and then the internal controls we chose when performing the western blot experiment were :membrane protein---N-cadherin cytosolic protein---β-Actin

      Most importantly, when we incubate either the primary antibody of N-cadherin with the PVDF membrane of the cytosolic protein, or the primary antibody of the cytosolic control β-Actin with the PVDF membrane of the membrane protein, the protein bands cannot be obtained in the scan results

      Author response image 19.

      (6) More detail in Results, figures, and figure legends will assist the reader.

      a. In Fig. 5, it would be helpful to label sinus rhythm vs. arrhythmia segments.

      Thank you for the good suggestion. We've marked Sinus Rhythm and Arrhythmia segments with arrows

      Author response image 20.

      b. Please explain in the figure legend what the red bars in 5A are

      Thank you for the insightful comment. We've added the explanation to the figure legend .The red lines in the ECG traces indicate VT duration.

      c. In 5C, what the durations pertain to.

      Thank you for the good suggestion. 720ms-760ms refers to the duration of one action potential, with 720ms being the peak of one action potential and 760ms being the peak of another action potential.The interval duration is not fixed, in this artical, we use 10ms as an interval to count the phase singularities from the Consecutive phase maps. Because the shorter the interval duration, the larger the sample size and the more convincing the data.

      d. In the text, please define "breaking points" and explain what the physiological underpinning is. Define "phase singularity."

      Thank you for the insightful comment. Cardiac excitation can be viewed as an electrical wave, with a wavefront corresponding to the action potential upstroke (phase 0) and a waveback corresponding to rapid repolarization (phase 3). Normally, Under normal circumstances, cardiac conduction is composed of a sequence of well-ordered action potentials, and in the results of optical mapping experiments, different colors represent different phases.when a wave propagates through cardiac tissue, wavefront and waveback never touch.when arrhythmias occur in the heart, due to factors such as reenfrant phenomenon, the activation contour will meet the refractory contour and waves will break up, initiating a newly spiral reentry. Corresponding to the optical mapping result graph, different colors representing different time phases (including depolarization and repolarization) come together to form a vortex, and the center of the vortex is defined as the phase singularity.

      (7) In reflecting on why enhanced INa is not proarrhythmic, it is noted that the kinetics are not altered. I agree that is key, but perhaps the consequence could be better articulated. Because lncDACH1 does not alter Nav1.5 gating, the late Na current may not be enhanced to the same effect as observed with LQT gain-of-function Nav1.5 mutations, in which APD prolongation is attributed to gating defects that increase late Na current.

      Thank you for the good suggestion. Your explanation is very brilliant and important for this article. We have revised the discussion section of the article and added these explanations to it.

      Reviewer #3 (Recommendations For The Authors):

      (1) Experiments to specifically address the reduction in total Nav1.5 protein should be included.

      Thank you for the insightful comment. We examined the ubiquitination of Nav1.5. We found that overexpression of lncDACH1 increased the ubiquitination of Nav1.5, which explains the downregulation of total Nav1.5 protein (Online Supplementary Figure 12).

      Author response image 21.

      (2) Experiments to convincingly demonstrate that LncRNA-DACH1 regulates Nav1.5 targeting via dystrophin are missing. As it is, total reduction in Nav1.5 seems to be the explanation as to why there is a decrease in membrane Nav1.5.

      Thank you for the insightful comment. we performed pulldown and RNA immunoprecipitation experiments. The data showed that lncDACH1 can pulldown dystrophin(Figure 1),but failed to pulldown Nav1.5 and anti-Nav1.5 did not precipitate lncDACH1( Supplementary Fig. 1). These data indicated that lncDACH does not interact with Nav1.5 directly. It participates in the regulation of Nav1.5 by binding to dystrophin.

      Author response image 22.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #2 (Public Review):

      In this revised manuscript Aguillon and collaborators convincingly demonstrating that CLK is required for free-running behavioral rhythms under constant conditions in the Cnidarian Nematostella. The results also convincingly show that CLK impacts rhythmic gene expression in this organism. This original work thus demonstrates that CLK was recruited very early during animal evolution in the circadian clock mechanism to optimize behavior and gene expression with the time-of-day. The manuscript could still benefit from some improvements so that it is more accessible for a wide readership.

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      Aguillon and collaborators have deeply revised, and in the progress significantly improved the presentation of their interesting results with the first Cnidarian circadian gene mutant. Results are now very convincingly demonstrating that CLK is required for free-running behavioral rhythms under constant conditions. The results also now more convincingly show that CLK impact rhythmic gene expression, although interpretation of the transcriptomics data is not straightforward. I think there is still improvements that are needed to make the manuscript more accessible. We authors need to keep in mind that a broad audience will read their report, not just chronobiologists. I have listed below several issues that I think should be addressed, and some editing suggestions.

      General comment to Editor and Reviewers:

      We are genuinely grateful to both reviewers and editors about all the feedback which helped us to make the best of our data, to question our analysis to the point we redefined our approach and end up with a great article we are proud of it. Only the name of authors is visible on the article, and considering how much the reviewing system help to improve the research it seems almost unfair. As such, we thank all of you and really appreciate the new eLife system. Bravo all.

      Abstract:

      (1) Line 40" It should read "transcript levels" instead of "transcription". There is no measurement of transcription rates in this manuscript, only mRNA levels.

      Modified accordingly.

      (2) Line 41: the authors mention "constant light". Does this refer to previous work? Their data in Figure 4 were in constant darkness, not in LL.

      Modified accordingly.

      (3) Line 46 and throughout the manuscript, the allelic nomenclature is not standard. 1-/- seems to indicate there are two different alleles. Since the allele might not be a null, I would suggest simply using 1/1, or perhaps delta/delta since the mutation results in a truncates CLK.

      NvClk1-/- became NvClkΔ/Δ. Except in the .xls supplementary table were the mutant kept the NvClk-/- nomenclature. It is not possible to replace only part of a word with a different font, here generating delta sign would require to do it one by one.

      (4) The last sentence of the abstract needs to be rephrased, as it suggests that CLK evolved to maintain circadian rhythms under constant conditions. Constant conditions very rarely exist on Earth, and thus cannot be an evolutionary driving force. Different explanations have been proposed on why a self-sustained clock is the evolutionary solution to timekeeping, but the purpose of the clock and of clock genes is not to maintain oscillations in constant conditions. Actually, this sentence conflicts with the title.

      Modified to: the Clock gene has evolved in cnidarians to sustain 24-hour rhythmic physiology and behavior in absence of diel environmental conditions. From my actual understanding, you are right, the purpose of clock gene is not to maintain oscillation in constant conditions (this is simply the result of the experiment), but to synchronize the physiology to the day/night rhythm, and surely to sustain 24h oscillations in case the environment challenges the perception of the diel cues. The DD or LL is just an artificial experimental design to reveal the endogenous time-keeping pacemaker.

      Results:

      (1) Line 148 and elsewhere in the MS: I would not use the word "lower" or "higher" to qualify acrophases. I would suggest advanced/delayed or earlier/later.

      Modified accordingly.

      (2) Line 157-9: The introductory sentence does not clearly present the rationale for the 6/6 experiments.

      We modified the paragraph accordingly: The presence of a 24-hour rhythm of NvClkΔ/Δ polyps under LD conditions could be attributed to either a direct light-response or the partial functioning of the circadian clock due to the nature of the mutation….

      (3) At the end of the behavior section, or perhaps at the end of each paragraph in this section, it would be helpful to have a summary of the results and more clearly explain their interpretation. The authors need to guide the readers, particularly non-chronobiologist, so that they can understand what the really neat data that were obtained mean. For example, what does it mean that the acrophase is different between mutant and wild-type, why are Clk mutants rhythmic under LD12/12 or 6/6, etc.

      We added a conclusion sentence to help non-specialist to understand each result.

      (4) Line 172 and elsewhere" "true rhythmic genes" sounds odd to me. Either they are, or they are not rhythmic.

      Modified to “rhythmic genes.”

      (5) Paragraph starting with line 184: I do not follow what is important about the number of genes per time cluster. What does it tell us, beyond the simple fact that less genes are rhythmic in the Clk mutants?

      We rewrote the result paragraph to make it clearer why we performed this clustering analysis. This clustering analysis became Extended Data Fig.2 with modification of the figures (see my comments in your review about Figure 3).

      (6) Line 197: The authors need to explain what they saw with circadian clock genes and their expression in CLk mutants. In some case, amplitude increased in LD. This surprising observation deserves some explanations. "Complex regulatory effect" is too vague.

      We replaced the vague “complex regulatory effect” by a more thorough description of the figure 3.a.

      (7) Line 198-203: Again, help the reader understand the significance of these observations.

      We rewrote the paragraph to help the reader to better understand the significance of these observations.

      Discussion:

      (1) Line 236-40. Careful with the use of -/-, which implies that an allele is a null. The first CLk mutants in mammals and flies, which the authors refer to. were actually dominant negatives.

      I went over the citations we used for this paragraph and this first mutation in fly dClkar is null, no dominant negative. Flies are still rhythmic in the dark. Unless there is an older mutation? However, you right the first mutation identified in mouse was a dominant-negative with loss of rhythmicity, while the gene deletion did not show any effect on the behavior, suggesting compensation by a paralog. I removed two references which were not relevant to the discussion.

      (2) Line 265-268 are not very clear. Do the authors mean that the lack of overlap for non-cricadian pacemaker genes is because of different experimental conditions? What would be those differences? It is reassuring that the Leach/Reitzel study and the present share pacemaker genes as rhythmic, but it is also surprising that there is almost no overlap beyond these genes. How robust are those other rhythms compared to circadian clock genes?

      We revised this paragraph and raised major points regarding the raising condition of our polyps between labs and their potential genetic differences which could explain these differences.

      (3) Line 270. I am not sure "compensation" is the right word, since there is no overlap between the rhythmic genes in mutants under LD and wild-type under either LD or DD. Also, saying on line 273 that the transcriptional pattern is not fully reproduced is a rather striking understatement, given the absence of rhythm gene overlap

      We rewrote the paragraph accordingly. We replaced by “alternative way to drive rhythmicity under LD condition”.

      (4) Line 279. The authors mention the possibility of false positives. Based on the FDR, is there more rhythmic genes than by chance?

      The possibility of false-positive is a risk to consider when you do not perform multiple-testing. We added within the results paragraph the number of rhythmic genes identified with BH.Q or p.adj. which both are the multiple testing for each algorithm (RAIN and JTK) we used.

      (5) Line 279-82. The references to the Ray study is rather obscure. What is the point the authors are trying to make here?

      Eventually, we removed the reference from this article and modify the paragraph of the discussion. Indeed, the discussion around the Ray study did not gave an interesting direction to discuss our results and analysis approach.

      (6) Line 284: define BHQ and p.adj

      Defined and referenced.

      (7) The way Lines 283-288 are worded do not provide a good rationale for how transcriptional rhythms were analyzed. The idea to combine two different approaches (JTK and RAIN) to be selective with rhythmicity was great. The authors need however to justify these choices in a more convincing manner. The goal is to detect rhythmic genes in a reliable manner, irrespective of the number of rhythmic genes observed Also, explaining the choice of methodology belongs to the result section.

      We explained our choice of methodology and moved it to the result section as suggested.

      (8) Line 292-3. There are known mechanisms that explain how transcriptional time clusters are generated. In particular, the use of interlocked feedback loop with antiphase peaks of transcriptions is well documented. Actually, it seems to me the clustering shown in Fig 4 might hint at such a mechanism.

      Indeed you are right the clustering shown in Fig 3 (former Fig 4) revealed such mechanism.

      Figures:

      Figure 2: Define relative amplitude

      We added the definition of the relative amplitude within the results. If this is what you asked for?

      Figure 3: Some of the cycles look odd (first row of graphs in panel C). Why would the first and last data point be so different in three of these graphs?

      We decided to modify this figure as we realized it was not informative and not objective enough, as we selected among multiple patterns few “representatives”. In the new figure we combined the cluster analysis to the behavior. Thus, readers can now pick a cluster according to a specific behavior activity level (or ZT/CT) and reach in supp. Table 4 the “genes of potential interest”. However generally speaking this figure does not explain more the consequences of the mutation, so we moved it into the Extended data Fig.2

      Figure4: define the color coding in the correlation panels (blue to red)

      These values from -1 to 1 are the Pearson correlation values. Now indicated on the figure with the color coding legend.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This study provides an important cell atlas of the gill of the mussel Gigantidas platifrons using a single nucleus RNA-seq dataset, a resource for the community of scientists studying deep sea physiology and metabolism and intracellular host-symbiont relationships. The work, which offers solid insights into cellular responses to starvation stress and molecular mechanisms behind deep-sea chemosymbiosis, is of relevance to scientists interested in host-symbiont relationships across ecosystems.

      Public Reviews:

      Reviewer #1 (Public Review):

      Wang et al have constructed a comprehensive single nucleus atlas for the gills of the deep sea Bathymodioline mussels, which possess intracellular symbionts that provide a key source of carbon and allow them to live in these extreme environments. They provide annotations of the different cell states within the gills, shedding light on how multiple cell types cooperate to give rise to the emergent functions of the composite tissues and the gills as a whole. They pay special attention to characterizing the bacteriocyte cell populations and identifying sets of genes that may play a role in their interaction with the symbiotes.

      Wang et al sample mussels from 3 different environments: animals from their native methane-rich environment, animals transplanted to a methane-poor environment to induce starvation, and animals that have been starved in the methane-poor environment and then moved back to the methane-rich environment. They demonstrated that starvation had the biggest impact on bacteriocyte transcriptomes. They hypothesize that the upregulation of genes associated with lysosomal digestion leads to the digestion of the intracellular symbiont during starvation, while the non-starved and reacclimated groups more readily harvest the nutrients from symbiotes without destroying them.

      Strengths:

      This paper makes available a high-quality dataset that is of interest to many disciplines of biology. The unique qualities of this non-model organism and the collection of conditions sampled make it of special interest to those studying deep sea adaptation, the impact of environmental perturbation on Bathymodioline mussels populations, and intracellular symbiotes. The authors do an excellent job of making all their data and analysis available, making this not only an important dataset but a readily accessible and understandable one.

      The authors also use a diverse array of tools to explore their data. For example, the quality of the data is augmented by the use of in situ hybridizations to validate cluster identity and KEGG analysis provides key insights into how the transcriptomes of bacteriocytes change.

      The authors also do a great job of providing diagrams and schematics to help orient non-mussel experts, thereby widening the audience of the paper.

      Thank the reviewer for the valuable feedback on our study. We are grateful that the reviewers found our work to be interesting and we appreciate their thorough evaluation of our research. Their constructive comments will be considered as we continue to develop and improve our study.

      Weaknesses:

      One of the main weaknesses of this paper is the lack of coherence between the images and the text, with some parts of the figures never being referenced in the body of the text. This makes it difficult for the reader to interpret how they fit in with the author's discussion and assess confidence in their analysis and interpretation of data. This is especially apparent in the cluster annotation section of the paper.

      We appreciate the feedback and suggestions provided by the reviewer, and we have revised our manuscript to make it more accessible to general audiences.

      Another concern is the linking of the transcriptomic shifts associated with starvation with changes in interactions with the symbiotes. Without examining and comparing the symbiote population between the different samples, it cannot be concluded that the transcriptomic shifts correlate with a shift to the 'milking' pathway and not other environmental factors. Without comparing the symbiote abundance between samples, it is difficult to disentangle changes in cell state that are due to their changing interactions with the symbiotes from other environmental factors.

      We are grateful for the valuable feedback and suggestions provided by the reviewer. Our keen interest lies in understanding symbiont responses, particularly at the single-cell level. However, it's worth noting that existing commercial single-cell RNA-seq technologies rely on oligo dT priming for reverse transcription and barcoding, thus omitting bacterial gene expression information from our dataset. We hope that advancements in technology will soon enable us to perform an integrated analysis encompassing both host and symbiont gene expression.

      Additionally, conclusions in this area are further complicated by using only snRNA-seq to study intracellular processes. This is limiting since cytoplasmic mRNA is excluded and only nuclear reads are sequenced after the organisms have had several days to acclimate to their environment and major transcriptomic shifts have occurred.

      We appreciate the comments shared by the reviewer and agree that scRNA-seq provides more comprehensive transcriptional information by targeting the entire mRNA of the cell. However, we would like to highlight that snRNA-seq has some unique advantages over scRNA-seq. Notably, snRNA-seq allows for simple snap-freezing of collected samples, facilitating easier storage, particularly for samples obtained during field trips involving deep-sea animals and other ecologically significant non-model animal samples. Additionally, unlike scRNA-seq, snRNA-seq eliminates the need for tissue dissociation, which often involves prolonged enzymatic treatment of deep-sea animal tissue/cells under atmospheric pressure. This process can potentially lead to the loss of sensitive cells or alterations in gene expression. Moreover, snRNA-seq procedures disregard the size and shape of animal cells, rendering it a superior technology for constructing the cell atlas of animal tissues. Consequently, we assert that snRNA-seq offers flexibility and represents a suitable choice for the research objects of our current research.

      Reviewer #2 (Public Review):

      Wang, He et al. shed insight into the molecular mechanisms of deep-sea chemosymbiosis at the single-cell level. They do so by producing a comprehensive cell atlas of the gill of Gigantidas platifrons, a chemosymbiotic mussel that dominates the deep-sea ecosystem. They uncover novel cell types and find that the gene expression of bacteriocytes, the symbiont-hosting cells, supports two hypotheses of host-symbiont interactions: the "farming" pathway, where symbionts are directly digested, and the "milking" pathway, where nutrients released by the symbionts are used by the host. They perform an in situ transplantation experiment in the deep sea and reveal transitional changes in gene expression that support a model where starvation stress induces bacteriocytes to "farm" their symbionts, while recovery leads to the restoration of the "farming" and "milking" pathways.

      A major strength of this study includes the successful application of advanced single-nucleus techniques to a non-model, deep-sea organism that remains challenging to sample. I also applaud the authors for performing an in situ transplantation experiment in a deep-sea environment. From gene expression profiles, the authors deftly provide a rich functional description of G. platifrons cell types that is well-contextualized within the unique biology of chemosymbiosis. These findings offer significant insight into the molecular mechanisms of deep-sea host-symbiont ecology, and will serve as a valuable resource for future studies into the striking biology of G. platifrons.

      The authors' conclusions are generally well-supported by their results. However, I recognize that the difficulty of obtaining deep-sea specimens may have impacted experimental design. In this area, I would appreciate more in-depth discussion of these impacts when interpreting the data.

      Thank the reviewer for their valuable feedback on our study. We're grateful that the reviewers found our work interesting, and we appreciate their thorough evaluation of our research. We'll consider their constructive comments as we continue to develop and improve our study.

      Because cells from multiple individuals were combined before sequencing, the in situ transplantation experiment lacks clear biological replicates. This may potentially result in technical variation (ie. batch effects) confounding biological variation, directly impacting the interpretation of observed changes between the Fanmao, Reconstitution, and Starvation conditions. It is notable that Fanmao cells were much more sparsely sampled. It appears that fewer cells were sequenced, resulting in the Starvation and Reconstitution conditions having 2-3x more cells after doublet filtering. It is not clear whether this is due to a technical factor impacting sequencing or whether these numbers are the result of the unique biology of Fanmao cells. Furthermore, from Table S19 it appears that while 98% of Fanmao cells survived doublet filtering, only ~40% and ~70% survived for the Starvation and Reconstitution conditions respectively, suggesting some kind of distinction in quality or approach.

      There is a pronounced divergence in the relative proportions of cells per cell type cluster in Fanmao compared to Reconstitution and Starvation (Fig. S11). This is potentially a very interesting finding, but it is difficult to know if these differences are the expected biological outcome of the experiment or the fact that Fanmao cells are much more sparsely sampled. The study also finds notable differences in gene expression between Fanmao and the other two conditions- a key finding is that bacteriocytes had the largest Fanmao-vs-starvation distance (Fig. 6B). But it is also notable that for every cell type, one or both comparisons against Fanmao produced greater distances than comparisons between Starvation and Reconstitution (Fig. 6B). Again, it is difficult to interpret whether Fanmao's distinctiveness from the other two conditions is underlain by fascinating biology or technical batch effects. Without biological replicates, it remains challenging to disentangle the two.

      As highlighted by the reviewer, our experimental design involves pooling multiple biological samples within a single treatment state before sequencing. We acknowledge the concern regarding the absence of distinct biological replicates and the potential impact of batch effects on result interpretation. While we recognize the merit of conducting multiple sequencing runs for a single treatment to provide genuine biological replicates, we contend that batch effects may not exert a strong influence on the observed patterns.

      In addition, we applied a bootstrap sampling algorithm to assess whether the gene expression patterns within a cluster are more similar than those between clusters. This algorithm involves selecting a portion of cells per cluster and examining whether this subset remains distinguishable from other clusters. Our assumption was that if different samples exhibited distinct expression patterns due to batch effect, the co-assignment probabilities of a cluster would be very low. This expectation was not met in our data, as illustrated in Fig. S2. The lack of significantly low co-assignment probabilities within clusters suggests that batch effects may not exert a strong influence on our results.

      Indeed, we acknowledge a noticeable shift in the expression patterns of certain cell types, such as the bacteriocyte. However, this is not universally applicable across all cell types. For instance, the UMAP figure in Fig. 6A illustrates a substantial overlap among basal membrane cell 2 from Fanmao, Starvation, and Reconstitution treatments, and the centroid distances between the three treatments are subtle, as depicted in Fig. 6B. This consistent pattern is also observed in DEPC, smooth muscle cells, and the food groove ciliary cells.

      The reviewer also noted variations in the number of cells per treatment. Specifically, Fanmao sequencing yielded fewer than 10 thousand cells, whereas the other two treatments produced 2-3 times more cells after quality control (QC). It is highly probable that the technician loaded different quantities of cells into the machine for single-nucleus sequencing—a not uncommon occurrence in this methodology. While loading more cells may increase the likelihood of doublets, it is crucial to emphasize that this should not significantly impact the expression patterns post-QC. It's worth noting that overloading samples has been employed as a strategic approach to capture rare cell types, as discussed in a previous study (reference: 10.1126/science.aay0267).

      The reviewer highlighted the discrepancy in cell survival rates during the 'doublet filtering' process, with 98% of Fanmao cells surviving compared to approximately 40% and 70% for the Starvation and Reconstitution conditions, respectively. It's important to clarify that the reported percentages reflect the survival of cells through a multi-step QC process employing various filtering strategies.

      Post-doublet removal, we filtered out cells with <100 or >2500 genes and <100 or >6000 unique molecular identifiers (UMIs). Additionally, genes with <10 UMIs in each data matrix were excluded. The observed differences in survival rates for Starvation and Reconstitution cells can be attributed to the total volume of data generated in Illumina sequencing. Specifically, we sequenced approximately 91 GB of data for Fanmao, ~196 GB for Starvation, and ~249 GB for Reconstitution. As a result, the qualified data obtained for Starvation and Reconstitution conditions was only about twice that of Fanmao due to the limited data volume.

      The reviewer also observed a divergence in the relative proportions of cells per cell type cluster in Fanmao compared to Reconstitution and Starvation, as depicted in Fig. S1. This discrepancy may hold true biological significance, presenting a potentially intriguing finding. However, our discussion on this pattern was rather brief, as we acknowledge that the observed differences could be influenced by the sample preparation process for dissection and digestion. It is crucial to consider that cutting a slightly different area during dissection may result in variations in the proportion of cells obtained. While we recognize the potential impact of this factor, we do not think that the sparsity of sampling alone could significantly affect the relative proportions of cells per cell type.

      In conclusion, we acknowledge the reviewer's suggestion that sequencing multiple individual samples per treatment condition would have been ideal, rather than pooling them together. However, the homogenous distribution observed in UMAP and the consistent results obtained from bootstrap sampling suggest that the impact of batch effects on our analyses is likely not substantial. Additionally, based on our understanding, the smaller number of cells in the Fanmao sample should not have any significant effect on the resulting different proportion of cells or the expression patterns per each cluster.

      Reviewer #3 (Public Review):

      Wang et al. explored the unique biology of the deep-sea mussel Gigantidas platifrons to understand the fundamental principles of animal-symbiont relationships. They used single-nucleus RNA sequencing and validation and visualization of many of the important cellular and molecular players that allow these organisms to survive in the deep sea. They demonstrate that a diversity of cell types that support the structure and function of the gill including bacteriocytes, specialized epithelial cells that host sulfur-oxidizing or methane-oxidizing symbionts as well as a suite of other cell types including supportive cells, ciliary, and smooth muscle cells. By performing experiments of transplanting mussels from one habitat which is rich in methane to methane-limited environments, the authors showed that starved mussels may consume endosymbionts versus in methane-rich environments upregulated genes involved in glutamate synthesis. These data add to the growing body of literature that organisms control their endosymbionts in response to environmental change.

      The conclusions of the data are well supported. The authors adapted a technique that would have been technically impossible in their field environment by preserving the tissue and then performing nuclear isolation after the fact. The use of single-nucleus sequencing opens the possibility of new cellular and molecular biology that is not possible to study in the field. Additionally, the in-situ data (both WISH and FISH) are high-quality and easy to interpret. The use of cell-type-specific markers along with a symbiont-specific probe was effective. Finally, the SEM and TEM were used convincingly for specific purposes in the case of showing the cilia that may support water movement.

      We appreciate the valuable feedback provided by the reviewer on our study. It is encouraging to know that our work was found to be interesting and that they conducted a thorough evaluation of our research. We will take their constructive comments into account as we strive to develop and enhance our study. Thank the reviewer for all the input.

      The one particular area for clarification and improvement surrounds the concept of a proliferative progenitor population within the gill. The authors imply that three types of proliferative cells within gills have long been known, but their study may be the first to recover molecular markers for these putative populations. The markers the authors present for gill posterior end budding zone cells (PEBZCs) and dorsal end proliferation cells (DEPCs) are not intuitively associated with cell proliferation and some additional exploration of the data could be performed to strengthen the argument that these are indeed proliferative cells. The authors do utilize a trajectory analysis tool called Slingshot which they claim may suggest that PEBZCs could be the origin of all gill epithelial cells, however, one of the assumptions of this analysis is that differentiated cells are developed from the same precursor PEBZC population.

      However, these conclusions do not detract from the overall significance of the work of identifying the relationship between symbionts and bacteriocytes and how these host bacteriocytes modulate their gene expression in response to environmental change. It will be interesting to see how similar or different these data are across animal phyla. For instance, the work of symbiosis in cnidarians may converge on similar principles or there may be independent ways in which organisms have been able to solve these problems.

      We are grateful for the valuable comments and suggestions provided by the reviewer. All suggestions have been carefully considered, and the manuscript has been revised accordingly. We particularly value the reviewer's insights regarding the characterization of the G. platifrons gill proliferative cell populations. In a separate research endeavor, we have conducted experiments utilizing both cell division and cell proliferation markers on these proliferative cell populations. While these results are not incorporated into the current manuscript, we would be delighted to share our preliminary findings with the reviewer. Our preliminary results indicate that the proliferative cell populations exhibit positivity for cell proliferation markers and contain a significant number of mitotic cells..

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Further experiments are needed to link the changes in transcriptomes of Bathymodioline mussels in the different environmental conditions to changes in their interactions with symbiotes. For example, quantifying the abundance and comparing the morphology of symbiotes between the environmental conditions would lend much support for shifting between milking and farming strategies. Without analyzing the symbiotes and comparing them across populations, it is difficult to comment on the mechanisms of interactions between symbiotes and the hosts. Without this analysis, this data is better suited towards comments about the general effect of environmental perturbation and stress on gene expression in these mussels.

      We appreciate the reviewer’s comments. We are also very curious about the symbiont responses, especially at the single-cell level. However, all the current commercial single-cell RNA-seq technologies are based on oligo dT priming for reverse transcription and barcoding. Therefore, the bacterial gene expression information is omitted from our dataset. Hopefully, with the development of technology, we could conduct an integrated analysis of both host and symbiont gene expression soon.

      Additionally, clarification is needed on which types of symbiotes are being looked at. Are they MOX or SOX populations? Are they homogenous? What are the concentrations of sulfur at the sampled sites?

      We thank you for your valuable comments and suggestions. Gigantidas platifrons harbors a MOX endosymbiont population characterized by a single 16S rRNA phylotype. We apologize for any confusion resulting from our previous wording. To clarify, we have revised lines 57-59 of our introduction

      In the text and images, consider using standardized gene names and leaving out the genome coordinates. This would greatly help with readability. Also, be careful to properly follow gene naming and formatting conventions (ie italicizing gene names and symbols).

      We appreciate the reviewer’s insightful comments. In model animals, gene nomenclature often stems from forward genetic approaches, such as the identification of loss-of-function mutants. These gene names, along with their protein products, typically correspond to unique genome coordinates. Conversely, in non-model invertebrates (e.g., Gigantidas platifrons of present study), gene prediction relies on a combination of bioinformatics methods, including de novo prediction, homolog-based prediction, and transcriptomics mapping. Subsequently, the genes are annotated by identifying their best homologs in well-characterized databases. Given that different genes may encode proteins with similar annotated functions, we chose to include both the gene ID (genome coordinates) and the gene name in our manuscript. This dual labeling approach ensures that our audience receives accurate and comprehensive information regarding gene identification and annotation.

      Additionally, extending KEGG analysis to the atlas annotation section could help strengthen the confidence of annotations. For example, when identifying bacteriocyte populations, the functional categories of individual marker genes (lysosomal proteases, lysosomal traffic regulators, etc) are used to justify the annotation. Presenting KEGG support that these functional categories are upregulated in this population relative to others would help further support how you characterize this cluster by showing it's not just a few specific genes that are enriched in this cell group, but rather an overall functionality.

      We appreciate the valuable suggestion provided by the reviewer. Indeed, incorporating KEGG analysis into the atlas annotation section could further enhance the confidence in our annotations. However, in our study, we encountered some limitations that impeded us from conducting a comprehensive KEGG enrichment analysis.

      Firstly, the number of differentially expressed genes (DEGs) that we identified for certain cell populations was relatively small, making it challenging to meet the threshold required for meaningful KEGG enrichment analysis. For instance, among the 97 marker genes identified for the Bacteriocyte cluster, only two genes, Bpl_scaf_59648-4.5 (lysosomal alpha-glucosidase-like) and Bpl_scaf_52809-1.6 (lysosomal-trafficking regulator-like isoform X1), were identified as lysosomal genes. To generate reliable KEGG enrichments, a larger number of genes is typically required.

      Secondly, single-nucleus sequencing, as employed in our study, tends to yield a relatively smaller number of genes per cell compared to bulk RNA sequencing. This limited gene yield can make it challenging to achieve sufficient gene representation for rigorous KEGG enrichment analysis.

      Furthermore, many genes in the genome still lack comprehensive annotation, both in terms of KEGG and GO annotations. In our dataset, out of the 33,584 genes obtained through single-nuclei sequencing, 26,514 genes have NO KEGG annotation, and 25,087 genes have NO GO annotation. This lack of annotations further restricts the comprehensive application of KEGG analysis in our study.

      The claim that VEPCs are symbiote free is not demonstrated. Additional double in situs are needed to show that markers of this cell type localize in regions free of symbiotes.

      We appreciate your comments and suggestions. In Figure 5B, our results demonstrate that the bacteriocytes (green fluorescent signal) are distant from the VEPCs, which are located around the tip of the gill filaments (close to the food groove). We have revised our Figure 5B to make it clear.

      Additionally, it does not seem like trajectory analysis is appropriate for these sampling conditions. Generally, to create trajectories confidently, more closely sampled time points are needed to sufficiently parse out the changes in expression. More justification is needed for the use of this type of analysis here and a discussion of the limitations should be mentioned, especially when discussing the hypotheses relating to PEBZCs, VEPCs, and DEPCs.

      We greatly appreciate your thoughtful commentary. It is important to acknowledge that in the context of a developmental study, incorporating more closely spaced time points indeed holds great value. In our ongoing project investigating mouse development, for instance, we have implemented time points at 24-hour intervals. However, in the case of deep-sea adult animals, we hypothesized a slower transcriptional shift in such extreme environment, which led us to opt for a time interval of 3-7 days. Examining the differential expression profiles among the three treatments, we observed that most cell types exhibited minimal changes in their expression profiles. For the cell types strongly impacted by in situ transplantation, their expression profiles per cell type still exhibited highly overlap in the UMAP analysis (Figure 6a), thus enabling meaningful comparisons. Nevertheless, we recognize that our sampling strategy may not be flawless. Additionally, the challenging nature of conducting in situ transplantation in 1000-meter depths limited the number of sampling occasions available to us. We sincerely appreciate your input and understanding.

      Finally, more detail should be added on the computational methods used in this paper. For example, the single-cell genomics analysis protocol should be expanded on so that readers unfamiliar with BD single-cell genomics handbooks could replicate the analysis. More detail is also needed on what criteria and cutoffs were used to calculate marker genes. Also, please be careful to cite the algorithms and software packages mentioned in the text.

      Acknowledged, thank you for highlighting this. In essence, the workflow closely resembles that of the 10x Genomics workflow (despite the use of a different software, i.e., Cell Ranger). We better explain the workflow below, and also noting that this information may no longer be relevant for newer users of BD or individuals who are not acquainted with BD, given that the workflow underwent a complete overhaul in the summer of 2023.

      References to lines

      Line 32: typo "..uncovered unknown tissue heterogeny" should read "uncovering" or "and uncovered")

      Overall abstract could include more detail of findings (ex: what are the "shifts in cell state" in line 36 that were observed)

      We apologize for the mistakes, and have revised the manuscript accordingly.

      Line 60: missing comma "...gill filament structure, but also"

      We apologize for the mistakes, and have revised the manuscript accordingly.

      Line 62-63: further discussion here, or in the relevant sections of the specific genes identified in the referenced bulk RNA-seq project could help strengthen confidence in annotation

      We appreciate the comment, and have revised the manuscript accordingly.

      Line 112: what bootstrapping strategy? Applied to what?

      This is a bootstrap sampling algorithm to assess the robustness of each cell cluster developed in a recent biorxiv paper. (Singh, P. & Zhai, Y. Deciphering Hematopoiesis at single cell level through the lens of reduced dimensions. bioRxiv, 2022.2006.2007.495099 (2022). https://doi.org:10.1101/2022.06.07.495099)

      Lines 127-129: What figures demonstrate the location of the inter lamina cells? Are there in situs that show this?

      We apologize for any errors; the referencing of figures in the manuscript has been revised for clarity

      Lines 185-190: does literature support these as markers of SMCs? Are they known smooth muscle markers in other systems?

      We characterized the SMCs by the expression of LDL-associated protein, angiotensin-converting enzyme-like protein, and the "molecular spring" titin-like protein, all of which are commonly found in human vascular smooth muscle cells. Based on this analysis, we hypothesize that these cells belong to the smooth muscle cell category.

      Line 201: What is meant by "regulatory roles"?

      In this context, we are discussing the expression of genes encoding regulatory proteins, such as SOX transcription factors and secreted-frizzled proteins.

      Line 211: which markers disappeared? What in situs show this?

      We apologize for the mistakes, and have revised the manuscript accordingly.

      Line 211: typo, "role" → "roll"

      We apologize for the mistakes, and have revised the manuscript accordingly.

      Line 214: what are these "hallmark genes"

      We apologize for the mistakes, here we are referring to the genes listed in figure 4B. We have revised the manuscript accordingly.

      Line 220: are there meristem-like cells in metazoans? If so, this would be preferable to a comparison with plants.

      In this context, we are discussing the morphological characteristics of gill proliferative cell populations found in filibranch bivalves. These populations, namely PEPC, VEPC, and DEPC, consist of cells exhibiting morphological traits akin to those of plant cambial-zone meristem cells. These cells typically display small, round shapes with a high nucleus-to-plasma ratio. We acknowledge that while these terms are utilized in bivalve studies (citations below), they lack the robust support seen in model systems backed by molecular biology evidences. The present snRNA-seq data, however, may offer valuable cell markers for future comprehensive investigations.

      Leibson, N. L. & Movchan, O. T. Cambial zones in gills of Bivalvia. Mar. Biol. 31, 175-180 (1975). https://doi.org:10.1007/BF00391629

      Wentrup, C., Wendeberg, A., Schimak, M., Borowski, C. & Dubilier, N. Forever competent: deep-sea bivalves are colonized by their chemosynthetic symbionts throughout their lifetime. Environ. Microbiol. 16, 3699-3713 (2014). https://doi.org:10.1111/1462-2920.12597

      Cannuel, R., Beninger, P. G., McCombie, H. & Boudry, P. Gill Development and its functional and evolutionary implications in the blue mussel Mytilus edulis (Bivalvia: Mytilidae). Biol. Bull. 217, 173-188 (2009). https://doi.org:10.1086/BBLv217n2p173

      Line 335: what is slingshot trajectory analysis? Does this differ from the pseudotime analysis?

      Slingshot is an algorithm that uses the principal graph of the cells to infer trajectories. It models trajectories as curves on the principal graph, capturing the progression and transitions between different cellular states.

      Both Slingshot and pseudotime aim to infer cellular trajectories. Slingshot focuses on capturing branching patterns which is fully compatible with the graph generated using dimensionality reduction such as UMAP and PHATE, while pseudotime analysis aims to order cells along a continuous trajectory. It does not rely on dimensionality reduction graphs. We used both in the MS for different purposes.

      Line 241: introduce FISH methodology earlier in the paper, when in situ images are first referenced

      We appreciate the comment, and have revised the manuscript accordingly.

      Line 246-249: can you quantify the decrease in signal or calculate the concentration of symbiotes in the cells? Was 5C imaged whole? This can impact the fluorescent intensity in tissues of different thicknesses.

      We appreciate your comment. In Figure 5C, most of the typical gill filament region is visible (the ventral tip of the gill filament, and the mid part of the gill filament) except for the dorsal end. The gill filament of bathymodioline mussels exhibits a simple structure: a single layer of bacteriocytes grow on the basal membrane. Consequently, the gill slices have a fairly uniform thickness (with two layers of bacteriocytes and one layer of interlamina cells in between), minimizing any potential impact on fluorescent intensity. As of now, detailed quantification of intracellular symbionts may necessitate continuous TEM or ultra-resolution confocal sections to 3D reconstruct the bacteriocytes, which may exceed the scope of the current study. Therefore, fluorescent intensity remains the only method available to us for estimating bacterial density/distribution across the gill filament.

      Line 249: What is meant by 'environmental gradient?'

      Here we are refereeing the gases need for symbiont’s chemosynthesis. We have revised the manuscript to make it clear.

      Lines 255-256: Were the results shown in the TEM images previously known? Not clear what novel information is conveyed in images Fig 5 C and D

      In the Fig 5 C and D, we’ve delivered a high-quality SEM TEM image of a typical bacteriocyte, showcasing its morphology and subcellular machinery with clarity. These electron microscopy images offer the audience a comprehensive introduction to the cellular function of bacteriocytes. Additionally, they serve as supportive evidence for the bacteriocytes' snRNA-seq data.

      Line 295-296: Can you elaborate on what types of solute carrier genes have been shown to be involved with symbioses?

      We appreciate the comment, and have revised the manuscript accordingly. The putative functions of the solute carriers could be found in Figure 5I.

      Line 297-301: Which genes from the bulk RNA-seq study? Adding more detail and references in cluster annotation would help readers better understand the justifications.

      We appreciate the comment, and have revised the manuscript accordingly.

      Line 316 -322: Can you provide the values of the distances?

      We also provide values in the main text, in addition to the Fig6b. We also provide a supplementary Table (Supplementary Table S19).

      Line 328: What are the gene expression patterns?

      We observed genes that are up- and down-regulated in Starvation and reconstitution.

      LIne 334-337: A visualization of the different expression levels of the specific genes in clusters between sites might be helpful to demonstrate the degree of difference between sites.

      We have prepared a new supplementary file showing the different expression levels.

      Line 337: Citation needed

      We appreciate the comment. Here, we hypothesize the cellular responds based on the gene’s function and their expression patterns.

      Line 402-403: Cannot determine lineages from data presented. Need lineage tracing over time to determine this

      We acknowledge the necessity of conducting lineage tracing over time to validate this hypothesis. Nonetheless, in practical terms, it is difficult to obtain samples for testing this. Perhaps, it is easier to use their shallow sea relatives to test this hypothesis. However, in practice, it is very difficult.

      413-414: What are the "cell-type specific responses to environmental change"? It could be interesting to present these results in the "results and discussion" section

      These results are shown in Supplementary Figure S8.

      Line 419-424: Sampling details might go better earlier on in the paper, when the sampling scheme is introduced.

      We appreciate the comments. Here, we are discussing the limitations of our current study, not sampling details.

      Line 552: What type of sequencing? Paired end? How long?

      We conducted 150bp paired-end sequencing.

      556-563: More detail here would be useful to readers not familiar with the BD guide. Also be careful to cite the software used in analysis!

      The provided guide and handbook elucidate the intricacies of gene name preparation, data alignment to the genome, and the generation of an expression matrix. It is worth mentioning that we relied upon outdated versions of the aforementioned resources during our data analysis phase, as they were the only ones accessible to us at the time. However, we have since become aware of a newer pipeline available this year, rendering the information presented here of limited significance to other researchers utilizing BD.

      Many thanks for your kind reminding. We have now included a reference for STAR. All other software was cited accordingly. There are no scholarly papers or publications to refer to for the BD pipeline that we can cite.

      Line 577-578: How was the number of clusters determined? What is meant by "manually combine the clusters?" If cells were clustered by hand, more detail on the method is needed, as well as direct discussion and justification in the body of the paper.

      It would be more appropriate to emphasize the determination of cell types rather than clusters. The clusters were identified using a clustering function, as mentioned in the manuscript. It's important to note that the clustering function (in our case, the FindClusters function of Seurat) provides a general overview based on diffuse gene expression. Technically speaking, there is no guarantee that one cluster corresponds to a single cell type. Therefore, it is crucial to manually inspect the clustering results to assign clusters to the appropriate cell types. In some cases, multiple clusters may be assigned to the same cell type, while in other cases, a single cluster may need to be further subdivided into two or more cell types or sub-cell types, depending on the specific circumstances.

      For studies conducted on model species such as humans or mice, highly and specifically expressed genes within each cluster can be compared to known marker genes of cell types mentioned in previous publications, which generally suffices for annotation purposes. However, in the case of non-model species like Bathymodioline mussels, there is often limited information available about marker genes, making it challenging to confidently assign clusters to specific cell types. In such situations, in situ hybridisation proves to be incredibly valuable. In our study, WISH was employed to visualise the expression and morphology of marker genes within clusters. When WISH revealed the expression of marker genes from a cluster in a specific type of cell, we classified that cluster as a genuine cell type. Moreover, if WISH demonstrated uniform expression of marker genes from different clusters in the same cell, we assigned both clusters to the same cell type.

      We expanded the description of the strategy in the Method section.

      LIne 690-692: When slices were used, what part of the gill were they taken from?

      We sectioned the gill around the mid part which could represent the mature bacteriocytes.

      References to figures:

      General

      Please split the fluorescent images into different channels with an additional composite. It is difficult to see some of the expression patterns. It would also make it accessible to colorblind readers.

      We appreciate the comments and suggestions from the reviewer. We have converted our figures to CMYK colour which will help the colorblind audiences to read our paper.

      Please provide the number of replicates for each in situ and what proportion of those displayed the presented pattern.

      We appreciate the reviewer’s comments. We have explained in the material and methods part of the manuscript.

      Figure 2.C' is a fantastic summary and really helps the non-mussel audience understand the results. Adding schematics like this to Figures 3-5 would be helpful as well.

      We value the reviewer's comments. We propose that Figures 3K, 4C, and 5A-D could offer similar schematic explanations to assist the audience.

      Figure 2:

      Figures 2.C-F, 2.C', 2.H-J are not referenced in the text. Adding in discussions of them would help strengthen your discussions on the cluster annotation

      We appreciate the reviewer's comments. We have revise the manuscript accordingly.

      In 2.B. 6 genes are highlighted in red and said to be shown in in situs, but only 5 are shown.

      We apology for the mistake. We didn’t include the result 20639-0.0 WISH in present study. We have changed the label to black.

      Figure 3:

      FIg 2C-E not mentioned.

      We appreciate the reviewer's comments. We have revise the manuscript accordingly.

      In 3.B 8 genes are highlighted in red and said to be shown in in situs. Only 6 are.

      The result of the WISH were provided in Supplementary Figures S4 and S5.

      FIgure 3.K is not referenced in the legend.

      We appreciate the comment, and have revised the manuscript accordingly.

      Figure 4:

      In Figure D, it might be helpful to indicate the growth direction.

      We appreciate the comment, and have revised the manuscript accordingly by adding an arrow in panel D to indicate growth direction.

      4F: A double in situ with the symbiote marker is needed to demonstrate the nucleolin-like positive cells are symbiote free.

      We appreciate the comment. The symbiont free region could be found in Figure 5A.

      Figure 5:

      In 5.A, quantification of symbiote concentration would help support your conclusion that they are denser around the edges.

      We appreciate the comment, as we mentioned above, detailed quantification of intracellular symbionts may necessitate continuous TEM or ultra-resolution confocal sections to 3D reconstruct the bacteriocytes, which may exceed the scope of the current study. Therefore, fluorescent intensity remains the only method available to us for estimating bacterial density/distribution across the gill filament.

      In 5.D, the annotation is not clear. Adding arrows like in 5.C would be helpful.

      We appreciate the comment, and have revised the manuscript accordingly.

      A few genes in 5.F are not mentioned in the paper body when listing other genes. Mentioning them would help provide more support for your clustering.

      We appreciate the comment, and have revised the manuscript accordingly.

      Is 5.I meant to be color coded with the gene groups from 5.F? Color Coding the gene names, rather than organelles or cellular structures might portray this better and help visually strengthen the link between the diagram and your dot plot.

      We appreciate the suggestions. We've experimented with color-coding the gene names, but some colors are less discernible against a white background.

      Figure 6:

      6.B Is there a better way to visualize this data? The color coding is confusing given the pairwise distances. Maybe heatmaps?

      We attempted a heatmap, as shown in the figure below. However, all co-authors agree that a bar plot provides clearer visualization compared to the heatmap. We agree that the color scheme maya be confusing because they use the same color as for individual treatment. So we change the colors.

      Author response image 1.

      Figure 6.D: Why is the fanmao sample divided in the middle?

      Fig6C show that single-cell trajectories include branches. The branches occur because cells execute alternative gene expression programs. Thus, in Fig 6D, we show changes for genes that are significantly branch dependent in both lineages at the same time. Specifically, in cluster 2, the genes are upregulated during starvation but downregulated during reconstitution. Conversely, genes in cluster 1 are downregulated during starvation but upregulated during reconstitution. It's of note that Fig 6D displays only a small subset of significantly branch-dependent genes.

      FIgure 6.D: Can you visualize the expression in the same format as in figures 2-5?

      We appreciate the comments from the reviewer. As far as we know, this heatmap are the best format to demonstrate this type of gene expression profile.

      Supplementary Figure S2:

      Please provide a key for the cell type abbreviations

      We appreciate the comment, and have added the abbreviations of cell types accordingly.

      Supplementary Figures S4 and S5:

      What part of the larger images are the subsetted image taken from?

      We appreciate the comment, these images were taken from the ventral tip and mid of the gill slices, respectively. We have revised the figure legends to make it clear.

      Supplemental Figure S7:

      If clusters 1 and 2 show genes up and downregulated during starvation, what do clusters 4 and 3 represent?

      Cluster 1: Genes that are obviously upregulated during Starvation, and downregulated during reconstitution; luster4: genes are downregulated during reconstitution but not obviously upregulated during Starvation.

      Cluster 2 show genes upregulated during reconstitution, and cluster 3 obviously downregulated during Starvation.

      Author response table 1.

      Supplemental Figure S8:

      This is a really interesting figure that I think shows some of the results really well! Maybe consider moving it to the main figures of the paper?

      We appreciate the comments and suggestions. We concur with the reviewer on the significance of the results presented. However, consider the length of this manuscript, we have prioritized the inclusion of the most pertinent information in the main figures. Supplementary materials containing additional figures and details on the genes involved in these pathways are provided for interested readers.

      Supplemental Figure S11:

      Switching the axes might make this image easier for the reader to interpret. Additionally, calculating the normalized contribution of each sample to each cluster could help quantify the extent to which bacteriocytes are reduced when starving.

      Thank you for the insightful suggestion, which we have implemented as detailed below. We acknowledge the importance of understanding the changes in bacteriocyte proportions across different treatments. However, it's crucial to note that the percentage of cells per treatment is highly influenced by factors such as the location of digestion and sequencing, as previously mentioned.

      Author response image 2.

      Reviewer #2 (Recommendations For The Authors):

      The following are minor recommendations for the text and figures that may help with clarity:

      Fig. 3K: This figure describes water flow induced by different ciliary cells. It is not clear what the color of the arrows corresponds to, as they do not match the UMAP (i.e. the red arrow) and this is not indicated in the legend. Are these colours meant to indicate the different ciliary cell types? If so it would be helpful to include this in the legend.

      We appreciate the reviewer's comments and suggestions. The arrows indicate the water flow that might be agitated by the certain types of cilium. We have revised our figure and figure legends to make it clear.

      Line 369: The incorrect gene identifier is given for the mitochondrial trifunctional enzyme. This gene identifier is identical to the one given in line 366, which describes long-chain-fatty-acid-ligase ACSBG2-like (Bpl_scaf_28862-1.5).

      We appreciate the reviewer's comments and suggestions. We have revised our manuscript accordingly.

      Line 554: The Bioproject accession number (PRJNA779258) does not appear to lead to an existing page in any database.

      We appreciate the reviewer's comments and suggestions. We have released this Bioproject to the public.

      Line 597-598: it would be helpful to know the specific number of cells that the three sample types were downsampled to, and the number of cells remaining in each cluster, as this can affect the statistical interpretation of differential expression analyses.

      The number of cells per cluster in our analysis ranged from 766 to 14633. To mitigate potential bias introduced by varying cell numbers, we implemented downsampling, restricting the number of cells per cluster to no more than 3500. This was done to ensure that the differences between clusters remained less than 5 times. We experimented with several downsampling strategies, exploring cell limits of 4500 and 2500, and consistently observed similar patterns across these variations.

      Data and code availability:

      The supplementary tables and supplementary data S1 appear to be the final output of the differential expression analyses. Including the raw data (e.g. reads) and/or intermediate data objects (e.g. count matrices, R objects), in addition to the code used to perform the analyses, may be very helpful for replication and downstream use of this dataset. As mentioned above, the Bioproject accession number appears to be incorrect.

      We appreciate the reviewer's comments and suggestions. Regarding our sequencing data, we have deposited all relevant information with the National Center for Biotechnology Information (NCBI) under Bioproject PRJNA779258. Additionally, we have requested the release of the Bioproject. Furthermore, as part of this round of revision, we have included the count matrices for reference.

      Reviewer #3 (Recommendations For The Authors):

      As noted in the public review, my only major concerns are around the treatment of progenitor cell populations. I am sympathetic to the challenges of these experiments but suggest a few possible avenues to the authors.

      First, there could be some demonstration that these cells in G. platifrons are indeed proliferative, using EdU incorporation labeling or a conserved epitope such as the phosphorylation of serine 10 in histone 3. It appears in Mytilus galloprovincialis that proliferating cell nuclear antigen (PCNA) and phospho-histone H3 have previously been used as good markers for proliferative cells (Maiorova and Odintsova 2016). The use of any of these markers along with the cell type markers the authors recover for PEBZCs for example would greatly strengthen the argument that these are proliferative cells.

      If performing these experiments would not be currently possible, the authors could use some computation approaches to strengthen their arguments. Based on conserved cell cycle markers and the use of Cell-Cycle feature analysis in Seurat could the authors provide evidence that these progenitors occupy the G2/M phase at a greater percentage than other cells? Other than the physical position of the cells is there much that suggests that these are proliferative? While I am more convinced by markers in VEPCs the markers for PEBZCs and DEPCs are not particularly compelling.

      While I do not think the major findings of the paper hinge on this, comments such as "the PBEZCs gave rise to new bacteriocytes that allowed symbiont colonization" should be taken with care. It is not clear that the PBEZCs are proliferative and there does not seem to be any direct evidence that PBEZCs (or DEPCs or VEPCS for that manner) are the progenitor cells through any sort of labeling or co-expression studies.

      We appreciate the comments and suggestions from the reviewer. We have considered all the suggestions and have revised the manuscript accordingly. We especially appreciate the reviewer’s suggestions about the characterisations of the G. platifrons gill proliferative cell populations. In a separate research project, we have tested both cell division and cell proliferation markers on the proliferation cell populations. Though we are not able to include these results in the current manuscript, we are happy to share our preliminary results with the reviewer. Our results demonstrate the proliferative cell populations, particularly the VEPCs, are cell proliferation marker positive, and contains high amount of mitotic cells.

      Author response image 3.

      Finally, there is a body of literature that has examined cell proliferation and zones of proliferation in mussels (such as Piquet, B., Lallier, F.H., André, C. et al. Regionalized cell proliferation in the symbiont-bearing gill of the hydrothermal vent mussel Bathymodiolus azoricus. Symbiosis 2020) or other organisms (such as Bird, A. M., von Dassow, G., & Maslakova, S. A. How the pilidium larva grows. EvoDevo. 2014) that could be discussed.

      We appreciate the comments and suggestions from the reviewer. We have considered all the suggestions and have revised the manuscript accordingly (line 226-229).

      Minor comments also include:

      Consider changing the orientation of diagrams in Figure 2C' in relationship to Figure 2C and 2D-K.

      We appreciate the comments and suggestions from the reviewer. The Figure 2 has been reorganized.

      For the diagram in Figure 3K, please clarify if the arrows drawn for the direction of inter lamina water flow is based on gene expression, SEM, or some previous study.

      We are grateful for the reviewer's valuable feedback and suggestions. The arrows in the figure indicate the direction of water flow that could be affected by specific types of cilium. Our prediction is based on both gene expression and SEM results. To further clarify this point, we have revised the figure legend of Fig. 3.

      Please include a label for the clusters in Figure 5E for consistency.

      We have revised our Figure 5E to keep our figures consistent.

      Please include a note in the Materials and Methods for Monocle analysis in Figure 6.

      We conducted Monocle analyses using Monocle2 and Monocle 3 in R environment. We have revised our material and methods with further information of Figure 6.

      In Supplement 2, the first column is labeled PEBC while the first row is labeled PEBZ versus all other rows and columns have corresponding names. I am guessing this is a typo and not different clusters?

      We appreciate the great effort of the reviewer in reviewing our manuscript. We have corrected the typo in the revised version.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      (1) The authors' findings are primarily rooted in a series of well-conducted in vitro experiments using two CML cell lines, K562 and MEG-01. While the findings are interesting and novel, further work to corroborate these findings in primary CML samples would have greatly strengthened the potential real-world relevance of these discoveries. The authors appear to have some PBMCs from primary CML patients and a BM sample from a Ph+ ALL in which they performed western blot analyses (Fig 1). Couldn't these samples have been used to at least confirm some of the key discoveries? For example, the neddylation of BCR-ABL, or; sensitivity of primary leukemic cells to RAPSYN knockdown, and/or; phosphorylation of RAPSYN by SRC?

      We agree with your points and really appreciate your comments. To demonstrate the clinical relevance, we have conducted a series of experiments to address your concerns.

      (1) after a thorough optimization on the transduction process, we have managed to show that shRNA-mediated gene silencing of RAPSYN impaired the growth of primary CML samples. These additional data are presented as Figure 1D in the revised manuscript with its corresponding figure legend and description, lines 136-141.

      (2) we have invested tremendous time and effort to deal with “key discoveries” regardless of the almost impossible task with a great technical difficulty. With 5 mL (ethical approval) of PBMCs on hands, we have finally managed to confirm BCR-ABL neddylation by IP from two newly acquired CML patients. The results are as presented in Figure 2F in the revised manuscript with its corresponding figure legend and description, lines 186-187.

      (2) The authors initially interrogated a fairly dated (circa 2009) microarray-based primary dataset to show that the increase in RAPSYN is primarily a post-transcriptional event, as mRNA levels are not different between healthy and CML samples. It would be interesting to see whether differences might be more readily seen in more recent RNA-seq datasets from CML patients, given the well-known differences in sensitivity between the two platforms. Additionally, I wonder if there would be transcriptional signatures of increased NEDDylation (or RAPSYN-induced NEDDylation) that could be interrogated in primary samples? Furthermore, there are proteomics datasets of CML cells made resistant to TKIs (through in vitro selection experiments) that could be interrogated for independent validation of the authors' discoveries. For example: from K562 cells, PMID: 30730747 or PMID: 34922009).

      Thank you very much for your constructive comments. Based on your suggestion, we have 1) analyzed mRNA level of RAPSYN in RNA-seq datasets GSE13159 (2009), GSE138883 (2020) and GSE140385 (2020), indicating no difference between CML patients and healthy donors. We have included the results in Figure1-figure supplementary 1A and in the revised manuscript (lines 123-127); 2) examined the RNA levels of RAPSYN-related neddylation enzymes, including E1 (NAE1), E2 (UBE2M), NEDD8 and NEDP1 in these databases, and no significant differences of these neddylation-related genes were found between CML patients and healthy donors as well (Supplementary Figure 2C, lines 168-172).

      We have also analyzed the proteomics datasets from PMID: 30730747 and PMID: 34922009 according to your suggestion. Unfortunately, no information on RAPSYN expression is available in these datasets. To avoid potential negligence, we have examined all CML-related proteomics datasets from 2002 to 2024, still resulting in no information about protein expression of RAPSYN. Consequently, our finding on the higher expression of RAPSYN in the PBMCs of Ph+ patients in this study appears to be an observation for the first time. And we believe that our results should be more clinically relevant than those, if any, from the cells by in vitro selection.

      Reviewer #2 (Public Review):

      Most of the conclusions drawn in this paper are well supported by data, but some aspects of the data need to be clarified and extended:

      (1) The authors propose that targeting RAPSYN in Ph+ leukemia could have a high therapeutic index, suggesting that inhibition of RAPSYN may lead to cytotoxicity in Ph+ leukemia with high specificity and minimal side effects. To substantiate this assertion, the authors should investigate the impact on cell viability upon RAPSYN knockdown in non-Ph leukemic cell lines or HS-5 cells (similar to Figure 1C), despite their lower RAPSYN protein levels.

      We appreciate your valuable comments. When we used shRNA to knockdown the expression of RAPSYN in HS-5 cells, it did not affect the cell growth of HS-5 cells. We have included the data in Figure 1C, modified its figure legend, and added corresponding description, lines 136-141.

      (2) The authors intriguingly show that the protein levels of RAPSYN are significantly enriched in Ph+ patient samples and cell lines (Figure 1A, B), even though the mRNA levels remain unchanged (Supplementary Figure 1 A-C). This observation merits a clear explanation in the context of the presented results. The data in the manuscript does imply a feedforward loop mechanism (Figure 7), where BCR-ABL activates SRC, which subsequently stabilizes RAPSYN, which in turn helps protect BCR-ABL from c-CBL-mediated degradation. If this is the working hypothesis, it would be beneficial for the reader to see supporting evidence.

      Thank you very much for pointing out the issue. We have realized the inappropriateness of Figure 7, which was originally placed as a summarizing figure. To avoid potential confusion and misleading, this figure has been deleted, which does not affect the results and conclusions of this study. In addition, the differences on mRNA levels and protein expressions have been responded to Reviewer #1.

      (3) The authors present compelling evidence to suggest that RAPSYN may possess direct NEDD8-ligase activity on BCR-ABL. To strengthen this claim, it may be valuable to conduct further assays involving a ligase-deficient mutant, such as C366A, beyond its use in Figure 2J. Incorporating this mutant into the in vitro assay illustrated in Figure 2K, for instance, could offer substantial validation for the claim. In addition, showing whether the ligase-deficient mutant is capable of phenocopying the phosphorylation-mutant Y336F, as showcased in Figures 5E, F, and 6D, F, would be beneficial.

      We are grateful to your comments. In the manuscript, we have provided sufficient data to support the direct neddylation of BCR-ABL by RAPSYN, as you commented “The authors present compelling evidence to suggest that RAPSYN may possess direct NEDD8-ligase activity on BCR-ABL.”. Cys366 was previously demonstrated as the catalytic residue essential for E3 activity of RAPSYN (Li et al. 2016, PMID: 27839998), and the phosphorylation at Phe336 was thoroughly verified by site-directed mutagenesis and the treatments of SRC-specific inhibitor saracatinib in present cellular experiments. Therefore, while we fully respect your opinions, we do not think it would be necessary to perform tedious in vitro reactions for expected negative results, which was the reason for us not to conduct enzymatic reactions with known inactive mutants, such as C366A and Y336F, in the first place.

      (4) The observations presented in Figures 6 C-G require additional clarification. Notably, there are discrepancies in relative cell viability effects in K562 cells, and to some extent in MEG-01 cells, under conditions that are indicated as being either identical or highly similar. For instance, this inconsistency is observable when comparing the left panels of Figure 6C and 6D in the case of NC overexpression + shSRC#2, and the left panels of Figure 6E and 6G with NC overexpression or shNC, respectively. Listing potential causes of these discrepancies would strengthen the overall validity of the findings and their subsequent interpretation.

      Thank you for your comments and apologize for the confusion. To make a meaningful comparison, we have revised the method part “Preparation of stable RAPSYNWT, RAPSYNY336F or SRC expression cell lines” (lines 625-627) and reorganized Figure 6 to reflect the differences on the negative controls. In fact, we first used LV6 (EF-1a/Puro; OE-NC1) vector for the overexpression of RAPSYNWT and SRC. Due to low expression level with LV6 and long period of time for subsequent selection, we switched to LV18 (CMV/Puro; OE-NC2) for the overexpression of RAPSYNY336F. Since the sensitivities of K562/MEG01-OE-NC cells to shSRC transduction in Figure 6C (now revised to K562/MEG01-OE-NC1) and 6D (now revised to K562/MEG01-OE-NC2) were noticeably different, we have separated RAPSYNWT and RAPSYNY336F cells as 6C and 6D with their own corresponding empty vector as negative control, instead of merging the results into a single figure with one negative control of OE-NC. In addition, given the fact that K562/MEG01 cells reacted differently upon saracatinib treatments after transduction with the empty vector, we have also distinguished the negative controls as OE-NC1 in Figure 6E, OE-NC2 in Figure 6F and shNC in Figure 6G. Afterall, the transduction of K562/MEG01 cells with different expression vectors and viral particles caused the discrepancies in the experiments of cell viability, which has been clarified by reorganizing Figure 6 in the revision.

      (5) Throughout the manuscript, immunoblots which showcase immunoprecipitations of BCR-ABL or His-BCR-ABL depict poly-neddylation (e.g. Figures 2E-M, 3D-G, and 5A-E) and poly-ubiquitination (e.g. Figures 3D-G) patterns/smears where these patterns seem to extend below the molecular weight of BCR-ABL. To enhance clarity, it would be valuable for the authors to provide an explanation in the text or the figure legend for this observation. Is it reflective of potential degradation of BCR-ABL or is there another explanation behind it?

      Thank you for your valuable comments. After carefully checking original immunoblots, we have ascertained that the protein band of BCR-ABL was at 250 KDa and the smear bands appeared to be higher than 250 KDa were likely caused by the conjugation of NEDD8 (neddylation) or Ubiquitin (ubiquitination) onto BCR-ABL. Regarding the molecular weight of modified BCR-ABL lower than expected, whether it is a common feature as previously reported (Mao, J., et al, 2010, PMID: 21118980) or possible degradation during the modification process or sample preparation requires further investigation. We have corrected the labeling of figures in the revised manuscript.

      Reviewer #1 (Recommendations For The Authors):

      (1) It would really nail the real-world relevance of these nice findings if the authors are able to confirm some aspects of their cell line-based discoveries in publicly available 'omics datasets generated from primary CML samples. I have suggested some of these in the public review as well.

      Alternatively, if they are able to investigate samples from murine CML models (eg. BALB/c CML models), it would represent a step towards real-world relevance.

      Thank you very much for your constructive comments. According to your suggestion, we have examined and analyzed RAPSYN mRNA and protein in updated and publicly available datasets as replied in the public response.

      (2) The Discussion repeats some of the information already presented in the Introduction (for example, lines 311-327 of the merged document, or lines 349-358). I would urge the authors to instead expand more about how RAPSYN might be upregulated at the post-transcriptional level, or its potential post-translational regulation by SRC-mediated phosphorylation.

      Thanks for your constructive suggestion. We have re-written this part according to your suggestion and marked in red color in the revised manuscript, lines 319-325 and lines 351-378.

      (3) There are instances of clunky phrases/grammatical mistakes in the manuscript which detract from its readability (eg: lines 142-143: "...empty body transduced shRAPSN#3 or K562 cells into...."; lines 163-164: "Despite AChR subunits α7, M2, M3, and M4 were expressed in all tested cells, no change..."; line 178: "Preeminent BCR-ABL neddylation was detected in..."). A closer proof-reading of the final manuscript is advisable.

      We appreciate the valuable comments. We have made changes for improvement, which is marked in red color in the revised manuscript, lines 145-147, lines 166-168 and line 185.

      (4) The western blot in Fig 5C (particularly the control "OE-NC" of K562) looks drastically different from the corresponding control lanes in Figs 5A and 5B. Similarly, the cell viability curves presented in Fig 6D and 6F (for both K562 and MEG-01, control conditions) look very different from the corresponding curves in Figs 6A and 6B.

      We appreciate for your valuable comments. Because we accidently used the imagines with different exposure time, the western blots in Fig 5C (particularly the control "OE-NC" of K562) look very different from corresponding control lanes in Figs 5A and 5B. We have replaced images with the same exposure time in the revised manuscript.

      For readers to clearly understand, we have revised the method part “Preparation of stable RAPSYNWT, RAPSYNY336F or SRC expression cell lines” (lines 625-627) and related figure legends to reflect the differences.

      We have publicly responded the discrepancy on cell viability.

      Reviewer #2 (Recommendations For The Authors):

      In reviewing your study, I must insist that the completeness and robustness of your work would significantly benefit from a more exhaustive listing of the antibodies used for immunoblotting and immunoprecipitation within the Materials and Methods section. A number of antibodies have been accounted for, however, crucial ones targeting BCR-ABL, c-CBL, Ubiquitin, NEDD8, HA, Myc, and others appear to be omitted. To maintain rigorous scientific standards, I strongly encourage you to include these.

      We appreciate your comments. We have carefully checked the section of Methods and added detailed information of antibodies for Immunoblotting and Immunoprecipitation in the revised manuscript, lines 502-516.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      eLife assessment

      The authors have made important contributions to our understanding of the pathogenesis of erectile dysfunction (ED) in diabetic patients. They have identified the gene Lbh, expressed in pericytes of the penis and decreased in diabetic animals. Overexpression of Lbh appears to counteract ED in these animals. The authors also confirm Lbh as a potential marker in cavernous tissues in both humans and mice. While solid evidence supports Lbh's functional role as a marker gene, further research is needed to elucidate the specific mechanisms by which it exerts its effects. This work is of interest to those working in the fields of ED and angiogenesis.

      Public Reviews:

      Reviewer #1 (Public Review):

      In this study, the researchers aimed to investigate the cellular landscape and cell-cell interactions in cavernous tissues under diabetic conditions, specifically focusing on erectile dysfunction (ED). They employed single-cell RNA sequencing to analyze gene expression patterns in various cell types within the cavernous tissues of diabetic individuals. The researchers identified decreased expression of genes associated with collagen or extracellular matrix organization and angiogenesis in several cell types, including fibroblasts, chondrocytes, myofibroblasts, valve-related lymphatic endothelial cells, and pericytes. They also discovered a newly identified marker, LBH, that distinguishes pericytes from smooth muscle cells in mouse and human cavernous tissues. Furthermore, the study revealed that pericytes play a role in angiogenesis, adhesion, and migration by communicating with other cell types within the corpus cavernosum. However, these interactions were found to be significantly reduced under diabetic conditions. The study also investigated the role of LBH and its interactions with other proteins (CRYAB and VIM) in maintaining pericyte function and highlighted their potential involvement in regulating neurovascular regeneration. Overall, the manuscript is well-written and the study provides novel insights into the pathogenesis of ED in patients with diabetes and identifies potential therapeutic targets for further investigation.

      Comments on revised version:

      For Figure 4, immunofluorecent staining of LBH following intracavernous injections with lentiviruses is required to justify overexpression and tissue specificity.

      We agree with this claims. Therefore, we have performed the immunofluorecent staining of LBH in cavernous tissues after infection with LBH O/E lentiviruses. And we found the LBH expression is significantly decreased in DM or DM+NC groups, however, after infection with LBH O/E lentiviruses, the LBH expression is significantly increased, shown as Supplementary Fig. 10. (Please see revised ‘Result’ and ‘Supplementary Fig. 10’)

      Reviewer #3 (Public Review):

      Bae et al. described the key roles of pericytes in cavernous tissues in diabetic erectile dysfunction using both mouse and human single-cell transcriptomic analysis. Erectile dysfunction (ED) is caused by dysfunction of the cavernous tissue and affects a significant proportion of men aged 40-70. The most common treatment for ED is phosphodiesterase 5 inhibitors; however, these are less effective in patients with diabetic ED. Therefore, there is an unmet need for a better understanding of the cavernous microenvironment, cell-cell communications in patients with diabetic ED, and the development of new therapeutic treatments to improve the quality of life.

      Pericytes are mesenchymal-derived mural cells that directly interact with capillary endothelial cells (ECs). They play a vital role in the pathogenesis of erectile function as their interactions with ECs are essential for penile erection. Loss of pericytes has been associated with diabetic retinopathy, cancer, and Alzheimer's disease and has been investigated in relation to the permeability of cavernous blood vessels and neurovascular regeneration in the authors' previous studies. This manuscript explores the mechanisms underlying the effect of diabetes on pericyte dysfunction in ED. Additionally, the cellular landscape of cavernous tissues and cell type-specific transcriptional changes were carefully examined using both mouse and human single-cell RNA sequencing in diabetic ED. The novelty of this work lies in the identification of a newly identified pericyte (PC)-specific marker, LBH, in mouse and human cavernous tissues, which distinguishes pericytes from smooth muscle cells. LBH not only serves as a cavernous pericyte marker, but its expression level is also reduced in diabetic conditions. The LBH-interacting proteins (Cryab and Vim) were further identified in mouse cavernous pericytes, indicating that these signaling interactions are critical for maintaining normal pericyte function. Overall, this study demonstrates the novel marker of pericytes and highlights the critical role of pericytes in diabetic ED.

      Comments on revised version:

      Bae and colleagues substantially improved the data quality and revised their manuscript "Pericytes contribute to pulmonary vascular remodeling via HIF2a signaling". While these revisions clarify some of the concerns raised, others remain. In my view, the following question must be addressed.

      In my prior question on #3, I completely disagree with the statement that "identified cells with pericyte-like characteristics in the walls of large blood vessels". The staining that authors provided for LBH, was clearly stained for SMCs, not pericytes. Per Fig 2E, the authors are correct that LBH is colocalized with SMA+ cells( SMCs). However, the red signal from LBH clearly stains endothelial cells. In the rest of 2E and 2D, LBH is CD31- and their location suggests LBH stained for SMCs in the Aorta, Kidney vasculature, Dorsal vein, and Dorsal Artery.

      We respect the reviewer's comments and provide further justification for the reviewer's concerns. We first performed double staining of LBH and CD31 on dorsal artery and dorsal vein tissues. We found that LBH-expressing cells are completely different from CD31-expressing cells (Figrue 2D, indicated by arrows, and Supplementary Fig. 10A) and that expression is higher in veins than in arteries. This is consistent with previous understanding. In addition, in the double staining of LBH and α-SMA, we also found that there was no overlap between LBH-expressing cells and α-SMA-expressing smooth muscle cells in the cavernosum tissues, but there was some overlap in dorsal artery and dorsal vein (Figrue 2E, indicated by arrows). This may indicate that LBH is expressed slightly different types of blood vessels. This requires further experiments to prove in the future. In addition, to avoid confusion among other readers. We modify our previous discussion regarding the identification of cells with pericyte-like characteristics in the walls of large blood vessels. We removed the associated immunofluorescence staining in the aorta and kidneys replaced them with dorsal artery and dorsal vein (Please see revised ‘Result’ and ‘Figure 2’ and ‘Supplementary Fig. 10A’)

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this work, Qiu and colleagues examined the effects of preovulatory (i.e., proestrous or late follicular phase) levels of circulating estradiol on multiple calcium and potassium channel conductances in arcuate nucleus kisspeptin neurons. Although these cells are strongly linked to a role as the "GnRH pulse generator," the goal here was to examine the physiological properties of these cells in a hormonal milieu mimicking late proestrus, the time of the preovulatory GnRH-LH surge. Computational modeling is used to manipulate multiple conductances simultaneously and support a role for certain calcium channels in facilitating a switch in firing mode from tonic to bursting. CRISPR knockdown of the TRPC5 channel reduced overall excitability, but this was only examined in cells from ovariectomized mice without estradiol treatment. The patch clamp experiments are comprehensive and overall solid but a direct demonstration of the role of these conductances in being necessary for surge generation (or at least having a direct physiological consequence on surge properties) is lacking, substantially reducing the impact of the findings.

      Strengths:

      (1) Examination of multiple types of calcium and potassium currents, both through electrophysiology and molecular biology.

      (2) Focus on arcuate kisspeptin neurons during the surge is relatively conceptually novel as the anteroventral periventricular nucleus (AVPV) kisspeptin neurons have received much more attention as the "surge generator" population.

      (3) The modeling studies allow for direct examination of manipulation of single and multiple conductances, whereas the electrophysiology studies necessarily require examination of each current in isolation. The construction of an arcuate kisspeptin neuron model promises to be of value to the reproductive neuroendocrinology field.

      We thank the reviewer for recognizing our comprehensive examination of Kiss-ARH neurons through electrophysiological, molecular and computational modeling of their activity during the preovulatory surge, which as the reviewer pointed out is “conceptually novel.” We will bolster our argument that Kiss1-ARH neurons transition from synchronized firing to burst firing with the E2-mediated regulation of channel expression with the addition of new experiments. We will address the weaknesses as follows:

      Weaknesses:

      (1) The novelty of some of the experiments needs to be clarified. This reviewer's understanding is that prior experiments largely used a different OVX+E2 treatment paradigm mimicking periods of low estradiol levels, whereas the present work used a "high E2" treatment model. However, Figures 10C and D are repeated from a previous publication by the same group, according to the figure legend. Findings from "high" vs. "low" E2 treatment regimens should be labeled and clearly separated in the text. It would also help to have direct comparisons between results from low E2 and high E2 treatment conditions.

      We will revise Figures 10C and 10D to include new findings on Tac2 and Vglut2 expression in OVX and E2-treated Kiss1ARH. We did show the previously published data (Qiu, eLife 2018) to contrast with Figures 10E, F showing the downregulation of TRPC5 and GIRK2 channels following E2 treatment. Most importantly, our E2 treatment regime is clearly stated in the Methods and is exactly the same that was used previously (Qiu, eLife 2016 and Qiu, eLife 2018) for the induction of the LH surge in OVX mice (Bosch, Molecular and Cellular Endocrinology 2013) .

      (2) In multiple places, links are made between the changes in conductances and the transition from peptidergic to glutamatergic neurotransmission. However, this relationship is never directly assessed. The data that come closest are the qPCR results showing reduced Tac2 and increased Vglut2 mRNA, but in the figure legend, it appears that these results are from a prior publication using a different E2 treatment regimen.

      In the revised Figure 1, we will now include a clear depiction of the transition from synchronized firing driven by NKB signaling in OVX females to burst firing driven by glutamate in E2-treated females. We have used the same E2 treatment paradigm as previously published (Qiu, eLife 2018).

      (3) Similarly, no recordings of arcuate-AVPV glutamatergic transmission are made so the statements that Kiss1ARH neurons facilitate the GnRH surge via this connection are still only conjecture and not supported by the present experiments.

      Using a horizontal hypothalamic slice preparation, we have shown that Kiss1-ARH neurons excite GnRH neurons via Kiss1ARH glutaminergic input to Kiss1AvPV neurons (summarized in Fig. 12, Qiu, eLife 2016). We do not think that it is necessary to repeat these experiments in the current manuscript.

      (4) Figure 1 is not described in the Results section and is only tenuously connected to the statement in the introduction in which it is cited. The relevance of panels C and D is not clear. In this regard, much is made of the burst firing pattern that arises after E2 treatment in the model, but this burst firing pattern is not demonstrated directly in the slice electrophysiology examples.

      We will revised Figure 1 to include new whole-cell, current clamp recordings documenting the burst firing in response to glutamate in E2-treated, OVX females.

      (5) In Figure 3, it would be preferable to see the raw values for R1 and R2 in each cell, to confirm that all cells were starting from a similar baseline. In addition, it is unclear why the data for TTA-P2 is not shown, or how many cells were recorded to provide this finding.

      Before initiating photo-stimulation for each Kiss1-ARH neuron, we adjust the resting membrane potential to -70 mV, as noted in each panel in Figure 3, through current injections. We will include new findings on the effects of the T-channel blocker TTA-P2 on slow EPSP in the revised Figure 3. The number of cells tested with each calcium channel blocker is depicted in each of the bar graphs summarizing the effects of the blockers.

      (6) In Figure 5, panel C lists 11 cells in the E2 condition but panel E lists data from 37 cells. The reason for this discrepancy is not clear.

      In Figure 5E, we measured the L-, N-, P/Q and R channel currents after pretreatment with TTA-P2 to block the T-type current, whereas in Figure 5C, we measured the current without TTA-P2.

      (7) In all histogram figures, it would be preferable to have the data for individual cells superimposed on the mean and SEM.

      In all revised Figures we will include the individual data points for the individual neurons.

      (8) The CRISPR experiments were only performed in OVX mice, substantially limiting interpretation with respect to potential roles for TRPC5 in shaping arcuate kisspeptin neuron function during the preovulatory surge.

      The TRPC5 channels are most important for generating slow EPSPs when expression of NKB is high in the OVX state. Conversely, the glutamatergic response becomes more significant when the expression of NKB and TRPC5 channel are muted. Therefore, the CRISPR experiments were specifically conducted in OVX mice to maximize the effects.

      (9) Furthermore, there are no demonstrations that the CRISPR manipulations impair or alter the LH surge.

      In this manuscript, our focus is on the cellular electrophysiological activity of the Kiss1ARH neurons in ovx and E2-treated females. Exploration of CRISPR manipulations related to the LH surge is certainly slated for future experiments, but these in vivo experiments are beyond the scope of these comprehensive cellular electrophysiological and molecular studies.

      (10) The time of day of slice preparation and recording needs to be specified in the Methods.

      We will provide the times of slice preparation and recordings in the revised Methods and Materials.

      Reviewer #2 (Public Review):

      Summary:

      Kisspeptin neurons of the arcuate nucleus (ARC) are thought to be responsible for the pulsatile GnRH secretory pattern and to mediate feedback regulation of GnRH secretion by estradiol (E2). Evidence in the literature, including the work of the authors, indicates that ARC kisspeptin coordinate their activity through reciprocal synaptic interactions and the release of glutamate and of neuropeptide neurokinin B (NKB), which they co-express. The authors show here that E2 regulates the expression of genes encoding different voltage-dependent calcium channels, calcium-dependent potassium channels, and canonical transient receptor potential (TRPC5) channels and of the corresponding ionic currents in ARC kisspeptin neurons. Using computer simulations of the electrical activity of ARC kisspeptin neurons, the authors also provide evidence of what these changes translate into in terms of these cells' firing patterns. The experiments reveal that E2 upregulates various voltage-gated calcium currents as well as 2 subtypes of calcium-dependent potassium currents while decreasing TRPC5 expression (an ion channel downstream of NKB receptor activation), the slow excitatory synaptic potentials (slow EPSP) elicited in ARC kisspeptin neurons by NKB release and expression of the G protein-associated inward-rectifying potassium channel (GIRK). Based on these results, and on those of computer simulations, the authors propose that E2 promotes a functional transition of ARC kisspeptin neurons from neuropeptide-mediated sustained firing that supports coordinated activity for pulsatile GnRH secretion to a less intense firing in glutamatergic burst-like firing pattern that could favor glutamate release from ARC kisspeptin. The authors suggest that the latter might be important for the generation of the preovulatory surge in females.

      Strengths:

      The authors combined multiple approaches in vitro and in silico to gain insights into the impact of E2 on the electrical activity of ARC kisspeptin neurons. These include patch-clamp electrophysiology combined with selective optogenetic stimulation of ARC kisspeptin neurons, reverse transcriptase quantitative PCR, pharmacology, and CRIPR-Cas9-mediated knockdown of the Trpc5 gene. The addition of computer simulations for understanding the impact of E2 on the electrical activity of ARC kisspeptin cells is also a strength.

      The authors add interesting information on the complement of ionic currents in ARC kisspeptin neurons and on their regulation by E2 to what was already known in the literature. Pharmacological and electrophysiological experiments appear of the highest standards. Robust statistical analyses are provided throughout, although some experiments (illustrated in Figures 7 and 8) do have rather low sample numbers.

      The impact of E2 on calcium and potassium currents is compelling. Likewise, the results of Trpc5 gene knockdown do provide good evidence that the TRPC5 channel plays a key role in mediating the NKB-mediated slow EPSP. Surprisingly, this also revealed an unsuspected role for this channel in regulating the membrane potential and excitability of ARC kisspeptin neurons.

      We thank the reviewer for recognizing that the “pharmacological and electrophysiological experiments appear of the highest standards” and “the addition of the computer modeling for understanding the impact of E2 on the electrical activity of ARC kisspeptin cells is also a strength. However, we agree with the reviewer that we need to provide a direct demonstration of “burst-like” firing of Kiss1-ARH neurons. We will address the weaknesses as follows:

      Weaknesses:

      The manuscript also has weaknesses that obscure some of the conclusions drawn by the authors.

      One has to do with the fact that "burst-like" firing that the authors postulate ARC kisspeptin neurons transition to after E2 replacement is only seen in computer simulations, and not in slice patch-clamp recordings. A more direct demonstration of the existence of this firing pattern, and of its prominence over neuropeptide-dependent sustained firing under conditions of high E2 would make a more convincing case for the authors' hypothesis.

      We will provide a more direct demonstration of the existence of this firing pattern in the whole-cell current clamp experiments in the revised Figure 1.

      In addition, and quite importantly, the authors compare here two conditions, OVX versus OVX replaced with high E2, that may not reflect the physiological conditions (the diestrous [low E2] and proestrous [high E2] stages of the estrous cycle) under which the proposed transition between neuropeptide-dependent sustained firing and less intense burst firing might take place. This is an important caveat to keep in mind when interpreting the authors' findings. Indeed, that E2 alters certain ionic currents when added back to OVX females, does not mean that the magnitude of these ionic currents will vary during the estrous cycle.

      We have published that the magnitude of the slow EPSP, which is TRPC5 channel mediated, varies throughout the estrous cycle and the similarity to that found in OVX compared to E2-treated, OVX females (Figure 2, Qiu, eLife 2016). Moreover, TRPC5 channel mRNA expression, similar to the peptides, is downregulated by an E2 treatment (Figure 10 this manuscript) that mimics proestrus levels of the steroid (Bosch, Mol Cell Endocrinology 2013). Furthermore, the magnitude of ionic currents is directly proportional to the number of ion channels expressed in the plasma membrane, which we have found correlates with mRNA expression. Therefore, it is likely that the magnitude of these ionic currents will vary during the estrous cycle.

      Lastly, the results of some of the pharmacological and genetic experiments may be difficult to interpret as presented. For example, in Figure 3, although it is possible that blockade of individual calcium channel subtypes suppresses the slow EPSP through decreased calcium entry at the somato-dendritic compartment to sustain TRPC5 activation and the slow depolarization (as the authors imply), a reasonable alternative interpretation would be that at least some of the effects on the amplitude of the slow EPSP result from suppression of presynaptic calcium influx and, thus, decreased neurotransmitter and neuropeptide secretion. Along the same lines, in Figure 12, one possible interpretation of the observed smaller slow EPSPs seen in mice with mutant TRPC5 could be that at least some of the effect is due to decreased neurotransmitter and neuropeptide release due to the decreased excitability associated with TRPC5 knockdown.

      The reviewer raises a good point, but our previous findings clearly demonstrate that chelating intracellular calcium with BAPTA in whole-cell current clamp recordings abolishes the slow EPSP and persistent firing (Qiu, J. Neurosci 2021), which we have noted is the rationale for dissecting out the contribution of T, R, N, L and P/Q calcium channels to the slow EPSP in our current studies (revised Figure 3 will include the effects of T-channel blocker).

      However, to further bolster the argument for the post-synaptic contribution of the calcium channels to the slow EPSP and eliminate the potential presynaptic effects of calcium channel blockers on the postsynaptic slow EPSP amplitude, which may result from reduced presynaptic calcium influx and subsequently decreased neurotransmitter release, we will utilized an additional strategy. Specifically, we will measure the response to the externally administered TACR3 agonist senktide under conditions in which the extracellular calcium influx, as well as neurotransmitter and neuropeptide release, are blocked (new Figure 3).

    1. Author response:

      eLife assessment

      Unlocking the potential of molecular genetic tools (optogenetics, chemogenetics, sensors, etc.) for the study of systems neuroscience in nonhuman primates requires the development of effective regulatory elements for cell-type specific expression to facilitate circuit dissection. This study provides a valuable building block, by carefully characterizing the laminar expression profile of two viral vectors, one designed for general GABA+ergic neurons and the second for parvalbumin+ cell-type selective expression in the marmoset primary visual cortex. The authors provide solid evidence for the first enhancer S5E2 and incomplete evidence for the second one, h56D. This study contributes to our understanding of these tools but is limited by the understandably small number of animals used.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Federer et al. tested AAVs designed to target GABAergic cells and parvalbumin-expressing cells in marmoset V1. Several new results were obtained. First, AAV-h56D targeted GABAergic cells with >90% specificity, and this varied with serotype and layer. Second, AAV-PHP.eB.S5E2 targeted parvalbumin-expressing neurons with up to 98% specificity. Third, the immunohistochemical detection of GABA and PV was attenuated near viral injection sites.

      Strengths:

      Vormstein-Schneider et al. (2020) tested their AAV-S5E2 vector in marmosets by intravenous injection. The data presented in this manuscript are valuable in part because they show the transduction pattern produced by intraparenchymal injections, which are more conventional and efficient.

      Our manuscript additionally provides detailed information on the laminar specificity and coverage of these viral vectors, which was not investigated in the original studies.

      Weaknesses:

      The conclusions regarding the effects of serotype are based on data from single injection tracks in a single animal. I understand that ethical and financial constraints preclude high throughput testing, but these limitations do not change what can be inferred from the measurements. The text asserts that "...serotype 9 is a better choice when high specificity and coverage across all layers are required". The data presented are consistent with this idea but do not make a strong case for it.

      We are aware of the limitations of our results on the AAV-h56D. We agree with the Reviewer that a single injection per serotype does not allow us to make strong statements about differences between the 3 serotypes. Therefore, in the revised version of the manuscript we will temper our claims about such differences and use more caution in the interpretation of these data. Despite this weakness, we feel that these data still demonstrate high efficiency and specificity across cortical layers of transgene expression in GABA cells using the h56D promoter, at least with two of the 3 AAV serotypes we tested. We feel that in itself this is sufficiently useful information for the primate community, worthy of being reported. Due to cost, time and ethical considerations related to the use of primates, we chose not to perform additional experiments to determine precise differences among serotypes. Thus, for example, while it is possible that had we replicated these experiments, serotype 7 would have proven equally efficient and specific as the other two serotypes, we felt answering this question did not warrant additional experiments in this precious species.

      A related criticism extends to the analysis of injection volume on viral specificity. Some replication was performed here, but reliability across injections was not reported. My understanding is that individual ROIs were treated as independent observations. These are not biological replicates (arguably, neither are multiple injection tracks in a single animal, but they are certainly closer). Idiosyncrasies between animals or injections (e.g. if one injection happened to hit one layer more than another) could have substantial impacts on the measurements. It remains unclear which results regarding injection volume or serotype would hold up had a large number of injections been made into a large number of marmosets.

      For the AAV-S5E2, we made a total of 7 injections (at least 2 at the same volume), all of which, irrespective of volume, resulted in high specificity and efficiency for PV interneurons. Our conclusion is that larger volumes are slightly less specific, but the differences are minimal and do not warrant additional injections. Additionally, all of our injections involved all cortical layers, and the ROIs we selected for counts encompassed reporter protein expression across all layers. To provide a better sense of the reliability of the results across injections, in the revised version of the manuscript we will provide a supplementary table with results for each injection case separately.

      Reviewer #2 (Public Review):

      This is a straightforward manuscript assessing the specificity and efficiency of transgene expression in marmoset primary visual cortex (V1), for 4 different AAV vectors known to target transgene expression to either inhibitory cortical neurons (3 serotypes of AAV-h56D-tdTomato) or parvalbumin (PV)+ inhibitory cortical neurons in mice. Vectors are injected into the marmoset cortex and then postmortem tissue is analyzed following antibody labeling against GABA and PV. It is reported that: "in marmoset V1 AAV-h56D induces transgene expression in GABAergic cells with up to 91-94% specificity and 80% efficiency, depending on viral serotype and cortical layer. AAV-PHP.eB-S5E2 induces transgene expression in PV cells across all cortical layers with up to 98% specificity and 86-90% efficiency."

      These claims are largely supported but slightly exaggerated relative to the actual values in the results presented. In particular, the overall efficiency for the best h56D vectors described in the results is: "Overall, across all layers, AAV9 and AAV1 showed significantly higher coverage (66.1{plus minus}3.9 and 64.9%{plus minus}3.7)". The highest coverage observed is just in middle layers and is also less than 80%: "(AAV9: 78.5%{plus minus}9.1; AAV1: 76.9%{plus minus}7.4)".

      In the abstract, we indeed summarize the overall data and round up the decimals, and state that these parentages are upper bound and that they vary by serotype and layer, while in the Results we report the detailed counts with decimals. To clarify this, in the revised version of the Abstract we will change 80% to 79% and emphasize even more clearly the dependence on serotype and layer. We will amend this sentence of the Abstract as follows: “We show that in marmoset V1 AAV-h56D induces transgene expression in GABAergic cells with up to 91-94% specificity and 79% efficiency, but this depends on viral serotype and cortical layer.”

      For the AAV-PHP.eB-S5E2 the efficiency reported in the abstract ("86-90%) is also slightly exaggerated relative to the results: "Overall, across all layers coverage ranged from 78%{plus minus}1.9 for injection volumes >300nl to 81.6%{plus minus}1.8 for injection volumes of 100nl."

      Indeed, the numbers in the Abstract are upper bounds, for example efficiency in L4A/B with S5E2 reaches 90%. To further clarify this important point, in the revised abstract we will state ”AAV-PHP.eB-S5E2 induces transgene expression in PV cells across all cortical layers with up to 98% specificity and 86-90% efficiency, depending on layer”.

      These data will be useful to others who might be interested in targeting transgene expression in these cell types in monkeys. Suggestions for improvement are to include more details about the vectors injected and to delete some comments about results that are not documented based on vectors that are not described (see below).

      Major comments:

      Details provided about the AAV vectors used with the h56D enhancer are not sufficient to allow assessment of their potential utility relative to the results presented. All that is provided is: "The fourth animal received 3 injections, each of a different AAV serotype (1, 7, and 9) of the AAV-h56D-tdTomato (Mehta et al., 2019), obtained from the Zemelman laboratory (UT Austin)." At a minimum, it is necessary to provide the titers of each of the vectors. It would also be helpful to provide more information about viral preparation for both these vectors and the AAVPHP.eB-S5E2.tdTomato. Notably, what purification methods were used, and what specific methods were used to measure the titers?

      We thank the Reviewer for this comment. In the revised version of the manuscript, we will provide a Table with titers of each viral vector injected as well as more information regarding viral preparation methods. In fact, the methods for viral preparation and purification are detailed in the original publications so we feel it may be sufficient to cite the original papers?

      The first paragraph of the results includes brief anecdotal claims without any data to support them and without any details about the relevant vectors that would allow any data that might have been collected to be critically assessed. These statements should be deleted. Specifically, delete: "as well as 3 different kinds of PV-specific AAVs, specifically a mixture of AAV1-PaqR4-Flp and AAV1-h56D-mCherry-FRT (Mehta et al., 2019), an AAV1-PV1-ChR2-eYFP (donated by G. Horwitz, University of Washington)," and delete "Here we report results only from those vectors that were deemed to be most promising for use in primate cortex, based on infectivity and specificity. These were the 3 serotypes of the GABA-specific pAAV-h56D-tdTomato, and the PV-specific AAVPHP.eB-S5E2.tdTomato." These tools might in fact be just as useful or even better than what is actually tested and reported here, but maybe the viral titer was too low to expect any expression.

      This data is indeed anecdotal, and while we could delete it from the manuscript, as suggested by the Reviewer, we feel it could be useful information for the scientific community. It could prevent other labs from wasting resources, animals and time, particularly, as some of these vectors have been reported to be selective and efficient in the primate cortex, which we have not been able to confirm. We made several injections in several animals of those vectors that failed either to infect a sufficient number of cells or turned out to be poorly specific. Therefore, the negative results have been consistent. But we agree with the Reviewer that our negative results could have depended on factors such as titer. In the revised version of the manuscript, we will provide a supplementary Methods section in which we will report the specifics of the vectors that failed in our hands (i.e. number of injections made in how many animals, volumes, survival time, and titers).

      Based on the description in the Methods it seems that no antibody labeling against TdTomato was used to amplify the detection of the transgenes expressed from the AAV vectors. It should be verified that this is the case - a statement could be added to the Methods.

      That is indeed the case. We used no immunohistochemistry to enhance the reporter proteins as this was unnecessary. The native / non-emplified tdT signal was strong.

      Reviewer #3 (Public Review):

      Summary:

      Federer et al. describe the laminar profiles of GABA+ and of PV+ neurons in marmoset V1. They also report on the selectivity and efficiency of expression of a PV-selective enhancer (S5E2). Three further viruses were tested, with a view to characterizing the expression profiles of a GABA-selective enhancer (h56d), but these results are preliminary.

      Strengths:

      The derivation of cell-type specific enhancers is key for translating the types of circuit analyses that can be performed in mice - which rely on germline modifications for access to cell-type specific manipulation - in higher-order mammals. Federer et al. further validate the utility of S5E2 as a PV-selective enhancer in NHPs.

      Additionally, the authors characterize the laminar distribution pattern of GABA+ and PV+ cells in V1. This survey may prove valuable to researchers seeking to understand and manipulate the microcircuitry mediating the excitation-inhibition balance in this region of the marmoset brain.

      Weaknesses:

      Enhancer/promoter specificity and efficiency cannot be directly compared, because they were packaged in different serotypes of AAV.

      The three different serotypes of AAV expressing reporter under the h56D promoter were only tested once each, and all in the same animal. There are many variables that can contribute to the success (or failure) of a viral injection, so observations with an n=1 cannot be considered reliable.

      This is an important point that was also brought up by the Reviewer 1, which we thoroughly addressed in our comments. For clarity and convenience, we copied our response to Reviewer 1 below:.

      We are aware of the limitations of our results on the AAV-h56D. We agree with the Reviewer that a single injection per serotype does not allow us to make strong statements about differences between the 3 serotypes. Therefore, in the revised version of the manuscript we will temper our claims about such differences and use more caution in the interpretation of these data. Despite this weakness, we feel that these data still demonstrate high efficiency and specificity across cortical layers of transgene expression in GABA cells using the h56D promoter, at least with two of the 3 AAV serotypes we tested. We feel that in itself this is sufficiently useful information for the primate community, worthy of being reported. Due to cost, time and ethical considerations related to the use of primates, we chose not to perform additional experiments to determine precise differences among serotypes. Thus, for example, while it is possible that had we replicated these experiments, serotype 7 would have proven equally efficient and specific as the other two serotypes, we felt answering this question did not warrant additional experiments in this precious species.

      The language used throughout conflates the cell-type specificity conferred by the regulatory elements with that conferred by the serotype of the virus.

      In the revised version of the manuscript we will correct ambiguous language.

    1. Author response:

      The following is the authors’ response to the original reviews.

      General responses to the weaknesses of this work:

      The two reviewers mentioned two major weaknesses of this work:

      (1) The one unexplained step in this intricately described mechanism is how HSCB functions to promote TACC3 degradation. It appears that the proteasome is involved since MG-132 reverses the effect of HSCB deficiency, but no other details are provided. Does HSCB target TACC3 for ubiquitination somehow? Future studies will be required to understand this portion of the mechanism.

      We totally agree that the detailed mechanisms through which HSCB promotes TACC3 degradation should be clarified. We tried to find the ubiquitin ligases involved in this regulatory process but could not identify such a key protein so far. We also investigated whether HSCB itself is a ubiquitin ligase but found that the protein does not possess this activity. We therefore consider this weakness another limitation of this research and have added one sentence to the penultimate paragraph of the Discussion section to address this issue.

      (2) This study only uses cell models. The significance of this work may be broadened by further studies using animal models.

      We totally agree that in vivo models should be adopted to validate the major findings of this study. As we stated in the penultimate paragraph of the Discussion section, we did not have access to biological samples from the patient harboring the HSCB mutation. Additionally, HSCB constitutive knockout mice died during the embryonic stage, while conditional knockout did not cause embryonic death but resulted in almost no erythroid cells in the bone marrow. Therefore, we were not able to further validate our findings in in vivo models.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      • Figure 3A - Should include FOG1 on the total cell lysate blots to show if total FOG1 is changing or only the cytoplasmic/nuclear ratio. This is shown later but would be good to include here.

      We would like to thank the reviewer for the nice suggestion. We have added the blots for total FOG1 to updated Figure 3A as requested.

      • Figures 3C and 4F - Should include the qPCR results from control cultures on the graphs (EPO + CRISPR NC and shNC, respectively).

      We would like to thank the reviewer for the good suggestion. We have added the control groups for all qPCR assays to the updated figures throughout the study.

      • Figure 4 - The addition of genetic manipulation of TACC3 to confirm its role in the cytoplasmic retention of FOG1 and failed erythroid differentiation in HSCB-deficient cells would strengthen the conclusions of this figure.

      We would like to thank the reviewer for the good suggestion. We initially tried to knock down TACC3 expression through siRNAs to confirm its role in the cytoplasmic retention of FOG1. However, we found that siRNAs that worked well in untreated K562 and erythroid progenitor cells as well as several other cell lines had poor efficiency of knocking down gene expression upon HSCB deficiency. This happened not only to siRNAs targeting TACC3, but also to those targeting several other genes. Interestingly, gene overexpression plasmids worked especially well in HSCB-deficient cells. We were not able to explain these phenomena and chose to use an inhibitor of TACC3 to study its functional implications in this research.

      • Text should be added to discuss the implications of this work for the lineage-specifying function of GATA-1. There are papers by John Crispino and Alan Cantor/Stu Orkin using the FOG-binding mutant of GATA-1 that implicate FOG1-dependent GATA-1 activity as Meg/Ery specifying, whereas FOG1-independent GATA-1 activity promotes mast cell or eosinophil fate. This work suggests that GATA1-expressing myeloid progenitors where FOG1 is kept cytoplasmic (no EPO signaling) would be driven towards the mast cell fate.

      We would like to thank the reviewer for the valuable suggestion. We have added a new paragraph in the Discussion section of the updated manuscript to discuss the implication of this work for the lineage-specifying function of GATA-1.

      Reviewer #2 (Recommendations For The Authors):

      Minor comments:

      (1) In the model provided in Figure 7H, HSCB and FOG1 bind TACC3 simultaneously. However based on the data provided in Figure 6B and other figures, it seems that their interactions are more likely to be mutually exclusive. Is there a possibility that, besides inducing the degradation of TACC3, the binding of HSCB can inhibit the interaction between TACC3 and FOG1?

      We would like to thank the reviewer for the insightful comment. According to the data presented in the updated Figure 5F, TACC3 can simultaneously bind with HSCB and FOG1 in E 2-day HSCs. That is why we depict the simultaneous binding pattern in the model provided in Figure 7H. However, we agree that there is a possibility that the binding of HSCB can inhibit the interaction between TACC3 and FOG1 and have mentioned this possibility in the “Phosphorylation of HSCB by PI3K was necessary for its functionalization during human erythropoiesis” subsection of the “Results” section in the updated manuscript.

      (2) Whether the decreased TACC3 protein abundance (Figure 5D) during erythroblast differentiation is mainly due to the effect of HSCB. Can silencing of HSCB block this reduction?

      We would like to thank the reviewer for the great question. We have analyzed the protein abundance of TACC3 in HSCB-deficient hematopoietic stem cells induced for erythropoiesis for 0, 2 and 4 days and summarized the results as a new Figure 5E. According to the results, TACC3 protein abundance in HSCB-deficient hematopoietic stem cells exhibited no obvious change when the cells were induced for erythropoiesis for 0, 2 and 4 days. These results suggest that the decreased TACC3 protein abundance during early erythroblast differentiation was indeed due to the effect of HSCB. We only investigated the effect of HSCB on TACC3 abundance in early erythroid progenitors because, as shown in Figure 1, HSCB-deficient hematopoietic stem cells stopped differentiation at an early phase of their erythropoiesis. We have also mentioned these data in the “HSCB facilitated FOG1 nuclear translocation by binding with and mediating the proteasomal degradation of TACC3 upon activation of the EPO/EPOR signaling” subsection of the “Results” section in the updated manuscript.

      (3) This study shows that HSCB can be phosphorylated by PI3K, and this modification is important for its role in regulating FOG1 distribution. Does the phosphorylation of HSCB also affect its function in ISC biogenesis?

      We would like to thank the reviewer for the instructive question. We have analyzed the mitochondrial and cytosolic aconitase activities in wortmannin-treated K562 and E 2-day HSCs and their respective controls. The results have been summarized as a new Figure S5. According to the results, wortmannin treatment did not significantly affect mitochondrial and cytosolic aconitase activities. Therefore, it seems that HSCB phosphorylation does not affect its function in ISC biogenesis. We have also mentioned these data in the “Phosphorylation of HSCB by PI3K was necessary for its functionalization during human erythropoiesis” subsection of the “Results” section in the updated manuscript.

      (4) The method of isolation of nuclear fraction needs to be provided in the "Materials and Methods" section.

      We would like to thank the reviewer for the thoughtful suggestion. We have added the required information to the “Nuclear proteomics analysis” subsection of the "Materials and Methods" section in the updated manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      Following small molecule screens, this study provides convincing evidence that 7,8 dihydroxyflavone (DHF) is a competitive inhibitor of pyridoxal phosphatase. These results are important since they offer an alternative mechanism for the effects of 7,8 dihdroxyflavone in cognitive improvement in several mouse models. This paper is also significant due to the interest in the protein phosphatases and neurodegeneration fields.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Zink et al set out to identify selective inhibitors of the pyridoxal phosphatase (PDXP). Previous studies had demonstrated improvements in cognition upon removal of PDXP, and here the authors reveal that this correlates with an increase in pyridoxal phosphate (PLP; PDXP substrate and an active coenzyme form of vitamin B6) with age. Since several pathologies are associated with decreased vitamin B6, the authors propose that PDXP is an attractive therapeutic target in the prevention/treatment of cognitive decline. Following high throughput and secondary small molecule screens, they identify two selective inhibitors. They follow up on 7, 8 dihydroxyflavone (DHF). Following structure-activity relationship and selectivity studies, the authors then solve a co-crystal structure of 7,8 DHF bound to the active site of PDXP, supporting a competitive mode of PDXP inhibition. Finally, they find that treating hippocampal neurons with 7,8 DHF increases PLP levels in a WT but not PDXP KO context. The authors note that 7,8 DHF has been used in numerous rodent neuropathology models to improve outcomes. 7, 8 DHF activity was previously attributed to activation of the receptor tyrosine kinase TrkB, although this appears to be controversial. The present study raises the possibility that it instead/also acts through modulation of PLP levels via PDXP, and is an important area for future work.

      Strengths:

      The strengths of the work are in the comprehensive, thorough, and unbiased nature of the analyses revealing the potential for therapeutic intervention in a number of pathologies.

      Weaknesses:

      Potential weaknesses include the poor solubility of 7,8 DHF that might limit its bioavailability given its relatively low potency (IC50= 0.8 uM), which was not improved by SAR. However, the compound has an extended residence me and the co-crystal structure could aid the design of more potent molecules and would be of interest to those in the pharmaceutical industry. The images related to crystal structure could be improved.

      Reviewer #2 (Public Review):

      Summary:

      In this study, the authors performed a screening for PDXP inhibitors to identify compounds that could increase levels of pyridoxal 5'- phosphate (PLP), the co-enzymatically active form of vitamin B6. For the screening of inhibitors, they first evaluated a library of about 42,000 compounds for activators and inhibitors of PDXP and secondly, they validated the inhibitor compounds with a counter-screening against PGP, a close PDXP relative. The final narrowing down to 7,8-DHF was done using PLP as a substrate and confirmed the efficacy of this flavonoid as an inhibitor of PDXP function. Physiologically, the authors show that, by acutely treating isolated wild-type hippocampal neurons with 7,8-DHF they could detect an increase in the ratio of PLP/PL compared to control cultures. This effect was not seen in PDXP KO neurons.

      Strengths:

      The screening and validation of the PDXP inhibitors have been done very well because the authors have performed crystallographic analysis, a counter screening, and mutation analysis. This is very important because such rigor has not been applied to the original report of 7,8 DHF as an agonist for TrkB. Which is why there is so much controversy on this finding.

      Weaknesses:

      As mentioned in the summary report the study may benefit from some in vivo analysis of PLP levels following 7,8-DHF treatment, although I acknowledge that it may be challenging because of the working out of the dosage and timing of the procedure.

      Reviewer #3 (Public Review):

      This is interesting biology. Vitamin B6 deficiency has been linked to cognitive impairment. It is not clear whether supplements are effective in restoring functional B6 levels. Vitamin B6 is composed of pyridoxal compounds and their phosphorylated forms, with pyridoxal 5-phosphate (PLP) being of particular importance. The levels of PLP are determined by the balance between pyridoxal kinase and phosphatase activities. The authors are testing the hypothesis that inhibition of pyridoxal phosphatase (PDXP) would arrest the age-dependent decline in PLP, offering an alternative therapeutic strategy to supplements. Published data illustrating that ablation of the Pdxp gene in mice led to increases in PLP levels and improvement in learning and memory trials are consistent with this hypothesis.

      In this report, the authors conduct a screen of a library of ~40k small molecules and identify 7,8dihydroxyflavone (DHF) as a candidate PDXP inhibitor. They present an initial characterization of this micromolar inhibitor, including a co-crystal structure of PDXP and 7,8-DHF. In addition, they demonstrate that treatment of cells with 7,8 DHP increases PLP levels. Overall, this study provides further validation of PDXP as a therapeutic target for the treatment of disorders associated with vitamin B6 deficiency and provides proof-of-concept for inhibition of the target with small-molecule drug candidates.

      Strengths include the biological context, the focus on an interesting and under-studied class of protein phosphatases that includes several potential therapeutic targets, and the identification of a small molecule inhibitor that provides proof-of-concept for a new therapeutic strategy. Overall, the study has the potential to be an important development for the phosphatase field in general.

      Weaknesses include the fact that the compound is very much an early-stage screening hit. It is an inhibitor with micromolar potency for which mechanisms of action other than inhibition of PDXP have been reported. Extensive further development will be required to demonstrate convincingly the extent to which its effects in cells are due to on-target inhibition of PDXP.

      Recommendations for the authors:

      There is general agreement that the study represents an advance regarding the mechanisms of pyridoxal phosphatase and 7,8 DHF. From the reviewers' comments, several major questions and considerations are raised, followed by their detailed remarks:

      (1) More analysis of the solubility and dose of 7,8 DHF with regard to the 50% inhibition and the salt bridge of the B protomer, as raised by the reviewers.

      (2) Is there a possible involvement of another phosphatase?

      (3) Does 7,8 DHF cause an effect upon TrkB tyrosine phosphorylation?

      We thank the Reviewers and Editors for their fair and constructive comments and suggestions. We have performed additional experiments to address these questions and considerations. In addition, we have generated two new high-resoling (1.5 Å) crystal structures of human PDXP in complex with 7,8-DHF that substantially expand our understanding of 7,8-DHF-mediated PDXP inhibition. The scientist who performed this work for the revision of our manuscript has been added as an author (shared first authorship).

      We believe that the insights gained from these new data have further strengthened and improved the quality of our manuscript. Together, our data provide compelling evidence that 7,8-dihydroxyflavone is a direct and competitive inhibitor of pyridoxal phosphatase.

      Please find our point-by-point responses to the Public Reviews that are not addressed in the Recommendations for the Authors, and the Recommendations for the Authors below.

      Reviewer #2:

      As mentioned in the summary report the study may benefit from some in vivo analysis of PLP levels following 7,8-DHF treatment, although I acknowledge that it may be challenging because of the working out of the dosage and timing of the procedure.

      We agree that an in vivo analysis of PLP levels following 7,8-DHF treatment could be informative for the further evaluation of a possible mechanistic link between the reported effects of this compound and PDXP/vitamin B6. However, we currently do not have a corresponding animal experimentation permission in place and are unlikely to obtain such a permit within a reasonable me frame for this revision.

      Recommendations For The Authors:

      Reviewer #1:

      The work is already well-written, comprehensive, and convincing.

      Suggestions that could improve the manuscript.

      (1) Include a protein tyrosine phosphatase (PTP) in the selectivity analysis. One possibility is that 7,8 DHF acts on a PTP (such as PTP1B), leading to TrkB activation by preventing dephosphorylation. I note that a previous study has looked at SAR for flavones with PTP1B (PMID: 29175190), which is worth discussion.

      We thank the reviewer for bringing this interesting possibility to our attention. We were not aware of the SAR study for flavonoids with PTP1B by Proenca et al. but have now tested the effect of 7,8-DHF on PTP1B, referring to this paper. As shown in Figure 2d, PTP1B was not inhibited by 7,8-DHF at a concentration of 5 or 10 µM. At the highest tested concentration of 40 µM, 7,8-DHF inhibited PTP1B merely by ~20%. For comparison, compound C13 (3-hydroxy-7,8-dihydroxybenzylflavone-3’,4’dihydroxymethyl-phenyl), which emerged as the most active flavonoid in the SAR study by Proenca et al. inhibited PTP1B with an IC50 of 10 µM. Consistent with the results of these authors, our finding confirms that less polar substituents, such as O-benzyl groups at positions 7 and 8, and O-methyl groups at positions 3’ and 4’ of the flavone scaffold, are important for the ability of flavonoids to effectively inhibit PTP1B. We conclude that PTP1B inhibition by 7,8-DHF is unlikely to be a primary contributor to the reported cellular and in vivo effects of this flavone.

      In addition to PTP1B, we have now additionally tested the effect of 7,8-DHF on the serine/threonine protein phosphatase calcineurin/PP2B, the DNA/RNA-directed alkaline phosphatase CIP, and three other metabolite-directed HAD phosphatases, namely NANP, NT5C1A and PNKP. PP2B, CIP and NANP were not inhibited by 7,8-DHF. Similar to PTP1B, PNKP activity was attenuated (~30%) only at 40 µM 7,8-DHF. In contrast, 7,8-DHF effectively inhibited NT5C1A (IC50 ~10 µM). NT5C1A is an AMP hydrolase expressed in skeletal muscle and heart. To our knowledge, a role of NT5C1A in the brain has not been reported. Based on currently available information, the inhibition of NT5C1A therefore appears unlikely to contribute to 7,8-DHF effects in the brain.

      The results of these experiments are shown in the revised Figure 2d. Taken together, the extended selectivity analysis of 7,8-DHF on a total of 12 structurally and functionally diverse protein- and nonprotein-directed phosphatases supports our initial conclusion that 7,8-DHF preferentially inhibits PDXP.

      (2) Line 144: It is unclear how fig 2c supports the statement here. Remove call out for clarity.

      Our intention was to highlight the fact that 7,8-DHF concentrations >12.5 µM could not be tested in the BLI assay (shown in Figure 2c) due to 7,8-DHF solubility issues under these experimental conditions. However, since this is discussed in the text, but not directly visible in Figure 2c, we agree with the Reviewer and have removed this call out.

      (3) Figure 3a. It is difficult to see the pink 7,8 DHF on top of the pink ribbon backbone. A better combination of colours could be used. Likewise in Figure 3b it is pink on pink again.

      We have improved the combination of colors to enhance the visibility of 7,8-DHF and have consistently color-coded murine and the new human PDXP structures throughout the manuscript.

      (4) Figure 3c and d. These are the two protomers I believe, but the colour coding is not present in 3c where the ribbon is now gray. Please choose colours that can be used to encode protomers throughout the figure.

      Please see response to point 3 above.

      (5) Figure 3f. I think this is the same protomer as 3c but a 180-degree rotation. Could this be indicated, or somehow lined up between the two figures for clarity? It would also be useful to have 3e in the same orientation as 3f, to better visualise the overlap with PLP binding. PLP and 7,8 DHF could be labelled similarly to the amino acids in 3f (the colour coding here is helpful).

      Please see response to point 3 above. We have substantially revised the structural figures and have used consistent color coding and the same perspective of 7,8-DHF in the PDXP active sites.

      (6) Figure 3g. The colours of the bars relating to specific mutations do not quite match the colours in Figure 3f, which I think was the aim and is very helpful.

      We have adapted the colours of the residues in Figure 3f (now Fig. 3b and additionally Fig. 3 – figure supplement 1e) so that they exactly match the colours of the bars in Figure 3g (now Fig. 3d).

      Reviewer #2:

      No further comments.

      Reviewer #3:

      Page 4: The authors describe 7,8DHF as a "selective" inhibitor of PDXP - in my opinion, they do not have sufficient data to support such a strong assertion. Reports that 7,8DHF may act as a TRK-B-agonist already highlight a potential problem of off-target effects. Does 7,8DHF promote tyrosine phosphorylation of TRK-B in their hands? The selectivity panel presented in Figure 2, focusing on 5 other HAD phosphatases, is much too limited to support assertions of selectivity.

      We agree with the Reviewer that our previous selectivity analysis with six HAD phosphatases was limited. To further explore the phosphatase target spectrum of 7,8-DHF, we have now analyzed six other enzymes: three other non-HAD phosphatases (the tyrosine phosphatase PTP1B, the serine/threonine protein phosphatase PP2B/calcineurin, and the DNA/RNA-directed alkaline phosphatase/CIP) and three other non-protein-directed C1/C0-type HAD phosphatases (NT5C1A, NANP, and PNKP). The C1-capped enzymes NT5C1A and NANP were chosen because we previously found them to be sensitive to small molecule inhibitors of the PDXP-related phosphoglycolate phosphatase PGP (PMID: 36369173). PNKP was chosen to increase the coverage of C0-capped HAD phosphatases (previously, only the C0-capped MDP1 was tested).

      We found that calcineurin, CIP and NANP were not inhibited by up to 40 µM 7,8-DHF. The activities of PTP1B or PNKP activity were attenuated (by ~20 or 30%, respectively) only at 40 µM 7,8-DHF. In contrast, 7,8-DHF effectively inhibited NT5C1A (IC50 ~10 µM). We have previously found that NT5C1A was sensitive to small-molecule inhibitors of the PDXP paralog PGP, although these molecules are structurally unrelated to 7,8-DHF (PMID: 36369173). NT5C1A is an AMP hydrolase expressed in skeletal muscle and heart (PMID: 12947102). To our knowledge, a role of NT5C1A in the brain has not been reported. Based on currently available information, the inhibition of NT5C1A therefore appears unlikely to contribute to 7,8-DHF effects in the brain. The results of these experiments are shown in the revised Figure 2d. Taken together, the extended selectivity analysis of 7,8-DHF on a total of 12 structurally and functionally diverse protein- and non-protein-directed phosphatases supports our initial conclusion that 7,8-DHF preferentially inhibits PDXP. To nevertheless avoid any overstatement, we have now also replaced “selective” by “preferential” in this context throughout the manuscript.

      We have not tested if 7,8-DHF promotes tyrosine phosphorylation of TRK-B. Being able to detect 7,8- DHF-induced TRK-B phosphorylation in our hands would not exclude an additional role for PDXP/vitamin B6-dependent processes. Not being able to detect TRK-B phosphorylation may indicate absence of evidence or evidence of absence. This would neither conclusively rule out a biological role for 7,8-DHF-induced TRK-B phosphorylation in vivo, nor contribute further insights into a possible involvement of vitamin B6-dependent processes in 7,8-DHF induced effects.

      Page 6: The authors report that they obtained only two PDXP-selective inhibitor hits from their screen; 7,8DHF and something they describe as FMP-1. For the later, they state that it "was obtained from an academic donor, and its structure is undisclosed for intellectual property reasons". In my opinion, this is totally unacceptable. This is an academic research publication. If the authors wish to present data, they must do so in a manner that allows a reader to assess their significance; in the case of work with small molecules that includes the chemical structure. In my opinion, the authors should either describe the compound fully or remove mention of it altogether.

      We are unable to describe “FMP-1” because its identity has not been disclosed to us. The academic donor of this molecule informed us that they were not able to permit release of any details of its structure or general structural class due to an emerging commercial interest.

      We mentioned FMP-1 simply to highlight the fact that the screening campaign yielded more than one inhibitor. FMP-1 was also of interest due its complete inhibition of PDXP phosphatase activity.

      Because the structure of this molecule is unknown to us, we have now removed any mention of this compound in the manuscript. For the same reason, we have removed the mention of the inhibitor hits “FMP-2” and “FMP-3” in Figure 2 – figure supplement 1 and Figure 2 – figure supplement 2. The number of PDXP inhibitor hits in the manuscript has been adapted accordingly.

      Page 7: The observed plateau at 50% inhibition requires further explanation. It is not clear how poor solubility of the compound explains this observation. For example, the authors state that "due to the aforementioned poor solubility of 7,8DHF, concentrations higher than 12.5µM could not be evaluated". Yet on page 8, they describe assays against the specificity panel at concentrations of compound up to 40µM. Do the analogues of 7,8DHF (Fig 2b) result in >50% inhibition at higher concentrations? Further explanation and data on the solubility of the compounds would be of benefit.

      We currently do not have a satisfactory explanation for the apparent plateau of ~50% PDXP inhibition by 7,8-DHF. Resolving this question will likely require other approaches, including computational chemistry such as molecular dynamics simulations, and we feel that this is beyond the scope of the present manuscript.

      We previously speculated that the limited solubility of 7,8-DHF may counteract a complete enzyme inhibition if higher concentrations of this molecule are required. Specifically, we referred to Todd et al. who have performed HPLC-UV-based solubility assays of 7,8-DHF (ref. 35). These authors found that immediately after 7,8-DHF solubilization, nominal 7,8-DHF concentrations of 5, 20 or 50 µM resulted in 0.5, 3.0 or 13 µM of 7,8-DHF in solution of (i.e., 10, 15 or 26% of the respective nominal concentration). Seven hours later, 46, 26 or 26% of the respective nominal 7,8-DHF concentrations were found in solution. Hence, above a nominal concentration of 5 µM, 7,8-DHF solubility does not increase linearly with the input concentration, but plateaus at ~20% of the nominal concentration. This phenomenon could potentially contribute to the apparent plateau of human or murine PDXP inhibition by 7,8-DHF in vitro.

      However, experiments performed during the revision of our manuscript show that they HAD phosphatase NT5C1A can be effectively inhibited by 7,8-DHF with an IC50-value of 10 µM (see revised Fig. 2). Together with the fact that the activity of the PDXP-Asn61Ser variant can be completely inhibited by 7,8-DHF (see Fig. 3d), we conclude that the reason for the observed plateau of PDXP inhibition is likely to be primarily structural, with Asn61 impeding 7,8-DHF binding. We have therefore removed the mention of the limited solubility of 7,8-DHF here. On p.14, we now say: “These data also suggest that Asn61 contributes to the limited efficacy of 7,8-mediated PDXP inhibition in vitro.”

      The solubility of 7,8-DHF is dependent on the specific assay and buffer conditions. In BLI experiments, interference patterns caused by binding of 7,8-DHF in solution to biotinylated PDXP immobilized on the biosensor surface are measured. In phosphatase selectivity assays, phosphatases are in solution, and the effect of 7,8-DHF on the phosphatase activity is measured via the quantification of free inorganic phosphate.

      In BLI experiments, we observed that the sensorgrams obtained with the highest tested 7,8-DHF concentration (25 µM) showed the same curve shapes as the sensorgrams obtained with 12.5 µM 7,8-DHF. This contrasts with the expected steeper slope of the curves at 25 µM vs. 12.5 µM 7,8-DHF. The same behavior was observed for the reference sensors (i.e., the SSA sensors that were not loaded with PDXP, but incubated with 7,8-DHF at all employed concentrations for referencing against nonspecific binding of 7,8-DHF to the sensors). The sensorgrams at 25 µM 7,8-DHF were therefore not included in the analysis (this is now specified in the Materials and Methods BLI section on p.27). To clarify this point, we now state that “As a result of the poor solubility of the molecule, a saturation of the binding site was not experimentally accessible” (p.7).

      In contrast, the phosphatase selectivity assays described on p.8 could be performed with nominal 7,8-DHF concentrations of up to 40 µM. Although the effective 7,8-DHF concentration in solution is expected to be lower (see ref. 35 and discussed above), the limited solubility of 7,8-DHF in phosphatase assays does not prevent the quantification of free inorganic phosphate. Nevertheless, we cannot exclude some interference with this absorbance-based assay (e.g., due to turbidity caused by insoluble compound). Indeed, 5,6-dihydroxyflavone and 5,6,7-trihydroxyflavone caused an apparent increase in PDXP activity at concentrations above 10 µM (see Figure 2b), which may be related to compound solubility issues. Alternatively, these flavones may activate PDXP at higher concentrations.

      We have tested the 7,8-DHF analogue 3,7,8,4’-tetrahydroxyflavone at concentrations of 70 and 100 µM. At concentrations >100 µM, the DMSO concentration required for solubilizing the flavone interferes with PDXP activity. PDXP inhibition by 3,7,8,4’-tetrahydroxyflavone was slightly increased at 70 µM compared to 40 µM (by ~18%) but plateaued between 70 and 100 µM. These results are now mentioned in the text (p.7): “The efficacy of PDXP inhibition by 3,7,8,4’-tetrahydroxyflavone was not substantially increased at concentrations >40 µM (relative PDXP activity at 40 µM: 0.46 ± 0.05; at 70 µM: 0.38 ± 0.15; at 100 µM: 0.37 ± 0.09; data are mean values ± S.D. of n=6 experiments).”

      Page 9: The authors report that PDXP crystallizes as a homodimer in which 7,8DHF is bound only to one protomer. Is the second protomer active? Does that contribute to the 50% inhibition plateau? If Arg62 is mutated to break the salt bridge, does inhibition go beyond 50%?

      We have no way to measure the activity of the second, inhibitor-free protomer in murine PDXP. We know that PDXP functions as a constitutive homodimer, and based on our current understanding, both protomers are active. We have previously shown that the experimental monomerization of PDXP (upon introduction of two-point mutants in the dimerization interface) strongly reduces its phosphatase activity. Specifically, PDXP homodimerization is required for an inter-protomer interaction that mediates the proper positioning of the substrate specificity loop. Thus, homodimerization is necessary for effective substrate coordination and -dephosphorylation (PMID: 24338687).

      In the murine structure, we observed that 7,8-DHF binding to the second subunit (the B-protomer) is prevented by a salt bridge between Arg62 and Asp14 of a symmetry-related A-protomer in the crystal lace (i.e., this is not a salt bridge between Arg62 in the B-protomer and Asp14 in the A-protomer of a PDXP homodimer). As suggested, we have nevertheless tested the potential role of this salt bridge for the sensitivity of the PDXP homodimer to 7,8-DHF.

      The mutation of Arg62 is not suitable to answer this question, because this residue is involved in the coordination of 7,8-DHF (see Figure 3b), and the PDXP-Arg62Ala mutant is inhibitor resistant (see Figure 3d). We have therefore mutated Asp14, which is not involved in 7,8-DHF coordination. As shown in the new Figure 3 – figure supplement 1d, the 7,8-DHF-mediated inhibition of PDXPAsp14Ala again reached a plateau at ~50%. This result suggests that while an Arg62-Asp14 salt bridge is stabilized in the murine crystal, it is not a determinant of the active site accessibility of protomer B in solution.

      To address this important question further, we have now also generated co-crystals of human PDXP bound to 7,8-DHF, and refined two structures to 1.5 Å. We found that in human PDXP, both protomers bind 7,8-DHF. These new, higher resolution data are now shown in the revised Figure 3 and its figure supplements, and we have moved the panels referring to the previously reported murine PDXP structure to the Figure 3 – figure supplement 1. Thus, both protomers of human PDXP, but only one protomer of murine PDXP bind 7,8-DHF in the crystal structure, yet the 7,8-DHFmediated inhibition of human and murine PDXP plateaus at ~50% under the phosphatase assay conditions (see Figure 2a). We conclude that 7,8-DHF binding efficiency in the PDXP crystal does not necessarily reflect its inhibitory efficiency in solution.

      Taken together, these data indicate that the apparent partial inhibition of murine and human PDXP phosphatase activity by 7,8-DHF in our in vitro assays is not explained by an exclusive binding of 7,8DHF to just one protomer of the homodimer.

      Page 10-12; Is it possible to generate a mutant form of PDXP in which activity is maintained but inhibition is attenuated - an inhibitor-resistant mutant form of PDXP? Can such a mutant be used to assess on-target vs off-target effects of 7,8DHF in cells?

      This is an excellent point, and we agree with the Reviewer that such an approach would provide further evidence for cellular on-target activity of 7,8-DHF. Indeed, the verification of the PDXP-7,8DHF interaction sites has led to the generation of catalytically active, inhibitor-resistant PDXP mutants, such as Tyr146Ala and Glu148Ala (Fig. 3d). However, the biochemical analysis of such mutants in primary hippocampal neurons is a very difficult task.

      Primary hippocampal neurons are derived from pooled, isolated hippocampi of mouse embryos and are subsequently differentiated for 21 days in vitro. The resulting cellular yield is typically low and variable, and the viability (and contamination of the respective cultures with e.g. glial cells) varies from batch to batch. Although such cell preparations are suitable for electrophysiological or immunocytochemical experiments, they are far from ideal for biochemical studies. A meaningful experiment would require the efficient expression of a catalytically active, but inhibitor-resistant PDXP-mutant in PDXP-KO neurons. In parallel, PDXP-KO cells reconstituted with PDXP-WT (at phosphatase activity levels comparable with the PDXP mutant cells) would be needed for comparison. Unfortunately, the generation of (a) sufficient numbers of (b) viable cells that (c) efficiently express (d) functionally comparable levels of PDXP-WT or -mutant for downstream analysis (PLP/PL-levels upon inhibitor treatment) is currently not possible for us.

      Human iPSC-derived (hippocampal) spheroids are at present no alternative, due to the necessity of generating PDXP-KO lines first, and the difficulties with transfecting/transducing them. Such a system would require extensive validation. We have attempted to use SH-SY5Y cells (a metastatic neuroblastoma cell line), but PDXK expression in these cells is modest and they produce too little PLP. We therefore feel that this question is beyond the scope of our current study.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This is an interesting study that performs scRNA-Seq on infected and uninfected wounds. The authors sought to understand how infection with E. faecalis influences the transcriptional profile of healing wounds. The analysis demonstrated that there is a unique transcriptional profile in infected wounds with specific changes in macrophages, keratinocytes, and fibroblasts. They also speculated on potential crosstalk between macrophages and neutrophils and macrophages and endothelial cells using NicheNet analysis and CellChat. Overall the data suggest that infection causes keratinocytes to not fully transition which may impede their function in wound healing and that the infection greatly influenced the transcriptional profile of macrophages and how they interact with other cells.

      Strengths:

      It is a useful dataset to help understand the impact of wound infection on the transcription of specific cell types. The analysis is very thorough in terms of transcriptional analysis and uses a variety of techniques and metrics.

      Weaknesses:

      Some drawbacks of the study are the following. First, the fact that it only has two mice per group, and only looks at one time point after wounding decreases the impact of the study. Wound healing is a dynamic and variable process so understanding the full course of the wound healing response would be very important to understand the impact of infection on the healing wound. Including unwounded skin in the scRNA-Seq would also lend a lot more significance to this study. Another drawback of the study is that mouse punch biopsies are very different than human wounds as they heal primarily by contraction instead of reepithelialization like human wounds. So while the conclusions are generally supported the scope of the work is limited.

      Thank you for your thoughtful review and acknowledgment of the thoroughness of our analysis.

      First, the fact that it only has two mice per group, and only looks at one time point after wounding decreases the impact of the study.

      We acknowledge your concerns regarding the limitations of our study, particularly regarding the small number of mice per group and the examination of only one time point post-wounding. We agree that a more comprehensive analysis across multiple time points would provide a deeper understanding of the temporal changes induced by infection. While our primary focus in this study was to elucidate the foundational responses to bacteria-infected wounds, we attempted to augment our analysis by incorporating publicly available datasets of similar nature. However, these datasets lacked power in terms of cell number and populations. Nonetheless, we have bolstered our analysis by applying a crossentropy test on the integrated dataset and reporting its significance (Figure S1F), ensuring the robustness of our single-cell RNA sequencing datasets.

      Including unwounded skin in the scRNA-Seq would also lend a lot more significance to this study.

      We also recognize the significance of comparing infected wounds to unwounded skin to establish a baseline for transcriptional changes. While we attempted to incorporate publicly available unwounded skin samples into our analysis, we encountered limitations in the number of cells, particularly within the immune population. This constraint is addressed in the Limitations section of the manuscript.

      Another drawback of the study is that mouse punch biopsies are very different than human wounds as they heal primarily by contraction instead of re-epithelialization like human wounds.

      Regarding the concern about differences between murine and human wound healing mechanisms, we took measures during tissue isolation to mitigate this issue, extracting incisions of the wounds rather than contracted tissues. Despite the primary mode of wound closure in mice being contraction, we believe our analysis still offers valuable insights into cellular responses to infection relevant to human wound healing.

      We appreciate your constructive criticism of our study. Despite these constraints, we believe our work provides valuable insights into the transcriptional changes induced by infection in healing wounds.

      Reviewer #2 (Public Review):

      Summary:

      The authors have performed a detailed analysis of the complex transcriptional status of numerous cell types present in wounded tissue, including keratinocytes, fibroblasts, macrophages, neutrophils, and endothelial cells. The comparison between infected and uninfected wounds is interesting and the analysis suggests possible explanations for why infected wounds are delayed in their healing response.

      Strengths:

      The paper presents a thorough and detailed analysis of the scRNAseq data. The paper is clearly written and the conclusions drawn from the analysis are appropriately cautious. The results provide an important foundation for future work on the healing of infected and uninfected wounds.

      Weaknesses:

      The analysis is purely descriptive and no attempt is made to validate whether any of the factors identified are playing functional roles in wound healing. The experimental setup is analyzing a single time point and does not include a comparison to unwounded skin.

      We are thankful for your acknowledgment of the thoroughness of our analysis and the cautious nature of our conclusions.

      The analysis is purely descriptive, and no attempt is made to validate whether any of the factors identified are playing functional roles in wound healing.

      Regarding your concern about the purely descriptive nature of our analysis and the lack of functional validation of identified factors, we agree on the importance of understanding the functional roles of transcriptional changes in wound healing. To address this limitation, we plan to conduct functional experiments, such as perturbation assays or in vivo validation studies, to validate the roles of specific factors identified in our analysis.

      The experimental setup is analyzing a single time point and does not include a comparison to unwounded skin.

      We acknowledge the importance of comparing wounded tissue to unwounded skin to establish a baseline for understanding transcriptional changes. This point is noted and acknowledged in the limitations section of our manuscript.

      We appreciate your feedback and assure you that we will consider your suggestions in future iterations of our research.

      Recommendations For The Authors:

      We are grateful for the positive overall assessment of our revised work by the reviewers. Critical comments on specific aspects of our work are listed verbatim below followed by our responses.

      Reviewer 1 (Recommendations for the Authors):

      (1) The figures are a bit cluttered and hard to parse out. The different parts of the figure seem to be scattered all over the place with no consistent order.

      Thank you for your feedback regarding the figures in our manuscript. We acknowledge your concern that some panels may appear cluttered and challenging to navigate. In response, we made concerted efforts to declutter certain panels, taking into account page size constraints and ensuring a minimum font size for readability.

      (2) I didn't really understand what the last sentence on page 6 meant. Is this meant to say that these could be biomarkers of infection?

      We thank the reviewer for noting this lack of clarity. We revised the statement.

      Updated manuscript (lines 111-113)

      “Overall, the persistent E. faecalis infection contributed to higher Tgfb1 expression, whilst Pdgfa levels remained low, correlating with delayed wound healing.”

      (3) >(3) A reference on page 19 didn't format correctly.

      We thank the reviewer for catching the typos. We corrected the reference formatting.

      Updated manuscript (lines 503-505)

      “We confirm the immune-suppressive role of E. faecalis in wound healing, consistent with previous findings in different experimental settings (Chong et al., 2017; Kao et al., 2023; Tien et al., 2017).”

      (4) The title doesn't really address the scope of the finding which goes beyond immunomodulatory.

      The reviewer is correct! We therefore revised the title to cover all aspects of the study as:

      “Decoding the complexity of delayed wound healing following Enterococcus faecalis infection”

      Reviewer 2 (Recommendations for the Authors):

      (1) On page 6, the expression of Tgfb1 is described as "aggravated" by wounding alone. I am not sure whether this means Tgfb1 levels are increased or decreased. It appears from the data that it is increased, which was confusing to me since I interpreted "aggravated" as meaning decreased. So perhaps a different more straightforward word could be used to describe the data.

      We modified this ambiguous statement to:

      Updated manuscript (lines 105-106)

      “By contrast, wounding alone resulted in higher transforming growth factor beta 1 (Tgfb1) expression.”

      (2) On page 7, the authors state that "cells from infected wounds...demonstrated distinct clustering patterns compared to cells from uninfected wounds (Figure S1F)" but when I look at the data in this figure, I cannot really see a difference. Perhaps the differences could be more clearly highlighted?

      Thank you for pointing out this issue. We appreciate the reviewer's comment. We utilized the crossentropy test for statistical comparison, employing UMAP embedding space data. While the data underwent batch correction based on infection status, the UMAP plots for each condition may appear visually similar. However, it's important to note that the number of cells per clusters between the infected and uninfected conditions varies significantly. This aspect influences the selection of points (cells) and their nearest neighbours for statistical testing within each cluster in the embedding space. To address this concern, we have included a table indicating the number of cells per cell type alongside the plot (Figure S1F), providing additional context for the interpretation of our results.

      Author response table 1.

      Author response image 1.

      (3) On page 8, Zeb2hi cells are described as "immunosuppressive" and yet the genes are highlighted to express in include Cxcl2 and IL1b which I would classify as inflammatory, not immunosuppressive. Can the authors be a bit more clear on why they describe the phenotype of these cells as "immunosuppressive"?

      We agree with the reviewer that this is a bit counterintuitive. Conventionally, CXCL2 is thought to be chemoattractant for neutrophil recruitment. However, the infection-specific keratinocyte cluster expressing Cxcl2, Il1b, Wfdc17 along with Zeb2 and Thbs1 indicate their myeloid-derived suppressor cell-like features, which play immunosuppressive roles during infection and in cancer (Alshetaiwi et al., 2020; Siriwach et al., 2022; Veglia et al., 2021).

      Updated manuscript (lines 159-163)

      “As the barrier to pathogens, keratinocytes secrete a broad range of cytokines that can induce inflammatory responses (Alshetaiwi et al., 2020; Siriwach et al., 2022; Veglia et al., 2021). However, Zeb2hi keratinocytes co-expressing Cxcl2, Il1b, and Wfdc17, indicate myeloidderived suppressor cell-like phenotype which implies an immunosuppressive environment (Hofer et al., 2021; Veglia et al., 2021).”

      (4) On pages 8-9, Keratinocytes are described to express MHC class II. I find this quite unexpected since class II is usually thought to be expressed primarily by APCs such as DCs and B cells. Is there a precedent for keratinocytes to express class II? The authors should acknowledge that this is unexpected and in need of further validation, or support the claim with references in which class II expression has been previously observed on keratinocytes (and is thus not unexpected)

      Although MHC class II expression is predominantly on immune cells, an antigen-presenting role for keratinocytes has been reported in many studies (Banerjee et al., 2004; Black et al., 2007; Carr et al., 1986; Gawkrodger et al., 1987; Jiang et al., 2020; Li et al., 2022; Oh et al., 2019; Tamoutounour et al., 2019). Therefore, antigen-presenting role of keratinocytes is known and expected, and we think that this should be further investigated in in the context of wound infection.

      Updated manuscript (lines 177-179)

      “These genes are associated with the major histocompatibility complex (MHC) class II, suggesting a self-antigen presenting keratinocyte population, which have a role in costimulation of T cell responses (Meister et al., 2015; Tamoutounour et al., 2019).”

      REFERENCES

      Alshetaiwi, H., Pervolarakis, N., McIntyre, L. L., Ma, D., Nguyen, Q., Rath, J. A., Nee, K., Hernandez, G., Evans, K., Torosian, L., Silva, A., Walsh, C., & Kessenbrock, K. (2020). Defining the emergence of myeloid-derived suppressor cells in breast cancer using single-cell transcriptomics. Science Immunology, 5(44), eaay6017. https://doi.org/10.1126/sciimmunol.aay6017

      Banerjee, G., Damodaran, A., Devi, N., Dharmalingam, K., & Raman, G. (2004). Role of keratinocytes in antigen presentation and polarization of human T lymphocytes. Scandinavian Journal of Immunology, 59(4), 385–394. https://doi.org/10.1111/j.0300-9475.2004.01394.x

      Black, A. P. B., Ardern-Jones, M. R., Kasprowicz, V., Bowness, P., Jones, L., Bailey, A. S., & Ogg, G. S. (2007). Human keratinocyte induction of rapid effector function in antigen-specific memory CD4+ and CD8+ T cells. European Journal of Immunology, 37(6), 1485–1493. https://doi.org/10.1002/eji.200636915

      Carr, M. M., McVittie, E., Guy, K., Gawkrodger, D. J., & Hunter, J. A. (1986). MHC class II antigen expression in normal human epidermis. Immunology, 59(2), 223–227.

      Gawkrodger, D. J., Carr, M. M., McVittie, E., Guy, K., & Hunter, J. A. (1987). Keratinocyte expression of MHC class II antigens in allergic sensitization and challenge reactions and in irritant contact dermatitis. The Journal of Investigative Dermatology, 88(1), 11–16. https://doi.org/10.1111/1523-1747.ep12464641

      Jiang, Y., Tsoi, L. C., Billi, A. C., Ward, N. L., Harms, P. W., Zeng, C., Maverakis, E., Kahlenberg, J. M., & Gudjonsson, J. E. (2020). Cytokinocytes: The diverse contribution of keratinocytes to immune responses in skin. JCI Insight, 5(20), e142067, 142067. https://doi.org/10.1172/jci.insight.142067

      Li, D., Cheng, S., Pei, Y., Sommar, P., Kärner, J., Herter, E. K., Toma, M. A., Zhang, L., Pham, K., Cheung, Y. T., Liu, Z., Chen, X., Eidsmo, L., Deng, Q., & Xu Landén, N. (2022). Single-Cell Analysis Reveals Major Histocompatibility Complex II‒Expressing Keratinocytes in Pressure Ulcers with Worse Healing Outcomes. The Journal of Investigative Dermatology, 142(3 Pt A), 705–716. https://doi.org/10.1016/j.jid.2021.07.176

      Oh, S., Chung, H., Chang, S., Lee, S.-H., Seok, S. H., & Lee, H. (2019). Effect of Mechanical Stretch on the DNCB-induced Proinflammatory Cytokine Secretion in Human Keratinocytes. Scientific Reports, 9(1), 5156. https://doi.org/10.1038/s41598-019-41480-y

      Siriwach, R., Ngo, A. Q., Higuchi, M., Arima, K., Sakamoto, S., Watanabe, A., Narumiya, S., & Thumkeo, D. (2022). Single-cell RNA sequencing identifies a migratory keratinocyte subpopulation expressing THBS1 in epidermal wound healing. iScience, 25(4), 104130. https://doi.org/10.1016/j.isci.2022.104130

      Tamoutounour, S., Han, S.-J., Deckers, J., Constantinides, M. G., Hurabielle, C., Harrison, O. J., Bouladoux, N., Linehan, J. L., Link, V. M., Vujkovic-Cvijin, I., Perez-Chaparro, P. J., Rosshart, S. P., Rehermann, B., Lazarevic, V., & Belkaid, Y. (2019). Keratinocyte-intrinsic MHCII expression controls microbiota-induced Th1 cell responses. Proceedings of the National Academy of Sciences of the United States of America, 116(47), 23643–23652. https://doi.org/10.1073/pnas.1912432116

      Veglia, F., Sanseviero, E., & Gabrilovich, D. I. (2021). Myeloid-derived suppressor cells in the era of increasing myeloid cell diversity. Nature Reviews. Immunology, 21(8), 485–498. https://doi.org/10.1038/s41577-020-00490-y

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #3 (Public Review):

      Summary:

      It has been proposed in the literature, that the ATP release channel Panx1 can be activated in various ways, including by tyrosine phosphorylation of the Panx1 protein. The present study reexamines the commercial antibodies used previously in support of the phosphorylation hypothesis and the presented data indicate that the antibodies may recognize proteins unrelated to Panx1. Consequently, the authors caution about the use and interpretation of results obtained with these antibodies.

      Strengths:

      The manuscript by Ruan et al. addresses an important issue in Panx1 research, i.e. the activation of the channel formed by Panx1 via protein phosphorylation. If the authors' conclusions are correct, the previous claims for Panx1 phosphorylation on the basis of the commercial anti-phospho-Panx1 antibodies would be in question.

      This is a very detailed and comprehensive analysis making use of state-of-the-art techniques, including mass spectrometry and phos-tag gel electrophoresis.

      In general, the study is well-controlled as relating to negative controls.

      The value of this manuscript is, that it could spawn new, more function-oriented studies on the activation of Panx1 channels.

      Weaknesses:

      Although the manuscript addresses an important issue, the activation of the ATP-release channel Panx1 by protein phosphorylation, the data provided do not support the firm conclusion that such activation does not exist. The failure to reproduce published data obtained with commercial anti-phospho Panx1 antibodies can only be of limited interest for a subfield.

      (1) The title claiming that "Panx1 is NOT phosphorylated..." is not justified by the failure to reproduce previously published data obtained with these antibodies. If, as claimed, the antibodies do not recognize Panx1, their failure cannot be used to exclude tyrosine phosphorylation of the Panx1 protein. There is no positive control for the antibodies.

      The full title of our manuscript is “Human Pannexin 1 Channel is NOT Phosphorylated by Src Tyrosine Kinase at Tyr199 and Tyr309”. The major conclusion of our manuscript shall not be extended to the claim that “Panx1 is NOT phosphorylated”. This is by no means our conclusion. In fact, the LC-MS/MS data from both ours and others have shown that PANX1 is phosphorylated at both serine and tyrosine sites1. However, we provided solid evidence that Tyr199 and Tyr309 of human PANX1 are not effective substrate of the Src kinase.

      We did provide several positive controls for the antibodies in our study. We showed that the anti-PANX1 and anti-Src antibodies unambiguously recognized PANX1 and Src, respectively (Figure 3A), and that a pan-specific phosphotyrosine antibody (P-Tyr-100) unambiguously recognized phosphorylated Src (Figure 3A)—as expected—but did not recognize PANX1. In addition, we demonstrated that the two antibodies in question (anti-PANX1-pY198 and anti-PANX1-pY308) did produce signals in our western blot analysis, but we provided compelling evidence that the bands produced by these antibodies do not correspond to PANX1 (Figure 2B).

      (2) The authors claim that exogenous SRC expression does not phosphorylate Y198. DeLalio et al. 2019 show that Panx1 is constitutively phosphorylated at Y198, so an effect of exogenous SRC expression is not necessarily expected.

      We have unambiguously identified peptide fragments containing non-phosphorylated Y198 in our LC-MS/MS experiment, none corresponds to a phosphorylated Y198. Therefore, our LC-MS/MS data doesn’t support the notion that Panx1 is constitutively phosphorylated at Y198.

      (3) The authors argue that the GFP tag of Panx1at the COOH terminus does not interfere with folding since the COOH modified (thrombin cleavage site) Panx1 folds properly, forming an amorphous glob in the cryo-EM structure. However, they do not show that the COOH-modified Panx1 folds properly. It may not, because functional data strongly suggest that the terminal cysteine dives deep into the pore. For example, the terminal cysteine, C426, can form a disulfide bond with an engineered cysteine at position F54 (Sandilos et al. 2012).

      Our manuscript included results of using a non-GFP tagged PANX1 construct (Figure 2-figure supplement 1). We didn’t notice any difference for PANX1 phosphorylation between GFP-tagged and non-GFP-tagged PANX1. Therefore, the folding of the C-terminal tail of PANX1 doesn’t affect the conclusion of our study.

      (4) The authors dismiss the additional arguments for tyrosine phosphorylation of Panx1 given by the various previous studies on Panx1 phosphorylation. These studies did not, as implied, solely rely on the commercial anti-phospho-Panx1 antibodies, but also presented a wealth of independent supporting data. Contrary to the authors' assertion, in the previous papers the pY198 and pY308 antibodies recognized two protein bands in the size range of glycosylated and partial glycosylated Panx1.

      We didn’t dismiss additional arguments for the Src-dependent PANX1 regulation. In fact, in the discussion of our manuscript, we acknowledged the fact that Src may still be involved in PANX1 regulation, but probably through indirect mechanisms. In the two previous studies2,3, it’s unclear if the multimeric bands detected by pY198/pY308 antibodies correspond to glycosylated PANX1 or not, as the authors did not overlay the protein markers with their blots. In particular, the migration pattern of PANX1 changes across different western blot images from DeLalio et al2. It’s also worth noting that none of these “independent supporting data” in the two previous studies provided direct evidence that Src can phosphorylate pY198/pY308.

      (5) A phosphorylation step triggering channel activity of Panx1 would be expected to occur exclusively on proteins embedded in the plasma membrane. The membrane-bound fraction is small in relation to the total protein, which is particularly true for exogenously expressed proteins. Thus, any phosphorylated protein may escape detection when total protein is analyzed. Furthermore, to be of functional consequence, only a small fraction of the channels present in the plasma membrane need to be in the open state. Consequently, only a fraction of the Panx1 protein in the plasma membrane may need to be phosphorylated. Even the high resolution of mass spectroscopy may not be sufficient to detect phosphorylated Panx1 in the absence of enrichment processes.

      We agree with the reviewer that only plasma membrane-residing Panx1 phosphorylation is functionally relevant. Interestingly, however, previous studies actually analyzed total protein from cell lysate and concluded that PANX1 is phosphorylated at Y198 and Y3082,3. This has motivated our analysis, in which we found that the phosphorylation events cannot be detected when using whole cell lysate. Therefore, we have also conducted an electrophysiology experiment by comparing conditions with/without active Src kinase (Figure 7). Our result indicates that PANX1 current is not affected by the presence of Src. This result suggests that even if there might be minor Src kinase phosphorylation beyond detection limit of western blot or mass spectrometry, they may not be functionally significant as well.

      (6) In the electrophysiology experiments described in Figure 7, there is no evidence that the GFP-tagged Panx1 is in the plasma membrane. Instead, the image in Figure 7a shows prominent fluorescence in the cytoplasm. In addition, there is no evidence that the CBX-sensitive currents in 7b are mediated by Panx1-GFP and are not endogenous Panx1. Previous literature suggests that the hPanx1 protein needs to be cleaved (Chiu et al. 2014) or mutated at the amino terminus (Michalski et al 2018) to see voltage-activated currents, so it is not clear that the currents represent hPANX1 voltage-activated currents.

      Our previous analysis has already shown that endogenous current of non-transfected cells is not sensitive to CBX4. Therefore, the CBX-sensitive current in cells overexpressed PANX1 is from PANX1-GFP. It should be noted that when protein is overexpressed, it tends to accumulate at different intracellular membranes during protein synthesis/maturation. However, this doesn’t affect a portion of the protein to be trafficked to the plasma membrane. In the paper from Michalski et al 2018, it was shown that WT human/mouse PANX1 displayed voltage-dependent activation5. Although the current is relatively small, it is clearly distinguishable from non-transfected HEK and CHO cells. This voltage-dependent activation is also sensitive to CBX, consistent with our measurement (Figure 7)4. When GS is introduced at the N-terminus, the voltage-dependent activation of human/mouse PANX1 is significantly boosted, likely due to the altered NTH conformation resulting from the N-terminal extension.

      Recommendations for the authors:

      Reviewer #3 (Recommendations For The Authors):

      Literature quotes are still problematic. Why are secondary papers quoted instead of the original work? At least quote reviews by authors who published the original findings.

      We appreciate the reviewer pointing this out. We have carefully checked our references and made sure that the original literature is cited.

      Why does wtPanx1 run close to the 37 kD marker (Figure 2 supplement 1) instead of close to 50 kD as shown in the previous papers using the pY198 and pY308 antibodies?

      It is a common observation that membrane proteins migration in SDS-PAGE gel doesn’t correlate with their formula molecular weight, also known as “gel shifting”6–8. The molecular mechanism of this phenomenon remains complex. Therefore, simply relying on protein molecular standard could not unambiguously identify PANX1 protein band. This is an issue for identifying PANX1 band, especially in light of the fact that some antibodies may not be very specific (see Figure 6B). In our experiment, we have correlated the in-gel fluorescence and western blot signal which allowed us to determine the protein band corresponding to PANX1. It is worth noting that in Figure S3 of DeLalio 2019, the PANX1 is detected at 37 kDa2. However, in many other panels of the paper, PANX1 is detected at close to 50 kDa (for example, Figure S2B).

      Figure 6, supplement 1: why are there oligomers observed in the absence of crosslinking? Why is there no shift in the size of the "oligomers" in response to glycosidase F?

      It is common to observe multimeric membrane proteins, including PANX1, forming oligomeric bands in SDS-PAGE gels, likely because they are not fully denatured or disassembled. PANX1 also contains several free cysteines, which may non-specifically crosslink subunits. There is actually a small shift for the 75 kDa band (dimer) in Figure 6, supplement 1. For higher molecular weight bands, this small shift may not be apparent due to the limited resolution of the gel.

      A positive control for the antibodies used is missing. The authors argue that such controls are not available, since these commercial antibodies are "proprietary".

      We did provide several positive controls for the antibodies in our study. We showed that the anti-PANX1 and anti-Src antibodies unambiguously recognized PANX1 and Src, respectively (Figure 3A), and that a pan-specific phosphotyrosine antibody (P-Tyr-100) unambiguously recognized phosphorylated Src (Figure 3A)—as expected—but did not recognize PANX1. In addition, we demonstrated that the two antibodies in question (anti-PANX1-pY198 and anti-PANX1-pY308) did produce signals in our western blot analysis, but we provided compelling evidence that the bands produced by these antibodies do not correspond to PANX1 (Figure 2B).

      Unfortunately, the epitopes that Millipore Sigma used to generate anti-PANX1-pY198 and anti-PANX1-pY308 are not available. The description of the immunogen from Millipore Sigma website states that “A linear peptide corresponding to 12 amino acids surrounding phospho-Tyr198 of murine Pannexin-1” and “A linear peptide corresponding to 13 amino acids surrounding phosphotyrosine 308 of rat pannexin-1”. However, these immunogen peptides are not available for us to purchase.

      References

      (1) Nouri-Nejad, D. et al. Pannexin 1 mutation found in melanoma tumor reduces phosphorylation, glycosylation, and trafficking of the channel-forming protein. Mol Biol Cell 32, (2021).

      (2) DeLalio, L. J. et al. Constitutive SRC-mediated phosphorylation of pannexin 1 at tyrosine 198 occurs at the plasma membrane. Journal of Biological Chemistry 294, (2019).

      (3) Weilinger, N. L. et al. Metabotropic NMDA receptor signaling couples Src family kinases to pannexin-1 during excitotoxicity. Nat Neurosci 19, (2016).

      (4) Ruan, Z., Orozco, I. J., Du, J. & Lü, W. Structures of human pannexin 1 reveal ion pathways and mechanism of gating. Nature 584, (2020).

      (5) Michalski, K., Henze, E., Nguyen, P., Lynch, P. & Kawate, T. The weak voltage dependence of pannexin 1 channels can be tuned by N-terminal modifications. Journal of General Physiology 150, (2018).

      (6) Rath, A., Cunningham, F. & Deber, C. M. Acrylamide concentration determines the direction and magnitude of helical membrane protein gel shifts. Proc Natl Acad Sci U S A 110, (2013).

      (7) Rath, A. & Deber, C. M. Correction factors for membrane protein molecular weight readouts on sodium dodecyl sulfate-polyacrylamide gel electrophoresis. Anal Biochem 434, (2013).

      (8) Rath, A., Glibowicka, M., Nadeau, V. G., Chen, G. & Deber, C. M. Detergent binding explains anomalous SDS-PAGE migration of membrane proteins. Proc Natl Acad Sci U S A 106, (2009).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1:

      Summary:

      The evolution of non-shivering thermogenesis is of fundamental importance to understand. Here, in small mammals, the contractile apparatus of the muscle is shown to increase energy expenditure upon a drop in ambient temperature. Additionally, in the state of torpor, small hibernators did not show an increase in energy expenditure under the same challenge.

      Strengths:

      The authors have conducted a very well-planned study that has sampled the muscles of large and small hibernators from two continents. Multiple approaches were then used to identify the state of the contractile apparatus, and its energy expenditure under torpor or otherwise.

      Weaknesses:

      There was only one site of biopsy from the animals used (leg). It would be interesting to know if non-shivering thermogenesis is something that is regionally different in the animal, given the core body and distal limbs have different temperatures.

      We thank the reviewer for their time and effort in reviewing our manuscript. Furthermore, we agree that it would be of interest to perform similar experiments upon different muscle sites in these animals. This is of particular interest as in some mammals, such as mice, distal limbs do not shiver and therefore non-shivering thermogenesis may play a more prominent role in heat regulation. A paper from Aydin et al., demonstrated that when shivering muscles (soleus) were prevented undergoing non-shivering thermogenesis via knock-out of UCP1 and were then exposed to cold temperatures, the force production of these muscles was significantly reduced due to prolonged shivering [1]. These results do suggest that even in shivering muscle, non-shivering thermogenesis plays a key role in the generation of heat for survival and for the maintenance of muscle performance. Furthermore, there is evidence from garden dormice that muscle temperature during torpor is slightly warmer than abdominal temperature and slighter cooler that heart temperature which is 7-8°C than abdominal suggesting the existence of non-shivering thermogenesis in skeletal and cardiac muscles (Giroud et al. in prep) [2]. We have added this information and reference into our discussion to reflect this important point (Discussion, paragraph 6, “As the biopsies which were used…”).

      Reviewer #2:

      Summary:

      The authors utilized (permeabilized) fibers from muscle samples obtained from brown and black bears, squirrels, and Garden dormice, to provide interesting and valuable data regarding changes in myosin conformational states and energetics during hibernation and different types of activity in summer and winter. Assuming that myosin structure is similar between species then its role as a regulator of metabolism would be similar and not different, yet the data reveal some interesting and perplexing differences between the selected hibernating species.

      Strengths:

      The experiments on the permeabilized fibers are complementary, sophisticated, and well-performed, providing new information regarding the characteristics of skeletal muscle fibers between selected hibernating mammalian species under different conditions (summer, interarousal, and winter).

      The studies involve complementary assessments of muscle fiber biochemistry, sarcomeric structure using X-ray diffraction, and proteomic analyses of posttranslational modifications.

      Weaknesses:

      It would be helpful to put these findings on permeabilized fibers into context with the other anatomical/metabolic differences between the species to determine the relative contribution of myosin energetics (with these other contributors) to overall metabolism in these different species, including factors such as fat volume/distribution.

      We thank the reviewer for the time and effort they have put into reviewing our paper and are grateful for the helpful suggestions which we believe, enhances our work (please see below for detailed answers to critics).

      Reviewer #3:

      Summary and strengths:

      The manuscript, "Remodelling of skeletal muscle myosin metabolic states in hibernating mammals", by Lewis et al, investigates whether myosin ATP activity may differ between states of hibernation and activity in both large and small mammals. The study interrogates (primarily) permeabilized muscle strips or myofibrils using several state-of-the-art assays, including the mant-ATP assay to investigate ATP utilization of myosin, X-ray diffraction of muscles, proteomics studies, metabolic tests, and computational simulations. The overall data suggests that ATP utilization of myosin during hibernation is different than in active conditions.

      A clear strength of this study is the use of multiple animals that utilize two different states of hibernation or torpor. Two large animal hibernators (Eurasian Brown Bear, American Black Bear) represent large animal hibernators that typically undergo prolonged hibernation. Two small animal hibernators (Garden Dormouse, 13 Lined Ground Squirrel) undergo torpor with more substantial reductions in heart rate and body temperature, but whose torpor bouts are interrupted by short arousals that bring the animals back to near-summer-like metabolic conditions.

      Especially interesting, the investigators analyze the impact that body temperature may have on myosin ATP utilization by performing assays at two different temperatures (8 and 20 degrees C, in 13 Lined Ground Squirrels).

      The multiple assays utilized provide a more comprehensive set of methods with which to test their hypothesis that muscle myosins change their metabolic efficiency during hibernation.

      We thank this reviewer for the effort and time they have put into carefully reviewing our manuscript and have taken on board their valuable suggestions to improve our manuscript (please see below for detailed answers to critics).

      Suggestions and potential weaknesses:

      While the samples and assays provide a robust and comprehensive coverage of metabolic needs and testing, the data is less categorical. Some of these may be dependent on sample size or statistical analysis while others may be dependent on interpretation.

      (1) Statistical Analysis

      (1a) The results of this study often cannot be assessed properly due to a lack of clarity in the statistical tests.

      For example, the results related to the large animal hibernators (Figure 1) do not describe the statistical test (in the text of the results, methods, or figure legends). (Similarly for figure 6 and Supplemental Figure 1). Further, it is not clear whether or when the analysis was performed with paired samples. As the methods described, it appears that the Eurasian Brown Bear data should be paired per animal.

      We thank the reviewer for these important points and have added information upon the statistical tests used where previously missing in each figure legend. Details on the statistical testing used for figure 6 are listed in the methods section, paragraph 18, “All statistical analysis of TMT derived protein expression data…”

      (1b) The statistical methods state that non-parametric testing was utilized "where data was unevenly distributed". Please clarify when this was used.

      We have now clariid all statistical tests used in the figure legends.

      (1c) While there are two different myosin isoforms, the isoform may be considered a factor. It is unclear why a one-way ANOVA is generally used for most of the mant-ATP chase data.

      The reviewer is right, in our analysis, we haven’t considered ‘myosin isoforms’ as a factor. One of the main reasons for that is because we have decided to treat fibres expressing different myosin heavy chain isoforms as totally separated entities (not interconnected).

      (1d) While the technical replicates on studies such as the mant-ATP chase assay are well done, the total biological replicates are small. A consideration of the sample power should be included.

      Unfortunately, obtaining additional biological samples from these unique species is challenging. Hence, we have added a statement in the Discussion section. This statement focuses on the potential benefits of increasing sample size to increase statistical power (Discussion, paragraph 2, “In contrast to our study hypothesis…”

      (1e) An analysis of the biological vs statistical significance should be considered, especially for the mant-ATP chase data from the American Black Bear, where there appear to be shifts between the summer and winter data.

      We agree that it is important to be careful when drawing conclusions from data only based on p-values. We agree that the modest differences observed in these data on American Black bear, whilst not significant, are worth noting and we have added these considerations into the manuscript (Discussion, paragraph 2, “In contrast to our study hypothesis…).

      (2) Consistency of DRX/SRX data.

      (2a) The investigators performed both mant-ATP chase and x-ray diffraction studies to investigate whether myosin heads are in an "on" or "off" state. The results of these two studies do not appear to be fully consistent with each other, which should not be a surprise. The recent work of Mohran et al (PMID 38103642) suggests that the mant-ATP-predicted SRX:DRX proportions are inconsistent with the position of the myosin heads. The discussion appears to lack a detailed assessment of this prior work and lack a substantive assessment contrasting the differing results of the two assays in the current study. i.e. why the current study's mant-ATP chase and x-ray diffraction results differ.

      Prior works on skeletal muscle (observing discrepancies between Mant-ATP chase assay and X-ray diffraction) are rather scarce. Adding a comprehensive discussion about this may be beyond the scope of current study and would distract the reader from the main topic. For this reason, we have not added any section. Note that, we have other manuscripts in preparation that are specifically dedicated to the discrepancy.

      (2b) The discussion of the current study's x-ray diffraction data relating to the I_1,1/I_1,0 ratio and how substantially different this is to the M6 results merits discussion. i.e. how can myosin both be more primed to contract during IBA versus torpor (according to intensity ratio), but also have less mass near the thick filament (M6).

      The I1,1/I1,0 ratio indicates a subtle mass shift towards the myosin thick filament whilst the M6 spacing shows a more compliant thick filament. These results are not incompatible and rely on interpretation of the X-ray diffraction patterns. To avoid any confusion and avoid distracting the reader from the main topic, we have decided not to speculate there.

      (3) Possible interactions with Heat Shock Proteins

      Heat Shock Proteins (HSPs), such as HSP70, have been shown to be differential during torpor vs active states. A brief search of HSP and myosin reveals HPSs related to thick filament assembly and Heat Shock Cognate 70 interacting with myosin binding protein C. Especially given the author's discussion of protein stability and the potential interaction with myosin binding protein C and the SRX state, the limitation of not assessing HSPs should be discussed. (While HSP's relation to thick filament assembly might conceivably modify the interpretation of the M3 x-ray diffraction results, this reviewer acknowledges that possibility as a leap.)

      The reviewer raises an interesting and potentially important of the potential impact of HSP and their interaction with the thick filament during hibernation. We have added a section into the discussion of this manuscript regarding this, with particular impact upon the HSP70 acting as a chaperone for myosin binding protein, however we feel that it is important to point out that HSPs have only been shown to interact with MYBPC3, a cardiac isoform of this protein which is not present in skeletal muscle [3]. (Discussion, paragraph 5, “Of potential further interest to the regulation of myosin…”).

      Despite the above substantial concerns/weaknesses, this reviewer believes that this manuscript represents a valuable data set.

      Other comments related to interpretation:

      (4) The authors briefly mention the study by Toepfer et al [Ref 25] and that it utilizes cardiac muscles. There would benefit from increased discussion regarding the possible differences in energetics between cardiac and skeletal muscle in these states.

      As this manuscript focuses solely on skeletal muscle. We believe that introducing comparisons between cardiac and skeletal muscles would confuse the reader. These types of muscles have very different regulations of SRX/DRX as an example. Note that we are preparing a manuscript focusing on cardiac muscle and hibernation.

      (5) The author's analysis of temperature is somewhat limited.

      (5a) First, the authors use 20 degrees C (room temperature), not 37 degrees C, a more physiologic body temperature for large mammals. While it is true that limbs are likely at a lower temperature, 20 degrees C seems substantially outside of a normal range. Thus, temperature differences may have been minimized by the author's protocol.

      The authors agree that the experimental set up to perform these single fiber studies at slightly higher temperatures may have been more beneficial to replicate the physiological conditions of these hind leg muscle in the analyzed animals. However, previous work has shown that the resting myosin dynamics are in fact stable at temperatures between 20-30 degrees Celsius in type I, type II and cardiac mammalian muscle fibers [4].

      (5b) Second, the authors discuss the possibility of myosin contributing to non-shivering thermogenesis. The magnitude of this impact should be discussed. The suggestion of myosin ATP utilization also implies that there is some basal muscle tone (contraction), as the myosin ATPase utilizes ATP to release from actin, before binding and hydrolyzing again. Evidence of this tone should be discussed.

      The reviewer is raising an interesting point and it would indeed be interesting to assess the magnitude of the impact and whether a basal muscle tone exists. Assessing the magnitude of the impact, is not an easy task and would require very advanced simulations which we are not experts in unfortunately. As for basal muscle tone, this is difficult to say as myosin is not actually binding to actin but hydrolyzing ATP at a faster pace during hibernation. We then think that the relation between our data and basal muscle tone is unclear. Hence, we have decided not to discuss these points in the manuscript.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      This is a very interesting paper. I have some minor suggestions to help improve it.

      Is there any way to estimate the contribution of contractile apparatus to energy expenditure in reference to what is being generated at SERCA in the resting muscle under the various states examined?

      This is an interesting idea however, as far as we know, this would be challenging experimentally (in the hibernating mammals) and difficult to achieve in a reliable manner.

      It is important to emphasize that while BAT has been traditionally seen to be the site of NST, the skeletal muscle is very important, especially in large mammals, where BAT is going to be a very small % of the body and unlikely to be able to adequately provide heat. The addition of the contractile apparatus to SERCA as a heat generator at rest is very important -- also, the activation of ryanodine receptor Ca2+ to increase the local [Ca2+] at SERCA to generate heat has also recently been shown and should be mentioned (Meizoso-Huesca et al 2022, PNAS; Singh et al 2023, PNAS) alongside the work of Bal et al 2012 etc...

      We have included these mechanisms and references into the manuscript discussion [5, 6]. Discussion, paragraph 4, “A critical difference between the large hibernators…”

      Are you able to report the likely proportion of type II fibers in the muscles you have sampled?

      The fiber type breakdown for all animals used in this study is reported in supplementary table 1.

      The sampling of muscle from the legs of live animals is sensible and convenient. Is it possible different muscles in the body have different levels of NST, changes in energy expenditure in torpor, and other states?

      As discussed in the public review we have added to the discussion of this manuscript to reflect upon this important point of potentially different results from different muscle sites in these animals.

      Reviewer #2 (Recommendations For The Authors):

      Is it likely that the proportion of fast and slow myosin-heavy chains within the selected sample of myofibers from the different mammals contributes to the overall differences in the energetics of different conformational states? In living animals, how does the relative contribution of the energetics from different muscle fiber types compare with the contribution from other organs to the overall regulation of metabolism during activities in summer, winter, or periods of intermittent arousal?

      Fiber types in mammals can be vastly different between species as well as having a considerable amount of plasticity to change within each species upon specific stimuli. Furthermore, some mammals also have specific myosin heavy chain isoforms which have considerable expression, for example, myosin heavy chain 2B which is expressed in rodents such as mice but not larger mammals such as humans.

      In the manuscript, we demonstrate that there is no significant change in the ATP usage by myosin in resting muscle in any of the species which we examined (Fig 1 F, L; Fig 2 E, J). The relatively high mitochondrial density of type I fibers when compared to type II fibers may contribute to a higher overall requirement of energy storage primarily via lipid oxidation. However, mitochondrial respiration is heavily suppressed during hibernation, so questions remain over the overall energy demand in hibernating muscle beyond myosin [7]. The fact that myosin ATP demand is relatively preserved in hibernating muscle suggests that skeletal muscle may be a relatively energy-demanding organ even during hibernation, we speculate in the manuscript this may be due to the requirement of maintaining muscular tone and function during this period of prolonged immobilization. This may be of relevance when one considers the almost complete shutdown of organs involved with food intake and breakdown such as the stomach and liver during hibernation. Furthermore, heart rate and breathing rates are vastly suppressed. Altogether, whilst is it difficult at this point to make an accurate estimate of energy demands between the different organs of hibernators, our data points to skeletal muscle to be a relatively high energy demand organ during these periods. When considering the difference between fiber type, again our data suggests that both type I and type II fibers have relatively similar energy demands during hibernation.

      The supplementary data are quite revealing as to how the myosin isoform composition is stable in some species but highly plastic in others in response to the same environmental/metabolic challenges. Why is the myosin heavy chain isoform (I and II) composition stable for brown bears but not for black bears between summer and winter? This is very interesting. For the Ground squirrel, there is remarkable plasticity between myosin heavy chain isoforms ( I and II) between summer, interbout arousal, and torpor. Yet in the Garden Dormouse, the myosin heavy chain isoform (I and II) composition is stable between these three activity states. The inconsistencies between and within species are perplexing and worthy of closer interrogation.

      The measurements and role of myosin energetics in different conformational states are interesting but need to be explained in context with other metabolic regulators for these hibernating mammals, especially because some species show remarkable plasticity whereas others show remarkable stability. For example, compare brown and black bears which show differences in the response of myosin composition the activity, interbout arousal, and torpor. Ground squirrels show remarkable plasticity in myosin isoform composition between activity states (and likely metabolic differences), but the Garden Dormouse has a remarkably stable myosin isoform composition during the three metabolic/environmental challenges. What mechanisms facilitate these modifications in some but not other mammals, even those of similar size? The differences are very interesting, worthy of follow-up, and may well contribute to further understanding the significance of the energetics of different myosin conformational states.

      We agree that the changes seen between these species are very interesting and worthy of further investigation. What would be of further interest would be to look at methods which would allow for even deeper phenotyping, such as single fiber proteomics, to allow for the assessment of the percentage of hybrid fibers and fibers undergoing any fiber type switch during hibernating periods. Our results do observe a modest, albeit not significant, increase in the number of type I muscle fibers in 13-lined ground squirrels and Garden dormice during torpor which is consistent with previous studies[8]. Previous studies have demonstrated that lower temperatures may promote a shift towards more oxidative type I muscle fibers in mammals[9]. This could be an explanation for why we see this specifically in the smaller hibernators, however as we demonstrate and discuss, these lower temperatures are vital for the survival of these smaller mammals during hibernation so it would be inconsistent to hypothesize that these shifts are for heat-production purposes. Further studies are warranted to understand the relevance of these shifts further, particularly those with a higher sample size. It would also be on interest to examine fiber type percentages during the progression these long hibernating periods to observe if these changes are progressive.

      As for the triggers and mechanisms which facilitate these changes to myosin dynamics, this is of current investigation by the field. One which may be of particular relevance to the changes seen during hibernation would that of steroid hormones previous research has demonstrated that steroid hormone levels in make and female bears change differentially[10]. This may be of relevance as the steroid hormone estradiol has been shown to slow the resting myosin ATP turnover via the binding of myosin RLC[11]. Considering these studies, future work which looks at hibernating animals of each sex as different groups may be fruitful.

      Reviewer #3 (Recommendations For The Authors):

      i. PDF Pg 8- Results- 'Myosin temperature sensitivity is lost in relaxed skeletal muscles fibers of hibernating Ictidomys tridecemlineatus.': An extra comma appears to be placed between "temperature, decrease".

      ii. PDF Pg 9- Results- 'Hyper-phosphorylation of Myh2 predictably stabilizes myosin backbone in hibernating Ictidomys tridecemlineatus.' (last paragraph): A parenthesis needs to be closed upon the first reference to "supplemental figures 2 and 3".

      iii. PDF Pg 15- Methods- 'Samples collection and cryo-preservation'- The authors use the term "individuals" in the 2nd line. Consider using "subjects".

      iv. PDF Pg 15- Methods- 'Samples collection and cryo-preservation' (2nd paragraph)- define "subadult" in approximate months or years.

      v. PDF Pg 15- Methods- 'Samples collection and cryo-preservation' (2nd paragraph)- The authors state that brown bears were located in "February and again ... in late June". Was this order of operations always held? If so, a comment about how the potential ageing from the hibernation (especially if sub-adult transitions to adulthood in this period) should be included.

      All samples were collected during the subadult period of the lifespan of each bear and therefore we do not think that there would be a potential aging affect observed considering the lifespan of this species to be 20-30 years.

      vi. PDF Pg 15- Methods- 'Samples collection and cryo-preservation' (3rd paragraph)- The justification for deprivation of feeding of black bears 24 hours prior to euthanasia should be included. A comment on how this might impact post-translational modifications or gene expression should be included.

      Animals are starved prior to prevent aspiration during euthanasia. Considering these samples are to be compared to animals which have not consumed food or water for five months the impact relative impact on PTMs and gene expression would be considered negligible.

      vii. PDF Pg 17- Methods- 'Mant-ATP chase experiments' (just after normalized fluorescence equation): The "Where" may be lowercase.

      viii. PDF Pg 17- Methods- 'Mant-ATP chase experiments' (last paragraph): The protocol for myosin staining, along with the antibody identification (source, catalog number) should be included.

      ix. PDF Pg 18- Methods- 'Post-translational Modification Peptide mapping': Define the makeup of the acrylamide gel and/or the source and catalog number.

      x. PDF Pg 18- Methods- 'Post-translational Modification Peptide mapping': The authors state that "Gel bands were washed..." Please specify which protein bands and if multiple bands (i.e. multiple isoforms) were isolated.

      We thank this reviewer for their careful reading of our manuscript, we have made the changes above as relevant.

      Reference list

      (1) Aydin, J., et al., Nonshivering thermogenesis protects against defective calcium handling in muscle. Faseb j, 2008. 22(11): p. 3919-24.

      (2) Stickler, S., Regional body temperatures and fatty acid compositions in hibernating garden dormice: a focus on cardiac adaptions. 2022, Vienna: Vienna. p. v, 49 Seiten, Illustrationen.

      (3) Glazier, A.A., et al., HSC70 is a chaperone for wild-type and mutant cardiac myosin binding protein C. JCI Insight, 2018. 3(11).

      (4) Walklate, J., et al., Exploring the super-relaxed state of myosin in myofibrils from fast-twitch, slow-twitch, and cardiac muscle. Journal of Biological Chemistry, 2022. 298(3).

      (5) Meizoso-Huesca, A., et al., Ca<sup>2+</sup> leak through ryanodine receptor 1 regulates thermogenesis in resting skeletal muscle. Proceedings of the National Academy of Sciences, 2022. 119(4): p. e2119203119.

      (6) Singh, D.P., et al., Evolutionary isolation of ryanodine receptor isoform 1 for muscle-based thermogenesis in mammals. Proceedings of the National Academy of Sciences, 2023. 120(4): p. e2117503120.

      (7) Staples, J.F., K.E. Mathers, and B.M. Duffy, Mitochondrial Metabolism in Hibernation: Regulation and Implications. Physiology, 2022. 37(5): p. 260-271.

      (8) Xu, R., et al., Hibernating squirrel muscle activates the endurance exercise pathway despite prolonged immobilization. Exp Neurol, 2013. 247: p. 392-401.

      (9) Yu, J., et al., Effects of Cold Exposure on Performance and Skeletal Muscle Fiber in Weaned Piglets. Animals (Basel), 2021. 11(7).

      (10) Frøbert, A.M., et al., Differential Changes in Circulating Steroid Hormones in Hibernating Brown Bears: Preliminary Conclusions and Caveats. Physiol Biochem Zool, 2022. 95(5): p. 365-378.

      (11) Colson, B.A., et al., The myosin super-relaxed state is disrupted by estradiol deficiency. Biochemical and biophysical research communications, 2015. 456(1): p. 151-155.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1:

      Comments on revised version:

      The authors have satisfactorily addressed my concerns.

      I suggest some minor edits, however. Line 747 does not mention MARK3 and neither does the figure 8 legend (just MARK2). It would be helpful if the authors could include references to the papers reporting the shown structures in the Figure 8 legend

      We have added MARK3 and related references in the revised Figure 8 legend.

      Reviewer #2:

      I would recommend that the catalog numbers from the different antibodies used in the study, mainly CST and Invitrogen are depicted in material and methods (see Methods/Recombinant proteins and general reagents).

      Thank you for the comment. We have now added the antibody catalog numbers in the revised methods section.

      I have one remark related to question number 5 (my question was not clear enough). I meant if the authors did look at the functional relevance of the residues implicated in the identified salt-bridge network/tethers. What happens to the proteins functionally when you mutate those residues? (represented on Fig. 8).

      Otherwise, the authors have satisfactorily addressed my concerns.

      Yes, we have analyzed the stability of the salt bridge interaction in the context of cysteine mutations, and our findings are described in the results section titled “Cysteine mutations alter critical structural interactions required for kinase allosteric regulation Figure 6)”. However, we have not performed mutational analysis of the salt bridge residues as we feel this would be beyond the scope of the current study.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review): Weaknesses:

      However, the molecular mechanisms leading to NPC dysfunction and the cellular consequences of resulting compartmentalization defects are not as thoroughly explored. Results from complementary key experiments using western blot analysis are less impressive than microscopy data and do not show the same level of reduction. The antibodies recognizing multiple nucleoporins (RL1 and Mab414) could have been used to identify specific nucleoporins that are most affected, while the selection of Nup98 and Nup107 is not well explained.

      The results for the Western blots are less impressive than single nuclei imaging analysis because the protocol for isolating brain nuclei is heterogeneous and includes non-neuronal cells. For this reason, we selected specific nucleoporins for Western blot studies to complement the nonspecificity of pan-NPC antibodies for which the detection is based on the glycosylated moieties. We reasoned that a combination of pan-NPC and select NUPs will give the strongest complementary validation for the mutant phenotype. We have discussed the rationale of NUP selection in discussion. In brief, we selected NUP107 as it is a major component of the Yscaffold complex and is a long-lived subunit of the NPCs (Boehmer et al., 2003; D'Angelo et al., 2009). NUP98 is a mobile nucleoporin and is associated with the central pore, nuclear basket and cytoplasmic filaments. Both NUPs have been implicated in degenerative disorders. (Eftekharzadeh et al., 2018; Wu et al., 2001).

      There is also no clear hypothesis on how Aβ pathology may affect nucleoporin levels and NPC function. All functional NCT experiments are based on reporters or dyes, although one would expect widespread mislocalization of endogenous proteins, likely affecting many cellular pathways.

      We agree that the interaction between Aβ pathology and the NPC remains a work in progress. We decided to rigorously characterize Aβ-mediated deficits in App KI neurons – using different approaches and in more than one animal model – before moving on to explore mechanisms in subsequent studies, which we think deserves more extensive experiments. We seek your understanding and have included in the discussion, possible mechanisms for direct and indirect Aβ-mediated disruption of NPCs. We have also included an additional study to show the disruption in the localization of an endogenous nucleocytoplasmic protein – CRTC1 (cAMP Regulated Transcriptional Coactivator), which is CREB coactivator responsive to neural activity. We observed under basal and also in tetrodotoxin-silenced conditions, there is much higher CRTC1 in the nucleus in App KI neurons relative to WT. This reflects the compromised permeability barrier that we observed via FRAP studies. (Supplementary Figure S15).

      The second part of this manuscript reports that in App KI neurons, disruption in the permeability barrier and nucleocytoplasmic transport may enhance activation of key components of the necrosome complex that include receptor-interacting kinase 3 (RIPK3) and mixed lineage kinase domain1 like (MLKL) protein, resulting in an increase in TNFα-induced necroptosis. While this is of potential interest, it is not well integrated in the study. This potential disease pathway is not shown in the very simple schematic (Fig. 8) and is barely mentioned in the Discussion section, although it would deserve a more thorough examination.

      The study of necroptosis is meant to showcase a single cellular pathway that requires nucleocytoplasmic transport for activation that is compromised and is relevant for AD. We agree there is much more to explore in this pathway but feel is outside the scope of this study. We have included a new illustration that models how damage to NPCs and permeability barrier results in enhanced vulnerability of App KI neurons for necroptosis (Supplemental figure S12).

      Reviewer #2 (Public Review):

      (1) Adding statistics and comparisons between wild-type changes at different times/ages to determine if the nuclear pore changes with time in wild-type neurons. The images show differences in the Nuclear pore in neurons from the wild-type mice, with time in culture and age. However, a rigorous statistical analysis is lacking to address the impact of age/development on NUP function. Although the authors state that nuclear pore transport is reported to be altered in normal brain aging, the authors either did not design their experiments to account for the normal aging mechanisms or overlooked the analysis of their data in this light.

      All our quantifications and statistical comparisons in neuron cocultures are time-matched between WT and App KI neurons, and thus independent of age and maturity of the neurons in culture. The accelerated loss of NUP expression is evident across all time groups. However, we cannot compare across age groups in cultured neurons as the time-matched WT and App KI samples for each time point were processed and imaged separately as neurons matured over time (Fig. 1B-C). An experiment must be done simultaneously across all age groups to compare agerelated effects for WT and App KI neurons in order to account for time-dependent changes. Given the unique challenges of studying “aging” in culture systems, we opted to be more conservative in our interpretation of the results and as such, we were careful to describe the accelerated nuclear pore deficits in App KI neurons relative to time-matched WT expression and speculate its relationship to normal brain aging only in the discussion section. We seek your understanding in this matter. That said, we are able to capture the decline of the NPC in histology of brain sections and observed a statistically significant drop in WT NUP levels in animal sections across age groups where we quantified and compared the raw nuclear intensities from brain sections that were processed and imaged simultaneously across independent experiments (Fig. 1D-E). We have included a statement in the results section to highlight that point.

      (2) Add experiments to assess the contribution of wild-type beta-amyloid accumulation with aging. It was described in 2012 (Guix FX, Wahle T, Vennekens K, Snellinx A, Chávez-Gutiérrez L, Ill-Raga G, Ramos-Fernandez E, Guardia-Laguarta C, Lleó A, Arimon M, Berezovska O, Muñoz FJ, Dotti CG, De Strooper B. 2012. Modification of γ-secretase by nitrosative stress links neuronal ageing to sporadic Alzheimer's disease. EMBO Mol Med 4:660-673, doi:10.1002/emmm.201200243) and 2021 (Burrinha T, Martinsson I, Gomes R, Terrasso AP, Gouras GK, Almeida CG. 2021. Upregulation of APP endocytosis by neuronal aging drives amyloid-dependent synapse loss. J Cell Sci 134. doi:10.1242/jcs.255752), 28 DIV neurons are senescent and accumulate beta-amyloid42. In addition, beta-amyloid 42 accumulates normally in the human brain (Baker-Nigh A, Vahedi S, Davis EG, Weintraub S, Bigio EH, Klein WL, Geula C. 2015. Neuronal amyloid-β accumulation within cholinergic basal forebrain in ageing and Alzheimer's disease. Brain 138:1722-1737. doi:10.1093/brain/awv024), thus, it would be important to determine if it contributes to NUP dysfunction. Unfortunately, the authors tested the Abeta contribution at div14 when wild-type Abeta accumulation was undetected. It would enrich the paper and allow the authors to conclude about normal aging if additional experiments were performed, namely, treating 28Div neurons with DAPT and assessing if NUP is restored.

      Your point is well-noted. We are intrigued at the potential contribution of WT Aβ to the decline in NUPs and NPC but decided to focus on mutant Aβ for this manuscript. We have observed negligible MOAB2-positive Aβ signals in WT neurons across all age groups (data not shown) but acknowledge the potential contributions of aging toward a reduction in NPC function. Instead, we have included a section in the discussion to highlight the aging-related expression of Aβ in WT neurons and a subset of the citations above to indicate a possible link with normal decay of NPCs.

      Reviewer #3 (Public Review):

      Weaknesses:

      (1) It does not consider the relationship of the findings here to other published work on the intraneuronal perinuclear and nuclear accumulation of amyloid in other transgenic mouse models and in humans.

      We have updated the discussion to further elaborate on intraneuronal and perinuclear accumulation of amyloid and how that relates to our NPC phenotype.

      (2) It appears to presume that soluble, secreted Abeta is responsible for the effect rather than the insoluble amyloid fibrils.

      At present, our data cannot fully discount the role of fibrils or other forms of Aβ causing the NPC deficits, but our studies do show that external presence of Aβ (e.g. addition of synthetic oligomeric Aβ or App KI conditioned media) leads to intracellular accumulation and NPC dysfunction. We are aware that endogenous formation of fibrils could also contribute to the NPC dysfunction but refrained from drawing any conclusions without further studies. We have stated this in the discussion.

      (5) It is not clear when the alteration in NUP expression begins in the KI mice as there is no time at which there is no difference between NUP expression in KI and Wt and the earliest time shown is 2 months. If NUP expression is decreased from the earliest times at birth, then this makes the significance of the observation of the association with amyloid pathology less clear.

      The phenotype we observed early in neuronal cultures and in very young animals is subtle and in all our studies, the severity of the NUP phenotypes consistently correlates with elevated intracellular Aβ. We expect that by looking at earlier/younger neurons, the deficits will not be present. However, neurons before DIV7 are immature, and hence we chose not to include those in our observations. In animals, we observed Aβ expression in neuronal soma in young mice (2 mo.), but it is not clear when the deficits manifests and how early to look. While the NUP expression is reduced at an early stage, we speculate in discussion that cellular homeostatic mechanisms can compensate for any compromised nuclear functions and to maintain viability to the point where age-dependent degradation of cellular mechanisms will eventually lead to progression of AD.

      Reviewer #1 (Recommendations For The Authors):

      While the App KI model is suitable for modeling one key aspect of human AD, the use of the term "AD neurons" throughout the manuscript is misleading and should be avoided when describing experiments with "App KI neurons".

      Noted and corrected.

      The claim that Aβ pathology causes NPC dysfunction via reduced nucleoporin protein expression would be stronger if it was better supported by biochemical evidence based on western blots (WBs) to complement the strong microscopy data. The results shown in Figure 2H show a very weak effect compared to microscopy data that does not appear to match the quantification (e.g. Lamin-B1 staining appears reduced after 2 months in WB but not the graph). It is also not clear why nuclear fractionation is required. WB analyses with RL1 and MAB414 (that recognizes multiple FG-Nupsin ICCs and WBs) would help identify Nups that are most affected by Aβ pathology.

      The weaker Western blot results is due to the heterogeneity of the nuclei we isolated from the whole brain which includes non-neuronal cells. We reasoned that isolating the nuclear fraction would give us a cleaner Western blot with fewer background bands as the input lysate is more specific. We also decided to use antibodies against specific NUPs as a way to complement the pan-NPC antibodies that detect glycosylation-enriched epitopes in the nucleus. We reasoned that Western blot identification of individual subunits should provide complementary and stronger evidence for the reduction of NUPs at the peptide level. Overall, we used four different nuclear pore antibodies (RL1, Mab414, NUP98, NUP107) to demonstrate the same mutant phenotype in App KI neurons.

      While the observed NCT defects are discussed in detail, the authors do not present any potential mechanisms to be tested, how intracellular Aβ may impact NPCs. Does Aβ pathology affect nucleoporin expression or stability?

      We have observed the presence of Aβ adjacent to the nuclear membrane and also in the cytosol via high resolution confocal microscopy (Supplementary Figure S14). Our primary goal in this paper is to provide convincing evidence – using different assays and in more than one mouse model – for the reduction of NUPs and lower NPC counts. We feel mechanistic details of Aβdriven NPC disruption requires more extensive experimentation more suitable for subsequent publications.

      The very simple schematic just represents the loss of compartmentalization, without illustrating more complex concepts. It would also be improved by representing the outer and inner nuclear membrane fusing around the NPCs with a much wider perinuclear space between the membranes. As shown now, the nuclear envelope almost looks like a single membrane, while >60kDa proteins are shown at a similar size as the 125MDa NPC.

      We have updated the illustration along with a new schematic for necroptosis (Supplementary Figure S12). We have refrained from giving specific details of the damage to the nuclear pore complex because it is not yet clear the nature of these deficits.

      Misspelling of "Hoechst" as "Hochest" in several figures (Fig. 1, 2, S5, S7).

      Noted and corrected

      Reviewer #2 (Recommendations For The Authors):

      (1) Additional data analysis is required concerning the wild-type controls. The figures show clear differences in the wild-type neurons with time in culture (referring to figures 1A, 1B, 1C; 2A, 2B, 2C, 2D,6E, 6F, 6G, s4) and in different ages (2E, 2F, 2G, 5B, 5C, 5D). The data analysis is shown for knockin vs the time-matched wild-type condition. The effect of time in wild-type neurons/mice should also be analyzed. All the data is suggested to be normalized to 7 DIV/2month wild-type neurons/mice. Were these experiments done with different time points of the same culture? This would be the best to conclude on the effect of time.

      We have noted a decline of NUPs in WT neurons over time in primary cultures and in animal sections. This is not surprising since the NPC and nuclear signaling pathways deteriorate with age (Liu and Hetzer, 2022; Mertens et al., 2015). However, we are unable to do a direct comparison across age groups in cultured neurons as the time-matched WT and App KI neuronal samples for each time point were processed and imaged separately as neurons matured over time (Fig. 1B-C). Hence, we perform statistical analysis for each time-matched WT and App KI neurons. To be clear, multiple independent experiments across different cultures were performed at each time point. Given the inherent challenges of studying aging in culture systems, we opted to be more conservative in our interpretation of the results and as such, we were careful to describe the accelerated nuclear pore deficits in App KI neurons relative to WT levels without inferring the effect of time and speculate its relationship to normal brain aging only in the discussion section. That said, we are able to capture the decline of the nuclear pore complex across different age groups in histology of brain sections where we observed a drop in WT NUP levels in animal sections when we quantified and compared the raw nuclear intensities from brain sections that were processed and imaged simultaneously across independent experiments (Fig. 1D-E).

      Similarly, in Figure 2H, why aren't 2 months compared with 14 months? Why were these ages chosen? 2 months is a young adult, and 14 months is a middle-aged adult. To conclude, aging should have included an age between 18 and 24 months old.

      As with cultures, we isolated age-matched WT and App KI animals separately. We chose 2 to 14 months as they represent young and middle-aged adults as we wanted to showcase the nuclear pore deficits induced by the presence of Aβ without drawing a conclusion on the effects of age or time. That said, we do show histology of brain sections at 18 months of age with individual NUPs. We agree that the temporal aspects of NPC loss in WT neurons is interesting, however, given our experimental parameters, we cannot draw conclusions across different age groups at the moment.

      In Figure 3, statistics between wild type should have been included.

      Similar to the above comment, samples were processed and imaged independently across different groups, hence we cannot compare the datapoints across time.

      (4) Additional quantification: The intensity of MOAB2 at 2 and 13 months should be measured as in Figure 3C.

      Intracellular Aβ signal in 2-mo. old App KI mice is diffuse throughout the soma but in older animals, they are punctate. This observation was similarly described by Lord et al. for tgAPPArcSwe mice (Lord et al., 2006). We have included a confocal micrograph of MOAB-2 immunocytochemistry of a 13-mo. App KI brain section in supplemental figures (Supplementary Figure S13). We found it challenging to differentiate whether the signal is localized intracellularly or as an extracellular aggregate. Regardless, the differences in the quality and uneven distribution of Aβ signal makes any direct comparison of soma intensity across the different age groups harder to interpret in the context of the mutant phenotype.

      (5) Additional experiments: Because primary neurons differentiate, mature, and age with time in culture, they are required to control for the developmental stage of your cultures. Analyzing neuronal markers such as doublecortin for neuronal precursors, MAP2 (or Tau) for dendritic/axonal maturation, synapsin for synaptic maturation, and accumulation of senescenceassociated beta-galactosidase (SA-Beta-Gal) as an aging marker.

      As part of the maintenance of cultures, we stain cultures for axodendritic markers (e.g. MAP2), glial cell distribution (e.g GFAP) and excitatory vs. inhibitory neuronal subpopulations (e.g. Gad65) and synaptic markers (e.g. PSD95) to ensure that growth, survival and viability of neurons are not compromised (data not shown). These markers for maturity are routinely tracked to ensure proper development. We also test the health of the cultures (e.g. apoptosis, necrosis) and to look for cytoskeletal disruption or fragmentation for neuronal processes.

      (6) Additional methods: The quantification of Abeta intensity in Figure 3 is not clearly explained in the methods. Was the intensity measured per field, per cell body?

      The quantifications for Aβ are done for each MAP2-positive cell body and have included that statement in the methods.

      (7) Missing in discussion integration and references to these papers:

      a. Mertens J, Paquola ACM, Ku M, Hatch E, Böhnke L, Ladjevardi S, McGrath S, Campbell B, Lee H, Herdy JR, Gonçalves JT, Toda T, Kim Y, Winkler J, Yao J, Hetzer MW, Gage FH. 2015. Directly Reprogrammed Human Neurons Retain Aging-Associated Transcriptomic Signatures and Reveal Age-Related Nucleocytoplasmic Defects. Cell Stem Cell 17:705-718. doi:10.1016/j.stem.2015.09.001

      b. Guix FX, Wahle T, Vennekens K, Snellinx A, Chávez-Gutiérrez L, Ill-Raga G, Ramos-Fernandez E, Guardia-Laguarta C, Lleó A, Arimon M, Berezovska O, Muñoz FJ, Dotti CG, De Strooper B. 2012. Modification of γ-secretase by nitrosative stress links neuronal ageing to sporadic Alzheimer's disease. EMBO Mol Med 4:660-673. doi:10.1002/emmm.201200243

      c. Burrinha T, Martinsson I, Gomes R, Terrasso AP, Gouras GK, Almeida CG. 2021. Upregulation of APP endocytosis by neuronal aging drives amyloid-dependent synapse loss. J Cell Sci 134. doi:10.1242/jcs.255752),

      Neuronal amyloid-β accumulation within cholinergic basal forebrain in ageing and Alzheimer's disease. Brain 138:1722-1737. doi:10.1093/brain/awv024).

      We have cited a subset of the papers in the discussion section and also expanded the discussion to include the possibility of time-dependent changes for Aβ expression in WT neurons.

      Reviewer #3 (Recommendations For The Authors):

      Specific comments:

      (1) Fig. 1D,E. Fig. 2E, F. This shows the change in NUP IR with time for the APP-KI, but there is also a difference between Wt and KI from the earliest time shown. How early is this difference apparent? From birth? The study should go back to the earliest time possible as the timing of the staining for NUP is important to correlate this with other events of intraneuronal Abeta and amyloid IR. Is the difference between 4 and 7-month ko mice in Figures 2G and 2F statistically significant? If not, perhaps we need a larger N to determine the timing accurately.

      The point is well taken. We have not examined the WT and App KI brains before 2-mo. of age. At this early time point, the extracellular amyloid deposits are very low but intracellular Aβ can be readily detected in neuronal soma. We expect that as the animal ages, the Aβ inside cells will directly impact the NPC mutant phenotype, but it is unclear how early this phenotype manifests in animals and when we should look. To be clear, in less mature neurons (DIV7), the phenotype is very subtle and can only be observed via high resolution microscopy. The differences between 4-7 mo. old animals (Fig. 2F and G) in terms of severity of the reduction cannot be assessed as the age-matched animals for each time point were processed separately, but at each time point, we observed a significant reduction of NPC relative to WT. Nevertheless, in Figure 1E, we performed immunohistochemistry experiments with pan-NPC antibodies and quantified raw intensities to show a difference between 4/7-mo. with 13-mo. old animals.

      (2) Similarly, the increase in Abeta IR is only shown for cultured neurons and only a single time point of 2 months is shown for CA1 in KI brain. Since a major point is that the decrease in NUP IR is correlated with an increase in Abeta IR, a more convincing approach would be to stain for both simultaneously in KI brain, especially since Abeta IR is quite sensitive to conformational variation between APP, Abeta, and aggregated forms and whether they are treated with denaturants for "antigen retrieval". The entire brain hemisphere should be shown as the pathology is not limited to CA1. There are many different Abeta antibodies that are specific to the amyloid state so it should be possible to come up with a set of antibodies and conditions that work for both Abeta and NUP staining.

      The intracellular Aβ signal in 2-mo. old App KI mice is diffuse throughout the soma but in older animals, they are punctate. We have included a confocal micrograph of MOAB-2 immunocytochemistry of a 13-mo. App KI brain section (Supplementary Figure S13). We did not quantify Aβ as it was challenging to differentiate if the signal is intracellular Aβ or amyloid β plaques. Regardless, the differences in the quality and uneven distribution of Aβ signal makes any direct comparison of soma intensity across the different age groups much harder to interpret.

      (3) Figure 3A. The staining with MOAB 2 and 82E1 appears qualitatively different with 82E1 exhibiting larger perinuclear puncta. Both antibodies appear to stain puncta inside the nucleus consistent with previously published reports of intranuclear amyloid IR. If these are flattened images, then 3D Z stacks should be shown to clarify this. Figure 3H shows what appears to be Abeta immunofluorescence quantitation in DAPT-treated cells, but the actual images are apparently not shown. The details of this experiment aren't clear or what antibody is used, but this may not be Abeta as many APP fragments that are not Abeta also react with antibodies like MOAB2.

      Since 82E1 detects a larger epitope (aa1-16 as compared to 1-4 in MOAB-2), it is possible some forms of Aβ are differentially detected inside the cell. MOAB-2 is shown to detect the different forms of Aβ40 and 42, with a stronger selectivity for the latter. However, it is not known to react with APP or APP/CTFs (Youmans et al., 2012). DAPT-treated cells were processed and imaged as with other experiments in figure 3 using MOAB-2 antibodies to detect Aβ. We have included that information in the figure legends.

      The way we image the cell is to collect LSM800 confocal stacks and use IMARIS software to render the nucleus in a 3D object prior to quantifying the intensity or coverage. In this way, we are capturing and quantifying the entire volume of the nucleus and not just a single plane. The majority of signal for MOAB-2 positive Aβ are punctate signals in the cytosol with a subset adjacent to the nucleus (Supplementary Figure 14; Airyscan; single plane). We also detected MOAB-2 signals coming from within the nucleus. The nature of this interaction between Aβ and the nuclear membrane/perinuclear space/nucleoplasm remains unclear.

      (4) P20 L12. "We demonstrate an Aβ-driven loss of NUP expression in hippocampal neurons both in primary cocultures and in AD mouse models" It isn't clear that exogenous or extracellular Abeta drives this in living animals. All the data that demonstrate this is derived from cell culture and things may be very different (eg. Soluble Abeta concentration) in vivo. It is OK to speculate that the same thing happens in vivo, but to say it has been demonstrated in vivo is not correct.

      We have rewritten the opening statement in the paragraph to narrowly define our observations in the context of App KI. We understand the caveats of our studies in primary cultures, but we have done our due diligence to study the phenomenon in different assays, using at least four different nuclear pore antibodies, and in more than one mouse model to show the deficits. We mentioned Aβ-driven loss but did not conclude which Aβ peptide (e.g. 40 vs. 42) or form (e.g. fibrillar) that drives the deficits. However, we have shown some data that oligomers and not monomers as well as extracellular Aβ can accumulate in the soma and trigger NPC deficits. We also state in the discussion that other possible mechanisms of action, mainly via indirect interactions of Aβ with the cell, could result in the deficits.

      (5) P21, L21 "Inhibition of γ-secretase activity prevented cleavage of mutant APP and generation of Aβ, which led to the partial restoration of NUP levels". What the data actually shows is that treatment of the cells with DAPT led to partial restoration of NUP levels. Other studies have shown that DAPT is a gamma secretase inhibitor, so it is reasonable to suspect that the effect to gamma secretase activity, but the substrates and products are assumed rather than measured, so a little caution is a good idea here. For example, CTF alpha is also a substrate, producing P3, which is not considered abeta. The products Abeta and P3 also typically are secreted, where they can be further degraded. Abeta and P3 can also aggregate into amyloid, so whether the effect is really due to Abeta per se as a monomer or Abeta-containing aggregates isn't clear.

      The point is noted. DAPT inhibition of -secretase can impact more than one substate as the complex can cleave multiple substrates. However, we have measured Aβ intensity which increases with DAPT, and while a singular experiment is insufficient to show direct Aβ involvement, we have performed other experiments that show a correlation of Aβ levels inside the soma and the degree of NPC reduction. This includes the direct application of synthetic Aβ42 oligomers. We agree the data cannot fully exclude the involvement of other -secretase cleavage products, but we feel there is strong enough evidence that Aβ – in whatever form - is at least partially if not, the main driver that promote these deficits.

      (6) Discussion. The authors point to "intracellular Abeta" as a potential causative agent for decreased NUP expression and function and cite a number of papers reporting intracellular Abeta. (D'Andrea et al., 2001; Iulita et al., 2014; Kimura et al., 2003; LaFerla et al., 1997; Oddo et al., 2003b; Takahashi et al., 2004; Wirths et al., 2001). Most of these papers report immunoreactivity with Abeta antibodies and argue about whether this is really Abeta40 or 42 and not APP or APP-CTF immunoreactivity. What is missing from these papers and the discussion in this manuscript is that this is not just soluble Abeta, but Abeta amyloid of the same type that ends up in plaques because it has the same immunoreactivity with Abeta amyloid fibril-specific antibodies and even the classical anti-Abeta antibodies 6E10 and 4G8 after antigen retrieval as shown in papers by Pensalfini, et al., 2014 and Lee, et al., 2022 (1,2) who describe the evolution of neuritic plaques and their amyloid core beginning inside neurons. The term "dystrophic neurite" is a misnomer because the structures that resemble "neurites" morphologically are actually autophagic vesicles packed with Abeta and APP immunoreactive material which has the detergent insolubility properties of amyloid plaques. See (1,2). The apparent intranuclear IR of MOAB2 and 82E1 mentioned in comment 3 is relevant here. In Lee et al., the 3D serial section EM reconstruction of one of these neurons with perinuclear and nuclear amyloid shows abundant amyloid fibrils in the remnant of the nucleus. The nuclear envelope appears to break down as evidenced by the redistribution of NeuN immunoreactivity (Pensalfini et al.,) and other nuclear markers and the EM evidence (Lee et al.,). These papers are also improperly cited as evidence for a hypothetical intracellular source for soluble Abeta.

      We have devoted a section of the discussion to highlight some of these findings in the context of Pensalfini et al. 2014 and Lee et al. 2022. Lee et al. tested multiple animal strains to observe the Panthos structures but did not use the App KI mouse model. Since none of our experiments directly tested their observations (e.g. perinuclear fibrils or acidity of autophagic vesicles) in App KI, we decided to take a more conservative approach in our interpretations by framing the NPC deficits without specifying the nature of the intracellular Aβ. We note in discussion that it is entirely possible that App KI animals also show the same Panthos phenotypes and the perinuclear accumulation of Aβ which results in damaged NUPs. To do that, the Panthos phenotype must first be established in App KI mice.

      (7) The authors also cite the work of Ditaranto et al., 2001 and Ji et al., 2002 for Aβ-induced lysosomal leakage from these vesicular structures but overlook the original publications on Abeta-induced lysosomal leakage by Yang et al., (3) who further show that this is correlated with aggregation of Abeta42 upon internalization which also leads to the co-aggregation of APP and APP-CTFs in a detergent-insoluble form (4) and pulse-chase studies demonstrate that metabolically-labeled APP ultimately ends up as insoluble Abeta that have "ragged" N-termini (5). This work seems relevant to the results reported here as the perinuclear amyloid that the authors report here is likely to be the same insoluble, aggregated APP and APP-CTF-containing amyloid as that reported in references 1 and 2.

      We have included the literature references in the discussion, highlighting the possibility of lysosomal leakage contributing to the NPC damage.

      Minor points.

      (1) P2, L28 "permeability barrier facilities passive" should be 'facilitates'.

      (2) P7, L24 "homogenate and grounded for 5 additional strokes" One of the peculiarities of English is that the past tense of grind is ground. Grounded means something else.

      (3) P8, L9 "For synthetic Aβ experiments," Abeta what? 42? 40? It makes a difference and if it is Abeta42, you should be specific in the rest of the text where it is used.

      (4) P11, L14. "To determine if Aβ can trigger changes in nuclear structure and function" It seems a little early to start by presupposing that it is Abeta that triggers changes in nuclear structure and function. It sounds like you are starting out with a bias.

      (5) P11, L16,17 "While Aβ pathology is robustly detected in App KIs" At some point in the manuscript, either here or in the introduction, it would be useful to include a couple of sentences about what the pathology is in these mice along with the timing of the development of the pathology to compare with the results presented here. There are several types of amyloid deposits, "neuritic" plaques, diffuse plaques, and cerebrovascular amyloid. This is important because the early "neuritic" plaques are intraneuronal at least early on before the neuron dies. See (1,2).

      (6) P19, L10. "LMB is an inhibitor or CRM-1 mediated" should be of

      All minor points have been addressed in the manuscript and figures.

      References

      (1) Pensalfini, A., Albay, R., 3rd, Rasool, S., Wu, J. W., Hatami, A., Arai, H., Margol, L., Milton, S., Poon, W. W., Corrada, M. M., Kawas, C. H., and Glabe, C. G. (2014) Intracellular amyloid and the neuronal origin of Alzheimer neuritic plaques. Neurobiol Dis 71C, 53-61

      (2) Lee, J. H., Yang, D. S., Goulbourne, C. N., Im, E., Stavrides, P., Pensalfini, A., Chan, H., Bouchet-Marquis, C., Bleiwas, C., Berg, M. J., Huo, C., Peddy, J., Pawlik, M., Levy, E., Rao, M., Staufenbiel, M., and Nixon, R. A. (2022) Faulty autolysosome acidification in Alzheimer’s disease mouse models induces autophagic build-up of Abeta in neurons, yielding senile plaques. Nat Neurosci 25, 688-701

      (3) Yang, A. J., Chandswangbhuvana, D., Margol, L., and Glabe, C. G. (1998) Loss of endosomal/lysosmal membrane impermeability is an early event in amyloid Aß1-42 pathogenesis. J. Neurosci. Res. 52, 691-698

      (4) Yang, A. J., Knauer, M., Burdick, D. A., and Glabe, C. (1995) Intracellular A beta 1-42 aggregates stimulate the accumulation of stable, insoluble amyloidogenic fragments of the amyloid precursor protein in transfected cells. J Biol Chem 270, 14786-14792

      (5) Yang, A., Chandswangbhuvana, D., Shu, T., Henschen, A., and Glabe, C. G. (1999) Intracellular accumulation of insoluble, newly synthesized Aßn-42 in APP transfected cells that have been treated with Aß1-42. J. Biol. Chem. 274, 20650-20656

      References

      Boehmer, T., Enninga, J., Dales, S., Blobel, G., and Zhong, H. (2003). Depletion of a single nucleoporin, Nup107, prevents the assembly of a subset of nucleoporins into the nuclear pore complex. Proc Natl Acad Sci U S A 100, 981-985.

      D'Angelo, M.A., Raices, M., Panowski, S.H., and Hetzer, M.W. (2009). Age-dependent deterioration of nuclear pore complexes causes a loss of nuclear integrity in postmitotic cells. Cell 136, 284-295.

      Eftekharzadeh, B., Daigle, J.G., Kapinos, L.E., Coyne, A., Schiantarelli, J., Carlomagno, Y., Cook, C., Miller, S.J., Dujardin, S., Amaral, A.S., et al. (2018). Tau Protein Disrupts Nucleocytoplasmic Transport in Alzheimer's Disease. Neuron 99, 925-940 e927.

      Liu, J., and Hetzer, M.W. (2022). Nuclear pore complex maintenance and implications for agerelated diseases. Trends Cell Biol 32, 216-227.

      Lord, A., Kalimo, H., Eckman, C., Zhang, X.Q., Lannfelt, L., and Nilsson, L.N. (2006). The Arctic Alzheimer mutation facilitates early intraneuronal Abeta aggregation and senile plaque formation in transgenic mice. Neurobiol Aging 27, 67-77.

      Mertens, J., Paquola, A.C., Ku, M., Hatch, E., Bohnke, L., Ladjevardi, S., McGrath, S., Campbell, B., Lee, H., Herdy, J.R., et al. (2015). Directly Reprogrammed Human Neurons Retain Aging-Associated Transcriptomic Signatures and Reveal Age-Related Nucleocytoplasmic Defects. Cell stem cell 17, 705-718.

      Wu, X., Kasper, L.H., Mantcheva, R.T., Mantchev, G.T., Springett, M.J., and van Deursen, J.M. (2001). Disruption of the FG nucleoporin NUP98 causes selective changes in nuclear pore complex stoichiometry and function. Proc Natl Acad Sci U S A 98, 3191-3196.

      Youmans, K.L., Tai, L.M., Kanekiyo, T., Stine, W.B., Jr., Michon, S.C., Nwabuisi-Heath, E., Manelli, A.M., Fu, Y., Riordan, S., Eimer, W.A., et al. (2012). Intraneuronal Abeta detection in 5xFAD mice by a new Abeta-specific antibody. Molecular neurodegeneration 7, 8.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We thank both reviewers for their supportive comments. Reviewer 1 has suggested a different data processing strategy to better resolve subunits at the CALHM4/CALHM2 interface:

      I recommend an alternative data processing strategy. First, refine particles with 2-4 CALHM4 subunits with symmetry imposed. This is followed by symmetry expansion, signal subtraction of two adjacent subunits, and subsequent classification and refinement of the subtracted particles. This approach, while not guaranteed, can potentially provide a clearer definition of CALHM2 and CALHM4 interfaces and show whether CALHM2 subunits adopt different conformations based on their proximity to CALHM4 subunits.

      We have followed the recommended strategy in an attempt to improve the resolution and better resolve the structural heterogeneity in CALHM2/4 channels. To this end, we have combined symmetry expansion and partial signal subtraction, as suggested by the reviewer. Initially, a symmetrized (C11) 3.4 Å consensus map of undecameric CALHM2/4 channels bound to sybodies SbC2 and SbC4 was used. The particles of this reconstruction were subjected to symmetry expansion (C11) followed by signal subtraction of nine adjacent subunits. Next, we performed focused, alignment-free 3D classification of the remaining two subunits followed by refinement of these classes, leading to the classification of CALHM subunit pairs. The majority of the classes feature well-resolved CALHM2 pairs, consistent with the original approach (Author response image 1A). A minority of the classes contain CALHM4 subunits, revealing heterogeneity similar to regions of CALHM4 subunits observed in the non-symmetrized channel reconstruction (Author response image 1B). Unfortunately, this approach thus did not improve resolution or facilitate a more accurate subunit assignment. Consequently, we decided not to include these attempts in our manuscript. The resubmitted version thus contains only small corrections compared to the previous version.

      Author response image 1.

      Classification of subunit pairs of undecameric CALHM2/4 channels bound to sybodies SbC2 and SbC4 after the processing combining symmetry expansion and partial signal subtraction. (A) Classes showing CALHM2 subunit pairs. (B) Classes showing subunits at interfaces to CALHM4.

  3. Apr 2024
    1. Author response:

      The following is the authors’ response to the original reviews.

      We would like to express our gratitude to the reviewers for their suggestions and critiques as we continually strive to enhance the quality of the manuscript. We improved it, by incorporating the reviewers’ suggestions, changing the content and numbering of figures (Figs 1, 3S1 were edited; 4 figures were moved to supplemental materials), and adding several analyses suggested by the reviewers along with accompanying figures (1S2, 1S3) and tables (1 and 2). These analyses include investigating the link between freezing behavior and 44-kHz calls as well as their sound mean power and duration. Also, we have introduced detailed information regarding the experiments performed as well as expanded the description and discussion of the results section. Finally, we added the information about 44-kHz calls reported by another group – which was inspired by our findings.

      Below is the point-by-point response to the reviewers’ comments.

      Reviewer #1 (Public Review):

      Olszyński and colleagues present data showing variability from canonical "aversive calls", typically described as long 22 kHz calls rodents emit in aversive situations. Similarly long but higher-frequency (44 kHz) calls are presented as a distinct call type, including analyses both of their acoustic properties and animals' responses to hearing playback of these calls. While this work adds an intriguing and important reminder, namely that animal behavior is often more variable and complex than perhaps we would like it to be, there is some caution warranted in the interpretation of these data. The authors also do not provide adequate justification for the use of solely male rodents. With several reported sex differences in rat vocal behaviors this means caution should be exercised when generalizing from these findings.

      We fully agree that our data should be interpreted with caution and we followed the Reviewer’s suggestions along these lines (see below). Also, we appreciate the suggestion to explore the prevalence of 44-kHz calls in female subjects, which would indeed represent an important and intriguing extension of our research. However, due to present financial constraints, we can only plan such experiments. To address the comment, we have added the sentence: “Here we are showing introductory evidence that 44-kHz vocalizations are a separate and behaviorally-relevant group of rat ultrasonic calls. These results require further confirmations and additional experiments, also in form of repetition, including research on female rat subjects.”

      It is important to note that the data presented in the current manuscript originates primarily from previously conducted experiments. These earlier experiments employed male subjects only; it was due to established evidence indicating that the female estrus cycle significantly influences ultrasonic vocalization (Matochik et al., 1992). Adhering to controls for the estrus cycle would require a greater number of female subjects than males, which would not only increase animal suffering but also escalate the demands of human labor and financial costs.

      Firstly, the authors argue that the shift to higher-frequency aversive calls is due to an increase in arousal (caused by the animals having received multiple aversive foot shocks towards the end of the protocols). However, it cannot be ruled out that this shift would be due to factors such as the passage of time and increase in fatigue of the animals as they make vocalizations (and other responses) for extended periods of time. In fact the gradual frequency increase reported for 22 kHz calls and the drop in 44 kHz calls the next day in testing is in line with this.

      Answer: We would like to point out that the “increased-arousal” hypothesis, declared in the manuscript, is only a hypothesis – as reflected by the wording used. However, we changed the beginning of the sentence in question from “It could be argued” to “We would like to propose a hypothesis” to emphasize the speculative aspect of the proposed explanation behind the increase of 44-kHz ultrasonic emissions.

      Also, we do agree that other factors could contribute to the increased emission of 44kHz calls. These factors could include: heightened fear, stress/anxiety, annoyance/anger, disgust/boredom, grief/sadness, despair/helplessness, and weariness/fatigue. We are listing these potential factors in the discussion. Also, we added: “It is not possible, at this stage, to determine which factors played a decisive role. Please note that the potential contribution of these factors is not mutually exclusive”. However, we propose a list of arguments supporting the idea that 44-kHz vocalizations communicate an increased negative emotional state. Among these arguments were the conclusions drawn from additional analyses – mostly inspired by the fatigue hypothesis proposed by the Reviewer #1. In particular, we investigated changes in the sound mean power and duration of 22-kHz and 44-kHz calls. Specifically, we showed that the mean power of 44-kHz vocalizations did not change, and was higher than that of 22-kHz vocalizations (Fig. 1S2EF).

      Finally, the Reviewer #1 listed “the gradual frequency increase reported for 22 kHz calls and the drop in 44 kHz calls the next day” as arguments for the fatigue hypothesis. We do not agree that the “increase” should be interpreted as a sign of fatigue [Producing and maintaining higher frequency calls require greater effort from the vocalizer, on which we elaborated in the manuscript], also we are not sure what “drop in 44 kHz calls” the Reviewer is referring to [We assume it refers to less 44-kHz calls during testing vs. training; we suppose that the levels of arousal are lower in the test due to shorter session time and lack of shocks, which additionally contributes to fear extinction].

      Secondly, regarding the analysis where calls were sorted using DBSCAN based on peak frequency and duration, it is not surprising that the calls cluster based on frequency and duration, i.e. the features that are used to define the 44 kHz calls in the first place. Thus presenting this clustering as evidence of them being truly distinct call types comes across as a circular argument.

      Answer: The DBSCAN sorting results were to convey that when changing the clustering ε value, the degree of cluster separation, the 44-kHz vocalizations remained distinct from the 22-kHz and various short-call clusters that merged. In other words: 44-kHz calls remained separate from long 22-kHz, short 22-kHz and 50-kHz vocalizations, which all consolidated into one common cluster. As a result, in this mathematical analysis, 44-kHz vocalizations remained distinct without applying human biases. Additionally, frequency and duration are the two most common features used to define all types of calls (Barker et al., 2010; Silkstone & Brudzynski, 2019a, 2019b; Willey & Spear, 2013). In summary, we did not expect the analysis to isolate out the 44-kHz calls, and we were surprised by this result.

      The sparsity of calls in the 30-40 kHz range (shown in the individual animal panels in Figure 2C) could in theory be explained by some bioacoustics properties of rat vocal cords, without necessarily the calls below and above that range being ethologically distinct.

      Answer: We respectfully disagree with the argument regarding sparsity. It is important to note that, during prolonged fear conditioning experiments, we observed an increased incidence of 44-kHz calls (Fig. 1E-G) of up to >19% (Fig. 1S2AB) of the total ultrasonic vocalizations during specific inter-trial intervals. Also, it is possible that in observed experimental circumstances almost every fifth call could be attributed to the vocal apparatus as an artifact of its functioning (assuming we are interpreting the Reviewer’s argument correctly). While we do not believe this to be the case, we acknowledge the importance of considering such a hypothesis.

      The behavioral response to call playback is intriguing, although again more in line with the hypothesis that these are not a distinct type of call but merely represent expected variation in vocalization parameters. Across the board animals respond rather similarly to hearing 22 kHz calls as they do to hearing 44 kHz calls, with occasional shifts of 44 kHz call responses to an intermediate between appetitive and aversive calls. This does raise interesting questions about how, ethologically, animals may interpret such variation and integrate this interpretation in their responses. However, the categorical approach employed here does not address these questions fully.

      Answer: We are unsure of the Reviewer’s critique in this paragraph and will attempt to address it to the best of our understanding. Our finding of up to >19% of long seemingly aversive, 44-kHz calls, at a frequency in the define appetitive ultrasonic range (usually >32 kHz) is unexpected rather than “expected”. We would agree that aversive call variation is expected, but not in the appetitive frequency range.

      Kindly note the findings by Saito et al. (2019), which claim that frequency band plays the main role in rat ultrasonic perception. It is possible that the higher peak frequency of 44kHz calls may be a strong factor in their perception by rats, which is, however, modified by the longer duration and the lack of modulation.

      Also, from our experience, it is quite challenging to demonstrate different behavioral responses of naïve rats to pre-recorded 22-kHz (aversive) vs. 50-kHz (appetitive) vocalizations. Therefore, to demonstrate a difference in response to two distinct, potentially aversive, calls, i.e., 22-kHz vs. 44-kHz calls, to be even more difficult (as to our knowledge, a comparable experiment between short vs. long 22-kHz ultrasonic vocalizations, has not been done before).

      Therefore, we do not take lightly the surprising and interesting finding that “animals respond rather similarly to hearing 22 kHz calls as they do to hearing 44 kHz calls, with occasional shifts of 44 kHz call responses to an intermediate between appetitive and aversive calls”. We would rather put this description in analogous words: “the rats responded similarly to hearing 44-kHz calls as they did to hearing aversive 22-kHz calls, especially regarding heartrate change, despite the 44-kHz calls occupying the frequency band of appetitive 50-kHz vocalizations” and “other responses to 44-kHz calls were intermediate, they fell between response levels to appetitive vs. aversive playback” – which we added to the Discussion.

      Finally, we acknowledge that our findings do not present a finite and complete picture of the discussed aspects of behavioral responses to the presented ultrasonic stimuli (44-kHz vocalizations). Therefore, we have incorporated the Reviewer’s suggestion in the discussion. The added sentence reads: “Overall, these initial results raise further questions about how, ethologically, animals may interpret the variation in hearing 22-kHz vs. 44-kHz calls and integrate this interpretation in their responses.”

      In sum, rather than describing the 44kHz long calls as a new call type, it may be more accurate to say that sometimes aversive calls can occur at frequencies above 22 kHz. Individual and situational variability in vocalization parameters seems to be expected, much more so than all members of a species strictly adhering to extremely non-variable behavioral outputs.

      Answer: The surprising fact that there are presumably aversive calls that are beyond the commonly applied thresholds, i.e. >32 kHz, while sharing some characteristics with 22-kHz calls, is the main finding of the current publication. Whether they be finally assigned as a new type, subtype, i.e. a separate category or become a supergroup of aversive calls with 22-kHz vocalizations is of secondary importance to be discussed with other researchers of the field of study.

      However, we would argue – by showing a comparison – that 22-kHz calls occur at durations of <300 ms and also >300 ms, and are, usually, referred to in literature as short and long 22-kHz vocalizations, respectively (not introduced with a description that “sometimes 22kHz calls can occur at durations below 300 ms”). These are then regarded and investigated as separate groups or classes usually referred to as two different “types” (e.g., Barker et al., 2010) or “subtypes” (e.g., Brudzynski, 2015). Analogously, 44-kHz vocalizations can also be regarded as a separate type or a subtype of 22-kHz calls. The problem with the latter is that 22-kHz vocalizations are traditionally and predominantly defined by 18–32 kHz frequency bandwidth (Araya et al., 2020; Barroso et al., 2019; Browning et al., 2011; Brudzynski et al., 1993; Hinchcliffe et al., 2022; Willey & Spear, 2013).

      Reviewer #2 (Public Review):

      Olszyński et al. claim that they identified a "new-type" ultrasonic vocalization around 44 kHz that occurs in response to prolonged fear conditioning (using foot-shocks of relatively high intensity, i.e. 1 mA) in rats. Typically, negative 22-kHz calls and positive 50-kHz calls are distinguished in rats, commonly by using a frequency threshold of 30 or 32 kHz. Olszyński et al. now observed so-called "44-kHz" calls in a substantial number of subjects exposed to 10 tone-shock pairings, yet call emission rate was low (according to Fig. 1G around 15%, according to the result text around 7.5%).

      Answer: We are thankful for praising the strengths. Please note Figure 1G referred to 10-trial Wistar rats during delay fear conditioning session in which 44-kHz constituted 14.1% of ultrasonic vocalizations. The 7.5% number in results refers to the total of vocalizations analyzed across all animal groups used in fear conditioning experiments. These values have been updated in the current version of the manuscript. Also, please note – 44-kHz calls constituted up to 19.4% of calls, on average, in one of the ITI during fear conditioning session. However, the prevalence of aversive calls and of 44-kHz vocalizations in particular varied. It varied between individual rats; we added the text: “for n = 3 rats, 44-kHz vocalizations accounted for >95% of all calls during at least one ITI (e.g., 140 of total 142, 222 of 231, and 263 of 265 tallied 44-kHz calls), and in n = 9 rats, 44-kHz vocalizations constituted >50% of calls in more than one ITI.” See also further for the description of the array of experiments analyzed and the prevalence/percentage of 44-kHz calls encountered (Tab. 1, Fig. 1S3).

      Weaknesses: I see a number of major weaknesses.

      While the descriptive approach applied is useful, the findings have only focused importance and scope, given the low prevalence of "44 kHz" calls and limited attempts made to systematically manipulate factors that lead to their emission. In fact, the data presented appear to be derived from reanalyses of previously conducted studies in most cases and the main claims are only partially supported. While reading the manuscript, I got the impression that the data presented here are linked to two or three previously published studies (Olszyński et al., 2020, 2021, 2023). This is important to emphasize for two reasons:

      (1) It is often difficult (if not impossible) to link the reported data to the different experiments conducted before (and the individual experimental conditions therein). While reanalyzing previously collected data can lead to important insight, it is important to describe in a clear and transparent manner what data were obtained in what experiment (and more specifically, in what exact experimental condition) to allow appropriate interpretation of the data. For example, it is said that in the "trace fear conditioning experiment" both single- and grouphoused rats were included, yet I was not able to tell what data were obtained in single- versus group-housed rats. This may sound like a side aspect, however, in my view this is not a side aspect given the fact that ultrasonic vocalizations are used for communication and communication is affected by the social housing conditions.

      Answer: Preparing the current manuscript, we indeed used data collected during fear conditioning experiments which were described previously (Olszyński et al., 2021; Olszyński et al., 2022). Please note, however, that vocalization behavior during the fear conditioning itself was not the main subject of these publications. Our previous publications (Olszyński et al., 2020; Olszyński et al., 2021; Olszyński et al., 2022) present primarily ultrasonic-vocalization data from playback-part of experiments whereas here we analyze recordings obtained during fear conditioning experiments, thus we are analyzing new parts, i.e., not yet analyzed, of previously published studies. Also, we have performed additional experiments.

      In the first version of the current manuscript, we did not attempt to demonstrate exactly which calls were recorded in which conditions as the focus was to demonstrate that 44-kHz calls were emitted in several different fear-conditioning experiments. Also, as the experiments were not performed simultaneously and are results from different experimental situations, we would prefer to not compare these results directly.

      However, in the current version of the manuscript, we have introduced an additional reference system, based on Tab. 1, to more clearly indicate which rats have been employed in each analysis, e.g. the group of “Wistar rats that undergone 10 trials of fear conditioning” are described as “Tab. 1/Exp. 1-3/#2,4,8,13; n = 46”, i.e., these are the rats listed in rows 2, 4, 8, and 13 of Tab. 1.

      We have also tried to unify the analyses, in terms of rats used, as much as possible. Finally, we have also introduced Fig. 1S3 to demonstrate the prevalence of 44-kHz calls in all experiments analyzed with the note that “the experiments were not performed in parallel”.

      Regarding the Reviewer’s concerns about analyzing single- and pair-housed rats together. We have examined ultrasonic vocalizations emitted and freezing behavior in these two groups.

      • Ultrasonic vocalizations; when comparing the number of vocalizations, their duration, peak frequency and latency to first occurrence, equally for all types of calls and divided into types (short 22-kHz, long 22-kHz, 44-kHz, 50-kHz), the only difference was observed in peak frequency in 50-kHz vocalizations (50.7 ± 2.8 kHz for paired vs. 61.8 ± 3.1 kHz for single rats; p = 0.0280, Mann-Whitney). Since 50-kHz calls are not the subject of the current publication, we did not investigate this difference further. Also, this difference was not observed during playback experiments (Olszyński et al., 2020, Tab. 1).

      • Freezing. There were no differences between single- and pair-housed groups in freezing behavior, both in the time before first shock presentation and during fear conditioning training (Mann-Whitney).

      In summary, since the two groups did not differ in relevant ultrasonic features and freezing, we decided to present the results obtained from these rats together. However, we agree with the Reviewer, and it is possible that social housing conditions may in fact affect the emission of 44-kHz vocalizations, which could be a subject of another project – involving, e.g., larger experimental groups observed under hypothesis-oriented and defined conditions.

      (2) In at least two of the previously published manuscripts (Olszyński et al., 2021, 2023), emission of ultrasonic vocalizations was analyzed (Figure S1 in Olszyński et al., 2021, and Fig. 1 in Olszyński et al., 2023). This includes detailed spectrographic analyses covering the frequency range between 20 and 100 kHz, i.e. including the frequency range, where the "newtype" ultrasonic vocalization, now named "44 kHz" call, occurs, as reflected in the examples provided in Fig. 1 of Olszyński et al. (2023). In the materials and methods there, it was said: "USV were assigned to one of three categories: 50-kHz (mean peak frequency, MPF >32 kHz), short 22-kHz (MPF of 18-32 kHz, <0.3 s duration), long 22-kHz (MPF of 18-32 kHz, >0.3 s duration)". Does that mean that the "44 kHz" calls were previously included in the count for 50-kHz calls? Or were 44 kHz calls (intentionally?) left out? What does that mean for the interpretation of the previously published data? What does that mean for the current data set? In my view, there is a lack of transparency here.

      Answer: As mentioned above, we indeed used data collected during fear conditioning experiments which were described previously (Olszyński et al., 2021; Olszyński et al., 2022). However, in these publications, ultrasonic vocalizations emitted during playback experiments were the main subject, while the ultrasonic calls emitted during fear conditioning (performed before the playback) were only analyzed in a preliminary way. As a result, the 44-kHz vocalizations analyzed in the current manuscript were not included in the previous analyses. In particular, in Olszyński et al. (2021), we counted the overall number of ultrasonic vocalizations before fear conditioning session to determine the basal ultrasonic emissions (Fig. S1). Then, our next article (Olszyński et al., 2022), we analyzed again the number of all ultrasonic vocalizations before fear conditioning (Fig. S1) and restricted the analysis of vocalizations during fear conditioning to 22-kHz calls (Tab. S1 and S2).

      Also, we re-reviewed all the data used in our previous playback publications. Overall, 44-kHz calls were extremely rare in playback parts of the experiments. There were no 44-kHz calls in the playback data used in Olszyński et al. (2022) and Olszyński et al. (2020). In Olszyński et al. (2021), one rat produced eight 44-kHz calls. These 44-kHz calls constituted 0.03% of all vocalizations analyzed in the experiment (8/24888) and were included in the total number of calls analyzed (but not in the 50-kHz group), they were not described in further detail in that publication.

      Moreover, whether the newly identified call type is indeed novel is questionable, as also mentioned by the authors in their discussion section. While they wrote in the introduction that "high-pitch (>32 kHz), long and monotonous ultrasonic vocalizations have not yet been described", they wrote in the discussion that "long (or not that long (Biały et al., 2019)), frequency-stable high-pitch vocalizations have been reported before (e.g. Sales, 1979; Shimoju et al., 2020), notably as caused by intense cholinergic stimulation (Brudzynski and Bihari, 1990) or higher shock-dose fear conditioning (Wöhr et al., 2005)" (and I wish to add that to my knowledge this list provided by the authors is incomplete). Therefore, I believe, the strong claims made in abstract ("we are the first to describe a new-type..."), introduction ("have not yet been described"), and results ("new calls") are not justified.

      Answer: We would argue that 44-kHz vocalizations were indeed reported but not described. As far as we are concerned, an in-depth analysis of the properties and experimental circumstance of emission of long, high-frequency calls has not yet been performed. These researchers have observed, at least to a degree, similar calls to the ones we observed – as we mentioned in the discussion section. However, since these reported 44-kHz vocalizations were not fully described, we can only guess that they may be similar to ours. We speculate that perhaps like us, these researchers unknowingly recorded 44-kHz calls in their experiments and may also be able to describe them more extensively when re-analyzing their data as we have done here.

      Possibly, it was difficult to find reports on vocalizations, similar to the 44-kHz calls that we observed, because of the canonical and accepted definitions of ultrasonic vocalization types. Biały et al. (2019) allocated them as a part of 22-kHz group, perhaps because their calls were often of a step variation having both low and high components. Shimoju et al. (2020) grouped them along with 50-kHz vocalizations because they appeared during stroking rats held vertically; this procedure was compared to tickling which usually elicits appetitive calls.

      The Reviewer #2 states there are other publications to complete the list. We are aware of other articles authored by the same team as Shimoju et al. (2020) with different first authors. However, they are reporting similar findings to the cited article. Otherwise, we would gladly cite a more complete list of publications showing atypical, long, monotonous highfrequency vocalizations, similar to those observed in our experiments. Therefore, we would argue that ultrasonic vocalizations which were long, flat, high in frequency, and repeatedly occurring in a defined behavioral situation, have not been reported before. However, concerning the strong claims of novelty of our finding, we toned them down where we found this was warranted.

      In general, the manuscript is not well written/ not well organized, the description of the methods is insufficient, and it is often difficult (if not impossible) to link the reported data to the experiments/ experimental conditions described in the materials and methods section.

      Answer: The description of the methods has been adjusted and expanded. We added the requested link to each particular experiment as a formula “Tab. 1/Exp. nos./# nos.” which shows, each time, which experiments and experimental groups were analyzed. The list of the experiments and groups is found in the Tab. 1.

      For example, I miss a clear presentation of basic information: 1) How many rats emitted "44 kHz" calls (in total, per experiment, and importantly, also per experimental condition, i.e. single- versus group-housed)?

      Answer: We now clearly show which experiments were performed and how many animals were tested in each condition (Tab. 1), while the prevalence of 44-kHz calls amongst experimental conditions and animal groups is shown in Fig. 1S3. Also, we included information regarding the number of animals and treatment of each group of rats when reporting results. For example, we are stating that:

      (1a) “53 of all 84 conditioned Wistar rats (Tab. 1/Exp. 1-3/#2,4,6-8,13, Figs 1B, 1E, 1S1BC) displayed” 44-kHz vocalizations – as a general assessment; these numbers are different from those in the first version of the Ms, when we are mentioning Wistar rats conditioned 6 or 10 times only.

      (1b) “From this group of rats (n = 46), n = 41 (89.1%) emitted long 22-kHz calls, and 32 of them (69.6%) emitted 44-kHz calls” – this time referring only to 10-times conditioned Wistar rats as the biggest group that could be analyzed together (Figs 1F, 1G, 1S2A).

      (1c) “for n = 3 rats, 44-kHz vocalizations accounted for >95% of all calls during at least one ITI (e.g., 140 of total 142, 222 of 231, and 263 of 265 tallied 44-kHz calls), and in n = 9 rats, 44kHz vocalizations constituted >50% of calls in more than one ITI.”

      (2) Out of the ones emitting "44 kHz" calls, what was the prevalence of "44 kHz" calls (relative to 22- and 50-kHz calls, e.g. shown as percentage)?

      Answer: The prevalence of 44-kHz vocalizations in all investigated experiments and groups is shown in Fig. 1S3CD. Also, more information regarding the percentage of 44-kHz calls was demonstrated in Fig. 1S2AB where we calculated the distribution of 44-kHz calls to 22-kHz calls in Wistar rats, in 10-trial fear conditioning, across the length of the session.

      Additionally, the values are listed in the sentence regarding all Wistar rats which underwent 10 trials of fear conditioning: “these vocalizations were less frequent following the first trial (1.2 ± 0.4% of all calls), and increased in subsequent trials, particularly after the 5th (8.8 ± 2.8%), through the 9th (19.4 ± 5.5%, the highest value), and the 10th (15.5 ± 4.9%) trials, where 44-kHz calls gradually replaced 22-kHz vocalizations in some rats (Fig. 1F, 1S2B, Video 1; comp Fig. 1D vs. 1E).”

      (3) How did this ratio differ between experiments and experimental conditions?

      Answer: The prevalence of 44-kHz vocalizations in all experimental conditions is shown in Fig. 1S3. However, the direct comparison of results obtained in different conditions was not the goal of the present work. Also, we would argue, that such direct comparisons of results of different experiments would not be allowed. These experiments were done with different groups of animals, at different times, with different timetables of experimental manipulations.

      However, we are comfortable to state that:

      • There were more 44-kHz vocalizations during fear conditioning training than testing in all fear-conditioned Wistar rats;

      • We observed more 44-kHz vocalizations in Wistar rats compared to SHR.

      (4) Was there a link to freezing? Freezing was apparently analyzed before (Olszyński et al., 2021, 2023) and it would be important to see whether there is a correlation between "44-kHz" calls and freezing. Moreover, it would be important to know what behavior the rats are displaying while such "44-kHz" calls are emitted? (Note: Even not all 22-kHz calls are synced to freezing.) All this could help to substantiate the currently highly speculative claims made in the discussion section ("frequency increases with an increase in arousal" and "it could be argued that our prolonged fear conditioning increased the arousal of the rats with no change in the valence of the aversive stimuli"). Such more detailed analyses are also important to rule out the possibility that the "new-type" ultrasonic vocalization, the so-called "44 kHz" call, is simply associated with movement/ thorax compression.

      Answer: We analyzed freezing behavior and its association with ultrasonic emissions. The emission of 44-kHz vocalizations was associated with freezing. The results are now described and presented in the manuscript, i.e., Tab. 2, its legend and the description in Results: “Freezing during the bins of 22-kHz calls only (p < 0.0001, for both groups) and during 44-kHz calls only bins (p = 0.0003) was higher than during the first 5 min baseline freezing levels of the session. Also, the freezing associated with emissions of 44-kHz calls only was higher than during bins with no ultrasonic vocalizations (p = 0.0353), and it was also 9.9 percentage points higher than during time bins with only long 22-kHz vocalizations, but the difference was not significant (p = 0.1907; all Wilcoxon)” and “To further investigate this potential difference, we measured freezing during the emission of randomly selected single 44-kHz and 22-kHz vocalizations. The minimal freezing behavior detection window was reduced to compensate for the higher resolution of the measurements (3, 5, 10, or 15 video frames were used). There was no difference in freezing during the emission of 44-kHz vs. 22-kHz vocalizations for ≥150ms-long calls (3 frames, p = 0.2054) and for ≥500-ms-long calls (5 frames, p = 0.2404; 10 frames, p = 0.4498; 15 frames, p = 0.7776; all Wilcoxon, Tab. 2B).”

      Please note, that the general observation that "frequency increases with an increase in arousal" is not our claim but a general rule derived from large body of observations and proposed by the others (Briefer et al., 2012); we changed the wording of this statement to: “frequency usually increases with an increase in arousal (Briefer et al., 2012)”.

      The figures currently included are purely descriptive in most cases - and many of them are just examples of individual rats (e.g. majority of Fig. 1, all of Fig. 2 to my understanding, with the exception of the time course, which in case of D is only a subset of rats ("only rats that emitted 44-kHz calls in at least seven ITI are plotted" - is there any rationale for this criterion?)), or, in fact, just representative spectrograms of calls (all of Fig. 3, with the exception of G, all of Fig. 4).

      Answer: Please note, the former figures 2, 4, 6, and 8 have been now moved to supplementary figures 1S1, 2S1, 3S1, and 4S1 – to better organize the presentation of data. Figures 1, 3, 5, 7 are now 1, 2, 3, 4 respectively. In regards to presenting data from individual rats, this was to show the general patterns of ultrasonic-calls distributions observed. Showing the full data set as seen in Fig. 5A (now Fig. 3A) would obscure the readability of the graph without using mathematical clustering techniques such as DBSCAN.

      Concerning the Reviewer’s #2 question regarding the criterion of “minimum seven ITI”, we selected the highest vocalizers by taking animals above the 75th percentile of the number of ITI with 44-kHz calls. However, in the current version of the manuscript, we decided to omit this part of the analysis and the accompanying part of the figure, since it did not provide any additional informative value (apart from employing questionable criterion).

      Moreover, the differences between Fig. 5 and Fig. 6 are not clear to me. It seems Fig. 5B is included three times - what is the benefit of including the same figure three times?

      Answer: We hope that designating Fig. 6 as supplementary to Fig. 5 (now Figs 3S1 and 3, respectively) will make interpreting them more streamlined. Fig. 6A (now Fig. 3S1A) is a more detailed look on information presented in Fig. 5B (now Fig. 3B) with spectrogram images of ultrasonic vocalizations from different areas of the plot. Also, Fig. 3B (former Fig. 5B) was removed from Fig. 3S1B (former Fig. 6B).

      A systematic comparison of experimental conditions is limited to Fig. 7 and Fig. 8, the figures depicting the playback results (which led to the conclusion that "the responses to 44-kHz aversive calls presented from the speaker were either similar to 22-kHz vocalizations or in between responses to 22-kHz and 50-kHz playbacks", although it remains unclear to me why differences were seen b e f o r e the experimental manipulation, i.e. the different playback types in Fig. 8B).

      Answer: There were indeed instances of such before-differences. Such differences were observed in our previous studies (Olszyński et al., 2020, Tabs S9-12; Olszyński et al., 2021, Tabs S7; Olszyński et al., 2022, Tabs S4, S9, S13, S17, S18) and were most likely due to analyzing multiple comparisons. However, we think that the carry-over effect, mentioned by the Reviewer #2 (see below), also played a role.

      Related to that, I miss a clear presentation of relevant methodological aspects: 1) Why were some rats single-housed but not the others?

      Answer: As stated before, data were collected from our previous experiments and the observation of 44-kHz vocalizations in fear conditioning was an emergent discovery as we decided to analyze ultrasonic recordings from fear conditioning procedures. Single-housed animals were part of our experiment comparing fear conditioning and social situation on the perception of ultrasonic playback as described in Olszyński et al. (2020). Aside from this experiment, all other rats were housed in pairs.

      (2) Is the experimental design of the playback study not confounded? It is said that "one group (n = 13) heard 50-kHz appetitive vocalization playback while the other (n = 16) 22-kHz and 44kHz aversive calls". How can one compare "44 kHz" calls to 22- and 50-kHz calls when "44 kHz" calls are presented together with 22-kHz calls but not 50-kHz calls? What about carry-over effects? Hearing one type of call most likely affects the response to the other type of call. It appears likely that rats are a bit more anxious after hearing aversive 22-kHz calls, for example. Therefore, it would not be very surprising to see that the response to "44 kHz" calls is more similar to 22-kHz calls than 50-kHz calls.

      Of note, in case of the other playback experiment it is just said that rats "received appetitive and aversive ultrasonic vocalization playback" but it remains unclear whether "44 kHz" calls are seen as appetitive or aversive. Later it says that "rats were presented with two 10-s-long playback sets of either 22-kHz or 44-kHz calls, followed by one 50-kHz modulated call 10-s set and another two playback sets of either 44-kHz or 22-kHz calls not previously heard" (and wonder what data set was included in the figures and how - pooled?). Again, I am worried about carry-over effects here. This does not seem to be an experimental design that allows to compare the response to the three main call types in an unbiased manner.

      Answer: We apologize for being confounding and brief in our original description of the playback experiments. We wanted to avoid confusion associated with including several additional playback signals (please note some are not related to the current comparisons and include different 50-kHz ultrasonic subtypes and two different subtypes of short 22-kHz calls). We lengthened the description of these playback experiments in the current version.

      In general, including more than one type of ultrasonic calls as playback has a risk of a carry-over effect as well as a habituation effect (the responses become weak). However, it greatly reduces the number of required animals. Finally, regarding the first experiment, we chose 3 playbacks to compare the rats’ reactions, as this was the most conservative choice we thought of.

      We would like to highlight that we wanted to compare specifically the rats’ responses to 22-kHz vs. 44-kHz playback (as well as the effects of playback of different subtypes 50-kHz calls, which is not the subject of the current work). Therefore, we would argue, that the design of both experiments is actually unbiased regarding this key comparison (responses to 22-kHz vs. 44-kHz playback). In both experiments, 22-kHz and 44-kHz playbacks were included in the same sequences of stimuli and counterbalanced regarding their order (i.e., taking into account possible carry-over effects), and presented to the same rats. We regarded the group of rats that heard 50-kHz recordings as a baseline/control, since we know from previous playback studies what reactions to expect from rats exposed to these vocalizations (and 22-kHz playback), while in the second experiment, we reduced the 50-kHz playback to one set in order to minimize possible habituation to multiple playbacks.

      We agree that the design of both experiments does not allow for full comparison of the effects of aversive playbacks to 50-kHz playback. Also, we agree that some carry-over effects could play a role. It was mentioned in the discussion: ”Please factor in potential carryover effects (resulting from hearing playbacks of the same valence in a row) in the differences between responses to 50-kHz vs. 22/44-kHz playbacks, especially, those observed before the signal (Fig. 4AB).” However, we would still argue that the observed lack of difference in heartrate response (Fig. 4A) and the differences regarding the number of 50-kHz calls emitted (e.g., Fig. 4S1F) are void of the constraints raised by the Reviewer #2.

      We acknowledge that our studies do not give a complete picture of 44-kHz ultrasonic perception in relation to other ultrasonic bands and, given the possibility, we would like to perform more in-depth and focused experiments to study this aspect of 44-kHz calls in the future.

      Finally, regarding the second experiment, the description of the rats now includes that they “received 22-kHz, 44-kHz, and 50-kHz ultrasonic vocalization playback”, while the description of the experiment itself includes: “Responses to the pairs of playback sets were averaged”.

      Of note, what exactly is meant by "control rats" in the context of fear conditioning is also not clear to me. One can think of many different controls in a fear conditioning experiment.

      More concrete information is needed.

      Answer: This information was included in our previous publications. However, it was now provided in the method section of the current version of the manuscript. In general, control rats were subjected to the same procedures but did not receive electric shocks.

      Literature included in the answers

      Araya, E. I., Baggio, D. F., Koren, L. O., Andreatini, R., Schwarting, R. K. W., Zamponi, G. W., & Chichorro, J. G. (2020). Acute orofacial pain leads to prolonged changes in behavioral and affective pain components. Pain, 161(12), 2830-2840. https://doi.org/10.1097/j.pain.0000000000001970

      Barker, D. J., Root, D. H., Ma, S., Jha, S., Megehee, L., Pawlak, A. P., & West, M. O. (2010). Dose-dependent differences in short ultrasonic vocalizations emitted by rats during cocaine self-administration. Psychopharmacology (Berl), 211(4), 435-442. https://doi.org/10.1007/s00213-010-1913-9

      Barroso, A. R., Araya, E. I., de Souza, C. P., Andreatini, R., & Chichorro, J. G. (2019). Characterization of rat ultrasonic vocalization in the orofacial formalin test: Influence of the social context. Eur Neuropsychopharmacol, 29(11), 1213-1226. https://doi.org/10.1016/j.euroneuro.2019.08.298

      Biały, M., Podobinska, M., Barski, J., Bogacki-Rychlik, W., & Sajdel-Sulkowska, E. M. (2019). Distinct classes of low frequency ultrasonic vocalizations in rats during sexual interactions relate to different emotional states. Acta Neurobiol Exp (Wars), 79(1), 1-12. https://www.ncbi.nlm.nih.gov/pubmed/31038481

      Briefer, E. F., Padilla de la Torre, M., & McElligott, A. G. (2012). Mother goats do not forget their kids' calls. Proc Biol Sci, 279(1743), 3749-3755. https://doi.org/10.1098/rspb.2012.0986

      Browning, J. R., Browning, D. A., Maxwell, A. O., Dong, Y., Jansen, H. T., Panksepp, J., & Sorg, B. A. (2011). Positive affective vocalizations during cocaine and sucrose self administration: a model for spontaneous drug desire in rats. Neuropharmacology, 61(1-2), 268-275. https://doi.org/10.1016/j.neuropharm.2011.04.012

      Brudzynski, S. M. (2015). Pharmacology of Ultrasonic Vocalizations in adult Rats: Significance, Call Classification and Neural Substrate. Curr Neuropharmacol, 13(2), 180-192. https://doi.org/10.2174/1570159x13999150210141444

      Brudzynski, S. M., & Bihari, F. (1990). Ultrasonic vocalization in rats produced by cholinergic stimulation of the brain. Neurosci Lett, 109(1-2), 222-226. https://doi.org/10.1016/0304-3940(90)90567-s

      Brudzynski, S. M., Bihari, F., Ociepa, D., & Fu, X. W. (1993). Analysis of 22 kHz ultrasonic vocalization in laboratory rats: long and short calls. Physiol Behav, 54(2), 215-221. https://doi.org/10.1016/0031-9384(93)90102-l

      Hinchcliffe, J. K., Jackson, M. G., & Robinson, E. S. (2022). The use of ball pits and playpens in laboratory Lister Hooded male rats induces ultrasonic vocalisations indicating a more positive affective state and can reduce the welfare impacts of aversive procedures. Lab Anim, 56(4), 370-379. https://doi.org/10.1177/00236772211065920

      Matochik, J. A., White, N. R., & Barfield, R. J. (1992). Variations in scent marking and ultrasonic vocalizations by Long-Evans rats across the estrous cycle. Physiol Behav, 51(4), 783-786. https://doi.org/10.1016/0031-9384(92)90116-j

      Olszyński, K. H., Polowy, R., Małż, M., Boguszewski, P. M., & Filipkowski, R. K. (2020). Playback of Alarm and Appetitive Calls Differentially Impacts Vocal, Heart-Rate, and Motor Response in Rats. iScience, 23(10), 101577. https://doi.org/10.1016/j.isci.2020.101577

      Olszyński, K. H., Polowy, R., Wardak, A. D., Grymanowska, A. W., & Filipkowski, R. K. (2021). Increased Vocalization of Rats in Response to Ultrasonic Playback as a Sign of Hypervigilance Following Fear Conditioning. Brain Sci, 11(8). https://doi.org/10.3390/brainsci11080970

      Olszyński, K. H., Polowy, R., Wardak, A. D., Grymanowska, A. W., Zieliński, J., & Filipkowski, R. K. (2022). Spontaneously hypertensive rats manifest deficits in emotional response to 22-kHz and 50-kHz ultrasonic playback. Prog Neuropsychopharmacol Biol Psychiatry, 120, 110615. https://doi.org/10.1016/j.pnpbp.2022.110615

      Saito, Y., Tachibana, R. O., & Okanoya, K. (2019). Acoustical cues for perception of emotional vocalizations in rats. Scientific Reports, 9(1), 10539.

      Sales, G. D. (1979). Strain Differences in the Ultrasonic Behavior of Rats (Rattus norvegicus) Am Zool, 19(2), 513-527. https://www.jstor.org/stable/3882331

      Shimoju, R., Shibata, H., Hori, M., & Kurosawa, M. (2020). Stroking stimulation of the skin elicits 50-kHz ultrasonic vocalizations in young adult rats. J Physiol Sci, 70(1), 41. https://doi.org/10.1186/s12576-020-00770-1

      Silkstone, M., & Brudzynski, S. M. (2019a). The antagonistic relationship between aversive and appetitive emotional states in rats as studied by pharmacologically-induced ultrasonic vocalization from the nucleus accumbens and lateral septum. Pharmacology Biochemistry and Behavior, 181, 77-85. https://doi.org/10.1016/j.pbb.2019.04.009

      Silkstone, M., & Brudzynski, S. M. (2019b). Intracerebral injection of R-(-)-Apomorphine into the nucleus accumbens decreased carbachol-induced 22-kHz ultrasonic vocalizations in rats. Behavioural Brain Research, 364, 264-273. https://doi.org/10.1016/j.bbr.2019.01.044

      Willey, A. R., & Spear, L. P. (2013). The effects of pre-test social deprivation on a natural reward incentive test and concomitant 50 kHz ultrasonic vocalization production in adolescent and adult male Sprague-Dawley rats. Behav Brain Res, 245, 107-112. https://doi.org/10.1016/j.bbr.2013.02.020

      Wöhr, M., Borta, A., & Schwarting, R. K. (2005). Overt behavior and ultrasonic vocalization in a fear conditioning paradigm: a dose-response study in the rat. Neurobiol Learn Mem, 84(3), 228-240. https://doi.org/10.1016/j.nlm.2005.07.004

      Recommendations For The Authors:

      Reviewer #1 (Recommendations For The Authors):

      Additional considerations:

      The discussion of the "perfect fifth" and the proposition that this observation could be evidence of an evolutionary mechanism underlying it is rather far-fetched, especially for being presented in the Results section (with no supporting non-anecdotal evidence).

      Answer: We agree with the Reviewer #1. The text was modified, the word “evolutionary” was deleted. Instead, we expended on the possible reason for prevalence of the perfect fifth in the current version of the manuscript; we added that the prevalence of the perfect fifth: “could be explained by the observation that all physical objects capable of producing tonal sounds generate harmonic vibrations, the most prominent being the octave, perfect fifth, and major third (Christensen, 1993, discussed in Bowling and Purves, 2015).”

      It is not clear why Sprague-Dawleys were used as "receivers" in the playback experiment, when presumably the calls were recorded from Wistars and SHRs. While this does not critically impact the conclusions, within the species rats should be able to respond appropriately to calls made by rats of different genetic backgrounds, it adds an unnecessary source of variance.

      Answer: Sprague-Dawley rats were used to test another normotensive strain of rats. Regarding the Reviewer’s main point – we beg to differ as we think that it is worth testing playback stimuli in different strains. Diverging the stimuli between different rat strains would add unnecessary variance and it seemed logical to use the same recordings to test effects in different strains. Please note that finally, in spite of this additional variance, the results of both playback experiments are, in general, similar – which may point to a universal effect of 44-kHz playback across rat strains.

      It is pertinent to note that for the trace fear conditioning experiment, the rats had previously been exposed to a vocalization playback experiment. While such a pre-exposure is unlikely to be a very strong stressor, the possibility for it to influence the vocal behaviors of these rats in later experiments cannot be ruled out. It is also not clear what the control rats in this experiment experienced (home cage only?), nor what they were used for in analyses.

      Answer: In the current version of the manuscript, we have described in greater detail all the experiments performed and analyzed. We would like to emphasize that both delay and trace fear conditioning experiments with radiotelemetric transmitters were not performed specifically to elicit any particular response during fear conditioning, rather that our observation of 44-kHz vocalizations emerged as a result of re-examining the audio recordings. As a result, this work summarizes our observations of 44-kHz calls from several different experiments. It is relevant to note, that 44-kHz vocalizations were observed “in rats which were exposed to vocalization playback experiment”, in rats before the playback experiments as well as in naïve rats, without transmitters implemented, trained in fear conditioning (Tab. 1/Exp. 1-3).

      Our main message is that 44-kHz vocalizations were present in several experiments, with different conditions and subjects, while we are not attempting to compare in detail the results across the different experiments. In other words, we agree that pre-exposure to playback (and even more likely – transmitters implantation) could influence, but are not necessary, for 44-kHz ultrasonic emissions by the rats. To demonstrate this, we added a prolonged fear conditioning group with naïve Wistar rats (Exp. 3) to verify the emission of 44kHz calls in the absence of those experimental factors.

      We modified the methods section to clarify the circumstances under which these discoveries were made, such as including the information regarding the control rats in trace fear conditioning. In particular we mention that: “Control rats were subjected to the exact same procedures but did not receive the electric shock at the end of trace periods”.

      For Figure 1A-E, only example call distributions from individual rats are shown. It would perhaps be more informative to see the full data set displayed in this manner, with color/shape codes distinguishing individuals if desired.

      Answer: Please note the Fig. 1S1 shows more examples of ultrasonic call distribution. Showing all the data would make it more difficult to read and interpret. The problem is partly amended in Fig. 3A.

      It is not clear what is presented in Figure 2D vs. E, i.e. panel D is shown only for "selected rats" but the legend does not clarify how and why these rats were selected. It is also not clear why the legend reports p-values for both Friedman and Wilcoxon tests; the latter is appropriate for paired data which seems to be the case when the question is whether the call peak frequency alters across time, but the Friedman assumes non-paired input data.

      Answer: The question refers to the current Fig. 1S2C panel (former Fig. 2E panel) and the former Fig. 2D panel. The latter was not included in the current version of the manuscript, since both reviewers opposed the presentation of “selected rats” only (see above). The full description of the Fig. 1S2C panel is now in the results section together with p-values for Friedman and Wilcoxon test. We used the latter to investigate the difference between the first and the last ITI (selected paired data), while the Friedman to investigate the presence of change within the chain of ten ITI – since it is a suitable test for a difference between two or more paired samples.

      Reviewer #2 (Recommendations For The Authors):

      The weaknesses listed in the public review need to be addressed.

      Answer: We have done our best to address the weaknesses.

      Notes: 1) Page and line numbers would have been useful.

      Answer: We are including a separate manuscript version with page and line numbers.

      .(2) English language needs to be improved.

      Answer: The text has been checked by two native English speakers (one with a scientific background). Both only identified minor changes to improve the text which we applied.

      (3) I am a bit unsure whether the comment about the Star Wars movie (1997) and the Game of Thrones series (2011) is supposed to be a joke.

      Answer: These are indeed two genuine examples of the perfect fifth in human music that we hope are easily recognizable and familiar to readers. Parts of the same examples of the perfect fifth can also heard in the rat voice files provided.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      During the last decades, extensive studies (mostly neglected by the authors), using in vitro and in vivo models, have elucidated the five-step mechanism of intoxication of botulinum neurotoxins (BoNTs). The binding domain (H chain) of all serotypes of BoNTs binds polysialogangliosides and the luminal domain of a synaptic vesicle protein (which varies among serotypes). When bound to the synaptic membrane of neurons, BoNTs are rapidly internalized by synaptic vesicles (SVs) via endocytosis. Subsequently, the catalytic domain (L chain) translocates, a process triggered by the acidification of these organelles. Following translocation, the disulfide bridge connecting the H chain with the L chain is reduced by the thioredoxin reductase/thioredoxin system, and it is refolded by the chaperone Hsp90 on SV's surface. Once released into the cytosol, the L chains of different serotypes cleave distinct peptide bonds of specific SNARE proteins, thereby disrupting neurotransmission. In this study, Yeo et al. extensively revise the neuronal intoxication model, suggesting that BoNT/A follows a more complex intracellular route than previously thought. The authors propose that upon internalization, BoNT/A-containing endosomes are retro-axonally trafficked to the soma. At the level of the neuronal soma, this serotype then traffics to the endoplasmic reticulum (ER) via the Golgi apparatus. The ER SEC61 translocon complex facilitates the translocation of BoNT/A's LC from the ER lumen into the cytosol, where the thioredoxin reductase/thioredoxin system and HSP complexes release and refold the catalytic L chain. Subsequently, the L chain diffuses and cleaves SNAP25 first in the soma before reaching neurites and synapses. Strengths:

      I appreciate the authors' efforts to confirm that the newly established methods somehow recapitulate aspects of the BoNTs mechanism of action, such as toxin binding and uptake occurring at the level of active synapses. Furthermore, even though I consider the SNAPR approach inadequate, the genome-wide RNAi screen has been well executed and thoroughly analyzed. It includes well-established positive and negative controls, making it a comprehensive resource not only for scientists working in the field of botulinum neurotoxins but also for cell biologists studying endocytosis more broadly. Weaknesses:

      I have several concerns about the authors' main conclusions, primarily due to the lack of essential controls and validation for the newly developed methods used to assess toxin cleavage and trafficking into neurons. Furthermore, there is a significant discrepancy between the proposed intoxication model and existing studies conducted in more physiological settings. In my opinion, the authors have omitted over 20 years of work done in several labs worldwide (Montecucco, Montal, Schiavo, Rummel, Binz, etc.). I want to emphasize that I support changes in biological dogma only when these changes are supported by compelling experimental evidence, which I could not find in the present manuscript.

      We thank the reviewer for his reading and comments and for pointing out the discrepancy between our proposed model and the existing model. However, we respectfully disagree with the phrase of “extensive studies have elucidated the five-steps mechanism of intoxication…”. This sentence and the following imply that the model is well-established and demonstrated. It also highlights how the reviewer is convinced about this previous model.

      We contest this model for theoretical reasons and contest the strength of evidences that support it. We previously included references to previous work showing that the model is also being challenged by others. In light of the reviewer’s comments, we incluced more references in the introduction and we also explicit our main theoretical concern in the introduction:

      “Arguably, the main problem of the model is its failure to propose a thermodynamically consistent explanation for the directional translocation of a polypeptidic chain across a biologial membrane. Other known instances of polypeptide membrane translocation such as the co-translational translocation into the ER indicate that it is an unfavorable process, which consumes significant energy (Alder and Theg 2003). ”

      We also added the following text in the Discussion to address with the reviewer’s concerns: “Our study contradicts the long-established model of BoNT intoxication, which is described in several reviews specifically dedicated to the subject 1–4. In short, these reviews support the notion that BoNT are molecular machines able to mediate their own translocation across membranes; this notion has convinced some cell biologists interested in toxins and retrograde traffic, who describe BoNT mode of translocation in their reviews 5,6.

      But is this notion well supported by data? A careful examination of the primary literature reveals that early studies indeed report that BonTs form ion channels at low pH values 7,8. These studies have been extended by the use of patch-clamp 9,10. These works and others lead to various suppositions on how the toxin forms a channel and translocate the LC 1,11 .

      However, only a single study claims to reconstitute in vitro the translocation of BonT LC across membranes 12. In this paper, the authors report using a system of artificial membranes separating two aqueous compartments. They load the toxin in the cis compartment and measure the protease activity in the trans compartment after incubation. However, when the experimental conditions described are actually converted in terms of molarity, it appears that the cis compartment was loaded at 10e-8M BonT and that the reported translocated protease activity is equivalent to 10e-17 M (Figure 3D, 12). Thus, in this experiment, about 1 LC molecule in 100 millions has crossed the membrane. Such extremely low transfert rate does not tally with the extreme efficiency of intoxication in vivo, even while taking into account the difference between artificial and biological membranes.

      In sum, a careful analysis of the primary literature indicate that while there is ample evidence that BoNTs have the ability to affect membranes and possibly create ion channels, there is actually no credible evidence that these channels mediate translocation of the LC. As mentioned earlier, it is not clear how such a self-translocation mechanism would function thermodynamically. By contrast, our model proposes a mechanism without a thermodynamic problem, is consistent with current knowledge about other protein toxins, such as PE, Shiga and Ricin, and can help explain previously puzzling features of BonT effects. It is worth noting that a similar self-translocation model was proposed for other protein toxins such as Pseudomonas exotoxin, which have similar molecular organisation as BonT (68). However, it has since been demonstrated that the PE toxins require cellular machinery, in particular in the ER, for intoxication (21,69,70).”

      Reviewer #2 (Public Review):

      Summary:

      The study by Yeo and co-authors addresses a long-lasting issue about botulinum neurotoxin (BoNT) intoxication. The current view is that the toxin binds to its receptors at the axon terminus by its HCc domain and is internalized in recycled neuromediator vesicles just after the release of the neuromediators. Then, the HCn domain assists the translocation of the catalytic light chain (LC) of the toxin through the membrane of these endocytic vesicles into the cytosol of the axon terminus. There, the LC cleaves its SNARE substrate and blocks neurosecretion. However, other views involving kinetic aspects of intoxication suggest that the toxin follows the retrograde axonal transport up to the nerve cell body and then back to the nerve terminus before cleaving its substrate.

      In the current study, the authors claim that the BoNT/A (isotype A of BoNT) not only progresses to the cell body but once there, follows the retrograde transport trafficking pathway in a retromer-dependent fashion, through the Golgi apparatus, until reaching the endoplasmic reticulum. Next, the LC dissociates from the HC (a process not studied here) and uses the translocon Sec61 machinery to retro-translocate into the cytosol. Only then, does the LC traffic back to the nerve terminus following the anterograde axonal transport. Once there, LC cleaves its SNARE substrate (SNAP25 in the case of BoTN/A) and blocks neurosecretion.

      To reach their conclusion, Yeo and co-authors use a combination of engineered tools: a cell line able to differentiate into neurons (ReNcell VN), a reporter dual fluorescent protein derived from SNAP25, the substrate of BoNT/A (called SNAPR), the use of either native BoNT/A or a toxin to which three fragment 11 of the reporter fluorescent protein Neon Green (mNG) are fused to the N-terminus of the LC (BoNT/A-mNG11x3), and finally ReNcell VN transfected with mNG1-10 (a protein consisting of the first 10 beta strands of the mNG).

      SNAPR is stably expressed all over in the ReNcell VN. SNAPR is yellow (red and green) when intact and becomes red only when cleaved by BoNT/A LC, the green tip being degraded by the cell. When the LC of BoNT/A-mNG11x3 reaches the cytosol in ReNcell VN transfected by mNG1-10, the complete mNG is reconstituted and emits a green fluorescence.

      In the first experiment, the authors show that the catalytic activity of the LC appears first in the cell body of neurons where SNAPR is cleaved first. This phenomenon starts 24 hours after intoxication and progresses along the axon towards the nerve terminus during an additional 24 hours. In a second experiment, the authors intoxicate the ReNcell VN transfected by mNG1-10 using the BoNT/A-mNG11x3. The fluorescence appears also first in the soma of neurons, then diffuses in the neurites in 48 hours. The conclusion of these two experiments is that translocation occurs first in the cell body and that the LC diffuses in the cytosol of the axon in an anterograde fashion.

      In the second part of the study, the authors perform a siRNA screen to identify regulators of BoNT/A intoxication. Their aim is to identify genes involved in intracellular trafficking of the toxin and translocation of the LC. Interestingly, they found positive and negative regulators of intoxication. Regulators could be regrouped according to the sequential events of intoxication.

      Genes affecting binding to the cell-surface receptor (SV2) and internalization. Genes involved in intracellular trafficking. Genes involved in translocation such as reduction of the disulfide bond linking the LC to the HC and refolding in the cytosol. Genes involved in signaling such as tyrosine kinases and phosphatases. All these groups of genes may be consistent with the current view of BoNT intoxication within the nerve terminus. However, two sets of genes were particularly significant to reach the main conclusion of the work and definitely constitute an original finding important to the field. One set of genes consists of those of the retromer, and the other relates to the Sec61 translocon. This should indicate that once endocytosed, the BoNT traffics from the endosomes to the Golgi apparatus, and then to the ER. Ultimately, the LC should translocate from the ER lumen to the cytosol using the Sec61 translocon. The authors further control that the SV2 receptor for the BoNT/A traffics along the axon in a retromer-dependent fashion and that BoNT/A-mNG11x3 traverses the Golgi apparatus by fusing the mNG1-10 to a Golgi resident protein.

      Strengths:

      The findings in this work are convincing. The experiments are carefully done and are properly controlled. In the first part of the study, both the activity of the LC is monitored together with the physical presence of the toxin. In the second part of the work, the most relevant genes that came out of the siRNA screen are checked individually in the ReNcell VN / BoNT/A reporter system to confirm their role in BoNT/A trafficking and retro-translocation.

      These findings are important to the fields of toxinology and medical treatment of neuromuscular diseases by BoNTs. They may explain some aspects of intoxication such as slow symptom onset, aggravation, and appearance of central effects.

      Weaknesses:

      The findings antagonize the current view of the intoxication pathway that is sustained by a vast amount of observations. The findings are certainly valid, but their generalization as the sole mechanism of BoNT intoxication should be tempered. These observations are restricted to one particular neuronal model and engineered protein tools. Other models such as isolated nerve/muscle preparations display nerve terminus paralysis within minutes rather than days. Also, the tetanus neurotoxin (TeNT), whose mechanism of action involving axonal transport to the posterior ganglia in the spinal cord is well described, takes between 5 and 15 days. It is thus possible that different intoxication mechanisms co-exist for BoNTs or even vary depending on the type of neurons.

      Although the siRNA experiments are convincing, it would be nice to reach the same observations with drugs affecting the endocytic to Golgi to ER transport (such as Retro-2, golgicide or brefeldin A) and the Sec61 retrotranslocation (such as mycolactone). Then, it would be nice to check other neuronal systems for the same observations.

      We thank the reviewer for the careful reading and comments of our manuscript. The reference to “a vast amount of observation” is a similar argument to the Reviewer 1 and used to suggest that our study may not be applicable as a general mechanism.

      We respectfully disagree as described above and posit on the contrary that the model we propose is much more likely to be general than the model presented in current reviews for the several reasons cited (see added text in Introduction and Discussion). While we agree that more work is needed to confirm the proposed mechanisms of BonT translocation in other models, these experiments fall outside the perimeter of our study.

      The fact that nerve/muscle preparations of BonT activity have relatively fast kinetics does not pose a contradiction to our model. Our model reveals primarily the requirement for trafficking to the ER membranes. This ER targeting requires trafficking through the Golgi complex, in turn explaining the requirement for trafficking to the soma of neurons in the experimental system we used. However, in neuronal cells in vivo, Golgi bodies can be found along the lenght of the axon, thus BonT may not always require trafficking to the soma of the affected cells. The time required for intoxication could thus vary greatly depending on the neuronal structural organisation.

      TenT is proposed to transfer from excitatory neurons into inhibitory neurons before exerting its action. While the detailed mechanism of this fascinating mechanism remain to be explored, it clearly falls beyond the purview of this manuscript.

      Regarding the use of drugs, we agree that it would be a nice addition; unfortunately we are unable to perform such experiments at this stage. Setting up a large scale siRNA screen for BonT mechanism of action is challenging as it requires a special facility with controlled access and police authorisation (in Singapore) given the high toxicity of this molecule. Unfortunately, the authorisations have now lapsed.

      Reviewer #3 (Public Review): Summary:

      The manuscript by Yao et al. investigates the intracellular trafficking of Botulinum neurotoxin A (BoNT/A), a potent toxin used in clinical and cosmetic applications. Contrary to the prevailing understanding of BoNT/A translocation into the cytosol, the study suggests a retrograde migration from the synapse to the soma-localized Golgi in neurons. Using a genome-wide siRNA screen in genetically engineered neurons, the researchers identified over three hundred genes involved in this process. The study employs organelle-specific split-mNG complementation, revealing that BoNT/A traffics through the Golgi in a retromer-dependent manner before moving to the endoplasmic reticulum (ER). The Sec61 complex is implicated in the retro-translocation of BoNT/A from the ER to the cytosol. Overall, the research challenges the conventional model of BoNT/A translocation, uncovering a complex route from synapse to cytosol for efficient intoxication. The findings are based on a comprehensive approach, including the introduction of a fluorescent reporter for BoNT/A catalytic activity and genetic manipulations in neuronal cell lines. The conclusions highlight the importance of retrograde trafficking and the involvement of specific genes and cellular processes in BoNT/A intoxication.

      Strengths:

      The major part of the experiments are convincing. They are well-controlled and the interpretation of their results is balanced and sensitive.

      Weaknesses:

      To my opinion, the main weakness of the paper is in the interpretation of the data equating loss of tGFP signal (when using the Red SNAPR assay) with proteolytic cleavage by the toxin. Indeed, the first step for loss of tGFP signal by degradation of the cleaved part is the actual cleavage. However, this needs to be degraded (by the proteasome, I presume), a process that could in principle be affected (in speed or extent) by the toxin.

      We thank the reviewer for his comments and careful reading of our manuscript.

      Regarding the read-out of the assay, we agree that the assay could be sensitive to alteration in the protein degradation pathway. We have added the following sentence in the Discussion to take it into account:

      “As noted by one reviewer, the assay may be sensitive to perturbation in the general rate of protein degradation, a consideration to keep in mind when evaluating the results of large scale screens.”

      While this may be valid for some hits in the general list, it is important to note that the main hits have been shown to affect toxin trafficking by an independent, orthogonal assay based on the split GFP reconstitution.

      Recommendations to authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) To assess the activity of BoNT/A in neurons, Yeo et al. have generated a neuronal stem line referred to as SNAPR. This cell line stably expresses a chimeric reporter protein that consists of SNAP25 flanked at its N-terminus with a tagRFPT and at its C-terminus with a tagGFP. After exposure to BoNT/A, SNAP25 is cleaved and, the C-terminal tGFP-containing moiety is rapidly degraded. I have many doubts about the validity of the described method. Indeed, BoNT/A activity is analysed in an indirect way by quantifying the degradation of the GFP moiety generated after toxin cleavage (Fig. 2). In this regard, the authors should consider that their approach is dependent, not only on the toxin's metalloprotease activity but also on the functionality of the proteasome in neurons. Therefore, considering the current dataset, it is impossible to rule out the possibility that the progression of GFP signal loss from the soma to the neurite terminals may be attributed to the different proteasome activity in these compartments. Is it conceivable that the GFP fragment generated upon toxin cleavage degrades more rapidly in the soma in comparison to axonal terminals? This alternative explanation could challenge the conclusion drawn in Fig. 2.

      The reviewer’s alternative explanation disregards the experiments performed with the split-GFP complementation approach, which indicate translocation in the soma first. The split GFP reporter is not dependent on the proteasome activity. It also disregard the genetic data implicating many genes involved in membrane retrograde traffic, which are also not consistent with the hypothesis of the reviewer. These genes depletions not only affect SNAPR degradation but also BoNT/A-mNG11 trafficking: thus, their effect cannot be attributed to an completely hypothetical spatial heterogeneous distribution of the proteasome.

      For this reason, I strongly suggest using a more physiological approach that does not depend on proteasomal degradation or on the expression of the sensor in neurons. The authors should consider performing a time course experiment following intoxication and staining BoNT/A-cleaved SNAP25 by using specific antibodies (see Antonucci F. et al., Journal of Neuroscience, 2008 or Rheaume C. et al., Toxins 2015).

      For the above reason, we do not agree with the pressing importance of confirming by a third method using specific antibodies; especially considering that BonT is very difficult to detect in cells when incubated at physiological levels. By the way, the cited paper, by Antonucci F; et al. documents long distance retrograde traffic of BonT/A, which is in line with our data.

      An alternative approach could involve the use of microfluidic devices that physically separate axons from cell bodies. Such a separation will allow us to test the authors' primary conclusion that SNAP25 is initially cleaved in the soma. The suggested experiments will also rule out potential overexpression artifacts that could influence the authors' conclusions when using the newly developed SNAPR approach. Without these additional experiments, the authors' main conclusion that SNAP25 is cleaved first in the neuronal soma rather than at the nerve terminal is inadequate.

      As discussed above we disagree about the doubts raised by the reviewer: we present three types of evidences (SNAPR, split GFP and genetic hits) and they all point in the same direction. Thus, we respectfully doubt that a fourth approach would convince this reviewer. To note, we have attempted to use microfluidics devices as suggested by the reviewer, however, the Ren-VM neurons were not able to extend axons long enough across the device.

      (2) To detect BoNT/A translocation into the cytosol, the authors have used a complementation assay by intoxicating ReNcell VM cell expressing a cytosolic HA-tagged split monomeric NeonGreen (Cyt-mNG1-10) with an engineered BoNT/A, where the catalytic domain (LC) was fused to mNG1-11. When drawing conclusions regarding the detection of cytosolic LC in the neuronal soma, the authors should highlight the limitations of this assay and explicitly describe them to the readers. Firstly, the authors need to investigate whether the addition of mNG1-11 to the LC affects the translocation process itself (by comparing with a WT, not tagged, LC).

      Additionally, from the data shown in Fig. 2C, it is evident that the Cyt-mNG1-10 is predominantly expressed in the cytosol and less detected in neurites. This raises the question of whether there might be a bias for the cell soma in this assay. To address this important concern, I suggest quantifying MFI per cell (Fig. 2D) taking into consideration the amount of HA-tagged Cyt-mNG1-10. Furthermore, I strongly suggest targeting mNG1-10 to synapses and performing a similar time course experiment to observe when LC translocation occurs at nerve terminals. Alternative experiments, to prove that BoNT/A requires retrograde trafficking before it can translocate, may be done to repeat the experiments shown in Fig. 2D in the presence of inhibitors (or by KD some of the hits identified as microtubule stabilizers) that should interfere with BoNT/A trafficking to the neuronal somata. Without these additional experiments, the authors' main conclusion that the BoNT/A catalytic domain is first detected in the neuronal soma rather than at the nerve terminal is very preliminary.

      Similarly as for the SNAPR assay, the reviewer is raising the level of doubt to very high levels. We respect his thoroughness and eagerness to question the new model. However, we note that a similar level of scrutiny does not apply to the prevalent competitive model. Indeed, the data supporting the self-translocation model is based on a single in vitro experiment published in one panel as we have explain din the discussion (see above).

      (3) In the genome-wide RNAi screening, rather than solely assessing SV2 surface levels, it would have been beneficial to directly investigate BoNT/A binding to the neuronal membrane. For instance, this could have been achieved by using a GFP-tagged HC domain of BoNT/A. At present, the authors cannot exclude the possibility that among the 135 hits that did not affect SV2 levels, some might still inhibit BoNT/A binding to the neuronal surface. These concerns, already exemplified by B4CALT4 (which is known to be involved in the synthesis of GT1b), should be explicitly addressed in the main text.

      We agree with the reviewer that perturbation of binding of BonT is possible. We added the following text:

      “Network analysis reveals regulators of signaling, membrane trafficking and thioreductase redox state involved in BoNT/A intoxication

      Among the positive regulators of the screen, 135 hits did not influence significantly surface SV2 levels and are thus likely to function in post-endocytic processes (Supplementary Table 2). However, we cannot formerly exclude that they could affect binding of BonT to the cell surface independently of SV2.”

      (4) The authors should clearly state which reagents they have tried to use in order to explain the challenges they faced when directly testing the trafficking of BoNT/A. The accumulation of Dendra-SV2 bulbous structures at the neurite tips in VPS35-depleted cells could be interpreted as a sign of neuronal stress/death. Have the authors investigated other proteins that do not undergo retro-axonal trafficking in a retromer-dependent manner? This control is essential. In this regard, the use of a GFP-tagged HC domain of BoNT/A could prove to be quite helpful.

      We tried multiple commercially available antibodies against BonT but we could not get a very good signal. The postdoc in charge of this project has now gone to greener pastures and we are not in the capacity to provide the details corresponding to these antibodies. We di dnot observe significant cell death after VPS-35 knockdown at the time of the experiment, however longe rterm treatment might result in toxicity indeed.

      (5) Considering my concerns related to the SNAPR system and the complementation assay to study SNAP25 cleavage and BoNT/A trafficking, I suggest validating some of their major hits (ex. VPS34 and Sec61) by performing WB or IF analysis to examine the cleavage of endogenous SNAP25. Furthermore, the authors should test VPS35 depletion in the context of the experiments performed in Fig. 6G-H, by validating that this protein is essential for BoNT/A retrograde trafficking.

      The reviewer concerns are well noted but as discussed above, the two systems we used are completely orthogonal. Thus, for the reviewer’s concerns to be valid, it would have to be two completely independent artefacts giving rise to the same result. The alternative explanation is that BonT/A translocates in the soma. The Ockham razor principle dictates that the simplest explanation is the likeliest.

      (6) The introduction and the discussion section of this paper completely disregard more than 20 years of research conducted by several labs worldwide (Montecucco, Montal, Schiavo, Rummel, Binz, etc). The authors should make an effort to contextualize their data within the framework of these studies and address the significant discrepancies between their proposed intoxication model and existing research that clearly demonstrates BoNTs translocating upon the endocytic retrieval of SVs at presynaptic sites. Nevertheless, even assuming that the model proposed by the authors is accurate, numerous questions emerge. One such question is: How can the authors explain the exceptional toxicity of botulinum neurotoxin in an ex vivo neuromuscular junction preparation devoid of neuronal cell bodies (see Cesare Montecucco and Andreas Rummel's seminal studies)?

      Please see above in the answer to public reviews.

      (7) Scale bars should be added to all representative pictures.

      This has been done. Thank you for the thorough reading of our manuscript.

      Reviewer #2(Recommendations For The Authors):*

      (1) The title overstates the results. It may be indicated "in differenciated ReNcell VM".

      Title changed to: “Botulinum toxin intoxication requires retrograde transport and membrane translocation at the ER in RenVM neurons”

      (2) In the provided manuscript there are two Figure 2 and no Figure 3. This made the reading and understanding extremely difficult and should be corrected. As a result, the Figure legends do not fit the numbering. There are also discrepancies between some Figure panels (A, B, C, etc), the text, and the Legends. All this needs to be carefully checked.

      We apologize for the confusion as the manuscript as followed multiple rounds of revisions. We have carefully verified labels and legends.

      (3) The BoNT/A-mNG11x3 may introduce some bias that could be discussed. Would these additional peptides block LC translocation from synaptic vesicles in the nerve termini? In addition, the mNG peptides that are unfolded before complementation may direct LC towards Sec61. These aspects should be discussed.

      The comment would be valid if BoNT/A-mNG11x3 was the only approach used in the paper, however the SNAPR reporter is used with native BonT and shows data consistent with the split GFP approach.

      (4) In the Figure about SV2 (Fig 3 or 4): The authors did not locate SV2. The cells seem not to have the same differentiated phenotype as in Figure 1 and Figure 2/3A.

      We apologized above for the mislabeling. It is not clear what is the question here.

      (5) The authors should check whether BoNT/A wt cleaves the endogeneous SNAP25 by western blot for instance in the original ReNcell VN before SNAPR engineering. This should be compared with wt SNAP25 cleavage by the BoNT/A-LC-mNG.

      It is likely that BoNT/A-LC-mNG11 should have similar activity as it is only adding a small peptide at the end of the LC. At any rate, it is not clear why this is so important since both molecules translocate in the cytosol, with the same kinetics and in the same subcellular locale.

      (6) Perhaps I did not understand. How can the authors exclude that what is observed is the kinetic overproduction of the reporter substrate SNAPR?

      The authors could use SLO toxin (PNAS 98, 3185-3190, 2001) to permeabilize the cells all along their body and axon to introduce BoNT/A or LC (wt) and observe synchronized SNAPR cleavage throughout the cells.

      The concept mentioned here is not very clear to us. The reviewer is proposing that the SNAPR is produced much more efficiently at the tips of the neurites and thus its cleavage takes longer to be detected and is apparent first in the soma?? With all due respect, this is a strange hypothesis, at odds with what we know of protein dynamics in the neurons (i.e. most proteins are largely made in the soma and transported or diffuse into the neurites).

      Again, the two orthogonal approaches: split GFP and SNAPR reporter use different constructs and methods, yet converge on similar results. Perhaps, the incredulity of the reviewer might be more productively directed at the current data “demonstrating” the translocation of LC in the synaptic button?

      (7) The authors could also use an essay on neurotransmitter release monitoring by electrophysiology measurements to check the functional consequences of the kinetic diffusion of LC activity along the axon. Can the authors exclude that some toxin molecules translocate from the endocytic vesicles and block neurotransmission within minutes or a few hours?

      It is well established that inhibition of neurotransmission does not occur within minutes in vivo and in vitro, but rather within hours or even days. This kinetic delay is experienced by many patients and is one of the key argument against the current model of self-translocation at the synaptic vesicle level.

      Minor remarks

      Thank you for pointing out all these.

      (1) Please check typos. There are many. Check space before the parenthesis, between numbers and h (hours), reference style etc.

      Thank you. We have reviewed the text and try to eliminate all these instances.

      (2) Line 90: The C of HC should be capitalized.

      Fixed

      (3) Line 107: add space between "neurons(Donato".

      Fixed

      (4) Line 109: space "72 h".

      Fixed

      (5) Line 115: a word is missing ? ...to show retro-axonal... ? Please clarify this sentence.

      Fixed

      (6) Figure 1E: does nm refer to nM (nanomolar)? Please correct. No mention of panel F.

      Fixed

      (7) Line 161: do you mean ~16 µm/h? Please correct.

      Fixed

      (8) Line 168, words are missing.

      Fixed, thank you

      We verified that Cyt-mNG1-10 was expressed using the HA tag, the expression was homogeneously distributed in differentiated neurons and we observed no GFP signal (Figure2C).

      (9) Line 171: Isn't mNG 11 the eleventh beta strand of the neon green fluorescent protein, not alpha helix? Otherwise, can the authors confirm it acquires the shape of an alpha helix? Same at line 326.

      We have corrected the mistake; thanks for pointing it out.

      (10) Figure 2 is doubled. The legend of Fig 2 refers to Figure 3. There is no legend for Figure 2. Then, some figures are shifted in their numbering.

      Fixed

      (11) The fluorescence in the cell body must appear before the fluorescence in the axon due to higher volume. Please discuss.

      The fluorescence progresses in the neurites extensions in a centripetal fashion. The volume of the neurite near the cell body is not significantly different from the end of the neurite. Thus the fluorescence data is consistent with translocation in soma and not with an effect due to higher volume in the soma.

      (12) Figure 2D, right: the term intoxication is improper for this experiment. Rather, it is the presence of the BoNT/A-mNG11 that is detected. I believe the authors should be particularly careful about the use of terms: intoxication means blockade of neurosecretion, SNAPR cleavage means activity etc.

      While the reviewer is correct that it is the presence of BoNT/A-mNG11 that is detected, it remains that it is an active toxin, so the neurons are effectively intoxicated; as they are when we use the wild type toxin. We do not imply that we are measuring intoxication, but simply that the neurons are put into contact with a toxin.

      (13) Line 196: Should we read TXNRD1 is required for BoNT/A LC translocation? TXNRD1 in the current model of translocation is located in the cytoplasm and is supposed to play a role in the cleavage of the disulfide bond linking LC to HC. In the model proposed by this study, LC is translocated through the Sec61 translocon. In this case, I would assume that the protein disulfide isomerase (PDI) in the endoplasmic reticulum would reduce the LC-HC disulfide bond. In that case, TXNRD1 would not be required anymore. Please discuss.

      Why should we assume that a PDI is involved in the reduction of the LC-HC disulfide bond? In our previous studies on A-B toxins (PE and Ricin), different reduction systems seemed to be at play. There is no conceptual imperative to assume reduction in the ER because the Sec61 translocon is implicated. Reduction might occur on the cytosolic side by TXNRD1 or the effect of this reductase could be indirect.

      (14) The legend of Figure 4 (in principle Figure 5?) is not matching with the panels and panel entries are missing (Figure 4F in particular).

      Fixed

      (15) Figure 6 panels E and H, please match colors with legend (grey and another color).

      Not clear

      (16) Please indicate BoNT/A construct concentrations in all Figure legends.

      Done

      (17) Line 416: isn't SV2 also involved in epilepsy?

      Yes it is.

      (18) Line 433: as above, shouldn't the disulfide bond linking LC to HC be cleaved by PDI in the ER in this model (as for other translocating bacterial toxins) rather than by thioredoxin reductases in the cytoplasm? Please discuss.

      See above

      (19) Identification of vATPase in the screen could be consistent with the endocytic vesicle acidification model of translocation.

      Yes

      (20) Did the authors add KCl in screening controls without toxins? This should be detailed in the Materials and Methods. Could there be a KCl effect on the cells? KCl exposure for 48 hours may be highly stressful for cells. The KCl exposure should last only several minutes for toxin entry.

      We did not observe significant cell detah with the cell culture conditions used. Cell viability was controlled at multiple stages using nuclei number for instance

      Reviewer #3 (Recommendations For The Authors):

      Main comments: (1) In Figure 1B: could you devise a means to prevent proteosomal degradation of the tGFP cleaved part to assess whether this is formed?

      We have also used a FRET assay after tintoxication and obtained similar results

      (2) Line 152: Where it reads "was not surprising", maybe I missed something, but to me, this is indeed surprising. If the toxin is rapidly internalized and translocated (therefore, it is able to cleave SNAP25), the fact that tGFP requires 48 hours to be degraded seems surprising to me. Or does it mean that the toxin also slows down the degradation of the tGFP fragment? So, how can you differentiate between the effect being on cleavage of the fragment or in tGFP degradation?

      The reviewer is correct, the “not” was a typo due to re-writting; the long delay between adding the toxin and observing cleavage was suprising indeed. Our interpretation is that it is trafficking that takes time, indeed, the split-GFP data kinetics indicates that the toxin takes about 48h to fill up the entire cytosol (Fig. 2D).

      (3) Regarding the effect of Sec61G knockdown, is it possible that the observed effects are indirect and not due to the translocon being directly responsible for translocating the protein?

      As discussed in the last part of the results,Sec61 knock-down results in block of intoxication, but does not prevent BonT from reaching the lumen of the ER (Figure 6G,H). Thus, Sec61 is “is instrumental to the translocation of BoNT/A LC into the neuronal cytosol at the soma.”

      Minor comments:

      (1) Fig. 3E: in the legend I think one of the NT3+ should be NT3-.

      Yes, thanks for spotting it

      (2) Would you consider adding Figure S4 as a main figure?

      Thanks for the suggestion

      (3) Please, check that all microscopy image panels have scale bars.

      Done

      (4) Figure 6B (bottom panes): why does it seem that there is a lot of mNeonGreen positive signal in regions that are not positive for HA? Shouldn't complementation keep HA in the complemented protein.

      Our assumption i sthat there is an excess of receptor protein (HA tag) over reconstituted protein (GFP protein) given the relatively low concentration of toxin being internalized and translocated Refs: (1) Pirazzini M, Azarnia Tehran D, Leka O, Zanetti G, Rossetto O, Montecucco C. On the translocation of botulinum and tetanus neurotoxins across the membrane of acidic intracellular compartments. Biochim Biophys Acta. 2016 Mar;1858(3):467–474. PMID: 26307528

      (2) Pirazzini M, Rossetto O, Eleopra R, Montecucco C. Botulinum Neurotoxins: Biology, Pharmacology, and Toxicology. Pharmacol Rev. 2017 Apr;69(2):200–235. PMCID: PMC5394922

      (3) Dong M, Masuyer G, Stenmark P. Botulinum and Tetanus Neurotoxins. Annu Rev Biochem. Annual Reviews; 2019 Jun 20;88(1):811–837.

      (4) Rossetto O, Pirazzini M, Fabris F, Montecucco C. Botulinum Neurotoxins: Mechanism of Action. Handb Exp Pharmacol. 2021;263:35–47. PMCID: 6671090

      (5) Williams JM, Tsai B. Intracellular trafficking of bacterial toxins. Curr Opin Cell Biol. 2016 Aug;41:51–56. PMCID: PMC4983527

      (6) Mesquita FS, van der Goot FG, Sergeeva OA. Mammalian membrane trafficking as seen through the lens of bacterial toxins. Cell Microbiol. 2020 Apr;22(4):e13167. PMCID: PMC7154709

      (7) Hoch DH, Romero-Mira M, Ehrlich BE, Finkelstein A, DasGupta BR, Simpson LL. Channels formed by botulinum, tetanus, and diphtheria toxins in planar lipid bilayers: relevance to translocation of proteins across membranes. Proc Natl Acad Sci U S A. 1985 Mar;82(6):1692–1696. PMCID: PMC397338

      (8) Donovan JJ, Middlebrook JL. Ion-conducting channels produced by botulinum toxin in planar lipid membranes. Biochemistry. 1986 May 20;25(10):2872–2876. PMID: 2424493

      (9) Fischer A, Montal M. Single molecule detection of intermediates during botulinum neurotoxin translocation across membranes. Proc Natl Acad Sci U S A. 2007 Jun 19;104(25):10447–10452. PMCID: PMC1965533

      (10) Fischer A, Nakai Y, Eubanks LM, Clancy CM, Tepp WH, Pellett S, Dickerson TJ, Johnson EA, Janda KD, Montal M. Bimodal modulation of the botulinum neurotoxin protein-conducting channel. Proc Natl Acad Sci U S A. 2009 Feb 3;106(5):1330–1335. PMCID: PMC2635780

      (11) Fischer A, Montal M. Crucial role of the disulfide bridge between botulinum neurotoxin light and heavy chains in protease translocation across membranes. J Biol Chem. 2007Oct 5;282(40):29604–29611. PMID: 17666397

      (12) Koriazova LK, Montal M. Translocation of botulinum neurotoxin light chain protease through the heavy chain channel. Nature structural biology. 2003. p. 13–18. PMID: 12459720

      (13) Moreau D, Kumar P, Wang SC, Chaumet A, Chew SY, Chevalley H, Bard F.Genome-wide RNAi screens identify genes required for Ricin and PE intoxications. Dev Cell. 2011 Aug 16;21(2):231–244. PMID: 21782526

      (14) Bassik MC, Kampmann M, Lebbink RJ, Wang S, Hein MY, Poser I, Weibezahn J, Horlbeck MA, Chen S, Mann M, Hyman AA, Leproust EM, McManus MT, Weissman JS. A systematic mammalian genetic interaction map reveals pathways underlying ricin susceptibility. Cell. 2013 Feb 14;152(4):909–922. PMCID: PMC3652613

      (15) Tian S, Muneeruddin K, Choi MY, Tao L, Bhuiyan RH, Ohmi Y, Furukawa K, Furukawa K, Boland S, Shaffer SA, Adam RM, Dong M. Genome-wide CRISPR screens for Shiga toxins and ricin reveal Golgi proteins critical for glycosylation. PLoS Biol. 2018 Nov;16(11):e2006951. PMCID: PMC6258472

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      (1) Importantly, it would be useful to have provided more detailed information on the structure and histological properties of the murine cysts and how such findings relate to human lung cysts. Also, the authors should examine whether there is any information on Bmpr1a in human cyst formation (i.e GWAS data).

      We fully agree that it is important to examine Bmpr1a in human cyst pathology. Unfortunately, there is no GWAS data on this. From the published RNA-seq data, which were obtained from postnatal lung specimen of congenital pulmonary airway malformation (CPAM) patients, “integrated suppression of BMP signaling pathway” was reported although altered expression of BMPR1A was not presented. We speculate that (1) BMPR1A is critical in embryonic development and a germline deficiency of BMPR1A may lead to early embryonic lethality prior to lung formation as supported by mouse data; (2) As suggested by our previously published study related to TGF-beta signaling and prenatal pulmonary cysts (Miao et al., Am J Physiol Lung Cell Mol Physiol 2021), dysregulation of BMPR1A-mediated signaling in a particular time window of fetal lung development may be sufficient to cause cyst formation, so that BMPR1A alteration may not be persistent to postnatal lung specimens.

      (2) Throughout the paper, there is a lack of quantification for the histological findings. Littermate controls should also be clearly defined genetically,

      We thank the reviewer for this suggestion and acknowledge the importance of quantitative measurement for the changes. We now add quantitative data on branching number and size of the airway tips to define the difference between wild-type and Bmpr1a CKO mouse lungs in Fig.1. “The littermate controls were the mice without any gene deletion due to lack of transgenes Tbx4-rtTA and/or TetO-Cre”, which is now added in Materials and Methods.

      (3) Figure 1 suppl: "Doxycycline" is misspelled.

      This has been corrected.

      (4) Figure1c Suppl: Hard to discern clear-cut expression of Bmpr1a protein in mesenchyme in WT. Comparable images with similar sizes of airways should be used.

      To provide a clearer comparison of Bmpr1a expression patterns between Bmpr1a CKO and control lungs, we enlarge the fluorescent stained lungs presented in Supplemental Figure 1C as suggested by the editor. Additionally, dotted lines have been added to delineate the airway boundaries from the surrounding mesenchyme to better visualize the Bmpr1a distribution in lung mesenchyme. Bmpr1a expression in fetal lung mesenchyme is easily detected at E15.5 when significant dilation of airways is presented in Bmpr1a CKO lung. It is rare to have comparable sizes of peripheral airways in the Bmpr1a CKO lung at this point.

      (5) Figure 2a: Expression of several genes studied and altered should be identified on scatter plot.

      As suggested by the reviewer, we now highlight the related genes, including Acta2, Myocd, Eln, Bmp4, Sox2, etc., in the scatter plot. In addition, we also highlight these critical genes in the heatmap (Fig. 2B and Fig. 7B).

      (6) Figure 2c: Authors should also consider staining for other smooth muscle markers.

      We now include a panel of Myh11 immunostaining in Figure 2E. Myh11 is another common marker for smooth muscle cells. Lack of Myh11 staining in Bmpr1a CKO lung airways further supports our conclusion that loss of mesenchymal Bmpr1a leads to defective airway smooth muscle growth.

      (7) Figure 3: ELN expression should be defined in a clear quantitative manner.

      We have presented RNA-seq data, Real-time PCR results, immunostaining, and western blot data for in vivo samples. Additionally, we have included in vitro experiment illustrating that Bmp4 induces Eln expression, suggesting that BMP signaling regulates Eln expression. We believe that these datasets collectively support our conclusion.

      (8) Figure 4: Additional information on p38 dependent signaling (Including in vivo studies) would potentially help to understand key molecular events and perhaps could help to address key mechanistic events, including their location and identity.

      We sincerely appreciate the insightful suggestion from the reviewer. While the study of p38-dependent signaling is definitely important to dissect the entire mechanisms, we are not going to include such experiments in this manuscript due to time constraints associated with in vivo studies.

      (9) Figure 6: Would be helpful to know whether Bmpr1a receptor is expressed in Myocd KO.

      Bmpr1a expression is not changed in Myocd KO lungs, which is now included as Figure 6C. Together with other data, this suggests that Myocd is a downstream target directly mediating Bmpr1a-regulated airway smooth muscle development.

      (10) Figure 7: Not clear how these findings, though interesting, relate to the body of studies and the pathogenesis of cyst formation. Other points: 1) The authors should re-examine/repeat co-staining in the KO mouse lung (right 2 images in the top group of 4) for Foxj1, Sox2, and CDH (right 2 images, Figure 7A). For one thing, the cadherin stain in the 2 KO images seems localized to the lumen. Secondly, the pattern of cadherin staining looks exactly the same in both KO images, suggesting an error and/or duplication 2) authors should place arrows on the heat map showing the location of SPC, Sox2, Sox9, and FoxJ1 bands 3) figure 7D graph needs numbers on y axis.

      Fig.7 provides an additional potential mechanism by which deficient Bmp signaling leads to abnormally increased Bmp ligand expression, which disrupts the formation of epithelial proximal-distal axis, and results in cystic defects. Further in vivo experiments are needed to test this, which is beyond the scope of this paper.

      The E-cadherin staining signal in the lumen is caused by the tissue section positioned at an interface between lumen and the apical membrane of the lining epithelial cells where the E-cadherin is localized.

      Triple immunostaining of E-Cadherin, Sox2, and FoxJ1 was performed for the same tissue section (upper two panels of Figure 7A) as these antibodies were derived from different species, but the images are presented in two different combinations for simplicity and clarity. For the lower two panels of Figure 7A, double immunostaining of Sox9/E-Cadherin and Spc/E-Cadherin were performed separately on different tissue sections due to both anti-Sox9 and anti-Spc antibodies were produced from rabbits.

      The genes listed in the heatmap are canonical and putative marker genes for differential lung epithelial cell lineages, such as Scgb1a1 for Clara cells and FoxJ1 for ciliated cells. Therefore, progenitor cell marker Sox2 and Sox9 were not included. In the updated heatmap, four widely acknowledged epithelial cell markers—Scgb1a1, FoxJ1, Sftpb, and Sftpc have been distinguished by utilizing a distinct font color (red) to enhance their visibility.

      Label for the y axis of Fig.7D is now added.

      Reviewer #2 (Public Review):

      (1) The authors may be aware that a recent paper (https://doi.org/10.1038/s41598-022-24858-3) reported on transcriptional changes seen in human CPAM. It would seem that some of the molecular changes seen in human CPAM move in the opposite direction of what is reported in mice lacking mesenchymal Bmrp1a. Perhaps the authors could comment on these differences in the discussion and whether they potentially explain the etiology of CPAM or branching morphogenesis in general.

      We thank the reviewer for referring this paper regarding human CPAM study. CPAM has a variety of histopathology. The type 1 CPAM is assumed to develop from more proximal bronchial/bronchiolar airways while type 2 CPAM is developed from relatively distal bronchiolar airways. In that publication, surgical resected lung specimens were collected from type 1 CPAM patients postnatally (0.5-1 year), in which the cysts were lined with ciliated pseudostratified columnar epithelial cells. Gene expression was compared between cystic lung tissues and adjacent non-cystic lung tissues. Interestingly, integrated suppression of BMP signaling pathway was shown by their data analysis. In our mouse model, the histopathology appears as human type 2 CPAM, such as back-to-back cysts lining with a simple layer of epithelial cells. Therefore, several factors could explain the differences between their published data and our study at the molecular level: (1) Different types of CPAM based on the histopathology; (2) Different sampling time points, developing cysts at fetal stage in mouse sample vs. developed cysts in postnatal huma samples; (3) Different comparison of diseased and normal tissues: separate normal lungs vs. cystic lungs in mice while in human cystic tissues vs. non-cystic tissues in the same lungs. We now include this reference in the Discussion.

      (2) Figure 4 shows that BMP4 increases SMADs, p38, and several muscle genes in mesenchymal cells. Figure 5 extends this finding with a clever strategy to label airway and vascular smooth muscle with different fluorescent molecules used to isolate different types of mesenchymal cells. It shows that non-vascular smooth muscle cells but not perivascular smooth muscles are responsive to BMP4 signaling as defined by increased expression of Myh11. Are there cell-restricted responses to the other genes shown in Figure 4? Given the lack of SMAD signaling and the increase seen in p38 signaling, would blocking p38 signaling influence the BMP responsiveness of these nonvascular smooth muscle cells?

      We thank the reviewer for this constructive comment. As we have addressed above, we will leave p38-mediated signaling and cyst formation to next step study due to time constraints associated with these studies.

      (3) Figure 6 shows that mesenchymal loss of Myocd causes a deficiency of airway smooth muscle cells, but this was not sufficient to create cysts. Did the authors ever check to see if it changed Sox2-Sox9 staining in the airway epithelium?

      There is no significant change in Sox2 expression in proximal airway epithelia of Myocd CKO lungs as detected by immunostaining. The result was not included in this manuscript.

      (4) Figure 7 shows that mesenchymal loss of Bmpr1a proximalizes the distal airway as defined by loss of Sox2 and FoxJ1 (a ciliated marker) and gain in (Sox9 and SP-C) staining. But Club cells expressing Scgb1a1 and Cyp2F2 are the predominant epithelial cells in the distal airway. The transcriptomics data in panel B shows expression of these genes is less in the mutant mice. Does this mean they fail to generate Club cells or there is just less expression per cell? In other words, what are the primary epithelial cells present in the airways of mice with loss of mesenchymal Bmpr1a?

      As shown in the heatmap of Fig.7b, the dysregulated gene expression in the Bmpr1a CKO extends beyond the featured epithelial cell markers, encompassing alterations in numerous putative marker genes. For example, several putative Club cell markers in addition to Scgb1a1 and Cyp2F2 were reduced in the Bmpr1a CKO lungs, suggesting a compromised differentiation of Club cells. Additionally, we observed upregulations of some molecular markers for distal progenitors and differentiated cells in the proximal region of airways, again suggesting a significant disruption in epithelial differentiation in the Bmpr1a CKO lungs. These abnormal cells can be further defined by a single cell transcriptomic approach in future.

      Recommendations for Authors:

      Reviewer #1 (Recommendations For The Authors):

      As discussed above, there may be an issue with the histological images and staining in 2 images in Figure 7A. The precise images, problems and suggestions to resolve the issue are in the Review.

      Please see our response to Reviewer 1 above.

      Reviewer #2 (Recommendations For The Authors):

      Minor Weaknesses:

      (1) Please enlarge the fluorescent stained lungs presented in Supplemental Figure 1C.

      We have revised this panel accordingly.

      (2) Figure 1D and E show that loss of Bmpr1a does not change proliferation or apoptosis on E15.5. Was that also seen through E18.5?

      We thank the reviewer for the thoughtful question about proliferation and apoptosis at later embryonic stages. Our focus here was to elucidate the mechanisms underlying abnormal branching morphogenesis and lung cyst initiation that occur prior to E15.5 in our model. Measuring the dynamic changes in cell proliferation and apoptosis at later timepoints will help to understand cyst progression, which will be our next focus.

      (3) BMP inhibitors used in Figure 4 show that BMP signaling regulates mesenchymal myogenesis independent of SMAD. But the experiments don't show how the inhibitors impact the control cells.

      We have examined the effects of the BMPR1 inhibitor LDN on the control cells. At the same dose (200 nM) and serum-free culture condition, LDN did not affect the basal level of BMP signaling (data not included) but blocked exogeneous BMP4-induced signaling elevation (Fig.4E).

      (4) Bmpr1a was deleted by administering doxycycline to pregnant dams prior to lung bud formation. It caused cystic disorders by disrupting proximal airspace. Could the authors speculate on why it does not impact tracheal and bronchiolar development? In other words, does the TBX4 promoter not target these cells? Do these cells not express Bmpr1a?

      The Tbx4 enhancer does target mesenchymal cells surrounding the trachea and bronchioles. Deletion of Bmpr1a in tracheal mesenchymal cells result in disruption of tracheal cartilage formation and smooth muscle differentiation. These phenotypes are evident in the gross view of lungs from E15.5 and later (Fig.1A). However, our manuscript is focusing on the phenotype of prenatal lung cysts, and we have chosen not to include complex data on tracheal development.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      This manuscript explores the impact of serotonin on olfactory coding in the antennal lobe of locusts and odor-evoked behavior. The authors use serotonin injections paired with an odor-evoked palp-opening response assay and bath application of serotonin with intracellular recordings of odor-evoked responses from projection neurons (PNs).

      Strengths:

      The authors make several interesting observations, including that serotonin enhances behavioral responses to appetitive odors in starved and fed animals, induces spontaneous bursting in PNs, and uniformly enhances PN responses to odors. Overall, I had no technical concerns. Weaknesses:

      While there are several interesting observations, the conclusions that serotonin enhanced sensitivity specifically and that serotonin had feeding-state-specific effects, were not supported by the evidence provided. Furthermore, there were other instances in which much more clarification was needed for me to follow the assumptions being made and inadequate statistical testing was reported.

      Major concerns.

      • To enhance olfactory sensitivity, the expected results would be that serotonin causes locusts to perceive each odor as being at a relatively higher concentration. The authors recapitulate a classic olfactory behavioral phenomenon where higher odor concentrations evoke weaker responses which is indicative of the odors becoming aversive. If serotonin enhanced the sensitivity to odors, then the dose-response curve should have shifted to the left, resulting in a more pronounced aversion to high odor concentrations. However, the authors show an increase in response magnitude across all odor concentrations. I don't think the authors can claim that serotonin enhances the behavioral sensitivity to odors because the locusts no longer show concentration-dependent aversion. Instead, I think the authors can claim that serotonin induces increased olfactory arousal.

      The reviewer makes a valid point. Bath application of serotonin increased POR behavioral responses across all odor concentrations, and concentration-dependent aversion was also not observed. Furthermore, the monotonic relationship between projection neuron responses and the intensity of current injection is altered when serotonin is exogenously introduced (see Author response image 1; see below for more explanation). Hence, our data suggests that serotonin alters the dose-response relationship between neural/behavioral responses and odor intensity. As recommended, we have followed what the reviewer has suggested and revised our claim to serotonin inducing increase in olfactory arousal. The new physiology data has been added as Supplementary Figure 3 to the revised manuscript.

      • The authors report that 5-HT causes PNs to change from tonic to bursting and conclude that this stems from a change in excitability. However, excitability tests (such as I/V plots) were not included, so it's difficult to disambiguate excitability changes from changes in synaptic input from other network components.

      To confirm that the PN excitability did indeed change after serotonin application, we performed a new set of current-clamp recordings. In these experiments, we monitored the spiking activities in individual PNs as we injected different levels of current injections (200 – 1000 pico Amperes). Note that locust LNs that provide recurrent inhibition arborize and integrate inputs from a large number of sensory neurons and projection neurons. Therefore, activating a single PN should not activate the local neurons and therefore the antennal lobe network.

      We found that the total spiking activity monotonically increased with the magnitude of the current injection in all four PNs recorded (Author response image 1). However, after serotonin injection, we found that the spiking activity remained relatively stable and did not systematically vary with the magnitude of the current injection. While the changes in odor-evoked responses may incorporate both excitability changes in individual PNs and recurrent feedback inhibition through GABAergic LNs, these results from our current injection experiments unambiguously indicate that there are changes in excitability at the level of individual PNs. We have added this result to the revised manuscript.

      Author response image 1.

      Current-injection induced spiking activity in individual PNs is altered after serotonin application. (A) Representative intracellular recordings showing membrane potential fluctuations as a function of time for one projection neuron (PNs) in the locust antennal lobe. A two-second window when a positive 200-1000pA current was applied is shown. Firing patterns before (left) and after (right) serotonin application are shown for comparison. Note, the spiking activity changes after the 5HT application. The black bar represents the 20mV scale. (B) Dose-response curves showing the average number of action potentials (across 5 trials) during the 2second current pulse before (green) and after (purple) serotonin for each recorded PN. Note that the current intensity was systematically increased from 200 pA to 1000 pA. The (C) The mean number of spikes across the four recorded cells during current injection is shown. The color progression represents the intensity of applied current ranging 200pA (leftmost bar) to 1000pA (rightmost bar). The dose-response trends before (green) and after (purple) 5HT application are shown for comparison.. The error bars represent SEM across the four cells.

      • There is another explanation for the theoretical discrepancy between physiology and behavior, which is that odor coding is further processing in higher brain regions (ie. Other than the antennal lobe) not studied in the physiological component of this study. This should at least be discussed.

      This is a valid argument. For our model of neural mapping onto behavior to work, we only need the odorant that evokes or suppresses PORs to activate a distinct set of neurons. Having said that, our extracellular recording results (Fig. 6E) indicate that hexanol (high POR) and linalool (low POR) do activate highly non-overlapping sets of PNs in the antennal lobe. Hence, our results suggest that the segregation of neural activity based on behavioral relevance already begins in the antennal lobe. We have added this clarification to the discussion section.

      • The authors cannot claim that serotonin underlies a hunger state-dependent modulation, only that serotonin impacts responses to appetitive odors. Serotonin enhanced PORs for starved and fed locusts, so the conclusion would be that serotonin enhances responses regardless of the hunger state. If the authors had antagonized 5-HT receptors and shown that feeding no longer impacts POR, then they could make the claim that serotonin underlies this effect. As it stands, these appear to be two independent phenomena.

      This is also a valid point. We have clarified this in the revised manuscript.

      Reviewer #2 (Public Review):

      Summary:

      The authors investigate the influence of serotonin on feeding behavior and electrophysiological responses in the antennal lobe of locusts. They find that serotonin injection changes behavior in an odorspecific way. In physiology experiments, they can show that antennal lobe neurons generally increase their baseline firing and odor responses upon serotonin injection. Using a modeling approach the authors propose a framework on how a general increase in antennal lobe output can lead to odorspecific changes in behavior. The authors finally suggest that serotonin injection can mimic a change in a hunger state.

      Strengths:

      This study shows that serotonin affects feeding behavior and odor processing in the antennal lobe of locusts, as serotonin injection increases activity levels of antennal lobe neurons. This study provides another piece of evidence that serotonin is a general neuromodulator within the early olfactory processing system across insects and even phyla. Weaknesses:

      I have several concerns regarding missing control experiments, unclear data analysis, and interpretation of results.

      A detailed description of the behavioral experiments is lacking. Did the authors also provide a mineral oil control and did they analyze the baseline POR response? Is there an increase in baseline response after serotonin exposure already at the behavioral output level? It is generally unclear how naturalistic the chosen odor concentrations are. This is especially important as behavioral responses to different concentrations of odors are differently modulated after serotonin injection (Figure 2: Linalool and Ammonium).

      POR protocol: Sixth instar locusts (Schistocera americana) of either sex were starved for 24-48 hours before the experiment or taken straight from the colony and fed blades of grass for the satiated condition. Locusts were immobilized by placing them in the plastic tube and securing their body with black electric tape (see Author response image 2). Locusts were given 20 - 30 minutes to acclimatize after placement in the immobilization tube. As can be noted, the head of the locusts along with the antenna and maxillary palps protruded out of this immobilization tube so they can be freely moved by the locusts. Note that the maxillary palps are sensory organs close to the mouth parts that are used to grab food and help with the feeding process.

      It is worth noting that our earlier studies had shown that the presentation of ‘appetitive odorants’ triggers the locust to open their maxillary palps even when no food is presented (Saha et al., 2017; Nizampatnam et al., 2018; Nizampatnam et al., 2022; Chandak and Raman, 2023.) Furthermore, our earlies results indicate that the probability of palp opening varies across different odorants (Chandak and Raman, 2023). We chose four odorants that had a diverse range of palp-opening: supra-median (hexanol), median (benzaldehyde), and sub-median (linaool). Therefore, each locust in our experiments was presented with one concentration of four odorants (hexanol, benzaldehyde, linalool, and ammonium) in a pseudorandomized order. The odorants were chosen based on our physiology results such that they evoked different levels of spiking activities.

      The odor pulse was 4 s in duration and the inter-pulse interval was set to 60 s. The experiments were recorded using a web camera (Microsoft) placed right in front of the locusts. The camera was fully automated with the custom MATLAB script to start recording 2 seconds before the odor pulse and end recording at odor termination. An LED was used to track the stimulus onset/offset. The POR responses were manually scored offline. Responses to each odorant were scored a 0 or 1 depending on if the palps remained closed or opened. A positive POR was defined as a movement of the maxillary palps during the odor presentation time window as shown on the locust schematic (Main Paper Figure 1).

      Author response image 2.

      Pictures showing the behavior experiment setup and representative palp-opening responses in a locust.

      As the reviewer inquired, we performed a new series of POR experiments, where we explored POR responses to mineral oil and hexanol, before and after serotonin injection. For this study, we used 10 locusts that were starved 24-48 hours before the experiment. Note that hexanol was diluted at 1% (v/v) concentration in mineral oil. Our results reveal that locusts PORs to hexanol (~ 50% PORs) were significantly higher than those triggered by mineral oil (~10% PORs). Injection of serotonin increased the POR response rate to hexanol but did not alter the PORs evoked by mineral oil (Author response image 3).

      Author response image 3.

      Serotonin does not alter the palp-opening responses evoked by paraffin oil. The PORs before and after (5HT) serotonin injection are summarized and shown as a bar plot for hexanol and paraffin oil. Striped bars signify the data collected after 5HT injection. Significant differences are identified in the plot (one-tailed paired-sample t-test; (*p<0.05).

      Regarding recordings of potential PNs - the authors do not provide evidence that they did record from projection neurons and not other types of antennal lobe neurons. Thus, these claims should be phrased more carefully.

      In the locust antennal lobe, only the cholinergic projection neurons fire full-blown sodium spikes. The GABAergic local neurons only fire calcium ‘spikelets’ (Laurent, TINS, 1996; Stopfer et al., 2003; see Author response image 4 for an example). Hence, we are pretty confident that we are only recording from PNs. Furthermore, due to the physiological properties of the LNs, their signals being too small, they are also not detected in the extracellular recordings from the locust antennal lobe. Hence, we are confident with our claims and conclusion.

      Author response image 4.

      PN vs LN physiological differences: Left: A representative raw voltage traces recorded from a local neuron before, during, and after a 4-second odor pulse are shown. Note that the local neurons in the locust antennal lobe do not fire full-blown sodium spikes but only fire small calcium spikelets. On the right: A representative raw voltage trace recorded from a representative projection neuron is shown for comparison. Clear sodium spikes are clearly visible during spontaneous and odor-evoked periods. The gray bar represents 4 seconds of odor pulse. The vertical black bar represents the 40mV.

      The presented model suggests labeled lines in the antennal lobe output of locusts. Could the presented model also explain a shift in behavior from aversion to attraction - such as seen in locusts when they switch from a solitarious to a gregarious state? The authors might want to discuss other possible scenarios, such as that odor evaluation and decision-making take place in higher brain regions, or that other neuromodulators might affect behavioral output. Serotonin injections could affect behavior via modulation of other cell types than antennal lobe neurons. This should also be discussed - the same is true for potential PNs - serotonin might not directly affect this cell type, but might rather shut down local inhibitory neurons.

      There are multiple questions here. First, regarding solitary vs. gregarious states, we are currently repeating these experiments on solitary locusts. Our preliminary results (not included in the manuscript) indicate that the solitary animals have increased olfactory arousal and respond with a higher POR but are less selective and respond similarly to multiple odorants. We are examining the physiology to determine whether the model for mapping neural responses onto behavior could also explain observations in solitary animals.

      Second, this reviewer makes the point raised by Reviewer 1. We agree that odor evaluation and decisionmaking might take place in higher brain regions. All we could conclude based on our data is that a segregation of neural activity based on behavioral relevance might provide the simplest approach to map non-specific increase in stimulus-evoked neural responses onto odor-specific changes in behavioral outcome. Furthermore, our results indicate that hexanol and linalool, two odorants that had an increase and decrease in PORs after serotonin injection, had only minimal neural response overlap in the antennal lobe. These results suggest that the formatting of neural activity to support varying behavioral outcomes might already begin in the antennal lobe. We have added this to our discussion.

      Third, regarding serotonin impacting PNs, we performed a new set of current-clamp experiments to examine this issue (Author response image 1). Our results clearly show that projection neuron activity in response to current injections (that should not incorporate feedback inhibition through local neurons) was altered after serotonin injection. Therefore, the observed changes in the odor-evoked neural ensemble activity should incorporate modulation at both individual PN level and at the network level. We have added this to our discussion as well.

      Finally, the authors claim that serotonin injection can mimic the starved state behavioral response. However, this is only shown for one of the four odors that are tested for behavior (HEX), thus the data does not support this claim.

      We note that Hex is the only appetitive odorant in the panel. But, as reviewer 1 has also brought up a similar point, we have toned down our claims and will investigate this carefully in a future study.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      • Was the POR of the locusts towards linalool and ammonium higher than towards a blank odor cartridge? I ask because the locusts appear to be less likely to respond to these odors and so I am concerned that this assay is not relevant to the ecological context of these odors. In other words, perhaps serotonin did not enhance the responses to these odors in this assay, because this is not a context in which locusts would normally respond to these odors.

      The POR response to linalool and ammonium is lower and comparable to that of paraffin oil. Serotonin does not increase POR responses to paraffin oil but does increase response to hexanol (an appetitive odorant). We have clarified this using new data (Author response image 5).

      • It seems to me that Figure 5C is the crux for understanding the potential impact of 5-HT on odor coding, but it is somewhat confusing and underutilized. Is the implication that 5-HT decorrelates spontaneous activity such that when an odor stimulus arrives, the odor-evoked activity deviates to a greater degree? The authors make claims about this figure that require the reader to guess as to the aspect of the figure to which they are referring.

      The reviewer makes an astute observation. Yes, the spontaneous activity in the antennal lobe network before serotonin introduction is not correlated with the ensemble spontaneous activity after serotonin bath application. Remarkably, the odor-evoked responses were highly similar, both in the reduced PCA space and when assayed using high-dimensional ensemble neural activity vectors. Whether the changes in network spontaneous activity have a function in odor detection and recognition is not fully understood and cannot be convincingly answered using our data. But this is something that we had pondered.

      • The modeling component summarized in Figure 6 needs clarification and more detail. Perhaps example traces associated with positive weighting within neural ensemble 1 relative to neural ensemble 2? I struggled to understand conceptually how the model resolved the theoretical discrepancy between physiology and behavior.

      As recommended, here is a plot showing the responses of four PNs that had positive weights to hexanol and linalool. As can be expected, each PN in this group had higher responses to hexanol and no response to linalool. Further, the four PNs that received negative weights had response only to linalool.

      Author response image 5.

      Odor-evoked responses of four PNs that received positive weights in the model (top panel), and four PNs that were assigned negative weights in the model (bottom).

      • Was there a significant difference between the PORs of hungry vs. fed locusts? The authors state that they differ and provide statistics for the comparisons to locusts injected with 5-HT, but then don't provide any statistical analyses of hungry vs. fed animals.

      The POR responses to HEX (an appetitive odorant) were significantly different between the hungry and starved locusts.

      Author response image 6.

      A bar plot summarizing PORs to all four odors for satiated locust (highlighted with stripes), before (dark shade), and after 5HT injection (lighter shade). To allow comparison before 5HT injection for starved locust plotted as well (without stripes). The significance was determined using a one-tailed paired-sample ttest(*p<0.05).

      • Were any of the effects of 5-HT on odor-evoked PN responses significant? No statistics are provided.

      We examined the distribution of odor-evoked responses in PNs before and after 5HT introduction. We found that the overall distribution was not significantly different between the two (one-tailed pairedsample t-test; p = 0.93).

      Author response image 7.

      Comparison of the distribution of odor-evoked PN responses before (green) and after (purple) 5HT introduction. One-tailed paired sample t-test was used to compare the two distributions.

      • The authors interchangeably use "serotonin", "5HT" and "5-HT" throughout the manuscript, but this should be consistent.

      This has been fixed in the revised manuscript.

      • On page 2 the authors provide an ecological relevance for linalool as being an additive in pesticides, however, linalool is a common floral volatile chemical. Is the implication that locusts have learned to associate linalool with pesticides?

      Linalool is a terpenoid alcohol that has a floral odor but has also been used as a pesticide and insect repellent [Beier et al., 2014]. As shown in Author response image 2, it evoked the least POR responses amongst a diverse panel of 22 odorants that were tested. We have clarified how we chose odorants based on the prior dataset in the Methods section.

      • In Figure 1, there should be a legend in the figure itself indicating that the black box indicates the absence of POR and the white box indicates presence, rather than just having it in the legend text.

      Done.

      • In Figure 2, the raw data from each animal can be moved to the supplements. The way it is presented is overwhelming and the order of comparisons is difficult to follow.

      Done.

      • For the induction of bursting in PNs by the application of 5-HT, were there any other metrics observed such as period, duration of bursts, or peak burst frequency? The authors rely on ISI, but there are other bursting metrics that could also be included to understand the nature of this observation. In particular, whether the bursts are likely due to changes in intrinsic biophysical properties of the PNs or polysynaptic effects.

      We could use other metrics as the reviewer suggests. Our main point is that the spontaneous activity of individual PNs changed. We have added a new current-injection experiments to show that the PNs output to square pulses of current becomes different after serotonin application (Author response image 1)

      • Were 4-vinyl anisole, 1-nonanol, and octanoic acid selected as additional odors because they had particular ecological relevance, or was it for the diversity of chemical structure?

      These odorants were selected based on both, chemical structure and ecological relevance. The logic behind this was to have a very diverse odor panel that consisted of food odorant – Hexanol, aggregation pheromone – 4-vinyl anisole, sex pheromone – benzaldehyde, acid – octanoic acid, base – ammonium, and alcohol – 1-nonanol. Additionally, we selected these odors based on previous neural and behavioral data on these odorants (Chandak and Raman, 2023, Traner and Raman, 2023, Nizampatnam et al, 2022 & 2018; Saha et al., 2017 & 2013).

      Reviewer #2 (Recommendations For The Authors):

      The electrophysiology dataset combines all performed experiments across all tested different PN-odor pairs. How many odors have been tested in a single PN and how many PNs have been tested for a single odor? This information is not present in the current manuscript. Can the authors exclude that there are odor-specific modulations?

      In total, our dataset includes recordings from 19 PNs. Seven PNs were tested on a panel of seven odorants (4-vinyl anisole, 1-nonanol, octanoic acid, Hex, Bza, Lool, and Amn), and the remaining twelve were tested with the four main odorants used in the study (Hex, Bza, Lool, and Amn). This information has been added to the Methods section

      How did the authors choose the concentrations of serotonin injections and bath applications - is this a naturalistic amount?

      The serotonin concentration for ephys experiments was chosen based on trial-error experiments:

      0.01mM was the highest concentration that did not cause cell death. For the behavioral experiments, we increased the concentration (0.1 M) due to the presence of anatomical structures in the locust's head such as air sacks, sheath as well as hemolymph which causes some degree of dilution that we cannot control.

      Behavior experiments were performed 3 hours after injection - ephys experiments 5-10 minutes following bath application. Can the authors exclude that serotonin affects neural processing differently on these different timescales?

      We cannot exclude this possibility. We did ePhys experiments 5-10 minutes after bath application as it would be extremely hard to hold cells for that long.

      A longer delay was required for our behavioral experiments as the locusts tended to be a bit more agitated with larger spontaneous movements of palps as well as exhibited unprompted vomiting. A 3hour period allowed the locust to regain its baseline level movements after 5HT introduction. [This information has been added to the methods section of the revised manuscript]

      Concerning the analysis of electrophysiological data. The authors should correct for changes in the baseline before performing PCA analysis. And how much of the variance is explained by PC1 and PC2?

      We did not correct for baseline changes or subtract baseline as we wanted to show that the odor-evoked neural responses still robustly encoded information about the identity of the odorant.

      The authors should perform dye injections after recordings to visualize the cell type they recorded from. Serotonin might affect also other cell types in the antennal lobe.

      As mentioned above, in the locust antennal lobe only PNs fire full-blown sodium spikes, and LNs only fire calcium spikelets (Author response image 4). Since these signals are small, they will be buried under the noise floor when using extracellular recording electrodes for monitoring responses in the AL antennal lobe.

      Hence we are pretty certain what type of cells we are recording from.

      There were several typos in the manuscript, please check again.

      We have fixed many of the grammatical errors and typos in the revised version.

    1. Author response:

      We would like to thank the reviewers for their helpful comments. We note that both reviews are strongly supportive with comments including, “a biophysical tour de force” (rev #1), “the study is exemplary” (rev #2), and “represents a roadmap for future work” (rev #2). Below we respond to each reviewer comment.

      Reviewer #1

      This study provides a detailed and quantitative description of the allosteric mechanisms resulting in the paradoxical activation of BRAF kinase dimers by certain kinase inhibitors. The findings provide a much needed quantiative basis for this phenomenon and may lay the foundation for future drug development efforts aimed at the important cancer target BRAF. The study builds on very evidence obtained by multiple independent biophysical methods.

      Summary:

      The authors quantitatively describe the complex binding equilibria of BRAF and its inhibitors resulting in some cases in the paradoxical activation of BRAF dimer when bound to ATP competitive inhibitors. The authors use a biophysical tour de force involving FRET binding assays, NMR, kinase activity assays and DEER spectroscopy.

      We are gratified by the reviewer’s supportive summary.

      Strengths:

      The strengths of the study are the beautifully conducted assays that allow for a thorough characterization of the allostery in this complex system. Additionally, the use of F-NMR and DEER spectroscopy provide important insights into the details of the process. The resulting model for binding of inhibitors and dimerization (Fig.4) is very helpful.

      Weaknesses:

      This is a complex system and its communication is inherently challenging. It might be of interest to the broader readership to understand the implications of the model for drug development and therapy.

      We agree with the reviewer that this is a complicated system. With regard to inhibitor development, a key insight is that designing aC-in state inhibitors that avoid paradoxical activation may be non-trivial because these molecules not only induce dimers but also tend to bind the second dimer subunit more weakly than the first, due to allosteric asymmetry and/or inherently different affinities for each RAF isoform. We feel the full implications for future therapeutic development are an extensive topic that is beyond the scope of our work, which is focused on the properties of current inhibitors.

      Recommendations for the author:

      The experimental work, analysis and resulting model are excellent. I had some difficulty following the complex model in some instances and it may be useful to review the description of the model and see whether it can be made more palatable to the broader readership. I think it would be useful to discuss the model presented in reference 40 (Kholodenko) and to compare it to the presented model here.

      We regret any confusion with regards to the nature of the model. Our analysis was built upon the model developed by Boris Kholodenko as reported in his 2015 Cell Reports paper. This formed the theoretical framework that combined with our experimental data allowed us to parameterize this model to obtain experimental values for the equilibrium constants and allosteric coupling factors.

      Reviewer #2

      This manuscript combines elegant biophysical solution measurements to address paradoxical kinase activation by Type II BRAF inhibitors. The novel findings challenge prevailing models, through experiments that are rigorous and carefully controlled. The study is exemplary in the breadth of strategies it uses to address protein kinase dynamics and inhibitor allostery.

      Summary:

      This manuscript uses FRET, 19F-NMR and DEER/EPR solution measurements to examine the allosteric effects of a panel of BRAF inhibitors (BRAFi). These include first-generation aC-out BRAFi, and more recent Type I and Type II aC-in inhibitors. Intermolecular FRET measurements quantify Kd for BRAF dimerization and inhibitor binding to the first and second subunits. Distinct patterns are found between aC-in BRAFi, where Type I BRAFi bind equally well to the first and second subunits within dimeric BRAF. In contrast, Type II BRAFi show stronger affinity for the first subunit and weaker affinity for the second subunit, an effect named "allosteric asymmetry". Allosteric asymmetry has the potential for Type II inhibitors to promote dimerization while favoring occupancy of only one subunit (BBD form), leading to enrichment of an active dimer.

      Measurements of in vitro BRAF kinase activity correlate amazingly well with the calculated amounts of the half site-inhibited BBD forms with Type II inhibitors. This suggests that the allosteric asymmetry mechanism explains paradoxical activation by this class of inhibitors. DEER/EPR measurements further examine the positioning of helix aC. They show systematic outward movement of aC with Type II inhibitors, relative to the aC-in state with Type I inhibitors, and further show that helix aC adopts multiple states and is therefore dynamic in apo BRAF. This makes a strong case that negative cooperativity between sites in the BRAF dimer can account for paradoxical kinase activation by Type II inhibitors by creating a half site-occupied homodimer, BBD. In contrast, Type I inhibitors and aC-out inhibitors do not fit this model, and are therefore proposed to be explained by previous proposed models involving negative allostery between subunits in BRAF-CRAF heterodimers, RAS priming, and transactivation.

      Strengths:

      This study integrates orthogonal spectroscopic and kinetic strategies to characterize BRAF dynamics and determine how it impacts inhibitor allostery. The unique combination of approaches presented in this study represents a road map for future work in the important area of protein kinase dynamics. The work represents a worthy contribution not only to the field of BRAF regulation but protein kinases in general.

      Weaknesses:

      Some questions remain regarding the proposed model for Type II inhibitors and its comparison to Type I and aC-out inhibitors that would be useful to clarify. Specifically, it would be helpful to address whether the activation of BRAF by Type II inhibitors, while strongly correlated with BBD model predictions in vitro, also depends on CRAF via BRAF-CRAF in cells and therefore overlaps with the mechanisms of paradoxical activation by Type I and aC-out inhibitors.

      We agree with the reviewer that this is a worthy question to be pursued. However, given the substantial experimental effort required for such an endeavor, and the highly supportive nature of the reviewer comments, including that “This is a strong manuscript that I feel is well above the bar for publication”, we believe this effort is more appropriate for a future study.

      This is a strong manuscript that I feel is well above the bar for publication. Nevertheless, it is recommended that the authors consider addressing the following points in order to support their major conclusions.

      (1) Fig 3D shows similar effects of Type II and Type I inhibitors in the biphasic increase of cellular pMEK/pERK. From this, the authors argue that Type II inhibitors are explained by negative allostery in the BRAF homodimer (based on Fig 2E), while Type I inhibitors are not. But it seems possible that despite the terrific correlation between BBD and BRAF kinase activities measured in vitro, CRAF is still important to explain pathway activation in cells. It also seems conceivable that the calculated %BBD between different Type II inhibitors may not correlate as well with their effects on pathway activation in cells. These possibilities should be addressed.

      We agree with the reviewer that it is likely that CRAF contributes to paradoxical activation by type II inhibitors in cells. It is also likely that other cellular factors such as RAS-priming and membrane recruitment play a role in activation. However, we note that for the type II inhibitors there is good agreement between the biophysical predictions and the concentration regimes in which activation is observed in cells, suggesting that these predictions are capturing a key part of the activation process that occurs in cells.

      (2) In Fig 2A, is it possible to report the activity of dimeric BRAF-WT in the absence of inhibitor? This would help confirm that the maximal activity measured after titrating inhibitor is indeed consistent with the predicted %BBD population, which would be expected to have half of the specific activity of BB.

      In principle, it is possible to determine the catalytic activity of apo dimers (BB) by combining our model predictions for the concentration of BB dimers and our activity measurements. However, because the activity assays are performed at nanomolar kinase concentrations, whereas the baseline dimerization affinity of BRAF is in the micromolar range, the observed activity of apo BRAF arises from a small subpopulation of dimers (on the order of 4 percent under the conditions of our experiments) and is therefore difficult to define accurately. As a result, we deemed it more suitable to compare our results to published activity measurements derived from 14-3-3-activated dimers which should represent fully dimerized BRAF. This analysis, as reported in Figure 2E, suggests that the BBD activity is approximately half of that of BB.

      (3) The 19F-NMR experiments make a good case for broadening of the helix aC signal in the BRAF dimer. From this, the study proposes that after inhibitor binds one subunit, the second unoccupied subunit retains dynamics. It would be useful to address this experimentally, if possible. For example, can the 19F-NMR signal be measured in the presence of inhibitor, to support the prediction that the unoccupied subunit is indeed dynamic and samples multiple conformations as in apo BRAF?

      We agree with the reviewer that it would be interesting to determine the dynamic response of BRAF to inhibitor binding. However, this is a challenging undertaking due to the biochemical heterogeneity that occurs at sub saturating inhibitor concentrations. For example, at any given inhibitor concentration, BRAF exists as a mixture of monomers, apo dimers, dimers with one inhibitor molecule, and dimers with two inhibitor molecules bound. This makes it challenging to relate the 19F NMR signal to a single biochemical state. Addressing this would require a substantial experimental effort that we feel is beyond the scope of this study.

    1. Author response:

      Reviewer 1:

      The paper “Quantifying gliding forces of filamentous cyanobacteria by self-buckling” combines experiments on freely gliding cyanobacteria, buckling experiments using two-dimensional V-shaped corners, and micropipette force measurements with theoretical models to study gliding forces in these organisms. The aim is to quantify these forces and use the results to perhaps discriminate between competing mechanisms by which these cells move. A large data set of possible collision events are analyzed, bucking events evaluated, and critical buckling lengths estimated. A line elasticity model is used to analyze the onset of buckling and estimate the effective (viscous type) friction/drag that controls the dynamics of the rotation that ensues post-buckling. This value of the friction/drag is compared to a second estimate obtained by consideration of the active forces and speeds in freely gliding filaments. The authors find that these two independent estimates of friction/drag correlate with each other and are comparable in magnitude. The experiments are conducted carefully, the device fabrication is novel, the data set is interesting, and the analysis is solid. The authors conclude that the experiments are consistent with the propulsion being generated by adhesion forces rather than slime extrusion. While consistent with the data, this conclusion is inferred.

      We thank the reviewer for the positive evaluation of our work.

      Summary:

      The paper addresses important questions on the mechanisms driving the gliding motility of filamentous cyanobacteria. The authors aim to understand these by estimating the elastic properties of the filaments, and by comparing the resistance to gliding under a) freely gliding conditions, and b) in post-buckled rotational states. Experiments are used to estimate the propulsion force density on freely gliding filaments (assuming over-damped conditions). Experiments are combined with a theoretical model based on Euler beam theory to extract friction (viscous) coefficients for filaments that buckle and begin to rotate about the pinned end. The main results are estimates for the bending stiffness of the bacteria, the propulsive tangential force density, the buckling threshold in terms of the length, and estimates of the resistive friction (viscous drag) providing the dissipation in the system and balancing the active force. It is found that experiments on the two bacterial species yield nearly identical values of f (albeit with rather large variations). The authors conclude that the experiments are consistent with the propulsion being generated by adhesion forces rather than slime extrusion.

      We appreciate this comprehensive summary of our work.

      Strengths of the paper:

      The strengths of the paper lie in the novel experimental setup and measurements that allow for the estimation of the propulsive force density, critical buckling length, and effective viscous drag forces for movement of the filament along its contour – the axial (parallel) drag coefficient, and the normal (perpendicular) drag coefficient (I assume this is the case, since the post-buckling analysis assumes the bent filament rotates at a constant frequency). These direct measurements are important for serious analysis and discrimination between motility mechanisms.

      We thank the reviewer for this positive assessment of our work.

      Weaknesses:

      There are aspects of the analysis and discussion that may be improved. I suggest that the authors take the following comments into consideration while revising their manuscript.

      The conclusion that adhesion via focal adhesions is the cause for propulsion rather than slime protrusion is consistent with the experimental results that the frictional drag correlates with propulsion force. At the same time, it is hard to rule out other factors that may result in this (friction) viscous drag - (active) force relationship while still being consistent with slime production. More detailed analysis aiming to discriminate between adhesion vs slime protrusion may be outside the scope of the study, but the authors may still want to elaborate on their inference. It would help if there was a detailed discussion on the differences in terms of the active force term for the focal adhesion-based motility vs the slime motility.

      We appreciate this critical assessment of our conclusions. Of course we are aware that many different mechanisms may lead to similar force/friction characteristics, and that a definitive conclusion on the mechanism would require the combination of various techniques, which is beyond the scope of this work. Therefore, we were very careful in formulating the discussion of our findings, refraining, in particular, from a singular conclusion on the mechanism but instead indicating “support” for one hypothesis over another, and emphasizing “that many other possibilities exist”.

      The most common concurrent hypotheses for bacterial gliding suggest that either slime extrusion at the junctional pore complex [A1], rhythmic contraction of fibrillar arrays at the cell wall [A2], focal adhesion sites connected to intracellular motor-microtubule complexes [A3], or modified type-IV pilus apparati [A4] provide the propulsion forces. For the slime extrusion hypothesis, which is still abundant today, one would rather expect an anticorrelation of force and friction: more slime extrusion would generate more force, but also enhance lubrication. The other hypotheses are more conformal to the trend we observed in our experiments, because both pili and focal adhesion require direct contact with a substrate. How contraction of fibrilar arrays would micromechanically couple to the environment is not clear to us, but direct contact might still facilitate force transduction. Please note that these hypotheses were all postulated without any mechanical measurements, solely based on ultra-structural electron microscopy and/or genetic or proteomic experiments. We see our work as complementary to that, providing a mechanical basis for evaluating these hypotheses.

      We agree with the referee that narrowing down this discussion to focal adhesion should have been avoided. We rewrote the concluding paragraph (page 8):

      “…it indicates that friction and propulsion forces, despite being quite vari able, correlate strongly. Thus, generating more force comes, inevitably, at the expense of added friction. For lubricated contacts, the friction coefficient is proportional to the thickness of the lubricating layer (Snoeijer et al., 2013 ), and we conjecture active force and drag both increase due to a more intimate contact with the substrate. This supports mechanisms like focal adhesion (Mignot et al., 2007 ) or a modified type-IV pilus (Khayatan et al., 2015 ), which generate forces through contact with extracellular surfaces, as the underlying mechanism of the gliding apparatus of filamentous cyanobacteria: more contacts generate more force, but also closer contact with the substrate, thereby increasing friction to the same extent. Force generation by slime extrusion (Hoiczyk and Baumeister, 1998 ), in contrast, would lead to the opposite behavior: More slime generates more propulsion, but also reduces friction. Besides fundamental fluid-mechanical considerations (Snoeijer et al., 2013 ), this is rationalized by two experimental observations: i. gliding velocity correlates positively with slime layer thickness (Dhahri et al., 2013 ) and ii. motility in slime-secretion deficient mutants is restored upon exogenous addition of polysaccharide slime. Still we emphasize that many other possibilities exist. One could, for instance, postulate a regulation of the generated forces to the experienced friction, to maintain some preferred or saturated velocity.”

      Can the authors comment on possible mechanisms (perhaps from the literature) that indicate how isotropic friction may be generated in settings where focal adhesions drive motility? A key aspect here would probably be estimating the extent of this adhesion patch and comparing it to a characteristic contact area. Can lubrication theory be used to estimate characteristic areas of contact (knowing the radius of the filament, and assuming a height above the substrate)? If the focal adhesions typically cover areas smaller than this lubrication area, it may suggest the possibility that bacteria essentially present a flat surface insofar as adhesion is concerned, leading to a transversely isotropic response in terms of the drag. Of course, we will still require the effective propulsive force to act along the tangent.

      We thank the referee for suggesting to estimate the dimensions of the contact region. Both pili and focal adhesion sites would be of sizes below one micron [A3, A4], much smaller than the typical contact region in the lubricated contact, which is on the order of the filament radius (few microns). So indeed, isotropic friction may be expected in this situation [A5] and is assumed frequently in theoretical work [A6–A8]. Anisotropy may then indeed be induced by active forces [A9], but we are not aware of measurements of the anisotropy of friction in bacterial gliding.

      For a more precise estimate using lubrication theory, rheology and extrusion rate of the secreted polysaccharides would have to be known, but we are not aware of detailed experimental characterizations.

      We extended the paragraph in the buckling theory on page 5 regarding the assumption of isotropic friction:

      “We use classical Kirchhoff theory for a uniform beam of length L and bending modulus B, subject to a force density ⃗b = −f ⃗t− η ⃗v, with an effective active force density f along the tangent ⃗t, and an effective friction proportional to the local velocity ⃗v, analog to existing literature (Fily et al., 2020; Chelakkot et al., 2014; Sekimoto et al., 1995 ). Presumably, this friction is dominated by the lubrication drag from the contact with the substrate, filled by a thin layer of secreted polysaccharide slime which is much more viscous than the surrounding bulk fluid. Speculatively, the motility mechanism might also comprise adhering elements like pili (Khayatan et al., 2015 ) or foci (Mignot et al., 2007 ) that increase the overall friction (Pompe et al., 2015 ). Thus, the drag due to the surrounding bulk fluid can be neglected (Man and Kanso, 2019 ), and friction is assumed to be isotropic, a common assumption in motility models (Fei et al., 2020; Tchoufag et al., 2019; Wada et al., 2013 ). We assume…”

      We also extended the discussion regarding the outcome of isotropic friction (page 7):

      “…Thus we plot f/v over η in Figure 4 D, finding nearly identical values over about two decades. Since f and η are not correlated with v0, this is due to a correlation between f and η. This relation is remarkable in two aspects: On the one hand, it indicates that friction is mainly isotropic. This suggests that friction is governed by an isotropic process like bond friction or lubrication from the slime layer in the contact with the substrate, the latter being consistent with the observation that mutations deficient of slime secretion do not glide but exogenous addition of slime restores motility (Khayatan et al., 2015 ). In contrast, hydrodynamic drag from the surrounding bulk fluid (Man and Kanso, 2019 ), or the internal friction of the gliding apparatus would be expected to generate strongly anisotropic friction. If the latter was dominant, a snapping-like transition into the buckling state would be expected, rather than the continuously growing amplitude that is observed in experiments. On the other hand, it indicates that friction and propulsion forces…”

      I am not sure why the authors mention that the power of the gliding apparatus is not rate-limiting. The only way to verify this would be to put these in highly viscous fluids where the drag of the external fluid comes into the picture as well (if focal adhesions are on the substrate-facing side, and the upper side is subject to ambient fluid drag). Also, the friction referred to here has the form of a viscous drag (no memory effect, and thus not viscoelastic or gel-like), and it is not clear if forces generated by adhesion involve other forms of drag such as chemical friction via temporary bonds forming and breaking. In quasi-static settings and under certain conditions such as the separation of chemical and elastic time scales, bond friction may yield overall force proportional to local sliding velocities.

      We agree with the referee that the origin of the friction is not easily resolved. Lubrication yields an isotropic force density that is proportional to the velocity, and the same could be generated by bond friction. Importantly, both types of friction would be assumed to be predominantly isotropic. We explicitly referred to lubrication drag because it has been shown that mutations deficient of slime extrusion do not glide [A4].

      Assuming, in contrast, that in free gliding, friction with the environment is not rate limiting, but rather the internal friction of the gliding apparatus, i.e., the available power, we would expect a rather different behavior during early-buckling evolution. During early buckling, the tangential motion is stalled, and the dynamics is dominated by the growing buckling amplitude of filament regions near the front end, which move mainly transversely. For geometric reasons, in this stage the (transverse) buckling amplitude grows much faster than the rear part of the filament advances longitudinally. Thus that motion should not be impeded much by the internal friction of the gliding apparatus, but by external friction between the buckling parts of the filament and the ambient. The rate at which the buckling amplitude initially grows should be limited by the accumulated compressive stress in the filament and the transverse friction with the substrate. If the latter were much smaller than the (logitudinal) internal friction of the gliding apparatus, we would expect a snapping-like transition into the buckled state, which we did not observe.

      In our paper, we do not intend to evaluate the exact origin of the friction, quantifying the gliding force is the main objective. A linear force-velocity relation agrees with our observations. A detailed analysis of friction in cyanobacterial gliding would be an interesting direction for future work.

      To make these considerations more clear, we rephrased the corresponding paragraph on page 7 & 8:

      “…Thus we plot f/v over η in Figure 4 D, finding nearly identical values over about two decades. Since f and η are not correlated with v0, this is due to a correlation between f and η. This relation is remarkable in two aspects: On the one hand, it indicates that friction is mainly isotropic. This suggests that friction is governed by an isotropic process like bond friction or lubrication from the slime layer in the contact with the substrate, the latter being consistent with the observation that mutations deficient of slime secretion do not glide but exogenous addition of slime restores motility (Khayatan et al., 2015 ). In contrast, hydrodynamic drag from the surrounding bulk fluid (Man and Kanso, 2019 ), or the internal friction of the gliding apparatus would be expected to generate strongly anisotropic friction. If the latter was dominant, a snapping-like transition into the buckling state would be expected, rather than the continuously growing amplitude that is observed in experiments. On the other hand, it indicates that friction and propulsion forces…”

      For readers from a non-fluids background, some additional discussion of the drag forces, and the forms of friction would help. For a freely gliding filament if f is the force density (per unit length), then steady gliding with a viscous frictional drag would suggest (as mentioned in the paper) f ∼ v! L η||. The critical buckling length is then dependent on f and on B the bending modulus. Here the effective drag is defined per length. I can see from this that if the active force is fixed, and the viscous component resulting from the frictional mechanism is fixed, the critical buckling length will not depend on the velocity (unless I am missing something in their argument), since the velocity is not a primitive variable, and is itself an emergent quantity.

      We are not sure what “f ∼ v! L η||” means, possibly the spelling was corrupted in the forwarding of the comments.

      We assumed an overdamped motion in which the friction force density ff (per unit length of the filament) is proportional to the velocity v0, i.e. ff ∼ η v0, with a friction coefficient η. Overdamped means that the friction force density is equal and opposite to the propulsion force density, so the propulsion force density is f ∼ ff ∼ η v0. The total friction and propulsion forces can be obtained by multiplication with the filament length

      L, which is not required here. In this picture, v0 is an emergent quantity and f and η are assumed as given and constant. Thus, by observing v0, f can be inferred up to the friction coefficient η. Therefore, by using two descriptive variables, L and v0, with known B, the primitive variable η can be inferred by logistic regression, and f then follows from the overdamped equation of motion.

      To clarify this, we revised the corresponding section on page 5 of the paper:

      “The substrate contact requires lubrication from polysaccharide slime to enable bacteria to glide (Khayatan et al., 2015 ). Thus we assume an over- damped motion with co-linear friction, for which the propulsion force f and the free gliding velocity v0 of a filament are related by f = η v0, with a friction coefficient η. In this scenario, f can be inferred both from the observed Lc ∼ (f/B)−1/3 and, up to the proportionality coefficient η, from the observed free gliding velocity. Thus, by combining the two relations, one may expect also a strong correlation between Lc and v0. In order to test this relation for consistency with our data, we include v0 as a second regressor, by setting x = (L−Lc(v0))/∆Lc in Equation 1, with Lc(v0) = (η v0/(30.5722 B))−1/3, to reflect our expectation from theory (see below). Now, η rather than f is the only unknown, and its ensemble distribution will be determined in the regression. Figure 3 E,F show the buckling behavior…”

      Reviewer 2:

      In the presented manuscript, the authors first use structured microfluidic devices with gliding filamentous cyanobacteria inside in combination with micropipette force measurements to measure the bending rigidity of the filaments.

      Next, they use triangular structures to trap the bacteria with the front against an obstacle. Depending on the length and rigidity, the filaments buckle under the propulsive force of the cells. The authors use theoretical expressions for the buckling threshold to infer propulsive force, given the measured length and stiffnesses. They find nearly identical values for both species, f ∼ (1.0 ± 0.6) nN/µm, nearly independent of the velocity.

      Finally, they measure the shape of the filament dynamically to infer friction coefficients via Kirchhoff theory. This last part seems a bit inconsistent with the previous inference of propulsive force. Before, they assumed the same propulsive force for all bacteria and showed only a very weak correlation between buckling and propulsive velocity. In this section, they report a strong correlation with velocity, and report propulsive forces that vary over two orders of magnitude. I might be misunderstanding something, but I think this discrepancy should have been discussed or explained.

      We regret the misunderstanding of the reviewer regarding the velocity dependence, which indicates that the manuscript should be improved to convey these relations correctly.

      First, in the Buckling Measurements section, we did not assume the same propulsion force for all bacteria. The logistic regression yields an ensemble median for Lc (and thus an ensemble median for f ), along with the width ∆Lc of the distribution (and thus also the width of the distribution of f ). Our result f ∼ (1.0 ± 0.6) nN/µm indicates the median and the width of the distribution of the propulsion force densities across the ensemble of several hundred filaments used in the buckling measurements. The large variability of the forces found in the second part is consistently reflected by this very wide distribution of active forces detected in the logistic regression in the first part.

      We did small modifications to the buckling theory paragraph to clarify that in the first part, a distribution of forces rather than a constant value is inferred (page 6)

      “Inserting the population median and quartiles of the distributions of bending modulus and critical length, we can now quantify the distribution of the active force density for the filaments in the ensemble from the buckling measurements. We obtain nearly identical values for both species, f ∼ (1.0±0.6) nN/µm, where the uncertainty represents a wide distribution of f across the ensemble rather than a measurement error.”

      The same holds, of course, when inferring the distribution of the friction coefficients (page 5):

      “The substrate contact requires lubrication from polysaccharide slime to enable bacteria to glide (Khayatan et al., 2015 ). Thus we assume an over- damped motion with co-linear friction, for which the propulsion force f and the free gliding velocity v0 of a filament are related by f = η v0, with a friction coefficient η. In this scenario, f can be inferred both from the observed Lc ∼ (f/B)−1/3 and, up to the proportionality coefficient η, from the observed free gliding velocity. Thus, by combining the two relations, one may expect also a strong correlation between Lc and v0. In order to test this relation for consistency with our data, we include v0 as a second regressor, by setting x = (L−Lc(v0))/∆Lc in Equation 1, with Lc(v0) = (η v0/(30.5722 B))−1/3, to reflect our expectation from theory (see below). Now, η rather than f is the only unknown, and its ensemble distribution will be determined in the regression. Figure 3 E,F show the buckling behavior…”

      The (naturally) wide distribution of force (and friction) leads to a distribution of Lc as well. However, due to the small exponent of 1/3 in the buckling threshold Lc ∼ f 1/3, the distribution of Lc is not as wide as the distributions of the individually inferred f or η. This is visualized in panel G of Figure 3, plotting Lc as a function of v0 (v0 is equivalent to f , up to a proportionality coefficient η). The natural length distribution, in contrast, is very wide. Therefore, the buckling propensity of a filament is most strongly characterized by its length, while force variability, which alters Lc of the individual, plays a secondary role.

      In order to clarify this, we edited the last paragraph of the Buckling Measurements section on page 5 of the manuscript:

      “…Within the characteristic range of observed velocities (1 − 3 µm/s), the median Lc depends only mildly on v0, as compared to its rather broad distribution, indicated by the bands in Figure 3 G. Thus a possible correlation between f and v0 would only mildly alter Lc. The natural length distribution (cf. Appendix 1—figure 1 ), however, is very broad, and we conclude that growth rather than velocity or force distributions most strongly impacts the buckling propensity of cyanobacterial colonies. Also, we hardly observed short and fast filaments of K. animale, which might be caused by physiological limitations (Burkholder, 1934 ).”

      Second, in the Profile analysis section, we did not report a correlation between force and velocity. As can be seen in Figure 4—figure Supplement 1, neither the active force nor the friction coefficient, as determined from the analysis of individual filaments, show any significant correlation with the velocity. This is also written in the discussion (page 7):

      We see no significant correlation between L or v0 and f or η, but the observed values of f and η cover a wide range (Figure 4 B, C and Figure 4—figure Supplement 1 ).

      Note that this is indeed consistent with the logistic regression: Using v0 as a second regressor did not significantly reduce the width of the distribution of Lc as compared to the simple logistic regression, indicating that force and velocity are not strongly correlated.

      In order to clarify this in the manuscript, we modified that part (page 7):

      “…We see no significant correlation between L or v0 and f or η, but the observed values of f and η cover a wide range (Figure 4 B,C and Figure 4— figure Supplement 1 ). This is consistent with the logistic regression, where using v0 as a second regressor did not significantly reduce the width of the distribution of critical lengths or active forces. The two estimates of the friction coefficient, from logistic regression and individual profile fits, are measured in (predominantly) orthogonal directions: tangentially for the logistic regression where the free gliding velocity was used, and transversely for the evolution of the buckling profiles. Thus we plot f/v over η in Figure 4 D, finding nearly identical values over about two decades. Since f and η are not correlated with v0, this is due to a correlation between f and η. This relation is remarkable in two aspects: On the one hand, it indicates that friction is mainly isotropic…”

      From a theoretical perspective, not many new results are presented. The authors repeat the well-known calculation for filaments buckling under propulsive load and arrive at the literature result of buckling when the dimensionless number (f L3/B) is larger than 30.6 as previously derived by Sekimoto et al in 1995 [1] (see [2] for a clamped boundary condition and simulations). Other theoretical predictions for pushed semi-flexible filaments [1–4] are not discussed or compared with the experiments. Finally, the Authors use molecular dynamics type simulations similar to [2–4] to reproduce the buckling dynamics from the experiments. Unfortunately, no systematic comparison is performed.

      [1]        Ken Sekimoto, Naoki Mori, Katsuhisa Tawada, and Yoko Y Toyoshima. Symmetry breaking instabilities of an in vitro biological system. Physical review letters, 75(1):172, 1995.

      [2]       Raghunath Chelakkot, Arvind Gopinath, Lakshminarayanan Mahadevan, and Michael F Hagan. Flagellar dynamics of a connected chain of active, polar, brownian particles. Journal of The Royal Society Interface, 11(92):20130884, 2014.

      [3]       Rolf E Isele-Holder, Jens Elgeti, and Gerhard Gompper. Self-propelled worm-like filaments: spontaneous spiral formation, structure, and dynamics. Soft matter, 11(36):7181–7190, 2015.

      [4]       Rolf E Isele-Holder, Julia J¨ager, Guglielmo Saggiorato, Jens Elgeti, and Gerhard Gompper. Dynamics of self-propelled filaments pushing a load. Soft Matter, 12(41):8495–8505, 2016.

      We thank the reviewer for pointing us to these publications, in particular the work by Sekimoto we were not aware of. We agree with the referee that the calculation is straight forward (basically known since Euler, up to modified boundary conditions). Our paper focuses on experimental work, the molecular dynamics simulations were included mainly as a consistency check and not intended to generate the beautiful post-buckling patterns observed in references [2-4]. However, such shapes do emerge in filamentous cyanobacteria, and with the data provided in our manuscript, simulations can be quantitatively matched to our experiments, which will be covered by future work.

      We included the references in the revision of our manuscript, and a statement that we do not claim priority on these classical theoretical results.

      Introduction, page 2:

      “…Self-Buckling is an important instability for self-propelling rod-like micro-organisms to change the orientation of their motion, enabling aggregation or the escape from traps (Fily et al., 2020; Man and Kanso, 2019; Isele-Holder et al., 2015; Isele-Holder et al., 2016 ). The notion of self-buckling goes back to work of Leonhard Euler in 1780, who described elastic columns subject to gravity (Elishakoff, 2000 ). Here, the principle is adapted to the self-propelling, flexible filaments (Fily et al., 2020; Man and Kanso, 2019; Sekimoto et al., 1995 ) that glide onto an obstacle. Filaments buckle if they exceed a certain critical length Lc ∼ (B/f)1/3, where B is the bending modulus and f the propulsion force density…”

      Buckling theory, page 5:

      “…The buckling of gliding filaments differs in two aspects: the propulsion forces are oriented tangentially instead of vertically, and the front end is supported instead of clamped. Therefore, with L < Lc all initial orientations are indifferently stable, while for L > Lc, buckling induces curvature and a resultant torque on the head, leading to rotation (Fily et al., 2020; Chelakkot et al., 2014; Sekimoto et al., 1995 ). Buckling under concentrated tangential end-loads has also been investigated in literature (de Canio et al., 2017; Wolgemuth et al., 2005 ), but leads to substantially different shapes of buckled filaments. We use classical Kirchhoff theory for a uniform beam of length L and bending modulus B, subject to a force density ⃗b = −f ⃗t − η ⃗v, with an effective active force density f along the tangent ⃗t, and an effective friction proportional to the local velocity ⃗v, analog to existing literature (Fily et al., 2020; Chelakkot et al., 2014; Sekimoto et al., 1995 )…”

      Further on page 6:

      “To derive the critical self-buckling length, Equation 5 can be linearized for two scenarios that lead to the same Lc: early-time small amplitude buckling and late-time stationary rotation at small and constant curvature (Fily et al., 2020; Chelakkot et al., 2014 ; Sekimoto et al., 1995 ). […] Thus, in physical units, the critical length is given by Lc = (30.5722 B/f)1/3, which is reproduced in particle based simulations (Appendix Figure 2 ) analogous to those in Isele-Holder et al. (2015, 2016).”

      Discussion, page 7 & 8:

      “…This, in turn, has dramatic consequences on the exploration behavior and the emerging patterns (Isele-Holder et al., 2015, 2016; Abbaspour et al., 2021; Duman et al., 2018; Prathyusha et al., 2018; Jung et al., 2020 ): (L/Lc)3 is, up to a numerical prefactor, identical to the flexure number (Isele-Holder et al., 2015, 2016; Duman et al., 2018; Winkler et al., 2017 ), the ratio of the Peclet number and the persistence length of active polymer melts. Thus, the ample variety of non-equilibrium phases in such materials (Isele-Holder et al., 2015, 2016; Prathyusha et al., 2018; Abbaspour et al., 2021 ) may well have contributed to the evolutionary success of filamentous cyanobacteria.”

      Reviewer 3:

      Summary:

      This paper presents novel and innovative force measurements of the biophysics of gliding cyanobacteria filaments. These measurements allow for estimates of the resistive force between the cell and substrate and provide potential insight into the motility mechanism of these cells, which remains unknown.

      We thank the reviewer for the positive evaluation of our work. We have revised the manuscript according to their comments and detail our replies and modifications next to the individual points below.

      Strengths:

      The authors used well-designed microfabricated devices to measure the bending modulus of these cells and to determine the critical length at which the cells buckle. I especially appreciated the way the authors constructed an array of pillars and used it to do 3-point bending measurements and the arrangement the authors used to direct cells into a V-shaped corner in order to examine at what length the cells buckled at. By examining the gliding speed of the cells before buckling events, the authors were able to determine how strongly the buckling length depends on the gliding speed, which could be an indicator of how the force exerted by the cells depends on cell length; however, the authors did not comment on this directly.

      We thank the referee for the positive assessment of our work. Importantly, we do not see a significant correlation between buckling length and gliding speeds, and we also do not see a correlation with filament length, consistent with the assumption of a propulsion force density that is more or less homogeneously distributed along the filament. Note that each filament consists of many metabolically independent cells, which renders cyanobacterial gliding a collective effort of many cells, in contrast to gliding of, e.g., myxobacteria.

      In response also to the other referees’ comments, we modified the manuscript to reflect more on the absence of a strong correlation between velocity and force/critical length. We modified the Buckling measurements section on page 5 of the paper:

      “The substrate contact requires lubrication from polysaccharide slime to enable bacteria to glide (Khayatan et al., 2015 ). Thus we assume an over-damped motion with co-linear friction, for which the propulsion force f and the free gliding velocity v0 of a filament are related by f = η v0, with a friction coefficient η. In this scenario, f can be inferred both from the observed Lc ∼ (f/B)−1/3 and, up to the proportionality coefficient η, from the observed free gliding velocity. Thus, by combining the two relations, one may expect also a strong correlation between Lc and v0. In order to test this relation for consistency with our data, we include v0 as a second regressor, by setting x = (L−Lc(v0))/∆Lc in Equation 1, with Lc(v0) = (η v0/(30.5722 B))−1/3, to reflect our expectation from theory (see below). Now, η rather than f is the only unknown, and its ensemble distribution will be determined in the regression. Figure 3 E, F show the buckling behavior…”

      Further, we edited the last paragraph of the Buckling measurements section on page 5 of the manuscript:

      “Within the characteristic range of observed velocities (1 − 3 µm/s), the median Lc depends only mildly on v0, as compared to its rather broad distribution, indicated by the bands in Figure 3 G. Thus a possible correlation between f and v0 would only mildly alter Lc. The natural length distribution (cf. Appendix 1—figure 1 ), however, is very broad, and we conclude that growth rather than velocity or force distributions most strongly impacts the buckling propensity of cyanobacterial colonies. Also, we hardly observed short and fast filaments of K. animale, which might be caused by physiological limitations (Burkholder, 1934 ).”

      We also rephrased the corresponding discussion paragraph on page 7:

      “…Thus we plot f/v over η in Figure 4 D, finding nearly identical values over about two decades. Since f and η are not correlated with v0, this is due to a correlation between f and η. This relation is remarkable in two aspects: On the one hand, it indicates that friction is mainly isotropic. This suggests that friction is governed by an isotropic process like bond friction or lubrication from the slime layer in the contact with the substrate, the latter being consistent with the observation that mutations deficient of slime secretion do not glide but exogenous addition of slime restores motility (Khayatan et al., 2015 ). In contrast, hydrodynamic drag from the surrounding bulk fluid (Man and Kanso, 2019 ), or the internal friction of the gliding apparatus would be expected to generate strongly anisotropic friction. If the latter was dominant, a snapping-like transition into the buckling state would be expected, rather than the continuously growing amplitude that is observed in experiments. On the other hand, it indicates that friction and propulsion forces…”

      Weaknesses:

      There were two minor weaknesses in the paper.

      First, the authors investigate the buckling of these gliding cells using an Euler beam model. A similar mathematical analysis was used to estimate the bending modulus and gliding force for Myxobacteria (C.W. Wolgemuth, Biophys. J. 89: 945-950 (2005)). A similar mathematical model was also examined in G. De Canio, E. Lauga, and R.E Goldstein, J. Roy. Soc. Interface, 14: 20170491 (2017). The authors should have cited these previous works and pointed out any differences between what they did and what was done before.

      We thank the reviewer for pointing us to these references. The paper by Wolgemuth is theoretical work, describing A-motility in myxobacteria by a concentrated propulsion force at the rear end of the bacterium, possibly stemming from slime extrusion. This model was a little later refuted by [A3], who demonstrated that focal adhesion along the bacterial body and thus a distributed force powers A-motility, a mechanism that has by now been investigated in great detail (see [A10]). The paper by Canio et al. contains a thorough theoretical analysis of a filament that is clamped at one end and subject to a concentrated tangential load on the other. Since both models comprise a concentrated end-load rather than a distributed propulsion force density, they describe a substantially different motility mechanism, leading also to substantially different buckling profiles. Consequentially, these models cannot be applied to cyanobacterial gliding.

      We included both citations in the revision and pointed out the differences to our work in the introduction (page 2):

      “…A few species appear to employ a type-IV-pilus related mechanism (Khayatan et al., 2015; Wilde and Mullineaux, 2015 ), similar to the better- studied myxobacteria (Godwin et al., 1989; Mignot et al., 2007; Nan et al., 2014; Copenhagen et al., 2021; Godwin et al., 1989 ), which are short, rod-shaped single cells that exhibit two types of motility: S (social) motility based on pilus extension and retraction, and A (adventurous) motility based on focal adhesion (Chen and Nan, 2022 ) for which also slime extrusion at the trailing cell pole was earlier postulated as mechanism (Wolgemuth et al., 2005 ). Yet, most gliding filamentous cyanobacteria do not exhibit pili and their gliding mechanism appears to be distinct from myxobacteria (Khayatan et al., 2015 ).”

      And in Buckling theory, page 5:

      “….The buckling of gliding filaments differs in two aspects: the propulsion forces are oriented tangentially instead of vertically, and the front end is supported instead of clamped. Therefore, with L < Lc all initial orientations are indifferently stable, while for L > Lc, buckling induces curvature and a resultant torque on the head, leading to rotation (Fily et al., 2020; Chelakkot et al., 2014; Sekimoto et al., 1995 ). Buckling under concentrated tangential end-loads has also been investigated in literature (de Canio et al., 2017; Wolgemuth et al., 2005 ), but leads to substantially different shapes of buckled filaments.”

      The second weakness is that the authors claim that their results favor a focal adhesion-based mechanism for cyanobacterial gliding motility. This is based on their result that friction and adhesion forces correlate strongly. They then conjecture that this is due to more intimate contact with the surface, with more contacts producing more force and pulling the filaments closer to the substrate, which produces more friction. They then claim that a slime-extrusion mechanism would necessarily involve more force and lower friction. Is it necessarily true that this latter statement is correct? (I admit that it could be, but is it a requirement?)

      We thank the referee for raising this interesting question. Our claim regarding slime extrusion is based on three facts: i. mutations deficient of slime extrusion do not glide, but start gliding as soon as slime is provided externally [A4]. ii. A positive correlation between speed and slime layer thickness was observed in Nostoc [A11]. iii. The fluid mechanics of lubricated sliding contacts is very well understood and predicts a decreasing resistance with increasing layer thickness.

      We included these considerations in the revision of our manuscript (page 8):

      “…it indicates that friction and propulsion forces, despite being quite variable, correlate strongly. Thus, generating more force comes, inevitably, at the expense of added friction. For lubricated contacts, the friction coefficient is proportional to the thickness of the lubricating layer (Snoeijer et al., 2013 ), and we conjecture active force and drag both increase due to a more intimate contact with the substrate. This supports mechanisms like focal adhesion (Mignot et al., 2007 ) or a modified type-IV pilus (Khayatan et al., 2015 ), which generate forces through contact with extracellular surfaces, as the underlying mechanism of the gliding apparatus of filamentous cyanobacteria: more contacts generate more force, but also closer contact with the substrate, thereby increasing friction to the same extent. Force generation by slime extrusion (Hoiczyk and Baumeister, 1998 ), in contrast, would lead to the opposite behavior: More slime generates more propulsion, but also reduces friction. Besides fundamental fluid-mechanical considerations (Snoeijer et al., 2013 ), this is rationalized by two experimental observations: i. gliding velocity correlates positively with slime layer thickness (Dhahri et al., 2013 ) and ii. motility in slime-secretion deficient mutants is restored upon exogenous addition of polysaccharide slime. Still we emphasize that many other possibilities exist. One could, for instance, postulate a regulation of the generated forces to the experienced friction, to maintain some preferred or saturated velocity.”

      Related to this, the authors use a model with isotropic friction. They claim that this is justified because they are able to fit the cell shapes well with this assumption. How would assuming a non-isotropic drag coefficient affect the shapes? It may be that it does equally well, in which case, the quality of the fits would not be informative about whether or not the drag was isotropic or not.

      The referee raises another very interesting point. Given the typical variability and uncertainty in experimental measurements (cf. error Figure 4 A), a model with a sightly anisotropic friction could be fitted to the observed buckling profiles as well, without significant increase of the mismatch. Yet, strongly anisotropic friction would not be consistent with our observations.

      Importantly, however, we did not conclude on isotropic friction based on the fit quality, but based on a comparison between free gliding and early buckling (Figure 4 D). In early buckling, the dominant motion is in transverse direction, while longitudinal motion is insignificant, due to geometric reasons. Thus, independent of the underlying model, mostly the transverse friction coefficiont is inferred. In contrast, free gliding is a purely longitudinal motion, and thus only the friction coefficient for longitudinal motion can be inferred. These two friction coefficients are compared in Figure 4 D. Still, the scatter of that data would allow to fit a certain anisotropy within the error margins. What we can exclude based on out observation is the case of a strongly anisotropic friction. If there is no ab-initio reason for anisotropy, nor a measurement that indicates it, we prefer to stick with the simplest

      assumption. We carefully chose our wording in the Discussion as “mainly isotropic” rather

      than “isotropic” or “fully isotropic”.

      We added a small statement to the Discussion on page 7 & 8:

      “... Thus we plot f/v over η in Figure 4 D, finding nearly identical values over about two decades. Since f and η are not correlated with v0, this is due to a correlation between f and η. This relation is remarkable in two aspects: On the one hand, it indicates that friction is mainly isotropic. This suggests that friction is governed by an isotropic process like bond friction or lubrication from the slime layer in the contact with the substrate, the latter being consistent with the observation that mutations deficient of slime secretion do not glide but exogenous addition of slime restores motility (Khayatan et al., 2015 ). In contrast, hydrodynamic drag from the surrounding bulk fluid (Man and Kanso, 2019 ), or the internal friction of the gliding apparatus would be expected to generate strongly anisotropic friction. If the latter was dominant, a snapping-like transition into the buckling state would be expected, rather than the continuously growing amplitude that is observed in experiments. On the other hand, it indicates that friction and propulsion forces ...”

      Recommendations for the authors

      The discussion regarding how the findings of this paper imply that cyanobacteria filaments are propelled by adhesion forces rather than slime extrusion should be improved, as this conclusion seems questionable. There appears to be an inconsistency with a buckling force said to be only weakly dependent on the gliding velocity, while its ratio with the velocity correlates with a friction coefficient. Finally, data and source code should be made publicly available.

      In the revised version, we have modified the discussion of the force generating mechanism according to the reviewer suggestions. The perception of inconsistency in the velocity dependence of the buckling force was based on a misunderstanding, as we detailed in our reply to the referee. We revised the corresponding section to make it more clear. Data and source code have been uploaded to a public data repository.

      Reviewer #2 (recommendations for the authors)

      Despite eLife policy, the authors do not provide a Data Availability Statement. For the presented manuscript, data and source code should be provided “via trusted institutional or third-party repositories that adhere to policies that make data discoverable, accessible and usable.” https://elifesciences.org/inside-elife/51839f0a/for-authors-updates- to-elife-s-data-sharing-policies

      Most of the issues in this reviewer’s public review should be easy to correct, so I would strongly support the authors to provide an amended manuscript.

      We added the Data Availability Statement in the amended manuscript.

      References

      [A1] E. Hoiczyk and W. Baumeister. “The junctional pore complex, a prokaryotic secretion organelle, is the molecular motor underlying gliding motility in cyanobacteria”. In: Curr. Biol. 8.21 (1998), pp. 1161–1168. doi: 10.1016/s0960-9822(07)00487-3.

      [A2] N. Read, S. Connell, and D. G. Adams. “Nanoscale Visualization of a Fibrillar Array in the Cell Wall of Filamentous Cyanobacteria and Its Implications for Gliding Motility”. In: J. Bacteriol. 189.20 (2007), pp. 7361–7366. doi: 10.1128/jb.00706- 07.

      [A3] T. Mignot, J. W. Shaevitz, P. L. Hartzell, and D. R. Zusman. “Evidence That Focal Adhesion Complexes Power Bacterial Gliding Motility”. In: Science 315.5813 (2007), pp. 853–856. doi: 10.1126/science.1137223.

      [A4] Behzad Khayatan, John C. Meeks, and Douglas D. Risser. “Evidence that a modified type IV pilus-like system powers gliding motility and polysaccharide secretion in filamentous cyanobacteria”. In: Mol. Microbiol. 98.6 (2015), pp. 1021–1036. doi: 10.1111/mmi.13205.

      [A5] Tilo Pompe, Martin Kaufmann, Maria Kasimir, Stephanie Johne, Stefan Glorius, Lars Renner, Manfred Bobeth, Wolfgang Pompe, and Carsten Werner. “Friction- controlled traction force in cell adhesion”. In: Biophysical journal 101.8 (2011), pp. 1863–1870.

      [A6] Hirofumi Wada, Daisuke Nakane, and Hsuan-Yi Chen. “Bidirectional bacterial gliding motility powered by the collective transport of cell surface proteins”. In: Physical Review Letters 111.24 (2013), p. 248102.

      [A7] Jo¨el Tchoufag, Pushpita Ghosh, Connor B Pogue, Beiyan Nan, and Kranthi K Mandadapu. “Mechanisms for bacterial gliding motility on soft substrates”. In: Proceedings of the National Academy of Sciences 116.50 (2019), pp. 25087–25096.

      [A8] Chenyi Fei, Sheng Mao, Jing Yan, Ricard Alert, Howard A Stone, Bonnie L Bassler, Ned S Wingreen, and Andrej Kosmrlj. “Nonuniform growth and surface friction determine bacterial biofilm morphology on soft substrates”. In: Proceedings of the National Academy of Sciences 117.14 (2020), pp. 7622–7632.

      [A9] Arja Ray, Oscar Lee, Zaw Win, Rachel M Edwards, Patrick W Alford, Deok-Ho Kim, and Paolo P Provenzano. “Anisotropic forces from spatially constrained focal adhesions mediate contact guidance directed cell migration”. In: Nature communications 8.1 (2017), p. 14923.

      [A10] Jing Chen and Beiyan Nan. “Flagellar motor transformed: biophysical perspectives of the Myxococcus xanthus gliding mechanism”. In: Frontiers in Microbiology 13 (2022), p. 891694.

      [A11] Samia Dhahri, Michel Ramonda, and Christian Marliere. “In-situ determination of the mechanical properties of gliding or non-motile bacteria by atomic force microscopy under physiological conditions without immobilization”. In: PLoS One 8.4 (2013), e61663.

    1. Author response:

      We extend our gratitude to the two reviewers and the editors at eLife for their meticulous examination of our manuscript, as well as for their valuable feedback and positive assessment. We are particularly pleased to observe in both the reviews and the editorial evaluation the recognition of the importance of our findings. Through this provisional response, we wish to convey to the editors, reviewers, and the readership of eLife our intention to enhance the paper by incorporating a detailed description of the sections pertaining to MAD analysis, data interpretation with combined HS-AFM and PCA methods, and specific portions of the discussions. This will involve editing the manuscript accordingly and providing separate explanations in the "author response”. We acknowledge that such additions will strengthen the comprehensiveness of our work and render it more self-contained.

      Moreover, in alignment with the recommendations from the review team, we will provide a thorough discussion of published data and offer a clearer explanation of our utilized methods, thereby providing a more robust foundation for our conclusions.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This study provides valuable insights into how chromatin-bound PfMORC controls gene expression in the asexual blood stage of Plasmodium falciparum. By interacting with key nuclear proteins, PfMORC appears to affect expression of genes important for host invasion and subtelomeric var genes. Correlating transcriptomic data with in vivo chromatin insights, the study provides solid evidence for the central role of PfMORC in epigenetic transcriptional regulation through modulation of chromatin compaction.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The study provides valuable insights into the role of PfMORC in Plasmodium's epigenetic regulation, backed by a comprehensive methodological approach. The overarching goal was to understand the role of PfMORC in epigenetic regulation during asexual blood stage development, particularly its interactions with ApiAP2 TFs and its potential involvement in the regulation of genes vital for Plasmodium virulence. To achieve this, they conducted various analyses. These include a proteomic analysis to identify nuclear proteins interacting with PfMORC, a study to determine the genome-wide localization of PfMORC at multiple developmental stages, and a transcriptomic analysis in PfMORCHA-glmS knockdown parasites. Taken together, this study suggests that PfMORC is involved in chromatin assemblies that contribute to the epigenetic modulation of transcription during the asexual blood stage development.

      Strengths:

      The study employed a multi-faceted approach, combining proteomic, genomic, and transcriptomic analyses, providing a holistic view of PfMORC's role. The proteomic analysis successfully identified several nuclear proteins that may interact with PfMORC. The genome-wide localization offered valuable insights into PfMORC's function, especially its predominant recruitment to subtelomeric regions. The results align with previous findings on PfMORC's interaction with ApiAP2 TFs. Notably, the authors meticulously contextualized their findings with prior research, including pre-prints, adding credibility to their work.

      Weaknesses:

      While the study identifies potential interacting partners and loci of binding, direct functional outcomes of these interactions remain an inference. The authors heavily rely on past research for some of their claims. While it strengthens some assertions, it might indicate a lack of direct evidence in the current study for particular aspects. The declaration that PfMORC may serve as an attractive drug target is substantial. While the data suggests its involvement in essential processes, further studies are required to validate its feasibility as a drug target.

      Reviewer #2 (Public Review):

      Summary:

      This is a paper entitled "Plasmodium falciparum MORC protein modulates gene expression through interaction with heterochromatin" describes the role of PfMORC during the intra-erythrocytic cycle of Plasmodium falciparum. Garcia et al. investigated the PfMORC-interacting proteins and PfMORC genomic distribution in trophozoites and schizonts. They also examined the transcriptome of the parasites after partial knockdown of the transcript.

      Strengths:

      This study is a significant advance in the knowledge of the role of PfMORC in heterochromatin assembly. It provides an in-depth analysis of the PfMORC genomic localization and its correlation with other chromatin marks and ApiAP2 transcription factor binding.

      Weaknesses:

      However, most of the conclusions are based on the function of interacting proteins and the genomic localization of the protein. The authors did not investigate the direct effects of PfMORC depletion on heterochromatin marks. Furthermore, the results of the transcriptomic analysis are puzzling as 50% of the transcripts are downregulated, a phenotype not expected for a heterochromatin marker.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Suggestions for improved or additional experiments, data, or analyses.

      • Figure 1A and Table 1: the authors should incorporate a volcano plot in their proteomic results presentation. This graphical representation can provide a more intuitive grasp of the most relevant proteins associated with PfMORC in terms of both their abundance and significance. It will aid in swiftly pinpointing proteins with the most notable differential associations. This will complement the comprehensive overview provided by the authors, referencing past research where PfMORC was detailed.

      We thank the reviewer for the suggestion. We agree with the reviewer that the volcano plot we now provide does indeed bring comprehensive information on associations between PfMORC and other cellular proteins. The volcano plot presented in the revised manuscript as Figure 1A, was generated using the normalized MS/MS counts from the anti-GFP and 3D7 (control) proteomics datasets (n=3). The potential PfMORC interacting proteins were determined using the fold changes and p-values between the two datasets, as provided in Table 1.

      Several protein interactors were strongly supported by statistical analysis (p-value), while others showed weaker p-value due to variability between replicates. Indeed, the total number of proteins identified in the three replicates, shown in the Venn diagram (Supplemental Figure 1D), exhibits a good overlap between the replicates but a lower number of identified proteins in the GFP-E1 sample. This variability was observed also in the statistical analysis. Indeed, by analyzing the GFP/3D7 ratios, some proteins have a significant difference in abundance (fold change greater than 1.5x) in one of the groups but do not meet the statistical threshold. For more clarity, we have included the -log p-value for the proteins listed in Table 1.

      Overall, these results demonstrate that many ApiAP2 proteins and several chromatin-associated factors interact with PfMORC.

      • Given the plethora of proteins detected in the PfMORC eluate, it raises the question of how many are genuine MORC interactors versus those that are merely nearby molecules acting adjacently. These might incidentally end up in the immunoprecipitate due to unintended interactions with DNA or chromatin. While the M&M section mentions that the beads were thoroughly washed, there is no specification about the washing buffer or its stringency (i.e., salinity level). At higher salinities, one could isolate core complexes of interactors associated with DNA or even RNA carryover.

      We apologize for this omission and have now added the buffer composition used to wash the beads. This section now reads "To perform the co-immunoprecipitation we followed the manufacturer protocol (ChromoTek, gta-20). Samples were lysed in modified RIPA buffer (50 mM Tris, pH 7.5, 150 mM NaCl, 0.5% sodium deoxycholate, 1% Nonidet P-40, 10 µg/ml aprotinin, 10 µg/ml leupeptin, 10 µg/ml, 1 mM phenylmethylsulfonyl fluoride, benzamidine) for 30 min on ice. The lysate was precleared with 50 µl of protein A/G-Agarose beads at 4°C for 1 h and clarified by centrifugation at 10,000 × g for 10 min. The precleared lysate was incubated overnight with an anti-GFP antibody using anti-GFP-Trap-A beads (ChromoTek, gta-20). The magnetic beads were then pelleted using a magnet (Invitrogen) and washed 3 times with wash buffer (10 mM Tris/Cl pH 7.5, 150 mM NaCl, 0.05 % Nonidet™ P40 Substitute, 0.5 mM EDTA)."

      We used the same salt concentration for immunoprecipitation as was used in the lysis buffer to minimize the binding of non-specific proteins. The wash buffer composition is updated in the revised manuscript. The immunoprecipitations were done in biological triplicates to ensure reproducibility and statistical support. A number of proteins are common across all three replicates. We also used wild-type parasites (non-GFP) as a negative control to eliminate non-specific hits, and we used a log2-fold change ≥1.5 relative to wild type parasites as our cutoff between the comparison groups.

      We believe that these conditions provide the stringency required to identify high confidence PfMORC interacting proteins, although this still leaves a possibility for additional lower affinity interactions. Future studies will certainly follow up candidate interaction partners to better define this complex. However, the complexity of the complex resembles that reported previously in Toxoplasma gondii (Farhat et al. 2020, Nat Microbiol) as well another report on the PfMORC complexes: https://elifesciences.org/reviewed-prepri nts/92499

      • The authors demonstrate that PfMORC creates distinct peaks in and around HP1-bound areas (Figure 2F), hinting at a specific role for PfMORC in heterochromatin compaction, boundary definition, and gene silencing. This pattern is clearly depicted in an example in Figure 2F. It would be beneficial to know if this enrichment profile is replicated elsewhere and, if so, it would be worthwhile to quantify it.

      This is an excellent point. Yes, this pattern is seen across the entire genome, where PfMORC is apposed to PfHP1-bound heterochromatic regions. As indicated in the manuscript, we have quantified this effect genome-wide; however, since we already display compiled data for Chromosome 2 (at both chromosome ends) pertaining to the position of PfMORC relative to PfHP1 we do not feel it is essential to provide such a figure for the entire genome as it does not alter the central message of our manuscript. Figure 2F is representative of the genome-wide distribution of PfMORC relative to PfHP1. The raw genome-wide data are available in Supplementary Information for further inspection of specific loci on other chromosomes.

      Recommendations for improving the writing and presentation.

      MAIN TEXT

      Panel e, referenced both in the main text and legend, is missing from Figure 4. This missing panel represents a significant finding of the study, highlighting according to the authors a low correlation between ChIP-seq gene targets and RNA-seq DEGs. This observation implies that PfMORC's global occupancy is more aligned with shaping chromatin architecture than directly regulating specific gene targets. In light of this, the authors should rephrase parts of their manuscript (including abstract and title) to avoid suggesting that PfMORC acts primarily (directly) as a gene regulator, emphasizing instead its role in influencing the topological structure of chromosomes.

      We have modified the title as suggested by the reviewer to more accurately reflect that PfMORC modulates chromatin architecture rather than acting as a direct regulator of specific genes. Our new title is: A Plasmodium falciparum MORC protein complex modulates epigenetic control of gene expression through interaction with heterochromatin

      We apologize for the omission of Figure 4e, which is now included in the revised manuscript. We found PfMORC occupancy on all chromosomes at subtelomeric regions, which are known to harbor genes related to immune evasion and antigenic variation (including most of the var genes). This study is also in agreement with Bryant et al. (PMID 32816370) which reported PfMORC occupancy along with PfISW1 at var gene promoters. PfMORC has also been identified in complexes with various ApiAP2 proteins in a proteome-wide study (Hillier et al. Cell Rep, PMID 31390575), as well as in immunoprecipitations of PfAP2-G2 (Singh et al., Mol Micro, PMID 33368818) and PfAP2-P (Subudhi et al., Nat Microbiol, PMID 37884813). The recent study by Subudhi et al. reports that PfAP2-P is involved in the regulation of var gene expression, antigenic variation, trophozoite development and parasite egress. It is therefore possible that PfMORC may have different effects on transcriptional regulation through interactions with different ApiAP2 transcription factors. Our comparison of PfMORC with known ApiAP2 protein occupancy reveals a high level of overlap, indicating that PfMORC may affect gene expression in various ways throughout the asexual cycle. Additionally, Hillier et al. show that PfMORC interaction is not limited to ApiAP2 but also implicates several other chromatin remodellers, which is consistent with our own results. We do not imply direct regulation of transcription via PfMORC in our manuscript. To the contrary, we suggest that it interacts with heterochromatin and thereby plays a role in the epigenetic control of asexual blood stage transcriptional regulation which is also clarified in the revised abstract.

      Another limitation of differential gene expression was use of the glmS ribozyme system, which resulted in only 50% depletion of the PfMORC transcript. There may still be enough PfMORC to rescue the gene expression we could not detect correctly. Therefore, it is challenging to interpret the function of PfMORC in only chromatin architecture but not in gene expression.

      If we believe that PfMORC in Plasmodium isn't mainly adjusting gene expression, the authors' suggestion that MORC is targeted by some AP2s becomes puzzling. How do we make sense of these different ideas? The authors need to clarify this to maintain consistency in their findings.

      Based on our data, we hypothesize that PfMORC acts as an accessory protein for ApiAP2 transcription factors. In a number of studies, including ours and the concurrent publication in eLife (https://elifesciences.org/reviewed-preprints/92499), PfMORC co-IPed with several ApiAP2 proteins, suggest it has multiple functions. In our previous study we showed that PfMORC expression is highest in mid and late asexual stages. A comparison of the PfMORC occupancy with 6 ApiAP2 (having different expression profile) suggest plasticity in PfMORC function. We have revised our discussion to make this hypothesis more transparent for the readers.

      The authors should cite Farhat et al. 2020 (Extended Data Fig. 1a), as it similarly identified 3 different ELM2-containing proteins in Toxoplasma MORC-associated complexes. This previous work provides context and supports the observations made with PfMORC in this study.

      Thank you for the suggestion and pointing out this omission. We have indeed cited the work of the Farhat group in the original manuscript and have now included this additional reference to corroborate the text and provide further support to our conclusions.

      Minor corrections to the text and figures.

      • Panel e is missing from Figure 4.

      As mentioned above Panel e is now included in Figure 4.

      • The captions are very minimally detailed. An effort must be made to better describe the panels as well as which statistical tests were used. As it stands, this is not really up to standard.

      We have elaborated the captions with more detailed descriptions, and we now provide additional information where further clarification was necessary.

      Reviewer #2 (Recommendations For The Authors):

      • The study lacks a direct correlation between the inferred function of PfMORC and the heterochromatin state of the genome after its depletion. It would be interesting to perform chip-seq on known heterochromatin markers such as H3K9me3, HP1 or H3K36me2/3 to measure the consequences of PfMORC depletion on global heterochromatin and its boundaries.

      While the proposed experiments are certainly interesting, they are beyond the scope of this study. The current manuscript is focused on PfMORC occupancy, its interacting partners, and its impact on differential gene regulation after PfMORC depletion in asexual parasites. Nonetheless, we did in fact compared the PfMORC occupancy with that of various heterochromatin markers (H2A.Z, H3K9ac, H3K4me3, H3K27ac, H3K18ac, H3K9me3, H3K36me2/3, H4K20me3, and H3K4me1) at 30hpi and 4hpi time points. These data are presented in Supplemental Figure 9. We did not find any significant colocalization, but documented the presence of PMORC in H3K36me2 depleted regions.

      • The PfMORC depletion was performed using a glms-based genetic system and the reviewer did not find any quantification of the depletion level at 24h or 36h. This is particularly important as the authors present RNA-seq data at these time points.

      We would like to clarify that RNA-seq was performed on 32hpi parasites after approximately 48 h treatment with 2.5 mM GlcN. At the trophozoite and schizont stage, PfMORC expression is high, which is why we selected these time points for RNA-seq (32hpi) and ChIP-seq (30hpi and 40hpi). PfMORC protein expression after GlcN treatment is analyzed in our previous paper (Singh et al., Sci Rep, PMID 33479315), where treatment with 2.5 mM GlcN leads to 50% reduction in PfMORC transcript at 32hpi. This is referenced in the Results section; we decided not to repeat the same experiment in the current manuscript.

      • The authors performed a thorough analysis of the correlations between ApiAP2 binding, histone modification and genomic localization of PfMORC (their chip-seq data). However, they found an inverse relationship between H3K36me2, a known histone repressive mark, and PfMORC genomic localization. This is particularly surprising when PfMORC itself is presented as a heterochromatin marker. The wording of this data is confusing in the results section (lines 257-258) and never discussed further. This important data should at least be discussed to make sense of this apparent contradiction.

      H3K36me2 indeed acts as a global repressive mark in P. falciparum. However, our hypothesis implies that PfMORC not only overlaps with H3K36me2 depleted region, but also interacts with other epigenetic regulators. Therefore, we propose that PfMORC is part of chromatin remodeling complexes involved in heterochromatin dynamics. Moreover, we did not see any overlap between several other heterochromatin markers, suggesting it has a unique binding preference not shared with other heterochromatin markers. Based on this study and parallel work submitted by Chahine et al. (https://elifesciences.org/reviewed-preprints/92499#abstract), it is evident that PfMORC is crucial for gene regulation and chromatin structure maintenance as shown in other organisms. Currently, we do not know what the apparent mutual exclusion between H3K36me2 and PfMORC implies mechanistically or how PfMORC interaction with heterochromatin aids in chromatin integrity. In Arabidopsis thaliana, MORC binding leads to chromatin compaction and reduces DNA accessibility to transcription factors, thereby repressing gene expression. In P. falciparum, overlap in the binding region of PfMORC with different transcription factors suggests several possibilities that require further investigation. Since there is only one gene encoding a PfMORC protein in P. falciparum, it is possible that PfMORC function is not limited to chromatin integrity, but it may also function to modulate gene expression at different stages. To fully explore the function of PfMORC will require investigating the functional role of the other interacting partners we and others have identified.

      We have modified the result section per the reviewer's suggestion, and we now also discuss this finding in more detail in the discussion section.

      • The ChIP-seq data are central to this manuscript. However, the presentation of this data in Figure 2A suggests that it is very noisy (particularly for Chr1). It would be of interest to present the called peaks together with the normalized data so that the reader can assess the quality of the ChIP-seq data.

      Our results clearly demonstrate the enrichment of PfMORC in sub-telomeric regions and internal heterochromatic islands. These results are consistent across all of our replicates taken at two independent time points of parasite asexual blood stage development and correlate well with the results of Le Roch: https://elifesciences.org/reviewed-preprints/92499. The raw data files have been provided and can be re-analyzed by any user.

      • The RNA-seq data showed that only a few genes are affected after 24 h of PfMORC depletion. Furthermore, there is an equal number of up- and down-regulated genes. It is not clear why depletion of a heterochromatin marker would induce down-regulation of genes. How these data relate to the partial depletion of PfMORC is not discussed.

      We would like to clarify that RNA-seq experiment was performed at 32hpi after GlcN following knockdown as previously described (Singh et al., Sci Rep, PMID 33479315). Briefly, synchronous, early trophozoites stage (24hpi) PfMORCglmS-HA parasites were treated with 2.5 mM GlcN until they reached the trophozoite stage (32 hpi) in the next cycle. These parasites were then collected for analysis by RNA-seq. We did not detect a substantial log-fold change at this point because only 50% of the transcripts were depleted in the glmS-based PfMORC knockdown system. However, we have seen a distinctive pattern of up (60) and down (103) regulated DEGs that are comprised of egress-related genes or surface antigens. We believe that PfMORC interacts with different ApiAP2 proteins, as shown in Figure 3A, and consequently exhibits multiple functions. This finding has now been corroborated in several other recent studies (See response to Reviewer 1 above).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      I will summarize my comments and suggestions below.

      (1) Abstract:

      "Non-catalytic (pseudo)kinase signaling mechanisms have been described in metazoans, but information is scarce for plants." To the best of my understanding EFR is an active protein kinase in vitro and in vivo and cannot be considered a pseudokinase. Consider rephrasing.

      We rephrased to: “Non-catalytic signaling mechanisms of protein kinase domains have been described in metazoans, but information is scarce for plants.”

      (2) Page 4: It should be noted, that while membrane associated Rap-RiD systems have been used in planta to activate receptor kinase intracellular domains by promoting interaction with a co-receptor kinase domain, this system does not resemble the actual activation mechanism in the plasma membrane. This would be worth discussing when introducing the system. For example, the first substrates of the RK signaling complex may also be membrane associated and not freely diffuse in solution, which may be important for enzyme-substrate interaction.

      We inserted on page 4: “The RiD system was previously applied in planta, maintaining membrane-association by N-terminal myristoylation (Kim et al., 2021). For the in vitro experiments, the myristoylation sites were excluded to facilitate the production of recombinant protein.”

      (3) Page 4 and Fig 1: The catalytic Asp in BRI1 is D1027 and not D1009 (https://pubmed.ncbi.nlm.nih.gov/21289069/). Please check and prepare the correct mutant protein if needed.

      We clarified this in the text by stating that we mutated the HRD-aspartate to asparagine in all our catalytic-dead mutants: “Kinase-dead variants with the catalytic residue (HRD-aspartate) replaced by asparagine (EFRD849N and BRI1D1009N), had distinct effects […]”. D1027 in BRI1 is the DFG-Asp, which was not mutated in our study.

      (4) Page 4 and Fig 1: Is BIK1 a known component of the BR signaling pathway and a direct BRI1 substrate? Or in other words how specific is the trans-phosphorylation assay? In my opinion, a more suitable substrate for BRI1/BAK1 would be BSK1 or BSK3 (for example https://pubmed.ncbi.nlm.nih.gov/30615605/).

      Kinase-dead BIK1 is a reported substrate of BRI1. We clarified this in the results section by inserting: “BIK1 was chosen as it is reported substrate of both, EFR/BAK1 and BRI1/BAK1 complexes (Lin et al., 2013).”

      (5) Fig. 1B Why is BIK1 D202N partially phosphorylated in the absence of Rap? I would suggest to add control lanes showing BRI1, EFR, FLS2, BAK1 and BIK1 in isolation. Given that a nice in vitro activation system with purified components is available, why not compare the different enzyme kinetics rather than band intensities at only 1 enzyme : substrate ratio?

      BIK1 D202N is partially phosphorylated due to the presence of active BAK1 that is capable of transphosphorylating BIK1 D202N as it has been reported in a previous study: (DOI: 10.1038/s41586-018-0471-x).

      (6) Page 4 and Fig 1: Is the kinase dead variant of EFR indeed kinase dead? I could still see a decent autorad signal for this mutant when expressed in E. coli (Fig 1 A in Bender et al., 2021; https://pubmed.ncbi.nlm.nih.gov/34531323/)? If this mutant is not completely inactive, could this change the interpretation of the experiments performed with the mutant protein in vitro and in planta in the current manuscript? In my opinion, it could be possible that a partially active EFR mutant can be further activated by BAK1, and in turn can phosphorylate BIK1 D202N. The differences in autorad signal for BRI1D1009?N and EFRD849N is very small, and the entire mechanism hinges on this difference.

      We would like to emphasize that the mechanism hinges on the difference between non-dimerized and dimerized kinase domains in the in vitro kinase assay. BRI1 D1009N fails to enhance BIK1 D202N trans-phosphorylation compared to the non-dimerized sample, while EFR D849N is still capable of enhancing BIK1 transphosphorylation upon dimerization as indicated by quantification of autorads (Figure 1B/C). We have also addressed this point in a section on the limitations of our study.

      (7) Fig 1B. "Our findings therefore support the hypothesis that EFR increases BIK1 phosphorylation by allosterically activating the BAK1 kinase domain." To the best of my understanding presence of wild-type EFR in the EFR-BAK1 signaling complex leads to much better phosphorylation of BIK1D202N when compared to the EFRD849N mutant. How does that support the allosteric mechanism? By assuming that the D849N mutant is in an inactive conformation and fully catalytically inactive (see above)? Again, I think the data could also be interpreted in such a way that the small difference in autorad signal for BIK1 between BRI1 inactive (but see above) and ERF inactive are due to EFR not being completely kinase dead (see above), rather than EFR being an allosteric regulator. To clarify this point I would suggest to a) perform quantitative auto- and trans-(generic substrate) phosphorylation assays with wt and D849N EFR to derive enzyme kinetic parameters, to (2) include the EFRD849 mutant in the HDX analysis and (3) to generate transgenic lines for EFRD489N/F761H/Y836F // EFRD489N/F761H/SSAA and compare them to the existing lines in Fig. 3.

      Mutations of proteins, especially those that require conformational plasticity for their function can have pleiotropic effects as the mutation may affect the conformational plasticity and consequently catalytic and non-catalytic functions that depend on the conformational plasticity. In such cases, it is difficult to fully untangle catalytic and non-catalytic functions. Coming back to EFR D849N, the D849N mutation may also impact the non-catalytic function by altering the conformational plasticity, explaining the difference observed in EFR vs EFR D849N. As you rightly suggested, HDX would be a way to address this but would still not clarify whether catalytic activity contributes to activation. We instead attempted to produce analog sensitive EFR variants for in vivo characterization of EFR-targeted catalytic inhibition. Unfortunately, we failed in producing an analog-sensitive variant for which we could show ATP-analog binding. To address your concern, we inserted a section on limitations of the study.

      (8) Fig. 2B,C, supplement 3 C,D. Has it been assessed if the different EFR versions were expressed to similar protein levels and still localized to the PM?

      Localization of the mutant receptors has not been explicitly evaluated by confocal microscopy. However, the selected mutation EFRF761H is shown to accumulate in stable Arabidopsis lines (Figure 3 – Supplement 1C) and BAK1 could be coIPed by all EFR variants upon elf18-treatment (Figure 3 B), indicating plasma membrane localization.

      (9) How the active-like conformation of EFR is in turn activating BAK1 is poorly characterized, but appears to be the main step in the activation of the receptor complex. Extending the HDX analyses to resting and Rap-activated receptor complexes could be a first step to address this question. I tried to come up with an experimental plan to test if indeed the kinase activity of BAK1 and not of EFR is essential for signal propagation, but this is a complex issue. You would need to be able to mimic an activated form of EFR (which you can), to make sure its inactive (possibly, see above) and likewise to engineer a catalytically inactive form of BAK1 in an active-like state (difficult). As such a decisive experiment is difficult to implement, I would suggest to discuss different possible interpretations of the existing data and alternative scenarios in the discussion section of the manuscript.

      We addressed your concern whether BAK1 kinase activity is essential for signaling propagation by pairing EFRF761H and BAK1D416N (Figure 4 Supplement 2 C) which fails to induce signaling. In this case, EFRF761H is in its activated conformation but cannot activate downstream signaling. We also attempted to address your concern by an in vitro kinase assay by pairing EFR and BAK1D416N and using a range of concentrations of the substrate BIK1D202N. We observed that catalytic activity of BAK1 but not EFR was essential for BIK1 phosphorylation. However, this experiment does not address whether activated EFR can efficiently propagate signaling in the absence of BAK1 catalytic activity. In the limitations of the study section, we now discuss the catalytic importance of EFR for signaling activation.

      Author response image 1.

      BIK1 trans-phosphorylation depends on BAK1 catalytic activity. Increasing concentrations of BIK1 D202N were used as substrate for Rap-induced dimers of EFR-BAK1, EFR D849N-BAK1, and EFR-BAK1 D416N respectively. BIK1 trans-phosphorylation depended on the catalytic activity of BAK1. Proteins were purified from E. coli λPP cells. Three experiments yielded similar results of which a representative is shown here.

      Reviewer #2:

      All of my suggestions are minor.

      Figure 1B, I think it would be more useful to readers to explain the amino acid in the D-N change, rather than just call it D-to-N? Also, please label the bands on the stained gel; the shift on FKBP-BRI1 and FKBP-EFR are noticeable on the Coomassie stain.

      We implemented your suggestions.

      Figure 1-Supplement 1. There is still a signal in pS612 BAK1 (it states 'also failed to induce BAK1 S612 phosphorylation' in the text, which is not quite correct). Also, could mention the gel shift seen in BAK1, which appears absent in Y836F.

      We corrected the text which now states: “To test whether the requirement for Y836 phosphorylation is similar, we immunoprecipitated EFR-GFP and EFRY836F-GFP from mock- or elf18-treated seedlings and probed co-immunoprecipitated BAK1 for S612 phosphorylation. EFRY836F also obstructed the induction of BAK1 S612 phosphorylation (Figure 1 – Supplement 1), indicating that EFRY836F and EFRSSAA impair receptor complex activation.” The gel shift of BAK1 you pointed out was not observed in replications and thus we prefer not to comment on it.

      Figure 2 and 3 are full of a, b, c,d's, which I don't understand. Sorry

      We used uppercase letters to indicate subpanels and lowercase letters to indicate the results of the statistical testing. In the figure caption, we have clarified that the lowercase letters refer to statistical comparisons.

      Figure 2 A. If each point on the x-axis is one amino acid, I think it would again be useful to name the amino acids that the gold or purple or blue colored lines extend through.

      Each point stands for a peptide which are sorted by position of their starting amino acid from N-terminus to C-terminus. We now added plots of HDX for individual peptides that correspond to the highlighted region in subpanel A.

      Figure Supplement 1 is very small for what it is trying to show, even on the printed page. If this residue were to be phosphorylated, what would happen to the H-bond?

      We suppose that VIa-Tyr phosphorylation would break the H-bond and causes displacement of the aC-b4 loop. Recent studies, published after our submission, highlight the importance of this loop for substrate coordination and ATP binding. Thus, phosphorylation of VIa-Tyr and displacing this loop may render the kinase rather unproductive. We have expanded the discussion to include this point.

      Figure 2B: Tyr 836 is not present in any of the alignments in Figure 2A. This should be rectified, because the text talks about the similarity to Tyr 156 in PKA.

      We have adjusted the alignments such that they now contain the VIa-Tyr residues of EFR and PKA.

      Figure 4D. Is there any particular reason that these Blots are so hard to compare or FKBP and BAK1?

      We assume it is referred to Figure 4 – Supplement 2 D. FKBP-EFR and FRB-BAK1 both are approximately the size of RubisCo, the most abundant protein in plant protein samples and which overlay the FKBP- and FRB-tagged kinase. Thus, it is difficult to detect these proteins.

      Reviewer #3:

      (1) The paper reporting the allosteric activation mechanism of EGFR should be cited.

      Will be included.

      (2)The authors showed that "Rap addition increased BIK1 D202N phosphorylation when the BRI1 or EFR kinase domains were dimerized with BAK1, but no such effect was observed with FLS2". Please explain why FLS2 failed to enhance BIK1 transphosphorylation by Rap treatment?

      Even though BIK1 is a reported downstream signaling component of FLS2/BAK1, it might be not the most relevant downstream signaling component and rather related RLCKs, like PBL1, might be better substrates for dimerized FLS2/BAK1. We haven’t tested this, however. Alternatively, the purified FLS2 kinase domain might be labile and quickly unfolds even though it was kept on ice until the start of the assay, or the N-terminal FKBP-tag may disrupt function. As the reason for our observation is not clear, we have removed FLS2 in vitro dimerization experiments from the manuscript.

      (3) Based solely on the data presented in Figure 1, it can be concluded that EFR's kinase activity is not required to facilitate BIK1 transphosphorylation. Therefore, the title of Figure 1, "EFR Allosterically Activates BAK1," may be inappropriate.

      We have changed the figure title to: “EFR facilitates BIK1 trans-phosphorylation by BAK1 non-catalytically.”

      (4) In Figure 1- Supplement 1, I could not find any bands in anti-GFP and anti-BAK1 pS612 of input. Please redo it.

      Indeed, we could not detect protein in the input samples of this experiment. BAK1 S612 phosphorylation is an activation mark and not necessarily expected to be abundant enough for detection in input samples. EFR-GFP, however, is usually detected in input samples and is reported in Macho et al. 2014 from which manuscript these lines come. Why EFR-GFP is not detected in this set of experiments is unclear but, in our opinion, does not detract from the conclusions drawn since similar amounts of EFR-GFP are pulled-down across all samples.

      (5) For Figure 2A, please mark the structure represented by each color directly in the figure.

      We have made the suggested change.

      (6) Please modify "EFRF761/Y836F and EFRF761H/SSAA restore BIK1 trans-phosphorylation" to "EFRF761H/Y836F and EFRF761H/SSAA restore BIK1 trans-phosphorylation".

      Thank you for spotting this. We changed it.

      (7) The HDX-MS analysis demonstrated that the EFR (Y836F) mutation inhibits the formation of the active-like conformation. Conversely, the EFR (F761H) mutation serves as a potent intragenic suppressor, significantly stabilizing the active-like conformation. Confirming through HDX-MS conformational testing that the EFR (Y836F F761H) double mutation does not hinder the formation of the active-like EFR kinase conformation would greatly strengthen the conclusions of the article.

      Response: We agree that this is beneficial, and we attempted to do it but failed to produce enough protein for HDX-MS analysis. We stated this now in an extra section of the paper (“Limitations of the study”).

    1. Author response:

      eLife assessment

      This study investigates associations between retrotransposon element expression and methylation with age and inflammation, using multiple public datasets. The study is valuable because a systematic analysis of retrotransposon element expression during human aging has been lacking. However, the data provided are incomplete due to the sole reliance on microarray expression data for the core analysis of the paper.

      Both reviewers found this study to be important. We have selected the microarray datasets of human blood adopted by a comprehensive study of ageing published in Nature Communications (DOI: doi: 10.1038/ncomms9570). We only included the datasets specifically collected for ageing studies. Therefore, the large RNA-seq cohorts for cancer, cardiovascular, and neurological diseases were not relevant to this study and cannot be included.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Tsai and Seymen et al. investigate associations between RTE expression and methylation and age and inflammation, using multiple public datasets. The concept of the study is in principle interesting, as a systematic analysis of RTE expression during human aging is lacking.

      We thank the reviewer for the positive comment.

      Unfortunately, the reliance on expression microarray data, used to perform the core analysis of the paper places much of the study on shaky ground. The findings of the study would not be sufficiently supported until the authors validate them with more suitable methods.

      In our discussion section in the manuscript, we have clarified that “we are aware of the limitations imposed by using microarray in this study, particularly the low number of intergenic probes in the expression microarray data. Our study can be enriched with the advent of large RNA-seq cohorts for aging studies in the future.” However, the application of microarray for RTE expression analysis was introduced previously. In fact, in a manuscript published by Reichmann et al. (DOI: 10.1371/journal.pcbi.1002486) which was cited 76 times, the authors showed and experimentally verified that cryptic repetitive element probes present in Illumina and Affymetrix gene expression microarray platforms can accurately and sensitively monitor repetitive element expression data. Inspired by this methodological manuscript with reasonable acceptance by other researchers, we trusted that the RTE microarray probes could accurately quantify RTE expression at class and family levels.

      Strengths:

      This is a very important biological problem.

      Weaknesses:

      RNA microarray probes are obviously biased to genes, and thus quantifying transposon analysis based on them seems dubious. Based on how arrays are designed there should at least be partial (perhaps outdated evidence) that the probe sites overlap a protein-coding or non-coding RNA.

      We disagree with the reviewer that quantifying transposon analysis based on microarray data is dubious. As previously shown by Reichmann et al., the quantification is reliable as long as the probes do not overlap with annotated genes and they are in the correct orientation to detect sense repetitive element transcripts. Reichman et al. identified 1,400 repetitive element probes in version 1.0, version 1.1 and version 2.0 of the Illumina Mouse WG-6 Beadchips by comparing the genomic locations of the probes with the Repeatmasked regions of the mouse genome. We applied the same criteria for Illumina Human HT-12 V3 (29431 probes) and V4 (33963) to identify the RTE-specific probes.

      The authors state they only used intergenic probes, but based on supplementary files, almost half of RTE probes are not intergenic but intronic (n=106 out of 264).

      All our identified RTE probes overlap with intergenic regions. However, due to their repetitive natures, some probes overlap with intronic regions, too. We can replace "intergenic" with "noncoding" in our revision to show that they do not overlap with the exons of protein-coding genes. However, we do not rule out the possibility that some of our detected RTE probes might overlap noncoding RNAs. In fact, the border between coding and non-coding genomes has recently become very fuzzy with new annotations of the genome. RTE RNAs can be easily considered as non-coding RNAs if we challenge our junk DNA view.

      This is further complicated by the fact that not all this small subset of probes is available in all analyzed datasets. For example, 232 probes were used for the MESA dataset but only 80 for the GTP dataset. Thus, RTE expression is quantified with a set of probes which is extremely likely to be highly affected by non-RTE transcripts and that is also different across the studied datasets. Differences in the subsets of probes could very well explain the large differences between datasets in multiple of the analyses performed by the authors, such as in Figure 2a, or 3a. It is nonetheless possible that the quantification of RTE expression performed by the authors is truly interpretable as RTE expression, but this must be validated with more data from RNA-seq. Above all, microarray data should not be the main type of data used in the type of analysis performed by the authors.

      In this study, we did not compare MESA with GTP etc. We have analysed each dataset separately based on the available data for that dataset. Therefore, sacrificing one analysis because of the lack of information from the other does not make sense. We would do that if we were after comparing different datasets. Moreover, the datasets are not comparable because they were produced from different blood cell types.

      Reviewer #2 (Public Review):

      Summary:

      Yi-Ting Tsai and colleagues conducted a systematic analysis of the correlation between the expression of retrotransposable elements (RTEs) and aging, using publicly available transcriptional and methylome microarray datasets of blood cells from large human cohorts, as well as single-cell transcriptomics. Although DNA hypomethylation was associated with chronological age across all RTE biotypes, the authors did not find a correlation between the levels of RTE expression and chronological age. However, expression levels of LINEs and LTRs positively correlated with DNA demethylation, and inflammatory and senescence gene signatures, indicative of "biological age". Gene set variation analysis showed that the inflammatory response is enriched in the samples expressing high levels of LINEs and LTRs. In summary, the study demonstrates that RTE expression correlates with "biological" rather than "chronological" aging.

      Strengths:

      The question the authors address is both relevant and important to the fields of aging and transposon biology.

      We thank the reviewer for finding this study relevant and important.

      Weaknesses:

      The choice of methodology does not fully support the primary claims. Although microarrays can detect certain intergenic transposon sequences, the authors themselves acknowledge in the Discussion section that this method's resolution is limited. More critical considerations, however, should be addressed when interpreting the results. The coverage of transposon sequences by microarrays is not only very limited (232 unique probes) but also predetermined. This implies that any potential agerelated overexpression of RTEs located outside of the microarray-associated regions, or of polymorphic intact transposons, may go undetected. Therefore, the authors should be more careful while generalising their conclusions.

      This is a bioinformatics study, and we have already admitted and discussed the limitations in the discussion section of this manuscript. All technologies have their own limitations, and this should not stop us from shedding light on scientific facts because of inadequate information. In the manuscript, we have discussed that all large and proper ageing studies were performed using microarray technology. Peters et al. (DOI: doi: 10.1038/ncomms9570) adopted all these microarray data in their transcriptional landscape of ageing manuscript. Our study essentially applies the Reichmann et al. method to the peripheral blood-related data from the Peters et al. manuscript. Since hypomethylation due to ageing is a well-established and broad epigenetic reprogramming, it is unlikely that only a fraction of RTEs is affected by this phenomenon. Therefore, the subsampling of RTEs should not affect the result so much. Indeed, this is supported in our study by the inverse correlation between DNA methylation and RTE expression for LINE and SINE classes despite having limited numbers of probes for LINE and SINE expressions.

      Additionally, for some analyses, the authors pool signals from RTEs by class or family, despite the fact that these groups include subfamilies and members with very different properties and harmful potentials. For example, while sequences of older subfamilies might be passively expressed through readthrough transcription, intact members of younger groups could be autonomously reactivated and cause inflammation. The aggregation of signals by the largest group may obscure the potential reactivation of smaller subgroups. I recommend grouping by subfamily or, if not possible due to the low expression scores, by subgroup. For example, all HERV subfamilies are from the ERVL family.

      We agree with the reviewer that different subfamilies of RTEs play different roles through their activation. However, we will lose our statistical power if we study RTE subfamilies with a few probes. Global epigenetic alteration and derepression of RTEs by ageing have been observed to be genome-wide. While our systematic analysis across RTE classes and families cannot capture alterations in subfamilies due to statistical power, it is still relevant to the research question we are addressing.

      Next, Illumina arrays might not accurately represent the true abundance of TEs due to non-specific hybridization of genomic transposons. Standard RNA preparations always contain traces of abundant genomic SINEs unless DNA elimination is specifically thorough. The problem of such noise should be addressed.

      We have checked the RNA isolation step from MESA, GTP, and GARP manuscripts. The total RNA was isolated using the Qiagen mini kit following the manufacturer’s recommendations. The authors of these manuscripts did not mention whether they eliminated genomics DNA, but we assumed they were aware of the DNA contamination and eliminated it based on the manufacturer’s recommendations. We have looked up the literature about non-specific hybridization of RTEs but could not find any evidence to support this observation. We would appreciate the reviewers providing more evidence about such RTE contaminations.

      Lastly, scRNAseq was conducted using 10x Genomics technology. However, quantifying transposons in 10x sequencing datasets presents major challenges due to sparse signals.

      Applying the scTE pipeline (https://www.nature.com/articles/s41467-021-21808-x), we have found that the statical power of quantifying RTE classes (LINE, SINE, and LTR) or RTE families (L1, L2, All, ERVK, etc.) are as good as each individual gene. However, our proposed method cannot analyse RTE subfamilies, and we did not do that.

      Smart-seq single-cell technology is better suited to this particular purpose.

      We agree with the reviewer that Smart-seq provides higher yield than 10x, but there is no Smart-seq data available for ageing study.

      Anyway, it would be more convincing if the authors demonstrated TE expression across different clusters of immune cells using standard scRNAseq UMAP plots instead of boxplots.

      Since the number of RTE reads per cell is low, showing the expression of RTEs per cell in UMAP may not be the best statistical approach to show the difference between the aged and young groups. This is why we chose to analyse with pseudobulk and displayed differential expression using boxplot rather than UMAP for each immune cell type.

      I recommend validating the data by RNAseq, even on small cohorts. Given that the connection between RTE overexpression and inflammation has been previously established, the authors should consider better integrating their observations into the existing knowledge.

      Until recently, there were no publicly-available, non-cancerous, large cohort of RNA-seq data for ageing studies. We tried to gain access to the two RNA-seq datasets suggested by reviewer 2: Marquez et al. 2020 (phs001934.v1.p1, controlled access) and Morandini et al. 2023 (GSE193141, public access).

      Unfortunately, Marquez et al. 2020 data is not accessible because the authors only provide the data for projects related to cardiovascular diseases. However, we did analyse Morandini et al. 2023 data, and we can confirm that no association was observed between any class and family of RTEs with chronological ageing, which is the second strong piece of evidence supporting the statement in the manuscript. However, as expected, we found a positive correlation between RTE expression and IFNI signature score.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This study provides an important finding that the local abundance of metabolites impacts the biology of the tumor microenvironment by utilizing kidney tumors from patients and adjacent normal tissues. The evidence supporting the claims of the authors is convincing although certain caveats need to be taken into consideration as the authors acknowledged in the paper. The work will be of interest to the research community working on metabolism and on kidney cancer especially.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The present study addresses how the local abundance of metabolites impacts the biology of the tumor microenvironment. The authors enroll patients harboring kidney tumors and use freshly resected tumor material for metabolic studies. Specifically, the authors separate the adjacent normal kidney tissue from the tumor material and then harvest the interstitial fluid from the normal kidney (KIF) or the tumor (TIF) for quantitative metabolomics. The plasma samples from the patient are used for comparison. Additionally, the authors also compare metabolite levels in the plasma of patients with kidney versus lung cancer (or healthy donors) to address how specific tumor types might contribute to circulating levels of metabolites. Altogether, the authors find that the metabolite levels in the KIF and TIF, although vastly different than plasma, are largely overlapping. These findings indicate that tissue of origin appears to have a stronger role in determining the local metabolic environment of tumors than the genetics or biochemistry of the tumor itself.

      Strengths:

      The biggest strength of the current study is the use of human patient-derived samples. The cohort size (~50 patients) is relatively large, which adds to the rigor of the work. The work also relies on a small pool of metabolites that can be quantitatively measured using methods developed by the authors. Focusing on a smaller metabolic pool also likely increases the signal-to-noise ratio and enables the more rigorous determination of any underlying differences. The manuscript is well-written and highlights both the significance of the findings and also acknowledges many of the caveats. The recognition of the metabolic contributions of surrounding normal tissue as the primary driver of local nutrient abundance is a novel finding in the work, which can be leveraged in future studies.

      We thank the Reviewer for their careful evaluation of the study and for their supportive comments.

      Weaknesses:

      The work has certain caveats, some of which have been already recognized by the authors. These include the use of steady-state metabolites and the possibility of cross-contamination of some TIF into the adjacent KIF. This study is also unable to distinguish the mechanisms driving the metabolic changes in KIF/TIF relative to circulating levels in plasma.

      We agree with the Reviewer that these are important caveats to consider when interpreting the results of this study.

      The relative similarity of KIF and TIF is quite surprising. However, this interpretation is presently based on a sampling of only ~100 polar metabolites and ~200 lipid molecules. It is, perhaps, possible that future technological developments that enable more comprehensive quantitative metabolic profiling might distinguish between KIF and TIF composition.

      The Reviewer raises another important point that our interpretation of KIF vs TIF is limited to the ~300 metabolites we measured. We agree it would be worthwhile quantifying more metabolites where technically feasible to further characterize similarities and differences in nutrient availability between tumor and normal tissues.

      In vitro, tissue culture is recognized to suffer from ‘non-physiological’ nutrient dependencies, which are impacted by the composition of culture media. Thus, in vivo studies remain our current gold-standard in mechanistic studies of tumor metabolism. It is presently unclear whether the findings of this work will be recapitulated in any of the kidney cancer in vivo models and thus be functionally testable.

      We thank the Reviewer for calling attention to the limitations of cell culture media in studying tumor metabolism. While both in vitro and in vivo approaches have inherent limitations, formulating culture media based on metabolite concentrations measured here and in other studies provides a tool to study the influence of nutrient availability on kidney cell or kidney cancer cell phenotypes in vitro. We also agree with the Reviewer that determining whether the findings in our study are recapitulated in mouse models of kidney cancer, as this might enable investigation into the factors that modulate nutrient availability in this tissue context.

      Reviewer #2 (Public Review):

      The study employs quantitative metabolomic and lipidomic analyses to scrutinize tumor interstitial fluid (TIF), adjacent normal kidney interstitial fluid (KIF), and plasma samples from renal cell carcinoma (RCC) patients. The authors delve into the intricate world of renal cell carcinoma and its tumor microenvironment, shedding light on the factors that shape nutrient availability in both cancerous and adjacent normal tissues. The authors prove that non-cancer-driven tissue factors play a dominant role in shaping nutrient availability in RCC. This finding opens up new avenues for research, suggesting that the tumor microenvironment is profoundly influenced by factors beyond the presence of cancer cells. This study not only contributes valuable insights into RCC metabolism but also prompts a reevaluation of the factors governing nutrient availability in tumor microenvironments more broadly. Overall, it represents a significant step forward in our understanding of the intricate interplay between cancer and its surrounding milieu.

      We thank the Reviewer for their evaluation of our work and for their supportive comments.

      The study is overall well-constructed, including appropriate analysis. Likewise, the manuscript is written clearly and supported by high-quality figures. Since the authors exclusively employed samples from RCC patients and did not include kidney interstitial fluid and plasma samples from healthy individuals, we cannot accurately assess the true significance and applicability of the results until the role of cancer cells in reshaping KIF is understood. In essence, some metabolite levels in the tumor interstitial fluid did not show an increase or decrease compared to the adjacent normal kidney interstitial fluid. However, the levels of these metabolites in both TIF and KIF might be higher or lower than those in kidney interstitial fluid from healthy individuals, and the roles of these metabolites should not be overlooked. Similar concerns extend to plasma levels, emphasizing the importance of metabolites that synchronously change in RCC TIF, KIF, and plasma-whether elevated or reduced.

      We agree with the Reviewer that an important caveat in considering the study findings is that we do not have KIF values from healthy individuals. Since resection of normal kidney is not a common procedure, obtaining KIF samples from healthy patients was not possible to complement our analysis. We further agree that the metabolite levels we measured in KIF or plasma are plausibly impacted by the presence of RCC. We did compare the composition of polar metabolites in the plasma from RCC, lung cancer, and healthy patients, highlighting how cystine is affected by tumor presence and/or sample collection methodology. We also point out that factors such as diet will impact metabolites in both blood and tissues.

      Reviewer #3 (Public Review):

      In this study, the authors utilized mass spectrometry-based quantification of polar metabolites and lipids in normal and cancerous tissue interstitial fluid and plasma. This showed that nutrient availability in tumor interstitial fluid was similar to that of interstitial fluid in adjacent normal kidney tissue, but that nutrients found in both interstitial fluid compartments were different from those found in plasma. This suggests that the nutrients in kidney tissue differ from those found in blood and that nutrients found in kidney tumors are largely dictated by factors shared with normal kidney tissue. Those data could be useful as a resource to support further study and modeling of the local environment of RCC and normal kidney physiology.

      We thank the Reviewer for their time considering our paper and for their supportive comments.

      In Figures 1D and 1E, there were about 30% of polar metabolites and 25% of lipids significantly different between TIF and KIF, which could be key factors for RCC tumors. This reviewer considers that the authors should make comments on this.

      We agree with the Reviewer that the metabolites that significantly differ between TIF and KIF are of interest, particularly for those studying RCC tumor metabolism. We comment on some of the metabolites driving differences between TIF and KIF in our discussion of Figure 2, and in the revised manuscript we now include a new figure showing a heatmap that enables visualization of these metabolites (Figure 2-Supplement 1A-B).

      Recommendations for the authors:

      From the Reviewing Editor:

      Figure 2 needs to plot heatmaps for both upregulated and downregulated metabolites in TIF.

      We agree and now include heatmaps for significantly differing polar metabolites and lipids in TIF vs KIF as requested by Reviewer 3 (Figure 2-Supplement 1A-B). For completeness, we also include heatmaps for metabolites differing between healthy and RCC plasma (Figure 2-Supplement 2C) and for NSCLC and RCC plasma (Figure 2-Supplement 2D).

      There is a need to show whether the differences in these metabolites between plasma and tissue interstitial fluid are specific to RCC patients or if they are also present in normal individuals.

      Unfortunately, it has not been possible for us to collect KIF from healthy individuals. Since resection of normal kidney is not a common procedure, we have no way to obtain sufficient KIF samples from healthy patients for this measurement. We discuss this as a limitation of the study.

      Reviewer #1 (Recommendations For The Authors):

      a. The authors should provide additional details about the methodology to separate the KIF and TIF. Contaminating metabolites from surrounding tissue or the peritoneal fluids could impact interpretation and it would be helpful to understand how these challenges were addressed during tissue collection for this study. Additionally, was the collected tissue minced or otherwise dissociated? If so, could these procedures cause tissue lysis and contaminate the KIF/TIF with intracellular components?

      We thank the Reviewer for the suggestions to include more information about the sampling methodology. Care was taken to minimize cell lysis incurred by the processing methodology as tissues were not minced, smashed, nor dissociated, however there is still a possibility of some level of tissue lysis that is pre-existing or occurs during the isolation procedure. We note this caveat in the text (lines 218-220) and have updated the Methods with more details of the sampling and processing of the samples.

      b. Although the authors focus on metabolites that are elevated in TIF (relative to KIF and plasma), it would be equally relevant to consider the converse. Metabolites that are reduced in TIF, either due to underproduction or overconsumption, could render the tumors auxotrophic for some critical dependencies and identify some novel metabolic vulnerabilities. In this regard, Figure 2 could have a heatmap of the top metabolites that are elevated and depleted specifically in the TIF.

      We agree with the Reviewer it is useful to include heatmaps to better display the metabolites that significantly differ between TIF and KIF and now include these in Figure 2-Supplement 1A-B.

      c. The future utilization of this knowledge would depend on our ability to model these differences. Would interstitial tissue from a normal mouse kidney or tumor-bearing mouse kidney recapitulate the same differences relative to mouse plasma?

      We agree with the Reviewer that it would be worth determining whether the findings in our study are recapitulated in mouse models of kidney cancer, which would support future investigation into the factors that modulate nutrient availability. This is an interesting question, but we did not have access to endogenously arising models of RCC, which have been a limitation for the field, and comparison of normal mouse kidney metabolite data to human metabolite data is problematic for obvious reasons. Thus, we had no choice but to discuss this as a limitation of the study.

      Reviewer #2 (Recommendations For The Authors):

      In this study, Abbott et al. investigated the metabolic profile of renal cell carcinoma (RCC) by analyzing the tumor interstitial fluid (TIF), adjacent normal kidney interstitial fluid (KIF), and plasma samples from patients. The results indicate that nutrient composition in TIF closely resembles that of KIF, suggesting that tissue-specific factors, rather than tumor-driven alterations, have a more significant impact on nutrient levels. These findings are interesting. The study is overall well-constructed, including appropriate analysis, and the manuscript is written clearly and supported by high-quality figures. However, some issues are raised which if addressed, would strengthen the paper.

      We thank the Reviewer for their suggestions to improve the paper.

      The authors found a difference in the number of metabolites when comparing TIF or KIF lipid composition with plasma. The discoveries are intriguing; however, I am keen to understand whether the differences in these metabolites between plasma and tissue interstitial fluid are specific to RCC patients or if they are also present in normal individuals. I am particularly interested in identifying which metabolites could serve as potential diagnostic markers, intervention targets, or potentially reshape the tumor microenvironment. Because, even though some metabolite levels show no difference between TIF and KIF in RCC patients, I wonder if these metabolite levels in KIF increase or decrease compared to the interstitial fluid in healthy individuals. I am intrigued by the metabolites that simultaneously increase or decrease in both TIF and KIF compared to the kidney interstitial fluid in healthy individuals.

      We agree with the Reviewer that it would be interesting to measure kidney interstitial fluid from healthy patients to be able to compare metabolites changing due to the presence of RCC tumor. As we discuss in response to the public review, this was not possible as we could not obtain material from healthy individuals for analysis. Nevertheless we agree it warrants future study if material were available.

      The analysis conducted using plasma from healthy donors, as applauded by the author, is noteworthy. The author seems to have found that cystine levels do not differ between RCC patient plasma and tissue interstitial fluid. However, considering that in patient plasma, the cystine concentration is approximately two-fold higher than in plasma from healthy individuals, likely, cystine levels in patient tissue fluid have also increased nearly two-fold compared to levels in the interstitial fluid of normal kidney tissues. This finding aligns with the discovery of elevated GSH levels in cancer cells.

      We agree with the Reviewer that a higher cystine concentration in RCC patient plasma and interstitial fluid is interesting, and also considered this in relationship to past findings including reports of elevated GSH levels in RCC. However, we think this observation is driven at least in part by the fasting status of the patients pre-surgery. This does not rule out some part being related to the presence of the tumor, as this would be consistent with elevated GSH levels as noted by the Reviewer. Future studies will be needed to further delineate the factors that impact elevated cystine levels in both interstitial fluid and plasma.

      Some minor typos, such as "HIF1􀀀-driven" should be corrected.

      We thank the Reviewer for pointing out this typo and we have corrected it in the revised manuscript.

    1. Author response:

      eLife assessment

      This study provides valuable evidence indicating that Syngap1 regulates the synaptic drive and membrane excitability of parvalbumin- and somatostatin-positive interneurons in the auditory cortex. Since haplo-insufficiency of Syngap1 has been linked to intellectual disabilities without a well-defined underlying cause, the central question of this study is timely. However, the support for the authors' conclusions is incomplete in general and some parts of the experimental evidence are inadequate. Specifically, the manuscript requires further work to properly evaluate the impact on synaptic currents, intrinsic excitability parameters, and morphological features.

      We are happy that the editors found that our study provides valuable evidence and that the central question is timely. We thank the reviewers for their detailed comments and suggestions. Below, we provide a point-by-point answer (in blue) to the specific comments and indicate the changes to the manuscript and the additional experiments we plan to perform to answer these comments.

      Public Reviews:

      Reviewer #1 (Public Review):

      The study is designed to assess the role of Syngap1 in regulating the physiology of the MGE-derived PV+ and SST+ interneurons. Syngap1 is associated with some mental health disorders, and PV+ and SST+ cells are the focus of many previous and likely future reports from studies of interneuron biology, highlighting the translational and basic neuroscience relevance of the authors' work.

      Strengths of the study are using well-established electrophysiology methods and the highly controlled conditions of ex vivo brain slice experiments combined with a novel intersectional mouse line, to assess the role of Syngap1 in regulating PV+ and SST+ cell properties. The findings revealed that in the mature auditory cortex, Syngap1 haploinsufficiency decreases both the intrinsic excitability and the excitatory synaptic drive onto PV+ neurons from Layer 4. In contrast, SST+ interneurons were mostly unaffected by Syngap1 haploinsufficiency. Pharmacologically manipulating the activity of voltage-gated potassium channels of the Kv1 family suggested that these channels contributed to the decreased PV+ neuron excitability by Syngap insufficiency. These results therefore suggest that normal Syngap1 expression levels are necessary to produce normal PV+ cell intrinsic properties and excitatory synaptic drive, albeit, perhaps surprisingly, inhibitory synaptic transmission was not affected by Syngap1 haploinsufficiency.

      Since the electrophysiology experiments were performed in the adult auditory cortex, while Syngap1 expression was potentially affected since embryonic stages in the MGE, future studies should address two important points that were not tackled in the present study. First, what is the developmental time window in which Syngap1 insufficiency disrupted PV+ neuron properties? Albeit the embryonic Syngap1 deletion most likely affected PV+ neuron maturation, the properties of Syngap-insufficient PV+ neurons do not resemble those of immature PV+ neurons. Second, whereas the observation that Syngap1 haploinsufficiency affected PV+ neurons in auditory cortex layer 4 suggests auditory processing alterations, MGE-derived PV+ neurons populate every cortical area. Therefore, without information on whether Syngap1 expression levels are cortical area-specific, the data in this study would predict that by regulating PV+ neuron electrophysiology, Syngap1 normally controls circuit function in a wide range of cortical areas, and therefore a range of sensory, motor and cognitive functions. These are relatively minor weaknesses regarding interpretation of the data in the present study that the authors could discuss.

      We agree with the reviewer on the proposed open questions, which we will certainly discuss in the revised manuscript we are preparing. We do have experimental evidence suggesting that Syngap1 mRNA is expressed by PV+ and SST+ neurons in different cortical areas, during early postnatal development and in adulthood; therefore, we agree that it will be important, in future experiments, to tackle the question of when the observed phenotypes arise.

      Reviewer #2 (Public Review):

      Summary:

      In this manuscript, the authors investigated how partial loss of SynGap1 affects inhibitory neurons derived from the MGE in the auditory cortex, focusing on their synaptic inputs and excitability. While haplo-insufficiently of SynGap1 is known to lead to intellectual disabilities, the underlying mechanisms remain unclear.

      Strengths:

      The questions are novel

      Weaknesses:

      Despite the interesting and novel questions, there are significant concerns regarding the experimental design and data quality, as well as potential misinterpretations of key findings. Consequently, the current manuscript fails to contribute substantially to our understanding of SynGap1 loss mechanisms and may even provoke unnecessary controversies.

      Major issues:

      (1) One major concern is the inconsistency and confusion in the intermediate conclusions drawn from the results. For instance, while the sEPSC data indicates decreased amplitude in PV+ and SOM+ cells in cHet animals, the frequency of events remains unchanged. In contrast, the mEPSC data shows no change in amplitudes in PV+ cells, but a significant decrease in event frequency. The authors conclude that the former observation implies decreased excitability. However, traditionally, such observations on mEPSC parameters are considered indicative of presynaptic mechanisms rather than changes of network activity.‎ The subsequent synapse counting experiments align more closely with the traditional conclusions. This issue can be resolved by rephrasing the text. However, it would remain unexplained why the sEPSC frequency shows no significant difference. If the majority of sEPSC events were indeed mediated by spiking (which is blocked by TTX), the average amplitudes and frequency of mEPSCs should be substantially lower than those of sEPSCs. Yet, they fall within a very similar range, suggesting that most sEPSCs may actually be independent of action potentials. But if that was indeed the case, the changes of purported sEPSC and mEPSC results should have been similar.

      We understand the reviewer’s perspective; indeed, we asked ourselves the very same question regarding why the sEPSC and mEPSC frequency fall within a similar range when we analysed neuron means (bar graphs). We have already recorded sEPSCs followed by mEPSCs from several PV neurons (control and cHet) and are in the process of analyzing the data. We will add this data to the revised version of the manuscript. We will also rephrase the manuscript to present multiple potential interpretations of the data.

      We hope that we have correctly interpreted the reviewer's concern. However, if the question is why sEPSC amplitude but not frequency is affected in cHet vs ctrl then the reviewer’s comment is perhaps based on the assumption that the amplitude and frequency of miniature events should be lower for all events compared to those observed for spontaneous events. However, it's essential to note that changes in the mean amplitude of sEPSCs are primarily driven by alterations in large sEPSCs (>9-10pA, as shown in cumulative probability in Fig. 1b right), with smaller ones being relatively unaffected. Consequently, a reduction in sEPSC amplitude may not necessarily result in a significant decrease in frequency since their values likely remain above the detection threshold of 3 pA. This could explain the lack of a significant decrease in average inter-interval event of sEPSCs (as depicted in Fig. 1b left).

      If the question is whether we should see the same parameters affected by the genetic manipulation in both sEPSC and mEPSC, then another critical consideration is the involvement of the releasable pool in mEPSCs versus sEPSCs. Current knowledge suggests that activity-dependent and -independent release may not necessarily engage the same pool of vesicles or target the same postsynaptic sites. This concept has been extensively explored (reviewed in Kavalali, 2015). Consequently, while we may have traditionally interpreted activity-dependent and -independent data assuming they utilize the same pool, this is no longer accurate. The current discussion in the field revolves around understanding the mechanisms underlying such phenomena. Therefore, comparisons between sEPSCs and mEPSCs may not yield conclusive data but rather speculative interpretations. For a rigorous analysis, particularly in this context involving thousands of events, it is essential to assess these data sets (mEPSCs vs sEPSCs) separately and provide cumulative probability curves. This approach allows for a more comprehensive understanding of the underlying distributions and helps to elucidate any potential differences between the two types of events. We will rephrase the text, and as mentioned above, add additional data, to better reflect these considerations.

      (2) Another significant concern is the quality of synapse counting experiments. The authors attempted to colocalize pre- and postsynaptic markers Vglut1 and PSD95 with PV labelling. However, several issues arise. Firstly, the PV labelling seems confined to soma regions, with no visible dendrites. Given that the perisomatic region only receives a minor fraction of excitatory synapses, this labeling might not accurately represent the input coverage of PV cells. Secondly, the resolution of the images is insufficient to support clear colocalization of the synaptic markers. Thirdly, the staining patterns are peculiar, with PSD95 puncta appearing within regions clearly identified as somas by Vglut1, hinting at possible intracellular signals. Furthermore, PSD95 seems to delineate potential apical dendrites of pyramidal cells passing through the region, yet Vglut1+ partners are absent in these segments, which are expected to be the marker of these synapses here. Additionally, the cumulative density of Vglut2 and Vglut1 puncta exceeds expectations, and it's surprising that subcortical fibers labeled by Vglut2 are comparable in number to intracortical Vglut1+ axon terminals. Ideally, N(Vglut1)+N(Vglut2) should be equal or less than N(PSD95), but this is not the case here. Consequently, these results cannot be considered reliable due to these issues.

      We apologize, as it appears that the images we provided have caused confusion. The selected images represent a single focal plane of a confocal stack, which was visually centered on the PV cell somata. We chose just one confocal plane because we thought it showed more clearly the apposition of presynaptic and postsynaptic immunolabeling around the somata. In the revised version of the manuscript, we will provide higher magnification images, which will clearly show how we identified and selected the region of interest for the quantification of colocalized synaptic markers. In our confocal stacks, we can also identify PV immunolabeled dendrites and colocalized vGlut1/PSD95 or vGlut2/PSD95 puncta on them; but these do not appear in the selected images because, as explained, only one focal plane, centered on the PV cell somata, was shown.

      We acknowledge the reviewer's point that in PV+ cells the majority of excitatory inputs are formed onto dendrites; however, we focused on the somatic excitatory inputs to PV cells, because despite their lower number, they produce much stronger depolarization in PV neurons than dendritic excitatory inputs (Hu et al., 2010; Norenberg et al., 2010). Further, quantification of perisomatic putative excitatory synapses is more reliable since by using PV immunostaining, we can visualize the soma and larger primary dendrites, but smaller, higher order dendrites are not be always detectable. Of note, PV positive somata receive more excitatory synapses than SST positive and pyramidal neuron somata as found by electron microscopy studies in the visual cortex (Hwang et al., 2021; Elabbady et al., 2024).

      Regarding the comment on the density of vGlut1 and vGlut2 puncta, the reason that the numbers appear high and similar between the two markers is because we present normalized data (cHet normalized to their control values for each set of immunolabelling) to clearly represent the differences between genotypes. This information is present in the legends but we apologize for not clearly explaining it the methods section. We will provide a more detailed explanation of our methods in the revised manuscript.

      Briefly, immunostained sections were imaged using a Leica SP8-STED confocal microscope, with a 63x (NA 1.4) at 1024 X 1024, z-step =0.3 μm, stack size of ~15 μm. Images were acquired from the auditory cortex from at least 3 coronal sections per animal. All the confocal parameters were maintained constant throughout the acquisition of an experiment. All images shown in the figures are from a single confocal plane. To quantify the number of vGlut1/PSD95 or vGlut2/PSD95 putative synapses, images were exported as TIFF files and analyzed using Fiji (Image J) software. We first manually outlined the profile of each PV cell soma (identified by PV immunolabeling). At least 4 innervated somata were selected in each confocal stack. We then used a series of custom-made macros in Fiji as previously described (Chehrazi et al, 2023). After subtracting background (rolling value = 10) and Gaussian blur (σ value = 2) filters, the stacks were binarized and vGlut1/PSD95 or vGlut2/PSD95 puncta were independently identified around the perimeter of a targeted soma in the focal plane with the highest soma circumference. Puncta were quantified after filtering particles for size (included between 0-2μm2) and circularity (included between 0-1). Data quantification was done by investigators blind to the genotype, and presented as normalized data over control values for each experiment.

      (3) One observation from the minimal stimulation experiment was concluded by an unsupported statement. Namely, the change in the onset delay cannot be attributed to a deficit in the recruitment of PV+ cells, but it may suggest a change in the excitability of TC axons.

      We agree with the reviewer, please see answer to point below.

      (‎4) The conclusions drawn from the stimulation experiments are also disconnected from the actual data. To make conclusions about TC release, the authors should have tested release probability using established methods, such as paired-pulse changes. Instead, the only observation here is a change in the AMPA components, which remained unexplained.

      We agree with the reviewer and we will perform additional paired-pulse ratio experiments at different intervals. We will rephrase the discussion and our interpretation and potential hypothesis according to the data obtained from this new experiment.

      (5) The sampling rate of CC recordings is insufficient ‎to resolve the temporal properties of the APs. Therefore, the phase-plots cannot be interpreted (e.g. axonal and somatic AP components are not clearly separated), raising questions about how AP threshold and peak were measured. The low sampling rate also masks the real derivative of the AP signals, making them apparently faster.

      We acknowledge that a higher sampling rate could offer a more detailed analysis of the action potential waveform. However, in the context of action potential analysis, it is acceptable to use sampling rates ranging from 10 kHz to 20 kHz (Golomb et al., 2007; Stevens et al., 2021; Zhang et al., 2023), which are considered adequate in the context of the present study. Indeed, our study aims to evaluate "relative" differences in the electrophysiological phenotype when comparing groups following a specific genetic manipulation. A sampling rate of 10 kHz is commonly employed in similar studies, including those conducted by our collaborator and co-author S. Kourrich (e.g., Kourrich and Thomas 2009, Kourrich et al., 2013), as well as others (Russo et al., 2013; Ünal et al., 2020; Chamberland et al., 2023).

      Despite being acquired at a lower sampling rate than potentially preferred by the reviewer, our data clearly demonstrate significant differences between the experimental groups, especially for parameters that are negligibly or not affected by the sampling rate used here (e.g., #spikes/input, RMP, Rin, Cm, Tm, AP amplitude, AP latency, AP rheobase).

      Regarding the phase-plots, we agree that a higher sampling rate would have resulted in smoother curves and more accurate absolute values. However, the differences were sufficiently pronounced to discern the relative variations in action potential waveforms between the experimental groups.

      A related issue is that the Methods section lacks essential details about the recording conditions, such as bridge balance and capacitance neutralization.

      We indeed performed bridge balance and neutralized the capacitance before starting every recording. We will add the information in the methods.

      (6) Interpretation issue: One of the most fundamental measures of cellular excitability, the rheobase, was differentially affected by cHet in BCshort and BCbroad. Yet, the authors concluded that the cHet-induced changes in the two subpopulations are common.

      We are uncertain if we have correctly interpreted the reviewer's comment. While we observed distinct impacts on the rheobase (Fig. 7d and 7i), there seems to be a common effect on the AP threshold (Fig. 7c and 7h), as interpreted and indicated in the final sentence of the results section for Figure 7 (page 12). If our response does not address the reviewer's comment adequately, we would greatly appreciate it if the reviewer could rephrase their feedback.

      (7) Design issue:

      The Kv1 blockade experiments are disconnected from the main manuscript. There is no experiment that shows the causal relationship between changes in DTX and cHet cells. It is only an interesting observation on AP halfwidth and threshold. However, how they affect rheobase, EPSCs, and other topics of the manuscript are not addressed in DTX experiments.

      Furthermore, Kv1 currents were never measured in this work, nor was the channel density tested. Thus, the DTX effects are not necessarily related to changes in PV cells, which can potentially generate controversies.

      While we acknowledge the reviewer's point that Kv1 currents and density weren't specifically tested, an important insight provided by Fig. 5 is the prolonged action potential latency. This delay is significantly influenced by slowly inactivating subthreshold potassium currents, namely the D-type K+ current. It's worth noting that D-type current is primarily mediated by members of the Kv1 family. The literature supports a role for Kv1.1-containing channels in modulating responses to near-threshold stimuli in PV cells (Wang et al., 1994; Goldberg et al., 2008; Zurita et al., 2018). However, we recognize that besides the Kv1 family, other families may also contribute to the observed changes.

      To address this concern, we will revise our interpretation. We will opt for the more accurate term "D-type K+ current" and only speculate about the involved channel family in the discussion. It is not our intention to open unnecessary controversy, but present the data we obtained. We believe this approach and rephrasing the discussion as proposed will prevent unnecessary controversy and instead foster fruitful discussions.

      (8) Writing issues:

      Abstract:

      The auditory system is not mentioned in the abstract.

      One statement in the abstract is unclear‎. What is meant by "targeting Kv1 family of voltage-gated potassium channels was sufficient..."? "Targeting" could refer to altered subcellular targeting of the channels, simple overexpression/deletion in the target cell population, or targeted mutation of the channel, etc. Only the final part of the Results revealed that none of the above, but these channels were blocked selectively.

      We agree with the reviewer and we will rephrase the abstract accordingly.

      Introduction:

      There is a contradiction in the introduction. The second paragraph describes in detail the distinct contribution of PV and SST n‎eurons to auditory processing. But at the end, the authors state that "relatively few reports on PV+ and SST+ cell-intrinsic and synaptic properties in adult auditory cortex". Please be more specific about the unknown properties.

      We agree with the reviewer and we will rephrase more specifically.

      (9) The introduction emphasizes the heterogeneity of PV neurons, which certainly influences the interpretation of the results of the current manuscript. However, the initial experiments did not consider this and handled all PV cell data as a pooled population.

      In the initial experiments, we handled all PV cell data together because we wanted to be rigorous and not make assumptions/biases on the different PV cells, which in later experiments we were to distinguish based on the intrinsic properties alone. We will make this point clear in the revised manuscript.

      (10) The interpretation of the results strongly depends on unpublished work, which potentially provide the physiological and behavioral contexts about the role of GABAergic neurons in SynGap-haploinsufficiency. The authors cite their own unpublished work, without explaining the specific findings and relation to this manuscript.

      We agree with the reviewer and apologize for the lack of clarity. Our unpublished work is in revision right now. We will provide more information and update references in the revised version of this manuscript.

      (11) The introduction of Scholl analysis ‎experiments mentions SOM staining, however, there is no such data about this cell type in the manuscript.

      We apologize for the error, we will change SOM with SST (SOM and SST are two commonly used acronyms for Somatostatin expressing interneurons).

      Reviewer #3 (Public Review):

      This paper compares the synaptic and membrane properties of two main subtypes of interneurons (PV+, SST+) in the auditory cortex of control mice vs mutants with Syngap1 haploinsufficiency. The authors find differences at both levels, although predominantly in PV+ cells. These results suggest that altered PV-interneuron functions in the auditory cortex may contribute to the network dysfunction observed in Syngap1 haploinsufficiency-related intellectual disability. The subject of the work is interesting, and most of the approach is direct and quantitative, which are major strengths. There are also some weaknesses that reduce its impact for a broader field.

      (1) The choice of mice with conditional (rather than global) haploinsufficiency makes the link between the findings and Syngap1 relatively easy to interpret, which is a strength. However, it also remains unclear whether an entire network with the same mutation at a global level (affecting also excitatory neurons) would react similarly.

      The reviewer raises an interesting and pertinent open question which we will address in the discussion of the revised paper.

      (2) There are some (apparent?) inconsistencies between the text and the figures. Although the authors appear to have used a sophisticated statistical analysis, some datasets in the illustrations do not seem to match the statistical results. For example, neither Fig 1g nor Fig 3f (eNMDA) reach significance despite large differences.

      We respectfully disagree, we do not think the text and figures are inconsistent. In the cited example, large apparent difference in mean values does not show significance due to the large variability in the data; further, we did not exclude any data points, because we wanted to be rigorous. In particular, for Fig.1g, statistical analysis shows a significant increase in the inter-mEPSC interval (*p=0.027, LMM) when all events are considered (cumulative probability plots), while there is no significant difference in the inter-mEPSCs interval for inter-cell mean comparison (inset, p=0.354, LMM). Inter-cell mean comparison does not show difference with Mann-Whitney test either (p=0.101, the data are not normally distributed, hence the choice of the Mann-Whitney test). For Fig. 3f (eNMDA), the higher mean value for the cHet versus the control is driven by two data points which are particularly high, while the other data points overlap with the control values. The Mann-Whitney test show also no statistical difference (p=0.174).

      In the manuscript, discussion of the data is based on the results of the LMM analysis, which takes in account both the number of cells and the numbers of mice from which these cells are recorded. We chose this statistical approach because it does not rely on the assumption that cells recorded from same mouse are independent variables. In the supplemental tables, we provided the results of the statistical analysis done with both LMM and the most commonly used Mann Whitney (for not normally distributed) or t-test (for normally distributed), for each data set.

      Also, the legend to Fig 9 indicates the presence of "a significant decrease in AP half-width from cHet in absence or presence of a-DTX", but the bar graph does not seem to show that.

      We apologize for our lack of clarity. In legend 9, we reported the statistical comparisons between 1) cHET mice in absence of a-DTX and control mice and 2) cHET mice in presence of a-DTX and control mice. We will rephrase result description and the legend of the figure to avoid confusion.

      (3) The authors mention that the lack of differences in synaptic current kinetics is evidence against a change in subunit composition. However, in some Figures, for example, 3a, the kinetics of the recorded currents appear dramatically different. It would be important to know and compare the values of the series resistance between control and mutant animals.

      We agree with the reviewer that there appears to be a qualitative difference in eNMDA decay between conditions, although quantified eNMDA decay itself is similar between groups. We have used a cutoff of 15 % for the series resistance (Rs), which is significantly more stringent as compared to the cutoff typically used in electrophysiology, which are for the vast majority between 20 and 30%. To answer this concern, we re-examined the Rs, we compared Rs between groups and found no difference for Rs in eAMPA (13.2±0.5 in WT n=16 cells, 7 mice vs 13.7±0.3 in cHet n=14 cells, 7 mice, p=0.432 LMM) and eNMDA (12.7±0.7 in WT n=6 cells, 3 mice vs 13.8±0.7 in cHet n=6 cells, 5 mice, p=0.231, LMM). Thus, the apparent qualitative difference in eNMDA decay stems from inter-cell variability rather than inter-group differences. Notably, this discrepancy between the trace (Fig. 3a) and the data (Fig. 3f, right) is largely due to inter-cell variability, particularly in eNMDA, where a higher but non-significant decay rate is driven by a couple of very high values (Fig. 3f, right). In the revised manuscript, we will show traces that better represent our findings.

      (4) A significant unexplained variability is present in several datasets. For example, the AP threshold for PV+ includes points between -50-40 mV, but also values at around -20/-15 mV, which seems too depolarized to generate healthy APs (Fig 5c, Fig7c).

      We acknowledge the variability in AP threshold data, with some APs appearing too depolarized to generate healthy spikes. However, we meticulously examined each AP that spiked at these depolarized thresholds and found that other intrinsic properties (such as Rin, Vrest, AP overshoot, etc.) all indicate that these cells are healthy. Therefore, to maintain objectivity and provide unbiased data to the community, we opted to include them in our analysis. It's worth noting that similar variability has been observed in other studies (Bengtsson Gonzales et al., 2020; Bertero et al., 2020).

      Further, we conducted a significance test on AP threshold excluding these potentially unhealthy cells and found that the significant differences persist. After removing two outliers from the cHet group with values of -16.5 and 20.6 mV, we obtain: -42.6±1.01 mV in control, n=33, 15 mice vs -36.2±1.1 mV in cHet, n=38 cells, 17 mice, ***p<0.001, LMM. Thus, whether these cells are included or excluded, our interpretations and conclusions remain unchanged.

      We would like to clarify that these data have not been corrected with the junction potential. We will add this info in the revised version.

      (5) I am unclear as to how the authors quantified colocalization between VGluts and PSD95 at the low magnification shown in Supplementary Figure 2.

      We apologize for our lack of clarity. Although the analysis was done at high resolution, the figures were focused on showing multiple PV somata receiving excitatory inputs. We will add higher magnification figures and more detailed information in the methods of the revised version. Please also see our response to reviewer #2.

      (6) The authors claim that "cHet SST+ cells showed no significant changes in active and passive membrane properties", but this claim would seem to be directly refused by the data of Fig 8f. In the absence of changes in either active or passive membrane properties shouldn't the current/#AP plot remain unchanged?

      While we acknowledge the theoretical expectation that changes in intrinsic parameters should correlate with alterations in neuronal firing, the absence of differences in the parameters analyzed in this study should not overshadow the clear and significant decrease in firing rate observed in cHet SST+ cells. This decrease serves as a compelling indication of reduced intrinsic neuronal excitability. It's certainly possible that other intrinsic factors, not assessed in this study, may have contributed to this effect. However, exploring these mechanisms is beyond the scope of our current investigation. We will rephrase the discussion and add this limitation of our study in the revised version.

      (7) The plots used for the determination of AP threshold (Figs 5c, 7c, and 7h) suggest that the frequency of acquisition of current-clamp signals may not have been sufficient, this value is not included in the Methods section.

      This study utilized a sampling rate of 10 kHz, which is a standard rate for action potential analysis in the present context. We will describe more extensively the technical details in the method section of the revised manuscript we are preparing. While we acknowledge that a higher sampling rate could have enhanced the clarity of the phase plot, our recording conditions, as detailed in our response to Rev#2/comment#5, were suitable for the objectives of this study.

      Reference list

      Bengtsson Gonzales C, Hunt S, Munoz-Manchado AB, McBain CJ, Hjerling-Leffler J (2020) Intrinsic electrophysiological properties predict variability in morphology and connectivity among striatal Parvalbumin-expressing Pthlh-cells. Scientific Reports, 10, 15680. https://doi.org/10.1038/s41598-020-72588-1

      Bertero A, Zurita H, Normandin M, Apicella AJ (2020) Auditory long-range parvalbumin cortico-striatal neurons. Frontiers in Neural Circuits, 14, 45. http://doi.org/ 10.3389/fncir.2020.00045

      Chamberland S, Nebet ER, Valero M, Hanani M, Egger R, Larsen SB, Eyring KW, Buzsáki G, Tsien RW (2023) Brief synaptic inhibition persistently interrupts firing of fast-spiking interneurons. Neuron, 111, 1264–1281. http://doi.org/10.1016/j.neuron.2023.01.017

      Chehrazi P, Lee KKY, Lavertu-Jolin M, Abbasnejad Z, Carreño-Muñoz MI, Chattopadhyaya B, Di Cristo G (2023). The p75 Neurotrophin Receptor in Preadolescent Prefrontal Parvalbumin Interneurons Promotes Cognitive Flexibility in Adult Mice. Biol Psychiatry, 94, 310-321. doi: 10.1016/j.biopsych.2023.04.019.

      Elabbady L, Seshamani S, Mu S, Mahalingam G, Schneider-Mizell C, Bodor AL, Bae JA, Brittain D, Buchanan J, Bumbarger DJ, Castro MA, Dorkenwald S, Halageri A, Jia Z, Jordan C, Kapner D, Kemnitz N, Kinn S, Lee K, Li K…Collman F (2024) Perisomatic features enable efficient and dataset wide cell-type classifications across large-scale electron microscopy volumes. bioRxiv, https://doi.org/10.1101/2022.07.20.499976

      Goldberg EM, Clark BD, Zagha E, Nahmani M, Erisir A, Rudy B (2008) K+ Channels at the axon initial segment dampen near-threshold excitability of neocortical fast-spiking GABAergic interneurons. Neuron, 58, 387–400. https://doi.org/10.1016/j.neuron.2008.03.003

      Golomb D, Donner K, Shacham L, Shlosberg D, Amitai Y, Hansel D. (2007). Mechanisms of firing patterns in fast-spiking cortical interneurons. PLoS Computational Biology, 38, e156. http://doi.org/10.1371/journal.pcbi.0030156

      Hu H, Martina M, Jonas P (2010). Dendritic mechanisms underlying rapid synaptic activation of fast-spiking hippocampal interneurons. Science, 327, 52–58. http://doi.org/10.1126/science.1177876

      Hwang YS, Maclachlan C, Blanc J, Dubois A, Petersen CH, Knott G, Lee SH (2021). 3D ultrastructure of synaptic inputs to distinct gabaergic neurons in the mouse primary visual cortex. Cerebral Cortex, 31, 2610–2624. http://doi.org/10.1093/cercor/bhaa378

      Kavalali E (2015) The mechanisms and functions of spontaneous neurotransmitter release. Nature Reviews Neuroscience, 16, 5–16. https://doi.org/10.1038/nrn3875

      Kourrich S, Thomas MJ (2009) Similar neurons, opposite adaptations: psychostimulant experience differentially alters firing properties in accumbens core versus shell. Journal of Neuroscience, 29, 12275-12283. http://doi.org:10.1523/JNEUROSCI.3028-09.2009

      Kourrich S, Hayashi T, Chuang JY, Tsai SY, Su TP, Bonci A (2013) Dynamic interaction between sigma-1 receptor and Kv1.2 shapes neuronal and behavioral responses to cocaine. Cell, 152, 236–247. http://doi.org/10.1016/j.cell.2012.12.004

      Norenberg A, Hu H, Vida I, Bartos M, Jonas P (2010) Distinct nonuniform cable properties optimize rapid and efficient activation of fast-spiking GABAergic interneurons. Proceedings of the National Academy of Sciences, 107, 894–9. http://doi.org/10.1073/pnas.0910716107

      Stevens SR, Longley CM, Ogawa Y, Teliska LH, Arumanayagam AS, Nair S, Oses-Prieto JA, Burlingame AL, Cykowski MD, Xue M, Rasband MN (2021) Ankyrin-R regulates fast-spiking interneuron excitability through perineuronal nets and Kv3.1b K+ channels. Elife, 10, e66491. http://doi.org/10.7554/eLife.66491

      Russo G, Nieus TR, Maggi S, Taverna S (2013) Dynamics of action potential firing in electrically connected striatal fast-spiking interneurons. Frontiers in Cellular Neuroscience, 7, 209. https://doi.org/10.3389/fncel.2013.00209

      Ünal CT, Ünal B, Bolton MM (2020) Low-threshold spiking interneurons perform feedback inhibition in the lateral amygdala. Brain Structure and Function, 225, 909–923. http://doi.org/10.1007/s00429-020-02051-4

      Wang H, Kunkel DD, Schwartzkroin PA, Tempel BL (1994) Localization of Kv1.1 and Kv1.2, two K channel proteins, to synaptic terminals, somata, and dendrites in the mouse brain. The Journal of Neuroscience, 14, 4588-4599. https://doi.org/10.1523/JNEUROSCI.14-08-04588.1994

      Zhang YZ, Sapantzi S, Lin A, Doelfel SR, Connors BW, Theyel BB (2023) Activity-dependent ectopic action potentials in regular-spiking neurons of the neocortex. Frontiers in Cellular Neuroscience, 17. https://doi.org/10.3389/fncel.2023.1267687

      Zurita H, Feyen PLC, Apicella AJ (2018) Layer 5 callosal parvalbumin-expressing neurons: a distinct functional group of GABAergic neurons. Frontiers in Cellular Neuroscience, 12, 53. https://doi.org/10.3389/fncel.2018.00053

    1. Author response:

      Reviewer #1

      The first is that data on the general health of mice with single and double knockouts is not shown, nor is there any data on effects in any other tissues. This gives the impression that the only phenotype is in the male reproductive system, which would be misleading if there were phenotypes in other tissues that are not reported.

      We thank the reviewer for helpful and constructive suggestions that we plan to implement in the revision. We agree with this point and we will add a statement that the effect on the urogenital system was not the only observed phenotype, although it was the most striking histological feature that we found. We did notice some other physiological differences that we are examining in detail and determining their mechanisms, for future publications.

      Furthermore, data for the genitourinary system in single knockouts are very sparse; data are described for fertility in Figure 1H, ploidy, and cell number in Figures 2B and C, plasma testosterone and luteinizing hormone levels in Figures 5C and 5D, and morphology of testis and prostate tissue for single Cdk8 knockout in Supplementary Figure 1C (although in this case the images do not appear very comparable between control and CDK8 KO, thus perhaps wider fields should be shown), but, for example, there is no analysis of different meiotic stages or of gene expression in single knockouts. It is worth mentioning that single knockouts seem to show a corresponding upregulation of the level of the paralogue kinase, indicating that any lack of phenotypes might be due to feedback compensation, which would be an interesting finding if confirmed; this has not been mentioned.

      We agree that a description of the single KO could be beneficial, but we expect no big differences with the WT or Cre-Ert. We found neither histological differences nor changes in cell counts or ratios of cell types. Our ethical committee also has concerns about sacrificing mice without major phenotypic changes, without a well formulated hypothesis about the observed effects. We plan to add histological pictures to the next version of the article.

      We thank the reviewer for raising an important point about the paralog upregulation. Indeed, our data on primary cells (supplementary 1B) suggests the upregulation of CDK19 in CDK8KO and vice versa. We will point this out in disc We plan to examine the data for the testis as soon as more tissues are available.

      The second major weakness is that the correlation between double knockout and reduced expression of genes involved in steroid hormone biosynthesis is portrayed as a causal mechanism for the phenotypes observed. While this is a possibility, there are no experiments performed to provide evidence that this is the case. Furthermore, there is no evidence showing that CDK8 and/or CDK19 are directly responsible for the transcription of the genes concerned.

      We agree with the reviewer that the effects on CDK8/CDK19/CCNC could lead to the observed transcriptional changes in multiple indirect steps. There are, however, major technical challenges in examining the binding of transcription factors in the tissue, especially in Leydig cells which are a relatively minor population. We will clarify it in the revision, and strengthen this point in the discussion.

      Finally, the authors propose that the phenotypes are independent of the kinase activity of CDK8 or CDK19 because treatment of mice for a month with an inhibitor does not recapitulate the effects of the knockout, and nor does expression of two steroidogenic genes change in cultured Leydig cells upon treatment with an inhibitor. However, there are no controls for effective target inhibition shown.

      We thank the reviewer for raising this concern, which we will address in the revision. This study used the same CDK8/19 inhibitor (SNX631-6) as in the recently published study on prostate cancer (doi: 10.1172/JCI176709). That study describes the inhibitor, its target engagement in cell-free and cell-based assays, its anticancer potency, and its transcriptomic effects in vivo, the same dosage strength as in the present study, which phenocopy the effects of CDK8/19 knockdown. Additional data will be included in the revision.

      Reviewer #2

      The claim of reproductive defects in the induced double knockout of CDK8/19 resulted from the loss of CCNC via a kinase-independent mechanism is interesting but was not supported by the data presented. While the construction and analysis of the systemic induced knockout model of Cdk8 in Cdk19KO mice is not trivial, the analysis and data are weakened by the systemic effect of Cdk8 loss, making it difficult to separate the systemic effect from the local testis effect.

      We agree with the reviewer that the effects on the testis could be due to the systemic loss of CDK8 rather than specifically in the testis, and we will clarify it in the revision. We will also clarify that although our results are suggestive that the effects of CDK8/19 knockout are kinase-independent, and that the loss of Cyclin C is a likely explanation for the kinase independence but we do not claim that it is the mechanism.

      The analysis of male sterile phenotype is also inadequate with poor image quality, especially testis HE sections. The male reproductive tract picture is also small and difficult to evaluate.

      Unfortunately, during the submission process through Biorxiv the quality of the image worsened. We uploaded the high resolution pictures for the journal but probably they were not presented for the reviewer. We will re-send the high resolution images.

      The mice crossing scheme is unusual as you have three mice to cross to produce genotypes, while we could understand that it is possible to produce pups of desired genotypes with different mating schemes, such a vague crossing scheme is not desirable and of poor genetics practice.

      We thank the reviewer for this suggestion. Indeed, our scheme is not a representation of the actual breeding scheme but just a brief explanation of lineages used for the acquisition of the triple transgenic mice. We will include the full crossing scheme into the revision.

      Also using TAM-treated wild type as control is ok, but a better control will be TAM-treated ERT2-cre; CDK8f/f or TAM-treated ERT2 Cre CDK19/19 KO, so as to minimize the impact from the well-recognized effect of TAM.

      We used TAM-treated ERT2-cre for most of the experiments, and did not observe any major histological or physiological differences with the WT+TAM. We will make sure to present them in the revision.

      While the authors proposed that the inducible loss of CDK8 in the CDK19 knockout background is responsible for spermatogenic defects, it was not clear in which cells CDK8/19 genes are interested and which cell types might have a major role in spermatogenesis. The authors also put forward the evidence that reduction/loss of Testosterone might be the main cause of spermatogenic defects, which is consistent with the expression change in genes involved in steroigenesis pathway in Leydig cells of inducible double knockout. However it is not clear how the loss of Testosterone contributed to the loss of CcnC protein.

      We agree with the reviewer that the spermatogenic defects could be caused by the effects on gene expression in tissues other than Leydig cells. Nevertheless, this is our primary hypothesis since these changes resemble the effects of chemical castration in rats (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3408499/), and in SCARKO mice (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3968405/).

      Our hypothesis is actually the reversed scenario proposed by the reviewer. We think that the loss of steroidogenic gene expression is caused by the loss of CDK8/19 and Cyclin C in Leydig cells. This, in turn, leads to a drop of testosterone levels. We will expand this explanation for clarity.

      The authors should clarify or present the data on where CDK8 and CDK19 as well as CcnC are expressed so as to help the readers understand which tissues both CDK might be functioning in and cause the loss of CcnC. It should be easier to test the hypothesis of CDK8/19 stabilizing CcnC protein using double knock-out primary cells, instead of the whole testis.

      The stabilizing effect of Cdk8/19 on CcnC has been previously discovered and reported in cell culture (doi: 10.1093/nar/gkad538.), and here we have confirmed it at the level of whole tissue. Due to a limited sensitivity of single cell sequencing (only ~5,000 transcripts are sequenced from total of average 500,000 transcripts per cell, so the low expressed transcripts are not sequenced in all cells) it is challenging to firmly establish CDK8/19 positive and -negative tissues from single cell data because both transcripts are minor. This image will be included in the next version. We plan to resolve this matter using two approaches. First, we will try immunohistochemistry. If this method is not sufficiently sensitive we will analyze published single cell sequencing data from mouse databases and re-analyze our data. So far the former approach was challenging for us due to the absence of anti-mouse antibodies which are specific for CDK8 and CDK19 and work on tissue sections. We and others could not produce a tissue-specific staining, with the currently available commercially available antibodies. The only published specific antibody is currently not available.

      Since CDK8KO and CDK19KO have significantly reduced fertility compared to the wild type, it might be important to measure the sperm quantity and motility among CDK8 KO, CDK19KO, and induced DKO to evaluate spermatogenesis based on their sperm production.

      We agree that this is an interesting question. We did not do spermograms for single KOs but we don’t think that a decreased sperm count would explain CDK8KO infertility as the vasectomized males are able to produce copulative plugs in females whereas CDK8KO males do not, suggesting the absence of mating behavior as a reason for low fertility in the latter genotype.

      Some data for the inducible knockout efficiency of Cdk8 were presented in Supplemental Figure 1, but there is no legend for the supplemental figures, it was not clear which band represented the deletion band, and which tissues were examined. Tail or testis?

      We apologize for the accidental loss of supplementary figure legends, which will be presented in the next version. The efficiency of CDK8 KO in different tissues was previously examined by us in https://www.ncbi.nlm.nih.gov/gene/264064. The western blot in the MS represents deletion data for the testis.

      It seems that two months after the injection of Tam, all the Cdk8 were completely deleted, indicating extremely efficient deletion of Tam induction by two months post administration. Were the complete deletion of Cdk8 happening even earlier?

      The complete deletion of CDK8 occurs within a week or even as early as 2-3 days in culture, and at least after at two weeks in vivo. We chose the two mo. period to prevent the effect of tamoxifen on gene expression. We examined other time points (Figure 6) and registered the beginning of effects at 2 weeks and maximum effect by one mo.

      The authors found that Sertoli cells re-entered the cell cycle in the inducible double knockout but stopped short of careful characterization other than increased expression of cell cycle genes.

      We agree with the reviewer, and we will add Ki67 (or equivalent) staining along with Sertoli cell markers.

      Dko should be appropriately named iDKO (induced dKO).

      We will make the corresponding change.

      We performed necropsy ? not the right wording here. Colchicine-lke apoptotic bodies ? what does this mean? Not clear.

      We will amend the next version to address these minor points, and we thank the reviewer for careful reading of the manuscript.

      Images throughout the manuscript suffer from poor resolution and are often blurry and hard to evaluate.

      As mentioned above, we had a problem with image quality during the submission through Biorxiv and we will provide high resolution images in the next version.

      To pinpoint the meiotic stage defect of iDKO, it is better to use the meiotic chromosome spread approach.

      Unfortunately, meiotic spreads would not be feasible or informative, due to a low number of surviving cells in iDKO and the fact that there were evidently no cells in stages after SYCP3+.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Thank you very much for the careful and positive reviews of our manuscript. We have addressed each comment in the attached revised manuscript. We describe the modifications below. To avoid confusion, we've changed supplementary figure and table captions to start with "Supplement Figure" and "Supplementary Table," instead of "Figure" and "Table."

      We have modified/added:

      ● Supplementary Table S1: AUC scores for the top 10 frequent epitope types (pathogens) in the testing set of epitope split.

      ● Supplementary Table S5: AUCs of TCR-epitope binding affinity prediction models with BLOSUM62 to embed epitope sequences.

      ● Supplementary Table S6: AUCs of TCR-epitope binding affinity prediction models trained on catELMo TCR embeddings and random-initialized epitope embeddings.

      ● Supplementary Table S7: AUCs of TCR-epitope binding affinity prediction models trained on catELMo and BLOSUM62 embeddings.

      ● Supplementary Figure 4: TCR clustering performance for the top 34 abundant epitopes representing 70.55% of TCRs in our collected databases.

      ● Section Discussion.

      ● Section 4.1 Data: TCR-epitope pairs for binding affinity prediction.

      ● Section 4.4.2 Epitope-specific TCR clustering.

      Public Reviews:

      Reviewer #1 (Public Review):

      In this manuscript, the authors described a computational method catELMo for embedding TCR CDR3 sequences into numeric vectors using a deep-learning-based approach, ELMo. The authors applied catELMo to two applications: supervised TCR-epitope binding affinity prediction and unsupervised epitope-specific TCR clustering. In both applications, the authors showed that catELMo generated significantly better binding prediction and clustering performance than other established TCR embedding methods. However, there are a few major concerns that need to be addressed.

      (1) There are other TCR CDR3 embedding methods in addition to TCRBert. The authors may consider incorporating a few more methods in the evaluation, such as TESSA (PMCID: PMC7799492), DeepTCR (PMCID: PMC7952906) and the embedding method in ATM-TCR (reference 10 in the manuscript). TESSA is also the embedding method in pMTnet, which is another TCR-epitope binding prediction method and is the reference 12 mentioned in this manuscript.

      TESSA is designed for characterizing TCR repertoires, so we initially excluded it from the comparison. Our focus was on models developed specifically for amino acid embedding rather than TCR repertoire characterization. However, to address the reviewer's inquiry, we conducted further evaluations. Since both TESSA and DeepTCR used autoencoder-based models to embed TCR sequences, we selected one used in TESSA for evaluation in our downstream prediction task, conducting ten trials in total. It achieved an average AUC of 75.69 in TCR split and 73.3 in epitope split. Notably, catELMo significantly outperformed such performance with an AUC of 96.04 in TCR split and 94.10 in epitope split.

      Regarding the embedding method in ATM-TCR, it simply uses BLOSUM as an embedding matrix which we have already compared in Section 2.1. Furthermore, we have provided the comparison results between our prediction model trained on catELMo embeddings with the state-of-the-art prediction models such as netTCR and ATM-TCR in Table 6 of the Discussion section.

      (2) The TCR training data for catELMo is obtained from ImmunoSEQ platform, including SARS-CoV2, EBV, CMV, and other disease samples. Meanwhile, antigens related to these diseases and their associated TCRs are extensively annotated in databases VDJdb, IEDB and McPAS-TCR. The authors then utilized the curated TCR-epitope pairs from these databases to conduct the evaluations for eptitope binding prediction and TCR clustering. Therefore, the training data for TCR embedding may already be implicitly tuned for better representations of the TCRs used in the evaluations. This seems to be true based on Table 4, as BERT-Base-TCR outperformed TCRBert. Could catELMo be trained on PIRD as TCRBert to demonstrate catELMo's embedding for TCRs targeting unseen diseases/epitopes?

      We would like to note that catELMo was trained exclusively on TCR sequences in an unsupervised manner, which means it has never been exposed to antigen information. We also ensured that the TCRs used in catELMo's training did not overlap with our downstream prediction data. Please refer to the section 4.1 Data where we explicitly stated, “We note that it includes no identical TCR sequences with the TCRs used for training the embedding models.”. Moreover, the performance gap (~1%) between BERT-Base-TCR and TCRBert, as observed in Table 4, is relatively small, especially when compared to the performance difference (>16%) between catELMo and TCRBert.

      To further address this concern, we conducted experiments using the same number of TCRs, 4,173,895 in total, sourced exclusively from healthy ImmunoSeq repertoires. This alternative catELMo model demonstrated a similar prediction performance (based on 10 trials) to the one reported in our paper, with an average AUC of 96.35% in TCR split and an average AUC of 94.03% in epitope split.

      We opted not to train catELMo on the PIRD dataset for several reasons. First, approximately 7.8% of the sequences in PIRD also appear in our downstream prediction data, which could be a potential source of bias. Furthermore, PIRD encompasses sequences related to diseases such as Tuberculosis, HIV, CMV, among others, which the reviewer is concerned about.

      (3) In the application of TCR-epitope binding prediction, the authors mentioned that the model for embedding epitope sequences was catElMo, but how about for other methods, such as TCRBert? Do the other methods also use catELMo-embedded epitope sequences as part of the binding prediction model, or use their own model to embed the epitope sequences? Since the manuscript focuses on TCR embedding, it would be nice for other methods to be evaluated on the same epitope embedding (maybe adjusted to the same embedded vector length).

      Furthermore, the authors found that catELMo requires less training data to achieve better performance. So one would think the other methods could not learn a reasonable epitope embedding with limited epitope data, and catELMo's better performance in binding prediction is mainly due to better epitope representation.

      Review 1 and 3 have raised similar concerns regarding the epitope embedding approach employed in our binding affinity prediction models. We address both comments together on page 6 where we discuss the epitope embedding strategies in detail.

      (4) In the epitope binding prediction evaluation, the authors generated the test data using TCR-epitope pairs from VDJdb, IEDB, McPAS, which may be dominated by epitopes from CMV. Could the authors show accuracy categorized by epitope types, i.e. the accuracy for TCR-CMV pair and accuracy for TCR-SARs-CoV2 separately?

      The categorized AUC scores have been added in Supplementary Table 7. We observed significant performance boosts from catELMo compared with other embedding models.

      (5) In the unsupervised TCR clustering evaluation, since GIANA and TCRdist direct outputs the clustering result, so they should not be affected by hierarchical clusters. Why did the curves of GIANA and TCRdist change in Figure 4 when relaxing the hierarchical clustering threshold?

      For fair comparisons, we performed GIANA and TCRdist with hierarchical clustering instead of the nearest neighbor search. We have clarified it in the revised manuscript as follows.

      “Both methods are developed on the BLOSUM62 matrix and apply nearest neighbor search to cluster TCR sequences. GIANA used the CDR3 of TCRβ chain and V gene, while TCRdist predominantly experimented with CDR1, CDR2, and CDR3 from both TCRα and TCRβ chains. For fair comparisons, we perform GIANA and TCRdist only on CDR3 β chains and with hierarchical clustering instead of the nearest neighbor search.”

      (6 & 7) In the unsupervised TCR clustering evaluation, the authors examined the TCR related to the top eight epitopes. However, there are much more epitopes curated in VDJdb, IEDB and McPAS-TCR. In real application, the potential epitopes is also more complex than just eight epitopes. Could the authors evaluate the clustering result using all the TCR data from the databases? In addition to NMI, it is important to know how specific each TCR cluster is. Could the authors add the fraction of pure clusters in the results? Pure cluster means all the TCRs in the cluster are binding to the same epitope, and is a metric used in the method GIANA.

      We would like to note that there is a significant disparity in TCR binding frequencies across different epitopes in current databases. For instance, the most abundant epitope (KLGGALQAK) has approximately 13k TCRs binding to it, while 836 out of 982 epitopes are associated with fewer than 100 TCRs in our dataset. Furthermore, there are 9347 TCRs having the ability to bind multiple epitopes. In order to robustly evaluate the clustering performance, we originally selected the top eight frequent epitopes from McPAS and removed TCRs binding multiple epitopes to create a more balanced dataset.

      We acknowledge that the real-world scenario is more complex than just eight epitopes. Therefore, we conducted clustering experiments using the top most abundant epitopes whose combined cognate TCRs make up at least 70% of TCRs across three databases (34 epitopes). This is illustrated in Supplementary Figure 5. Furthermore, we extended our analysis by clustering all TCRs after filtering out those that bind to multiple epitopes, resulting in 782 unique epitopes. We found that catELMo achieved the 3rd and 2nd best performance in NMI and Purity, respectively (see Table below). These are aligned with our previous observations of the eight epitopes.

      Author response table 1.

      Reviewer #2 (Public Review):

      In the manuscript, the authors highlighted the importance of T-cell receptor (TCR) analysis and the lack of amino acid embedding methods specific to this domain. The authors proposed a novel bi-directional context-aware amino acid embedding method, catELMo, adapted from ELMo (Embeddings from Language Models), specifically designed for TCR analysis. The model is trained on TCR sequences from seven projects in the ImmunoSEQ database, instead of the generic protein sequences. They assessed the effectiveness of the proposed method in both TCR-epitope binding affinity prediction, a supervised task, and the unsupervised TCR clustering task. The results demonstrate significant performance improvements compared to existing embedding models. The authors also aimed to provide and discuss their observations on embedding model design for TCR analysis: 1) Models specifically trained on TCR sequences have better performance than models trained on general protein sequences for the TCR-related tasks; and 2) The proposed ELMo-based method outperforms TCR embedding models with BERT-based architecture. The authors also provided a comprehensive introduction and investigation of existing amino acid embedding methods. Overall, the paper is well-written and well-organized.

      The work has originality and has potential prospects for immune response analysis and immunotherapy exploration. TCR-epitope pair binding plays a significant role in T cell regulation. Accurate prediction and analysis of TCR sequences are crucial for comprehending the biological foundations of binding mechanisms and advancing immunotherapy approaches. The proposed embedding method presents an efficient context-aware mathematical representation for TCR sequences, enabling the capture and analysis of their structural and functional characteristics. This method serves as a valuable tool for various downstream analyses and is essential for a wide range of applications. Thank you.

      Reviewer #3 (Public Review):

      Here, the authors trained catElMo, a new context-aware embedding model for TCRβ CDR3 amino acid sequences for TCR-epitope specificity and clustering tasks. This method benchmarked existing work in protein and TCR language models and investigated the role that model architecture plays in the prediction performance. The major strength of this paper is comprehensively evaluating common model architectures used, which is useful for practitioners in the field. However, some key details were missing to assess whether the benchmarking study is a fair comparison between different architectures. Major comments are as follows:

      • It is not clear why epitope sequences were also embedded using catELMo for the binding prediction task. Because catELMO is trained on TCRβ CDR3 sequences, it's not clear what benefit would come from this embedding. Were the other embedding models under comparison also applied to both the TCR and epitope sequences? It may be a fairer comparison if a single method is used to encode epitope sequence for all models under comparison, so that the performance reflects the quality of the TCR embedding only.

      In our study, we indeed used the same embedding model for both TCRs and epitopes in each prediction model, ensuring a consistent approach throughout.

      Recognizing the importance of evaluating the impact of epitope embeddings, we conducted experiments in which we used BLOSUM62 matrix to embed epitope sequences for all models. The results (Supplementary Table 5) are well aligned with the performance reported in our paper. This suggests that epitope embedding may not play as critical a role as TCR embedding in the prediction tasks. To further validate this point, we conducted two additional experiments.

      Firstly, we used catELMo to embed TCRs while employing randomly initialized embedding matrices with trainable parameters for epitope sequences. It yielded similar prediction performance as when catELMo was used for both TCR and epitope embedding (Supplementary Table 6). Secondly, we utilized BLOSUM62 to embed TCRs but employed catELMo for epitope sequence embedding, resulting in performance comparable to using BLOSUM62 for both TCRs and epitopes (Supplementary Table 4). These experiment results confirmed the limited impact of epitope embedding on downstream performance.

      We conjecture that these results may be attributed to the significant disparity in data scale between TCRs (~290k) and epitopes (less than 1k). Moreover, TCRs tend to exhibit high similarity, whereas epitopes display greater distinctiveness from one another. These features of TCRs require robust embeddings to facilitate effective separation and improve downstream performance, while epitope embedding primarily serves as a categorical encoding.

      We have included a detailed discussion of these findings in the revised manuscript to provide a comprehensive understanding of the role of epitope embeddings in TCR binding prediction.

      • The tSNE visualization in Figure 3 is helpful. It makes sense that the last hidden layer features separate well by binding labels for the better performing models. However, it would be useful to know if positive and negative TCRs for each epitope group also separate well in the original TCR embedding space. In other words, how much separation between these groups is due to the neural network vs just the embedding?

      It is important to note that we used the same downstream prediction model, a simple three-linear-layer network, for all the discussed embedding methods. We believe that the separation observed in the t-SNE visualization effectively reflects the ability of our embedding model. Also, we would like to mention that it can be hard to see a clear distinction between positive and negative TCRs in the original embedding space because embedding models were not trained on positive/negative labels. Please refer to the t-SNE of the original TCR embeddings below.

      Author response image 1.

      • To generate negative samples, the author randomly paired TCRs from healthy subjects to different epitopes. This could produce issues with false negatives if the epitopes used are common. Is there an estimate for how frequently there might be false negatives for those commonly occurring epitopes that most populations might also have been exposed to? Could there be a potential batch effect for the negative sampled TCR that confounds with the performance evaluation?

      Thank you for bringing this valid and interesting point up. Generating negative samples is non-trivial since only a limited number of non-binding TCR-pairs are publicly available and experimentally validating non-binding pairs is costly [1]. Standard practices for generating negative pairs are (1) paring epitopes with healthy TCRs [2, 3], and (2) randomly shuffling existing TCR-epitope pairs [4,5]. We used both approaches (the former included in the main results, and the latter in the discussion). In both scenarios, catELMo embeddings consistently demonstrated superior performance.

      We acknowledge the possibility of false negatives due to the finite-sized TCR database from which we randomly selected TCRs, however, we believe that the likelihood of such occurrences is low. Given the vast diversity of human TCR clonotypes, which can exceed 10^15[6], the chance of randomly selecting a TCR that specifically recognizes a target epitope is relatively small.

      In order to investigate the batch effect, we generated new negative pairs using different seeds and observed consistent prediction performance across these variations. However, we agree that there could still be a potential batch effect for the negative samples due to potential data bias.

      We have discussed the limitation of generative negative samples in the revised manuscript.

      • Most of the models being compared were trained on general proteins rather than TCR sequences. This makes their comparison to catELMO questionable since it's not clear if the improvement is due to the training data or architecture. The authors partially addressed this with BERT-based models in section 2.4. This concern would be more fully addressed if the authors also trained the Doc2vec model (Yang et al, Figure 2) on TCR sequences as baseline models instead of using the original models trained on general protein sequences. This would make clear the strength of context-aware embeddings if the performance is worse than catElmo and BERT.

      We agree it is important to distinguish between the effects of training data and architecture on model performance.

      In Section 2.4, as the reviewer mentioned, we compared catELMo with BERT-based models trained on the same TCR repertoire data, demonstrating that architecture plays a significant role in improving performance. Furthermore, in Section 2.5, we compared catELMo-shallow with SeqVec, which share the same architecture but were trained on different data, highlighting the importance of data on the model performance.

      To further address the reviewer's concern, we trained a Doc2Vec model on the TCR sequences that have been used for catELMo training. We observed significantly lower prediction performance compared to catELMo, with an average AUC of 50.24% in TCR split and an average AUC of 51.02% in epitope split, making the strength of context-aware embeddings clear.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) It is known that TRB CDR3, the CDR1, CDR2 on TRBV gene and the TCR alpha chain also contribute to epitope recognition, but were not modeled in catELMo. It would be nice for the authors to add this as a current limitation for catELMo in the Discussion section.

      We have discussed the limitation in the revised manuscript.

      “Our study focuses on modeling the TCRβ chain CDR3 region, which is known as the primary determinant of epitope binding. Other regions, such as CDR1 and CDR2 on the TRB V gene, along with the TCRα chain, may also contribute to specificity in antigen recognition. However, a limited number of available samples for those additional features can be a challenge for training embedding models. Future work may explore strategies to incorporate these regions while mitigating the challenges of working with limited samples.”

      (2) I tried to follow the instructions to train a binding affinity prediction model for TCR-epitope pairs, however, the cachetools=5.3.0 seems could not be found when running "pip install -r requirements.txt" in the conda environment bap. Is this cachetools version supported after Python 3.7 so the Python 3.6.13 suggested on the GitHub repo might not work?

      This has been fixed. We have updated the README.md on our github page.

      Reviewer #2 (Recommendations For The Authors):

      The article is well-constructed and well-written, and the analysis is comprehensive.

      The comments for minor issues that I have are as follows:

      (1) In the Methods section, it will be clearer if the authors interpret more on how the standard deviation is calculated in all tables. How to define the '10 trials'? Are they based on different random training and test set splits?

      ‘10 trials' refers to the process of splitting the dataset into training, validation, and testing sets using different seeds for each trial. Different trials have different training, validation, and testing sets. For each trial, we trained a prediction model on its training set and measured performance on its testing set. The standard deviation was calculated from the 10 measurements, estimating model performance variation across different random splits of the data.

      (2) The format of AUCs and the improvement of AUCs need to be consistent, i.e., with the percent sign.

      We have updated the format of AUCs.

      Reviewer #3 (Recommendations For The Authors):

      In addition to the recommendations in the public review, we had the following more minor questions and recommendations:

      • Could you provide some more background on the data, such as overlaps between the databases, and how the training and validation split was performed between the three databases? Also summary statistics on the length of TCR and epitope sequence data would be helpful.

      We have provided more details about data in our revision.

      • Could you comment on the runtime to train and embed using the catELMo and BERT models?

      Our training data is TCR sequences with relatively short lengths (averaging less than 20 amino acid residues). Such characteristic significantly reduces the computational resources required compared to training large-scale language models on extensive text corpora. Leveraging standard machines equipped with two GeForce RTX 2080 GPUs, we were able to complete the training tasks within a matter of days. After training, embedding one sequence can be accomplished in a matter of seconds.

      • Typos and wording:

      • Table 1 first row of "source": "immunoSEQ" instead of "immuneSEQ"

      This has been corrected.

      • L23 of abstract "negates the need of complex deep neural network architecture" is a little confusing because ELMo itself is a deep neural network architecture. Perhaps be more specific and add that the need is for downstream tasks.

      We have made it more specific in our abstract.

      “...negates the need for complex deep neural network architecture in downstream tasks.”

      References

      (1) Montemurro, Alessandro, et al. "NetTCR-2.0 enables accurate prediction of TCR-peptide binding by using paired TCRα and β sequence data." Communications biology 4.1 (2021): 1060.

      (2) Jurtz, Vanessa Isabell, et al. "NetTCR: sequence-based prediction of TCR binding to peptide-MHC complexes using convolutional neural networks." BioRxiv (2018): 433706.

      (3) Gielis, Sofie, et al. "Detection of enriched T cell epitope specificity in full T cell receptor sequence repertoires." Frontiers in immunology 10 (2019): 2820.

      (4) Cai, Michael, et al. "ATM-TCR: TCR-epitope binding affinity prediction using a multi-head self-attention model." Frontiers in Immunology 13 (2022): 893247.

      (5) Weber, Anna, et al. "TITAN: T-cell receptor specificity prediction with bimodal attention networks." Bioinformatics 37 (2021): i237-i244.

      (6) Lythe, Grant, et al. "How many TCR clonotypes does a body maintain?." Journal of theoretical biology 389 (2016): 214-224.

    1. Author response:

      eLife assessment

      This is an important study describing a neuromuscular junction co-culture system using human cells that the authors use to study the synaptic consequences of ALS mutations. The data supporting the system are solid and show the value of using myotubes and motor neurons from the same donor. The study will be of interest to researchers who model neuromuscular junction disorders, however, the authors could more comprehensively compare and contrast their system with previous literature describing other similar models. There are also technical weaknesses that limit the interpretation of specific findings.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors propose an improved neuro-muscle co-culture system to study ALS-related functional differences in human pluripotent stem cell lines.

      Strengths:

      A simple co-culture system with functional readout.

      We appreciate the recognition that this is a simplified co-culture system with a straight-forward functional evaluation.

      Weaknesses:

      There are concerns about the lack of novelty, rigor, and clarity in the approach. The strength of the study is undermined by its reliance on transcription factors used more than a decade ago, low myocyte activity, and inadequate validation methods, such as the lack of single-cell transcriptome analysis and detailed neuromuscular synapse characterization. The evidence presented requires substantial validation through rigorous experimental approaches and resolution of the identified concerns for the study's findings to be considered significant and reliable.

      The muscle differentiation protocol used in our work is an adaptation of the Albini S, et al. Cell Rep. 2013. This protocol was selected due to its efficiency to differentiate skeletal muscles from pluripotent stem cells (PSCs). Modifications from the original publications were made in the plasmids (MYOD and BAF60C) used, such as the inclusion of selection genes, puromycin and blasticidin, to improve efficiency. Moreover, a criticism of the previously used overexpression system, especially overexpression of MYOD, is that it introduces artificial expression of this gene throughout muscle differentiation, when it is only supposed to be expressed early in myogenesis. Thus, the constructs used in our work are dox inducible, which enables us to control the expression of MYOD and restrict it to the first 48 hours. This protocol resulted in a highly efficient skeletal muscle differentiation, as noted in our manuscript. “The PSC-derived skeletal muscles were characterized by the presence of Desmin (DES) and Myosin Heavy Chain (MHC), and as early as day 8 of differentiation nearly 100% of the cells co-expressed these markers.” We agree with the reviewer that the myocyte activity identified in our work is lower compared to Albini et al. (2013), mostly explained by the modification we made to the method, from a 3D to a 2D culture. In Albini et al. (2013) the electrophysiological properties were assayed in skeletal myospheres (3D), which are known to improve contractility measurements. Conversely, in 2D cultures when the contractility intensifies the cells detach from the plate. Thus, a tight regulation of cell concentration for optimal maturation and formation of contractile skeletal muscle culture without premature detachment of the cells is required. We believe that single-cell or single-nuclei transcriptome analysis from the co-culture setting of two well-defined cell types might yield little value for method characterization, however, as part of a follow up study we are performing morphological NMJ characterization and applying single-nuclei transcriptome analysis in the fALS disease context to identify specific molecular mechanisms that result in synaptic dysfunction.

      Reviewer #2 (Public Review):

      The manuscript by Chen et al from the group of Helen Miranda aims to describe an improved neuromuscular junction (NMJ) model to study synaptic dysfunction in several cases of familial ALS. Overall, the system described in the paper appears as a valid platform to study disease phenotypes with exciting results showing specific effects of GDNF on non-SOD1 ALS patient lines. The strength of the paper lies in the use of myotubes, and motor neurons derived from the same donor. However, the current study: (1) lacks a clear comparison of the current system with numerous previously described systems; (2) is limited by the number of repeat experiments in the study and (3) has no description of the synaptic phenotype observed in the study. These major points are discussed in more detail below.

      We appreciate the recognition that “the system described in the paper appears as a valid platform to study disease phenotypes with exciting results showing specific effects of GDNF on non-SOD1 ALS patient lines” and the careful evaluation of our work. We plan to address the points raised by this reviewer in the revision.

      Major points:

      (1) In the introduction the authors state (p. 4): "Finally, recent human NMJ models have been established from PSCs by differentiating these cells into both skeletal muscles and motor neurons in 2D and 3D formats. These previous systems present a remarkable advancement to the studies of human NMJs, however, they require long NMJ formation and maturation time (40 to 60 days), which, restricts their sensitivity and scalability [42]"

      In fact, a number of studies have described various in-vitro NMJ systems, with the same timeframes for NMJ formation. For example, in studies by Osaki et al, 2018, Sci Adv; Bellmann et al, 2019, Biomat; Demestre et al, 2015, Stem Cell Res; Badu-Mensah et al, 2022, Biomat (this is just an exemplar selection of the papers); NMJ formation was observed as early as 14 d in culture, in line with or at least slightly longer than reported by Chen et al. With the exception of the study by Osaki et al, all co-culture systems cited above are 2D-based. The authors need to expand on this further or provide a quantitative assessment of why their system is better compared to previously published models.

      Indeed, there are previous publications that have described neuromuscular junctions (NMJs) in cocultures of iPSC-derived skeletal muscles and motor neurons. Some of the publications mentioned above did show NMJ formation within ~20ish days, albeit with several caveats such as culture heterogeneity, i.e. 50% motor neuron differentiation efficiency. We agree with the reviewer that this needs to be expanded and clarified, and we will address this concern in the revision.

      (2) Further, when comparing their results with other work it is hard to claim how the current system is (p. 5) "more reproducible, and offers a 6-fold increase in scalability compared to previous models [40-43]".

      The authors need to expand on this further.

      This is an important aspect of this work, and we believe that our protocol offers a higher reproducibility due to, at least partially, the homogeneity of the starting cultures of iPSC-derived skeletal muscles and iPSC-derived motor neurons, and that the direct 2D co-culture approach is more suitable for miniaturization compared to 3D cultures or microfluidic chamber devices. Thus, we will expand on this idea in the revision.

      (3) Although mentioned, there were no examples of the modularity of the system, which of course would strengthen the paper and help to uncover ALS mechanisms of synaptic formation, for example by combining WT myotubes and fALS motor neurons (see point 4 below). The authors should show how they would adapt to 96 well plate format to showcase the scalability of the system. Based on their data on the efficacy of synaptic formation (60 per 0.7 cm2 area), is further miniaturization allowed?

      We appreciate the points raised by the reviewer. The “mix-and-match” approach to co-culture wild-type and affected iPSC-derived skeletal muscles with iPSC-derived motor neurons is a main focus of our lab and an advantage to protocols like ours, where cells are differentiated independently and later co-cultured together; however, a comprehensive characterization of various mix-match combinations is beyond the scope of this Tools and Resources article. Since the initial submission of this manuscript, we have extensively optimized the scalability of the co-cultures from the initial 0.7 cm2 to 0.32 cm2 (96-well plates). Further miniaturization is also being optimized to 0.136 cm2 (384-well plates). This point will be clarified in the revision.

      (4) A lot of a-bungarotoxin staining corresponds to AChR clusters that do not seem to be associated with muscle and do not form normal rings of clustering (pretzel-like) associated with the NMJ in vivo. This is seen clearly in Figure 3B and Figure 5B. Figures 3B and 5B only show low-magnification images which makes it difficult to assess the specificity of localization of the pre-/post-synaptic markers. The authors should clearly show the morphologies of the NMJs formed in WT and fALS lines at high magnification. In addition, the authors should show co-localization images for a-bungarotoxin and myosin-heavy chains to confirm the localization of the bungarotoxin signal on the myotubes.

      In addition to that, the authors report that the number of functional synapses formed on a plate varies from 30 (fASL) to 60 (Ctrl) per 10,000 neurons spread over the 0.7 cm2 area (0.6%). How do the authors explain an extensive loss of a-bungarotoxin signal in Figure 5B the majority of which likely corresponds to AChR clusters that are formed outside of neuronal connections? Such clustering can be usually observed in immature co-cultures and in vivo prior to the innervation of myotubes. One explanation could be that myotubes derived from fALS PSC are less capable of synaptic formation. Noteworthy, a study of PSCderived myotubes and motor neurons from PSC lines with various SOD1 mutations has already been published, but not cited by Chen et al (Badu-Mensah et al). Given the importance of those confounding factors, the authors should test cell-intrinsic (motor neuron-related) vs non-cell-intrinsic mechanisms by co-culturing healthy myotubes with fALS-derived motor neurons followed by NMJ quantification.

      The iPSC-derived skeletal muscle cultures were plated as a monolayer and even though the abungarotoxin staining does not show the pretzel-like shape NMJs, similar to other in vitro NMJ protocols (Badu-Mensah et al, Biomat 2023; Pereira et al., Nat Commun 2021; Uzel et al., Sci Adv 2016), abungarotoxin does show association with the muscles. For quantification purposes we omitted the MHC staining to decrease background, however we will include it in the revision in response to the reviewer’s concern.

      We agree with the reviewer that the suggested approaches would yield insight into disease mechanism but are beyond the scope of this method development study. In fact, we are very excited about our follow up study pursuing a more in-depth analysis of cell-autonomous vs non-cell autonomous pathogenesis to understand the NMJ dysfunction in fALS. We apologize that the “Badu-Mensah et al” work was not included, this was our oversight and will be added in the revision.

      (5) The authors present the advantage of optogenetic stimulation, but they only show the proof-ofprinciple and never really apply it to their studies. Specifically, with regard to Figure 6, are motor units derived from fALS PSCs incapable of being ontogenetically activated to the same extent as control motor units? Does the dysfunction stem from fALS motor neurons or fALS myotubes?

      We agree that these are important questions to be addressed and are actively pursuing these experiments as part of the natural follow up investigation from the present Tools and Resources article.

      (6) Figures 6 B and C appear to be identical except for the addition of the GDNF effect on the fALS lines. This should all be put in one figure. The authors should also show whether GDNF-induced functional recovery is associated with recovery in the number of motor units or with merely synaptic function by quantifying the NMJ number in the presence of GDNF.

      We will combine Figures 6B and 6C in the revision. Our follow up study also includes the interrogation of the mechanism through which GDNF rescues fALS NMJ dysfunction.

      (7) Figure 5 and Figure 6. The authors only use one line per fALS mutation and their corresponding isogenic controls. They state that the n=6 for these experiments represents the technical replication of the experiment. These experiments should be performed at least n=3 times starting from neuronal differentiation, and not by seeding replicate wells representing a true replication of each experiment. This would significantly strengthen their argument that their method is robust and the results are easily reproducible.

      We will clarify that the technical replicates originated from independent differentiations in the revision.

      (8) In the discussion the authors may want to mention that the lack of function of GDNF on the SOD1 lines may relate to the fact that SOD1 mutations do not lead to TDP43 pathology. Although speculative this suggests that in cases with TDP43 mutations (their data) or sporadic disease GDNF may be effective.

      We appreciate this suggestion and will highlight this as possible inclusion criteria for GDNF treatment in the discussion of our revised version of the manuscript.

      (9) Although beyond the scope of this paper, it would of course be interesting to see if sporadic forms of ALS had this same phenotype.

      We agree with the reviewer and we hope to include iPSC derived NMJs from sporadic ALS patients in a future study.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Strengths:

      This work (almost didactically) demonstrates how to develop, calibrate, validate and analyze a comprehensive, spatially resolved, dynamical, multicellular model. Testable model predictions of (also non-monotonic) emergent behaviors are derived and discussed. The computational model is based on a widely-used simulation platform and shared openly such that it can be further analyzed and refined by the community.

      Weaknesses:

      While the parameter estimation approach is sophisticated, this work does not address issues of structural and practical non-identifiability (Wieland et al., 2021, DOI:10.1016/j.coisb.2021.03.005) of parameter values, given just tissue-scale summary statistics, and does not address how model predictions might change if alternative parameter combinations were used. Here, the calibrated model represents one point estimate (column "Value" in Suppl. Table 1) but there is specific uncertainty of each individual parameter value and such uncertainties need to be propagated (which is computationally expensive) to the model predictions for treatment scenarios.

      We thank the reviewer for the excellent suggestions and observations. The CaliPro parameterization technique applied puts an emphasis on finding a robust parameter space instead of a global optimum. To address structural non-identifiability, we utilized partial rank correlation coefficient with each iteration of the calibration process to ensure that the sensitivity of each parameter was relevant to model outputs. We also found that there were ranges of parameter values that would achieve passing criteria but when testing the ranges in replicate resulted in inconsistent outcomes. This led us to further narrow the parameters into a single parameter set that still had stochastic variability but did not have such large variability between replicate runs that it would be unreliable. Additional discussion on this point has been added to lines 623-628. We acknowledge that there are likely other parameter sets or model rules that would produce similar outcomes but the main purpose of the model was to utilize it to better understand the system and make new predictions, which our calibration scheme allowed us to accomplish.

      Regarding practical non-identifiability, we acknowledge that there are some behaviors that are not captured in the model because those behaviors were not specifically captured in the calibration data. To ensure that the behaviors necessary to answer the aims of our paper were included, we used multiple different datasets and calibrated with multiple different output metrics. We believe we have identified the appropriate parameters to recapitulate the dominating mechanisms underlying muscle regeneration. We have added additional discussion on practical non-identifiability to lines 621-623.

      Suggested treatments (e.g. lines 484-486) are modeled as parameter changes of the endogenous cytokines (corresponding to genetic mutations!) whereas the administration of modified cytokines with changed parameter values would require a duplication of model components and interactions in the model such that cells interact with the superposition of endogenous and administered cytokine fields. Specifically, as the authors also aim at 'injections of exogenously delivered cytokines' (lines 578, 579) and propose altering decay rates or diffusion coefficients (Fig. 7), there needs to be a duplication of variables in the model to account for the coexistence of cytokine subtypes. One set of equations would have unaltered (endogenous) and another one have altered (exogenous or drugged) parameter values. Cells would interact with both of them.

      Our perturbations did not include delivery of exogenously delivered cytokines and instead were focused on microenvironmental changes in cytokine diffusion and decay rates or specific cytokine concentration levels. For example, the purpose of the VEGF delivery perturbation was to test how an increase in VEGF concentrations would alter regeneration outcome metrics with the assumption that the delivered VEGF would act in the same manner as the endogenous VEGF. We have clarified the purpose of the simulations on line 410. We agree that exploring if model predictions would be altered if endogenous and exogenous were represented separately; however, we did not explore this type of scenario.

      This work shows interesting emergent behavior from nonlinear cytokine interactions but the analysis does not provide insights into the underlying causes, e.g. which of the feedback loops dominates early versus late during a time course.

      Indeed, analyzing the model to fully understand the time-varying interactions between the multiple feedback loops is a challenge in and of itself, and we appreciate the opportunity to elaborate on our approach to addressing this challenge. First: the crosstalk/feedback between cytokines and the temporal nature was analyzed in the heatmap (Fig. 6) and lines 474-482. Second: the sensitivity of cytokine parameters to specific outputs was included in Table 9 and full-time course sensitivity is included in Supplemental Figure 2. Further correlation analysis was also included to demonstrate how cytokine concentrations influenced specific output metrics at various timepoints (Supplemental Fig. 3). We agree that further elaboration of these findings is required; therefore, we added lines 504-509 to discuss the specific mechanisms at play with the combined cytokine interactions. We also added more discussion (lines 637-638) regarding future work that could develop more analysis methods to further investigate the complex behaviors in the model.

      Reviewer #2 (Public Review):

      Strengths:

      The manuscript identified relevant model parameters from a long list of biological studies. This collation of a large amount of literature into one framework has the potential to be very useful to other authors. The mathematical methods used for parameterization and validation are transparent.

      Weaknesses:>

      I have a few concerns which I believe need to be addressed fully.

      My main concerns are the following:

      (1) The model is compared to experimental data in multiple results figures. However, the actual experiments used in these figures are not described. To me as a reviewer, that makes it impossible to judge whether appropriate data was chosen, or whether the model is a suitable descriptor of the chosen experiments. Enough detail needs to be provided so that these judgements can be made.

      Thank you for raising this point. We created a new table (Supplemental table 6) that describes the techniques used for each experimental measurement.

      (2) Do I understand it correctly that all simulations are done using the same initial simulation geometry? Would it be possible to test the sensitivity of the paper results to this geometry? Perhaps another histological image could be chosen as the initial condition, or alternative initial conditions could be generated in silico? If changing initial conditions is an unreasonably large request, could the authors discuss this issue in the manuscript?

      We appreciate your insightful question regarding the initial simulation geometry in our model. The initial configuration of the fibers/ECM/microvascular structures was kept consistent but the location of the necrosis was randomly placed for each simulation. Future work will include an in-depth analysis of altered histology configuration on model predictions which has been added to lines 618-621. We did a preliminary example analysis by inputting a different initial simulation geometry, which predicted similar regeneration outcomes. We have added Supplemental Figure 5 that provides the results of that example analysis.

      (3) Cytokine knockdowns are simulated by 'adjusting the diffusion and decay parameters' (line 372). Is that the correct simulation of a knockdown? How are these knockdowns achieved experimentally? Wouldn't the correct implementation of a knockdown be that the production or secretion of the cytokine is reduced? I am not sure whether it's possible to design an experimental perturbation which affects both parameters.

      We appreciate that this important question has been posed. Yes, in order to simulate the knockout conditions, the cytokine secretion was reduced/eliminated. The diffusion and decay parameters were also adjusted to ensure that the concentration within the system was reduced. Lines 391-394 were added to clarify this assumption.

      (4) The premise of the model is to identify optimal treatment strategies for muscle injury (as per the first sentence of the abstract). I am a bit surprised that the implemented experimental perturbations don't seem to address this aim. In Figure 7 of the manuscript, cytokine alterations are explored which affect muscle recovery after injury. This is great, but I don't believe the chosen alterations can be done in experimental or clinical settings. Are there drugs that affect cytokine diffusion? If not, wouldn't it be better to select perturbations that are clinically or experimentally feasible for this analysis? A strength of the model is its versatility, so it seems counterintuitive to me to not use that versatility in a way that has practical relevance. - I may well misunderstand this though, maybe the investigated parameters are indeed possible drug targets.

      Thank you for your thoughtful feedback. The first sentence (lines 32-34) of the abstract was revised to focus on beneficial microenvironmental conditions to best reflect the purpose of the model. The clinical relevance of the cytokine modifications is included in the discussion (lines 547-558) with additional information added to lines 524-526. For example, two methods to alter diffusion experimentally are: antibodies that bind directly to the cytokine to prevent it from binding to its receptor on the cell surface and plasmins that induce the release of bound cytokines.

      (5) A similar comment applies to Figure 5 and 6: Should I think of these results as experimentally testable predictions? Are any of the results surprising or new, for example in the sense that one would not have expected other cytokines to be affected as described in Figure 6?

      We appreciate the opportunity to clarify the basis for these perturbations. The perturbations included in Figure 5 were designed to mimic the conditions of a published experiment that delivered VEGF in vivo (Arsic et al. 2004, DOI:10.1016/J.YMTHE.2004.08.007). The perturbation input conditions and experimental results are included in Table 8 and Supplemental Table 6 has been added to include experimental data and method description of the perturbation. The results of this analysis provide both validation and new predictions, because some the outputs were measured in the experiments while others were not measured. The additional output metrics and timepoints that were not collected in the experiment allow for a deeper understanding of the dynamics and mechanisms leading to the changes in muscle recovery (lines 437-454). These model outputs can provide the basis for future experiments; for example, they highlight which time points would be more important to measure and even provide predicted effect sizes that could be the basis for a power analysis (lines 639-640).

      Regarding Figure 6, the published experimental outcomes of cytokine KOs are included in Table 8. The model allowed comparison of different cytokine concentrations at various timepoints when other cytokines were removed from the system due to the KO condition. The experimental results did not provide data on the impact on other cytokine concentrations but by using the model we were able to predict temporally based feedback between cytokines (lines 474-482). These cytokine values could be collected experimentally but would be time consuming and expensive. The results of these perturbations revealed the complex nature of the relationship between cytokines and how removal of one cytokine from the system has a cascading temporal impact. Lines 533-534 have been added to incorporate this into the discussion.

      (6) In figure 4, there were differences between the experiments and the model in two of the rows. Are these differences discussed anywhere in the manuscript?

      We appreciate your keen observation and the opportunity to address these differences. The model did not match experimental results for CSA output in the TNF KO and antiinflammatory nanoparticle perturbation or TGF levels with the macrophage depletion. While it did align with the other experimental metrics from those studies, it is likely that there are other mechanisms at play in the experimental conditions that were not captured by simulating the downstream effects of the experimental perturbations. We have added discussion of the differences to lines 445-454.

      (7) The variation between experimental results is much higher than the variation of results in the model. For example, in Figure 3 the error bars around experimental results are an order of magnitude larger than the simulated confidence interval. Do the authors have any insights into why the model is less variable than the experimental data? Does this have to do with the chosen initial condition, i.e. do you think that the experimental variability is due to variation in the geometries of the measured samples?

      Thank you for your insightful observations and questions. The lower model variability is attributed to the larger sample size of model simulations compared to experimental subjects. By running 100 simulations it narrows in the confidence interval (average 2.4 and max 3.3) compared to the experiments that typically had a sample size of less than 15. If the number of simulations had been reduced to 15 the stochasticity within the model results in a larger confidence interval (average 7.1 and max 10). There are also several possible confounding variables in the experimental protocols (i.e. variations in injury, different animal subjects for each timepoint, etc.) that are kept constant in the model simulation. We have added discussion of this point to the manuscript (lines 517519). Future work with the model will examine how variations in conditions, such as initial muscle geometry, injury, etc, alter regeneration outcomes and overall variability. This discussion has been incorporated into lines 640-643.

      (8) Is figure 2B described anywhere in the text? I could not find its description.

      Thank you for pointing that out. We have added a reference for Fig. 2B on line 190.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) The model code seems to be available from https://simtk.org/projects/muscle_regen but that website requests member status ("This is a private project. You must be a member to view its contents.") and applying for membership could violate eLife's blind review process. So, this reviewer liked to but couldn't run the model her/himself. To eLife: Can the authors upload their model to a neutral server that reviewers and editors can access anonymously?

      The code has been made publicly available on the following sites:

      SimTK: https://simtk.org/docman/?group_id=2635

      Zendo: https://zenodo.org/records/10403014

      GitHub: https://github.com/mh2uk/ABM-of-Muscle-Regeneration-with-MicrovascularRemodeling

      Line 121 has been updated with the new link and the additional resources were added to lines 654-657.

      (2) The muscle regeneration field typically studies 2D cross-sections and the present model can be well compared to these other 2D models but cells as stochastic and localized sources of diffusible cytokines may yield different cytokine fields in 3D vs. 2D. I would expect more broadened and smoothened cytokine fields (from sources in neighboring cross-sections) than what the 2D model predicts based on sources just within the focus cross-section. Such relations of 2D to 3D should be discussed.

      We thank the reviewer for the excellent suggestions and observations. It has been reported in other Compucell3D models (Sego et al. 2017, DOI:10.1088/17585090/aa6ed4) that the convergence of diffusion solutions between 2D and 3D model configurations had similar outcomes, with the 3D simulations presenting excessive computational cost without contributing any noticeable additional accuracy. Similarly, other cell-based ABMs that incorporate diffusion mechanisms (Marino et al. 2018, DOI:10.3390/computation6040058) have found that 2D and 3D versions of the model both predict the same mechanisms and that the 2D resolution was sufficient for determining outcomes. Lines 615-618 were added to elaborate on this topic.

      (3) Since the model (and title) focuses on "nonlinear" cytokine interactions, what would change if cytokine decay would not be linear (as modeled here) but saturated (with nonlinear Michaelis-Menten kinetics as ligand binding and endocytosis mechanisms would call for)?

      Thank you for raising an intriguing point. The model includes a combination of cytokine decay as well as ligand binding and endocytosis mechanisms that can be saturated. For a cytokine-dependent model behavior to occur the cytokines necessary to induce that action had to reach a minimum threshold. Once that threshold was reached, that amount of the cytokine would be removed at that location to simulate ligand-receptor binding and endocytosis. These ligand binding and endocytosis mechanisms behave in a saturated way, removing a set amount when above a certain threshold or a defined ratio when under the threshold. Lines 313-315 was revised to clarify this point. There were certain concentrations of cytokines where we saw a plateau in outputs likely as a result of reaching a saturation threshold (Supplemental Fig. 3). In future work, more robust mathematical simulation of binding kinetics of cytokines (e.g., using ODEs) could be included.

      (4) Limitations of the model should be discussed together with an outlook for model refinement. For example, fiber alignment and ECM ultrastructure may require anisotropic diffusion. Many of the rate equations could be considered with saturation parameters etc. There are so many model assumptions. Please discuss which would be the most urgent model refinements and, to achieve these, which would be the most informative next experiments to perform.

      We appreciate your thoughtful consideration of the model's limitations and the need for a comprehensive discussion on model refinements and potential future experiments. The future direction section was expanded to discuss additional possible model refinements (lines 635-643) and additional possible experiments for model validation (lines 630-634).

      (5) It is not clear how the single spatial arrangement that is used affects the model predictions. E.g. now the damaged area surrounds the lymphatic vessel but what if the opposite corner was damaged and the lymphatic vessel is deep inside the healthy area?

      Thank you for highlighting the importance of considering different spatial arrangements in the model and its potential impact on predictions. We previously tested model perturbations that included specifying the injury surrounding the lymphatic vessel versus on the side opposite the vessel. Since this paper focuses more on cytokine dynamics, we plan to include this perturbation, along with other injury alterations, in a follow-on paper. We added more context about this in the future efforts section lines 640-643.

      (6) It seems that not only parameter values but also the initial values of most of the model components are unknown. The parameter estimation strategy does not seem to include the initial (spatial) distributions of collagen and cytokines and other model components. Please discuss how other (reasonable) initial values or spatial arrangements will affect model predictions.

      We appreciate your thoughtful consideration of unknown initial values/spatial arrangements and their potential influence on predictions. Initial cytokine levels prior to injury had a low relative concentration compared to levels post injury and were assumed to be negligible. Initial spatial distribution of cytokines was not defined as initial spatial inputs (except in knockout simulations) but are secreted from cells (with baseline resident cell counts defined from the literature). The distribution of cytokines is an emergent behavior that results from the cell behaviors within the model. The collagen distribution is altered in response to clearance of necrosis by the immune cells (decreased collagen with necrosis removal) and subsequent secretion of collagen by fibroblasts. The secretion of collagen from fibroblast was included in the parameter estimation sweep (Supplemental Table 1).

      We are working on further exploring the model sensitivity to altered spatial arrangements and have added this to the future directions section (lines 618-621), as well as provided Supplemental Figure 5 to demonstrate that model outcomes are similar with altered initial spatial arrangements.

      (7) Many details of the CC3D implementation are missing: overall lattice size, interaction neighborhood order, and "temperature" of the Metropolis algorithm. Are the typical adhesion energy terms used in the CPM Hamiltonian and if so, then how are these parameter values estimated?

      Thank you for bringing attention to the missing details regarding the CC3D implementation in our manuscript. We have included supplemental information providing greater detail for CPM implementation (Lines 808-854). We also added two additional supplemental tables for describing the requested CC3D implementation details (Supplemental Table 4) and adhesion energy terms (Supplemental Table 5).

      (8) Extending the model analysis of combinations of altered cytokine properties, which temporal schedules of administration would be of interest, and how could the timing of multiple interventions improve outcomes? Such a discussion or even analysis would further underscore the usefulness of the model.

      In response to your valuable suggestion, lines 558-562 were added to discuss the potential of using the model as a tool to perturb different cytokine combinations at varying timepoints throughout regeneration. In addition, this is also included in future work in lines 636-637.

      (9) The CPM is only weakly motivated, just one sentence on lines 142-145 which mentions diffusion in a misleading way as the CPM just provides cells with a shape and mechanical interactions. The diffusion part is a feature of the hybrid CompuCell3D framework, not the CPM.

      Thank you for bringing up this distinction. We removed the statement regarding diffusion and updated lines 143-146 to focus on CPM representation of cellular behavior and interactions. We also added a reference to supplemental text that includes additional details on CPM.

      (10) On lines 258-261 it does not become clear how the described springs can direct fibroblasts towards areas of low-density collagen ECM. Are the lambdas dependent on collagen density?

      Thank you for highlighting this area for clarification. The fibroblasts form links with low collagen density ECM and then are pulled towards those areas based on a constant lambda value. The links between the fibroblast and the ECM will only be made if the collagen is below a certain threshold. We added additional clarification to lines 260-264.

      (11) On line 281, what does the last part in "Fibers...were regenerating but not fully apoptotic cells" mean? Maybe rephrase this.

      The last of part of that line indicates that there were some fibers surrounding the main injury site that were damaged but still had healthy portions, indicating that they were impacted by the injury and are regenerating but did not become fully apoptotic like the fiber cells at the main site of injury. We rephrased this line to indicate that the nearby fibers were damaged but not fully apoptotic.

      (12) Lines 290-293 describe interactions of cells and fields with localized structures (capillaries and lymphatic vessel). Please explain in more detail how "capillary agents...transport neutrophiles and monocytes" in the CPM model formalism. Are new cells added following rules? How is spatial crowding of the lattice around capillaries affecting these rules? Moreover, how can "lymphatic vessel...drain the nearby cytokines and cells"? How is this implemented in the CPM and how is "nearby" calculated? We appreciate your detailed inquiry into the interactions of cells and fields with localized structures. The neutrophils and monocytes are added to the simulation at the lattice sites above capillaries (within the cell layer Fig. 2B) and undergo chemotaxis up their respective gradients. The recruitment of the neutrophils and monocytes are randomly distributed among the healthy capillaries that do not have an immune cell at the capillary location (a modeling artifact that is a byproduct of only having one cell per lattice site). This approach helped to prevent an abundance of crowding at certain capillaries. Because immune cells in the simulation are sufficiently small, chemotactic gradients are sufficiently large, and the simulation space is sufficiently large, we do not see aggregation of recruited immune cells in the CPM.

      The lymphatic vessel uptakes cytokines at lattice locations corresponding to the lymphatic vessel and will remove cells located in lattice sites neighboring the lymphatic vessel. In addition, we have included a rule in our ABM to encourage cells to migrate towards the lymphatic vessel utilizing CompuCell3D External Potential Plugin. The influence of this rule is inversely proportional to the distance of the cells to the lymphatic vessel.

      We have updated lines 294-298 and 305-309 to include the above explanation.

      (13) Tables 1-4 define migration speeds as agent rules but in the typical CPM, migration speed emerges from random displacements biased by chemotaxis and other effects (like the slope of the cytokine field). How was the speed implemented as a rule while it is typically observable in the model?

      We appreciate your inquiry regarding the implementation of migration speeds. To determine the lambda parameters (Table 7) for each cell type, we tested each in a simplified control simulation with a concentration gradient for the cell to move towards. We tuned the lambda parameters within this simulation until the model outputted cell velocity aligned with the literature reported cell velocity for each cell type (Tables 1-4). We have incorporated clarification on this to lines 177-180.

      (14) Line 312 shows the first equation with number (5), either add eqn. (1-4) or renumber.

      We have revised the equation number.

      (15) Typos: Line 456, "expect M1 cell" should read "except M1 cell".

      Line 452, "thresholds above that diminish fibroblast response (Supplemental Fig 3)." remains unclear, please rephrase.

      Line 473, "at 28." should read "at 28 days.".

      Line 474, is "additive" correct? Was the sum of the individual effects calculated and did that match?

      Line 534, "complexity our model" should read "complexity in our model".

      We have corrected the typos and clarified line 452 (updated line 594) to indicate that the TNF-α concentration threshold results in diminished fibroblast response. We updated terminology line 474 (updated line 512) to indicate that there was a synergistic effect with the combined perturbation.

      (16) Table 7 defines cell target volumes with the same value as their diameter. This enforces a strange cell shape. Should there be brackets to square the value of the cell diameter, e.g. Value=(12µm)^2 ?

      The target volume parameter values were selected to reflect the relative differences in average cell diameter as reported in the literature; however, there are no parameters that directly enforce a diameter for the cells in the CPM formalism separate from the volume. We have observed that these relative cell sizes allow the ABM to effectively reproduce cell behaviors described in the literature. Single cells that are too large in the ABM would be unable to migrate far enough per time step to carry out cell behaviors, and cells that are too small in the CPM would be unstable in the simulation environment and not persist in the simulation when they should. We removed the units for the cell shape values in Table 7 since the target volume is a relative parameter and does not directly represent µm.

      (17) Table 7 gives estimated diffusion constants but they appear to be too high. Please compare them to measured values in the literature, especially for MCP-1, TNF-alpha and IL-10, or relate these to their molecular mass and compare to other molecules like FGF8 (Yu et al. 2009, DOI:10.1038/nature08391).

      We utilized a previously published estimation method (Filion et al. 2004, DOI:10.1152/ajpheart.00205.2004) for estimating cytokine diffusivity within the ECM. This method incorporates the molecular masses and accounts for the combined effects of the collagen fibers and glycosaminoglycans. The paper acknowledged that the estimated value is faster than experimentally determined values, but that this was a result of the less-dense matrix composition which is more reflective of the tissue environment we are simulating in contrast to other reported measurements which were done in different environments. Using this estimation method also allowed us to more consistently define diffusion constants versus using values from the literature (which were often not recorded) that had varied experimental conditions and techniques (such as being in zebrafish embryo Yu et al. 2009, DOI:10.1038/nature08391 as opposed to muscle tissue). This also allowed for recalculation of the diffusivity throughout the simulation as the collagen density changed within the model. Lines 318-326 were updated to help clarify the estimation method.

      (18) Many DOIs in the bibliography (Refs. 7,17,20,31,40,47...153) are wrong and do not resolve because the appended directory names are not allowed in the DOI, just with a journal's URL after resolution.

      Thank you for bringing this to our attention. The incorrect DOIs have been corrected.

      Reviewer #2 (Recommendations For The Authors):

      Minor comments:

      (9) On line 174, the authors say "We used the CC3D feature Flip2DimRatio to control the number of times the Cellular-Potts algorithm runs per mcs." What does this mean? Isn't one monte carlo timestep one iteration of the Cellular Potts model? How does this relate to physical timescales?

      We appreciate your attention to detail and thoughtful question regarding the statement about the use of the CC3D feature Flip2DimRatio. Lines 175-177 were revised to simplify the meaning of Flip2DimRatio. That parameter alters the number of times the Cellular-Potts algorithm is run, which is the limiting factor for cell movement. The physical timescale is kept to a 15-minute timestep but a high Flip2DimRatio allows more flexibility and stability to allow the cells to move faster in one timestep.

      (10) Has the costum matlab script to process histology images into initial conditions been made available?

      The Matlab script along with CC3D code for histology initialization with documentation has been made available with the source code on the following sites:

      SimTK: https://simtk.org/docman/?group_id=2635

      Zendo: https://zenodo.org/records/10403014

      GitHub: https://github.com/mh2uk/ABM-of-Muscle-Regeneration-with-MicrovascularRemodeling

      (11) Equation 5 is provided without a reference or derivation. Where does it come from and what does it mean?

      Thank you for highlighting the diffusion equation and seeking clarification on its origin and significance. Lines 318-326 were revised to clarify where the equation comes from. This is a previously published estimation method that we applied to calculate the diffusivity of the cytokines considering both collagen and glycosaminoglycans.

      (12) Line 326: "For CSA, experimental fold-change from pre-injury was compared with fold-change in model-simulated CSA". Does this step rely on the assumption that the fold change will not depend on the CSA? If so, is this something that is experimentally known, or otherwise, can it be confirmed by simulations?

      We appreciate the opportunity to clarify our rationale. The fold change was used as a method to normalize the model and experiment so that they could be compared on the same scale. Yes, this step relies on the assumption that fold change does not depend on pre-injury CSA. Experimentally it is difficult to determine the impact of initial fiber morphology on altered regeneration time course. This fold-change allows us to compare percent recovery which is a common metric utilized to assess muscle regeneration outcomes experimentally. Line 340-343 was revised to clarify.

      (13) Line 355: "The final passing criteria were set to be within 1 SD for CSA recovery and 2.5 SD for SSC and fibroblast count" Does this refer to the experimental or the simulated SD?

      The model had to fit within those experimental SD. Lines 371-372 was edited to specify that we are referring the experimental SD.

      (14) "Following 8 iterations of narrowing the parameter space with CaliPro, we reached a set that had fewer passing runs than the previous iteration". Wouldn't one expect fewer passing runs with any narrowing of the parameter space? Why was this chosen as the stopping criterion for further narrowing?

      We appreciate your observation regarding the statement about narrowing the parameter space with CaliPro. We started with a wide parameter space, expecting that certain parameters would give outputs that fall outside of the comparable data. So, when the parameter space was narrowed to enrich parts that give passing output, initially the number of passing simulations increased.

      Once we have narrowed the set of possible parameters into an ideal parameter space, further narrowing will cut out viable parameters resulting in fewer passing runs. Therefore, we stopped narrowing once any fewer simulations passed the criteria that they had previously passed with the wider parameter set. Lines 375-379 have been updated to clarify this point.

      (15) Line 516: 'Our model could test and optimize combinations of cytokines, guiding future experiments and treatments." It is my understanding that this is communicated as a main strength of the model. Would it be possible to demonstrate that the sentence is true by using the model to make actual predictions for experiments or treatments?

      This is demonstrated by the combined cytokine alterations in Figure 7 and discussed in lines 509-513. We have also added in a suggested experiment to test the model prediction in lines 691-695.

      (16) Line 456, typo: I think 'expect' should be 'except'.

      Thank you for pointing that out. The typo has been corrected.

    1. Author response:

      Reviewer #1 (Public Review):

      Summary:

      The authors collected genomic information from public sources covering 423 eukaryote genomes and around 650 prokaryote genomes. Based on pre-computed CDS annotation, they estimated the frequency of alternative splicing (AS) as a single average measure for each genome and computed correlations with this measure and other genomic properties such as genome size, percentage of coding DNA, gene and intergenic span, etc. They conclude that AS frequency increases with genome complexity in a somewhat directional trend from "lower" organisms to "higher" organisms.

      Strengths:

      The study covers a wide range of taxonomic groups, both in prokaryotes and eukaryotes.

      Weaknesses:

      The study is weak both methodologically and conceptually. Current high throughput sequencing technologies, coupled with highly heterogeneous annotation methods, can observe cases of AS with great sensitivity, and one should be extremely cautious of the biases and rates of false positives associated with these methods. These issues are not addressed in the manuscript. Here, AS measures seem to be derived directly from CDS annotations downloaded from public databases, and do not account for differing annotation methods or RNA sequencing depth and tissue sample diversity.

      We are aware of the bias that may exist in annotation files. Since the source of noise can be highly variable, we have assumed that most of the data has a similar bias. However, we agree with the reviewer that we could perform some analysis to test for these biases and their association to different methodologies. Thus, we will measure the uncertainty present in the data. From one side, we will be more explicit about the data limitations and the biases it can generate in the results. On the other side, while analyzing the false positives in the data is out of our scope, we will perform a statistical test to detect possible biases regarding different methods of sequencing and annotation, and types of organisms (model or non-model organisms). If positive, we will proceed, as far as possible, to normalize the data or to estimate a confidence interval.

      Here, AS measures seem to be derived directly from CDS annotations downloaded from public databases, and do not account for differing annotation methods or RNA sequencing depth and tissue sample diversity.

      Beyond taking into account the differential bias that may exist in the data, we do not consider that our AS measure is problematic. The NCBI database is one of the most reliable databases that we have to date and is continuously updated from all scientific community. So, the use of this data and the corresponding procedures for deriving the AS measure are perfectly acceptable for a comparative analysis on such a huge global scale. Furthermore, the proposal of a new genome-level measure of AS that allows to compare species spanning the whole tree of life is part of the novelty of the study. We understand that small-scale studies require a high specificity about the molecular processes involved in the study. However, this is not the case, where we are dealing with a large-scale problem. On the other side, as we have previously mention, we agree with the reviewer to analyze the degree of uncertainty in the data to better interpret the results.

      There is no mention of the possibility that AS could be largely caused by random splicing errors, a possibility that could very well fit with the manuscript's data. Instead, the authors adopt early on the view that AS is regulated and functional, generally citing outdated literature.

      There is no question that some AS events are functional, as evidenced by strongly supported studies. However, whether all AS events are functional is questionable, and the relative fractions of functional and non-functional AS are unknown. With this in mind, the authors should be more cautious in interpreting their data.

      Many studies suggest that most of the AS events observed are the result of splicing errors and are therefore neither functional nor conserved. However, we still have limited knowledge about the functionality of AS. Just because we don’t have a complete understanding of its functionality, doesn’t mean there isn’t a fundamental cause behind these events. AS is a highly dynamic process that can be associated with processes of a stochastic nature that are fundamental for phenotypic diversity and innovation. This is one of the reasons why we do not get into a discussion about the functionality of AS and consider it as a potential measure of biological innovation. Nevertheless, we agree with the reviewer’s comments, so we will add a discussion about this issue with updated literature and look at any possible misinterpretation of the results.

      The "complexity" of organisms also correlates well (negatively) with effective population size. The power of selection to eliminate (slightly) deleterious mutations or errors decreases with effective population size. The correlation observed by the authors could thus easily be explained by a non-adaptive interpretation based on simple population genetics principles.

      We appreciate the observation of the reviewer. We know well the M. Lynch’s theory on the role of the effective population size and its eventual correlation with genomic parameters, but we want to emphasize that our objective is not to find an adaptive or non-adaptive explanation of the evolution of AS, but rather to reveal it. Nevertheless, as the reviewer suggests, we will look at the correlation between the AS and the effective population size and discuss about a possible non-adaptive interpretation.

      The manuscript contains evidence that the authors might benefit from adopting a more modern view of how evolution proceeds. Sentences such as "... suggests that only sophisticated organisms optimize alternative splicing by increasing..." (L113), or "especially in highly evolved groups such as mammals" (L130), or the repeated use of "higher" and "lower" organisms need revising.

      As the reviewer suggests, we will proceed with the corresponding linguistic corrections.

      Because of the lack of controls mentioned above, and because of the absence of discussion regarding an alternative non-adaptive interpretation, the analyses presented in the manuscript are of very limited use to other researchers in the field. In conclusion, the study does not present solid conclusions.

      Reviewer #2 (Public Review):

      Summary:

      In this contribution, the authors investigate the degree of alternative splicing across the evolutionary tree and identify a trend of increasing alternative splicing as you move from the base of the tree (here, only prokaryotes are considered) towards the tips of the tree. In particular, the authors investigate how the degree of alternative splicing (roughly speaking, the number of different proteins made from a single ORF (open reading frame) via alternative splicing) relates to three genomic variables: the genome size, the gene content (meaning the fraction of the genome composed of ORFs), and finally, the coding percentage of ORFs, meaning the ratio between exons and total DNA in the ORF. When correlating the degree of alternative splicing with these three variables, they find that the different taxonomic groups have a different correlation coefficient, and identify a "progressive pattern" among metazoan groups, namely that the correlation coefficient mostly increases when moving from flowering plants to arthropods, fish, birds, and finally mammals. They conclude that therefore the amount of splicing that is performed by an organismal group could be used as a measure of its complexity.

      Weaknesses:

      While I find the analysis of alternative splicing interesting, I also find that it is a very imperfect measure of organismal complexity and that the manuscript as a whole is filled with unsupported statements. First, I think it is clear to anyone studying evolution over the tree of life that it is the complexity of gene regulation that is at the origin of much of organismal structural and behavioral complexity. Arguably, creating different isoforms out of a single ORF is just one example of complex gene regulation. However, the complexity of gene regulation is barely mentioned by the authors.

      We disagree with the reviewer with that our measure of AS is imperfect. Just as we responded to the first reviewer, we will quantify the uncertainty in the data and correct for differential biases caused by annotation and sequencing methods. Thus, beyond correcting relevant biases in the data, we consider that our measure is adequate for a comparative analysis at a global scale. A novelty of our study is the proposal of a genome-level measure of AS that takes into account data from the entire scientific community. 

      We want also to emphasize that we assume from the beginning that AS may reflect some kind of biological complexity, it is not a conclusion from the results. An argument in favor of such an assumption is that AS is associated with stochastic processes that are fundamental for phenotypic diversity and innovation. Of course, we agree with the reviewer that it is not the only mechanism behind biological complexity, so we will emphasize it in the manuscript. On the other side, we will be more explicit about the assumptions and objectives, and will correct any unsupported statement.

      Further, it is clear that none of their correlation coefficients actually show a simple trend (see Table 3). According to these coefficients, birds are more complex than mammals for 3 out of 4 measures.

      An evolutionary trend is broadly defined as the gradual change in some characteristic of organisms as they evolve or adapt to a specific environment. Under our context, we define an evolutionary trend as the gradual change in genome composition and its association with AS across the main taxonomic groups. If we look at Figure 4 and Table 3 we can conclude that there is a progressive trend. We will be more precise about how we define an evolutionary trend and correct any possible misinterpretation of the results. On the other side, we do not assume that mammals should be more complex than birds. First, we will emphasize that our results show that birds have the highest values of such a trend. Second, after reading the reviewer’s comments, we have decided that we will perform an additional analysis to correct for differences in the taxonomic group sizes, which will allow us to have more confidence in the results.

      It is also not clear why the correlation coefficient between alternative splicing ratio and genome length, gene content, and coding percentage should display such a trend, rather than the absolute value. There are only vague mechanistic arguments.

      The study analyzes the relationship of AS with genomic composition for the large taxonomic groups. We assume that significant differences in these relationships are indicators of the presence of different mechanisms of genome evolution. However, we agree with the reviewer that a correlation does not imply a causal relation, so we will be more cautious when interpreting the results.

      To quantify the relationships we use correlation coefficients, the slopes of such correlations, and the relation of variability. Although the absolute values of AS are also illustrated in Table 4, we consider that they are less informative than if we include how it relates to the genomic composition. For example, we observe that plants have a different genome composition and relation with AS if compared to animals, which suggest that they follow different mechanisms of genome evolution. On the other hand, we observe a trend in animals, where high values of AS are associated to a large percentage of introns and a percentage of intergenic DNA of about the 50% of genomes.

      Much more troubling, however, is the statement that the data supports "lineage-specific trends" (lines 299-300). Either this is just an ambiguous formulation, or the authors claim that you can see trends *within* lineages.

      We agree with the reviewer that this statement is not correct, so we will proceed to correct it.

      The latter is clearly not the case. In fact, within each lineage, there is a tremendous amount of variation, to such an extent that many of the coefficients given in Table 3 are close to meaningless. Note that no error bars or p-values are presented for the values shown in Table 3. Figure 2 shows the actual correlation, and the coefficient for flowering plants there is given as 0.151, with a p-value of 0.193. Table 3 seems to quote r=0.174 instead. It should be clear that a correlation within a lineage or species is not a sign of a trend.

      The reviewer is not understanding correctly the results in Table 3. It is precisely the variation of the genome variables what we are measuring. Given the standardization of these values by the mean values, we have proceeded to compare the variability between groups, which is the result shown in Table 3. In this case there are no error bars or p-values associated. On the other hand, we agree that a correlation is not a sign of a trend. But the relations of variability, together with the results obtained in Figure 3, are indicators of a trend. As we mentioned before, we will proceed to analyze whether the variation in the group sizes is causing a bias in the results.

      There are several wrong or unsupported statements in the manuscript. Early on, the authors state that the alternative splicing ratio (a number greater or equal to one that can be roughly understood as the number of different isoforms per ORF) "quantifies the number of different isoforms that can be transcribed using the same amount of information" (lines 51-52). But in many cases, this is incorrect, because the same sequence can represent different amounts of information depending on the context. So, if a changed context gives rise to a different alternative splice, it is because the genetic sequence has a different meaning in the changed context: the information has changed.

      We agree that there are not well supported statements, so we will proceed to revise them.

      In line 149, the authors state that "the energetic cost of having large genomes is high". No citation is given, and while such a statement seems logical, it does not have very solid support.

      We will also revise the bibliography and support our statements with updated references.

      If there was indeed a strong selective force to reduce genome size, we would not see the stunning diversity of genome sizes even within lineages. This statement is repeated (without support) several times in the manuscript, apparently in support of the idea that mammals had "no choice" to increase complexity via alternative splicing because they can't increase it by having longer genomes. I don't think this reasoning can be supported.

      We agree with the reviewer in this issue, so we will carefully revise the statements that indirectly (or directly) assume the action of selective forces on the genome composition.

      Even more problematic is the statement that "the amount of protein-coding DNA seems to be limited to a size of about 10MB" (line 219). There is no evidence whatsoever for this statement.

      In Figure 1A we observe a one-to-one relationship between the genome size and the amount of coding. However, in multicellular organisms, although the genome size increases we observe that the amount of coding does not increase by more than 10Mb, which suggest the presence of some genomic limitation. Of course, this is not an absolute or general statement, but rather a suggestion. We are only describing our results.

      The reference that is cited (Choi et al 2020) suggests that there is a maximum of 150GB in total genome size due to physiological constraints. In lines 257-258, the authors write that "plants are less restricted in terms of storing DNA sequences compared to animals" (without providing evidence or a citation).

      We will revise the bibliography and add updated references.

      I believe this statement is made due to the observation that plants tend to have large intergenic regions. But without examining the functionality of these interagency regions (they might host long non-coding RNA stretches that are used to regulate the expression of other genes, for example) it is quite adventurous to use such a simple measure as being evidence that plants "are less restricted in terms of storing DNA sequences", whatever that even means. I do not think the authors mean that plants have better access to -80 freezers. The authors conclude that "plant's primary mechanism of genome evolution is by expanding their genome". This statement itself is empty: we know that plants are prone to whole genome duplication, but this duplication is not, as far as we understand, contributing to complexity. It is not a "primary mechanism of genome evolution".

      We will revise these statements.

      In lines 293-294, the authors claim that "alternative splicing is maximized in mammalian genomes". There is no evidence that this ratio cannot be increased. So, to conclude (on lines 302-303) that alternative splicing ratios are "a potential candidate to quantify organismal complexity" seems, based on this evidence, both far-fetched and weak at the same time.

      Our results show the highest values of AS in mammals, but we understand that the results are limited to the availability and accuracy of data, which we will emphasize in the manuscript. As we previously mention, we will also proceed to analyze the uncertainty in data and carry out the appropriate corrections.

      I am also not very comfortable with the data analysis. The authors, for example, say that they have eliminated from their analysis a number of "outlier species". They mention one: Emmer wheat because it has a genome size of 900 Mb (line 367). Since 900MB does not appear to be extreme, perhaps the authors meant to write 900 Gb. When I consulted the paper that sequenced Triticum dicoccoides, they noted that 14 chromosomes are about 10GB. Even a tetraploid species would then not be near 900Gb. But more importantly, such a study needs to state precisely which species were left out, and what the criteria are for leaving out data, lest they be accused of selecting data to fit their hypothesis.

      The reviewer is right, we wanted to say 900Mb, which is approximately 7.2Gb. We had a mistake of nomenclature. This value is extreme compared to the typical values, so it generates large deviations when applying measures of central tendency and dispersion. We want to obtain mean values that are representative of the most species composing the taxonomic groups, so we find appropriate to exclude all outlier values in the study. Nevertheless, we will specify the criteria that we have used to select the data in a rigorous way.

      I understand that Methods are often put at the end of a manuscript, but the measures discussed here are so fundamental to the analysis that a brief description of what the different measures are (in particular, the "alternative splicing ratio") should be in the main text, even when the mathematical definition can remain in the Methods.

      We agree with the reviewer, so we will add a brief description of the genomic variables at the beginning of the Results section.

      Finally, a few words on presentation. I understand that the following comments might read differently after the authors change their presentation. This manuscript was at the border of being comprehensible. In many cases, I could discern the meaning of words and sentences in contexts but sometimes even that failed (as an example above, about "species-specific trends", illustrates). The authors introduced jargon that does not have any meaning in the English language, and they do this over and over again.

      Note that I completely agree with all the comments by the other reviewer, who alerted me to problems I did not catch, including the possible correlation with effective population size: a possible non-adaptive explanation for the results.

    2. Author response:

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors collected genomic information from public sources covering 423 eukaryote genomes and around 650 prokaryote genomes. Based on pre-computed CDS annotation, they estimated the frequency of alternative splicing (AS) as a single average measure for each genome and computed correlations with this measure and other genomic properties such as genome size, percentage of coding DNA, gene and intergenic span, etc. They conclude that AS frequency increases with genome complexity in a somewhat directional trend from "lower" organisms to "higher" organisms.

      Strengths:

      The study covers a wide range of taxonomic groups, both in prokaryotes and eukaryotes.

      Weaknesses:

      The study is weak both methodologically and conceptually. Current high throughput sequencing technologies, coupled with highly heterogeneous annotation methods, can observe cases of AS with great sensitivity, and one should be extremely cautious of the biases and rates of false positives associated with these methods. These issues are not addressed in the manuscript.

      We are aware of the bias that may exist in annotation files. Since the source of noise can be highly variable, we have assumed that most of the data has a similar bias. However, we agree with the reviewer that we could perform some analysis to test for these biases and their association to different methodologies. Thus, we will measure the uncertainty present in the data. From one side, we will be more explicit about the data limitations and the biases it can generate in the results. On the other side, while analyzing the false positives in the data is out of our scope, we will perform a statistical test to detect possible biases regarding different methods of sequencing and annotation, and types of organisms (model or non-model organisms). If positive, we will proceed, as far as possible, to normalize the data or to estimate a confidence interval.

      Here, AS measures seem to be derived directly from CDS annotations downloaded from public databases, and do not account for differing annotation methods or RNA sequencing depth and tissue sample diversity.

      Beyond taking into account the differential bias that may exist in the data, we do not consider that our AS measure is problematic. The NCBI database is one of the most reliable databases that we have to date and is continuously updated from all scientific community. So, the use of this data and the corresponding procedures for deriving the AS measure are perfectly acceptable for a comparative analysis on such a huge global scale. Furthermore, the proposal of a new genome-level measure of AS that allows to compare species spanning the whole tree of life is part of the novelty of the study. We understand that small-scale studies require a high specificity about the molecular processes involved in the study. However, this is not the case, where we are dealing with a large-scale problem. On the other side, as we have previously mention, we agree with the reviewer to analyze the degree of uncertainty in the data to better interpret the results.

      There is no mention of the possibility that AS could be largely caused by random splicing errors, a possibility that could very well fit with the manuscript's data. Instead, the authors adopt early on the view that AS is regulated and functional, generally citing outdated literature.

      There is no question that some AS events are functional, as evidenced by strongly supported studies. However, whether all AS events are functional is questionable, and the relative fractions of functional and non-functional AS are unknown. With this in mind, the authors should be more cautious in interpreting their data.

      Many studies suggest that most of the AS events observed are the result of splicing errors and are therefore neither functional nor conserved. However, we still have limited knowledge about the functionality of AS. Just because we don’t have a complete understanding of its functionality, doesn’t mean there isn’t a fundamental cause behind these events. AS is a highly dynamic process that can be associated with processes of a stochastic nature that are fundamental for phenotypic diversity and innovation. This is one of the reasons why we do not get into a discussion about the functionality of AS and consider it as a potential measure of biological innovation. Nevertheless, we agree with the reviewer’s comments, so we will add a discussion about this issue with updated literature and look at any possible misinterpretation of the results.

      The "complexity" of organisms also correlates well (negatively) with effective population size. The power of selection to eliminate (slightly) deleterious mutations or errors decreases with effective population size. The correlation observed by the authors could thus easily be explained by a non-adaptive interpretation based on simple population genetics principles.

      We appreciate the observation of the reviewer. We know well the M. Lynch’s theory on the role of the effective population size and its eventual correlation with genomic parameters, but we want to emphasize that our objective is not to find an adaptive or non-adaptive explanation of the evolution of AS, but rather to reveal it. Nevertheless, as the reviewer suggests, we will look at the correlation between the AS and the effective population size and discuss about a possible non-adaptive interpretation.

      The manuscript contains evidence that the authors might benefit from adopting a more modern view of how evolution proceeds. Sentences such as "... suggests that only sophisticated organisms optimize alternative splicing by increasing..." (L113), or "especially in highly evolved groups such as mammals" (L130), or the repeated use of "higher" and "lower" organisms need revising.

      As the reviewer suggests, we will proceed with the corresponding linguistic corrections.

      Because of the lack of controls mentioned above, and because of the absence of discussion regarding an alternative non-adaptive interpretation, the analyses presented in the manuscript are of very limited use to other researchers in the field. In conclusion, the study does not present solid conclusions.

      Reviewer #2 (Public Review):

      Summary:

      In this contribution, the authors investigate the degree of alternative splicing across the evolutionary tree and identify a trend of increasing alternative splicing as you move from the base of the tree (here, only prokaryotes are considered) towards the tips of the tree. In particular, the authors investigate how the degree of alternative splicing (roughly speaking, the number of different proteins made from a single ORF (open reading frame) via alternative splicing) relates to three genomic variables: the genome size, the gene content (meaning the fraction of the genome composed of ORFs), and finally, the coding percentage of ORFs, meaning the ratio between exons and total DNA in the ORF. When correlating the degree of alternative splicing with these three variables, they find that the different taxonomic groups have a different correlation coefficient, and identify a "progressive pattern" among metazoan groups, namely that the correlation coefficient mostly increases when moving from flowering plants to arthropods, fish, birds, and finally mammals. They conclude that therefore the amount of splicing that is performed by an organismal group could be used as a measure of its complexity.

      Weaknesses:

      While I find the analysis of alternative splicing interesting, I also find that it is a very imperfect measure of organismal complexity and that the manuscript as a whole is filled with unsupported statements. First, I think it is clear to anyone studying evolution over the tree of life that it is the complexity of gene regulation that is at the origin of much of organismal structural and behavioral complexity. Arguably, creating different isoforms out of a single ORF is just one example of complex gene regulation. However, the complexity of gene regulation is barely mentioned by the authors.

      We disagree with the reviewer with that our measure of AS is imperfect. Just as we responded to the first reviewer, we will quantify the uncertainty in the data and correct for differential biases caused by annotation and sequencing methods. Thus, beyond correcting relevant biases in the data, we consider that our measure is adequate for a comparative analysis at a global scale. A novelty of our study is the proposal of a genome-level measure of AS that takes into account data from the entire scientific community.

      We want also to emphasize that we assume from the beginning that AS may reflect some kind of biological complexity, it is not a conclusion from the results. An argument in favor of such an assumption is that AS is associated with stochastic processes that are fundamental for phenotypic diversity and innovation. Of course, we agree with the reviewer that it is not the only mechanism behind biological complexity, so we will emphasize it in the manuscript. On the other side, we will be more explicit about the assumptions and objectives, and will correct any unsupported statement.

      Further, it is clear that none of their correlation coefficients actually show a simple trend (see Table 3). According to these coefficients, birds are more complex than mammals for 3 out of 4 measures.

      An evolutionary trend is broadly defined as the gradual change in some characteristic of organisms as they evolve or adapt to a specific environment. Under our context, we define an evolutionary trend as the gradual change in genome composition and its association with AS across the main taxonomic groups. If we look at Figure 4 and Table 3 we can conclude that there is a progressive trend. We will be more precise about how we define an evolutionary trend and correct any possible misinterpretation of the results. On the other side, we do not assume that mammals should be more complex than birds. First, we will emphasize that our results show that birds have the highest values of such a trend. Second, after reading the reviewer’s comments, we have decided that we will perform an additional analysis to correct for differences in the taxonomic group sizes, which will allow us to have more confidence in the results.

      It is also not clear why the correlation coefficient between alternative splicing ratio and genome length, gene content, and coding percentage should display such a trend, rather than the absolute value. There are only vague mechanistic arguments.

      The study analyzes the relationship of AS with genomic composition for the large taxonomic groups. We assume that significant differences in these relationships are indicators of the presence of different mechanisms of genome evolution. However, we agree with the reviewer that a correlation does not imply a causal relation, so we will be more cautious when interpreting the results.

      To quantify the relationships we use correlation coefficients, the slopes of such correlations, and the relation of variability. Although the absolute values of AS are also illustrated in Table 4, we consider that they are less informative than if we include how it relates to the genomic composition. For example, we observe that plants have a different genome composition and relation with AS if compared to animals, which suggest that they follow different mechanisms of genome evolution. On the other hand, we observe a trend in animals, where high values of AS are associated to a large percentage of introns and a percentage of intergenic DNA of about the 50% of genomes.

      Much more troubling, however, is the statement that the data supports "lineage-specific trends" (lines 299-300). Either this is just an ambiguous formulation, or the authors claim that you can see trends within lineages.

      We agree with the reviewer that this statement is not correct, so we will proceed to correct it.

      The latter is clearly not the case. In fact, within each lineage, there is a tremendous amount of variation, to such an extent that many of the coefficients given in Table 3 are close to meaningless. Note that no error bars or p-values are presented for the values shown in Table 3. Figure 2 shows the actual correlation, and the coefficient for flowering plants there is given as 0.151, with a p-value of 0.193. Table 3 seems to quote r=0.174 instead. It should be clear that a correlation within a lineage or species is not a sign of a trend.

      The reviewer is not understanding correctly the results in Table 3. It is precisely the variation of the genome variables what we are measuring. Given the standardization of these values by the mean values, we have proceeded to compare the variability between groups, which is the result shown in Table 3. In this case there are no error bars or p-values associated. On the other hand, we agree that a correlation is not a sign of a trend. But the relations of variability, together with the results obtained in Figure 3, are indicators of a trend. As we mentioned before, we will proceed to analyze whether the variation in the group sizes is causing a bias in the results.

      There are several wrong or unsupported statements in the manuscript. Early on, the authors state that the alternative splicing ratio (a number greater or equal to one that can be roughly understood as the number of different isoforms per ORF) "quantifies the number of different isoforms that can be transcribed using the same amount of information" (lines 51-52). But in many cases, this is incorrect, because the same sequence can represent different amounts of information depending on the context. So, if a changed context gives rise to a different alternative splice, it is because the genetic sequence has a different meaning in the changed context: the information has changed.

      We agree that there are not well supported statements, so we will proceed to revise them.

      In line 149, the authors state that "the energetic cost of having large genomes is high". No citation is given, and while such a statement seems logical, it does not have very solid support.

      We will also revise the bibliography and support our statements with updated references.

      If there was indeed a strong selective force to reduce genome size, we would not see the stunning diversity of genome sizes even within lineages. This statement is repeated (without support) several times in the manuscript, apparently in support of the idea that mammals had "no choice" to increase complexity via alternative splicing because they can't increase it by having longer genomes. I don't think this reasoning can be supported.

      We agree with the reviewer in this issue, so we will carefully revise the statements that indirectly (or directly) assume the action of selective forces on the genome composition.

      Even more problematic is the statement that "the amount of protein-coding DNA seems to be limited to a size of about 10MB" (line 219). There is no evidence whatsoever for this statement.

      In Figure 1A we observe a one-to-one relationship between the genome size and the amount of coding. However, in multicellular organisms, although the genome size increases we observe that the amount of coding does not increase by more than 10Mb, which suggest the presence of some genomic limitation. Of course, this is not an absolute or general statement, but rather a suggestion. We are only describing our results.

      The reference that is cited (Choi et al 2020) suggests that there is a maximum of 150GB in total genome size due to physiological constraints. In lines 257-258, the authors write that "plants are less restricted in terms of storing DNA sequences compared to animals" (without providing evidence or a citation).

      We will revise the bibliography and add updated references.

      I believe this statement is made due to the observation that plants tend to have large intergenic regions. But without examining the functionality of these interagency regions (they might host long non-coding RNA stretches that are used to regulate the expression of other genes, for example) it is quite adventurous to use such a simple measure as being evidence that plants "are less restricted in terms of storing DNA sequences", whatever that even means. I do not think the authors mean that plants have better access to -80 freezers. The authors conclude that "plant's primary mechanism of genome evolution is by expanding their genome". This statement itself is empty: we know that plants are prone to whole genome duplication, but this duplication is not, as far as we understand, contributing to complexity. It is not a "primary mechanism of genome evolution".

      We will revise these statements.

      In lines 293-294, the authors claim that "alternative splicing is maximized in mammalian genomes". There is no evidence that this ratio cannot be increased. So, to conclude (on lines 302-303) that alternative splicing ratios are "a potential candidate to quantify organismal complexity" seems, based on this evidence, both far-fetched and weak at the same time.

      Our results show the highest values of AS in mammals, but we understand that the results are limited to the availability and accuracy of data, which we will emphasize in the manuscript. As we previously mention, we will also proceed to analyze the uncertainty in data and carry out the appropriate corrections.

      I am also not very comfortable with the data analysis. The authors, for example, say that they have eliminated from their analysis a number of "outlier species". They mention one: Emmer wheat because it has a genome size of 900 Mb (line 367). Since 900MB does not appear to be extreme, perhaps the authors meant to write 900 Gb. When I consulted the paper that sequenced Triticum dicoccoides, they noted that 14 chromosomes are about 10GB. Even a tetraploid species would then not be near 900Gb. But more importantly, such a study needs to state precisely which species were left out, and what the criteria are for leaving out data, lest they be accused of selecting data to fit their hypothesis.

      The reviewer is right, we wanted to say 900Mb, which is approximately 7.2Gb. We had a mistake of nomenclature. This value is extreme compared to the typical values, so it generates large deviations when applying measures of central tendency and dispersion. We want to obtain mean values that are representative of the most species composing the taxonomic groups, so we find appropriate to exclude all outlier values in the study. Nevertheless, we will specify the criteria that we have used to select the data in a rigorous way.

      I understand that Methods are often put at the end of a manuscript, but the measures discussed here are so fundamental to the analysis that a brief description of what the different measures are (in particular, the "alternative splicing ratio") should be in the main text, even when the mathematical definition can remain in the Methods.

      We agree with the reviewer, so we will add a brief description of the genomic variables at the beginning of the Results section.

      Finally, a few words on presentation. I understand that the following comments might read differently after the authors change their presentation. This manuscript was at the border of being comprehensible. In many cases, I could discern the meaning of words and sentences in contexts but sometimes even that failed (as an example above, about "species-specific trends", illustrates). The authors introduced jargon that does not have any meaning in the English language, and they do this over and over again.

      Note that I completely agree with all the comments by the other reviewer, who alerted me to problems I did not catch, including the possible correlation with effective population size: a possible non-adaptive explanation for the results.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      Specific comments to improve the quality of the work:

      (1) The choice of subunits to tag are really not ideal. In the available structures of the human proteasome, The C-terminus of Rpn3/PSMD3 points directly toward the ATPase pore and is likely to disrupt the structure and/or dynamics of the proteasome during proteolysis (see comments regarding controls for functionality below). Similarly, the C-terminal tail of Rpt1/PSMC2 has a key role in the opening of the 20S core particle gate for substrate translocation and processing (see 2018 Nature Communications, 9:1360 and 2018 Cell Reports 24:1301-1315), and Alpha3/PSMA4 can be substituted by a second copy of Alpha4/PSMA7 in some conditions (although tagging Alpha3/PSMA4 would admittedly provide a picture of the canonical proteasome interactome while actively excluding the interactome of the non-canonical proteasomes that form via replacement of Alpha3/PSMA4). Comparison of these cell lines with lines harboring tags on subunits that are commonly used for tagging in the field because of a lack of impacts, such as the N-terminus of Rpn1/PSMD2, the C-terminus of Rpn11/PSMD14, and the C-terminus of Beta4/PSMB2 would help instill confidence that the interactome reported largely arises from mature, functional proteasomes rather than subcomplexes, defective proteasomes, or other species that may occur due to tagging at these positions.

      We thank the reviewer for pointing this out. The original purpose of our strategy was to establish proximity labeling of proteasomes to enable applications both in cell culture and in vivo. The choice of PSMA4 and PSMC2 was dictated by previous successful tagging with GFP in mammalian cells (Salomons et al., Exp Cell Res 2010)(Bingol and Schuman, Nature 2006). However, the choice of C-terminal PSMC2 might have been not optimal. HEK293 cells overexpressing PSMC2-BirA show slower growth and the BioID data retrieve higher enrichment of assembly factors suggesting slower assembly of this fusion protein in proteasome. Although we did not observe a negative impact on overall proteasome activity and PSMC2-BirA was (at least in part) incorporated into fully assembled proteasomes as indicated by enrichment of 20S proteins.We apologize for not making it clear that we labeled the N-terminus of PSMD3/Rpn3 and not the C-terminus (Figure 1a and S1a). Therefore, we included in Figure S1a of the revised manuscript structures of the proteasome where the tagged subunit termini are highlighted: C-terminus for PSMA4 and PSMC2 and N-terminus for PSMD3. Additionally, we would like to point out that, differently from PSMC2-BirA, cells expressing BirA-PSMD3 did not show slower growth, and BioID data showed a more homogenous enrichment of both 19S and 20S proteins, as compared to PSMC2-BirA (Figure 1D and 1E). However, the overall level of enrichment of proteasome subunits was not comparable to PSMA4-BirA and, therefore, we opted for focusing the rest of the manuscript on this construct.

      In support of this point, the data provided in Figure 1E in which the change in the abundances of each proteasome subunit in the tagged line vs. the BirA control line demonstrates substantial enrichment of the subcomplexes of the proteasome that are tagged in each case; this effect may represent the known feedback-mediated upregulation of new proteasome subunit synthesis that occurs when proteasomal proteolysis is impaired, or alternatively, the accumulation of subcomplexes containing the tagged subunit that cannot readily incorporate into mature proteasomes. Acknowledging this limitation in the text would be valuable to readers who are less familiar with the proteasome.

      We would like to clarify that the data shown in Figure 1E do not represent whole proteome data, but rather log2 fold changes vs. BirA* control calculated on streptavidin enrichment samples. The differences in the enrichment of the various subcomplexes between cell lines derives from the fact that the effect size of the enrichment depends on both protein abundance in the isolated complexes, but also on the efficiency of biotinylation. The latter will be higher for proteins located in closer proximity to the bait. A similar observation was pointed out in a recent publication (PMID:36410438) that compared BioID and Co-IP for the same bait. When a component of the nuclear pore complex (Nup158) was analyzed by BioID only the more proximal proteins were enriched as compared to the whole complex in Co-IP data (Author response image 1):

      Author response image 1.

      Proteins identified in the NUP158 BioID or pulldown experiments are filled in red or light red for significance intervals A or B, respectively. The bait protein NUP158 is filled in yellow. Proteins enriched in the pulldown falling outside the SigA/B cutoff are filled in gray. NPC, nuclear pore complex. SigA, significant class A; SigB, significant class B. Reproduced from Figure 6 of PMID: 36410438.

      However, we would like to point out that despite quantitative differences between different proteasome subunits, both 19S and 20S proteins were found to be strongly enriched (typically >2 fold) in all the constructs compared to BirA* control line (Figure 1E). This indicates that at least a fraction of all the tagged subunits are incorporated into fully assembled proteasomes.

      Regarding the upregulation of proteasome subunits as a consequence of proteasome dysfunction, we did not find evidence of this, at least in the case of PSMA4. The immunoblot shown in Figure 2A and its quantification in S3A indicate no increased abundance of endogenous PSMA4 upon tetracycline induction of PSMA4-BirA*.

      (2) The use of myc as a substrate of the proteasome for demonstration that proteolysis is unaffected is perhaps not ideal. Myc is known to be degraded via both ubiquitin-dependent and ubiquitin-independent mechanisms, such that disruption of one means of degradation (e.g., ubiquitin-dependent degradation) via a given tag could potentially be compensated by another. A good example of this is that the C-terminal tagging of PSMC2/Rpt1 is likely to disrupt interaction between the core particle and the regulatory particle (as suggested in Fig. 1D); this may free up the core particle for ubiquitin-independent degradation of myc.

      Aside from using specific reporters for ubiquitin-dependent vs. independent degradation or a larger panel of known substrates, analysis of the abundance of K48-ubiquitinated proteins in the control vs. tag lines would provide additional evidence as to whether or not proteolysis is generally perturbed in the tag lines.

      We thank the reviewer for this suggestion. We have included an immunoblot analysis showing that the levels of K48 ubiquitylation (Figure S3d) are not affected by the expression of tagged PSMA4.

      (3) On pg. 8 near the bottom, the authors accidentally refer to ARMC6 as ARMC1 in one instance.

      We have corrected the mistake.

      (4) On pg. 10, the authors explain that they analyzed the interactome for all major mouse organs except the brain; although they explain in the discussion section why the brain was excluded, including this explanation on pg. 10 here instead of in the discussion might be a better place to discuss this.

      We moved the explanation from the discussion to the results part.

      Reviewer #2 (Recommendations For The Authors):

      (1) Perhaps the authors can quantify the fraction of unassembled PSMA4-BirA* from the SEC experiment (Fig. 2b) to give the readers a feeling for how large a problem this could be.

      The percentages based on Area Under the Curve calculations have been added to Figure S3b.

      (2) Do the authors observe any difference in the enrichment scores between proteins that are known to interact with the proteasome vs proteins that the authors can justify as "interactors of interactors" vs the completely new potential interactors? This could be an interesting way to show that the potential new interactors are not simply because of poor false positive rate calibration, but that they behave in the same way as the other populations.

      We thank the reviewer for this suggestion. We analyzed the enrichment scores for 20S proteasome subunits, known PIPs, first neighbors and the remaining enriched proteins. The remaining proteins (potential new interactors) have very similar scores as the first neighbors of known interactors. This plot has been added to Figure S3g.

      (3) Did the authors try to train a logistic model for the miniTurbo experiments, like it was done for the BirA* experiments? Perhaps combining the results of both experiments would yield higher confidence on the proteasome interactors.

      Following the reviewers suggestion, we applied the classifier on the dataset of the comparison between miniTurbo and PSMA-miniTurbo. We found a clear separation between the FPR and the TPR with 136 protein groups enriched in PSMA-miniTurbo. We have added the classifier and corresponding ROC curve to Figure S4f and S4g.

      75 protein groups were found to be enriched for both PSMA4-BirA* and PSMA4-miniTurbo (Author response image 2), including the proteasome core particles, regulatory particles, known interactors and potential new interactors. As we focused more on the identification of substrates with PSMA4-miniTurbo, we did not pursue these overlapping protein groups further, but rather used the comparison to the mouse model to identify potential new interactors.

      Author response image 2.

      Overlap between ProteasomeID enriched proteins (fpr<0.05) between PSMA4-BirA* and PSMA4-miniTurbo.

      (4) Perhaps this is already known, but did the authors check if MG132 affect proteasome assembly? The authors could for example repeat their SEC experiments in the presence of MG132.

      We thank the reviewer for the suggestion, however to our knowledge there are no reports that MG132 has an effect on the assembly of the proteasome. MG132 is one of the most used proteasome inhibitors in basic research and as such has been extensively characterized in the last 3 decades. The small peptide aldehyde acts as a substrate analogue and binds directly to the active site of the protease PSMB5/β5. We therefore think it is unlikely that MG132 is interfering with the assembly of the proteasome.

      (5) Minor comment: at the bottom of page 8, the authors probably mean ARMC6 and not ARMC1.

      We have corrected the mistake.

      (6) It would be interesting to expand the analysis of the already acquired in vivo data to try to identify tissue-specific proteasome interactors. Can the authors draw a four-way Venn diagram with the interactors of each tissue?

      We thank the reviewer for this suggestion. We have generated an UpSet plot showing the overlap of ProteasomeID enriched proteins in the four tissues that gave us meaningful results (Author response image 3). In order to investigate whether the observed differences in ProteasomeID enriched proteins could be meaningful in terms of proteasome biology, we have highlighted proteins belonging to the UPS that show tissue specific enrichments. We found proteasome activators such as PSME1/PA28alpha and PSME2/PA28beta to enrich preferentially in kidney and liver, respectively, as well as multiple deubiquitinases to enrich preferentially in the heart. These differences might be related to the specific cellular composition of the different tissues, e.g., number of immune cells present, or the tissue-specific interaction of proteasomes with enzymes involved in the ubiquitin cycle. Given the rather preliminary nature of these findings, we have opted for not including this figure in the main manuscript, but rather include it only in this rebuttal letter.

      Author response image 3.

      Upset plot showing overlap between ProteasomeID enriched proteins in different mouse organs.

      Reviewer #3 (Recommendations For The Authors):

      (1) In the first paragraph of the Introduction, the authors link cellular senescence caused by partial proteasome inhibition with the efficacy of proteasome inhibitors in cancer therapy. Although this is an interesting hypothesis, I am not aware of any direct evidence for this; rather, I believe the efficacy of bortezomib/carfilzomib in haematological malignancies is most commonly attributed to these cells having adapted to high levels of proteotoxic stress (e.g., chronic unfolded protein response activation). I would suggest rephrasing this sentence.

      We thank the reviewer for the comment and have amended the introduction.

      (2) For the initial validation experiments (e.g., Fig. 1B), have the authors checked what level of Streptavidin signal is obtained with "+ bio, - tet" ? Although I accept that the induction of PSMA4-BirA* upon doxycycline addition is clear from the anti-Flag blots, it would still be informative to ascertain what level of background labelling is obtained without induction (but in the presence of exogenous biotin).

      We tested four different conditions +/- tet and +/- biotin (24h) in PSMA4-BirA* cell lines (Author response image 4). As expected, biotinylation was most pronounced when tet and biotin were added. When biotin was omitted, streptavidin signal was the lowest regardless of the addition of tet. Compared to the -biotin conditions, a slight increase of streptavidin signal could be observed when biotin was added but tet was not added. This could be either due to the promoter leaking (PMID: 12869186) or traces of tetracycline in the FBS we used, as we did not specifically use tet-free FBS for our experiments.

      Author response image 4.

      Streptavidin-HRP immunoblot following induction of BirA fusion proteins with tetracycline (+tet) and supplementation of biotin (+bio). For the sample used as expression control tetracycline was omitted (-tet). To test background biotinylation, biotin supplementation was omitted (-bio). Immunoblot against BirA and PSMA was used to verify induction of fusion proteins, while GAPDH was used as loading control.

      (3) For the proteasome structure models in Fig. 1D, a scale bar would be useful to inform the reader of the expected 10 nm labelling radius (as the authors have done later, in Fig. 2D).

      We have added 10 nm scale bars to Figure 1d.

      (4) In the "Identification of proteasome substrates by ProteasomeID" Results subsection, I believe there is a typo where the authors refer to ARMC1 instead of ARMC6.

      We have corrected the mistake.

      (5) I think Fig. S5 was one of the most compelling in the manuscript. Given the interest in confirming on-target efficacy of targeted degradation modalities, as well as identifying potential off-target effects early-on in development, I would consider promoting this out of the supplement.

      We thank the reviewer for the comment and share the excitement about using ProteasomeID for targeted degradation screening. We have moved the data on PROTACs (Figure S5) into a new main Figure 5.

      In addition, in relation to the comment of this reviewer regarding the detection of endogenous substrates, we have now included validation for one more hit emerging from our analysis (TIGD5) and included the results in Figure 4f, 4g and S4j.

    1. Author response:

      Overall recommendations.

      A brief summary of the main reviewers' recommendations that should be prioritized is listed below. Detailed recommendations as suggested by each individual reviewer are also included.

      -Better justification of the choice of the substitutions for the mutations should be added. In addition, authors should strongly consider adding more mutations to enable mechanistic tests of the proposed model for lipid conduction.

      We will characterize more mutations to the key residues at the TM4-TM6 interface. In addition to the TM4 lysine mutations shown in the original manuscript, we will include mutations to alanine and glutamate, and justify our choice of the substitutions in the revised manuscript. Furthermore, we will also test if introducing lysine mutations in TM6 will convert the ion channels into lipid scramblases. These additional experiments will greatly strengthen our conclusion.

      -Blockers to validate the concern that the recorded currents indeed arise from TMEM16A or OSCA/TMEM63 channels should be tested. Do the pore blockers also block scramblase activity in the gating mutants?

      TMEM16A and OSCA1.2 are readily expressed on cell surface. OSCA1.2 also has large conductance. This is the reason why we can record huge current even with inside-out patches. We will include TMEM16A inhibitor Ani9 and a non-specific inhibitor of OSCA channels to further validate. We have reported that Ani9 can inhibit a TMEM16A-derived lipid scramblase (L543K in TM4) in our previo3us publication (PMID: 31015464). We will test if Ani9 can also inhibit other TMEM16A scramblases reported in this study. We will also examine if Gd3+ is capable of blocking lipid scrambling of the OSCA1.2 gating mutations.

      -Include details of missing experimental conditions for scramblase activity.

      We will conduct a thorough revision to include detailed experimental conditions for scramblase activity measurement.

      -Additional mutants above and below the putative lysine gate as suggested by reviewer 3 to better assess the model.

      As we explained in Response #1, we will extend our mutations around the putative activation gate.

      -Concern about whether osmolarity changes are in fact activating OSC and TMEM63. As suggested by reviewers 1 and 3. This could be addressed by assessing scramblase activity and currents at different osmolarity levels.

      We will test the engineered OSCA1.2 scramblases in response to solutions with different osmolarity.

      Reviewer #1 (Public Review):

      Summary:

      TMEM16, OSCA/TMEM63, and TMC belong to a large superfamily of ion channels where TMEM16 members are calcium-activated lipid scramblases and chloride channels, whereas OSCA/TMEM63 and TMCs are mechanically activated ion channels. In the TMEM16 family, TMEM16F is a well-characterized calcium-activated lipid scramblase that plays an important role in processes like blood coagulation, cell death signaling, and phagocytosis. In a previous study, the group demonstrated that lysine mutation in TM4 of TMEM16A can enable the calcium-activated chloride channel to permeate phospholipids too. Based on this they hypothesize that the energy barrier for lipid scramblase in these ion channels is low, and that modification in the hydrophobic gate region by introducing a charged side chain between the TM4/6 interface in TMEM16 and OSCA/TMEM63 family can allow lipid scramblase. In this manuscript, using scramblase activity via Annexin V binding to phosphatidylserine, and electrophysiology, the authors demonstrate that lysine mutation in TM4 of TMEM16F and TMEM16A can cause constitutive lipid scramblase activity. The authors then go on to show that analogous mutations in OSCA1.2 and TMEM63A can lead to scramblase activity.

      Strengths:

      Overall, the authors introduce an interesting concept that this large superfamily can permeate ions and lipids.

      Weaknesses:

      The electrophysiology data does not entirely support their claims.

      We appreciate your positive comments. We will conduct more experiments including more electrophysiology characterizations as suggested.

      Reviewer #2 (Public Review):

      This concise and focused study by Lowry and colleagues identifies a motif in the pores of three families of channel/scramblase proteins that regulate exclusive ion permeation and lipid transport. These three ion channel families, which include the TMEM16s, the plant-expressed and stress-gated cation channel OSCA, and the mammalian homolog and mechanosensitive cation channel, TMEM63 share low sequence similarity between them and have seemingly differing functions, as anion (TMEM16s), or stress-activated cation channels (OSCA/TMEM63). The study finds that in all three families, mutating a single hydrophobic residue in the ion permeation pathway of the channels confers lipid transport through the pores of the channels, indicating that TMEM16 and the related OSCA and TMEM63 channels have a conserved potential for both ion and lipid permeation. The authors interpret the findings as revealing that these channel/scramblase proteins have a relatively low "energetic barrier for scramblase" activity. The experiments themselves seem to be done with a high level of rigor and the paper is well written. A weakness is the limited scope of the experiments which, if fixed, could open up a new line of inquiry.

      We appreciate the positive comments from the reviewer. We will conduct more experiments listed in our responses to the Overall Recommendations to improve the scope and quality of our study.

      Reviewer #3 (Public Review):

      This study was focused on the conserved mechanisms across the Transmembrane Channel/Scramblase superfamily, which includes members of the TMEM16, TMEM63/OSCA, and TMC families. The authors show that the introduction of lysine residues at the TM4-TM6 interface can disrupt gating and confer scramblase activity to non-scramblase proteins. Specifically, they show this to be true for conserved TM4 residues across TMEM16F, TMEM16A, OSCA1.2, and TMEM63A proteins. This breadth of data is a major strength of the paper and provides strong evidence for an underlying linked mechanism for ion conduction and phospholipid transport. Overall, the confocal imaging experiments, patch clamping experiments, and data analysis are performed well.

      However, there are several concerns regarding the scope of experiments supporting some claims in the paper. Although the authors propose that the TM4/TM6 interface is critical to ion conduction and phospholipid scramblase activity, in each case, there is very narrow evidence of support consisting of 1-3 lysine substitutions at specific residues on TM4. Given that the authors postulate that the introduction of a positive charge via the lysine side chain is essential to the constitutive activity of these proteins, additional mutation controls for side chain size (e.g. glutamine/methionine) or negative charge (e.g. glutamic acid), or a different positive charge (i.e. arginine) would have strengthened their argument. To more comprehensively understand the TM4/TM6 interface, mutations at locations one turn above and one turn below could be studied until there is no phenotype. In addition, the equivalent mutations on the TM6 side should be explored to rule out the effects of conformational changes that arise from mutating TM4 and to increase the strength of evidence for the importance of side-chain interactions at the TM6 interface. The experiments for OSCA1.2 osmolarity effects on gating and scramblase in Figure 4 could be improved by adding different levels of osmolarity in addition to time in the hypotonic solution.

      We appreciate the positive and constructive comments from the reviewer. As we outlined in our responses to the Overall Recommendations, we will include more mutations at the TM4 and TM6 interface to further strengthen our conclusion.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      In this study, the authors examined the role of IBTK, a substrate-binding adaptor of the CRL3 ubiquitin ligase complex, in modulating the activity of the eiF4F translation initiation complex. They find that IBTK mediates the non-degradative ubiquitination of eiF4A1, promotes cap-dependent translational initiation, nascent protein synthesis, oncogene expression, and tumor cell growth. Correspondingly, phosphorylation of IBTK by mTORC1/ S6K1 increases eIF4A1 ubiquitination and sustains oncogenic translation.

      Strengths:

      This study utilizes multiple biochemical, proteomic, functional, and cell biology assays to substantiate their results. Importantly, the work nominates IBTK as a unique substrate of mTORC1, and further validates eiF4A1 (a crucial subunit of the ei44F complex) as a promising therapeutic target in cancer. Since IBTK interacts broadly with multiple members of the translational initial complex - it will be interesting to examine its role in eiF2alpha-mediated ER stress as well as eiF3-mediated translation. Additionally, since IBTK exerts pro-survival effects in multiple cell types, it will be of relevance to characterize the role of IBTK in mediating increased mTORC1 mediated translation in other tumor types, thus potentially impacting their treatment with eiF4F inhibitors.

      Limitations/Weaknesses:

      The findings are mostly well supported by data, but some areas need clarification and could potentially be enhanced with further experiments:

      (1) Since eiF4A1 appears to function downstream of IBTK1, can the effects of IBTK1 KO/KD in reducing puromycin incorporation (in Fig 3A), cap-dependent luciferase reporter activity (Fig 3G), reduced oncogene expression (Fig 4A) or 2D growth/ invasion assays (Fig 4) be overcome or bypassed by overexpressing eiF4A1? These could potentially be tested in future studies.

      We appreciate the reviewer for bringing up this crucial point. As per the reviewer's suggestion, we conducted experiments where we overexpressed Myc-eIF4A1 in IBTK-KO SiHa cells. Our findings indicate that increasing levels of eIF4A1 through ectopic overexpression is unable to reverse the decrease in puromycin incorporation (Fig. S3C) and protein expression of eIF4A1 targets caused by IBTK ablation (Fig. S4E). These results clearly demonstrate that IBTK ablation-induced eIF4A1 dysfunctions cannot be rescued by simply elevating eIF4A1 protein levels. Given the above results are negative, the impacts of eIF4A1 overexpression on the 2D growth/invasion capacities of IBTK-KO cells were not further examined. We sincerely appreciate the reviewer's understanding regarding this matter.

      (2) The decrease in nascent protein synthesis in puromycin incorporation assays in Figure 3A suggest that the effects of IBTK KO are comparable to and additive with silvesterol. It would be of interest to examine whether silvesterol decreases nascent protein synthesis or increases stress granules in the IBTK KO cells stably expressing IBTK as well.

      We appreciate the reviewer for bringing up this crucial point. We have showed that silvestrol treatment still decreased nascent protein synthesis in IBTK-KO cells overexpressing FLAG-IBTK as well (Fig. S3B).

      (3) The data presented in Figure 5 regarding the role of mTORC1 in IBTK- mediated eiF4A1 ubiquitination needs further clarification on several points:

      • It is not clear if the experiments in Figure 5F with Phos-tag gels are using the FLAG-IBTK deletion mutant or the peptide containing the mTOR sites as it is mentioned on line 517, page 19 "To do so, we generated an IBTK deletion mutant (900-1150 aa) spanning the potential mTORC1-regulated phosphorylation sites" This needs further clarification.

      We appreciate the reviewer for bringing up this crucial point. The IBTK deletion mutant used in Fig. 5F is FLAG-IBTK900-1150aa. We have annotated it with smaller font size in the panel (red box) in Author response image 1.

      Author response image 1.

      • It may be of benefit to repeat the Phos tag experiments with full-length FLAG- IBTK and/or endogenous IBTK with molecular weight markers indicating the size of migrated bands.

      We appreciate the reviewer for bringing up this crucial point. We attempted to perform Phos-tag assays to detect the overexpressed full-length FLAG-IBTK or endogenous IBTK. However, we encountered difficulties in successfully transferring the full-length FLAG-IBTK or endogenous IBTK onto the nitrocellulose membrane during Phos-tag WB analysis. This is likely due to the limitations of this technique. Based on our experience, phos-tag gel is less efficient in detecting protein motility shifts with large molecular weights. As the molecular weight of IBTK protein is approximately 160 kDa, it falls within this category. Considering these technical constraints, we did not include Phos-tag assay results for full-length IBTK in our study. We sincerely appreciate the reviewer's understanding regarding this matter.

      The binding of Phos-tag to phosphorylated proteins induces a mobility shift during gel electrophoresis or protein separation techniques. This shift allows for the visualization and quantification of phosphorylated proteins separately from non-phosphorylated proteins. It's important to note that these mobility shifts indicate phosphorylation status, rather than actual molecular weights. pre- stained protein markers are typically used as a reference to assess the efficiency of protein transfer onto the membrane [Ref: 1]. Considering the aforementioned reasons, we did not add molecular weights to the WB images.

      Reference [1]. FUJIFILM Wako Pure Chemical Corporation, https://www.wako- chemicals.de/media/pdf/c7/5e/20/FUJIFILM-Wako_Phos-tag-R.pdf

      • Additionally, torin or Lambda phosphatase treatment may be used to confirm the specificity of the band in separate experiments.

      We appreciate the reviewer for bringing up this crucial point. Torin1 is a synthetic mTOR inhibitor by preventing the binding of ATP to mTOR, leading to the inactivation of both mTORC1 and mTORC2, whereas rapamycin primarily targets mTORC1 activity and may inhibit mTORC2 in certain cell types after a prolonged treatment. We have identified that the predominant mediator of IBTK phosphorylation is the mTORC1/S6K1 complex. Therefore, in this context, we think that rapamycin is sufficient to inactivate the mTORC1/S6K1 pathway. As shown in Fig. 5F, the phosphorylated IBTK900-1150aa was markedly decreased while the non-phosphorylated form was simultaneously increased in rapamycin- treated cells. As per the reviewer's suggestion, we treated FLAG-IBTK900-1150aa overexpressed cells with lambda phosphatase. As shown in Fig. 5G, lambda phosphatase treatment completely abolished the mobility shifts of phosphorylated FLAG-IBTK900-1150aa. Additionally, the lowest band displayed an abundant accumulation of the non-phosphorylated form of FLAG-IBTK900-1150aa. These findings confirm that the mobility shifts observed in WB analysis correspond to the phosphorylated forms of FLAG-IBTK900-1150aa.

      • Phos-tag gels with the IBTK CRISPR KO line would also help confirm that the non-phosphorylated band is indeed IBTK.

      We appreciate the reviewer for bringing up this crucial point. As we state above, we performed Phos-tag assays to detect the mobility shifts of phosphorylated FLAG-IBTK900-1150aa. Anti-FLAG antibody, but not the anti-IBTK antibody was used for WB detection. This antibody does not exhibit cross-reactivity with endogenous IBTK.

      • It is unclear why the lower, phosphorylated bands seem to be increasing (rather than decreasing) with AA starvation/ Rapa in Fig 5H.

      We appreciate the reviewer for bringing up this crucial point. We think the panel the reviewer mentioned is Fig. 5F. According to the principle of Phos-tag assays, proteins with higher phosphorylation levels have slower migration rates on SDS-PAGE, while proteins with lower phosphorylation levels have faster migration rates.

      As shown in Author response image 2, the green box indicates the most phosphorylated forms of FLAG-IBTK900-1150aa, the red box indicates the moderately phosphorylated forms of FLAG-IBTK900-1150aa, and the yellow box indicates the non-phosphorylated forms of FLAG-IBTK900-1150aa. AA starvation or Rapamycin treatment reduced the hyperphosphorylated forms of FLAG-IBTK900-1150aa (green box), while simultaneously increasing the hypophosphorylated (red box) and non- phosphorylated (yellow box) forms of FLAG-IBTK900-1150aa. Thus, we conclude that AA starvation or Rapamycin treatment leads to a marked decrease in the phosphorylation levels of FLAG-IBTK900-1150aa.

      Author response image 2.

      Reviewer #2 (Public Review):

      Summary:

      This study by Sun et al. identifies a novel role for IBTK in promoting cancer protein translation, through regulation of the translational helicase eIF4A1. Using a multifaceted approach, the authors demonstrate that IBTK interacts with and ubiquitinates eIF4A1 in a non-degradative manner, enhancing its activation downstream of mTORC1/S6K1 signaling. This represents a significant advance in elucidating the complex layers of dysregulated translational control in cancer.

      Strengths:

      A major strength of this work is the convincing biochemical evidence for a direct regulatory relationship between IBTK and eIF4A1. The authors utilize affinity purification and proximity labeling methods to comprehensively map the IBTK interactome, identifying eIF4A1 as a top hit. Importantly, they validate this interaction and the specificity for eIF4A1 over other eIF4 isoforms by co- immunoprecipitation in multiple cell lines. Building on this, they demonstrate that IBTK catalyzes non-degradative ubiquitination of eIF4A1 both in cells and in vitro through the E3 ligase activity of the CRL3-IBTK complex. Mapping IBTK phosphorylation sites and showing mTORC1/S6K1-dependent regulation provides mechanistic insight. The reduction in global translation and eIF4A1- dependent oncoproteins upon IBTK loss, along with clinical data linking IBTK to poor prognosis, support the functional importance.

      Weaknesses:

      While these data compellingly establish IBTK as a binding partner and modifier of eIF4A1, a remaining weakness is the lack of direct measurements showing IBTK regulates eIF4A1 helicase activity and translation of target mRNAs. While the effects of IBTK knockout/overexpression on bulk protein synthesis are shown, the expression of multiple eIF4A1 target oncogenes remains unchanged.

      Summary:

      Overall, this study significantly advances our understanding of how aberrant mTORC1/S6K1 signaling promotes cancer pathogenic translation via IBTK and eIF4A1. The proteomic, biochemical, and phosphorylation mapping approaches established here provide a blueprint for interrogating IBTK function. These data should galvanize future efforts to target the mTORC1/S6K1-IBTK-eIF4A1 axis as an avenue for cancer therapy, particularly in combination with eIF4A inhibitors.

      Reviewer #1 (Recommendations For The Authors):

      (1) Certain references should be provided for clarity. For e.g.,: Page 15, line 418 " The C-terminal glycine glycine (GG) amino acid residues are essential for Ub conjugation to targeted proteins".

      We appreciate the reviewer for bringing up this crucial point. We have taken two fundamental review papers (PMID: 22524316, 9759494) on the ubiquitin system as references in this sentence.

      (2) Please describe the properties of the ΔBTB mutant on page 15 when first describing it. What motifs does it lack and has it been described before in functional studies?

      We appreciate the reviewer for bringing up this crucial point. We added a sentence to describe the properties of the ΔBTB mutant. This mutant lacks the BTB1 and BTB2 domains (deletion of aa 554–871), which have been previously demonstrated to be essential for binding to CUL3. The original reference has been added to the revised manuscript.

      (3) In Figure 2G how do the authors explain the fact that co-expression of the Ub K-ALLR mutant, which is unable to form polyubiquitin chains, formed only a moderate reduction in IBTK-mediated eIF4A1 ubiquitination?

      We appreciate the reviewer for bringing up this crucial point. The Ub K-ALLR mutant can indeed conjugate to substrate proteins, but it cannot form chains due to its absence of lysine residues, resulting in mono-ubiquitination. Multi- mono-ubiquitination refers to the attachment of single ubiquitin molecules to multiple lysine residues on a substrate protein. It's worth noting that a poly- ubiquitinated protein and a multi-mono-ubiquitinated protein appear strikingly similar in Western blot. Our findings demonstrated that the co-expression of the Ub K-ALL-R mutant resulted in only a modest reduction in IBTK-mediated eIF4A1 ubiquitination (Fig. 2G), and that eIF4A1 was ubiquitinated at twelve lysine residues when co-expressed with IBTK (Fig. S2F). As such, we conclude that the CRL3IBTK complex primarily catalyzes multi-mono-ubiquitination on eIF4A1. .

      (4) In Figure 5, The identity of the seven sites in the IBTK 7ST A mutants should be specified.

      We appreciate the reviewer for bringing up this crucial point. We have specified the seven mutation sites in the IBTK-7ST A mutant (Fig. 6A).

      (5) In Figure 5, the rationale for generating antibodies only to S990/992/993, as opposed to the other mTORC1/S6K motifs should be specified.

      We appreciate the reviewer for bringing up this crucial point. Upon demonstrating that IBTK can be phosphorylated—with evidence from positive Phos-tag and in vitro phosphorylation assays—we sought to directly detect changes in the phosphorylation levels using an antibody specific to IBTK phosphorylation. However, the expense of generating seven phosphorylation- specific antibodies for each site is significant. Recognizing that S990/992/993 are three adjacent sites, we deemed it appropriate to generate a single antibody to recognize the phospho-S990/992/993 epitope. Moreover, out of the seven phosphorylation sites, S992 perfectly matches the consensus motif for S6K1 phosphorylation (RXRXXS). Utilizing this antibody allowed us to observe a substantial decrease in the phosphorylation levels of these three adjacent Ser residues in IBTK following either AA deprivation or Rapamycin treatment (Fig. 5L). We have specified these points in the manuscript.

      Reviewer #2 (Recommendations For The Authors):

      The following suggestions would strengthen the study:

      (1) Directly examine the effects of IBTK modulation (knockdown/knockout/ overexpression) on eIF4A1 helicase activity.

      We appreciate the reviewer for bringing up this crucial point. We agree with the reviewer's suggestion that evaluating IBTK's influence on eIF4A1 helicase activity directly would enhance the strength of our conclusion. However, the current eIF4A1 helicase assays, as described in previous publications [Ref: 1, 2], can only be conducted using in vitro purified recombinant proteins. For instance, it is feasible to assess the varying levels of helicase activity exhibited by recombinant wild-type or mutant EIF4A1 proteins [Ref: 2]. Importantly, there is currently no reported methodology for evaluating the helicase activity of EIF4A1 in vivo, as mentioned by the reviewer in gene knockdown, knockout, or overexpression cellular contexts. Therefore, we have not performed these assays and we sincerely appreciate the reviewer's understanding in this regard. We sincerely appreciate the reviewer's understanding regarding this matter.

      Reference:

      [1] Chu J, Galicia-Vázquez G, Cencic R, Mills JR, Katigbak A, Porco JA, Pelletier J. CRISPR-mediated drug-target validation reveals selective pharmacological inhibition of the RNA helicase, eIF4A. Cell reports. 2016 Jun 14;15(11):2340-7.

      [2] Chu J, Galicia-Vázquez G, Cencic R, Mills JR, Katigbak A, Porco JA, Pelletier J. CRISPR-mediated drug-target validation reveals selective pharmacological inhibition of the RNA helicase, eIF4A. Cell reports. 2016 Jun 14;15(11):2340-7.

      (2) Justify why the expression of some but not all eIF4A1 target oncogenes is affected in IBTK-depleted/overexpressing cells. This is important if IBTK should be considered as a therapeutic target. The authors should consider which of the eIF4A1 targets are most impacted by IBTK KO. This would provide a more focused therapeutic approach in the future.

      We appreciate the reviewer for bringing up this crucial point. As the reviewer has pointed out, we assessed the protein levels of ten reported eIF4A1 target genes across three cancer cell lines (Fig.4, Fig. S4A, C). We observed that IBTK depletion led to a substantial reduction in the protein levels of most eIF4A1- regulated oncogenes upon IBTK depletion, although there were some exceptions. For instance, IBTK KO in H1299 cells exerted minimal influence on the protein levels of ROCK1 (Fig. S4A). Several possible explanations might account for this observation: firstly, given that our list of eIF4A1 target genes collected from previous studies conducted using distinct cell lines, it is not unexpected for different lines to exhibit subtle differences in regulation of eIF4A1 target genes. Secondly, as a CRL3 adaptor, IBTK potentially performs other biological functions via ubiquitination of specific substrates; dysregulation of these could buffer the impact of IBTK KO on the protein expression of some eIF4A1 target genes. We added these comments to the Discussion section of the revised manuscript.

      (3) Expand mTOR manipulation experiments (inhibition, Raptor knockout, activation) and evaluate impacts on IBTK phosphorylation, eIF4A1 ubiquitination, and translation.

      The mTORC1 signaling pathway is constitutively active under normal culture conditions. In order to inhibit mTORC1 activation, we employed several approaches including AA starvation, Rapamycin treatment, or Raptor knockout. Our results have demonstrated that both AA starvation and rapamycin treatment led to a reduction in eIF4A1 ubiquitination (Fig. 5M). Moreover, we have included new findings in the revised manuscript, which highlight that Raptor knockout specifically decreases eIF4A1 ubiquitination (Fig. 5N). It is worth mentioning that the impacts of mTOR inhibition or activation on protein translation have been extensively investigated and documented in numerous studies. Therefore, in our study, we did not feel it necessary to examine these treatments further.

      (4) Although not absolutely necessary, it would be nice to see if some of these findings are true in other cancer cell types.

      We appreciate the reviewer for bringing up this crucial point. We concur with the reviewer's suggestion that including data from other cancer cell types would enhance the strength of our conclusion. While the majority of our data is derived from two cervical cancer cell lines, we have corroborated certain key findings— such as the impact of IBTK on eIF4A1 and its target gene expression—in H1299 cells (human lung cancer) (Fig. 2C, Fig. S4A, B) and in CT26 cells (murine colon adenocarcinoma) (Fig. S4C, D). Additionally, we demonstrated that IBTK promotes IFN-γ-induced PD-L1 expression and tumor immune escape in both the H1299 and CT26 cells (Fig. S6A-K).

    1. Author response:

      The following is the authors’ response to the original reviews.

      The reviewer comments have been helpful, and we have revised the manuscript to address the concerns of reviewer 2. In addition to text changes, we also added a negative control to Figure 1 to address concerns about photobleaching or DNA unwrapping.

      Reviewer #1:

      This manuscript presents an extremely exciting and very timely analysis of the role that the nucleosome acidic patch plays in SWR1-catalyzed histone exchange. Intriguingly, SWR1 loses activity almost completely if any of the acidic patches are absent. To my knowledge, this makes SWR1 the first remodeler with such a unique and pronounced requirement for the acidic patch. The authors demonstrate that SWR1 affinity is dramatically reduced if at least one of the acidic patches is absent, pointing to a key role of the acidic patch in SWR1 binding to the nucleosome. The authors also pinpoint a specific subunit - Swc5 - that can bind nucleosomes, engage the acidic patch, and obtain a cryo-EM structure of Swc5 bound to a nucleosome. They also identify a conserved arginine-rich motif in this subunit that is critical for nucleosome binding and histone exchange in vitro and for SWR1 function in vivo. The authors provide evidence that suggests a direct interaction between this motif and the acidic patch.

      Strengths:

      The manuscript is well-written and the experimental data are of outstanding quality and importance for the field. This manuscript significantly expands our understanding of the fundamentally important and complex process of H2A.Z deposition by SWR1 and would be of great interest to a broad readership.

      We thank the reviewer for their enthusiastic and positive comments on our work.

      Reviewer #2:

      Summary:

      In this study, Baier et al. investigated the mechanism by which SWR1C recognizes nucleosomal substrates for the deposition of H2A.Z. Their data convincingly demonstrate that the nucleosome's acidic patch plays a crucial role in the substrate recognition by SWR1C. The authors presented clear evidence showing that Swc5 is a pivotal subunit involved in the interaction between SWR1C and the acidic patch. They pared down the specific region within Swc5 responsible for this interaction. However, two central assertions of the paper are less convincing. First, the data supporting the claim that the insertion of one Z-B dimer into the canonical nucleosome can stimulate SWR1C to insert the second Z-B dimer is somewhat questionable (see below). Given that this claim contradicts previous observations made by other groups, this hypothesis needs further testing to eliminate potential artifacts. Secondly, the claim that SWR1C simultaneously recognizes the acidic patch on both sides of the nucleosome also needs further investigation, as the assay used to establish this claim lacks the sensitivity necessary to distinguish any difference between nucleosomal substrates containing one or two intact acidic patches.

      Strengths:

      As mentioned in the summary, the authors presented clear evidence demonstrating the role of Swc5 in recognition of the nucleosome acidic patch. The identification of the specific region in Swc5 responsible for this interaction is important.

      We thank the reviewer for their careful critique of our work. Below we address each major concern.

      Major comments: (1) Figure 1B: It is unclear how much of the decrease in FRET is caused by the bleaching of fluorophores. The authors should include a negative control in which Z-B dimers are omitted from the reaction. In the absence of ZB dimers, SWR1C will not exchange histones. Therefore, any decrease in FRET should represent the bleaching of fluorophores on the nucleosomal substrate, allowing normalization of the FRET signal related to A-B eviction.

      In this manuscript, as well as in our two previous publications (Singh et al., 2019; Fan et al.,2022), we have presented the results of no enzyme controls, +/- ZB dimers, no ATP controls, or AMP-PNP controls for our FRET-based, H2A.Z deposition assay (see also Figure S3). We do not observe significant levels of photobleaching in this assay, either during ensemble measurements or in an smFRET experiment. To aid the reader, we have added the AMP-PNP data for the experiment shown in Figure 1B. The results show there is less than a 10% decrease in FRET over 30’, and the signal from the double acidic patch disrupted nucleosome is identical to this negative control.

      (2) Figure S3: The authors use the decrease in FRET signal as a metric of histone eviction. However, Figure S3 suggests that the FRET signal decrease could be due to DNA unwrapping. Histone exchange should not occur when SWR1C is incubated with AMP-PNP, as histone exchange requires ATP hydrolysis (10.7554/eLife.77352). And since the insertion of Z-B dimer and the eviction of A-B dimer are coupled, the decrease of FRET in the presence of AMP-PNP is unlikely due to histone eviction or exchange. Instead, the FRET decrease is likely due to DNA unwrapping (10.7554/eLife.77352). The authors should explicitly state what the loss of FRET means.

      We agree with the reviewer, that loss of FRET can be due to DNA unwrapping from the nucleosome. We have previously demonstrated this activity by SWR1C in our smFRET study (Fan et al., 2022). However, DNA unwrapping is highly reversible and has a time duration of only 1-3 seconds. We and others have not observed stable unwrapping of nucleosomes by SWR1C, but rather the stable loss of FRET reports on dimer eviction. We assume the reviewer is concerned about the rather large decrease in FRET signal shown in the AMP-PNP controls for Figure S3, panels A and D. For the other 7 panels, the decrease in FRET with AMP-PNP are minimal. In fact, if we average all of the AMP-PNP data points, the rate of FRET loss is not statistically different from no enzyme control reactions (nucleosome plus ZB dimers).

      Data for panels A and D used a 77NO nucleosomal substrate, with Cy3 labeling the linker distal dimer. This is our standard DNA fragment, and it was used in Figure 1B. The only difference between data sets is that the data shown in Fig 1B used nucleosome reconstituted with a Cy5-labelled histone octamer, rather than the hexasome assembly method used for Fig S3. Three points are important. First, for all of these substrates, we assembled 3 independent nucleosomes, and the results are highly reproducible. Two, we performed a total of 6 experiments for the 77NO-Cy5 substrates to ensure that the rates were accurate (+/-ATP). Third, and most important, we do not see this decrease in FRET signal in the absence of SWR1C (no enzyme control). This data was included in the data source file. Thus, it appears that there is significant SWR1C-induced nucleosome instability for these two hexasome-assembled substrates. We now note this in the legend to Figure S3. Key for this work, however, is that there is a large increase in the rate of FRET loss in the presence of ATP, and this rate is faster when a ZB dimer was present at the linker proximal location. In response to the last point, we state in the first paragraph of the results: “The dimer exchange activity of SWR1C is monitored by following the decrease in the 670 nm FRET signal due to eviction of the Cy5-labeled AB-Cy5 dimer (Figure 1A).”

      (3) Related to point 2. One way to distinguish nucleosomal DNA unwrapping from histone dimer eviction is that unwrapping is reversible, whereas A-B eviction is not. Therefore, if the authors remove AMP-PNP from the reaction chamber and a FRET signal reappears, then the initial loss of FRET was due to reversible DNA unwrapping. However, if the removal of AMP-PNP did not regain FRET, it means that the loss of FRET was likely due to A-B eviction. The authors should perform an AMP-PNP and/or ATP removal experiment to make sure the interpretation of the data is correct.

      See response to item 2 above

      (4) The nature of the error bars in Figure 1C is undefined; therefore, the statistical significance of the data is not interpretable.

      We apologize for not making this more explicit for each figure. The error bars report on 95% confidence intervals from at least 3 sets of experiments. This statement has been added to the legend.

      (5) The authors claim that the SWR1C requires intact acidic patches on both sides of the nucleosomes to exchange histone. This claim was based on the experiment in Figure 1C where they showed mutation of one of two acidic patches in the nucleosomal substrate is sufficient to inhibit SWR1C-mediated histone exchange activity. However, one could argue that the sensitivity of this assay is too low to distinguish any difference between nucleosomes with one (i.e., AB/AB-apm) versus two mutated acidic patches (i.e., AB-apm/AB-apm). The lack of sensitivity of the eviction assay can be seen when Figure 1B is taken into consideration. In the gel-shift assay, the AB-apm/AB-apm nucleosome exhibited a 10% SWR1C-mediated histone exchange activity compared to WT. However, in the eviction assay, the single AB/AB-apm mutant has no detectable activity. Therefore, to test their hypothesis, the authors should use the more sensitive in-gel histone exchange assay to see if the single AB/AB-apm mutant is more or equally active compared to the double AB-apm/AB-apm mutant.

      Our pincher model is based on three, independent sets of data, not just Figure 1C. First, as noted by the reviewer, we find that disruption of either acidic patch cripples the dimer exchange activity of SWR1C in the FRET-based assay. Whether the defect is identical to that of the double APM mutant nucleosome does not seem pertinent to the model. In a second set of assays, we used fluorescence polarization to quantify the binding affinity of SWR1C for wildtype nucleosomes, a double APM nucleosome, or each single APM nucleosome. Consistent with the pincher model, each single APM disruption decreases binding affinity at least 10-fold (below the sensitivity of the assay). Finally, we monitored the ability of different nucleosomes to stimulate the ATPase activity of SWR1C. Consistent with the pincher model, a single APM disruption was sufficient to eliminate nucleosome stimulation.

      (6) The authors claim that the AZ nucleosome is a better substrate than the AA nucleosome. This is a surprising result as previous studies showed that the two insertion steps of the two Z-B dimers are not cooperative (10.7554/eLife.77352 and 10.1016/J.CELREP.2019.12.006). The authors' claim was based on the eviction assay shown in Fig 1C. However, I am not sure how much variation in the eviction assay is contributed by different preparations of nucleosomes. The authors should use the in-gel assay to independently test this hypothesis.

      For all data shown in our manuscript, at least three different nucleosome preparations were used. The impact of a ZB dimer on the rates of dimer exchange was highly reproducible among different nucleosome preparations and experiments. We also see reproducible ZB stimulation for three different substrates – with ZB on the linker proximal side, the linker distal side, and on one side of a core particle. We do not believe that our data are inconsistent with previous studies. First, the previous work referenced by the reviewer performed dimer exchange reactions with a large excess of nucleosomes to SWR1C (catalytic conditions), whereas we used single turnover reactions. Secondly, our study is the first to use a homogenous, ZA heterotypic nucleosome as a substrate for SWR1C. All previous studies used a standard AA nucleosome, following the first and second rounds of dimer exchange that occur sequentially. And finally, we observe only a 20-30% increase in rate by a ZB dimer (e.g. 77N0 substrates), and such an increase was unlikely to have been detected by previous gel-based assays.

      Minor comments:

      (1) Abstract line 4: To say 'Numerous' studies have shown acidic patch impact chromatin remodeling enzymes activity may be too strong.

      Removed

      (2) Page 15, line 15: The authors claim that swc5∆ was inviable on formamide media. However, the data in Figure 8 shows cell growth in column 1 of swc5∆.

      The term ‘inviable’ has been replaced with ‘poor’ or ‘slow growth’

      (3) The authors should use standard yeast nomenclature when describing yeast genes and proteins. For example, for Figure 8 and legend, Swc5∆ was used to describe the yeast strain BY4741; MATa; his3Δ1; leu2Δ0; met15Δ0; ura3Δ0; YBR231c::kanMX4. Instead, the authors should describe the swc5∆ mutant strain as BY4741 MAT a his3∆1 leu2∆0 met15∆0 ura3∆0 swc5∆::kanMX4. Exogenous plasmid should also be indicated in italics and inside brackets, such as [SWC5-URA3] or [swc5(R219A)-URA3].

      We apologize for missing this mistake in the Figure 8 legend. We had inadvertently copied this from the euroscarf entry and forgot to edit the entry. We decided not to add all the plasmid names to the figure, as it was too cluttered. We state in the figure legend that the panels show growth of swc5 deletion strains harboring the indicated swc5 alleles on CEN/ARS plasmids.

      (4) According to Lin et al. 2017 NAR (doi: 10.1093/nar/gkx414), there is only one Swc5 subunit per SWR1C. Therefore, the pincher model proposed by the authors would suggest that there is a missing subunit that recognizes the second acidic patch. The authors should point out this fact in the discussion. However, as mentioned in Major comment 6, I am not sure if the pincer model is substantiated.

      In our discussion, we had noted that the published cryoEM structure had suggested that the Swc2 subunit likely interacts with the acidic patch on the dimer that is not targeted for replacement, and we proposed that Swc5 interacts with the acidic patch on the exchanging H2A/H2B dimer. We have now made this more clear in the text.

    1. Author response:

      We thank the reviewers for the feedback on our manuscript; we are planning to address the raised concerns in the following manner:

      We will be more explicit about the novelty of this method framing it more concretely within the scope of current research. From some comments of the reviewers, we understand that it is not clear that our method is an extension of an already existing method and model that has been extensively validated with pre-trained models brought online. Consequently, the details of the model as well as the training cohort are only covered briefly, referencing relevant published works on this topic. We will improve the clarity in this respect in the full responses. Nevertheless, we agree that the work would benefit from a simulation study that formally evaluates the performance of our method compared with more traditional approaches and will add it in our full responses. We will take care specifically of investigating the effect of assumptions like the centile-stability in healthy controls as suggested by the Reviewer 2.

      The novelty of this work lies in introducing a mathematically transparent method to use normative modelling for evaluating studies with a longitudinal design, using normative models trained on cross sectional data. We emphasise strongly that this is otherwise not possible using current methods. Furthermore, by building on a pre-trained model, this method enjoys the benefits of big (cross-sectional) data (by the pre-trained model being fitted on an extensive population sample) without the need to have direct access to them, or a ‘big’ longitudinal dataset from the cohort at hand. This is crucial in neuroimaging, where longitudinal data are much more scarce than cross-sectional data.

      We strongly disagree with the notion raised by Reviewer 1 that after the first episode cortical thickness alterations are expected to become more severe. There is now increasing evidence that: (i) trajectories of cortical thickness are highly variable across different individuals after the first psychotic episode and (ii) that individuals treated with second-generation antipsychotics and with careful clinical follow-up can show normalisation of cortical thickness atypicalities after the first episode. Indeed, we can provide evidence for this in an independent cohort, with different analytical methodologies, where precisely this occurs (https://www.medrxiv.org/content/10.1101/2024.04.19.24306008v1, https://pubmed.ncbi.nlm.nih.gov/36805840/). In the full revision, we would be happy to provide further discussion of evidence in support of this.

      We  would also like to re-emphasise  that the data were processed with the utmost rigour using state of the art processing pipelines including quality control.

      We will take care to improve the flow of the manuscript with special attention to the theoretical part and sections highlighted by the Reviewer 2. 

      We agree with the challenge outlined by the Reviewer 2 regarding the limitations in interpretation of overall trends when the position in the visit one is different between the subjects. However, this is a much broader challenge and is not specific to this study. The non-random sampling of large cohort studies is problematic for nearly all studies using such cohorts, and regardless of the  statistical approach used. We will explicitly acknowledge these limitations in the full response.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This solid study investigates the transdifferentiation of chicken embryonic fibroblasts into muscle and fat cells in 3D to create whole-cut meat mimics. The study is important and provides a method to control muscle, fat, and collagen content within the 3D meat mimics and thus provides a new avenue for customized cultured meat production. Limitations of this study include the use of transgene for transdifferentiation and thus the creation of GMO food.

      We are grateful for the substantial effort that editors and reviewers put into assessing our manuscript and providing insightful feedback. We have tried to address, as much as possible, all comments and criticisms. We believe that we have now a significantly improved manuscript. Below, there is a point-by-point response.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors presented here a novel 3D fibroblast culture and transdifferentiation approach for potential meat production with GelMA hydrogel.

      Strengths:

      (1) Reduced serum concentration for 3D chicken fibroblast culture and transdifferentiation is optimized.

      (2) Efficient myogenic transdifferentiation and lipogenesis as well as controlled fat deposition are achieved in the 3D GelMA.

      Weaknesses:

      (1) While the authors stated the rationale of using fibroblasts instead of myogenic/adipogenic stem cells for meat production, the authors did not comment on the drawbacks/disadvantages of genetic engineering (e.g., forced expression of MyoD) in meat production.

      Thanks for the reviewer for raise this important issue. We have now described this drawback in the discussion part.

      As a proof-of-concept study, we sought to explore the potential of utilizing the transdifferentiation integrated transgene tools for overexpressing a transdifferentiation factor to achieve the maximum muscle production. However, it is important to acknowledge that genetically modified meat products derived from the genetic engineering of cultured cells will not be suitable for consumer acceptance and market viability. We are currently testing other non-genomic integrating delivery means such as modRNAs and chemical cocktails to induce myogenic transdifferentiation in fibroblasts. We believe the new non-genomic integration means would be compatible for the meat production and consumer acceptance.

      Please see lines 439-445.

      “As a proof-of-concept, we utilized the transgene method to achieve maximum myogenic induction and the final products still retain the foreign transgene fragment in the cells’ genome. It is therefore posing a risk of genetic modified food which is not suitable for mass production. In the next step, other non-transgenic means such as non-integrating vectors, chemical reprogramming, modified RNAs, and recombinant transgene removal techniques will be explored to develop transgene-free end products.”

      (2) While the authors cited one paper to state the properties and applications of GelMA hydrogel in tissue engineering and food processing, concerns/examples of the food safety with GelMA hydrogel are not discussed thoroughly.

      Thank you for pointing out this issue. We discussed the drawbacks of Gelma hydrogel applications in the meat production in the main text.

      GelMA-based hydrogels have shown great potential due to their biocompatibility and mechanical tenability. It is widely used in 3D cell culture and tissue engineering for regenerative medicine, but less common in food processing and agricultural applications. Due to its special photo-crosslinking properties, biocompatibility and degradability, it allows this material to be shaped into complex tissue structures by 3D printing or modelling. Many researchers have also used Gelma hydrogel as a scaffold for culture meat production (Jeong et al., 2022; Li et al., 2021; Park et al., 2023). Later research will carefully consider Gelma hydrogen as well as other types of scaffold biomaterials for cost-effective and food-safety compliant culture meat production (Bomkamp et al., 2022).

      Bomkamp, C., Skaalure, S. C., Fernando, G. F., Ben‐Arye, T., Swartz, E. W., & Specht, E. A. J. A. S. (2022). Scaffolding biomaterials for 3D cultivated meat: prospects and challenges. Advanced Science (Weinh), 9(3), 2102908.

      Jeong, D., Seo, J. W., Lee, H. G., Jung, W. K., Park, Y. H., & Bae, H. (2022). Efficient Myogenic/Adipogenic Transdifferentiation of Bovine Fibroblasts in a 3D Bioprinting System for Steak-Type Cultured Meat Production. Advanced Science (Weinh), 9(31), e2202877.

      Li, Y., Liu, W., Li, S., Zhang, M., Yang, F., & Wang, S. J. J. o. F. F. (2021). Porcine skeletal muscle tissue fabrication for cultured meat production using three-dimensional bioprinting technology. Journal of Future Foods, 1(1), 88-97.

      Park, S., Hong, Y., Park, S., Kim, W., Gwon, Y., Jang, K.-J., & Kim, J. J. J. o. B. E. (2023). Designing Highly Aligned Cultured Meat with Nanopatterns-Assisted Bio-Printed Fat Scaffolds. Journal of Biosystems Engineering, 48(4), 503-511.

      We discussed the drawbacks of GelMA hydrogel. Please see lines 445-457.

      “Another food safety concern in this study is the use of GelMA hydrogel for culture meat production. Due to its excellent biocompatibility and mechanical flexibility, GelMA-based hydrogel has demonstrated significant potential in scalable 3D cell culture for creating artificial tissue ranging in sizes from millimeters to centimeters. It is widely used in 3D cell culture and tissue engineering for regenerative medicine, but less common in food processing and agricultural applications. Due to its special photo-crosslinking properties, biocompatibility and degradability, it allows this material to be shaped into complex tissue structures by 3D printing or modelling. Many researchers have also used GelMA hydrogel as a scaffold for culture meat production (Jeong et al., 2022; Li et al., 2021; Park et al., 2023). Later research will carefully consider hydrogel as well as other types of scaffold biomaterials for cost-effective and food-safety compliant culture meat production (Bomkamp et al., 2022). ”

      (3) In Fig. 4C, there seems no significant difference in the Vimentin expression between Fibroblast_MyoD and Myofibroblast. The conclusion of "greatly reduced in the myogenic transdifferentiated cells" is overstated.

      Thanks for pointing out this mistake.

      We revised the wording accordingly. The vimentin expression was reduced in fibroblast_MyoD compare to the original fibroblast.

      Please see lines 231-233.

      “The fibroblast intermediate filament Vimentin (Tarbit et al., 2019) was abundantly expressed in the fibroblasts but reduced in the myogenic transdifferentiated cells (Figure 4C)”

      (4) The presented cell culture platform is only applied to chicken fibroblasts and should be tested in other species such as pigs and fish.

      Thank you for the suggestion.

      In this pilot cultured meat study, we utilized chicken embryonic fibroblasts. These specific cells were chosen for their near-immortal nature and robustness in culture, as well as the inducible myogenic capacity. In our previous experiments (Ren et al, Cell Reports, 2022, 40:111206), we have tested the myogenic transdifferentiation potential of fibroblasts from mice, pigs, and chickens, and observed varying efficiencies of myogenesis. It is important to note that fibroblast cells derived from different species, or even different tissues within the same species, would exhibit significant variations in their capacities for myogenic and adipogenic transdifferentiation.

      In this proof-of-concept study we used only one source of fibroblasts for testing culture meat production and confirmed the myogenic/adipogenic transdifferentiation could be manipulated as feasible means to precisely control muscle, fat and collagen content. We would expect that different origins of fibroblasts to display different transdifferentiation efficiencies and thus produce various muscle/fat ratios in meat mimics. That is beyond the scope of current study.

      Furthermore, we are also testing myogenic/adipogenic transdifferentiation of fibroblasts from pigs through non-genomic integration approaches. We believe only the non-transgene tools are viable solutions for culture meat production in the future. We added the species information in the discussion part.

      See lines 515-517.

      “This approach can be readily extrapolated to other species such as pigs and presents promising avenues for the large-scale production of customized and versatile meat products that may cater to varying consumer preferences.”

      Reviewer #2 (Public Review):

      The manuscript by Ma et al. tries to develop a protocol for cell-based meat production using chicken fibroblasts as three-dimensional (3D) muscle tissues with fat accumulation. The authors used genetically modified fibroblasts which can be forced to differentiate into muscle cells and formulated 3D tissues with these cells and a biphasic material (hydrogel). The degrees of muscle differentiation and lipid deposition in culture were determined by immunohistochemical, biochemical, and molecular biological evaluations. Notably, the protocol successfully achieved the process of myogenic and lipogenic stimulation in the 3D tissues.

      Overall, the study is reasonably designed and performed including adequate analysis. The manuscript is clearly written with well-supported figures. While it presents valuable results in the field of cultivated meat science and skeletal muscle biology, some critical concerns were identified. First, it is unclear whether some technical approaches were really the best choice for cell-based meat production. Next, more careful evaluations and justifications would be required to properly explain biological events in the results. These points include additional evaluations and considerations with regard to myocyte alignment and lipid accumulation in the differentiated 3D tissues. The present data are very suggestive in general, but further clarifications and arguments would properly support the findings and conclusions.

      Thanks for the reviewer’s comments. We have performed additional experiments and analysis to address the critical questions. We also revised the text extensively to clarify or discuss some of the concerns, such as the cell alignment and cellular distribution of intramuscular fat issues. We expect the revised data and text could adequately support the conclusions of the manuscript.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) In Figure 1, the authors used 1% chicken serum. Have the authors tested other lower concentrations? It will be interesting to see the lowest chicken serum concentrations in fibroblast culture and transdifferentiation;

      Thank you for your suggestion.

      Yes, we actually have tested the lower concentrations of serum, such as 1% FBS, and 0.5% chicken serum. However, the cells are not in a healthy state under these low levels of serum, as shown by the abnormal cell morphology and nearly no cell growth. Please see the revised Supplementary Figure S1D, in which we added the 1%FBS and 0.5% chicken serum data. Hence, the 1% chicken serum is optimal in our hands. We will also test other types of specialized serum-free medium in future experiments.

      (2) In Figure 2, the authors should quantify the fold expansion of fibroblasts cultured in 3D gel after 1, 3, 5, and 9 days since this data is important for future meat manufacturing. In addition, long-term expansion (e.g., 1 month) in 3D gel should also be shown;

      Thanks for the question. We have quantified the cell growth in 3D by measuring the PHK26 stained cells. Since the cells were implanted into the gel, they propagated exponentially from 1 day to 9 days. The cell proliferation data provide good reference for the future meat manufacturing (Figure 2D). We have tried the long-term expansion in 3D but failed to measure the cell proliferation. Because the 3D gel always collapsed during 12-15 days in cell culture for some unknown reasons, either the cells are grown too crowded to compromise the gel structure or the gel matrix itself is not strong enough for standing long-term. We believe the cells will grow well in long-term if we provide enough 3D attachment surface, since they grow indefinitely in 2D. We will testing different 3D matrix in the future.

      Please see the revised Figure 2D for the quantification of cells.

      (3) In Figure 3, please also show MyoD staining as it'll be interesting to see the expression of exogenous and endogenous MyoD expression after dox treatment. In Figure G, the hydrogel meat seems very small, please show/discuss the maximum size of hydrogel meat that may be achieved using this approach;

      Thanks for asking this information. We performed the immunostaining by using the anti-MyoD and anti-Flag to show the expression of all MyoD (exogenous and endogenous) and only exogenous MyoD after dox treatment. The MyoD and 3xFlag were fused in-frame in the transgene plasmid and thus the anti-Flag staining indicate the exogenous MyoD expression and anti-MyoD staining indicate the expression of exogenous and endogenous MyoD together.

      As shown in Figure S4, we found that almost 100% of cells were positive for MyoD staining and 60% of which expressed Flag, these data were consistent with our previous results (Ren et al., 2022, Cell Reports).

      Author response image 1.

      As for the size of the culture meat based on hydrogel, we discussed the possibilities in scalable production of hydrogel based whole-cut meat mimics. Please see lines 446-449. “Due to its excellent biocompatibility and mechanical flexibility, GelMA-based hydrogel has demonstrated significant potential in scalable 3D cell culture for creating artificial tissue ranging in sizes from millimeters to centimeters.”

      (4) In Figure 5 and Supplementary Figure 6, please quantify the Oil-red O+ fat cells in the 2D and 3D lipogenic induction. Also in Fig. 6B, quantify the oil-red+MHC+ cells;

      Thank you for this advice. We have quantified the oil-red O stained images in the result “Stimulate the fat deposition in chicken fibroblasts in 3D” using analysis software imageJ and the quantification of Oil-red O area was added to the corresponding graphs (Figure 5C, Figure S6C and S6F).

      However, due to the unique structure of the 3D matrix, many MHC+ and Oil Red O+ double-positive cells overlap with each other across different Z-stack layers in 3D. This overlap makes it challenging to accurately position and quantify the double-positive cells as the different layers interfere with each other.

      (5) In Figure 7, please show immunostaining images of collagen and other major ECMs;

      Thank you for this question. We have tried to stain collagen networks the by the Picrosirius Red staining but failed. Instead, we employed the laminin immunostainings to confirm that the ECM contents in the 3D matrix is increasing steadily during cell culturation.

      Please see Figure 7C. Lines 346-348.

      “the laminin protein content was accumulated and increased steadily during 3D culturation (Figure 7C) “

      (6) In Figure 8, please show hierarchical clustering analysis of whole transcriptomes of 3D_fibroblasts, 3D_MyoD, 3D+FI, and 3D_MyoD+FI. A Venn Diagram showing the overlap and distinct gene expression among these groups is also appreciated.

      Thank you for the suggestion.

      We added the hierarchical clustering analysis of whole transcriptomes of 3D_fibroblasts, 3D_MyoD, 3D+FI, and 3D_MyoD+FI using Euclidean distance with ward.D cluster method. Please see Figure 8B. The result showed that these groups formed two large clusters, in which the 3D+FI clustered separately and the 3D_fibroblasts, 3D_MyoD and 3D_MyoD+FI were more similar. Please see Figure 8B.

      As the reviewer suggested, we also compared the transcriptomes of 3D_MyoD, 3D+FI, and 3D_MyoD+FI to the original 3D_fibroblasts to identify differentially expression genes (DEG) and then analyzed the overlap and distinct DEGs respectively. As shown in Figure 8D, the Venn Diagram showed that majority of DEG from 3D_MyoD+FI (3D_MyoD+FI versus 3D_fibroblasts) are overlapped with 3D_MyoD and 3D+FI, indicating that 3D_MyoD+FI are compatible with myogenic and adipogenic function.

      Please see the revised Figure 8.

      Reviewer #2 (Recommendations For The Authors):

      In this study, the authors demonstrated a new approach for cultivated meat production using chicken fibroblasts. Specifically, the cells were cultured as 3D and induced muscle differentiation and lipid deposition. The manuscript contains a good set of data, which would be valuable to researchers in the fields of both cell-based meat and skeletal muscle biology. From the aspect of cultivated meat science, the rationale behind the idea is understandable, but it remains unclear whether the proposed approach was really the best choice to achieve their final goal. On the other hand, when we read this manuscript as a paper in skeletal muscle biology, the overall approach was not innovative enough and several uncertain issues remain. The authors should add more sufficient justifications, arguments, and discussions.

      (1) When considering their goal to produce edible meat products, the current approach has some concerns. First, there are issues with the approach used for the induction of myogenesis by MyoD transgene. This makes the end products GMO foods, which are not easily acceptable to a wide range of consumers. Next, the hydrogel was used for 3D tissue formation, but it is unclear whether this matrix type is edible, safe, and bio-comparable for cell-based meat production. The authors already discussed these points by excusing that the current work remains proof-of-concept. However, more careful considerations and justifications would be required.

      Thank you for the suggestion.

      We acknowledge that the current transgene myogenic induction method is not suitable for mass production of culture meat because of the GMO food concerns. We utilized the MyoD transgene as the means of myogenic transdifferentiation at the first place, because of the ease of genetic manipulation and maximum efficiency. We are current testing non-genomic integration tools such as chemical cocktails and modified RNAs for myogenic transdifferentiation.

      When it comes to the applications of hydrogel in the food industry, certain types of hybrid hydrogels, such as those made from pectin or sodium polyacrylate, are not only edible but also safe for consumption. While GelMA hydrogel is typically utilized in tissue engineering and subsequent implantation in patients for therapeutic regenerative medicine purposes, it has not been commonly employed in food processing. In this study, we cultivated cells within GelMA hydrogel due to its durability and ease of use in cell culture. Moving forward, we plan to investigate alternative types of matrices to develop cultured meat suitable for food applications.

      We have now described the GMO and hydrogel drawbacks in the discussion part. Please see lines 439-457.

      “As a proof-of-concept, we utilized the transgene method to achieve maximum myogenic induction and the final products still retain the foreign transgene fragment in the cells’ genome. It is therefore posing a risk of genetic modified food which is not suitable for mass production. In the next step, other non-transgenic means such as non-integrating vectors, chemical reprogramming, modified RNAs, and recombinant transgene removal techniques will be explored to develop transgene-free end products. Another food safety concern in this study is the use of GelMA hydrogel for culture meat production. Due to its excellent biocompatibility and mechanical flexibility, GelMA-based hydrogel has demonstrated significant potential in scalable 3D cell culture for creating artificial tissue ranging in sizes from millimeters to centimeters. It is widely used in 3D cell culture and tissue engineering for regenerative medicine, but less common in food processing and agricultural applications. Due to its special photo-crosslinking properties, biocompatibility and degradability, it allows this material to be shaped into complex tissue structures by 3D printing or modelling. Many researchers have also used GelMA hydrogel as a scaffold for culture meat production (Jeong et al., 2022; Li et al., 2021; Park et al., 2023). Later research will carefully consider hydrogel as well as other types of scaffold biomaterials for cost-effective and food-safety compliant culture meat production (Bomkamp et al., 2022). ”

      (2) From the view of skeletal muscle biology, the approaches (MyoD overexpression, hydrogel-based 3D tissue formation, and lipogenic induction) have already been tested.

      Thank you for the insightful comments from the perspective of skeletal muscle cell biology. We totally agree that the current approaches including MyoD overexpression, 3D cell culture and lipogenic induction, were routine experiments in muscle cell biology. However, we want to highlight that utilization of these classical and robust muscle cell approaches, combine with the unique advantages of fibroblast cells (easily accessible, immortalized, cost-effective, ...) would provide a novel and practical avenue for culture meat production. We stated these issues in the revised manuscript in the discussion part.

      Please see lines 511-515.

      “In conclusion, we have effectively utilized immortalized chicken fibroblasts in conjunction with classical myogenic/adipogenic transdifferentiation approaches within 3D hydrogel to establish a cultured meat model. This model allows for the precise regulation of the synthesis of key components found in conventional meat, including muscle, fat, and ECM.”

      (3) The common emphasis in this manuscript is to use the advantages of 3D culture for tissue differentiation. As the authors described, skeletal muscle is a highly aligned tissue. In this study, some results successfully demonstrated advantages in terms of myocyte alignment, maturation, and lipid deposition. However, the current results cannot address whether the entire 3D tissues maintained these advantageous characteristics or not. Because the method for 3D formation does not have any additional modifications to make the cells aligned, like micropatterning, scaffolding, or bioprinting.

      Thank you for the suggestion.

      We agree with the reviewer that the skeletal muscle tissues are composed of well organized, directional bundles of fibers, and the cell alignment would greatly affect the meat tenderness and sensory properties. Therefore, it is a desired attribute if the cells in the culture meat matrix could be aligned together. But this alignment would require sophisticated biomaterial engineering mainly involved in the scaffold manipulation which is beyond the scope of this study. The hydrogel used in this study formed different sizes of pores at random directions and we would expect the embedded cells to be totally non-directional. But we still found localized cell alignments in some parts of the gel matrix which confirming the cell-cell interactions, please see figure 3D. We describe this feature in the results part. In the future, we will be testing the application of physical or electrical stimulations to the matrix to see if we can align the cells better to make all the muscle cells in the whole matrix to align together.

      Please see lines 186-190.

      “The separate XY axis views of the orthogonal projections at different depths (Figure 3D) and a multi-angle video (Supplementary Video 2) also showed the several myotubes were aligned together. Nevertheless, many myotubes were oriented in different directions, preventing the entire matrix from aligning in one direction.”

      (4) In the skeletal muscle, fat accumulation mainly occurs in adipocytes between myocytes. This means that "intra-" muscular fat deposition is identified. However, lipid deposition within myocytes also occurred in this preparation (Supplementary Figure 7C). This situation is not "intra-" muscular accumulation, which sounds different from what is going on in normal skeletal muscle tissues. Please explain what happened and what biological situations accounted for this. Also, the authors should clarify better how lipogenesis was induced in the 3D tissues, such as cell types (transdifferentiated myocytes, remained/un-transdifferentiated fibroblasts, or both).

      Thank you for the very insightful question. We have revised the corresponding text to further explain the intramuscular fat distribution in different cell types in culture meat.

      We totally agree with the reviewer that intramuscular fat accumulation may occur mainly in the intramuscular adipocytes. However, under some pathological and physiological conditions in human and animals, the lipid droplets were also abundantly observed inside myofibers (intramyocellular lipids within myofiber cytoplasm). For instance, high intramyocellular lipid content was found in insulin resistance patients and paradoxically in endurance trained athletes, (doi.org/10.1016/j.tem.2012.05.009), as well as in some farm animals under intensive selective breeding (doi:10.2174/1876142910901010059). In the current study, with the Oil Red O staining of lipid droplets, we identified lipid deposition in both the transdifferentiated myocytes and the remained un-transdifferentiated fibroblasts in the culture meat. This lipid distribution pattern is comparable to the intramuscular fat storage pattern observed in some human and animals, in which fat accumulation occurs in both myofibers (intramyocellular lipids) and intramuscular adipocyte cells (extramyocellular lipids) which reside within the muscle tissue bundle but between myofibers. We reason that current adipogenic induction treatment caused lipogenesis in both the MyoD-transdifferentiated cells and un-transdifferentiated fibroblasts. It is difficult to compare the absolute amount of lipids between these two types of cells via the Oil Red O staining. Also, it is almost impossible to separate these two types of cells from the 3D meat mimics. Thus, we can only confirm the lipid deposition occurs in both transdifferentiated myocytes and un-transdifferentiated fibroblasts, but without knowing which one is dominant and the major contributor to the intramuscular fat content in the culture meat.

      Please see lines 486-492.

      “In this study, the deposition of fat in the myotubes/myofibers facilitated the storage of significant lipid quantities in transdifferentiated muscle cells, known as intramyocellular lipids. Additionally, we observed Oil Red O staining in the remaining un-transdifferentiated fibroblasts, resembling cells of intramuscular adipocytes (extramyocellular lipids) found within muscle tissue. Hence, current adipogenic induction treatment caused lipogenesis in both the MyoD-transdifferentiated cells and un-transdifferentiated fibroblasts.”

    1. Author response:

      Reviewer #1 (Public Review):

      Given that this is one of the first studies to report the mapping of longitudinal intactness of proviral genomes in the globally dominant subtype C, the manuscript would benefit from placing these findings in the context of what has been reported in other populations, for example, how decay rates of intact and defective genomes compare with that of other subtypes where known.

      Most published studies are from men living with HIV-1 subtype B and the studies are not from the hyperacute infection phase and therefore a direct head-to-head comparison with the FRESH study is difficult. However, we can cite/highlight and contrast our study with a few examples from other acute infection studies as follows.

      (1) Peluso et. al., JCI, 2020, showed that in Caucasian men (SCOPE study), with subtype B infection, initiating ART during chronic infection virus intact genomes decayed at a rate of 15.7% per year, while defective genomes decayed at a rate of 4% per year. In our study we showed that in chronic treated participants genomes decreased at a rate of 25% (intact) and 3% (defective) per month for the first 6 months of treatment.

      (2) White et. al., PNAS, 2021, demonstrated that in a cohort of African, white and mixed-race American men treated during acute infection, the rate of decay of intact viral genomes in the first phase of decay was <0.3 logs copies in the first 2-3 weeks following ART initiation. In the FRESH cohort our data from acute treated participants shows a comparable decay rate of 0.31 log copies per month for virus intact genomes.

      (3) A study in Thailand (Leyre et. al., 2020, Science Translational Medicine), of predominantly HIV-1 CRF01-AE subtype compared HIV-reservoir levels in participants starting ART at the earliest stages of acute HIV infection (in the RV254/SEARCH 010 cohort) and participants initiating ART during chronic infection (in SEARCH 011 and RV304/SEARCH 013 cohorts). In keeping with our study, they showed that the frequency of infected cells with integrated HIV DNA remained stable in participants who initiated ART during chronic infection, while there was a sharp decay in these infected cells in all acutely treated individuals during the first 12 weeks of therapy. Rates of decay were not provided and therefore a direct comparison with our data from the FRESH cohort is not possible.

      (4) A study by Bruner et. al., Nat. Med. 2016, described the composition of proviral populations in acute treated (within 100 days) and chronic treated (>180 days), predominantly male subtype B cohort. In comparison to the FRESH chronic treated group, they showed that in chronic treated infection 98% (87% in FRESH) of viral genomes were defective, 80% (60% in FRESH) had large internal deletions and 14% (31% in FRESH) were hypermutated. In acute treated 93% (48% in FRESH) were defective and 35% (7%) in FRESH were hypermutated. The differences frequency of hypermutations could be explained by the differences in timing of infection specifically in the acute treated groups were FRESH participants initiate ART at a median of 1 day after infection. It is also possible that sex- or race-based differences in immunological factors that impact the reservoir may play a role.

      This study also showed that large deletions are non-random and occur at hotspots in the HIV-1 genome. The design of the subtype B IPDA assay (Bruner et. al., Nature, 2019) is based on optimal discrimination between intact and deleted sequences - obtained with a 5′ amplicon in the Ψ region and a 3′ amplicon in Envelope. This suggest that Envelope is a hotspot for large while deletions in Ψ is the site of frequent small deletions and is included in larger 5′ deletions. In the FRESH cohort of HIV-1 subtype C, genome deletions were most frequently observed between Integrase and Envelope relative to Gag (p<0.0001–0.001).

      (5) In 2017, Heiner et. al., in Cell Rep, also described genetic characteristics of the latent HIV-1 reservoir in 3 acute treated and 3 chronic treated male study participants with subtype B HIV. Their data was similar to Bruner et. al. above showing proportions of intact proviruses in participants who initiated therapy during acute/early infection at 6% (94% defective) and chronic infection at 3% (97% defective). In contrast the frequencies in FRESH in acute treated were 52% intact and 48% defective and in chronic infection were 13% intact and 87% defective. These differences could be attributed to the timing of treatment initiation where in the aforementioned study early treatment ranged from 0.6-3.4 months after infection.

      Indeed, in the abstract, the authors indicate that treatment was initiated before the peak. The use of the term 'peak' viremia in the hyperacute-treated group could perhaps be replaced with 'highest recorded viral load'. The statistical comparison of this measure in the two groups is perhaps more relevant with regards to viral burden over time or area under the curve viral load as these are previously reported as correlates of reservoir size.

      We will edit the manuscript text to describe the term peak viraemia in hyperacute treated participants more clearly. We will perform an analysis of area under the curve to compare viral burden in the two study groups.

      Reviewer #2 (Public Review):

      Other factors also deserve consideration and include age, and environment (e.g. other comorbidities and coinfections.)

      We agree that these factors could play a role however participants in this study were of similar age (18-23), and information on co-morbidities and coinfections are not known.

      Reviewer #3 (Public Review):

      The word reservoir should not be used to describe proviral DNA soon after ART initiation. It is generally agreed upon that there is still HIV DNA from actively infected cells (phase 1 & 2 decay of RNA) during the first 6-12 months of ART. Only after a full year of uninterrupted ART is it really safe to label intact proviral HIV DNA as an approximation of the reservoir. This should be amended throughout.

      We agree and will amend the use of the word reservoir to only refer to the proviral DNA load after full viral suppression, i.e., during undetectable viral load.

      All raw, individualized data should be made available for modelers and statisticians. It would be very nice to see the RNA and DNA data presented in a supplementary figure by an individual to get a better grasp of intra-host kinetics.

      We will make all relevant data available and accessible to interested parties.

      The legend of Supplementary Figure 2 should list when samples were taken.

      The data in this figure represents an overall analysis of all sequences available for each participant at all time points. This will be explained more clearly in the manuscript and added to the figure legend.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This is a follow-up study to the authors' previous eLife report about the roles of an alpha-arrestin called protein thioredoxin interacting protein (Txnip) in cone photoreceptors and in the retinal pigment epithelium. The findings are important because they provide new information about the mechanism of glucose and lactate transport to cone photoreceptors and because they may become the basis for therapies for retinal degenerative diseases.

      Strengths:

      Overall, the study is carefully done and, although the analysis is fairly comprehensive with many different versions of the protein analyzed, it is clearly enough described to follow. Figure 4 greatly facilitated my ability to follow, understand and interpret the study. The authors have appropriately addressed a few concerns about statistical significance and the relationship between their findings and previous studies of the possible roles of Txnip on GLUT1 expression and localization on the surfaces of RPE cells.

      We are delighted that Reviewer #1 is satisfied with this revised version.

      Reviewer #2 (Public Review):

      The hard work of the authors is much appreciated. With overexpression of a-arrestin Txnip in RPE, cones and the combined respectively, the authors show a potential gene agnostic treatment that can be applied to retinitis pigmentosa. Furthermore, since Txnip is related to multiple intracellular signaling pathway, this study is of value for research in the mechanism of secondary cone dystrophy as well.

      There are a few areas in which the article may be improved through further analysis and application of the data, as well as some adjustments that should be made in to clarify specific points in the article.

      Strengths

      • The follow-up study builds on innovative ground by exploring the impact of TxnipC247S and its combination with HSP90AB1 knockdown on cone survival, offering novel therapeutic pathways.

      • Testing of different Txnip deletion mutants provides a nuanced understanding of its functional domains, contributing valuable insights into the mechanism of action in RP treatment.

      • The findings regarding GLUT1 clearance and the differential effects of Txnip mutants on cone and RPE cells lay the groundwork for targeted gene therapy in RP.

      Weaknesses

      • The focus on specific mutants and overexpression systems might overlook broader implications of Txnip interactions and its variants in the wider context of retinal degeneration.

      Txnip is not expressed in WT or RP cones, as described in our previous study (Xue et al., 2021, eLife), so we could not perform loss of function assays. We thus chose overexpression, and assayed various alleles, based upon the literature, as we describe in our manuscript.

      • The study's reliance on cell count and GLUT1 expression as primary outcomes misses an opportunity to include functional assessments of vision or retinal health, which would strengthen the clinical relevance.

      In our previous study, we demonstrated that the optomotor response of Txnip-treated RP mice improved (Xue et al., 2021, eLife). Also, as described in our previous Txnip study, as well as an independent study (Xue et al., 2021, eLife; Xue et al., 2023, PNAS), ERG assays of Txnip-treated RP cones were no different than the controls. Other therapies that prolong RP cone survival and the optomotor response in our lab also failed to save the ERG, suggesting that there are other pathways that need to be addressed, e.g. the visual cycle. A combination therapy addressing multiple problems is one of our goals.

      • The paper could benefit from a deeper exploration of why certain treatments (like Best1-146 Txnip.C247S) do not lead to cone rescue and the potential for these approaches to exacerbate disease phenotypes through glucose shortages.

      This system is more complicated than we currently understand, and more work needs to be done.

      • Minor inconsistencies, such as the missing space in text references and the need for clarification on data representation (retinas vs. mice), should be addressed for clarity and accuracy.

      The missing spaces are added.

      We described the strategy of injecting the same mouse in each eye, one eye with control and one with the experimental vector. However, the following sentence has been added to the Materials and Methods to better assist the reader:

      “In almost all experiments, other than as noted, one eye of the mouse was treated with control (AAV8-RedO-H2BGFP, 2.5 × 108 vg/eye), and the other eye was treated with the experimental vector plus AAV8-RedO-H2BGFP, 2.5 × 108 vg/eye.”

      • The observation of promoter leakage and potential vector tropism issues raise questions about the specificity and efficiency of the gene delivery system, necessitating further discussion and validation.

      The following sentences have been added to the Results. We do not think this phenomenon affects the practice of the experiments or the interpretation of the results in this study.

      “To enable automated cone counting and trace the infection, we co-injected an AAV (AAV8-RedO-H2BGFP-WPRE-bGHpA) encoding an allele of GFP fused to histone 2B (H2BGFP), which localized to the nucleus. As the red opsin promoter was used to express this gene, H2BGFP was seen in cone nuclei, but not in the RPE, if AAV8-RedO-H2BGFP-WPRE-bGHpA was injected alone. However, when an AAV that expressed in the RPE, i.e. AAV8-Best1-Sv40intron-(Gene)-WPRE-bGHpA, was co-injected with AAV8-RedO-H2BGFP-WPRE-bGHpA, H2BGFP was expressed in the RPE, along with expression in cones (Figure 2A). We speculate that this is due to concatenation or recombination of the two genomes, such that the H2BGFP comes under the control of the RPE promoter. This may be due to the high copy number of AAV in the RPE, as it did not happen in the reverse combination, i.e. AAV with an RPE promoter driving GFP and a cone promoter driving another gene, perhaps due to the observation that the AAV genome copy number is »10 fold lower in cones than in the RPE (Wang et al., 2020).”

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Public Review):

      Summary:

      This paper provides a straightforward mechanism of how mycobacterial cAMP level is increased under stressful conditions and shows that the increase is important for the survival of the bacterium in animal hosts. The cAMP level is increased by decreasing the expression of an enzyme that degrades cAMP.

      We thank the reviewer for these extremely encouraging comments.

      Strengths:

      The paper shows that under different stresses the response regulator PhoP represses a phosphodiesterase (PDE) that degrades cAMP specifically. Identification of PhoP as a regulator of cAMP is significant progress in understanding Mtb pathogenesis, as increase in cAMP apparently increases bacterial survival upon infection. On the practical side, reduction of cAMP by increasing PDE can be a means to attenuate the growth of the bacilli. The results have wider implications since PhoP is implicated in controlling diverse mycobacterial stress responses and many bacterial pathogens modulate host cell cAMP level. The results here are straightforward, internally consistent, and of both theoretical and applied interests. The results also open considerable future work, especially how increases in cAMP level help to increase survival of the pathogen.

      Weaknesses:

      It is not clear whether PhoP-PDE Rv0805 is the only pathway to regulate cAMP level under stress.

      Reviewer 1 (Recommendations for the authors):

      (1) L.1: "maintenance of" or 'regulating'- I thought change in cAMP level upon stress is the whole point of the paper. Also, can replace "intracellular survival" with 'survival in host macrophages' if you want to be more specific.

      We agree with the reviewer, and therefore, we have now replaced “maintenance of” with “regulating cAMP level” in the title. However, we feel more comfortable with “intracellular survival” rather than being more specific with ‘survival in host macrophages’ as we have also shown animal experiments to demonstrate ‘in vivo’ effect in mice lung and spleen.

      (2) L.26: ---requires the bacterial virulence regulator –

      The suggested change has been made to the text.

      (3) L.30: Replace "phoP locus since the" with 'PhoP since this'. (The product, not the locus, is the regulator). The same comment for l.113.

      We agree with the reviewer. The suggested changes have been made to the text.

      (4) L.31: Change represtsor to repressor.

      We are sorry for the embarrassing spelling mistake. We have rectified the mistake in the revised version.

      (5) L.32: "hydrolytically degrades" or hydrolyses? (lytic and degrade sound like tautology). Same comment for l.117.

      We agree. The suggested change has been made to the text in both places of the revised manuscript.

      (6) L.35: I would also suggest changing "intra-mycobacterial" to 'intra bacterial' because you are talking about one bacterium here. The same change is recommended in l.29.

      Following reviewer’s recommendation, we have made the changes in the revised manuscript.

      (7) L.37: bacillus unless use of the plural form is the norm in the field.

      We agree. The suggested change has been made to the text.

      (8) L.43: Delete "intracellular" and change "intracellular" to host in l.44.

      The suggested changes have been made to the text.

      (9) L.66: --that a burst--

      We have corrected the mistake in the revised manuscript.

      (10) L.76: Receptor or receptor?

      We have corrected the mistake in the revised manuscript.

      (11) L.86: -- mechanisms of regulation of mycobacterial cAMP level. (homeostasis needs to be introduced first, and not used in the concluding statement for the first time).

      The suggested changes have been made to the text.

      (12) L.96: "essential" or 'a requirement'. (reduction is not the same as elimination)

      We understand the reviewer’s concern. However, several studies have independently established that phoPR remains an essential requirement for mycobacterial virulence.

      (13) L.97: Moreover, a mutant

      The suggested change has been made to the text.

      (14) L.113: --locus since PhoP has been –

      The suggested change has been made to the text.

      (15) L.119: mechanism or manner? (you are stating a fact, not a mechanism)

      We agree. We have now replaced ‘mechanism’ with ‘manner’ in the revised manuscript.

      (16) L.130: --lacking copies of both phoP and phoR (I am assuming you don't have two copies of each gene)

      We understand the reviewer’s concern. For better clarity, we have now clearly mentioned that the phoPR-KO mutant lacks both the single copies of phoP and phoR genes.

      (17) L.156: Indicate why GroEL2? - cells as another cytoplasmic protein, GroEL2 was also undetectable

      We have now mentioned it in the secretion experiments that mycobacterial cells did not undergo autolysis. To prove this point, we have used cytoplasmic GroEL2 as a marker protein. The absence of detectable GroEL2 in the culture filtrates (CFs) suggests absence of autolysis. To this end, we have modified the sentence in the revised manuscript (duplicated below):

      “Fig. 1C confirms absence of autolysis of mycobacterial cells as GroEL2, a cytoplasmic protein, was undetectable in the culture filtrates (CF).”

      (18) L.266: May delete "Together". Start with These data--, which would draw more attention to integrated view. In l.268-270, a reminder that intracellular pH is acidic in the normal course would enhance the physiological significance of the present results.

      We agree. We have made the suggested changes to the text. In view of the second comment of the reviewer, we have modified the text (duplicated below):

      “These data represent an integrated view of our results suggesting that PhoP-dependant repression of rv0805 regulates intra-mycobacterial cAMP level. In keeping with these results, activated PhoP under acidic pH conditions significantly represses rv0805, and intracellular mycobacteria most likely utilizes a higher level of cAMP to effectively mitigate stress for survival under hostile environment including acidic pH of the phagosome.”

      (19) L.272: Delete "and intracellular survival" (?) (I am assuming the survival is due to stress tolerance; also the section talks about stress only). No period in l.273.

      Following reviewer’s recommendations, the suggested changes have been made to the text.

      (20) L.295: Start the sentence thus: It appears that at least one of ---. (This would put more emphasis on the inference)

      We agree. We have now incorporated the recommended changes in the revised version.

      (21) L.301: No parenthesis.

      The parenthesis has been removed in the revised manuscript.

      (22) L.306: Together already implies these. Either delete Together (which I would prefer) or say 'Together, the results suggest that strains expressing wild type and mutant----properties, and the results are

      We agree. We have now deleted ‘Together’ in the revised manuscript.

      (23) L.311: These results support our view that higher---- (to avoid repetition of l.266)

      We agree. We have now incorporated the suggested change in the revised manuscript.

      (24) L.316: Using or with?

      We think “with” goes well with the statement.

      (25) L.329: Rephrase thus: Effect of intra-bacterial cAMP level on in vivo--

      The recommended change has been made to the text.

      (26) L.333: I would use ~, if you want to indicate about.

      We agree. We have now used ‘~’ in the revised version. Changes were incorporated in lines 328, 330 and 333 of the revised manuscript.

      (27) L.350: Change "somewhat functionally" to phenotypically?

      We thank the reviewer for this suggestion. We have changed “somewhat functionally” to “phenotypically” in the revised manuscript.

      (28) L.361: Change "is connected to" to 'regulates'.

      The suggested change has been made to the text.

      (29) L.365: ACs (to be parallel with PDEs)

      We agree. The suggested change has been made to the text.

      (30) L.366: delete "very" (let the readers decide how recent from the reference date).

      The suggested change has been made to the text.

      (31) L.382: level remained unknown before the present study.

      The recommended change has been made to the text.

      (32) L.399: add at the end of the sentence 'under stress'. Also, represent, not represents.

      The recommended changes have been made to the text.

      (33) L.560 and 571: Section headings formatted differently from the rest. Similar problem in l.900.

      We have rectified the issue and all of the section headings are now formatted in the same style.

      Reviewer #2 (Public Review):

      Summary:

      In the manuscript, the authors have presented new mechanistic details to show how intracellular cAMP levels are maintained linked to the phosphodiesterase enzyme which in turn is controlled by PhoP. Later, they showed the physiological relevance linked to altered cAMP concentrations.

      Strengths:

      Well thought out experiments. The authors carefully planned the experiments well to uncover the molecular aspects of it diligently.

      We thank the reviewer for these extremely encouraging comments.

      Weaknesses:

      Some fresh queries were made based on the author's previous responses and hope to get satisfactory answers this time.

      We provide below a point-by-point response to the fresh queries.

      (2) Line 134: please describe the complementation strain features as it is mentioned for the first time (plasmid, copy number, promoter etc.) in the manuscript. Especially under NO stress what could be the authors' justification regarding the high cAMP concentration in the complementation strain?

      As recommended by the reviewer, the details of construction of the complemented strain have been incorporated in the 'Materials and Methods' section of the revised manuscript (duplicated below): "To complement phoPR expression, pSM607 containing a 3.6-kb DNA fragment of M. tuberculosis phoPR including 200-bp phoP promoter region, a hygromycin resistance cassette, attP site and the gene encoding phage L5 integrase, as detailed earlier (Walters et al., 2006) was used to transform phoPR mutant to integrate at the L5 attB site.

      " To address the reviewer's other concern, we have now included the following sentence in the 'Results' section of the revised manuscript (duplicated below): "A higher cAMP level in the complemented strain under NO stress is possibly attributable to reproducibly higher phoP expression in the complemented mutant under specific stress condition (Khan et al., 2022)."

      Reference: Khan et al. (2022) Convergence of two global regulators to coordinate expression of essential virulence determinants of Mycobacterium tuberculosis. eLife 2022, 11:e80965.

      New query: The complemented gene (in pSM607 plasmid) becomes a single copy after chromosomal integration, so it should ideally behave like a WT strain. How could authors still justify the high cAMP concentration under NO stress?

      We agree with the reviewer. We are unable to provide a cogent justification regarding this result. We speculate that PhoP is strikingly activated under NO stress by a non-canonical mechanism and strongly represses rv0805 expression. As a result, there is a significantly higher cAMP concentration in case of the complemented mutant under NO stress.

      (13) Line 292: There is a difference between red and green bars. Authors should do statistical analysis and then comment on whether overexpression of WT and mutant pde are different or similar, to me they are different; also, explain why the WT-Rv0805 strain is different than the phoPR-KO strain in the context of cell wall metabolism.

      As recommended by the reviewer, we have now included statistical significance of the data in the revised version, and modified the text accordingly in the manuscript.

      New query: Authors are asked to put a statistical significance test between WT-Rv0805 and WT-Rv0805M.

      We have included it in the modified figure. Also, to explain it we incorporated new text in the legend to Fig. 4C of the revised manuscript (duplicated below):

      “Note that similar to phoPR-KO, WT-Rv0805 shows a comparably higher sensitivity to CHP relative to WT bacilli. However, WT-Rv0805M expressing a mutant Rv0805, shows a significantly lower sensitivity to CHP relative to WT-Rv0805, as measured by the corresponding CFU values.”

      (14) Line 299-303: Authors should explain how the colocalization % are calculated. Also, in the figure 4D merge panel please highlight the difference.

      As suggested by the reviewer, we have now explained the methodology used to calculate percent colocalization in greater details. Also, we have modified Figure 4D to highlight the difference between samples shown in merge panel. Please see our response to comment # 33 from the Reviewer 1.

      New query: In the figure legend it should be mentioned that the white arrow indicates non-co-localization which is visibly higher in WT and WT Rvo805M.

      We thank the reviewer for this very important suggestion. We have now included the following text in the legend to Fig. 4D of the revised manuscript.

      “White arrowheads in the merge panels indicate non-colocalization, which remains higher in WT-H37Rv and WT-Rv0805M relative to phoPR-KO or WT-Rv0805.”

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      Through an unbiased genomewide KO screen, the authors identified loss of DBT to suppress MG132-mediated death of cultured RPE cells. Further analyses suggested that DBT reduces ubiquitinated proteins by promoting autophagy. Mechanistic studies indicated that DBT loss promotes autophagy via AMPK and its downstream ULK and mTOR signaling. Furthermore, loss of DBT suppresses polyglutamine- or TDP-43-mediated cytotoxicity and/or neurodegeneration in fly models. Finally, the authors showed that DBT proteins are increased in ALS patient tissues, compared to non-neurological controls.

      Strengths:

      The idea is novel, the evidence is mostly convincing, and the data are clean. The findings have implications for human diseases.

      Reply: We thank the reviewer for the supportive comments.

      Weaknesses:

      More experiments are needed to establish the connections between DBT and autophagy. The mechanistic studies are somewhat biased, and it's unclear whether the same mechanism (i.e., AMPK-->mTOR) can be applied to TDP-43-mediated neurodegeneration. Also, some data interpretation has to be more accurate.

      Reply: We thank the reviewer for raising these questions, and we have provided additional evidence in the revised manuscript to support the model that DBTKO can enhance autophagy and induce resistance to TDP-43-associated toxicity. This is described in greater detail below.

      (1) To provide further evidence for the connection between DBT and autophagy, we have introduced additional controls. For the additional controls, we have included the AMPK shRNA and drug treatment controls (Fig.4D, Fig.S4B), and these results suggest that reducing the AMPK level renders DBTKO cells sensitive to MG132 toxicity. We also added the TSC1 shRNA and mTOR agonist treatment controls (Fig.5E, Fig.S4G), and the results show that increasing mTOR levels also make the DBTKO cells sensitive to MG132.

      (2) To further confirm the roles of AMPK and mTOR in DBTKO cells, we introduced the AMPK agonist (EX229) and mTOR inhibitors (RAD001 and AZD8055) in co-treatment experiments with MG132 and then measured cell survival (Fig.S4D, S4G). The results indicate that promoting AMPK activation or inhibiting mTOR can enhance cell resistance to MG132-induced toxicity.

      (3) Additionally, we included the overexpression and rescue experiments for DBT and analyzed the AMPK-ULK1 signaling in WT RPE1 and DBTKO cells (Fig.S5D, S5E). The results indicate that the increase of DBT can significantly reduce the phosphorylation of AMPK/ULK1 and the levels of the autophagy marker LC3II. Together, these results suggest that DBT plays an important role in autophagy.

      (4) We had shown in the original version of the manuscript that DBTKO renders cells more resistant to TDP-43-associated toxicity, similar to the tolerance of MG132-induced toxicity. Here we further show that expression of TDP-43M337V enhances the phosphorylation of AMPK in the DBTKO cells (Fig. S7A), similar to the effect of the MG132 treatment. These results suggest that the resistance of DBTKO cells to MG132 or TDP-43-assoicated toxicity shares a similar mechanism of activated the AMPK signaling.

      Reviewer #2 (Public Review):

      Summary:

      Hwang, Ran-Der et al utilized a CRISPR-Cas9 knockout in human retinal pigment epithelium (RPE1) cells to evaluate for suppressors of toxicity by the proteasome inhibitor MG132 and identified that knockout of dihydrolipoamide branched chain transacylase E2 (DBT) suppressed cell death. They show that DBT knockout in RPE1 cells does not alter proteasome or autophagy function at baseline. However, with MG132 treatment, they show a reduction in ubiquitinated proteins but with no change in proteasome function. Instead, they show that DBT knockout cells treated with MG132 have improved autophagy flux compared to wildtype cells treated with MG132. They show that MG132 treatment decreases ATP/ADP ratios to a greater extent in DBT knockout cells, and in accordance causes activation of AMPK. They then show downstream altered autophagy signaling in DBT knockout cells treated with MG132 compared to wild-type cells treated with MG132. Then they express the ALS mutant TDP43 M337 or expanded polyglutamine repeats to model Huntington's disease and show that knockdown of DBT improves cell survival in RPE1 cells with improved autophagic flux. They also utilize a Drosophila model and show that utilizing either a RNAi or CRISPR-Cas9 knockout of DBT improves eye pigment in TDP43M337V and polyglutamine repeat-expressing transgenic flies. Finally, they show evidence for increased DBT in postmortem spinal cord tissue from patients with ALS via both immunoblotting and immunofluorescence.

      Strengths:

      This is a mechanistic and well-designed paper that identifies DBT as a novel regulator of proteotoxicity via activating autophagy in the setting of proteasome inhibition. Major strengths include careful delineation of a mechanistic pathway to define how DBT is protective. These conclusions are largely justified, but additional experiments and information would be useful to clarify and extend these conclusions.

      Reply: We thank the reviewer for the supportive comments.

      Weaknesses:

      The large majority of the experiments are evaluating suppression of drug (MG132) toxicity in an in vitro epithelial cell line, so the generalizability to disease is unclear. Indeed, MG132 itself has been shown to modulate autophagy, and off-target effects of MG132 are not addressed. While this paper is strengthened by the inclusion of mouse-induced motor neurons, Drosophila models, and postmortem tissue, the putative mechanisms are minimally evaluated in these models.

      Also, this effect is only seen with MG132 treatment, at a dose that causes markedly impaired cell survival. In this setting, it is certainly plausible that changes in autophagy could be the result of differences in cell survival, as opposed to an underlying mechanism for cell survival. Additional controls would be useful to increase confidence that DBT knockdown is protective via modulation of autophagy.

      While the authors report increased DBT in postmortem ALS tissue as suggestive that DBT may modulate proteotoxicity in neurodegeneration, this point would be better supported with the evaluation of overexpression of DBT in their model.

      Reply: We appreciate the reviewer for raising these questions, and we have provided further evidence in the revised manuscript to support the proposed mechanism that DBTKO confers resistance to MG132-induced toxicity through activation of autophagy. This is discussed in greater detail below.

      (1) To provide further mechanistic analysis, we have included additional controls for the analysis of AMPK signaling in Fig. 4D and Fig. S4B. These results demonstrate that using drugs or shRNAs to reduce AMPK activity can decrease DBTKO survival. We have also shown that that an increasing the AMPK activity with an activator enhances the survival of both WT and DBTKO cells under MG132 treatment (Fig. S4D), suggesting that DBTKO cells resist MG132-induced toxicity through the activation of AMPK signaling.

      (2) We have included additional controls for the analysis of mTOR signaling in Fig. 5E and Fig. S4F. The results in Fig. 5E show that reducing TSC1 using shRNAs can decrease DBTKO survival. We also added the experiments with mTOR agonist MHY1485 as a control in Fig. S4F. These results indicate that mTOR activation can promote DBTKO cells' sensitivity to MG132 toxicity. To further confirm the importance of mTOR in DBTKO-mediated resistance to MG132 toxicity, we included the mTOR inhibitors RAD001 and AZD8055 in the co-treatment experiments with MG132, and then measured cell survival (Fig. S4G). The results show that both mTOR inhibitors can enhance cell resistance to MG132-induced toxicity (Fig. S4G). These findings suggest that mTOR inhibition is required for DBTKO-mediated cell survival under MG132 treatment.

      (3) To further test the hypothesis that DBT knockdown is protective via modulation of autophagy, we have introduced the overexpression of DBT and the rescue of DBT in DBTKO cells to analyze the AMPK signaling that regulates autophagy (Fig. S5E). The results demonstrate that overexpression of DBT significantly reduced the phosphorylation of AMPK and ULK1 (Fig. S5E). In the rescue experiment, the results mirror those of the overexpression experiment, showing a significant reduction in the phosphorylation of AMPK and ULK1 (Fig. S5E). We also analyzed the autophagy marker LC3II in both the overexpression and rescue experiments, and the results indicate that increasing the DBT level specifically reduces the LC3II level (Fig. S5D). These results support the model that loss of DBT promotes the activation of autophagy.

      (4) To test the hypothesis that DBT may modulate proteotoxicity in neurodegeneration, we included the studies with TDP-43M337V and found that the expression of the mutant TDP43 enhanced the phosphorylation of AMPK in the DBTKO cells (Fig. S7A), consistent with the observations made with MG-132 treatment. Together with other findings in the manuscript, these results indicate that DBTKO can sensitize the activation of the AMPK signaling and confer the resistance to TDP-43-associated toxicity.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Editor’s summary:

      This paper by Castello-Serrano et al. addresses the role of lipid rafts in trafficking in the secretory pathway. By performing carefully controlled experiments with synthetic membrane proteins derived from the transmembrane region of LAT, the authors describe, model and quantify the importance of transmembrane domains in the kinetics of trafficking of a protein through the cell. Their data suggest affinity for ordered domains influences the kinetics of exit from the Golgi. Additional microscopy data suggest that lipid-driven partitioning might segregate Golgi membranes into domains. However, the relationship between the partitioning of the synthetic membrane proteins into ordered domains visualised ex vivo in GPMVs, and the domains in the TGN, remain at best correlative. Additional experiments that relate to the existence and nature of domains at the TGN are necessary to provide a direct connection between the phase partitioning capability of the transmembrane regions of membrane proteins and the sorting potential of this phenomenon.

      The authors have used the RUSH system to study the traffic of model secretory proteins containing single-pass transmembrane domains that confer defined affinities for liquid ordered (lo) phases in Giant Plasma Membrane derived Vesicles (GPMVs), out of the ER and Golgi. A native protein termed LAT partitioned into these lo-domains, unlike a synthetic model protein termed LAT-allL, which had a substituted transmembrane domain. The authors experiments provide support for the idea that ER exit relies on motifs in the cytosolic tails, but that accelerated Golgi exit is correlated with lo domain partitioning.

      Additional experiments provided evidence for segregation of Golgi membranes into coexisting lipid-driven domains that potentially concentrate different proteins. Their inference is that lipid rafts play an important role in Golgi exit. While this is an attractive idea, the experiments described in this manuscript do not provide a convincing argument one way or the other. It does however revive the discussion about the relationship between the potential for phase partitioning and its influence on membrane traffic.

      We thank the editors and scientific reviewers for thorough evaluation of our manuscript and for positive feedback. While we agree that our experimental findings present a correlation between trafficking rates and raft affinity, in our view, the synthetic, minimal nature of the transmembrane protein constructs in question makes a strong argument for involvement of membrane domains in their trafficking. These constructs have no known sorting determinants and are unlikely to interact directly with trafficking proteins in cells, since they contain almost no extramembrane amino acids. Yet, the LATTMD traffics through Golgi similarly to the full-length LAT protein, but quite different from mutants with lower raft phase affinity. We suggest that these observations can be best rationalized by involvement of raft domains in the trafficking fates and rates of these constructs, providing strong evidence (beyond a simple correlation) for the existence and relevance of such domains.

      We have substantially revised the manuscript to address all reviewer comments, including several new experiments and analyses. These revisions have substantially improved the manuscript without changing any of the core conclusions and we are pleased to have this version considered as the “version of record” in eLife.

      Below is our point-by-point response to all reviewer comments.

      ER exit:

      The experiments conducted to identify an ER exit motif in the C-terminal domain of LAT are straightforward and convincing. This is also consistent with available literature. The authors should comment on whether the conservation of the putative COPII association motif (detailed in Fig. 2A) is significantly higher than that of other parts of the C-terminal domain.

      Thank you for this suggestion, this information has now been included as Supp Fig 2B. While there are other wellconserved residues of the LAT C-terminus, many regions have relatively low conservation. In contrast, the essential residues of the COPII association motif (P148 and A150) are completely conserved across in LAT across all species analyzed.

      One cause of concern is that addition of a short cytoplasmic domain from LAT is sufficient to drive ER exit, and in its absence the synthetic constructs are all very slow. However, the argument presented that specific lo phase partitioning behaviour of the TMDs do not have a significant effect on exit from the ER is a little confusing. This is related to the choice of the allL-TMD as the 'non-lo domain' partitioning comparator. Previous data has shown that longer TMDs (23+) promote ER export (eg. Munro 91, Munro 95, Sharpe 2005). The mechanism for this is not, to my knowledge, known. One could postulate that it has something to do with the very subject of this manuscript- lipid phase partitioning. If this is the case, then a TMD length of 22 might be a poor choice of comparison. A TMD 17 Ls' long would be a more appropriate 'non-raft' cargo. It would be interesting to see a couple of experiments with a cargo like this.

      The basis for the claim that raft affinity has relatively minor influence on ER exit kinetics, especially in comparison to the effect of the putative COPII interaction motif, is in Fig 1G. We do observe some differences between constructs and they may be related to raft affinity, however we considered these relatively minor compared to the nearly 4-fold increase in ER efflux induced by COPII motifs.

      We have modified the wording in the manuscript to avoid the impression that we have ruled out an effect of raft affinity of ER exit.

      We believe that our observations are broadly consistent with those of Munro and colleagues. In both their work and ours, long TMDs were able to exit the ER. In our experiments, this was true for several proteins with long TMDs, either as fulllength or as TMD-only versions (see Fig 1G). We intentionally did not measure shorter synthetic TMDs because these would not have been comparable with the raft-preferring variants, which all require relatively long TMDs, as demonstrated in our previous work1,2. Thus, because our manuscript does not make any claims about the influence of TMD length on trafficking, we did not feel that experiments with shorter non-raft constructs would substantively influence our conclusions.

      However, to address reviewer interest, we did complete one set of experiments to test the effect of shortening the TMD on ER exit. We truncated the native LAT TMD by removing 6 residues from the C-terminal end of the TMD (LAT-TMDd6aa). This construct exited the ER similarly to all others we measured, revealing that for this set of constructs, short TMDs did not accumulate in the ER. ER exit of the truncated variant was slightly slower than the full-length LAT-TMD, but somewhat faster than the allL-TMD. These effects are consistent with our previous measurements with showed that this shortened construct has slightly lower raft phase partitioning than the LAT-TMD but higher than allL2. While these are interesting observations, a more thorough exploration of the effect of TMD length would be required to make any strong conclusion, so we did not include these data in the final manuscript.

      Author response image 1.

      Golgi exit:

      For the LAT constructs, the kinetics of Golgi exit as shown in Fig. 3B are surprisingly slow. About half of the protein Remains in the Golgi at 1 h after biotin addition. Most secretory cargo proteins would have almost completely exited the Golgi by that time, as illustrated by VSVG in Fig. S3. There is a concern that LAT may have some tendency to linger in the Golgi, presumably due to a factor independent of the transmembrane domain, and therefore cannot be viewed as a good model protein. For kinetic modeling in particular, the existence of such an additional factor would be far from ideal. A valuable control would be to examine the Golgi exit kinetics of at least one additional secretory cargo.

      We disagree that LAT is an unusual protein with respect to Golgi efflux kinetics. In our experiments, Golgi efflux of VSVG was similar to full-length LAT (t1/2 ~ 45 min), and both of these were similar to previously reported values3. Especially for the truncated (i.e. TMD) constructs, it is very unlikely that some factor independent of their TMDs affects Golgi exit, as they contain almost no amino acids outside the membrane-embedded TMD.

      Practically, it has proven somewhat challenging to produce functional RUSH-Golgi constructs. We attempted the experiment suggested by the reviewer by constructing SBP-tagged versions of several model cargo proteins, but all failed to trap in the Golgi. We speculate that the Golgin84 hook is much more sensitive to the location of the SBP on the cargo, being an integral membrane protein rather than the lumenal KDEL-streptavidin hook. This limitation can likely be overcome by engineering the cargo, but we did not feel that another control cargo protein was essential for the conclusions we presented, thus we did not pursue this direction further.

      Comments about the trafficking model

      (1) In Figure 1E, the export of LAT-TMD from the ER is fitted to a single-exponential fit that the authors say is "well described". This is unclear and there is perhaps something more complex going on. It appears that there is an initial lag phase and then similar kinetics after that - perhaps the authors can comment on this?

      This is a good observation. This effect is explainable by the mechanics of the measurement: in Figs 1 and 2, we measure not ‘fraction of protein in ER’ but ‘fraction of cells positive for ER fluorescence’. This is because the very slow ER exit of the TMD-only constructs present a major challenge for live-cell imaging, so ER exit was quantified on a population level, by fixing cells at various time points after biotin addition and quantifying the fraction of cells with observable ER localization (rather than tracking a single cell over time).

      For fitting to the kinetic model (which attempts to describe ‘fraction in ER/Golgi’) we re-measured all constructs by livecell imaging (see Supp Fig 5) to directly quantify relative construct abundance in the ER or Golgi. These data did not have the plateau in Fig 1E, suggesting that this is an artifact of counting “ER positive cells” which would be expected to have a longer lag than “fraction of protein in ER”. Notably however, t1/2 measured by both methods was similar, suggesting that the population measurement agrees well with single-cell live imaging.

      We have included all these explanations and caveats in the manuscript. We have also changed the wording from “well described” to “reasonably approximated”.

      (2) The model for Golgi sorting is also complicated and controversial, and while the authors' intention to not overinterpreting their data in this regard must be respected, this data is in support of the two-phase Golgi export model (Patterson et al PMID:18555781).

      The reviewers are correct, our observations and model are consistent with Patterson et al and it was a major oversight that a reference to this foundational work was not included. We have now added a discussion regarding the “two phase model” of Patterson and Lippincott-Schwartz.

      Furthermore contrary to the statement in lines 200-202, the kinetics of VSVG exit from the Golgi (Fig. S3) are roughly linear and so are NOT consistent with the previous report by Hirschberg et al.

      Regarding kinetics of VSVG, our intention was to claim that the timescale of VSVG efflux from the Golgi was similar to previously reported in Hirschberg, i.e. t1/2 roughly between 30-60 minutes. We have clarified this in the text. Minor differences in the details between our observations and Hirschberg are likely attributable to temperature, as those measurements were done at 32°C for the tsVSVG mutant.

      Moreover, the kinetics of LAT export from the Golgi (Fig. 3B) appear quite different, more closely approximating exponential decay of the signal. These points should be described accurately and discussed.

      Regarding linear versus exponential fits, we agree that the reality of Golgi sorting and efflux is far more complicated than accounted for by either the phenomenological curve fitting in Figs 1-3 or the modeling in Fig 4. In addition to the possibility of lateral domains within Golgi stacks, there is transport between stacks, retrograde traffic, etc. The fits in Figs 1-3 are not intended to model specifics of transport, but rather to be phenomenological descriptors that allowed us to describe efflux kinetics with one parameter (i.e. t1/2). In contrast, the more refined kinetic modeling presented in Figure 4 is designed to test a mechanistic hypothesis (i.e. coexisting membrane domains in Golgi) and describes well the key features of the trafficking data.

      Relationship between membrane traffic and domain partitioning:

      (1) Phase segregation in the GPMV is dictated by thermodynamics given its composition and the measurement temperature (at low temperatures 4degC). However at physiological temperatures (32-37degC) at which membrane trafficking is taking place these GPMVs are not phase separated. Hence it is difficult to argue that a sorting mechanism based solely on the partitioning of the synthetic LAT-TMD constructs into lo domains detected at low temperatures in GPMVs provide a basis (or its lack) for the differential kinetics of traffic of out of the Golgi (or ER). The mechanism in a living cell to form any lipid based sorting platforms naturally requires further elaboration, and by definition cannot resemble the lo domains generated in GPMVs at low temperatures.

      We thank the reviewers for bringing up this important point. GPMVs are a useful tool because they allow direct, quantitative measurements of protein partitioning between coexisting ordered and disordered phases in complex, cell-derived membranes. However, we entirely agree, that GPMVs do not fully represent the native organization of the living cell plasma membrane and we have previously discussed some of the relevant differences4,5. Despite these caveats, many studies have supported the cellular relevance of phase separation in GPMVs and the partitioning of proteins to raft domains therein 6-9. Most notably, elegant experiments from several independent labs have shown that fluorescent lipid analogs that partition to Lo domains in GPMVs also show distinct diffusive behaviors in live cells 6,7, strongly suggesting the presence of nanoscopic Lo domains in live cells. Similarly, our recent collaborative work with the lab of Sarah Veatch showed excellent agreement between raft preference in GPMVs and protein organization in living immune cells imaged by super-resolution microscopy10. Further, several labs6,7, including ours11, have reported nice correlations between raft partitioning in GPMVs and detergent resistance, which is a classical (though controversial) assay for raft association.

      Based on these points, we feel that GPMVs are a useful tool for quantifying protein preference for ordered (raft) membrane domains and that this preference is a useful proxy for the raft-associated behavior of these probes in living cells. We propose that this approach allows us to overcome a major reason for the historical controversy surrounding the raft field: nonquantitative and unreliable methodologies that prevented consistent definition of which proteins are supposed to be present in lipid rafts and why. Our work directly addresses this limitation by relating quantitative raft affinity measurements in a biological membrane with a relevant and measurable cellular outcome, specifically inter-organelle trafficking rates.

      Addressing the point about phase transition temperatures in GPMVs: this is the temperature at which macroscopic domains are observed. Based on physical models of phase separation, it has been proposed that macroscopic phase separation at lower temperatures is consistent sub-microscopic, nanoscale domains at higher temperatures8,12. These smaller domains can potentially be stabilized / functionalized by protein-protein interactions in cells13 that may not be present in GPMVs (e.g. because of lack of ATP).

      (2) The lipid compositions of each of these membranes - PM, ER and Golgi are drastically different. Each is likely to phase separate at different phase transition temperatures (if at all). The transition temperature is probably even lower for Golgi and the ER membranes compared to the PM. Hence, if the reported compositions of these compartments are to be taken at face value, the propensity to form phase separated domains at a physiological temperature will be very low. Are ordered domains even formed at the Golgi at physiological temperatures?

      It is a good point that the membrane compositions and the resulting physical properties (including any potential phase behavior) will be very different in the PM, ER, and Golgi. Whether ordered domains are present in any of these membranes in living cells remains difficult to directly visualize, especially for non-PM membranes which are not easily accessible by probes, are nanoscopic, and have complex morphologies. However, the fact that raft-preferring probes / proteins share some trafficking characteristics, while very similar non-raft mutants behave differently argues that raft affinity plays a role in subcellular traffic.

      (3) The hypothesis of 'lipid rafts' is a very specific idea, related to functional segregation, and the underlying basis for domain formation has been also hotly debated. In this article the authors conflate thermodynamic phase separation mechanisms with the potential formation of functional sorting domains, further adding to the confusion in the literature. To conclude that this segregation is indeed based on lipid environments of varying degrees of lipid order, it would probably be best to look at the heterogeneity of the various membranes directly using probes designed to measure lipid packing, and then look for colocalization of domains of different cargo with these domains.

      This is a very good suggestion, and a direction we are currently following. Unfortunately, due to the dynamic nature and small size of putative lateral membrane domains, combined with the interior of a cell being filled with lipophilic environments that overlay each other, directly imaging domains in organellar membranes with lipid packing probes remains extremely difficult with current technology (or at least available to us). We argue that the TMD probes used in this manuscript are a reasonable alternative, as they are fluorescent probes with validated selectivity for membrane compartments with different physical properties.

      Ultimately, the features of membrane domains suggested by a variety of techniques – i.e. nanometric, dynamic, relatively similar in composition to the surrounding membrane, potentially diverse/heterogeneous – make them inherently difficult to microscopically visualize. This is one reason why we believe studies like ours, which use a natural model system to directly quantify raft-associated behaviors and relate them to cellular effects (in our case, protein sorting), are a useful direction for this field.

      We believe we have been careful in our manuscript to avoid confusing language surrounding lipid rafts, phase separation, etc. Our experiments clearly show that mammalian membranes have the capacity to phase separate, that some proteins preferentially interact with more ordered domains, and that this preference is related to the subcellular trafficking fates and rates of these proteins. We have edited the manuscript to emphasize these claims and avoid the historical controversies and confusions.

      (4) In the super-resolution experiments (by SIM- where the enhancement of resolution is around two fold or less compared to optical), the authors are able to discern a segregation of the two types of Golgi-resident cargo that have different preferences for the lo-domains in GPMVs. It should be noted that TMD-allL and the LATallL end up in the late endosome after exit of the Golgi. Previous work from the Bonafacino laboratory (PMID: 28978644) has shown that proteins (such as M6PR) destined to go to the late endosome bud from a different part of the Golgi in vesicular carriers, while those that are destined for the cell surface first (including TfR) bud with tubular vesicular carriers. Thus at the resolution depicted in Fig 5, the segregation seen by the authors could be due to an alternative explanation, that these molecules are present in different areas of the Golgi for reasons different from phase partitioning. The relatively high colocalization of TfR with the GPI probe in Fig 5E is consistent with this explanation. TfR and GPI prefer different domains in the GPMV assays yet they show a high degree of colocalization and also traffic to the cell surface.

      This is a good point. Even at microscopic resolutions beyond the optical diffraction limit, we cannot make any strong claims that the segregation we observe is due to lateral lipid domains and not several reasonable alternatives, including separation between cisternae (rather than within), cargo vesicles moving between cisternae, or lateral domains that are mediated by protein assemblies rather than lipids. We have explicitly included this point in the Discussion: “Our SIM imaging suggests segregation of raft from nonraft cargo in the Golgi shortly (5 min) after RUSH release (Fig 5B), but at this level of resolution, we can only report reduced colocalization, not intra-Golgi protein distributions. Moreover, segregation within a Golgi cisterna would be very difficult to distinguish from cargo moving between cisternae at different rates or exiting via Golgi-proximal vesicles.”

      We have also added a similar caveat in the Results section of the manuscript: “These observations support the hypothesis that proteins can segregate in Golgi based on their affinity for distinct membrane domains; however, it is important to emphasize that this segregation does not necessarily imply lateral lipid-driven domains within a Golgi cisterna. Reasonable alternative possibilities include separation between cisternae (rather than within), cargo vesicles moving between cisternae, or lateral domains that are mediated by protein assemblies rather than lipids.”

      Finally, while probes with allL TMD do eventually end up in late endosomes (consistent with the Bonifacino lab’s findings which we include), they do so while initially transiting the PM2,11.

      Minor concerns:

      (1) Generally, the quantitation is high quality from difficult experimental data. Although a lot appears to be manual, it appears appropriately performed and interpreted. There are some claims that are made based on this quantitation, however, where there are no statistics performed. For example, figure 1B. Any quantitation with an accompanying conclusion should be subject to a statistical test. I think the quality of the model fits- this is particularly important.

      We appreciate the thoughtful feedback, the quantifications and fits were not trivial, but we believe important. We have added statistical significance to Figure 1B and others where it was missing.

      (2) Modulation of lipid levels in Fig 4E shows a significant change for the trafficking rate for the LAT-TMD construct and a not so significant change for all-TMD construct. However, these data are not convincing and appear to depend on a singular data point that appears to lower the mean value. In general, the experiment with the MZA inhibitor (Fig. 4D-F) is hard to interpret because cells will likely be sick after inhibition of sphingolipid and cholesterol synthesis. Moreover, the difference in effects for LAT-TMD and allL-TMD is marginal.

      We disagree with this interpretation. Fig 4E shows the average of three experiments and demonstrates clearly that the inhibitors change the Golgi efflux rate of LAT-TMD but not allL-TMD. This is summarized in the t1/2 quantifications of Fig 4F, which show a statistically significant change for LAT-TMD but not allL-TMD. This is not an effect of a singular data point, but rather the trend across the dataset.

      Further, the inhibitor conditions were tuned carefully to avoid cells becoming “sick”: at higher concentrations, cells did adopt unusual morphologies and began to detach from the plates. We pursued only lower concentrations, which cells survived for at least 48 hrs and without major morphological changes.

      (3) Line 173: 146-AAPSA-152 should read either 146-AAPSA-150 or 146-AAPSAPA-152, depending on what the authors intended.

      Thanks for the careful reading, we intended the former and it has been fixed.

      (4) What is the actual statistical significance in Fig. 3C and Fig. 3E? There is a single asterisk in each panel of the figure but two asterisks in the legend.

      Apologies, a single asterisk representing p<0.05 was intended. It has been fixed.

      (5) The code used to calculate the model. is not accessible. It is standard practice to host well-annotated code on Github or similar, and it would be good to have this publicly available.

      We have deposited the code on a public repository (doi: 10.5281/zenodo. 10478607) and added a note to the Methods.

      (1) Lorent, J. H. et al. Structural determinants and func7onal consequences of protein affinity for membrane ra=s. Nature communica/ons 8, 1219 (2017).PMC5663905

      (2) Diaz-Rohrer, B. B., Levental, K. R., Simons, K. & Levental, I. Membrane ra= associa7on is a determinant of plasma membrane localiza7on. Proc Natl Acad Sci U S A 111, 8500-8505 (2014).PMC4060687

      (3) Hirschberg, K. et al. Kine7c analysis of secretory protein traffic and characteriza7on of golgi to plasma membrane transport intermediates in living cells. J Cell Biol 143, 1485-1503 (1998).PMC2132993

      (4) Levental, K. R. & Levental, I. Giant plasma membrane vesicles: models for understanding membrane organiza7on. Current topics in membranes 75, 25-57 (2015)

      (5) Sezgin, E. et al. Elucida7ng membrane structure and protein behavior using giant plasma membrane vesicles. Nat Protoc 7, 1042-1051 (2012)

      (6) Komura, N. et al. Ra=-based interac7ons of gangliosides with a GPI-anchored receptor. Nat Chem Biol 12, 402-410 (2016)

      (7) Kinoshita, M. et al. Ra=-based sphingomyelin interac7ons revealed by new fluorescent sphingomyelin analogs. J Cell Biol 216, 1183-1204 (2017).PMC5379944

      (8) Stone, M. B., Shelby, S. A., Nunez, M. F., Wisser, K. & Veatch, S. L. Protein sor7ng by lipid phase-like domains supports emergent signaling func7on in B lymphocyte plasma membranes. eLife 6 (2017).PMC5373823

      (9) Machta, B. B. et al. Condi7ons that Stabilize Membrane Domains Also Antagonize n-Alcohol Anesthesia. Biophys J 111, 537-545 (2016)

      (10) Shelby, S. A., Castello-Serrano, I., Wisser, I., Levental, I. & S., V. Membrane phase separa7on drives protein organiza7on at BCR clusters. Nat Chem Biol in press (2023)

      (11) Diaz-Rohrer, B. et al. Rab3 mediates a pathway for endocy7c sor7ng and plasma membrane recycling of ordered microdomains Proc Natl Acad Sci U S A 120, e2207461120 (2023)

      (12) Veatch, S. L. et al. Cri7cal fluctua7ons in plasma membrane vesicles. ACS Chem Biol 3, 287-293 (2008)

      (13) Wang, H. Y. et al. Coupling of protein condensates to ordered lipid domains determines func7onal membrane organiza7on. Science advances 9, eadf6205 (2023).PMC10132753

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Suggestions to the authors:

      • Please re-analyze findings by omitting from all Tables and Figures all data of comparators who were not randomized (BAC). I understand the difficulties of running this trial but the results of excess reduction of mortality do not allow the publication of a trial where comparators do not come from the randomized patient population.

      We wish to thank the editors and reviewers for their useful comments. Given that the study was designed with both randomised and CC participants we can’t easily exclude the CC analysis from the paper. However, we do provide graphs for both randomised only and randomised and CC participants for the primary and secondary endpoints. The fact that the primary endpoint (CRP) results are mirrored in both instances is also informative form a trial design perspective and indicative of the effect of dornase alfa therapy on inflammation being robust enough to yield the same results with small and larger cohorts.

      We agree that there are potential drawbacks of using contemporary controls. To address these potential biases we used CC patients recruited at the same time period at single site using the same selection criteria as the randomised group, which minimised potential bias. However, the enrolment and comparison of CRP in CC-BAC participants to concurrent randomised control R-BAC patients indicated that the two groups responded to BAC treatment in the same manner (Table 2, LS means log(CRP) 3.78 vs 3.53, P=0.386), whereas the R-BAC+DA vs R-BAC group comparison yielded significant differences (Table 2, LS means log(CRP) 3.1 vs 3.59, P=0.041). These comparisons mitigate to a large degree these potential problems.

      Still, to make easy to distinguish the groups we now use the following unique nomenclature throughout the manuscript which is clearly defined on ln. 111 and state that comparisons of treated participants were performed with both control groups separately and combined.

      R-BAC: Randomised BAC CC-BAC: Contemporary control BAC R-BAC+DA : Randomised BAC+ dornase alfa T-BAC: R-BAC + CC-BAC

      In fact, the most important bias in our study, might actually be the placebo effect, given that participants randomised to BAC did not receive a nebulized control substance. We now discuss these points in more detail in the manuscript and modified the title by removing the reference to a randomised trial and clinical outcomes.

      • The presentation remains confusing and the manuscript should be critically revised for clarity. There is a repetition of methods (e.g. lines 176-187 repeat 160-175) and redundant results (e.g. Figure S2, Table 3).

      We apologise for the repetition. We removed the repeated text in the Exclusion criteria (lines 176-187 in the old manuscript).

      Figure S2 is not related to Table 3. Figure S2 depicts baseline characteristics, whereas Table 3 complements the graph in Figure 3A but lists the mean daily value of the primary endpoint as requested by Reviewer 1 in the first round of revision.

      At Table 4: the authors should select one method of illustration for lab results, either Table or figure, without repetitions

      We agree and have removed Table 4 leaving the graphs instead.

      • Regarding inclusion criteria, it is unclear whether high radiological suspicion is sufficient for inclusion or whether PCR based confirmation is required in all instances (differences in wording between lines 153 and 191), and under which oxygen requirements (lines 155 and 192)

      We thank the reviewer for pointing this out. Indeed, radiological suspicion was not sufficient and all participants in this study had a positive PCR test as part of their diagnosis prior to inclusion in the study. The entire eligibility section was rewritten to reflect this important point.

      • Table 1 should be merged with Table S2 and a better description of cohort baseline severity (P/F, SOFA, APACHE, organ support, number of patients in each point of the WHO severity score) and treatments should be made available.

      We thank the reviewer for this suggestion. We have now merged Table 1 and S2 and included WHO ordinal severity information in Table 1, with median, average, SD, min and max values which reflect the participant distribution. Unfortunately, although the additional requested information was recorded, it was not systematically collected for the analysis of the trial and it was not straight forward to compile at this stage.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendation for the authors):

      (1) On a few occasions, I found that the authors would introduce a concept, but provide evidence much later on. For example, in line 57, they introduced the idea that feedback timing modulates engagement of the hippocampus and striatum, but they provided the details much later on around line 99. There are a few instances like these, and the authors may want to go through the manuscript critically to bridge such gaps to improve the flow of reading.

      First, we thank the reviewer for acknowledging the contribution of our study and the methodological choices. We acknowledge the concern raised about the flow of information in the introduction. We have critically reviewed the manuscript, especially on writing style and overall structure, to ensure a smoother transition between the introduction of concepts and the provision of supporting evidence. In the case of the concept of feedback timing and memory systems, lines 46-58 first introduce the concept enhanced with evidence regarding adults, and we then pick up the concept around line 103 again to relate it to children and their brain development to motivate our research question. To further improve readability, we have included an outline of what to expect in the introduction. Specifically, we added a sentence in line 66-68 that provides an overview of the different paragraphs: “We will introduce the key parameters in reinforcement learning and then we review the existing literature on developmental trajectories in reinforcement learning as well as on hippocampus and striatum, our two brain regions of interest.”

      This should prepare the reader better when to expect more evidence regarding the concepts introduced. We included similar “road-marker” outline sentences in other occasions the reviewer commented on, to enhance consistency and readability.

      (2) I am curious as to how they think the 5-second delay condition maps onto real-life examples, for example in a classroom setting feedback after 5 seconds could easily be framed as immediate feedback.

      The authors may want to highlight a few illustrative examples.

      Thank you for asking about the practical implications of a 5-second delay condition, which may be very relevant to the reader. We have modified the introduction example in line 39-41 towards the role of feedback timing in the classroom to point out its practical relevance early on: “For example, children must learn to raise their hand before speaking during class. The teacher may reinforce this behavior immediately or with a delay, which raises the question whether feedback timing modulates their learning”.

      We have also expanded a respective discussion point in lines 720-728 to pick up the classroom example and to illustrate how we think timescale differences may apply: “In scenarios such as in the classroom, a teacher may comment on a child’s behavior immediately after the action or some moments later, in par with our experimental manipulation of 1 second versus 5 seconds. Within such short range of delay in teachers’ feedback, children’s learning ability during the first years of schooling may function equally well and depend on the striatal-dependent memory system. However, we anticipate that the reliance on the hippocampus will become even more pronounced when feedback is further delayed for longer time. Children’s capacity for learning over longer timescales relies on the hippocampal-dependent memory system, which is still under development. This knowledge could help to better structure learning according to their development.”

      (3) In the methods section, there are a few instances of task description discrepancies which make things a little bit confusing, for example, line 173 reward versus punishment, or reward versus null elsewhere e.g. line 229. In the same section, line 175, there are a few instances of typos.

      We appreciate your attention to detail in pointing out discrepancies in task descriptions and typos in the method section. We have revised the section, corrected typos, and now phrased the learning outcomes consistently as “reward” and “punishment”.

      (4). I wasn't very clear as to why the authors did not compute choice switch probability directly from raw data but implemented this as a model that makes use of a weight parameter. Former would-be much easier and straightforward for data plotting especially for uninformed readers, i.e., people who do not have backgrounds in computational modelling.

      Thank you for asking for clarification on the calculation of switching behavior. Indeed, in the behavioral results, switching behavior was directly calculated from the raw data. We now stressed this in the methods in lines 230-235, also by naming win-stay and lose-shift as “proportions” instead of as “probabilities”:“As a first step, we calculated learning outcomes diretly from the raw data, which where learning accuracy, win-stay and lose-shift behavior as well as reaction time.

      Learning accuracy was defined as the proportion to choose the more rewarding option, while win-stay and lose-shift refer to the proportion of staying with the previously chosen option after a reward and switching to the alternative choice after receiving a punishment, respectively.”

      In contrast to the raw data switching behavior, the computational heuristic strategy model indeed uses a weight for a relative tendency of switching behavior. We have also stressed the advantage of the computational measure and its difference to the raw data switching behavior in lines 248-252 and believe that the reader can now clearly distinguish between the raw data and the computational results: “Note that these model-based outcomes are not identical to the win-stay and lose-shift behavior that were calculated from the raw data. The use of such model-based measure offers the advantage in discerning the underlying hidden cognitive process with greather nuance, in contrast to classical approaches that directly use raw behavioral data.”

      (5) I agree with the authors' assertion that both inverse temperature and outcome sensitivity parameters may lead to non-identifiability issues, but I was not 100% convinced about their modelling approach exclusively assessing a different family of models (inv temperature versus outcome sensitivity). Here, I would like to make one mid-way recommendation. They may want to redefine the inverse temperature term in terms of reaction time, i.e., B=exp^(s+g(RT-mean (RT)) where s and g are free parameters (see Webb, 2019), and keep the outcome sensitivity parameter in the model with bounds [0,2] so that the interpretation could be % increase or decrease in actual outcome. Personally, in tasks with binary outcomes i.e. [0,1: null vs reward] I do not think outcome sensitivity parameters higher than 2 are interpretable as these assign an inflated coefficient to outcomes.

      We appreciate the mid-way recommendation regarding the modeling approach for inverse temperature and outcome sensitivity parameters. We have carefully revised our analysis approach by considering alternative modeling choices. Regarding the suggestion to redefine the inverse temperature in terms of reaction time by B=exp^(s+g(RT-mean (RT)), we unfortunately were not able to identify the reference Webb (2019), nor did we find references to the suggested modeling approach. Any further information that the reviewer could provide will be greatly appreciated. Regardless, we agree that including reaction times through the implementation of drift-diffusion modeling may be beneficial. However, changing the inverse temperature model in such a way would necessitate major changes in our modeling approach, which unfortunately would result in non-convergence issues in our MCMC pipeline using Rstan. Hence, this approach goes beyond the scope of the manuscript. Nonetheless, we have decided to mention the use of a drift-diffusion model, along with other methodological considerations, as future recommendation for disentangling outcome sensitivity from inverse temperature in lines 711-712: “Future studies might shed new light by examining neural activations at both task phases, by additionally modeling reaction times using a drift-diffusion approach, or by choosing a task design that allows independent manipulations of these phases and associated model parameters, e.g., by using different reward magnitudes during reinforcement learning, or by studying outcome sensitivity without decisionmaking.“

      Regarding the upper bound of outcome sensitivity, we agree that traditionally, limiting the parameter values at 2 is the choice for the parameter to be best interpretable. During model fitting, we had experienced non-convergence issues and ceiling effects in the outcome sensitivity parameter when fixing the inverse temperature at 1. The non-convergence issue was not resolved when we fixed the inverse temperature at 15.47, which was the group mean of the winning inverse temperature family. Model convergence was only achieved after increasing the outcome sensitivity upper bound to 20, with inverse temperature again fixed at 1. Since this model also performed well during parameter and model recovery, we argue that the parameter is nevertheless meaningful, despite the more extreme trial-to-trial value fluctuations under higher outcome sensitivity. We described our choice for this model in the methods section in lines 282-288: “Even though outcome sensitivity is usually restricted to an upper bound of 2 to not inflate outcomes at value update, this configuration led to ceiling effects in outcome sensitivity and non-converging model results. Further, this issue was not resolved when we fixed the inverse temperature at the group mean of 15.47 of the winning inverse temperature family model. It may be that in children, individual differences in outcome sensitivity are more pronounced, leading to more extreme values. Therefore, we decided to extend the upper bound to 20, parallel to the inverse temperature, and all our models converged with Rhat < 1.1.”.

      (6) I think the authors reporting optimal parameters for the model is very important (line 464), but the learning rate they report under stable contingencies is much higher than LRs reported by for example Behrens et al 2007, LRs around 0.08 for the optimal learning behaviour. The authors may want to discuss why their task design calls for higher learning rates.

      Thank you for appreciating our optimal parameter analysis, and for the recommendation to discuss why optimal learning rates in our task design may call for higher learning rates compared to those reported in some other studies. As largely articulated in Zhang et al (2020; primer piece by one of our co-authors), the optimal parameter combination is determined by several factors, such as the reward schedule (e.g., 75:25, vs 80:20) and task design (e.g., no reversal, one reversal, vs multiple reversal) and number of trials (e.g., 80, vs 100, vs, 120). Notably, in these taskrelated regards, our task is different from Behrens et al. (2007), which hinders a quantitative comparison among the optimal parameters in the two tasks. We have now included more details in our discussion in lines 643-656: “However, the differences in learning rate across studies have to be interpreted with caution. The differences in the task and the analysis approach may limit their comparability. Task proporties such as the trial number per condition differed across studies. Our study included 32 trials per cue in each condition, while in adult studies, the trials per condition ranged from 28 to 100. Optimal learning rates in a stable learning environment were at around 0.25 for 10 to 30 trials, another study reported a lower optimal learning rate of around 0.08 for 120 trials. This may partly explain why in our case of 32 trials per condition and cue, optimal learning rates called for a relatively high optimal learning rate of 0.29, while in other studies, optimal learning rates may be lower. Regarding differences in the analysis approach, the hierarchical bayesian estimation approach used in our study produces more reliable results in comparison to maximum likelihood estimation, which had been used in some of the previous adult studies and may have led to biased results towards extreme values. Taken together, our study underscores the importance of using longitudinal data to examine developmental change as well as the importance of simulation-based optimal parameters to interpret the direction of developmental change.”

      (7) The authors may want to report degrees of freedom in t-tests so that it would be possible to infer the final sample size for a specific analysis, for example, line 546.

      We appreciate the recommendation to include degrees of freedom, which are now added in all t-test results, for example in line 579: “Episodic memory, as measured by individual corrected object recognition memory (hits - false alarms) of confident (“sure”) ratings, showed at trend better memory for items shown in the delayed feedback condition (𝛽!""#$%&’(#")%*"# = .009, SE =.005, t(df = 137) = 1.80, p = .074, see Figure 5A).”

      (8) I'm not sure why reductions in lose shift behaviour are framed as an improvement between 2 assessment points, e.g. line 578. It all depends on the strength of the contingency so a discussion around this point should be expanded.

      We acknowledge that a reduction in lose-shift behavior only reflect improvements under certain conditions where uncertainty is low and the learning contingencies are stable, which is the case in our task. We have added Supplementary Material 4 to illustrate the optimality of win-stay and lose-shift proportions from model simulation and to confirm that children’s longitudinal development was indeed towards more optimal switching behavior. In the manuscript, we refer to these results in lines 488-490: “We further found that the average longitudinal change in win-stay and lose-shift proportion also developed towards more optimal value-based learning (Supplementary Material 4).”

      (9) If I'm not mistaken, the authors reframe a trend-level association as weak evidence. I do not think this is an accurate framing considering the association is strictly non-significant, therefore should be omitted line 585.

      We thank for the point regarding the interpretation of a trend-level association as weak evidence. We changed our interpretation, corrected in lines 581-585: “The inclusion of poor learners in the complete dataset may have weakend this effect because their hippocampal function was worse and was not involved in learning (nor encoding), regardless of feedback timing. To summarize, there was inconclusive support for enhanced episodic memory during delayed compared to immediate feedback, calling for future study to test the postulation of a selective association between hippocampal volume and delayed feedback learning.” as well as lines 622-623: “Contrary to our expectations, episodic memory performance was not enhanced under delayed feedback compared to immediate feedback.”

      Reviewer # 2 (Public Review):

      We thank the reviewer for acknowledging the strength of our study and pointing out its weaknesses.

      Weaknesses:

      There were a few things that I thought would be helpful to clarify. First, what exactly are the anatomical regions included in the striatum here?

      We appreciate the clarification question regarding the anatomical regions included in the striatum. The striatum included ventral and dorsal regions, i.e., accumbens, caudate and putamen. We have now specified the anatomical regions that were included in the striatum in lines 211-212: “We extracted the bilateral brain volumes for our regions of interest, which were striatum and hippocampus. The striatum regions included nucleus accumbens, caudate and putamen.”

      Second, it was mentioned that for the reduced dataset, object recognition memory focused on "sure" ratings. This seems like the appropriate way to do it, but it was not clear whether this was also the case for the full analyses in the main text.

      Thank you for pointing out that in the full dataset analysis, the use of “sure” ratings for object recognition memory was previously not mentioned. Including only “sure” ratings was used consistently across analyses. This detail is now described under methods in lines 332-333: “Only confident (“sure”) ratings were included in the analysis, which were 98.1 % of all given responses.”

      Third, the children's fitted parameters were far from optimal; is it known whether adults would be closer to optimal on the task?

      We thank for your question on whether adult learning rates in the task have been reported to be more optimal than those of the children in our study. This indeed seems to be the case, and we added this point in our discussion in line 639-643: “Adult studies that examined feedback timing during reinforcement learning reported average learning rates range from 0.12 to 0.34, which are much closer to the simulated optimal learning rates of 0.29 than children’s average learning rates of 0.02 and 0.05 at wave 1 and 2 in our study. Therefore, it is likely that individuals approach adult-like optimal learning rates later during adolescence.”

      The main thing I would find helpful is to better integrate the differences between the main results reported and the many additional results reported in the supplement, for example from the reduced dataset when excluding non-learners. I found it a bit challenging to keep track of all the differences with all the analyses and parameters. It might be helpful to report some results in tables side-by-side in the two different samples. And if relevant, discuss the differences or their implication in the Discussion. For example, if the patterns change when excluding the poor learners, in particular for the associations between delayed feedback and hippocampal volume, and those participants were also those less well fit by the value-based model, is that something to be concerned about and does that affect any interpretations? What was not clear to me is whether excluding the poor learners at one extreme simply weakens the general pattern, or whether there is a more qualitative difference between learners and non-learners. The discussion points to the relevance of deficits in hippocampaldependent learning for psychopathology and understanding such a distinction may be relevant.

      We appreciate the feedback that it might seem challenging to keep track of differences between the analyses of the full and the reduced dataset. We have now gathered all the analyses for the reduced dataset in Supplementary Material 6, with side-by-side tables for comparison to the full dataset results. Whenever there were differences between the results, they were pointed out in the results section, see lines 557-560: “In the results of the reduced dataset, the hippocampal association to the delayed learning score was no longer significant, suggesting a weakened pattern when excluding poor learners (Supplementary Material 6). It is likely that the exclusion reduced the group variance for hippocampal volume and delayed learning score in the model.” and lines 579-581: “Note that in the reduced dataset, delayed feedback predicted enhanced item memory significantly (Supplementary Material 6).”

      The found differences were further included in our discussion in lines 737-740 in the context of deficits in hippocampal-dependent learning and psychopathology: “Interestingly, poor learners showed relatively less value-based learning in favor of stronger simple heuristic strategies, and excluding them modulated the hippocampal-dependent associations to learning and memory in our results. More studies are needed to further clarify the relationship between hippocampus and psychopathology during cognitive and brain development.”

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      (1) There appears to be a flaw in the exploration of cortical inputs. the authors never show that HFS of cortical inputs has no effect in the absence of thalamic stimulation. It appears that there is a citation showing this, but I think it would be important to show this in this study as well.

      We understand that the reviewer would like us to induce an HFS protocol on cortical input and then test if there is any change in synaptic strength in thalamic input. We have done this experiment which shows that without a footshock, high-frequency stimulation (HFS) of the cortical inputs did not induce synaptic potentiation on the thalamic pathway (Extended Data Fig. 4d).

      (2) t is somewhat confusing that the authors refer to the cortical input as driving heterosynaptic LTP, but this is not shown until Figure 4J, that after non-associative conditioning (unpaired shock and tone) HFS of the cortex can drive freezing and heterosynaptic LTP of thalamic inputs.

      We agree with the reviewer that it is in figure 4j and figure 5,b,c which we show electrophysiological evidence for cortical input driving heterosynaptic LTP. It is only to be consistent with our terminology that initially we used behavioral evidence as the proxy for heteroLTP (figure 3c).

      …, the authors are 'surprised' by this outcome, which appears to be what they predict.

      We removed the phrase “To our surprise”.

      (3) 'Cortex' as a stimulation site is vague. The authors have coordinates they used, it is unclear why they are not using standard anatomical nomenclature.

      We replaced “cortex” with “auditory/associative cortex”.

      (4) The authors' repeated use of homoLTP and heteroLTP to define the input that is being stimulated makes it challenging to understand the experimental detail. While I appreciate this is part of the goal, more descriptive words such as 'thalamic' and 'cortical' would make this much easier to understand.

      We agree with the reviewer that a phrase such as “an LTP protocol on thalamic and cortical inputs” would be more descriptive. We chose the words “homoLTP” and “heteroLTP” only to clarify (for the readers) the physiological relevance of these protocols. We thought by using “thalamic” and “cortical” readers may miss this point. However, when for the first time we introduce the words “homoLTP” and “heteroLTP”, we describe which stimulated pathway each refers to.

      Reviewer #2 (Public Review):

      (1) …The experimental schemes in Figs. 1 and 3 (and Fig. 4e and extended data 4a,b) show that one group of animals was subjected to retrieval in the test context at 24 h, then received HFS, which was then followed by a second retrieval session. With this design, it remains unclear what the HFS impacts when it is delivered between these two 24 h memory retrieval sessions.

      We understand that the reviewer has raised the concern that the increase in freezing we observed after the HFS protocol (ex. Fig. 1b, the bar labeled as Wth+24hHFSth) could be caused or modulated by the recall prior to the HFS (Fig. 1a, top branch). To address this concern, in a new group of mice, 24 hours after weak conditioning, we induced the HFS protocol, followed by testing (that is, no testing prior to the HFS protocol). We observed that homoLTP was as effective in mice that were tested prior to the induction protocol as those that were not (Fig. 1b, Extended Data Fig. 1d,e).

      It would be nice to see these data parsed out in a clean experimental design for all experiments (in Figs 1, 3, and 4), that means 4 groups with different treatments that are all tested only once at 24 h, and the appropriate statistical tests (ANOVA). This would also avoid repeating data in different panels for different pairwise comparisons (Fig 1, Fig 3, Fig 4, and extended Fig 4).

      While we understand the benefit of the reviewer’s suggestion, the current presentation of the data was done to match the flow of the text and the delivery of the information throughout the manuscript. We think it is unlikely that the retrieval test prior to the HFS impacts its effectiveness, as confirmed by homosynaptic HFS data (Extended Data Fig. 1d,e). It is beyond the scope of current manuscript to investigate the mechanisms and manipulations related to reconsolidation and retrieval effects.

      (2) … It would be critical to know if LFPs change over 24 h in animals in which memory is not altered by HFS, and to see correlations between memory performance and LFP changes, as two animals displayed low freezing levels. … They would suggest that thalamo-LA potentiation occurs directly after learning+HFS (which could be tested) and is maintained over 24 h.

      We have performed the experiment where we recorded the evoked LFP 2hrs and 24hrs following the weak conditioning protocol. We observed that a weak conditioning protocol that was not followed by an optical LTP protocol on the cortical inputs failed to produce synaptic potentiation of the thalamic inputs (tested 2hrs and 24hrs after the LTP protocol; Extended Data Fig. 5d,e).

      (3) The statistical analyses need to be clarified. All statements should be supported with statistical testing (e.g. extended data 5c, pg 7 stats are missing). The specific tests should be clearly stated throughout. For ANOVAs, the post-hoc tests and their outcomes should be stated. In some cases, 2-way ANOVAs were performed, but it seems there is only one independent variable, calling for one-way ANOVA.

      All the statistical analyses have been revised and the post-hoc tests performed after the ANOVAs are mentioned in the relevant figure legends.

      Reviewer #2 (Recommendations For The Authors):

      The wording "transient" and "persistent" used here in the context of memory seems a bit misleading, as only one timepoint was assessed for memory recall (24 h), at which the memory strength (freezing levels) seem to change.

      As the reviewer mentioned, we have tested memory recall only at one time point. For this reason, throughout the text we used “transient” exclusively to refer to the experience (receiving footshock) and not to the memory. We replaced “persistence” with “stabilization” where it refers to a memory (“the induction of plasticity influences the stabilization of the memory”).

      For the procedures in which the CS and US were not paired, the term "unpairing" is used (which is probably the more adequate one), but the term "non-associative conditioning" appears in the text, which seems a bit misleading, as this term may have another connotation. There is also literature that an unpairing of CS and US could lead to the formation of a safety memory to the CS, that may be disrupted by HFS stimulation.

      We replaced "non-associative" with “unpaired”.

      Validation of viral injection sites for all experiments: Only representative examples are shown, it would be nice to see all viral expression sites.

      For this manuscript, we have used 155 mice. For this reason, including the injection sites for all the animals in the manuscript is not feasible. Except for the mice that have been excluded, (please see exclusion criteria added in the methods), the expression pattern we observed was consistent across animals and therefore the images shown are true representatives.

      Extended Data 1b: Please explain what N, U, W, and S behavioral groups mean. To what groups mentioned in the text (pg 2,3) do these correspond?

      The requested clarifications are implemented in the figure legend.

      Please elaborate on the following aspects of your methods and approaches:

      • Please explain if the protocol for HFS to manipulate behavior was the same as the one used for the LTP experiments (Fig 1d, Fig 4j) and was identical for homo/hetero inputs from thal and ctx?

      We used the same HFS protocol for all the HFS inductions. We included this information in the methods section.

      • Please state when the HFS was given in respect to the conditioning (what means immediately before and after?) and in which context it was given. Were animals subjected to HFS exposed to the context longer (either before or after the conditioning while receiving HFS) than the other groups? When the HFS was given in another context (for the 24 h group)- how was this controlled for?

      Requested information has been added to the methods section. The control and intervention groups were treated in the same way.

      • When were the footshocks given in the anesthesized recordings (Fig. 4j) and how was the temporal relationship to the HFS? Was the timing the same as for the HFS in the behavioral experiments?

      Requested information has been added to the methods section.

      • Please add information on how the LFP was stimulated and how the LFP- EPSP slope was determined in in vivo recordings, likewise for the whole cell recordings of EPSPs in Fig. 5d-f.

      Requested information has been added to the methods section.

      Here, the y-Axis in Fig. 5e should be corrected to EPSP slope rather than fEPSP slope if these are whole-cell recordings.

      This has been corrected.

      • Please include information if the viral injections and opto-manipulations were done bilateral or unilateral and if so in which hemisphere. Likewise, indicate where the LFP recordings were done.

      Requested information has been added to the methods section.

      • Were there any exclusion criteria for animals (e.g. insufficient viral targeting or placement of fibers and electrodes), other than the testing of the optical CS for adverse effects?

      Requested information has been added to the methods section.

      Statistics: In addition to clarifying analytical statistics, please clarify n-numbers for slice recordings (number of animals, number of slices, and number of cells if applicable).

      Requested information has been added to the methods section.

      It would be nice to scrutinize the results in extended data 4b. The freezing levels with U+24h HFS show a strong trend towards an increase, the effect size may be similar to immediate HFS Fig 4f and extended data 4a) if n was increased.

      We agree with the reviewer. To address this point, we added “HomoLTP protocol when delivered 24hrs later, produced an increase in freezing; however, the value was not statistically significant.” To show this point, we used the same scale for freezing in Extended Data Fig. 4a and b.

      In the final experiment (Fig. 5a-c), Fig. 5b seems to show results from only one animal, but behavioral results are from 4 animals (Fig 5c). It would be helpful to see the quantification of potentiation in each animal.

      The results (now with error bar) include all mice.

      Please spell out the abbreviation "STC".

      Now, it is spelled out.

      Page 8 last sentence of the discussion does not seem to fit there.

      The sentence has been removed.

      Reviewer #3 (Recommendations For The Authors):

      (1) The authors did not determine how WTh affects Th-LA synapses, as field EPSPs were recorded only after HFS. WTh was required for the effects of HFS, as HFS alone did not produce CR in naïve and/or unpaired controls. As such the effects of the WTh protocol on synaptic strength must be investigated.

      We have performed the experiment where we recorded the evoked LFP 2hrs and 24hrs following the weak conditioning protocol. We observed that a weak conditioning protocol that was not followed by an optical LTP protocol on the cortical inputs failed to produce synaptic potentiation of the thalamic inputs (tested 2hrs and 24hrs after the LTP protocol; Extended Data Fig. 5d,e).

      (2) The authors provide some evidence that their dual opsin approach is feasible, particularly the use of sustained yellow light to block the effects of blue light on ChrimsonR. However, this validation was done using single pulses making it difficult to assess the effect of this protocol on Th input when HFS was used. Without strong evidence that the optogenetic methods used here are fault-proof, the main conclusions of this study are compromised. Why did the authors not use a protocol in which fibers were placed directly in the Ctx and Th while using soma-restricted opsins to avoid cross-contamination?

      We understand that the reviewer raises the possibility that our dual-opsin approach, although effective with single pulses, may fail in higher frequency stimulation protocols (10Hz and 85Hz). To address this concern, in a new group of mice we applied our approach to 10Hz and 85Hz stimulation protocols. We show that our approach is effective in single-pulse as well as in 10Hz and 85Hz stimulation protocols (Fig. 2d-h).

    1. Author response:

      Reviewer #1 (Public Review):

      Summary:

      Zhang et al. demonstrate that CD4+ single positive (SP) thymocytes, CD4+ recent thymic emigrants (RTE), and CD4+ T naive (Tn) cells from Cd11c-p28-flox mice, which lack IL-27p28 selectively in Cd11c+ cells, exhibit a hyper-Th1 phenotype instead of the expected hyper Th2 phenotype. Using IL-27R-deficient mice, the authors confirm that this hyper-Th1 phenotype is due to IL-27 signaling via IL-27R, rather than the effects of monomeric IL-27p28. They also crossed Cd11c-p28-flox mice with autoimmune-prone Aire-deficient mice and showed that both T cell responses and tissue pathology are enhanced, suggesting that SP, RTE, and Tn cells from Cd11c-p28-flox mice are poised to become Th1 cells in response to self-antigens. Regarding mechanism, the authors demonstrate that SP, RTE, and Tn cells from Cd11c-p28-flox mice have reduced DNA methylation at the IFN-g and Tbx21 loci, indicating 'de-repression', along with enhanced histone tri-methylation at H3K4, indicating a 'permissive' transcriptional state. They also find evidence for enhanced STAT1 activity, which is relevant given the well-established role of STAT1 in promoting Th1 responses, and surprising given IL-27 is a potent STAT1 activator. This latter finding suggests that the Th1-inhibiting property of thymic IL-27 may not be due to direct effects on the T cells themselves.

      Strengths:

      Overall the data presented are high quality and the manuscript is well-reasoned and composed. The basic finding - that thymic IL-27 production limits the Th1 potential of SP, RTE, and Tn cells - is both unexpected and well described.

      Weaknesses:

      A credible mechanistic explanation, cellular or molecular, is lacking. The authors convincingly affirm the hyper-Th1 phenotype at epigenetic level but it remains unclear whether the observed changes reflect the capacity of IL-27 to directly elicit epigenetic remodeling in developing thymocytes or knock-on effects from other cell types which, in turn, elicit the epigenetic changes (presumably via cytokines). The authors propose that increased STAT1 activity is a driving force for the epigenetic changes and resultant hyper-Th1 phenotype. That conclusion is logical given the data at hand but the alternative hypothesis - that the hyper-STAT1 response is just a downstream consequence of the hyper-Th1 phenotype - remains equally likely. Thus, while the discovery of a new anti-inflammatory function for IL-27 within the thymus is compelling, further mechanistic studies are needed to advance the finding beyond phenomenology.

      Thanks for the comments. Following the suggestions of the reviewer, further studies will be performed to test whether developing thymocytes are the direct targets of IL-27 using Cd4-IL-27ra knockout mice or mixed bone marrow chimeras of wildtype and IL-27ra knockout cells.

      To address the potential autocrine loop in the STAT1 hyperactivation, we added IFN-γ antibody into CD4+ T cell cultures and saw no obvious impact on STAT1 phosphorylation. If deemed necessary, we could further test this possibility in vivo using Cd4-Ifng and CD11c-p28 double knockout mice.

      The detailed mechanisms underlying the hyperactivation of STAT1 remain to be determined. IL-27p28 has recently been shown to act as an antagonist of gp130-mediated signaling. In addition, structural studies have demonstrated that IL-27p28 has the interface with EBI3, as well as the two receptor subunits IL-27Rα and gp130. Taken into consideration of these findings and the fact that p28 and IL-27ra deficiency exhibits similar phenotype, we speculate that deficiency in either p28 or IL-27ra makes more gp130 available to transduce signals elicited by other cytokines. We will next focus on gp130 related cytokines to search for the candidate(s) which ultimately leads to enhanced STAT1 activation in the absence of p28. Alternatively, release of EBI3 in the absence of p28 may facilitate its coupling with other cytokine subunits. IL-35, which is composed of EBI3 and p35, is of particular interest as IL-27Rα is also involved in its signaling.

      To narrow down the candidate cytokines, we will first examine the expression of IL-35 and gp130 related cytokines, including IL-6, IL-11, LIF, CT1, OSM, IL-31, CLCF1, CNTF in the thymus and thymocyte-depleted thymic stromal cells by mining public databases and by RT-PCR. Similarly, CD4+ thymocytes will be examined for the expression of receptor subunits which can couple with gp130, including IL-6R, IL-11R, LIFR, OSMRβ, IL-31Rα, CNTFRα, IL-23R, and IL-12Rβ2.

      We next will select those cytokines expressed in the thymus or thymic stromal cells with cognate receptor expression in CD4+ thymocytes and test their effect on STAT1 phosphorylation of wildtype and p28-deficient CD4+ thymocytes. If deemed necessary, double knockout mice will be engaged to rescue the hyper-Th1 phenotype.

      Reviewer #2 (Public Review):

      Summary:

      Naïve CD4 T cells in CD11c-Cre p28-floxed mice express highly elevated levels of proinflammatory IFNg and the transcription factor T-bet. This phenotype turned out to be imposed by thymic dendritic cells (DCs) during CD4SP T cell development in the thymus [PMID: 23175475]. The current study affirms these observations, first, by developmentally mapping the IFNg dysregulation to newly generated thymic CD4SP cells [PMID: 23175475], second, by demonstrating increased STAT1 activation being associated with increased T-bet expression in CD11c-Cre p28-floxed CD4 T cells [PMID: 36109504], and lastly, by confirming IL-27 as the key cytokine in this process [PMID: 27469302]. The authors further demonstrate that such dysregulated cytokine expression is specific to the Th1 cytokine IFNg, without affecting the expression of the Th2 cytokine IL-4, thus proposing a role for thymic DC-derived p28 in shaping the cytokine response of newly generated CD4 helper T cells. Mechanistically, CD4SP cells of CD11c-Cre p28-floxed mice were found to display epigenetic changes in the Ifng and Tbx21 gene loci that were consistent with increased transcriptional activities of IFNg and T-bet mRNA expression. Moreover, in autoimmune Aire-deficiency settings, CD11c-Cre p28-floxed CD4 T cells still expressed significantly increased amounts of IFNg, exacerbating the autoimmune response and disease severity. Based on these results, the investigators propose a model where thymic DC-derived IL-27 is necessary to suppress IFNg expression by CD4SP cells and thus would impose a Th2-skewed predisposition of newly generated CD4 T cells in the thymus, potentially relevant in autoimmunity.

      Strengths:

      Experiments are well-designed and executed. The conclusions are convincing and supported by the experimental results.

      Weaknesses:

      The premise of the current study is confusing as it tries to use the CD11c-p28 floxed mouse model to explain the Th2-prone immune profile of newly generated CD4SP thymocytes. Instead, it would be more helpful to (1) give full credit to the original study which already described the proinflammatory IFNg+ phenotype of CD4 T cells in CD11c-p28 floxed mice to be mediated by thymic dendritic cells [PMID: 23175475], and then, (2) build on that to explain that this study is aimed to understand the molecular basis of the original finding. In its essence, this study mostly rediscovers and reaffirms previously reported findings, but with different tools. While the mapping of epigenetic changes in the IFNg and T-bet gene loci and the STAT1 gene signature in CD4SP cells are interesting, these are expected results, and they only reaffirm what would be assumed from the literature. Thus, there is only incremental gain in new insights and information on the role of DC-derived IL-27 in driving the Th1 phenotype of CD4SP cells in CD11c-p28 floxed mice.

      Indeed, the present study is based on the finding of enhanced IFN-γ production by CD4+ T cells from CD11c-p28 floxed mice, which was originally reported by Zhang et al. and repeatedly cited in the our manuscript. We revisited this phenomenon in the context of functional bias of newly generated CD4+ T cells and sought to reveal the mechanisms underlying the hyper-Th1 phenotype in the absence of thymic DC-derived IL-27. We showed that deletion of p28 resulted in an unexpected hyperactivation of STAT1, which was accompanied by epigenetic changes in favor of Th1 bias. However, the gap remains between p28 deficiency and STAT1 activation.

      Altogether, the major issues of this study remain unresolved:

      (1) It is still unclear why the p28-deficiency in thymic dendritic cells would result in increased STAT1 activation in CD4SP cells. Based on their in vitro experiments with blocking anti-IFNg antibodies, the authors conclude that it is unlikely that the constitutive activation of STAT1 would be a secondary effect due to autocrine IFNg production by CD4SP cells. However, this possibility should be further tested with in vivo models, such as Ifng-deficient CD11c-p28 floxed mice. Alternatively, is this an indirect effect by other IFNg producers in the thymus, such as iNKT cells? It is necessary to explain what drives the STAT1 activation in CD11c-p28 floxed CD4SP cells in the first place.

      Thanks for the suggestions. Further studies will be performed to test the potential autocrine loop for IFN-γ production in vivo using Cd4-Ifng and CD11c-p28 double knockout mice. This model should also be helpful to exclude the possibility of indirect role of IFN- production by such cells as iNKT.

      As pointed out by the reviewer, a critical unanswered question is what drives the STAT1 activation in CD11c-p28 floxed CD4SP cells. Several lines of evidence point to the possibility that p28 deficiency increases the responsiveness of developing thymocytes to STAT1-activating cytokines. Firstly, IL-27p28 has recently been shown to act as an antagonist of gp130-mediated signaling. Secondly, structural studies have demonstrated that IL-27p28 is centrally positioned in the complex formed with EBI3, as well as the two receptor subunits IL-27Rα and gp130. Thirdly, we observed similar hyper-Th1 phenotype in the absence of either p28 and IL-27ra. Therefore, it is speculated that more gp130 should be available to transduce signals elicited by other cytokines in such a scenario. We will next seek to determine the candidate cytokine(s) responsible for the enhanced STAT1 activation in the absence of p28 as outlined in the response to Reviewer 1.

      (2) It is also unclear whether CD4SP cells are the direct targets of IL-27 p28. The cell-intrinsic effects of IL-27 p28 signaling in CD4SP cells should be assessed and demonstrated, ideally by CD4SP-specific deletion of IL-27Ra, or by establishing bone marrow chimeras of IL-27Ra germline KO mice.

      Thanks for the suggestions. Further studies will be performed to test whether developing thymocytes are the direct targets of IL-27 using Cd4-IL-27ra knockout mice or mixed bone marrow chimeras of wildtype and IL-27ra knockout cells.

    1. Author Response:

      We thank the editors for their assessment of our manuscript. We appreciate the reviewers’ thoughtful comments and plan to incorporate their feedback into a revised manuscript. We agree that incorporating an additional, more common ablation tool would be highly complementary to our Kir2.1 ablation studies. We also agree that images across timepoints should be expanded for contact analyses, connectomics data can be better leveraged, additional quantifications can be performed as suggested by the reviewers to better support claims, and that the introduction and discussion can be revised to better position our work in the context of previous studies. We also strongly agree that providing data on receptor RNA and protein expression in the GF across timepoints would be extremely informative, however we have found acquiring these data, at the necessary resolution, would require new approaches and tools that may be outside the scope of the project.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Farhat-Younis and colleagues demonstrate tumor-specific IgM's capacity to induce tumor cell death in monocyte-derived dendritic cell cultures. They subsequently designed a chimeric receptor based on high-affinity FcRI. However, the authors found that the transfection process was more efficient when either the variable light or heavy chain was transfected individually rather than the entire scFv. This scFv construct led to an endoplasmic reticulum (ER) stress response and scFv degradation. A considerable portion of the manuscript is dedicated to the negative scFv expression results. The authors pivoted to a modified FcgRI capable of transmitting IgM signals. This represents a tremendous amount of work in the development of this chimeric receptor, the critical experiment showing efficacy in vivo was not presented, and instead various in vitro assays are shown. Thus, this manuscript will markedly benefit from showing improved responses to tumors in vivo when macrophages express FcgRI-IgM.

      We deeply thank the reviewer for his thoughtful comments and overall favorable review of our manuscript.

      1) In a mouse tumor model, the authors demonstrated that monocyte-derived dendritic cells (MoDCs) treated with IgG immune complexes (ICs) were more effective at preventing tumor growth compared to those treated with IgM ICs (as shown in Figure 1B). In Figure 1C, their in vitro experiments revealed that IgM resulted in tumor cell death, as well as increased production of nitric oxide (NO) and granzyme B. How do the authors reconcile IgG IC-treated MoDCs performing better in preventing tumors in vivo than IgM IC-treated MoDCs, despite the in vitro results with IgM-ICs. The authors speculate that IgG IC-treated MoDCs might trigger T cell immunity but do not show T cell involvement.

      We apologize for not making this point clearer. We have extensively studied this phenomenon and published two papers that detailed the underlying mechanism in two consecutive papers (PMID: 27812544, PMID: 25924063). Briefly, we showed that DC activated with IgM-IC DC undergo cell death concomitantly to their release of lytic granules and lysis of tumor cells. As a result, they do not migrate to the lymph nodes where they should induce reactive T cell clones. In contrast, DC activated with IgG-IC do not elicit in vitro cytotoxicity but rather process the IC to present its derived antigens of MHC-II. We addressed that issue in the revised version and cited the relevant paper to further clarify it.

      (2) The authors report distinct functional consequences of MoDCs incubated with tumor-IgG complexes and tumor IgM complexes. Tumor growth was inhibited and T cell immunity induced with the former. The latter, however, elicited robust anti-tumor killing. What happens if MoDCs are incubated with both IgG and IgM complexes? If this combined treatment induces effective killing and T cell memory, would this impact the design of the chimeric receptor to include IgG responsiveness as well?

      This is a very interesting point. As mentioned above, our previous publications strongly suggest that tumor binding IgG and IgM induce different processes in myeloid cells. Yet, since MoDC naturally express the high affinity receptors for IgG FcRI, we speculate that treating tumor-bearing mice modified monocyte, alone or in combination with tumor-binding IgG, would shed some light into that. Indeed, such treatment elicit a strong T cell immunity in these mice and the data was added to Supplementary Data Figure S4J. With that being said, a complete analysis of this question is very complicated and extent beyond the scope of this work. We would like to emphasize that the purpose of this work is to highlight some of the challenges unique to genetic manipulation in myeloid cells and to suggest one alternative scaffold for integrating signaling in these cells. We do not argue that the specific solution presented here is the most potent one and more work is required before promoting such treatment into the clinic. We have added a sentence to the Discussion section that stress that issue.

      (3) In Figure 5H, the authors demonstrate the ability of the chimeric receptor construct to deplete tumor cells in vitro. The ms would improve if the authors could show the chimeric receptor construct results in tumor cell death and/or prevention in an in vivo model. Similarly, if combined stimulation with IgG and IgM complexes enhances tumor response, this should be incorporated into the therapeutic strategy.

      This is a wonderful suggestion. To address that, we challenged C57Bl/6 mice with B16F10 melanoma and allowed them to grow until it reached a palpable size of approximately 25 mm2. Concomitantly, we cultured bone marrow dendritic cells from syngeneic mice and transfected them with a linear mRNA of the alpha/mu construct. Tumor bearing mice were then treated with alpha/mu and sham transduced BMDC alone, or in combination with antibody against the melanoma antigen Trp1 (TA99). The results were added as Figure 5K and to Supplementary Figure S4h-S4I.

      Reviewer #2 (Public Review):

      Summary:

      While a significant portion of immunotherapy research has focused on the pivotal role of T cells in tumor immunity, their effectiveness may be limited by the suppressive nature of the tumor environment. On the other hand, myeloid cells are commonly found within tumors and can withstand these adverse conditions. However, these cells often adopt an immunosuppressive phenotype when infiltrating tumors. Therefore, manipulating myeloid cells could potentially enhance the anti-tumor potential of immunotherapy.

      In this manuscript, Farhat-Younes and colleagues have demonstrated that activating the IgM receptor signaling in myeloid cells induces an oxygen burst, the secretion of Granzyme B, and the lysis of adjacent tumor cells. Furthermore, they have outlined a strategy to utilize these features to generate CAR macrophages. However, they have identified a limitation: the expression of scFv in myeloid cells induces ER stress and the degradation of misfolded proteins. To address this issue, chimeric receptors were designed based on the high-affinity FcγRI for IgG. When macrophages transfected with these receptors were exposed to tumor-binding IgG, extensive tumor cell killing, and the release of reactive oxygen species and Granzyme B were observed.

      Strengths:

      In general, I consider this work to be significant, and the results are compelling. It emphasizes the specific considerations and requirements for successful manipulation in myeloid cells, which could further advance the field of cellular engineering for the benefit of immunotherapy

      We thank the reviewer for his thoughtful comments and overall appreciation of our findings.

      Weaknesses:

      Nevertheless, there are several minor issues that should be addressed:

      (1) TCR fragments are commonly used to induce ER stress in non-immune cells. Therefore, it would be interesting to investigate whether TCR fragments can be expressed in myeloid cells and if they induce ER stress. Addressing this issue would support the notion that these cells lack the ER chaperones required for folding immunoglobulin variable chains.

      This is a wonderful suggestion. To assess that possibility, we cloned the alpha chain of anti-Trp1 TCR and transfected RAW 264.7 macrophages. Importantly, we could not detect expression on this construct in macrophages, further supporting our findings with scFv in these cells. We added this result to Figure 4J and Supplementary Figure S3C.

      (2) It would be valuable to determine whether, after the degradation of scFv fragments by myeloid cells, they are presented on MHC-I and MHC-II.

      This is a very interesting point. To address that, we generated a genetic construct where we fused the anti-CD19 scFv to a polypeptide composed from the MHCI and the MHCII fragments of Ova Albumin. Next, DC 2.4 were transfected with this construct and measured their capacity to stimulate the proliferation of CD8+ T cells from OT-I and CD4+ from OT-II mice. DC transfected with this construct efficiently stimulated the proliferation of both T cells, suggesting that both Ova fragments are indeed presented on MHCI and MHCII. Nonehteless, DC transfected with polypeptide of MHCI and MHCII fragments of Ova Albumin only (with no scFv), were almost equally effective in stimulating OT-I and OT-II T cell proliferation. We added that result to Supplementary Figure S3D-S3E.

      (3) Some methodological details, such as the vaccination protocol and high-resolution microscopy procedures, are missing from the text.

      We thank the reviewer for pointing out these issues. We added the missing details to the revised version of the manuscript.

    1. Author response:

      We thank both reviewers for their feedback and for underlying the potential of our new tool and experimental approach to identify signalling molecules that can improve the in vitro derivation of specific cell types from human pluripotent stem cells. To address the reviewers' points we plan to carry out further analysis that should solidify our conclusions. We will also edit the text to temper conclusions where appropriate.

    1. Author response:

      The following is the authors’ response to the current reviews.

      We sincerely appreciate the reviewer’s dedication to evaluating our manuscript and raising essential considerations regarding the classification of the migration behavior we described. While the reviewer suggests that this behavior aligns with the concept of itinerancy, we contend that it represents a distinct phenomenon, albeit with similarities, as both involve the non-breeding movements of birds. We acknowledge that our manuscript did not adequately address this distinction and have considered the reviewer’s feedback. In our response, we clarify the difference between the described phenomenon and itinerancy. Our revised manuscript will include a new section in the Discussion to address this issue comprehensively.

      In the first part of the review, the reviewer emphasizes that the pattern we are describing is consistent with itinerancy. Regardless of the terminology used, we want to highlight the existence of two different types of migratory behavior, both of which involve movement in non-breeding areas.

      The first type, called itinerancy, was first described by Moreau in 1972 in “The Palaearctic-African Bird Migration Systems.” As noted by the reviewer, this behavior involves an alternation of stopovers and movements between different short-term non-breeding residency areas. They usually occur in response to food scarcity in one part of the non-breeding range, causing birds to move to another part of the same range. These movements typically cover distances of 10 to 100 kilometers but are neither continuous nor directional. Moreau (1972) defined itinerancy as prolonged stopovers, normally lasting several months, primarily in tropical regions. He noted observations of certain species disappearing from his study areas in sub-Saharan Africa in December and others appearing, suggesting they may have multiple home ranges during the non-breeding season. Subsequent research, as mentioned by the reviewer, has confirmed itinerancy in many species, particularly among Palaearctic-African migrants in sub-Saharan Africa. In particular, the Montagu’s Harrier has been extensively studied in this regard. The reviewer rightly points out that our study does not include recent findings on this species. In our revised version, we will include references to recent studies, such as those by Trierweiler et al. (2013, Journal of Animal Ecology, 82:107-120) and Schlaich et al. (2023, Ardea, 111:321-342), which show that Montagu’s Harrier has an average of 3-4 home ranges separated by approximately 200 kilometers. These studies suggest that the species spends approximately 1.5 months at each site, with the most extended period typically observed at the last site before migrating to the breeding grounds.

      In the second type, birds undertake a post-breeding migration, arrive in their non-breeding range, and then gradually move in a particular direction throughout the season. This continuous directional movement covers considerable distances and continues throughout the non-breeding period. In our study, this movement covered about 1000 km, comparable to the total migration distance of Rough-legged Buzzards of about 1500 km. As observed in our research, these movements are influenced by external factors such as snow cover. In such cases, the progression of snow cover in a south-westerly direction during winter can prevent birds from finding food, forcing them to continue migrating in the same direction. In essence, this movement represents a prolonged phase of the migration process but at a slower pace. Similar behavior has been documented in buzzards, as reported by Strandberg et al. (2009, Ibis 151:200-206). Although several transmitters in their study stopped working in mid-winter, the authors observed a phenomenon they termed ‘prolonged autumn migration.’

      In the second part of the review, the reviewer questions the need to distinguish between the two behaviors we have discussed. However, we believe these behaviors differ in their structure (with the first being intermittent and often non-directional, whereas the second is continuous and directional) and in their causes (with the first being driven by seasonal food resource cycles and the second by advancing snow cover). We therefore argue that it is worth distinguishing between them. To differentiate these forms of non-breeding movement, we propose to use ‘itinerancy’ for the first type, as described initially by Moreau in 1972, and introduce a separate term for the second behavior. Although ‘slow directional itinerancy’ could be considered, we find it too cumbersome.

      Moreover, ‘itinerancy’ in the literature refers not only to non-breeding movements but also to the use of different nesting sites, e.g., Lislevand et al. (2020, Journal of Avian Biology: e02595), reinforcing its association with movements between multiple sites within habitats. We, therefore, propose that the second behavior be given a distinct name. We acknowledge the reviewer’s point that we did not adequately address this distinction in the Discussion and plan to include a separate section in our paper’s revised version. In the third part of his review, the reviewer suggests an alternative title. Another reviewer, Dr Theunis Piersma, suggested the current title during the first round of reviewing, and we have chosen his version.

      In the fourth part of the review, the reviewer questions whether it is appropriate to discuss the conservation aspect of this study. This type of non-breeding movement raises concerns about accurately determining non-breeding ranges and population dynamics for species that exhibit this behavior. We believe that accurate determination of range and population dynamics is critical to conservation efforts. While this may be less important for species breeding in Europe and migrating to Africa, for which monitoring breeding territories is more feasible, it’s essential for Arctic and sub-Arctic breeding species. Large-scale surveys in these regions have historically been challenging and have become even more so with the end of Arctic cooperation following Russia’s war with Ukraine (Koivurova, Shibata, 2023). For North America and Europe, non-breeding abundance is typically estimated once per season in mid-winter. In North America, these are the so-called Christmas counts (which take place once at the end of December), and in Europe, they are the IWC counts mentioned by the reviewer (as follows from their official website - “The IWC requires a single count at each site, which should be repeated each year. The exact dates vary slightly from region to region, but take place in January or February”). Because of such a single count in mid-winter, non-breeding habitats occupied in autumn and spring will be listed as ‘uncommon’ at best, while south-western habitats where birds are only present in mid-winter will be listed as ‘common.’ However, the situation will be reversed if we consider the time birds spend in these habitats.

      The reviewer also highlights the introduction’s unconventional structure and information redundancy at the beginning. We have chosen this structure and provided basic explanations to improve readability for a wider audience, given eLife’s readership. At the same time, we will certainly take the reviewers’ feedback into account in the revised version. We plan to include the references to modern itinerancy research mentioned above and to add a section on itinerancy to the Discussion.

      We appreciate the reviewer’s input and sincerely thank them for their time and effort in reviewing our paper. While we may not fully agree on the classification of the behavior we describe, we value the opportunity to engage in discussion and believe that presenting arguments and counterarguments to the reader is beneficial to scientific progress.


      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      I much enjoyed reading this manuscript, that is, once I understood what it is about. Titles like "Conserving bird populations in the Anthropocene: the significance of non-breeding movements" are a claim to so-called relevance, they have NOTHING to do with the content of the paper, so once I understood that this paper was about the "Quick quick slow: the foxtrot migration of rough-legged buzzards is a response to habitat and snow" (an alternative title), it was becoming very interesting. So the start of the abstract as well as the introduction is very tedious, as clearly much trouble is taken here to establish reputability. In my eyes this is unnecessary: eLife should be interested in publishing such a wonderful description of such a wonderful migrant in a study that comes to grips with limiting factors on a continental scale!

      We sincerely appreciate your time and effort in reviewing our manuscript. Thank you for your appreciation of our study.

      We agree that the focus of the article should be changed from conservation to migration patterns. We have rewritten the Introduction and Discussion as suggested. We have added the application of this pattern including conservation at the end of the Discussion by completely changing Figure 5. We have also changed the title to the suggested one.

      Not sure that the first paragraph statements that seek to downplay what we know about wintering vs breeding areas are valid (although I see what purpose they serve). Migratory shorebirds have extensively been studied in the nonbreeding areas, for example, including movement aspects (see, as just one example, Verhoeven, M.A., Loonstra, A.H.J., McBride, A.D., Both, C., Senner, N.R. & Piersma, T. (2020) Migration route, stopping sites, and non breeding destinations of adult Black tailed Godwits breeding in southwest Fryslân, The Netherlands. Journal of Ornithology 162, 61-76) and there are very impressive studies on the winter biology of migrants across large scale (for example in Zwarts' Living on the Edge book on the Sahel wetlands). Think also about geese and swans and about seabirds!

      We have rewritten the first paragraph and it now talks about patterns of migratory behavior. We have also rewritten the second paragraph, now it is devoted to studies of movements in the non-breeding period. We explain how our pattern differs from those already studied and give references to the papers you mentioned.

      Directional movements in nonbreeding areas as a function of food (in this case locusts) have really beautifully been described by Almut Schlaich et al in JAnimEcol for Montagu's harriers.

      We have added Montagu's harrier example in the second paragraph of the Introduction and the Discussion. We have added a reference to Schlaich and to Garcia and Arroyo, who suggested that Montagu's harriers have long directional migrations during the non-breeding period.

      Once the paper starts talking buzzards, and the analyses of the wonderful data, all is fine. It is a very competent analysis with a description of a cool pattern.

      Thank you for your appreciation of our study. We hope the revised version is better and clearer.

      However, i would say that it is all a question of spatial scale. The buzzards here respond to changes in food availability, but there is not an animal that doesn't. The question is how far they have to move for an adequate response: in some birds movements of 100s of meters may be enough, and then anything to the scale of rough-legged buzzards.

      In the new version of the manuscript, we emphasize that this is a large distance (about 1000 km), comparable to the distance of the fall and spring migrations (about 1400 km) in lines 70-72 of the Introduction and 379-383 of the Discussion.

      And actually, several of the shorebirds I know best also do a foxtrot, such as red knots and bar-tailed godwits moulting in the Wadden Sea, then spending a few months in the UK estuaries, before returning to the Wadden Sea before the long migrations to Arctic breeding grounds. The publication of the rough-legged buzzard story may help researchers to summarize patterns such as this too. Mu problem with this paper is the framing. A story on the how and why of these continental movements in response to snow and other habitat features would be a grand contribution. Drop Anthropocene, and rethink whether foxtrot should be introduced as a hypothesis or a summary of cool descriptions. I prefer the latter, and recommend eLife to go with that too, rather than encourage "disconnected frames that seek 'respectability'" Good luck, theunis piersma

      We thank the reviewer again for his valuable comments and suggestions. We have changed the framing to the suggested one and removed the Anthropocene from the article.

      Reviewer #2 (Recommendations For The Authors):

      We sincerely appreciate the time and effort you have taken to review our manuscript. We have carefully considered all of your comments, including both public and author comments, and provided detailed responses to each of them below. In addition, we would like to address the most important public comments.

      We agree with the suggestion to shift the focus of the article from conservation to migration patterns. Accordingly, we have rewritten both the Introduction and Discussion sections to focus on migration behavior rather than conservation.

      However, we respectfully disagree with the suggestion that the migration patterns we describe are synonymous with itinerancy. We acknowledge that our original presentation may have been unclear and may have hindered full understanding. In the revised version, we provide a detailed analysis of migratory behavior in the Introduction that describes how our pattern differs from itinerancy. We also revisit this distinction in the Discussion section. We have also carefully revised Figure 1 to improve clarity and avoid potential misunderstandings.

      Regarding the applicability of the described migration pattern, we acknowledge that the Rough-legged Buzzard is not listed as an endangered species. However, we believe that our findings have practical implications. We have moved our discussion of this issue to the end of the Discussion section and have completely revised Figure 5. While the overall population of Rough-legged Buzzards is not declining, certain regions within its range are experiencing declines. We show that this decline does not warrant listing the species as endangered. Instead, it may represent a redistribution within the non-breeding range - a shift in range dynamics. We use the example of the Rough-legged Buzzard to illustrate this concept and emphasize the importance of considering such dynamics when assessing the conservation status of species in the future.

      We also acknowledge that the hypothesis of this form of behavior has been proposed previously for Montagu's Harrier, and we have included this information in the revised manuscript. In addition, we agree that the focus on the Anthropocene is unnecessary in this context and have therefore removed it.

      We believe that these revisions significantly improve the clarity and robustness of the manuscript, and we are grateful for your insightful comments and suggestions.

      As a general comment, please note that including line numbers (as it is the standard in any manuscript submission) would facilitate reviewers providing more detailed comments on the text.

      We apologize for this oversight and have added line numbers to our revised manuscript.

      Dataset: unclear what is the frequency of GPS transmissions. Furthermore, information on relative tag mass for the tracked individuals should be reported.

      We have included this information in our manuscript (L 157-163). We also refer to the study in which this dataset was first used and described in detail (L 164).

      Data pre-processing: more details are needed here. What data have been removed if the bird died? The entire track of the individual? Only the data classified in the last section of the track? The section also reports on an 'iterative procedure' for annotating tracks, which is only vaguely described. A piecewise regression is mentioned, but no details are provided, not even on what is the dependent variable (I assume it should be latitude?).

      Regarding the deaths. We only removed the data when the bird was already dead. We have corrected the text to make this clear (L 170).

      Regarding the iterative procedure. We have added a detailed description on lines 175-188.

      Data analysis: several potential issues here:

      (1) Unclear why sex was not included in all mixed models. I think it should be included.

      Our dataset contains 35 females and eight males. This ratio does not allow us to include sex in all models and adequately assess the influence of this factor. At the same time, because adult females disperse farther than males in some raptor species, we conducted a separate analysis of the dependence of migration distance on sex (Table S8) and found no evidence for this in our species. We have written a separate paragraph about this. This paragraph can be found on lines 356-360 of the new manuscript.

      (2) Unclear what is the rationale of describing habitat use during migration; is it only to show that it is a largely unsuitable habitat for the species? But is a formal analysis required then? Wouldn't be enough to simply describe this?

      Habitat use and snow cover determine the two main phases (quick and slow) of the pattern we describe. We believe that habitat analysis is appropriate in this case and that a simple description would be uninformative and would not support our conclusions.

      (3) Analysis of snow cover: such a 'what if' analysis is fine but it seems to be a rather indirect assessment of the effect of snow cover on movement patterns. Can a more direct test be envisaged relating e.g. daily movement patterns to concomitant snow cover? This should be rather straightforward. The effectiveness of this method rests on among-year differences in snow cover and timing of snowfall. A further possibility would be to demonstrate habitat selection within the entire non-breeding home range of an individual in relation snow cover. Such an analysis would imply associating presence-absence of snow to every location within the non-breeding range and testing whether the proportion of locations with snow is lower than the proportion of snow of random locations within the entire non-breeding home range (95% KDE) for every individual (e.g. by setting a 1/10 ratio presence to random locations).

      The proposed analysis will provide an opportunity to assess whether the Rough-legged Buzzard selects areas with the lowest snow cover, but will not provide an opportunity to follow the dynamics and will therefore give a misleading overall picture. This is especially true in the spring months. In March-April, Rough-legged Buzzards move northeast and are in an area that is not the most open to snow. At this time, areas to the southwest are more open to snow (this can be seen in Figure 4b). If we perform the proposed analysis, the control points for this period would be both to the north (where there is more snow) and to the south (where there is less snow) from the real locations, and the result would be that there is no difference in snow cover.

      A step-selection analysis could be used, as we did in our previous work (Curk et al 2020 Sci Rep) with the same Rough-legged Buzzard (but during migration, not winter). But this would only give us a qualitative idea, not a quantitative one - that Rough-legged Buzzards move from snow (in the fall) and follow snowmelt progression (in the spring).

      At the same time, our analysis gives a complete picture of snow cover dynamics in different parts of the non-breeding range. This allows us to see that if Rough-legged Buzzards remained at their fall migration endpoint without moving southwest, they would encounter 14.4% more snow cover (99.5% vs. 85.1%). Although this difference may seem small (14.4%), it holds significance for rodent-hunting birds, distinguishing between complete and patchy snow cover. Simultaneously, if Rough-legged Buzzards immediately flew to the southwest and stayed there throughout winter, they would experience 25.7% less snow cover (57.3% vs. 31.6%). Despite a greater difference than in the first case, it doesn't compel them to adopt this strategy, as it represents the difference between various degrees of landscape openness from snow cover.

      We write about this in the new manuscript on lines 385-394.

      Results: it is unclear whether the reported dispersion measures are SDs or SEs. Please provide details.

      For the date and coordinates of the start and end of the different phases of migration, we specified the mean, sd, and sample size. We wrote this in line 277. For the values of the parameters of the different phases of the migration (duration, distance, speed, and direction), we used the mean, the standard error of the mean, and the confidence interval (obtained using the ‘emmeans’ package). We have indicated this in lines 302-303 and the caption of Table 1 (L 315) and Figure 2 (L 293-294). For the values of habitat and snow cover experienced by the Rough-legged Buzzards, we used the mean and the error of the mean. We reported this on lines 322 and 337 and in Figures 3 (L 332-333) and 4 (L 355-356).

      Discussion: in general, it should be reshaped taking into account the comments. It is overlong, speculative and quite naive in several passages. Entire sections can be safely removed (I think it can be reduced by half without any loss of information). I provide some examples of the issues I have spotted below. For instance, the entire paragraph starting with 'Understanding....' is not clear to me. What do you mean by 'prohibited management' options? Without examples, this seems a rather general text, based on unclear premises when related to the specific of this study. Some statements are vague, derive from unsubstantiated claims, and unclear. E.g. "Despite their scarcity in these habitats, forests appear to hold significant importance for Rough-legged buzzards for nocturnal safety". I could not find any day-night analysis showing that they actually roost in forests during nighttime. Being a tundra species, it may well be possible that rough-legged buzzards perceive forests as very dangerous habitats and that they prefer instead to roost in open habitats. Analysing habitat use during day and night during the non-breeding period may be of help to clarify this. Furthermore, considering the fast migration periods, what is the flight speed during day and night above forests? Do these birds also migrate at night or do they roost during the night? Perhaps a figure visualizing day and night track segments could be of help (or an analysis of day vs. night flight speed) (there are several R packages to annotate tracks in relation to day and night). This is an example of another problematic statement: "The progression of snow cover in the wintering range of Rough-legged buzzards plays a significant role in their winter migration pattern." The manuscript does not contain any clear demonstration of this, as I wrote in my previous comments. Without such evidence, you must considerably tone down such assertions. But since providing a direct link is certainly possible, I think that additional analyses would clearly strengthen your take-home message.

      The paragraph starting with "The quantification of environmental changes that could prove fatal to bird species presents yet another challenge for conservation efforts in an era of rapid global change." is quite odd. Take the following statement "For instance, the presence of small patches of woodland in the winter range might appear crucial to the survival of the Rough-legged buzzard. Elimination of these seemingly minor elements of vegetation cover through management actions could have dire consequences for the species.". It is based on the assumption that minor vegetation elements play a key role in the ecology of the species, without any evidence supporting this. Does it have any sense? I could safely say exactly the opposite and I would believe it might even be more substantiated.

      We agree with these comments.

      We have completely rewritten this section. As suggested, we have shortened it by removing statements that were not supported by the research. We have completely removed the statements about "prohibited management". We have also removed the statement that "forests appear to be of significant importance to Rough-legged buzzards for nocturnal safety" and everything associated with that statement, e.g. the statement about "small elements of vegetation cover", etc. We do believe that this statement is true in substance, but we also agree that it is not supported by the results and requires separate analysis. At the same time, we believe that this is a topic for a separate study and would be redundant here. Therefore, we leave it for a separate publication.

      Conclusion paragraph: I believe this severely overstates the conservation importance of this study. That the results have "crucial implications for conservation efforts in the Anthropocene, where rapidly changing environmental factors can severely impact bird migration" seems completely untenable to me. What is the evidence for such crucial implications? For instance, these results may suggest that climate change, because global warming is predicted to reduce snow cover in the non-breeding areas, might well be beneficial for populations of this species, by reducing non-breeding energy expenditure and improving non-breeding survival. I think statements like these are simply not necessary, and that the study should be more focused on the actual results and evidence provided.

      We have completely rewritten this section. We removed the reference to the Anthropocene and focused on migratory behavior and migration patterns.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Review:

      Summary:

      In this manuscript, the authors set out to understand how different TLR4 agonists trigger Myddosome assembly and seek to examine how the potent LPS agonist induces a heightened TLR4 response. A strength of the study is that the authors employ a novel light sheet imaging modality coupled to nanopipette delivery of TLR4 ligands. The authors use this technological innovation to resolve the dynamics of Myddosome formation within the whole cell volume of macrophage cell lines expressing MyD88-YFP. The main finding is that the kinetics of Myddosome formation is slower for the weaker agonist Abeta than LPS. However, Abeta amyloids resulted in the formation of larger MyD88-YFP puncta that persisted for longer. The authors suggest the slower kinetics of formation and larger puncta size reflect how Abeta amyloids are a less efficient TLR4 agonist. Many Toll-like receptors are now known to recognize endogenous produced danger signals and microbially derived molecules. This work is the first to compare the signaling kinetics of endogenous versus microbially derived TLR agonists.

      Strengths:

      A key strength of this work is the technological achievement of imaging Myddosomes within the entire cell volume and using a nanopipette to administer ligands directly to single cells. The authors also combine this light sheet microscopy with STORM imaging to gain a super-resolved view of the assembly of Myddosomes. These findings suggest that Myddosomes formed in response to Abeta have a more irregular morphology. We conclude that these technological achievements are significant in improving our understanding of the dynamics of TLR4 signaling in response to diverse agonists. Given the limited literature on the molecular dynamics of innate immune signal transduction, this study is an important addition to the field.

      Weaknesses:

      One limitation of the paper is that a suitable explanation for how larger Myddosomes would contribute to an attenuated downstream signaling response. Do the larger clusters of nucleated MyD88 polymers reflect inefficiency in assembling fully formed Myddosomes that contain IRAK4/2? Could the MyD88-GFP puncta be stained with antibodies against IRAK4 (or IRAK2) to determine the frequency and probably of the two ligands to stimulate signal transduction beyond MyD88 assembly?

      A second weakness is the discussion. The authors should explore other explanations for the observed differences in Myddosome formation between TLR4 agonists. For example, could the observed delay in Myddosome assembly in response to Abeta be due to different binding affinity or kinetics to TLR4? Can this be ruled out?

      We thank the reviewer for these comments.

      To address the first comment we have added a section on the limitations of the current study and suggested that future work could use IRAK4 or 2 staining to identify Myddosomes that are functional as well as working with cells where the Myddosome expression levels is at physiological levels, which may reduce the formation of larger Myddosomes.

      The reviewer is correct that the difference in delay time for Myddosome formation could be due slow formation of a TLR4 dimer or binding to the TLR4 dimer, rather than the time take to assemble the Myddosome after TLR4 dimerisation and binding since we have only measured the delay time for Myddosome formation when triggered by LPS or Aβ aggregates. This delay times involves dimerization of TLR4, binding of LPS or Aβ aggregates to the TLR4 dimer followed by Myddosome formation. These other processes might contribute to the difference in delay time that we observed between LPS or Aβ aggregates. It is worth noting that in our experiments we deliver the LPS or Aβ aggregates directly onto the surface for 5 seconds and that we previously showed the presence of the preformed TLR4 dimers on the cell surface (Latty et al., 2018). The affinity of Aβ aggregates for TLR4 is not known but LPS has a high affinity for TLR4, estimated to ∼3 nM for lipid A–TLR4-MD-2 (Akashi et al., 2003). However, even with this high affinity which implies fast binding, direct delivery directly onto the surface and the presence of preformed TLR4 dimers on the cell surface we observed that it took 80 s to observe Myddosome formation. This indicates that Myddosome formation is the slow step for LPS triggering. This is likely to be the case Aβ aggregates, since pM concentrations of aggregates can trigger TLR4 signalling (Hughes et al., 2020) indicating high affinity. However, it is not possible to rule out a contribution of a difference in affinity to observed difference in delay time without measuring the affinity directly.

      We have added both these points to a new paragraph on the limitations of the study in the Discussion.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We are grateful for these balanced, nuanced evaluations of our work concerning the observed epistatic trends and our interpretations of their mechanistic origins. Overall, we think the reviewers have done an excellent job at recognizing the novel aspects of our findings while also discussing the caveats associated with our interpretations of the biophysical effects of these mutations. We believe it is important to consider both of these aspects of our work in order to appreciate these advances and what sorts of pertinent questions remain.

      Notably, both reviewers are concerned that our lack of experimental approaches to compare the conformational properties of GnRHR variants weakens our claims. We would first humbly suggest that this constitutes a more general caveat that applies to nearly all investigations of the cellular misfolding of α-helical membrane proteins. Whether or not any current in vitro folding measurements report on conformational transitions that are relevant to cellular protein misfolding reactions remains an active area of debate (discussed further below). Nevertheless, while we concede that our structural and/ or computational evaluations of various mutagenic effects remain speculative, prevailing knowledge on the mechanisms of membrane protein folding suggest our mutations of interest (V276T and W107A) are highly unlikely to promote misfolding in precisely the same way. Thus, regardless of whether or not we were able experimentally compare the relevant folding energetics of GnRHR variants, we are confident that the distinct epistatic interactions formed by these mutations reflect variations in the misfolding mechanism and that they are distinct from the interactions that are observed in the context of stable proteins. In the following, we provide detailed considerations concerning these caveats in relation to the reviewers’ specific comments.

      Reviewer #1 (Public Review):

      The paper carries out an impressive and exhaustive non-sense mutagenesis using deep mutational scanning (DMS) of the gonadotropin-releasing hormone receptor for the WT protein and two single point mutations that I) influence TM insertion (V267T) and ii) influence protein stability (W107A), and then measures the effect of these mutants on correct plasma membrane expression (PME).

      Overall, most mutations decreased mGnRHR PME levels in all three backgrounds, indicating poor mutational tolerance under these conditions. The W107A variant wasn't really recoverable with low levels of plasma membrane localisation. For the V267T variant, most additional mutations were more deleterious than WT based on correct trafficking, indicating a synergistic effect. As one might expect, there was a higher degree of positive correlation between V267T/W107A mutants and other mutants located in TM regions, confirming that improper trafficking was a likely consequence of membrane protein co-translational folding. Nevertheless, context is important, as positive synergistic mutants in the V27T could be negative in the W107A background and vice versa. Taken together, this important study highlights the complexity of membrane protein folding in dissecting the mechanism-dependent impact of disease-causing mutations related to improper trafficking.

      Strengths

      This is a novel and exhaustive approach to dissecting how receptor mutations under different mutational backgrounds related to co-translational folding, could influence membrane protein trafficking.

      Weaknesses

      The premise for the study requires an in-depth understanding of how the single-point mutations analysed affect membrane protein folding, but the single-point mutants used seem to lack proper validation.

      Given our limited understanding of the structural properties of misfolded membrane proteins, it is unclear whether the relevant conformational effects of these mutations can be unambiguously validated using current biochemical and/ or biophysical folding assays. X-ray crystallography, cryo-EM, and NMR spectroscopy measurements have demonstrated that many purified GPCRs retain native-like structural ensembles within certain detergent micelles, bicelles, and/ or nanodiscs. However, helical membrane protein folding measurements typically require titration with denaturing detergents to promote the formation of a denatured state ensemble (DSE), which will invariably retain considerable secondary structure. Given that the solvation provided by mixed micelles is clearly distinct from that of native membranes, it remains unclear whether these DSEs represent a reasonable proxy for the misfolded conformations recognized by cellular quality control (QC, see https://doi.org/10.1021/acs.chemrev.8b00532). Thus, the use and interpretation of these systems for such purposes remains contentious in the membrane protein folding community. In addition to this theoretical issue, we are unaware of any instances in which GPCRs have been found to undergo reversible denaturation in vitro- a practical requirement for equilibrium folding measurements (https://doi.org/10.1146/annurev-biophys-051013-022926). We note that, while the resistance of GPCRs to aggregation, proteolysis, and/ or mechanical unfolding have also been probed in micelles, it is again unclear whether the associated thermal, kinetic, and/ or mechanical stability should necessarily correspond to their resistance to cotranslational and/ or posttranslational misfolding. Thus, even if we had attempted to validate the computational folding predictions employed herein, we suspect that any resulting correlations with cellular expression may have justifiably been viewed by many as circumstantial. Simply put, we know very little about the non-native conformations are generally involved in the cellular misfolding of α-helical membrane proteins, much less how to measure their relative abundance. From a philosophical standpoint, we prefer to let cells tell us what sorts of broken protein variants are degraded by their QC systems, then do our best to surmise what this tells us about the relevant properties of cellular DSEs.

      Despite this fundamental caveat, we believe that the chosen mutations and our interpretation of their relevant conformational effects are reasonably well-informed by current modeling tools and by prevailing knowledge on the physicochemical drivers of membrane protein folding and misfolding. Specifically, the mechanistic constraints of translocon-mediated membrane integration provide an understanding of the types of mutations that are likely to disrupt cotranslational folding. Though we are still learning about the protein complexes that mediate membrane translocation (https://doi.org/10.1038/s41586-022-05336-2), it is known that this underlying process is fundamentally driven by the membrane depth-dependent amino acid transfer free energies (https://doi.org/10.1146/annurev.biophys.37.032807.125904). This energetic consideration suggests introducing polar side chains near the center of a nascent TMDs should almost invariably reduce the efficiency of topogenesis. To confirm this in the context of TMD6 specifically, we utilized a well-established biochemical reporter system to confirm that V276T attenuates its translocon-mediated membrane integration (Fig. S1)- at least in the context of a chimeric protein. We also constructed a glycosylation-based topology reporter for full-length GnRHR, but ultimately found its’ in vitro expression to be insufficient to detect changes in the nascent topological ensemble.

      In contrast to V276T, the W107A mutation is predicted to preserve the native topological energetics of GnRHR due to its position within a soluble loop region. W107A is also unlike V276T in that it clearly disrupts tertiary interactions that stabilize the native structure. This mutation should preclude the formation of a structurally conserved hydrogen bonding network that has been observed in the context of at least 25 native GPCR structures (https://doi.org/10.7554/eLife.5489). However, without a relevant folding assay, the extent to which this network stabilizes the native GnRHR fold in cellular membranes remains unclear. Overall, we admit that these limitations have prevented us from measuring how much V276T alters the efficiency of GnRHR topogenesis, how much the W107A destabilizes the native fold, or vice versa. Nevertheless, given these design principles and the fact that both reduce the plasma membrane expression of GnRHR, as expected, we are highly confident that the structural defects generated by these mutations do, in fact, promote misfolding in their own ways. We also concede that the degree to which these mutagenic perturbations are indeed selective for specific folding processes is somewhat uncertain. However, it seems exceedingly unlikely that these mutations should disrupt topogenesis and/ or the folding of the native topomer to the exact same extent. From our perspective, this is the most important consideration with respect to the validity of the conclusions we have made in this manuscript.

      Furthermore, plasma membrane expression has been used as a proxy for incorrect membrane protein folding, but this not necessarily be the case, as even correctly folded membrane proteins may not be trafficked correctly, at least, under heterologous expression conditions. In addition, mutations can affect trafficking and potential post-translational modifications, like glycosylation.

      While the reviewer is correct that the sorting of folded proteins within the secretory pathway is generally inefficient, it is also true that the maturation of nascent proteins within the ER generally bottlenecks the plasma membrane expression of most α-helical membrane proteins. Our group and several others have demonstrated that the efficiency of ER export generally appears to scale with the propensity of membrane proteins to achieve their correct topology and/ or to achieve their native fold (see https://doi.org/10.1021/jacs.5b03743 and https://doi.org/10.1021/jacs.8b08243). Notably, these investigations all involved proteins that contain native glycosylation and various other post-translational modification sites. While we cannot rule out that certain specific combinations of mutations may alter expression through their perturbation of post-translational GnRHR modifications, we feel confident that the general trends we have observed across hundreds of variants predominantly reflect changes in folding and cellular QC. This interpretation is supported by the relationship between observed trends in variant expression and Rosetta-based stability calculations, which we identified using unbiased unsupervised machine learning approaches (compare Figs. 6B & 6D).

      Reviewer #2 (Public Review):

      Summary:

      In this paper, Chamness and colleagues make a pioneering effort to map epistatic interactions among mutations in a membrane protein. They introduce thousands of mutations to the mouse GnRH Receptor (GnRHR), either under wild-type background or two mutant backgrounds, representing mutations that destabilize GnRHR by distinct mechanisms. The first mutant background is W107A, destabilizing the tertiary fold, and the second, V276T, perturbing the efficiency of cotranslational insertion of TM6 to the membrane, which is essential for proper folding. They then measure the surface expression of these three mutant libraries, using it as a proxy for protein stability, since misfolded proteins do not typically make it to the plasma membrane. The resulting dataset is then used to shed light on how diverse mutations interact epistatically with the two genetic background mutations. Their main conclusion is that epistatic interactions vary depending on the degree of destabilization and the mechanism through which they perturb the protein. The mutation V276T forms primarily negative (aggravating) epistatic interactions with many mutations, as is common to destabilizing mutations in soluble proteins. Surprisingly, W107A forms many positive (alleviating) epistatic interactions with other mutations. They further show that the locations of secondary mutations correlate with the types of epistatic interactions they form with the above two mutants.

      Strengths:

      Such a high throughput study for epistasis in membrane proteins is pioneering, and the results are indeed illuminating. Examples of interesting findings are that: (1) No single mutation can dramatically rescue the destabilization introduced by W107A. (2) Epistasis with a secondary mutation is strongly influenced by the degree of destabilization introduced by the primary mutation. (3) Misfolding caused by mis-insertion tends to be aggravated by further mutations. The discussion of how protein folding energetics affects epistasis (Fig. 7) makes a lot of sense and lays out an interesting biophysical framework for the findings.

      Weaknesses:

      The major weakness comes from the potential limitations in the measurements of surface expression of severely misfolded mutants. This point is discussed quite fairly in the paper, in statements like "the W107A variant already exhibits marginal surface immunostaining" and many others. It seems that only about 5% of the W107A makes it to the plasma membrane compared to wild-type (Figures 2 and 3). This might be a low starting point from which to accurately measure the effects of secondary mutations.

      The reviewer raises an excellent point that we considered at length during the analysis of these data and the preparation of the manuscript. Though we remain confident in the integrity of these measurements and the corresponding analyses, we now realize this aspect of the data required further discussion and documentation which we have provided in the revised version of the manuscript as is described in the following.

      Still, the authors claim that measurements of W107A double mutants "still contain cellular subpopulations with surface immunostaining intensities that are well above or below that of the W107A single mutant, which suggests that this fluorescence signal is sensitive enough to detect subtle differences in the PME of these variants". I was not entirely convinced that this was true.

      We made this statement based on the simple observation that the surface immunostaining intensities across the population of recombinant cells expressing the library of W107A double mutants was consistently broader than that of recombinant cells expressing W107A GnRHR alone (see Author response image 1 for reference). Given that the recombinant cellular library represents a mix of cells expressing ~1600 individual variants that are each present at low abundance, the pronounced tails within this distribution presumably represent the composite staining of many small cellular subpopulations that express collections of variants that deviate from the expression of W107A to an extent that is significant enough to be visible on a log intensity plot.

      Author response image 1.

      Firstly, I think it would be important to test how much noise these measurements have and how much surface immunostaining the W107A mutant displays above the background of cells that do not express the protein at all.

      For reference, the average surface immunostaining intensity of HEK293T cells transiently expressing W107A GnRHR was 2.2-fold higher than that of the IRES-eGFP negative, untransfected cells within the same sample- the WT immunostaining intensity was 9.5-fold over background by comparison. Similarly, recombinant HEK293T cells expressing the W107A double mutant library had an average surface immunostaining intensity that was 2.6-fold over background across the two DMS trials. Thus, while the surface immunostaining of this variant is certainly diminished, we were still able to reliably detect W107A at the plasma membrane even under distinct expression regimes. We have included these and other signal-to-noise metrics for each experiment in the Results section of the revised manuscript.

      Beyond considerations related to intensity, we also previously noticed the relative intensity values for W107A double mutants exhibited considerable precision across our two biological replicates. If signal were too poor to detect changes in variant expression, we would have expected a plot of the intensity values across these two replicates to form a scatter. Instead, we found DMS intensity values for individual variants to be highly correlated from one replicate to the next (Pearson’s R2 = 0.95, see Author response image 2 for reference). This observation empirically demonstrates that this assay consistently differentiated between variants that exhibit slightly enhanced immunostaining from those that have even lower immunostaining than W107A GnRHR. We have included these discussion points in the Results section as well as scatter plots for replicate variant intensities within all three genetic backgrounds in Figure S3 of the revised manuscript.

      Author response image 2.

      But more importantly, it is not clear if under this regimen surface expression still reports on stability/protein fitness. It is unknown if the W107A retains any function or folding at all. For example, it is possible that the low amount of surface protein represents misfolded receptors that escaped the ER quality control.

      While we believe that such questions are outside the scope of this work, we certainly agree that it is entirely possible that some of these variants bypass QC without achieving their native fold. This topic is quite interesting to us but is quite challenging to assess in the context of GPCRs, which have complex fitness landscapes that involve their propensity to distinguish between different ligands, engage specific components associated with divergent downstream signaling pathways, and navigate between endocytic recycling/ degradation pathways following activation. In light of the inherent complexity of GPCR function, we humbly suggest our choice of a relatively simple property of an otherwise complex protein may be viewed as a virtue rather than a shortcoming. Protein fitness is typically cast as the product of abundance and activity. Rather than measuring an oversimplified, composite fitness metric, we focused on one variable (plasma membrane expression) and its dominant effector (folding). We believe restraining the scope in this manner was key for the elucidation of clear mechanistic insights.

      The differential clustering of epistatic mutations (Fig. 6) provides some interesting insights as to the rules that dictate epistasis, but these too are dominated by the magnitude of destabilization caused by one of the mutations. In this case, the secondary mutations that had the most interesting epistasis were exceedingly destabilizing. With this in mind, it is hard to interpret the results that emerge regarding the epistatic interactions of W107A. Furthermore, the most significant positive epistasis is observed when W107A is combined with additional mutations that almost completely abolish surface expression. It is likely that either mutation destabilizes the protein beyond repair. Therefore, what we can learn from the fact that such mutations have positive epistasis is not clear to me. Based on this, I am not sure that another mutation that disrupts the tertiary folding more mildly would not yield different results. With that said, I believe that the results regarding the epistasis of V276T with other mutations are strong and very interesting on their own.

      We agree with the reviewer. In light of our results we believe it is virtually certain that the secondary mutations characterized herein would be likely to form distinct epistatic interactions with mutations that are only mildly destabilizing. Indeed, this insight reflects one of the key takeaway messages from this work- stability-mediated epistasis is difficult to generalize because it should depend on the extent to which each mutation changes the stability (ΔΔG) as well as initial stability of the WT/ reference sequence (ΔG, see Figure 7). Frankly, we are not so sure we would have pieced this together as clearly had we not had the fortune (or misfortune?) of including such a destructive mutation like W107A as a point of reference.

      Additionally, the study draws general conclusions from the characterization of only two mutations, W107A and V276T. At this point, it is hard to know if other mutations that perturb insertion or tertiary folding would behave similarly. This should be emphasized in the text.

      We agree. Our findings suggest different mutations may not behave similarly, which we believe is a key finding of this work. We have emphasized this point in the Discussion section of the revised manuscript as follows:

      “These findings suggest the folding-mediated epistasis is likely to vary among different classes of destabilizing mutations in a manner that should also depend on folding efficiency and/ or the mechanism(s) of misfolding in the cell.”

      Some statistical aspects of the study could be improved:

      (1) It would be nice to see the level of reproducibility of the biological replicates in a plot, such as scatter or similar, with correlation values that give a sense of the noise level of the measurements. This should be done before filtering out the inconsistent data.

      We thank the reviewer for this suggestion and will include scatters for each genetic background like the one shown above in Figure S3 of the revised version of the manuscript.

      (2) The statements "Variants bearing mutations within the C- terminal region (ICL3-TMD6-ECL3-TMD7) fare consistently worse in the V276T background relative to WT (Fig. 4 B & E)." and "In contrast, mutations that are 210 better tolerated in the context of W107A mGnRHR are located 211 throughout the structure but are particularly abundant among residues 212 in the middle of the primary structure that form TMD4, ICL2, and ECL2 213 (Fig. 4 C & F)." are both hard to judge. Inspecting Figures 4B and C does not immediately show these trends, and importantly, a solid statistical test is missing here. In Figures 4E and F the locations of the different loops and TMs are not indicated on the structure, making these statements hard to judge.

      We apologize for this oversight and thank the reviewer for pointing this out. We utilized paired Wilcoxon-Signed Rank Tests to evaluate the statistical significance of these observations and modified the description of these findings in the revised version of the results section as follows:

      “Variants bearing mutations within the C-terminal regions including ICL3, TMD6, and TMD7 fare consistently worse in the V276T background relative to WT (paired Wilcoxon-Signed Rank Test p-values of 0.0001, 0.02, and 0.005, respectively) (Fig. 4 B & E). Given that V276T perturbs the cotranslational membrane integration of TMD6 (Fig. S1, Table S1), this directional bias potentially suggests that the apparent interactions between these mutations manifest during the late stages of cotranslational folding. In contrast, mutations that are better tolerated in the context of W107A mGnRHR are located throughout the structure but are particularly abundant among residues in the middle of the primary structure that form ICL2, TMD4, and ECL2 (paired Wilcoxon-Signed Rank Test p-values of 0.0005, 0.0001, and 0.004, respectively) (Fig. 4 C & F).”

      (3) The following statement lacks a statistical test: "Notably, these 98 variants are enriched with TMD variants (65% TMD) relative to the overall set of 251 variants (45% TMD)." Is this enrichment significant? Further in the same paragraph, the claim that "In contrast to the sparse epistasis that is generally observed between mutations within soluble proteins, these findings suggest a relatively large proportion of random mutations form epistatic interactions in the context of unstable mGnRHR variants". Needs to be backed by relevant data and statistics, or at least a reference.

      We thank the reviewer for this reasonable suggestion. In the revised manuscript, we included the results of a paired Wilcoxon-Signed Rank Test that confirms the statistical significance of this observation and modified the Results section to reflect this as follows:

      “Notably, these 98 variants are enriched with TMD variants (65% TMD) relative to the overall set of 251 variants (45% TMD, Fisher’s Exact Test p = 0.0019). These findings suggest random mutations form epistatic interactions in the context of unstable mGnRHR variants in a manner that depends on the specific folding defect (V276T vs. W107A) and topological context.”

      Reviewer #1 (Recommendations for the Authors):

      As far as this reviewer is aware, the effect of the V267T variant on MP insertion has not been measured directly; its position corresponds to T277 in TMD6 of human GnRHR that has been measured for TM insertion, but given the clear lack of conservation (threonine vs valine) the mutation in TM6 could potentially have a different impact on the mouse homologue. Please clarify what the predicted delta TM for insertion is between human and mouse GnRHR is? Moreover, I would argue that single TM insertion by tethering to Lep is insufficient to understand MP insertion/folding, as neighbouring TM helices could help to drive TM6 insertion. Has ER microsome experiments for mouse GnRHR also been carried out in the context of neighbouring helices?

      We included measurements (and predictions) of the impact of the V276T substitution on the translocon-mediated membrane integration of the mouse TMD6 in the context of a chimeric Lep protein (see Fig. S1 & Table S1). Our results reveal that this substitution decreases the efficiency of TMD6 membrane integration by ~10%. Though imperfect, this prevailing biochemical assay remains popular for a variety of theoretical and technical reasons. Importantly, extensive experimental testing of this system has shown that these measurements report apparent equilibrium constants that are well-described by two-state equilibrium partitioning models (see DOIs 10.1038/nature03216 and 10.1038/nature06387). This observation provides a reasonable rationale to interpret these measurements using energetic models as we have in this work (see Table S1). From a technical perspective, the Lep system is also advantageous due to the fact that this protein is generally well expressed in the context of in vitro translation systems containing native membranes, which generally ensures a consistent signal to noise and dynamic range for membrane integration measurements. Nevertheless, the reviewers are correct that membrane integration efficiencies are likely distinct in the context of the native mGnRHR protein. For these reasons, we attempted to develop a glycosylation-based topology reporter prior to the posting and submission of this manuscript. However, all GnRHR reporters we tested were poorly expressed in vitro and the resulting 35S-labeled proteins only generated faint smears on our phosphorimaging screens that could not be interpreted. For these reasons, we chose to rely the Lep measurements for these investigations.

      The lack of a more relevant topological reporter is one of many challenges we faced in our investigations of this unstable, poorly behaved protein. We share the reviewer’s frustrations concerning the speculative aspects of this work. Nevertheless, there is increasing appreciation for the fact that our perspectives on protein biophysics have been skewed by our continuing choice to focus on the relatively small set of model proteins that are compatible with our favored methodologies (doi: 10.1016/j.tibs.2013.05.001). We humbly suggest this work represents an example of how we can gain a deeper understanding of the limits of biochemical systems when we instead choose to study the unsavory bits of cellular proteomes. But this choice requires a willingness to make some reasonable assumptions and to lean on energetic/ structural modeling from time to time. Despite this limitation, we believe there is still tremendous value in this compromise.

      What is the experimental evidence the W107A variant affects the protein structure? Has its melting temperature with and without inverse agonist binding for WT vs the W107A variant been measured, for example? Even heat-FSEC of detergent-solubilised membranes would be informative to know how unstable the W107A variant is. If is very unstable in detergent, then it could be that recovery mutants are going to be unlikely as you are already starting with a poor construct showing poor folding/localisation.

      We again understand the rationale for this concern, but do not believe that thermal melting measurements are likely to report the same sorts of conformational transitions involved in cellular misfolding. Heating up a protein to the point in which membranes (or micelles) are disrupted and the proteins begin to form insoluble aggregates is a distinct physical process from those that occur during co- and post-translational folding within intact ER membranes at physiological temperatures (discussed further in the Response to the Reviews). Indeed, as the reviewer points out below, there seems to be little evidence that secretion is linked to thermal stability or various other metrics that others have attempted to optimize for the sake of purification and/ or structural characterization. Thus, we believe it would be just as speculative to suggest thermal aggregation represents a relevant metric for the propensity of membrane proteins to fold in the cell. The physical interpretation of membrane protein misfolding reaction remains contentious in our field due to the key fact that the denatured states of helical membrane proteins remain highly structured in a manner that is hard to generalize beyond the fact that the denatured states retain α-helical secondary structure (doi: 10.1146/annurev-biophys-051013-022926). This is in stark contrast to soluble proteins, where random coil reference states have proven to be generally useful for energetic interpretations of protein stability. For reference, our lab is currently working to leverage epistatic measurements like this to map the prevailing physiological denatured states of an integral membrane protein. Our current findings suggest that non-native electrostatic interactions form in the context of misfolded states. We hope that more information on the structural aspects of these states will help us to develop and interpret meaningful folding measurements within the membrane.

      For reference, even in cases when quantitative folding measurements can be achieved, their relevance remains actively debated. As a point of reference, the corresponding author of this work previously worked on the stability and misfolding of another human α-helical membrane protein (PMP22). Like GnRHR, PMP22 is prone to misfolding in the secretory pathway and is associated with dozens of pathogenic mutations that cause protein misfolding. To understand how the thermodynamic stability of this protein is linked to secretion, the corresponding author purified PMP22, reconstituted it into n-Dodecyl-phosphocholine (DPC) micelles, and measured its resistance to denaturation by an anionic denaturing detergent (Lauryl Sarcosine, LS). The results were initially perplexing due to the fact that equilibrium unfolding curves manifested as an exponential decay (rather than a sigmoid) and relaxation kinetics appeared to be dominated by the rate constant for unfolding (doi: 10.1021/bi301635f). Unfortunately, these data could not be fit with existing folding models due to the lack of a folded protein baseline and the absence of a folding arm in the chevron plot. We eventually found that a full sigmoidal unfolding transition and refolding kinetics could be measured upon addition of 15% (v/v) glycerol. Our measurements revealed that the free energy of unfolding in DPC micelles was 0 kcal/ mol (without glycerol). This shocking lack of WT stability made it impossible to directly measure the effects of destabilizing mutations that enhance misfolding- you can’t measure the unfolding of a protein that is already unfolded. We ultimately had to instead infer the energetic effects of such mutations from the thermodynamic coupling between cofactor binding and folding (doi: 10.1021/jacs.5b03743). Finally, after demonstrating the resulting ΔΔGs correlated with both cellular trafficking and disease phenotype, we still faced justified scrutiny about the relevance of these measurements due to the fact that they were carried out in micelles. For these reasons, we do not feel that additional biophysical measurements will add much to this work until more is understood about the nature of misfolding reactions in the membrane and how to effectively recapitulate it in vitro. We also note that PMP22 is secreted with 20% efficiency in mammalian cell lines, which is 20-fold more efficient than human GnRHR under similar conditions (doi: 10.1016/j.celrep.2021.110046). Thus, we suspect equilibrium unfolding measurements are likely out of reach using previously described measurements.

      Our greatest evidence suggesting W107A destabilizes the protein has to do with the fact that it deletes a highly conserved structural contact and that this structural modification kills its secretion. The fact that this mutation clearly reduces the escape of GnRHR from ER quality control is a classic indicator of misfolding that represents the cell’s way of telling us that the mutation compromises the folding of the nascent protein in some way or another. Precisely how this mutation remodels the nascent conformational ensemble of nascent GnRHR and how this relates to the free energy difference between the native and non-native portions of its conformational ensemble under cellular conditions is a much more challenging question that lies beyond the scope of this investigation (and likely beyond the scope of what’s currently possible). Indeed, there is an entire field dedicated to understanding such. Nevertheless, the difference in the epistatic interactions formed by W107A and V276T is at the very least consistent with our speculative interpretation that these two mutations vary in their misfolding mechanism and/ or in the extent to which they destabilize the protein. For these reasons, we feel the main conclusions of this manuscript are well-justified.

      Please clarify if the protein is glycosylated or not and, if it is, how would this requirement affect the conclusions of your analysis?

      As we noted in the Response to the Reviewers, which also constitutes a published portion of the final manuscript, this protein is indeed glycosylated. We were well aware of this aspect of the protein since inception of this project and do not think this changes our interpretation at all. Most membrane proteins are glycosylated, and several groups have demonstrated in various ways that the secretion efficiency of glycoproteins is proportional to certain stability metrics for secreted soluble proteins and membrane proteins alike. Generally, mutations that enhance misfolding do not change the propensity of the nascent chain to undergo N-linked glycosylation, which occurs during translation before protein synthesis and/ or folding is complete. Misfolded proteins typically carry lower weight glycans, which reflects their failure to advance from the ER to the Golgi, where N-linked glycans are modified and O-linked glycans are added. From our perspective, glycosyl modifications just ensure that nascent proteins are engaged by calnexin and other lectin chaperones involved in QC. It does not decouple folding from secretion efficiency. In the case of PMP22 (described above), we found that removal of its glycosylation site allows the nascent protein to bypass the lectin chaperones in a manner that enhances its plasma membrane expression eight-fold (doi: 10.1016/j.jbc.2021.100719). Similar to WT, the expression of several misfolded PMP22 variants also significantly increases upon removal of the glycosylation site. Nevertheless, their expression is still significantly lower than the un-glycosylated WT protein, and the expression patterns of the mutants relative to WT was quite similar across this panel of un-glycosylated proteins. Thus, while glycosylation certainly impacts secretion, it does not change its dependence on folding efficiency within the ER. There are many layers of partially redundant QC within the ER, and it seems that folding imposes a key bottleneck to secretion regardless of which QC proteins are involved. For these reasons, we do not think glycosylation (or other PTMs) should factor into our interpretation of these results.

      One caveat with the study is that there is a poor understanding of the factors that decide if the protein should be trafficked to the PM or not. Even secretory proteins not going through the calnexin/reticulum cycle (as they have no N-linked glycans), might still get stuck in the ER, despite the fact they are functional. Could this be a technical issue of heterologous expression overloading the Sec system?

      While we agree that there is much to be learned about this topic, we disagree with the notion that our understanding of folding and secretion is insufficient to generally interpret the molecular basis of the observed trends. In collaboration with various other groups, the corresponding author of this paper has shown for several other proteins that the stability of the native topology and the native tertiary structure can constrain secretion efficiency (see dois: 10.1021/jacs.8b08243, 10.1021/jacs.5b03743, and 10.1016/j.jbc.2021.100423). Moreover, the Balch and Kelly groups demonstrated many years ago that relatively simple models for the coupling between folding and chaperone binding can recapitulate the observed effects of mutations on the secretion efficiency of various proteins (doi: 10.1016/j.cell.2007.10.025). Given a wide body of prevailing knowledge in this area, we believe it is entirely reasonable to assume that the conformational effects of these mutation have a dominant effect on plasma membrane expression.

      Whether or not some of the proteins retained in the ER are folded and/ or functional is an interesting question, but is outside the scope of this work. Various lines of evidence concerning approaches to rescue misfolded membrane proteins suggest many of these variants are likely to retain residual function once they escape the ER, which may suggest there are pockets of foldable/ folded proteins within the ER. But it seems generally clear that the efficiency of folding in the ER bottlenecks secretion regardless of whether or not the ER contains some fraction of folded/ functional protein. We note that it is certainly possible, if not likely, that secretion efficiency is likely to be higher at lower expression levels (doi: 10.1074/jbc.AC120.014940). However, the mutational scanning platform used in this work was designed such that all variants are expressed from an identical promoter at the same location within the genome. Thus, for the purposes of these investigations, we believe it is entirely fair to draw “apples-to-apples” comparisons of their relative effects on plasma membrane expression.

      Please see Francis Arnold's paper on this point and their mutagenesis library of the channelrhodopsin (https://www.pnas.org/doi/10.1073/pnas.1700269114), which further found that 20% of mutations improved WT trafficking. Some general comparisons to this paper might be informative.

      We agree that it may be interesting to compare the results from this paper to those in our own. Indeed, we find that 20% of the point mutations characterized herein also enhance the expression of WT mGnRHR, as mentioned in the Results section. However, we think it might be a bit premature to suggest this is a more general trend in light of the fact that the channelrhodopsins engineered in those studies were not of eukaryotic origin and have likely resulted from distinct evolutionary constraints. We ultimately decided against adding more on this to our already lengthy discussion in order to maintain focus on the mechanisms of epistasis.

      Chris Tate and others have shown that there is a high frequency of finding stabilising point mutations in GPCRs and this is the premise of the StAR technology used to thermostabilise GPCRs in the presence of different ligands, i.e. agonist vs inverse agonists. As far as I am aware, there is a poor correlation between expression levels and thermostability (measured by ligand binding to detergent-solubilised membranes). As such, it is possible that some of the mutants might be more stable than WT even though they have lower levels of PME.

      We believe the disconnect between thermostability and expression precisely speaks to our main point about the suitability of current membrane protein folding assays for the questions we address herein. The degradative activity of ER quality control has not necessarily selected for proteins that are resistant to thermal degradation and/ or are suitable for macromolecular crystallography. For this reason, it is often not so difficult to engineer proteins with enhanced thermal stability. We do not believe this disconnect signals that quality control is insensitive to protein folding and stability, but rather that it is more likely to recognize conformational defects that are distinct from those involved in thermal degradation and/ or aggregation. Indeed, recent work from the Fluman group, which builds on a wider body of previous observations, has shown that the exposure of polar groups within the membrane is a key factor that recruits degradation machinery (doi: 0.1101/2023.12.12.571171). It is hard to imagine that these sorts of conformational defects are the same as those involved in thermal aggregation.

      Reviewer #2 (Recommendations For The Authors):

      (1) I believe that by focusing more on the epistasis with V276T, and less on W107A, the paper could be strengthened significantly.

      We appreciate this sentiment. But we believe the comparison of these two mutants really drive home the point that destabilizing mutations are not equivalent with respect to the epistatic interactions they form.

      (2) In the abstract - please define the term epistasis in a simple way, to make it accessible to a general audience. For example - negative epistasis means that... this should be explicitly explained.

      We thank the reviewer for this suggestion. To meet eLife formatting, we had to cut down the abstract significantly. We simplified this as best we could in the following statement:

      “Though protein stability is known to shape evolution, it is unclear how cotranslational folding constraints modulate the synergistic, epistatic interactions between mutations.”

      We also define positive and negative epistasis in the results section as follows:

      “Positive Ɛ values denote double mutants that have greater PME than would be expected based on the effects of single mutants. Negative Ɛ values denote double mutants that have lower PME than would be expected based on the effects of single mutants. Pairs of mutations with Ɛ values near zero have additive effects on PME.”

      (3) The title is quite complex and might deter readers from outside the protein evolution field. Consider simplifying it.

      We thank the reviewer for this suggestion. We have simplified the title to the following:

      “Divergent Folding-Mediated Epistasis Among Unstable Membrane Protein Variants”

      (4) The paper could benefit from a simple figure explaining the different stages of membrane protein folding (stages 1+2) to make it more accessible to readers from outside the membrane protein field.

      This is a great suggestion. We incorporated a new schematic in the revised manuscript that outlines the nature of these processes (see Fig. 1A in the revised manuscript).

      (5) For the FACS-Seq experiment - it was not clear to me if and when all cells are pulled together. For example - are the 3 libraries mixed together already at the point of transfection, or are the transfected cells pulled together at any point before sorting? This could have some implications on batch effects and should, therefore, be explicitly mentioned in the main text.

      We thank the reviewer for this suggestion. We modified the description of the DNA library assembly to emphasize that the mutations were generated in the context of three mixed plasmid pools, which were then transfected into the cells and sorted independently:

      “We then generated a mixed array of mutagenic oligonucleotides that collectively encode this series of substitutions (Table S3) and used nicking mutagenesis to introduce these mutations into the V276T, W107A, and WT mGnRHR cDNAs (Medina-Cucurella et al., 2019), which produced three mixed plasmid pools.”

      (6) The following description in the text is quite confusing. It would be better to simplify it considerably or remove it: "scores (Ɛ) were then determined by taking the log of the double mutant fitness value divided by the difference between the single mutant fitness values (see Methods)."

      We thank the reviewer for this valuable feedback and have simplified the text as follows:

      “To compare epistatic trends in these libraries, we calculated epistasis scores (Ɛ) for the interactions that these 251 mutations form with V276T and W107A by comparing their relative effects on PME of the WT, V276T, and W107A variants using a previously described epistasis model (product model, see Methods) (Olson et al. 2014).”

    1. Author response:

      Reviewer #1 (Public Review):

      Summary:

      For many years, there has been extensive electrophysiological research investigating the relationship between local field potential patterns and individual cell spike patterns in the hippocampus. In this study, using state-of-the-art imaging techniques, they examined spike synchrony of hippocampal cells during locomotion and immobility states. In contrast to conventional understanding of the hippocampus, the authors demonstrated that hippocampal place cells exhibit prominent synchronous spikes locked to theta oscillations.

      Strengths:

      The voltage imaging used in this study is a highly novel method that allows recording not only suprathreshold-level spikes but also subthreshold-level activity. With its high frame rate, it offers time resolution comparable to electrophysiological recordings. Moreover, it enables the visualization of actual cell locations, allowing for the examination of spatial properties (e.g., Figure 4G).

      We thank the reviewer for pointing out the technical novelty of this work.

      Weaknesses:

      There is a notable deviation from several observations obtained through conventional electrophysiological recordings. Particularly, as mentioned below in detail, the considerable differences in baseline firing rates and no observations of ripple-triggered firing patterns raise some concerns about potential artifacts from imaging and analysis, such as cell toxicity, abnormal excitability, and false detection of spikes. While these findings are intriguing if the validity of these methods is properly proven, accepting the current results as new insights is challenging.

      We appreciate the reviewer’s insightful comments regarding the intriguing aspect of our findings. Indeed, the emergence of a novel form of CA1 population synchrony presents exciting implications for hippocampal memory research and beyond.

      While we acknowledge the deviations from conventional electrophysiological recordings, we respectfully contend that these differences do not necessarily imply methodological flaws. All experiments and analyses were conducted with meticulous adherence to established standards in the field.

      Regarding the observed variations in averaging firing rates, it is important to note the well-documented heterogeneity in CA1 pyramidal neuron firing rates, spanning from 0.01 to 10 Hz, with a skewed distribution toward lower frequencies (Mizuseki et al., 2013). Our exclusion criteria for neurons with low estimated firing rates may have inadvertently biased the selection towards more active neurons. Moreover, prior research has indicated that averaging firing rates tend to increase during exposure to novel environments (Karlsson et al., 2008), and among deep-layer CA1 pyramidal neurons (Mizuseki et al., 2011). Given our recording setup in a highly novel environment and the predominance of deep CA1 pyramidal neurons in our sample, the observed higher averaging firing rates could be influenced by these factors. Considering these points, our mean firing rates (3.2 Hz) are reasonable estimations compared to previously reported values obtained from electrophysiological recordings (2.1 Hz in McHugh et al., 1996 and 2.4-2.6 Hz in Buzsaki et al., 2003).

      Regarding concerns about potential cell toxicity, previous studies have shown that Voltron expression and illumination do not significantly alter membrane resistance, membrane capacitance, resting membrane potentials, spike amplitudes, and spike width (see Abdelfattah 2019, Science, Supplementary Figure 11 and 12). In our recordings, imaged neurons exhibit preserved membrane and dendritic morphology during and after experiments (Author response image 1), supporting the absence of significant toxicity.

      Author response image 1.

      Voltron-expressing neurons exhibit preserved membrane and dendritic morphology. (A) Images of two-photon z-stack maximum intensity projection showing Voltron-expressing neurons taken after voltage image experiments in vivo. (B) Post-hoc histological images of neurons being voltage-imaged.

      Regarding spike detection, we use validated algorithms (Abdelfattah et al., 2019 and 2023) to ensure robust and reliable detection of spikes. Spiking activity was first separated from slower subthreshold potentials using high-pass filtering. This way, a slow fluorescence increase will not be detected as a spike, even if its amplitude is large. We benchmarked the detection algorithm in computer simulation. The sensitivity and specificity of the algorithm exceed 98% at the level of signal-to-noise ratio of our recordings. While we acknowledge that a small number of spikes, particularly those occurring later in a burst, might be missed due to their smaller amplitudes (as illustrated in Figure 1 and 2 of the manuscript), we anticipate that any missed spikes would lead to a decrease rather than an increase in synchrony between neurons. Overall, we are confident that spike detection is performed in a rigorous and robust manner.

      To further strengthen these points, we will include the following in the revision:

      (1) Histological images of recorded neurons during and after experiments.

      (2) Further details regarding the validation of spike detection algorithms.

      (3) Analysis of publicly available electrophysiological datasets.

      (4) Discussion regarding the reasons behind the novelty of some of our findings compared to previous observations.

      In conclusion, we assert that our experimental and analysis approach upholds rigorous standards. We remain committed to reconciling our findings with previous observations and welcome further scrutiny and engagement from the scientific community to explore the intriguing implications of our findings.

      Reviewer #2 (Public Review):

      Summary:

      This study employed voltage imaging in the CA1 region of the mouse hippocampus during the exploration of a novel environment. The authors report synchronous activity, involving almost half of the imaged neurons, occurred during periods of immobility. These events did not correlate with SWRs, but instead, occurred during theta oscillations and were phased-locked to the trough of theta. Moreover, pairs of neurons with high synchronization tended to display non-overlapping place fields, leading the authors to suggest these events may play a role in binding a distributed representation of the context.

      We thank the reviewer for a thorough and thoughtful review of our paper.

      Strengths:

      Technically this is an impressive study, using an emerging approach that allows single-cell resolution voltage imaging in animals, that while head-fixed, can move through a real environment. The paper is written clearly and suggests novel observations about population-level activity in CA1.

      We thank the reviewer for pointing out the technical strength and the novelty of our observations.

      Weaknesses:

      The evidence provided is weak, with the authors making surprising population-level claims based on a very sparse data set (5 data sets, each with less than 20 neurons simultaneously recorded) acquired with exciting, but less tested technology. Further, while the authors link these observations to the novelty of the context, both in the title and text, they do not include data from subsequent visits to support this. Detailed comments are below:

      We understand the reviewer’s concerns regarding the size of the dataset. Despite this limitation, it is important to note that synchronous ensembles beyond what could be expected from chance (jittering) were detected in all examined data. In the revision, we plan to add more data, including data from subsequent visits, to further strengthen our findings.

      (1) My first question for the authors, which is not addressed in the discussion, is why these events have not been observed in the countless extracellular recording experiments conducted in rodent CA1 during the exploration of novel environments. Those data sets often have 10x the neurons simultaneously recording compared to these present data, thus the highly synchronous firing should be very hard to miss. Ideally, the authors could confirm their claims via the analysis of publicly available electrophysiology data sets. Further, the claim of high extra-SWR synchrony is complicated by the observation that their recorded neurons fail to spike during the limited number of SWRs recorded during behavior- again, not agreeing with much of the previous electrophysiological recordings.

      We understand the reviewer’s concern. We will examine publicly available electrophysiology datasets to gain further insights into any similarities and differences to our findings. Based on these results, we will discuss why these events have not been previously observed/reported.

      (2) The authors posit that these events are linked to the novelty of the context, both in the text, as well as in the title and abstract. However, they do not include any imaging data from subsequent days to demonstrate the failure to see this synchrony in a familiar environment. If these data are available it would strengthen the proposed link to novelty if they were included.

      We thank the reviewer’s constructive suggestion. We will acquire more datasets from subsequent visits to gain further insights into these synchronous events.

      3) In the discussion the authors begin by speculating the theta present during these synchronous events may be slower type II or attentional theta. This can be supported by demonstrating a frequency shift in the theta recording during these events/immobility versus the theta recording during movement.

      We thank the reviewer’s constructive suggestion. We did demonstrate a frequency shift to a lower frequency in the synchrony-associated theta during immobility than during locomotion (see Fig. 4B, the red vs. blue curves). We will enlarge this panel and specifically refer to it in the corresponding discussion paragraph.

      (4) The authors mention in the discussion that they image deep-layer PCs in CA1, however, this is not mentioned in the text or methods. They should include data, such as imaging of a slice of a brain post-recording with immunohistochemistry for a layer-specific gene to support this.

      We thank the reviewer’s constructive suggestion. We do have images of brain slices post-recordings (Author response image 2). Imaged neurons are clearly located in the deep CA1 pyramidal layer. We will add these images and quantification in the revised manuscript.

      Author response image 2.

      Imaged neurons are located in the deep pyramidal layer of the dorsal hippocampal CA1 region.

      Reviewer #3 (Public Review):

      Summary:

      In the present manuscript, the authors use a few minutes of voltage imaging of CA1 pyramidal cells in head-fixed mice running on a track while local field potentials (LFPs) are recorded. The authors suggest that synchronous ensembles of neurons are differentially associated with different types of LFP patterns, theta and ripples. The experiments are flawed in that the LFP is not "local" but rather collected in the other side of the brain, and the investigation is flawed due to multiple problems with the point process analyses. The synchrony terminology refers to dozens of milliseconds as opposed to the millisecond timescale referred to in prior work, and the interpretations do not take into account theta phase locking as a simple alternative explanation.

      We genuinely appreciate the reviewer’s feedback and acknowledge the concerns raised. However, we believe these concerns can be effectively addressed without undermining the validity of our conclusions. With this in mind, we respectfully disagree with the assessment that our experiments and investigation are flawed. Please allow us to address these concerns and offer additional context to support the validity of our study.

      Weaknesses:

      The two main messages of the manuscript indicated in the title are not supported by the data. The title gives two messages that relate to CA1 pyramidal neurons in behaving head-fixed mice: (1) synchronous ensembles are associated with theta (2) synchronous ensembles are not associated with ripples.

      There are two main methodological problems with the work:

      (1) Experimentally, the theta and ripple signals were recorded using electrophysiology from the opposite hemisphere to the one in which the spiking was monitored. However, both signals exhibit profound differences as a function of location: theta phase changes with the precise location along the proximo-distal and dorso-ventral axes, and importantly, even reverses with depth. And ripples are often a local phenomenon - independent ripples occur within a fraction of a millimeter within the same hemisphere, let alone different hemispheres. Ripples are very sensitive to the precise depth - 100 micrometers up or down, and only a positive deflection/sharp wave is evident.

      We appreciate the reviewer’s consideration regarding the collection of LFP from the contralateral hemisphere. While we acknowledge the limitation of this design, we believe that our findings still offer valuable insights into the dynamics of synchronous ensembles. Despite potential variations in theta phases with recording locations and depth, we find that the occurrence and amplitudes of theta oscillations are generally coordinated across hemispheres (Buzsaki et al., Neurosci., 2003). Therefore, the presence of prominent contralateral LFP theta around the times of synchronous ensembles in our study (see Figure 4A of the manuscript) strongly supports our conclusion regarding their association with theta oscillations, despite the collection of LFP from the opposite hemisphere.

      In addition, in our manuscript, we specifically mentioned that the “preferred phases” varied from session to session, likely due to the variability of recording locations (see Line 254-256). Therefore, we think that the reviewer’s concern regarding theta phase variability has already been addressed in the present manuscript.

      Regarding ripple oscillations, while we recognize that they can sometimes occur locally, the majority of ripples occur synchronously in both hemispheres (up to 70%, see Szabo et al., Neuron, 2022; Buzsaki et al., Neurosci., 2003). Therefore, using contralateral LFP to infer ripple occurrence on the ipsilateral side has been a common practice in the field, employed by many studies published in respectable journals (Szabo et al., Neuron, 2022; Terada et al., Nature, 2021; Dudok et al., Neuron, 2021; Geiller et al., Neuron, 2020). Furthermore, our observation that 446 synchronous ensembles during immobility do not co-occur with contralateral ripples, and the remaining 313 ensembles during locomotion are not associated with ripples, as ripples rarely occur during locomotion. Therefore, our conclusion that synchronous ensembles are not associated with ripple oscillations is supported by data.

      (2) The analysis of the point process data (spike trains) is entirely flawed. There are many technical issues: complex spikes ("bursts") are not accounted for; differences in spike counts between the various conditions ("locomotion" and "immobility") are not accounted for; the pooling of multiple CCGs assumes independence, whereas even conditional independence cannot be assumed; etc.

      We acknowledge the reviewer’s concern regarding spike train analysis. Indeed, complex bursts or different behavioral conditions can lead to differences in spike counts that could potentially affect the detection of synchronous ensembles. However, our jittering procedure (see Line 121-132) is designed to control for the variation of spike counts. Importantly, while the jittered spike trains also contain the same spike count variations, we found 7.8-fold more synchronous events in our data compared to jitter controls (see Figure 1G of the manuscript), indicating that these factors cannot account for the observed synchrony.

      To explicitly demonstrate that complex bursts cannot account for the observed synchrony, we have performed additional analysis to remove all latter spikes in bursts and only count the single and the first spikes of bursts. Importantly, we found that this procedure did not change the rate and size of synchronous ensembles, nor did it significantly alter the grand-average CCG (see Author response image 3). The results of this analysis explicitly rule out a significant effect of complex spikes on the analysis of synchronous ensembles.

      Author response image 3.

      Population synchrony remains after the removal of spikes in bursts. (A) The grand-average cross correlogram (CCG) was calculated using spike trains without latter spikes in bursts. The gray line represents the mean grand average CCG between reference cells and randomly selected cells from different sessions. (B) Pairwise comparison of the event rates of population synchrony between spike trains containing all spikes and spike trains without latter spikes in bursts. Bar heights indicate group means (n=10 segments, p=0.036, Wilcoxon signed-rank test). (C) Histogram of the ensemble sizes as percentages of cells participating in the synchronous ensembles.

      Beyond those methodological issues, there are two main interpretational problems: (1) the "synchronous ensembles" may be completely consistent with phase locking to the intracellular theta (as even shown by the authors themselves in some of the supplementary figures).

      We agree with the reviewer that the synchronous ensembles are indeed consistent with theta phase locking. However, it is important to note that theta phase locking alone does not necessarily imply population synchrony. In fact, theta phase locking has been shown to “reduce” population synchrony in a previous study (Mizuseki et al., 2014, Phil. Trans. R. Soc. B.). Thus, the presence of theta phase locking cannot be taken as a simple alternative explanation of the synchronous ensembles.

      To directly assess the contribution of theta phase locking to synchronous ensembles, we have performed a new analysis to randomize the specific theta cycles in which neurons spike, while keeping the spike phases constant. This manipulation disrupts spike co-occurrence while preserving theta phase locking, allowing us to test whether theta phase locking alone can explain the population synchrony, or whether spike co-occurrence in specific cycles is required. The grand-average CCG shows a much smaller peak compared to the original peak (Author response image 4A). Moreover, synchronous event rates show a 4.5-fold decrease in the randomized data compared to the original event rates (Author response image 4B). Thus, the new analysis reveals theta phase locking alone cannot account for the population synchrony.

      Author response image 4.

      Drastic reduction of population synchrony by randomizing spikes to other theta cycles while preserving the phases. (A) The grand-average cross correlogram (CCG) was calculated using original spike trains (black) and randomized spike trains where theta phases of the spikes are kept the same but spike timings were randomly moved to other theta cycles (red). (B) Pairwise comparison of the event rates of population synchrony between the original spike trains and randomized spike trains (n=10 segments, p=0.002, Wilcoxon signed-rank test). Bar heights indicate group means. ** p<0.01

      (2) The definition of "synchrony" in the present work is very loose and refers to timescales of 20-30 ms. In previous literature that relates to synchrony of point processes, the timescales discussed are 1-2 ms, and longer timescales are referred to as the "baseline" which is actually removed (using smoothing, jittering, etc.).

      Regarding the timescale of synchronous ensembles, we acknowledge that it varies considerably across studies and cell types. However, it is important to note that a timescale of dozens, or even hundreds of milliseconds is common for synchrony terminology in CA1 pyramidal neurons (see Csicsvari et al., Neuron, 2000; Harris et al., Science, 2003; Malvache et al., Science, 2016; Yagi et al., Cell Reports, 2023). In fact, a timescale of 20-30 ms is considered particularly important for information transmission and storage in CA1, as it matches the membrane time constant of pyramidal neurons, the period of hippocampal gamma oscillations, and the time window for synaptic plasticity. Therefore, we believe that this timescale is relevant and in line with established practices in the field.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      Connelly and colleagues provide convincing genetic evidence that importation from mainland Tanzania is a major source of Plasmodium falciparum lineages currently circulating in Zanzibar. This study also reveals ongoing local malaria transmission and occasional near-clonal outbreaks in Zanzibar. Overall, this research highlights the role of human movements in maintaining residual malaria transmission in an area targeted for intensive control interventions over the past decades and provides valuable information for epidemiologists and public health professionals.

      Reviewer #1 (Public Review):

      Zanzibar archipelago is close to achieving malaria elimination, but despite the implementation of effective control measures, there is still a low-level seasonal malaria transmission. This could be due to the frequent importation of malaria from mainland Tanzania and Kenya, reservoirs of asymptomatic infections, and competent vectors. To investigate population structure and gene flow of P. falciparum in Zanzibar and mainland Tanzania, they used 178 samples from mainland Tanzania and 213 from Zanzibar that were previously sequenced using molecular inversion probes (MIPs) panels targeting single nucleotide polymorphisms (SNPs). They performed Principal Component Analysis (PCA) and identity by descent (IBD) analysis to assess genetic relatedness between isolates. Parasites from coastal mainland Tanzania contribute to the genetic diversity in the parasite population in Zanzibar. Despite this, there is a pattern of isolation by distance and microstructure within the archipelago, and evidence of local sharing of highly related strains sustaining malaria transmission in Zanzibar that are important targets for interventions such as mass drug administration and vector control, in addition to measures against imported malaria.

      Strengths:

      This study presents important samples to understand population structure and gene flow between mainland Tanzania and Zanzibar, especially from the rural Bagamoyo District, where malaria transmission persists and there is a major port of entry to Zanzibar. In addition, this study includes a larger set of SNPs, providing more robustness for analyses such as PCA and IBD. Therefore, the conclusions of this paper are well supported by data.

      Weaknesses:

      Some points need to be clarified:

      (1) SNPs in linkage disequilibrium (LD) can introduce bias in PCA and IBD analysis. Were SNPs in LD filtered out prior to these analyses?

      Thank you for this point. We did not filter SNPs in LD prior to this analysis. In the PCA analysis in Figure 1, we did restrict to a single isolate among those that were clonal (high IBD values) to prevent bias in the PCA. In general, disequilibrium is minimal only over small distances <5-10kb without selective forces at play. This is much less than the average spacing of the markers in the panel. If there is minimal LD, the conclusions drawn on relative levels and connections at high IBD are unlikely to be confounded by any effects of disequilibrium.

      ( 2) Many IBD algorithms do not handle polyclonal infections well, despite an increasing number of algorithms that are able to handle polyclonal infections and multiallelic SNPs. How polyclonal samples were handled for IBD analysis?

      Thank you for this point. We added lines 157-161 to clarify. This section now reads:

      “To investigate genetic relatedness of parasites across regions, identity by descent (IBD) estimates were assessed using the within sample major alleles (coercing samples to monoclonal by calling the dominant allele at each locus) and estimated utilizing a maximum likelihood approach using the inbreeding_mle function from the MIPanalyzer package (Verity et al., 2020). This approach has previously been validated as a conservative estimate of IBD (Verity et al., 2020).”

      Please see the supplement in (Verity et al., 2020) for an extensive simulation study that validates this approach.

      Reviewer #1 (Recommendations For The Authors):

      (3) I think Supplementary Figures 8 and 9 are more visually informative than Figure 2.

      Thank you for your response. We performed the analysis in Figure 2 to show how IBD varies between different regions and is higher within a region than between.

      Reviewer #2 (Public Review):

      This manuscript describes P. falciparum population structure in Zanzibar and mainland Tanzania. 282 samples were typed using molecular inversion probes. The manuscript is overall well-written and shows a clear population structure. It follows a similar manuscript published earlier this year, which typed a similar number of samples collected mostly in the same sites around the same time. The current manuscript extends this work by including a large number of samples from coastal Tanzania, and by including clinical samples, allowing for a comparison with asymptomatic samples.

      The two studies made overall very similar findings, including strong small-scale population structure, related infections on Zanzibar and the mainland, near-clonal expansion on Pemba, and frequency of markers of drug resistance. Despite these similarities, the previous study is mentioned a single time in the discussion (in contrast, the previous research from the authors of the current study is more thoroughly discussed). The authors missed an opportunity here to highlight the similar findings of the two studies.

      Thank you for your insights. We appreciated the level of detail of your review and it strengthened our work. We have input additional sentences on lines 292-295, which now reads:

      “A recent study investigating population structure in Zanzibar also found local population microstructure in Pemba (Holzschuh et al., 2023). Further, both studies found near-clonal parasites within the same district, Micheweni, and found population microstructure over Zanzibar.”

      Strengths:

      The overall results show a clear pattern of population structure. The finding of highly related infections detected in close proximity shows local transmission and can possibly be leveraged for targeted control.

      Weaknesses:

      A number of points need clarification:

      (1) It is overall quite challenging to keep track of the number of samples analyzed. I believe the number of samples used to study population structure was 282 (line 141), thus this number should be included in the abstract rather than 391. It is unclear where the number 232 on line 205 comes from, I failed to deduct this number from supplementary table 1.

      Thank you for this point. We have included 282 instead of 391 in the abstract. We added a statement in the results at lines 203-205 to clarify this point, which now reads:

      “PCA analysis of 232 coastal Tanzanian and Zanzibari isolates, after pruning 51 samples with an IBD of greater than 0.9 to one representative sample, demonstrates little population differentiation (Figure 1A).”

      (2) Also, Table 1 and Supplementary Table 1 should be swapped. It is more important for the reader to know the number of samples included in the analysis (as given in Supplementary Table 1) than the number collected. Possibly, the two tables could be combined in a clever way.

      Thank you for this advice. Rather than switch to another table altogether, we appended two columns to the original table to better portray the information (see Table 1).

      Methods

      (3) The authors took the somewhat unusual decision to apply K-means clustering to GPS coordinates to determine how to combine their data into a cluster. There is an obvious cluster on Pemba islands and three clusters on Unguja. Based on the map, I assume that one of these three clusters is mostly urban, while the other two are more rural. It would be helpful to have a bit more information about that in the methods. See also comments on maps in Figures 1 and 2 below.

      Cluster 3 is a mix of rural/urban while the clusters 2, 4 and 5 are mostly rural. This analysis was performed to see how IBD changes in relation to local context within different regions in Zanzibar, showing that there is higher IBD within locale than between locale.

      (4) Following this point, in Supplemental Figure 5 I fail to see an inflection point at K=4. If there is one, it will be so weak that it is hardly informative. I think selecting 4 clusters in Zanzibar is fine, but the justification based on this figure is unclear.

      The K-means clustering experiment was used to cluster a continuous space of geographic coordinates in order to compare genetic relatedness in different regions. We selected this inflection point based on the elbow plot and based the number to obtain sufficient subsections of Zanzibar to compare genetic relatedness. This point is added to the methods at lines 174-178, which now reads:

      “The K-means clustering experiment was used to cluster a continuous space of geographic coordinates in order to compare genetic relatedness in different regions. We selected K = 4 as the inflection point based on the elbow plot (Supplemental Figure 5) and based the number to obtain sufficient subsections of Zanzibar to compare genetic relatedness.”

      (5) For the drug resistance loci, it is stated that "we further removed SNPs with less than 0.005 population frequency." Was the denominator for this analysis the entire population, or were Zanzibar and mainland samples assessed separately? If the latter, as for all markers <200 samples were typed per site, there could not be a meaningful way of applying this threshold. Given data were available for 200-300 samples for each marker, does this simply mean that each SNP needed to be present twice?

      Population frequency is calculated based on the average within sample allele frequency of each individual in the population, which is an unbiased estimator. Within sample allele frequency can range from 0 to 1. Thus, if only one sample has an allele and it is at 0.1 within sample frequency, the population allele frequency would be 0.1/100 = 0.001. This allele is removed even though this would have resulted in a prevalence of 0.01. This filtering is prior to any final summary frequency or prevalence calculations (see MIP variant Calling and Filtering section in the methods). This protects against errors occurring only at low frequency.

      Discussion:

      (6) I was a bit surprised to read the following statement, given Zanzibar is one of the few places that has an effective reactive case detection program in place: "Thus, directly targeting local malaria transmission, including the asymptomatic reservoir which contributes to sustained transmission (Barry et al., 2021; Sumner et al., 2021), may be an important focus for ultimately achieving malaria control in the archipelago (Björkman & Morris, 2020)." I think the current RACD program should be mentioned and referenced. A number of studies have investigated this program.

      Thank you for this point. We have added additional context and clarification on lines 275-280, which now reads:

      “Thus, directly targeting local malaria transmission, including the asymptomatic reservoir which contributes to sustained transmission (Barry et al., 2021; Sumner et al., 2021), may be an important focus for ultimately achieving malaria control in the archipelago (Björkman & Morris, 2020). Currently, a reactive case detection program within index case households is being implemented, but local transmission continues and further investigation into how best to control this is warranted (Mkali et al. 2023).”

      (7) The discussion states that "In Zanzibar, we see this both within and between shehias, suggesting that parasite gene flow occurs over both short and long distances." I think the term 'long distances' should be better defined. Figure 4 shows that highly related infections rarely span beyond 20-30 km. In many epidemiological studies, this would still be considered short distances.

      Thank you for this point. We have edited the text at lines 287-288 to indicate that highly related parasites mainly occur at the range of 20-30km, which now reads:

      “In Zanzibar, highly related parasites mainly occur at the range of 20-30km.”

      (8) Lines 330-331: "Polymorphisms associated with artemisinin resistance did not appear in this population." Do you refer to background mutations here? Otherwise, the sentence seems to repeat lines 324. Please clarify.

      We are referring to the list of Pfk13 polymorphisms stated in the Methods from lines 146-148. We added clarifying text on lines 326-329:

      “Although polymorphisms associated with artemisinin resistance did not appear in this population, continued surveillance is warranted given emergence of these mutations in East Africa and reports of rare resistance mutations on the coast consistent with spread of emerging Pfk13 mutations (Moser et al., 2021). “

      (9) Line 344: The opinion paper by Bousema et al. in 2012 was followed by a field trial in Kenya (Bousema et al, 2016) that found that targeting hotspots did NOT have an impact beyond the actual hotspot. This (and other) more recent finding needs to be considered when arguing for hotspot-targeted interventions in Zanzibar.

      We added a clarification on this point on lines 335-345, which now reads:

      “A recent study identified “hotspot” shehias, defined as areas with comparatively higher malaria transmission than other shehias, near the port of Zanzibar town and in northern Pemba (Bisanzio et al., 2023). These regions overlapped with shehias in this study with high levels of IBD, especially in northern Pemba (Figure 4). These areas of substructure represent parasites that differentiated in relative isolation and are thus important locales to target intervention to interrupt local transmission (Bousema et al., 2012). While a field cluster-randomized control trial in Kenya targeting these hotspots did not confer much reduction of malaria outside of the hotspot (Bousema et al. 2016), if areas are isolated pockets, which genetic differentiation can help determine, targeted interventions in these areas are likely needed, potentially through both mass drug administration and vector control (Morris et al., 2018; Okell et al., 2011). Such strategies and measures preventing imported malaria could accelerate progress towards zero malaria in Zanzibar.”

      Figures and Tables:

      (10) Table 2: Why not enter '0' if a mutation was not detected? 'ND' is somewhat confusing, as the prevalence is indeed 0%.

      Thank you for this point. We have put zero and also given CI to provide better detail.

      (11) Figure 1: Panel A is very hard to read. I don't think there is a meaningful way to display a 3D-panel in 2D. Two panels showing PC1 vs. PC2 and PC1 vs. PC3 would be better. I also believe the legend 'PC2' is placed in the wrong position (along the Y-axis of panel 2).

      Supplementary Figure 2B suffers from the same issue.

      Thank you for your comment. A revised Figure 1 and Supplemental Figure 2 are included, where there are separate plots for PC1 vs. PC2 and PC1 vs. PC3.

      (12) The maps for Figures 1 and 2 don't correspond. Assuming Kati represents cluster 4 in Figure 2, the name is put in the wrong position. If the grouping of shehias is different between the Figures, please add an explanation of why this is.

      Thank you for this point. The districts with at least 5 samples present are plotted in the map in Figure 1B. In Figure 2, a totally separate analysis was performed, where all shehias were clustered into separate groups with k-means and the IBD values were compared between these clusters. These maps are not supposed to match, as they are separate analyses. Figure 1B is at the district level and Figure 2 is clustering shehias throughout Zanzibar.

      The figure legend of Figure 1B on lines 410-414 now reads:

      “B) A Discriminant Analysis of Principal Components (DAPC) was performed utilizing isolates with unique pseudohaplotypes, pruning highly related isolates to a single representative infection. Districts were included with at least 5 isolates remaining to have sufficient samples for the DAPC. For plotting the inset map, the district coordinates (e.g. Mainland, Kati, etc.) are calculated from the averages of the shehia centroids within each district.”

      The figure legend of Figure 2 on lines 417-425 now reads:

      “Figure 2. Coastal Tanzania and Zanzibari parasites have more highly related pairs within their given region than between regions. K-means clustering of shehia coordinates was performed using geographic coordinates all shehias present from the sample population to generate 5 clusters (colored boxes). All shehias were included to assay pairwise IBD between differences throughout Zanzibar. Pairwise comparisons of within cluster IBD (column 1 of IBD distribution plots) and between cluster IBD (column 2-5 of IBD distribution plots) was done for all clusters. In general, within cluster IBD had more pairwise comparisons containing high IBD identity.”

      (13) Figure 2: In the main panel, please clarify what the lines indicate (median and quartiles?). It is very difficult to see anything except the outliers. I wonder whether another way of displaying these data would be clearer. Maybe a table with medians and confidence intervals would be better (or that data could be added to the plots). The current plots might be misleading as they are dominated by outliers.

      Thank you for this point and it greatly improved this figure. We changed the plotting mechanisms through using a beeswarm plot, which plots all pairwise IBD values within each comparison group.

      (14) In the insert, the cluster number should not only be given as a color code but also added to the map. The current version will be impossible to read for people with color vision impairment, and it is confusing for any reader as the numbers don't appear to follow any logic (e.g. north to south).

      Thank you very much for these considerations. We changed the color coding to a color blind friendly palette and renamed the clusters to more informative names; Pemba, Unguja North (Unguja_N), Unguja Central (Unguja_C), Unguja South (Unguja_S) and mainland Tanzania (Mainland).

      (15) The legend for Figure 3 is difficult to follow. I do not understand what the difference in binning was in panels A and B compared to C.

      Thank you for this point. We have edited the legend to reflect these changes. The legend for Figure 3 on lines 427-433 now reads:

      “Figure 3. Isolation by distance is shown between all Zanzibari parasites (A), only Unguja parasites (B) and only Pemba parasites (C). Samples were analyzed based on geographic location, Zanzibar (N=136) (A), Unguja (N=105) (B) or Pemba (N=31) (C) and greater circle (GC) distances between pairs of parasite isolates were calculated based on shehia centroid coordinates. These distances were binned at 4km increments out to 12 km. IBD beyond 12km is shown in Supplemental Figure 8. The maximum GC distance for all of Zanzibar was 135km, 58km on Unguja and 12km on Pemba. The mean IBD and 95% CI is plotted for each bin.”

      (16) Font sizes for panel C differ, and it is not aligned with the other panels.

      Thank you for pointing this out. Figure 3 and Supplemental Figure 10 are adjusted with matching formatting for each plot.

      (17) Why is Kusini included in Supplemental Figure 4, but not in Figure 1?

      In Supplemental Figure 4, all isolates were used in this analysis and isolates with unique pseudohaplotypes were not pruned to a single representative infection. That is why there are additional isolates in Kusini. The legend for Supplemental Figure 4 now reads:

      “Supplemental Figure 4. PCA with highly related samples shows population stratification radiating from coastal Mainland to Zanzibar. PCA of 282 total samples was performed using whole sample allele frequency (A) and DAPC was performed after retaining samples with unique pseudohaplotypes in districts that had 5 or more samples present (B). As opposed to Figure 1, all isolates were used in this analysis and isolates with unique pseudohaplotypes were not pruned to a single representative infection.”

      (18) Supplemental Figures 6 and 7: What does the width of the line indicate?

      The sentence below was added to the figure legends of Supplemental Figures 6 and 7 and the legends of each network plot were increased in size:

      “The width of each line represents higher magnitudes of IBD between pairs.”

      (19) What was the motivation not to put these lines on the map, as in Figure 4A? This might make it easier to interpret the data.

      Thank you for this comment. For Supplemental Figure 8 and 9, we did not put these lines that represent lower pairwise IBD to draw the reader's attention to the highly related pairs between and within shehias.

      Reviewer #2 (Recommendations For The Authors):

      (1) There is a rather long paragraph (lines 300-323) on COI of asymptomatic infections and their genetic structure. Given that the current study did not investigate most of the hypotheses raised there (e.g. immunity, expression of variant genes), and the overall limited number of asymptomatic samples typed, this part of the discussion feels long and often speculative.

      Thank you for your perspective. The key sections highlighted in this comment, regarding immunity and expression of variant genes, were shortened. This section on lines 300-303 now reads:

      “Asymptomatic parasitemia has been shown to be common in falciparum malaria around the globe and has been shown to have increasing importance in Zanzibar (Lindblade et al., 2013; Morris et al., 2015). What underlies the biology and prevalence of asymptomatic parasitemia in very low transmission settings where anti-parasite immunity is not expected to be prevalent remains unclear (Björkman & Morris, 2020).”

      (2) As a detail, line 304 mentions "few previous studies" but only one is cited. Are there studies that investigated this and found opposite results?

      Thank you for this comment. We added additional studies that did not find an association between clinical disease and COI. These changes are on lines 303-308, which now reads:

      “Similar to a few previous studies, we found that asymptomatic infections had a higher COI than symptomatic infections across both the coastal mainland and Zanzibar parasite populations (Collins et al., 2022; Kimenyi et al., 2022; Sarah-Matio et al., 2022). Other studies have found lower COI in severe vs. mild malaria cases (Robert et al., 1996) or no significant difference between COI based on clinical status (Earland et al. 2019; Lagnika et al. 2022; Conway et al. 1991; Kun et al. 1998; Tanabe et al. 2015)”

      (3) Table 2: Percentages need to be checked. To take one of several examples, for Pfk13-K189N a frequency of 0.019 for the mutant allele is given among 137 samples. 2/137 equals to 0.015, and 3/137 to 0.022. 0.019 cannot be achieved. The same is true for several other markers. Possibly, it can be explained by the presence of polyclonal infections. If so, it should be clarified what the total of clones sequenced was, and whether the prevalence is calculated with the number of samples or number of clones as the denominator.

      Thank you for this point. We mistakenly reported allele frequency instead of prevalence. An updated Table 2 is now in the manuscript. The method for calculating the prevalence is now at lines 148-151:

      “Prevalence was calculated separately in Zanzibar or mainland Tanzania for each polymorphism by the number of samples with alternative genotype calls for this polymorphism over the total number of samples genotyped and an exact 95% confidence interval was calculated using the Pearson-Klopper method for each prevalence.”

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Granados-Aparici et al., investigate somatic-germline interactions in female mice. Mammalian oocytes are nurtured in multi-cellular ovarian follicles and communication with surrounding somatic cells is critical for oocyte development. This study focused on transzonal projections (TZP) extending from granulosa cells to the surface of oocytes and documented the importance of SMAD4, a TGF- β mediator, in regulating the TZPs. They propose a model in which individual TZPs contact the surface of the oocyte and stably attach if there is sufficient N-cadherin. In SMAD4-depleted cells, there is insufficient N-cadherin to stabilize the attachment. The TZP continues to elongate but eventually retracts. Their model is well supported by their experimental evidence and the manuscript is both well-formulated and written.

      Reviewer #2 (Public Review):

      Summary:

      This study proposed a new mechanism by which the TGF-beta signaling pathway promotes contacts between oocytes and the surrounding somatic cells in mice, by regulating the numbers of transzonal projections (TZPs).

      Strengths:

      The conditional Smad4 knockout and three-dimensional observation of transzonal projections are solid and sufficiently support the major conclusions.

      Weaknesses:

      The physiological significance of SMAD4-dependent formation of transzonal projection networks is not assessed in this study.

      Previous studies have shown that physical contact and gap junctional communication with the granulosa cells is essential for normal oocyte development. A recent study has also shown that depleting Myo10 in granulosa cells reduces the number of TZPs and leads to abnormalities in oocyte and embryo development. Thus, the importance of TZPs is well-established. These findings, which were insufficiently brought out in the Introduction of the original manuscript, have now been made more clearly (Introduction, 2nd paragraph). We recognize that these reports do not directly test a role for SMAD4-dependent TZPs. Unfortunately, it is beyond our technical capacity to obtain embryos following meiotic maturation and fertilization of oocytes that have grown in vitro, which wold be necessary for us to fully test the physiological role of SMAD4-dependent TZPs.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) The authors switch from Amhr2-cre to ER-cre to increase the number of GFP-positive granulosa cells in 12 d/o ovaries. To avoid disruption of FSH secretion by SMAD4, they use an in vitro model that requires 6 days in GEO culture (1 d tamoxifen + 5 d). Could it be that Amhr2-cre didn't work because most follicles would not have reached the atretic preantral stage in 12 d/o ovaries? Did the authors consider 6 days in vitro GEO culture to determine if Amhr2-cre would be efficient and avoid exposure to tamoxifen?

      Please see below.

      When is Amhr2 expressed?

      Previous studies (Jorgez et al, 2004; Pangas et al, 2006) report that Amhr2 is expressed in growing follicles that have progressed beyond a single layer of granulosa cells (often defined as secondary and primary follicles, respectively). As shown in Fig. 1C, we did not observe evidence of widespread Cre activity in multilayer follicles. At least two factors may contribute why we observed relatively weak Cre activity. One possibility is that, on the genetic background our mice, Amhr2 is expressed relatively late during follicular growth. Thus, we might have observed more GFP-positive granulosa cells in antral or pre-ovulatory follicles. Because the granulosa cells of these late-stage follicles would already have produced many TZPs, the number of new TZPs generated in wild-type but not SMAD4-depleted cells after Amhr2 activation would be a relatively small proportion of the total population. This would make it more difficult to detect a reduction in TZP number in the absence of SMAD4.

      A second point is that we used pre-puberal mice whereas Jorgez et al examined Amhr2 expression in ovaries of adult mice. Pangas et al evaluated both prepuberal and adult females. It may be that Amhr2 is expressed earlier or more strongly in granulosa cells of adult mice. Regarding the suggestion to culture complexes obtained from mice on the Amhr2-Cre background, as this might allow widespread expression of Cre without the need for tamoxifen, this is an excellent idea. If there is considerable heterogeneity among cells in the timing of Amhr2-Cre activity, though, this may further cloud efforts to uncover the role of SMAD4 in the production or stability of TZPs, as noted above.

      (2) Did most of the GEO cultured in vitro reach the antral follicle stage after 6 days?

      Since GOCs were treated with collagenase, the thecal layer was removed. Therefore, development of an antrum does not occur. We observed that, in some cases, the oocyte was extruded from the granulosa cell mass. These abnormal complexes were discarded.

      (3). Was the development/diameter of the oocyte in the GEO comparable to the oocyte growing in vivo?

      We did not compare the diameter of the oocytes grown in vitro to those grown in vivo. Thus, we cannot say whether the oocytes grown in vitro reached the same size as those grown in vivo. We did, however, compare the diameter of the oocytes in the wt and ko groups and observed no difference (Figure 2). This indicates that depletion of SMAD4 in the granulosa cells does not impair oocyte growth. Importantly for our studies, it excludes the possibility that the reduction in TZP-number is simply due to a smaller surface area of the oocyte.

      (4) SMAD4 depletion in granulosa cells disrupts steroidogenesis leading to increased progesterone levels and precocious luteinization of granulosa cells (Pangas et al., 2006). Did the authors determine the expression level of luteal markers of granulosa cells in the in vitro GEO culture Smad4 knockout model? Are their observations direct effects of the absence of SMAD4?

      This is an excellent point. We checked our previously performed RNA-seq analysis of the wild-type and knockout granulosa cells, but found no difference in the quantities of Cyp11a1, Sfrp4, Star or Ptgfr. This is now described in the Discussion (4th paragraph). One potentially important difference between our study and that of Pangas et al (2006) is that they observed premature luteinization when prepuberal (3-week old) mice were injected with the FSH analogue, equine serum gonadotropin, whereas we studied granulosa-oocyte complexes cultured in vitro. This could underlie the apparent differences with respect to luteinization.

      (5) Could the reduced number of TZPs in ER-cre+; Smad4fl/fl GOCs be explained by luteinization?

      This interesting and logical possibility is related to the previous point. In other words, luteinization could be considered as a default pathway of differentiation that is suppressed by SMAD signaling. It is possible that luteinized cells are unable to generate or maintain TZPs. This model offers a potential mechanistic basis for our observation, and we now raise it in the Discussion (3rd paragraph).

      Reviewer #3 (Recommendations For The Authors):

      The expression and localization of N-cadherin should be observed in Smad4 and control granulosa cell-oocyte complexes.

      We agree that this would be an excellent approach to confirm the decreased expression of N-cadherin in the granulosa cells that was observed by immunoblotting. We were confronted by two challenges, however. First, we were unable to consistently obtain strong staining of granulosa cell membranes in the inner layers of multilayer granulosa-oocyte complexes. Other antibodies are able to stain structures at the oocyte surface, indicating that antibodies are not physically blocked from penetrating the complex. More likely, the anti-N-cadherin does not bind its target strongly enough to generate a robust signal that can be detected through multiple overlying layers of cells. Second, whereas for immunoblotting we collect all granulosa cells from culture complexes, for immunofluorescence we are only able to examine those that remain in the complex. This means that, for immunofluorescence, we essentially but unavoidably select against cells that are only loosely attached – as would be expected for N-cadherin-deficient cells – to their neighbours. Given these challenges, we believe that the immunoblotting approach, which produced highly reproducible results over six biological replicates (Fig. 6), is the most reliable.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This study presents useful findings regarding the role of formin-like 2 in mouse oocyte meiosis. The submitted data are supported by incomplete analyses, and in some cases, the conclusions are overstated. If these concerns are addressed, this paper would be of interest to reproductive biologists.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The presented study focuses on the role of formin-like 2 (FMNL2) in oocyte meiosis. The authors assessed FMNL2 expression and localization in different meiotic stages and subsequently, by using siRNA, investigated the role of FMNL2 in spindle migration, polar body extrusion, and distribution of mitochondria and endoplasmic reticulum (ER) in mouse oocytes.

      Strengths:

      Novelty in assessing the role of formin-like 2 in oocyte meiosis.

      Weaknesses:

      Methods are not properly described.

      Overstating presented data.

      It is not clear what statistical tests were used.

      My main concern is that there are missing important details of how particular experiments and analyses were done. The material and methods section are not written in the way that presented experiments could be repeated - it is missing basic information (e.g., used mouse strain, timepoints of oocytes harvest for particular experiments, used culture media, image acquisition parameters, etc.). Some of the presented data are overstated and incorrectly interpreted. It is not clear to me how the analysis of ER and mitochondria distribution was done, which is an important part of the presented data interpretation. I'm also missing important information about the timing of particular stages of assessed oocytes because the localization of both ER and mitochondria differs at different stages of oocyte meiosis. The data interpretation needs to be justified by proper analysis based on valid parameters, as there is considerable variability in the ER and mitochondria structure and localization across oocytes based on their overall quality and stage.

      Thank you for your comment. We regret the oversight of omitting critical information in the manuscript. In the revised manuscript, we have included essential details such as mouse strains, culture media, stages of oocyte and statistical methods in the materials and methods section. Please find our details responses in the “Recommendations for the authors” part.

      Reviewer #2 (Public Review):

      Summary:

      This research involves conducting experiments to determine the role of Fmnl2 during oocyte meiosis I.

      Strengths:

      Identifying the role of Fmnl2 during oocyte meiosis I is significant.

      Weaknesses:

      The quantitative analysis and the used approach to perturb FMNL2 function are currently incomplete and would benefit from more confirmatory approaches and rigorous analysis.

      (1) Most of the results are expected. The new finding here is that FMNL2 regulates cytoplasmic F-actin in mouse oocytes, which is also expected given the role of FMNL2 in other cell types. Given that FMNL2 regulates cytoplasmic F-actin, it is very expected to see all the observed phenotypes. It is already established that F-actin is required for spindle migration to the oocyte cortex, extruding a small polar body and normal organelle distribution and functions.

      Thank you for your comment. In the recent decade, Arp2/3 complex (Nat Cell Biol 2011), Formin2 (Nat Cell Biol 2002, Nat Commun 2020), and Spire (Curr Biol 2011) were reported to be 3 key factors to involve into this process. These factors regulate actin filaments in different ways. However, how they cross with each other for the subcellular events were still fully clear. Our current study identified that FMNL2 played a critical role in coordinating these molecules for actin assembly in oocytes. Our findings demonstrate that FMNL2 interacts with both the Arp2/3 complex and Formin2 to facilitate actin-based meiotic spindle migration. Additionally, we discovered a novel role for FMNL2 in determining the distribution and function of the endoplasmic reticulum and mitochondria, which may in turn influence meiotic spindle migration in oocytes. Our results not only uncover the novel functions of FMNL2-mediated actin for organelle distribution, but also extend our understanding of the molecular basis for the unique meiotic spindle migration in oocyte meiosis.

      (2) The authors used Fmnl2 cRNA to rescue the effect of siRNA-mediated knockdown of Fmnl2. It is not clear how this works. It is expected that the siRNA will also target the exogenous cRNA construct (which should have the same sequence as endogenous Fmnl2) especially when both of them were injected at the same time. Is this construct mutated to be resistant to the siRNA?

      Thank you for your question. We regret any misunderstanding that may have been caused by the inappropriate description in our manuscript. In the rescue experiments, we initially injected FMNL2 siRNA into oocytes, followed by the microinjection of FMNL2 mRNA 18-20 hours later. After conducting our previous experiments, we have verified through Western blotting that endogenous FMNL2 is effectively suppressed 18-20 hours following the microinjection of FMNL2 siRNA. Additionally, we observed a significant increase in exogenous FMNL2 protein expression 2 hours after the injection of FMNL2 mRNA. We believe that the exogenous FMNL2 could compensate the decrease by FMNL2 knockdown, and this approach was adopted in many oocyte studies.

      (3) The authors used only one approach to knockdown FMNL2 which is by siRNA. Using an additional approach to inhibit FMNL2 would be beneficial to confirm that the effect of siRNA-mediated knockdown of FMNL2 is specific.

      Thank you for your question. Yes, the specificity is always the concern for siRNA or morpholino microinjection due to the off-target issue. Due to the limitation we could not generate the knock out model, and there are no known inhibitors with specific targeting capabilities for FMNL2. To solve this, we performed the rescue study with exogenous mRNA to confirm the effective knock down of FMNL2. These measures provide reassurance regarding the credibility of the experimental outcomes, and this is also the general way to avoid the off-target of siRNA or morpholino.

      Reviewer #3 (Public Review):

      Summary:

      The authors focus on the role of formin-like protein 2 in the mouse oocyte, which could play an important role in actin filament dynamics. The cytoskeleton is known to influence a number of cellular processes from transcription to cytokinesis. The results show that downregulation of FMNL2 affects spindle migration with resulting abnormalities in cytokinesis in oocyte meiosis I.

      Weaknesses:

      The overall description of methods and figures is overall dismissively poor. The description of the sample types and number of replicate experiments is impossible to interpret throughout, and the quantitative analysis methods are not adequately described. The number of data points presented is unconvincing and unlikely to support the conclusions. On the basis of the data presented, the conclusions appear to be preliminary, overstated, and therefore unconvincing.

      Thank you for your comment. We regret the oversight of omitting critical information in the manuscript. In the revised manuscript, we have incorporated your suggestions for modification, particularly regarding the Materials and Methods section. Please see the detailed revision and responses in the “Recommendations for the authors” part.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for The Authors):

      My main concern is that there are missing important details of how particular experiments and analyses were done. The material and methods section is not written in the way that presented experiments could be repeated - it is missing basic information (e.g., used mouse strain, timepoints of oocytes harvest for particular experiments, used culture media, image acquisition parameters, etc.). Some of the presented data are overstated and incorrectly interpreted. It is not clear to me how the analysis of ER and mitochondria distribution was done, which is an important part of the presented data interpretation. I'm also missing important information about the timing of particular stages of assessed oocytes because the localization of both ER and mitochondria differs at different stages of oocyte meiosis. The data interpretation needs to be justified by proper analysis based on valid parameters, as there is considerable variability in the ER and mitochondria structure and localization across oocytes based on their overall quality and stage. My specific comments are listed below.

      (1) Information about statistical tests that were used needs to be provided for all quantification experiments.

      Thank you for your suggestion. Based on your suggestions, we revised the statistical analysis description in the Materials and Methods section. Additionally, we also included a description of the statistical methods in the legends of the relevant result figures.

      (2) I recommend replacing the plunger plots, used in most quantification data, with alternatives allowing evaluation of the distribution of the data (dot plots, box plots, whisker plots).

      Thank you for your suggestion. Following your suggestion, we replaced the plunger plots in Fig 2C, D, H, I and Fig3 B, C with dot plots.

      (3) Can the authors provide information about particular time points when were individual oocyte stages (GVBD, meiosis I, and meiosis II) harvested/used for immunofluorescence protein detection, western blotting, microinjection, and ER and mitochondria staining? Were the time points always the same in all presented experiments and experimental vs control group? If not, this needs to be clarified.

      Thank you for your suggestion. We used oocytes in the metaphase I (MI) stage for the statistical analysis of spindle migration, actin filament aggregation, endoplasmic reticulum localization, and mitochondrial localization. In the Western blot analysis, GV stage oocytes were utilized to evaluate the efficiency of knockdown and rescue experiments. The protein expression levels of Arp2, Formin2, INF2, Cofilin, Grp78, and Chop in different treatment groups were detected using MI-stage oocytes. In the revised version, we provided all the detailed information about the stages.

      (4) Figure 1B: Can the authors comment on why there is a missing representative image of MII oocyte FMBL2-Ab? I recommend including this in the figure to have a complete view of comparing overexpressed and endogenous FMNL2 localization in oocyte meiosis.

      Thank you for your suggestion. In the revised manuscript, we added immunostaining images of FMNL2 antibody in MII stage oocytes.

      (5) Figure 1C: The figure legend says, "FMNL2 and actin overlapped in cortex and spindle surrounding". In MI oocytes, there is usually no accumulated actin signal around the spindle, which is also true in the presented images, so there cannot be overlapping with the FMNL2 signal. The interpretation should be changed.

      We apologize for this inappropriate description that was used, and we deleted this sentence.

      (6) Figure 2B: What were the parameters of the "large" and "normal" polar bodies for performing the analysis?

      Thank you for your question. In order to assess the size of the polar body, we conducted a comparison between the diameter of the polar body and that of the oocyte. If the diameter of the polar body was found to be less than 1/3 of the oocyte's diameter, we categorized it as normal-sized polar body. Conversely, if the polar body's diameter exceeded 1/3 of the oocyte's diameter, we categorized it as a large polar body. We have included these details in the Results section of the manuscript.

      (7) Figure 2F: Can the authors comment on what can be the second band in the rescue group?

      Thank you for your question. In the rescue experiment, we microinjected exogenous FMNL2-EGFP mRNA into the oocytes. As a result, compared to endogenous FMNL2, the protein size increased due to the addition of the EGFP tag, approximately 27 kDa. Hence, in the Western blot bands of the rescue group, the upper band represents the expression of exogenous FMNL2-EGFP, while the lower band corresponds to the expression of endogenous FMNL2. We have provided annotations in the revised Figure 2F to clarify this.

      (8) Can the authors comment on the variability of PBE between 2C and 2H in the FMNL2-KD groups? In panel C, the PBE in the KD group was 59.5 {plus minus} 2.82%; in panel H, the PBE in the KD group was 48.34 {plus minus} 4.2%, and in the rescue group, the PBE was 62.62 {plus minus} 3.6%. The rescue group has a similar PBE rate as the KD group in panel C. How consistent was the FMNL2 knockdown across individual replicates? Can the authors provide more details on how the rescue experiment was performed?

      Thank you for your question. We believe that the difference in PBE observed in Figure 2C and 2H of the FMNL2-KD group was due to the microinjection times and the duration of in vitro arrest. The results shown in Figure 2C depict the outcome of a single injection of FMNL2 siRNA into GV stage oocytes, followed by 18 hours of in vitro arrest; the results shown in Figure 2H contain a subsequent additional injection of FMNL2-EGFP mRNA with another 2 hours of arrest. The two rounds of microinjection and the extended period of in vitro arrest both affect oocyte maturation rates.

      (9). Figure 2J and K: What groups were compared together? The used statistic needs to be properly described.

      Thank you for your question. The FMNL2-KD, FMNL3-KD, and FMNL2+3-KD groups were all compared to the Control group, therefore, t-test was used for analysis. We have provided explanations in the revised manuscript.

      (10) Figure 4B and C: Can the authors provide representative images without oversaturated actine signal?

      Thank you for your question. For the analysis of oocyte F-actin, the F-actin are divided into cortex actin and cytoplasmic actin. Due to the contrast during imaging, the strong cortex actin signals affected the detection of cytoplasmic actin, therefore, it is necessary to increase the scanning index, which will cause the overexpose the cortex actin signal. This is for the better observation of the cytoplasmic signals.

      (11) Figure 4G + 5H: Can the authors comment on why they used as a housekeeping gene actin instead of tubulin, which was used in the rest of the WB experiments?

      Thank you for your question. In most of the western blot experiments conducted in this study, we used tubulin as a housekeeping gene. However, due to the supply of antibodies by delivery period, we had GAPDH and actin as well for some experiments. These housekeeping genes were all valid for the study.

      (12) Based on what parameters was ER considered normally or abnormally distributed, and what stages of oocytes were assessed?

      Thank you for your question. In this study, we employed oocytes at the MI stage for the analysis of ER localization. In the MI stage, the ER localized around the spindle, which is regarded as the typical localization pattern. The ER displayed a dispersed distribution throughout the cytoplasm or clustered were categorized as aberrant positioning. We included relevant descriptions in the revised version of the manuscript.

      (13) Figure 5H: As a housekeeping gene was used actin - the quantification is labeled as a Grp78 to tubulin ratio.

      Thank you for pointing out the error. This is a label mistake and we corrected it.

      (14) Information about how JC-1 staining was done needs to be provided.

      Thank you for your carefully reading. We included a description of JC1 staining in the Materials and Methods section.

      (15). Line 231-232: "As shown in Figure 4A" - the text doesn't correspond to the figure.

      Thank you for pointing out the error. We revised this mistake in the revised manuscript by correcting "Fig3A" to "Fig4A."

      (16) Line 265: there is probably a missing word "Formin2".

      Thank you and we corrected the error and made the necessary changes in the revised manuscript.

      Reviewer #2 (Recommendations for The Authors):

      (1) Quantification and analysis:

      • Fig. 3B: The rate of spindle migration should be quantified based on the distance from the spindle to the cortex. Also, the orientation of the spindle (Z-position) needs to be taken into consideration.

      • Fig. 5C, D: It is unclear how the rate of ER distribution was calculated.

      • Western blot: In many experiments (such as Fig. 5H), the bands are saturated which will prevent accurate intensity measurements and quantifications.

      For spindle migration, we specifically focused on spindles exhibiting a distinctive spindle-like shape with clear bipolarity to eliminate any statistical discrepancies potentially caused by variations in Z-axis alignment. Our criterion for determining successful migration was based on the contact between the spindle pole and the cortical region of the oocyte. Therefore, we think that the rate is better to reflect the phenotype than the distance.

      For the examination of ER localization, Reviewer 1 also raised this issue. We utilized oocytes at the MI stage in this study. The ER localized around the spindle in MI stage. The ER displayed a dispersed distribution throughout the cytoplasm or clustered were categorized as aberrant positioning. We included relevant descriptions in the revised version of the manuscript.

      For the bands of the western blot results, during the experimental procedure we typically capture multiple images at different exposure levels (3-5 images). In the revised manuscript, we have replaced the inappropriate images with more suitable ones.

      (2) Given that all Immunoprecipitation experiments in this manuscript were performed on the whole ovary which contains more somatic cells than oocytes, the results do not necessarily reflect meiotic oocytes. Please consider this possibility during the interpretation.

      Thank you for your suggestion. Yes, we agree with you. In the revised manuscript, we made appropriate modifications to the relevant descriptions.

      (3) 351-365: The conclusion that Arp2/3 compensates for the decreased formin 2 in FMNL2 knockdown oocytes is a bit unconvincing. 1- In mouse oocytes, it is already known that Arp2/3 and formin 2 regulate different pools of F-actin nucleation. 2- The authors found an increase in Arp2/3 in FMNL2 knockdown oocytes compared to control oocytes without any change in cortical F-actin. Given that Arp2/3 is primarily promoting cortical F-actin, it is expected to see an increase in cortical F-actin in FMNL2 knockdown oocytes, which was not the case.

      Thank you for your question. Yes, previous studies showed that formin2 localizes to the cytoplasm of oocytes and accumulates around the spindle, which facilitate cytoplasmic actin assembly. While Arp2/3 is primarily responsible for actin assembly at the cortex region of oocytes. In invasive cells, FMNL2 is mainly localized in the leading edge of the cell, lamellipodia and filopodia tips, to improve cell migration ability by actin-based manner (Curr Biol 2012). We showed that FMNL2 localized both at spindle periphery and cortex, but depletion of FMNL2 did not affect cortex actin intensity. We think that FMNL2 and Arp2/3 both contribute to the cortex actin dynamics, when FMNL2 decreased, ARP2 increased to compensate for this, which maintained the cortex actin level. In the revised manuscript, we have made modifications to avoid excessive extrapolation from our results, ensuring that our conclusions are presented in a more objective manner.

      (4) Lines 195-197: The spindle is initially formed soon after the GVBD, so there is no spindle during GVBD. Also, I can't see oocytes at anaphase I or telophase I in this figure. Please revise.

      Thank you for your suggestion. We apologize for the inappropriate descriptions that were used. In the revised manuscript, we have made modifications to the respective descriptions in the Results part.

      (5) Fig. 2E: It seems that the control oocyte is abnormal with mild cytokinesis defects. Please replace or delete it since this information is already included in Fig. 3A.

      Thank you for your suggestion. Based on our observations, during the extrusion of the first polar body in oocytes, there is a temporary occurrence of cellular morphological fragmentation due to cortical reorganization (11h in control oocyte from Fig 2E). However, after the extrusion of the first polar body, the oocyte morphology returns to normal. Figure 2E illustrates the meiotic division process of oocytes, while Figure 3A primarily focuses on the process of oocyte spindle migration. We think that it is better to retain both to present our results.

      Reviewer #3 (Recommendations for The Authors):

      In the case of the observed phenotype, the stage of GV is important. The phenotypes presented also occur in meiotic or developmentally incompetent oocytes. In addition, the images of GV oocytes appear as NSN, which also show the KD phenotype in Figs. 2 and 3.

      Thank you for your concern. As the oocyte grows, the proportion of SN-type oocytes gradually increases. When the oocyte diameter reaches 70-80 μm, the proportion of SN oocytes is approximately 52.7% (Mol Reprod Dev. 1995). In our study, both the control and knockdown groups collected oocytes with a diameter of around 80 μm, which is considered as fully-grown oocytes, predominantly in the SN phase. Since the collection period and size of the oocytes were consistent, we can sure that the observed differences between the control and knockdown groups in phenotype analysis could be solid and reliable.

      MII is absent in Fig. 1B.

      In the revised manuscript, we added immunostaining images of FMNL2 in MII stage oocytes.

      The result of KD is not convincing. Also, discuss whether the heterozygous effect of Fmnl2 deletion affects reproductive fitness.

      Thank you for your concern. In our investigation, limited to the setup of knock out model, we employed siRNA to knockdown FMNL2 expression, to avoid the risk of off-target, we performed rescue experiment with exogenous mRNA, which we believe that it could solve this issue. When designing siRNA sequences, we ensured their specificity for binding to FMNL2 mRNA only, and we assessed the levels of FMNL2 and FMNL3 mRNA in oocytes after injection of FMNL2 siRNA. The results showed that, compared to the control group, the expression of FMNL2 mRNA decreased by approximately 70% after 18 hours of FMNL2 siRNA injection, while the level of FMNL3 mRNA was not decreased.

      Fig. 2F rescue experiment with double bands. What bands are seen here? Did the authors inject tagged or untagged FMNL2? Or does endogenous FMNL2 appear higher in the sample after KD?

      Thank you for your question. In the rescue experiment, we microinjected exogenous FMNL2-EGFP mRNA into the oocytes. As a result, compared to endogenous FMNL2, the protein size increased due to the addition of the EGFP tag, approximately 27 kDa. Hence, in the Western blot bands of the rescue group, the upper band represents the expression of exogenous FMNL2-EGFP, while the lower band corresponds to the expression of endogenous FMNL2. We provided annotations in the revised Figure 2F to clarify this.

      Variability in mitochondria and ER distribution patterns is also known in healthy and developing oocytes, although the authors described only a single phenotype.

      Thank you for your concern. Yes, mitochondria and ER show dynamic localization in different stage of oocyte maturation. However, in this study we employed oocyte MI stage for the analysis of ER and mitochondria localization, and in MI stage, both the ER and mitochondria localize around the spindle. This pattern is considered as the normal localization. Several studies showed that dispersed or clustered localization contributed to maturation defects. We included relevant descriptions in the revised manuscript.

      What exactly is meant by input in the IP experiments? Why is the target missing in the input sample?

      Thank you for your question. We subjected the input samples to electrophoresis on a single channel, all the analyzed proteins demonstrated normal expression, thereby confirming the viability of the input sample. However, upon simultaneous exposure with the IP samples, we observed a lack of clear signal for certain proteins in the input group. This phenomenon is due to the excessive signal intensity resulting from protein enrichment in the IP group, which caused the low exposure of proteins in input group.

      Explain the rationale for using, actin or tubulin as loading or normalization controls in the study focusing on the cytoskeleton.

      Thank you for your question. Actin and tubulin are both widely used as the control due to their stable expression. For actin, there are α-actin and β-actin isoforms. Formins and Arp2/3 complex regulate the polymerization of α-actin and β-actin to form F-actin, not isoform expression. In our study F-actin (the functional type) was examined. While α-tubulin and β-tubulin are two subtypes of tubulin, and they interact with each other to form stable α/β-tubulin heterodimers. The changes of cytoskeleton dynamics could not change the expression of α/β-tubulin. Therefore, β-actin and α-tubulin could be used as normalization controls.

      Fig. 6E shows only , but the legend says *.

      Thank you for pointing out the error. We correct the mistake in the revised manuscript.

      Spindle positioning appears to differ between control and KD. Does this affect the quantification of Fig. 6F? Adequate nomenclature should be used here.

      Thank you for your question. Yes, spindle positioning was affected by FMNL2 depletion. However, central spindle or cortex spindle all belong to MI stage, and JC1 is not related with the stage difference. To avoid misunderstanding we replaced the representative images and corresponding description in Figure 6F.

      The description of the methods and legends should be significantly improved.

      Thank you for your suggestion. Reviewer 1 and 2 also raised the similar concern. We enriched the description of methods and legends in the revised manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This study presents valuable new insights into HIV-associated nephropathy (HIVAN) kidney phenotype in the Tg26 transgenic mouse model and delineates the kidney cell types that express HIV genes and are injured in these HIV-transgenic mice. A series of compelling experiments demonstrated that PKR inhibition can ameliorate HIVAN with reversal of mitochondrial dysfunction (mainly confined to endothelial cells), a prominent feature shared in other kidney diseases. Although there are concerns regarding the specificity of C16 to PKR inhibition, as well as with the in situ hybridization studies, the data suggests that inhibition of PKR and mitochondrial dysfunction has potential clinical significance for HIVAN.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      HIV-associated nephropathy (HIVAN) is a rapidly progressing form of kidney disease that manifests secondary to untreated HIV infection, and is predominantly seen in individuals of African descent. Tg26 mice carrying an HIV transgene lacking gag and pol exhibit high levels of albuminuria and rapid decline in renal function that recapitulates many features of HIVAN in humans. HIVAN is seen predominantly in individuals carrying two copies of missense variants in the APOL1 gene, and the authors have previously shown that APOL1 risk variant mRNA induces activity of the double-strand RNA sensor kinase PKR. Because of the tight association between the APOL1 risk genotype and HIVAN, the authors hypothesized that PKR activation may mediate renal injury in Tg26 mice and tested this hypothesis by treating mice with a commonly used PKR inhibitory compound called C16. Treatment with C16 substantially attenuated renal damage in the Tg26 model as measured by urinary albumin/creatinine ratio, urinary NGAL/creatinine ratio, and improvement in histology. The authors then performed bulk and single-nucleus RNAseq on kidneys from mice from different treatment groups to identify pathways and patterns of cell injury associated with HIV transgene expression as well as to determine the mechanistic basis for the effect of C16 treatment. They show that proximal tubule nuclei from Tg26 mice appear to have more mitochondrial transcripts which was reversed by C16 treatment and suggest that this may provide evidence of mitochondrial dysfunction in this model. They explore this hypothesis by showing there is a decrease in the expression of nuclear-encoded genes and proteins involved in oxidative phosphorylation as well as a decrease in respiratory capacity via functional assessment of respiration in tubule and glomerular preparations from these mouse kidneys. All of these changes were reversed by C16 treatment. The authors propose the existence of a novel injured proximal tubule cell-type characterized by the leak of mitochondrial transcripts into the nucleus (PT-Mito). Analysis of HIV transgene expression showed high level expression in podocytes, consistent with the pronounced albuminuria that characterizes this model and HIVAN, but transcripts were also detected in tubular and endothelial cells. Because of the absence of mitochondrial transcripts in the podocytes, the authors speculate that glomerular mitochondrial dysfunction in this model is driven by damage to glomerular endothelial cells.

      Strengths:

      The strengths of this study include the comprehensive transcriptional analysis of the Tg26 model, including an evaluation of HIV transgene expression, which has not been previously reported. This data highlights that HIV transcripts are expressed in a subset of podocytes, consistent with the highly proteinuric disease seen in mice and humans. However, transcripts were also seen in other tubular cells, notably intercalated cells, principal cells and injured proximal tubule cells. Though the podocyte expression makes sense, the relevance of the tubular expression to human disease is still an open question.

      The data in support of mitochondrial dysfunction are also robust and rely on combined evidence from downregulation of transcripts involved in oxidative phosphorylation, decreases in complex I and II as determined by immunoblot, and assessments of respiratory capacity in tubular and glomerular preparations. These data are largely consistent with other preclinical renal injury models reported in the literature as well as previous, less thorough assessments in the Tg26 model.

      Weaknesses:

      The key weakness of the study lies in the use of a PKR inhibitor with questionable specificity. C16 has been reported to inhibit numerous other kinases including cyclin CDKs and GSK3α and -β, and this means that the conclusions of this study with respect to the role of PKR are highly questionable. The rationale for the dose used was not provided (and is lower than used in other publications with C16), and in the absence of drug exposure data and assessment of target engagement, it is difficult to ascertain whether substantial inhibition of PKR was achieved.

      A second key weakness lies in the identification of the PT-Mito cell cluster. Though the authors provide some rationale for the identification of this specific cell type, it seems equally plausible the cells merely reflect a high background capture of mitochondria in a subset of droplets. The IHC analysis that was provided is not convincing enough to support the claim and more careful high resolution imaging and in situ hybridization (with appropriate quantitation) will be needed to provide substantive support for the presence of a proximal tubule cell type with mitochondrial transcript that are trafficked to the nucleus.

      We appreciate the reviewer’s thoughtful summary.

      With regard to non-specificity of C16, we added to the Discussion a description and references that describe non-specificity of C16. as suggested by the reviewer. Of note, the C16 doses that we used were also used previously (Okamoto, CommBiol, 2018). Importantly, newly-added immunofluorescence images using a phospho-PKR specific antibody showed PKR inhibition (Supplemental Figure 1).

      Identification of the PT-Mito cluster in tissues was challenging, mainly due to the absence of existence of know marker genes for newly-identified cluster. Finally, We added in situ hybridization images, with a negative control probe, to show specificity of target probes.

      Reviewer #2 (Public Review):

      Summary:

      Numerous studies by the authors and other groups have demonstrated an important role for HIV gene expression kidney cells in promoting progressive chronic kidney disease, especially HIV-associated nephropathy. The authors had previously demonstrated a role for protein kinase R (PKR) in a non-HIV transgenic model of kidney disease (Okamoto, Commun Bio, 2021). In this study, the authors used innovative techniques including bulk and single nuclear RNAseq to demonstrate that mice expressing a replication-incompetent HIV transgene have prominent dysregulation of mitochondrial gene expression and activation of PKR and that treatment of these mice with a small molecule PKR inhibitor ameliorated the kidney disease phenotype in HIV-transgenic mice. They also identified STAT3 as a key upstream regulator of kidney injury in this model, which is consistent with previously published studies. Other important advances include identifying the kidney cell types that express the HIV transgene and have dysregulation of cellular pathways.

      Strengths:

      Major strengths of the study include the use of a wide variety of state-of-the-art molecular techniques to generate important new data on the pathogenesis of kidney injury in this commonly used model of kidney disease and the identification of PKR as a potential druggable target for the treatment of HIV-induced kidney disease. The authors also identify a potential novel cell type within the kidney characterized by high expression of mitochondrial genes.

      Weaknesses:

      Though the HIV-transgenic model used in these studies results in a phenotype that is very similar to HIV-associated nephropathy in humans, the model has several limitations that may prevent direct translation to human disease, including the fact that mice lack several genetic factors that are important contributors to HIV and kidney pathogenesis in humans. Additional studies are therefore needed to confirm these findings in human kidney disease.

      We appreciate the succinct summary of the present work. We agree that the findings from the HIV Tg26 mouse model warrant additional investigation in human kidney disease samples. Further studies will be needed to confirm whether the mechanisms presented here are operative in human HIVAN or other RNA virus-associated kidney diseases.

      Reviewer #1 (Recommendations For The Authors)

      The specificity of the C16 tool has been called into question in 3 publications - Chen et al, 2008, PMID: 19046382; Lopez-Grancha et al, 2021, PMID: 34531308; and Cusak et al, 2023, PMID: 36400288. Lopez-Grancha et al have reported a novel, more selective PKR inhibitor with good pharmacological properties that might enable a more robust test of the PKR hypothesis. Regardless, compound exposures and target engagement (i.e. by monitoring phosphorylation of PKR targets such eIF2α) should accompany these studies. Alternatively, it may be easier to probe the role of PKR in Tg26 pathogenicity by crossing the Tg26 line to a PKR knockout mouse.

      In response, we have added a description and references about the the possibility of non-specificity of C16 in the Discussion as a limitation as suggested. (Page 21).

      “Third, we acknowledge possibility of a non-specific effect of C16 as an inhibitor of PKR.66-68”

      Further, we added immunohistochemistry images of pPKR on kidney tissue as shown in Supplemental Figure 1A-D. Images showed PKR activation in Tg26 tubular cells, which was inhibited by C16 treatment.

      Author response image 1.

      Immunofluorescent images showing pPKR. (A-D) Immunofluorescent images showed PKR activation by detecting pPKR in Tg26 mouse kidney. pPKR was inhibited by C16 treatments.

      The suggested PKR knockout mice experiment is an excellent idea for future work but we believe Is outside the scope of the current manuscript.

      To enhance the evidentiary base for the PT-Mito cell type, it would be interesting to know whether these cells can also be found in human datasets like KPMP, though this might require reprocessing the original snRNAseq data. Further in situ hybridization in both mouse and human samples using fluorescent rather than colorimetric approaches should yield a more compelling dataset to provide evidence for this cell type. These approaches would also allow for more precise quantification of the PT-Mito cells compared to the population of proximal tubule cells. Again, the default assumption here should be that the mitochondrial transcripts represent a contamination, and the purpose of these additional experiments is to definitively rule out that explanation.

      Authors: First, as suggested, we carried out additional analyses. We examined a publiclyavailable human kidney snRNA-seq dataset (GSE131882) and found in it the same PT-Mito cluster as shown in Supplemental Figure 6. The PT-Mito cluster was located in close proximity to the PT cluster in a UMAP plot. We added this finding in the Results as follows (Page 12):

      “We also confirmed the existence of similar PT-Mito cluster in published human kidney single-nuclear RNA-seq data47 by the re-analysis of the original data. (Supplemental Figure 6A-C).”

      Author response image 2.

      PT-Mito cluster detection of publicly available human kidney single-nuclear RNA-seq data (GSE131882) (A) UMAP plot of human kidney single-nuclear RNA-seq data shows 16 clusters. Cluster 1, 4 are proximal tubule (PT) clusters, and cluster 7 is PT-Mito cluster. (B) Dot plot shows expression of PT marker genes and PT-Mito marker genes obtained from current manuscript data. PTMito markers including MT-CO1 and MT-CO2 had high expression in cluster 7. (C) UMAP plot shows all six samples are contributing to all cell clusters.

      Second, as suggested, we also included negative control data from in situ hybridization studies (Supplementary Figure 5A, 5B), which shows that the signals in Figure 4B, 4C are true signals.

      Author response image 3.

      Additional in situ hybridization images. (A) In situ hybridization images probing dapB (negative control probe) showed no signals. (B) In situ hybridization images probing Ppib (positive control probe) showed strong signals.

      Reviewer #2 (Recommendations For The Authors)

      (1) The supplementary data file seems to have been uploaded twice but the supplementary methods were not available which would have been helpful when assessing some methods such as using PodoCount to count podocytes.

      We acknowledge that we inadvertently failed to upload the Supplementary Methods section-thank you for pointing this out. The supplementary methods are now provided in the revised submission, including detailed methods about PodoCount. Corresponding descriptions are as follows:

      “Estimation of glomerular podocyte count

      PodoCount5, a computational tool for whole slide podocyte estimation from digitized histologic sections, was used to detect, enumerate, and characterize podocyte nuclear profiles in the glomeruli of immunohistochemically labeled (IHC-labeled) murine kidney sections. Formalin-fixed, paraffin embedded tissues (2 µm thickness) were IHC-labeled for p57kip2, a marker of podocyte terminal differentiation (ab75974, Abcam, Cambridge, UK), and detected with horse radish peroxidase (RU-HRP1000, Diagnostic BioSystems, Pleasanton, CA) and diaminobenzidine chromogen substrate (BSB0018A, Bio SB, Santa Barbara, CA). A periodic acid-Schiff post-stain was applied without hematoxylin counterstain. The tool uses a combination of stain deconvolution, digital image processing, and feature engineering to compute histologic podometrics6 with correction for section thickness7. In this study, PodoCount was used to assess mean glomerular podocyte count per mouse.“

      (2) In the abstract, the authors give the impression that they know definitively the sequence of HIV gene expression, cytoskeletal dysregulation, dedifferentiation, then loss from glomeruli. Since they could only examine cells that were present in glomeruli, they can't definitively say much about the cells that were lost from glomeruli.

      As suggested, deleted the following text: “and were lost from glomeruli tuft”

      (3) The authors state that 56,976 cells were used for snRNAseq studies. Was the number of cells similar for each of the 8 mice (from 4 different groups)?

      In response, we have created a new table summarizing numbers of nuclei from each sample (i.e. each mouse) added to the Supplemental Figure 2D as follows:

      Author response table 1.

      Pre-processing of single-nuclear RNA-seq data, Breakdown of nuclei numbers from each sample showed comparable numbers of nuclei analyzed.

      (4) Please provide information on the assay that was used to measure creatinine since some methods can be unreliable in mice

      This is now provided in the revised submission, including creatinine measurement methods (LC-MS/MS) on page 3 of Supplementary Material:

      “Mouse chemistry measurements

      Plasma creatinine was measured by isotope dilution LC-MS/MS at The University of Alabama at Birmingham O’Brien Center Core C (Birmingham, AL).”

      (5) The authors state that expression of PKR (Eif2ak2) was expressed in all nephron segments. However, it appears on visual inspection of the UMAP in Fig S2B that the percentage of cells expressing Eif2ak2 was low. What percent of cells expressed Eif2ak2 and if it was a low percentage, what is the authors hypothesis for how expression in a small percentage of cells led to the kidney phenotype?

      Supplemental Figure 2B (now 3B) does show modest expression of Eif2ak2, approximately 10%. The technique may lack sensitivity to detect low gene expression and even low gene expression may be sufficient to cause phenotypic change.

      (6a) In figure 4B and C, it is not clear what genotype/treatment group is shown.

      The legend for figure 4B, 4C has been modified to state that the group was wildtype mice

      (B, C) In situ hybridization of mt-Co1 and mt-Atp6 genes showed signals inside nuclei of WT mice

      (6b) Also, if these ISH images are from Tg26 mice, it would be helpful to do ISH in mice with/without C16 treatment.

      These images of ISH for these two genes are from wild-type mice, as now stated in the revised legend. Our purpose was to show that these mitochondrial-encoded gene transcripts (mt-Co1 and mt-Atp6) are transported to nuclei from the cytoplasm. We believe it is not necessary to do ISH in Tg26 mice because these genes are not disease-specific.

      (6c) Also, only 3-6% of cells express these "PT-mito" markers by snRNAseq, but it appears that far more are expressed by ISH, raising concerns for nonspecific binding of the ISH probe.

      (6d) Also, nonsense controls should be included to demonstrate the specificity of the ISH data.

      First (comment 6c), the PT-mito cluster does not have specific markers, to our knowledge. Second (comment 6d) , to address the concern for non-specific binding of the ISH probes, we have now added additional ISH images, together with a negative control probe (C. elegans gene dapB) and a positive control probe (mouse Ppib), as shown in Supplementary Figure 5A and 5B, respectively.

      Author response image 4.

      Additional in situ hybridization images. (A) In situ hybridization images probing dapB (negative control probe) showed no signals. (B) In situ hybridization images probing Ppib (positive control probe) showed strong signals.

      (7) The authors state that "mitochondrial dysfunction was most pronounced in the PT-Mito cluster" but in Figure 4D, the oxidative phosphorylation activation Z score was most down in the PT-inj (injured PT cells) and the PT-Mito cells were the 4-most downregulated cell type.

      We appreciate the careful reading and agree with reviewer’s comment. In the revision, we have deleted “most” from this description.

      (8) In Fig 4F, please state what "Cp expression" means.

      We have spelled out ceruloplasmin (Cp).

      (9) It is not clear in immunohistochemistry images in Fig 5F where the p-stat3 was detected due to the hematoxylin counterstain which may have obscured subtle nuclear staining. Also, some of the strongest staining appears to be in peritubular capillaries, instead of tubular and glomerular epithelial cells.

      We have added arrows to help readers see where we show that p-Stat3 was detected as faintly-brown and distinct cytoplasmic granules in injured tubular cells in Tg26 mice (panel F), as opposed to diffuse in tubular cytoplasmic color in wild-type mice (panel E).

      Author response image 5.

      (10) For the studies of mitochondrial oxygen consumption (Fig 6), it would be helpful to also provide data on the effect of C16 in wild-type kidneys, in case C16 somehow causes a primary increase in mitochondrial oxygen consumption rather than preventing HIV-induced loss in kidney cells from HIV-transgenic mice.

      We did not include Seahorse data regarding oxygen consumption from WT mice treated with C16, as C16 did not affect either renal function or transcriptomes in WT mice, in contrast to the Tg26 mice (Figure 1A-G).

      (11) The authors emphasize that podocytes had the highest expression of HIV genes (Fig 7). However, it appears that <2% of podocytes expressed HIV genes. How do the authors explain the severe renal phenotype given the relatively small number of cells expressing the HIV transgene? Also, did the same cells express all/most of the HIV transcripts, or did some cells express some HIV transcripts? For instance, since the authors state that vpr and nef have the most important role in kidney injury, were the same cells that expressed nef also expressing Vpr?

      We know that snRNA-seq cannot detect the whole transcriptome in each cell, due to the well-known drop-out effect characteristic of the method. Several factors may contribute to this drop-out effect, including stochastic patterns of gene expression, low RNA amounts and inefficient mRNA capture (Qiu, Nature Comm, 2020; Ran, Bioinformatics, 2020).

      Our interpretation is that HIV gene expressing-podocytes had higher expression of HIV genes, but it does not mean that other kidney cells entirely lack HIV gene expression. With regard to co-expression of other HIV transcripts, nef and vpr were more often coexpressed as shown in Figure 7J. Vpr was expressed in nef-positive podocytes and not detected in nef-negative podocytes.

      (12) In figure 8, the authors emphasize the dysregulation of genes involved in cell-cell interaction, particularly PDGF-D. They show some data for the effect of C16 in this system in Fig 8 but it would be helpful if they can state the effect in the text of the Results section.

      We have added text in the Results describing activating interactions in Tg26 mice, that were reduced by C16 exposure, as follows: (page 18)

      “For example, platelet derived growth factor D (PDGF-D) was upregulated in PT-Inj in Tg26 mice and was downregulated by C16 treatment (Figure 8D). Further, PDGF-D may interact with PDGFR-B in fibroblasts.”

    1. Author response:

      We extend our sincere gratitude to the editor and three reviewers for their invaluable feedback, which not only included positive comments but also provided constructive suggestions for enhancing the quality of our manuscript.

      Of potential interest to you is our forthcoming investigation into vaccine efficacy, where we will compare the effectiveness of our live-attenuated vaccine with an mRNA-based alternative.

      Moreover, we acknowledge and fully endorse the recommendation to elucidate why immunization with our live-attenuated vaccine confers protection against viral challenge, even in the absence of sufficient neutralizing antibodies. As pointed out by the reviewers, this phenomenon may be attributed to mucosal immunity. Consequently, we have outlined plans to investigate whether the attenuated live vaccine elicits mucosal immunity as part of our ongoing research.

      We are currently working to gather the necessary data to address these inquiries comprehensively, and are aiming to resubmit our manuscript at the earliest opportunity.

      Reviewer #1: We sincerely appreciate the insightful comments provided by Reviewer #1. In response to this feedback, we will conduct a comparative analysis of efficacy between our live-attenuated vaccine and an mRNA-based alternative. Furthermore, we will thoroughly examine and delineate the advantages and limitations of this/our live-attenuated vaccine in our discussion.

      Reviewer #2: We express our sincere appreciation to Reviewer #2 for invaluable suggestions. In light of the insightful observation concerning the weakness of our study, related to the poor assessment/evaluation of the induction of mucosal immunity by our vaccine candidate, we have resolved to undertake a comprehensive analysis in this regard.

      Furthermore, we will take into account this reviewer's recommendation to compare BK2102 results with those of an mRNA vaccine. We are currently in the process of planning additional experiments to thoroughly address this aspect.

      Reviewer #3: We are very grateful to Reviewer #3 for the positive feedback and invaluable suggestions. In order to further explore the immune mechanisms underlying the protection against the Omicron variant in the absence of detectable neutralizing antibodies, we are currently devising plans for experiments focused on evaluating mucosal immunity.

      Moreover, in accordance with Reviewer #3's suggestion, we are considering the incorporation of an ELISPOT assay experiment. However, we acknowledge uncertainties regarding the feasibility of establishing an experimental system for this purpose.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript, Day et al. present a high-throughput version of expansion microscopy to increase the throughput of this well-established super-resolution imaging technique. Through technical innovations in liquid handling with custom-fabricated tools and modifications to how the expandable hydrogels are polymerized, the authors show robust ~4-fold expansion of cultured cells in 96-well plates. They go on to show that HiExM can be used for applications such as drug screens by testing the effect of doxorubicin on human cardiomyocytes. Interestingly, the effects of this drug on changing DNA organization were only detectable by ExM, demonstrating the utility of HiExM for such studies.

      Overall, this is a very well-written manuscript presenting an important technical advance that overcomes a major limitation of ExM - throughput. As a method, HiExM appears extremely useful, and the data generally support the conclusions.

      Strengths:

      Hi-ExM overcomes a major limitation of ExM by increasing the throughput and reducing the need for manual handling of gels. The authors do an excellent job of explaining each variation introduced to HiExM to make this work and thoroughly characterize the impressive expansion isotropy. The dox experiments are generally well-controlled and the comparison to an alternative stressor (H2O2) significantly strengthens the conclusions.

      Weaknesses:

      (1) Based on the exceedingly small volume of solution used to form the hydrogel in the well, there may be many unexpanded cells in the well and possibly underneath the expanded hydrogel at the end of this. How would this affect the image acquisition, analysis, and interpretation of HiExM data?

      The hydrogel footprint covers approximately 5% of the surface within an individual well and only cells within this area are embedded in the polymerized hydrogel for subsequent processing steps. Cells that are outside of this footprint are not incorporated into the gel, meaning that these cells are digested by Proteinase K and subsequently washed away by the excess water exchange in the gel swelling step. Note that different cell types may require higher or lower concentrations of Proteinase K to adequately digest cells for expansion while maintaining fluorescence signal. Given the compatibility of HiExM with 96-well plates, this titration can be performed rapidly in a single experiment. Although cells outside of the hydrogel footprint are removed prior to imaging, we do occasionally observe Hoechst signal that appears to be underneath the gels. We believe this signal is likely from excess DNA from digested cells that was not fully washed out in the gel swelling step. This signal is both spatially and morphologically distinct from the nuclear signal of intact cells and it does not affect image acquisition, analysis, or data interpretation.

      (2) It is unclear why the expansion factor is so variable between plates (e.g., Figure 2H). This should be discussed in more detail.

      The variability in expansion factor across plates can likely be attributed to the small volume (~250 nL) deposited by the device posts. Small variations in gel volume could impact gel polymerization compared to standard ExM gels. For example, gels in HiExM are more sensitive to evaporation because they are ~1000x smaller than standard expansion gel preparations due to an increased air-liquid-interface. Evaporation in HiExM gels increases monomer and cross linker concentrations, leading to variation in expansion factor across plates. We note that expansion factor is robust within well plates and that variance is slightly increased between plates. These differences will be discussed in the revised manuscript.

      (3) The authors claim that CF dyes are more resistant to bleaching than other dyes. However, in Figure. S3, it appears that half of the CF dyes tested still show bleaching, and no data is shown supporting the claim that Alexa dyes bleach. It would be helpful to include data supporting the claim that Alexa dyes bleach more than CF dyes and the claim that CF dyes in general are resistant to bleaching should be modified to more accurately reflect the data shown.

      We did not show data using Alexa dyes because these fluorophores are highly sensitive to photobleaching using Irgacure and thus we could not obtain images. In contrast, some CF dyes are more robust to bleaching in HiExM including CF488A, CF568, and CF633 dyes. We have recently adapted our protocol to PhotoExM chemistry which is compatible with a wider range of fluorophores as described by Günay et al. (2023) and as shown in current Fig. S11.

      (4) Related to the above point, it appears that Figure S11 may be missing the figure legend. This makes it hard to understand how HiExM can use other photo-inducible polymerization methods and dyes other than CF dyes.

      The following figure legend will be included in the revised manuscript. Fig. S11: Example of a cell expanded in HiExM using Photo-ExM gel chemistry. Photo-ExM does not require an anoxic environment for gel deposition and polymerization, improving ease of use of HiExM. Mitochondria were stained with an Alexa 647 conjugated secondary antibody, indicating that HiExM is compatible with additional fluorophores when combined with Photo-ExM.

      (5) The use of automated high-content imaging is impressive. However, it is unclear to me how the increased search space across the extended planar area and focal depths in expanded samples is overcome. It would be helpful to explain this automated imaging strategy in more detail.

      We imaged plates on the Opera Phenix using the PreciScan Acquisition Software in Harmony. In brief, each well is imaged at 5x magnification in the Hoechst channel to capture the full well at low resolution. Hoechst is used for this step given its signal brightness, ubiquity across established staining protocols, and spectral independence from most fluorophores commonly conjugated to secondary antibodies. Using this information, the microscope detects regions of interest (nuclei) based on criteria including size, brightness, circularity, etc. Finally, the positional information for each region is stored, and the microscope automatically images those regions at 63x magnification. The working distance for the objective used in this study is 600 µm which is sufficient to capture the entirety of expanded cells in the Z direction. This strategy allows minimizes off-target imaging and allows robust image acquisition even in cultures with lower seeding density. A detailed description of the automated imaging strategy will be included in the revised manuscript.

      (6) The general method of imaging pre- and post-expansion is not entirely clear to me. For example, on page 5 the authors state that pre-expansion imaging was done at the center of each gel. Is pre-expansion imaging done after the initial gel polymerization? If so, this would assume that the gelation itself has no effect on cell size and shape if these gelled but not yet expanded cells are used as the reference for calculating expansion factor and isotropy.

      Pre-expansion imaging is performed after staining is complete, but prior to the application of AcX, which is the first step of the HiExM protocol. Following staining and imaging, plates can be sealed with paraffin and stored at 4˚C for up to a week prior to starting the expansion protocol. We typically image 61 fields of view at the center of the well plate (where the gel will be deposited) to obtain sufficient pre-expansion images as shown in Figure 2b (left). After pre-expansion imaging, we perform the HiExM protocol followed by image acquisition. We then tile all the images, as shown in Figure 2b, and compare tiled images from the same well pre- and post-expansion to manually identify the same cells. Comparisons of the pre- and post-expansion images of the same cell are then used to calculate expansion factor and isotropy measurements as described. This detailed description will be included in the revised manuscript.

      (7) In the dox experiments, are only 4 expanded nuclei analyzed? It is unclear in the Figure 3 legend what the replicates are because for the unexpanded cells, it says the number of nuclei but for expanded it only says n=4. If only 4 nuclei are analyzed, this does not play to the strengths of HiExM by having high throughput.

      We performed the DOX titration assay across four different well plates (i.e. n=4). For each condition, the total number of nuclei measured was 56, 71, 64, 92, and 62 for DMSO, 1nM, 10nM, 100nM, and 1µM, respectively. For SEM calculations, we included the number of technical replicates to avoid underestimating error. We have revised the Figure 3 legend to better reflect the experimental details.

      (8) I am not sure if the analysis of dox-treated cells is accurate for the overall phenotype because only a single slice at the midplane is analyzed. It would be helpful to show, at least in one or two example cases, that this trend of changing edge intensity occurs across the whole 3D nucleus.

      We will repeat our analysis on a subset of images using multiple optical sections for each nucleus reported. These new data will be included in the revised manuscript.

      (9) It would be helpful to provide an actual benchmark of imaging speed or throughput to support the claims on page 8 that HiExM can be combined with autonomous imaging to capture thousands of cells a day. What is the highest throughput you have achieved so far?

      The parameters that dictate imaging speed in HiExM include exposure time, z-stack height, and number of channels. Depending on the signal intensity for a given channel, exposure times vary from 200ms to 1000ms. For z-stack height, we found that imaging 65 sections with 1µm spacing allowed for robust identification of each region of interest in the 5x pre-scan. As an example, collecting images for a full well plate (e.g., 20 images per well with 4 channels) requires approximately 24 hours of autonomous image acquisition using the Opera Phenix. Depending on cell size, this yields imaging data for between 1200 cells (1 cell per field of view) to 6000 cells (5 cells per field of view). Different autonomous imagers as well as improving staining techniques that increase signal:noise can be expected to significantly decrease the exposure time as it will reduce the number of z-stacks needed for each region.

      Reviewer #2 (Public Review):

      Summary:

      In the present work, the authors present an engineering solution to sample preparation in 96-well plates for high-throughput super-resolution microscopy via Expansion Microscopy. This is not a trivial problem, as the well cannot be filled with the gel, which would prohibit the expansion of the gel. A device was engineered that can spot a small droplet of hydrogel solution and keep it in place as it polymerizes. It occupies only a small portion of space at the center of each well, the gel can expand into all directions, and imaging and staining can proceed by liquid handling robots and an automated microscope.

      Strengths:

      In contrast to Reference 8, the authors' system is compatible with standard 96 well imaging plates for high-throughput automated microscopy and automated liquid handling for most parts of the protocol. They thus provide a clear path towards high-throughput ExM and high-throughput super-resolution microscopy, which is a timely and important goal.

      Weaknesses:

      The assay they chose to demonstrate what high-throughput ExM could be useful for, is not very convincing. But for this reviewer that is not important.

      We appreciate this reviewer’s point. We believe the data provide an example of the power of HiExM for collecting thousands of nanoscale images that would benefit experiments that require many samples (e.g., conditions, replicates, timepoints, etc.). The ability to generate large data sets also enables quantitative analysis of images with appropriate statistical power. The intention of this experiment was to provide a proof-of-concept example of the robustness, accessibility, and experimental design flexibility of HiExM.

      Reviewer #3 (Public Review):

      Summary:

      Day et al. introduced high-throughput expansion microscopy (HiExM), a method facilitating the simultaneous adaptation of expansion microscopy for cells cultured in a 96-well plate format. The distinctive features of this method include 1) the use of a specialized device for delivering a minimal amount (~230 nL) of gel solution to each well of a conventional 96-well plate, and 2) the application of the photochemical initiator, Irgacure 2959, to successfully form and expand the toroidal gel within each well.

      Strengths:

      This configuration eliminates the need for transferring gels to other dishes or wells, thereby enhancing the throughput and reproducibility of parallel expansion microscopy. This methodological uniqueness indicates the applicability of HiExM in detecting subtle cellular changes on a large scale.

      Weaknesses:

      To demonstrate the potential utility of HiExM in cell phenotyping, drug studies, and toxicology investigations, the authors treated hiPS-derived cardiomyocytes with a low dose of doxycycline (dox) and quantitatively assessed changes in nuclear morphology. However, this reviewer is not fully convinced of the validity of this specific application. Furthermore, some data about the effect of expansion require reconsideration.

      The application we chose was intended as a proof of concept. We believe the data provide an example of the power of HiExM for collecting thousands of nanoscale images that would benefit experiments that require many samples (e.g., conditions, replicates, timepoints, etc.). The ability to generate large data sets also enables quantitative analysis of images with appropriate statistical power. The intention of this experiment was to provide a proof-of-concept example of the robustness, accessibility, and experimental design flexibility of HiExM.

      The variability in expansion factor across plates can likely be attributed to the small volume (~250 nL) deposited by the device posts. Small variations in gel volume could impact gel polymerization compared to standard ExM gels. For example, gels in HiExM are more sensitive to evaporation because they are ~1000x smaller than standard expansion gel preparations due to an increased air-liquid-interface. Evaporation in HiExM gels increases monomer and cross linker concentrations, leading to variation in expansion factor across plates. We note that expansion factor is robust within well plates and that variance is slightly increased between plates. These differences will be discussed in the revised manuscript.

    1. Author response:

      eLife assessment

      This study presents valuable information on the mechanism of how birnavirus VP3 protein interacts with PI3P in early endosomes. Evidence supporting the proposed two-stage mechanism is incomplete and would benefit from additional supporting experiments, and additional experimentation would also address concerns about data consistency.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Zanetti et al. use biophysical and cellular assays to investigate the interaction of the birnavirus VP3 protein with the early endosome lipid PI3P. The major novel finding is that the association of the VP3 protein with an anionic lipid (PI3P) appears to be important for viral replication, as evidenced through a cellular assay on FFUs.

      Strengths:

      Supports previously published claims that VP3 may associate with early endosomes and bind to PI3P-containing membranes. The claim that mutating a single residue (R200) critically affects early endosome binding and that the same mutation also inhibits viral replication suggests a very important role for this binding in the viral life cycle.

      Weaknesses:

      The manuscript is relatively narrowly focused: one bimolecular interaction between a host cell lipid and one protein of an unusual avian virus (VP3-PI3P). Aspects of this interaction have been described previously. Additional data would strengthen claims about the specificity and some technical issues should be addressed. Many of the core claims would benefit from additional experimental support to improve consistency.

      We focused our efforts on the characterization of the molecular interaction between the birnaviral protein VP3 and the anionic lipid PI3P, which is found in the host cell. This decision was motivated by our previous research, which made use of cell biology and virology techniques to demonstrate that VP3 facilitates the formation of the viral replication machinery on the cytosolic leaflet of early endosomes due to its inherent endosome-targeting capability (J Virol. 2018 May 14;92(11):e01964-17). Additionally, our previous findings indicated that PI3P, present in early endosomal membranes, is a critical host factor enabling VP3's association with these membranes, thereby promoting viral replication (J Virol. 2021 Feb 24;95(6):e02313-20). Consequently, an in-depth characterization of the VP3/PI3P interaction was necessary and motivated the present work. We plan to incorporate specific recommendations to further substantiate our assertions in the revised version of our manuscript.

      Reviewer #2 (Public Review):

      Summary:

      Birnavirus replication factories form alongside early endosomes (EEs) in the host cell cytoplasm. Previous work from the Delgui lab has shown that the VP3 protein of the birnavirus strain infectious bursal disease virus (IBDV) interacts with phosphatidylinositol-3-phosphate (PI3P) within the EE membrane (Gimenez et al., 2018, 2020). Here, Zanetti et al. extend this previous work by biochemically mapping the specific determinants within IBDV VP3 that are required for PI3P binding in vitro, and they employ in silico simulations to propose a biophysical model for VP3-PI3P interactions.

      Strengths:

      The manuscript is generally well-written, and much of the data is rigorous and solid. The results provide deep knowledge into how birnaviruses might nucleate factories in association with EEs. The combination of approaches (biochemical, imaging, and computational) employed to investigate VP3-PI3P interactions is deemed a strength.

      Weaknesses:

      (1) Concerns about the sources, sizes, and amounts of recombinant proteins used for co-flotation: Figures 1A, 1B, 1G, and 4A show the results of co-flotation experiments in which recombinant proteins (control His-FYVE v. either full length or mutant His VP3) were either found to be associated with membranes (top) or non-associated (bottom). However, in some experiments, the total amounts of protein in the top + bottom fractions do not appear to be consistent in control v. experimental conditions. For instance, the Figure 4A western blot of His-2xFYVE following co-flotation with PI3P+ membranes shows almost no detectable protein in either top or bottom fractions.

      Liposome-based methods, such as the co-flotation assay, are well-known and preferred to study protein-phosphoinositide interaction because the phosphoinositides are incorporated in a membrane, the composition of which can mimic cellular membranes. Additionally, by modifying the phosphoinositide incorporated in the liposomes, this technique allows for determining the specificity of the protein binding. However, this approach is rather qualitative, meaning that, after density gradient separation, the protein is found in the top fractions (bound to liposomes) or in the bottom fractions (not bound to liposomes), and our quantifications have the aim of showing the difference in the bound fraction between liposome populations with or without PI3P. Given the setting of the co-flotation assays, each protein-liposome system [2xFYVE-PI3P(-), 2xFYVE-PI3P(+), VP3-PI3P(-), or VP3-PI3P(+)] is assessed separately, and even if the conditions are homogeneous, it’s not surprising to observe differences in the protein level between each one. Indeed, our revised version of the manuscript will include membranes with more similar band intensities.

      Reading the paper, it was difficult to understand which source of protein was used for each experiment (i.e., E. coli or baculovirus-expressed), and this information is contradicted in several places (see lines 358-359 v. 383-384). Also, both the control protein and the His-VP3-FL proteins show up as several bands in the western blots, but they don't appear to be consistent with the sizes of the proteins stated on lines 383-384. For example, line 383 states that His-VP3-FL is ~43 kDa, but the blots show triplet bands that are all below the 35 kDa marker (Figures 1B and 1G). Mass spectrometry information is shown in the supplemental data (describing the different bands for His-VP3-FL) but this is not mentioned in the actual manuscript, causing confusion. Finally, the results appear to differ throughout the paper (see Figures 1B v. 1G and 1A v. 4A).

      We used two sources of recombinant VP3: baculovirus and Escherichia coli. Initially, we opted for the baculovirus system based on evidence from previous studies that it was suitable for ectopic expression of VP3. Subsequently, we successfully produced VP3 using Escherichia coli and chose to transition to this system due to several technical advantages. Moreover, mass spectrometry analysis did not reveal any post-translational modifications that may have favored retaining the baculoviral system. We confirmed that VP3, produced in either system, exhibited similar behavior in our co-flotation assays. We will clarify all this in the revised version of our manuscript.

      (2) Possible "other" effects of the R200D mutation on the VP3 protein. The authors performed mutagenesis to identify which residues within patch 2 on VP3 are important for association with PI3P. They found that a VP3 mutant with an engineered R200D change (i) did not associate with PI3P membranes in co-floatation assays, and (ii) did not co-localize with EE markers in transfected cells. Moreover, this mutation resulted in the loss of IBDV viability in reverse genetics studies. The authors interpret these results to indicate that this residue is important for "mediating VP3-PI3P interaction" (line 211) and that this interaction is essential for viral replication. However, it seems possible that this mutation abrogated other aspects of VP3 function (e.g., dimerization or other protein/RNA interactions) aside from or in addition to PI3P binding. Such possibilities are not mentioned by the authors.

      The arginine amino acid at position 200 of VP3 is not located in any of the protein regions associated with its other known functions. VP3 has a dimerization domain located in the second helical domain, where different amino acids across the three helices form a total of 81 interprotomeric close contacts; however, R200 is not involved in these contacts (Structure. 2008 Jan;16(1):29-37). VP3 also has an oligomerization domain mapped within the 42 C-terminal residues of the polypeptide, i.e., the segment of the protein composed by the residues at positions 216-257 (J Virol. 2003 Jun;77(11):6438–6449). Regarding VP3’s ability to bind RNA, it is facilitated by a region of positively charged amino acids, identified as P1, which includes K99, R102, K105, and K106 (PLoS One. 2012;7(9):e45957). Furthermore, our findings indicate that the R200D mutant retains a folding pattern similar to the wild-type protein, as shown in Figure 4B. All these lead us to conclude that the loss of replication capacity of R200D viruses results from impaired, or even lost, VP3-PI3P interaction.

      (3) Interpretations from computational simulations. The authors performed computational simulations on the VP3 structure to infer how the protein might interact with membranes. Such computational approaches are powerful hypothesis-generating tools. However, additional biochemical evidence beyond what is presented would be required to support the authors' claims that they "unveiled a two-stage modular mechanism" for VP3-PI3P interactions (see lines 55-59). Moreover, given the biochemical data presented for R200D VP3, it was surprising that the authors did not perform computational simulations on this mutant. The inclusion of such an experiment would help tie together the in vitro and in silico data and strengthen the manuscript.

      We acknowledge that the language used may have overstated the "unveiling" of the two-stage binding mechanism for VP3 on membranes containing PI3P. We intended to propose, rather than confirm, this mechanism, largely based on our coarse-grained simulations. Accordingly, we will revise the manuscript to temper our claims and frame them more appropriately. Regarding the absence of computer simulations for the R200D VP3 mutant, these were indeed conducted, and the results are detailed in Figure 14 of the supplementary material. We realize this was not adequately emphasized in the main manuscript, an oversight we will correct in the revised version.

      Reviewer #3 (Public Review):

      Summary:

      Infectious bursal disease virus (IBDV) is a birnavirus and an important avian pathogen. Interestingly, IBDV appears to be a unique dsRNA virus that uses early endosomes for RNA replication that is more common for +ssRNA viruses such as for example SARS-CoV-2.

      This work builds on previous studies showing that IBDV VP3 interacts with PIP3 during virus replication. The authors provide further biophysical evidence for the interaction and map the interacting domain on VP3.

      Strengths: Detailed characterization of the interaction between VP3 and PIP3 identified R200D mutation as critical for the interaction. Cryo-EM data show that VP3 leads to membrane deformation.

      Weaknesses:

      The work does not directly show that the identified R200 residues are directly involved in VP3-early endosome recruitment during infection. The majority of work is done with transfected VP3 protein (or in vitro) and not in virus-infected cells. Additional controls such as the use of PIP3 antagonizing drugs in infected cells together with a colocalization study of VP3 with early endosomes would strengthen the study. In addition, it would be advisable to include a control for cryo-EM using liposomes that do not contain PIP3 but are incubated with HIS-VP3-FL. This would allow ruling out any unspecific binding that might not be detected on WB.

      The authors also do not propose how their findings could be translated into drug development that could be applied to protect poultry during an outbreak. The title of the manuscript is broad and would improve with rewording so that it captures what the authors achieved.

      In previous works from our group, we demonstrated the crucial role of the VP3 P2 region in targeting the early endosomal membranes and for viral replication, including the use of PI3K inhibitors to deplete PI3P, showing that both the control RFP-2xFYVE and VP3 lost their ability to associate with the early endosomal membranes (J Virol. 2018 May 14;92(11):e01964-17; J Virol. 2021 Feb 24;95(6):e02313-20). In the present work, to further characterize the role of R200 in binding to early endosomes and for viral replication, we show that: i) the transfected VP3 R200D protein loses the ability to bind to early endosomes in immunofluorescence assays (Figure 2E and Figure 3); ii) the recombinant VP3 R200D protein loses the ability to bind to liposomes PI3P(+) in co-flotation assays (Figure 4A); and, iii) the mutant virus R200D loses replication capacity (Figure 4C).

      Regarding the cryo-EM comment: we will include images where we used liposomes PIP3(-) in the revised version of our manuscript.

      We will also modify the title of the manuscript.

      Regarding the question of how our findings could be translated into drug development, indeed, VP3-PI3P binding constitutes a good target for drugs that counteract infectious bursal disease. However, we did not mention this idea in the manuscript, first because it is somewhat speculative and second because infected farms do not implement any specific treatment. The control is based on vaccination. We will mention these aspects of the infection in the revised version of our manuscript.

    1. Author Response

      The following is the authors’ response to the previous reviews.

      Thank you once again for your patience and guidance through this revision process. I would like to add an important aspect to our previous discussion regarding the identification and impact of potential contaminants in our study.

      In recent years, advanced tools such as SCRuB (recently published in Nature Biotechnology, DOI:10.1038/s41587-023-01696-w) and the widely-used tool decontam have been developed to address the issue of contaminants in metagenomic studies. These tools primarily operate based on sequence similarity, identifying potential contaminants by marking and removing those found in only a minority of samples or those that display patterns indicative of laboratory contamination.

      As the reviewer rightly pointed out, contaminants are often rare species that appear in very few samples. Our study, focusing on high-abundance species in the vaginal microbiome, is less susceptible to the influences of such rare contaminants. This approach aligns with the methodology employed by leading research groups in the field, such as Professor Jacques Ravel's lab. Their decision not to use blank controls in several of their studies on the female reproductive tract microbiome likely stems from a similar understanding — that the impact of rare contaminants is minimal on the study's conclusions, especially when high-abundance species are the main focus.

      We believe that the methodologies and tools currently available for contaminant identification and removal, while highly effective for their intended purpose, reinforce our decision to focus on high-abundance species. This focus minimizes the potential impact of rare contaminants on our study's conclusions. In light of this, our study's methodology remains robust and well-suited for achieving our research objectives.

      In our revised manuscript, we will include a discussion of these points, further clarifying our approach and the rationale behind our methodological choices. We hope that this additional information will address the concerns raised and provide a clearer understanding of the context and reliability of our findings.

      Thank you for considering these additional points. We look forward to your feedback on our revised manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We thank the reviewers for their thoughtful comments. We were pleased that they thought our study was "well crafted and written", "important", and that it provides a "valuable resource for researchers studying color vision". They also expressed several constructive criticisms, concerning – among other things – the lack of details regarding experimental procedures and analysis, the challenge in relating retinal data to cortical recordings, and consistency of results across animals. In response to the reviewers’ comments and following their suggestions, we performed additional analyses, and substantially revised the paper:

      We added a section in the Discussion about "Limitations of the stimulus paradigm". In addition, we added a new Suppl. Figure that illustrates the effect of deconvolution of calcium traces on our results and clarified in the text why we use deconvolved signals for all analyses. The new Suppl. Figure also shows an additional analysis with a more conservative threshold of neuron exclusion.

      We now clarify how retinal signaling relates to our cortical results and rewrote the text to be more conservative regarding our conclusions.

      In addition, we added a new Suppl. Figure showing the key analyses from Figures 2 and 4 separately for each animal. We now mention consistency across animals in the Results section and clearly state which analyses were performed an data pooled across animals.

      We are positive that these additions address the issues raised by the reviewers. Please find our point-by-point replies to all comments below.

      eLife assessment

      Franke et al. explore and characterize the color response properties in the mouse primary visual cortex, revealing specific color opponent encoding strategies across the visual field. The data is solid; however, the evidence supporting some conclusions and details about some procedures are incomplete. In its current form, the paper makes a useful contribution to how color is coded in mouse V1. Significance would be enhanced with some additional analyses and resolution of some technical issues.

      We thank the reviewers for appreciating our manuscript and their thoughtful comments.

      Referee 1 (Remarks to the Author):

      Summary:

      In this study, Franke et al. explore and characterize the color response properties across the primary visual cortex, revealing specific color opponent encoding strategies across the visual field. The authors use awake-behaving 2P imaging to define the spectral response properties of visual interneurons in Layer 2/3. They find that opponent responses are more prominent at photopic light levels, and diversity in color opponent responses exists across the visual science, with green ON/ UV OFF responses being stronger represented in the upper visual field. This is argued to be relevant for detecting certain features that are more salient when the chromatic space is used, possibly due to noise reductions.

      Strengths:

      The work is well crafted and written and provides a thorough characterization that reveals an uncharacterized diversity of visual properties in V1. I find this characterization important because it reveals how strongly chromatic information can modulate the response properties in V1. In the upper visual field, 25% of the cells differentially relay chromatic information, and one may wonder how this information will be integrated and subsequently used to aid vision beyond the detection of color per see. I personally like the last paragraph of the discussion that highlights this fact.

      We thank the reviewer for appreciating our manuscript.

      Weaknesses: One major point highlighted in this paper is the fact that Green ON/UV OFF responses are not generated in the retina. But glancing through the literature, I saw this is not necessarily true. Fig 1. of Joesch and Meister, a paper cited, shows this can be the case. Thus, I would not emphasize that this wasn’t present in the retina. This is a minor point, but even if the retina could not generate these signals, I would be surprised if the diversity of responses would only arise through feed-forward excitation, given the intricacies of cortical connectivity. Thus, I would argue that the argument holds for most of the responses seen in V1; they need to be further processed by cortical circuitries.

      We thank the reviewer for this comment. When analyzing available data from the retina using a similar center-surround color flicker stimulus (Szatko et al. 2020), we found that Green On/UV Off color opponency is very rare in the RF center of retinal ganglion cells (Suppl. Fig. 5). This suggests that center Green On/UV Off color opponency in V1 neurons is not inherited by the RF center of retinal neurons. However, we agree with the reviewer that retinal neurons might still contribute to V1 color opponency, for example by being center-surround color opponent (e.g. Joesch et al. 2016 and Szatko et al. 2020). We rephrased the text to acknowledge this fact.

      This takes me to my second point, defining center and surround. The center spot is 37.5 deg of visual angle, more than 1 mm of the retinal surface. That means that all retinal cells, at least half and most likely all of their surrounds will also be activated. Although 37.5 deg is roughly the receptive field size previously determined for V1 neurons, the one-to-one comparison with retinal recording, particularly with their center/surround properties, is difficult. This should be discussed. I assume that the authors tried a similar approach with sparse or dense checker white noise stimuli. If so, it would be interesting if there were better ways of defining the properties of V1 neurons on their complex/simple receptive field properties to define how much of their responses are due to an activation of the true "center" or a coactivation of the surround. Interestingly, at least some of the cells (Fig. 1d, cells 2 and 5) don’t have a surround. Could it be that in these cases, the "center" and "surround" are being excited together? How different would the overall statistics change if one used a full-filed flicker stimulus instead of a center/surround stimulus? How stable are the results if the center/surround flicker stimulus is shifted? These results won’t change the fact that chromatic coding is present in the VC and that there are clear differences depending on their position, but it might change the interpretation. Thus, I would encourage you to test these differences and discuss them.

      Thanks for this comment. We agree with the reviewer that a one-to-one comparison of retina and V1 data is challenging, due to differences in both RF and stimulus size. We rephrased the Results text to clarify this point and now also mention it in the Discussion.

      To be able to record from many V1 neurons simultaneously, we used a stimulus size of 37.5 degree visual angle in diameter, which is slightly larger than center RFs of single V1 neurons. As the reviewer mentions, the disadvantage of this approach is that the stimulus is only roughly centered on the neurons’ center RFs. To reduce the impact of potential stimulus misalignment on our results, we used the following steps:

      For each recording, we positioned the monitor such that the mean RF across all neurons lies within the center of the stimulus field of view.

      We confirmed that this procedure results in good stimulus alignment for the large majority of recorded neurons within individual recording fields by using a sparse noise stimulus (Suppl. Fig. 1a-c). Specifically, we found that for 83% of tested neurons, more than two thirds of their center RF, determined by the sparse noise stimulus, overlapped with the center spot of the color noise stimulus.

      For analysis, we excluded neurons without a significant center STA, which may be caused by misalignment of the stimulus.

      Together, we believe these points strongly suggest that the center spot and the surround annulus of the noise stimulus predominantly drive center (i.e. classical RF) and surround (i.e. extraclassical RF), respectively, of the recorded V1 neurons. This is further supported by the fact that color response types identified using an automated clustering method were robust across mice (Suppl. Fig. 6c), indicating consistent stimulus centering.

      Nevertheless, we cannot exclude that the stimulus was misaligned for a subset of the recorded neurons used for analysis. We agree with the reviewer that such misalignment might have contributed to cells not having surround STAs, due to simultaneous activation of antagonistic center and surround RF components by the surround stimulus. While a full-field stimulus would get rid of the misalignment problem, it would not allow to study color tuning in center and surround RF components separately. Instead, one could compare the results of our approach with an approach that centers the stimulus on individual neurons. However, we believe that performing these additional experiments is out of the scope of the current study.

      To acknowledge the experimental limitations of our study and the concerns brought up by the reviewer, we now explicitly mention the steps we perform to reduce the effects of stimulus misalignment in the Results section and discuss the problem of stimulus alignment in the Discussion. We believe these changes will help the reader to interpret our results.

      Referee 2 (Remarks to the Author):

      Summary:

      Franke et al. characterize the representation of color in the primary visual cortex of mice and how it changes across the visual field, with a particular focus on how this may influence the ability to detect aerial predators. Using calcium imaging in awake, head-fixed mice, they characterize the properties of V1 neurons (layer 2/3) using a large center-surround stimulation where green and ultra-violet were presented in random combinations. Using a clustering approach, a set of functional cell-types were identified based on their preference to different combinations of green and UV in their center and surround. These functional types were demonstrated to have varying spatial distributions in V1, including one neuronal type (Green-ON/UV-OFF) that was much more prominent in the posterior V1 (i.e. upper visual field). Modelling work suggests that these neurons likely support the detection of predator-like objects in the sky.

      Strengths:

      The large-scale single-cell resolution imaging used in this work allows the authors to map the responses of individual neurons across large regions of the visual cortex. Combining this large dataset with clustering analysis enabled the authors to group V1 neurons into distinct functional cell types and demonstrate their relative distribution in the upper and lower visual fields. Modelling work demonstrated the different capacity of each functional type to detect objects in the sky, providing insight into the ethological relevance of color opponent neurons in V1.

      We thank the reviewer for appreciating our manuscript.

      Weaknesses:

      While the study presents solid evidence a few weaknesses exist, including the size of the dataset, clarity regarding details of data included in each step of the analysis and discussion of caveats of the work. The results presented here are based on recordings of 3 mice. While the number of neurons recorded is reasonably large (n > 3000) an analysis that tests for consistency across animals is missing. Related to this, it is unclear how many neurons at each stage of the analysis come from the 3 different mice (except for Suppl. Fig 4).

      Thank you for this comment. We apologize that the original manuscript did not clearly indicate the consistency of our results across animals. We have revised the manuscript in the following ways:

      We have added an additional Suppl. Figure, which shows the variability of the data within and across animals (Suppl. Fig. 4). Specifically, we show the distribution of color and luminance selectivity for (i) center and surround components of V1 RFs and (ii) for upper and lower visual field. This data is used for all analyses shown in Figures 2-4. The figure legend of this figure also states the number of neurons per animal.

      We now clearly state in the Results section that all analyses in the main figures were performed by pooling data across animals, and refer to the Suppl. Figures for consistency across animals.

      We believe these changes help the reader to interpret our results.

      Finally, the paper would greatly benefit from a more in depth discussion of the caveats related to the conclusion drawn at each stage of the analysis. This is particularly relevant regarding the caveats related to using spike triggered averages to assess the response preferences of ON-OFF neurons, and the conclusions drawn about the contribution of retinal color opponency.

      Thanks. We substantially revised the text to discuss caveats and limitations of the approach. For example, we added a section into the Discussion called "Limitations of the stimulus paradigm". In addition, we clarified how retinal signals relate to cortical ones and phrased our conclusions more conservatively.

      The authors provide solid evidence to support an asymmetric distribution of color opponent cells in V1 and a reduced color contrast representation in lower light levels. Some statements would benefit from more direct evidence such as the integration of upstream visual signals for color opponency in V1.

      Based on the comments from Reviewer 1, we have rephrased the statements regarding the integration of upstream visual signals for color opponency in V1. We think these revisions increase the clarity of the results and help the reader with interpretation.

      Overall, this study will be a valuable resource for researchers studying color vision, cortical processing, and the processing of ethologically relevant information. It provides a useful basis for future work on the origin of color opponency in V1 and its ethological relevance.

      Thanks! We thank the reviewer again for the helpful comments.

      Referee 3 (Remarks to the Author):

      This paper studies chromatic coding in mouse primary visual cortex. Calcium responses of a large collection of cells are measured in response to a simple spot stimulus. These responses are used to estimate chromatic tuning properties - specifically sensitivity to UV and green stimuli presented in a large central spot or a larger still surrounding region. Cells are divided based on their responses to these stimuli into luminance or chromatic sensitive groups. Several technical concerns limit how clearly the data support the conclusions. If these issues can be fixed, the paper would make a valuable contribution to how color is coded in mouse V1.

      We thank the reviewer for the helpful comments.

      Analysis: The central tool used to analyze the data is a "spike triggered average" of the responses to randomly varying stimuli. There are several steps in this analysis that are not documented, and hence evaluating how well it works is difficult. Central to this is that the paper does not measure spikes. Instead, measured calcium traces are converted to estimated spike rates, which are then used to estimate STAs. There are no raw calcium traces shown, and the approach to estimate spike rates is not described in any detail. Confirming the accuracy of these steps is essential for a reader to be able to evaluate the paper. Further, it is not clear why the linear filters connecting the recorded calcium traces and the stimulus cannot be estimated directly, without the intermediate step of estimating spike rates.

      Thank you for this comment. We have used the genetically encoded calcium sensor GCaMP6s in our recordings. This sensor is a very sensitive GCaMP6 variant, but also one with slow kinetics. To remove the effect of the slow sensor kinetics from recorded calcium responses, the recorded traces are commonly deconvolved with the impulse function of the sensor to obtain the deconvolved calcium traces. We now include this reasoning in the Results section. To illustrate the effect of the deconvolution, we added a new Suppl. Figure (Suppl. Fig. 2) showing raw calcium and deconvolved traces, and the STAs estimated from both types of traces. This illustrates that the results regarding neuronal color preferences are consistent across raw and deconvolved calcium traces.

      We agree with the reviewer that the term STA might be confusing. We have replaced it with the term "even-triggered-average" (ETA). In addition, we have replaced the phrase "estimated spike rate" with "deconvolved calcium trace" throughout the manuscript because the unit of the deconvolved traces is not interpretable, like spike rate would be (spikes per second). In the revised version, we now clarify in the Methods section that we estimate the ETAs based on deconvolved calcium traces, which is correlated with and an approximation for spike rate.

      A further issue about the STAs is that the inclusion criterion (correlation of predicted vs measured responses of 0.25) is pretty forgiving. It would be helpful to see a distribution of those correlation values, and some control analyses to check whether the STA is providing a sufficiently accurate measure to support the results (e.g. do the central results hold for the cells with the highest correlations).

      We thank the reviewer for this comment. To exclude noisy neurons from analysis, we used the following procedure:

      For each of the four stimulus conditions (center and surround for green and UV stimuli), kernel quality was measured by comparing the variance of the STA with the variance of the baseline, defined as the first 500 ms of the STA. Only cells with at least 10-times more variance of the kernel compared to baseline for UV or green center STA were considered for further analysis.

      We have added the distribution of quality values to a new Suppl. Figure (Suppl. Fig. 2d,e). We now also show the percentage of neurons above threshold, given different quality thresholds. Finally, we have repeated the analysis shown in Figure 2 for a much more conservative threshold, including only the top 25% of neurons (Suppl. Fig. 2e,f). We now mention this new analysis in the Methods and Results section.

      Limitations of stimulus choice: The paper relies on responses to a large (37.5 degree diameter) modulated spot and surrounding region. This spot is considerably larger than the receptive fields of both V1 cells and retinal ganglion cells. As a result, the spot itself is very likely to strongly activate both center and surround mechanisms, and responses of cells are likely to depend on where the receptive fields are located within the spot (and, e.g., how much of the true neural surround samples the center spot vs the surround region). The impact of these issues on the conclusions is considered briefly at the start of the results but needs to be evaluated in considerably more detail. This is particularly true for retinal ganglion cells given the size of their receptive fields (see also next point).

      We agree with the reviewer that the centering of the stimulus is critical and apologize if this point was not discussed sufficiently. To be able to record from many V1 neurons simultaneously, we used a stimulus size of 37.5 degree visual angle in diameter, which is slightly larger than center RFs of single V1 neurons. As the reviewer mentions, the disadvantage of this approach is that the stimulus is only roughly centered on the neurons’ center RFs. To reduce the impact of potential stimulus misalignment on our results, we have used different experimental and analysis steps and controls (see also second comment of Reviewer 1):

      For each recording, we positioned the monitor such that the mean RF across all neurons lies within the center of the stimulus field of view.

      We confirmed that this procedure results in good stimulus alignment for the large majority of recorded neurons within individual recording fields by using a sparse noise stimulus (Suppl. Fig. 1a-c). Specifically, we found that for 83% of tested neurons, more than two thirds of their center RF, determined by the sparse noise stimulus, overlapped with the center spot of the color noise stimulus.

      For analysis, we excluded neurons without a significant center STA, which may be caused by misalignment of the stimulus.

      We now mention those clearly in the Results section and added the limitations of our approach to the Discussion section.

      Comparison with retina: A key conclusion of the paper is that the chromatic tuning in V1 is not inherited from retinal ganglion cells. This conclusion comes from comparing chromatic tuning in a previously-collected data set from retina with the present results. But the retina recordings were made using a considerably smaller spot, and hence it is not clear that the comparison made in the paper is accurate. This issue may be handled by the analysis presented in the paper, but if so it needs to be described more clearly. The paper from which the retina data is taken argues that rod-cone chromatic opponency originates largely in the outer retina. This mechanism would be expected to be shared across retinal outputs. Thus it is not clear how the Green-On/UV-Off vs Green-Off/UV-On asymmetry could originate. This should be discussed.

      We agree with the reviewer that a one-to-one comparison of retina and V1 data is challenging, due to differences in both RF and stimulus size. We rephrased the Results text to clarify this point and now also mention it in the Discussion.

      When analyzing available data from the retina using a similar center-surround color flicker stimulus (Szatko et al. 2020), we found that Green On/UV Off color opponency is very rare in the RF center of retinal ganglion cells (Suppl. Fig. 5). This suggests that center Green On/UV Off color opponency in V1 neurons is not inherited by the RF center of retinal neurons. However, we agree with the reviewer that retinal neurons might still contribute to V1 color opponency, for example by being center-surround color opponent (e.g. Joesch et al. 2016 and Szatko et al. 2020). We rephrased the text to acknowledge this fact.

      Residual chromatic cells at low mesopic light levels The presence of chromatically tuned cells at the lowest light level probed is surprising. The authors describe these conditions as rod-dominated, in which case chromatic tuning should not be possible. This again is discussed only briefly. It either reflects the presence of an unexpected pathway that amplifies weak cone signals under low mesopic conditions such that they can create spectral opponency or something amiss in the calibrations or analysis. Data collected at still lower light levels would help resolve this.

      Thank you for this comment. We call the lowest light level "low mesopic" and "rod-dominated" because the spectral contrast of V1 center responses in posterior recording fields is green-shifted for this light level (Fig. 3a). This is only expected if responses in the UV-cone dominant ventral retina are predominantly driven by rod photoreceptors. We now explain this rationale in the Results section. In addition, we mention in the Discussion that future studies are required to test whether cone signals need to be amplified for low light levels. While we agree with the reviewer that it would be exciting to use even lower light levels during recordings, we believe this is out of the scope of the current study due to the technical challenges involved in achieving scotopic stimulation.

    1. Author response:

      The following is the authors’ response to the original reviews.

      As you will see, the main changes in the revised manuscript pertain to the structure and content of the introduction. Specifically, we have tried to more clearly introduce our paradigm, the rationale behind the paradigm, why it is different from learning paradigms, and why we study “relief”.

      In this rebuttal letter, we will go over the reviewers’ comments one-by-one and highlight how we have adapted our manuscript accordingly. However, because one concern was raised by all reviewers, we will start with an in-depth discussion of this concern.

      The shared concern pertained to the validity of the EVA task as a model to study threat omission responses. Specifically, all reviewers questioned the effectivity of our so-called “inaccurate”, “false” or “ruse” instructions in triggering an equivalent level of shock expectancy, and relatedly, how this effectivity was affected by dynamic learning over the course of the task.

      We want to thank the reviewers for raising this important issue. Indeed, it is a vital part of our design and it therefore deserves considerable attention. It is now clear to us that in the previous version of the manuscript we may have focused too little on why we moved away from a learning paradigm, and how we made sure that the instructions were successful at raising the necessary expectations; and how the instructions were affected by learning. We believe this has resulted in some misunderstandings, which consequently may have cast doubts on our results. In the following sections, we will go into these issues.

      The rationale behind our instructed design

      The main aim of our study was to investigate brain responses to unexpected omissions of threat in greater detail by examining their similarity to the reward prediction error axioms (Caplin & Dean, 2008), and exploring the link with subjective relief. Specifically, we hypothesized that omission-related responses should be dependent on the probability and the intensity of the expected-but-omitted aversive event (i.e., electrical stimulation), meaning that the response should be larger when the expected stimulation was stronger and more expected, and that fully predicted outcomes should not trigger a difference in responding.

      To this end, we required that participants had varying levels of threat probability and intensity predictions, and that these predictions would most of the time be violated. Although we fully agree with the reviewers that fear conditioning and extinction paradigms can provide an excellent way to track the teaching properties of prediction error responses (i.e., how they are used to update expectancies on future trials), we argued that they are less suited to create the varying probability and intensity-related conditions we required (see Willems & Vervliet, 2021). Specifically, in a standard conditioning task participants generally learn fast, rendering relatively few trials on which the prediction is violated. As a result, there is generally little intraindividual variability in the prediction error responses. This precludes an in-depth analysis of the probability-related effects. Furthermore, conditioning paradigms generally only include one level of aversive outcome: the electrical stimulation is either delivered or omitted. As a result, intensity-related effects cannot be tested. Finally, because CS-US contingencies change over the course of a fear conditioning and extinction study (e.g. from acquisition to extinction), there is never complete certainty about when the US will (not) follow. This precludes a direct comparison of fully predicted outcomes.

      Another added value of studying responses to the prediction error at threat omission outside a learning context is that it can offer a way to disentangle responses to the violation of threat expectancy, with those of subsequent expectancy updating.

      Also note that Rutledge and colleagues (2010), who were the first to show that human fMRI responses in the Nucleus Accumbens comply to the reward prediction error axioms also did not use learning experiences to induce expectancy. In that sense, we argued it was not necessary to adopt a learning paradigm to study threat omission responses.

      Adaptations in the revised manuscript: We included two new paragraphs in the introduction of the revised manuscript to elaborate on why we opted not to use a learning paradigm in the present study (lines 90-112).

      “However, is a correlation with the theoretical PE over time sufficient for neural activations/relief to be classified as a PE-signal? In the context of reward, Caplin and colleagues proposed three necessary and sufficient criteria all PE-signals should comply to, independent of the exact operationalizations of expectancy and reward (the socalled axiomatic approach24,25; which has also been applied to aversive PE26–28). Specifically, the magnitude of a PE signal should: (1) be positively related to the magnitude of the reward (larger rewards trigger larger PEs); (2) be negatively related to likelihood of the reward (more probable rewards trigger smaller PEs); and (3) not differentiate between fully predicted outcomes of different magnitudes (if there is no error in prediction, there should be no difference in the PE signal).”

      “It is evident that fear conditioning and extinction paradigms have been invaluable for studying the role of the threat omission PE within a learning context. However, these paradigms are not tailored to create the varying intensity and probability-related conditions that are required to evaluate the threat omission PE in the light of the PE axioms. First, conditioning paradigms generally only include one level of aversive outcome: the electrical stimulation is either delivered or omitted. As a result, the magnitude-related axiom cannot be tested. Second, in conditioning tasks people generally learn fast, rendering relatively few trials on which the prediction is violated. As a result, there is generally little intra-individual variability in the PE responses. Moreover, because of the relatively low signal to noise ratio in fMRI measures, fear extinction studies often pool across trials to compare omission-related activity between early and late extinction16, which further reduces the necessary variability to properly evaluate the probability axiom. Third, because CS-US contingencies change over the course of the task (e.g. from acquisition to extinction), there is never complete certainty about whether the US will (not) follow. This precludes a direct comparison of fully predicted outcomes. Finally, within a learning context, it remains unclear whether PErelated responses are in fact responses to the violation of expectancy itself, or whether they are the result of subsequent expectancy updating.”

      Can verbal instructions be used to raise the expectancy of shock?

      The most straightforward way to obtain sufficient variability in both probability and intensityrelated predictions is by directly providing participants with instructions on the probability and intensity of the electrical stimulation. In a previous behavioral study, we have shown that omission responses (self-reported relief and omission SCR) indeed varied with these instructions (Willems & Vervliet, 2021). In addition, the manipulation checks that are reported in the supplemental material provided further support that the verbal instructions were effective at raising the associated expectancy of stimulation. Specifically, participants recollected having received more stimulations after higher probability instructions (see Supplemental Figure 2). Furthermore, we found that anticipatory SCR, which we used as a proxy of fearful expectation, increased with increasing probability and intensity (see Supplemental Figure 3). This suggests that it is not necessary to have expectation based on previous experience if we want to evaluate threat omission responses in the light of the prediction error axioms.

      Adaptations in the revised manuscript: We more clearly referred to the manipulation checks that are presented in the supplementary material in the results section of the main paper (lines 135-141).

      “The verbal instructions were effective at raising the expectation of receiving the electrical stimulation in line with the provided probability and intensity levels. Anticipatory SCR, which we used as a proxy of fearful expectation, increased as a function of the probability and intensity instructions (see Supplementary Figure 3). Accordingly, post-experimental questions revealed that by the end of the experiment participants recollected having received more stimulations after higher probability instructions, and were willing to exert more effort to prevent stronger hypothetical stimulations (see Supplementary Figure 2).”

      How did the inconsistency between the instructed and experienced probability impact our results?

      All reviewers questioned how the inconsistency between the instructed and experienced probability might have impacted the probability-related results. However, judging from the way the comments were framed, it seems that part of the concern was based on a misunderstanding of the design we employed. Specifically, reviewer 1 mentions that “To ensure that the number of omissions is similar across conditions, the task employs inaccurate verbal instructions; I.e., 25% of shocks are omitted regardless of whether subjects are told that the probability is 100%, 75%, 50%, 25%, 0%.”, and reviewer 3 states that “... the fact remains that they do not get shocks outside of the 100% probability shock. So learning is occurring, at least for subjects who realize the probability cue is actually a ruse.” We want to emphasize that this was not what we did, and if it were true, we fully agree with the reviewers that it would have caused serious trust- and learning related issues, given that it would be immediately evident to participants that probability instructions were false. It is clear that under such circumstances, dynamic learning would be a big issue.

      However, in our task 0% and 100% instructions were always accurate. This means that participants never received a stimulus following 0% instructions and always received the stimulation of the given intensity on the 100% instructions (see Supplemental Figure 1 for an overview of the trial types). Only for the 25%, 50% and 75% trials an equal reinforcement rate (25%) was maintained, meaning that the stimulation followed in 25% of the trials, irrespective of whether a 25%, 50% or 75% instruction was given. The reason for this was that we wanted to maximize and balance the number of omission trials across the different probability levels, while also keeping the total number of presentations per probability instruction constant. We reasoned that equating the reinforcement rate across the 25%, 50% and 75% instructions should not be detrimental, because (1) in these trials there was always the possibility that a stimulation would follow; and (2) we instructed the participants that each trial is independent of the previous ones, which should have discouraged them to actively count the number of shocks in order to predict future shocks.

      Adaptations in the revised manuscript: We have tried to further clarify the design in several sections of the manuscript, including the introduction (lines 121-125), results (line 220) and methods (lines 478-484) sections:

      Adaptation in the Introduction section: “Specifically, participants received trial-by-trial instructions about the probability (0%, 25%, 50%, 75% and 100%) and intensity (weak, moderate, strong) of a potentially painful upcoming electrical stimulation, time-locked by a countdown clock (see Fig.1A). While stimulations were always delivered on 100% trials and never on 0% trials, most of the other trials (25%-75%) did not contain the expected stimulation and hence provoked an omission PE.”

      Adaptation in the Results section: “Indeed, the provided instructions did not map exactly onto the actually experienced probabilities, but were all followed by stimulation in 25% on the trials (except for the 0% trials and the 100% trials).”

      Adaptation in the Methods section: “Since we were mainly interested in how omissions of threat are processed, we wanted to maximize and balance the number of omission trials across the different probability and intensity levels, while also keeping the total number of presentations per probability and intensity instruction constant. Therefore, we crossed all non-0% probability levels (25, 50, 75, 100) with all intensity levels (weak, moderate, strong) (12 trials). The three 100% trials were always followed by the stimulation of the instructed intensity, while stimulations were omitted in the remaining nine trials. Six additional trials were intermixed in each run: Three 0% omission trials with the information that no electrical stimulation would follow (akin to 0% Probability information, but without any Intensity information as it does not apply); and three trials from the Probability x Intensity matrix that were followed by electrical stimulation (across the four runs, each Probability x Intensity combination was paired at least once, and at most twice with the electrical stimulation).”

      Could the incongruence between the instructed and experienced reinforcement rate have detrimental effects on the probability effect? We agree with reviewer 2 that it is possible that the inconsistency between instructed and experienced reinforcement rates could have rendered the exact probability information less informative to participants, which might have resulted in them paying less attention to the probability information whenever the probability was not 0% or 100%. This might to some extent explain the relatively larger difference in responding between 0% and 25% to 75% trials, but the relatively smaller differences between the 25% to 75% trials.

      However, there are good reasons to believe that the relatively smaller difference between 25% to 75% trials was not caused by the “inaccurate” nature of our instructions, but is inherent to “uncertain” probabilities.

      We added a description of these reasons to the supplementary materials in a supplementary note (supplementary note 4; lines 97-129 in supplementary materials), and added a reference to this note in the methods section (lines 488-490).

      “Supplementary Note 4: “Accurate” probability instructions do not alter the Probability-effect

      A question that was raised by the reviewers was whether the inconsistency between the probability instruction and the experienced reinforcement rate could have detrimental effects on the Probability-related results; especially because the effect of Probability was smaller when only including non-0% trials.

      However, there are good reasons to believe that the relatively smaller difference between 25% to 75% trials was not caused by the “inaccurate” nature of our instructions, but that they are inherent to “uncertain” probabilities.

      First, in a previously unpublished pilot study, we provided participants with “accurate” probability instructions, meaning that the instruction corresponded to the actual reinforcement rate (e.g., 75% instructions were followed by a stimulation in 75% of the trials etc.). In line with the present results and our previous behavioral study (Willems & Vervliet, 2021), the results of this pilot (N = 20) showed that the difference in the reported relief between the different probability levels was largest when comparing 0% and the rest (25%, 50% and 75%). Furthermore the overall effect size of Probability (excluding 0%) matched the one of our previous behavioral study (Willems & Vervliet, 2021): ηp2 = +/- 0.50.”

      Author response image 1.

      Main effect of Probability including 0% : F(1.74,31.23) = 53.94, p < .001, ηp2 = 0.75

      Main effect of Probability excluding 0%: F(1.50, 28.43) = 21.03, p < .001, ηp2 = 0.53

      Second, also in other published studies that used CSs with varying reinforcement rates (which either included explicit written instructions of the reinforcement rates or not) showed that the difference in expectations, anticipatory SCR or omission SCR was largest when comparing the CS0% to the other CSs of varying reinforcement rates (Grings & Sukoneck, 1971; Öhman et al., 1973; Ojala et al., 2022).

      Together, this suggests that when there is a possibility of stimulation, any additional difference in probability will have a smaller effect on the omission responses, irrespective of whether the underlying reinforcement rate is accurate or not.

      Adaptation to methods section: “Note that, based on previous research, we did not expect the inconsistency between the instructed and perceived reinforcement rate to have a negative effect on the Probability manipulation (see Supplementary Note 4).”

      Did dynamic learning impact the believability of the instructions?

      Although we tried to minimize learning in our paradigm by providing instructions that trials are independent from one another, we agree with the reviewers that this cannot preclude all learning. Any remaining learning effects should present themselves by downweighing the effect of the probability instructions over time. We controlled for this time-effect by including a “run” regressor in our analyses. Results of the Run regressor for subjective relief and omission-related SCR are presented in Supplemental Figure 5. These figures show that although there was a general drop in reported relief pleasantness and omission SCR over time, the effects of probability and intensity remained present until the last run. This indicates that even though some learning might have taken place, the main manipulations of probability and intensity were still present until the end of the task.

      Adaptations in the revised manuscript: We more clearly referred to the results of the Blockregressor which were presented in the supplementary material in the results section of the main paper (lines 159-162).

      Note that while there was a general drop in reported relief pleasantness and omission SCR over time, the effects of Probability and Intensity remained present until the last run (see Supplementary Figure 5). This further confirms that probability and intensity manipulations were effective until the end of the task.

      In the following sections of the rebuttal letter, we will go over the rest of the comments and our responses one by one.

      Reviewer #1 (Public Review):

      Summary:

      Willems and colleagues test whether unexpected shock omissions are associated with reward-related prediction errors by using an axiomatic approach to investigate brain activation in response to unexpected shock omission. Using an elegant design that parametrically varies shock expectancy through verbal instructions, they see a variety of responses in reward-related networks, only some of which adhere to the axioms necessary for prediction error. In addition, there were associations between omission-related responses and subjective relief. They also use machine learning to predict relief-related pleasantness, and find that none of the a priori "reward" regions were predictive of relief, which is an interesting finding that can be validated and pursued in future work.

      Strengths:

      The authors pre-registered their approach and the analyses are sound. In particular, the axiomatic approach tests whether a given region can truly be called a reward prediction error. Although several a priori regions of interest satisfied a subset of axioms, no ROI satisfied all three axioms, and the authors were candid about this. A second strength was their use of machine learning to identify a relief-related classifier. Interestingly, none of the ROIs that have been traditionally implicated in reward prediction error reliably predicted relief, which opens important questions for future research.

      Weaknesses:

      To ensure that the number of omissions is similar across conditions, the task employs inaccurate verbal instructions; i.e. 25% of shocks are omitted, regardless of whether subjects are told that the probability is 100%, 75%, 50%, 25%, or 0%. Given previous findings on interactions between verbal instruction and experiential learning (Doll et al., 2009; Li et al., 2011; Atlas et al., 2016), it seems problematic a) to treat the instructions as veridical and b) average responses over time. Based on this prior work, it seems reasonable to assume that participants would learn to downweight the instructions over time through learning (particularly in the 100% and 0% cases); this would be the purpose of prediction errors as a teaching signal. The authors do recognize this and perform a subset analysis in the 21 participants who showed parametric increases in anticipatory SCR as a function of instructed shock probability, which strengthened findings in the VTA/SN; however given that one-third of participants (n=10) did not show parametric SCR in response to instructions, it seems like some learning did occur. As prediction error is so important to such learning, a weakness of the paper is that conclusions about prediction error might differ if dynamic learning were taken into account.

      We thank the reviewer for raising this important concern. We believe we replied to all the issues raised in the general reply above.

      Lastly, I think that findings in threat-sensitive regions such as the anterior insula and amygdala may not be adequately captured in the title or abstract which strictly refers to the "human reward system"; more nuance would also be warranted.

      We fully agree with this comment and have changed the title and abstract accordingly.

      Adaptations in the revised manuscript: We adapted the title of the manuscript.

      “Omissions of Threat Trigger Subjective Relief and Prediction Error-Like Signaling in the Human Reward and Salience Systems”

      Adaptations in the revised manuscript: We adapted the abstract (lines 27-29).

      “In line with recent animal data, we showed that the unexpected omission of (painful) electrical stimulation triggers activations within key regions of the reward and salience pathways and that these activations correlate with the pleasantness of the reported relief.”

      Reviewer #2 (Public Review):

      The question of whether the neural mechanisms for reward and punishment learning are similar has been a constant debate over the last two decades. Numerous studies have shown that the midbrain dopamine neurons respond to both negative and salient stimuli, some of which can't be well accounted for by the classic RL theory (Delgado et al., 2007). Other research even proposed that aversive learning can be viewed as reward learning, by treating the omission of aversive stimuli as a negative PE (Seymour et al., 2004).

      Although the current study took an axiomatic approach to search for the PE encoding brain regions, which I like, I have major concerns regarding their experimental design and hence the results they obtained. My biggest concern comes from the false description of their task to the participants. To increase the number of "valid" trials for data analysis, the instructed and actual probabilities were different. Under such a circumstance, testing axiom 2 seems completely artificial. How does the experimenter know that the participants truly believe that the 75% is more probable than, say, the 25% stimulation? The potential confusion of the subjects may explain why the SCR and relief report were rather flat across the instructed probability range, and some of the canonical PE encoding regions showed a rather mixed activity pattern across different probabilities. Also for the post-hoc selection criteria, why pick the larger SCR in the 75% compared to the 25% instructions? How would the results change if other criteria were used?

      We thank the reviewer for raising this important concern. We believe the general reply above covers most of the issues raised in this comment. Concerning the post-hoc selection criteria, we took 25% < 75% as criterium because this was a quite “lenient” criterium in the sense that it looked only at the effects of interest (i.e., did anticipatory SCR increase with increasing instructed probability?). However, also when the criterium was more strict (e.g., selecting participants only if their anticipatory SCR monotonically increased with each increase in instructed probability 0% < 25% < 50% < 75% < 100%, N = 11 participants), the probability effect (ωp2 = 0.08), but not the intensity effect, for the VTA/SN remained.

      To test axiom 3, which was to compare the 100% stimulation to the 0% stimulation conditions, how did the actual shock delivery affect the fMRI contrast result? It would be more reasonable if this analysis could control for the shock delivery, which itself could contaminate the fMRI signal, with extra confound that subjects may engage certain behavioral strategies to "prepare for" the aversive outcome in the 100% stimulation condition. Therefore, I agree with the authors that this contrast may not be a good way to test axiom 3, not only because of the arguments made in the discussion but also the technical complexities involved in the contrast.

      We thank the reviewer for addressing this additional confound. It was indeed impossible to control for the delivery of shock since the delivery of the shock was always present on the 100% trials (and thus completely overlapped with the contrast of interest). We added this limitation to our discussion in the manuscript. In addition, we have also added a suggestion for a contrast that can test the “no surprise equivalence” criterium.

      Adaptations in the revised manuscript: We adapted lines 358-364.

      “Thus, given that we could not control for the delivery of the stimulation in the 100% > 0% contrast (the delivery of the stimulation completely overlapped with the contrast of interest), it is impossible to disentangle responses to the salience of the stimulation from those to the predictability of the outcome. A fairer evaluation of the third axiom would require outcomes that are roughly similar in terms of salience. When evaluating threat omission PE, this implies comparing fully expected threat omissions following 0% instructions to fully expected absence of stimulation at another point in the task (e.g. during a safe intertrial interval).”

      Reviewer #3 (Public Review):

      We thank the reviewer for their comments. Overall, based on the reviewer’s comments, we noticed that there was an imbalance between a focus on “relief” in the introduction and the rest of the manuscript and preregistration. We believe this focus raised the expectation that all outcome measures were interpreted in terms of the relief emotion. However, this was not what we did nor what we preregistered. We therefore restructured the introduction to reduce the focus on relief.

      Adaptations in the revised manuscript: We restructured the introduction of the manuscript. Specifically, after our opening sentence: “We experience a pleasurable relief when an expected threat stays away1” we only introduce the role of relief for our research in lines 79-89.

      “Interestingly, unexpected omissions of threat not only trigger neural activations that resemble a reward PE, they are also accompanied by a pleasurable emotional experience: relief. Because these feelings of relief coincide with the PE at threat omission, relief has been proposed to be an emotional correlate of the threat omission PE. Indeed, emerging evidence has shown that subjective experiences of relief follow the same time-course as theoretical PE during fear extinction. Participants in fear extinction experiments report high levels of relief pleasantness during early US omissions (when the omission was unexpected and the theoretical PE was high) and decreasing relief pleasantness over later omissions (when the omission was expected and the theoretical PE was low)22,23. Accordingly, preliminary fMRI evidence has shown that the pleasantness of this relief is correlated to activations in the NAC at the time of threat omission. In that sense, studying relief may offer important insights in the mechanism driving safety learning.”

      Summary:

      The authors conducted a human fMRI study investigating the omission of expected electrical shocks with varying probabilities. Participants were informed of the probability of shock and shock intensity trial-by-trial. The time point corresponding to the absence of the expected shock (with varying probability) was framed as a prediction error producing the cognitive state of relief/pleasure for the participant. fMRI activity in the VTA/SN and ventral putamen corresponded to the surprising omission of a high probability shock. Participants' subjective relief at having not been shocked correlated with activity in brain regions typically associated with reward-prediction errors. The overall conclusion of the manuscript was that the absence of an expected aversive outcome in human fMRI looks like a reward-prediction error seen in other studies that use positive outcomes.

      Strengths:

      Overall, I found this to be a well-written human neuroimaging study investigating an often overlooked question on the role of aversive prediction errors, and how they may differ from reward-related prediction errors. The paper is well-written and the fMRI methods seem mostly rigorous and solid.

      Weaknesses:

      I did have some confusion over the use of the term "prediction-error" however as it is being used in this task. There is certainly an expectancy violation when participants are told there is a high probability of shock, and it doesn't occur. Yet, there is no relevant learning or updating, and participants are explicitly told that each trial is independent and the outcome (or lack thereof) does not affect the chances of getting the shock on another trial with the same instructed outcome probability. Prediction errors are primarily used in the context of a learning model (reinforcement learning, etc.), but without a need to learn, the utility of that signal is unclear.

      We operationalized “prediction error” as the response to the error in prediction or the violation of expectancy at the time of threat omission. In that sense, prediction error and expectancy violation (which is more commonly used in clinical research and psychotherapy; Craske et al., 2014) are synonymous. While prediction errors (or expectancy violations) are predominantly studied in learning situations, the definition in itself does not specify how the “expectancy” or “prediction” arises: whether it was through learning based on previous experience or through mere instruction. The rationale why we moved away from a conditioning study in the present manuscript is discussed in our general reply above.

      We agree with the reviewer that studying prediction errors outside a learning context limits the ecological validity of the task. However, we do believe there is also a strength to this approach. Specifically, the omission-related responses we measure are less confounded by subsequent learning (or updating of the wrongful expectation). Any difference between our results and prediction error responses in learning situation can therefore point to this exact difference in paradigm, and can thus identify responses that are specific to learning situations.

      An overarching question posed by the researchers is whether relief from not receiving a shock is a reward. They take as neural evidence activity in regions usually associated with reward prediction errors, like the VTA/SN . This seems to be a strong case of reverse inference. The evidence may have been stronger had the authors compared activity to a reward prediction error, for example using a similar task but with reward outcomes. As it stands, the neural evidence that the absence of shock is actually "pleasurable" is limited-albeit there is a subjective report asking subjects if they felt relief.

      We thank the reviewer for cautioning us and letting us critically reflect on our interpretation. We agree that it is important not to be overly enthusiastic when interpreting fMRI results and to attribute carelessly psychological functions to mere activations. Therefore, we will elaborate on the precautions we took not to minimize detrimental reverse inference.

      First, prior to analyzing our results, we preregistered clear hypotheses that were based on previous research, in addition to clear predictions, regions of interest and a testing approach on OSF. With our study, we wanted to investigate whether unexpected omissions of threat: (1) triggered activations in the VTA/SN, putamen, NAc and vmPFC (as has previously been shown in animal and human studies); (2) represent PE signals; and (3) were related to self-reported relief, which has also been shown to follow a PE time-curve in fear extinction (Vervliet et al., 2017). Based on previous research, we selected three criteria all PE signals should comply to. This means that if omission-related activations were to represent true PE signals, they should comply to these criteria. However, we agree that it would go too far to conclude based on our research that relief is a reward, or even that the omission-related activations represent only PE signals. While we found support for most of our hypotheses, this does not preclude alternative explanations. In fact, in the discussion, we acknowledge this and also discuss alternative explanations, such as responding to the salience (lines 395-397; “One potential explanation is therefore that the deactivation resulted from a switch from default mode to salience network, triggered by the salience of the unexpected threat omission or by the salience of the experienced stimulation.”), or anticipation (line 425-426; “... we cannot conclusively dismiss the alternative interpretation that we assessed (part of) expectancy instead”).

      Second, we have deliberately opted to only use descriptive labels such as omission-related activations when we are discussing fMRI results. Only when we are talking about how the activations were related to self-reported relief, we talk about relief-related activations.

      I have some other comments, and I elaborate on those above comments, below:

      (1) A major assumption in the paper is that the unexpected absence of danger constitutes a pleasurable event, as stated in the opening sentence of the abstract. This may sometimes be the case, but it is not universal across contexts or people. For instance, for pathological fears, any relief derived from exposure may be short-lived (the dog didn't bite me this time, but that doesn't mean it won't next time or that all dogs are safe). And even if the subjective feeling one gets is temporary relief at that moment when the expected aversive event is not delivered, I believe there is an overall conflation between the concepts of relief and pleasure throughout the manuscript. Overall, the manuscript seems to be framed on the assumption that "aversive expectations can transform neutral outcomes into pleasurable events," but this is situationally dependent and is not a common psychological construct as far as I am aware.

      We thank the reviewer for their comment. We have restructured the introduction because we agree with the reviewer that the introduction might have set false expectations concerning our interpretation of the results. The statements related to relief have been toned down in the revised manuscript.

      Still, we want to note that the initial opening statement “unexpected absence of danger constitutes the pleasurable emotion relief” was based on a commonly used definition of relief that states that relief refers to “the emotion that is triggered by the absence of expected or previously experienced negative stimulation ” (Deutsch, 2015). Both aspects that it is elicited by the absence of an otherwise expected aversive event and that it is pleasurable in nature has received considerable empirical support in emotion and fear conditioning research (Deutsch et al., 2015; Leknes et al., 2011; Papalini et al., 2021; Vervliet et al., 2017; Willems & Vervliet, 2021).

      That said, the notion that the feeling of relief is linked to the (reward) prediction error underlying the learning of safety is included in several theoretical papers in order to explain the commonly observed dopaminergic response at the time of threat omission (both in animals and humans; Bouton et al., 2020; Kalisch et al., 2019; Pittig et al., 2020).

      Together, these studies indicate that the definition of relief, and its potential role in threat omission-driven learning is – at least in our research field – established. Still, we felt that more direct research linking feelings of relief to omission-related brain responses was warranted.

      One of the main reasons why we specifically focus on the “pleasantness” of the relief is to assess the hedonic impact of the threat omission, as has been done in previous studies by our lab and others (Leknes et al., 2011; Leng et al., 2022; Papalini et al., 2021; Vervliet et al., 2017; Willems & Vervliet, 2021). Nevertheless, we agree with the reviewer that the relief we measure is a short-lived emotional state that is subjected to individual differences (as are all emotions).

      (2) The authors allude to this limitation, but I think it is critical. Specifically, the study takes a rather simplistic approach to prediction errors. It treats the instructed probability as the subjects' expectancy level and treats the prediction error as omission related activity to this instructed probability. There is no modeling, and any dynamic parameters affected by learning are unaccounted for in this design . That is subjects are informed that each trial is independently determined and so there is no learning "the presence/absence of stimulations on previous trials could not predict the presence/absence of stimulation on future trials." Prediction errors are central to learning. It is unclear if the "relief" subjects feel on not getting a shock on a high-probability trial is in any way analogous to a prediction error, because there is no reason to update your representation on future trials if they are all truly independent. The construct validity of the design is in question.

      (3) Related to the above point, even if subjects veered away from learning by the instruction that each trial is independent, the fact remains that they do not get shocks outside of the 100% probability shock. So learning is occurring, at least for subjects who realize the probability cue is actually a ruse.

      We thank the reviewer for raising these concerns. We believe that the general reply above covers the issues raised in points 2 and 3.

      (4) Bouton has described very well how the absence of expected threat during extinction can create a feeling of ambiguity and uncertainty regarding the signal value of the CS. This in large part explains the contextual dependence of extinction and the "return of fear" that is so prominent even in psychologically healthy participants. The relief people feel when not receiving an expected shock would seem to have little bearing on changing the long-term value of the CS. In any event, the authors do talk about conditioning (CS-US) in the paper, but this is not a typical conditioning study, as there is no learning.

      We fully agree with the reviewer that our study is no typical conditioning study. Nevertheless, because our research mostly builds on recent advances in the fear extinction domain, we felt it was necessary to introduce the fear extinction procedure and related findings. In the context of fear extinction learning, we have previously shown that relief is an emotional correlate of the prediction error driving acquisition of the novel safety memory (CSnoUS; Papalini et al., 2021; Vervliet et al., 2017). The ambiguity Bouton describes is the result of extinguished CS holding multiple meanings once the safety memory is acquired. Does it signal danger or safety? We agree with Bouton that the meaning of the CS for any new encounter will depend on the context, and the passage of time, but also on the initial strength of the safety acquisition (which is dependent on the size of the prediction error, and hence the amount of relief; Craske et al., 2014). However, it was not our objective to directly study the relation of relief to subsequent CS value, and our design is not tailored to do so post hoc.

      (5) In Figure 2 A-D, the omission responses are plotted on trials with varying levels of probability. However, it seems to be missing omission responses in 0% trials in these brain regions. As depicted, it is an incomplete view of activity across the different trial types of increasing threat probability.

      We thank the reviewer for pointing out this unclarity. The betas that are presented in the figures represent the ROI averages from each non-0% vs 0% contrasts (i.e., 25%>0%; 50%>0%; and 75%>0% for the weak, moderate and strong intensity levels). Any positive beta therefore indicates a stronger activation in the given region compared to a fully predicted omission. Any negative beta indicates a weaker activation.

      Adaptations in the revised manuscript: We have adapted the figure captions of figures 2 and 3.

      “The extracted beta-estimates in figures A-D represent the ROI averages from each non0% > 0% contrast (i.e., 25%>0%; 50%>0%; and 75%>0% for the weak, moderate and strong intensity levels). Any positive beta therefore indicates a stronger activation in the given region compared to a fully predicted omission. Any negative beta indicates a weaker activation.”

      (6) If I understand Figure 2 panels E-H, these are plotting responses to the shock versus no-shock (when no-shock was expected). It is unclear why this would be especially informative, as it would just be showing activity associated with shocks versus no-shocks. If the goal was to use this as a way to compare positive and negative prediction errors, the shock would induce widespread activity that is not necessarily reflective of a prediction error. It is simply a response to a shock. Comparing activity to shocks delivered after varying levels of probability (e.g., a shock delivered at 25% expectancy, versus 75%, versus 100%) would seem to be a much better test of a prediction error signal than shock versus no-shock.

      We thank the reviewer for this comment. The purpose of this preregistered contrast was to test whether fully predicted outcomes elicited equivalent activations in our ROIs (corresponding to the third prediction error axiom). Specifically, if a region represents a pure prediction error signal, the 100% (fully predicted shocks) > 0% (fully predicted shock omissions) contrast should be nonsignificant, and follow-up Bayes Factors would further provide evidence in favor of this null-hypothesis.

      We agree with the reviewer that the delivery of the stimulation triggers widespread activations in our regions of interest that confounded this contrast. However, given that it was a preregistered test for the prediction error axioms, we cannot remove it from the manuscript. Instead, we have argued in the discussion that future studies who want to take an axiomatic stance should consider alternative tests to examine this axiom.

      Adaptations in the revised manuscript: We adapted lines 358-364.

      “Thus, given that we could not control for the delivery of the stimulation in the 100% > 0% contrast (the delivery of the stimulation completely overlapped with the contrast of interest), it is impossible to disentangle responses to the salience of the stimulation from those to the predictability of the outcome. A fairer evaluation of the third axiom would require outcomes that are roughly similar in terms of salience. When evaluating threat omission PE, this implies comparing fully expected threat omissions following 0% instructions to fully expected absence of stimulation at another point in the task (e.g. during a safe intertrial interval).”

      Also note that our task did not lend itself for an in-depth analysis of aversive (worse-thanexpected) prediction error signals, given that there was only one stimulation trial for each probability x intensity level (see Supplemental Figure 1). The most informative contrast that can inform us about aversive prediction error signals contrasts all non-100% stimulation trials with all 100% stimulation trials. The results of this contrast are presented in Supplemental Figure 16 and Supplemental Table 11 for completeness.

      (7) I was unclear what the results in Figure 3 E-H were showing that was unique from panels A-D, or where it was described. The images looked redundant from the images in A-D. I see that they come from different contrasts (non0% > 0%; 100% > 0%), but I was unclear why that was included.

      We thank the reviewer for this comment. Our answer is related to that of the previous comment. Figure 3 presents the results of the axiomatic tests within the secondary ROIs we extracted from a wider secondary mask based on the non0%>0% contrast.

      (8) As mentioned earlier, there is a tendency to imply that subjects felt relief because there was activity in "the reward pathway ."

      We thank the reviewer for their comment, but we respectfully disagree. Subjective relief was explicitly probed when the instructed stimulations stayed away. In the manuscript we only talk about “relief” when discussing these subjective reports. We found that participants reported higher levels of relief-pleasantness following omissions of stronger and more probable threat. This was an observation that matches our predictions and replicates our previous behavioral study (Willems & Vervliet, 2021).

      The fMRI evidence is treated separately from the “pleasantness” of the relief. Specifically, we refrain from calling the threat omission-related neural responses “relief-activity” as this would indeed imply that the activation would only be attributed to this psychological function. Instead, we talked about omission-related activity, and we assessed whether it complied to the prediction error criteria as specified by the axiomatic approach.

      Only afterwards, because we hypothesized that omission-related fMRI activation and selfreported relief-pleasantness were related, and because we found a similar response pattern for both measures, we examined how relief and omission-related fMRI activations within our ROIs were related on a trial-by-trial basis. To this end, we entered relief-pleasantness ratings as a parametric modulator to the omission regressor.

      By no means do we want to reduce an emotional experience (relief) to fMRI activations in isolated regions in the brain. We agree with the reviewer that this would be far too reductionist. We therefore also ran a pre-registered LASSO-PCR analysis in order to identify whether a whole-brain pattern of activations can predict subjective relief (independent from the exact instructions we gave, and independent of our a priori ROIs). This analysis used trialby-trial patterns of activation across all voxels in the brain as the predictor and self-reported relief as the outcome variable. It is therefore completely data-driven and can be seen as a preregistered exploratory analysis that is intended to inform future studies.

      (9) From the methods, it wasn't entirely clear where there is jitter in the course of a trial. This centers on the question of possible collinearity in the task design between the cue and the outcome. The authors note there is "no multicollinearity between anticipation and omission regressors in the firstlevel GLMs," but how was this quantified? b The issue is of course that the activity coded as omission may be from the anticipation of the expected outcome.

      We thank the reviewer for pointing out this unclarity. Jitter was introduced in all parts of the trial: i.e., the duration of the inter-trial interval (4-7s), countdown clock (3-7s), and omission window (4-8s) were all jittered (see fig. 1A and methods section, lines 499-507). We added an additional line to the method section.

      Adaptations in the revised manuscript: We added an additional line of to the methods section to further clarify the jittering (lines 498-500).

      “The scale remained on the screen for 8 seconds or until the participant responded, followed by an intertrial interval between 4 and 7 seconds during which only a fixation cross was shown. Note that all phases in the trial were jittered (i.e., duration countdown clock, duration outcome window, duration intertrial interval).”

      Multicollinearity between the omission and anticipation regressors was assessed by calculating the variance inflation factor (VIF) of omission and anticipation regressors in the first level GLM models that were used for the parametric modulation analyses.

      Adaptations in the revised manuscript: We replaced the VIF abbreviation with “variance inflation factor” (line 423-424).

      “Nevertheless, there was no multicollinearity between anticipation and omission regressors in the first-level GLMs (VIFs Variance Inflation Factor, VIF < 4), making it unlikely that the omission responses purely represented anticipation.”

      (10) I did not fully understand what the LASSO-PCR model using relief ratings added. This result was not discussed in much depth, and seems to show a host of clusters throughout the brain contributing positively or negatively to the model. Altogether, I would recommend highlighting what this analysis is uniquely contributing to the interpretation of the findings.

      The main added value of this analyses is that it uses a different approach altogether. Where the (mass univariate) parametric modulation analysis estimated in each voxel (and each ROI) whether the activity in this voxel/ROI covaried with the reported relief, a significant activation only indicated that this voxel was related to relief. However, given that each voxel/ROI is treated independently in this analysis, it remains unclear how the activations were embedded in a wider network across the brain, and which regions contributed most to the prediction of relief. The multivariate LASSO-PCR analysis approach we took attempts to overcome this limitation by examining if a more whole-brain pattern can predict relief. Because we use the whole-brain pattern (and not only our a priori ROIs), this analysis is completely data-driven and is intended to inform future studies. In addition, the LASSO-PCR model was cross-validated using five-fold cross-validation, which is also a difference (and a strength) compared to the mass univariate GLM approach.

      One interesting finding that only became evident when we combined univariate and multivariate approaches is that despite that the parametric modulation analysis showed that omission-related fMRI responses in the ROIs were modulated by the reported relief, none of these ROIs contributed significantly to the prediction of relief based on the identified signature. Instead, some of the contributing clusters fell within other valuation and errorprocessing regions (e.g. lateral OFC, mid cingulate, caudate nucleus). This suggests that other regions than our a priori ROIs may have been especially important for the subjective experience of relief, at least in this task. However, all these clusters were small and require further validation in out of sample participants. More research is necessary to test the generalizability and validity of the relief signature to new individuals and tasks, and to compare the signature with other existing signature models (e.g., signature of pain, fear, reward, pleasure). However, this was beyond the scope of the present study.

      Adaptations in the revised manuscript: We altered the explanation of the LASSO-PCR approach in the results section (lines 286-295) and the discussion (lines 399-402)

      Adaptations in the Results section: “The (mass univariate) parametric modulation analysis showed that omission-related fMRI activity in our primary and secondary ROIs correlated with the pleasantness of the relief. However, given that each voxel/ROI is treated independently in this analysis, it remains unclear how the activations were embedded in a wider network of activation across the brain, and which regions contributed most to the prediction of relief. To overcome these limitations, we trained a (multivariate) LASSO-PCR model (Least Absolute Shrinkage and Selection Operator-Regularized Principle Component Regression) in order to identify whether a spatially distributed pattern of brain responses can predict the perceived pleasantness of the relief (or “neural signature” of relief)31. Because we used the whole-brain pattern (and not only our a priori ROIs), this analysis is completely data driven and can thus identify which clusters contribute most to the relief prediction.”

      Adaptations in the Discussion section: “In addition to examining the PE-properties of neural omission responses in our a priori ROIs, we trained a LASSO-PCR model to establish a signature pattern of relief. One interesting finding that only became evident when we compared the univariate and multivariate approach was that none of our a priori ROIs appeared to be an important contributor to the multivariate neural signature, even though all of them (except NAc) were significantly modulated by relief in the univariate analysis.”

      In addition to the public peer review, the reviewers provided some recommendation on how to further improve our manuscript. We will reply to the recommendations below.

      Reviewer #1 (Recommendations For The Authors):

      Given that you do have trial-level estimates from the classifier analysis, it would be very informative to use learning models and examine responses trial-by-trial to test whether there are prediction errors that vary over time as a function of learning.

      We thank the reviewer for the suggestion. However, based on the results of the run-regressor, we do not anticipate large learning effects in our paradigm. As we mentioned in our responses above, we controlled for time-related drops in omission-responding by including a “run” regressor in our analyses. Results of this regressor for subjective relief and omission-related SCR showed that although there was a general drop in reported relief pleasantness and omission SCR over time, the effects of probability and intensity remained present until the last run. This suggests that even though some learning might have taken place, its effect was likely small and did not abolish our manipulations of probability and intensity. In any case, we cannot use the LASSO-PCR signature model to investigate learning, as this model uses the trial-level brain pattern at the time of US omission to estimate the associated level of relief. These estimates can therefore not be used to examine learning effects.

      Reviewer #2 (Recommendations For The Authors):

      The LASSO-PCR model feels rather disconnected from the rest of the paper and does not add much to the main theme. I would suggest to remove this part from the paper.

      We thank the reviewer for this suggestion. However, the LASSO-PCR analysis was a preregistered. We therefore cannot remove it from the manuscript. We hope to have clarified its added value in the revised version of the manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We have revised the manuscript mainly in the following aspects: (1) the data of electrophysiological and behavioral responses of larvae and adults to trehalose have been added, and the related figures and texts have been modified accordingly; (2) the photos of taste organs of larvae and adults indicating the position of recorded sensilla have been added; (3) the potential off-target effects of GR knock-out on other GR expressions has been carefully explained and revised in the relevant text; (4) the abstract has been revised to present the findings more technically in a limited number of words; (5) some details of experiments in Materials and Methods and some new literatures have been added; (6) a new figure (Figure 8) summarizing the main findings of the study has been added.

      In the following, we respond to the reviewers’ comments and suggestions one by one. We hope that our answers will satisfy you and the three reviewers. We are also very happy to get further valuable advices from you.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The process of taste perception is significantly more intricate and complex in Lepidopteran insects. This investigation provides valuable insights into the role of Gustatory receptors and their dynamics in the sensation of sucrose, which serves as a crucial feeding cue for insects. The article highlights the differential sensitivity of Grs to sucrose and their involvement in feeding and insect behavior.

      Strengths:

      To support the notion of the differential specificity of Gr to sucrose, this study employed electrophysiology, ectopic expression of Grs in Xenopus, genome editing, and behavioral studies on insects. This investigation offers a fundamental understanding of the gustation process in lepidopteran insects and its regulation of feeding and other gustation-related physiological responses. This study holds significant importance in advancing our comprehension of lepidopteran insect biology, gustation, and feeding behavior.

      Thank you for your recognition of our research.

      Weaknesses:

      While this manuscript demonstrates technical proficiency, there exists an opportunity for additional refinement to optimize comprehensibility for the intended audience. Several crucial sugars have been overlooked in the context of electrophysiology studies and should be incorporated. Furthermore, it is imperative to consider the potential off-target effects of Gr knock-out on other Gr expressions. This investigation focuses exclusively on Gr6 and Gr10, while neglecting a comprehensive narrative regarding other Grs involved in sucrose sensation.

      We accept the reviewer's suggestion. Because trehalose is a main sugar in insect blood, and it is converted by insects after feeding on plant sugars, we have added the new data on electrophysiological and behavioral responses of larvae and adults of Helicoverpa armigera to trehalose (see Figure 1-2, Figure 1-figure supplement 1, Figure 2-figure supplement 1). Now, the total eight sugars include 2 pentoses (arabinose and xylose), 4 hexoses (fructose, fucose, galactose and glucose), and 2 disaccharides (sucrose and trehalose), which were chosen because they are mainly present in host-plants of H. armigera and/or representative in the structure and source of sugars.

      We fully agree to the reviewer’s opinion and have already taken the potential off-target effects of CRISPR/Cas9 knockout of Gr on other GR expressions into consideration. To predict the potential off-target sites of sgRNA of Gr6 and Gr10 establishing homozygous mutants using CRISPR/Cas9 technology, we first use online software CasOFFinder (http://www.rgenome.net/cas-offinder/) to blast the genome of the wild type cotton bollworm and set the mismatch number less than or equal to 3. We found that Gr10 sgRNA had no potential potential off-target site, and the sgRNA of Gr6 had only one potential off-target site. Therefore, we designed primers according to the sequence of potential off-target sites of Gr6 sgRNA, and conducted PCR using genomic DNA of homozygous mutant as a template, performed Sanger sequencing on the PCR products obtained, and found that the potential off-target sites of Gr6 sgRNA were no different from those of the wild type. Particularly, concerning the sgRNA of Gr6 and Gr10 may produce off-target effects on other sugar receptor genes of H. armigera, we conducted the same off-target site analysis with the designed sgRNA on each of the other eight sugar receptor genes, and found that there were no off-target sites on these receptor genes (see Line254-256).

      Reviewer #2 (Public Review):

      Summary:

      To identify sugar receptors and assess the capacity of these genes the authors first set out to identify behavioral responses in larvae and adults as well as physiological response. They used phylogenetics and gene expression (RNAseq) to identify candidates for sugar reception. Using first an in vitro oocyte system they assess the responses to distinct sugars. A subsequent genetic analysis shows that the Gr10 and Gr6 genes provide stage specific functions in sugar perception.

      Strengths:

      A clear strength of the manuscript is the breadth of techniques employed allowing a comprehensive study in a non-canonical model species.

      Thank you for your recognition of our research.

      Weaknesses:

      There are no major weaknesses in the study for the current state of knowledge in this species. Since it is much basic work to establish a broader knowledge, context with other modalities remains unknown. It might have been possible to probe certain contexts known from the fruit fly, which would have strengthened the manuscript.

      Thank you so much for your suggestion. According to this suggestion, we further added some sentences probing sugar sensing and behaviors of fruit fly larvae in the Introduction and discussion sections (Line 68-71 in Introduction section, Line 395-399 in Discussion section).

      Reviewer #3 (Public Review):

      In this study, the authors combine electrophysiology, behavioural analyses, and genetic editing techniques on the cotton bollworm to identify the molecular basis of sugar sensing in this species.

      The larval and adult forms of this species feed on different plant parts. Larvae primarily consume leaves, which have relatively lower sugar concentrations, while adults feed on nectar, rich in sugar. Through a series of experiments-spanning electrophysiological recordings from both larval and adult sensillae, qPCR expression analysis of identified GRs from these sensillae, response profiles of these GRs to various sugars via heterologous expression in Xenopus oocytes, and evaluations of CRISPR mutants based on these parameters-the authors discovered that larvae and adults employ distinct GRs for sugar sensing. While the larva uses the highly sensitive GR10, the adult uses the less sensitive and broadly tuned GR6. This differential use of GRs are in keeping with their behavioral ecology.

      The data are cohesive and consistently align across the methodologies employed. They are also well presented and the manuscript is clearly written.

      Recommendations for the authors:

      While appreciating the quality of the work and its presentation, we have a few comments for the authors, should they wish to consider them, that would significantly improve the presentation of the work.

      Title: Could the authors please revisit their title to better reflect the main finding of their work?

      The title has been changed into “The larva and adult of Helicoverpa armigera use differential gustatory receptors to sense sugars”.

      Text: There are a few comments related to the text, and these are listed below:

      (1) Could the authors place their work in the context of what's known about sugar sensing in Drosophila larva and adult?

      In the Introduction section, we added the status of research on sugar perception in Drosophila larvae, pointing out "No external sugar-sensing mechanism in Drosophila larvae has yet been characterized." (Line 70-71); in the Discussion section, the research progress of sugar sensing in Drosophila adults and larvae was also summarized (Line 397-399).

      (2) For each results section, could the authors please include a sentence or two that interprets the data in the context of previously presented data?

      We accept the reviewer's suggestion. In order to make it easy for readers to follow up, we included a sentence interprets the above data at the beginning of each part of the Results on the premise of avoiding duplication.

      (3) Could the authors please provide details of the generation and screening of the CRISPR mutants?

      We have added more details on mutant establishment and screening in the Materials and Methods section (Line 722-726, 729-732).

      Figures: Could the authors please include images and schematics wherever possible? For example, a schematic depicting the position of the sense organs and one summarising the main findings of the studies.

      In Figure 1 we added the photo of each taste organ, on which the recorded sensilla were indicated. We also added a new figure, Figure 8, summarizing the main findings of the study.

      Choice of Sugars: Could the authors please justify their choice of sugars they have used in the analyses?

      In the first paragraph of the Results section of the article, we further explain the reasons for using the sugars in the study. “We first investigated the electrophysiological responses of the lateral and medial sensilla styloconica in the larval maxillary galea to eight sugars. These sugars were chosen because they are mostly found in host-plants of H. armigera or are representative in the structure and source of sugars.”

      In addition to this, there are several specific comments in the detailed reviewers comments below, which the authors could consider responding to.

      Reviewer #1 (Recommendations For The Authors):

      The article titled "Sucrose taste receptors exhibit dissimilarities between larval and adult stages of a moth" by Shuai-Shuai Zhang and colleagues provides an intriguing analysis. The authors have conducted a meticulously planned and executed study. However, I do have some inquiries.

      (1) What precisely does the term "differ" signify in the title? It can be expounded upon in terms of differing in expression or sensitivity. The title could benefit from being more informative. The authors should appropriately specify the insect species in the title of the paper. This would make it more comprehensible to readers. Merely mentioning the term "moth" does not provide any information about the model organism. Hence, it would be preferable to mention Helicoverpa armigera instead of using the generic term "moth" in the title.

      Thank you for your suggestions. We considered it better to emphasize that the receptors for sucrose are different, and we have accepted the suggestion of adding the name of the animal. The title has been changed into “The larva and adult of Helicoverpa armigera use differential gustatory receptors to sense sugars”.

      (2) The abstract is written in a simple and easily understandable manner, but it overlooks important findings from a technical standpoint.

      We add some key experimental techniques to illustrate some important findings in the Abstract.

      (3). Almost all herbivorous insects are said to consume plants and utilize sucrose as a stimulus for feeding, as stated by the authors. Sucrose, glucose, and fructose sugar are among the commonly observed stimulants for feeding in numerous insects. It would be appropriate to incorporate not only sucrose but also glucose and fructose as feeding stimulants for almost all herbivorous insects.

      Thank you for your suggestion. Sucrose is the major sugar in plants, and its concentration varies greatly from tissue to tissue, while the concentration of the hexose sugars is much lower and the concentration does not change much. In Line 48, we state that sucrose, glucose, and fructose are feeding stimuli for herbivorous insects. From the previous studies, it seems that sucrose is the strongest, followed by fructose, and finally glucose. The cotton bollworm larvae showed no electrophysiological and behavioral response to glucose.

      (4) The reason why trehalose is not considered in the electrophysiology analysis is unclear. Given that trehalose is a major sugar in insects and plants, it would be intriguing to include it in the analysis.

      We have accepted the reviewer's suggestion, and supplemented the electrophysiological responses of taste organs in larvae and adults of Helicoverpa armigera to trehalose (Figure 1, Figure 1-Figure Supplement 1), and also tested the behavioral responses of the larvae and adults to trehalose (Figure 2, Figure 2-Figure Supplement 1). Therefore, all the related figures have been changed.

      (5) The author's intention regarding the co-receptor relationship between Gr5 and Gr6 (line 211) is unclear. If this is indeed the case, then the reason for considering Gr5 in further studies remains uncertain.

      We have changed the sentence as follows: “Since Gr5 was highly expressed with Gr6 in the proboscis and tarsi (Figure 3D-3E, Figure 3—figure supplement 1), we suspected that Gr5 and Gr6 might be expressed in the same cells, and then tested the response profile of their co-expression in oocytes.”

      (6) The homologous nature of Grs is emphasized by the authors. It is not specified how the author ensured that the guide RNA targeting Gr6 or Gr10 did not result in off-target effects on other Grs.

      Thank you so much for your suggestion. We have rewritten the relevant paragraph (Line 238-251), detailing our tests and the results on the potential off-target effects of knocking out GRs by CRISPR/Cas9: “In order to predict the potential off-target sites of sgRNA of Gr6 and Gr10, we used online software Cas-OFFinder (http://www.rgenome.net/cas-offinder/) to blast the genome of H. armigera, and the mismatch number was set to less than or equal to 3. According to the predicted results, the Gr10 sgRNA had no potential off-target region but Gr6 sgRNA had one. Therefore, we amplified and sequenced the potential off-target region of Gr6-/- and found there was no frameshift or premature stop codon in the region compared to WT (Figure 5—figure supplement 2). It is worth mentioning that there was no potential off-target region of Gr6 and Gr10 sgRNA in other sugar receptor genes of H. armigera, Gr4, Gr5, Gr7, Gr8, Gr9, Gr11 and Gr12. We further found there was no difference in the response to xylose of the medial sensilla styloconica among WT, Gr10-/- and Gr6-/- (Figure 5—figure supplement 2). Furthermore, WT, Gr10-/- and Gr6-/- did not show differences in the larval body weight, adult lifespan, and number of eggs laid per female (Figure 5—figure supplement 2). All these results suggest that no off-target effects occurred in the study.”

      (7) Is it possible that knocking out Gr10 is not compensated for by the overexpression of Gr6 or other sucrose sensing Grs? Similarly, would the vice versa scenario hold true?

      In the Discussion section, we have added some sentences to discuss this issue: “From our results, knocking out Gr10 or Gr6 is unlikely to be compensated by overexpression of other sugar GRs. One of our recent studies showed that Orco knockout had no significant effect on the expression of most OR, IR and GR genes in adult antennae of H. armigera, but some genes were up- or down-regulated (Fan et al., 2022).”

      (8) What was the rationale for selecting nine candidate GR genes for expression analysis?

      Based on the reviewer's suggestion, we expanded the relevant paragraphs to illustrate the rationale for selecting nine candidate GR genes for expression analysis: “To reveal the molecular basis of sugar reception in the taste sensilla of H. armigera, we first analyzed the putative sugar gustatory receptor genes based on the reported gene sequences of GRs in H. armigera and their phylogenetic relationship of D. melanogaster sugar gustatory receptors (Jiang et al., 2015; Pearce et al., 2017; Xu et al., 2017). Nine putative sugar GR genes, Gr4–12 were identified, and their full-length cDNA sequences were cloned (The GenBank accession number is provided in Appendix—Table S1).” (Line 155-161)

      (9) What is the potential reason for the difference between the major larval sugar receptors of Drosophila and Lepidopterans?

      The difference between the major larval sugar receptors of Drosophila and Lepidopterans is probably due to differences in the food their larvae feed on. Fruit fly larvae feed on rotten fruit, the main sugar of which is fructose. The larvae of Lepidoptera mainly feed on plants, and the main sugar is sucrose. In the Discussion section, we have added a sentence “This is most likely due to fruit fly larvae feeding on rotten fruits, which contain fructose as the main sugar.” (Line 399-401)

      (10) There is a disparity in GRs, specifically GR5 and GR6, between the female antenna, proboscis, and tarsi. What could be the possible justification and significance of this?

      Thank you so much for this question. We have added a sentence in the Discussion section, “In this study, the expression patterns of 9 sugar GRs in three taste organs of adult H. armigera show that there is a disparity in GRs, specifically GR5 and GR6, between the female antenna, tarsi and proboscis, which may be an evolutionary adaptation reflecting subtle differentiation in the function of these taste organs in adult foraging. Antennae and tarsi play a role in the exploration of potential sugar sources, while the proboscis plays a more precise role in the final decision to feed.” (Line 433-438)

      (11) I suggest that a visual representation illustrating the positioning of GSNs, particularly the lateral and medial sensilla, in both larva and adult stages would enhance the correlation with the results.

      In Figure 1 we added the photo of each taste organ and the position of the recorded sensilla, and also added a new figure, Figure 8 summarizing the main findings of the studies.

      (12) Further experiments can be conducted to elucidate the precise molecular mechanisms, particularly the downstream effects of GRs, in order to establish the specificity of GRs more convincingly.

      Thank you so much for your suggestion. We have discussed the further experiments in the Discussion section, “To elucidate the precise molecular mechanisms of sugar reception in H. armigera is necessary to compare a series of single, double and even multiple Gr knock-out lines and investigate the downstream effects of the GRs.” (Line 363-369)

      (13) Figure 6 caption: In Figure 6 (D to I), the percentage of PER is depicted. There is redundancy in the Y-axis title (Percentage of PER) and the legend. This appears to be repetitive. I suggest that it would be better to include the Y-axis title only in Figure D or in Figures D and G.

      We accept the suggestion. Figure 7 (not Figure 6) has been revised accordingly.

      (14) In Figures 6A and 6C, there is inconsistency in the colors used for WT, Gr6, and Gr10. This could potentially confuse the reader. I recommend using the same colors in both figures instead of using a blue color. Please specify how the authors calculated the feeding area in Figure 6.

      We accept the reviewer's suggestion and have changed the color of Figure 7A, B. We have also added the detail method for calculating feeding area (Line 541-545).

      (15) In Two-choice tests, why did the authors use 0.01% Tween 80? Please provide comments on this.

      Use of 0.01% Tween 80 is to reduce the surface tension and increase the malleability of the solution. We have given detailed explanation in the Method section and cite the reference. (Line538-540)

      (16) It would be valuable if the authors could comment on the prospects of this study, considering that GRs play a vital role in controlling behavior and developmental pathways. What are the potential consequences of blocking or disrupting these receptors in terms of behavioral and developmental phenotypic deformities? Could this potentially lead to increased insect mortality?

      Thank you so much for your suggestions. In the last paragraph of the Discussion section, we have added the following perspectives, “Knockout of Gr10 or Gr6 led to a significant decrease in sugar sensitivity and food preference of the larvae and adults of H. armigera, respectively, which is bound to bring adverse consequences to survival and reproduction of the insects. Therefore, studying the molecular mechanisms underlying sugar perception in phytophagous insects may provide new insights into the behavioral ecology of this important and highly diverse group of insects, and measures blocking or disrupting sugar receptors could also have applications to control agricultural pests and improve crop yields worldwide” (Line 449-456).

      Reviewer #2 (Recommendations for The Authors):

      There are a few comments, that I feel would be beneficial to be addressed.

      • The authors used 7 different sugars for their experimental approach. While I agree that this is a sufficiently large collection for a study, I was wondering why they specifically chose these sugars; an explanatory section might be helpful for a reader to follow the reasoning.

      According to reviewer 1's suggestion, we increased trehalose to 8 sugars in experiments. Trehalose is a main sugar in insect blood. It is converted by insects after feeding on plant sugars. The 8 sugars were chosen because they are present in host-plants of H. armigera or are representative in the structure and source of sugars. They contain 2 pentoses (arabinose and xylose), 4 hexoses (fructose, fucose, galactose and glucose), and 2 disaccharides (sucrose and trehalose).

      • It might be beneficial to provide some broader overview on the gustatory system in the cotton bollworm, particularly at the larval stage since this may not be common knowledge. Along these lines eg. the complexity of sensilla types, organs and overall number (or estimation) of neurons might be good to know, a graphical representation of the sense organs might be informative.

      In the Introduction section, we give a more specific description on sugar sensitive GSNs in the taste system of the larva and adult of H. armigera, and cite the corresponding references.

      • Concerning phylogeny of GRs, it might be relevant to know how complete the genome information is and some more general background on GR diversity in the cotton bollworm.

      We agree to your opinion. According to this idea, we got the putative sugar GRs from the previously published genome (Pearce et al. 2017) and the related annotation of GRs (Jiang et al. 2015, Xu et al. 2012). We have made a more detailed explanation about this in the new version of the manuscript, “We first analyzed the putative sugar gustatory receptor genes based on the genome data of H. armigera (Pearce et al. 2017), the reported gene sequences of sugar GRs in H. armigera and their phylogenetic relationship of D. melanogaster sugar gustatory receptors (Jiang et al. 2015, Xu et al. 2012). All nine putative sugar GR genes in H. armigera, Gr4–12 were validated, and their full-length cDNA sequences were cloned (The GenBank accession number is provided in Appendix—Table S1).” (Line 155-161).

      • Generation of mutants based on CRISPR is intriguing and a powerful step. While the techniques are well described in the method section, there is no information concerning efficiency or broader feasibility of the approach. I feel it would be quite interesting to learn about how feasible or laborious the approach is to generate mutants (e.g. number of initial injected eggs, the resulting F0 offspring, number of back-crosses, number of screened F1s ....).

      In the Materials and Methods section, we have added specific success rates for each step in the process of building the two mutants (Line 722-726, 729-732).

      Reviewer #3 (Recommendations For The Authors):

      I want to congratulate the authors on this very nice study and have only minor comments for them.

      (1) It would be very nice to include pictures of the larva and adult of H. armigera. It would also help to have schematics of where the sensilla they are recording from are.

      We have added photos of four taste organs on which the recoded sensilla were indicated (Figure 1), and picture of the larva and adult on which the stimulating site was indicated (Figure 2).

      (2) A schematic summarising their findings, including the relevance to the animal's behavioural ecology, will greatly improve interpretations for the broader audience.

      A schematic summarizing the findings has been added.

      (3) The manner in which PIs are represented in figure 2A, B (among others) is confusing. Can the authors please plot the PI and not the feeding area? From the PI values listed beside the plot, it actually suggests that the larvae don't really show a preference. Could the authors please comment on this?

      Yes, sucrose has a significant stimulating effect on larva feeding, but the effect is not as large as the predicted based on the sensitivity of the sensillum, the main reasons are as follows: (1) there are many factors affecting larva feeding, sucrose is only one of them; (2) due to the substrate leaf discs also contain sugar, the effect of newly added sucrose may be reduced. After careful consideration, we think it is better to display the feeding area and PI together so that readers have a complete understanding of the data.

      (4) The heterologous expression experiments suggest that co-expression of GR6 with either GR10 or GR5 somehow suppress the response of the GR6 alone to fucose. Am I reading the data correctly? Why would this be? Perhaps the authors could discuss this. In this context, it would help to reproduce all the GR6 data together.

      Your interpretation is reasonable to a certain extent. The result of co-injection might be that Gr10 or Gr5 inhibited the response of Gr6. However, there is another possibility that the amount of Gr6 sRNA was diluted by co-injection of two GRs, resulting in a reduced response of Gr6 to fucose.

      (5) In general, for each results section, it would help to have a sentence or two that interprets the data in the context of previously presented data. This would help the reader digest the data and interpret it as they read along. Currently, the authors summarise the observations and leave all the interpretation to the discussion section.

      We accept the suggestion. In each part of the results, we have added a sentence to explain the above data, which will help readers to clarify the context of the research more easily.

      (6) Is the GR6 data in 4C not lined up correctly?

      Yes, it is right.

      (7) Line 228 suggests that the mutants were validating with qPCRs - I don't see that data.

      The mutants were not validating with qPCR. We used the ordinary PCR technology at the mRNA level to verify whether the related sequences were really deleted in the mutants.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #3 (Public Review):

      Kang, Huang, and colleagues have provided new data to address concerns regarding confirmation of LRRK1 and LRRK2 deletion in their mouse model and the functional impact of the modest loss of TH+ neurons observed in the substantia nigra of their double KO mice. In the revised manuscript, the new data around the characterization of the germline-deleted LRRK1 and LRRK2 mice add confidence that LRRK1 and LRRK2 can be deleted using the genetic approach. They have also added new text to the discussion to try and address some of the comments and questions raised regarding how LRRK1/2 loss may impact cell survival and the implications of this work for PD-linked variants in LRRK2 and therapeutic approaches targeting LRRK2.

      The new data provides additional support for the author's claims. I have provided below some suggestions for clarification/additions to the text that can be addressed without additional experiments.

      (1) The authors added additional text highlighting that more studies are warranted in mice where LRRK1/2 are deleted in other CNS cell types (microglia/astrocytes) to understand cell extrinsic drivers of the autophagy deficits observed in their previous work. It still remains unclear how loss of LRRK1/2 leads to increased apoptosis and gliosis in dopaminergic neurons in a cell-intrinsic manner, and, as suggested in the original review, it would be helpful to add some text to the discussion speculating on potential mechanisms by which this might occur.

      (2) Revisions have been made to the discussion to clarify their rationale around how variants in LRRK2 associated with PD may be loss-of-function to support the relevance of this mouse model to phenotypes observed in PD. However, as written, the argument that PD-linked variants are loss-offunction is based on the fact that the double KO mice have a mild loss of TH+ neurons while the transgenic mice overexpressing PD-linked LRRK2 variants often do not and that early characterization of kinase activity was done in vitro are relatively weak. Given that the majority of evidence generated by many labs in the field supports a gain-of-function mechanism, the discussion should be further tempered to better highlight the uncertainty around this (rather than strongly arguing for a loss-offunction effect). This could include the mention of increased Rab phosphorylation observed in cellular and animal models and opposing consequences on lysosomal function observed in cellular studies in KO and pathogenic variant expressing cells. Further, a reference to the Whiffen et al. 2020 paper mentioned by another reviewer should be included in the discussion for completeness.

      We thank the reviewer for the comments. The discussion has been further revised and expanded to explain the cell extrinsic microglial response to pathophysiological changes in DA neurons of cDKO mice and propose future studies of single-cell RNA-sequencing to identify molecular changes within DA neurons of cDKO mice that may drive their apoptotic death during aging.

      We also added paragraphs summarizing existing experimental evidence for the toxic gain-of-function mechanism (biochemical data of increased kinase activity but the lack of evidence for the elevated pRabs and the altered pLRRK2 driving dopaminergic neurodegeneration) and for the loss-of-function mechanism (genetic data of relevant physiological roles) as well as the relationships between LRRK1 and LRRK2 (functional homologues sharing functional domains and overlapping roles in dopaminergic neuron survival) and how dominantly inherited missense mutations can confer a loss of function mechanism (impairing its function in cis and inhibiting wild-type protein function in trans). We also provided a brief summary and discussion of the Whiffen et al. 2020 paper.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      This study identifies a family of solute transports in the enteric protist, Blastocystis, that may mediate the transport of glycolytic intermediates across the mitochondrial membrane. The study builds on previous observations suggesting that Blastocystis (and other Stramenopiles) are unusual in having a compartmentalized glycolytic pathway with enzymes involved in upper and lower glycolysis being located in the cytosol and mitochondria, respectively. In this study, the authors identified two putative Stamenopile metabolite transporters that are related to plant di/tricarboxylic acid transporters that might mediate the transport of glycolytic intermediates across the mitochondrial membrane. These GIC-transporters were localized to the Blastocystis mitochondrion using specific rabbit antibodies and shown to bind several glycolytic intermediates (including GAP, DHAP, and PEP) based on thermostability shift assays. Direct evidence for transport activity was obtained by reconstituting native proteins in proteoliposomes and measuring the uptake of 14C-malate or 35S-sulphate against unlabelled substrates. This assay showed that GIC-2 transported DHAP, GAP, and PEP. However, significant transport activity was not observed for bGIC-2. Overall, the study provides strong, but not conclusive evidence that bGIC-1 is involved in transporting glycolytic intermediates across the inner membrane of the mitochondria, while the function of GIC-2 remains unclear, despite exhibiting the same metabolite binding properties as bGIC-2 in thermostability assays.

      Strengths:

      Overall, the findings are of interest in the context of understanding the diversity of core metabolic pathways in evolutionarily diverse eukaryotes, as well as the process by which cytosolic glycolysis evolved in most eukaryotes. The experiments are carefully performed and clearly described.

      We thank the reviewer for their constructive comments. We note that bGIC-2 is the identified glycolytic intermediate transporter, not bGIC-1.

      Weaknesses:

      The main weakness of the study is the lack of direct evidence that either bGIC-1 and/or bGIC2 are active in vivo. While it is appreciated that the genetic tools for disrupting GIC genes in Blastocystis are limited/lacking, are there opportunities to ectopically express or delete these genes in other Stamenopiles, such as Phaeodactylum triconuteum, to demonstrate function in vivo?

      Here, we have identified a transport protein, unique to stramenopiles, which is present in mitochondria of Blastocystis and can bind and transport glycolytic intermediates. We agree that it would have been desirable to confirm that they function as glycolytic intermediate transporters in vivo. However, the reviewer is correct in saying that the genetic tools for disrupting GIC genes in Blastocystis in vivo are not available. While the reviewer mentions the possibility of performing these analyses in Phaeodactylum tricornutum, it is important to note that this species possesses aerobic mitochondria and that the pay-off phase of glycolysis is present in both the mitochondrial matrix and the cytosol. Consequently, any data obtained from this species might not be conclusive and would also not be relevant to the glycolytic metabolism in Blastocystis, the subject of this study.

      The authors demonstrate that both bGIC-1 and bGIC-2 are targeted to the mitochondrion, based on immunofluorescence studies. However, the precise localization and topology of these carriers in the inner or outer membrane are not defined. The conclusions of the study would be strengthened if the authors could show that one/both transporters are present in the inner membrane using protease protection experiments following differential solubilization of the outer and inner mitochondrial membranes.

      The protein is a member of the mitochondrial carrier family, which are extremely hydrophobic membrane proteins. Those with an established transport function are known to localise consistently to the mitochondrial inner membrane, which is impermeable to charged molecules, whereas the outer membrane is porous through VDAC. Furthermore, when the carriers are overproduced in Saccharomyces cerevisiae, the protein is found in the enriched mitochondrial fraction, adding further support to the idea that they are localised to the inner membrane, as the outer membrane has a limited surface area.

      It is not clear why hetero-exchange reactions were not performed for bGIC-1 (only for bGIC-2).

      Unfortunately, bGIC-1 did not display transport activity when tested in [14C]-malate/malate, [35S]-sulphate/sulphate or [33P]-phosphate/phosphate homo-exchange reactions, as shown in Figure 6 (Figure 5 in the revised manuscript). Phosphoenolpyruvate and dihydroxyacetone phosphate are not available in a radiolabelled form and glyceraldehyde-3-phosphate is prohibitively expensive, so we were unable to test glycolytic intermediates directly in homo-exchange reactions. Hetero-exchange reactions, as performed in Figure 5 (Figure 6 in the revised manuscript) for bGIC-2, are conclusive, as accumulation of the radio-labelled substrate inside the proteoliposomes can only occur, when the internal substrate is exported. It seems that Blastocystis has multiple copies, some of which are coding for dysfunctional carriers, being possible pseudo-genes.

      The summary slide depicted in Fig 7 is somewhat simplified and inaccurate. First, the authors show that TPI is located in the mitochondria in this study, while in the summary figure, TPI is shown to be present in both the cytosol and mitochondrial matrix. A cytosolic localization for TPI provides a functional rationale for having a triose-P carrier in the inner membrane - however, this is not supported by the data shown here. Second, if bGIC1/2 uses PEP as a counter ion to import GA3P and DHAP into the mitochondrion, as proposed in Fig 7, the lower glycolytic pathway would be effectively truncated at PEP, removing substrate for pyruvate kinase and formation of pyruvate/ATP. Third, the authors suggest that DHAP may have other functions in the mitochondria although these are not shown in the figure.

      Figure 7 presents a schematic comparison of the localisation of glycolysis in humans and Blastocystis, specifically focused on the transport steps of either pyruvate (humans) or glycolytic intermediates (Blastocystis) into the mitochondrial matrix. Most of the metabolism of Blastocystis has been inferred from the presence or absence of genes, encoding for particular enzymes, with the exception of the unusual glycolytic pathway. We feel that overcomplicating this schematic figure would detract from the main message of this analysis. Although the transport data show that PEP, another glycolytic intermediate, is transported, we agree with the reviewer that PEP export cannot be rationalised in the context of our current understanding of the metabolism, and we have changed the figure accordingly.

      We have not suggested that DHAP has other functions in mitochondria; on line 230, we state that ‘we have not found any evidence for the presence of dihydroxyacetone phosphate inside mitochondria in the literature. It is possible that it is not transported under physiological conditions in competition with dicarboxylates or other substrates.’

      Reviewer #2 (Public Review):

      In this manuscript, the authors set out to identify transporters that must exist in Stramenophiles due to the fact that the second half of glycolysis appears to be conducted in the mitochondria. They hypothesize that a Stramenophile-specific clade of transporters related to the dicarboxylate carriers is likely the relevant family and then go on to test two proteins from Blastocystis due to the infectious disease relevance of this organism. They show rather convincingly that these two proteins are expressed and are localized to the mitochondria in the native organism. The purified proteins bind to glycolytic intermediates and one of them, GIC-2, transports several glycolytic intermediates in vitro. This is a very solid and well-executed study that clearly demonstrates that bCIC-2 can transport glycolytic intermediates.

      We thank the reviewer for their positive comments on the manuscript, and their careful analyses of the presented data.

      (1) The major weakness is that the authors aren't able to show that this protein actually has this function in the native organism. This could be impossible due to the lack of genetic tools in Blastocystis, but it leaves us without absolute confidence that bGIC-2 is the important glycolytic intermediate mitochondrial transporter (or even that it has this function in vivo).

      Unfortunately, genetic manipulation in Blastocystis is currently not feasible and thus we cannot conduct a comparative metabolic study with the appropriate controls. The gold standard for identification is to prove the function with purified protein directly, which we have done here by using binding studies and transport assays.

      (2) It's atypical that the figures and figure panels don't really follow the order of their citation in the text. It's not a big deal, but mildly annoying to have to skip around in the figures (e.g. Figure 3D-E are described in the same paragraph as Figure 5). In addition, to facilitate the flow and a proper understanding I would encourage a reordering between figures 5D and 6 since Figure 6 is needed to understand the results shown in panel 5D, which may lead to confusion.

      We agree with the reviewer and have reordered the figures, switching Figure 5 and 6, which makes the manuscript easier to follow.

      (3) My impression is that the authors under-emphasize the fact that the hDIC also binds (and is stabilized by) glycolytic intermediates (G3P and 3PG). In the opinion of this reviewer, this might change the interpretation about the uniqueness of the bGIC proteins. They act on additional glycolytic intermediates, but it's not unique.

      The reviewer is correct that hDIC is stabilized by both G3P and 3PG, but neither are transported, as shown in Figure 5B (Figure 6B in the revised manuscript). It is not uncommon for compounds to bind to some extend without being transported, as they share certain structural and chemical features with the substrates, which result in stabilisation in thermostability analyses. For example, GTP stabilises the ADP/ATP carrier in thermostability analyses to some extent (Majd et al, 2018), although it is not a transported substrate of the carrier (King et al, 2020). Although thermostability assays are very useful for screening of potential substrates, it is always necessary to carry out transport assays, which are the gold standard for transporter identification.

      Reviewer #3 (Public Review):

      Summary:

      Unlike most eukaryotes, Blastocystis has a branched glycolysis pathway, which is split between the cytoplasm and the mitochondrial matrix. An outstanding question was how the glycolytic intermediates generated in the 'preparatory' phase' are transported into the mitochondrial matrix for the 'pay off' phase. Here, the authors use bioinformatic analysis to identify two candidate solute carrier genes, bGIC-1, and bGIC-2, and use biochemical and biophysical methods to characterise their substrate specificity and transport properties. The authors demonstrate that bGIC-2 can transport dihydroxyacetone phosphate, glyceraldehyde-3-phosphate, 3-phosphoglycerate, and phosphoenolpyruvate, establishing this protein as the 'missing link' connecting the two split branches of glycolysis in this branch of single-celled eukaryotes. The authors also present their data on bGIC-1, which suggests a role in anion transport and bOGC, which is a close functional homologue of the human oxoglutarate carrier (hOGC, SLC25A11) and human dicarboxylate carrier (hDIC, SLC25A10).

      Strengths:

      The results are presented in a clear and logical arrangement, which nicely leads the reader through the process of gene identification and subsequent ligand screening and functional reconstitution. The results are compelling and well supported - the thermal stabilisation data is supported by the exchange studies. Caveats, where apparent, are discussed and rational explanations are given.

      We thank the reviewer for their positive and constructive comments on the manuscript.

      Weaknesses:

      The study does not contain any significant weaknesses in my view. I would like to see the authors include the initial rate plots used in the main figures (possibly as insets), so we can observe the data points used for these calculations. It would also have been interesting to include the AlphaFold models for bGIC-1 and bGIC-2 and a discussion/rationalisation for the substrate specificity discussed in the study.

      We have shown uptake curves in both Figure 3 and Figure 6 (Figure 5 in the revised manuscript) to provide the typical uptake curves that we record by our robot, and we also show how we calculate the initial rates. We feel that the inclusion of uptake curves for each compound for each carrier (96 uptake curves in total) would make figure 5 (Figure 6 in the revised manuscript) extremely complicated.

      It would also have been interesting to include the AlphaFold models for bGIC-1 and bGIC-2 and a discussion/rationalisation for the substrate specificity discussed in the study.

      Whilst AlphaFold is an important step forward in the prediction of protein structures, it is not accurate enough at this time to be used for the rationalisation of the substrate specificity. For instance, there are the significant structural differences between the predicted AlphaFold structure of the human uncoupling protein (https://alphafold.ebi.ac.uk/entry/P25874), by and large based on the mitochondrial ADP/ATP carrier, and the experimentally determined structure, especially for the central cavity where the substrate recognition takes place (Jones et al, 2023; Kang & Chen, 2023). More importantly, it is believed that the optimal binding of the substrate takes place in the occluded state (Klingenberg, 2007; Springett et al, 2017), for which we have no structure.

      References

      Jones SA, Gogoi P, Ruprecht JJ, King MS, Lee Y, Zögg T, Pardon E, Chand D, Steimle S, Copeman DM et al (2023) Structural basis of purine nucleotide inhibition of human uncoupling protein 1. Sci Adv 9: eadh4251

      Kang Y, Chen L (2023) Structural basis for the binding of DNP and purine nucleotides onto UCP1. Nature 620: 226-231

      King MS, Tavoulari S, Mavridou V, King AC, Mifsud J, Kunji ERS (2020) A single cysteine residue in the translocation pathway of the mitosomal ADP/ATP carrier from Cryptosporidium parvum confers a broad nucleotide specificity. Int J Mol Sci 21: 8971

      Klingenberg M (2007) Transport viewed as a catalytic process. Biochimie 89: 1042-1048

      Majd H, King MS, Palmer SM, Smith AC, Elbourne LD, Paulsen IT, Sharples D, Henderson PJ, Kunji ER (2018) Screening of candidate substrates and coupling ions of transporters by thermostability shift assays. Elife 7: e38821

      Springett R, King MS, Crichton PG, Kunji ERS (2017) Modelling the free energy profile of the mitochondrial ADP/ATP carrier. Biochim Biophys Acta 1858: 906-914

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors present a detailed study of a nearly complete Entomophthora muscae genome assembly and annotation, along with comparative analyses among related and non-related entomopathogenic fungi. The genome is one of the largest fungal genomes sequenced, and the authors document the proliferation and evolution of transposons and the presence/absence of related genetic machinery to explore how this may have occurred. There has also been an expansion in gene number, which appears to contain many "novel" genes unique to E. muscae. Functionally, the authors were interested in CAZymes, proteases, circadian clock related genes (due to entomopathogenicity/ host manipulation), other insect pathogenspecific genes, and secondary metabolites. There are many interesting findings including expansions in trahalases, unique insulinase, and another peptidase, and some evidence for RIP in Entomophthoralean fungi. The authors performed a separate study examining E. muscae species complex and related strains. Specifically, morphological traits were measured for strains and then compared to the 28S+ITSbased phylogeny, showing little informativeness of these morpho characters with high levels of overlap.

      This work represents a big leap forward in the genomics of non-Dikarya fungi and large fungal genomes. Most of the gene homologs have been studied in species that diverged hundreds of millions of years ago, and therefore using standard comparative genomic approaches is not trivial and still relatively little is known. This paper provides many new hypotheses and potential avenues of research about fungal genome size expansion, entomopathogenesis in zygomycetes, and cellular functions like RIP and circadian mechanisms.

      Strengths:

      There are many strengths to this study. It represents a massive amount of work and a very thorough functional analysis of the gene content in these fungi (which are largely unsequenced and definitely understudied). Too often comparative genomic work will focus on one aspect and leave the reader wondering about all the other ways genome(s) are unique or different from others. This study really dove in and explored the relevant aspects of the E. muscae genome.

      The authors used both a priori and emergent properties to shape their analyses (by searching for specific genes of interest and by analyzing genes underrepresented, expanded, or unique to their chosen taxa), enabling a detailed review of the genomic architecture and content. Specifically, I'm impressed by the analysis of missing genes (pFAMs) in E. muscae, none of which are enriched in relatives, suggesting this fungus is really different not by gene loss, but by its gene expansions.

      Analyzing species-level boundaries and the data underlying those (genetic or morphological) is not something frequently presented in comparative genomic studies, however, here it is a welcome addition as the target species of the study is part of a species complex where morphology can be misleading and genetic data is infrequently collected in conjunction with the morphological data.

      Thank you for your careful reading of our work. We’re glad that you identified these areas as strengths.

      Weaknesses:

      The conclusions of this paper are mostly well supported by data, but a few points should be clarified.

      In the analysis of Orthogroups (OGs), the claim in the text is that E. muscae "has genes in multi-species OGs no more frequently than Enotomophaga maimaiga. (Fig. 3F)" I don't see that in 3F. But maybe I'm really missing something.

      Thank you for catching this. You were, in fact, not missing anything at all. There was a mismatch between the data plotted in F and G and how the caption described these data. We very much apologize for the confusion that this must have caused. We have corrected these plots and also made changes to improve interpretability (see below).

      Also related, based on what is written in the text of the OG section, I think portions of Figure 3G are incorrect/ duplicated. First, a general question, related to the first two portions of the graph. How do "Genes assigned to an OG" and "Genes not assigned to an OG" not equal 100% for each species? The graph as currently visualized does not show that. Then I think the bars in portion 3 "Genes in speciesspecific OG" are wrong (because in the text it says "N. thromboides had just 16.3%" species-specific OGs, but the graph clearly shows that bar at around 50%. I think portion 3 is just a duplicate of the bars in portion 4 - they look exactly the same - and in addition, as stated in the text portion 4 "Potentially speciesspecific genes" should be the simple addition of the bars in portion 2 and portion 3 for each species.

      As mentioned above, we sincerely regret the error made in the plot and for the confusion that this caused. F now reflects the percentage of orthogroups (OGs) that possess at least one representative from the indicated species (left) and the percentage of OGs that are species-specific (only possess genes from one species; right). The latter is a subset of the former. G now reflects the percentage of annotated genes that were assigned an OG, per species, as well as the inverse of this - genes that were not assigned to any OG. These should, and now do, sum to 100%. The “Within species-specific OG” data summed with the “Not assigned OG” data yields the “Potentially species-specific data” in the rightmost column.

      In the introduction, there is a name for the phenomenon of "clinging to or biting the tops of plants," it's called summit disease. And just for some context for the readers, summit disease is well-documented in many of these taxa in the older literature, but it is often ignored in modern studies - even though it is a fascinating effect seen in many insect hosts, caused by many, many fungi, nematodes (!), etc. This phenomenon has evolved many times. Nice discussions of this in Evans 1989 and Roy et al. 2006 (both of whom cite much of the older literature).

      You’re right. We have now clarified that this behavior is called “summit disease” and referenced the suggested articles, along with a more recent review.

      Reviewer #2 (Public Review):

      In their study, Stajich and co-authors present a new 1.03 Gb genome assembly for an isolate of the fungal insect parasite Entomophthora muscae (Entomophthoromycota phylum, isolated from Drosophila hydei). Many species of the Entomophthoromycota phylum are specialised insect pathogens with relatively large genomes for fungi, with interesting yet largely unexplored biology. The authors compare their new E. muscae assembly to those of other species in the Entomophthorales order and also more generally to other fungi. For that, they first focus on repetitive DNA (transposons) and show that Ty3 LTRs are highly abundant in the E. muscae genome and contribute to ~40% of the species' genome, a feature that is shared by closely related species in the Entomophthorales. Next, the authors describe the major differences in protein content between species in the genus, focusing on functional domains, namely protein families (pfam), carbohydrate-active enzymes, and peptidases. They highlight several protein families that are overrepresented/underrepresented in the E. muscae genome and other

      Entomophthorales genomes. The authors also highlight differences in components of the circadian rhythm, which might be relevant to the biology of these insect-infecting fungi. To gain further insights into E. muscae specificities, the authors identify orthologous proteins among four Entomophthorales species. Consistently with a larger genome and protein set in E. muscae, they find that 21% of the 17,111 orthogroups are specific to the species. To finish, the authors examine the consistency between methods for species delineation in the genus using molecular (ITS + 28S) or morphological data (# of nuclei per conidia + conidia size) and highlight major incongruences between the two.

      Although most of the methods applied in the frame of this study are appropriate with the scripts made available, I believe there are some major discrepancies in the datasets that are compared which could undermine most of the results/conclusions. More precisely, most of the results are based on the comparison of protein family content between four Entomophthorales species. As the authors mention on page 5, genome (transcriptome) assembly and further annotation procedures can strongly influence gene discovery. Here, the authors re-annotated two assemblies using their own methods and recovered between 30 and 60% more genes than in the original dataset, but if I understand it correctly, they perform all downstream comparative analyses using the original annotations. Given the focus on E. muscae and the small sample size (four genomes compared), I believe performing the comparisons on the newly annotated assemblies would be more rigorous for making any claim on gene family variation.

      Thank you for this comment. While we did compare gene model predictions for two of these assemblies to assess if this difference could account for discrepancies in gene counts, completely reannotating all non-E. muscae datasets was outside of the scope of this study. In our opinion, the total number of predicted genes in a genome is not a best representation of differences since splitting or fusing gene models can inflate seeming differences; the orthology and domain counts are a more accurate assessment of the content. It’s possible that annotation differences may have inflated some gene family counts, however we will note that similar domain trends were observed between the closest species to E. muscae, Entomophaga maimaiga, suggesting that these differences were not sufficient to prevent us from detecting real biological signals. We look forward to continued improvement of our genome through additional sequencing and more clarity on total gene content of E. muscae.

      The authors also investigate the putative impact of repeat-induced point mutation on the architecture of the large Entomophthorales genomes (for three of the eight species in Figure 1) and report low RIP-like dinucleotide signatures despite the presence of RID1 (a gene involved in the RIP process in Neurospora crassa) and RNAi machinery. They base their analysis on the presence of specific PFAM domains across the proteome of the three Entomophthorales species. In the case of RID1, the authors searched for a DNA methyltransferase domain (PF00145), however other proteins than RID1 bear such functional domain (DNMT family) so that in the current analysis it is impossible to say if the authors are actually looking at RID1 homologs (probably not, RID1 is monophyletic to the Ascomycota I believe). Similar comments apply to the analysis of components of the RNAi machinery. A more reliable alternative to the PFAM analysis would be to work with full protein sequences in addition to the functional domains.

      While we understand this concern regarding domain vs. full length protein, the advantage of the domain search is that HMM-based searches are sensitive to detecting more distantly related homologs. Entomophthoralean fungi are distantly related from the ascomycetes in which these mechanisms have been characterized, so we chose a broader search approach that may identify proteins with similar domain structure, but are not necessarily homologs. These searches are presented in the manuscript as preliminary, but worth further investigation. However, our RID-based analysis did not identify convincing homologs for RID1 in entomophthoralean fungi included in our investigation, and we reported low homology (i.e., 12-14%) among our orthogroup of interest and RID1. We have further edited this section to clarify our understanding that these candidates are not RID1 homologs. We had hoped to avoid this implication, but we felt this investigation and null result were worth reporting.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Specific points:

      Results:

      "1.03 Gb genome consisting of 7,810 contigs (N50 = 301.1 kb). Additional... resulted in a final contig count of 7,810 (N50 = 329.6 kb)" So you started and ended with the same contig count but a different N50? Is this a typo?

      Yes, this was a typo. Thank you for bringing this to our attention.

      Figure 1D.

      The colors of Complete1x and Complete2x are too similar to tell them apart.

      The colors have been made more distinct.

      Figure 4B.

      I know C. rosea has been found from insects before, but it's mostly a mycoparasite and occasionally an endophyte, and has bioactivity against a lot of things. I just saw that it's listed as an entomopathogen, and I was surprised. Anyway, leave it as is if you want to, but it's definitely better studied and better known (Google Scholar) as a mycoparasite.

      Thanks for this comment. For the sake of including a more diverse representation of entomopathogenic fungi, we have opted to leave this as is.

      Full references (from the public comment)

      Evans, H.C., 1989. Mycopathogens of insects of epigeal and aerial habitats. Insect-fungus interactions, pp.205-238.

      Roy, H.E., Steinkraus, D.C., Eilenberg, J., Hajek, A.E. and Pell, J.K., 2006. Bizarre interactions and endgames: entomopathogenic fungi and their arthropod hosts. Annu. Rev. Entomol., 51, pp.331-357.

      Reviewer #2 (Recommendations For The Authors):

      I believe the manuscript could largely benefit from restructuring the results section to enhance clarity. The results section reads like a lot of descriptive back and forth, so that the reader lacks a clear rationale. The absence of a consistent dataset used for the different comparisons made all along the manuscript makes it hard to follow.

      Minor comments:

      (No line numbers were available so I refer to page numbers).

      p1

      • not sure about the use of "allied" to describe other fungal species in the title and after (sister species?).

      We didn’t want to use the word sister because not all of these species could be considered sister.

      • Genomic defence against transposable elements rather than "anti"?

      We have rephrased to genomic defense.

      p3

      • Extra parenthesis at Bronski et al.

      This is now corrected.

      • What does newly-available mean here?

      We mean recent. A lot of the datasets we used were very new, and we wanted to emphasize that point.

      • The back and forth between genomes and transcriptomes makes it hard to follow, would clarify from the beginning (in addition to the sequencing method - short vs long-read assemblies as in Figure 1B) or perhaps use a consistent dataset for all subsequent comparative analysis in the Entomophthorales.

      We have denoted our transcriptomic datasets in Fig 1C using parentheses.

      p5

      • Perhaps clarify that class II DNA transposons can also "copy" (single-strand excisions can be repaired by the host machinery).

      We have now included mention of “copy” as well as “jump” mechanisms of Class II transposons per your suggestion.

      p6

      • "beginning roughly concurrently", not clear what "began".

      This is now corrected.

      • "control" rather than "protect against"?

      We’ve changed “protect against” to “counter”.

      • I believe RIP has only been observed (experimentally) in a handful of fungal species, all from the Ascomycota phylum.

      Hood et al, 2005 found signatures of RIP in anther-smut fungus and Horns et al, 2012, found evidence of hypermutability across repeat elements within several Pucciniales species.

      • "RID1 contains two DNA_methylase domains", RID1 has one methyltransferase domain according to the reference Freitag et al, 2002.

      Thank you for drawing this to our attention. It is true RID1 has one methyltransferase region; however, the sequence deposited by Freitag et al, 2002 (AAM27408) is predicted by HMMer to have two adjacent Pfam DNA_methylase domains (i.e., PF00145). In this exploratory analysis, we tried to leverage this characteristic to identify candidate proteins of interest. We have reworded this section to clarify this.

      p8

      • Here and after I would use more informative titles for each paragraph.

      With the exception of the headings for Pfam, CAZy and MEROPs analyses, we believe the other headings are informative. We appreciate this comment, but opt to leave the heading titles as is.

      • I believe presenting the orthology analysis before the more in-depth protein family domain search.

      We leveraged the OG analysis mostly as a way to identify potentially unique genes in E. muscae, so we think the current order makes the most sense.

      p10

      • Figures 3F and G are confusing. The legend for Figure 3F mentions "OGs with >= 2 species" while the figure shows "multi-species OGs", and reads as redundant with the "species-specific" OGs. For the "OGs within species" do I understand it correctly that it represents the number of genes assigned to OGs for each species? If yes, the numbers are in contradiction with Figure 3G. And in Figure 3G shouldn't the sum of "genes assigned in OGs" and "genes nor assigned in OGs" add up to 100? I'm probably missing something here, but I would clarify what the different sets of orthogroups are in the figure and in the text (perhaps adopting a pangenome-like nomenclature).

      Thanks for this comment. This legend, unfortunately, reflected an earlier version of the figure and was overlooked prior to submission. We have since amended this and sincerely apologize for the error on our part.

      p12

      • The whole first paragraph reads more like it should be part of an introduction/discussion.

      We’ve moved some of this paragraph to the discussion but left the background information necessary for the reader to understand why we were looking for homologs of wc and frq.

      p13

      • The last paragraph reads like discussion.

      We have revised this paragraph so it now reads: “Because E. muscae is an obligate insect-pathogen only living inside live flies, we investigate the presence of canonical entomopathogenic enzymes in the genome. We find that E. muscae appear to have an expanded group of acid-trehalases compared to other entomopathogenic and non-entomopathogenic Entomophthorales (Fig. 4A), which correlates with the primary sugar in insect blood (hemolymph) being trehalose (Thompson, 2003). The obligate insectpathogenic lifestyle is also evident when comparing the repertoire of lipases, subtilisin-like serine proteases, trypsins, and chitinases in our focal species versus Zoopagomycota and Ascomycota fungi that are not obligate insect pathogens (Fig. 4B). Sordariomycetes within Ascomycota contains the other major transition to insect-pathogenicity within the kingdom Fungi (Araújo and Hughes, 2016). Based on our comparison of gene numbers, Entomophthorales possess more enzymes suitable for cuticle penetration than Sordariomycetes (Fig. 4B). In contrast, insect-pathogenic fungi within Hypocreales possess a more diverse secondary metabolite biosynthesis machinery as evidenced by the absence of polyketide synthase (PKS) and indole pathways in Entomophthorales (Fig. 4C).”

      p15 and 16

      • This all reads as redundant with the previous protein family domain analysis. I would try to merge them.

      Thank you for this comment, however we have opted to maintain the current structure.

      p18

      • In the first sentence, I'm not sure about what was performed here.

      This has been reworded to clarify.

      p20

      • Regarding the assembly, do I understand it correctly that a nuclear genome can be partially haploid / diploid?

      Thanks for your comment. The genome itself is, of course, some integer multiple of n, but based on BUSCO scores our assembly doesn’t appear to have completely collapsed into a haploid genome. We think it makes more sense here to say “partially haploid” than “partially diploid” so have altered this.

      p21

      • RIP has only been observed in a couple of Ascomycetes. RIP-like genomic signatures (GC bias) have been observed elsewhere.

      Hood et al, 2005 found signatures of RIP in anther-smut fungus and Horns et al, 2012, found evidence of hypermutability across repeat elements within several Pucciniales species.

      p23

      • Interesting that the peptidase A2B domain is found uniquely in E. muscae genome and is associated with Ty3 activity. Does the domain often overlap with annotated Ty3 in E. muscae genome? Or how come the domain is not present in other sister species with large genomes full of Ty3 transposons? Could it relate to a new active transposon in E. muscae specifically?

      Thanks for this comment. The domain-based analysis was only performed on the predicted transcriptome of the genome assembly, which does not include the repeat elements (e.g., Ty3). It could be that this peptidase reflects a new active transposon that’s specific to E. muscae, which would certainly be very interesting. We’ve now included this idea in the discussion.

      p26

      • In the case of fungal genomes, I would not advise masking the assembly for repeated sequences prior to gene annotation (in particular given the current focus on protein family variation).

      Thank you for this comment, however we disagree with this assertion as a typical approach for genome annotation in fungi and eukaryotic genomes is to use soft masking of transposable elements before performing gene prediction to avoid over-prediction. While there could be alternative approaches that compare masked or unmasked. This is a recommended protocol for underlying tools like Augustus (10.1002/cpbi.57) and in general descriptions of genome annotation (10.1002/0471250953.bi0401s52). The false positive rate of genes predicted through TE regions is likely to be more a problem than false negatives of missed genes in our experience. Further it seems appropriate to use consistent approach to annotation throughout when including genomes from other sources (e.g., Joint Genome Institute annotated genomes) which also use a repeat masking approach first before annotation. It seems most appropriate to use consistent methods when generating datasets to be used for comparative analyses. It is outside the scope of this project to reannotate all genomes with and without repeat masking.

      p27

      • Interrupted sentence at "Classification of DNA and LTR .. by similarity The".

      This was an unnecessary partial phrase as the information on classification of elements via RepBase was made a few sentences above this.

      p28

      • Enriched/depleted rather than "significantly different"?

      Thank you for this comment, however we have opted to maintain the current phrasing.

    1. Author response:

      Reviewer #2 (Public Review):

      In this study, the authors report that both mice and human patients carrying function-disrupting mutations in the OFD1 gene exhibited ectopic brown adipose tissue formation in the malformed tongue. The OFD1 gene is located on the X-chromosome and encodes a protein product required for the formation and function of the primary cilium, which is required for cells to properly receive and activate several signaling pathways, particularly the hedgehog signaling pathway. Loss of OFD1 function causes prenatal lethality of male fetuses and mosaic disruption of tissues in females due to random inactivation of the X-chromosome carrying either the mutant or wildtype allele. Using cell type-specific gene inactivation and genetic lineage labeling, the manuscript shows that the ectopic brown adipose tissue in the mutant tongue was not derived from cranial neural crest cells (CNCCs). Additional genetic and embryological studies led to the conclusion that loss of Ofd1 function in the CNCC cells in the embryonic hypoglossal cord, via which the tongue myoblast precursor cells migrate from anterior somites to the tongue primordia, caused disruption of cell-cell interactions between the CNCCs and migrating muscle precursor cells, resulting in altered differentiation of those myoblast precursor cells into brown adipocytes. The authors provided data that disruption of Smo in a subset of CNCCs also resulted in ectopic adipose tissue formation in the tongue, indicating that this phenotype in the Ofd1 mutant mice was likely caused by disruption of hedgehog signaling in CNCCs. However, no experimental evidence is provided to support a major conclusion of the manuscript regarding altered differentiation of the tongue myoblast precursor cells into brown adipocytes in the Ofd1 mutant mice. Since it is well established that hedgehog signaling in the CNCCs is required for them to direct tongue myoblast cell migration as well as for tongue muscle differentiation/organization after the myoblasts arrived in the tongue primordia, the finding of tongue muscle defects in the Ofd1 mutant mice is not surprising. However, if proven true that disruption of Ofd1 function in CNCCs caused tongue myoblast precursor cells to alter their fate and differentiate into brown adipocytes, it would be an interesting new finding. Further identification of the signals produced by the Ofd1 mutant CNCCs for directing the cell fate switch will be a highly significant new advance in understanding the cellular and molecular mechanisms regulating tongue morphogenesis.

      Many in vitro and in vivo data have been added as new data. We hope that these are enough for our conclusion. It is extremely difficult to identify the signals produced by the Ofd1 mutant CNCCs for directing the cell fate switch of mesodermal cells after activation of Hh signaling in CNCC. Instead, our new findings raise the possibility that Hh signaling in mesodermal cells is also important for their differentiation as well as Hh signaling in CNCC, which has been added in revised paper. However, we think that it is beyond the scope of this study to deepen these.

      Reviewer #3 (Public Review):

      The authors observed phenotypes of ciliopathy model mice and they seem to coincide with those in human patients. They used mutants in which cilial function genes are deleted in cranial neural crest cells, and found the mutants exhibit abnormal cell differentiation in both neural crest- and mesoderm-lineage cells. The finding clearly shows the importance of tissue/cell interaction. The authors mainly observed the mouse in which Ofd1 gene that is coded on the X chromosome is deleted, therefore, Ofd1fl/WT;Wnt1Cre(HET) mice show that about one-fourth of neural crest cells can exhibit Ofd1 function whereas Ofd1fl;Wnt1Cre (HM) shows null Ofd1 function and show severer phenotypes than HET.

      For ectopic brown adipose tissue in the tongue is derived from mesoderm and the authors tried to show that the hypoglossal cord failed to obtain myogenic lineage after entering branchial arches in HET and HM due to lack of communication with neural crest cells. For ectopic bone formation, they found that it is due to the lack of Hedgehog signaling in neural crest cells, which was consistent with the reports in the Smofl/fl;Wnt1-Cre (Xu et al., 2019) and Ift88fl/fl;Wnt1Cre (Kitamura et al. 2020). The ectopic bone is connected to the original mandibular bone. The authors attribute the ectopic bone formation to the migration of mandibular bone neural crest cells into the tongue-forming area.

      For the poor tongue frenum formation, the authors found the importance of cell migration from the lateral sides of the branchial arch to the midline and its formation relies on non-canonical Wnt signaling. The authors observed similar phenotypes in the human patients as those in the mutants. The adipose tissue in the tongue area is normally found in the salivary gland region and intermuscular space, and it is intriguing to find the brown adipose tissue anterior to the cervical area in which the most anterior brown adipose tissue develops. qRT-PCR indicates that some of the marker genes are expressed in the laser micro-dissected sections of the ectopic brown adipose tissue. However, histology does not show the typical brown adipose tissue feature. In addition, brown adipose tissue is normally recognized in the sixth pharyngeal region as the cervical brown tissue from around E14.5 (Schulz and Tseng 2013), not E12 as the authors observe. Although the mutants develop under abnormal conditions, is it possible to say they are brown adipose tissue? The point has to be further investigated with more marker expression by immunohistochemical detection and other methods. Since the mutants seem to show impaired midline formation (which is consistent with the condition of human ciliopathy), is it possible to hypothesize that the adipose-like tissue is derived from the mesoderm of posterior branchial arch levels if the tissue is brown adipose tissue?

      Immunohistochemistry data has been added as new Figure S4 and S5.

      We agree reviewer’s comment. Histology of ectopic adipose in Ofd1 cKO is slightly different from typical images of brown adipose. Molecular characters of ectopic adipose in Ofd1 mutant tongue are similar to these of low thermogenic adipocyte. Histological features of low thermogenic is known to be different from that of typical brown adipose tissue. Histological features of low thermogenic adipocyte is similar to that of ectopic adipose in Ofd1 mutant mice. This has been mentioned in Results section.

      The cervical brown adipose tissue in Ofd1 mutant should be shrinked or be connected to ectopic adipose in mutant tongue, if ectopic adipose in mutant tongue was derived from the cervical brown adipose tissue due to mis-migration. However, any significant changes of the cervical brown adipose tissue or conection between cervical brown adipose and tongue adipose could not be detected in Ofd1 mutant mice. We think that ectopic adipose in mutant tongue is unlikely derived from cervical brown adipose tissue. These have been added in Result section.

      Cranial neural crest cells start migrating around E8.0 and reach their destination by E9.5. The authors show the lack of neural crest cells in the midline, the fluorescence is absent from the midline in HM, however, they studied it in the E11 mandible (Fig. 4E), almost more than two days after neural crest migration completes. Since the mandibular arch seems to form at the beginning in the mutants, is there a failure in allocating the neural crest and mesoderm at the beginning of the mandibular arch formation?

      It is difficult to prove how much migration is affected in mutant mice. Therefore, sentence describing migration has been deleted in revised paper

      The authors tried to disturb the interaction between the hypoglossal cord and neural crest cells by making incisions in the dorsal area of the branchial arches. That area contains both neural crest and mesoderm but not the hypoglossal cord-derived mesoderm. The hypoglossal cord passed through the posterior edge of the caudal (6th) pharyngeal arch, along the lateral side of the pericardium towards the anterior, ventral to branchial arches, and then inside the 2nd and 1st branchial arches (Adachi et al., 2018). It expresses Pax3 before entering the branchial arches, then Myf5 in the branchial arches. It seems that the migration of the hypoglossal cord does not require interaction with neural crest cells but it has to be confirmed as well as neural crest migration into the branchial arches from the beginning. Although the hypoglossal cord migrates mostly in mesoderm-derived mesenchyme, we cannot exclude the possibility that hypoglossal cord migration is affected.

      Cutting region in original Figure 2Q was not accurate. It has been changed in new Figure 3Q. We agree reviewer’s comment “we cannot exclude the possibility that hypoglossal cord migration is affected”. However, It is difficult to prove how much migration is affected in mutant mice. Therefore, sentence describing migration has been deleted in revised paper

      The lack of Myf5 expression in Ofd1fl;Wnt1Cre (HM) was explained as a failure in the differentiation of the hypoglossal cord into myoblasts on entrance into the branchial arches. Most of the cervical brown adipose tissue is derived from either Myf5- or Pax3- expressing lineage (Sanchez-Gurmaches and Guertin, 2014). Although the authors suggest that brown adipose cells are fate-changed mesoderm in the branchial arches, how do they explain the association with Myf5- or Pax3- expression?

      As reviewer mentioned, the cervical brown adipose tissue is derived from either Myf5- or Pax3- expressing lineage. However, these cells lost Myf5- or Pax3 expression when they differentiate into brawn adipocytes. Although ectopic adipose in Ofd1 mutant tongue showed Pax3 expression at early stage, they likely loose Pax3 expression soon after. There is another possibility that ectopic adipocytes retain Pax3 expression, if they would be abnormal adipocytes. If so, it's not surprised when expression pattern of ectopic adipocytes in Ofd1 mutant is different from these of normal brown adipose tissue. Anything can be possible in these situation. Therefore, we don’t mention anything about these in the text

      In addition, the cervical brown tissue is supposed to be derived from the branchial arch mesoderm (Mo et al., 2017). Is the formation of the cervical brown tissue affected in the Ofd1fl/WT;Wnt1Cre(HET) or Ofd1fl;Wnt1Cre (HM) if dysfunction of neural crest cells results in the cell fate change of mesoderm?

      Any significant morphological changes of the cervical brown adipose tissue could not be detected in Ofd1 mutant mice. Ectopic adipose tissue in Ofd1 cKO was found from E115, while cervical adipose tissue form from E14.5. We think that dysfunction of CNCC at E14.5 does not affect mesodermal cells for the cervical adipose tissue.

      For the tongue frenum development, it is hard to understand to hypothesize that its formation is unlikely to associate with midline formation. Although Lgr5 and Tbx22 are not expressed in the midline, the defect in midline formation could cause unnecessary interaction between the right and left tissues.

      We agree reviewer’s comment. The sentences have been changed in new manuscript.

      Tissue morphogenesis takes place in three dimensions, which were not considered in the data, especially in the labeling experiments. When the authors labelled the cells, which cells in which area were labelled? In the textbook, tongue formation is a result of the fusion of the midline processes derived from the branchial arches, therefore, it is important to identify which cells in which area are labelled.

      Data of Lgr5 and Tbx22 in situ hybridization has been added as new Figure 10-S1D and -S1E, since we labelled cells within Lgr5 and Tbx22 expression domain. Data showing section of explant with DiD injection before and after culture has been added as new Figure 10-S1F and -S1G, which showed DiD labelled cells were located within Lgr5 and Tbx22 expression domain before culture and at tongue frenum region after culture.

      The weakest point is that the authors demonstrate many interesting phenotypes but fail to show the mechanism of altered cell differentiation and direct evidence of the tissue origin of ectopic brown tissue. Without the data, suggestion from the authors' argument is weak, which is reflected in the conclusion of the abstract.

      Many in vitro and in vivo data have been added as new data. We hope that these are enough for our conclusion.

    1. Author response:

      Reviewer #2 (Public Review):

      (1) Some changes to statistical analyses are needed in this study.

      Fig. 1B, 1D, 2A, 3E, and 3F report the QL.d phenotype as a percentage of animals scored that were defective in migration. The methods make it clear this data is categorical rather than quantitative. Therefore, a t-test or any test designed for quantitative data is not appropriate. I suggest that the authors should investigate using a chi-squared or Fisher's exact test.

      For the reasons mentioned above, the calculation of standard deviation (as shown in error bars) is also not appropriate for Fig. 1B, 1D, 2A, 3E, and 3F. Of course, it is excellent that the authors scored multiple trials. For experiments with mutants, I suggest the authors might combine these trials or show separate results of each trial. For experiments using RNAi (Fig. 1B), each trial should be plotted separately because RNAi effectiveness can vary. If there is not enough space to show multiple trials, then I would ask that a representative trial be shown in the main figure and additional trials in a supplement.

      We thank the reviewer for pointing out the statistical mistake. For all figures assessing the QL.d migration phenotype (Fig.1B, 1D, 2A, 4A (former 3E), 4D (former 3F) and Fig.1 – figure supplement 1, Fig.2 – figure supplement 1, Fig.4 – figure supplement 2) the statistical significance was evaluated using Fisher’s exact test. For RNAi experiments (Fig. 1B) results from a representative experiment is shown and two additional trials are shown in Figure 1 – figure supplement 1. For experiments with mutants, results from separate trials were pooled and are presented in the main figures.

      In Fig. 1, 2, 3, and 5, it is not specified whether/how p-values were adjusted for multiple tests.

      We have applied Bonferroni correction for multiple testing in all Figures where it was relevant (Fig. 1, 2, 4, 5 and 6 and in their supplements) and this is now stated in all corresponding Figure legends.

      (2) I felt the author's interpretation of the sel-5 mutant phenotypes in EXC, and the genetic interactions with Wnt signaling mutants, might be improved. The authors show convincing data that the sel-5 mutants display a shortened EXC outgrowth phenotype. Conversely, mutants with reduced Wnt signaling, such as the lin-17 or lin-44 mutants, displayed lengthened EXC outgrowth. The authors show that in double mutants, loss of sel-5 partially suppressed the EXC overgrowth defects of lin-17 or lin-44 mutants (Fig. 5). In my opinion, this data is consistent with a model where SEL-5 acts to inhibit Wnt signaling in EXC. An inhibitory role in a Wnt-receiving cell would be consistent with the known activity for human AAK1 in promoting negative feedback and endocytosis of LPR6. Interestingly, the authors mention in their discussion that a mutant of plr-1, which acts in the internalization of Frizzled receptors, has a shortened EXC phenotype similar to that of sel-5 mutants. These observations all seem consistent with an inhibitory role, yet the authors do not state this as their conclusion. A clarification of their interpretation is needed.

      We thank the reviewer for this feedback. Indeed, the above interpretation of the excretory cell migration data is plausible, however, we think that several lines of evidence argue against this possibility. First, measurements of the posterior canal length during L1/L2 larval stages show that LIN-44/LIN-17 signalling is not required for the early stages of excretory canal outgrowth, unlike SEL-5/VPS-29 (Fig. 5E, 6D). This suggests that SEL-5 and VPS-29 are required earlier than LIN-44 and LIN-17 and therefore should not act at the level of Wnt receptor internalization. Our new data with more mutant combinations revealed canal shortening in cwn-1; cfz-2 and cwn-2; cfz-2 mutants. This would rather suggest a positive role for SEL-5 and VPS-29 in Wnt pathway regulation. Either SEL-5/VPS-29 employ two different mechanisms of Wnt pathway regulation or alternatively, act prior to any Wnt-dependent step in the excretory canal outgrowth. The observed partial rescue of the lin-17 or lin-44 overgrowth defect by sel-5 could then be explained for example by a reduced speed of canal outgrowth in sel-5 mutants. Based on new findings about CWN-1, CWN-2 and CFZ-2 involvement we have also modified our model now presented in Fig.7.

      For changes to the Results section, see Response to Reviewer 1, point 4b. The Discussion part has been substantially rewritten and is presented below:

      LINE 428 “Our analysis of single Wnt and Frizzled mutants revealed that while loss of cwn-2 or cfz-2 expression resulted in a very mild shortening of the excretory canal, loss of lin-44 or lin-17 led to profound canal overgrowth (summarized in Fig. 7A). These findings suggested that two independent Wnt pathways could be employed to establish proper excretory canal length – one promoting canal extension and one generating the stop signal for growth termination. Further analyses of double mutants and other Wnt signalling components revealed that the extension-promoting pathway includes cwn-1 in addition to cwn-2 and cfz-2, while the stop-signal pathway encompasses lin-44, lin-17, dsh-1, mig-5 and mig-14. A similar repulsive role of LIN-44/LIN-17 complex has been described in the case of a posterior axon of C. elegans GABAergic DD6 motor neuron (Maro et al., 2009) or PLM, ALN and PLN neurons (Zheng et al., 2015). Loss of lin-44 or lin-17 expression promoted outgrowth of the posterior neurites of these neurons implicating that in wild type animals, LIN-44 serves as a repulsive cue. On the other hand, cwn-2 and cfz-2 were shown to positively regulate the posterior neurite outgrowth of RMED/V neurons with cwn-2 acting as an attractive cue (Song et al., 2010). The role of two other Wnt signalling components, egl-20 and mig-1, is less clear. No effect (mig-1) or only very mild overgrowth defect (egl-20) is observed in single mutants. However, both egl-20 and mig-1 significantly rescue the overgrowth phenotype of lin-17 mutants, while at the same time, mig-1 can suppress the shortening of canals in cfz-2 mutants. EGL-20-producing cells are localized around the rectum (Whangbo et al., 1999; Harterink et al., 2011), exactly where the excretory canals stop, while LIN-44 is expressed more posteriorly (Herman et al., 1995; Harterink et al., 2011). A possible explanation could thus be that while LIN-44 provides a general posterior repulsive signal, EGL-20 fine-tunes the exact stopping position of the growing canal. The role of different Wnts and Frizzleds in excretory canal outgrowth is summarized in Fig. 7B. Further investigation will be required to decipher the exact way how SEL-5 and the retromer crosstalk with Wnt signalling during excretory cell outgrowth. It is clear though that more than one mechanism is likely involved. First, sel-5 vps-29 mutants display canal shortening similarly to cwn-1; cfz-2 or cwn-2; cfz-2 suggesting a positive regulatory role. Mutants in lin-17 and lin-44 display canal overgrowth, yet sel-5 is partially able to suppress this phenotype. This would imply a negative regulatory role of sel-5 and be in agreement with the role of AAK1 in Wnt pathway regulation (Agajanian et al., 2019). However, sel-5 and vps-29 are required already during the initial larval outgrowth while the LIN-44/LIN-17 signal is required later. The observed rescue might thus also be explained by a delayed growth of the canal and not by a direct impact of sel-5 and vps-29 on LIN-44 or LIN-17 levels or localization.”

    1. Author response:

      Reviewer #2 (Public Review):

      The manuscript entitled " Multimodal HLA-I genotypes regulation by human cytomegalovirus US10 and resulting surface patterning" by Gerke et al describes the biochemical analysis of US10-mediated down regulation of HLA-I molecules. The authors systemically examine the surface expression of different HLA-I alleles in cells expressing US10 and interactions of US10 with HLA-I and antigen presentation machinery. Further, studies examined genotypic and allotypic differences during expression of US10/US11 transcripts suggest a different allelic class I downregulation. In general, the authors have included data supporting the major claims. Yet, the conclusions and findings of the study only marginally advance the overall understanding of HCMV viral evasion and the mechanism of US10 function.

      Strengths:

      The studies are well characterized and the studies utilize diverse HLA-I and HCMV viral molecules. The biochemistry is excellent and is of high quality. Importantly, the study describes HLA-I allelic specific HCMV down regulation at the cell surface and molecular levels.

      Weaknesses:

      (1) The authors use over expressive language such as "strong binding" that does not have a quantitative value and it is relative to the specific assay with only small differences among the factors.

      We have changed the language to avoid non-quantitative expressions.

      (2) The US10 binding to the HLA-I did not correlate with class I surface levels suggesting that binding to the APC machinery (Figure 1); hence, why does the binding of US10 to the APC define its mechanism of action.

      We hypothesized that since binding to HLA-I allomorphs did not correlate with surface expression, further factors could be involved in regulation. Since the PLC (APC machinery) plays a major role for HLA-I expression, it was relevant to investigate this. The new data underlines the importance of the PLC for US10-mediated HLA-I regulation.

      (3) The innovative and significant aspects of the study are limited. The study does not delineate the US10 mechanism of action or show data in which US10-mediated MHC class I down regulation impacts adaptive or innate immune function.

      These remarks are important. We want to emphasize the variable impact of US10 on HLA-I. To our knowledge previous studies have not uncovered genotype-dependent effects on HLA-I as distinct as those observed with US10, indicating that US10 may exploit aspects of HLA-I that are yet to be fully elucidated. Therefore, confirming these findings is crucial for our study. The quantitative analysis of the HeLa HLA-I ligandome in US10-expressing cells strongly supports this conclusion. The precise quantification of HLA-I peptide ligands was made possible through collaboration with Dr. Andreas Schlosser from Würzburg, Germany, who possesses profound expertise in this specific method. Thus, in our opinion, this revision has enabled us to advance innovation and, importantly, enhance the significance of our study.

    1. Author response:

      Reviewer #3 (Public Review):

      Software UX design is not a trivial task and a point-and-click interface may become difficult to use or misleading when such design is not very well crafted. While Phantasus is a laudable effort to bring some of the out-of-the box transcriptomics workflows closer to the broader community of point-and-click users, there are a number of shortcomings that the authors may want to consider improving.

      Thank you for such an in-depth review. We really appreciate this feedback and have tried to address all of the concerns in the new version of Phantasus.

      Here I list the ones I found running Phantasus locally through the available Bioconductor package:

      (1) The feature of loading in one click one of the thousands of available GEO datasets is great. However, one important use of any such interfaces is the possibility for the users to analyze his/her own data. One of the standard formats for storing tables of RNA-seq counts are CSV files. However, if we try to upload from the computer a CSV file with expression data, such as the counts stored in the file GSE120660_PCamerge_hg38.csv.gz from https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE120660, a first problem is that the system does not recognize that the CSV file is compressed. A second problem is that it does not recognize that values are separated by commas, the very original CSV format, giving a cryptic error "columnVector is undefined". If we transform the CSV format into tab-separated values (TSV) format, then it works, but this constitutes already a first barrier for the target user of Phantasus.

      Thank you for highlighting this issue of file formats support. We acknowledge the commonality of CSV and CSV.gz files in gene expression analysis. As a response, we have updated our data loading procedure to support these file formats. Moreover, the most recent version of our web application is able to recognize gzip-archived file in any of supported table formats: GCT, TSV, CSV and XLSX.

      (2) Many RNA-seq processing pipelines use Ensembl annotations, which for the purpose of downstream interpretation of the analysis, need to be translated into HUGO gene symbols. When I try to annotate the rows to translate the Ensembl gene identifiers, I get the error

      "There is no AnnotationDB on server. Ask administrator to put AnnotationDB sqlite databases in cacheDir/annotationdb folder"

      Thank you for revealing this issue. Indeed, locally installed instances of the Phantatus might lose some functionality in absence of some auxiliary files. For example, gene annotation mapping is unavailable without annotation databases. Previously, the user had to perform additional setup steps to unlock a few features, which might be confusing and unclear. In order to overcome this we have revised significantly the installation procedure. Newly added ‘setupPhantasus’ function is able to create all necessary configuration files and provides an interactive dialog with the user that helps to load all necessary data files from our official cache mirror (https://alserglab.wsutl.edu/files/phantasus/minimal-cache/). Docker-based installation follows the same approach, however it is configured to install everything by default. Thus, with help of the new installation procedure locally installed Phantasus now has the whole functionality available at the official mirrors. The comprehensive installation description is now available at https://ctlab.github.io/phantasus-doc/installation.

      (3) When trying to normalize the RNA-seq counts, there are no standard options such as within-library (RPKM, FPKM) or between-library (TMM) normalization procedures.

      Appreciating your feedback, we've expanded the available normalization options in the updated version of Phantasus. We added support for TMM normalization as suggested by the edgeR package and voom normalization from the limma package. However, certain strategies like RPKM/FPKM or TPM rely on gene-specific effective lengths, which are challenging to infer without protocol and alignment details. As Phantasus operates on gene expression matrices and doesn't execute alignment steps, the implementation of these normalization seems infeasible. On the other hand, if the user has the matrix with FPKM or TPM gene values (for example from a core facility), such a matrix can be loaded into Phantasus and used for the analysis.

      If I take log2(1+x) a new tab is created with the normalized data, but it's not easy to realize what happened because the tab has the same name as the previous one and while the colors of the heatmap changed to reflect the new scale of the data, this is quite subtle. This may cause that an unexperienced user to apply the same normalization step again on the normalized data. Ideally, the interface should lead the user through a pipeline, reducing unnecessary degrees of freedom associated with each step.

      Thank you for your comment. Indeed our approach to create a new tab for each alteration to the expression values preserving the name might be the source of confusion for a user. On the other hand, generating informative tab names without overwhelming users with too much detail is also challenging. As a compromise we have an option for the user to manually rename the tab. Still, we agree that this remains an area for improvement. We also consider it to be a part of a larger issue: for example, the loaded data can already be log-scaled, so that even one round of log-scale transformation in Phantasus would be incorrect. Accordingly, we are exploring ways to address this issue in the future by adding automated checks for the tools or, as you suggested, implementing stricter pipelines.

      (4.4) Phantasus allows one to filter out lowly-expressed genes by averaging expression of genes across samples and discarding/selecting genes using some cutoff value on that average. This strategy is fine, but to make an informed decision on that cutoff it would be useful to see a density plot of those averages that would allow one to identify the modes of low and high expression and decide the cutoff value that separates them.

      Thank you for the suggestion. Indeed a density plot might help users to make informed decisions during gene filtration. We have added such a plot into the ‘Plot/Chart’ tool as a ‘histogram’ chart type.

      It would be also nice to have an interface to the filterByExpr() function from the edgeR package, which provides more control on how to filter out lowly-expressed genes.

      Thank you for proposing the inclusion of an interface for the filterByExpr() function from the edgeR package. In the recent update we have incorporated filterByExpr() as part of the voom normalization tool. For now, for simplicity, we have decided to keep only the default parameter values. However, we will explore the addition of the dedicated filtering tool in the future.

      (5) When attempting a differential expression (DE) analysis, a popup window appears saying:

      "Your dataset is filtered. Limma will apply to unfiltered dataset. Consider using New Heat Map tool."

      One of the main purposes of filtering lowly-expressed genes is mainly to conduct a DE analysis afterwards, so it does not make sense that the tool says that such an analysis will be done on the unfiltered dataset. The reference to the "New Heat Map tool" is vague and unclear where should the user look for that other tool, without any further information or link.

      Thank you for highlighting this issue. We agree that the message in the popup window and the default action were confusing. In response to your feedback, we've updated the default behavior of our DE tools to automatically use the filtered data in a new tab. Additionally, we've clarified the warning message to ensure a better understanding of this process.

      (6) The DE analysis only allows for a two-sample group comparison, which is an important limitation in the question we may want to address. The construction of more complex designs could be graphically aided by using the ExploreModelMatrix Bioconductor package (Soneson et al, F1000Research, 2020).

      Indeed, the ability to create complex designs and various comparisons is important for many applications for gene expression analysis. Accordingly, in the latest Phantasus version, we've introduced an advanced design feature for the DE analysis, enabling the utilization of multiple column annotations for the design matrix. Combined with the existing ability to create new annotations, this update facilitates the setup of diverse design matrices. While at the moment we do not allow setting a complex contrast, we hope that the current interface will cover most of the differential expression use cases.

      (7) When trying to perform a pathway analysis with FGSEA, I get the following error:

      "Couldn't load FGSEA meta information. Please try again in a moment. Error: cannot open the connection In call: file(file, "rt")

      We hope that this issue should be resolved after we have implemented a more streamlined setup process. Among others, the new approach aims to eliminate the unexpected absence of metafiles in local installations. The latest Phantasus package version explicitly prompts the user to load necessary additional files automatically during the initial run, reducing options for an invalid setup.

      Finally, there have been already some efforts to approach R and Bioconductor transcriptomics pipelines to point-and-click users, such as iSEE (Rue-Albrecht et al, 2018) and GeneTonic (Marini et al, 2021) but they are not compared or at least cited in the present work.

      Indeed, our comparison was focused toward tools that offer non-programmatic functionalities for gene expression data analysis. While tools like iSEE and GeneTonic are adept at visualizing data and hold their own in providing extensive abilities, they do necessitate additional data preparation using R, distinguishing them from the specific scope of tools we assessed.

      One nice features of these two tools that I missed in Phantasus is the possibility of generating the R code that produces the analysis performed through the interface. This is important to provide a way to ensure the reproducibility of the analyses performed.

      The ability to generate R code within tools like these indeed aids in ensuring analysis reproducibility. Moreover, we have previously attempted implementing this functionality in Phantasus, however it proved to be hard to do in a useful fashion due to potential complex interactions between user and the client-side part of Phantasus. Nevertheless, we acknowledge the significance of such a feature and aim to introduce it in the future.

    1. Author Response:

      We thank the reviewers for careful reading, acknowledging the strength of our manuscript, and pointing out its weakness, which we will address in the revised version as described below.

      (1) We will supplement our analysis with finer statistical testing and analysis, such as cross-validation and a more detailed analysis of the relation between the inferred model and the intrinsic timescales of the system. For the effect of the drug TIMP-1 on the animal, we will first explore the possibility of assessing the results using a multifactor ANOVA test, with the caveat that the distribution of interactions is not Gaussian. We will further test the effect of different group size on the significance of our results by considering subgroups of animals in the drug group, and compare the statistics between the (subsampled) drug group and the controlled group.

      (2) Our manuscript is similar with that of Shemesh et al. in that we both analyze socially interacting mice by constructing maximum entropy models (MEM) of the co-localization patterns of mice. The difference is in the setup and the number of mice (4 mice in Shemesh et al, 10-15 in our work), as we outlined in the manuscript. To further supplement our current argument of the difference of our results in the Discussion section, we will learn a MEM model up to triplet interactions for our Eco-HAB mice data, and compare to our current MEM model up to pairwise interactions using test-set validation or the Bayesian information criterion (BIC).

    1. Author response:

      The following is the authors’ response to the original reviews.

      We thank the reviewers for a careful review of the manuscript and for their comments, which we address below.

      Reviewer #1:

      (1) …the authors could examine division in a population of cells with only one centrosome. Seeing some restoration of mitotic progression in the absence of SAC-dependent delays would suggest that even one centrosome with uninhibited Eg5 is sufficient to negate SAC-dependent delays, and would limit models for what exactly centrosomes contribute.

      We agree that the one-centrosome question (i.e. whether cells with a single centriole, and therefore a single centrosome, have the same SAC dependence) would be interesting to address. It is known that cells with a single centriole generated through centrinone treatment also have elongated mitoses, like cells lacking centrioles (see Chinen, et. al. 2021, compare Fig 2C to Fig 2D), We have tried this experiment in RPE-1 cells with preliminary results confirming that there is a mitotic delay. It is not known whether this delay requires SAC activity, and we hope to address that in future work. In addition, we note that we show in Fig. 4b-c that cells with the normal centrosome number but with a single focus of microtubules due to Eg5 inhibition, were also sensitive to MPS1 inhibition. This suggests that centrosome presence alone cannot overcome the requirement for SAC activity, rather, the centrosomes need to be able to separate in a timely fashion.

      Reviewer #2:

      (1) An example is how to interpret the effect of Aurora B inhibition, which does not block acentrosomal cell division. If Aurora B is required for SAC activity, it suggests this effect of MPS1 may be a function other than SAC. Given the complexity of the SAC, it would be informative to test other SAC components. Instead, the authors conclude that the mitotic delay caused by MPS is required for acentrosomal cell division. I don't think they have ruled out, or even addressed other functions of MPS1.

      We agree that it is possible that functions of the MPS1 kinase other than those involved in the SAC could be important. Although we have not directly tested other SAC components, we did “mimic” SAC activity by delaying anaphase onset using APC/C inhibition while also inhibiting MPS1 (Fig. 2b-b’’). The fact that this restored division suggests that it is the SAC function of MPS1 kinase activity that is relevant to this delay. 

      (2) The authors find that when both the APC and MPS1 are inhibited, the cells eventually divide. These results are intriguing, but hard to interpret. The authors suggest that the failure to divide in MPS1-inhibited cells is because they enter anaphase, and then must back out. This is hard to understand and there is not data supporting some kind of aborted anaphase. Is the division observed with double inhibition some sort of bypass of the block caused by MPS1 inhibition alone? It is not clear why inhibition of APC causes increased cell division when MPS1 is inhibited.

      As described in the response to 1), we believe that reinstating the delay to anaphase onset by APC/C inhibition provided the time needed to establish a functional bipolar spindle even in the absence of the SAC, and that cells eventually overcome the proTAME block and proceed through mitosis, as observed in control cells in our experiments. We note that we chose concentrations of proTAME specifically for each cell line (RPE-1 and U2OS) that would result only in a temporary block, following on the work of Lara-Gonzalez and Taylor (2012), who reported similar findings for HeLa cells.

      (3) The authors characterize MTOC formation in these cells, which is also interesting. MTOCs are established after NEB in acentrosomal cells. Indeed, forming these MTOCs is probably a key mechanism for how these cells complete a division, like mouse oocytes.

      We agree that the observed intermediates of MTOCs are interesting and likely crucial to the mechanism of cell division in acentrosomal somatic cells. We are investigating further the differences and similarities between somatic cell MTOC formation in the absence of centrosomes and the naturally-occurring form of that process in oocytes.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We would like to thank the editors and reviewers for providing feedback and suggestions for our manuscript.

      In response to reviewers comments we changed several main Figures and added new tables and supplementary figures. We also made edits to the Discussion.

      Reviewer #1 (Public Review):

      Weaknesses:

      Limited data is shown on the let-7afdLOF mice. Does this mouse respond similarly to nCB as the let-7bc2LOF.

      In the revised manuscript, we have added a baseline lung phenotypic assessment for the let-7afdLOF mice up to 6-months of age within Figure 4-figure supplement 1. The data supports our original statement and observation that let-7afdLOF mice do not exhibit lung pathology, inflammation, or changes in T cell subsets at baseline. Our view is that current manuscript addresses the importance of let-7bc2-cluster in experimental emphysema and the let-7afd-cluster mice is used to validate Rorc as a direct target of let-7. In the future, new grant funding will make it possible to ascertain whether absence of the let-7afd-cluster also sensitizes mice to experimentally induced emphysema.

      Because the authors validate their findings from a previously published RNA-seq dataset in subjects with and without emphysema, the authors should include patient demographics from the data presented in Figure 1C-D.

      We thank the reviewers for their recommendation. In address of this, the revised manuscript contains a new Supplementary Table 1 with the human subject demographic information that corresponds with Figure 1D.

      To validate their mouse models, the absence of Let-7 or enhanced Let-7 expression needs to be shown in isolated T cells from exposed mice.

      In the case of let-7bc2-cluster, we have included Figure 2-figure supplement 2 which shows pri-let7bc2 expression assessed by qPCR from selected CD8+ lung T cells of control and let-7bc2LOF mice exposed to PBS vehicle or nCB. The let-7g GOF model used in our studies has been validated for the induction of let-7g in thymic and peripheral T cells and elicitation of gain-of-function phenotypes (Pobezinskaya et al. 2019; Angelou et al. 2020; Wells et al. 2023).

      In Figure 3, the authors are missing the unexposed let-7bc2LOF group from all panels.

      We emphasize that our exhaustive characterization of control and let-7bc2LOF mice in absence of challenge showed no phenotype. The baseline data was collectively shown in Figure 2-figure supplement 1.

      Why did the authors choose to overexpress Let-7g, the rational is not clear?

      We concur that ideal GOF experiments can be carried out with let-7b or let-7c. Unfortunately, let-7b/c2 transgenic mice are not currently available, so we elected to use the well characterized let-7g T cell GOF mouse model (Pobezinskaya et al. 2019; Angelou et al. 2020; Wells et al. 2023). Furthermore, it is worth noting that the binding/seed sequence of let-7g is identical to let-7a/b/c and other members. Nonetheless, we have edited our Discussion section to reflect this as a potential caveat that can confound the utilization of this let-7GOF mouse model.

      The purity of the CD4+ and CD8+ T cells is not shown and the full gating strategy should be included.

      In the revision, we included the flow gating strategy and display the representative population with purities in Supplementary Figure 1 of the revised manuscript.

      Reviewer #2 (Public Review):

      Weaknesses:

      The functional analyses are unusually focused on IL-17 producing CD8 T cells, but it is not made clear whether these cells are an important player in emphysema pathogenesis in the nCB and CS models. The data shown reveal that they are far less numerous than IL-17-producing CD4 T cells. It is also notable that the Figure 1 expression data from human subjects used sorted CD4+ T cells. And as the author mentioned, prior work on let-7 showed that it regulated Th17 (CD4) responses.

      As we showed that the let-7bc2LOF had enhanced the Tc17 cell population without any significant impact on Th17 cells, we elected to focus our analysis on this population. Furthermore, the connection of let-7 with the generation of a Tc17 inflammatory response is a novel finding, which so far remained unappreciated in the field and instigates new lines of inquiry.

      Compared with Let7bc2 deletion, Let7afd deletion had a much larger effect on IL17 production by CD8 T cells in vitro, and it also had a larger effect on RORgt expression in untreated mice in vivo, especially in the lung. It would be valuable to more thoroughly characterize the let7afd mice. RORgt expression should be shown in the in vitro assays. In the results, the authors state that let7afdLOF mice "did not exhibit lung histopathology nor inflammatory changes" up to 6 months of age. Similarly, it is stated in the conclusion that "the let-7afdLOF mice ... did not exhibit changes in Tc17/Th17 subpopulations" in vivo. All these data should be shown, and if no baseline changes are apparent, then I also recommend challenging these mice with nCB and/or cigarette smoke.

      We concur that additional phenotypic characterization on the let-7afdLOF mice will contribute valuable information in the future. Reviewer 1 had a similar comment. As described above in response to Reviewer 1, we added comprehensive phenotypic analysis of let-7afdLOF mice within Figure 4-figure supplement 1 in the revised manuscript. The new data indicates that there is no overt lung pathology in the let-7afdLOF mice despite the subtle induction of RORγt expression in T cells. Furthermore, we have now included flow cytometric analysis of RORγt expression from in vitro polarized Tc0 and Tc17 cells from let-7afdLOF mice within revised Figure 5H.

      This brings up the larger issue of redundancy among the let-7 family members and genomic clusters. This should be discussed, including some explanation of the relative expression of each mature family member in T cells, and how that maps to the clusters studied here (and those that were not investigated). It would also be helpful to explain the relationship between mouse Let7bc2 and human Let7a3b, since Let7bc2 is the primary focus of emphysema experiments in this manuscript. This is especially important because the study of individual let-7 clusters is the core novelty of this body of work, as described in the first paragraph of the discussion. The regulation of let-7 expression has been reported before and its functional role has been investigated with a variety of tools.

      We appreciate the interest and suggestion to expand the discussion on the let-7 family and their expression regulation. To address these points, we included additional references and expanded the Discussion section of the revised manuscript.

      Let7g overexpression caused a marked reduction in Rorgt expression in T cells at baseline and in the setting of nCB challenge, and it reduced the frequency of IL17+ producing CD8 T cells in the lung to baseline levels. Yet there was no change in the MLI measurement of histopathology. Is this a robust result? The responses in the experiment shown in Fig. 6C-D are quite muted compared to those shown in Figure 2. The latter also shows a larger number of replicates, and it is unclear whether the data in 6D include measurement from all of the mice tested (e.g. pooled from 2 small experiments) or only mice from one experiment.

      We appreciate the reviewer inquiry into the data presented in Figure 6C-D. The data is representative of a single experiment and the number of experiments has been added to the revised Figure 6 legend. We note that all let-7GOF and associated control mice in Figure 6 are exposed to doxycycline as part of the let7g induction model, whereas mice in Figure 2 are not. It has been previously reported that doxycycline, a member of the tetracycline family of molecules, has anti-inflammatory properties (Di Caprio et al. 2015), which we speculate could account for the differences in the magnitude of emphysemic response.

      Reviewer #3 (Public Review):

      Weaknesses:

      The authors show no change in frequencies of Treg cells in let-7bc2LOF mice exposed to nCB. Do these Treg cells also express higher levels of RORgt and IL-17? The major question that was not addressed in this study is how let-7 expression is regulated in emphysema. The other recommendation is that the authors include the sequences of the let-7 mimic oligos used in the luciferase assay.

      We did not have the opportunity to address whether RORγt is in fact also upregulated in Treg cells. It remains unclear what upstream mechanisms drive the downregulation of the let-7 clusters in T cells with exposure to smoke/nCB. However, we agree that this an important question and we therefore updated the Discussion section of manuscript by including several citations that could explain how let-7 clusters become repressed in a coordinated fashion. Regarding the last point, the sequence of the duplex used in luciferase assay corresponds to the canonical mature let-7b in NCBI and has been added to Supplementary Table 3.

      Reviewer #2 (Recommendations For The Authors):

      The authors state that "Recent evidence suggests the let-7 family is downregulated in patients with COPD, however, how they cause emphysema remains unclear." This should be reworded. Its downregulation in disease does not necessarily indicate that let-7 causes emphysema. Also, recommend rewording "Overall, our findings shed light on the let-7/RORγt axis as a braking and driving regulatory circuit in the generation of Tc17 cells..." What does it mean to be a "braking and driving" circuit? These terms seem contradictory.

      We recognize that the sentences were not phrased clearly. We have rephrased these statements as “Recent evidence suggests the let-7 miRNA family is downregulated in patients with COPD, however, whether this repression conveys a functional consequence in emphysema pathology has not been elucidated.” and “Overall, our findings shed light on the let-7/RORγt axis with let-7 acting as a molecular brake in the generation of Tc17 cells…”

      Experimental details are needed for the human miRNA expression studies. Too little information is provided in the methods section, and the article cited there (Yuan et al 2020) is not listed in the bibliography.

      We expanded the Materials and Methods section for the collection, isolation, and qPCR analysis of human subject lung T cells. We have corrected the bibliography and added the missing citation.

      The claim of novelty for miRNA-mediated silencing of Rorc in the discussion section is unnecessary and incorrect (https://pubmed.ncbi.nlm.nih.gov/23359619).

      Thank you for bringing the publication to our attention. Close inspection of this publication indicates that the authors did not experimentally validate Rorc as a direct target of let-7 itself. Plus the work was limited to immortalized in vitro cell cultures. We amended the sentence in the Discussion section highlighting the novelty of our findings which is the demonstration of Rorc as an in vivo target of let-7 in T cells.

      Citations

      Angelou, Constance C., Alexandria C. Wells, Jyothi Vijayaraghavan, Carey E. Dougan, Rebecca Lawlor, Elizabeth Iverson, Vanja Lazarevic, et al. 2020. “Differentiation of Pathogenic Th17 Cells Is Negatively Regulated by Let-7 MicroRNAs in a Mouse Model of Multiple Sclerosis.” Frontiers in Immunology 10: 3125. https://doi.org/10.3389/fimmu.2019.03125.

      Di Caprio, Roberta, Serena Lembo, Luisa Di Costanzo, Anna Balato, and Giuseppe Monfrecola. 2015. “Anti-Inflammatory Properties of Low and High Doxycycline Doses: An in Vitro Study.” Mediators of Inflammation 2015: 329418. https://doi.org/10.1155/2015/329418.

      Pobezinskaya, Elena L., Alexandria C. Wells, Constance C. Angelou, Eric Fagerberg, Esengul Aral, Elizabeth Iverson, Motoko Y. Kimura, and Leonid A. Pobezinsky. 2019. “Survival of Naïve T Cells Requires the Expression of Let-7 miRNAs.” Frontiers in Immunology 10 (May). https://doi.org/10.3389/fimmu.2019.00955.

      Wells, Alexandria C., Kaito A. Hioki, Constance C. Angelou, Adam C. Lynch, Xueting Liang, Daniel J. Ryan, Iris Thesmar, et al. 2023. “Let-7 Enhances Murine Anti-Tumor CD8 T Cell Responses by Promoting Memory and Antagonizing Terminal Differentiation.” Nature Communications 14 (1): 5585. https://doi.org/10.1038/s41467-023-40959-7.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      (1) The description of the wing phenotype that results from combinations of wingless and delex alleles at the bottom of page 4 (figure 1) is quite confusing. Are the trans-hets suppressed to wt or enhanced? The images in the Fig look enhanced.

      We thank the reviewer for this thoughtful observation regarding the wing phenotype description in combination with wg and dx alleles. We understand the confusion and appreciate the opportunity to clarify.

      In response to the concern raised, the trans-heterozygous indeed enhanced rather than suppressed to wild type. We acknowledge that the description would have been clearer. We have revised the relevant section to explicitly state that trans-heterozygous exhibit an enhanced wing phenotype in the updated version of the manuscript.

      (2) Use of Cut as a Wg readout in Fig1 is problematic since it is also a Notch target. Perhaps a more direct measure of Arm activity would be a better choice here, e.g., naked-lacZ.

      We appreciate the reviewer’s insightful comment regarding the use of Cut as a Wg readout. The point about being Cut as a Notch target raises a valid concern. To address this issue and provide a more direct measurement of Arm activity, we agree that incorporating a specific Arm readout, such as naked lacZ, would be a more suitable choice.

      We will incorporate this valuable feedback into our future research endeavors to augment the comprehensiveness of our study.

      (3) The dx allele effects on Sens and Vg in Fig 2C appear greater at two points along the DV margin (arrows). Do these match the expression pattern of dx mRNA?

      We thank the reviewer for this thoughtful observation. We understand that the effect of the dx LOF allele on Sens and Vg seems more pronounced at two specific points along the D/V margin. As far as our understanding Dx shows a homogeneous expression pattern throughout the Wg disc which has been reported earlier (Busseau et al., 1994., Mukherjee et al., 2005).

      (4) It really looks to my eye that dx loss lowers Wg expression in source cells in Fig 2. To confirm the model that Dx controls the spread of Wg protein, it would be ideal to rule out txnal effects with a wg-lacZ reporter.

      We appreciate the reviewer for raising this important point. In the revised version of the manuscript, we have introduced Wg-lacZ staining for both Wg-lacZ/+ and dx152/Y; Wg-lacZ/+ combination in Figure 2. This additional information eliminates the possibility of Deltex influencing Wg transcriptional regulation in source cells, thus reinforcing our hypothesis that the reduction of Deltex leads to a decline in Wg protein levels in the source cells, given Dx essential role in wingless gradient formation.

      (5) The drop in DV Wg and expansion of Vg domain in dx mutants seem paradoxical but could be explained by accelerated Wg spread and uptake. This could be tested by depleting the dally-like glypican that promotes long-range Wg diffusion in dx mutants, and seeing if this restores Wg levels at the DV margin.

      This is indeed a very thoughtful comment and we thank the reviewer for this insightful suggestion for further exploration. We believe that depleting dally-like glypican in dx mutants could possibly restore Wg levels at the DV margin.

      We recognize the importance of this experiment in providing a more comprehensive understanding of the underlying mechanisms, and we will give major emphasis on incorporating this suggestion in our future research.

      (6) The authors describe the effect of Dx over-expression as "reducing" the Wg gradient when they actually mean "flattening". Please be careful with this word choice as they mean different things.

      We thank the reviewer for the insightful feedback. The suggested modifications have been incorporated into the revised version of the manuscript.

      (7) The combined effects of Rab5dn and Dx o/e on Wg protein loc/levels are interesting but need to be followed up by testing whether the endogenous Dx/Rab5 show genetic interactions in control of Wg protein levels/localization.

      We acknowledge the reviewer's comment and in addressing it, we wish to highlight that the over-expression of Dx with endogenous Rab5 or Rab7 does not affect Wg protein levels or localization. We have mentioned the supporting data for this control in Figure 5(G, H).

      (8) The ability of MG132 to restore Arm levels in en-Dx discs is very promising. However, MG132 will also block Arm degradation by the Slmb-APC destruction complex, so this result could be non-specific. Tests of whether Dx drives poly-ub of Arm, and how much Dx is redundant to Slmb in this role, would be needed to solidify the authors' conclusion.

      We thank the reviewer for this insightful comment. We understand that the concern about MG132 blocking Arm degradation by Slmb-APC destruction complex adds an important layer of complexity to the interpretation of the results. We agree with the reviewer's comment that conducting these experiments will indeed offer valuable insight into the specificity of MG132 effects and further strengthen our conclusion.

      We are interested to see how future experiments addressing the points raised by the reviewer will shape our understanding of the intricate mechanisms involved in Wg signaling and Arm/-catenin degradation. Once again, we thank the reviewer for the thoughtful engagement with the research, and the comments will undoubtedly stimulate further investigation and discussion in this area.

      Reviewer #2 (Recommendations For The Authors):

      The work really needs more experiments to further provide a mechanistic understanding and distinguish between direct and indirect action (via Notch signaling) on Wingless, but instead switches in the second half to a second interaction with β-catenin, leaving the conclusions of the first part hanging. More mechanistic information on the cell biology of how Deltex might affect wingless endocytic trafficking directly would be beneficial, for example involving some cell culture experiments where the action of deltex on Notch and wingless could be more clearly separated and a more detailed study of the consequences on wingless trafficking could be explored.

      Wingless is secreted into an extracellular compartment and so won't be accessible for a direct interaction with cytoplasmic deltex. Therefore are the authors proposing Deltex interacts with a membrane-bound wingless receptor such as frizzled in order to mediate its effects? These avenues could be explored further experimentally to derive a more mechanistic conclusion.

      The colocalisation images are not high resolution and colocalisation is not quantified, and no differences ( +/- Deltex) in wingless subcellular localisation, which would aid mechanistic interpretation, are shown.

      We thank the reviewer for the insightful feedback on our work. We appreciate the suggestion for more experiments to provide a mechanistic understanding and to distinguish between direct and indirect actions of Notch on Wingless signaling. We acknowledge the importance of clarifying these aspects and agree that further experiments could help separate the effects of Deltex on Notch and Wingless signaling, allowing for a more detailed examination of their respective trafficking and ubiquitination mechanisms.

      We will consider your valuable input in our future research efforts to enhance the comprehensiveness of our study.

      Other specific points

      Figure 2: Narrowing and broadening of different marker gene expression patterns in dx mutants needs to be quantified so that variation is taken into account and the numbers of wings imaged should be clearly stated.

      We greatly appreciate this valuable suggestion from the reviewer. As a response, we have incorporated quantification data to address the observed variations. We have also provided information regarding the number of wing discs that were imaged for the purpose of quantification.

      Figure 3: The number of discs imaged in total should be mentioned

      We express our appreciation to the reviewer for the input. We have taken their comment into account and have subsequently included details regarding the number of discs imaged in the figure legend section of the manuscript.

      Figure 6: There is no description of (E5-E6) in the figure legend. F1 to F5 eye size phenotypes require quantification.

      We are grateful to the reviewers for bringing this to our attention. In response, we have included a description of E5-E6 in the figure legend. Also, as per the reviewer’s suggestions, we have incorporated the quantification data of the eye size phenotype.

      Discussion

      Links between Notch and wingless pathway should be more comprehensively discussed, including previous work that has previously linked Notch/Deltex to β-catenin degradation e.g.

      Acar et al. .Sci Rep 2021 Apr 27;11(1):9096. doi: 10.1038/s41598-021-88618-5

      Hayward et al. Development 2005 Apr;132(8):1819-30. doi: 10.1242/dev.01724;

      Kwon et al Nat Cell Biol 2011 Aug 14;13(10):1244-51. doi: 10.1038/ncb2313.

      Sanders et al. PLoS Biol 2009 Aug;7(8):e1000169. doi:10.1371/journal.pbio.1000169. Epub 2009 Aug 11.

      The links between endocytic trafficking and wingless gradient formation could also be further discussed eg.

      Marois et al. Development 2006 Jan;133(2):307-17.doi: 10.1242/dev.02197. Epub 2005 Dec 14

      Yamazaki et al Nat Cell Biol 2016 Apr;18(4):451-7. doi: 10.1038/ncb3325. Epub 2016 Mar 14.

      We appreciate the reviewer's valuable suggestions and we have now included these references in the discussion section of the revised manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      (1) The data strongly suggest that iron depletion in urine leads to conditional essentiality of some genes. It would be informative to test the single gene deletions (Figure 3G) for growth in urine supplemented with iron, to determine how many of those genes support growth in urine due to iron limitation.

      We appreciate this suggestion. We have now included this suggested experiment as a new panel (Figure 5G).

      (2) Line 641. The authors raise the intriguing possibility that some mutants can "cheat" by benefitting from the surrounding cells that are phenotypically wild-type. Growing a fepA deletion strain in urine, either alone or mixed with wild-type cells, would address this question. Given that other mutants may be similarly "masked", it is important to know whether this phenomenon occurs.

      We thank the reviewer for this suggestion but believe that this would be very difficult to ascertain in K. pneumoniae as several redundant iron uptake systems exist. This would require significantly more time to construct sequential/combinatorial iron-uptake mutants to exactly determine this “cheating” and “masking” phenomenon and such work is beyond the scope of the current study.

      (3) In cases where there are disparities between studies, e.g., for genes inferred to be essential for serum resistance, it would be informative to test individual deletions for genes described as essential in only one study.

      We thank the reviewer for this suggestion, and we agree that deleting conditionally essential genes (i.e. serum resistance) could help identify discrepancies in methodology with other studies but this is beyond the scope of this study. Furthermore, we do not have these other strains readily available to us and importing these strains into Australia is challenging due to the strict import/quarantine laws.

      Reviewer #1 (Recommendations For The Authors)

      (4) Line 529. Why was 50 chosen as the read count threshold?

      This was chosen as the minimum threshold needed to exclude essential genes from the comparative analysis, as these can contribute false positive results where a change from, for example, 2 to 5 reads between conditions is considered a >2-fold change. We have updated the manuscript text to highlight this: “were removed from downstream analysis to exclude confounding essential genes and minimize the effect of stochastic mutant loss” (line 539

      (5) The titles for Figure 5 and Figure 6 appear to be switched.

      Thank you, we have now corrected this error.

      (6) Line 381. "Forty-six of these regions contain potential open reading frames that could encode proteins". How is a potential ORF defined?

      This was based on submitting the selected 145bp regions to BLASTx using default parameters and listing the top hit (if one was found). We have now edited the manuscript text to make this clearer. (Line 394)

      (7) Two previous TnSeq studies looking at Escherichia coli and Vibrio cholerae suggest that H-NS can prevent transposon insertion, leading to false positive essentiality calls. Is there any evidence of this phenomenon here? A/T content could be used as a proxy for H-NS occupancy.

      We thank the reviewer for this point and also agree that H-NS or other DNA-binding proteins could indeed lead to false-positive essentiality calls using TraDIS. Based on this, we have now included a sentence in the conclusion section mentioning this methodological caveat (Line 631). We believe that A/T content could potentially be used as a proxy for H-NS occupancy,

      Reviewer #2 (Recommendations For The Authors):

      (1) The authors may wish to reformat the manuscript by decanting a number of panels and figures as supplementary material. These include the panels related to the description of TraDIS (for example Fig 1D, 1E, 1F. 1G, Fig 2A, Fig 3C, 3D, 3E, 3F, Fig 5C, Fig 6D). This is a well-established method.

      We thank the reviewer for this suggestion but believe that these panels allow the methodology and resulting insertion plots to be more followable and allow other researchers, of varying expertise, to better understand this functional genetic screen technique.

      (2) The authors need to indicate how relevant the strain they have probed is. Is it a good reference strain of the KpI group?

      This is a great suggestion and we have now included a new figure illustrating the genetic context and relatedness of K. pneumoniae ECL8 within the KpI phylogroup (New Figure 3).

      (3) The authors need to provide an extensive comparison between the data obtained and those reported testing other Klebsiella strains. A Table identifying the common and different genes, as well as a figure, may suffice. I would encourage authors to compare also their data against E. coli and Salmonella. For example, igaA seems to be not essential in Kebsiella although data indicates it is in Salmonella.

      We thank the reviewer for their comment and appreciate that our data could be extended and compared to other relevant Enterobacteriaceae members. However, we believe this is beyond the scope of this study as the focus is more on K. pneumoniae.

      (4) None of the mutants tested further are complemented. Without these experiments, it cannot be rigorously claimed that these loci play any role in the phenotypes investigated.

      We agree that complementation is an important tenet for validation of mutant gene phenotypes to specific gene loci, in this case wbbY has already been complemented and believe complementation for an already known molecular mechanism would be redundant. Please refer to our response in point 6.

      We complemented isolated transposon mutants hns7::Tn5 and hns18::Tn5 with a mid-copy IPTG inducible . We observed a slight increase in serum susceptibility but not full rescue of the WT phenotype (i.e. serum susceptibility). We suspect that the imperfect rescue of the serum-resistance phenotype observed could be due to the expression levels and copy number of the complement hns plasmid used. As hns is a known global regulator its possible pleiotropic role is complex as many aspects of stress response, metabolism or capsule could be affected in Klebsiella (doi.org/10.1186/1471-2180-6-72, doi.org/10.3389/fcimb.2016.00013). We have now included in the text our efforts in complementation and have included a new supplementary figure (Figure S11).

      (5) The contribution of siderophores to survival in urine is not conclusively established. Authors may wish to test the transcription of relevant genes, and to assess whether the expression is fur dependent in urine. Also, authors may wish to identify the main siderophore needed for survival in urine by probing a number of mutants; this will allow us to assess whether there is a degree of selection and redundancy.

      We thank the reviewer for their comment and agree siderophore uptake is important. We have now included an additional panel (Figure 5G) interrogating the importance of iron-uptake genes grown in urine which is iron limited. We do appreciate that further experiments looking into the Fur regulon and siderophore biosynthesis would be interesting but believe this is outside the scope of this study.

      (6) The role of wbbY is intriguing, pointing towards the importance of high molecular weight O-polysaccharide. In this mutant background, the authors need to assess whether the expression of the capsule, and ECA is affected. Authors need also to complement the mutant. Which is the mechanism conferring resistance?

      We thank the reviewer for their comment and would like to mention that wbbY has already been shown to play a role in LPS profile/biosynthesis and serum-resistance (10.3389/fmicb.2014.00608 ). Furthermore, blast analysis shows that the wbbY gene between the NTUH-K2044 (strain used in aforementioned study) and ECL8 shares 100% sequence identity and also shares lps operon structure. Hence, we do not find it pertinent to complement this mutant as we believe its molecular mechanism has already been established. We have now in the text more prominently highlighted the results of this study and how our screen was robust enough to also identify this gene for serum resistance.

      (7) hns and gnd mutants most likely will have their capsule affected. The authors need to assess whether this is the case. Which is the mechanism conferring resistance?

      As mentioned in point 6, we believe that the serum resistance phenotype is attributable to the LPS phenotype. Previous studies have listed hns and gnd mutants would likely have differences in capsule but due to hns being pleiotropic and gnd being intercalated/adjacent to the LPS/O-antigen biosynthesis it would be difficult to exactly delineate which cellular surface structure is involved.

      (8) The conclusion section can be shortened significantly as much of the text is a repetition of the results/discussion section.

      We thank the reviewer for their suggestion and have made edits to limit repetition in the conclusion section.

      Reviewer #3 (Public Review):

      Below I include several comments regarding potential weaknesses in the methodology used:

      • The study was done with biological duplicates. In vitro studies usually require 3 samples for performing statistical robust analysis. Thus, are two duplicates enough to reach reproducible results? This is important because many genes are analyzed which could lead to false positives. That said, I acknowledge that genes that were confirmed through targeted mutagenesis led to similar phenotypic results. However, what about all those genes with higher p and q values that were not confirmed? Will those differences be real or represent false positives? Could this explain the differences obtained between this and other studies?

      We thank the reviewer for their comment and apologize for the confusion, data were only pooled for the statistical analysis of gene essentiality. Here, two technical replicates of the input library were sequenced and the number of insertions per gene quantified (insertion index scores). These replicates had a correlation coefficient of r2 = 0.955, and the insertions per gene data were pooled to give total insertions index scores to predict gene essentiality. For conditional analyses (growth in urine or serum), replicate data were not combined. As mentioned previously, differences between this and other studies could also be attributed to inherent genomic differences or due to differences in experimental methodology, computational approaches, or the stringency of analysis used to categorize these genes.

      • Two approaches are performed to investigate genes required for K. pneumoniae resistance to serum. In the first approach, the resistance to complement in serum is investigated. And here a total of 356 genes were identified to be relevant. In contrast, when genes required for overall resistance to serum are studied, only 52 genes seem to be involved. In principle, one would expect to see more genes required for overall resistance to serum and within them identify the genes required for resistance to complement. So this result is unexpected. In addition, it seems unlikely that 356 genes are involved in resistance to complement. Thus, is it possible false positives account for some of the results obtained?

      We thank the reviewer for their comment and do believe false positives may account for some of the identified genes. Specifically, to the large contrast in genes, we believe this is due to the methodology as alluded to in our conclusion section. For overall resistance to serum, we used a longer time point (180 min exposure) where fewer surviving mutants are recovered hence fewer overall genes will be identified, whereas strains with short killing windows will have more (i.e. complement-mediated killing, 90 minute exposure).

      Reviewer #3 (Recommendations For The Authors):

      • In Figure 4 it is shown that genes important for growth in urine include several that are required for enterobactin uptake. Moreover, an in vitro experiment shows that the complementation of urine with iron increases K. pneumoniae growth. It would have been informative to do a competition experiment between the WT and Fep mutants in urine supplemented with iron. This could demonstrate that the genes identified are only necessary for conditions in which iron is in limiting concentrations and confirm that the defect of the mutants is not due to other characteristics of urine.

      We appreciate this suggestion. We have now included a new panel (Figure 5G) addressing the supplementation of iron in urine for these select mutants.

      • Considering the results section, the title for Figure 6 seems to be more appropriate for Figure 5.

      Thank you, this has now been corrected.

      Other points:

      • Line 44: treat instead of treating

      Thank you, this has now been corrected.

      • Line 63: found that only 3 genes played a role instead of "found only 3 genes played a role"

      Thank you, this has now been corrected.

      • Line 105: is there any reason for only using males? Since UTIs are frequent in women? Why not use urine from women volunteers?

      Due to accessibility of willing volunteers and human ethic application processes, only male samples were available. We are currently undertaking further studies to understand how male and female urine influences growth of uropathogens.

      • Line 105: since the urine was filter-sterilized, maybe the authors can comment that another point that is missing in urine - and that it may be important to study - will be the presence of the urine microbiome and how this affects growth of K. pneumoniae.

      We again thank the reviewer for this comment and have now edited the manuscript discussing how the absence of urine microbiome could affect growth (Line 659). As an aside, future studies in our lab are interested in looking at the role of commensal/microbiome co-interactions for essentiality/pathogenesis using TraDIS.

      • Line 116: I understand that the 8 healthy volunteers combined males and females

      Thank you, we have now edited this methods line to make this clearer.

      • Line 120: incubate in serum 90 min and 180 RPM shaking: any reasons for using these conditions, any reference supporting these conditions?

      Thank you for pointing this out, we were mirroring a previous K. pneumoniae serum-resistance study (doi.org/10.1128/iai.00043-).

      • Line 156: space after the dot.

      Thank you, we have now corrected this in the manuscript.

      • Line 164: resulting reads were mapped to the K. pneumoniae: what are the parameters used for mapping (e.g. % of identity...)?

      Thank you for bringing this to our attention, we have now included in our manuscript that we used the default parameters of BWA-MEM for mapping for minimum seed length (default -k =20bp exact match)

      • Line 180: it will be good to upload to a repository the In-house scripts used or indicate the link beside the reference for those scripts.

      Our scripts are derived from the pioneering TraDIS study (doi: 10.1101/gr.097097.109). We are currently still optimizing our scripts and intend to upload these to be publicly available. However, in the meantime we are more than happy to share them with other parties upon request.

      • Line 191: why were genes classified as 12 times more likely to be situated in the left mode? Any particular reason for using this threshold?

      We opted for a more-stringent threshold for classifying essential genes, in keeping with previous and comparable studies (doi.org/10.1371/journal.pgen.1003834).

      • Line 209: do you mean Q-value of <0.05 instead of >0.05 ? How is this Q value is calculated, and which specific tests are applied?

      Thank you for pointing out this Q value error, we have now corrected this in the manuscript. These values were generated using the biotradis tradis_comparison.R script which uses the EdgeR package. For further reading please see DOI: 10.1093/bioinformatics/btp616. The Q-values are from P values corrected for multiple testing by the Benjamini-Hochberg method.

      • Line 212: again, which type of test is used? What about the urine growth analysis? The same type of tests were applied?

      Thank you for bringing this to our attention, we have now indicated in the referenced method section the use of which package for which datasets (i.e. or serum). Line 212 refers to our use of the AlbaTraDIS package, which builds on the biotradis toolkit, to identify gene commonalities/differences in the selected growth conditions again using multiple testing by the Benjamini-Hochberg methods. For further reading, please refer to DOI: 10.1371/journal.pcbi.1007980

      • Line 226: do the authors mean Sanger sequencing instead of SangerSanger sequencing?

      Thank you, we have now corrected this in the manuscript.

      • Line 239: does the WT strain contain another marker for differentiating this strain from the mutant? Or is the calculation of the number of WT CFUs done by subtracting the number of CFUs in media with antibiotics from the total number of CFUs in media without antibiotics? The former will be a more accurate method.

      The calculation was based on the latter assumption, “number of WT CFUs done by subtracting the number of CFUs in media with antibiotics from the total number of CFUs in media without antibiotics”. We have now updated the methods section to make this clearer.

      • Line 266: can you indicate approximately how many CFUs you have in this OD?

      Thank you, we have now also indicated an approximate CFU for this mentioned OD600 (OD600 1 = 7 × 108 cells).

      • Line 309: besides indicating Figure 1D please indicate here Dataset S1 (the table where one can see the list of essential and non-essential genes). This table is shown afterwards but I think it will be more appropriate to show it at the begging of the section.

      Thank you, we have now taken on this recommendation and have now edited the manuscript to also indicate Dataset S1 earlier.

      • Table 3. regarding the comparison of essential genes between different strains. I think it will be more clear if a Venn diagram was drawn including only genes that have homologs in all the studied strains (i.e. defining the core genome essentially).

      We would like to thank the reviewer for suggesting a venn diagram and have now removed Table 3 which has been replaced with a new Figure 3.

      • Line 461: replicates were combined for downstream analyses? But are replicates combined for doing the statistical analysis? If so, how is the statistical analysis performed? How is it taken into account the potential variability in the abundance in each library? An r of 0.9 is high but not perfect.

      Technical replicates of the sequenced input library were combined following identification of a correlation coefficient of r2 = 0.955, for the calculation of insertion index scores used in gene essentiality analysis. While r2 = 0.955 is not perfect, discrepancies here can be attributed to higher variance in insertion index scores when sampling small genes, as these are represented by fewer insertions and the stochastic absence of a single insertion event has a greater effect on the overall IIS. Replicate data were not pooled for statistical analysis of mutant fitness (growth in urine and serum).

      • Line 487: is there any control strain containing the kanamycin gene in a part of the genome that does not affect the growth of K. pneumoniae? This could be used to show that having the kanamycin gene does not provide any defect in urine growth.

      We thank the reviewer for this suggestion but argue that introduction of the kanamycin gene into each unique loci may result in various levels of gene fitness that would be incomparable to a single control strain. Instead, we culture the ECL8 mutant library in urine and ensure that its kinetics are comparable to the wildtype. As the library contains thousands of kanamycin cassettes uniquely positioned across most of the genome with no observable growth defect, we do not anticipate the presence or expression of the cassette to have an appreciable impact.

      • Line 569: in the methodology it was indicated that control cells were incubated in PBS for the same amount of time. I think this is an important control that is not cited in the results section. Please can you indicate?

      We apologise for this misunderstanding due to how the methodology was written. The experiment did not sequence the PBS incubated samples as this was solely used a check for viability of the used K. pneumoniae ECL8 stock solution.

      • Line 597: "Mutants in igaA are enriched in our experiments". Can you show this data?

      We have now included this as a supplementary (Figure S11A)

      • Line 615: when doing this calculation, I guess the authors take into account only genes that are also present in the other strains.

      That is correct, we were aiming to highlight the high conservation of “essential genes” among all the selected strains.

      • Line 627: why surprisingly? Because is too low. Then indicate.

      Thank you, we have now edited this sentence to indicate that.

      • Figure 4: please, for clarity, can you indicate the meaning of the colors in the figure itself besides indicating it in the figure legend?

      Thank you, we have now included a color legend in these figure panels for clarity.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #3:

      Summary:

      The receptor tyrosine kinase Anaplastic Lymphoma Kinase (ALK) in humans is nervous system expressed and plays an important role as an oncogene. A number of groups have been studying ALK signalling in flies to gain mechanistic insight into its various roles. In flies, ALK plays a critical role in development, particularly embryonic development and axon targeting. In addition, ALK also was also shown to regulate adult functions including sleep and memory. In this manuscript, Sukumar et al., used a suite of molecular techniques to identify downstream targets of ALK signalling. They first used targeted DamID, a technique that involves a DNA methylase to RNA polymerase II, so that GATC sites in close proximity to PolII binding sites are marked. They performed these experiments in wild type and ALK loss of function mutants (using an Alk dominant negative ALkDN), to identify Alk responsive loci. Comparing these loci with a larval single cell RNAseq dataset identified neuroendocrine cells as an important site of Alk action. They further combined these TaDa hits with data from RNA seq in Alk Loss and Gain of Function manipulations to identify a single novel target of Alk signalling - a neuropeptide precursor they named Sparkly (Spar) for its expression pattern. They generated a mutant allele of Spar, raised an antibody against Spar, and characterised its expression pattern and mutant behavioural phenotypes including defects in sleep and circadian function.

      Strengths:

      The molecular biology experiments using TaDa and RNAseq were elegant and very convincing. The authors identified a novel gene they named Spar. They also generated a mutant allele of Spar (using CrisprCas technology) and raised an antibody against Spar. These experiments are lovely, and the reagents will be useful to the community. The paper is also well written, and the figures are very nicely laid out making the manuscript a pleasure to read.

      We thank the reviewer for this analysis.

      Weaknesses:

      The manuscript has improved substantially in the revision. Yet, some concerns remain around the genetics and behavioural analysis which is incomplete and confusing. The authors generated a novel allele of Spar - Spar ΔExon1 and examined sleep and circadian phenotypes of this allele and of RNAi knockdown of Spar. The RNAi knockdown is a welcome addition. However, the authors only show one parental control the GAL4 / +, but leave out the other parental control i.e. the UAS RNAi / + e.g. in Fig. 9. It is important to show both parental controls.

      We would like to express our gratitude for your insightful comments and feedback on our manuscript. We acknowledge the concerns raised regarding the genetics and behavioural analysis, and we appreciate the opportunity to address these issues. We have added the reciprocal UAS Spar-RNAi control in addition to the GAL4/+ control and we have incorporated both controls in the revised Figure 9, Figure 9 Supplementary Figure 1 and Figure 9 Supplementary Figure 2. Figure legends have been modified accordingly.

      Further, the sleep and circadian characterisation could be substantially improved. It is unclear how sleep was calculated - what program was used or what the criteria to define a sleep bout was.

      The data underwent analysis utilizing an Excel macro, as outlined in the study by Berlandi et al. (2017) (PMID: 28912696). As previously indicated in the methodology, sleep is characterized as 5 minutes of inactivity. The raw data acquired from the Trikenetics DAM system was input into an Excel spreadsheet, and the parameters, encompassing sleep and activity, were computed for each day of the trial as an average derived from the data of all living animals at that time. Subsequently, these parameters were exhibited over the course of the experiment. We have further detailed this part in the methods section to avoid confusion (Page 32 of revised MS).

      In the legend for Fig 8c, it says sleep was shown as "percentage of time flies spend sleeping measured every 5min across a 24h time span". Sleep in flies is (usually) defined as at least 5 min of inactivity. With this definition, I'm not sure how one can calculate the % time asleep in a 5 min bin! Typically people use 30min or 60min bins.

      We thank the reviewer for bringing this to our attention. As previously stated, in our experiments, sleep is defined as 5 minutes of inactivity. We have now modified the wording in the figure legend (Figure 8, Page 41), which was previously misleading.

      The sleep numbers for controls also seem off to me e.g. in Fig. 8H and H' average sleep / day is ~100. Is this minutes of sleep? 100 min / day is far too low, is it a typo? The same applies to Figure 8, figure supplement 2. Other places e.g. Fig 8 figure supplement 1, avg sleep is around 1000 min / day.

      The numbers for sleep bouts are also too low to me e.g. in Fig 9 number of sleep bouts avg around 4, and in Fig. 8 figure supplement 2 they average 1 sleep bout. There are several free software packages to analyse sleep data (e.g. Sleep Mat, PMID 35998317, or SCAMP). I would recommend that the authors reanalyse their data using one of these standard packages that are used routinely in the field. That should help resolve many issues.

      We thank the reviewer for pointing this out. There was indeed a typo “missing 0”, resulting in 0 values as only 3 days of raw data were chosen for the analysis of the average sleep in the mentioned figures. We have corrected this mistake in all figures.

      The circadian anticipatory activity analyses could also be improved. The standard in the field is to perform eduction analyses and quantify anticipatory activity e.g. using the method of Harrisingh et al. (PMID: 18003827). This typically computed as the ratio of activity in the 3hrs preceding light transition to activity in the 6hrs preceding light transition. The programs referenced above should help with this.

      For consistency purposes we used the same macro excel (Berlandi et al, 2017) (PMID: 28912696) and followed the methodology of Harrisingh et al. (PMID: 18003827) to assess the anticipatory activity. We selected the activity in the 6 h period before lights on and defined it as a.m. anticipation, and the activity in the 6h period preceding the lights off and defined as p.m. anticipation (Figure 8 f-g).

      Finally, in many cases I'm not sure that the appropriate statistical tests have been used e.g. in Fig 8c, 8e, 8h t-tests have been used when are three groups in the figure. The appropriate test here would an ANOVA, followed by post-hoc comparisons.

      We agree with the reviewer’s comments. We have re-evaluated the data in Figure 8 b, c, e, h and h’ and Figure 8 Supplement 2 and 4 using a One-Way ANOVA followed by Tukey post-hoc test and we have indicated this in all legends.

    1. Author response:

      We kindly thank the senior editor, the reviewing editor, and the esteemed reviewers for their invaluable insights in enhancing our manuscript. The assessment and feedback, particularly on the role of directly released bacterial ATP versus OMV-delivered bacterial ATP and its role on neutrophils, addressing study limitations, and discussing our models is highly appreciated.

      The points you raised let us critically rethink our approach, our results, and our conclusions. Furthermore, it gave us the chance to elaborate on some critical aspects that you mentioned. With your help, we will make clarifications throughout the manuscript, and we will add the data about neutrophil numbers in the different organs (reviewer #1, weaknesses #3).

      Reviewer #1 (Public Review):

      Summary:

      • Extracellular ATP represents a danger-associated molecular pattern associated to tissue damage and can act also in an autocrine fashion in macrophages to promote proinflammatory responses, as observed in a previous paper by the authors in abdominal sepsis. The present study addresses an important aspect possibly conditioning the outcome of sepsis that is the release of ATP by bacteria. The authors show that sepsis-associated bacteria do in fact release ATP in a growth dependent and strain-specific manner. However, whether this bacterial derived ATP play a role in the pathogenesis of abdominal sepsis has not been determined. To address this question, a number of mutant strains of E. coli has been used first to correlate bacterial ATP release with growth and then, with outer membrane integrity and bacterial death. By using E. coli transformants expressing the ATP-degrading enzyme apyrase in the periplasmic space, the paper nicely shows that abdominal sepsis by these transformants results in significantly improved survival. This effect was associated with a reduction of peritoneal macrophages and CX3CR1+ monocytes, and an increase in neutrophils. To extrapolate the function of bacterial ATP from the systemic response to microorganisms, the authors exploited bacterial OMVs either loaded or not with ATP to investigate the systemic effects devoid of living microorganisms. This approach showed that ATP-loaded OMVs induced degranulation of neutrophils after lysosomal uptake, suggesting that this mechanism could contribute to sepsis severity.

      Strengths:

      • A strong part of the study is the analysis of E. coli mutants to address different aspects of bacterial release of ATP that could be relevant during systemic dissemination of bacteria in the host.

      We want to thank the reviewer for recognizing this important aspect of our experimental approach.

      Weaknesses:

      • As pointed out in the limitations of the study whether ATP-loaded OMVs provide a mechanistic proof of the pathogenetic role of bacteria-derived ATP independently of live microorganisms in sepsis is interesting but not definitively convincing. It could be useful to see whether degranulation of neutrophils is differentially induced by apyrase-expressing vs control E. coli transformants.

      We thank the reviewer for raising several important points. In our study, we assessed local and systemic effects of released bacterial ATP. The consequences of local bacterial ATP release were assessed using an apyrase-expressing E. coli transformant. Locally, bacterial ATP resulted in a decrease in neutrophil numbers and we hypothesize that directly released bacterial ATP either leads to neutrophil death (e.g. via P2X7 receptor (Proietti et al., 2019)) or interferes with the recruitment of neutrophils (e.g. via P2Y receptors (Junger, 2011)).

      The systemic consequences were assessed using ATP-loaded and empty OMV. We have shown that degranulation is induced by OMV-derived bacterial ATP. ATP-containing OMV are engulfed by neutrophils, reach its endolysosomal compartment and might activate purinergic receptors, which then lead to aberrant degranulation. This concept, that needs to be explored in future studies, is fundamentally different from classical purinergic signaling via directly released bacterial ATP into the extracellular space.

      It is possible that neutrophil degranulation is also modulated by directly released bacterial ATP. We agree that this should be assessed in future studies. Also, the role of OMV-derived bacterial ATP should be assessed locally as well as the importance of directly released vs. OMV-mediated bacterial ATP dissected locally. Based on our measurements (Figure 4-figure supplement 1A and Figure 5C), we estimate that the effect of OMV-derived bacterial ATP might be much smaller than the effects of directly released bacterial ATP. Thus, direct ATP release might predominate locally. However, we fully agree that this has to be investigated in a future study to reconcile the different aspects of bacterial ATP signaling. A paragraph will be added to the manuscript, in which we discuss this particular issue.

      • Also, the increase of neutrophils in bacterial ATP-depleted abdominal sepsis, which has better outcomes than "ATP-proficient" sepsis, seems difficult to correlate to the hypothesized tissue damage induced by ATP delivered via non-infectious OMVs.

      We fully acknowledge the mentioned discrepancy. What we propose is that bacterial ATP exhibits different functions that are dependent on the release mechanism (see above). Locally, in the peritoneal cavity, neutrophil numbers are decreased by directly released bacterial ATP. Remotely, ATP is delivered via OMV and impacts on neutrophil function. We agree that, in particular, in the peritoneal cavity, both effects may play a role. However, the impact of directly released bacterial ATP seems to be dominant (see above).

      We propose that neutrophils are decreased locally because of directly released bacterial ATP, which prevents efficient infection control and, therefore, impairs sepsis survival. In addition, these fewer neutrophils might even be dysregulated by the engulfment of bacterial ATP delivered via OMV, which leads to an upregulated and possibly aberrant degranulation process worsening local and remote tissue damage. We agree that in addition to neutrophil numbers, the function of local neutrophils should be assessed with and without the influence of OMV-delivered bacterial ATP. This could be done by RNA sequencing of primary neutrophils from the peritoneal cavity or neutrophil cell lines as well as degranulation assays.

      • Are the neutrophils counts affected by ATP delivered via OMVs?

      This is difficult to show in the peritoneal cavity where we have both, directly released bacterial ATP and OMV-derived bacterial ATP. We assessed such putative difference, however, for the systemic organs and the blood, where we did not find any differences in neutrophil numbers. We will include the figure in the revised manuscript as Figure 6-figure supplement 3C.

      Author response image 1.

      • A comparison of cytokine profiles in the abdominal fluids of E. coli and OMV treated animals could be helpful in defining the different responses induced by OMV-delivered vs bacterial-released ATP. The analyses performed on OMV treated versus E. coli infected mice are not closely related and difficult to combine when trying to draw a hypothesis for bacterial ATP in sepsis.

      We fully agree that there are several open questions that remain to be elucidated, in particular, to differentiate the local role of directly released versus OMV-delivered bacterial ATP. In this study, we laid the foundation for future in vivo research to examine the specific role of bacterial ATP in sepsis. Such future research avenues might be to investigate the local effects of OMV-delivered bacterial ATP, and how neutrophil migration, apoptosis and degranulation are altered. We agree that exploration of the local secretory immune response and cytokine profiles are relevant to understand the different mechanisms of how bacterial ATP alters sepsis. However, such experiments should be ideally performed in systems where the source and the delivery of ATP can be modulated locally.

      • Also it was not clear why lung neutrophils were used for the RNAseq data generation and analysis.

      Thank you for this remark. We have chosen primary lung neutrophils for four reasons:

      (1) Isolation of primary lung neutrophils allowed us to assess an in vivo response that would not have been possible with cell lines.

      (2) The lung and the respiratory system are among the clinically most important organs affected during sepsis resulting in a significant cause of mortality.

      (3) We show in Figure 6C that specifically in the lung, OMV are engulfed by neutrophils, which shows the relevance of the lung also in our study context.

      (4) And finally, lung neutrophils were chosen to examine specifically distant and not local effects.

      Reviewer #2 (Public Review):

      Summary:

      • In their manuscript "Released Bacterial ATP Shapes Local and Systemic Inflammation during Abdominal Sepsis", Daniel Spari et al. explored the dual role of ATP in exacerbating sepsis, revealing that ATP from both host and bacteria significantly impacts immune responses and disease progression.

      Strengths:

      • The study meticulously examines the complex relationship between ATP release and bacterial growth, membrane integrity, and how bacterial ATP potentially dampens inflammatory responses, thereby impairing survival in sepsis models. Additionally, this compelling paper implies a concept that bacterial OMVs act as vehicles for the systemic distribution of ATP, influencing neutrophil activity and exacerbating sepsis severity.

      We thank the reviewer for mentioning these key points and supporting the relevance of our study.

      Weaknesses:

      (1) The researchers extracted and cultivated abdominal fluid on LB agar plates, then randomly picked 25 colonies for analysis. However, they did not conduct 16S rRNA gene amplicon sequencing on the fluid itself. It is worth noting that the bacterial species present may vary depending on the individual patients. It would be beneficial if the authors could specify whether they've verified the existence of unculturable species capable of secreting high levels of Extracellular ATP.

      Most septic complications are caused by a limited spectrum of bacteria, belonging mainly either to the Firmicutes or the Proteobacteria phyla, including E. coli, K. pneumoniae, S. aureus or E. faecalis (Diekema et al., 2019; Mureșan et al., 2018). We validated this well documented existing evidence by randomly assessing 25 colonies. For the planned experiments, it was crucial to work with culturable bacteria; otherwise, ATP measurements, the modulation of ATP generation or loading of OMV would not have been possible. Using such culturable bacteria allowed us to describe mechanisms of ATP release.

      We fully agree that hard-to-culture or unculturable bacteria might contribute significantly to septic complications. This, however, would need to be explored in future studies using extensive culturing methods (Cheng et al., 2022).

      (2) Do mice lacking commensal bacteria show a lack of extracellular ATP following cecal ligation puncture?

      ATP is typically secreted by many cells of the host in active and passive manners in the case of any injury, including cecal ligation and puncture (Burnstock, 2016; Dosch et al., 2018; Eltzschig et al., 2012; Idzko et al., 2014). We hypothesize that bacterial ATP is a potential priming agent at early stages of sepsis, and indeed, at such early time points, a comparison of peritoneal ATP levels between germfree and colonized mice could support our hypothesis. Future studies addressing this question must, however, correct for the different immune responses between germ-free and colonized mice. This is of utmost importance, especially for the cecal ligation and puncture model, since the cecum of germ-free mice is extremely large, making such experiments hard to control.

      (3) The authors isolated various bacteria from abdominal fluid, encompassing both Gram-negative and Gram-positive types. Nevertheless, their emphasis appeared to be primarily on the Gram-negative E. coli. It would be beneficial to ascertain whether the mechanisms of Extracellular ATP release differ between Gram-positive and Gram-negative bacteria. This is particularly relevant given that the Gram-positive bacterium E. faecalis, also isolated from the abdominal fluid, is recognized for its propensity to release substantial amounts of Extracellular ATP.

      We fully agree with this comment. In this paper, we used E. coli as our model organism to determine the principles of sepsis-associated bacterial ATP release and therefore focused on gram-negative bacteria. In addition to the direct, growth-dependent release, we found a relevant impact of OMV-delivered bacterial ATP. For this latter purpose, a gram-negative strain, in which OMV generation has been well described (Schwechheimer & Kuehn, 2015), was chosen. Recently, gram-positive bacteria have been shown to secrete ATP and OMV as well (Briaud & Carroll, 2020; Hironaka et al., 2013; Iwase et al., 2010). Given the fundamental differences in the structure of the cell wall of gram-positive bacteria and the mechanisms of OMV generation and release, future studies are required to assess the relevance of directly released and OMV-delivered ATP in gram-positive bacteria.

      (4) The authors observed changes in the levels of LPM, SPM, and neutrophils in vivo. However, it remains uncertain whether the proliferation or migration of these cells is modulated or inhibited by ATP receptors like P2Y receptors. This aspect requires further investigation to establish a convincing connection.

      We fully agree with this comment. The decrease in LPM and the consequential predomination of SPM have been well described after inflammatory stimuli in the context of the macrophage disappearance reaction (Ghosn et al., 2010). Also, it has been shown that purinergic signaling modulates infiltration of neutrophils and can lead to cell death as a consequence of P2Y and P2X receptor activation (Junger, 2011; Proietti et al., 2019). In our study, we propose that intracellular purinergic receptors contribute to neutrophil function during sepsis. After introducing the general principles and fundaments of bacterial ATP with our studies, we fully agree that additional experiments need to address downstream purinergic receptor activation. That, however, would go beyond the scope of our study.

      (5) Additionally, is it possible that the observed in vivo changes could be triggered by bacterial components other than Extracellular ATP? In this research field, a comprehensive collection of inhibitors is available, so it is desirable to utilize them to demonstrate clearer results.

      This question is of utmost importance and defined the choice of our model and experimental approach. When we started the project, we used two different E. coli mutants that release low (ompC) and high (eaeH) amounts of ATP. However, the limitation of this approach is that these are different bacteria, which may also differ in the components they secrete or the surface proteins they express. We, therefore, decided against that approach. With the approach we finally used (same bacterium, just with and without ATP), we aimed to minimize the influence of non-ATP bacterial components.

      (6) Have the authors considered the role of host-derived Extracellular ATP in the context of inflammation?

      Yes, the role of host-derived extracellular ATP in inflammation and sepsis is well-established with contradictory results (Csóka et al., 2015; Ledderose et al., 2016). This conflicting data was the rationale to test the relevance of bacterial ATP. We suggest that bacterial ATP is essential in the early phase of sepsis when bacteria invade the sterile compartment and before efficient host response, including the eukaryotic release of ATP, is established.

      (7) The authors mention that Extracellular ATP is rapidly hydrolyzed by ectonucleotases in vivo. Are the changes of immune cells within the peritoneal cavity caused by Extracellular ATP released from bacterial death or by OMVs?

      This is a relevant question that was also asked by reviewer #1, and we answered it in detail above (weaknesses comment #1 and #2). From our ATP measurements (Figure 4-figure supplement 1A and Figure 5C), we conclude that locally, the role of directly released bacterial ATP (extracellular) predominates over OMV-derived bacterial ATP. Furthermore, the mechanisms between directly released and OMV-derived bacterial ATP (within OMV, engulfed and transported to the endolysosomal compartment) are different, and especially extracellular ATP has been described to lead to apoptosis via P2X7 signaling.

      (8) In the manuscript, the sample size (n) for the data consistently remains at 2. I would suggest expanding the sample size to enhance the robustness and rigor of the results.

      Two biological replicates (independent cultures) were only used for the bacteria cultures in Figure 1, Figure 2, and Figure 3, which achieved similar results and the standard deviation remained very small, indicating its robustness. In the in vitro experiments in Figure 5 we used a sample size of 6 (three biological replicates measured in technical duplicates), since we saw bigger deviations in our measurements. For the in vivo experiments, we always used 5 or more animals in at least two independent experiments.

      References

      Briaud, P., & Carroll, R. K. (2020). Extracellular Vesicle Biogenesis and Functions in Gram-Positive Bacteria. Infection and Immunity, 88(12), 10.1128/iai.00433-20. https://doi.org/10.1128/iai.00433-20

      Burnstock, G. (2016). P2X ion channel receptors and inflammation. Purinergic Signalling, 12(1), 59–67. https://doi.org/10.1007/s11302-015-9493-0

      Cheng, A. G., Ho, P.-Y., Aranda-Díaz, A., Jain, S., Yu, F. B., Meng, X., Wang, M., Iakiviak, M., Nagashima, K., Zhao, A., Murugkar, P., Patil, A., Atabakhsh, K., Weakley, A., Yan, J., Brumbaugh, A. R., Higginbottom, S., Dimas, A., Shiver, A. L., … Fischbach, M. A. (2022). Design, construction, and in vivo augmentation of a complex gut microbiome. Cell, 185(19), 3617-3636.e19. https://doi.org/10.1016/j.cell.2022.08.003

      Csóka, B., Németh, Z. H., Törő, G., Idzko, M., Zech, A., Koscsó, B., Spolarics, Z., Antonioli, L., Cseri, K., Erdélyi, K., Pacher, P., & Haskó, G. (2015). Extracellular ATP protects against sepsis through macrophage P2X7 purinergic receptors by enhancing intracellular bacterial killing. The FASEB Journal, 29(9), 3626–3637. https://doi.org/10.1096/fj.15-272450

      Diekema, D. J., Hsueh, P.-R., Mendes, R. E., Pfaller, M. A., Rolston, K. V., Sader, H. S., & Jones, R. N. (2019). The Microbiology of Bloodstream Infection: 20-Year Trends from the SENTRY Antimicrobial Surveillance Program. Antimicrobial Agents and Chemotherapy, 63(7), e00355-19. https://doi.org/10.1128/AAC.00355-19

      Dosch, M., Gerber, J., Jebbawi, F., & Beldi, G. (2018). Mechanisms of ATP Release by Inflammatory Cells. International Journal of Molecular Sciences, 19(4), 1222. https://doi.org/10.3390/ijms19041222

      Eltzschig, H. K., Sitkovsky, M. V., & Robson, S. C. (2012). Purinergic Signaling during Inflammation. New England Journal of Medicine, 367(24), 2322–2333. https://doi.org/10.1056/NEJMra1205750

      Ghosn, E. E. B., Cassado, A. A., Govoni, G. R., Fukuhara, T., Yang, Y., Monack, D. M., Bortoluci, K. R., Almeida, S. R., Herzenberg, L. A., & Herzenberg, L. A. (2010). Two physically, functionally, and developmentally distinct peritoneal macrophage subsets. Proceedings of the National Academy of Sciences, 107(6), 2568–2573. https://doi.org/10.1073/pnas.0915000107

      Hironaka, I., Iwase, T., Sugimoto, S., Okuda, K., Tajima, A., Yanaga, K., & Mizunoe, Y. (2013). Glucose Triggers ATP Secretion from Bacteria in a Growth-Phase-Dependent Manner. Applied and Environmental Microbiology, 79(7), 2328–2335. https://doi.org/10.1128/AEM.03871-12

      Idzko, M., Ferrari, D., & Eltzschig, H. K. (2014). Nucleotide signalling during inflammation. Nature, 509(7500), 310–317. https://doi.org/10.1038/nature13085

      Iwase, T., Shinji, H., Tajima, A., Sato, F., Tamura, T., Iwamoto, T., Yoneda, M., & Mizunoe, Y. (2010). Isolation and Identification of ATP-Secreting Bacteria from Mice and Humans. Journal of Clinical Microbiology, 48(5), 1949–1951. https://doi.org/10.1128/JCM.01941-09

      Junger, W. G. (2011). Immune cell regulation by autocrine purinergic signalling. Nature Reviews Immunology, 11(3), 201–212. https://doi.org/10.1038/nri2938

      Ledderose, C., Bao, Y., Kondo, Y., Fakhari, M., Slubowski, C., Zhang, J., & Junger, W. G. (2016). Purinergic Signaling and the Immune Response in Sepsis: A Review. Clinical Therapeutics, 38(5), 1054–1065. https://doi.org/10.1016/j.clinthera.2016.04.002

      Mureșan, M. G., Balmoș, I. A., Badea, I., & Santini, A. (2018). Abdominal Sepsis: An Update. The Journal of Critical Care Medicine, 4(4), 120–125. https://doi.org/10.2478/jccm-2018-0023

      Proietti, M., Perruzza, L., Scribano, D., Pellegrini, G., D’Antuono, R., Strati, F., Raffaelli, M., Gonzalez, S. F., Thelen, M., Hardt, W.-D., Slack, E., Nicoletti, M., & Grassi, F. (2019). ATP released by intestinal bacteria limits the generation of protective IgA against enteropathogens. Nature Communications, 10(1), Article 1. https://doi.org/10.1038/s41467-018-08156-z

      Schwechheimer, C., & Kuehn, M. J. (2015). Outer-membrane vesicles from Gram-negative bacteria: Biogenesis and functions. Nature Reviews Microbiology, 13(10), 605–619. https://doi.org/10.1038/nrmicro3525

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewing Editor's comments:

      There appears to be several mistakes/missing details in the additional statistical analyses reported in their response to Reviewer #'1 comments:

      (1) Detecting differentially expressed genes (DEGs):

      Reviewer #1 suggested adding an interaction term between sex and environment (ethnicity) in identifying DEGs. The authors performed ANCOVA analysis with sex and ethnicity as covariates (but not the interaction) and found sex explained more variance. This is not what the reviewer asked for, and the results do not help identify DEGs.

      We understand the reviewer’s suggestion about identification of DEGs using sex × ethnicity interaction. However, we could not find an appropriate tool to make such analysis, though we have carefully searched it in the literature. It should be noted that the interaction analysis between sex and environment was only designed to study genotype data rather than gene expression data. Besides, considering that we have added multiple covariates in our DEG detection, adding an interaction term between sex and environment (ethnicity) in identifying DEGs make the formulation too complex to resolve using current tools. Alternatively, we have made a linear regression model to test the explanation of sex for DEG detection in the revision (see details below). We would appreciate if the reviewer could provide any available tools, or previous studies conducting interaction analysis for DEG identification.

      (2) Overlap between DEGs and genes under positive selection in Tibetans (TSNGs)

      The authors claimed that the overlaps are significantly enriched in "sex-combined" set (p=0.048) and "male-only" set (p=9e-4), but it seems that the authors calculated the p-values incorrectly. Based on the histogram shown in Fig 3R (left penal), at least 750 out of 10,000 permutations led to 4 genes in overlap and there are additional permutations with 5 or more genes in overlap, so the p-value for the sex-combined set cannot be 0.048. In addition, the permutation procedure is somewhat questionable: it is unclear whether randomly sampling 192 genes from the human genome is reasonable choice, without matching for relevant gene features.

      As we explained in the response to Reviewer-1, we agree with the reviewer’s point that random sampling of genes in permutation should be extracted from genes expressed in each tissue rather than the entire genome. Based on this updated random sampling procedure, we redid the analysis, and our previous conclusions remain unchanged.

      (3) Polygenic adaptation signal based on eQTL information:

      The PolyGraph method is designed for highly polygenic traits with causal variants spread across the genome. However, the genetic architecture of the expression of a gene is much less polygenic with at most few cis- eQTLs per gene, so the PolyGraph model does not apply for expression of individual genes. On the other hand, eQTLs for different genes are associated with different "traits", so they cannot be simply aggregated together for PolyGraph analysis. Based on the Methods description, it is unclear how the authors ran the PolyGraph analysis on eQTLs practically and whether this practice is appropriate for detecting polygenic adaptation signal on gene expression.

      We understand the reviewer’s concern on polygenic adaptation analysis. In this study, we tested whether the estimated polygenic scores from eQTLs (estimated using sums of allele frequencies at independent eQTLs weighted by their effect sizes) were significantly enriched in Tibetans compared to other populations. The detailed descriptions of polygenic test are provided in the response to Reviewer-1.

      Reviewer #1 (Public Review):

      The revised manuscript new presented 1) a permutation-based test for the significance of the overlap between DEGs and genes with positive selection signals in Tibetans, and 2) polygenic adaptation test for the eQTLs. I make my suggestions in detail as below:

      Major Comments

      (1) My previous concern regarding the DEG analysis remains unresolved. Although the authors agreed in their response that the difference between the male- and female-specific DEGs are insufficient to the difference between sex-combined and sex-specific DEGs (Figure S6). However, the results section still states the opposite pattern between males and females as a decisive reason for the difference (p. 9, lines 236-239). Again, I would like to recommend the authors to test alternative ways of analysis to boost statistical power for DEG detection other than simply splitting data into males and females and performing analysis in each subset. For example, the authors may consider utilizing gene by environment interaction analysis schemes here biological sex as an environmental factor.

      To evaluate the effect of gene expression of each layer by sex, we adopted two strategies: 1) to calculate the variance explained by sex from the expression data; 2) to evaluate the statistical significance of association between sex from the expression data.

      Firstly, we observed a significantly higher variance explained by sex than by ethnicity in six layers of the placenta (see details in our previous response to reviewers).

      Then, we performed a linear regression model to test whether gender affects the gene expression. For each gene, a linear regression model was made by using R glm function with sex as covariates: glm (gene expression ~ sex). We discovered 5,865 genes significantly associated with sex, and most of them were located on the sex chromosomes. We observed 62.63% genes overlapped with those genes with opposite differential directions between the sex-combined and the sex-specific analyses.

      Considering the opposite direction of DEGs is likely only one of the explanations for the discrepancy between the sex-combined and the sex-specific DEGs, and there might be alternative mechanism for this phenomenon, we have tune down the description of this point in the revised manuscript:

      “Considering 62.63% of DEGs (248/396) with an opposite direction of between-population expression divergence in males and females, respectively (Figure S6), we reckon that there might be other factors such as sample size or cell composition affecting the identification of DEGs, which could cancel out the differences in the sex-combined analysis.” (Page 9)

      (2) Multiple testing schemes are still sub-optimal in some cases. Most of all, the p-values in the WGCNA analysis (p. 11), the authors corrected for the number of traits (n=12) after adjusting for the correlation between them. However, they did not mention whether they counted for the number of modules they tested at all (n=136 and 161 for males and females, respectively). Whether they account for the number of modules will make a substantial difference in the significance threshold, please incorporate and describe a proper multiple testing scheme for this analysis.

      We understand the reviewer’s point. Indeed, for multiple testing schemes, we considered both the number of traits and the number of modules. For the number of modules, multiple testing correction is already imbedded in WGCNA, as described in the published studies (Li et al. 2018; Zeng et al. 2023).

      (3) Evidence for natural selection on the observed DEG pattern is still weak and not properly described.

      (1) For the overlap between DEGs and TSNGs, the authors introduced a permutation-based test, but used a total set of genes in the human genome as a comparison set (p. 25, lines 699-700). I believe that the authors should sample random sets of genes from those already expressed in each tissue to make a fair comparison.

      We agree with the reviewer’s point that random sampling of genes in permutation should be extracted from genes expressed in each tissue, which is a fair comparison between the observed and the simulated counts of the overlapped genes.

      Therefore, for each permutation, we randomly extracted 192 genes from all the placenta expressed genes identified from the seven layers (17,284 genes in total), and we overlapped them with DEGs of the three sets (female + male, female only, and male only) and counted the gene numbers. After 10,000 permutations, we constructed a null distribution for each set, and found that the overlaps between DEGs and TSNGs were significantly enriched in the “sex-combined” set (p-value = 0.0123) and the “male-only” set (p-value < 1e-4), but not in the “female-only” set (p-value = 0.0572) (Figure R1). This result suggests that the observed DEGs are significantly enriched in TSNGs when compared to the set of random sampling, especially for the DEGs from the “male-only” set.

      Author response image 1.

      The distribution of 10,000 permutation tests of counts of the overlapped genes between 192 TSNGs and the DEGs randomly selected from the expressed genes in the placenta. The red-dashed lines indicate the observed values based on the randomly selected DEGs.

      (2) The entire polygraph analysis for polygenic adaptation is poorly described. The current version of the Methods does not clarify i) for which genes the eQTLs are discovered, 2) how the authors performed the eQTL analysis, iii) how the authors polarized the effect, and iv) how they set up a comparison between the eQTLs and the others.

      Considering the RNA-seq data of placenta mostly represent the transcriptomes of the newborns according to our analysis on maternal-fetal compositions of each dissected layer, we conducted eQTL analysis using the fetal genotypes and the placental tissue gene expression data (TPM) using R package MatrixEQTL (https://github.com/andreyshabalin/MatrixEQTL), and the altitude and maternal age were taken as covariates. We take a window 1 Mb upstream and 1 Mb downstream around each SNP to select genes or expression probes to test. Associations between these SNP–gene combinations are calculated using linear model. This tool can distinguish local (cis-) and distant (trans) eQTLs. We performed separate corrections for multiple testing.

      Finally, we detected 5,251 eQTLs (involving 319 eGenes), covering the SNPs significantly associated with gene expression (p-value < 5e-8). To identify the signatures of polygenic selection in Tibetans using eQTL information, we removed those SNPs in linkage disequilibrium (r2 > 0.2 in 1000 Genome Project) and obtained 176 independent eQTLs as input into PolyGraph (Racimo et al. 2018). QB (Racimo et al. 2018) and QX (Berg and Coop 2014) framework are used in Polygraph to determine whether the estimated polygenic scores exhibit more variance among populations than null expectation under genetic drift, by retrieving the summary statistics from the eQTL set.

      In this study, we focused on testing whether the estimated polygenic scores from eQTLs (estimated using sums of allele frequencies at independent eQTLs weighted by their effect sizes) were significantly enriched in Tibetans compared to other populations. The significance was evaluated by comparing to 10,000 sets of the control SNPs. Each set of control SNPs was randomly drawn from the genomic SNPs, and contained an equal number of SNPs as the eQTLs matched one-to-one by minor allele frequency.

      The PolyGraph result showed that Tibetans have a clear signature of polygenic selection on gene expression (Bonferroni-corrected p-value = 0.003, Figure S12). In other words, the frequency of alleles associated with gene expression (up-regulation or down-regulation) were specifically enriched in Tibetans, a signal of positive selection.

      Minor comments (1) In Figure S1, the amount of variance explained by PC1 and PC2 need to be corrected. PC1 explains less variance than PC2 (0.11 vs 0.68%).

      It was a typing error that mixed up the variances between PC1 and PC2. We have corrected it in the revised version.

      (2) In the section "Sex-biased expression divergence ..." (p. 8), the authors are using the term "gender" instead of sex. Considering that they are talking about the biological sex of each infant, I believe that sex is a more appropriate term to be used than gender.

      Following the reviewer’s suggestion, we rephrased “gender” as “sex” in the revised manuscript to describe the biological differences between females and males.

      Reviewer #3 (Public Review):

      More than 80 million people live at high altitude. This impacts health outcomes, including those related to pregnancy. Longer-lived populations at high altitudes, such as the Tibetan and Andean populations show partial protection against the negative health effects of high altitude. The paper by Yue sought to determine the mechanisms by which the placenta of Tibetans may have adapted to minimise the negative effect of high altitude on fetal growth outcomes. It compared placentas from pregnancies from Tibetans to those from the Han Chinese. It employed RNAseq profiling of different regions of the placenta and fetal membranes, with some follow-up of histological changes in umbilical cord structure and placental structure. The study also explored the contribution of fetal sex in these phenotypic outcomes.

      A key strength of the study is the large sample sizes for the RNAseq analysis, the analysis of different parts of the placenta and fetal membranes, and the assessment of fetal sex differences.

      A main weakness is that this study, and its conclusions, largely rely on transcriptomic changes informed by RNAseq. Changes in genes and pathways identified through bioinformatic analysis were not verified by alternate methods, such as by western blotting, which would add weight to the strength of the data and its interpretations. There is also a lack of description of patient characteristics, so the reader is unable to make their own judgments on how placental changes may link to pregnancy outcomes. Another weakness is that the histological analyses were performed on n=5 per group and were rudimentary in nature.

      For the three weaknesses raised by the reviewer, here are our responses:

      (1) Considering that our conclusions largely rely on the transcriptomic data, we agree with reviewer that more experiments are needed to validate the results from our transcriptomic data. However, this study was mainly aimed to provide a transcriptomic landscape of high-altitude placenta, and to characterize the gene-expression difference between native Tibetans and Han migrants. The molecular mechanism exploration is not the main task of this study, and more validation experiments are warranted in the future.

      (2) For the lack of description of patient characteristics, actually, we provided three-level results on the placental changes of Tibetans: macroscopic phenotypes (higher placental weight and volume), histological phenotypes (larger umbilical vein walls and umbilical artery intima and media; lower syncytial knots/villi ratios) and transcriptomic phenotypes (DEG and differential modules). Combined with the previous studies, these placenta changes suggest a better reproductive outcome. For example, the placenta volume shows a significantly positive correlation with birth weight (R = 0.31, p-value = 2.5e-16), therefore, the larger placenta volume of Tibetans is beneficial to fetal development at high altitude. In addition, the larger umbilical vein wall and umbilical artery intima and media of Tibetans can explain their adaptation in preventing preeclampsia.

      (3) For the sample size of histological analyses, we understand the reviewer’s concern that 5 vs. 5 samples are not very large in histological analyses. This is because it was difficult to collect high-altitude Han placenta samples, and we only got 13 Han samples, from which we selected 5 infant sex matched samples.

      Minor point:

      I feel the authors have responded well to the other reviewer comments. However, I am disappointed that the authors did not address my comment related to the validation of their RNAseq data. In particular, they failed to add new data that verifies and supports their RNAseq findings on pathways affected. This is imperative as their conclusions are based solely on the RNAseq analysis. The only other comment I have is that they should add a description of all abbreviations, including those in the supplementary information (like Table S12).

      For experimental validation of transcriptome, we understand the concern of reviewer. However, as we mentioned before, this study was mainly aimed to provide a transcriptomic landscape of high-altitude placenta, the molecular mechanism exploration is not the main task of this study, and more validation experiments are warranted in the future. Actually, we have tune down the description of power from transcriptomic data for explanation of biological difference, and called for the further functional validations in the future:

      “the transcriptome data is insufficient to explain the underlying molecular mechanisms of genetic adaptation in Tibetans. Future single-cell transcriptome analysis and functional validations of the candidate genes are warranted to reveal the responsible cell types and the molecular pathways.” (highlighted in Page 20)

      For abbreviations of the manuscript, according to the reviewer’s suggestion, we added descriptions of all abbreviations of this study in corresponding position (Table S1 and S12).

      References

      Berg JJ, and Coop G (2014). A population genetic signal of polygenic adaptation. PLoS Genet 10(8): e1004412.

      Li J, et al. (2018). Application of Weighted Gene Co-expression Network Analysis for Data from Paired Design. Sci Rep 8(1): 622.

      Racimo F, Berg JJ, and Pickrell JK (2018). Detecting Polygenic Adaptation in Admixture Graphs. Genetics 208(4): 1565-1584.

      Zeng JF, et al. (2023). Functional investigation and two-sample Mendelian randomization study of neuropathic pain hub genes obtained by WGCNA analysis. Frontiers in Neuroscience 17.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations for The Authors):

      (1) Since the data suggests that the degradation of Mecp2 is a crucial event in the exit from quiescence, gaining a better understanding of the underlying mechanism would improve the significance of the study. In this regard, the authors should take advantage of the serum stimulated degradation of Mecp2 (Fig. 3D) to identify the signaling pathway(s) required for the degradation.

      Thank you for this suggestion. To decipher the molecular mechanisms underlying Mecp2-regulated quiescence exit, we performed RNA-seq combined with ChIP-seq to identify the Mecp2-dependent transcriptome genome-wide during the early stage of liver regeneration (Figure S6C). There were 2658 Mecp2 direct target genes, in which 537 were PHx-activated and 2121 were PHx-repressed genes (Figure 6A). GO analysis showed that PHx-activated Mecp2 targets were highly enriched in proliferation-associated biological processes such as ribosome biogenesis, rRNA metabolic process, ncRNA metabolic process, and regulation of transcription by RNA polymerase I, whereas PHx-repressed Mecp2 targets were associated with several metabolic processes including carboxylic acid catabolic process, cellular amino acid metabolic process, fatty acid metabolic process and steroid metabolic process (Figure 6B). These results suggest that Mecp2 plays a negative regulatory role during quiescence exit by activating metabolism-associated genes while repressing proliferation-associated genes in quiescent cells.

      Given the more rapid decay of Mecp2 at the protein compared to the mRNA level during the quiescence-proliferation transition, we speculated that Mecp2 is targeted by posttranslational regulation. This hypothesis was supported by proteasome inhibition with the proteasome inhibitor MG132, which attenuated the reduction of Mecp2 in quiescent cells after S.R. (Figure S5A). To identify the signaling pathway that regulate Mecp2 degradation during the G0/G1 transition, we performed immunoprecipitation followed by mass spectrometry (IP-MS) using Mecp2 antibody in quiescent 3T3 cells treated with or without S.R. (Figure S5B). A total of 647 proteins were identified as putative Mecp2 interactors. We were particularly interested in the proteins involved in proteasome-mediated ubiquitin-dependent protein catabolic process which was one of the enriched Gene Ontology (GO) items in the Mecp2 interactome (Table S1).

      (2) The authors suggest that Mecp2 downregulation accelerates the induction of pRb, which serves as a key marker for G0/G1 transition. However, their data only show increased magnitudes of the expression in Mecp2 downregulated cells at the timepoints when samples were collected (Figs. 2B and 4B). In the in vitro experiments, the authors should investigate earlier timepoints to demonstrate that induction of pRB during the quiescence exit occurs earlier in Mecp2 deficient cells compared to control cells. Likewise, a later induction of pRB in Mecp2 overexpression cells, in comparison to normal cells, should be demonstrated.

      Thank you for these valuable suggestions. We have, accordingly, collected cell samples re-entered the cell cycle at 30-, 60-, 90- and 120-minutes post-S.R. We examined the pRb expression and found that phosphorylation of retinoblastoma protein (pRb) at Ser807/811 occurs earlier (about 90 minutes) in Mecp2 deficient cells compared to control cells (Figure S4C). Compared to the EV, Mecp2 OE resulted in the delayed induction of pRB (about 60 minutes) upon S.R. (Figure S4D). These data indicate that enhanced reduction of Mecp2 stimulates exit from quiescence.

      (3) There are three well-known phosphorylation sites in Mecp2, including S80, S229, and S423. As protein ubiquitination and degradation are often triggered by phosphorylation, it would be interesting to examine whether phosphorylation at these sites of Mecp2 is required for its downregulation during quiescence exit. This can be achieved using non-phosphorylate mutants of Mecp2.

      This is a very good question. Indeed, the 26S ubiquitin-proteasome system (26S UPS) is responsible for the breakdown of MeCP2 (PMID: 28394263, 28973632). In 2009, the bona fide PEST (enriched in proline, glutamic acid, serine, and threonine) domains have been identified, which are highly conserved across vertebrate evolution (PMID: 19319913). Consensus sequences enriched in PEST residues have been found to predispose proteins containing them for rapid proteolytic degradation (PMID: 8755249, 2876518). In addition, phosphorylation within PEST motifs precedes ubiquitination of proteins (PMID: 15229225). One of the best characterized sites of MeCP2 phosphorylation (S80) (PMID: 19225110), as well as one of the identified ubiquitination sites (K82/K99) (PMID: 22615490), both fall within one of these regions. It is still noteworthy that most of the MeCP2 phosphorylation sites were found in close proximity to potential ubiquitylation sites. For example, Rett syndrome missense mutations in Rett syndrome affecting three (K82R, K135A, K256S) of the ubiquitination sites (PMID: 25165434) and S80 (within one of the PEST sequences) and K82 have been shown to be phosphorylated and ubiquitinated.

      Based on the above discussion, we providing a potential hypothesis that the MeCP2 turnover during cell cycle re-entry is achieved by an initial phosphorylation signal (phosphorylated at S80, S229, or S421) that triggers the ubiquitination of a close lysine residue. We hope to solve these issues and be able to present the findings in future work. Thanks again for your professional suggestions.

      (4) It would be interesting if the authors could also examine the effect of altered expression of Mecp2 on the maintenance of quiescence. For example, whether the downregulation of Mecp2 sensitizes quiescent cells for entry of the cell cycle in response to serum stimulation or delays withdrawal from the cell cycle upon serum starvation or contact inhibition.

      Thank you for your suggestions. Cell cycle synchronization was induced with serum deprivation. When nutrients are exhausted, altered expression of Mecp2 have no statistical influence on the maintenance of quiescence as analyzed by Flow cytometric (Figure 4D and H). This suggests that the altered expression of Mecp2 alone may not be sufficient for cell cycle exit. In the presence of growth factors or nutrients, loss of MeCP2 only accelerates the rate of cell cycle re-entry.

      Minor points:

      For Figs. 2D, 2H, and 2L, it would be more intuitive if the percentage of changes in liver index rather than the relative index values were used. Also, the values listed in the figures should start from time zero after partial hepatectomy rather than pre-surgery.

      Liver weight have the corresponding change with body weight. The liver index (ratio of regenerate liver weight/body weight) is tightly regulated and depends on metabolic demands of the organism. During the course of liver regeneration, reestablishment of liver volume after resection is regulated by the functional needs of the organism. Using the percentage of regenerate liver weight/body weight as a liver growth index could reflect the regenerative function. Next, we agree with the data presentation form and the values listed in the figures have been modified in the revised version.

      Reviewer #2 (Recommendations for The Authors):

      My concerns are as follows:

      (1) The authors note that the decrease in Mecp2 protein levels was more pronounced than the decrease in mRNA levels, suggesting the presence of post-translational regulation of Mecp2 during the early stages of G0 exit. Could the decrease in MeCP2 levels be related to autophagy flux?

      Thank you for your valuable comments. Also, we have compared the cells extracts from untreated and chloroquine-treated cells (to block lysosomal degradation). Chloroquine did not cause any accumulation of MeCP2 (Figure S5B). The results suggest that autophagy activity do not involve in the decrease the MeCP2 protein.

      (2) In addition to Cyclin D1, how about other cell cycle-related proteins (cyclin A, cyclin B, and cyclin E) were changed when MeCP2 was lost during cell cycle re-entry? Protein expression should be examined by western blot.

      We appreciate your valuable suggestions. The expression of cell cycle related protein cyclin A2, cyclin B1 and cyclin E1 were evaluated by Western blotting. The expression of cyclin A2, cyclin B1 and cyclin E1 was enhanced by the knockdown of MeCP2 (Figure 4B). Conversely, the repressed expression of cyclin A2, cyclin B1 and cyclin E1 was observed by the over-expression of MeCP2 (Figure 4F).

      (3) By combining MeCP2 ChIP-seq and RNA-seq of genes regulated by MeCP2, the authors uncovered the dual role of Mecp2 in preventing quiescence exit by targeting Rara and Nr1h3. All they show are the Q-PCR results. The authors should show the protein level of Rara and Nr1h3 when MeCP2 was lost during cell cycle re-entry.

      Thank you for your advice. In Figure 7C, the knockdown efficiency of Rara and Nr1h3 were checked by Western blot analysis.

      (4) The authors performed lentiviral and AAV-mediated gene knockdown to target Rara and Nr1h3 in Cells and Mecp2-cKO livers, respectively. The Knockdown efficacy should be verified by western blots (Fig 7 C and F).

      In Figure 7F, the consequences of the Rara and Nr1h3 knockdown efficiency was verified by Western blot analysis.

      (5) The other major concern is regarding the lack of quantitative assessments of MeCP2 WB results (Fig 2, Fig 4, and Fig 7).

      Thank you for this suggestion. We added supplementary figures to Figure 2B, 2F and 2J to show the quantification membrane signal of MeCP2 protein in liver regeneration. And Fig S4A and 4B showing the quantification signal of MeCP2 protein in NIH3t3 cell cycle re-entry model.

      (6) In the Figure legends of Fig 4 B and Fig 4F, the authors should delete the statistical descriptions, as there are no statistical results. In Fig 5F, Fig 5J, Fig 6D, Fig 7D and Fig7H, there are no statistical results of p < 0.01, p < 0.05 or *p < 0.0001, respectively. The authors should check the description in the figure legends. In Fig S2C, the level of significance should be annotated.

      We would like to express our heartfelt thanks for your thorough reading of our manuscript. We have made corrections to make manuscript clearer and more accurate. The level of significance have been annotated in Fig S2C.

      (7) In Fig S4A, there are no WB results of Cyclin D1 and pRb, the authors should check the description.

      Thank you for pointing this out. We have deleted the confusing statements in the revised manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We thank the constructive criticism provided by the reviewers and editor. Based on these suggestions, we have thoroughly reworked the manuscript. More specifically but not limit:

      (1) We have corrected the mistakes mentioned by the reviewers on a point-by-point basis.

      (2) We have provided additional experimental evidences to explain the rationale behind selecting five miRNAs for q-PCR validation. Furthermore, we have elaborated on the reasons for focusing primarily on research related to cartilage.

      (3) In response to concerns regarding overinterpretation in the manuscript, we have made more precise descriptions and revisions. Furthermore, we have added some details in our methods, including the addition of results showing the conservation of miR-199b-5p sequences between human and mouse species.

      (4) We have provided additional details on the experiments, including the process for predicting target genes, timing of chondrocyte culture and other experimental operations.

      (5) Finally, we have made additional revisions to the details of the figures to avoid any distortions and enhance the precision of the language.

      Below please find our responses to the reviewers’ comments on a point-by-point basis. You also can track the changes in the modified manuscript. We believe that this revision has been substantially improved.

      eLife assessment

      The manuscript provides interesting evidence that miR-199b-5p regulates osteoarthritis and as such it may be considered as a potential therapeutic target. This finding may be useful to further advance the field.

      Thank you for your positive comments.

      Although the study is considered potentially clinically relevant, the evidence provided was deemed insufficient and incomplete to support the conclusions drawn by the authors.

      Thank you for your critical comments and constructive advices. We have response point to point according to the reviewers’ questions and thoroughly re-working our manuscript. We hope the revised manuscript can be qualified to the criteria and be published on the journal of eLife.

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript, the authors observed that miR-199b-5p is elevated in osteoarthritis (OA) patients. They also found that overexpression of miR-199b-5p induced OA-like pathological changes in normal mice and inhibiting miR-199b-5p alleviated symptoms in knee OA mice. They concluded that miR-199b-5p is not only a potential micro-target for knee OA but also provides a potential strategy for the future identification of new molecular drugs.

      Thanks for your comment.

      Strengths:

      The data are generated from both human patients and animal models.

      Thanks for the positive comment.

      Weaknesses:

      The data presented in this manuscript is not solid enough to support their conclusions. There are several questions that need to be addressed to improve the quality of this study.

      The following questions that need to be addressed to improve the quality of the study.

      (1) Exosomes were characterized by electron microscopy and western blot analysis (for CD9, 264 CD63, and CD81). However, figure S1 only showed two sample WB results and there is no positive and negative control as well as the confused not clear WB figure.

      Thank you for your suggestion. We acknowledge that a comprehensive identification of extracellular vesicles should include both positive and negative samples. However, in some of the initial studies we referenced, the positive and negative control were not mentioned1;2. In our study, we identified extracellular vesicles using a combination of electron microscopy, nanoparticle tracking analysis, and marker detection of exosomes. We agree that having negative samples would make our results more convincing, and we will include a negative control group in our future experiments. Additionally, we have provided clearer images in the revised version. (supplemental fig1 A)

      Reference

      (1) Ying W, Riopel M, Bandyopadhyay G, et al. Adipose Tissue Macrophage-Derived Exosomal miRNAs Can Modulate In Vivo and In Vitro Insulin Sensitivity. Cell. 2017;171(2).

      (2) Fang T, Lv H, Lv G, et al. Tumor-derived exosomal miR-1247-3p induces cancer-associated fibroblast activation to foster lung metastasis of liver cancer. Nature Communications. 2018;9(1):191.

      (2) The sequencing of miRNAs in serum exosomes showed that 88 miRNAs were upregulated and 89 miRNAs were downregulated in KOA patients compared with the control group based on fold change > 1.5 and p < 0.05. Figure 2 legend did not clearly elucidate what those represent and why the authors chose those five miRNAs to further validate although they did mention it with several words in line 108 'based on the p-value and exosomal'.

      In fact, our study included two additional groups: the acupuncture treatment group (4 weeks of continuous acupuncture treatment) and the waiting treatment group (no intervention, followed by acupuncture treatment after 4 weeks), in addition to the healthy control and knee osteoarthritis (OA) patient groups. After comparing these four groups, we found that 11 genes (hsa-miR-504-3p, hsa-miR-1915-3p, hsa-miR-103a-2-5p, hsa-miR-887-3p, hsa-miR-1228-5p, hsa-miR-34c-3p, hsa-miR-3168, hsa-miR-518e-3p, hsa-miR-1296-5p, hsa-miR-338-3p, and hsa-miR-199b-5p) were upregulated in KOA patients but downregulated after acupuncture treatment, with no change in the waiting treatment group. Additionally, 7 genes (hsa-miR-448, hsa-miR-514a-3p, hsa-miR-4440, hsa-let-7f-5p, hsa-let-7a-5p, hsa-let-7d-5p, and hsa-miR-15b-3p) were downregulated in KOA patients but upregulated after acupuncture treatment, with no change in the waiting treatment group. Considering the improvement in clinical symptoms of KOA patients after acupuncture treatment, we believe that these 18 genes are of significant value. Based on overall expression abundance and species specificity, we finally selected 5 genes, namely the 5 genes mentioned in this article. Regarding this result, we have already included it in the supplementary fig5(fig. S5).

      Author response image 1.

      Venn diagram showing differentially expressed miRNAs in the OA group compared with healthy patients and patients who recovered after acupuncture treatment.

      (3) In Figure 3 legend and methods, the authors did not mention how they performed the cell viability assay. What cell had been used? How long were they treated and all the details? Other figure legends have the same problem without detailed information.

      Thank you for your suggestions. In Figure 3, cell viability was determined using the CCK-8 assay. We used second-generation chondrocytes for this analysis. The chondrocytes were obtained from young mice aged 3-5 days after birth. The cartilage tissues were extracted, and the cells were cultured in complete medium after digestion with collagenase. The detailed description of the cell viability assay, cell culture procedures, specific timing, and treatment methods of the cells used can be found in our revised manuscript. (page14-15,line304-313)

      Besides, we have made thorough revisions to all figure legends to provide a clearer explanation of the relevant content.

      (4) The authors claimed that Gcnt2 and Fzd6 are two target genes of miR-199b-5p. However, there is no convincing evidence such as western blot to support their bioinformatics prediction.

      In the current study, we first identified six potential target genes by intersecting the predicted targets obtained from six bioinformatics websites. Subsequently, q-PCR was employed to test all six genes, revealing two genes with significant changes, namely Fzd6 and Gcnt2. We then predicted the binding sites of these genes and validated their existence through luciferase assays. Moreover, we examined the expression of these two potential targets in human KOA samples using a human database and found them to be expressed specifically in the samples. These results suggest that Fzd6 and Gcnt2 are potential target genes for KOA. However, we didn’t do western blot assay to verify the results. Based on your suggestions, we have further discussed the limitations of our study in this regard and proposed future research strategies.

      (5) To verify the binding site on 3'UTR of two potential targets, the authors designed a mouse sequence for luciferase assay, but not sure if it is the same when using a human sequence.

      Thank for your great advice. We carried out the comparative analysis of sequence conservatism between human and mouse, and find the binding site on 3'UTR matches to human sequence very well. The sequence conservation between hsa_miR-199b-5p and mmu_miR-199b-5p was as high as 95.65%. We added the methods and results in the revised manuscript. (page9, line181-184; page17, line361-365) (supplemental fig6).

      In detail: Firstly, the sequence information of mmu_miRNA-199b-5p was used to locate the human homologous sequence in the UCSC database. The homologous sequence was found to be located in the human genome at chr9:128244721-128244830 (supplemental fig6 A). Based on this positional information and the source gene, a further comparison was conducted in miRbase to identify the nearest miRNA at the position of the human genome. It was discovered that hsa_miR-199b-5p is positionally conserved and located at chr9:128244721-128244830 (supplemental fig6 B). The sequence of hsa_miR-199b-5p was obtained from the miRbase database (supplemental fig6 C), and a comparative analysis was performed between the sequences of humans and mouse (supplemental fig6 D). Besides being positionally conserved, the sequence conservation between hsa_miR-199b-5p and mmu_miR-199b-5p was as high as 95.65%, indicating a good sequence conservation.

      Author response image 2.

      (A) By using the sequence information of mmu_miRNA-199b-5p, we located the position of its human homologous sequence in the UCSC database. (B) Based on the positional information and the source gene, we further aligned this position with the closest miRNA in miRbase. (C) We compared the sequences of hsa_miR-199b-5p and mmu_miR-199b-5p. (D) Conservation analysis was performed to compare the sequence conservation of miR-199b-5p.

      Reviewer #2 (Public Review):

      Summary:

      The authors identified miR-199b-5p as a potential OA target gene using serum exosomal small RNA-seq from human healthy and OA patients. Their RNA-seq results were further compared with publicly available datasets to validate their finding of miR-199b-5p. In vitro chondrocyte culture with miR-199b-5p mimic/inhibitor and in vivo animal models were used to evaluate the function of miR-199b-5p in OA. The possible genes that were potentially regulated by miR-199b-5p were also predicted (i.e., Fzd6 and Gcnt2) and then validated by using Luciferase assays.

      We greatly appreciate Reviewer #2 constructive comments.

      Strengths:

      (1) Strong in vivo animal models including pain tests.

      (2) Validates the binding of miR-199b-5p with Fzd6 and binding of miR-199b-5p with Gcnt2.

      Thanks for positive comment.

      Weaknesses:

      (1) The authors may overinterpret their results. The current work shows the possible bindings between miR-199b-5p and Fzd6 as well as bindings between miR-199b-5p and Gcnt2. However, whether miR-199b-5p truly functions through Fzd6 and/or Gcnt2 requires genetic knockdown of Fzd6 and Gcnt2 in the presence of miR-199b-5p.

      In this study, we employed a comprehensive approach by integrating data from six bioinformatics databases to identify potential target genes for miR-199b-5p. Subsequent qPCR analysis revealed significant changes in two genes, Fzd6 and Gcnt2. We then utilized luciferase assays to validate the predicted binding sites and confirmed the interaction between miR-199b-5p and these genes. Additionally, we examined the expression profiles of these potential target genes in human KOA samples using a human database, which unveiled distinct expression patterns.

      While our findings suggest that Fzd6 and Gcnt2 may serve as potential target genes for miR-199b-5p, we acknowledge the necessity for further experimental validation and in-depth functional characterization. Building upon your insightful recommendations, we have thoroughly addressed the research limitations and proposed potential research strategies for future investigations in our discussion. (page11,line227-231)

      (2) In vitro chondrocyte experiments were conducted in a 2D manner, which led to chondrocyte de-differentiation and thus may not represent the chondrocyte response to the treatments.

      We admit that 3D culture system will be more accurate and reliable. However, according to Liu Qianqian et al researches3, the 2D culture systems were also used and work well. Besides, the second-generation primary mice chondrocytes we used in the current study did not exhibit a significant dedifferentiated morphology. So, considering the experiment condition in our lab, we chose the second-generation cultured primary mouse chondrocytes in the whole process of cell experiment. To show the reliability of the cells, we provided more pictures in the supplement fig 7(fig. S7) In the future study, we will adopt 3D culture system for experiments. Thank you for your advices and we have added this limitation in the revised manuscript. (page11,line237-240)

      Author response image 3.

      Primary mice chondrocytes we cultured (P1)and the secondary generation cells(P2) we used in the following experiment.

      References which used 2D :

      (3) Liu Q, Zhai L, Han M, et al. SH2 Domain-Containing Phosphatase 2 Inhibition Attenuates Osteoarthritis by Maintaining Homeostasis of Cartilage Metabolism via the Docking Protein 1/Uridine Phosphorylase 1/Uridine Cascade. Arthritis & Rheumatology (Hoboken, NJ). 2022;74(3):462-474.

      (3) There is a lack of description for bioinformatic analysis.

      Sorry for our neglection. We have added relevant descriptions and details. (Pages 14, line299-303)

      (4) There are several errors in figure labeling.

      We have revised. (Fig. 3, Fig. 4, Fig. 5 and Fig. 7)

      Recommendations for the authors:

      We appreciate the reviewers' feedback as we believe it has significantly contributed to the refinement of our manuscript. We are confident that our revisions have strengthened the quality and impact of our study, and we agree that the suggestions presented by the reviewers are valuable and appropriate for publication.

      Reviewer #2 (Recommendations For The Authors):

      I would like to thank the authors for investigating the functional role of miR-199b-5p in knee OA. While this study has the potential to provide valuable knowledge to the fields of miRNAs and joint diseases, significant improvements in several areas are required.

      We appreciate your constructive comments, and we have made a substantial improvement to the manuscript. We thank all the reviewers for their advice as well as their criticisms.

      Major concerns:

      (1) According to the Authors, miR-199b-5p is identified by the results from their own miRNA-sequencing as well as comparison with other publicly available datasets (both synovium and cartilage datasets). It is unclear to me why the synovium dataset was used here as it appears that the entire manuscript was mainly focused on chondrocytes.

      Thank you for your question. As we are aware, cartilage degradation is the initial pathological change in knee osteoarthritis (KOA), which subsequently leads to other pathological changes such as synovial inflammation4. These factors are interrelated, and current research on KOA encompasses cartilage, synovium, and system inflammation et al. Therefore, when we identified a large number of dysregulated miRNAs in extracellular vesicles isolated from serum, it was crucial to determine whether these dysregulated miRNAs were also altered in cartilage or synovium. To address this, we compared our findings with publicly available databases and found a higher overlap with the cartilage cell dataset, including miRNA-199b. Consequently, we decided to focus our subsequent investigations on cartilage-related research.

      Reference

      (4) Hunter D, Bierma-Zeinstra S. Osteoarthritis. Lancet (London, England). 2019;393(10182):1745-1759.

      (2) Also, 169 of 177 differentially expressed exosome miRNAs were intersected with differentially expressed miRNAs from OA cartilage datasets. It is surprising that in the 5 selected miRNAs for further qRT-PCR validation, 3 out of 5 were not in the exosome miRNA dataset (i.e., hsa-mir-1296-5p, hsa-mir-15b-3p, and hsa-mir-338-3p; page 5, line 109 and Fig. 1B). Isn't that selecting the miRNAs that both differently expressed in exosome and cartilage datasets for validation more essential? Furthermore, from the Authors' exosome miRNA dataset, only 5 out of 15 KOA patients actually exhibited up-regulated miR-199b-5p vs. health controls. Please elaborate on how the target was determined.

      In fact, our study included two additional groups: the acupuncture treatment group (4 weeks of continuous acupuncture treatment) and the waiting treatment group (no intervention, followed by acupuncture treatment after 4 weeks), in addition to the healthy control and knee osteoarthritis (OA) patient groups. After comparing these four groups, we found that 11 genes (hsa-miR-504-3p, hsa-miR-1915-3p, hsa-miR-103a-2-5p, hsa-miR-887-3p, hsa-miR-1228-5p, hsa-miR-34c-3p, hsa-miR-3168, hsa-miR-518e-3p, hsa-miR-1296-5p, hsa-miR-338-3p, and hsa-miR-199b-5p) were upregulated in KOA patients but downregulated after acupuncture treatment, with no change in the waiting treatment group. Additionally, 7 genes (hsa-miR-448, hsa-miR-514a-3p, hsa-miR-4440, hsa-let-7f-5p, hsa-let-7a-5p, hsa-let-7d-5p, and hsa-miR-15b-3p) were downregulated in KOA patients but upregulated after acupuncture treatment, with no change in the waiting treatment group. Considering the improvement in clinical symptoms of KOA patients after acupuncture treatment, we believe that these 18 genes are of significant value. Based on overall expression abundance and species specificity, we finally selected 5 genes, namely the 5 genes mentioned in this article. Regarding this result, we have already included it in the supplementary fig5(fig. S5).

      Author response image 4.

      Venn diagram showing differentially expressed miRNAs in the OA group compared with healthy patients and patients who recovered after acupuncture treatment.

      (3) There is also a lack of description for bioinformatic analysis regarding how miRNA sequencing datasets were analyzed. What R/python packages or algorithms were used? What were the QC criteria?

      We apologize for any confusion caused. We have now included a clear description of the method employed, and R was utilized for this data analysis (revised in Page14, Line301-305). To ensure consistency, we compared our findings with publicly available human serum data from the database (GSE105027) using a fold change threshold of > 1.5 and a significance level of p < 0.05. In the cartilage data (GSE175961), we observed a list of miRNAs with shared expression patterns, yet the precise differential values could not be determined.

      (4) Another major concern is the chondrocyte culture method. Chondrocytes should be cultured in a 3D manner (i.e., a 3D pellet culture system or a micro mass culture method). 2D cultured chondrocytes tend to de-differentiate into MSC-like cells and thus lose their chondrocyte phenotype. This is evident from Fig. 3B and C. Cells started to spread out and only a few cells were positive for COL2A1 with a deep brown staining color. Thus, the results from the in vitro studies may not be representative of chondrocyte response to the treatments.

      We admit that 3D culture system will be more accurate and reliable. However, according to Liu Qianqian et al researches3, the 2D culture systems were also used and work well. Besides, the second-generation primary mice chondrocytes we used in the current study did not exhibit a significant dedifferentiated morphology. So, considering the experiment condition in our lab, we chose the second-generation cultured primary mouse chondrocytes in the whole process of cell experiment. To show the reliability of the cells, we provided more pictures in the supplement fig 7(fig. S7) In the future study, we will adopt 3D culture system for experiments. Thank you for your advices and we have added this limitation in the revised manuscript. (page11, line237-240)

      Author response image 5.

      Primary mice chondrocytes we cultured (P1)and the secondary generation cells(P2) we used in the following experiment.

      References which used 2D :

      (3) Liu Q, Zhai L, Han M, et al. SH2 Domain-Containing Phosphatase 2 Inhibition Attenuates Osteoarthritis by Maintaining Homeostasis of Cartilage Metabolism via the Docking Protein 1/Uridine Phosphorylase 1/Uridine Cascade. Arthritis & Rheumatology (Hoboken, NJ). 2022;74(3):462-474.

      (5) Page 7, lines 148-149: "The cartilage of mice injected with the miR-199b-5p mimic was slightly degraded (p=0.02) (Fig. 4E, F)". However, there was no significance between the groups found in Fig. 4F. Also, from the histological images of Fig. 4E, it looks like mice with inhibitor injection had more cartilage damage than miR-199b-5p mimic.

      We apologize for any confusion caused. Figures 4E and 4F represent the Safranin Fast Green Staining staining of the joint after the administration of miR-199b-5p inhibitor and mimic under physiological conditions. As you can see, there is minimal difference between these four images. There is no statistically significant difference. However, in Figures 5E and 5F, the MIA-induced KOA model was utilized, and noticeable differences can be observed after the administration of the inhibitor and mimic. In the revised version, we have emphasized that Figures 4E and 4F represent the results under physiological conditions, not under the MIA-induced model. (page 7, line 146-151)

      (6) Page 7, lines 149-150: "Additionally, the articular surface showed insect erosion (Fig. 4G)." It is also unclear how micro-CT analysis will be able to demonstrate the erosion of cartilage. Or the authors actually indicate the trochlear groove. However, this could also be observed in the control group and the results were not quantified. It is also unclear if the cross-section images of micro-CT shown here are helpful at all without any further explanation in the manuscript.

      Figure 4 G represents control, vehicle control, inhibitor, and mimic groups, while Figure 5 G represents model, model+vehicle control, model+inhibitor, and model+mimic groups. From Figure 4G, it can be observed that the simulator group showed the most obvious erosion appearance, while the inhibitor group did not exhibit this phenomenon5. From Figure 5G, it can be seen that the model group and model+mimic group exhibited the most pronounced erosion appearance, while the model+inhibitor group showed the best recovery. To highlight the pathological changes in the erosion appearance, we marked the typical locations with red arrows in the images for easy comparison and reading by the readers (Fig. 4G; Fig. 5G). We also made corresponding textual modifications in the original manuscript to address these findings (page 7, line 150-151; page 8, line 160-161). In addition, the 3D reconstruction of micro-CT is based on the synthesis of these cross-sectional images.

      References

      (5) Tao Y, Wang Z, Wang L, et al. Downregulation of miR-106b attenuates inflammatory responses and joint damage in collagen-induced arthritis. Rheumatology (Oxford, England). 2017;56(10):1804-1813.

      (7) Page 17, line 309-310: "Before model establishment and at 3, 7, 10, 14, 21, and 28 days after model establishment." Please re-write this as this is not clear regarding the experimental procedure.

      Thank you. We had to re-write the sentences as following:Baseline testing of behavioral pain thresholds was conducted prior to model establishment, followed by behavioral pain threshold testing on days 3, 7, 10, 14, 21, and 28 after model establishment. (pages15, line322-324)

      (8) Fig. 5A. The M + inhibitor and Model images are not at the same plane as M + mimic and M + RNAnc images.

      Thank you. We have modified.

      (9) Fig. 5B. There are two lines both with circle markers (Control and M+inhibitor). Please correct.

      We have corrected.

      (10) Fig. 5F. Missing * sign.

      We added *sign.

      (11) Please elaborate how the potential binding sites between miR-199b-5p and Gcnt2 and between miR-199b-5p and Fzd6.

      We apologize for any lack of clarity in the original text. In fact, we utilized targets to predict potential binding sites. Specifically, for the mouse species, we predicted that the 3'UTR of Fzd6 binds with miR-199b-5p at positions 2483-2490, 3244-3251, 3303-3309, and 3854-3860, while the 3'UTR of Gcnt2 binds with miR-199b-5p at positions 2755-2762 and 4144-4151. In the revised version, we provide a detailed description of the methodology used for predicting these sites and offer an elaborate explanation of the results. (pages16, line352)

      Additionally, to demonstrate consistency with human binding sites, we not only predicted the binding sites of human miR with these two target genes but also found a high conservation of up to 95.65% between the human and mouse sequences of miR-199b-5p. We have included this information in the supplementary materials (Fig. S6). In Fig. 6E-F, we presented the potential binding sites between miR-199b-5p and Gcnt2, as well as between miR-199b-5p and Fzd6. In addition, we provide the predicted binding of human sequence to illustrate the binding sites. Furthermore, the predicted binding of human miR-199b-5p with fzd6 and gcnt2 showed a high degree of consistency. (The fluorescent labeling in the following text indicates the potential predicted binding sites.) (Supplement file 8)

      hsa-miR-199b-5p MIMAT0000263

      CCCAGUGUUUAGACUAUCUGUUC

      NCBI Gene ID 8323 GenBank Accession NM_001164615

      Gene Symbol FZD6 3' UTR Length 1368

      Gene Description frizzled class receptor 6

      3' UTR Sequence: agaacattttctctcgttactcagaagcaaatttgtgttacactggaagtgacctatgcactgttttgtaagaatcactgttacattcttcttttgcacttaaagttgcattgcctactgttatactggaaaaaatagagttcaagaataatatgactcatttcacacaaaggttaatgacaacaatatacctgaaaacagaaatgtgcaggttaataatatttttttaatagtgtgggaggacagagttagaggaatcttccttttctatttatgaagattctactcttggtaagagtattttaagatgtactatgctattttacttttttgatataaaatcaagatatttctttgctgaagtatttaaatcttatccttgtatctttttatacatatttgaaaataagcttatatgtatttgaacttttttgaaatcctattcaagtatttttatcatgctattgtgatattttagcactttggtagcttttacactgaatttctaagaaaattgtaaaatagtcttcttttatactgtaaaaaaagatataccaaaaagtcttataataggaatttaactttaaaaacccacttattgataccttaccatctaaaatgtgtgatttttatagtctcgttttaggaatttcacagatctaaattatgtaactgaaataaggtgcttactcaaagagtgtccactattgattgtattatgctgctcactgatccttctgcatatttaaaataaaatgtcctaaagggttagtagacaaaatgttagtcttttgtatattaggccaagtgcaattgacttcccttttttaatgtttcatgaccacccattgattgtattataaccacttacagttgcttatattttttgttttaacttttgttttttaacatttagaatattacattttgtattatacagtacctttctcagacattttgtagaattcatttcggcagctcactaggattttgctgaacattaaaaagtgtgatagcgatattagtgccaatcaaatggaaaaaaggtagttttaataaacaagacacaacgtttttatacaacatactttaaaatattaaggagttttcttaattttgtttcctattaagtattattctttgggcaagattttctgatgcttttgattttctctcaatttagcatttgcttttggtttttttctctatttagcattctgttaaggcacaaaaactatgtactgtatgggaaatgttgtaaatattaccttttccacattttaaacagacaactttgaatacaaaaactttgttttgtgtgatcttttcattaataaaattatctttgtataagaaaaaaaaaaaaaa

      hsa-miR-199b-5p MIMAT0000263

      CCCAGUGUUUAGACUAUCUGUUC

      NCBI Gene ID 2651 GenBank Accession NM_001491

      Gene Symbol GCNT2 3' UTR Length 2780

      Gene Description glucosaminyl (N-acetyl) transferase 2 (I blood group)

      3' UTR Sequence: gctattcatgagctactcatgactgaagggaaactgcagctgggaagaggagcctgtttttgtgagagacttttgccttcgtaatgttaaccgtttcaggaccacgtttatagcttcaggacctggctacgtaattatacttaaaatatccactggacactgtgaaatacactaacaggatggctgggtagagcaatctgggcactttggccaattttagtcttgctgtttcttgatgctcacctctatattagtttattgttaggatcaatgataaatttaaatgacctcagatctttgcaccagatactcatcatatacaaatgttttagtaaaaaagagaattgtagataatactgtctaggaaaataagaattaggtttctttgaagaaggaatcttttataacaccttaacagtcaccactgtgctcaaccagacagatagtgaaacagctttctgggtaattcaccaatttcctttaaaacataagctacctgaatggagaatacatcttgtttctgagtttcaacactagcatttttggcttactcatggacaaagttctgtatatagtataaagtcattaacaagaaacaggatatgctttaagacagaattcactgtctgttgcttcagtaaaaggacctcggggaataaaacatttctctcttatatgccagaatgtaggctggtccctatgtcatgtcttccattaagaacactaaaaagtccttgcaagaatggagatatgcattcaagagaggtgctatcacatagatctagtctgaagtctggaacactttcctcttctatgacccctctctccccagtattatcttacttgcaaaatggagaccaaattctatcctgtgaggcttttaattgcaccatagtatgctctgagtagctttacactgcctggtactgatagtagtggctcgatttttaagagccttcaattgtagatgaacatctctgttatttatccctcattcatccatccgttcattcattcagccttcaatcaacatctcttgagtgtctattatgtacaggacatgtactgagacaaaaaggaaacataagagctttttcactctaaaaatcttggcaataatgtcaacaccagaaagcctcctctggagaatcttacagagtgattgtagtttaatacaggaacacacagggctgtgtagcatgataccaggcccaggagatcagtaattacaaattaagggttaaatcagagattattcaacagagagggagaaaggaggagacagagggaggacctgttgtgttccagccattctggtattcctttatgtatctaatttcattcaaacctcacaacagtcttgtgaggcccttatataattactcccattttgcagatgaagtaactgaggcttagaaaggttaatagcaccggggaacaatttctctgggtgagaattgggactctgttgctggtcttctcagttcatttcctgaggtggatttactgagagaaggtgaaataaagccatatttagtataccagagaaggtagattttaagaatggtctcagtgttaatactgagaaaaagtcctgtcagttcagaaaaaatgtgaagtctactttagtattcctgtaatactaaaccgttgagtttctaaatatttatttattctaacaaaaagcaattactacaaatggatgacacatttaatgaacacaattttattttttttctgtaactgtgcttgttgaatgtcaatcatatttaaagggaatgactttgaagtaaaaccttttttcttgctactgaaaaaaatggagttgttttgggtggtaaagtgttaaggaatagggacagctggtcacacaaggaactcttgaaggccacatgtgaaaacctgtcacttgcacagaggccagtcccactaaggtgaccagagtgggctccaagcacaaactgccattggctatagatgggactgtgtccccccaaaattcatgtgttggagccttaaccctcaatgtgatggtatttgagatggggcctttggtaagggaagtttagatgaggtcacgagggtaggaccctcatgatgggatgagtccccttacaagacctctggcttgggccgggcgtggtggctcacacctgtaatcccaacactttgggaggccaaggcaggtagatcacttgatgccaggagttccagaccaggctggccgacatggtgaaaccccatctctactaaaaaatataaaaattagccgggctttgtggcatgtgcctgtaatcccagctatttggcaggctgaggcatgagaatcgcttgaacccaggaggtggaggttacagtgagctgagagtgccccactgcactccagcctgggtgacagagcgagactttgtcccaaaacaaaataggtgaggggatagcgaatgcactcagggtcagcagtggagtttaaaaattgtctcttttcaacttatttaaatgacagcacctgagaagaggaaccgttttacactggatgtttctcatgtagaacaagaaatctttctggaattgatgtttacatgtctgttgttggtcatctctcctgtgtcttaaatactttaatgttggaagagcatagtgtttgggctagtgggtttctgacagcccatgggaatgccctgaaactactgtatctgatgtttgttttcgatgaggttccatgttttgttttcttgggaataaattaatatattgttttccaaaaaaaaaaaaaaaaaaaa

      (12) Page 10-11, Line 222-223: "Our findings indicate that miR-199b-5p plays a crucial role in KOA by targeting Fzd6 and Gcnt2". This is an overstatement. The current work shows the possible bindings of miR-199b-5p and Fzd6 as well as bindings of miR-199b-5p and Gcnnt2. Whether miR-199b-5p truly functions through Fzd6 and/or Gcnt2 requires genetic knockdown of Fzd6 and Gcnt2 in the presence of miR-199b-5p. Thus, please tune down this statement and the title of the manuscript.

      We agree your opinion of our conclusion. Therefore, we delete the overstatement sentences and tune down the conclusion of the manuscript. (the title; page 8,179; page11, line227-228)

      (13) The Schematic figure (the last figure). Please remove osteophyte as this was not quantified in the study.

      We modified the schematic figure accordingly.

      Minor concerns:

      (1) Most figures were distorted.

      We provide a new version of the figure to avoid distortions.

      (2) Providing GO term numbers in Fig. 1C is not very helpful. Maybe show the GO term and corresponding numbers in the manuscript (Page 4, lines 79 - 82).

      Thank you for your advice. We added the corresponding notes of the GO term numbers in the manuscript to explain each biological concept of it. (Page 4, line 77-89;Page 22,line 515-532)

      (3) What were M-0.5 and M-1 in Fig. 2D? Different MIA concentrations?

      Yes, these are different MIA concentrations, which we illustrate in the legend. (Page 23, line 535-536)

      (4) Please follow the nomenclature of the gene symbol. For example, Fig. 3E-P should be mouse genes (?).

      We modified the relevant gene symbol.

      (5) Page 3, line 59. Not all chondrocytes are pathogenic cells in OA.

      We are sorry for the mistake, now it has been modified. (Page 3, line 59)

      (6) Typo. Page 3, line 55.

      We changed the Typo.

      (7) Page 4, line 78. These are differentially expressed miRNAs, not genes.

      We have revised the unsuitable expression. (Page4, line75-76)

      I wish the authors all the best with their continued work in this area.

      Thank you for your wishes.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      The authors build upon prior data implicating the secreted peptidoglycan hydrolase SagA produced by Enterococcus faecium in immunotherapy. Leveraging new strains with sagA deletion/complementation constructs, the investigators reveal that sagA is non-essential, with sagA deletion leading to a marked growth defect due to impaired cell division, and sagA being necessary for the immunogenic and anti-tumor effects of E. faecium. In aggregate, the study utilizes compelling methods to provide both fundamental new insights into E. faecium biology and host interactions and a proof-of-concept for identifying the bacterial effectors of immunotherapy response.

      We thank the Reviewers for their positive feedback on our manuscript. We also appreciate their helpful comments/critiques and have revised the manuscript as indicated below.

      Public Reviews:

      Reviewer #1 (Public Review):

      Klupt, Fam, Zhang, Hang, and colleagues present a novel study examining the function of sagA in E. faecium, including impacts on growth, peptidoglycan cleavage, cell separation, antibiotic sensitivity, NOD2 activation, and modulation of cancer immunotherapy. This manuscript represents a substantial advance over their prior work, where they found that sagA-expressing strains (including naturally-expressing strains and versions of non-expressing strains forced to overexpress sagA) were superior in activating NOD2 and improving cancer immunotherapy. Prior to the current study, an examination of sagA mutant E. faecium was not possible and sagA was thought to be an essential gene.

      The study is overall very carefully performed with appropriate controls and experimental checks, including confirmation of similar densities of ΔsagA throughout. Results are overall interpreted cautiously and appropriately.

      I have only two comments that I think addressing would strengthen what is already an excellent manuscript.

      In the experiments depicted in Figure 3, the authors should clarify the quantification of peptidoglycans from cellular material vs supernatants. It should also be clarified whether the sagA need to be expressed endogenously within E. faecium, and whether ambient endopeptidases (perhaps expressed by other nearby bacteria or recombinant enzymes added) can enzymatically work on ΔsagA cell wall products to produce NOD2 ligands?

      We mentioned in the main text that peptidoglycan was isolated from bacterial sacculi and digested with mutanolysin for LC-MS analysis. We have now also included “mutanolysin-digested” sacculi in the Figure 3 legend as well.

      We have added the following text “We next evaluated live bacterial cultures with mammalian cells to determine their ability to activate the peptidoglycan pattern recognition receptor NOD2” and “our analysis of these bacterial strains” to indicate live cultures were evaluated for NOD2 activation.

      We have also added the following text “Our results also demonstrated that while many enzymes are required for the biosynthesis and remodeling of peptidoglycan in E. faecium, SagA is essential for generating NOD2 activating muropeptides ex vivo.”

      In the murine experiments depicted in Figure 4, because the bacterial intervention is being performed continuously in the drinking water, the investigators have not distinguished between colonization vs continuous oral dosing of the mice peptidoglycans. While I do not think additional experimentation is required to distinguish the individual contributions of these 2 components in their therapeutic intervention, I do think the interpretation of their results should include this perspective.

      We have added the following text “We note that by continuous oral administration in the drinking water, live E. faecium and soluble muropeptides that are released into the media during bacterial growth may both contribute to NOD2 activation in vivo.” and revised the following text “Nonetheless, these results demonstrate SagA is not essential for E. faecium colonization, but required for promoting the ICI antitumor activity through NOD2 in vivo.

      Reviewer #2 (Public Review):

      Summary:

      The gut microbiome contributes to variation in the efficacy of immune checkpoint blockade in cancer therapy; however, the mechanisms responsible remain unclear. Klupt et al. build upon prior data implicating the secreted peptidoglycan hydrolase SagA produced by Enterococcus faecium in immunotherapy, leveraging novel strains with sagA deleted and complemented. They find that sagA is non-essential, but sagA deletion leads to a marked growth defect due to impaired cell division. Furthermore, sagA is necessary for the immunogenic and anti-tumor effects of E. faecium. Together, this study utilizes compelling methods to provide fundamental new insights into E. faecium biology and host interactions, and a proof-of-concept for identifying the bacterial effectors of immunotherapy response.

      Strengths:

      Klupt et al. provide a well-written manuscript with clear and compelling main and supplemental figures. The methods used are state-of-the-art, including various imaging modalities, bacterial genetics, mass spectrometry, sequencing, flow cytometry, and mouse models of immunotherapy response. Overall, the data supports the conclusions, which are a valuable addition to the literature.

      Weaknesses:

      Only minor revision recommendations were noted.

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      General comments - the number/type of replicates and statistics are missing from some of the figure panels. Please be sure to add these throughout - all main figure panels should have replicates. I've also noted some specific cases below.

      Abstract - sagA is non-essential, need to edit text at "essential functions".

      This change has been made.

      "small number of mutations" - specify how many in the text.

      We revised the text. “Small number” is changed to “11”.

      "under control of its native promoter" - what was the plasmid copy number? It looks clearly overexpressed in Figure 1d despite using a native promoter, although it's a bit hard to know for sure without a loading control.

      pAM401 has p15A origin of replication, therefore the plasmid copy number ~20-30 copies (Lutz R. et al Nucleic Acids Res. 1997). Total protein was visualized by Stain-Free™ imaging technology (BioRad) and serves as protein loading control and has been relabeled accordingly.

      "decrease levels of small muropeptides" - the asterisks are missing from Figure 3a.

      Green asterisks for peaks 2, 3, 7 and purple asterisks for peaks 13, 14 were added.

      The use of "Com 15 WT" in the figures is confusing - just replace it with "wt" and specify the strain in the text. Presumably, all of the strains are on the Com 15 background.

      “Com15 WT” was replaced to “WT” in figures and main text.

      Change 1d to 1b so that the panels are in order (reading left to right and then top to bottom).

      Figure 1 legend is missing a number of replicates and statistics for 1a.

      Number of replicates were added.

      Figure 1b - it's unclear to me what to look at here, could add arrows indicating the feature or interest and expand the relevant text.

      Arrows pointing to cell clusters were added.

      Figure 1d - what is "stain free"? It would be preferable to show a loading control using an antibody against a constitutive protein to allow for normalization of the loading control.

      Stain-Free Imaging technology (BioRad) utilizes gel-containing trihalo compound to make proteins fluorescent directly in the gel with a short photoactivation, allowing the immediate visualization of proteins at any point during electrophoresis and western blotting. Stain-Free total protein measurement serves as a reliable loading control comparable to Coomassie Blue Staining. This has been relabeled a “Total protein” in the Figure and Stain-free imaging technology is noted in the legend.

      ED Figure 1 - representative of how many biological replicates?

      Legends are updated.

      ED Figure 2a - I would replace this with a table, it's not necessary to show the strip images. Also, please specify the number of replicates per group.

      Additional Extended Data Table 2 was added.

      ED Figure 2b - This data was not that convincing since the sagA KO has a marked growth defect and the time points are cut off too soon to know if growth would occur later. The MIC definition is potentially misleading. Should specific a % growth cutoff (i.e. <10% of vehicle control) and the metric used (carrying capacity or AUC). Then assign MIC to the tested concentration, not a range. The empty vector also seems to impact MIC, which is concerning and complicates the interpretation. Specify the number of replicates and add statistics. Given these various concerns, I might suggest removing this figure, as it doesn't really add much to the story.

      We appreciate this comment from the Reviewer, but believe this data is helpful for paper and have included longer time points for the growth data. The definition of MIC for ED Fig. 2b has been included in the legend.

      Figure 2 - specify the type of replicate. Number of cells? Number of slices? Number of independent cultures?

      For Cryo-ET experiments single bacterial cultures were prepared. Number of cells and slices for analysis are indicated in the legend. Legends are updated.

      Figure 4e - missing the water group, was it measured?

      Water (αPD-L1) group was not included in immune profiling of tumor infiltrating lymphocytes (TILs) experiment, as we have previously demonstrated limited impact on ICI anti-tumor activity and T cell activation in this setting (Griffin M et al Science 2021).

      Figure 4d - is this media specific to your strains? If not, qPCR may be a better method using strain-specific primers.

      Yes, HiCrome™ Enterococcus faecium agar plates (HIMEDIA 1580) are selective for Enterococcus species, moreover the agar is chromogenic allowing to identify E. faecium as yellow colonies among other Enterococcus species.

    1. Author response:

      We are planning to extend our results of the Jurkat model system to primary T cells, as requested by the referees and eLife’s Senior Editor. This will involve the inclusion of new figures, including super-resolution/STED images to reinforce our results and to satisfy the referees’ points. In addition, we will improve and/or replace all the mentioned images to solve the raised caveats, including further quantification and analyses.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In their manuscript entitled 'The domesticated transposon protein L1TD1 associates with its ancestor L1 ORF1p to promote LINE-1 retrotransposition', Kavaklıoğlu and colleagues delve into the role of L1TD1, an RNA binding protein (RBP) derived from a LINE1 transposon. L1TD1 proves crucial for maintaining pluripotency in embryonic stem cells and is linked to cancer progression in germ cell tumors, yet its precise molecular function remains elusive. Here, the authors uncover an intriguing interaction between L1TD1 and its ancestral LINE-1 retrotransposon.

      The authors delete the DNA methyltransferase DNMT1 in a haploid human cell line (HAP1), inducing widespread DNA hypo-methylation. This hypomethylation prompts abnormal expression of L1TD1. To scrutinize L1TD1's function in a DNMT1 knock-out setting, the authors create DNMT1/L1TD1 double knock-out cell lines (DKO). Curiously, while the loss of global DNA methylation doesn't impede proliferation, additional depletion of L1TD1 leads to DNA damage and apoptosis.

      To unravel the molecular mechanism underpinning L1TD1's protective role in the absence of DNA methylation, the authors dissect L1TD1 complexes in terms of protein and RNA composition. They unveil an association with the LINE-1 transposon protein L1-ORF1 and LINE-1 transcripts, among others.

      Surprisingly, the authors note fewer LINE-1 retro-transposition events in DKO cells than in DNMT1 KO alone.

      Strengths:

      The authors present compelling data suggesting the interplay of a transposon-derived human RNA binding protein with its ancestral transposable element. Their findings spur interesting questions for cancer types, where LINE1 and L1TD1 are aberrantly expressed.

      Weaknesses:

      Suggestions for refinement:

      The initial experiment, inducing global hypo-methylation by eliminating DNMT1 in HAP1 cells, is intriguing and warrants a more detailed description. How many genes experience misregulation or aberrant expression? What phenotypic changes occur in these cells?

      The transcriptome analysis of DNMT1 KO cells showed hundreds of deregulated genes upon DNMT1 ablation. As expected, the majority were up-regulated and gene ontology analysis revealed that among the strongest up-regulated genes were gene clusters with functions in “regulation of transcription from RNA polymerase II promoter” and “cell differentiation” and genes encoding proteins with KRAB domains. In addition, the de novo methyltransferases DNMT3A and DNMT3B were up-regulated in DNMT1 KO cells suggesting the set-up of compensatory mechanisms in these cells. We will include this data set in the revised version of the manuscript.

      Why did the authors focus on L1TD1? Providing some of this data would be helpful to understand the rationale behind the thorough analysis of L1TD1.

      We have previously discovered that conditional deletion of the maintenance DNA methyltransferase DNMT1 in the murine epidermis results not only in the up-regulation of mobile elements, such as IAPs but also the induced expression of L1TD1 ((Beck et al, 2021), Suppl. Table 1 and Author response image 1). Similary, L1TD1 expression was induced by treatment of primary human keratinocytes or squamous cell carcinoma cells with the DNMT inhibitor aza-deoxycytidine (Author response image 2 and 3). These finding are in accordance with the observation that inhibition of DNA methyltransferase activity by azadeoxycytidine in human non-small cell lung cancer cells (NSCLCs) results in upregulation of L1TD1 (Altenberger et al, 2017). Our interest in L1TD1 was further fueled by reports on a potential function of L1TD1 as prognostic tumor marker. We will include this information in the revised manuscript.

      Author response image 1.

      RT-qPCR of L1TD1 expression in cultured murine control and Dnmt1 Δ/Δker keratinocytes. mRNA levels of L1td1 were analyzed in keratinocytes isolated at P5 from conditional Dnmt1 knockout mice (Beck et al., 2021). Hprt expression was used for normalization of mRNA levels and wildtype control was set to 1. Data represent means ±s.d. with n=4. **P < 0.01 (paired t-test).

      Author response image 2.

      RT-qPCR analysis of L1TD1 expression in primary human keratinocytes. Cells were treated with 5-aza-2-deoxycidine for 24 hours or 48 hours, with PBS for 48 hours or were left untreated. 18S rRNA expression was used for normalization of mRNA levels and PBS control was set to 1. Data represent means ±s.d. with n=3. **P < 0.01 (paired t-test).

      Author response image 3.

      Induced L1TD1 expression upon DNMT inhibition in squamous cell carcinoma cell lines SCC9 and SCCO12. Cells were treated with 5-aza-2-deoxycidine for 24 hours, 48 hours or 6 days. (A) Western blot analysis of L1TD1 protein levels using beta-actin as loading control. (B) Indirect immunofluorescence microscopy analysis of L1TD1 expression in SCC9 cells. Nuclear DNA was stained with DAPI. Scale bar: 10 µm. (C) RT-qPCR analysis of L1TD1 expression in primary human keratinocytes. Cells were treated with 5-aza-2deoxycidine for 24 hours or 48 hours, with PBS for 48 hours or were left untreated. 18S rRNA expression was used for normalization of mRNA levels and PBS control was set to 1. Data represent means ±s.d. with n=3. P < 0.05, *P < 0.01 (paired t-test).

      The finding that L1TD1/DNMT1 DKO cells exhibit increased apoptosis and DNA damage but decreased L1 retro-transposition is unexpected. Considering the DNA damage associated with retro-transposition and the DNA damage and apoptosis observed in L1TD1/DNMT1 DKO cells, one would anticipate the opposite outcome. Could it be that the observation of fewer transposition-positive colonies stems from the demise of the most transposition-positive colonies? Further exploration of this phenomenon would be intriguing.

      This is an important point and we were aware of this potential problem. Therefore, we calibrated the retrotransposition assay by transfection with a blasticidin resistance gene vector to take into account potential differences in cell viability and blasticidin sensitivity. Thus, the observed reduction in L1 retrotransposition efficiency is not an indirect effect of reduced cell viability.

      Based on previous studies with hESCs, it is likely that, in addition to its role in retrotransposition, L1TD1 has additional functions in the regulation of cell proliferation and differentiation. L1TD1 might therefore attenuate the effect of DNMT1 loss in KO cells generating an intermediate phenotype (as pointed out by Reviewer 2) and simultaneous loss of both L1TD1 and DNMT1 results in more pronounced effects on cell viability.

      Reviewer #2 (Public Review):

      In this study, Kavaklıoğlu et al. investigated and presented evidence for the role of domesticated transposon protein L1TD1 in enabling its ancestral relative, L1 ORF1p, to retrotranspose in HAP1 human tumor cells. The authors provided insight into the molecular function of L1TD1 and shed some clarifying light on previous studies that showed somewhat contradictory outcomes surrounding L1TD1 expression. Here, L1TD1 expression was correlated with L1 activation in a hypomethylation-dependent manner, due to DNMT1 deletion in the HAP1 cell line. The authors then identified L1TD1-associated RNAs using RIP-Seq, which displays a disconnect between transcript and protein abundance (via Tandem Mass Tag multiplex mass spectrometry analysis). The one exception was for L1TD1 itself, which is consistent with a model in which the RNA transcripts associated with L1TD1 are not directly regulated at the translation level. Instead, the authors found the L1TD1 protein associated with L1-RNPs, and this interaction is associated with increased L1 retrotransposition, at least in the contexts of HAP1 cells. Overall, these results support a model in which L1TD1 is restrained by DNA methylation, but in the absence of this repressive mark, L1TD1 is expressed and collaborates with L1 ORF1p (either directly or through interaction with L1 RNA, which remains unclear based on current results), leads to enhances L1 retrotransposition. These results establish the feasibility of this relationship existing in vivo in either development, disease, or both.

    1. Author response:

      Reviewer #1 (Public Review):

      Summary:

      [...] This study is a fundamental step towards our better understanding of the mechanisms underlying light effects on cognition and consequently optimising lighting standards.

      Strengths:

      While it is still impossible to distinguish individual hypothalamic nuclei, even with the high-resolution fMRI, the authors split the hypothalamus into five areas encompassing five groups of hypothalamic nuclei. This allowed them to reveal that different parts of the hypothalamus respond differently to an increase in illuminance. They found that higher illuminance increased the activity of the posterior part of the hypothalamus encompassing the MB and parts of the LH and TMN, while decreasing the activity of the anterior parts encompassing the SCN and another part of TMN. These findings are somewhat in line with studies in animals. It was shown that parts of the hypothalamus such as SCN, LH, and PVN receive direct retinal input in particular from ipRGCs. Also, acute chemogenetic activation of ipRGCs was shown to induce activation of LH and also increased arousal in mice.

      Weaknesses:

      While the light characteristics are well documented and EDI calculated for all of the photoreceptors, it is not very clear why these irradiances and spectra were chosen. It would be helpful if the authors explained the logic behind the four chosen light conditions tested. Also, the lights chosen have cone-opic EDI values in a high correlation with the melanopic EDI, therefore we can't distinguish if the effects seen here are driven by melanopsin and/or other photoreceptors. In order to provide a more mechanistic insight into the light-driven effects on cognition ideally one would use a silent substitution approach to distinguish between different photoreceptors. This may be something to consider when designing the follow-up studies.

      We thank the reviewer for acknowledging the quality and interest of our work and agree with the weaknesses they pointed out.

      Blue-enriched light illuminances were set according to the technical characteristics of the light source and to keep the overall photon flux similar to prior 3T MRI studies of our team (between ~1012 and 1014 ph/cm²/s) (Vandewalle et al. 2010 PNAS, Vandewalle et al. 2011 Biol. Psy.). The orange light was introduced as a control visual stimulation for potential secondary whole-brain analyses. It’s photopic illuminance should ideally have been set similar to the low illuminance blue-enriched light condition, but it was not the case. For the present region of interest analyses, we discarded colour differences between the light conditions and only considered illuminance as indexed by mel EDI lux. This constitutes indeed a limitation of our study as it does not allow attributing the findings to a particular photoreceptor class.

      The revised version of the manuscript will include a better explanation as to the choice of illuminances and spectra. The discussion will make clear that these choices limit the interpretation about the photoreceptors involved. The discussion will also point out that silent substitution could be used in the future to resolve such question.

      Reviewer #2 (Public Review):

      [...] By shedding light on these complex interactions, this research endeavors to contribute to the foundational knowledge necessary for developing innovative therapeutic strategies aimed at enhancing cognitive function through environmental modulation.

      Strengths:

      (1) Considerable Sample Size and Detailed Analysis: The study leverages a robust sample size and conducts a thorough analysis of hypothalamic dynamics, which enhances the reliability and depth of the findings.

      (2) Use of High-Resolution Imaging: Utilizing 7 Tesla fMRI to analyze brain activity during cognitive tasks offers high-resolution insights into the differential effects of illuminance on hypothalamic activity, showcasing the methodological rigor of the study.

      (3) Novel Insights into Illuminance Effects: The manuscript reveals new understandings of how different regions of the hypothalamus respond to varying illuminance levels, contributing valuable knowledge to the field.

      (4) Exploration of Potential Therapeutic Applications: Discussing the potential therapeutic applications of light modulation based on the findings suggests practical implications and future research directions.

      Weaknesses:

      (1) Foundation for Claims about Orexin and Histamine Systems: The manuscript needs to provide a clearer theoretical or empirical foundation for claims regarding the impact of light on the orexin and histamine systems in the abstract.

      (2) Inclusion of Cortical Correlates: While focused on the hypothalamus, the manuscript may benefit from discussing the role of cortical activation in cognitive performance, suggesting an opportunity to expand the scope of the manuscript.

      (3) Details of Light Exposure Control: More detailed information about how light exposure was controlled and standardized is needed to ensure the replicability and validity of the experimental conditions.

      (4) Rationale Behind Different Exposure Protocols: To clarify methodological choices, the manuscript should include more in-depth reasoning behind using different protocols of light exposure for executive and emotional tasks.

      We thank the reviewer for recognising the interest and strength of our study. We agree that corrections and clarifications to the text were needed. We will address the weaknesses they pointed out as follows:

      (1) As detailed in the discussion, we do believe orexin and histamine are excellent candidates for mediating the results we report. As also pointing out, however, we are in no position to know which neurons, nuclei, neurotransmitter and neuromodulator underlie the results. We will therefore remove the last sentence of the abstract as we agree our final statement in the abstract was too strong. We will carefully reconsider the discussion to avoid such overstatements.

      (2) We are unsure at this stage how to address the comment of the reviewer without considerably lengthening the manuscript with statements which can only be putative. Hypothalamus nuclei are connected to multiple cortical (and subcortical) structures. The relevance of these projections will vary with the cognitive task considered. In addition, we have not yet considered the cortex in our analyses such that truly integrating cortical structures appears premature. We will nevertheless refer to the general statement that subcortical structures (and particularly those receiving direct retinal projections) are likely to receive light illuminance signal first before passing on the light modulation to the cortical regions involved in the ongoing cognitive process.

      (3) Illuminance and spectra could not be directly measured within the MRI scanner due to the ferromagnetic nature of measurement systems. The MR coil and the associated optic fibre stand, together with the entire lighting system were therefore placed outside of the MR room to reproduce the experimental conditions of the in a completely dark room. A sensor was placed 2 cm away from the mirror of the coil (mounted at eye level), i.e. where the eye of the first author of the paper would be positioned, to measure illuminance and spectra. The procedure was repeated 4 times for illuminance and twice for spectra and measurements were averaged. This procedure does not take into account inter-individual variation in head size and orbit shape such that the reported illuminance levels may have varied slightly across subjects. The relative differences between illuminance are very unlikely to vary substantially across participants such that statistics consisting of tests for the impact of relative differences in illuminance were not affected. We will report these methodological details in the supplementary material file associated to the paper.

      (4) The comment is similar to the issue raised by reviewer 1 (and reviewer 3) so we refer to the response provided to reviewer 1 to address the final comment of reviewer 2.

      Reviewer #3 (Public Review):

      [...] The authors find evidence in support of a posterior-to-anterior gradient of increased blood flow in the hypothalamus during task performance that they later relate to performance on two different tasks. The results provide an enticing link between light levels, hypothalamic activity, and cognitive/affective function, however, clarification of some methodological choices will help to improve confidence in the findings.

      Strengths:

      The authors' focus on the hypothalamus and its relationship to light intensity is an important and understudied question in neuroscience.

      Weaknesses:

      I found it challenging to relate the authors' hypotheses, which I found to be quite compelling, to the apparatus used to test the hypotheses - namely, the use of orange light vs. different light intensities; and the specific choice of the executive and emotional tasks, which differed in key features (e.g., block-related vs. event-related designs) that were orthogonal to the psychological constructs being challenged in each task.

      Given the small size of the hypothalamus and the irregular size of the hypothalamic parcels, I wondered whether a more data-driven examination of the hypothalamic time series would have provided a more parsimonious test of their hypothesis.

      We thank the reviewer for acknowledging the originality and interest of our study. We agree that some methodological choices needed more explanations. We will address the weaknesses they pointed out as follows:

      The first comment questions the choices of the light conditions and of the tasks. Regarding light conditions, since reviewer 1 (and reviewer 2) raised a similar issue, we refer to the response provided to reviewer 1. We agree that many different tasks could have been used to test our hypotheses. Prior work of our team showed that the n-back task and emotional task we used were successful probes to demonstrate that light illuminance modulates cognitive activity, including within subcortical structures (though resolution did not allow precise isolation of nuclei or subparts). When taking the step of ultra-high field imaging we therefore opted for these tasks as our goal was to show that illuminance affects subcortical brain activity across cognitive domains in general and we were not interested in tasks that would test specific aspects of these domains. The fact that one task is event-related while the other consists of a block design adds, in our view, to the robustness of our finding that a similar anterior-posterior gradient of activity modulation by illuminance is present in hypothalamus. We will update the discussion to highlight this aspect.

      As mentioned in the text, the protocol also included an auditory attentional task that could have further broadened the potential generalisability of our findings, but it was not part of the analyses as it could only include 2 illuminance levels due to time constrains.

      We agree that a data driven approach could have constituted an alternative means to tests our hypothesis. We opted for an approach that we mastered best while still allowing to conclusively test for regional differences in activity across the hypothalamus. Examination of time series of the very same data we used will mainly confirm the results of our analyses – an anterior-posterior gradient in the impact of illuminance - and may yield slight differences in the limits of the subparts of the hypothalamus undergoing decreased or increased activity with increasing illuminance. While the suggested approach may have been envisaged if we had been facing negative results (i.e. no differences between subparts, potentially because subparts would not correspond functional differences in response to illuminance change), it would now constitute a circular confirmation of our main findings (i.e. using the same data). While we truly appreciate the suggestion, we do not consider that it would constitute a more parsimonious test of our hypothesis now that we successfully applied GLM/parcellation and GLMM approaches.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript, Bell et al. provide an exhaustive and clear description of the diversity of a new class of predicted type IV restriction systems that the authors denote as CoCoNuTs, for their characteristic presence of coiled-coil segments and nuclease tandems. Along with a comprehensive analysis that includes phylogenetics, protein structure prediction, extensive protein domain annotations, and an in-depth investigation of encoding genomic contexts, they also provide detailed hypotheses about the biological activity and molecular functions of the members of this class of predicted systems. This work is highly relevant, it underscores the wide diversity of defence systems that are used by prokaryotes and demonstrates that there are still many systems to be discovered. The work is sound and backed-up by a clear and reasonable bioinformatics approach. I do not have any major issues with the manuscript, but only some minor comments.

      Strengths:

      The analysis provided by the authors is extensive and covers the three most important aspects that can be covered computationally when analysing a new family/superfamily: phylogenetics, genomic context analysis, and protein-structure-based domain content annotation. With this, one can directly have an idea about the superfamily of the predicted system and infer their biological role. The bioinformatics approach is sound and makes use of the most current advances in the fields of protein evolution and structural bioinformatics.

      Weaknesses:

      It is not clear how coiled-coil segments were assigned if only based on AF2-predicted models or also backed by sequence analysis, as no description is provided in the methods. The structure prediction quality assessment is based solely on the average pLDDT of the obtained models (with a threshold of 80 or better). However, this is not enough, particularly when multimeric models are used. The PAE matrix should be used to evaluate relative orientations, particularly in the case where there is a prediction that parts from 2 proteins are interacting. In the case of multimers, interface quality scores, such as the ipTM or pDockQ, should also be considered and, at minimum, reported.

      A description of the coiled-coil predictions has been added to the Methods. For multimeric models, PAE matrices and ipTM+pTM scores have been included in Supplementary Data File S1.

      Reviewer #2 (Public Review):

      Summary:

      In this work, using in-depth computational analysis, Bell et al. explore the diverse repertoire of type IV McrBC modification-dependent restriction systems. The prototypical two-component McrBC system has been structurally and functionally characterised and is known to act as a defence by restricting phage and foreign DNA containing methylated cytosines. Here, the authors find previously unanticipated complexity and versatility of these systems and focus on detailed analysis and classification of a distinct branch, the so-called CoCoNut, named after its composition of coiled-coil structures and tandem nucleases. These CoCoNut systems are predicted to target RNA as well as DNA and to utilise defence mechanisms with some similarity to type III CRISPR-Cas systems.

      Strengths:

      This work is enriched with a plethora of ideas and a myriad of compelling hypotheses that now await experimental verification. The study comes from the group that was amongst the first to describe, characterize, and classify CRISPR-Cas systems. By analogy, the findings described here can similarly promote ingenious experimental and conceptual research that could further drive technological advances. It could also instigate vigorous scientific debates that will ultimately benefit the community.

      Weaknesses:

      The multi-component systems described here function in the context of large oligomeric complexes. Some of the single chain AF2 predictions shown in this work are not compatible, for example, with homohexameric complex formation due to incompatible orientation of domains. The recent advances in protein structure prediction, in particular AlphaFold2 (AF2) multimer, now allow us to confidently probe potential protein-protein interactions and protein complex formation. This predictive power could be exploited here to produce a better glimpse of these multimeric protein systems. It can also provide a more sound explanation for some of the observed differences amongst different McrBC types.

      Hexameric CnuB complexes with CnuC stimulatory monomers for Type I-A, I-B, I-C, II, and III-A CoCoNuT systems have been modeled with AF2 and included in Supplementary Data File S1, albeit without the domains fused to the GTPase N-terminus (with the exception of Type I-B, which lacks the long coiled-coil domain fused to the GTPase and was modeled with its entire sequence). Attempts to model the other full-length CnuB hexamers did not lead to convincing results.

      Recommendations for the authors:

      Reviewing Editor:

      The detailed recommendations by the two reviewers will help the authors to further strengthen the manuscript, but two points seem particularly worth considering: 1. The methods are barely sketched in the manuscript, but it could be useful to detail them more closely. Particularly regarding the coiled-coil segments, which are currently just statists, useful mainly for the name of the family, more detail on their prediction, structural properties, and purpose would be very helpful. 2. Due to its encyclopedic nature, the wealth of material presented in the paper makes it hard to penetrate in one go. Any effort to make it more accessible would be very welcome. Reviewer 1 in particular has made a number of suggestions regarding the figures, which would make them provide more support for the findings described in the text.

      A description of the techniques used to identify coiled-coil segments has been added to the Methods. Our predictions ranged from near certainty in the coiled-coils detected in CnuB homologs, to shorter helices at the limit of detection in other factors. We chose to report all probable coiled-coils, as the extensive coiled-coils fused to CnuB, which are often the only domain present other than the GTPase, imply involvement in mediating complex formation by interacting with coiled-coils in other factors, particularly the other CoCoNuT factors. The suggestions made by Reviewer 1 were thoughtful and we made an effort to incorporate them.

      Reviewer #1 (Recommendations For The Authors):

      I do not have any major issues with the manuscript. I have however some minor comments, as described below.

      • The last sentence of the abstract at first reads as a fact and not a hypothesis resulting from the work described in the manuscript. After the second read, I noticed the nuances in the sentence. I would suggest a rephrasing to emphasize that the activity described is a theoretical hypothesis not backed-up by experiments.

      This sentence has been rephrased to make explicit the hypothetical nature of the statement.

      • In line 64, the authors rename DUF3578 as ADAM because indeed its function is not unknown. Did the authors consider reaching out to InterPro to add this designation to this DUF? A search in interpro with DUF3578 results in "MrcB-like, N-terminal domain" and if a name is suggested, it may be worthwhile to take it to the IntrePro team.

      We will suggest this nomenclature to InterPro.

      • I find Figure 1E hard to analyse and think it occupies too much space for the information it provides. The color scheme, the large amount of small slices, and the lack of numbers make its information content very small. I would suggest moving this to the supplementary and making it instead a bar plot. If removed from Figure 1, more space is made available for the other panels, particularly the structural superpositions, which in my opinion are much more important.

      We have removed Figure 1E from the paper as it adds little information beyond the abundance and phyletic distribution of sequenced prokaryotes, in which McrBC systems are plentiful.

      • In Figure 2, it is not clear due to the presence of many colorful "operon schemes" that the tree is for a single gene and not for the full operon segment. Highlighting the target gene in the operons or signalling it somehow would make the figure easy to understand even in the absence of the text and legend. The same applies to Supplementary Figure 1.

      The legend has been modified to show more clearly that this is a tree of McrB-like GTPases.

      • In line 146, the authors write "AlphaFold-predicted endonucelase fold" to say that a protein contains a region that AF2 predicts to fold like an endonuclease. This is a weird way of writing it and can be confusing to non-expert readers. I would suggest rephrasing for increased clarity.

      This sentence has been rephrased for greater clarity.

      • In line 167, there is a [47]. I believe this is probably due to a previous reference formatting.

      Indeed, this was a reference formatting error and has been fixed.

      • In most figures, the color palette and the use of very similar color palettes for taxonomy pie charts, genomic context composition schemes, and domain composition diagrams make it really hard to have a good understanding of the image at first. Legends are often close to each other, and it is not obvious at first which belong to what. I would suggest changing the layouts and maybe some color schemes to make it easier to extract the information that these figures want to convey.

      It seemed that Figure 4 was the most glaring example of these issues, and it has been rearranged for easier comprehension.

      • In the paragraph that starts at line 199, the authors mention an Ig-like domain that is often found at the N-terminus of Type I CoCoNuTs. Are they all related to each other? How conserved are these domains?

      These domains are all predicted to adopt a similar beta-sandwich fold and are found at the N-terminus of most CoCoNuT CnuC homologs, suggesting they are part of the same family, but we did not undertake a more detailed sequenced-based analysis of these regions.

      We also find comparable domains in the CnuC/McrC-like partners of the abundant McrB-like NxD motif GTPases that are not part of CoCoNuT systems, and given the similarity of some of their predicted structures to Rho GDP-dissociation inhibitor 1, we suspect that they have coevolved as regulators of the non-canonical NxD motif GTPase type. Our CnuBC multimer models showing consistent proximity between these domains in CnuC and CnuB GTPase domains suggest this could indeed be the case. We plan to explore these findings further in a forthcoming publication.

      • In line 210, the authors write "suggesting a role in overcrowding-induced stress response". Why so? In >all other cases, the authors justify their hypothesis, which I really appreciated, but not here.

      A supplementary note justifying this hypothesis has been added to Supplementary Data File S1.

      • At the end of the paragraph that starts in line 264, the authors mention that they constructed AF2 multimeric models to predict if 2 proteins would interact. However, no quality scores were provided, particularly the PAE matrix. This would allow for a better judgement of this prediction, and I would suggest adding the PAE matrix as another panel in the figure where the 3D model of the complex is displayed.

      The PAE matrix and ipTM+pTM scores for this and other multimer models have been added to Supplementary Data File S1. For this model in particular, the surface charge distribution of the model has been presented to support the role of the domains that have a higher PAE in RNA binding.

      • In line 306, "(supplementary data)" refers to what part of the file?

      This file has been renamed Supplementary Table S3 and referenced as such.

      • In line 464, the authors suggest that ShdA could interact with CoCoNuTs. Why not model the complex as done for other cases? what would co-folding suggest?

      As we were not able to convincingly model full-length CnuB hexamers with N-terminal coiled-coils, we did not attempt modeling of this hypothetical complex with another protein with a long coiled-coil, but it remains an interesting possibility.

      • In line 528, why and how were some genes additionally analyzed with HHPred?

      Justification for this analysis has been added to the Methods, but briefly, these genes were additionally analyzed if there were no BLAST hits or to confirm the hits that were obtained.

      • In the first section of the methods, the first and second (particularly the second) paragraphs are extremely long. I would suggest breaking them to facilitate reading.

      This change has been made.

      • In line 545, what do the authors mean by "the alignment (...) were analyzed with HHPred"?

      A more detailed description of this step has been added to the Methods.

      • The authors provide the models they produced as well as extensive supplementary tables that make their data reusable, but they do not provide the code for the automated steps, as to excise target sequence sections out of multiple sequence alignments, for example.

      The code used for these steps has been in use in our group at the NCBI for many years. It will be difficult to utilize outside of the NCBI software environment, but for full disclosure, we have included a zipped repository with the scripts and custom-code dependencies, although there are external dependencies as well such as FastTree and BLAST. In brief, it involves PSI-BLAST detection of regions with the most significant homology to one of a set of provided alignments (seals-2-master/bin/wrappers/cog_psicognitor). In this case, the reference alignments of McrB-like GTPases and DUF2357 were generated manually using HHpred to analyze alignments of clustered PSI-BLAST results. This step provided an output of coordinates defining domain footprints in each query sequence, which were then combined and/or extended using scripts based on manual analysis of many examples with HHpred (footprint_finders/get_GTPase_frags.py and footprint_finders/get_DUF2357_frags.py), then these coordinates were used to excise such regions from the query amino acid sequence with a final script (seals-2-master/bin/misc/fa2frag).

      Reviewer #2 (Recommendations For The Authors):

      (1) Page 4, line 77 - 'PUA superfamily domains' could be more appropriate to use instead of "EVE superfamily".

      While this statement could perhaps be applied to PUA superfamily domains, our previous work we refer to, which strongly supports the assertion, was restricted to the EVE-like domains and we prefer to retain the original language.

      (2) Page 5. lines 128-130 - AF2 multimer prediction model could provide a more sound explanation for these differences.

      Our AF2 multimer predictions added in this revision indeed show that the NxD motif McrB-like CoCoNuT GTPases interact with their respective McrC-like partners such that an immunoglobulin-like beta-sandwich domain, fused to the N-termini of the McrC homologs and similar to Rho GDP-dissociation inhibitor 1, has the potential to physically interact with the GTPase variants. However, we did not probe this in greater detail, as it is beyond the scope of this already highly complex article, but we plan to study it in the future.

      (3) Page 8, line 252 - The surface charge distribution of CnuH OB fold domain looks very different from SmpB (pdb3iyr). In fact, the regions that are in contact with RNA in SmpB are highly acidic in CoCoNut CnuH. Although it looks likely that this domain is involved in RNA binding, the mode of interaction should be very different.

      We did not detect a strong similarity between the CnuH SmpB-like SPB domain and PDB 3IYR, but when we compare the surface charge distribution of PDB 1WJX and the SPB domain, while there is a significant area that is positively charged in 1WJX that is negatively charged in SPB, there is much that overlaps with the same charge in both domains.

      The similarity between SmpB and the SPB domain is significant, but definitely not exact. An important question for future studies is: If the domains are indeed related due to an ancient fusion of SmpB to an ancestor of CnuH, would this degree of divergence be expected?

      In other words, can we say anything about how the function of a stand-alone tmRNA-binding protein could evolve after being fused to a complex predicted RNA helicase with other predicted RNA binding domains already present? Experimental validation will ultimately be necessary to resolve these kinds of questions, but for now, it may be safe to say that the presence of this domain, especially in conjunction with the neighboring RelE-like RTL domain and UPF1-like helicase domain, signals a likely interaction with the A-site of the ribosome, and perhaps restriction of aberrant/viral mRNA.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This work provides a valuable contribution and assessment of what it means to replicate a null study finding, and what are the appropriate methods for doing so (apart from a rote p-value assessment). Through a convincing re-analysis of results from the Reproducibility Project: Cancer Biology using frequentist equivalence testing and Bayes factors, the authors demonstrate that even when reducing 'replicability success' to a single criterion, how precisely replication is measured may yield differing results. Less focus is directed to appropriate replication of non-null findings.

      Reviewer #1 (Public Review):

      Summary:

      The goal of Pawel et al. is to provide a more rigorous and quantitative approach for judging whether or not an initial null finding (conventionally with p ≥ 0.05) has been replicated by a second similarly null finding. They discuss important objections to relying on the qualitative significant/non-significant dichotomy to make this judgment. They present two complementary methods (one frequentist and the other Bayesian) which provide a superior quantitative framework for assessing the replicability of null findings.

      Strengths:

      Clear presentation; illuminating examples drawn from the well-known Reproducibility Project: Cancer Biology data set; R-code that implements suggested analyses. Using both methods as suggested provides a superior procedure for judging the replicability of null findings.

      Weaknesses:

      The proposed frequentist and the Bayesian methods both rely on binary assessments of an original finding and its replication. I'm not sure if this is a weakness or is inherent to making binary decisions based on continuous data.

      For the frequentist method, a null finding is considered replicated if the original and replication 90% confidence intervals for the effects both fall within the equivalence range. According to this approach, a null finding would be considered replicated if p-values of both equivalences tests (original and replication) were, say, 0.049, whereas would not be considered replicated if, for example, the equivalence test of the original study had a p-value of 0.051 and the replication had a p-value of 0.001. Intuitively, the evidence for replication would seem to be stronger in the second instance. The recommended Bayesian approach similarly relies on a dichotomy (e.g., Bayes factor > 1).

      Thanks for the suggestions, we now emphasize more strongly in the “Methods for assessing replicability of null results” and “Conclusions” sections that both TOST p-values and Bayes factors are quantitative measures of evidence that do not require dichotomization into “success” or “failure”.

      Reviewer #2 (Public Review):

      Summary:

      The study demonstrates how inconclusive replications of studies initially with p > 0.05 can be and employs equivalence tests and Bayesian factor approaches to illustrate this concept. Interestingly, the study reveals that achieving a success rate of 11 out of 15, or 73%, as was accomplished with the non-significance criterion from the RPCB (Reproducibility Project: Cancer Biology), requires unrealistic margins of Δ > 2 for equivalence testing.

      Strengths:

      The study uses reliable and shareable/open data to demonstrate its findings, sharing as well the code for statistical analysis. The study provides sensitivity analysis for different scenarios of equivalence margin and alfa level, as well as for different scenarios of standard deviations for the prior of Bayes factors and different thresholds to consider. All analysis and code of the work is open and can be replicated. As well, the study demonstrates on a case-by-case basis how the different criteria can diverge, regarding one sample of a field of science: preclinical cancer biology. It also explains clearly what Bayes factors and equivalence tests are.

      Weaknesses:

      It would be interesting to investigate whether using Bayes factors and equivalence tests in addition to p-values results in a clearer scenario when applied to replication data from other fields. As mentioned by the authors, the Reproducibility Project: Experimental Philosophy (RPEP) and the Reproducibility Project: Psychology (RPP) have data attempting to replicate some original studies with null results. While the RPCB analysis yielded a similar picture when using both criteria, it is worth exploring whether this holds true for RPP and RPEP. Considerations for further research in this direction are suggested. Even if the original null results were excluded in the calculation of an overall replicability rate based on significance, sensitivity analyses considering them could have been conducted. The present authors can demonstrate replication success using the significance criteria in these two projects with initially p < 0.05 studies, both positive and non-positive.

      Other comments:

      • Introduction: The study demonstrates how inconclusive replications of studies initially with p > 0.05 can be and employs equivalence tests and Bayesian factor approaches to illustrate this concept. Interestingly, the study reveals that achieving a success rate of 11 out of 15, or 73%, as was accomplished with the non-significance criterion from the RPCB (Reproducibility Project: Cancer Biology), requires unrealistic margins of Δ > 2 for equivalence testing.

      • Overall picture vs. case-by-case scenario: An interesting finding is that the authors observe that in most cases, there is no substantial evidence for either the absence or the presence of an effect, as evidenced by the equivalence tests. Thus, using both suggested criteria results in a picture similar to the one initially raised by the paper itself. The work done by the authors highlights additional criteria that can be used to further analyze replication success on a case-by-case basis, and I believe that this is where the paper's main contributions lie. Despite not changing the overall picture much, I agree that the p-value criterion by itself does not distinguish between (1) a situation where the original study had low statistical power, resulting in a highly inconclusive non-significant result that does not provide evidence for the absence of an effect and (2) a scenario where the original study was adequately powered, and a non-significant result may indeed provide some evidence for the absence of an effect when analyzed with appropriate methods. Equivalence testing and Bayesian factor approaches are valuable tools in both cases.

      Regarding the 0.05 threshold, the choice of the prior distribution for the SMD under the alternative H1 is debatable, and this also applies to the equivalence margin. Sensitivity analyses, as highlighted by the authors, are helpful in these scenarios.

      Thank you for the thorough review and constructive feedback. We have added an additional “Appendix C: Null results from the RPP and EPRP” that shows equivalence testing and Bayes factor analyses for the RPP and EPRP null results.

      Reviewer #3 (Public Review):

      Summary:

      The paper points out that non-significance in both the original study and a replication does not ensure that the studies provide evidence for the absence of an effect. Also, it can not be considered a "replication success". The main point of the paper is rather obvious. It may be that both studies are underpowered, in which case their non-significance does not prove anything. The absence of evidence is not evidence of absence! On the other hand, statistical significance is a confusing concept for many, so some extra clarification is always welcome.

      One might wonder if the problem that the paper addresses is really a big issue. The authors point to the "Reproducibility Project: Cancer Biology" (RPCB, Errington et al., 2021). They criticize Errington et al. because they "explicitly defined null results in both the original and the replication study as a criterion for replication success." This is true in a literal sense, but it is also a little bit uncharitable. Errington et al. assessed replication success of "null results" with respect to 5 criteria, just one of which was statistical (non-)significance.

      It is very hard to decide if a replication was "successful" or not. After all, the original significant result could have been a false positive, and the original null-result a false negative. In light of these difficulties, I found the paper of Errington et al. quite balanced and thoughtful. Replication has been called "the cornerstone of science" but it turns out that it's actually very difficult to define "replication success". I find the paper of Pawel, Heyard, Micheloud, and Held to be a useful addition to the discussion.

      Strengths:

      This is a clearly written paper that is a useful addition to the important discussion of what constitutes a successful replication.

      Weaknesses:

      To me, it seems rather obvious that non-significance in both the original study and a replication does not ensure that the studies provide evidence for the absence of an effect. I'm not sure how often this mistake is made.

      Thanks for the feedback. We do not have systematic data on how often the mistake of confusing absence of evidence with evidence of absence has been made in the replication context, but we do know that it has been made in at least three prominent large-scale replication projects (the RPP, RPEP, RPCB). We therefore believe that there is a need for our article.

      Moreover, we agree that the RPCB provided a nuanced assessment of replication success using five different criteria for the original null results. We emphasize this now more in the “Introduction” section. However, we do not consider our article as “a little bit uncharitable” to the RPCB, as we discuss all other criteria used in the RPCB and note that our intent is not to diminish the important contributions of the RPCB, but rather to build on their work and provide constructive recommendations for future researchers. Furthermore, in response to comments made by Reviewer #2, we have added an additional “Appendix B: Null results from the RPP and EPRP” that shows equivalence testing and Bayes factor analyses for null results from two other replication projects, where the same issue arises.

      Reviewer #1 (Recommendations For The Authors):

      The authors may wish to address the dichotomy issue I raise above, either in the analysis or in the discussion.

      Thank you, we now emphasize that Bayes factors and TOST p-values do not need to be dichotomized but can be interpreted as quantitative measures of evidence, both in the “Methods for assessing replicability of null results” and the “Conclusions” sections.

      Reviewer #2 (Recommendations For The Authors):

      Given that, here follow additional suggestions that the authors should consider in light of the manuscript's word count limit, to avoid confusing the paper's main idea:

      2) Referencing: Could you reference the three interesting cases among the 15 RPCB null results (specifically, the three effects from the original paper #48) where the Bayes factor differs qualitatively from the equivalence test?

      We now explicitly cite the original and replication study from paper #48.

      3) Equivalence testing: As the authors state, only 4 out of the 15 study pairs are able to establish replication success at the 5% level, in the sense that both the original and the replication 90% confidence intervals fall within the equivalence range. Among these 4, two (Paper #48, Exp #2, Effect #5 and Paper #48, Exp #2, Effect #6) were initially positive with very low p-values, one (Paper #48, Exp #2, Effect #4) had an initial p of 0.06 and was very precisely estimated, and the only one in which equivalence testing provides a clearer picture of replication success is Paper #41, Exp #2, Effect #1, which had an initial p-value of 0.54 and a replication p-value of 0.05. In this latter case (or in all these ones), one might question whether the "liberal" equivalence range of Δ = 0.74 is the most appropriate. As the authors state, "The post-hoc specification of equivalence margins is controversial."

      We agree that the post hoc choice of equivalence ranges is a controversial issue. The margins define an equivalence region where effect sizes are considered practically negligible, and we agree that in many contexts SMD = 0.74 is a large effect size that is not practically negligible. We therefore present sensitivity analyses for a wide range of margins. However, we do not think that the choice of this margin is more controversial for the mentioned studies with low p-values than for other studies with greater p-values, since the question of whether a margin plausibly encodes practically negligible effect sizes is not related to the observed p-value of a study. Nevertheless, for the new analyses of the RPP and EPRP data in Appendix B, we have added additional sensitivity analyses showing how the individual TOST p-values and Bayes factors vary as a function of the margin and the prior standard deviation. We think that these analyses provide readers with an even more transparent picture regarding the implications of the choice of these parameters than the “project-wise” sensitivity analyses in Appendix A.

      4) Bayes factor suggestions: For the Bayes factor approach, it would be interesting to discuss examples where the BF differs slightly. This is likely to occur in scenarios where sample sizes differ significantly between the original study and replication. For example, in Paper #48, Exp #2 and Effect #4, the initial p is 0.06, but the BF is 8.1. In the replication, the BF dramatically drops to < 1/1000, as does the p-value. The initial evidence of 8.1 indicates some evidence for the absence of an effect, but not strong evidence ("strong evidence for H0"), whereas a p-value of 0.06 does not lead to such a conclusion; instead, it favors H1. It would be interesting if the authors discussed other similar cases in the paper. It's worth noting that in Paper #5, Exp #1, Effect #3, the replication p-value is 0.99, while the BF01 is 2.4, almost indicating "moderate" evidence for H0, even though the p-value is inconclusive.

      We agree that some of the examples nicely illustrate conceptual differences between p-values and Bayes factors, e.g., how they take into account sample size and effect size. As methodologists, we find these aspects interesting ourselves, but we think that emphasizing them is beyond the scope of the paper and would distract eLife readers from the main messages.

      Concerning the conceptual differences between Bayes factors and TOST p-values, we already discuss a case where there are qualitative differences in more detail (original paper #48). We added another discussion of this phenomenon in the Appendix C as it also occurs for the replication of Ranganath and Nosek (2008) that was part of the RPP.

      5) p-values, magnitude and precision: It's noteworthy to emphasize, if the authors decide to discuss this, that the p-value is influenced by both the effect's magnitude and its precision, so in Paper #9, Exp #2, Effect #6, BF01 = 4.1 has a higher p-value than a BF01 = 2.3 in its replication. However, there are cases where both p-values and BF agree. For example, in Paper #15, Exp #2, Effect #2, both the original and replication studies have similar sample sizes, and as the p-value decreases from p = 0.95 to p = 0.23, BF01 decreases from 5.1 ("moderate evidence for H0") to 1.3 (region of "Absence of evidence"), moving away from H0 in both cases. This also occurs in Paper #24, Exp #3, Effect #6.

      We appreciate the suggestions but, as explained before, think that the message of our paper is better understood without additional discussion of more general differences between p-values and Bayes factors.

      6) The grey zone: Given the above topic, it is important to highlight that in the "Absence of evidence grey zone" for the null hypothesis, for example, in Paper #5, Exp #1, Effect #3 with a p = 0.99 and a BF01 = 2.4 in the replication, BF and p-values reach similar conclusions. It's interesting to note, as the authors emphasize, that Dawson et al. (2011), Exp #2, Effect #2 is an interesting example, as the p-value decreases, favoring H1, likely due to the effect's magnitude, even with a small sample size (n = 3 in both original and replications). Bayes factors are very close to one due to the small sample sizes, as discussed by the authors.

      We appreciate the constructive comments. We think that the two examples from Dawson et al. (2011) and Goetz et al. (2011) already nicely illustrate absence of evidence and evidence of absence, respectively, and therefore decided not to discuss additional examples in detail, to avoid redundancy.

      7) Using meta-analytical results (?): For papers from RPCB, comparing the initial study with the meta-analytical results using Bayes factor and equivalence testing approaches (thus, increasing the sample size of the analysis, but creating dependency of results since the initial study would affect the meta-analytical one) could change the conclusions. This would be interesting to explore in initial studies that are replicated by much larger ones, such as: Paper #9, Exp #2, Effect #6; Goetz et al. (2011), Exp #1, Effect #1; Paper #28, Exp #3, Effect #3; Paper #41, Exp #2, Effect #1; and Paper #47, Exp #1, Effect #5).

      Thank you for the suggestion. We considered adding meta-analytic TOST p-values and Bayes factors before, but decided that Figure 3 and the results section are already quite technical, so adding more analyses may confuse more than help. Nevertheless, these meta-analytic approaches are discussed in the “Conclusions” section.

      8) Other samples of fields of science: It would be interesting to investigate whether using Bayes factors and equivalence tests in addition to p-values results in a clearer scenario when applied to replication data from other fields. As mentioned by the authors, the Reproducibility Project: Experimental Philosophy (RPEP) and the Reproducibility Project: Psychology (RPP) have data attempting to replicate some original studies with null results. While the RPCB analysis yielded a similar picture when using both criteria, it is worth exploring whether this holds true for RPP and RPEP. Considerations for further research in this direction are suggested. Even if the original null results were excluded in the calculation of an overall replicability rate based on significance, sensitivity analyses considering them could have been conducted. The present authors can demonstrate replication success using the significance criteria in these two projects with initially p < 0.05 studies, both positive and non-positive.

      Thank you for the excellent suggestion. We added an Appendix B where the null results from the RPP and EPRP are analyzed with our proposed approaches. The results are also discussed in the “Results” and “Conclusions” sections.

      9) Other approaches: I am curious about the potential impact of using an approach based on equivalence testing (as described in https://arxiv.org/abs/2308.09112). It would be valuable if the authors could run such analyses or reference the mentioned work.

      Thank you. We were unaware of this preprint. It seems related to the framework proposed by Stahel W. A. (2021) New relevance and significance measures to replace p-values. PLoS ONE 16(6): e0252991. https://doi.org/10.1371/journal.pone.0252991

      We now cite both papers in the discussion.

      10) Additional evidence: There is another study in which replications of initially p > 0.05 studies with p > 0.05 replications were also considered as replication successes. You can find it here: https://www.medrxiv.org/content/10.1101/2022.05.31.22275810v2. Although it involves a small sample of initially p > 0.05 studies with already large sample sizes, the work is currently under consideration for publication in PLOS ONE, and all data and materials can be accessed through OSF (links provided in the work).

      Thank you for sharing this interesting study with us. We feel that it is beyond the scope of the paper to include further analyses as there are already analyses of the RPCB, RPP, and EPRP null results. However, we will keep this study in mind for future analysis, especially since all data are openly available.

      11) Additional evidence 02: Ongoing replication projects, such as the Brazilian Reproducibility Initiative (BRI) and The Sports Replication Centre (https://ssreplicationcentre.com/), continue to generate valuable data. BRI is nearing completion of its results, and it promises interesting data for analyzing replication success using p-values, equivalence regions, and Bayes factor approaches.

      We now cite these two initiatives as examples of ongoing replication projects in the introduction. Similarly as for your last point, we think that it is beyond the scope of the paper to include further analyses as there are already analyses of the RPCB, RPP, and EPRP null results.

      Reviewer #3 (Recommendations For The Authors):

      I have no specific recommendations for the authors.

      Thank you for the constructive review.

      Reviewing Editor (Recommendations For the Authors):

      I recognize that it was suggested to the authors by the previous Reviewing Editor to reduce the amount of statistical material to be made more suitable for a non-statistical audience, and so what I am about to say contradicts advice you were given before. But, with this revised version, I actually found it difficult to understand the particulars of the construction of the Bayes Factors and would have appreciated a few more sentences on the underlying models that fed into the calculations. In my opinion, the provided citations (e.g., Dienes Z. 2014. Using Bayes to get the most out of non-significant results) did not provide sufficient background to warrant a lack of more technical presentation here.

      Thank you for the feedback. We added a new “Appendix C: Technical details on Bayes factors” that provides technical details on the models, priors, and calculations underlying the Bayes factors.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      Bendzunas, Byrne et al. explore two highly topical areas of protein kinase regulation in this manuscript. Firstly, the idea that Cys modification could regulate kinase activity. The senior authors have published some standout papers exploring this idea of late, and the current work adds to the picture of how active site Cys might have been favoured in evolution to serve critical regulatory functions. Second, BRSK1/2 are understudied kinases listed as part of the "dark kinome" so any knowledge of their underlying regulation is of critical importance to advancing the field.

      Strengths:

      In this study, the author pinpoints highly-conserved, but BRSK-specific, Cys residues as key players in kinase regulation. There is a delicate balance between equating what happens in vitro with recombinant proteins relative to what the functional consequence of Cys mutation might be in cells or organisms, but the authors are very clear with the caveats relating to these connections in their descriptions and discussion. Accordingly, by extension, they present a very sound biochemical case for how Cys modification might influence kinase activity in cellular environs.

      Weaknesses:

      I have very few critiques for this study, and my major points are barely major.

      Major points

      (1) My sense is that the influence of Cys mutation on dimerization is going to be one of the first queries readers consider as they read the work. It would be, in my opinion, useful to bring forward the dimer section in the manuscript.

      We agree that the influence of Cys on BRSK dimerization is a topic of significant interest. Our primary focus was to explore oxidative regulation of the understudied BRSK kinases as they contain a conserved T-loop Cys, and we have previously demonstrated that equivalent residues at this position in related kinases were critical drivers of oxidative modulation of catalytic activity. We have demonstrated here that BRSK1 & 2 are similarly regulated by redox and this is due to oxidative modification of the T+2 Cys, in addition to Cys residues that are conserved amongst related ARKs as well as BRSK-specific Cys. Although we also provide evidence for limited redox-sensitive higher order BRSK species (dimers) in our in vitro analysis, these represent a small population of the total BRSK protein pool (this was validated by SEC-MALs analysis). As such, we do not have strong evidence to suggest that these limited dimers significantly contribute to the pronounced inhibition of BRSK1 & 2 in the presence of oxidizing agents, and instead believe that other biochemical mechanisms likely drive this response. This may result from oxidized Cys altering the conformation of the activation loop. Indeed, the formation of an intramolecular disulfide within the T-loop of BRSK1 & 2, which we detected by MS, is one such regulatory modification. It is noteworthy, that intramolecular disulfide bonds within the T-loop of AKT and MELK have already been shown to induce an inactive state in the kinase, and we posit a similar mechanism for BRSKs.

      While we recognize the potential importance of dimerization in this context, our current data from in vitro and cell-based assays do not provide substantial evidence to assert dimerization as a primary regulatory mechanism. Hence, we maintained a more conservative stance in our manuscript, discussing dimerization in later sections where it naturally followed from the initial findings. That being said, we acknowledge the potential significance of dimerization in the regulation of the BRSK T-loop cysteine. We believe this aspect merits further investigation and could indeed be the focus of a follow-up study.

      (2) Relatedly, the effect of Cys mutation on the dimerization properties of preparations of recombinant protein is not very clear as it stands. Some SEC traces would be helpful; these could be included in the supplement.

      In order to determine whether our recombinant BRSK proteins (and T-loop mutants) existed as monomers or dimers, we performed SDS-PAGE under reducing and non-reducing conditions (Fig 7). This unambiguously revealed that a monomer was the prominent species, with little evidence of dimers under these experimental conditions (even in the presence of oxidizing agents). Although we cannot discount a regulatory role for BRSK dimers in other physiological contexts, we could not produce sufficient evidence to suggest that multimerization played a substantial role in modifying BRSK kinase activity in our assays. We note that our in vitro analysis was performed using truncated forms of the protein, and as such it is entirely possible that regions of the protein that flank the kinase domain may serve additional regulatory functions that may include higher order BRSK conformations. In this regard, although we have not included SEC traces of our recombinant proteins, we have included analytical SEC-MALS of the truncated proteins (Supplementary Figure 6) which we believe to be more informative. We have also now included additional SEC-MALS data for BRSK2 C176A and C183A (Supplementary Figure 6d and e), which supports our findings in Fig 7, demonstrating the presence of limited dimer species under non-reducing conditions.

      (3) Is there any knowledge of Cys mutants in disease for BRSK1/2?

      We have conducted an extensive search across several databases: COSMIC (Catalogue of Somatic Mutations in Cancer), ProKinO (Protein Kinase Ontology), and TCGA (The Cancer Genome Atlas). These databases are well-regarded for their comprehensive and detailed records of mutations related to cancer and protein kinases. Our analysis using the COSMIC and TCGA databases focused on identifying any reported instances of Cys mutations in BRSK1/2 that are implicated in cancer. Additionally, we utilized the ProKinO database to explore the broader landscape of protein kinase mutations, including any potential disease associations of Cys mutations in BRSK1/2. However, we found no evidence to indicate the presence of Cys mutations in BRSK1/2 that are associated with cancer or disease. This lack of association in the current literature and database records suggests that, as of our latest search, Cys mutations in BRSK1/2 have not been reported as significant contributors to pathogenesis.

      (4) In bar charts, I'd recommend plotting data points. Plus, it is crucial to report in the legend what error measure is shown, the number of replicates, and the statistical method used in any tests.

      We have added the data points to the bar charts and included statistical methods in figure legends.

      (5) In Figure 5b, the GAPDH loading control doesn't look quite right.

      The blot has been repeated and updated.

      (6) In Figure 7 there is no indication of what mode of detection was used for these gels.

      We have updated the figure legend to confirm that the detection method was western blot.

      (7) Recombinant proteins - more detail should be included on how they were prepared. Was there a reducing agent present during purification? Where did they elute off SEC... consistent with a monomer of higher order species?

      We have added ‘produced in the absence of reducing agents unless stated otherwise’ in the methods section to improve clarity. Although we have not added additional sentences to describe the elution profile of the BRSK proteins by SEC during purification, we believe that the inclusion of analytical SEC-MALS data is sufficient evidence that the proteins are largely monomeric under non-reducing conditions.

      Reviewer #2 (Public Review):

      Summary:

      In this study by Bendzunas et al, the authors show that the formation of intra-molecular disulfide bonds involving a pair of Cys residues near the catalytic HRD motif and a highly conserved T-Loop Cys with a BRSK-specific Cys at an unusual CPE motif at the end of the activation segment function as repressive regulatory mechanisms in BSK1 and 2. They observed that mutation of the CPE-Cys only, contrary to the double mutation of the pair, increases catalytic activity in vitro and drives phosphorylation of the BRSK substrate Tau in cells. Molecular modeling and molecular dynamics simulations indicate that oxidation of the CPE-Cys destabilizes a conserved salt bridge network critical for allosteric activation. The occurrence of spatially proximal Cys amino acids in diverse Ser/Thr protein kinase families suggests that disulfide-mediated control of catalytic activity may be a prevalent mechanism for regulation within the broader AMPK family. Understanding the molecular mechanisms underlying kinase regulation by redox-active Cys residues is fundamental as it appears to be widespread in signaling proteins and provides new opportunities to develop specific covalent compounds for the targeted modulation of protein kinases.

      The authors demonstrate that intramolecular cysteine disulfide bonding between conserved cysteines can function as a repressing mechanism as indicated by the effect of DTT and the consequent increase in activity by BSK-1 and -2 (WT). The cause-effect relationship of why mutation of the CPE-Cys only increases catalytic activity in vitro and drives phosphorylation of the BRSK substrate Tau in cells is not clear to me. The explanation given by the authors based on molecular modeling and molecular dynamics simulations is that oxidation of the CPE-Cys (that will favor disulfide bonding) destabilizes a conserved salt bridge network critical for allosteric activation. However, no functional evidence of the impact of the salt-bridge network is provided. If you mutated the two main Cys-pairs (aE-CHRD and A-loop T+2-CPE) you lose the effect of DTT, as the disulfide pairs cannot be formed, hence no repression mechanisms take place, however when looking at individual residues I do not understand why mutating the CPE only results in the opposite effect unless it is independent of its connection with the T+2residue on the A-loop.

      Strengths:

      This is an important and interesting study providing new knowledge in the protein kinase field with important therapeutic implications for the rationale design and development of next-generation inhibitors.

      Weaknesses:

      There are several issues with the figures that this reviewer considers should be addressed.

      Reviewer #1 (Recommendations for The Authors):

      Major points

      Page 26 - the discussion could be more concise. There's an element of recapping the results, which should be avoided.

      Regarding the conciseness of the discussion section, we have thoroughly revised it to ensure a more succinct presentation, deliberately avoiding the recapitulation of results. The revised discussion now focuses on interpreting the findings and their implications, steering clear of redundancy with the results section.

      Figure 1b seems to be mislabeled/annotated. I recommend checking whether the figure legends match more broadly. Figure 1 appears to be incorrectly cited throughout the results.

      Thank you for pointing out the discrepancies in the labeling and citation of Figure 1b. We have carefully reviewed and corrected these issues to ensure that all figure labels, legends, and citations accurately reflect the corresponding data and illustrations. We appreciate your attention to detail and the opportunity to improve the clarity and accuracy of our presentation.

      Figure 6 - please include a color-coding key in the figure. Further support for these simulations could be provided by supplementary movies or plots of the interaction. Figure 4 colour palette should be adjusted for the spheres in the Richardson diagrams to have greater distinction.

      As suggested, we have amended the colour palette in Figure 4 to improve conformity throughout the figure.

      Minor points

      Figure 2 - it'd be helpful to know what the percentage coverage of peptides is.

      We have updated the figure legend to include peptide coverage for both proteins

      Some typos - Supp 2 legend "Domians".

      Fixed

      Figure 6 legend - analyzed by needs a space;

      Fixed

      Fig 8 legend schematic misspelled.

      Fixed

      Broadly, if you Google T-loop you get a pot pourri of enzyme answers. Why not just use Activation loop?

      The choice of "T-loop" over "Activation loop" in our manuscript was made to maintain consistency with other literature in the field, and in particular our previous paper “Aurora A regulation by reversible cysteine oxidation reveals evolutionarily conserved redox control of Ser/Thr protein kinase activity” where we refer to the activation loop cysteine as T-loop + 2. We acknowledge the varied enzyme contexts in which "T-loop" is used and agree on the importance of clarity. To address this, we made an explicit note in the manuscript that the "T-loop" is also referred to as the "Activation loop", ensuring readers are aware of the interchangeable use of these terms. Additionally, this nomenclature facilitates a more straightforward designation of cysteine residues within the loop (T+2 Cysteine). We believe this approach balances adherence to established conventions with the need for clarity and precision in our descriptions.

      Methods - what is LR cloning. Requires some definition. Some manufacturer detail is missing in methods, and referring to prior work is not sufficient to empower readers to replicate.

      We agree, and have added the following to the methods section:

      “BRSK1 and 2 were sub-cloned into pDest vectors (to encode the expression of N-terminal Flag or HA tagged proteins) using the Gateway LR Clonase II system (Invitrogen) according to the manufacturer’s instructions. pENtR BRSK1/2 clones were obtained in the form of Gateway-compatible donor vectors from Dr Ben Major (Washington University in St. Louis). The Gateway LR Clonase II enzyme mix mediates recombination between the attL sites on the Entry clone and the attR sites on the destination vector. All cloned BRSK1/2 genes were fully sequenced prior to use.”

      Page 7 - optimal settings should be reported. How were pTau signals quantified and normalised?

      We have added the following to the methods section:

      “Two-color Western blot detection method employing infrared fluorescence was used to measure the ratio of Tau phospho serine 262 to total Tau. Total GFP Tau was detected using a mouse anti GFP antibody and visualized at 680 nm using goat anti mouse IRdye 680 while phospho-tau was detected using a Tau phospho serine 262 specific antibody and visualized at 800 nm using goat anti rabbit IRdye 800. Imaging was performed using a Licor Odessey Clx with scan control settings set to 169 μm, medium quality, and 0.0 mm distance. Quantification was performed using Licor image studio on the raw image files. Total Tau to phospho Tau ratio was determined by measuring the ratio of the fluorescence intensities measured at 800 nm (pTau) to those at 680 nm (total tau).”

      In the Figure 6g-j legend, the salt bridge is incorrectly annotated as E185-R248 rather than 258.

      Fixed

      Lines 393-395 provides a repeat statement on BRSKs phosphorylating Tau (from 388-389).

      We have removed the repetition and reworded the opening lines of the results section to improve the overall flow of the manuscript.

      Supp. Figure 1 is difficult to view - would it be possible to increase the size of the phylogenetic analysis?

      We thank the reviewer for this observation. We have rotated (90°) and expanded the figure so that it can be more clearly viewed

      Supp. Figure 2 - BRSK1/2 incorrectly spelled.

      Fixed

      Please check the alignment of labels in Supp. Figure 3e.

      Fixed

      Reviewer #2 (Recommendations For The Authors):

      (1) In Figure 1, current panel b is not mentioned/described in the figure legend and as a consequence, the rest of the panels in the legends do not fit the content of the figure.

      Reviewer 1 also noted this error, and we have amended the manuscript accordingly.

      What is the rationale for using the HEK293T cells as the main experimental/cellular system? Are there cell lines that express both proteins endogenously so that the authors can recapitulate the results obtained from ectopic overexpression?

      The selection of HEK-293T cells was driven by their well-established utility in overexpression studies, which make them ideal for the investigation of protein interactions and redox regulation. This cell line's robust transfection efficiency and well-characterized biology provide a reliable platform for dissecting the molecular mechanisms underlying the redox regulation of proteins. Furthermore, the use of HEK-293T cells aligns with the broader scientific practice, serving as a common ground for comparability with existing literature in the field of BRSK1/2 signaling, protein regulation and interaction studies.

      The application of HEK-293T cells as a model system in our study serves as a foundational step towards eventually elucidating the functions of BRSK1/2 in neuronal cells, where these kinases are predominantly expressed and play critical roles. Given the fact that BRSKs are classed as ‘understudied’ kinases, the choice of a HEK-293T co-overexpression system allowed us to analyze the direct effects of BRSK kinase activity (using phosphorylation of Tau as a readout) in a cellular context and in more controlled manner. This approach not only aids in the establishment of a baseline understanding of the redox regulation of BRSK1/2, but also sets the stage for subsequent investigations in more physiologically relevant neuronal models

      In current panel d, could the authors recapitulate the same experimental conditions as in current panel c?

      Figure 1 panel c shows that both BRSK1 and 2 are reversibly inhibited by oxidizing agents such as H2O2, whilst panels d and e show the concentration dependent activation and inhibition of the BRSKs with increasing concentrations of DTT and H2O2 respectively. The experimental conditions were identical, other than changing amounts of reducing and oxidizing agents, and used the same peptide coupled assays. Data for all experiments were originally collected in ‘real time’ as depicted in Fig 1c (increase in substrate phosphorylation over time). However, to aid interpretation of the data, we elected to present the latter two panels as dose response curves by calculating the change in the rate of enzyme activity (shown as pmol phosphate incorporated into the peptide substrate per min) for each condition. To aid the reader, we now include an additional supplementary figure (new supplementary figure 2) depicting BRSK1 and 2 dependent phosphorylation of the peptide substrate in the presence of different concentrations of DTT and H2O2 in a real time (kinetic) assay. The new data shown is a subset of the unprocessed data that was used to calculate the rates of BRSK activity in Fig 1d & e.

      Why did the authors use full-length constructs in these experiments and did not in e.g. Figure 2 where they used KD constructs instead?

      In the initial experiments, illustrated in Figure 1, we employed full-length protein constructs to establish a proof of concept, demonstrating the overall behavior and interactions of the proteins in their full-length form. This confirmed that BRSK1 & 2, which both contain a conserved T + 2 Cys residue that is frequently prognostic for redox sensitivity in related kinases, displayed a near-obligate requirement for reducing agents to promote kinase activity.  

      Subsequently, in Figure 2, our focus shifted towards delineating the specific regions within the proteins that are critical for redox regulation. By using constructs that encompass only the kinase domain, we aimed to demonstrate that the redox-sensitive regulation of these proteins is predominantly mediated by specific cysteine residues located within the kinase domain itself. This strategic use of the kinase domain of the protein allowed for a more targeted investigation. Furthermore, in our hands these truncated forms of the protein were more stable at higher concentrations, enabling more detailed characterization of the proteins by DSF and SEC-MALS. We predict that the flanking disordered regions of the full-length protein (as predicted by AlphaFold) contribute to this effect.

      (2) In Figure 2, Did the authors try to do LC/MS-MS in the same experimental conditions as in Figure 1 (e.g. buffer minus/plus DTT, H2O2, H2O2 + DTT)?

      We would like to clarify that the mass spectrometry experiments were conducted exclusively on proteins purified under native (non-reducing) conditions. We did not extend the LC/MS-MS analyses to include proteins treated with various buffer conditions such as minus/plus DTT, H2O2, or H2O2 + DTT as used in the experiments depicted in Figure 1. Given that we could readily detect disulfides in the absence of oxidizing agents, we did not see the benefit of additional treatment conditions as peroxide treatment of protein samples can frequently complicate interpretation of MS data. However, it should be noted that prior to MS analysis, tryptic peptides were subjected to a 50:50 split, with one half alkylated in the presence of DTT (as described in the methods section) to eliminate disulfides and other transiently oxidized Cys forms. Comparative analysis between reduced and non-reduced tryptic peptides improved our confidence when assigning disulfide bonds (which were eliminated in identical peptides in the presence of DTT).

      On panel b, why did the authors show alphafold predictions and not empiric structural information (e.g. X-ray, EM,..)?

      The AlphaFold models were primarily utilized to map the general locations of redox-sensitive cysteine pairs within the proteins of interest. Although we have access to the crystal structure of mouse BRSK2, they do not fully capture the active conformation seen in the Alphafold model of the human version. The use of AlphaFold models for human proteins in this study aids in consistently tracking residue numbering across the manuscript, offering a useful framework for understanding the spatial arrangement of these critical cysteine pairs in their potentially active-like states. This approach facilitates our analysis and discussion by providing a reference for the structural context of these residues in the human proteins.

      What was the rationale for using the KD construct and not the FL as in Figure 1?

      The rationale to use the kinase domain was primarily based on the significantly lower confidence in the structural predictions for regions outside the kinase domain (KD). Our experimental focus was to investigate the role of conserved cysteine residues within the kinase domain, which are critical for the protein's function and regulation. This targeted approach allowed us to concentrate our analyses on the most functionally relevant and structurally defined portion of the protein, thereby enhancing the precision and relevance of our findings. As is frequently the case, truncated forms of the protein, consisting only of the kinase domain, are much more stable than their full length counterparts and are therefore more amenable to in vitro biochemical analysis. In our hands this was true for both BRSK1 and 2, and as such much of the data collected here was generated using kinase-domain (KD) constructs. Simulations using the KD structures are therefore much more representative of our original experimental setup.

      The BSK1 KD construct appears to be rather inactive and not responsive to DTT treatment. Could the authors comment on the differences observed with the FL construct of Figure 1

      It is important to note that BRSK1, in general, exhibits lower intrinsic activity compared to BRSK2. This reduced activity could be attributed to a range of factors, including the need for activation by upstream kinases such as LKB1, as well as potential post-translational modifications (PTMs) that may be absent in the bacterially expressed KD construct. The full-length forms of the protein were purified from Sf21 cells, and as such may have additional modifications that are lacking in the bacterially derived KD counterparts. We also cannot discount additional regulatory roles of the regions that flank the KD, and these may contribute in part to the modest discrepancy observed between constructs.  Despite these differences, it is crucial to emphasize that both the KD and FL constructs of BRSK1 are regulated by DTT, indicating a conserved redox-dependent activation for both of the related BRSK proteins.  

      (3) In Figure 4, on panel A wouldn´t the authors expect that mutating on the pairs e.g. C198A in BSK1 would have the same effect as mutating the C191 from the T+2 site? Did they try mutating individual sites of the aE/CHRD pair? The same will apply to BSK2

      We appreciate the insightful comment. It's important to clarify that the redox regulation of these proteins is influenced not solely by the formation of disulfide bonds but also by the oxidation state of individual cysteine residues, particularly the T+2 Cys. This nuanced mechanism of regulation allows for a diverse range of functional outcomes based on the specific cysteine involved and its state of oxidation. This aspect forms a key finding of our paper, highlighting the complexity of redox regulation beyond mere disulfide bond formation. For example, AURA kinase activity is regulated by oxidation of a single T+2 Cys (Cys290, equivalent to Cys191 and Cys176 of BRSK1 and 2 respectively), but this regulation can be supplemented through artificial incorporation of a secondary Cys at the DFG+2 position (Byrne et al., 2020). This targeted genetic modification or AURA mirrors equivalent regulatory disulfide-forming Cys pairs that naturally occur in kinases such as AKT and MELK, and which provide an extra layer of regulatory fine tuning (and a possible protective role to prevent deleterious over oxidation) to the T+2 Cys. We surmise that the CPE Cys is also an accessory regulatory element to the T+2 Cys in BRSK1 +2, which is the dominant driver of BRSK redox sensitivity (as judged by the fact that CPE Cys mutants are still potently regulated by redox [Fig 4]), by locking it in an inactive disulfide configuration.

      In our preliminary analysis of BRSK1, we observed that mutations of individual sites within the aE/CHRD pair was similarly detrimental to kinase activity as a tandem mutation (see reviewer figure 1). As discussed in the manuscript, we think that these Cys may serve important structural regulatory functions and opted to focus on co-mutations of the aE/CHRD pair for the remainder of our investigation.

      Author response image 1.

      In vitro kinase assays showing rates of in vitro peptide phosphorylation by WT and Cys-to-Ala (aE/CHRD residues) variants of BRSK1 after activation by LKB1.

      In panels C and D, the same experimental conditions should have been measured as in A and B.

      Panels A and B were designed to demonstrate the enzymatic activity and the response to DTT treatment to establish the baseline redox regulation of the kinase and a panel of Cys-to-Ala mutant variants. In contrast, panels C and D were specifically focused on rescue experiments with mutants that showed a significant effect under the conditions tested in A and B. These panels were intended to further explore the role of redox regulation in modulating the activity of these mutants, particularly those that retained some level of activity or exhibited a notable response to redox changes.

      The rationale for this experimental design was to prioritize the investigation of mutants, such as those at the T+2 and CPE cysteine sites, which provided the most insight into the redox-dependent modulation of kinase activity. Other mutants, which resulted in inactivation, were deprioritized in this context as they offered limited additional information regarding the redox regulation mechanism. This focused approach allowed us to delve deeper into understanding how specific cysteine residues contribute to the redox-sensitive control of kinase function, aligning with the overall objective of elucidating the nuanced roles of redox regulation in kinase activity.

      (4) In figure 5: Why did the authors use reduced Glutathione instead of DTT? The authors should have recapitulated the same experimental conditions as in Figure 4 and not focused only on the T+2 or the CPE single mutants but using the double and the aE/CHRD mutants as well, as internal controls and validation of the enzymatic assays using the modified peptide

      Regarding the use of reduced glutathione (GSH) instead of DTT in Figure 5, we chose GSH for its well characterized biological relevance as an antioxidant in cellular responses to oxidative stress. Furthermore, while DTT has been widely used in experimental setups, it is also potentially cytotoxic at high concentrations.

      Addressing the point on experimental consistency with Figure 4, we appreciate the suggestion and indeed had already conducted such experiments (Previously Supp Fig 3, now changed to current Supp Fig 4). These experiments include analyses of BRSK mutant activity in a HEK-293T model. However, we chose not to focus on inactivating mutants (such as the aE/CHRD mutants which had depleted expression levels possibly as a consequence of compromised structural integrity) or pursue the generation of double mutant CMV plasmids, as these were deemed unlikely to add significant insights into the core narrative of our study. Our focus remained on the mutants that yielded the most informative results regarding the redox regulation mechanisms in the in vitro setting, ensuring a clear and impactful presentation of our findings.

      A time course evaluation of the reducing or oxidizing reagents should have been performed. Would we expect that in WT samples, and in the presence of GSH, and also in the case of the CPE mutant, an increment in the levels of Tau phosphorylation as a readout of BSK1-2 activity?

      We acknowledge the importance of such analyses in understanding the dynamic nature of redox regulation on kinase activity and have included a time course (Supp Fig 2 e-g). These results confirm a depletion of Tau phosphorylation over time in response to peroxide generated by the enzyme glucose oxidase.

      (5) In Figure 6, did the authors look at the functional impact of the residues with which interact the T+2 and the CPE motifs e.g. T174 and the E185-R258 tether?

      Our primary focus was on the salt bridges, as this is a key regulatory structural feature that is conserved across many kinases. Regarding the additional interactions mentioned, we have thoroughly evaluated their roles and dynamics through molecular dynamics (MD) simulations but did not find any results of significant relevance to warrant inclusion.

      (6) In Figure 7: Did the author look at the oligomerization state of the BSK1-2 multimers under non-reducing conditions? Were they also observed in the case of the FL constructs? What was the stoichiometry?

      Our current work indicates that the kinase domain of BRSK1-2 primarily exists in a monomeric state, with some evidence of dimerization or multimer formation under specific conditions. Our SEC-MALS (Supp Fig 6) and SDS-PAGE analysis (Figure 7) clearly demonstrates that monomers are overwhelmingly the dominant species under non-reducing conditions (>90 %). We also conclude that these limited oligomeric species can be removed by inclusion of reducing agents such as DTT (Figure 7), which may suggest a role for a Cys residue(s). Notably, removal of the T+2 Cys was insufficient to prevent multimerization.

      We were unable to obtain reliable SEC-MALS data for the full-length forms of the protein, likely due to the presence of disordered regions that flank the kinase domain which results in a highly heterodispersed and unstable preparation (at the concentrations required for SEC-MALS). Although we are therefore unable to comment on the stoichiometry of FL BRSK dimers, we can detect BRSK1 and 2 hetero- and homo-complexes in HEK-293T cells by IP, which supports the existence of limited BRSK1 & 2 dimers (Supp Fig 6a). However, we were unable to detect intermolecular disulfide bonds by MS, although this does not necessarily preclude their existence. The physiological role of BRSK multimerization (if any) and establishing specifically which Cys residues drive this phenomenon is of significant interest to our future investigations.

    1. Author response:

      We thank the reviewers for their attention to our study and for their fair and reasonable assessment of the strengths and weaknesses of our work. We believe the reviewers adequately captured both the potential implications of our work as well as its major current limitations. As both reviewers noted, we believe the work presented in this manuscript is an exciting first step in adapting minibinders as antigen sensors for synthetic receptors but many questions remain before these new tools can be widely adopted. We hope that this work will catalyze others to try minibinders as potential antigen sensors when developing novel synthetic receptors, and we hope that future work will more thoroughly test a wide range of linkers to better optimize antigen sensor function across synthetic receptors.

      In our future work, we intend to evaluate a greater diversity of minibinders across different relevant therapeutic targets. We are working to test both existing minibinders as well as generate novel minibinders using deep-learning-based de novo protein design methods. We further hope to explore additional linker modifications, especially focusing on modifications that will allow minibinder coupled-synthetic receptors to escape the glycocalyx of engineered cells. We hope to share findings on these topics in either an update to this manuscript or in future manuscripts, depending on the results of our studies in progress.

      Finally, reviewers noted a mismatch in the data displayed in Figure 5A and 5C, whereby LCB-CAR-expressing cells induced higher lysis in Figure 5C than in Figure 5A. This is due to figure 5C showing only 24 hours of incubation between effector and target cells, as opposed to the 72 hours of incubation that is quantitated in 5A. These mismatched timepoints were selected because linker-dependent differences in lysis were most readily apparent at 24 hours and were negligible at 72 hours. The full-time course of lysis for this experiment can be seen in Supplemental Figure 2D.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We thank the reviewers for their thorough review of and overall positive comments on our manuscript. We have revised the manuscript to address most of the concerns raised. Below is a point-by-point response to the reviewers’ comments outlining these changes.

      The novelty of the study is compromised due to the recently published structure of unliganded PRex1 (Chang et al. 2022). The unliganded and IP4-bound structure of P-Rex1 appear virtually identical, however, no clear comparison is presented in the manuscript. In the same paper, a very similar model of P-Rex1 activation upon binding to PIP3 membranes and Gbeta/gamma is presented.

      This comparison has been added as Supplemental Figure 5. Although similar models of activation are presented in our manuscript and in that of Chang et al. 2022, our model is extended to incorporate inhibition by IP4 and other aspects of regulation not previously incorporated, shown in both schematic form (Figure 6B) and including supporting data (Figure 6A). We also point out that in the work by Chang et al. they used domain insertions to stabilize the structure, and here we present the native protein structure. It turns out that they look similar, but our work reduces concerns over possible engineering artifacts. Finally, our model is further informed by HDX-MS measurements of the enzyme bound to PIP3 in liposomes (Figure 6A and Supplemental figure 8), which reveal the regions of the protein subject to higher dynamics and are consistent with a more fully extended conformation.

      The authors demonstrate that IP4 binding to P-Rex1 results in catalytic inhibition and increased protection of autoinhibitory interfaces, as judged by HDX. The relevance of this in a cellular setting is not clear and is not experimentally demonstrated. Further, mechanistically, it is not clear whether the biochemical inhibition by IP4 of PIP3 activated P-Rex1 is due to competition of IP4 with activating PIP3 binding to the PH domain of P-Rex1, or due to stabilizing the autoinhibited conformation, or both.

      We feel that both occur. IP4 and PIP3 bind to the same site of the PH domain, thus they must be competitive at the very least. We also show that IP4 stabilizes the autoinhibited conformation (based on both our cryo-EM and HDX-MS data). Because PIP3 does not activate either DH/PH or DH/PH-DEP1 (nor does IP4 inhibit, see Sup. Fig. 1), it is not possible for us to tell with this suite of experiments how much the inhibition is due to competition versus stabilization of the autoinhibited conformation.

      It is difficult to judge the error in the HDX experiments presented in Sup. data 1 and 2. In the method section, it is stated that the results represent the average from two samples. How is the SD error calculated in Fig.1B-C?

      To clarify, the following passages have been revised:

      Figure 1 legend – “Graphs show the exchange over time for select regions in the P-Rex1 (B) PH domain and (C) a IP4P region that was disordered in the P-Rex1–Gbg structure. Shown is the average of two experiments with error bars representing the mean ± standard deviation.” Methods section – “Each sample was analyzed twice by HDX-MS, and the data shown in graphs represent the average of these experiments. For each peptide, the average of all five time points was calculated and used to plot the difference data onto the coordinates.”

      As mentioned, from the explanations in the manuscript it is difficult to judge the differences between the unliganded and the IP4 bound structure. A superposition, pointing to the main differences, would help. Are there any additional interactions observed that could explain a more stable autoinhibitory conformation?

      Added as Supplemental Figure 5. Although there are global shifts in some of the domains, the overall structures are similar to one another. Due to the moderate resolution of both structures (~4.2 Å), accurate placement of sidechains is difficult, in some places more than others. Because of this, we cannot pinpoint many specific sidechain interactions with certainty. There are no obvious interactions observed in our IP4 bound structure compared to that of 7SYF that would explain a more stable autoinhibited conformation, and thus the evidence comes primarily from the HDX-MS data.

      The cellular significance of IP4 regulation is not clear. Finding a way to manipulate intracellular IP4 levels and showing that this affects P-Rex1 cellular activity would greatly increase the significance of this finding.

      We agree that this would be an informative experiment, but not one that we currently have the means to perform.

      From the presented data it is not clear if inhibition by IP4 is due to competition with PIP3 or due to the proposed stabilization of P-Rex1 autoinhibition. Performing a study as shown in Fig.1D, but with the DH/PH construct could resolve this question.

      First, please see our response to the similar concern from Reviewer 1 above. It is not possible for us to test the DH/PH construct and assess if there is direct competition with PIP3. To emphasize this point (and to correct the error that we never made a call to Sup. Fig. 1C in the original manuscript), we added the following lines to the first paragraph of the Results.

      “Negatively charged liposomes (containing PC/PS), including those that also contain PIP3, unexpectedly inhibit the GEF activity of the DH/PH-DEP1 and DH/PH fragments (Sup. Fig. 1C). Because full-length P-Rex1 is not affected by PC/PS liposomes, it suggests this the observed inhibition represents a non-productive interaction of the DH/PH-DEP1 and DH/PH fragments with negatively charged surfaces in our assay. The lack of activation of DH/PH-DEP1 by PIP3 prevents us from testing whether IP4 can directly inhibit via direct competition with PIP3.”

      If I understand correctly, the data shown in Supplementary Data 1 and 2 are averages of 2 measurements, which makes it difficult to judge real signals from outliers. Perhaps, rather than showing the average, the results from the two experiments could be shown. Also, please explain how the SD error is calculated in Fig.1B-C if the data points indeed are averages of 2 measurements.

      We are sorry for the confusion. The data shown in Sup. Data 1 and 2 are not averages of two experiments. The Methods section has therefore been modified to read: “Each image in Supplemental Data 1 and 2 shows one experiment (rainbow plots) or a difference analysis from those experiments (red to blue plots). Only one of the two sets of experiments performed for each condition (+/- liposomes or +/- IP4) is shown here.” As described above, text has been added to clarify the SD error calculated in Fig. 1B and 1C.

      The authors claim that the data presented in Fig 4B suggests that the salt bridge formed by K207 and E251 is important for autoinhibition. If so, the authors should explain why the K207C mutant is not activated.

      Multiple reviewers had problems with this panel, and we now recognize that we misinterpreted the data, which did not help with this. Because this data is largely just supportive of our structure and SAXS data, Figure 4 was moved to the Supplement and this section of the results now reads:

      “Flexibility of the hinge in the a6-aN helix of the DH/PH module is important for autoinhibition.

      One of our initial goals in this project was to determine a high-resolution structure of the autoinhibited DH/PH-DEP1 core by X-ray crystallography. To this end, we started with the DH/PH-DEP1 A170K variant, which was more inhibited than wild-type but still dynamic, and then introduced S235C/M244C and K207C/E251C double mutants to completely constrain the hinge in the a6-aN helix via disulfide bond formation in a redox sensitive manner. Single cysteine variants K207C and M244C were generated as controls. The S235C/M244C variant performed as expected, decreasing the activity of the A170K variant to nearly background in the oxidized but not the reduced state (Supplemental Fig. 4). However, the M244C single mutant exhibited similar effects, suggesting that it forms disulfide bonds with cysteine(s) other than S235C. Indeed, the side chains of Cys200 and Cys234 are very close to that of M244C. The K207C/E251C mutant was similar to S235C/M244C under oxidized conditions, but ~15-fold more active (similar to WT DH/PH levels, see Fig. 3C) under reducing conditions. The K270C variant, on the other hand, exhibited higher activity than A170K on its own under oxidizing conditions, but similar activity to all the variants except K207C/E251C when reduced. These results suggest that K207C/E251C in a reduced state and K270C in an oxidized state favor a configuration where the DEP1 domain is less able to engage the DH domain and maintain the kinked state. The mechanism for this is not known. Regardless, these data show that perturbation of contacts between the kinked segments of the a6-aN helix can have profound consequences on the activity of the DH/PH-DEP1 core.”

      In the low-resolution cryo-EM study, it is mentioned that only a few classes exhibit the extra density that ultimately corresponds to autoinhibited P-Rex1. If so, is this also the case in the high-resolution study and how many of the most populated classes contribute to the autoinhibited structure? It would be informative for the reader to provide this information.

      Indeed, only a small subset of the particles are in the autoinhibited conformation in the Krios data set, similar to the Glacios. How many classes these particles partition to is dependent on how many classes are asked for during 2D classification and how many “garbage” particles are present at the different stages of particle stack cleaning during 2D classification. Also, because of the preferred orientation problem, many of the particles in this conformation segregate together during 2D classification. Therefore, in addition to the information show in Sup. Fig. 2, we think a more informative metric to answer the reviewer’s question is the number of particles at the start of data processing compared to at the end, which is shown in Table 1.

      Page 10, line 217: "The kink .... is important for autoinhibition". It seems unlikely that there is no kink in the activated state. Perhaps it should say something like "Mobility in the kink is important ..."

      Agreed. In fact, the SAXS data we reported on the DH/PH module in Ravala et al. (2020) is most consistent with a DH/PH that exhibits both extended and condensed conformations in solutions.

      Fig. 4A: It would help to label helices alpha6 and alphaN.

      These helices have now been labeled.

      Page 11, lines 223 and 228 are contradictory: In line 223 it is stated that K207C/E251C exhibit reduced GEF activity, while on line 228 it says this has little effect under non-reducing conditions.

      We thank the reviewer for this catch. We have modified the text to make it self-consistent.

      In Fig.5B, it would help if the authors mention in the legend that a trans-well migration assay was used, in order to know what the increase in stained cells signifies.

      The legend has been modified to include this information.

      The previous work by Chang et al., 2022 (PMID: 35864164) found that the final DH domain α6 formed the hinge helix (the kink in this manuscript), which undergoes a significant conformational change between closed and opened conformations of P-Rex1. Could the authors discuss the state of the kink in the presence of IP4 and in the P-Rex1 variants A170K and L177E?

      We have now included an alignment of our structure in the presence of IP4 with the Chang et al., 2022 structure (Supplemental Figure 5). There is very little difference in the kink region. Because the A170K variant exhibits reduced GEF activity and a smaller Dmax, it could be speculated that the kink might be further stabilized as compared to wild-type. The L177E variant exhibited activity similar to that of DH/PH alone, implying a relief of the kink. This interpretation is supported by our SAXS analysis of A170K and L177E in Fig. 3.

      I am a bit confused about the set of experiments with the intended DH-DEP1 interface disruptive mutation A170K, which later turned out to enhance P-Rex1 activity inhibition. The authors explained that the DH K170 salt bridges with DEP1 Glu411 stabilize the DH-DEP1 interaction. Next, the authors used P-Rex1 A170K mutant as the backbone for the introduction of disulfide bonds to block the closed configuration of the DH-PH hinge region by creating some mutants S235C/M244C and K207C/E251C. The first intended C235-C244 disulfide bond did not show any effect on the GEF activity because C235 is so close to the native C234 for a potential disulfide bond. I would recommend putting the data of S235C/M244C into a supplemental figure. Also, I am wondering if the GEF activity measurements in Fig 4B could be performed in the presence or absence of IP4 to see whether the IP4-induced autoinhibition form is distinct from the natural autoinhibitory once the kink was unblocked by reducing agent DTT.

      The confusion was warranted by our poor analysis of this data, rectified as discussed above.

      With regards to experiments plus/minus IP4, due to the absence of the IP4P domain, IP4 had no inhibitory effect on the activity of DH/PH or DH/PH-DEP1 (Supplemental Figure 1A and 1B) and as such this experiment would not likely be informative (or at best very hard to interpret).

      For the IP4 versus PIP3 activity assays, the authors indicated that P-Rex1 inhibition is dependent on the Inositol 3-phosphate. Have the authors tested and could they test with either Ins (1,3,4)P3 or Ins(1,3,5)P3?

      In these assays (Figure 1D), we show that inhibition does not occur with Ins(1,4,5)P3. Based on previous structures of IP4 bound to the PH domain and supporting biochemical assays (Cash et al., 2016, Structure), the 3- and 4-phosphates are the most highly coordinated and the next most thermostabilizing headgroup other than IP4 was Ins(1,3,4)P3. Therefore, we would anticipate that Ins(1,3,4)P3 might stabilize the autoinhibited state, perhaps at higher concentrations, but we have not directly tested this.

      The authors should provide the electron density maps of the P-REX1-IP4 complex in the supplemental figure and highlight the maps for two key interactions between DEP1 and DH and between PH and IP4P 4-helix bundle subdomain.

      The Coulomb potential map of this complex is shown in Figure 2A. Due to the moderate resolution of the reconstruction, side chain details cannot be unambiguously modeled at these interfaces, which is why we do not highlight any observed, specific interactions between sidechains.

      The manuscript was written very well and there is only one typing error in the legend of Supplemental Figure 1.

      Thank you for this catch.

      Details of EM density at significant domain interfaces and at the IP4 binding site should be provided as supplementary material.

      Beyond our comment about interfaces above, we have now provided the map representing the bound IP4 as Figure 4B.

      Line 123: It is difficult to discern in Figure 2A the "severe bend" in the helix that connects the DH and PH domains. It was not apparent (to me, at least) where this helix is located until eventually encountering Figure 4. It would be helpful to highlight or label (maybe with an asterisk) the bend site in Fig 2A.

      This has been labeled in Figure 2A.

      Line 125-126: likewise, It would be helpful to the reader to highlight the GTPase binding site in the DH domain.

      This has been labeled in Figure 2A.

      Line 159. Consider adding a supplementary figure showing a superposition of the two pREX-1 regulatory interfaces in the present structure and in 7SYF.

      A superposition of the two structures has now been added as Supplemental Figure 5. Because both structures are of moderate resolution, it is difficult to place side chains with a high degree of certainty. Thus, we did not think it wise to draw conclusions from comparisons between the details of these interfaces.

      Is the positioning of IP4 dictated by the EM density, prior knowledge from high-resolution structures, or both? A rendering of the EM density over the stick model as a supplementary figure would be helpful.

      This was modeled based on both. This image has now been added as Figure 4B.

      It should be emphasized that the jackknife model is similar to the hinge model proposed by Chang et al (2022).

      Mention of similarity between our model and the model proposed by Chang et al., 2022 occurs twice in the manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Response to Reviewer #1:

      Thank you for the careful reading and the positive evaluation of our manuscript. As you mentioned, the present study tried to address the question of how the lost genomic functions could be compensated by evolutionary adaptation, indicating the potential mechanism of "constructive" rather than "destructive" evolution. Thank you for the instructive comments that helped us to improve the manuscript. We sincerely hope the revised manuscript and the following point-to-point response meet your concerns.

      • Line 80 "Growth Fitness" is this growth rate?

      Yes. The sentence was revised as follows.

      (L87-88) “The results demonstrated that most evolved populations (Evos) showed improved growth rates, in which eight out of nine Evos were highly significant (Fig. 1B, upper).”

      • Line 94 a more nuanced understanding of r/K selection theory, allows for trade-ups between R and K, as well as trade-offs. This may explain why you did not see a trade-off between growth and carrying capacity in this study. See this paper https://doi.org/10.1038/s41396-023-01543-5. Overall, your evos lineages evolved higher growth rates and lower carrying capacity (Figures 1B, C, E). If selection was driving the evolution of higher growth rates, it may have been that there was no selective pressure to maintain high carrying capacity. This means that the evolutionary change you observed in carrying capacity may have been neutral "drift" of the carrying capacity trait, during selection for growth rate, not because of a trade-off between R and K. This is especially likely since carrying capacity declined during evolution. Unless the authors have convincing evidence for a tradeoff, I suggest they remove this claim.

      • Line 96 the authors introduce a previous result where they use colony size to measure growth rate, this finding needs to be properly introduced and explained so that we can understand the context of the conclusion.

      • Line 97 This sentence "the collapse of the trade-off law likely resulted from genome reduction." I am not sure how the authors can draw this conclusion, what is the evidence supporting that the genome size reduction causes the breakdown of the tradeoff between R and K (if there was a tradeoff)?

      Thank you for the reference information and the thoughtful comments. The recommended paper was newly cited, and the description of the trade-off collapse was deleted. Accordingly, the corresponding paragraph was rewritten as follows.

      (L100-115) “Intriguingly, a positive correlation was observed between the growth fitness and the carrying capacity of the Evos (Fig. 1D). It was somehow consistent with the positive correlations between the colony growth rate and the colony size of a genome-reduced strain 11 and between the growth rates and the saturated population size of an assortment of genome reduced strains 13. Nevertheless, the negative correlation between growth rate and carrying capacity, known as the r/K selection30,31 was often observed as the trade-off relationship between r and K in the evolution and ecology studies 32 33,34. As the r/K trade-off was proposed to balance the cellular metabolism that resulted from the cost of enzymes involved 34, the deleted genes might play a role in maintaining the metabolism balance for the r/K correlation. On the other hand, the experimental evolution (i.e., serial transfer) was strictly performed within the exponential growth phase; thus, the evolutionary selection was supposed to be driven by the growth rate without selective pressure to maintain the carrying capacity. The declined carrying capacity might have been its neutral "drift" but not a trade-off to the growth rate. Independent and parallel experimental evolution of the reduced genomes selecting either r or K is required to clarify the actual mechanisms.”

      • Line 103 Genome mutations. The authors claim that there are no mutations in parallel but I see that there is a 1199 base pair deletion in eight of the nine evo strains (Table S3). I would like the author to mention this and I'm actually curious about why the authors don't consider this parallel evolution.

      Thank you for your careful reading. According to your comment, we added a brief description of the 1199-bp deletion detected in the Evos as follows.

      (L119-122) “The number of mutations largely varied among the nine Evos, from two to 13, and no common mutation was detected in all nine Evos (Table S3). A 1,199-bp deletion of insH was frequently found in the Evos (Table S3, highlighted), which well agreed with its function as a transposable sequence.”

      • Line 297 Please describe the media in full here - this is an important detail for the evolution experiment. Very frustrating to go to reference 13 and find another reference, but no details of the method. Looked online for the M63 growth media and the carbon source is not specified. This is critical for working out what selection pressures might have driven the genetic and transcriptional changes that you have measured. For example, the parallel genetic change in 8/9 populations is a deletion of insH and tdcD (according to Table S3). This is acetate kinase, essential for the final step in the overflow metabolism of glucose into acetate. If you have a very low glucose concentration, then it could be that there was selection to avoid fermentation and devote all the pyruvate that results from glycolysis into the TCA cycle (which is more efficient than fermentation in terms of ATP produced per pyruvate).

      Sorry for the missing information on the medium composition, which was additionally described in the Materials and Methods. The glucose concentration in M63 was 22 mM, which was supposed to be enough for bacterial growth. Thank you for your intriguing thinking about linking the medium component to the genome mutation-mediated metabolic changes. As there was no experimental result regarding the biological function of gene mutation in the present study, please allow us to address this issue in our future work.

      (L334-337) “In brief, the medium contains 62 mM dipotassium hydrogen phosphate, 39 mM potassium dihydrogen phosphate, 15 mM ammonium sulfate, 15 μM thiamine hydrochloride, 1.8 μM Iron (II) sulfate, 0.2 mM magnesium sulfate, and 22 mM glucose.”

      • Line 115. I do not understand this argument "They seemed highly related to essentiality, as 11 out of 49 mutated genes were essential (Table S3)." Is this a significant enrichment compared to the expectation, i.e. the number of essential genes in the genome? This enrichment needs to be tested with a Hypergeometric test or something similar.

      • Also, "As the essential genes were known to be more conserved than nonessential ones, the high frequency of the mutations fixed in the essential genes suggested the mutation in essentiality for fitness increase was the evolutionary strategy for reduced genome." I do not think that there is enough evidence to support this claim, and it should be removed.

      Sorry for the unclear description. Yes, the mutations were significantly enriched in the essential genes (11 out of 45 genes) compared to the essential genes in the whole genome (286 out of 3290 genes). The improper description linking the mutation in essential genes to the fitness increase was removed, and an additional explanation on the ratio of essential genes was newly supplied as follows.

      (L139-143) “The ratio of essential genes in the mutated genes was significantly higher than in the total genes (286 out of 3290 genes, Chi-square test p=0.008). As the essential genes were determined according to the growth35 and were known to be more conserved than nonessential ones 36,37, the high frequency of the mutations fixed in the essential genes was highly intriguing and reasonable.”

      • Line 124 Regarding the mutation simulations, I do not understand how the observed data were compared to the simulated data, and how conclusions were drawn. Can the authors please explain the motivation for carrying out this analysis, and clearly explain the conclusions?

      Random simulation was additionally explained in the Materials and Methods and the conclusion of the random simulation was revised in the Results, as follows.

      (L392-401) “The mutation simulation was performed with Python in the following steps. A total of 65 mutations were randomly generated on the reduced genome, and the distances from the mutated genomic locations to the nearest genomic scars caused by genome reduction were calculated. Subsequently, Welch's t-test was performed to evaluate whether the distances calculated from the random mutations were significantly longer or shorter than those calculated from the mutations that occurred in Evos. The random simulation, distance calculation, and statistic test were performed 1,000 times, which resulted in 1,000 p values. Finally, the mean of p values (μp) was calculated, and a 95% reliable region was applied. It was used to evaluate whether the 65 mutations in the Evos were significantly close to the genomic scars, i.e., the locational bias.”

      (L148-157) “Random simulation was performed to verify whether there was any bias or hotspot in the genomic location for mutation accumulation due to the genome reduction. A total of 65 mutations were randomly generated on the reduced genome (Fig. 2B), and the genomic distances from the mutations to the nearest genome reduction-mediated scars were calculated. Welch's t-test was performed to evaluate whether the genomic distances calculated from random mutations significantly differed from those from the mutations accumulated in the Evos. As the mean of p values (1,000 times of random simulations) was insignificant (Fig. 2C, μp > 0.05), the mutations fixed on the reduced genome were either closer or farther to the genomic scars, indicating there was no locational bias for mutation accumulation caused by genome reduction.”

      • Line 140 The authors should give some background here - explain the idea underlying chromosomal periodicity of the transcriptome, to help the reader understand this analysis.

      • Line 142 Here and elsewhere, when referring to a method, do not just give the citation, but also refer to the methods section or relevant supplementary material.

      The analytical process (references and methods) was described in the Materials and Methods, and the reason we performed the chromosomal periodicity was added in the Results as follows.

      (L165-172) “As the E. coli chromosome was structured, whether the genome reduction caused the changes in its architecture, which led to the differentiated transcriptome reorganization in the Evos, was investigated. The chromosomal periodicity of gene expression was analyzed to determine the structural feature of genome-wide pattern, as previously described 28,38. The analytical results showed that the transcriptomes of all Evos presented a common six-period with statistical significance, equivalent to those of the wild-type and ancestral reduced genomes (Fig. 3A, Table S4).”

      • Line 151 "The expression levels of the mutated genes were higher than those of the remaining genes (Figure 3B)"- did this depend on the type of mutation? There were quite a few early stops in genes, were these also more likely to be expressed? And how about the transcriptional regulators, can you see evidence of their downstream impact?

      Sorry, we didn't investigate the detailed regulatory mechanisms of 49 mutated genes, which was supposed to be out of the scope of the present study. Fig. 3B was the statistical comparison between 3225 and 49 genes. It didn't mean that all mutated genes expressed higher than the others. The following sentences were added to address your concern.

      (L181-185) “As the regulatory mechanisms or the gene functions were supposed to be disturbed by the mutations, the expression levels of individual genes might have been either up- or down-regulated. Nevertheless, the overall expression levels of all mutated genes tended to be increased. One of the reasons was assumed to be the mutation essentiality, which remained to be experimentally verified.”

      • Line 199 onward. The authors used WGCNA to analyze the gene expression data of evolved organisms. They identified distinct gene modules in the reduced genome, and through further analysis, they found that specific modules were strongly associated with key biological traits like growth fitness, gene expression changes, and mutation rates. Did the authors expect that there was variation in mutation rate across their populations? Is variation from 3-16 mutations that they observed beyond the expectation for the wt mutation rate? The genetic causes of mutation rate variation are well understood, but I could not see any dinB, mutT,Y, rad, or pol genes among the discovered mutations. I would like the authors to justify the claim that there was mutation rate variation in the evolved populations.

      Thank you for the intriguing thinking. We don't think the mutation rates were significantly varied across the nine populations, as no mutation occurred in the MMR genes, as you noticed. Our previous study showed that the spontaneous mutation rate of the reduced genome was higher than that of the wild-type genome (Nishimura et al., 2017, mBio). As nonsynonymous mutations were not detected in all nine Evos, the spontaneous mutation rate couldn't be calculated (because it should be evaluated according to the ratio of nonsynonymous and synonymous single-nucleotide substitutions in molecular evolution). Therefore, discussing the mutation rate in the present study was unavailable. The following sentence was added for a better understanding of the gene modules.

      (L242-245) “These modules M2, M10 and M16 might be considered as the hotspots for the genes responsible for growth fitness, transcriptional reorganization, and mutation accumulation of the reduced genome in evolution, respectively.”

      • Line 254 I get the idea of all roads leading to Rome, which is very fitting. However, describing the various evolutionary strategies and homeostatic and variable consequence does not sound correct - although I am not sure exactly what is meant here. Looking at Figure 7, I will call strategy I "parallel evolution", that is following the same or similar genetic pathways to adaptation and strategy ii I would call divergent evolution. I am not sure what strategy iii is. I don't want the authors to use the terms parallel and divergent if that's not what they mean. My request here would be that the authors clearly describe these strategies, but then show how their results fit in with the results, and if possible, fit with the naming conventions, of evolutionary biology.

      Thank you for your kind consideration and excellent suggestion. It's our pleasure to adopt your idea in tour study. The evolutionary strategies were renamed according to your recommendation. Both the main text and Fig. 7 were revised as follows.

      (L285-293) “Common mutations22,44 or identical genetic functions45 were reported in the experimental evolution with different reduced genomes, commonly known as parallel evolution (Fig. 7, i). In addition, as not all mutations contribute to the evolved fitness 22,45, another strategy for varied phenotypes was known as divergent evolution (Fig. 7, ii). The present study accentuated the variety of mutations fixed during evolution. Considering the high essentiality of the mutated genes (Table S3), most or all mutations were assumed to benefit the fitness increase, partially demonstrated previously 20. Nevertheless, the evolved transcriptomes presented a homeostatic architecture, revealing the divergent to convergent evolutionary strategy (Fig. 7, iii).”

      Author response image 1.

      • Line 327 Growth rates/fitness. I don't think this should be called growth fitness- a rate is being calculated. I would like the authors to explain how the times were chosen - do the three points have to be during the log phase? Can you also explain what you mean by choosing three ri that have the largest mean and minor variance?

      Sorry for the confusing term usage. The fitness assay was changed to the growth assay. Choosing three ri that have the largest mean and minor variance was to avoid the occasional large values (blue circle), as shown in the following figure. In addition, the details of the growth analysis can be found at https://doi.org/10.3791/56197 (ref. 59), where the video of experimental manipulation, protocol, and data analysis is deposited. The following sentence was added in accordance.

      Author response image 2.

      (L369-371) “The growth rate was determined as the average of three consecutive ri, showing the largest mean and minor variance to avoid the unreliable calculation caused by the occasionally occurring values. The details of the experimental and analytical processes can be found at https://doi.org/10.3791/56197.”

      • Line 403 Chromosomal periodicity analysis. The windows chosen for smoothing (100kb) seem big. Large windows make sense for some things - for example looking at how transcription relates to DNA replication timing, which is a whole-genome scale trend. However, here the authors are looking for the differences after evolution, which will be local trends dependent on specific genes and transcription factors. 100kb of the genome would carry on the order of one hundred genes and might be too coarse-grained to see differences between evos lineages.

      Thank you for the advice. We agree that the present analysis focused on the global trend of gene expression. Varying the sizes may lead to different patterns. Additional analysis was performed according to your comment. The results showed that changes in window size (1, 10, 50, 100, and 200 kb) didn't alter the periodicity of the reduced genome, which agreed with the previous study on a different reduced genome MDS42 of a conserved periodicity (Ying et al., 2013, BMC Genomics). The following sentence was added in the Materials and Methods.

      (L460-461) “Note that altering the moving average did not change the max peak.”

      • Figures - the figures look great. Figure 7 needs a legend.

      Thank you. The following legend was added.

      (L774-777) “Three evolutionary strategies are proposed. Pink and blue arrowed lines indicate experimental evolution and genome reduction, respectively. The size of the open cycles represents the genome size. Black and grey indicate the ancestor and evolved genomes, respectively.”

      Response to Reviewer #2:

      Thank you for reviewing our manuscript and for your fruitful comments. We agree that our study leaned towards elaborating observed findings rather than explaining the detailed biological mechanisms. We focused on the genome-wide biological features rather than the specific biological functions. The underlying mechanisms indeed remained unknown, leaving the questions as you commented. We didn't perform the fitness assay on reconstituted (single and combinatorial) mutants because the research purpose was not to clarify the regulatory or metabolic mechanisms. It's why the RNA-Seq analysis provided the findings on genome-wide patterns and chromosomal view, which were supposed to be biologically valuable. We did understand your comments and complaints that the conclusions were biologically meaningless, as ALE studies that found the specific gene regulation or improved pathway was the preferred story in common, which was not the flow of the present study.

      For this reason, our revision may not address all these concerns. Considering your comments, we tried our best to revise the manuscript. The changes made were highlighted. We sincerely hope the revision and the following point-to-point response are acceptable.

      Major remarks:

      (1) The authors outlined the significance of ALE in genome-reduced organisms and important findings from published literature throughout the Introduction section. The description in L65-69, which I believe pertains to the motivation of this study, seems vague and insufficient to convey the novelty or necessity of this study i.e. it is difficult to grasp what aspects of genome-reduced biology that this manuscript intends to focus/find/address.

      Sorry for the unclear writing. The sentences were rewritten for clarity as follows.

      (L64-70) “Although the reduced growth rate caused by genome reduction could be recovered by experimental evolution, it remains unclear whether such an evolutionary improvement in growth fitness was a general feature of the reduced genome and how the genome-wide changes occurred to match the growth fitness increase. In the present study, we performed the experimental evolution with a reduced genome in multiple lineages and analyzed the evolutionary changes of the genome and transcriptome.”

      (2) What is the rationale behind the lineage selection described in Figure S1 legend "Only one of the four overnight cultures in the exponential growth phase (OD600 = 0.01~0.1) was chosen for the following serial transfer, highlighted in red."?

      The four wells (cultures of different initial cell concentrations) were measured every day, and only the well that showed OD600=0.01~0.1 (red) was transferred with four different dilution rates (e.g., 10, 100, 1000, and 10000 dilution rates). It resulted in four wells of different initial cell concentrations. Multiple dilutions promised that at least one of the wells would show the OD600 within the range of 0.01 to 0.1 after the overnight culture. They were then used for the next serial transfer. Fig. S1 provides the details of the experimental records. The experimental evolution was strictly controlled within the exponential phase, quite different from the commonly conducted ALE that transferred a single culture in a fixed dilution rate. Serial transfer with multiple dilution rates was previously applied in our evolution experiments and well described in Nishimura et al., 2017, mBio; Lu et al., 2022, Comm Biol; Kurokawa et al., 2022, Front Microbiol, etc. The following sentence was added in the Materials and Methods.

      (L344-345) “Multiple dilutions changing in order promised at least one of the wells within the exponential growth phase after the overnight culture.”

      (3) The measured growth rate of the end-point 'F2 lineage' shown in Figure S2 seemed comparable to the rest of the lineages (A1 to H2), but the growth rate of 'F2' illustrated in Figure 1B indicates otherwise (L83-84). What is the reason for the incongruence between the two datasets?

      Sorry for the unclear description. The growth rates shown in Fig. S2 were obtained during the evolution experiment using the daily transfer's initial and final OD600 values. The growth rates shown in Fig. 1B were obtained from the final population (Evos) growth assay and calculated from the growth curves (biological replication, N=4). Fig. 1B shows the precisely evaluated growth rates, and Fig. S2 shows the evolutionary changes in growth rates. Accordingly, the following sentence was added to the Results.

      (L84-87) “As the growth increases were calculated according to the initial and final records, the exponential growth rates of the ancestor and evolved populations were obtained according to the growth curves for a precise evaluation of the evolutionary changes in growth.”

      (4) Are the differences in growth rate statistically significant in Figure 1B?

      Eight out of nine Evos were significant, except F2. The sentences were rewritten and associated with the revised Fig. 1B, indicating significance.

      (L87-90) “The results demonstrated that most evolved populations (Evos) showed improved growth rates, in which eight out of nine Evos were highly significant (Fig. 1B, upper). However, the magnitudes of growth improvement were considerably varied, and the evolutionary dynamics of the nine lineages were somehow divergent (Fig. S2).”

      (5) The evolved lineages showed a decrease in their maximal optical densities (OD600) compared to the ancestral strain (L85-86). ALE could accompany changes in cell size and morphologies, (doi: 10.1038/s41586-023-06288-x; 10.1128/AEM.01120-17), which may render OD600 relatively inaccurate for cell density comparison. I suggest using CFU/mL metrics for the sake of a fair comparison between Anc and Evo.

      The methods evaluating the carrying capacity (i.e., cell density, population size, etc.) do not change the results. Even using CFU is unfair for the living cells that can not form colonies and unfair if the cell size changes. Optical density (OD600) provides us with the temporal changes of cell growth in a 15-minute interval, which results in an exact evaluation of the growth rate in the exponential phase. CFU is poor at recording the temporal changes of population changes, which tend to result in an inappropriate growth rate. Taken together, we believe that our method was reasonable and reliable. We hope you can accept the different way of study.

      (6) Please provide evidence in support of the statement in L115-119. i.e. statistical analysis supporting that the observed ratio of essential genes in the mutant pool is not random.

      The statistic test was performed, and the following sentence was added.

      (L139-141) “The ratio of essential genes in the mutated genes was significantly higher than in the total genes (286 out of 3290 genes, Chi-square test p=0.008).”

      (7) The assumption that "mutation abundance would correlate to fitness improvement" described in L120-122: "The large variety in genome mutations and no correlation of mutation abundance to fitness improvement strongly suggested that no mutations were specifically responsible or crucially essential for recovering the growth rate of the reduced genome" is not easy to digest, in the sense that (i) the effect of multiple beneficial mutations are not necessarily summative, but are riddled with various epistatic interactions (doi: 10.1016/j.mec.2023.e00227); (ii) neutral hitchhikers are of common presence (you could easily find reference on this one); (iii) hypermutators that accumulate greater number of mutations in a given time are not always the eventual winners in competition games (doi: 10.1126/science.1056421). In this sense, the notion that "mutation abundance correlates to fitness improvement" in L120-122 seems flawed (for your perusal, doi: 10.1186/gb-2009-10-10-r118).

      Sorry for the improper description and confusing writing, and thank you for the fruitful knowledge on molecular evolution. The sentence was deleted, and the following one was added.

      (L145-146) “Nevertheless, it was unclear whether and how these mutations were explicitly responsible for recovering the growth rate of the reduced genome.”

      (8) Could it be possible that the large variation in genome mutations in independent lineages results from a highly rugged fitness landscape characterized by multiple fitness optima (doi: 10.1073/pnas.1507916112)? If this is the case, I disagree with the notion in L121-122 "that no mutations were specifically responsible or crucially essential" It does seem to me that, for example, the mutations in evo A2 are specifically responsible and essential for the fitness improvement of evo A2 in the evolutionary condition (M63 medium). Fitness assessment of individual (or combinatorial) mutants reconstituted in the Ancestral background would be a bonus.

      Thank you for the intriguing thinking. The sentence was deleted. Please allow us to adapt your comment to the manuscript as follows.

      (L143-145) “The large variety of genome mutations fixed in the independent lineages might result from a highly rugged fitness landscape 38.”

      (9) L121-122: "...no mutations were specifically responsible or crucially essential for recovering the growth rate of the reduced genome". Strictly speaking, the authors should provide a reference case of wild-type E. coli ALE in order to reach definitive conclusions that the observed mutation events are exclusive to the genome-reduced strain. It is strongly recommended that the authors perform comparative analysis with an ALEed non-genome-reduced control for a more definitive characterization of the evolutionary biology in a genome-reduced organism, as it was done for "JCVI-syn3.0B vs non-minimal M. mycoides" (doi: 10.1038/s41586-023-06288-x) and "E. coli eMS57 vs MG1655" (doi: 10.1038/s41467-019-08888-6).

      The improper description was deleted in response to comments 7 and 8. The mentioned references were cited in the manuscript (refs 21 and 23). Thank you for the experimental advice. We are sorry that the comparison of wild-type and reduced genomes was not in the scope of the present study and will probably be reported soon in our future work.

      (10) L146-148: "The homeostatic periodicity was consistent with our previous findings that the chromosomal periodicity of the transcriptome was independent of genomic or environmental variation" A Previous study also suggested that the amplitudes of the periodic transcriptomes were significantly correlated with the growth rates (doi: 10.1093/dnares/dsaa018). Growth rates of 8/9 Evos were higher compared to Anc, while that of Evo F2 remained similar. Please comment on the changes in amplitudes of the periodic transcriptomes between Anc and each Evo.

      Thank you for the suggestion. The correlation between the growth rates and the amplitudes of chromosomal periodicity was statistically insignificant (p>0.05). It might be a result of the limited data points. Compared with the only nine data points in the present study, the previous study analyzed hundreds of transcriptomes associated with the corresponding growth rates, which are suitable for statistical evaluation. In addition, the changes in growth rates were more significant in the previous study than in the present study, which might influence the significance. It's why we did not discuss the periodic amplitude.

      (11) Please elaborate on L159-161: "It strongly suggested the essentiality mutation for homeostatic transcriptome architecture happened in the reduced genome.".

      Sorry for the improper description. The sentence was rewritten as follows.

      (L191-193) “The essentiality of the mutations might have participated in maintaining the homeostatic transcriptome architecture of the reduced genome.”

      (12) Is FPKM a valid metric for between-sample comparison? The growing consensus in the community adopts Transcripts Per Kilobase Million (TPM) for comparing gene expression levels between different samples (Figure 3B; L372-379).

      Sorry for the unclear description. The FPKM indicated here was globally normalized, statistically equivalent to TPM. The following sentence was added to the Materials and Methods.

      (L421-422) “The resulting normalized FPKM values were statistically equivalent to TPM.”

      (13) Please provide % mapped frequency of mutations in Table S3.

      They were all 100%. The partially fixed mutations were excluded in the present study. The following sentence was added to the caption of Table S3.

      (Supplementary file, p 9) “Note that the entire population held the mutations, i.e., 100% frequency in DNA sequencing.”

      (14) To my knowledge, M63 medium contains glucose and glycerol as carbon sources. The manuscript would benefit from discussing the elements that impose selection pressure in the M63 culture condition.

      Sorry for the missing information on M63, which contains 22 mM glucose as the only carbon source. The medium composition was added in the Materials and Methods, as follows.

      (L334-337) “In brief, the medium contains 62 mM dipotassium hydrogen phosphate, 39 mM potassium dihydrogen phosphate, 15 mM ammonium sulfate, 15 μM thiamine hydrochloride, 1.8 μM Iron (II) sulfate, 0.2 mM magnesium sulfate, and 22 mM glucose.”

      (15) The RNA-Seq datasets for Evo strains seemed equally heterogenous, just as their mutation profiles. However, the missing element in their analysis is the directionality of gene expression changes. I wonder what sort of biological significance can be derived from grouping expression changes based solely on DEGs, without considering the magnitude and the direction (up- and down-regulation) of changes? RNA-seq analysis in its current form seems superficial to derive biologically meaningful interpretations.

      We agree that most studies often discuss the direction of transcriptional changes. The present study aimed to capture a global view of the magnitude of transcriptome reorganization. Thus, the analyses focused on the overall features, such as the abundance of DEGs, instead of the details of the changes, e.g., the up- and down-regulation of DEGs. The biological meaning of the DEGs' overview was how significantly the genome-wide gene expression fluctuated, which might be short of an in-depth view of individual gene expression. The following sentence was added to indicate the limitation of the present analysis.

      (L199-202) “Instead of an in-depth survey on the directional changes of the DEGs, the abundance and functional enrichment of DEGs were investigated to achieve an overview of how significant the genome-wide fluctuation in gene expression, which ignored the details of individual genes.”

      Minor remarks

      (1) L41: brackets italicized "(E. coli)".

      It was fixed as follows.

      (L40) “… Escherichia coli (E. coli) cells …”

      (2) Figure S1. It is suggested that the x-axis of ALE monitor be set to 'generations' or 'cumulative generations', rather than 'days'.

      Thank you for the suggestion. Fig. S1 describes the experimental procedure, so the" day" was used. Fig. S2 presents the evolutionary process, so the "generation" was used, as you recommended here.

      (3) I found it difficult to digest through L61-64. Although it is not within the job scope of reviewers to comment on the language style, I must point out that the manuscript would benefit from professional language editing services.

      Sorry for the unclear writing. The sentences were revised as follows.

      (L60-64) “Previous studies have identified conserved features in transcriptome reorganization, despite significant disruption to gene expression patterns resulting from either genome reduction or experimental evolution 27-29. The findings indicated that experimental evolution might reinstate growth rates that have been disrupted by genome reduction to maintain homeostasis in growing cells.”

      (4) Duplicate references (No. 21, 42).

      Sorry for the mistake. It was fixed (leaving ref. 21).

      (5) Inconsistency in L105-106: "from two to 13".

      "From two to 13" was adopted from the language editing. It was changed as follows.

      (L119) “… from 2 to 13, …”

      Response to Reviewer #3:

      Thank you for reviewing our manuscript and for the helpful comments, which improved the strength of the manuscript. The recommended statistical analyses essentially supported the statement in the manuscript were performed, and those supposed to be the new results in the scope of further studies remained unconducted. The changes made in the revision were highlighted. We sincerely hope the revised manuscript and the following point-to-point response meet your concerns. You will find all your suggested statistic tests in our future work that report an extensive study on the experimental evolution of an assortment of reduced genomes.

      (1) Line 106 - "As 36 out of 45 SNPs were nonsynonymous, the mutated genes might benefit the fitness increase." This argument can be strengthened. For example, the null expectation of nonsynonymous SNPs should be discussed. Is the number of observed nonsynonymous SNPs significantly higher than the expected one?

      (2) Line 107 - "In addition, the abundance of mutations was unlikely to be related to the magnitude of fitness increase." Instead of just listing examples, a regression analysis can be added.

      Yes, it's significant. Random mutations lead to ~33% of nonsynonymous SNP in a rough estimation. Additionally, the regression is unreliable because there's no statistical significance between the number of mutations and the magnitude of fitness increase. Accordingly, the corresponding sentences were revised with additional statistical tests.

      (L123-129) “As 36 out of 45 SNPs were nonsynonymous, which was highly significant compared to random mutations (p < 0.01), the mutated genes might benefit fitness increase. In addition, the abundance of mutations was unlikely to be related to the magnitude of fitness increase. There was no significant correlation between the number of mutations and the growth rate in a statistical view (p > 0.1). Even from an individual close-up viewpoint, the abundance of mutations poorly explained the fitness increase.”

      (3) Line 114 - "They seemed highly related to essentiality, as 11 out of 49 mutated genes were essential (Table S3)." Here, the information mentioned in line 153 ("the ratio of essential to all genes (302 out of 3,290) in the reduced genome.") can be used. Then a statistical test for a contingency table can be used.

      (4) Line 117 - "the high frequency of the mutations fixed in the essential genes suggested the mutation in essentiality for fitness increase was the evolutionary strategy for reduced genome." What is the expected number of fixed mutations in essential genes vs non-essential genes? Is the observed number statistically significantly higher?

      Sorry for the improper and insufficient information on the essential genes. Yes, it's significant. The statistical test was additionally performed. The corresponding part was revised as follows.

      (L134-146) “They seemed highly related to essentiality7 (https://shigen.nig.ac.jp/ecoli/pec/genes.jsp), as 11 out of 49 mutated genes were essential (Table S3). Although the essentiality of genes might differ between the wild-type and reduced genomes, the experimentally determined 302 essential genes in the wild-type E. coli strain were used for the analysis, of which 286 were annotated in the reduced genome. The ratio of essential genes in the mutated genes was significantly higher than in the total genes (286 out of 3290 genes, Chi-square test p=0.008). As the essential genes were determined according to the growth35 and were known to be more conserved than nonessential ones 36,37, the high frequency of the mutations fixed in the essential genes was highly intriguing and reasonable. The large variety of genome mutations fixed in the independent lineages might result from a highly rugged fitness landscape 38. Nevertheless, it was unclear whether and how these mutations were explicitly responsible for recovering the growth rate of the reduced genome.”

      (5) The authors mentioned no overlapping in the single mutation level. Is that statistically significant? The authors can bring up what the no-overlap probability is given that there are in total x number of fixed mutations observed (either theory or simulation is good).

      Sorry, we feel confused about this comment. It's unclear to us why it needs to be statistically simulated. Firstly, the mutations were experimentally observed. The result that no overlapped mutated genes were detected was an Experimental Fact but not a Computational Prediction. We feel sorry that you may over-interpret our finding as an evolutionary rule, which always requires testing its reliability statistically. We didn't conclude that the evolution had no overlapped mutations. Secondly, considering 65 times random mutations happened to a ~3.9 Mb sequence, the statistical test was meaningful only if the experimental results found the overlapped mutations. It is interesting how often the random mutations cause the overlapped mutations in parallel evolutionary lineages while increasing the evolutionary lineages, which seems to be out of the scope of the present study. We are happy to include the analysis in our ongoing study on the experimental evolution of reduced genomes.

      (6) The authors mentioned no overlapping in the single mutation level. How about at the genetic level? Some fixed mutations occur in the same coding gene. Is there any gene with a significantly enriched number of mutations?

      No mutations were fixed in the same gene of biological function, as shown in Table S3. If we say the coding region, the only exception is the IS sequences, well known as the transposable sequences without genetic function. The following description was added.

      (L119-122) “The number of mutations largely varied among the nine Evos, from 2 to 13, and no common mutation was detected in all nine Evos (Table S3). A 1,199-bp deletion of insH was frequently found in the Evos (Table S3, highlighted), which well agreed with its function as a transposable sequence.”

      (7) Line 151-156- It seems like the authors argue that the expression level differences can be just explained by the percentage of essential genes that get fixed mutations. One further step for the argument could be to compare the expression level of essential genes with vs without fixed mutations. Also, the authors can compare the expression level of non-essential genes with vs without fixed mutations. And the authors can report whether the differences in expression level became insignificant after the control of the essentiality.

      It's our pleasure that the essentiality intrigued you. Thank you for the analytical suggestion, which is exciting and valuable for our studies. As only 11 essential genes were detected here and "Mutation in essentiality" was an indication but not the conclusion of the present study, we would like to apply the recommended analysis to the datasets of our ongoing study to demonstrate this statement. Thank you again for your fruitful analytical advice.

      (8) Line 169- "The number of DEGs partially overlapped among the Evos declined significantly along with the increased lineages of Evos (Figure 4B). " There is a lack of statistical significance here while the word "significantly" is used. One statistical test that can be done is to use re-sampling/simulation to generate a null expectation of the overlapping numbers given the DEGs for each Evo line and the total number of genes in the genome. The observed number can then be compared to the distribution of the simulated numbers.

      Sorry for the inappropriate usage of the term. Whether it's statistically significant didn't matter here. The word "significant" was deleted as follows.

      (L205--206) “The number of DEGs partially overlapped among the Evos declined along with the increased lineages of Evos (Fig. 4B).”

      (9) Line 177-179- "In comparison,1,226 DEGs were induced by genome reduction. The common DEGs 177 of genome reduction and evolution varied from 168 to 540, fewer than half of the DEGs 178 responsible for genome reduction in all Evos" Is the overlapping number significantly lower than the expectation? The hypergeometric test can be used for testing the overlap between two gene sets.

      There's no expectation for how many DEGs were reasonable. Not all numbers experimentally obtained are required to be statistically meaningful, which is commonly essential in computational and data science.

      (10) The authors should give more information about the ancestral line used at the beginning of experimental evolution. I guess it is one of the KHK collection lines, but I can not find more details. There are many genome-reduced lines. Why is this certain one picked?

      Sorry for the insufficient information on the reduced genome used for the experimental evolution. The following descriptions were added in the Results and the Materials and Methods, respectively.

      (L75-79) “The E. coli strain carrying a reduced genome, derived from the wild-type genome W3110, showed a significant decline in its growth rate in the minimal medium compared to the wild-type strain 13. To improve the genome reduction-mediated decreased growth rate, the serial transfer of the genome-reduced strain was performed with multiple dilution rates to keep the bacterial growth within the exponential phase (Fig. S1), as described 17,20.”

      (L331-334) “The reduced genome has been constructed by multiple deletions of large genomic fragments 58, which led to an approximately 21% smaller size than its parent wild-type genome W3110.”

      (11) How was the saturated density in Figure 1 actually determined? In particular, the fitness assay of growth curves is 48h. But it seems like the experimental evolution is done for ~24 h cycles. If the Evos never experienced a situation like a stationary phase between 24-48h, and if the author reported the saturated density 48 h in Figure 1, the explanation of the lower saturated density can be just relaxation from selection and may have nothing to do with the increase of growth rate.

      Sorry for the unclear description. Yes, you are right. The evolution was performed within the exponential growth phase (keeping cell division constant), which means the Evos never experienced the stationary phase (saturation). The final evolved populations were subjected to the growth assay to obtain the entire growth curves for calculating the growth rate and the saturated density. Whether the decreased saturated density and the increased growth rate were in a trade-off relationship remained unclear. The corresponding paragraph was revised as follows.

      (L100-115) “Intriguingly, a positive correlation was observed between the growth fitness and the carrying capacity of the Evos (Fig. 1D). It was somehow consistent with the positive correlations between the colony growth rate and the colony size of a genome-reduced strain 11 and between the growth rates and the saturated population size of an assortment of genome reduced strains 13. Nevertheless, the negative correlation between growth rate and carrying capacity, known as the r/K selection30,31 was often observed as the trade-off relationship between r and K in the evolution and ecology studies 32 33,34. As the r/K trade-off was proposed to balance the cellular metabolism that resulted from the cost of enzymes involved 34, the deleted genes might play a role in maintaining the metabolism balance for the r/K correlation. On the other hand, the experimental evolution (i.e., serial transfer) was strictly performed within the exponential growth phase; thus, the evolutionary selection was supposed to be driven by the growth rate without selective pressure to maintain the carrying capacity. The declined carrying capacity might have been its neutral "drift" but not a trade-off to the growth rate. Independent and parallel experimental evolution of the reduced genomes selecting either r or K is required to clarify the actual mechanisms.”

      (12) What annotation of essentiality was used in this paper? In particular, the essentiality can be different in the reduced genome background compared to the WT background.

      Sorry for the unclear definition of the essential genes. They are strictly limited to the 302 essential genes experimentally determined in the wild-type E coli strain. Detailed information can be found at the following website: https://shigen.nig.ac.jp/ecoli/pec/genes.jsp. We agree that the essentiality could differ between the WT and reduced genomes. Identifying the essential genes in the reduced genome will be an exhaustedly vast work. The information on the essential genes defined in the present study was added as follows.

      (L134-139) “They seemed highly related to essentiality7 (https://shigen.nig.ac.jp/ecoli/pec/genes.jsp), as 11 out of 49 mutated genes were essential (Table S3). Although the essentiality of genes might differ between the wild-type and reduced genomes, the experimentally determined 302 essential genes in the wild-type E. coli strain were used for the analysis, of which 286 were annotated in the reduced genome.”

      (13) The fixed mutations in essential genes are probably not rarely observed in experimental evolution. For example, fixed mutations related to RNA polymerase can be frequently seen when evolving to stressful environments. I think the author can discuss this more and elaborate more on whether they think these mutations in essential genes are important in adaptation or not.

      Thank you for your careful reading and the suggestion. As you mentioned, we noticed that the mutations in RNA polymerases (rpoA, rpoB, and rpoD) were identified in three Evos. As they were not shared across all Evos, we didn't discuss the contribution of these mutations to evolution. Instead of the individual functions of the mutated essential gene functions, we focused on the enriched gene functions related to the transcriptome reorganization because they were the common feature observed across all Evos and linked to the whole metabolic or regulatory pathways, which are supposed to be more biologically reasonable and interpretable. The following sentence was added to clarify our thinking.

      (L268-273) “In particular, mutations in the essential genes, such as RNA polymerases (rpoA, rpoB, rpoD) identified in three Evos (Table S3), were supposed to participate in the global regulation for improved growth. Nevertheless, the considerable variation in the fixed mutations without overlaps among the nine Evos (Table 1) implied no common mutagenetic strategy for the evolutionary improvement of growth fitness.”

      (14) In experimental evolution to new environments, several previous literature also show that long-term experimental evolution in transcriptome is not consistent or even reverts the short-term response; short-term responses were just rather considered as an emergency plan. They seem to echo what the authors found in this manuscript. I think the author can refer to some of those studies more and make a more throughput discussion on short-term vs long-term responses in evolution.

      Thank you for the advice. It's unclear to us what the short-term and long-term responses referred to mentioned in this comment. The "Response" is usually used as the phenotypic or transcriptional changes within a few hours after environmental fluctuation, generally non-genetic (no mutation). In comparison, long-term or short-term experimental "Evolution" is associated with genetic changes (mutations). Concerning the Evolution (not the Response), the long-term experimental evolution (>10,000 generations) was performed only with the wild-type genome, and the short-term experimental evolution (500~2,000 generations) was more often conducted with both wild-type and reduced genomes, to our knowledge. Previous landmark studies have intensively discussed comparing the wild-type and reduced genomes. Our study was restricted to the reduced genome, which was constructed differently from those reduced genomes used in the reported studies. The experimental evolution of the reduced genomes has been performed in the presence of additional additives, e.g., antibiotics, alternative carbon sources, etc. That is, neither the genomic backgrounds nor the evolutionary conditions were comparable. Comparison of nothing common seems to be unproductive. We sincerely hope the recommended topics can be applied in our future work.

      Some minor suggestions

      • Figures S3 & Table S2 need an explanation of the abbreviations of gene categories.

      Sorry for the missing information. Figure S3 and Table S3 were revised to include the names of gene categories. The figure was pasted followingly for a quick reference.

      Author response image 3.

      • I hope the authors can re-consider the title; "Diversity for commonality" does not make much sense to me. For example, it can be simply just "Diversity and commonality."

      Thank you for the suggestion. The title was simplified as follows.

      (L1) “Experimental evolution for the recovery of growth loss due to genome reduction.”

      • It is not easy for me to locate and distinguish the RNA-seq vs DNA-seq files in DRA013662 at DDBJ. Could you make some notes on what RNA-seq actually are, vs what DNA-seq files actually are?

      Sorry for the mistakes in the DRA number of DNA-seq. DNA-seq and RNA-seq were deposited separately with the accession IDs of DRA013661 and DRA013662, respectively. The following correction was made in the revision.

      (L382-383) “The raw datasets of DNA-seq were deposited in the DDBJ Sequence Read Archive under the accession number DRA013661.”

    1. Author response:

      eLife assessment

      In this valuable study, Kumar et al., provide evidence suggesting that the p130Cas drives the formation of condensates that sprout from focal adhesions to cytoplasm and suppress translation. Pending further substantiation, this study was found to be likely to provide previously unappreciated insights into the mechanisms linking focal adhesions to the regulation of protein synthesis and was thus considered to be of broad general interest. However, the evidence supporting the proposed model was incomplete; additional evidence is warranted to substantiate the relationship between p130Cas condensates and mRNA translation and establish corresponding functional consequences.

      We thank the Elife editorial team for their positive assessment of the broad significance of our manuscript. We fully agree that the functional consequences need to be explored in more detail. We feel that many of the criticisms are valid points that are not easily addressed via available tools, thus, should be considered limitations of present approaches. We hope that readers appreciate that identification of a new class of liquid-liquid phase separations calls for much more work to fully explore their characteristics, regulation and function, which will likely advance many areas of cell biology and perhaps even medicine.

      Reviewer #1 (Public Review):

      Summary:

      The authors demonstrated the phenomenon of p130Cas, a protein primarily localized at focal adhesions, and its formation of condensates. They identified the constituents within the condensates, which include other focal adhesion proteins, paxillin, and RNAs. Furthermore, they proposed a link between p130Cas condensates and translation.

      Strengths:

      Adhesion components undergo rapid exchange with the cytoplasm for some unclear biological functions. Given that p130Cas is recognized as a prominent mechanical focal adhesion component, investigating its role in condensate formation, particularly its impact on the translation process, is intriguing and significant.

      We thank the reviewer for recognizing the functional significance of the work.

      Weaknesses:

      The authors identified the disordered region of p130Cas and investigated the formation of p130Cas condensate. They attempted to demonstrate that p130Cas condensates inhibit translation, but the results did not fully support this assertion. There are several comments below:

      (1) Despite isolating p130Cas-GFP protein using GFP-trap beads, the authors cannot conclusively eliminate the possibility of isolating p130Cas from focal adhesions. While the characterization of the GFP-tagged pulls can reveal the proteins and RNAs associated with p130Cas, they need to clarify their intramolecular mechanism of localization within p130Cas droplets. Whether the protein condensates retain their liquid phase or these GFP-p130Cas pulls represent protein aggregate remains uncertain.

      We agree, the isolation from cell lysates does not distinguish between focal adhesions and cytoplasmic LLPS. We note that p130Cas in focal adhesions also appears to be in LLPS. But there are no methods available to isolate them separately. We acknowledge this is a limitation of the study.

      (2) The authors utilized hexanediol and ammonium acetate to highlight the phenomenon of p130Cas condensates. Although hexanediol is an inhibitor for hydrophobic interactions and ammonium acetate is a salt, a more thorough explanation of the intramolecular mechanisms underlying p130Cas protein-protein interaction is required. Additionally, given that the size of p130Cas condensates can exceed >100um2, classification is needed to differentiate between p130Cas condensates and protein aggregation.

      Ammonium acetate, which works by promoting hydrophobic interactions and weak Van der Waals forces, has been widely used in phase separation studies to change ionic strength without altering intracellular pH. Conversely, hexanediol weakens hydrophobic/ Van der Walls interactions that commonly mediate phase separation of IDRs. In the case of p130Cas, the multiple tyrosines and within the scaffolding domain are obvious targets. If the reviewer is asking us to resolve the detailed hydrophobic interactions within the scaffolding domain, this is far beyond the scope of the current paper.

      Protein aggregates are defined by their characteristics (e.g irreversibility, departure from spherical) not by size. Older, larger droplets remain circular and show slower but still measurable rates of exchange. Moreover, droplets are essentially absent after trypsinizing and replating cells. All these results argue against aggregates.

      (3) The connection between p130Cas condensates and translation inhibition appears tenuous. The data only suggests a correlation between p130Cas expression and translation inhibition. Further evidence is required to bolster this hypothesis.

      The optogenetic experiment shows that triggering LLPS by dimerizing p130Cas results in inhibition of translation. This is a causal not a correlative experiment. The reviewer may be thinking that dimerizing p130Cas could stimulate focal adhesion signaling, activating FAK or a src family kinase or other signals. However, none of these signals has been linked to inhibition of cell growth or migration. Thus, we agree that this is a limitation but consider it a low probability mechanism.

      Reviewer #2 (Public Review):

      Summary:

      In this article, Kumar et al., report on a previously unappreciated mechanism of translational regulation whereby p130Cas induces LLPS condensates that then traffic out from focal adhesion into the cytoplasm to modulate mRNA translation. Specifically, the authors employed EGFP-tagged p130Cas constructs, endogenous p130Cas, and p130Cas knockouts and mutants in cell-based systems. These experiments in conjunction with various imaging techniques revealed that p130Cas drives assembly of LLPS condensates in a manner that is largely independent of tyrosine phosphorylation. This was followed by in vitro EGFP-tagged p130Cas-dependent induction of LLPS condensates and determination of their composition by mass spectrometry, which revealed enrichment of proteins involved in RNA metabolism in the condensates. The authors excluded the plausibility that p130Cas-containing condensates co-localize with stress granules or p-bodies. Next, the authors determined mRNA compendium of p130Cas-containing condensates which revealed that they are enriched in transcripts encoding proteins implicated in cell cycle progression, survival, and cell-cell communication. These findings were followed by the authors demonstrating that p130Cas-containing condensates may be implicated in the suppression of protein synthesis using puromycylation assay. Altogether, it was found that this study significantly advances the knowledge pertinent to the understanding of molecular underpinnings of the role of p130Cas and more broadly focal adhesions on cellular function, and to this end, it is likely that this report will be of interest to a broad range of scientists from a wide spectrum of biomedical disciplines including cell, molecular, developmental and cancer biologists.

      Strengths:

      Altogether, this study was found to be of potentially broad interest inasmuch as it delineates a hitherto unappreciated link between p130Cas, LLPS, and regulation of mRNA translation. More broadly, this report provides unique molecular insights into the previously unappreciated mechanisms of the role of focal adhesions in regulating protein synthesis. Overall, it was thought that the provided data sufficiently supported most of the authors' conclusions. It was also thought that this study incorporates an appropriate balance of imaging, cell and molecular biology, and biochemical techniques, whereby the methodology was found to be largely appropriate.

      We thank reviewer for this positive assessment.

      Weaknesses:

      Two major weaknesses of the study were noted. The first issue is related to the experiments establishing the role of p130Cas-driven condensates in translational suppression, whereby it remained unclear whether these effects are affecting global mRNA translation or are specific to the mRNAs contained in the condensates. Moreover, some of the results in this section (e.g., experiments using cycloheximide) may be open to alternative interpretation. The second issue is the apparent lack of functional studies, and although the authors speculate that the described mechanism is likely to mediate the effects of focal adhesions on e.g., quiescence, experimental testing of this tenet was lacking.

      We appreciate the reviewer’s insights. Assessing translational inhibition for specific genes rather than global measurement of translation is an important direction for future work.

      Regarding the cycloheximide experiments, we are unsure what the reviewer means. We used it as a control for puromycin labeling but this is a very standard approach. It seems more likely that the question concerns Fig 5G, where we used it to sequester mRNAs on ribosomes to deplete from other pools. In this case, p130cas condensates decrease after 2 minutes. The reviewer may be suggesting that this effect could be due to blocked translation per se and loss of short-lived proteins. We acknowledge that this is possible but given the very rapid effect (2 min), we think it unlikely.

      Lastly, we agree with the reviewer that further functional studies in quiescence or senescence are warranted; however, these are extensive, open-ended studies and we will not be able to include them as part of the current paper.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      In this valuable study, the authors investigate the transcriptional landscape of tuberculous meningitis, revealing important molecular differences contributed by HIV co-infection. Whilst some of the evidence presented is compelling, the bioinformatics analysis is limited to a descriptive narrative of gene-level functional annotations, which are somewhat basic and fail to define aspects of biology very precisely. Whilst the work will be of broad interest to the infectious disease community, validation of the data is critical for future utility.

      We appreciate with eLife’s positive assessment, although we challenge the conclusion that we ‘fail to define aspects of biology very precisely’. Our stated objective was to use bioinformatics tools to identify the biological pathways and hub genes associated with TBM pathogenesis and the eLife assessment affirms we have investigated ‘the transcriptional landscape of tuberculous meningitis’. To more precisely define aspects of the biology will require another study with different design and methods.

      Reviewer #1 (Public Review):

      Summary:

      Tuberculous meningitis (TBM) is one of the most severe forms of extrapulmonary TB. TBM is especially prevalent in people who are immunocompromised (e.g. HIV-positive). Delays in diagnosis and treatment could lead to severe disease or mortality. In this study, the authors performed the largest-ever host whole blood transcriptomics analysis on a cohort of 606 Vietnamese participants. The results indicated that TBM mortality is associated with increased neutrophil activation and decreased T and B cell activation pathways. Furthermore, increased angiogenesis was also observed in HIV-positive patients who died from TBM, whereas activated TNF signaling and down-regulated extracellular matrix organisation were seen in the HIV-negative group. Despite similarities in transcriptional profiles between PTB and TBM compared to healthy controls, inflammatory genes were more active in HIV-positive TBM. Finally, 4 hub genes (MCEMP1, NELL2, ZNF354C, and CD4) were identified as strong predictors of death from TBM.

      Strengths:

      This is a really impressive piece of work, both in terms of the size of the cohort which took years of effort to recruit, sample, and analyse, and also the meticulous bioinformatics performed. The biggest advantage of obtaining a whole blood signature is that it allows an easier translational development into a test that can be used in the clinical with a minimally invasive sample. Furthermore, the data from this study has also revealed important insights into the mechanisms associated with mortality and the differences in pathogenesis between HIV-positive and HIV-negative patients, which would have diagnostic and therapeutic implications.

      Weaknesses:

      The data on blood neutrophil count is really intriguing and seems to provide a very powerful yet easy-to-measure method to differentiate survival vs. death in TBM patients. It would be quite useful in this case to perform predictive analysis to see if neutrophil count alone, or in combination with gene signature, can predict (or better predict) mortality, as it would be far easier for clinical implementation than the RNA-based method. Moreover, genes associated with increased neutrophil activation and decreased T cell activation both have significantly higher enrichment scores in TBM (Figure 9) and in morality (Figure 8). While I understand the basis of selecting hub genes in the significant modules, they often do not represent these biological pathways (at least not directly associated in most cases). If genes were selected based on these biologically relevant pathways, would they have better predictive values?

      We conducted a sensitivity analysis including blood neutrophil as a potential predictor in the multivariate Cox elastic-net regression model for important predictor selection (Table S14). In this analysis, all six selected important predictors (genes and clinical risk factors) identified in the original analysis (Table S13) were also selected, together with blood neutrophil number. Additionally, we evaluated the predictive value of blood neutrophil alone, which demonstrated poor performance, with an optimism-corrected AUC of 0.63 for all TBM, 0.67 for HIV-negative TBM, and 0.70 for HIV-positive TBM. Even when combined with identified gene signatures, blood neutrophil did not improve the overall performance of predictive model (optimism-corrected AUC of 0.79 for all TBM, 0.76 for HIV-negative TBM, and 0.80 for HIV-positive). These results indicate that identified hub genes exhibit better predictive values compared to blood neutrophil alone or in combination. These findings have been incorporated into our manuscript results.

      To test whether pathway representative genes have better predictive values than hub genes, we included all these genes in the analysis for important predictor selection. Pathway representative genes comprised ANXA3 and CXCR2 representing neutrophil activation and IL1b representing acute inflammatory response. We observed that all hub genes (MCEMP1, NELL2, ZNF354C, and CD4) consistently emerged as the most important genes with the highest selection in the models, compared to the rest, in both the HIV-negative TBM and HIV-positive TBM cohorts. Additionally, these identified hub genes were still selected when testing together with other hub genes representing relevant biological pathways associated with TBM mortality, such as CYSTM1 involved in neutrophil activation, TRAF5 involved in NF-kappa B signaling pathway, CD28 and TESPA1 involved in T cell receptor signaling. These results show that selected genes based on known biologically relevant pathways did not give better predictive values than the identified hub genes in the significant modules.

      Reviewer #2 (Public Review):

      Summary:

      This manuscript describes the analysis of blood transcriptomic data from patients with TB meningitis, with and without HIV infection, with some comparison to those of patients with pulmonary tuberculosis and healthy volunteers. The objectives were to describe the comparative biological differences represented by the blood transcriptome in TBM associated with HIV co-infection or survival/mortality outcomes and to identify a blood transcriptional signature to predict these outcomes. The authors report an association between mortality and increased levels of acute inflammation and neutrophil activation, but decreased levels of adaptive immunity and T/B cell activation. They propose a 4-gene prognostic signature to predict mortality.

      Strengths:

      Biological evaluations of blood transcriptomes in TB meningitis and their relationship to outcomes have not been extensively reported previously.

      The size of the data set is a major strength and is likely to be used extensively for secondary analyses in this field of research.

      Weaknesses:

      The bioinformatic analysis is limited to a descriptive narrative of gene-level functional annotations curated in GO and KEGG databases. This analysis cannot be used to make causal inferences. In addition, the functional annotations are limited to 'high-level' terms that fail to define biology very precisely. At best, they require independent validation for a given context. As a result, the conclusions are not adequately substantiated. The identification of a prognostic blood transcriptomic signature uses an unusual discovery approach that leverages weighted gene network analysis that underpins the bioinformatic analyses. However, the main problem is that authors seem to use all the data for discovery and do not undertake any true external validation of their gene signature. As a result, the proposed gene signature is likely to be overfitted to these data and not generalisable. Even this does not achieve significantly better prognostic discrimination than the existing clinical scoring.

      As explained in response to the eLife assessment, our objective was to use bioinformatics tools to identify the biological pathways and hub genes associated with TBM pathogenesis. We agree that ‘This analysis cannot be used to make causal inferences’: that would require different study design and approaches. The proposed gene signature has higher AUC values than the existing clinical model alone or in combination with clinical risk factors (Table 4). We agree that independent validation of the gene signature will be a crucial next step for future utility. We have performed qPCR in another sample set, and have added these results in the revision (Table 4 and supplementary figure S8)

      Reviewer #1 (Recommendations For The Authors):

      I have a few additional comments most of which are relatively minor:

      (1) Can the authors please clarify if all the PTB cases are also HIV-negative?

      This has been added to the methods section.

      (2) For Table 1, can the authors please list the total number of patients with microbiologically confirmed TB regardless of the methods used? And for the two TBM groups, was the positive microbiology based on CSF findings?

      The total number of patients with microbiologically confirmed TB was presented in Table 2 in definite TBM group, which was microbiologically confirmed TB diagnosed using microscopy, culture, and Xpert testing in cerebrospinal fluid (CSF) samples. We have updated the note in Table 2 to provide clarity on the definition.

      (3) How was the discovery and validation set selected? Was it based on randomisation?

      We randomly split TBM data into two datasets, a discovery cohort (n=142) and a validation cohort (n=139) with a purpose to ensure reproducibility of data analysis. We described this in the methods section.

      (4) Line 107 can be better clarified by stating that the overall 3-month mortality rate is 21.7% for TBM regardless of HIV status.

      Thank you, we have restated this sentence in the results section.

      (5) The authors stated that samples were collected at enrolment when patients would have received less than 6 days of anti-tubercular treatment. Is there information on the median and IQR on the number of days that the patients would have received Rx, especially between the groups? Did the authors control for this variable when analysing for DEGs?

      One of criteria to enroll participants in LAST-ACT and ACT-HIV trials is that they must receive less than 6 consecutive days of two or more drugs active against M. tuberculosis. However, the information of the days that the patients would have received Rx was not recorded and we could not control this variable when performing differential expression analysis for DEGs. This has been clarified further in the methods section: ‘The samples were taken at enrollment, when patients could not have received more than 6 consecutive days of two or more drugs active against M. tuberculosis.’

      (6) I am a little bit concerned with the reads mapping accuracy (57%) to the human genome, which is fairly low. Did the authors investigate the reasons behind this low accuracy?

      Thank you. It was indeed a typo. We have corrected it in the results section.

      (7) On Tables S2-S4, can the authors please clarify what the last column (labelled as "B") shows?

      Tables S2-S4 now have been changed to S3-S5. We have updated the legend of these tables to provide clarification regarding the meaning of the last column.

      Reviewer #2 (Recommendations For The Authors):

      If the authors wish to revise their manuscript, I suggest the following amendments:

      (1) Provide a consort diagram for the selection of samples included in the present analysis (from parent study cohorts), allocation to test and validation splits for bioinformatics analysis, and outcomes.

      We have provided our consort diagram in supplementary Figure S10.

      (2) Provide details of inclusion criteria for pulmonary TB cohort, and how samples from this cohort were selected for inclusion in the present analysis. Please clarify whether this cohort excluded HIV-positive participants by design or by chance.

      The inclusion criteria for the pulmonary TB cohort were described in the methods section. Due to the very low prevalence of HIV in this prospective observational study, HIV-positive participants were excluded. We have clarified in the amended manuscript that the pulmonary TB cohort only included HIV-negative participants.

      (3) Baseline characteristics of HIV-positive participants (Table 1) should include CD4 count, HIV viral load, and whether anti-retroviral therapy was naïve or experienced.

      We have included pre-treatment CD4 cell count, information on anti-retroviral therapy, and HIV viral load data in Table 1, as well as described these information in the results section.

      (4) I note that the TBM samples were derived from RCTs of adjunctive steroid therapy, but not stratified in the present analysis by treatment arm allocation. Clearly, this may affect the survival/mortality outcomes that are the central focus of this manuscript. Therefore, they should be included in the models for differential gene expression analysis and prognostic signature discovery. To do so, the authors may need to wait until they are able to unblind the trial metadata.

      With permission from the trial investigators, we were able to adjust the analyses for treatment with corticosteroids. The investigators remained blind to the allocation and we have not reported any direct effects of corticosteroids on outcome – such an analysis could only be done once the LAST-ACT trial has been reported (which won’t be until the end of 2024). Treatment outcome and effect were blinded by extracting only the fold change difference between survival and death in the linear regression model, in which gene expression was outcome and survival and treatment were covariates.

      (5) I understood from the methods (lines 460-461) that batch correction of the RNAseq data was necessary. However, it is not clear how the samples were batched. PCA of the transcriptomes before and after batch correction with batch and study group labels should be provided. I would also advocate for a sensitivity analysis to check the robustness of the main findings without batch correction. I assume Fig2A represents batch-corrected data, but this is not clear.

      We have now added information about the RNA sequencing batch and the batch correction approach, analyses and data visualizations utilized batch-corrected data in the methods section. We have also updated results related to batch correction in Fig. 2A and Supplementary Figure S9.

      (6) I would encourage the authors to include a differential gene expression analysis to directly compare the transcriptome of TBM to that of pulmonary TB. I think it would add additional value to their focus on describing the transcriptome in TBM.

      We thank for reviewer’s suggestion. Conducting differential gene expression analysis to compare the transcriptome of TBM with that of PTB is beyond the scope of this manuscript and we will examine this question separately.

      (7) I don't really understand the purpose of splitting their data set into test and validation for the purposes of showing that WGCNA analysis is mostly reproduced in the two halves of the data. I would advocate that they scrap this approach to maximise the statistical power of their analysis in the descriptive work.

      As mentioned in response to reviewer #1 in question #3, the purpose of splitting data is to ensure the reproducibility of the data analysis as suggested by Langfelder et al. (PMID: 21283776). This approach served two purposes: (i) to affirm the existence of functional modules in an independent cohort and (ii) to validate the association of interested modules or their hub genes with survival outcomes.

      (8) The authors should soften the confidence in their interpretation of the GO/KEGG annotations of WGCNA modules. At least, they should include a paragraph that explicitly details the limitations of their analyses, including (i) the accuracy GO/KEGG annotations are not validated in this context (if at all), (ii) that none of the data can be used to make causal inferences and (iii) that peripheral blood assessments that are obviously impacted by changes in cellular composition of peripheral blood do not necessarily reflect immunopathogenesis at the site of disease - in fact if circulating cells are being recruited to the site of disease or other immune compartments, then quite the opposite interpretations may be true.

      We appreciate the reviewer's comment. (i) In our analysis, we initially confirmed the existence of Weighted Gene Co-expression Network Analysis (WGCNA) modules in discovery cohort and validated the association of these modules with mortality outcomes in validation cohort. We then applied GO/KEGG annotations to define the biological functions involved in WGCNA modules. Finally, we performed Qusage analysis to directly test the association of top-hit pathways of each WGCNA module with mortality outcomes (see supplementary S6). This analysis approach helped to identify and validate modules and biological pathways associated with TBM mortality in this context, avoiding potential false positives in GO/KEGG annotations of WGCNA modules. (ii) We agree with the assessment that 'This analysis cannot be used to make causal inferences,' as that would require a different study design and approach. (iii) The focus of this study is to investigate the pathogenesis of TBM in the systemic immune system. We have highlighted this focus in the title and the aim of the manuscript.

      (9) For the prognostic signature discovery and validation, I strongly recommend the authors include more robust validation. For example, to undertake an 80:20 split for sequential discovery (for feature selection and derivation of a prognostic model), followed by validation of a 'locked' model in data that made no contribution to discovery. In two separate sensitivity analyses. I also suggest they split their dataset (i) by treatment allocation in the RCT and (ii) by HIV status. In addition, their method for feature selection has to be clearer- precisely how they select hub genes from their WGCNA analysis as candidate predictors is not explained. Since this is such a prominent output of their manuscript, the results of this analysis should really be included in the main manuscript, and all performance metrics for discrimination should include confidence intervals.

      Employing an 80:20 split for training and testing models is a good approach for an internal validation. However, we addressed the issue of overestimating the performance of a prognostic model by bootstrapping sampling approach proposed by Steyerberg et al. (PMID: 11470385). This approach has been proven to provide stable estimates with low bias. The overall model performance for discrimination, reported in our manuscript, was corrected for “optimism” to ensure internal validity. This adjustment was achieved through a 1000-times bootstrapping approach, which effectively accounted for estimation uncertainty. As such, there is no need to present confidence intervals for these metrics.

      Moreover, in our revision, to confirm prognostic signatures independently, we have evaluated the predictive value of identified gene signatures using qPCR in another set of samples. The results have been added in Table 4, supplementary Figure S8 and the results section.

      For the reasons given above (comment 4), we are unable to split our dataset by treatment allocation in this analysis. But as described, we have adjusted the analysis for corticosteroid treatment. Once the primary results of the LAST ACT trial have been published, we will examine the impact of corticosteroids on TBM pathophysiology and outcomes, seeking to better understand the mechanisms by which steroids have their therapeutic effects.

      Given the difference in pathogenesis and immune response by HIV-coinfection, we stratified our analysis by HIV status. As the reviewer’s suggestion, we have provided additional details in the methods section regarding the selection of hub genes from associated WGCNA modules and the feature selection process for predictive modeling.

    1. Author response:

      The following is the authors’ response to the previous reviews

      We extend our sincere gratitude for the invaluable comments provided by the reviewers and yourself, along with the constructive suggestions to enhance the quality of our manuscript. In response to this invaluable feedback, we have diligently revised and resubmitted our paper as an article, introducing five primary figures, seven supplementary figures, and two supplementary data files. Importantly, this work represents a significant contribution to the field, presenting novel findings for the first time without any prior publication.

      Within the enclosed document, we have provided a comprehensive response to the editor and reviewer comments, addressing each point meticulously and specifically. We extend our heartfelt thanks to the reviewers and yourself for your diligent examination of our manuscript and for offering insightful recommendations.

      In our latest revision, we have taken great care to address every comment, ensuring that we clarify the manuscript and provide robust evidence where required. We have meticulously highlighted the modifications within the manuscript in yellow for your convenience, while also including the modifications made in response to each specific comment. The primary focus of these revisions was to provide additional context regarding the relationship between PARP-1 and mono-methylated histones. Substantial modifications were made to our discussion section to address this point.

      Another concern raised was regarding the discrepancy in the relationship of PR-SET7 and PARP-1 between our study and the recent study by Estève et al. (PMID: 36434141). We have revised the results and discussion sections to discuss this concern.

      Addressing Reviewer 2’s concern about the potential indirect role of PARP1 in the regulation of some metabolic genes despite its direct binding to loci coding for metabolic genes we revised the discussion section to highlight this possibility.

      Enclosed, you will find a detailed, point-by-point response to each of the editor’s and reviewers' comments, showcasing our commitment to addressing their concerns with precision.

      We firmly believe that our revisions successfully resolve all the concerns raised by the editor and the reviewers, and we are confident that this improved version of our manuscript contributes significantly to the scientific discourse. Once again, we thank you for considering our work, and please feel free to contact me if you require any additional information.

      In the revised manuscript, most of the concerns raised by the reviewers have been addressed satisfactorily. However, as suggested by reviewer#2, it would have been more significant, if the PARP1-mediated reading of global mono-methylation of histone could be addressed. At least the mechanisms of selectivity of PARP1 need further convincing discussion.

      We thank the editor for their valuable comments. We have extended our discussion section to discuss in more detail the relationship between PARP1 and mono-methylated histones. In our refined Discussion section, we have endeavored to articulate more clearly how PARP-1 may be selectively recruited to active chromatin domains through its interaction with mono-methylated histone marks. We propose a model wherein PARP-1 actively participates in the turnover process, contributing to the maintenance of an active chromatin environment. This mechanism entails PARP-1 selectively binding to mono-methylated active histone marks associated with highly transcribed genes. Upon activation, PARP-1 undergoes automodification, leading to its release from chromatin and facilitating the reassembly of nucleosomes carrying the mono-methylated marks. Subsequently, the enzymatic action of Poly(ADP)-ribose glycohydrolase (PARG) cleaves pADPr, enabling the restoration of PARP-1's binding affinity to mono-methylated active histone marks. This proposed hypothesis is consistent with existing research across various model organisms and aligns with the known association of PARP-1 with highly expressed genes, as well as its role in mediating nucleosome dynamics and assembly.

      Our modified Discussion section unfolds as follows:

      "Finally, highly transcribed genes have been reported to present a high turnover of mono-methylated modifications, maintaining a state of low methylation (50). Moreover, our previous study revealed that PARP1 preferentially binds to highly active genes (34).  Consequently, our findings suggest an active involvement of PARP-1 in the turnover process to maintain an active chromatin environment. This proposed mechanism unfolds in the following steps: 1) PARP-1 selectively binds to mono-methylated active histone marks associated with highly transcribed genes. 2) Upon activation, PARP-1 undergoes automodification and subsequently disengages from chromatin, facilitating the reassembly of nucleosomes carrying the mono-methylated marks. 3) The enzymatic action of Poly(ADP)-ribose glycohydrolase (PARG) cleaves pADPr, restoring PARP-1's binding affinity to mono-methylated active histone marks. This proposed hypothesis is consistent with existing research conducted across various model organisms, including mice, Drosophila, and Humans (7, 24, 30, 51-53). Notably, previous studies have consistently demonstrated that PARP-1 predominantly associates with highly expressed genes and plays a crucial role in mediating nucleosome dynamics and assembly. Thus, our proposed model provides a molecular framework that may contribute to understanding the relationship between PARP-1 and the epigenetic regulation of gene expression."

      We trust that these revisions effectively address the editor’s comment and enhance the overall strength and clarity of our manuscript.

      Furthermore, recent developments in the area are omitted, as an important publication hasn't been discussed anywhere in the work (PMID: 36434141).

      We appreciate the editor's thorough review of our revised manuscript and the responses to the previous reviewer's comments. To address this important concern, we have carefully investigated the levels of PR-SET7 in parp1 hypomorphic conditions.

      Supplemental Fig. S4 and S5 demonstrate that in the absence of Parp1, there were no significant changes observed in PR-SET7 RNA or protein levels, respectively. This finding supports the conclusion that Parp1 is not directly involved in the regulation of PR-SET7 in Drosophila contrasting with the findings of Estève et al.'s study (PMID: 36434141). This discrepancy may arise from differing relationships between PARP-1 and PR-SET7, which could cooperate in the context of Drosophila development while playing antagonistic roles in specific cell lines or under particular conditions.

      We have updated the Results section to explicitly mention this observation:

      "Interestingly, in the absence of PARP-1, neither PR-SET7 RNA nor protein levels were affected (Supplemental Fig.S4-5), indicating that PARP-1 is not directly implicated in the regulation of pr-set7. This finding contrasts with recent evidence demonstrating PARP1-induced degradation of PR-SET7/SET8 in human cells (16)."

      Furthermore, we have modified the discussion section to address this discrepancy:

      "A recent study demonstrated that in human cells overexpressing PARP-1, PR-SET7/SET8 is degraded, whereas depletion of PARP-1 leads to an increase in PR-SET7/SET8 levels (16). However, in our study involving parp-1 mutant in Drosophila third-instar larvae revealed a nuanced scenario: we detected a minor but not significant reduction in both PR-SET7 RNA and protein levels (Supplemental Fig.S4 and S5). This outcome stands in stark contrast to the previous study's findings. The discrepancy could be due to the distinct experimental approaches used: the previous research focused on mammalian cells and in vitro experiments, whereas our study examined the functions of PARP-1 in whole Drosophila third-instar larvae during development. Consequently, while PARP-1 may cooperate with PR-SET7 in the context of Drosophila development, it could exhibit antagonistic roles against PR-SET7 in specific cell lines and under certain biological or developmental conditions."

      We believe that these modifications effectively address the raised concern and provide a more comprehensive understanding of the relationship between PARP1 and PR-SET7 in our study. We hope these clarifications enhance the overall robustness and clarity of our findings.

      Reviewer #2 (Public Review):

      Summary:

      This study from Bamgbose et al. identifies a new and important interaction between H4K20me and Parp1 that regulates inducible genes during development and heat stress. The authors present convincing experiments that form a mostly complete manuscript that significantly contributes to our understanding of how Parp1 associates with target genes to regulate their expression.

      Strengths:

      The authors present 3 compelling experiments to support the interaction between Parp1 and H4K20me, including:

      (1) PR-Set7 mutants remove all K4K20me and phenocopy Parp mutant developmental arrest and defective heat shock protein induction.

      (2) PR-Set7 mutants have dramatically reduced Parp1 association with chromatin and reduced poly-ADP ribosylation.

      (3) Parp1 directly binds H4K20me in vitro.

      Weaknesses:

      (1) The RNAseq analysis of Parp1/PR-Set7 mutants is reasonable, but there is a caveat to the author's conclusion (Line 251): "our results indicate H4K20me1 may be required for PARP-1 binding to preferentially repress metabolic genes and activate genes involved in neuron development at co-enriched genes." An alternative possibility is that many of the gene expression changes are indirect consequences of altered development induced by Parp1 or PR-Set7 mutants. For example, Parp1 could activate a transcription factor that represses metabolic genes. The authors counter this model by stating that Parp1 directly binds to "repressed" metabolic genes. While this argument supports their model, it does not rule out the competing indirect transcription factor model. Therefore, they should still mention the competing model as a possibility.

      We appreciate Reviewer 2's insightful comments during both rounds of revision, which have significantly enriched the quality of our manuscript. The binding of PARP1 to loci encoding metabolic genes indeed suggests a direct role of PARP1 in their regulation. However, we acknowledge Reviewer 2's point that some of these targets might be regulated indirectly, with PARP1 potentially modulating the expression of intermediary transcription factors.

      To address this possibility, we have revised the discussion section of our manuscript accordingly:

      "Remarkably, our observations indicate a notable affinity of PARP-1 for binding to the gene bodies of these metabolic genes (34), suggesting a direct involvement of PARP1 in their regulation. Nonetheless, it remains plausible that certain genes may be indirectly regulated by PARP1 through intermediary transcription factors."

      We trust that this modification adequately addresses Reviewer 2's concern.

      (2) The section on inducibility of heat shock genes is interesting but missing an important control that might significantly alter the author's conclusions. Hsp23 and Hsp83 (group B genes) are transcribed without heat shock, which likely explains why they have H4K20me without heat shock. The authors made the reasonable hypothesis that this H4K20me would recruit Parp-1 upon heat shock (line 270). However, they observed a decrease of H4K20me upon heat shock, which led them to conclude that "H4K20me may not be necessary for Parp1 binding/activation" (line 275). However, their RNA expression data (Fig4A) argues that both Parp1 and H40K20me are important for activation. An alternative possibility is that group B genes indeed recruit Parp1 (through H4K20me) upon heat shock, but then Parp1 promotes H3/H4 dissociation from group B genes. If Parp1 depletes H4, it will also deplete H4K20me1. To address this possibility, the authors should also do a ChIP for total H4 and plot both the raw signal of H4K20me1 and total H4 as well as the ratio of these signals. The authors could also note that Group A genes may similarly recruit Parp1 and deplete H3/H4 but with different kinetics than Group B genes because their basal state lacks H4K20me/Parp1. To test this possibility, the authors could measure Parp association, H4K20methylation, and H4 depletion at more time points after heat shock at both classes of genes.

      We sincerely appreciate Reviewer 2 for their insightful comment on our manuscript. Your hypothesis regarding the potential induction of H3/H4 dissociation from group B genes by PARP-1, leading to a reduction in H4K20me1, offers a thought-provoking perspective. However, our findings suggest an alternative interpretation.

      Our data indicate that while H4K20me1 is indeed present under normal conditions at group B genes, its reduction following heat shock does not seem to impede PARP-1's role in transcriptional activation (Fig. 4A, C, and E). Instead, we propose that this decrease in H4K20me1 might signify a regulatory shift in chromatin structure, facilitating transcriptional activation during heat shock, with PARP-1 playing an independent facilitating role. Moreover, existing studies have highlighted the dual role of H4K20me1, acting as a promoter of transcription elongation in certain contexts and as a repressor in others.

      The elevated enrichment of H4K20me1 in group B genes under normal conditions may indeed indicate a repressive state that requires alleviation for transcriptional activation. Additionally, we cannot discount the possibility of unique regulatory functions associated with PR-SET7, extending beyond its recognized role as a histone methylase. Non-catalytic activities and potential interactions with non-histone substrates might contribute to the nuanced control exerted by PR-SET7 on group B genes during heat stress.

      Furthermore, our exploration of pr-set720 and ParpC03256 mutants reveals distinct roles for PARP-1 and H4K20me1 in modulating gene expression (Fig 3E). This reinforces the notion that the interplay between PR-SET7 and PARP-1 involves a multifaceted regulatory mechanism.

      To address these points, we have revised the discussion section of our manuscript accordingly:

      "Another plausible explanation could be that the recruitment of PARP-1 to group B genes loci promotes H4 dissociation and then leads to a reduction of H4K20me1. However, our findings suggest an alternative interpretation: the decrease in H4K20me1 at group B genes during heat shock does not seem to impede PARP-1's role in transcriptional activation, (Fig.4A, C and E). Rather than disrupting PARP-1 function, we propose that this reduction in H4K20me1 may signify a regulatory shift in chromatin structure, priming these genes for transcriptional activation during heat shock, with PARP-1 playing an independent facilitating role. Moreover, existing studies have highlighted the dual role of H4K20me1, acting as a promoter of transcription elongation in certain contexts and as a repressor in others (13, 26, 39, 40, 42-46). The elevated enrichment of H4K20me1 in group B genes under normal conditions may indicate a repressive state that requires alleviation for transcriptional activation. Additionally, we cannot discount the possibility of unique regulatory functions associated with PR-SET7, extending beyond its recognized role as a histone methylase. Non-catalytic activities and potential interactions with non-histone substrates might contribute to the nuanced control exerted by PR-SET7 on group B genes during heat stress (47, 48). Furthermore, our exploration of pr-set720 and parp-1C03256 mutants reveals distinct roles for PARP-1 and H4K20me1 in modulating gene expression (Fig 3E). This reinforces the notion that the interplay between PR-SET7 and PARP-1 involves a multifaceted regulatory mechanism. Understanding the intricate relationship between these molecular players is crucial for elucidating the complexities of gene expression modulation under heat stress conditions."

      We believe that this modification enhances the clarity of our conclusions and adequately addresses Reviewer 2's concerns regarding the intricate relationship between PARP-1, H4K20me1, and PR-SET7 in transcriptional regulation under heat stress conditions.

    1. Author response:

      Reviewer #1 (Public Review):

      Summary:

      The endocannabinoid system (ECS) components are dysregulated within the lesion microenvironment and systemic circulation of endometriosis patients. Using endometriosis mouse models and genetic loss of function approaches, Lingegowda et al. report that canonical ECS receptors, CNR1 and CNR2, are required for disease initiation, progression, and T-cell dysfunction.

      Strengths:

      The approach uses genetic approaches to establish in vivo causal relationships between dysregulated ECS and endometriosis pathogenesis. The experimental design incorporates bulk RNAseq approaches, as well as imaging mass spectrometry to characterize the mouse lesions. The identification of immune-related and T-cell-specific changes in the lesion microenvironment of CNR1 and CNR2 knockout (KO) mice represents a significant advance

      Weaknesses:

      Although the mouse phenotypic analyses involve a detailed molecular characterization of the lesion microenvironment using genomic approaches, detailed measurements of lesion size/burden and histopathology would provide a better understanding of how CNR1 or CNR2 loss contributes to endometriosis initiation and progression. The cell or tissue-specific effects of the CNR1 and CNR2 are not incorporated into the experimental design of the studies. Although this aspect of the approach is recognized as a major limitation, global CNR1 and CNR2 KO may affect normal female reproductive tract function, ovarian steroid hormone levels, decidualization response, or lead to preexisting alterations in host or donor tissues, which could affect lesion establishment and development in the surgically induced, syngeneic mouse model of endometriosis.

      We appreciate the reviewer's thoughtful and constructive feedback. We agree that the additional measurements of lesion size/burden and histopathology would provide valuable insights into the specific contributions of CNR1 and CNR2 to endometriosis progression. However, the focus of this study was on assessing the alterations in complex immune microenvironment due to the absence of CNR1 and CNR2, given their close relation in regulating immune cell populations. We will plan to incorporate these measurements in future studies to further strengthen the understanding of the disease pathogenesis. Regarding the potential effects of global knockout, the reviewer raises a valid concern. To address this, we will explore cell and/or tissue-specific knockout models in future experiments to better isolate the direct effects of CNR1 and CNR2 on the disease process, while minimizing potential confounding factors from systemic alterations.

      Reviewer #2 (Public Review):

      Summary:

      The endocannabinoid system (ECS) regulates many critical functions, including reproductive function. Recent evidence indicates that dysregulated ECS contributes to endometriosis pathophysiology and the microenvironment. Therefore, the authors further examined the dysregulated ECS and its mechanisms in endometriosis lesion establishment and progression using two different endometrial sources of mouse models of endometriosis with CNR1 and CNR2 knockout mice. The authors presented differential gene expressions and altered pathways, especially those related to the adaptive immune response in CNR1 and CNR2 ko lesions. Interestingly, the T-cell population was dramatically reduced in the peritoneal cavity lacking CNR2, and the loss of proliferative activity of CD4+ T helper cells. Imaging mass cytometry analysis provided spatial profiling of cell populations and potential relationships among immune cells and other cell types. This study provided fundamental knowledge of the endocannabinoid system in endometriosis pathophysiology.

      Strengths:

      Dysregulated ECS and its mechanisms in endometriosis pathogenesis were assessed using two different endometrial sources of mouse models of endometriosis with CNR1 and CNR2 knockout mice. Not only endometriotic lesions, but also peritoneal exudate (and splenic) cells were analyzed to understand the specific local disease environment under the dysregulated ECS.

      Providing the results of transcriptional profiles and pathways, immune cell profiles, and spatial profiles of cell populations support altered immune cell population and their disrupted functions in endometriosis pathogenesis via dysregulation of ECS.

      In line 386: Role of CNR2 in T cells. The finding that nearly absent CD3+ T cells in the peritoneal cavity of CNR2 ko mice is intriguing.

      The interpretation of the results is well-described in the Discussion.

      Weaknesses:

      The study was terminated and characterized 7 days after EM induction surgery without the details for selecting the time point to perform the experiments.

      The authors also mentioned that altered eutopic endometrium contributes to the establishment and progression of endometriosis. This reviewer agrees with lines 324-325. If so, DEGs are likely identified between eutopic endometrium (with/without endometriosis lesion induction) and ectopic lesions. It would be nice to see the data (even though using publicly available data sets).

      Figure 7 CDEF. The results of the statistical analyses and analyzed sample numbers should be added. Lines 444-450 cannot be reviewed without them.

      This reviewer agrees with lines 498-500. In contrast, retrograded menstrual debris is not decidualized. The section could be modified to avoid misunderstanding.

      We would like to thank the reviewer for insightful comments, suggestions and acknowledging the importance of the work presented in this manuscript.

      Regarding 7-day time point, we have provided rationale in lines 479-481, but agree that it isn’t sufficient and hence we have provided additional details on the selection of the 7-day time point for the experiments in methods section (Mouse model of EM). We have also noted the suggestion on providing comparison of differentially expressed genes in the eutopic endometrium vs ectopic lesions. Since there are publications comparing the eutopic vs ectopic gene expression patterns (PMIDs: 33868805 and 18818281), including a study exploring the ECS genes in the endometrium throughout different menstrual cycles (PMID: 35672435), we believe additional analysis using the same dataset may not yield new information. However, we see the value in reviewer’s comment, and we will look at the gene expression patterns in the uterine vs endometriosis like lesions in our future studies with tissue or cell specific CNR1 and CNR2 knockout models to understand functional relevance of ECS in endometriosis initiation.

      Since the IMC study was exploratory for proof of concept, we did not have enough biological replicates for meaningful statistical validation (n = 2-3). We have clarified this information in the methods, results, and figure legends for appropriately representing the limitations of the current setup.

      Finally, we appreciate the feedback on the section discussing retrograded menstrual debris. Even though the menstrual debris may not be decidualized, some endometriotic lesions have the ability to decidualize based on their response to estrogen and progesterone in a cycling manner (PMID: 26450609), similar to the endometrium in the uterine cavity. We have clarified this in the revised MS.

    1. Author response:

      Public Reviews:

      Reviewer #1:

      Summary:

      Casas-Tinto et al. present convincing data that injury of the adult Drosophila CNS triggers transdifferentiation of glial cells and even the generation of neurons from glial cells. This observation opens up the possibility of getting a handle on the molecular basis of neuronal and glial generation in the vertebrate CNS after traumatic injury caused by Stroke or Crush injury. The authors use an array of sophisticated tools to follow the development of glial cells at the injury site in very young and mature adults. The results in mature adults revealing a remarkable plasticity in the fly CNS and dispels the notion that repair after injury may be only possible in nerve cords which are still developing. The observation of so-called VC cells which do not express the glial marker repo could point to the generation of neurons by former glial cells.

      Conclusion:

      The authors present an interesting story that is technically sound and could form the basis for an in-depth analysis of the molecular mechanism driving repair after brain injury in Drosophila and vertebrates.

      Strengths:

      The evidence for transdifferentiation of glial cells is convincing. In addition, the injury to the adult CNS shows an inherent plasticity of the mature ventral nerve cord which is unexpected.

      Weaknesses:

      Traumatic brain injury in Drosophila has been previously reported to trigger mitosis of glial cells and generation of neural stem cells in the larval CNS and the adult brain hemispheres. Therefore this report adds to but does not significantly change our current understanding. The origin and identity of VC cells is unclear.

      The Reviewer correctly points out that it has been reported that traumatic brain injury trigger generation of neural stem cells. However, according to previous reports, those cells where quiescent Dpn+ neuroblast. We now report that already differentiated adult neuropil glia transdifferentiate into neurons. Which is a new mechanism not previously reported.

      We agree with the reviewer regarding the identity of VC neurons although according to the results of G-TRACE experiments the origin is clear, they originate from neuropil glia (i.e. Astrocyte-like glia and ensheathing glia). We will use a battery of antibodies previously reported to identify specific subtypes of neurons to identify these newly generated neurons.

      Reviewer #2:

      Summary:

      Casas-Tinto et al., provide new insight into glial plasticity using a crush injury paradigm in the ventral nerve cord (VNC) of adult Drosophila. The authors find that both astrocyte-like glia (ALG) and ensheating glia (EG) divide under homeostatic conditions in the adult VNC and identify ALG as the glial population that specifically ramps up proliferation in response to injury, whereas the number of EGs decreases following the insult. Using lineage-tracing tools, the authors interestingly observe the interconversion of glial subtypes, especially of EGs into ALGs, which occurs independent of injury and is dependent on the availability of the transcription factor Prospero in EGs, adding to the plasticity observed in the system. Finally, when tracing the progeny of differentiated glia, Casas-Tinto and colleagues detect cells of neuronal identity and provide evidence that such glia-derived neurogenesis is specifically favored following ventral nerve cord injury, which puts forward a remarkable way in which glia can respond to neuronal damage.

      Numerous experiments have been carried out in 7-day-old flies, showing that the observed plasticity is not due to residual developmental remodeling or a still immature VNC.

      By elegantly combining different genetic tools, the authors show glial divisions with mitotic-dependent tracing and find that the number of generated glia is refined by apoptosis later on.

      The work identifies Prospero in glia as an important coordinator of glial cell fate, from development to the adult context, which draws further attention to the upstream regulatory mechanisms.

      We express our gratitude to the reviewer for their keen appreciation of our efforts and their enthusiasm for the outcomes of this research.

      Weaknesses:

      Although the authors do use a variety of methods to show glial proliferation, the EdU data (Figure 1B) could be more informative (Figure 1B) by displaying images of non-injured animals and providing quantifications or the mention of these numbers based on results previously acquired in the system.

      We appreciate the Reviewer’s comment. We believed that adding images of non-injured animals did not add new information as we already quantified the increase of glial proliferation upon injury in Losada-Perez let al. 2021. Besides, the porpoise of this experiment was to figure out if dividing cells where Astrocyte-like glia rather than the number of dividing cells. Comparing independent experiments could be tricky but if we compare the quantifications of G2-M glia (repo>fly-Fucci) done in Losada-Perez et al 2021 (fig 1C) with the quantifications of G2-M neuropil glia done in this work (fig 1C) we can see that the numbers are comparable.

      The experiments relying on the FUCCI cell cycle reporter suggested considerable baseline proliferation for EGs and ALGs, but when using an independent method (Twin Spot MARCM), mitotic marking was only detected for ALGs. This discrepancy could be addressed by assessing the co-localization of the different glia subsets using the identified driver lines with mitotic markers such as PH3.

      In our understanding this discrepancy could be explained by the magnitude of proliferation. The lower proliferation rate of EG (as indicate the fly-fucci experiments) combining with the incomplete efficiency of MARCM clones induction reduces considerably the chances of finding EG MARCM clones. PH3 is a mitotic marker but it is also found in apoptotic cells (Kim and Park 2012. DOI: 10.1371/journal.pone.0044307), however we can do the suggested experiment and quantify the results.

      The data in Figure 1C would be more convincing in combination with images of the FUCCI Reporter as it can provide further information on the location and proportion of glia that enter the cell cycle versus the fraction that remains quiescent.

      We will add the suggested images.

      The analyses of inter-glia conversion in Figure 3 are complicated by the fact that Prospero RNAi is both used to suppress EG - to ALG conversion and as a marker to establish ALG nature. Clarifications if the GFP+ cells still expressed Pros or were classified as NP-like GFP cells are required here.

      As described in the text, Pros is a marker for ALG and the results suggest that Prospero expression is required for the EG to ALG transition. We will clarify these concepts in the text accordingly. In figure 3 we showed images of NP-like cells originated from EG that are prospero+, and therefore supporting the transdifferentiation from EG to ALG.

      The conclusion that ALG and EG glial cells can give rise to cells of neuronal lineage is based on glial lineage information (GFP+ cells from glial G-trace) and staining for the neuronal marker Elav. The use of other neuronal markers apart from Elav or morphological features would provide a more compelling case that GFP+ cells are mature neurons.

      We completely agree with the reviewer's observation regarding the identity of VC neurons. We will try to identify the identity of these cells using previously described antibodies to identify neuronal populations. We will also appreciate any suggestions regarding the antibodies we can use

      Although the text discusses in which contexts, glial plasticity is observed or increased upon injury, the figures are less clear regarding this aspect. A more systematic comparison of injured VNCs versus homeostatic conditions, combined with clear labelling of the injury area would facilitate the understanding of the panels.

      We appreciate the Reviewer’s observation. We will carefully check all figures in order to increase their clarity

      Context/Discussion

      The study finds that glia in the ventral cord of flies have latent neurogenic potential. Such observations have not been made regarding glia in the fly brain, where injury is reported to drive glial divisions or the proliferation of undifferentiated progenitor cells with neurogenic potential.

      Discussing this different strategy for cell replacement adopted by glia in the VNC and pointing out differences to other modes seems fascinating. Highlighting differences in the reactiveness of glia in the VNC compared to the brain also seems highly relevant as they may point to different properties to repair damage.

      Based on the assays employed, the study points to a significant amount of glial "identity" changes or interconversions, which is surprising under homeostatic conditions. The significance of this "baseline" plasticity remains undiscussed, although glia unarguably show extensive adaptations during nervous system development.

      It would be interesting to know if the "interconversion" of glia is determined by the needs in the tissue or would shift in the context of selective ablation/suppression of a glial type.

      We deeply appreciate the Reviewer’s enthusiasm on this subject, it is indeed fascinating. We made a reduced discussion in order to fit in the eLife Short report requirements but the specific condition that trigger glial interconversion are of great interest for us. To compromise EG or ALG viability and evaluate the behaviour of glial cells is of great interest for developmental biology and regeneration, but the precise scenario to develop these experiments is not well defined. In this report, we aim to reproduce an injury in Drosophila brain and this model should serve to analyze cellular behaviours. The scenario where we deplete on specific subpopulation of glial cells is conceptually attractive, but far away from the scope of this report.

      Reviewer #3:

      In this manuscript, Casas-Tintó et al. explore the role of glial cells in the response to a neurodegenerative injury in the adult brain. They used Drosophila melanogaster as a model organism and found that glial cells are able to generate new neurons through the mechanism of transdifferentiation in response to injury.

      This paper provides a new mechanism in regeneration and gives an understanding of the role of glial cells in the process.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Public Review):

      In this manuscript, Huang and colleagues explored the role of iron in bacterial therapy for cancer. Using proteomics, they revealed the upregulation of bacterial genes that uptake iron, and reasoned that such regulation is an adaptation to the iron-deficient tumor microenvironment. Logically, they engineered E. Coli strains with enhanced iron-uptake efficiency, and showed that these strains, together with iron scavengers, suppress tumor growth in a mouse model. Lastly, they reported the tumor suppression by IroA-E. Coli provides immunological memory via CD8+ T cells. In general, I find the findings in the manuscript novel and the evidence convincing.

      (1) Although the genetic and proteomic data are convincing, would it be possible to directly quantify the iron concentration in (1) E. Coli in different growth environments, and (2) tumor microenvironment? This will provide the functional consequences of upregulating genes that import iron into the bacteria.

      We appreciate the reviewer’s comment regarding the precise quantification of iron concentrations. In our study, we attempted various experimental approaches, including Immunohistochemistry utilizing an a Fe3+ probe, iron assay kit (ab83366), and Inductively Coupled Plasma Mass Spectrometry (ICP-MS). Despite these attempts, the quantification of oxidized Fe3+ concentrations proved challenging due to the inherently low levels of Fe ions and difficulty to distinguish Fe2+ and Fe3+. We observed measurements below the detection threshold of even the sensitive ICP-MS technique. To circumvent this limitation, we designed an experiment wherein bacteria were cultured in a medium supplemented with Chrome Azurol S (CAS) reagent, which colormetrically detects siderophore activity. We compared WT bacteria and IroA-expressing bacteria at varying levels of Lcn2 proteins. The outcome, as depicted in the updated Fig. 3b, reveals an enhanced iron acquisition capability in IroA-E. coli under the presence of Lcn2 proteins, in comparison to the wild-type E. coli strains. In addition to the Lcn2 study, the proteomic study in Figure 4 highlights the competitive landscape between cancer cells and bacteria. We observed that IroA-E. coli showed reduced stress responses and exerted elevated iron-associated stress to cancer cells, thus further supporting the IroA-E. coli’s iron-scavenging capability against nutritional immunity.

      (2) Related to 1, the experiment to study the synergistic effect of CDG and VLX600 (lines 139-175) is very nice and promising, but one flaw here is a lack of the measurement of iron concentration. Therefore, a possible explanation could be that CDG acts in another manner, unrelated to iron uptake, that synergizes with VLX600's function to deplete iron from cancer cells. Here, a direct measurement of iron concentration will show the effect of CDG on iron uptake, thus complementing the missing link.

      We appreciate the reviewer’s comment and would like to point the reviewer to our results in Figure S3, which shows that the expression of CDG enhances bacteria survival in the presence of LCN2 proteins, which reflects the competitive relationship between CDG and enterobactin for LCN2 proteins as previously shown by Li et al. [Nat Commun 6:8330, 2015]. We regret to inform the reviewer that direct measurement of iron concentration was attempted to no avail due to the limited sensitivity of iron detecting assays. We do acknowledge that CDG may exert different effects in addition to enhancing iron uptake, particularly the potentiation of the STING pathway. We pointed out such effect in Fig 2c that shows enhanced macrophage stimulation by the CDG-expressing bacteria. We would like to accentuate, however, that a primary objective of the experiment is to show that the manipulation of nutritional immunity for promoting anticancer bacterial therapy can be achieved by combining bacteria with iron chelator VLX600. The multifaceted effects of CDG prompted us to focus on IroA-E. coli in subsequent experiments to examine the role of nutritional immunity on bacterial therapy. We have updated the associated text to better convey our experimental design principle.

      Lines 250-268: Although statistically significant, I would recommend the authors characterize the CD8+ T cells a little more, as the mechanism now seems quite elusive. What signals or memories do CD8+ T cells acquire after IroA-E. Coli treatment to confer their long-term immunogenicity?

      We apologize for the overinterpretation of the immune memory response in our previous manuscript and appreciate the reviewer’s recommendation to further characterize CD8+ T cells post-IroA-E. coli treatment. Our findings, which show robust tumor inhibition in rechallenge studies, indicate establishment of anticancer adaptive immune responses. As the scope of the present work is aimed at demonstrating the value of engineered bacteria for overcoming nutritional immunity, expounding on the memory phenotypes of the resulting cellular immunity is beyond the scope of the study. We do acknowledge that our initial writing overextended our claims and have revised the manuscript accordingly. The revised manuscript highlights induction of anticancer adaptive immunity, attributable to CD8+ T cells, following the bacterial therapy.

      (3) Perhaps this goes beyond the scope of the current manuscript, but how broadly applicable is the observed iron-transport phenomenon in other tumor models? I would recommend the authors to either experimentally test it in another model or at least discuss this question.

      We highly appreciate the reviewer’s suggestion regarding the generalizability of the iron-transport phenomenon in diverse tumor models. To address this, we extended our investigations beyond the initial model, employing B16-F10 melanoma and E0771 breast cancer in mouse subcutaneous models. The results, as depicted in Figures 3g to 3j and Figure S5, demonstrate the superiority of IroA-E. coli over WT bacteria in tumor inhibition. These findings support the broad implication of nutritional immunity as well as the potential of iron-scavenging bacteria for different solid tumor treatments.

      Reviewer #2 (Public Review):

      Summary:

      The authors provide strong evidence that bacteria, such as E. coli, compete with tumor cells for iron resources and consequently reduce tumor growth. When sequestration between LCN2 and bacterobactin is blocked by upregulating CDG(DGC-E. coli) or salmochelin(IroA-E.coli), E. coli increase iron uptake from the tumor microenvironment (TME) and restrict iron availability for tumor cells. Long-term remission in IroA-E.coli treated mice is associated with enhanced CD8+ T cell activity. Additionally, systemic delivery of IroA-E.coli shows a synergistic effect with chemotherapy reagent oxaliplatin to reduce tumor growth.

      Strengths:

      It is important to identify the iron-related crosstalk between E. coli and TME. Blocking lcn2-bacterobactin sequestration by different strategies consistently reduces tumor growth.

      Weaknesses:

      As engineered E.coli upregulate their function to uptake iron, they may increase the likelihood of escaping from nutritional immunity (LCN2 becomes insensitive to sequester iron from the bacteria). Would this raise the chance of developing sepsis? Do authors think that it is safe to administrate these engineered bacteria in mice or humans?

      We appreciate the reviewer’s comment on the safety evaluation of the iron-scavenging bacteria. To address the concern, we assessed the potential risk of sepsis development by measuring the bacterial burden and performing whole blood cell analyses following intravenous injection of the engineered bacteria. As illustrated in Figures 3k and 3l, our findings indicate that the administration of these engineered bacteria does not elevate the risk of sepsis. The blood cell analysis suggests that mice treated with the bacteria eventually return to baseline levels comparable to untreated mice, supporting the safety of this approach in our experimental models.

      Reviewer #3 (Public Review):

      Summary:

      Based on their observation that tumor has an iron-deficient microenvironment, and the assumption that nutritional immunity is important in bacteria-mediated tumor modulation, the authors postulate that manipulation of iron homeostasis can affect tumor growth. They show that iron chelation and engineered DGC-E. coli have synergistic effects on tumor growth suppression. Using engineered IroA-E. coli that presumably have more resistance to LCN2, they show improved tumor suppression and survival rate. They also conclude that the IroA-E. coli treated mice develop immunological memory, as they are resistant to repeat tumor injections, and these effects are mediated by CD8+ T cells. Finally, they show synergistic effects of IroA-E. coli and oxaliplatin in tumor suppression, which may have important clinical implications.

      Strengths:

      This paper uses straightforward in vitro and in vivo techniques to examine a specific and important question of nutritional immunity in bacteria-mediated tumor therapy. They are successful in showing that manipulation of iron regulation during nutritional immunity does affect the virulence of the bacteria, and in turn the tumor. These findings open future avenues of investigation, including the use of different bacteria, different delivery systems for therapeutics, and different tumor types.

      Weaknesses:

      • There is no discussion of the cancer type and why this cancer type was chosen. Colon cancer is not one of the more prominently studied cancer types for LCN2 activity. While this is a proof-of-concept paper, there should be some recognition of the potential different effects on different tumor types. For example, this model is dependent on significant LCN production, and different tumors have variable levels of LCN expression. Would the response of the tumor depend on the role of iron in that cancer type? For example, breast cancer aggressiveness has been shown to be influenced by FPN levels and labile iron pools.

      We highly appreciate the reviewer’s insightful comment on the varying LCN2 activities across different tumor types. In light of the reviewer’s suggestion, we extended our investigations beyond the initial colon cancer model, employing B16-F10 melanoma and E0771 breast cancer in mouse subcutaneous models. The results, as depicted in Figures 3g to 3j and Figure S5, demonstrate that IroA-E. coli consistently outperforms WT bacteria in tumor inhibition. We acknowledge the reviewer’s comment regarding LCN2 being more prominently examined in breast cancer and have highlighted this aspect in the revised manuscript. For colon and melanoma cancers, several reports have pointed out the correlation of LCN2 expression and the aggressiveness of these cancers [Int J Cancer. 2021 Oct 1;149(7):1495-1511][Nat Cancer. 2023 Mar;4(3):401-418], albeit to a lesser extent. These findings support the broad implication of nutritional immunity as well as the potential of iron-scavenging bacteria for different solid tumor treatments. The manuscript has been revised to reflect the reviewer’s insightful comment.

      • Are the effects on tumor suppression assumed to be from E. coli virulence, i.e. Does the higher number of bacteria result in increased immune-mediated tumor suppression? Or are the effects partially from iron status in the tumor cells and the TME?

      We appreciate the reviewer’s question regarding the therapeutic mechanism of IroA-E. coli. Bacterial therapy exerts its anticancer action through several different mechanisms, including bacterial virulence, nutrient and ecological competition, and immune stimulation. Decoupling one mechanism from another would be technically challenging and beyond the scope of the present work. With the objective of demonstrating that an iron-scavenging bacteria can elevate anticancer activity by circumventing nutritional immunity, we highlight our data in Fig. S6, which shows that IroA-E. coli administration resulted in higher bacterial colonization within solid tumors compared to WT-E. coli on Day 15. This increased bacterial presence supports our iron-scavenging bacteria design, and we highlight a few anticancer mechanisms mediated by the engineered bacteria. Firstly, as shown in Fig. 4d, IroA-E. coli is shown to induce an elevated iron stress response in tumor cells as the treated tumor cells show increased expression of transferrin receptors. Secondly, our experiments involving CD8+ T cell depletion indicates that the IroA-E. coli establishes a more robust anticancer CD8+ T cell response than WT bacteria. Both immune-mediated responses and alterations in iron status within the tumor microenvironment are demonstrated to contribute to the enhanced anticancer activity of IroA-E. coli in the present study.

      • If the effects are iron-related, could the authors provide some quantification of iron status in tumor cells and/or the TME? Could the proteomic data be queried for this data?

      We appreciate the reviewer’s query regarding the quantification of iron concentrations. In our study, we attempted various experimental approaches, including Immunohistochemistry utilizing an a Fe3+ probe, iron assay kit (ab83366), and Inductively Coupled Plasma Mass Spectrometry (ICP-MS). Despite these attempts, the quantification of oxidized Fe3+ concentrations proved challenging due to the inherently low levels of Fe ions and difficulty to distinguish Fe2+ and Fe3+. We observed measurements below the detection threshold of even the sensitive ICP-MS technique. Consequently, to circumvent this limitation, we designed an experiment wherein bacteria were cultured in a medium supplemented with Chrome Azurol S (CAS) reagent, which colormetrically detects siderophore activity. We compared WT bacteria and IroA-expressing bacteria at varying levels of Lcn2 proteins. The outcome, as depicted in the updated Fig. 3b, reveals an enhanced iron acquisition capability in IroA-E. coli under the presence of Lcn2 proteins, in comparison to the wild-type E. coli strains. In addition to the Lcn2 study, the proteomic study in Figure 4 highlights the competitive landscape between cancer cells and bacteria. We observed that IroA-E. coli showed reduced stress responses and exerted elevated iron-associated stress to cancer cells, thus further supporting the IroA-E. coli’s iron-scavenging capability against nutritional immunity.

      Reviewing Editor:

      The authors provide compelling technically sound evidence that bacteria, such as E. coli, can be engineered to sequester iron to potentially compete with tumor cells for iron resources and consequently reduce tumor growth. Long-term remission in IroA-E.coli treated mice is associated with enhanced CD8+ T cell activity and a synergistic effect with chemotherapy reagent oxaliplatin is observed to reduce tumor growth. The following additional assessments are needed to fully evaluate the current work for completeness; please see individual reviews for further details.

      We appreciate the editor’s positive comment.

      (1) The premise is one of translation yet the authors have not demonstrated that manipulating bacteria to sequester iron does not provide a potential for sepsis or other evidence that this does not increase the competitiveness of bacteria relative to the host. Only tumor volume was provided rather than animal survival and cause of death, but bacterial virulence is enhanced including the possibility of septic demise. Alternatively, postulated by the authors, that tumor volume is decreased due to iron sequestration but they do not directly quantify the iron concentration in (1) E. Coli in different growth environments, and (2) tumor microenvironment. These important endpoints will provide the functional consequences of upregulating genes that import iron into the bacteria.

      We appreciate the editor’s comment and have added substantial data to support the translational potential of the iron-scavenging bacteria. In particular, we added evidence that the iron-scavenging bacteria does not increase the risk of sepsis (Fig. 3k, l), evidence of increased bacteria competitiveness and survival in tumor (Fig. S6), and iron-scavenging bacteria’s superior anticancer ability and survival benefit across 3 different tumor models (Fig. 3e-j; Fig. S5). While direct measurement of iron concentration in the tumor environment is technically difficult due to the challenge in differentiating Fe2+ and Fe3+ by available techniques, we added a colormetric CAS assay to demonstrate the iron-scavenging bacteria can more effectively utility Fe than WT bacteria in the presence of LCN2 (Fig. 3b). These results substantiate the translational relevance of the engineered bacteria.

      (2) There is no discussion of the cancer type and why this cancer type was chosen. If the current tumor modulation system is dependent on LCN2 activity, there would need to be some recognition that different tumors have variable levels of LCN expression. Would the response of the tumor depend on the role of iron in that cancer type?

      We appreciate the comment and added relevant text and citations describing clinical relevance of LCN2 expression associated with the tumor types used in the study (breast cancer, melanoma, and colon cancer). Elevated LCN2 has been associated with higher aggressiveness for all three cancer types.

      (3) To demonstrate long-term anti-cancer memory was established through enhancement of CD8+ T cell activity (Fig 5c), the "2nd seeding tumor cells" experiment may need to be done in CD8 antibody-treated IronA mice since CD8+ T cells may play a role in tumor suppression regardless of whether or not iron regulation is being manipulated. It appears that the control group for this experiment is naive mice (and not WT-E. coli treated mice), in which case the immunologic memory could be from having had tumor/E. coli rather than the effect of IroA-E. coli.

      We acknowledge that our prior writing may have overstated our claim on immunological memory. Our intention is to show that upon treatment and tumor eradication by iron-scavenging bacteria, adaptive immunity mediated by CD8 T cells can be elicited. We also did not consider a WT-E. coli control as no WT-E. coli treated group achieved complete tumor regression. We have modified our text to reflect our intended message.

      Reviewer #1 (Recommendations For The Authors):

      All the figures seem to be in low resolution and pixelated. Please upload high-resolution ones.

      We have updated figures to high-resolution ones.

      Reviewer #2 (Recommendations For The Authors):

      Some specific comments towards experiments:

      (1) For Fig 2 f/ Fig 3f/ Fig 5d/Fig6c, the survival rate is based on the tumor volume (the mouse was considered dead when the tumor volume exceeded 1,500 mm3). Did the mice die from the experiment (how many from each group)? If it only reflects the tumor size, do these figures deliver the same information as the tumor growth figure?

      We appreciate the reviewer’s comment. The survival rate is indeed based on tumor volume, and we used a cutoff of 1500 mm3. No death event was observed prior to the tumors reaching 1500 mm3. Although the survival figures cover some of the information conveyed by the tumor volume tracking, the figures offer additional temporal resolution of tumor progression with the survival figures. Having both tumor volume and survival tracking are commonly adopted to depict tumor progression. We have the protocol regarding survival monitoring to the materials and method section.

      (2) Fig 3a, not sure if entE is a good negative control for this experiment. Neg. Ctrl should maintain its CFU/ml at a certain level regardless of Lcn2 conc. However, entE conc. is at 100 CUF/ml throughout the experiment suggesting there is no entE in media or if it is supersensitive to Lcn2 that bacteria die at the dose of 0.1nM?

      We appreciate the reviewer’s comment. The △entE-E. coli was indeed observed to be highly sensitive to LCN2. We included the control to highlight the competitive relationship between entE and LCN2 for iron chelation, which is previously reported in literature [Biometals 32, 453–467 (2019)].

      (3) Fig 4, the authors harvested bacteria from the tumor by centrifuging homogenized samples at different speeds. Internal controls confirming sample purity (positive for bacteria and negative for cells for panels a,b,c; or vice versa for panel d) may be necessary. This comment may also apply to samples from Fig 1.

      We acknowledge the reviewer’s concern and would like to point out that the proteomic analysis was performed using a highly cited protocol that provides reference and normalization standards for E. coli proteins [Mol Cell Proteomics. 2014 Sep; 13(9): 2513–2526]. The reference is cited in the Materials and Method section associated with the proteomic analysis.

      (4) To demonstrate long-term anti-caner memory was established through enhancement of CD8+ T cell activity, the "2nd seeding tumor cells" experiment may need to be done in CD8 antibody-treated IronA mice.

      We have modified our claims to highlight that the tumor eradication by iron scavenging bacteria can establish adaptive anticancer immunity through the elicitation of CD8 T cells. We apologize for overstating our claim in the previous manuscript draft.

      Minor suggestions:

      (1) Please include the tumor re-challenge experiment in the method section.

      The re-challenge experiment has been added to the method section as instructed.

      (2) Please cite others' and your previous work. E.g. line 281, 282, line 306-307.

      We have added the citations as instructed.

      (3) Line 448, BL21 is bacteria, not cells.

      We have made the correction accordingly.

      Reviewer #3 (Recommendations For The Authors):

      • The authors postulate that IroA-E. coli is more potent than DGC-E. coli in resisting LCN2 activity, and that this potency is the cause of the increased tumor suppression of this engineered strain. If so, Fig 3a should include DGC-E. coli for direct comparison.

      We appreciate the reviewer for the comment and would like to clarify that we intended construct IroA-E. coli as a more specific iron-scavenging strategy, which can aide the discussion of nutritional immunity and minimize compounding factors from the immune-stimulatory effect of CDG. We have modified our text to clarify our stance.

      • The data refers to the effects of WT bacteria-mediated tumor suppression, e.g. Figure 3e shows that even WT bacteria have a significant suppressive effect on tumor growth. Could the authors provide background on what is known about the mechanism of this tumor suppression, outside of tumor targeting and engineerability? They only reference "immune system stimulation."

      We appreciate the reviewer’s comment and would like to refer the reviewer to our recently published article [Lim et al., EMBO Molecular Medicine 2024; DOI: 10.1038/s44321-023-00022-w], which shows that in addition to immune system stimulation, WT bacteria can also be perceived as an invading species in the tumor that can exert differential selective pressure against cancer cells. Competition for nutrient is highlighted as a major contribution to contain tumor growth. In fact, the nutrient competition that we observed in the prior article inspired the design of the iron scavenging bacteria towards overcoming nutritional immunity. We have cited this recently published article to the revised manuscript to enrich the background.

      • The authors claim that there is immunologic memory because of tumor resistance in re-challenged mice after IroA-E. coli treatment (Fig 5c). It appears that the control group for this experiment is naive mice (and not WT-E. coli treated mice), in which case the immunologic memory could be from having had tumor/E. coli rather than the effect of IroA-E. coli.

      We have modified our claims to highlight that the tumor eradication by iron scavenging bacteria can establish adaptive anticancer immunity through the elicitation of CD8 T cells. We did not intend to highlight that the adaptive immunity stemmed from IroA-E. coli only, and we intend to build upon current literature that has reported CD8+ T cell elicitation by bacterial therapy. The IroA-E.coli is shown to enhance adaptive immunity. We also did not consider a WT-E. coli control as no WT-E. coli treated group achieved complete tumor regression.

      • The authors claim that CD8+ T cells are mechanistically important in the effects of iron status manipulation in E. coli-mediated tumor suppression (Fig 5). In order to show this, it seems that Fig 5c should include WT-E. coli and WT-E. coli+CD8 ab groups, as it may be that CD8+ T cells play a role in tumor suppression regardless of whether or not iron regulation is being manipulated.

      We apologize for the confusion from our prior writing. We have modified our claims to highlight that the tumor eradication by iron scavenging bacteria can establish adaptive anticancer immunity through the elicitation of CD8 T cells. We did not intend to convey that CD8+ T cells are mechanistically important in the effects of iron status manipulation.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      We thank the editorial team and reviewers for their continued contributions to improve our work.

      Below we have addressed the final recommendations to the authors

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      I asked previously why the suppression depth should vary based on the contrast change speed. I now understand that the authors expect this variation from a working model based on neural adaptation (lines 274-277 and 809-820). I suggest the authors specify this prediction also on lines 473-479, where there is room for improved clarity (the words/phrases 'impact,' 'be sensitive to,' and 'covary' are non-directional).

      We have now specified this prediction to improve clarity:

      Line 475 – 486

      “In the context of the tCFS method, the steady increases and decreases in the target’s actual strength (i.e., its contrast) should, respectively, boost its emergence from suppression (bCFS) and facilitate its reversion to suppression (reCFS) as it competes against the mask. Whether construed as a consequence of neural adaptation or error signal, we surmise that these cycling state transitions defining suppression depth should be sensitive to the rate of contrast change of the monocular target. Specifically, the slower the contrast change, the greater the amount of accrued adaptation, which will contract the range between breakthrough and suppression thresholds according to an adapting reciprocal inhibition model. For fast contrast change, there will be less accrual of adaptation meaning that the range between breakthrough and suppression thresholds will exhibit less contraction. Expressed in operational terms, the depth of suppression should be positively related to the rate of target change. Experiment 3 tested this supposition using three rates of contrast change.”

      Line 108: 'By comparing the thresholds for a target to transition into (reCFS) and out of awareness (bCFS)'-are 'into' and 'out of' reversed?

      They were, thank you, these have now been corrected.

      Lines 696-698 read, 'Figure 3 shows that polar patterns tend to emerge from suppression at slightly lower contrasts than do gratings.' In the same paragraph, lines 716-171 read, 'Figure 3 shows that bCFS and reCFS thresholds are very similar for all image categories.' There is a statistically significant effect of category in these results; meanwhile, the differences among categories are arguably small. Which side do the authors intend to emphasize? Are the readers meant to interpret this as a glass-half-full, half-empty situation?

      We have now revised this paragraph. We emphasise that the small differences do not support ‘preferential processing’ of the magnitude that would be expected from category specific neural CRFs.

      From Line 702

      “Next we turn to another question raised about our conclusion concerning invariant depth of suppression. If a certain image type had overall lower bCFS and reCFS contrast thresholds relative to another image type (despite equivalent suppression depth), would that imply the former image enjoyed “preferential processing” relative to the latter? And, what would determine the differences in bCFS and reCFS thresholds? Figure 3 shows that polar patterns tend to emerge from suppression at slightly lower contrasts than do gratings and that polar patterns, once dominant, tend to maintain dominance to lower contrasts than do gratings and this happens even though the rate of contrast change is identical for both types of stimuli. But while rate of contrast change is identical, the neural responses to those contrast changes may not be the same: neural responses to changing contrast will depend on the neural contrast response functions (CRFs) of the cells responding to each of those two types of stimuli, where the CRF defines the relationship between neural response and stimulus contrast. CRFs rise monotonically with contrast and typically exhibit a steeply rising initial response as stimulus contrast rises from low to moderate values, followed by a reduced growth rate for higher contrasts. CRFs can vary in how steeply they rise and at what contrast they achieve half-max response. CRFs for neurons in mid-level vision areas such as V4 and FFA (which respond well to polar stimuli and faces, respectively) are generally steeper and shifted towards lower contrasts than CRFs for neurons in primary visual cortex (which respond well to gratings). Therefore, the effective strength of the contrast changes in our tCFS procedure will depend on the shape and position of the underlying CRF, an idea we develop in more detail in Supplementary Appendix 1, comparing the case of V1 and V4 CRFs. Interestingly, the comparison of V1 and V4 CRFs shows two interesting points: (i) that V4 CRFs should produce much lower bCFS and reCFS thresholds than V1 CRFs, and (ii) that V4 CRFs should produce much more suppression than V1 CRFs. Our data do not support either prediction: bCFS and reCFS thresholds for the polar shape are not ‘much lower’ than those for gratings (Fig. 3) and neither is there ‘much more’ suppression depth for the polar form. There is no room in these results to support the claim that certain images are special and receive “preferential processing” or processing outside of awareness. Instead, the similar data patterns for all image types is most parsimoniously explained by a single mechanism processing all images (see Appendix 1), although there are many other kinds of images still to be tested in tCFS and exceptions may yet be found. As a first step in exploring this idea, one could use standard psychophysical techniques (e.g., (Ling & Carrasco, 2006)) to derive CRFs for different categories of patterns and then measure suppression depth associated with those patterns using tCFS.”

    1. Author response:

      The following is the authors’ response to the original reviews.

      The reviewers praised multiple aspects of our study. Reviewer 1 noted that “the work aligns well with current research trends and will greatly interest researchers in the field.” Reviewer 2 highlighted the unique capability of our imaging approach, which “allows for investigation of the heterogeneity of response across individual dopamine axons, unlike other common approaches such as fiber photometry.” Reviewer 3 commented that “the experiments are beautifully executed” and “are revealing novel information about how aversive and rewarding stimuli is encoded at the level of individual axons, in a way that has not been done before.”

      In addition to the positive feedback, the reviewers also provided useful criticisms and suggestions, some of which may not be fully addressed in a single study. For instance, questions regarding whether dopamine axons encode the valence or specific identity of the stimuli, or the most salient aspects of the environment, remain open. At the same time, as all the reviewers agreed, our report on the diversity of dopamine axonal responses using a novel imaging design introduces significant new insights to the neuroscience community. Following the reviewers’ recommendations, we have refrained from making interpretations that could be perceived as overinterpretation, such as concluding that “dopamine axons are involved in aversive processing.” This has necessitated extensive revisions, including modifying the title of our manuscript to make clear that the novelty of our work is revealing ‘functional diversity’ using our new imaging approach.

      Below, we respond to the reviewers’ comments point by point.

      eLife assessment

      This valuable study shows that distinct midbrain dopaminergic axons in the medial prefrontal cortex respond to aversive and rewarding stimuli and suggest that they are biased toward aversive processing. The use of innovative microprism based two-photon calcium imaging to study single axon heterogeneity is solid, although the experimental design could be optimized to distinguish aversive valence from stimulus salience and identity in this dopamine projection. This work will be of interest to neuroscientists working on neuromodulatory systems, cortical function and decision making.

      Reviewer #1

      Summary:

      In this manuscript, Abe and colleagues employ in vivo 2-photon calcium imaging of dopaminergic axons in the mPFC. The study reveals that these axons primarily respond to unconditioned aversive stimuli (US) and enhance their responses to initially-neutral stimuli after classical association learning. The manuscript is well-structured and presents results clearly. The utilization of a refined prism-based imaging technique, though not entirely novel, is well-implemented. The study's significance lies in its contribution to the existing literature by offering single-axon resolution functional insights, supplementing prior bulk measurements of calcium or dopamine release. Given the current focus on neuromodulator neuron heterogeneity, the work aligns well with current research trends and will greatly interest researchers in the field.

      However, I would like to highlight that the authors could further enhance their manuscript by addressing study limitations more comprehensively and by providing essential details to ensure the reproducibility of their research. In light of this, I have a number of comments and suggestions that, if incorporated, would significantly contribute to the manuscript's value to the field.

      Strengths:

      • Descriptive.

      • Utilization of a well-optimized prism-based imaging method.

      • Provides valuable single-axon resolution functional observations, filling a gap in existing literature.

      • Timely contribution to the study of neuromodulator neuron heterogeneity.

      We thank the reviewer for this positive assessment.

      Weaknesses:

      (1) It's important to fully discuss the fact that the measurements were carried out only on superficial layers (30-100um), while major dopamine projections target deep layers of the mPFC as discussed in the cited literature (Vander Weele et al., 2018) and as illustrated in FigS1B,C. This limitation should be explicitly acknowledged and discussed in the manuscript, especially given the potential functional heterogeneity among dopamine neurons in different layers. This potential across-layer heterogeneity could also be the cause of discrepancy among past recording studies with different measurement modalities. Also, mentioning technical limitations would be informative. For example: how deep the authors can perform 2p-imaging through the prism? was the "30-100um" maximum depth the authors could get?

      Thank you for pointing out this important issue about layer differences.

      It is possible that the mesocortial pathway has layer-specific channels, with some neurons targeting supra granular layers and others targeting infragranular ones. Alternatively, it is also plausible that the axons of the same neurons branch into both superficial and deep layers. This is a critical issue that has not been investigated in anatomical studies and will require single-cell labeling of dopamine neurons (Matsuda et al 2009 and Aransay et al 2015). We now discuss this issue in the Discussion.

      As for the imaging depth of 30–100 m, we were unable to visualize deeper axons in a live view mode. Our imaging system has already been optimized to detect weak signals (e.g., we have employed an excitation wavelength of 980 nm, dispersion compensation, and a hybrid photodetector). It is possible that future studies using improved imaging approaches may be able to visualize deeper layers. Importantly, sparse axons in the supragranular layers are advantageous in detecting weak signals; dense labeling of axons would increase the background fluorescence relative to signals. We now reference this layer issue in the Results and Discussion sections.

      (2) In the introduction, it seems that the authors intended to refer to Poulin et al. 2018 regarding molecular/anatomical heterogeneity of dopamine neurons, but they inadvertently cited Poulin et al. 2016 (a general review on scRNAseq). Additionally, the statement that "dopamine neurons that project to the PFC show unique genetic profiles (line 85)" requires clarification, as Poulin et al. 2018 did not specifically establish this point. Instead, they found at least the Vglut2/Cck+ population projects into mPFC, and they did not reject the possibility of other subclasses projecting to mPFC. Rather, they observed denser innervation with DAT-cre, suggesting that non-Vglut2/Cck populations would also project to mPFC. Discuss the potential molecular heterogeneity among mPFC dopamine axons in light of the sampling limitation mentioned earlier.

      We thank the reviewer for pointing this out. Genetic profiles of PFC-projecting DA neurons are still being investigated, so describing them as “unique” was misleading. We have edited the Introduction accordingly, and now discuss this issue in detail in the Discussion.

      (3) I find the data presented in Figure 2 to be odd. Firstly, the latency of shock responses in the representative axons (right panels of G, H) is consistently very long - nearly 500ms. It raises a query whether this is a biological phenomenon or if it stems from a potential technical artifact, possibly arising from an issue in synchronization between the 2-photon imaging and stimulus presentation. My reservations are compounded by the notable absence of comprehensive information concerning the synchronization of the experimental system in the method section.

      The synchronization of the stimulus and data acquisition is accomplished at a sub-millisecond resolution. We use a custom-made MATLAB program that sends TTL commands to standard imaging software (ThorImage or ScanImage) and a stimulator for electrical shocks. All events are recorded as analogue inputs to a different DAQ to ensure synchronization. We have provided additional details regarding the configuration in the Methods section.

      We consider that the long latency of shock response is biological. For instance, a similar long latency was found after electrical shock in a photometry imaging study (Kim, …, Deisseroth, 2016).

      Secondly, there appear to be irregularities in Panel J. While the authors indicate that "Significant axons were classified as either reward-preferring (cyan) or aversive-preferring (magenta), based on whether the axons are above or below the unity line of the reward/aversive scatter plot (Line 566)," a cyan dot slightly but clearly deviates above the unity line (around coordinates (x, y) = (20, 21)). This needs clarification. Lastly, when categorizing axons for analysis of conditioning data in Fig3 (not Fig2), the authors stated "The color-coded classification (cyan/magenta) was based on k-means clustering, using the responses before classical conditioning (Figure 2J)". I do not understand why the authors used different classification methods for two almost identical datasets.

      We thank the reviewer for pointing out these insufficient descriptions. We classified the axons using k-means clustering, and the separation of the two clusters happened to roughly coincide with the unity line of the reward/aversive scatter plot in Fig 2J. In other words, we did not use the unity line to classify the data points (which is why the color separation of the histogram is not at 45 degrees). We have clarified this point in the Methods section.

      (4) In connection with Point 3, conducting separate statistical analyses for aversive and rewarding stimuli would offer a fairer approach. This could potentially reveal a subset of axons that display responses to both aversive and appetitive stimuli, aligning more accurately with the true underlying dynamics. Moreover, the characterization of Figure 2J as a bimodal distribution while disregarding the presence of axons responsive to both aversive and appetitive cues seems somewhat arbitrary and circular logic. A more inclusive consideration of this dual-responsive population could contribute to a more comprehensive interpretation.

      We also attempted k-means clustering with additional dimensions (e.g., temporal domains as shown in Fig. 3I, J), but no additional clusters were evident. We note that the lack of other clusters does not exclude the possibility of their existence, which may only become apparent with a substantial increase in the number of samples. In the current report, we present the clusters that were the easiest/simplest for us to identify.

      Additionally, we have revised our manuscript to reflect that many axons respond to both reward and aversive stimuli, and that aversive-preferring axons do not exclusively respond to the aversive stimulus.

      (5) The contrast in initialization to novel cues between aversive and appetitive axons mirrors findings in other areas, such as the tail-of-striatum (TS) and ventral striatum (VS) projecting dopamine neurons (Menegas et al., 2017, not 2018). You might consider citing this very relevant study and discussing potential collateral projections between mPFC and TS or VS.

      Thank you for pointing this out. We have now included Menegas et al., 2017, and also discuss the possibility of collaterals to these areas. In addition, we also referred to Azcorra et al., 2023 - this was published after our initial submission.

      (6) The use of correlation values (here >0.65) to group ROIs into axons is common but should be justified based on axon density in the FOV and imaging quality. It's important to present the distribution of correlation values and demonstrate the consistency of results with varying cut-off values. Also, provide insights into the reliability of aversive/appetitive classifications for individual ROIs with high correlations. Importantly, if you do the statistical testing and aversive/appetitive classifications for individual ROIs with above-threshold high correlation (to be grouped into the same axon), do they always fall into the same category? How many false positives/false negatives are observed?


      "Our results remained similar for different correlation threshold values (Line 556)" (data not shown) is obsolete.

      We have conducted additional analysis using correlation values 0.5 and 0.3 that resulted in a smaller number of axon terminals. In essence, the relationship between reward responses and aversive responses remained very similar to Fig. 2J, K.

      Author response image 1.

      Reviewer #2 (Public Review):

      Summary:

      This study aims to address existing differences in the literature regarding the extent of reward versus aversive dopamine signaling in the prefrontal cortex. To do so, the authors chose to present mice with both a reward and an aversive stimulus during different trials each day. The authors used high spatial resolution two-photon calcium imaging of individual dopaminergic axons in the medial PFC to characterize the response of these axons to determine the selectivity of responses in unique axons. They also paired the reward (water) and an aversive stimulus (tail shock) with auditory tones and recorded across 12 days of associative learning.

      The authors find that some axons respond to both reward and aversive unconditioned stimuli, but overall, there is a strong preference to respond to aversive stimuli consistent with expectations from prior studies that used other recording methods. The authors find that both of their two auditory stimuli initially drive responses in axons, but that with training axons develop more selective responses for the shock associated tone indicating that associative learning led to changes in these axon's responses. Finally, the authors use anticipatory behaviors during the conditioned stimuli and facial expressions to determine stimulus discrimination and relate dopamine axons signals with this behavioral evidence of discrimination. This study takes advantage of cutting-edge imaging approaches to resolve the extent to which dopamine axons in PFC respond appetitive or aversive stimuli. They conclude that there is a strong bias to respond to the aversive tail shock in most axons and weaker more sparse representation of water reward.

      Strengths:

      The strength of this study is the imaging approach that allows for investigation of the heterogeneity of response across individual dopamine axons, unlike other common approaches such as fiber photometry which provide a measure of the average population activity. The use of appetitive and aversive stimuli to probe responses across individual axons is another strength.

      We thank the reviewer for this positive assessment.

      Weaknesses:

      A weakness of this study is the design of the associative conditioning paradigm. The use of only a single reward and single aversive stimulus makes it difficult to know whether these results are specific to the valence of the stimuli versus the specific identity of the stimuli. Further, the reward presentations are more numerous than the aversive trials making it unclear how much novelty and habituation account for results. Moreover, the training seems somewhat limited by the low number of trials and did not result in strong associative conditioning. The lack of omission responses reported may reflect weak associative conditioning. Finally, the study provides a small advance in our understanding of dopamine signaling in the PFC and lacks evidence for if and what might be the consequence of these axonal responses on PFC dopamine concentrations and PFC neuron activity.

      We thank the reviewer for the suggestions.

      We agree that interpreting the response change during classical conditioning is not straightforward. Although the reward and aversive stimuli we employed are commonly used in the field, future studies with more sophisticated paradigms will be necessary to address whether dopamine axons encode the valence of the stimuli, the specific identity of the stimuli, or novelty and habituation. In our current manuscript, we refrain from making a conclusion that distinct groups of neurons encode different valances. In fact, many axons respond to both stimuli, at different ratios. We have removed descriptions that may suggest exclusive coding of reward or aversive processing. Additionally, we have extensively discussed possible interpretations.

      In terms of the strength of the conditioning association, behavioral results indicated that the learning plateaued – anticipatory behaviors did not increase during the last two phases when the conditioned span was divided into six phases (Figure 3–figure supplement 1).

      Our goal in the current manuscript is to provide new insight into the functional diversity of dopamine axons in the mPFC. Investigating the impact of dopamine axons on local dopamine concentration and neural activity in the mPFC is important but falls beyond the scope of our current study. In particular, given the functional diversity of dopamine axons, interpreting bulk optogenetic or chemogenetic axonal manipulation experiments would not be straightforward. As suggested, measuring the dopamine concentration through two-photon imaging of dopamine sensors and monitoring the activity of dopamine recipient neurons (e.g., D1R- or D2R-expressing neurons) is a promising approach that we plan to undertake in the near future.

      Reviewer #3 (Public Review):

      Summary:

      The authors image dopamine axons in medial prefrontal cortex (mPFC) using microprism-mediated two-photon calcium imaging. They image these axons as mice learn that two auditory cues predict two distinct outcomes, tailshock or water delivery. They find that some axons show a preference for encoding of the shock and some show a preference for encoding of water. The authors report a greater number of dopamine axons in mPFC that respond to shock. Across time, the shock-preferring axons begin to respond preferentially to the cue predicting shock, while there is a less pronounced increase in the water-responsive axons that acquire a response to the water-predictive cue (these axons also increase non-significantly to the shock-predictive cue). These data lead the authors to argue that dopamine axons in mPFC preferentially encode aversive stimuli.

      Strengths:

      The experiments are beautifully executed and the authors have mastered an impressively complex technique. Specifically, they are able to image and track individual dopamine axons in mPFC across days of learning. This technique is used the way it should be: the authors isolate distinct dopamine axons in mPFC and characterize their encoding preferences and how this evolves across learning of cue-shock and cue-water contingencies. Thus, these experiments are revealing novel information about how aversive and rewarding stimuli is encoded at the level of individual axons, in a way that has not been done before. This is timely and important.

      We thank the reviewer for this positive assessment.

      Weaknesses:

      The overarching conclusion of the paper is that dopamine axons preferentially encode aversive stimuli. This is prevalent in the title, abstract, and throughout the manuscript. This is fundamentally confounded. As the authors point out themselves, the axonal response to stimuli is sensitive to outcome magnitude (Supp Fig 3). That is, if you increase the magnitude of water or shock that is delivered, you increase the change in fluorescence that is seen in the axons. Unsurprisingly, the change in fluorescence that is seen to shock is considerably higher than water reward.

      We agree that the interpretation of our results is not straightforward. Our current manuscript now focuses on our strength, which is reporting the functional diversity of dopamine axons. Therefore, we avoid using the word ‘encode’ when describing the response.

      We believe that our results could reconcile the apparent discrepancy as to why some previous studies reported only aversive responses while others reported reward responses. In particular, if the reward volume were very small, the reward response could go undetected.

      Further, when the mice are first given unexpected water delivery and have not yet experienced the aversive stimuli, over 40% of the axons respond [yet just a few lines below the authors write: "Previous studies have demonstrated that the overall dopamine release at the mPFC or the summed activity of mPFC dopamine axons exhibits a strong response to aversive stimuli (e.g., tail shock), but little to rewards", which seems inconsistent with their own data].

      We always recorded the reward and aversive response together, which might have confused the reviewer. Therefore, there is no inconsistency in our data. We have clarified our methods and reasoning accordingly.

      Given these aspects of the data, it could be the case that the dopamine axons in mPFC encodes different types of information and delegates preferential processing to the most salient outcome across time.

      This is certainly an exciting interpretation, so we have included it in our discussion. Meanwhile, ‘the most salient outcome’ alone cannot fully capture the diverse response patterns of the dopaminergic axons, particularly reward-preferring axons. We discuss our findings in more detail in the revised manuscript.

      The use of two similar sounding tones (9Khz and 12KHz) for the reward and aversive predicting cues are likely to enhance this as it requires a fine-grained distinction between the two cues in order to learn effectively. There is considerable literature on mPFC function across species that would support such a view. Specifically, theories of mPFC function (in particular prelimbic cortex, which is where the axon images are mostly taken) generally center around resolution of conflict in what to respond, learn about, and attend to. That is, mPFC is important for devoting the most resources (learning, behavior) to the most relevant outcomes in the environment. This data then, provides a mechanism for this to occur in mPFC. That is, dopamine axons signal to the mPFC the most salient aspects of the environment, which should be preferentially learned about and responded towards. This is also consistent with the absence of a negative prediction error during omission: the dopamine axons show increases in responses during receipt of unexpected outcomes, but do not encode negative errors. This supports a role for this projection in helping to allocate resources to the most salient outcomes and their predictors, and not learning per se. Below are a just few references from the rich literature on mPFC function (some consider rodent mPFC analogous to DLPFC, some mPFC), which advocate for a role in this region in allocating attention and cognitive resources to most relevant stimuli, and do not indicate preferential processing of aversive stimuli.

      Distinguishing between 9 kHz and 12 kHz sound tones may not be that difficult, considering anticipatory licking and running are differentially manifested. In addition, previous studies have shown that mice can distinguish between two sound tones when they are separated by 7% (de Hoz and Nelken 2014). Nonetheless, we agree with the attractive interpretation that “the mPFC devotes the most resources (learning, behavior) to the most relevant outcomes in the environment” and that dopamine is a mechanism for this. Therefore, we discuss this interpretation in the revised text.

      References:

      (1) Miller, E. K., & Cohen, J. D. (2001). An integrative theory of prefrontal cortex function. Annual review of neuroscience, 24(1), 167-202.

      (2) Bissonette, G. B., Powell, E. M., & Roesch, M. R. (2013). Neural structures underlying set-shifting: roles of medial prefrontal cortex and anterior cingulate cortex. Behavioural brain research, 250, 91101.

      (3) Desimone, R., & Duncan, J. (1995). Neural mechanisms of selective visual attention. Annual review of neuroscience, 18(1), 193-222.

      (4) Sharpe, M. J., Stalnaker, T., Schuck, N. W., Killcross, S., Schoenbaum, G., & Niv, Y. (2019). An integrated model of action selection: distinct modes of cortical control of striatal decision making. Annual review of psychology, 70, 53-76.

      (5) Ridderinkhof, K. R., Ullsperger, M., Crone, E. A., & Nieuwenhuis, S. (2004). The role of the medial frontal cortex in cognitive control. science, 306(5695), 443-447.

      (6) Nee, D. E., Kastner, S., & Brown, J. W. (2011). Functional heterogeneity of conflict, error, taskswitching, and unexpectedness effects within medial prefrontal cortex. Neuroimage, 54(1), 528-540.

      (7) Isoda, M., & Hikosaka, O. (2007). Switching from automatic to controlled action by monkey medial frontal cortex. Nature neuroscience, 10(2), 240-248.

      Reviewer #1 (Recommendations For The Authors):

      Specific Suggestions and Questions on the Methods Section:

      In general, the methods part is not well documented and sometimes confusing. Thus, as it stands, it hinders reproducible research. Specific suggestions/questions are listed in the following section.

      (1) Broussard et al. 2018 introduced axon-GCaMP6 instead of axon-jGCaMP8m. The authors should provide details about the source of this material. If it was custom-made, a description of the subcloning process would be appreciated. Additionally, consider depositing sequence information or preferably the plasmid itself. Furthermore, the introduction of the jGCaMP8 series by Zhang, Rozsa, et al. 2023 should be acknowledged and referenced in your manuscript.

      We thank the reviewer for pointing this out. We have now included details on how we prepared the axon-jGCaMP8m, which was based on plasmids available at Addgene. Additionally, we have deposited our construct to Addgene ( https://www.addgene.org/216533/ ). We have also cited Janelia’s report on jGCaMP8, Zhang et al.

      (2) The authors elaborate on the approach taken for experimental synchronization. Specifically, how was the alignment achieved between 2-photon imaging, treadmill recordings, aversive/appetitive stimuli, and videography? It would be important to document the details of the software and hardware components employed for generating TTLs that trigger the pump, stimulator, cameras, etc.

      We have now included a more detailed explanation about the timing control. We utilize a custommade MATLAB program that sends TTL square waves and analogue waves via a single National Instruments board (USB-6229) to control two-photon image acquisition, behavior camera image acquisition, water syringe movement, current flow from a stimulator, and sound presentation. We also continuously recorded at 30 kHz via a separate National Instrument board (PCIe-6363) the frame timing of two-photon imaging, the frame timing of a behavior camera, copies of command waves (sent to the syringe pump, the stimulator, and the speaker), and signals from the treadmill corresponding to running speed.

      (3) The information regarding the cameras utilized in the study presents some confusion. In one instance, you mention, "To monitor licking behavior, the face of each mouse was filmed with a camera at 60 Hz (CM3-U3-13Y3M-CS, FLIR)" (Line 488). However, there's also a reference to filming facial expressions using an infrared web camera (Line 613). Could you clarify whether the FLIR camera (which is an industrial CMOS not a webcam) is referred to as a webcam? Alternatively, if it's a different camera being discussed, please provide product details, including pixel numbers and frame rate for clarity.

      We thank the reviewer for pointing this out. This was a mistake on our end. The camera used in the current project was a CM3-U3-13Y3M-CS, not a web camera. We have now corrected this.

      (4) Please provide more information about the methodology employed for lick detection. Specifically, did the authors solely rely on videography for this purpose? If so, why was an electrical (or capacitive) detector not used? It would provide greater accuracy in detecting licking.

      Lick detection was performed offline based on videography, using DeepLabCut. As licking occurs at a frequency of ~6.5 Hz (Xu, …, O’Connor Nature Neurosci, 2022), the movement can be detected at a frame rate of 60 Hz. Initially, we used both a lick sensor and videography. However, we favored videography because it could potentially provide non-binary information.

      Other Minor Points:

      (5) Ensure consistency in the citation format; both Vander Weele et al. 2018 and Weele et al. 2019, share the same first author.

      Thank you for pointing this out. Endnote processes the first author’s name differently depending on the journal. We fixed the error manually. The first paper (2018) is an original research paper, and the second one (2019) is a review about how dopamine modulates aversive processing in the mPFC. We cited the second one in three instances where we mentioned review papers.

      (6) The distinction between "dashed vs dotted lines" in Figure 3K and 3M appears to be very confusing. Please consider providing a clearer visualization/labeling to mitigate this confusion.

      We have now changed the line styles.

      (7) Additionally plotting mean polar angles of aversive/appetitive axons as vectors in the Cartesian scatter plots (2J, 3I,J) would make interpretation easier.

      We have now made this change to Figures 2, 3, 4.

      (8) Data and codes should be shared in a public database. This is important for reproducible research and we believe that "available from the corresponding author upon reasonable request" is outdated language.

      We have uploaded the data to GitHub, https://github.com/pharmedku/2024-elife-da-axon.

      Reviewer #2 (Recommendations For The Authors):

      (1) Authors don't show which mouse each axon data comes from making it hard to know if differences arise from inter-mouse differences vs differences in axons. The best way to address this point is to show similar plots as Figure 2J & K but broken down by mouse to shows whether each mouse had evidence of these two clusters.

      We have now made this change to Figure 2-figure supplement 3.

      (2) Line 166: Should this sentence point to panels 2F, G, H rather than 2I which doesn't show a shock response?

      We thank the reviewer for pointing this out. We have fixed the incorrect labels.

      Line 195: The population level bias to aversive stimuli was shown previously using photometry so it is not justified to say "for the first time" regarding this statement.

      We have adjusted this sentences so the claim of ”for the first time” is not associated with the population-level bias.

      (4) The paper lacks a discussion of the potential role that novelty plays in the amplitude of the responses given that tail shocks occur less often that rewards. Is the amplitude of the first reward of the day larger than subsequent rewards? Would tail shock responses decay if they occurred in sequential trials?

      Following the reviewer's suggestion, we conducted a comparison of individual axonal responses to both conditioned and unconditioned stimuli across the first trial and subsequent trials. Our findings reveal a notable trend: aversive-preferring axons exhibited attenuation in response to CSreward, yet enhancement in response to CSaversive. Conversely, the response of these axons to USreward was attenuated, with no significant change observed for USaversive. In contrast, reward-preferring axons displayed an invariable activity pattern from the initial trial, highlighting the functional diversity present within dopamine axons. This analysis has been integrated into Figure 3-figure supplement 4 and is elaborated upon in the Discussion section.

      (5) Fix typo in Figure 1 - supplement 1. Shift

      We have now corrected this. Thank you.

      (6) The methods section needs information about trial numbers. Please indicate how many trials were presented to each mouse per day.

      We have now added the information about trial numbers to the Methods section.

      Reviewer #3 (Recommendations For The Authors):

      In line with the public review, my recommendation is for the authors to remain as objective about their data as possible. There are many points in the manuscript where the authors seem to directly contradict their own data. For example, they first detail that dopamine axons respond to unexpected water rewards. Indeed, they find that there are 40% of dopamine axons that respond in this way. Then, a few paragraphs later they state: "Previous studies have demonstrated that the overall dopamine release at the mPFC or the summed activity of mPFC dopamine axons exhibits a strong response to aversive stimuli (e.g., tail shock), but little to rewards". As detailed above, I do not think these data support an idea that dopamine axons in mPFC preferentially encode aversive outcomes. If the authors wanted to examine a role for mPFC in preferential encoding of aversive stimuli, you would first have to equate the outcomes by magnitude and then compare how the axons acquire preferences across time. Alternatively, a prediction of a more general process that I detail above would predict that you could give mice two rewards that differ in magnitude (e.g., lots of food vs. small water) and you would see the same results that the authors have seen here (i.e., a preference for the food, which is the larger and more salient outcome). Without other tests of how dopamine axons in mPFC respond to situations like this, I don't think any conclusion around mPFC in favoring aversive stimuli can be made.

      As suggested, we have made the current manuscript as objective as possible, removing interpretation aspects regarding what dopamine axons encode and emphasizing their functional diversity. In particular, we remove the word ‘encode’ when describing the response of dopamine axons.

      Although it may have appeared unclear, there was no contradiction within our data regarding the response to reward and aversive stimuli. We have now improved the readability of the Results and Methods sections. Concerning the interpretation of what exactly the mPFC dopamine axons encode, we have rewritten the discussion to be as objective about our data as possible, as suggested. We also have edited our title and abstract accordingly. Meanwhile, we wish to emphasize that our reward and aversive stimuli are standard paradigms commonly used in the field. We believe, and all the reviewers agreed, that reporting the diversity of dopamine axonal responses with a novel imaging design constitutes new insight for the neuroscience community. Therefore, we have decided to leave the introduction of new behavioral tasks for future studies and instead expanded our discussion.

      As mentioned, I think the experiments are executed really well and the technological aspects of the authors' methods are impressive. However, there are also some aspects of the data presentation that would be improved. Some of the graphs took a considerable amount of effort to unpack. For example, Figure 4 is hard going. Is there a way to better illustrate the main points that this figure wants to convey? Some of this might be helped by a more complete description in the figure captions about what the data are showing. It would also be great to see how the response of dopamine axons changes across trial within a session to the shock and water-predictive cues. Supp Figure 1 should be in the main text with standard error and analyses across time. Clarifying these aspects of the data would make the paper more relevant and accessible to the field.

      We thank the reviewer for pointing out that the legend of Figure 4 was incomplete. We have fixed it, along with improving the presentation of the figure. We have also prepared a new figure (Figure 3– figure supplement 4) to compare CSaversive and CSreward signals for the first and rest of the trials within daily sessions, revealing further functional diversity in dopamine axons. We have decided to keep Figure 1–figure supplement 2 as a figure supplement with an additional analysis, as another reviewer pointed out that the design is not completely new. Furthermore, as eLife readers can easily access figure supplements, we believe it is appropriate to maintain it in this way.

      Minor points:

      (1) What is the control period for the omission test? Was omission conducted for the shock?

      The control period for reward omission is a 2-second period just before the CS onset. We did not include shock omission, because a sufficient number of trials (> 6 trials) for the rare omission condition could not be achieved within a single day.

      (2) The authors should mention how similar the tones were that predicted water and shock.

      According to de Hoz and Nelken (2014), a frequency difference of 4–7% is enough for mice to discriminate between tones. In addition, anticipatory licking and running confirmed that the mice could discriminate between the frequencies. We have now included this information in the Discussion.

      (3) I realize the viral approach used in the current studies may not allow for an idea of where in VTA dopamine neurons are that project to mPFC- is there data in the literature that speak to this? Particularly important as we now know that there is considerable heterogeneity in dopamine neuronal responses, which is often captured by differences in medial/lateral position within VTA.

      Some studies have suggested that mesocortical dopamine neurons are located in the medial posterior VTA (e.g., Lammel et al., 2008). However, in mouse anterograde tracing, it is not possible to spatially confine the injection of conventional viruses/tracers. We now refer to Lammel et al., 2008 in the Introduction.

    1. Author response:

      eLife assessment

      This study provides valuable information on the mechanism of PepT2 through enhanced-sampling molecular dynamics, backed by cell-based assays, highlighting the importance of protonation of selected residues for the function of a proton-coupled oligopeptide transporter (hsPepT2). The molecular dynamics approaches are convincing, but with limitations that could be addressed in the manuscript, including lack of incorporation of a protonation coordinate in the free energy landscape, possibility of protonation of the substrate, errors with the chosen constant pH MD method for membrane proteins, dismissal of hysteresis emerging from the MEMENTO method, and the likelihood of other residues being affected by peptide binding. Some changes to the presentation could be considered, including a better description of pKa calculations and the inclusion of error bars in all PMFs. Overall, the findings will appeal to structural biologists, biochemists, and biophysicists studying membrane transporters.

      We would like to express our gratitude to the reviewers for providing their feedback on our manuscript, and also for recognising the variety of computational methods employed, the amount of sampling collected and the experimental validation undertaken. Following the individual reviewer comments, as addressed point-by-point below, we will shortly prepare a revised version of this paper. Intended changes to the revised manuscript are marked up in bold font in the detailed responses below, but before that we address some of the comments made above in the general assessment:

      • “lack of incorporation of a protonation coordinate in the free energy landscape”. We acknowledge that of course it would be highly desirable to treat protonation state changes explicitly and fully coupled to conformational changes. However, at this point in time, evaluating such a free energy landscape is not computationally feasible (especially considering that the non-reactive approach taken here already amounts to almost 1ms of total sampling time). Previous reports in the literature tend to focus on either simpler systems or a reduced subset of a larger problem. As we were trying to obtain information on the whole transport cycle, we decided to focus here on non-reactive methods.

      • “possibility of protonation of the substrate”. The reviewers are correct in pointing out this possibility, which we had not discussed explicitly in our manuscript. Briefly, while we describe a mechanism in which protonation of only protein residues (with an unprotonated ligand) can account for driving all the necessary conformational changes of the transport cycle, there is some evidence for a further intermediate protonation site in our data (as we commented on in the first version of the manuscript as well), which may or may not be the substrate itself. A future explicit treatment of the proton movements through the transporter, when it will become computationally tractable to do so, will have to include the substrate as a possible protonation site; for the present moment, we will amend our discussion to alert the reader to the possibility that the substrate could be an intermediate to proton transport. This has repercussions for our study of the E56 pKa value, where – if protons reside with a significant population at the substrate C-terminus – our calculated shift in pKa upon substrate binding could be an overestimate, although we would qualitatively expect the direction of shift to be unaffected. However, we also anticipate that treating this potential coupling explicitly would make convergence of any CpHMD calculation impractical to achieve and thus it may be the case that for now only a semi-quantitative conclusion is all that can be obtained.

      • “errors with the chosen constant pH MD method for membrane proteins”. We acknowledge that – as reviewer #1 has reminded us – the AMBER implementation of hybrid-solvent CpHMD is not rigorous for membrane proteins, and as such we will add a cautionary note to our paper. We will also explain how the use of the ABFE thermodynamic cycle calculations helps to validate the CpHMD results in a completely orthogonal manner (we will promote this validation which was in the supplementary figures into the main text in the revised version). We therefore remain reasonably confident in the results presented with regards to the reported pKa shift of E56 upon substrate binding, and suggest that if the impact of neglecting the membrane in the implicit-solvent stage of CpHMD is significant, then there is likely an error cancellation when considering shifts induced by the incoming substrate.

      • “dismissal of hysteresis emerging from the MEMENTO method”. We have shown in our method design paper how the use of the MEMENTO method drastically reduces hysteresis compared to steered MD and metadynamics for path generation, and find this improvement again for PepT2 in this study. We will address reviewer #3’s concern about our presentation on this point by revising our introduction of the MEMENTO method, as detailed in the response below.

      • “the likelihood of other residues being affected by peptide binding”. In this study, we have investigated in detail the involvement of several residues in proton-coupled di-peptide transport by PepT2. Short of the potential intermediate protonation site mentioned above, the set of residues we investigate form a minimal set of sorts within which the important driving forces of alternating access can be rationalised. We have not investigated in substantial detail here the residues involved in holding the peptide in the binding site, as they are well studied in the literature and ligand promiscuity is not the problem of interest here. It remains entirely possible that further processes contribute to the mechanism of driving conformational changes by involving other residues not considered in this paper. We will make our speculation that an ensemble of different processes may be contributing simultaneously more explicit in our revision, but do not believe any of our conclusions would be affected by this.

      As for the additional suggested changes in presentation, we will provide the requested details on the CpHMD analysis. Furthermore, we will use the convergence data presented separately in figures S12 and S16 to include error bars on our 1D-reprojections of the 2D-PMFs in figures 3, 4 and 5. (Note that we will opt to not do so in figures S10 and S15 which collate all 1D PMF reprojections for the OCC ↔ OF and OCC ↔ IF transitions in single reference plots, respectively, to avoid overcrowding those necessarily busy figures). We are also changing the colours schemes of these plots in our revision to improve accessibility.

      Reviewer #1 (Public Review):

      The authors have performed all-atom MD simulations to study the working mechanism of hsPepT2. It is widely accepted that conformational transitions of proton-coupled oligopeptide transporters (POTs) are linked with gating hydrogen bonds and salt bridges involving protonatable residues, whose protonation triggers gate openings. Through unbiased MD simulations, the authors identified extra-cellular (H87 and D342) and intra-cellular (E53 and E622) triggers. The authors then validated these triggers using free energy calculations (FECs) and assessed the engagement of the substrate (Ala-Phe dipeptide). The linkage of substrate release with the protonation of the ExxER motif (E53 and E56) was confirmed using constant-pH molecular dynamics (CpHMD) simulations and cellbased transport assays. An alternating-access mechanism was proposed. The study was largely conducted properly, and the paper was well-organized. However, I have a couple of concerns for the authors to consider addressing.

      We would like to note here that it may be slightly misleading to the reader to state that “The linkage of substrate release with the protonation of the ExxER motif (E53 and E56) was confirmed using constant-pH molecular dynamics (CpHMD) simulations and cell-based transport assays.” The cellbased transport assays confirmed the importance of the extracellular gating trigger residues H87, S321 and D342 (as mentioned in the preceding sentence), not of the substrate-protonation link as this line might be understood to suggest.

      (1) As a proton-coupled membrane protein, the conformational dynamics of hsPepT2 are closely coupled to protonation events of gating residues. Instead of using semi-reactive methods like CpHMD or reactive methods such as reactive MD, where the coupling is accounted for, the authors opted for extensive non-reactive regular MD simulations to explore this coupling. Note that I am not criticizing the choice of methods, and I think those regular MD simulations were well-designed and conducted. But I do have two concerns.

      a) Ideally, proton-coupled conformational transitions should be modelled using a free energy landscape with two or more reaction coordinates (or CVs), with one describing the protonation event and the other describing the conformational transitions. The minimum free energy path then illustrates the reaction progress, such as OCC/H87D342- → OCC/H87HD342H → OF/H87HD342H as displayed in Figure 3.

      We concur with the reviewer that the ideal way of describing the processes studied in our paper would be as a higher-dimensional free energy landscapes obtained from a simulation method that can explicitly model proton-transfer processes. Indeed, it would have been particularly interesting and potentially informative with regards to the movement of protons down into the transporter in the OF → OCC → IF sequence of transitions. As we note in our discussion on the H87→E56 proton transfer:

      “This could be investigated using reactive MD or QM/MM simulations (both approaches have been employed for other protonation steps of prokaryotic peptide transporters, see Parker et al. (2017) and Li et al. (2022)). However, the putative path is very long (≈ 1.7 nm between H87 and E56) and may or may not involve a large number of intermediate protonatable residues, in addition to binding site water. While such an investigation is possible in principle, it is beyond the scope of the present study.”

      Where even sampling the proton transfer step itself in an essentially static protein conformation would be pushing the boundaries of what has been achieved in the field, we believe that considering the current state-of-the-art, a fully coupled investigation of large-scale conformational changes and proton-transfer reaction is not yet feasible in a realistic/practical time frame. We also note this limitation already when we say that:

      “The question of whether proton binding happens in OCC or OF warrants further investigation, and indeed the co-existence of several mechanisms may be plausible here”.

      Nonetheless, we are actively exploring approaches to treat uptake and movement of protons explicitly for future work.

      In our revision, we will expand on our discussion of the reasoning behind employing a nonreactive approach and the limitations that imposes on what questions can be answered in this study.

      Without including the protonation as a CV, the authors tried to model the free energy changes from multiple FECs using different charge states of H87 and D342. This is a practical workaround, and the conclusion drawn (the OCC→ OF transition is downhill with protonated H87 and D342) seems valid. However, I don't think the OF states with different charge states (OF/H87D342-, OF/H87HD342-, OF/H87D342H, and OF/H87HD342H) are equally stable, as plotted in Figure 3b. The concern extends to other cases like Figures 4b, S7, S10, S12, S15, and S16. While it may be appropriate to match all four OF states in the free energy plot for comparison purposes, the authors should clarify this to ensure readers are not misled.

      The reviewer is correct in their assessment that the aligning of PMFs in these figures is arbitrary; no relative free energies of the PMFs to each other can be estimated without explicit free energy calculations at least of protonation events at the end state basins. The PMFs in our figures are merely superimposed for illustrating the differences in shape between the obtained profiles in each condition, as discussed in the text, and we will make this clear in the appropriate figure captions in our revision.

      b) Regarding the substrate impact, it appears that the authors assumed fixed protonation states. I am afraid this is not necessarily the case. Variations in PepT2 stoichiometry suggest that substrates likely participate in proton transport, like the Phe-Ala (2:1) and Phe-Gln (1:1) dipeptides mentioned in the introduction. And it is not rigorous to assume that the N- and C-termini of a peptide do not protonate/deprotonate when transported. I think the authors should explicitly state that the current work and the proposed mechanism (Figure 8) are based on the assumption that the substrates do not uptake/release proton(s).

      This is indeed an assumption inherent in the current work. While we do “speculate that the proton movement processes may happen as an ensemble of different mechanisms, and potentially occur contemporaneously with the conformational change” we do not in the current version indicate explicitly that this may involve the substrate. We will make clear the assumption and this possibility in the revised version of our paper. Indeed, as we discuss, there is some evidence in our PMFs of an additional protonation site not considered thus far, which may or may not be the substrate. We will make note of this point in the revised manuscript.

      As for what information can be drawn from the given experimental stoichiometries, we note in our paper that “a 2:1 stoichiometry was reported for the neutral di-peptide D-Phe-L-Ala and 3:1 for anionic D-Phe-L-Glu. (Chen et al., 1999) Alternatively, Fei et al. (1999) have found 1:1 stoichiometries for either of D-Phe-L-Gln (neutral), D-Phe-L-Glu (anionic), and D-Phe-L-Lys (cationic).”

      We do not assume that it is our place to arbit among the apparent discrepancies in the experimental data here, although we believe that our assumed 2:1 stoichiometry is additionally “motivated also by our computational results that indicate distinct and additive roles played by two protons in the conformational cycle mechanism”.

      (2) I have more serious concerns about the CpHMD employed in the study.

      a) The CpHMD in AMBER is not rigorous for membrane simulations. The underlying generalized Born model fails to consider the membrane environment when updating charge states. In other words, the CpHMD places a membrane protein in a water environment to judge if changes in charge states are energetically favorable. While this might not be a big issue for peripheral residues of membrane proteins, it is likely unphysical for internal residues like the ExxER motif. As I recall, the developers have never used the method to study membrane proteins themselves. The only CpHMD variant suitable for membrane proteins is the membrane-enabled hybrid-solvent CpHMD in CHARMM. While I do not expect the authors to redo their CpHMD simulations, I do hope the authors recognize the limitations of their method.

      We will discuss the limitations of the AMBER CpHMD implementation in the revised version. However, despite that, we believe we have in fact provided sufficient grounds for our conclusion that substrate binding affects ExxER motif protonation in the following way:

      In addition to CpHMD simulations, we establish the same effect via ABFE calculations, where the substrate affinity is different at the E56 deprotonated vs protonated protein. This is currently figure S20, though in the revised version we will move this piece of validation into a new panel of figure 6 in the main text, since it becomes more important with the CpHMD membrane problem in mind. Since the ABFE calculations are conducted with an all-atom representation of the lipids and the thermodynamic cycle closes well, it would appear that if the chosen CpHMD method has a systematic error of significant magnitude for this particular membrane protein system, there may be the benefit of error cancellation. While the calculated absolute pKa values may not be reliable, the difference made by substrate binding appears to be so, as judged by the orthogonal ABFE technique.

      Although the reviewer does “not expect the authors to redo their CpHMD simulations”, we consider that it may be helpful to the reader to share in this response some results from trials using the continuous, all-atom constant pH implementation that has recently become available in GROMACS (Aho et al 2022, https://pubs.acs.org/doi/10.1021/acs.jctc.2c00516) and can be used rigorously with membrane proteins, given its all-atom lipid representation.

      Unfortunately, when trying to titrate E56 in this CpHMD implementation, we found few protonationstate transitions taking place, and the system often got stuck in protonation state–local conformation coupled minima (which need to interconvert through rearrangements of the salt bridge network involving slow side-chain dihedral rotations in E53, E56 and R57). Author response image 1 shows this for the apo OF state, Author response image 2 shows how noisy attempts at pKa estimation from this data turn out to be, necessitating the use of a hybrid-solvent method.

      Author response image 1.

      All-atom CpHMD simulations of apo-OF PepT2. Red indicates protonated E56, blue is deprotonated.

      Author response image 2.

      Difficulty in calculating the E56 pKa value from the noisy all-atom CpHMD data shown in Author response image 1

      b) It appears that the authors did not make the substrate (Ala-Phe dipeptide) protonatable in holosimulations. This oversight prevents a complete representation of ligand-induced protonation events, particularly given that the substrate ion pairs with hsPepT2 through its N- & C-termini. I believe it would be valuable for the authors to acknowledge this potential limitation.

      In this study, we implicitly assumed from the outset that the substrate does not get protonated, which – as by way of response to the comment above – we will acknowledge explicitly in revision. This potential limitation for the available mechanisms for proton transfer also applies to our investigation of the ExxER protonation states. In particular, a semi-grand canonical ensemble that takes into account the possibility of substrate C-terminus protonation may also sample states in which the substrate is protonated and oriented away from R57, thus leaving the ExxER salt bridge network in an apo-like state. The consequence would be that while the direction of shift in E56 pKa value will be the same, our CpHMD may overestimate its magnitude. It would thus be interesting to make the C-terminus protonatable for obtaining better quantitative estimates of the E56 pKa shift (as is indeed true in general for any other protein protonatable residue, though the effects are usually assumed to be negligible). We do note, however, that convergence of the CpHMD simulations would be much harder if the slow degree of freedom of substrate reorientation (which in our experience takes 10s to 100s of ns in this binding pocket) needs to be implicitly equilibrated upon protonation state transitions. We will discuss such considerations in the revision.

      Reviewer #2 (Public Review):

      This is an interesting manuscript that describes a series of molecular dynamics studies on the peptide transporter PepT2 (SLC15A2). They examine, in particular, the effect on the transport cycle of protonation of various charged amino acids within the protein. They then validate their conclusions by mutating two of the residues that they predict to be critical for transport in cell-based transport assays. The study suggests a series of protonation steps that are necessary for transport to occur in Petp2. Comparison with bacterial proteins from the same family shows that while the overall architecture of the proteins and likely mechanism are similar, the residues involved in the mechanism may differ.

      Strengths:

      This is an interesting and rigorous study that uses various state-of-the-art molecular dynamics techniques to dissect the transport cycle of PepT2 with nearly 1ms of sampling. It gives insight into the transport mechanism, investigating how the protonation of selected residues can alter the energetic barriers between various states of the transport cycle. The authors have, in general, been very careful in their interpretation of the data.

      Weaknesses:

      Interestingly, they suggest that there is an additional protonation event that may take place as the protein goes from occluded to inward-facing but they have not identified this residue.

      We have indeed suggested that there may be an additional protonation site involved in the conformational cycle that we have not been able to capture, which – as we discuss in our paper – might be indicated by the shapes of the OCC ↔ IF PMFs given in Figure S15. One possibility is for this to be the substrate itself (see the response to reviewer #1 above) though within the scope of this study the precise pathway by which protons move down the transporter and the exact ordering of conformational change and proton transfer reactions remains a (partially) open question. We acknowledge this and denote it with question marks in the mechanistic overview we give in Figure 8, and also “speculate that the proton movement processes may happen as an ensemble of different mechanisms, and potentially occur contemporaneously with the conformational change”.

      Some things are a little unclear. For instance, where does the state that they have defined as occluded sit on the diagram in Figure 1a? - is it truly the occluded state as shown on the diagram or does it tend to inward- or outward-facing?

      Figure 1a is a simple schematic overview intended to show which structures of PepT2 homologues are available to use in simulations. This was not meant to be a quantitative classification of states. Nonetheless, we can note that the OCC state we derived has extra- and intracellular gate opening distances (as measured by the simple CVs defined in the methods and illustrated in Figure 2a) that indicate full gate closure at both sides. In particular, although it was derived from the IF state via biased sampling, the intracellular gate opening distance in the OCC state used for our conformational change enhanced sampling was comparable to that of the OF state (ie, full closure of the gate), see Figure S2b and the grey bars therein. Therefore, we would schematically classify the OCC state to lie at the center of the diagram in Figure 1a. Furthermore, it is largely stable over triplicates of 1 μslong unbiased MD, where in 2/3 replicates the gates remain stable, and the remaining replicate there is partial opening of the intracellular gate (as shown in Figure 2 b/c under the “apo standard” condition). We comment on this in the main text by saying that “The intracellular gate, by contrast, is more flexible than the extracellular gate even in the apo, standard protonation state”, and link it to the lower barrier for transition to IF than to OF. We did this by saying that “As for the OCC↔OF transitions, these results explain the behaviour we had previously observed in the unbiased MD of Figure 2c.” We acknowledge this was not sufficiently clear and will add details to the latter sentence in revision to help clarify better the nature of the occluded state.

      The pKa calculations and their interpretation are a bit unclear. Firstly, it is unclear whether they are using all the data in the calculations of the histograms, or just selected data and if so on what basis was this selection done. Secondly, they dismiss the pKa calculations of E53 in the outward-facing form as not being affected by peptide binding but say that E56 is when there seems to be a similar change in profile in the histograms.

      In our manuscript, we have provided two distinct analyses of the raw CpHMD data. Firstly, we analysed the data by the replicates in which our simulations were conducted (Figure 6, shown as bar plots with mean from triplicates +/- standard deviation), where we found that only the effect on E56 protonation was distinct as lying beyond the combined error bars. This analysis uses the full amount of sampling conducted for each replicate. However, since we found that the range of pKa values estimated from 10ns/window chunks was larger than the error bars obtained from the replicate analysis (Figures S17 and S18), we sought to verify our conclusion by pooling all chunk estimates and plotting histograms (Figure S19). We recover from those the effect of substrate binding on the E56 protonation state on both the OF and OCC states. However, as the reviewer has pointed out (something we did not discuss in our original manuscript), there is a shift in the pKa of E53 of the OF state only. In fact, the trend is also apparent in the replicate-based analysis of Figure 6, though here the larger error bars overlap. In our revision, we will add more details of these analyses for clarity (including more detailed figure captions regarding the data used in Figure 6) as well as a discussion of the partial effect on the E53 pKa value.

      We do not believe, however, that our key conclusions are negatively affected. If anything, a further effect on the E53 pKa which we had not previously commented on (since we saw the evidence as weaker, pertaining to only one conformational state) would strengthen the case for an involvement of the ExxER motif in ligand coupling.

      Reviewer #3 (Public Review):

      Summary:

      Lichtinger et al. have used an extensive set of molecular dynamics (MD) simulations to study the conformational dynamics and transport cycle of an important member of the proton-coupled oligopeptide transporters (POTs), namely SLC15A2 or PepT2. This protein is one of the most wellstudied mammalian POT transporters that provides a good model with enough insight and structural information to be studied computationally using advanced enhanced sampling methods employed in this work. The authors have used microsecond-level MD simulations, constant-PH MD, and alchemical binding free energy calculations along with cell-based transport assay measurements; however, the most important part of this work is the use of enhanced sampling techniques to study the conformational dynamics of PepT2 under different conditions.

      The study attempts to identify links between conformational dynamics and chemical events such as proton binding, ligand-protein interactions, and intramolecular interactions. The ultimate goal is of course to understand the proton-coupled peptide and drug transport by PepT2 and homologous transporters in the solute carrier family.

      Some of the key results include:

      (1) Protonation of H87 and D342 initiate the occluded (Occ) to the outward-facing (OF) state transition.

      (2) In the OF state, through engaging R57, substrate entry increases the pKa value of E56 and thermodynamically facilitates the movement of protons further down.

      (3) E622 is not only essential for peptide recognition but also its protonation facilitates substrate release and contributes to the intracellular gate opening. In addition, cell-based transport assays show that mutation of residues such as H87 and D342 significantly decreases transport activity as expected from simulations.

      Strengths:

      (1) This is an extensive MD-based study of PepT2, which is beyond the typical MD studies both in terms of the sheer volume of simulations as well as the advanced methodology used. The authors have not limited themselves to one approach and have appropriately combined equilibrium MD with alchemical free energy calculations, constant-pH MD, and geometry-based free energy calculations. Each of these 4 methods provides a unique insight regarding the transport mechanism of PepT2.

      (2) The authors have not limited themselves to computational work and have performed experiments as well. The cell-based transport assays clearly establish the importance of the residues that have been identified as significant contributors to the transport mechanism using simulations.

      (3) The conclusions made based on the simulations are mostly convincing and provide useful information regarding the proton pathway and the role of important residues in proton binding, protein-ligand interaction, and conformational changes.

      Weaknesses:

      (1) Some of the statements made in the manuscript are not convincing and do not abide by the standards that are mostly followed in the manuscript. For instance, on page 4, it is stated that "the K64-D317 interaction is formed in only ≈ 70% of MD frames and therefore is unlikely to contribute much to extracellular gate stability." I do not agree that 70% is negligible. Particularly, Figure S3 does not include the time series so it is not clear whether the 30% of the time where the salt bridge is broken is in the beginning or the end of simulations. For instance, it is likely that the salt bridge is not initially present and then it forms very strongly. Of course, this is just one possible scenario but the point is that Figure S3 does not rule out the possibility of a significant role for the K64-D317 salt bridge.

      The reviewer is right to point out that the statement and Figure S3 as they stand do not adequately support our decision to exclude the K64-D317 salt-bridge in our further investigations. The violin plot shown in Figure S3, visualised as pooled data from unbiased 1 μs triplicates, does indeed not rule out a scenario where the salt bridge only formed late in our simulations (or only in some replicates), but then is stable. Therefore, in our revision, we will include the appropriate time-series of the salt bridge distances, showing how K64-D317 is initially stable but then falls apart in replicate 1, and is transiently formed and disengaged across the trajectories in replicates 2 and 3. We will also remake the data for this plot as we discovered a bug in the relevant analysis script that meant the D170-K642 distance was not calculated accurately. The results are however almost identical, and our conclusions remain.

      (2) Similarly, on page 4, it is stated that "whether by protonation or mutation - the extracellular gate only opens spontaneously when both the H87 interaction network and D342-R206 are perturbed (Figure S5)." I do not agree with this assessment. The authors need to be aware of the limitations of this approach. Consider "WT H87-prot" and "D342A H87-prot": when D342 residue is mutated, in one out of 3 simulations, we see the opening of the gate within 1 us. When D342 residue is not mutated we do not see the opening in any of the 3 simulations within 1 us. It is quite likely that if rather than 3 we have 10 simulations or rather than 1 us we have 10 us simulations, the 0/3 to 1/3 changes significantly. I do not find this argument and conclusion compelling at all.

      If the conclusions were based on that alone, then we would agree. However, this section of work covers merely the observations of the initial unbiased simulations which we go on to test/explore with enhanced sampling in the rest of the paper, and which then lead us to the eventual conclusions.

      Figure S5 shows the results from triplicate 1 μs-long trajectories as violin-plot histograms of the extracellular gate opening distance, also indicating the first and final frames of the trajectories as connected by an arrow for orientation – a format we chose for intuitively comparing 48 trajectories in one plot. The reviewer reads the plot correctly when they analyse the “WT H87-prot” vs “D342A H87-prot” conditions. In the former case, no spontaneous opening in unbiased MD is taking place, whereas when D342 is mutated to alanine in addition to H87 protonation, we see spontaneous transition in 1 out of 3 replicates. However, the reviewer does not seem to interpret the statement in question in our paper (“the extracellular gate only opens spontaneously when both the H87 interaction network and D342-R206 are perturbed”) in the way we intended it to be understood. We merely want to note here a correlation in the unbiased dataset we collected at this stage, and indeed the one spontaneous opening in the case comparison picked out by the reviewer is in the condition where both the H87 interaction network and D342-R206 are perturbed. In noting this we do not intend to make statistically significant statements from the limited dataset. Instead, we write that “these simulations show a large amount of stochasticity and drawing clean conclusions from the data is difficult”. We do however stand by our assessment that from this limited data we can “already appreciate a possible mechanism where protons move down the transporter pore” – a hypothesis we investigate more rigorously with enhanced sampling in the rest of the paper. We will revise the section in question to make clearer that the unbiased MD is only meant to give an initial hypothesis here to be investigated in more detail in the following sections. In doing so, we will also incorporate, as we had not done before, the case (not picked out by the reviewer here but concerning the same figure) of S321A & H87 prot. In the third replicate, this shows partial gate opening towards the end of the unbiased trajectory (despite D342 not being affected), highlighting further the stochastic nature that makes even clear correlative conclusions difficult to draw.

      (3) While the MEMENTO methodology is novel and interesting, the method is presented as flawless in the manuscript, which is not true at all. It is stated on Page 5 with regards to the path generated by MEMENTO that "These paths are then by definition non-hysteretic." I think this is too big of a claim to say the paths generated by MEMENTO are non-hysteretic by definition. This claim is not even mentioned in the original MEMENTO paper. What is mentioned is that linear interpolation generates a hysteresis-free path by definition. There are two important problems here: (a) MEMENTO uses the linear interpolation as an initial step but modifies the intermediates significantly later so they are no longer linearly interpolated structures and thus the path is no longer hysteresisfree; (b) a more serious problem is the attribution of by-definition hysteresis-free features to the linearly interpolated states. This is based on conflating the hysteresis-free and unique concepts. The hysteresis in MD-based enhanced sampling is related to the presence of barriers in orthogonal space. For instance, one may use a non-linear interpolation of any type and get a unique pathway, which could be substantially different from the one coming from the linear interpolation. None of these paths will be hysteresis-free necessarily once subjected to MD-based enhanced sampling techniques.

      We certainly do not intend to claim that the MEMENTO method is flawless. The concern the reviewer raises around the statement "These paths are then by definition non-hysteretic" is perhaps best addressed by a clarification of the language used and considering how MEMENTO is applied in this work.

      Hysteresis in the most general sense denotes the dependence of a system on its history, or – more specifically – the lagging behind of the system state with regards to some physical driver (for example the external field in magnetism, whence the term originates). In the context of biased MD and enhanced sampling, hysteresis commonly denotes the phenomenon where a path created by a biased dynamics method along a certain collective variable lags behind in phase space in slow orthogonal degrees of freedom (see Figure 1 in Lichtinger and Biggin 2023, https://doi.org/10.1021/acs.jctc.3c00140). When used to generate free energy profiles, this can manifest as starting state bias, where the conformational state that was used to seed the biased dynamics appears lower in free energy than alternative states. Figure S6 shows this effect on the PepT2 system for both steered MD (heavy atom RMSD CV) + umbrella sampling (tip CV) and metadynamics (tip CV). There is, in essence, a coupled problem: without an appropriate CV (which we did not have to start with here), path generation that is required for enhanced sampling displays hysteresis, but the refinement of CVs is only feasible when paths connecting the true phase space basins of the two conformations are available. MEMENTO helps solve this issue by reconstructing protein conformations along morphing paths which perform much better than steered MD paths with respect to giving consistent free energy profiles (see Figure S7 and the validation cases in the MEMENTO paper), even if the same CV is used in umbrella sampling.

      There are still differences between replicates in those PMFs, indicating slow conformational flexibility propagated from end-state sampling through MEMENTO. We use this to refine the CVs further with dimensionality reduction (see the Method section and Figure S8), before moving to 2D-umbrella sampling (figure 3). Here, we think, the reviewer’s point seems to bear. The MEMENTO paths are ‘non-hysteretic by definition’ with respect to given end states in the sense that they connect (by definition) the correct conformations at both end-states (unlike steered MD), which in enhanced sampling manifests as the absence of the strong starting-state bias we had previously observed (Figure S7 vs S6). They are not, however, hysteresis-free with regards to how representative of the end-state conformational flexibility the structures given to MEMENTO really were, which is where the iterative CV design and combination of several MEMENTO paths in 2D-PMFs comes in.

      We also cannot make a direct claim about whether in the transition region the MEMENTO paths might be separated from the true (lower free energy) transition paths by slow orthogonal degrees of freedom, which may conceivably result in overestimated barrier heights separating two free energy basins. We cannot guarantee that this is not the case, but neither in our MEMENTO validation examples nor in this work have we encountered any indications of a problem here.

      We hope that the reviewer will be satisfied by our revision, where we will replace the wording in question by a statement that the MEMENTO paths do not suffer from hysteresis that is otherwise incurred as a consequence of not reaching the correct target state in the biased run (in some orthogonal degrees of freedom).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer 1

      (1) Given the low trial numbers, and the point of sequential vs clustered reactivation mentioned in the public review, it would be reassuring to see an additional sanity check demonstrating that future items that are currently not on-screen can be decoded with confidence, and if so, when in time the peak reactivation occurs. For example, the authors could show separately the decoding accuracy for near and far items in Fig. 5A, instead of plotting only the difference between them.

      We have now added the requested analysis showing the raw decoded probabilities for near and distant items separately in Figure 5A. We have also chosen to replace Figure 5B with the new figure as we think it provides more information than the previous Figure 5B. Instead, we have moved Figure 5B to the supplement. The median peak decoded accuracy for near and distant items is equivalent. We have added the following description to the figure:

      “Decoded raw probabilities for off-screen items, that were up to two steps ahead of the current stimulus cue (‘near’,) vs. distant items that were more than two steps away on the graph, on trials with correct answers. The median peak decoded probability for near and distant items was at the same time point for both probability categories. Note that displayed lines reflect the average probability while, to eliminate influence of outliers, the peak displays the median.”

      (2) The non-sequential reactivation analyses often use a time window of peak decodability, and it was not entirely clear to me what data this time window is determined on, e.g., was it determined based on all future reactivations irrespective of graph distance? This should be clarified in the methods.

      Thank you for raising this. We now clarify this in the relevant section to read: “First, we calculated a time point of interest by computing the peak probability estimate of decoders across all trials, i.e., the average probability for each timepoint of all trials (except previous onscreen items) of all distances, which is equivalent to the peak of the differential reactivation analysis”

      (3) Fig 4 shows evidence for forward and backward sequential reactivation, suggesting that both forward and backward replay peak at a lag of 40-50msec. It would be helpful if this counterintuitive finding could be picked up in the discussion, explaining how plausible it is, physiologically, to find forward and backward replay at the same lag, and whether this could be an artifact of the TDLM method.

      This is an important point and we agree that it appears counterintuitive. However, we would highlight this exact time range has been reported in previous studies, though t never for both forward and backward replay. We now include a discussion of this finding. The section now reads:

      “[… ] Even though we primarily focused on the mean sequenceness scores across time lags, there appears s to be a (non-significant) peak at 40-60 milliseconds. While simultaneous forward and backward replay is theoretically possible, we acknowledge that it is somewhat surprising and, given our paradigm, could relate to other factors such as autocorrelations (Liu, Dolan, et al., 2021).”

      (4) It is reported that participants with below 30% decoding accuracy are excluded from the main analyses. It would be helpful if the manuscript included very specific information about this exclusion, e.g., was the criterion established based on the localizer cross-validated data, the temporal generalisation to the cued item (Fig. 2), or only based on peak decodability of the future sequence items? If the latter, is it applied based on near or far reactivations, or both?

      We now clarify this point to include more specific information, which reads:

      “[…] Therefore, we decided a priori that participants with a peak decoding accuracy of below 30% would be excluded from the analysis (nine participants in all) as obtained from the cross-validation of localizer trials”

      (5) Regarding the low amount of data for the reactivation analysis, the manuscript should be explicit about the number of trials available for each participant. For example, Supplemental Fig. 1 could provide this information directly, rather than the proportion of excluded trials.

      We have adapted the plot in the supplement to show the absolute number of rejected epochs per participant, in addition to the ratio.

      (6) More generally, the supplements could include more detailed information in the legends.

      We agree and have added more extensive explanation of the plots in the supplement legends.

      (7) The choice of comparing the 2 nearest with all other future items in the clustered reactivation analysis should be better motivated, e.g., was this based on the Wimmer et al. (2020) study?

      We have added our motivation for taking the two nearest items and contrasting them with the items further away. The paragraph reads:

      “[…] We chose to combine the following two items for two reasons: First, this doubled the number of included trials; secondly, using this approach the number of trials for each category (“near” and “distant”) was more balanced. […]”

      Reviewer 2

      (1) Focus exclusively on retrieval data (and here just on the current image trials).

      If I understand correctly, you focus all your analyses (behavioural as well as MEG analyses) on retrieval data only and here just on the current image trials. I am surprised by that since I see some shortcomings due to that. These shortcomings can likely be addressed by including the learning data (and predecessor image trials) in your analyses.

      a) Number of trials: During each block, you presented each of the twelve edges once. During retrieval, participants then did one "single testing session block". Does that mean that all your results are based on max. 12 trials? Given that participants remembered, on average, 80% this means even fewer trials, i.e., 9-10 trials?

      This is correct and a limitation of the paper. However, while we used only correct trials for the reactivation analysis, the sequential analysis was conducted using all trials disregarding the response behaviour. To retain comparability with previous studies we mainly focused on data from after a consolidation phase. Nevertheless, despite the trial limitation we consider the results are robust and worth reporting. Additionally, based on the suggestion of the referee, we now include results from learning blocks (see below).

      b) Extend the behavioural and replay/reactivation analysis to predecessor images.

      Why do you restrict your analyses to the current image trials? Especially given that you have such a low trial number for your analyses, I was wondering why you did not include the predecessor trials (except the non-deterministic trials, like the zebra and the foot according to Figure 2B) as well.

      We agree it would be great to increase power by adding the predecessor images to the current image cue analysis, excluding the ambiguous trials, we did not do so as we considered the underlying retrieval processes of these trial types are not the same, i.e. cannot be simply combined. Nevertheless, we have performed the suggested analysis to check if it increases our power. We found, that the reactivation effect is robust and significant at the same time point of 220-230 ms. However, the effect size actually decreased: While before, peak differential reactivation was at 0.13, it is now at 0.07. This in fact makes conceptual sense. We suspect that the two processes that are elicited by showing a single cue and by showing a second, related, cue are distinct insofar as the predecessor image acts as a primer for the current image, potentially changing the time course/speed of retrieval. Given our concerns that the two processes are not actually the same we consider it important to avoid mixing these data.

      We have added a statement to the manuscript discussing this point. The section reads:

      “Note that we only included data from the current image cue, and not from the predecessor image cue, as we assume the retrieval processes differ and should not be concatenated.”

      c) Extend the behavioural and replay/reactivation analysis to learning trials.

      Similar to point 1b, why did you not include learning trials in your analyses?

      The advantage of including (correct and incorrect) learning trials has the advantage that you do not have to exclude 7 participants due to ceiling performance (100%).

      Further, you could actually test the hypothesis that you outline in your discussion: "This implies that there may be a switch from sequential replay to clustered reactivation corresponding to when learned material can be accessed simultaneously without interference." Accordingly, you would expect to see more replay (and less "clustered" reactivation) in the first learning blocks compared to retrieval (after the rest period).

      To track reactivation and replay over the course of learning is a great idea. We have given a lot of thought as to how to integrate these findings but have not found a satisfying solution. Thus, analysis of the learning data turned out to be quite tricky: We decided that each participant should perform as many blocks as necessary to reach at least 80% (with a limit of six and lower bound of two, see Supplement figure 4). Indeed, some participant learned 100% of the sequence after one block (these were mostly medical students, learning things by hard is their daily task). With the benefit of hindsight, we realise our design means that different blocks are not directly comparable between participants. In theory, we would expect that replay emerges in parallel with learning and then gradually changes to clustered reactivation as memory traces become consolidated/stronger. However, it is unclear when replay should emerge and when precisely a switch to clustered reactivation would happen. For this reason, we initially decided not to include the learning trials into the paper.

      Nevertheless, to provide some insight into the learning process, and to see how consolidation impacts differential reactivation and replay, we have split our data into pre and post resting state, aggregating all learning trials of each participant. While this does not allow us to track processes on a block basis, it does offer potential (albeit limited) insight into the hypothesis we outline in the discussion.

      For reactivation, we see emergence of a clear increase, further strengthening the outlined hypothesis, however, for replay the evidence is less clear, as we do not know over how many learning blocks replay is expected.

      We calculated individual trajectories of how reactivation and replay changes from learning to retrieval and related these to performance. Indeed, we see an increase of reactivation is nominally associated with higher learning performance, while an increase in replay strength is associated with lower performance (both non-significant). However, due to the above-mentioned reasons we think it would premature to add this weak evidence to the paper.

      To mitigate problems of experiment design in relation to this question we are currently implementing a follow-study, where we aim to normalize the learning process across participants and index how replay/reactivation changes over the course of learning and after consolidation.

      We have added plots showing clustered reactivation sequential replay measures during learning (Figure 5D and Supplement 8)

      The added section(s) now read:

      “To provide greater detail on how the 8-minute consolidation period affected reactivation we, post-hoc, looked at relevant measures across learning trials in contrast to retrieval trials. For all learning trials, for each participant, we calculated differential reactivation for the same time point we found significant in the previous analysis (220-260 milliseconds). On average, differential reactivation probability increased from pre to post resting state (Figure 5D). […]

      Nevertheless, even though our results show a nominal increase in reactivation from learning to retrieval (see Figure 5D), due to experimental design features our data do not enable us to test for an hypothesized switch for sequential replay (see also “limitations” and Supplement 8).”

      d) Introduction (last paragraph): "We examined the relationship of graph learning to reactivation and replay in a task where participants learned a ..." If all your behavioural analyses are based on retrieval performance, I think that you do not investigate graph learning (since you exclusively focus the analyses on retrieving the graph structure). However, relating the graph learning performance and replay/reactivation activity during learning trials (i.e., during graph learning) to retrieval trials might be interesting but beyond the scope of this paper.

      We agree. We have changed the wording to be more accurate. Indeed, we do not examine graph learning but instead examine retrieval from a graph, after graph learning. The mentioned sentence now read

      “[…] relationship of retrieval from a learned graph structure to reactivation [...]”

      e) It is sometimes difficult to follow what phase of the experiment you refer to since you use the terms retrieval and test synonymously. Not a huge problem at all but maybe you want to stick to one term throughout the whole paper.

      Thank you for pointing this out. We have now adapted the manuscript to exclusively refer to “retrieval” and not to “test”.

      (2) Is your reactivation clustered?

      In Figure 5A, you compare the reactivation strength of the two items following the cue image (i.e., current image trials) with items further away on the graph. I do not completely understand why your results are evidence for clustered reactivation in contrast to replay.

      First, it would be interesting to see the reactivation of near vs. distant items before taking the difference (time course of item probabilities).

      (copied answer from response to Reviewer 1, as the same remark was raised)

      We have added the requested analysis showing the raw decoded probabilities for near and distant items separately in Figure 5A. We have chosen to replace Figure 5B with the new figure as we think that it offers more information than the previous Figure 5B. Instead, we have moved Figure 5B to the supplement. The median peak decoded accuracy for near and distant items is equivalent. We have added the following description to the figure:

      “Decoded raw probabilities for off-screen items, that were up to two steps ahead of the current stimulus cue (‘near’,) vs. distant items that were more than two steps away on the graph, on trials with correct answers. The median peak decoded probability for near and distant items was at the same time point for both probability categories. Note that displayed lines reflect the average probability while, to eliminate influence of outliers, the peak displays the median. .”

      Second, could it still be that the first item is reactivated before the second item? By averaging across both items, it becomes not apparent what the temporal courses of probabilities of both items look like (and whether they follow a sequential pattern). Additionally, the Gaussian smoothing kernel across the time dimension might diminish sequential reactivation and favour clustered reactivation. (In the manuscript, what does a Gaussian smoothing kernel of  = 1 refer to?). Could you please explain in more detail why you assume non-sequential clustered reactivation here and substantiate this with additional analyses?

      We apologise for the unclear description. Note the Gaussian kernel is in fact only used for the reactivation analysis and not the replay analysis, so any small temporal successions would have been picked up by the sequential analysis. We now clarify this in the respective section of the sequential analysis and also explain the parameter of delta= 1 in the reactivation analysis section. The paragraph now reads

      “[…] As input for the sequential analysis, we used the raw probabilities of the ten classifiers corresponding to the stimuli. [...]

      […] Therefore, to address this we applied a Gaussian smoothing kernel (using scipy.ndimage.gaussian_filter with the default parameter of σ=1 which corresponds approximately to taking the surrounding timesteps in both direction with the following weighting: current time step: 40%, ±1 step: 25%, ±2 step: 5%, ±3 step: 0.5%) [...]”

      (3) Replay and/or clustered reactivation?

      The relationship between the sequential forward replay, differential reactivation, and graph reactivation analysis is not really apparent. Wimmer et al. demonstrated that high performers show clustered reactivation rather than sequential reactivation. However, you did not differentiate in your differential reactivation analysis between high vs. low performers. (You point out in the discussion that this is due to a low number of low performers.)

      We agree that a split into high vs low performers would have been preferably for our analysis. However, there is one major obstacle that made us opt for a correlational analysis instead: We employed criteria learning, rendering a categorical grouping conceptually biased. Even though not all participants reached the criteria of 80%, our sample did not naturally split between high and low performers but was biased towards higher performance, leaving the groups uneven. The median performance was 83% (mean ~81%), with six of our subjects (~1/4th of included participant) having this exact performance. This makes a median or mean split difficult, as either binning assignment choice would strongly affect the results. We have added a limitations section in which we extensively discuss this shortcoming and reasoning for not performing a median split as in Wimmer et al (2020). The section now reads:

      “There are some limitations to our study, most of which originate from a suboptimal study design. [...], as we performed criteria learning, a sub-group analysis as in Wimmer et al., (2020) was not feasible, as median performance in our sample would have been 83% (mean 81%), with six participants exactly at that threshold. [...]”

      It might be worth trying to bring the analysis together, for example by comparing sequential forward replay and differential reactivation at the beginning of graph learning (when performance is low) vs. retrieval (when performance is high).

      Thank you for the suggestion to include the learning segments, which we think improves the paper quite substantially. However, analysis of the learning data turned out to be quite tricky> We had decided that each participant should perform as many blocks as necessary to reach at least 80% accuracy (with a limit of six and lower bound of two, see Supplement figure 4). Some participants learned 100% of the sequence after one block (these were mostly medical students, learning things by hard is their daily task). This in hindsight is an unfortunate design feature in relation to learning as it means different blocks are not directly comparable between participants.

      In theory, we would expect that replay emerges in parallel with learning and then gradually change to clustered reactivation, as memory traces get consolidated/stronger. However, it is unclear when replay would emerge and when the switch to reactivation would happen. For this reason, we initially decided not to include the learning trials into the paper at all.

      Nevertheless, to give some insight into the learning process and to see how consolidation effects differential reactivation and replay, we have split our data into pre and post resting state, aggregating all learning trials of each participant. While this does not allow us to track measures of interest on a block basis, it gives some (albeit limited) insight into the hypothesis outlined in our discussion.

      For reactivation, we see a clear increase, further strengthening the outlined hypothesis, However, for replay the evidence is less obvious, potentially due to that fact that we do not know across how many learning blocks replay is to be expected.

      The added section(s) now read:

      “To examine how the 8-minute consolidation period affected reactivation we, post-hoc, looked at relevant measures during learning trials in contrast to retrieval trials. For all learning trial, for each participant, we calculated differential reactivation for the time point we found significant during the previous analysis (220-260 milliseconds). On average, differential reactivation probability increased from pre to post resting state (Figure 5D).

      […]

      Nevertheless, even though our results show a nominal increase in reactivation from learning to retrieval (see Figure 5D), our data does not enable us to show an hypothesized switch for sequential replay (see also “limitations” and Supplement 8).”

      Additionally, the main research question is not that clear to me. Based on the introduction, I thought the focus was on replay vs. clustered reactivation and high vs. low performance (which I think is really interesting). However, the title is more about reactivation strength and graph distance within cognitive maps. Are these two research questions related? And if so, how?

      We agree we need to be clearer on this point. We have added two sentences to the introduction, which should address this point. The section now reads:

      “[…] In particular, the question remains how the brain keeps track of graph distances for successful recall and whether the previously found difference between high and low performers also holds true within a more complex graph learning context.”

      (4) Learning the graph structure.

      I was wondering whether you have any behavioural measures to show that participants actually learn the graph structure (instead of just pairs or triplets of objects). For example, do you see that participants chose the distractor image that was closer to the target more frequently than the distractor image that was further away (close vs. distal target comparison)? It should be random at the beginning of learning but might become more biased towards the close target.

      Thanks, this is an excellent suggestion. Our analysis indeed shows that people take the near lure more often than the far lure in later blocks, while it is random in the first block.

      Nevertheless, we have decided to put these data into the supplement and reference it in the text. This is because analysis of the learning blocks is challenging and biased in general. Each participant had a different number of learning blocks based on their learning rate, and this makes it difficult to compare learning across participants. We have tried our best to accommodate and explain these difficulties in the figure legend. Nevertheless, we thank the referee for guidance here and this analysis indeed provides further evidence that participants learned the actual graph structure.

      The added section reads

      “Additionally, we have included an analysis showing how wrong answers participants provided were random in the first block and biased towards closer graph nodes in later blocks. This is consistent with participants actually learning the underlying graph structure as opposed to independent triplets (see figure and legend of Supplement 6 for details).”

      (5) Minor comments

      a) "Replay analysis relies on a successive detection of stimuli where the chance of detection exponentially decreases with each step (e.g., detecting two successive stimuli with a chance of 30% leaves a 9% chance of detecting the replay event). " Could you explain in more detail why 30% is a good threshold then?

      Thank you. We have further clarified the section. As we are working mainly with probabilities, it is useful to keep in mind that accuracy is a class metric that only provides a rough estimate of classifier ability. Alternatively, something like a Top-3-Accuracy would be preferable, but also slightly silly in the context of 10 classes.

      Nevertheless, subtle changes in probability estimates are present and can be picked up by the methods we employ. Therefore, the 30% is a rough lower bound and decided based on pilot data that showed that clean MEG data from attentive participants can usually reach this threshold. The section now reads:

      “(e.g., detecting two successive stimuli with a chance of 30% leaves a 9% chance of detecting a replay event). However, one needs to bear in mind that accuracy is a “winnertakes-all” metric indicating whether the top choice also has the highest probability, disregarding subtle, relative changes in assigned probability. As the methods used in this analysis are performed on probability estimates and not class labels, one can expect that the 30% are a rough lower bound and that the actual sensitivity within the analysis will be higher. Additionally, based on pilot data, we found that attentive participants were able to reach 30% decodability, allowing us to use decodability as a data quality check. “

      b) Could you make explicit how your decoders were designed? Especially given that you added null data, did you train individual decoders for one class vs. all other classes (n = 9 + null data) or one class vs. null data?

      We added detail to the decoder training. The section now reads

      “Decoders were trained using a one-vs-all approach, which means that for each class, a separate classifier was trained using positive examples (target class) and negative examples (all other classes) plus null examples (data from before stimulus presentation, see below). In detail, null data was.”

      c) Why did you choose a ratio of 1:2 for your null data?

      Our choice for using a higher ratio was based upon previous publications reporting better sensitivity of TDLM using higher ratios, as spatial sensor correlations are decreasing. Nevertheless, this choice was not well investigated beforehand. We have added more information to this to the manuscript

      d) You could think about putting the questionnaire results into the supplement if they are sanity checks.

      We have added the questionnaire results. However, due to the size of the tables, we have decided to add them as excel files into the supplementary files of the code repository. We have mentioned the existence file in the publication.

      e) Figure 2. There is a typo in D: It says "Precessor Image" instead of "Predecessor Image".

      Fixed typo in figure.

      f) You write "Trials for the localizer task were created from -0.1 to 0.5 seconds relative to visual stimulus onset to train the decoders and for the retrieval task, from 0 to 1.5 seconds after onset of the second visual cue image." But the Figure legend 3D starts at -0.1 seconds for the retrieval test.

      We have now clarified this. For the classifier cross-validation and transfer sanity check and clustered analysis we used trials from -0.1 to 0.5s, whereas for the sequenceness analysis of the retrieval, we used trials from 0 to 1.5 seconds

    1. Author response:

      The following is the authors’ response to the previous reviews.

      eLife assessment

      This important study advances our understanding of how past and future information is jointly considered in visual working memory by studying gaze biases in a memory task that dissociates the locations during encoding and memory tests. The evidence supporting the conclusions is convincing, with state-of-the-art gaze analyses that build on a recent series of experiments introduced by the authors. This work, with further improvements incorporating the existing literature, will be of broad interest to vision scientists interested in the interplay of vision, eye movements, and memory.

      We thank the Editors and the Reviewers for their enthusiasm and appreciation of our task, our findings, and our article. We also wish to thank the Reviewers for their constructive comments that we have embraced to improve our article. Please find below our point-by-point responses to this valuable feedback, where we also state relevant revisions that we have made to our article.

      In addition, please note that we have now also made our data and code publicly available.

      Reviewer 1, Comments:

      In this study, the authors offer a fresh perspective on how visual working memory operates. They delve into the link between anticipating future events and retaining previous visual information in memory. To achieve this, the authors build upon their recent series of experiments that investigated the interplay between gaze biases and visual working memory. In this study, they introduce an innovative twist to their fundamental task. Specifically, they disentangle the location where information is initially stored from the location where it will be tested in the future. Participants are tasked with learning a novel rule that dictates how the initial storage location relates to the eventual test location. The authors leverage participants' gaze patterns as an indicator of memory selection. Intriguingly, they observe that microsaccades are directed toward both the past encoding location and the anticipated future test location. This observation is noteworthy for several reasons. Firstly, participants' gaze is biased towards the past encoding location, even though that location lacks relevance to the memory test. Secondly, there's a simultaneous occurrence of an increased gaze bias towards both the past and future locations. To explore this temporal aspect further, the authors conduct a compelling analysis that reveals the joint consideration of past and future locations during memory maintenance. Notably, microsaccades biased towards the future test location also exhibit a bias towards the past encoding location. In summary, the authors present an innovative perspective on the adaptable nature of visual working memory. They illustrate how information relevant to the future is integrated with past information to guide behavior.

      Thank you for your enthusiasm for our article and findings as well as for your constructive suggestions for additional analyses that we respond to in detail below.

      This short manuscript presents one experiment with straightforward analyses, clear visualizations, and a convincing interpretation. For their analysis, the authors focus on a single time window in the experimental trial (i.e., 0-1000 ms after retro cue onset). While this time window is most straightforward for the purpose of their study, other time windows are similarly interesting for characterizing the joint consideration of past and future information in memory. First, assessing the gaze biases in the delay period following the cue offset would allow the authors to determine whether the gaze bias towards the future location is sustained throughout the entire interval before the memory test onset. Presumably, the gaze bias towards the past location may not resurface during this delay period, but it is unclear how the bias towards the future location develops in that time window. Also, the disappearance of the retro cue constitutes a visual transient that may leave traces on the gaze biases which speaks again for assessing gaze biases also in the delay period following the cue offset.

      Thank you for raising this important point. We initially focused on the time window during the cue given that our central focus was on gaze-biases associated with mnemonic item selection. By zooming in on this window, we could best visualize our main effects of interest: the joint selection (in time) of past and future memory attributes.

      At the same time, we fully agree that examining the gaze biases over a more extended time window yields a more comprehensive view of our data. To this end, we have now also extended our analysis to include a wider time range that includes the period between cue offset (1000 ms after cue onset) and test onset (1500 ms after cue onset). We present these data below. Because we believe our future readers are likely to be interested in this as well, we have now added this complementary visualization as Supplementary Figure 4 (while preserving the focus in our main figure on the critical mnemonic selection period of interest).

      Author response image 1.

      Supplementary Figure 4. Gaze biases in extended time window as a complement to Figure 1 and Supplementary Figure 2. This extended analysis reveals that while the gaze bias towards the past location disappears around 600 ms after cue onset, the gaze bias towards the future location persists (panel a) and that while the early (joint) future bias occurs predominantly in the microsaccade range below 1 degree visual angle, the later bias to the future location incorporates larger eye movement that likely involve preparing for optimally perceiving the anticipated test stimulus (panel b).

      This extended analysis reveals that while the gaze bias towards the past location disappears around 600 ms after cue onset (consistent with our prior reports of this bias), the gaze bias towards the future location persists. Moreover, as revealed by the data in panel b above, while the early (joint) future bias occurs predominantly in the microsaccade range below 1 degree visual angle, the later bias to the future location incorporates larger eye movement that likely involve preparing for optimally perceiving the anticipated test stimulus.

      We now also call out these additional findings and figure in our article:

      Page 2 (Results): “Gaze biases in both axes were driven predominantly by microsaccades (Supplementary Fig. 2) and occurred similarly in horizontal-to-vertical and vertical-tohorizontal trials (Supplementary Fig. 3). Moreover, while the past bias was relatively transient, the future bias continued to increase in anticipation of the of the test stimulus and increasingly incorporated eye-movements beyond the microsaccade range (see Supplementary Fig. 4 for a more extended time range)”.

      Moreover, assessing the gaze bias before retro-cue onset allows the authors to further characterize the observed gaze biases in their study. More specifically, the authors could determine whether the future location is considered already during memory encoding and the subsequent delay period (i.e., before the onset of the retro cue). In a trial, participants encode two oriented gratings presented at opposite locations. The future rule indicates the test locations relative to the encoding locations. In their example (Figure 1a), the test locations are shifted clockwise relative to the encoding location. Thus, there are two pairs of relevant locations (each pair consists of one stimulus location and one potential test location) facing each other at opposite locations and therefore forming an axis (in the illustration the axis would go from bottom left to top right). As the future rule is already known to the participants before trial onset it is possible that participants use that information already during encoding. This could be tested by assessing whether more microsaccades are directed along the relevant axis as compared to the orthogonal axis. The authors should assess whether such a gaze bias exists already before retro cue onset and discuss the theoretical consequences for their main conclusions (e.g., is the future location only jointly used if the test location is implicitly revealed by the retro cue).

      Thank you – this is another interesting point. We fully agree that additional analysis looking at the period prior to retrocue onset may also prove informative. In accordance with the suggested analysis, we have therefore now also analysed the distribution of saccade directions (including in the period from encoding to retrocue) as a function of the future rule (presented below, and now also included as Supplementary Fig. 5). Complementary recent work from our lab has shown how microsaccade directions can align to the axis of memory contents during retention (see de Vries & van Ede, eNeuro, 2024). Based on this finding, one may predict that if participants retain the items in a remapped fashion, their microsaccades may align with the axis of the future rule, and this could potentially already happen prior to cue onset.

      These complementary analyses show that saccade directions are predominantly influenced by the encoding locations rather than the test locations, as seen most clearly by the saccade distribution plots in the middle row of the figure below. To obtain time-courses, we categorized saccades as occurring along the axis of the future rule or along the orthogonal axis (bottom row of the figure below). Like the distribution plots, these time course plots also did not reveal any sign of a bias along the axis of the future rule itself.

      Importantly, note how this does not argue against our main findings of joint selection of past and future memory attributes, as for that central analysis we focused on saccade biases that were specific to the selected memory item, whereas the analyses we present below focus on biases in the axes in which both memory items are defined; not only the cued/selected memory item.

      Author response image 2.

      Supplementary Figure 5. Distribution of saccade directions relative to the future rule from encoding onset. (Top panel) The spatial layouts in the four future rules. (Middle panel) Polar distributions of saccades during 0 to 1500 ms after encoding onset (i.e., the period between encoding onset and cue onset). The purple quadrants represent the axis of the future rule and the grey quadrants the orthogonal axis. (Bottom panel) Time courses of saccades along the above two axes. We did not observe any sign of a bias along the axis of the future rule itself.

      We agree that these additional results are important to bring forward when we interpret our findings. Accordingly, we now mention these findings at the relevant section in our Discussion:

      Page 5 (Discussion): “First, memory contents could have directly been remapped (cf. 4,24–26) to their future-relevant location. However, in this case, one may have expected to exclusively find a future-directed gaze bias, unlike what we observed. Moreover, using a complementary analysis of saccade directions along the axis of the future rule (cf. 24), we found no direct evidence for remapping in the period between encoding and cue (Supplementary Fig. 5)”.

      Reviewer 2, Comments:

      The manuscript by Liu et al. reports a task that is designed to examine the extent to which "past" and "future" information is encoded in working memory that combines a retro cue with rules that indicate the location of an upcoming test probe. An analysis of microsaccades on a fine temporal scale shows the extent to which shifts of attention track the location of the location of the encoded item (past) and the location of the future item (test probe). The location of the encoded grating of the test probe was always on orthogonal axes (horizontal, vertical) so that biases in microsaccades could be used to track shifts of attention to one or the other axis (or mixtures of the two). The overall goal here was then to (1) create a methodology that could tease apart memory for the past and future, respectively, (2) to look at the time-course attention to past/future, and (3) to test the extent to which microsaccades might jointly encode past and future memoranda. Finally, some remarks are made about the plausibility of various accounts of working memory encoding/maintenance based on the examination of these time courses.

      Strengths:

      This research has several notable strengths. It has a clear statement of its aims, is lucidly presented, and uses a clever experimental design that neatly orthogonalizes "past" and "future" as operationalized by the authors. Figure 1b-d shows fairly clearly that saccade directions have an early peak (around 300ms) for the past and a "ramping" up of saccades moving in the forward direction. This seems to be a nice demonstration the method can measure shifts of attention at a fine temporal resolution and differentiate past from future-oriented saccades due to the orthogonal cue approach. The second analysis shown in Figure 2, reveals a dependency in saccade direction such that saccades toward the probe future were more likely also to be toward the encoded location than away from the encoded direction. This suggests saccades are jointly biased by both locations "in memory".

      Thank you for your overall appreciation of our work and for highlighting the above strengths. We also thank you for your constructive comments and call for clarifications that we respond to below.

      Weaknesses:

      (1) The "central contribution" (as the authors characterize it) is that "the brain simultaneously retains the copy of both past and future-relevant locations in working memory, and (re)activates each during mnemonic selection", and that: "... while it is not surprising that the future location is considered, it is far less trivial that both past and future attributes would be retained and (re)activated together. This is our central contribution." However, to succeed at the task, participants must retain the content (grating orientation, past) and probe location (future) in working memory during the delay period. It is true that the location of the grating is functionally irrelevant once the cue is shown, but if we assume that features of a visual object are bound in memory, it is not surprising that location information of the encoded object would bias processing as indicated by microsaccades. Here the authors claim that joint representation of past and future is "far less trivial", this needs to be evaluaed from the standpoint of prior empirical data on memory decay in such circumstances, or some reference to the time-course of the "unbinding" of features in an encoded object.

      Thank you. We agree that our participants have to use the future rule – as otherwise they do not know to which test stimulus they should respond. This was a deliberate decision when designing the task. Critically, however, this does not require (nor imply) that participants have to incorporate and apply the rule to both memory items already prior to the selection cue. It is at least as conceivable that participants would initially retain the two items at their encoded (past) locations, then wait for the cue to select the target memory item, and only then consider the future location associated with the target memory item. After all, in every trial, there is only 1 relevant future location: the one associated with the cued memory item. The time-resolved nature of our gaze markers argues against such a scenario, by virtue of our observation of the joint (simultaneous) consideration of past and future memory attributes (as opposed to selection of past-before-future). These temporal dynamics are central to the insights provided by our study.

      In our view, it is thus not obvious that the rule would be applied at encoding. In this sense, we do not assume that the future location is part of both memory objects from encoding, but rather ask whether this is the case – and, if so, whether the future location takes over the role of the past location, or whether past and future locations are retained jointly.

      Our statements regarding what is “trivial” and what is “less trivial” regard exactly this point: it is trivial that the future is considered (after all, our task demanded it). However, it is less trivial that (1) the future location was already available at the time of initial item selection (as reflected in the simultaneous engagement of past and future locations), and (2) that in presence of the future location, the past location was still also present in the observed gaze biases.

      Having said that, we agree that an interesting possibility is that participants remap both memory items to their future-relevant locations ahead of the cue, but that the past location is not yet fully “unbound” by the time of the cue. This may trigger a gaze bias not only to the new future location but also to the “sticky” (unbound) past location. We now acknowledge this possibility in our discussion (also in response to comment 3 below) where we also suggest how future work may be able to tap into this:

      Page 6 (Discussion): “In our study, the past location of the memory items was technically irrelevant for the task and could thus, in principle, be dropped after encoding. One possibility is that participants remapped the two memory items to their future locations soon after encoding, and had started – but not finished – dropping the past location by the time the cue arrived. In such a scenario, the past signal is merely a residual trace of the memory items that serves no purpose but still pulls gaze. Alternatively, however, the past locations may be utilised by the brain to help individuate/separate the two memory items. Moreover, by storing items with regard to multiple spatial frames (cf. 37) – here with regard to both past and future visual locations – it is conceivable that memories may become more robust to decay and/or interference. Also, while in our task past locations were never probed, in everyday life it may be useful to remember where you last saw something before it disappeared behind an occluder. In future work, it will prove interesting to systematically vary to the delay between encoding and cue to assess whether the reliance on the past location gradually dissipates with time (consistent with dropping an irrelevant feature), or whether the past trace remains preserved despite longer delays (consistent with preserving utility for working memory).”

      (2) The authors refer to "future" and "past" information in working memory and this makes sense at a surface level. However, once the retrocue is revealed, the "rule" is retrieved from long-term memory, and the feature (e.g. right/left, top/bottom) is maintained in memory like any other item representation. Consider the classic test of digit span. The digits are presented and then recalled. Are the digits of the past or future? The authors might say that one cannot know, because past and future are perfectly confounded. An alternative view is that some information in working memory is relevant and some is irrelevant. In the digit span task, all the digits are relevant. Relevant information is relevant precisely because it is thought be necessary in the future. Irrelevant information is irrelevant precisely because it is not thought to be needed in the immediate future. In the current study, the orientation of the grating is relevant, but its location is irrelevant; and the location of the test probe is also relevant.

      Thank you for this stimulating reflection. We agree that in our set-up, past location is technically “task-irrelevant” while future location is certainly “task-relevant”. At the same time, the engagement of the past location suggests to us that the brain uses past location for the selection – presumably because the brain uses spatial location to help individuate/separate the items, even if encoded locations are never asked about. Therefore, whether something is relevant or irrelevant ultimately depends on how one defines relevance (past location may be relevant/useful for the brain even if technically irrelevant from the perspective of the task). In comparison, the use of “past” and “future” may be less ambiguous.

      It is also worth noting how we interpret our findings in relation to demands on visual working memory, inspired by dynamic situations whereby visual stimuli may be last seen at one location but expected to re-appear at another, such as a bird disappearing behind a building (the example in our introduction). Thus, past for us does not refer to the memory item perse (like in the digit span analogue) but, rather, quite specifically to the past location of a dynamic visual stimulus in memory (which, in our experiment, was operationalised by the future rule, for convenience).

      (3) It is not clear how the authors interpret the "joint representation" of past and future. Put aside "future" and "past" for a moment. If there are two elements in memory, both of which are associated with spatial bindings, the attentional focus might be a spatial average of the associated spatial indices. One might also view this as an interference effect, such that the location of the encoded location attracts spatial attention since it has not been fully deleted/removed from working memory. Again, for the impact of the encoded location to be exactly zero after the retrieval cue, requires zero interference or instantaneous decay of the bound location information. It would be helpful for the authors to expand their discussion to further explain how the results fit within a broader theoretical framework and how it fits with empirical data on how quickly an irrelevant feature of an object can be deleted from working memory.

      Thank you also for this point (that is related to the two points above). As we stated in our reply to comment 1 above, we agree that one possibility is that the past location is merely “sticky” and pulls the task-relevant future bias toward the past location. If so, our time courses suggest that such “pulling” occurs only until approximately 600 ms after cue onset, as the past bias is only transient. An alternative interpretation is that the past location may not be merely a residual irrelevant trace, but actually be useful and used by the brain.

      For example, the encoded (past) item locations provide a coordinate system in which to individuate/separate the two memory items. While the future locations also provide such a coordinate system, the brain may benefit from holding onto both coordinate systems at the same time, rendering our observation of joint selection in both frames. Indeed, in a recent VR experiment in which we had participants (rather than the items) rotate, we also found evidence for the joint use of two spatial frames, even if neither was technically required for the upcoming task (see Draschkow, Nobre, van Ede, Nature Human Behaviour, 2022). Though highly speculative at this stage, such reliance on multiple spatial frames may make our memories more robust to decay and/or interference. Moreover, while past location was never explicitly probed in our task, in daily life the past location may sometimes (unexpectedly) become relevant, hence it may be useful to hold onto it, just in case. Thus, considering the past location merely as an “irrelevant feature” (that takes time to delete) may not do sufficient justice to the potential roles of retaining past locations of dynamic visual objects held in working memory.

      As also stated in response to comment 1 above, we now added these relevant considerations to our Discussion:

      Page 5 (Discussion): “In our study, the past location of the memory items was technically irrelevant for the task and could thus, in principle, be dropped after encoding. One possibility is that participants remapped the two memory items to their future locations soon after encoding, and had started – but not finished – dropping the past location by the time the cue arrived. In such a scenario, the past signal is merely a residual trace of the memory items that serves no purpose but still pulls gaze. Alternatively, however, the past locations may be utilised by the brain to help individuate/separate the two memory items. Moreover, by storing items with regard to multiple spatial frames (cf. 37) – here with regard to both past and future visual locations – it is conceivable that memories may become more robust to decay and/or interference. Also, while in our task past locations were never probed, in everyday life it may be useful to remember where you last saw something before it disappeared behind an occluder. In future work, it will prove interesting to systematically vary to the delay between encoding and cue to assess whether the reliance on the past location gradually dissipates with time (consistent with dropping an irrelevant feature), or whether the past trace remains preserved despite longer delays (consistent with preserving utility for working memory).”

      Reviewer 3, Comments:

      This study utilizes saccade metrics to explore, what the authors term the "past and future" of working memory. The study features an original design: in each trial, two pairs of stimuli are presented, first a vertical pair and then a horizontal one. Between these two pairs comes the cue that points the participant to one target of the first pair and another of the second pair. The task is to compare the two cued targets. The design is novel and original but it can be split into two known tasks - the first is a classic working memory task (a post-cue informs participants which of two memorized items is the target), which the authors have used before; and the second is a classic spatial attention task (a pre-cue signal that attention should be oriented left or right), which was used by numerous other studies in the past. The combination of these two tasks in one design is novel and important, as it enables the examination of the dynamics and overlapping processes of these tasks, and this has a lot of merit. However, each task separately is not new. There are quite a few studies on working memory and microsaccades and many on spatial attention and microsaccades. I am concerned that the interpretation of "past vs. future" could mislead readers to think that this is a new field of research, when in fact it is the (nice) extension of an existing one. Since there are so many studies that examined pre-cues and post-cues relative to microsaccades, I expected the interpretation here to rely more heavily on the existing knowledge base in this field. I believe this would have provided a better context of these findings, which are not only on "past" vs. "future" but also on "working memory" vs. "spatial attention".

      Thank you for considering our findings novel and important, while at the same time reminding us of the parallels to prior tasks studying spatial attention in perception and working memory. We fully agree that our task likely engages both attention to the (past) memory item as well as spatial attention to the upcoming (future) test stimulus. At the same time, there is a critical difference in spatial attention for the future in our task compared with ample prior tasks engaging spatial cueing of attention for perception. In our task, the cue never directly cues the future location. Rather, it exclusively cues the relevant memory item. It is the memory item that is associated with the relevant future location, according to the future rule. This integration of the rule-based future location into the memory representation is distinct from classical spatial-attention tasks in which attention is cued directly to a specific location via, for example, a spatial cue such as an arrow.

      Thus, if we wish to think about our task as engaging cueing of spatial attention for perception, we have to at least also invoke the process of cueing the relevant location via the appropriate memory item. We feel it is more parsimonious to think of this as attending to both the past and future location of a dynamic visual object in working memory.

      If we return to our opening example, when we see a bird disappear behind a building, we can keep in working memory where we last saw it, while anticipating where it will re-appear to guide our external spatial attention. Here too, spatial attention is fully dependent on working-memory content (the bird itself) – mirroring the dynamic semng in our study. Thus, we believe our findings contribute a fresh perspective, while of course also extending established fields. We now contextualize our finding within the literature and clarify our unique contribution in our revised manuscript:

      Page 5 (Discussion): “Building on the above, at face value, our task may appear like a study that simply combines two established tasks: tasks using retro-cues to study attention in working memory (e.g.,2,31-33) and tasks using pre-cues to study orienting of spatial attention to an upcoming external stimulus (e.g., 31,32,34–36). A critical difference with common pre-cue studies, however, is that the cue in our task never directly informed the relevant future location. Rather, as also stressed above, the future location was a feature of the cued memory item (according to the future rule), and not of the cue itself. Note how this type of scenario may not be uncommon in everyday life, such as in our opening example of a bird flying behind a building. Here too, the future relevant location is determined by the bird – i.e. the memory content – itself.”

      Reviewer 2, Recommendations:

      It would be helpful to set up predictions based on existing working memory models. Otherwise, the claim that the joint coding of past/future is "not trivial" is simply asserted, rather than contradicting an existing model or prior empirical results. If the non-trivial aspect is simply the ability to demonstrate the joint coding empirical through a good experimental design, make it clear that this is the contribution. For example, it may be that prevailing models predict exactly this finding, but nobody has been able to demonstrate it cleanly, as the authors do here. So the non-triviality is not that the result contradicts working memory models, but rather relates to the methodological difficulty of revealing such an effect.

      Thank you for your recommendation. First, please see our point-by-point responses to the individual comments above, where we also state relevant changes that we have made to our article, and where we clarify what we meant with “non trivial”. As we currently also state in our introduction, our work took as a starting point the framework that working memory is inherently about the past while being for the future (cf. van Ede & Nobre, Annual Review of Psychology, 2023). By virtue of our unique task design, we were able to empirically demonstrate that visual contents in working memory are selected via both their past and their future-relevant locations – with past and future memory attributes being engaged together in time. With “not trivial” we merely intend to make clear that there are viable alternatives than the findings we observed. For example, past could have been replaced by the future, or it could have been that item selection (through its past location) was required before its future-relevant location could be considered (i.e. past-before-future, rather than joint selection as we reported). We outline these alternatives in the second paragraph of our Discussion:

      Page 5 (Discussion): “Our finding of joint utilisation of past and future memory attributes emerged from at least two alternative scenarios of how the brain may deal with dynamic everyday working memory demands in which memory content is encoded at one location but needed at another.

      First, [….]”

      Our work was not motivated from a particular theoretical debate and did not aim to challenge ongoing debates in the working-memory literature, such as: slot vs. resource, active vs. silent coding, decay vs. interference, and so on. To our knowledge, none of these debates makes specific claims about the retention and selection of past and future visual memory attributes – despite this being an important question for understanding working memory in dynamics everyday semngs, as we hoped to make clear by our opening example.

      Reviewer 3, Recommendations:

      I recommend that the present findings be more clearly interpreted in the context of previous findings on working memory and attention. The task design includes two components - the first (post-cue) is a classic working memory task and the second (the pre-cue) is a classic spatial attention design. Both components were thoroughly studied in the past and this previous knowledge should be better integrated into the present conclusions. I specifically feel uncomfortable with the interpretation of past vs. future. I find this framework to be misleading because it reads like this paper is on a topic that is completely new and never studied before, when in fact this is a study on the interaction between working memory and spatial attention. I recommend the authors minimize this past-future framing or be more explicit in explaining how this new framework relates to the more common terminology in the field and make sure that the findings are not presented in a vacuum, as another contribution to the vibrant field that they are part of.

      Thank you for these recommendations. Please also see our point-by-point responses to the individual comments above. Here, we explained our logic behind using the terminology of past vs. future (in addition, see also our response to point 2 or reviewer 2). Here, we also stated relevant changes that we have made to our manuscript to explain how our findings complement – but are also distinct from – prior tasks that used pre-cues to direct spatial attention to an upcoming stimulus. As we explained above, in our task, the cue itself never contained information about the upcoming test location. Rather, the upcoming test location was a property of the memory item (given the future rule). Hence, we referred to this as a “future attribute” of the cued memory item, rather than as the “cued location” for external spatial attention. Still, we agree the future bias likely (also) reflects spatial allocation to the upcoming test array, and we explicitly acknowledge this in our discussion. For example:

      Page 5 (Discussion): “This signal may reflect either of two situations: the selection of a future-copy of the cued memory content or anticipatory attention to its the anticipated location of its associated test-stimulus. Either way, by the nature of our experimental design, this future signal should be considered a content-specific memory attribute for two reasons. First, the two memory contents were always associated with opposite testing locations, hence the observed bias to the relevant future location must be attributed specifically to the cued memory content. Second, we cued which memory item would become tested based on its colour, but the to-be-tested location was dependent on the item’s encoding location, regardless of its colour. Hence, consideration of the item’s future-relevant location must have been mediated by selecting the memory item itself, as it could not have proceeded via cue colour directly.”

      Page 6 (Discussion): “Building on the above, at face value, our task may appear like a study that simply combines two established tasks: tasks using retro-cues to study attention in working memory (e.g.,2,31-33) and tasks using pre-cues to study orienting of spatial attention to an upcoming external stimulus (e.g., 31,32,34–36). A critical difference with common pre-cue studies, however, is that the cue in our task never directly informed the relevant future location. Rather, as also stressed above, the future location was a feature of the cued memory item (according to the future rule), and not of the cue itself. Note how this type of scenario may not be uncommon in everyday life, such as in our opening example of a bird flying behind a building. Here too, the future relevant location is determined by the bird – i.e. the memory content – itself.”

    1. Author response:

      Factual error in the eLife assessment to be corrected:

      In the eLife assessment, "ribosomal protein H59" should be changed to "helix 59 of the 28S ribosomal RNA" to make this factually correct.

      Provisional author response

      We thank the reviewers for their thorough and thoughtful readings of the manuscript. Our responses to the four suggestions made in their public reviews are below.

      Reviewer #1 (Public Review):

      Major points:

      (1) The identification of RAMP4 is a pivotal discovery in this paper. The sophisticated AlphaFold prediction, de novo model building of RAMP4's RBD domain, and sequence analyses provide strong evidence supporting the inclusion of RAMP4 in the ribosome-translocon complex structure.

      However, it is crucial to ensure the presence of RAMP4 in the purified sample. Particularly, a validation step such as western blotting for RAMP4 in the purified samples would strengthen the assertion that the ribosome-translocon complex indeed contains RAMP4. This is especially important given the purification steps involving stringent membrane solubilization and affinity column pull-down.

      As suggested, we will revise the manuscript to include Western blots showing that RAMP4 is retained at secretory translocons (and not multipass translocons) after solubilisation, affinity purification, and recovery of ribosome-translocon complexes.

      (2) Despite the comprehensive analyses conducted by the authors, it is challenging to accept the assertion that the extra density observed in TRAP class 1 corresponds to calnexin. The additional density in TRAP class 1 appears to be less well-resolved, and the evidence for assigning it as calnexin is insufficient. The extra density there can be any proteins that bind to TRAP. It is recommended that the authors examine the density on the ER lumen side. An investigation into whether calnexin's N-globular domain and P-domain are present in the ER lumen in TRAP class 1 would provide a clearer understanding.

      We agree that the Calnexin assignment is less confident than the other assignments in this manuscript, and that further support would be ideal. We have exhaustively searched our maps for any unexplained density connected with the putative Calnexin TMD, and have found none. This is consistent with Calnexin's lumenal domain being flexibly linked to its TMD, and thus would not be resolved in a ribosome-aligned reconstruction.

      Our assignment of this TMD to Calnexin was based on existing biochemical data (referenced in the paper) favouring this as the best working hypothesis by far: Calnexin is TRAP’s only abundant co-purifying factor, and their interaction is sensitive to point mutations in the Calnexin TMD. Recognising that this is not conclusive, we will ensure that the text and figures consistently describe this assignment as provisional or putative.

      (3) In the section titled 'TRAP competes and cooperates with different translocon subunits,' the authors present a compelling explanation for why TRAP delta defects can lead to congenital disorders of glycosylation. To enhance this explanation, it would be valuable if the authors could provide additional analyses based on mutations mentioned in the references. Specifically, examining whether these mutations align with the TRAP delta-OSTA structure models would strengthen the link between TRAP delta defects and the observed congenital disorders of glycosylation.

      We agree that mapping disease-causing point mutants to the TRAP delta structure could be potentially informative. Unfortunately, the referenced TRAP delta disease mutants act by simply impairing TRAP delta expression, and thus admit no such fine-grained analyses. However, sequence conservation is our next best guide to mutant function. We note in the text that the contact site charges on TRAP delta and RPN2 are conserved, and that the closest-juxtaposed interaction pair (K117 on TRAPδ and D386 on RPN2) is also the most conserved.

      Reviewer #2 (Public Review):

      Strengths:

      The manuscript contains numerous novel new structural analyses and their potential functional implications. While all findings are exciting, the highlight is the discovery of RAMP4/SERP1 near the Sec61 lateral gate. Overall, the strength is the thorough and extensive structural analysis of the different high-resolution RTC classes as well as the expert bioinformatic evolutionary analysis.

      Weaknesses:

      A minor downside of the manuscript is the sheer volume of analyses and mechanistic hypotheses, which makes it sometimes difficult to follow. The authors might consider offloading some analyses based on weaker evidence to the supplement to maximize impact.

      We agree that the manuscript is long, and we will seek ways to streamline it in revision while avoiding the undesirable side effect of making important findings undiscoverable via literature searches (an unfortunate consequence of many supplemental data). Indeed, we chose eLife for its flexibility regarding article length and suitability for extended and detailed analyses.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1:

      We are grateful for the overall positive feedback from the reviewer.

      We agree with the reviewer that our data showing cellular co-localization between PRC1 and BIN1 requires further investigation in future studies, however, we are confident that in the current form, our manuscript already presents multiple evidences for the role of BIN1 in mitotic processes. We would like to emphasize that PRC1 is not the sole BIN1 partner that connects it to mitotic processes, but it is only one out of more than a dozen that we identified in our study. Furthermore, the mitotic connection with BIN1 is not absolutely novel as BIN1 levels are mildly fluctuating during the cell cycle, similar to other proteins involved in the regulation of the cell cycle (Santos et al., 2015) and because DNM2 is also a well-accepted actor during mitosis (Thompson et al., 2002).

      The less marked co-localization between BIN1 and PRC1 compared to the strong co-localization between BIN1 and DNM2 can be a consequence of their weaker affinity and their partial binding. Yet, this does not necessarily imply that stronger interactions have more biological significance. For example, weaker affinities can be compensated by local concentrations to achieve an even higher degree of cellular complexes than of strongly binding interactions that are separated within the cell. Furthermore, even the degree of complex formation cannot be used intuitively to estimate the biological significance of a complex because complexes can trigger very important biological processes even at very low abundances, e.g. by catalyzing enzymatic reactions. Deciding what is and what is not “biologically significant” among the identified interactions remains to be answered in the future, once we are able to overview complex biological processes in a holistic manner.

      In the revised version, we implemented minor changes to further clarify the raised points.

      Reviewer #2:

      We thank the reviewer for the careful assessment and we are pleased to see the positive enthusiasm regarding our affinity interactomic strategy.

      The reviewer points out that affinities were only measured with a single technique, which is relatively unproven. While it is true that our work uses two techniques building on the same holdup concept, we rather believe that this approach is well-proven. The original holdup method was described almost 20 years ago and since then, it has been used in more than 10 publications for quantitative interactomics. Over the years, at least five distinct generations of the assay were developed, all building on the expertise of the preceding one. In the past, we extensively proved that the resulting affinities show excellent agreement with affinities measured with other methods, such as fluorescence polarization, isothermal titration calorimetry, or surface plasmon resonance (for example in Vincentelli et al. Nat. Meth. 2015; Gogl et al. 2020 Structure; Gogl et al. 2022 Nat.Com.). However, it is true that the most recent variation of this method family, called native holdup, is a fairly new approach published just a bit more than a year ago and this is only the third work that utilizes this method. Yet, in our original work describing the method, we demonstrated good agreement with the results of previous holdup experiments, as well as with orthogonal affinity measurements (Zambo et al. 2022).

      Importantly, the reviewer raises concerns regarding the number of replicates used in our study, as well as the reliability of our methodology. We are glad for such a comment as it allows us to explain our motives behind experimental design which is most often left out from scientific works to save space and keep focus on results. The reason why we use technical replicates instead of the typical biological replicates lies in the nature of the holdup assay. In a typical interactomic assay, such as immunoprecipitation, a lot of variables can perturb the outcome of the measurement, such as bait immobilization, or captured prey leakage during washing steps. The output of such an experiment is a list of statistically significant partners and to minimize these variabilities, biological replicates are used. In the case of a native holdup approach, a panel of an equal amount of resins, all saturated with different baits or controls, is mixed with an equal amount of cell extract, taken from a single tube, and after a brief incubation, the supernatant of this mixture is analyzed. The output of such an experiment is a list of relative concentrations of prey and to maximize its accuracy, we use technical replicates. Using an ideal analytical method, such as fluorescence, it is not necessary to use technical replicates to reach accurate results. For example, the general accuracy of a holdup experiment coupled with a robust analytical approach can be seen clearly in our fragmentomic holdup data shown in Figure 7C where mutant domains that do not have any impact on the interactome show extreme agreement in affinities. Unfortunately, mass spectrometry is less accurate as an analytical method, hence we use technical triplicates to compensate for this. Finally, in the case of BIN1, an independent nHU measurement was also performed using a less capable mass spectrometer. Not counting the 117 detected partners of BIN1 that were only detected in only one of these proteomic measurements, 29 partners were identified as common significant partners in both of these measurements showing nearly identical affinities with a mean standard deviation between measured pKapp values of 0.18, meaning that the obtained dissociation constants are within a <2.5-fold range with >95% probability. There were also 61 BIN1 partners that were detected in both proteomic measurements but were only identified as a significant interaction partner in one of these experiments. Yet many of them show binding in both assays, albeit were found to be not significant in one of these assays. For example, CDC20 shows 66% depletion in one assay (significant binding) while it shows 54% depletion in the other (not significant binding), or CKAP2 shows 58% depletion in one assay (significant binding) while it shows 41% depletion in the other (not significant binding). We hope that these examples show that statistical significance in nHU experiments rather signifies how certain we are in a particular affinity measurement and not the accuracy of the affinity measurement itself. While there are true discrepancies between some of the affinity measurements between these experiments, that would be possible to clarify with more experimental replicates, the raw data presented in our work clearly demonstrate the strength and robustness of a fully quantitative interactomic assay.

      In the revised version, we clarified the number of replicates in the text, in the figure legends, and included some of this discussion in the method section.

      The reviewer had some very useful comments regarding affinity differences between short fragments and full-length proteins. In his comment, he possibly made a typo as we find that fulllength proteins typically interact with higher affinities compared to short PxxP motif fragments in isolation and not weaker. The reviewer also comments that we explain this difference with cooperativity. In a previous preprint version, which the reviewer may have seen, this was indeed the case, but since we realized that we did not have sufficient evidence supporting this model, therefore we did not discuss this in detail in the last version submitted to eLife. To clarify this, we included more discussion about the observed differences in the affinities between fragments and full-length proteins, but since we have limited data to make solid conclusions, we do not go into details about underlying models.

      Instead of cooperativity, the reviewer suggests that the observed differences may originate from additional residues that were not included in our peptides. Indeed, many similar experiments fail because of suboptimal peptide library design. Our peptide library was constructed as 15-mer, xxxxxxPxxPxxxxx motifs and we do not see a strong contribution of residues at the far end of these peptides. Specificity logo reconstructions are expected to identify all key residues that participate in SH3 domain binding, and based on this, all key residues of the identified motifs can be included in shorter 10-mer, xxxPxxPxxx motifs. Therefore, it is unlikely that residues outside our peptide regions will greatly contribute to the site-specific interactions of SH3 domains. It is however possible that other sites, that are sequentially far away from the studied PxxP motifs, are also capable of binding to SH3 through a different surface, but in light of the small size of an isolated SH3 domain, we believe it is very unlikely. It is also possible that BIN1 could also interact with other types of SH3 binding motifs that were not included in our peptide library. We think a more likely explanation is some sort of cooperativity. Cooperativity, or rather synergism between different sites can be easily explained in typical situations, such as in the case of a bimolecular interaction that is mediated by two independent sites. In such an event, once one site is bound, the second binding event will likely also occur because of the high effective local concentration of the binding sites. However, cooperativity can also form in atypical conditions and a molecular explanation for these events is rather elusive. As BIN1 contains a single SH3 domain, its binding to targets containing more binding sites can be challenging to interpret. If these sites are part of a greater Pro-rich region, such as in the case of DNM2, it is possible that the entire region adopts a fuzzy, malleable, yet PPII-like helical conformation. Once the SH3 domain is recruited to this helical region, it can freely trans-locate within this region via lateral diffusion and it will pause on optimal PxxP motifs. As an alternative to this sliding mechanism, a diffusion-limited cooperative binding can also occur. If the two motifs are not part of the same Pro-rich region, but are relatively close in space, such as in the case of ITCH or PRC1, once a BIN1 molecule dissociates from one site, it has a higher chance to rebind to the second site due to higher local concentrations. Such an event can more likely occur if a transient, but relatively stable encounter complex exists between the two molecules, from which complex formation can occur at both sites (A+B↔AB; AB↔ABsite1; AB*↔ABsite2). However, this large effective local concentration in this encounter complex is only temporary because diffusion rapidly diminishes it, although weak electrostatic interactions can increase the lifetime of such encounter complexes. In contrast, the large effective local concentration in conventional multivalent binding is time-independent and only determined by the geometry of the complex. Finally, it may also occur that our empirical bait concentration estimation for immobilized biotinylated proteins is less accurate than the concentration estimation of peptide baits because we approximate this value based on peptide baits. For this technical reason, which was discussed in detail in the original paper describing the nHU approach, we are carefully using apparent affinities for nHU experiments. Nevertheless, even without accurate bait concentrations, our nHU experiment provides precise relative affinities and, thus partner ranking. Either of the mechanisms underlying the interactions we study would be difficult to further explore experimentally, especially at the proteomic level.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Recommendations For The Authors):

      The data is poorly dealt with, and the figures are shown poorly. For example, Figure 2A is not even shown totally.

      We apologize for any difficulties that the reviewer encountered while attempting to view the figures. We have confirmed that all figures, including all panels of Figure 2, display correctly on the HTML and PDF versions of the article hosted at bioRxiv. The HTML and PDF versions generated by eLife also appears to contain all figures and panels in their entirety.

      Reviewer #2 (Recommendations For The Authors):

      Please refer to the public review for possible revisions.

      We thank Reviewer #2 for the summary and thoughtful comments provided in the Public Review. We note the point of possible revision noted from the Public Review: “It can be informative to directly demonstrate DPYD promoter-enhancer interactions. However, the genetic variants support the integration of regulatory activities.” In Figure 4, we provide evidence for direct promoterenhancer interaction though the use of 3C. We furthermore demonstrate that these interactions are dependent upon genotype at rs4294451 as stated by the reviewer. We have highlighted the promoter-enhancer interaction in the revised manuscript, lines 323-325. The role of genotype in this interaction is also specifically discussed in lines 378-381.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Gap junction channels establish gated intercellular conduits that allow the diffusion of solutes between two cells. Hexameric connexin26 (Cx26) hemichannels are closed under basal conditions and open in response to CO2. In contrast, when forming a dodecameric gapjunction, channels are open under basal conditions and close with increased CO2 levels. Previous experiments have implicated Cx26 residue K125 in the gating mechanism by CO2, which is thought to become carbamylated by CO2. Carbamylation is a labile post-translational modification that confers negative charge to the K125 side chain. How the introduction of a negative charge at K125 causes a change in gating is unclear, but it has been proposed that carbamylated K125 forms a salt bridge with the side chain at R104, causing a conformational change in the channel. It is also unclear how overall gating is controlled by changes in CO2, since there is significant variability between structures of gap-junction channels and the cytoplasmic domain is generally poorly resolved. Structures of WT Cx26 gap-junction channels determined in the presence of various concentrations of CO2 have suggested that the cytoplasmatic N-terminus changes conformation depending on the concentration of the gas, occluding the pore when CO2 levels are high.

      In the present manuscript, Deborah H. Brotherton and collaborators use an intercellular dyetransfer assay to show that Cx26 gap-junction channels containing the K125E mutation, which mimics carbamylation caused by CO2, is constitutively closed even at CO2 concentrations where WT channels are open. Several cryo-EM structures of WT and mutant Cx26 gap junction channels were determined at various conditions and using classification procedures that extracted more than one structural class from some of the datasets. Together, the features on each of the different structures are generally consistent with previously obtained structures at different CO2 concentrations and support the mechanism that is proposed in the manuscript. The most populated class for K125E channels determined at high CO2 shows a pore that is constricted by the N-terminus, and a cytoplasmic region that was better resolved than in WT channels, suggesting increased stability. The K125E structure closely resembles one of the two major classes obtained for WT channels at high CO2. These findings support the hypothesis that the K125E mutation biases channels towards the closed state, while WT channels are in an equilibrium between open and closed states even in the presence of high CO2. Consistently, a structure of K125E obtained in the absence of CO2 appeared to also represent a closed state but at lower resolution, suggesting that CO2 has other effects on the channel beyond carbamylation of K125 that also contribute to stabilizing the closed state. Structures determined for K125R channels, which are constitutively open because arginine cannot be carbamylated, and would be predicted to represent open states, yielded apparently inconclusive results.

      A non-protein density was found to be trapped inside the pore in all structures obtained using both DDM and LMNG detergents, suggesting that the density represents a lipid rather than a detergent molecule. It is thought that the lipid could contribute to the process of gating, but this remains speculative. The cytoplasmic region in the tentatively closed structural class of the WT channel obtained using LMNG was better resolved. An additional portion of the cytoplasmic face could be resolved by focusing classification on a single subunit, which had a conformation that resembled the AlphaFold prediction. However, this single-subunit conformation was incompatible with a C6-symmetric arrangement. Together, the results suggest that the identified states of the channel represent open states and closed states resulting from interaction with CO2. Therefore, the observed conformational changes illuminate a possible structural mechanism for channel gating in response to CO2.

      Some of the discussion involving comparisons with structures of other gap junction channels are relatively hard to follow as currently written, especially for a general readership. Also, no additional functional experiments are carried out to test any of the hypotheses arising from the data. However, structures were determined in multiple conditions, with results that were consistent with the main hypothesis of the manuscript. No discussion is provided, even if speculative, to explain the difference in behavior between hemichannels and gap junction channels. Also, no attempt was made to measure the dimensions of the pore, which is relevant because of the importance of identifying if the structures indeed represent open or closed states of the channel.

      We have considerably revised the manuscript in an attempt to make it more tractable. We respond to the individual comments below.

      Reviewer #2 (Public Review):

      Summary:

      The manuscript by Brotherton et al. describes a structural study of connexin-26 (Cx26) gap junction channel mutant K125E, which is designed to mimic the CO2-inhibited form of the channel. In the wild-type Cx26, exposure to CO2 is presumed to close the channel through carbamylation of the residue K125. The authors mutated K125 to a negatively charged residue to mimic this effect, and they observed by cryo-EM analysis of the mutated channel that the pore of the channel is constricted. The authors were able to observe conformations of the channel with resolved density for the cytoplasmic loop (in which K125 is located). Based on the observed conformations and on the position of the N-terminal helix, which is involved in channel gating and in controlling the size of the pore, the authors propose the mechanisms of Cx26 regulation.

      Strengths:

      This is a very interesting and timely study, and the observations provide a lot of new information on connexin channel regulation. The authors use the state of the art cryo-EM analysis and 3D classification approaches to tease out the conformations of the channel that can be interpreted as "inhibited", with important implications for our understanding of how the conformations of the connexin channels controlled.

      Weaknesses:

      My fundamental question to the premise of this study is: to what extent can K125 carbamylation by recapitulated by a simple K125E mutation? Lysine has a large side chain, and its carbamylation would make it even slightly larger. While the authors make a compelling case for E125-induced conformational changes focusing primarily on the negative charge, I wonder whether they considered the extent to which their observation with this mutant may translate to the carbamoylated lysine in the wild-type Cx26, considering not only the charge but also the size of the modified side-chain.

      This is an important point. We agree that the difference in size will have a different effect on the structure. For kinases, aspartate or glutamate are often used as mimics of phosphorylated serine or threonine and these will have the same issues. The fact that we cannot resolve the relevant side-chains in the density may be indicative that the mutation doesn’t give the whole story. It may be able to shift the equilibrium towards the closed conformation, but not stably trap the molecule in that conformation. We include a comment to this effect in the revised manuscript.

      Reviewer #3 (Public Review):

      Summary:

      The mechanism underlying the well-documented CO2-regulated activity of connexin 26 (Cx26) remains poorly understood. This is largely due to the labile nature of CO2-mediated carbamylation, making it challenging to visualize the effects of this reversible posttranslational modification. This paper by Brotherton et al. aims to address this gap by providing structural insights through cryo-EM structures of a carbamylation-mimetic mutant of the gap junction protein.

      Strengths:

      The combination of the mutation, elevated PCO2, and the use of LMNG detergent resulted in high-resolution maps that revealed, for the first time, the structure of the cytoplasmic loop between transmembrane helix (TM) 2 and 3.

      Weaknesses:

      The presented maps merely reinforce their previous findings, wherein wildtype Cx26 favored a closed conformation in the presence of high PCO2. While the structure of the TM2-TM3 loop may suggest a mechanism for stabilizing the closed conformation, no experimental data was provided to support this mechanism. Additionally, the cryo-EM maps were not effectively presented, making it difficult for readers to grasp the message.

      We have extensively revised the manuscript so that the novelty of this study is more apparent. There are three major points

      (1) The carbamylation mimetic pushes the conformation towards the closed conformation. Previously we just showed that CO2 pushes the conformation towards this conformation. Though we could show this was not due to pH, and could speculate this was due to carbamylation as suggested by previous mutagenesis studies, our data did not provide any mechanism whereby Lys125 was involved.

      (2) In going from the open to closed conformations, not only is a conformational change in TM2 involved, as we saw previously, but also a conformational change in TM1, the linker to the N-terminus and the cytoplasmic loop. Thus there is a clear connection between Lys125 and the conformation of the pore-closing N-terminus.

      (3) We observe for the first time in any connexin structure, density for the cytoplasmic loop. Since this loop is important in regulation, knowing how it might influence the positions of the transmembrane helices is important information if we are to understand how connexins can be regulated.

      Reviewing Editor:

      The reviewers have agreed on a list of suggested revisions that would improve the eLife assessment if implemented, which are as follows:

      (1) For completeness, Figure 1 could be supplied with an example of how the experiment would look like in the presence of CO2 - for the wild-type and for the K125E mutant. presumably for the wild-type this has been done previously in exactly this assay format, but this control would be an important part of characterization for the mutant. Page 4, lines 105106; "unsurprisingly, Cx26K125E gap junctions remain closed at a PCO2 of 55 mmHg." The data should be presented in the manuscript.

      We have now included the data with a PCO2 of 55mmH. This is now Figure 4 in our revised manuscript.

      (2) Would AlphaFold predictions show any interpretable differences in the E125 mutant, compared to the K125 (the wild-type)?

      We tried this in response to the reviewer’s suggestion. We did not see any interpretable differences. In general AlphaFold is not recognised as giving meaningful information around point mutations.

      (3) The K125R mutant appears to be a more effective control for extracting significant features from the K125E maps. Given that the use of a buffer containing high PCO2 is essential for obtaining high-resolution maps, wildtype Cx26 is unsuitable as an appropriate control. The K125R map, obtained at a high resolution (2.1Å), supports its suitability as a robust control.

      Though we are unsure what the referee is referring to here, we have rewritten this section and compare against the K125R map (figure 5a) as well as that derived from the wild-type protein. The important point is that the K125E mutant, causes a structural change that is consistent with the closure of the gap junctions that we observe in the dye-transfer assays.

      (4) Likewise, the rationale for using wildtype Cx26 maps obtained in DDM is unclear. Wildtype Cx26 seems to yield much better cryo-EM maps in LMNG. We suggest focusing the manuscript on the higher-quality maps, and providing supporting information from the DDM maps to discuss consistency between observations and the likely possibility that the nonprotein density in the pore is lipid and not detergent.

      The rationale for comparing the mutants against the wt Cx26 maps obtained in DDM was because the mutants were also solubilised in DDM. However, taking the lead from the referees’ comments, we have now rewritten the manuscript so that we first focus on the data we obtain from protein solubilised in LMNG. We feel this makes our message much clearer.

      (5) In general, the rationale for utilizing cryo-EM maps with the entire selected particles is unclear. Although the overall resolutions may slightly improve in this approach, the regions of interest, such as the N-terminus and the cytoplasmic loop, appear to be better ordered afer further classifications. The paper would be more comprehensible if it focuses solely on the classes representing the pore-constricting N-terminus (PCN) and the pore-open flexible Nterminus (POFN) conformations. Also, the nomenclatures used in the manuscript, such as "WT90-Class1", "K125E90-1", "LMNG90-class1", "LMNG90-mon-pcn" are confusing.

      LMNG90s are also wildtype; K125E-90-1 is in Class1 for this mutant and is similar to WT90Class2, which represents the PCN conformation. More consistent and intuitive nomenclatures would be helpful.

      We agree with the referees’ comments. This should now be clearer with our rewritten manuscript where we have simplified this considerably. We now call the conformations NConst (N-terminus defined and constricting the pore) and NFlex (N-terminus not visible) and keep this consistent throughout.

      (6) A potential salt bridge between the carbamylated K125 and R104 is proposed to account for the prevalence of Class-1 (i.e., PCN) in the majority of cryo-EM particles. However, the side chain densities are not well-defined, suggesting that such an interaction may not be strong enough to trap Cx26 in a closed conformation. Furthermore, the absence of experimental data to support this mechanism makes it unclear how likely this mechanism may be. Combining simple mutagenesis, such as R104E, with a dye transfer assay could offer support for this mechanism. Are there any published experimental results that could help address this question without the need for additional experimental work? Alternatively, as acknowledged in the discussion, this mechanism may be deemed as an "over-simplification." What is an alternative mechanism?

      R104 has been mutated to alanine in gap junctions and tested in a dye transfer assay as now mentioned in the text (Nijar et al, J Physiol 2021) supporting this role. In hemichannels R104 has been mutated to both alanine and glutamate and tested through dye loading assays Meigh et al, eLife 2013). Also in hemichannels R104 and K125 have been mutated to cysteines allowing them to be cross-linked through a disulphide bond. This mutant responds to a change in redox potential in a similar way to which the wild type protein responds to CO2 (Meigh et al, Open Biol 2015). Therefore, there is no doubt that the residues are important for the mechanism and the salt-bridge interaction seems a plausible mechanism to reconcile the mutagenesis data, however we cannot be sure that there are not other interactions involved that are necessary for closure. This information has now been included in the text.

      (7) The cryo-EM maps presented in the manuscript propose that gap junctions are constitutively open under normal PCO2 as the flexible N-terminus clears the solute permeation pathway in the middle of the channel. However, hemichannels appear to be closed under normal PCO2. It is puzzling how gap junctions can open when hemichannels are closed under normal PCO2 conditions. If this question has been addressed in previous studies, the underlying mechanism should be explicitly described in the introduction. If it remains an open question, differences in the opening mechanisms between hemichannels and gap junctions should be investigated.

      We suspect this is due to the difference in flexibility of gap junctions relative to hemichannels. However, a discussion of this is beyond this paper and would be complete speculation based on hemichannel structures of other connexins, performed in different buffering systems. There are no high resolution structures of Cx26 hemichannels.

      (8) A mystery density likely representing a lipid is abruptly introduced, but the significance of this discovery is unclear. It is hard to place the lipid on Figure S6 in the wider context of everything else that is discussed in the text. It would be helpful for readers if a figure were provided to show where the density is located in relation to all the other regions that are extensively discussed in the text.

      In the revised text this section has been completely rewritten. We have now include a more informative view in a new figure (Figure 1 – figure supplement 3).

      (9) Including and displaying even tentative pore-diameter measurements for the different states - this would be helpful for readers and provide a more direct visual cue as to the difference between open and closed states.

      We have purposely avoided giving precise measurements to the pore-diameter, since this depends on how we model the N-terminus. The first three residues are difficult to model into the density without causing stearic clashes with the neighbouring subunits.

      (10) Given that no additional experiments for channel function were carried out, it would be useful if to provide a more detailed discussion of additional mutagenesis results from the literature that are related to the experimental results presented.

      We have amplified this in the discussion (see answer to point 6).

      The reviewers also agreed that improvements in the presentation of the data would strengthen the manuscript. Here is a summary list of suggestions by reviewers aimed at helping improve how the data is presented:

      (1) Why is the pipette bright green in the top image, but rather weakly green in the bottom image in Figure 1 - is this the case for all images?

      (Now figure 4) This depends on whether the pipette was in the focal plane of view or not. The important point of these images is the difference in intensity of the donor vs the recipient cell. The graphs in figure 4c illustrate clearly the difference between the wild-type and the mutant gap junctions.

      (2) In figures 2-5, labels would help a lot in understanding what is shown - while the legends do provide the information on what is presented, it would help the reader to see the models/maps with labels directly in the panel. For example, Figure 2a/b - just indicating "WT90 Cx26" in pink and "K125E90" in blue directly in the panel would reduce the work for the reader.

      We have extensively modified the labels in the figures to address this issue.

      (3) Figure 4 - magenta and pink are fairly close, and to avoid confusion it might be useful to use a different color selection. This is especially true when structures are overlayed, as in this figure - the presentation becomes rather complicated, so the less confusion the color code can introduce, the better.

      (Now Figure 2) We have now changed pink to blue.

      (4) Figure 5 - a remarkably under-labelled figure.

      Now added labels.

      (5) Figure 6 - it would be interesting to add a comparison to Cx32 here as well for completeness, since the structure has been published in the meantime.

      Cx32 has now been included.

      (6) Figure 7 - please add equivalent labels on both sides of the model, left and right. Add the connecting lines for all of the tubes TM helices - this will help trace the structural elements shown. The legend does not quite explain the colors.

      We have modified the figure as suggested and explained the colours in the legend.

      (8) Fig.1 legend; Unclear what mCherry fluorescence represents. State that Cx26 was expressed as a translational fusion with mCherry.

      Now figure 4. We have now written “Montages each showing bright field DIC image of HeLa cells with mCherry fluorescence corresponding to the Cx26K125E-mCherry fusion superimposed (leftmost image) and the permeation of NBDG from the recorded cell to coupled cells.”

      (9) Fig. 3 b); Show R104 in the figure. Also E129-R98/R99 interaction is hard to acknowledge from the figure. It seems that the side chain density of E129 is not strong enough to support the modeled orientation.

      This is now Figure 1c. While the density in this region is sufficient to be confident of the main chain, we agree that the side chain density for the E129-R98/R99 interaction is not sufficiently clear to draw attention to and have removed the associated comment from the figure legend. The density is focussed on the linker between TM1 and the N-terminus and the KVRIEG motif. We prefer to omit R104, in order to keep the focus on this region. As described in the manuscript, the density for the R104 side chain is poor.

      (10) Fig. 3 c); Label the N-terminus and KVRIEG motif in the figure.

      Now Figure 1b. We have labelled the N-terminus. The KVRIEG motif is not visible in this map.

      (11) Page 9, lines 246-248; Restate, "We note, however, density near to Lys125, between Ser19 in the TM1-N-term linker, Tyr212 of TM4 and Tyr97 on TM3 of the neighbouring subunit, which we have been unable to explain with our modelling."

      We have reworded this.

      (12) Page 14, line 399; Patch clamp recording is not included in the manuscript.

      Patch clamp recordings were used to introduce dye into the donor cell.

      (13) On the same Figure 2, clashes are mentioned but these are hard to appreciate in any of the figures shown. Perhaps would be useful to include an inset showing this.

      We have modified Figure 2b slightly and added an explanation to highlight the clash. It is slightly confusing because the residues involved belong to neighbouring subunits.

      (14) The discussion related to Figure 6 is very hard to follow for readers who are not familiar with the context of abbreviations included on the figure labels. This figure could be improved to allow a general readership to identify more clearly each of the features and structural differences that are discussed in the text.

      We have extensively changed the text and updated the labels on the figure to make it much easier for the reader to follow.

      Below, you can find the individual reviews by each of the three reviewers.

      Reviewer #1 (Recommendations For The Authors):

      (1) In Figure 2d-e, the text discusses differences between K125E 90-1 and WT 90-class2 (7QEW), yet the figure compares K125E with 7QEQ. I suggest including a figure panel with a comparison between the two structures discussed in the manuscript text.

      This has been changed in the revised manuscript.

      Other comments have been addressed above.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      The reviewers thoughtful comments have helped us make the manuscript both more comprehensive and clearer. Thank you for your time and effort. We know that this is a long and technical paper. In our responses we refer to three documents:

      • Original: the first original submission

      • Revision: the revised document (02 MillardFranklinHerzog2023 v2.pdf)

      • Difference: a document that shows the changes made to text (but not figures or tables) from the original to revision (03 MillardFranklinHerzog2023 diff.pdf).

      Reviewer #1 (Recommendations For The Authors):

      (1) In general, the paper is well written and addresses important questions of muscle mechanics and muscle modeling. In the current version, the model limitations are briefly summarized in the abstract. However, the discussion needs a more complete description of limitations as well as a discussion of types of data (in vivo, ex vivo, single fiber, wholes muscle, MTU, etc.) that can be modeled using this approach.

      Please see the response to comment 23 for more details of the limitations that have been added to the revised document.

      (2) The choice of a model with several tendon parameters for simulating single muscle fiber experiments is not well justified.

      A rigid-tendon model with a slack length of zero was, in fact, used for these simulations for both the VEXAT and Hill models. In case this is still not clear: a rigid-tendon model of zero length is equivalent to no tendon at all. The text that first mentions the tendon model has now been modified to make it clearer that the parameters of the model were set to be consistent with no tendon at all:

      Please see the following text:

      Original:

      • page 17, column 1, line 28 ”... rigid tendon of zero length,”

      • page 17, column 1, line 51 ”... rigid tendon of zero length.”

      Revision:

      • page 19, column 1, line 19 ”... we used a rigid-tendon of zero length (equivalent to ignoring the tendon)”

      • page 19, column 1, line 38 ”... coupled with a rigid-tendon of zero-length.”

      Difference:

      • page 21, column 1, line 19 ”... we used a rigid-tendon... ”

      • page 21, column 1, line 45 ”... rigid-tendon of zero length ...”

      (3) A table that clarifies how all model parameters were estimated needs to be included in the main part of the manuscript.

      Two tables have been added to the manuscript that detail the parameters of the elastic-tendon cat soleus model (in the main body of the text) and the rabbit psoas fibril model (in an appendix). Each table includes:

      • A plain language parameter name

      • The mathematical symbol for the parameter

      • The value and unit of the parameter

      • A coded reference to the data source that indicates both the experimental animal and how the data was used to evaluate the parameter.

      Please see the following text:

      Revision:

      • page 11

      • page 42

      Difference:

      • page 11

      • page 46

      (4) The supplemental information is not properly referenced in the main text. There are a number of smaller issues that also need to be addressed.

      Thank for your attention to detail. The following problems related to Appendix referencing have been fixed:

      • Appendices are now parenthetically referenced at the end of a sentence. However, a few references to figures (that are contained within anAppendix) still appear in the body of the sentence since moving these figure references makes the text difficult to understand.

      • All Appendices are now referenced in the main body of the text.

      (5) Abstract, line 6: While it is commonly assumed that the short range stiffness of muscle is due to cross bridges, Rack & Westbury (1974) noted that it occurs over a distance of 25-35 nm, and that many cross-bridges must be stretched even farther than this distance (their p. 348 middle). It seems unlikely that cross-bridges alone can actually account for the short-range stiffness.

      There are three parts to our response to this comment:

      (a) Rack & Westbury’s definition of short-range-stiffness and unrealistic cross-bridge stretches

      (b) Rack & Westbury’s definition of short-range-stiffness vs. linear-timeinvariant system theory

      (c) Updates to the paper

      a. Rack & Westbury’s definition of short-range-stiffness and unrealistic cross-bridge stretches.

      As you note, on page 348, Rack and Westbury write that ”If the short range stiffness is to be explained in terms of extension of cross-bridges, then many of them must be extended further than the 25-35 nm mentioned above.” Having re-read the paper, its not clear how these three factors are being treated in the 25−35 nm estimate:

      • the elasticity of the tendon and aponeurosis,

      • the elasticity of actin and myosin filaments,

      • and the cycling rate of the cross-bridges.

      Obviously the elasticity of the tendon, aponeurosis, actin, and myosin filaments will reduce the estimated amount of crossbridge strain during Rack and Westbury’s experiments. A potentially larger factor is the cycling rate of each cross-bridge. If each crossbridge cycles faster than 11 Hz (the maximum frequency Rack and Westbury used), then no single crossbridge would stretch by 25-35 nm. So why didn’t Rack and Westbury consider the cycling rate of crossbridges?

      Rack and Westbury’s reasoned that a perfectly elastic work loop would necessarily mean that all crossbridges stayed attached: as soon as a crossbridge cycles it would release its stored elastic energy and the work loop would no longer be elastic. Since Rack and Westbury measured some nearly perfect elastic work loops (the smallest loops in Fig. 2,3, and 4), I guess they assumed crossbridges remained attached during the 25-35 nm crossbridge stretch estimate. However, even Rack and Westbury note that none of the work loops they measured were perfectly elastic and so there is room to entertain the idea that crossbridges are cycling.

      Fortunately, for this discussion, crossbridge cycling rates have been measured.

      In-vitro measurements by Uyeda et al. show that crossbridges are cycling at 30 Hz when moving at 0.5-1.2 length/s. At this rate, there would be enough time for a single crossbridge to cycle nearly 2.72 times for every cycle of the 11 Hz sinusoidal perturbations, reducing its expected strain from 25-35 nm down to 9.2−12.9µm. This effect becomes even more pronounced if crossbridge cycling rate is used to explain the difference in sliding velocity between Uyeda et al.’s in-vitro data (0.5-1.2 length/s) and the maximum contraction velocity of an in-situ cat soleus (4.65 lengths/s, Scott et al.).

      b. Rack & Westbury’s definition of short-range-stiffness vs. linear-time-invariant system theory

      Rack and Westbury defined short-range-stiffness to describe a specific kind of force response of the muscle to cyclical length changes:

      • muscle force is linear with length change,

      • and independent of velocity.

      Rack and Westbury’s definition therefore fails when viscous forces become noticeable, because viscous forces are velocity dependent.

      On line 6 of the abstract the term ‘short-range-stiffness’ is not used because Rack and Westbury’s definition is too narrow for our purposes. Instead we are using the more general approach of approximating muscle as a linear-timeinvariant (LTI) system, where it is assumed that

      • the response of the system is linear

      • and time invariant.

      To unpack that a little, a muscle is considered in the ‘short-range’ in our work if it meets the criteria of a linear time-invariant (LTI) system:

      • the force response of muscle can be accurately described as a linear function of its length and velocity (its state)

      • and its response is not a function of time (which means constant stimulation, and no fatigue).

      In contrast to Rack and Westbury’s definition, the ‘short-range’ in linear systems theory is general enough to accommodate both elastic and viscous forces. In physical terms, small for an LTI approximation of muscle is larger than the short-range defined by Rack and Westbury: an LTI system can include velocity dependence, while short-range-stiffness ends when velocity dependence begins.

      c. Updates to the paper

      To make the differences between Rack and Westbury’s ‘short-range-stiffness’ and LTI system theory clearer: - We have removed all occurrences of ‘short-range’ that were associated with Kirsch et al. and have replaced this phrase with ‘small’.

      • On the first mention of Kirsch’s work we have made the wording more specific

      Revision:

      • page 1, column 1, lines 4,5

      • page 1, column 2, lines 14-21 ”Under constant activation ...”

      Difference: page 1, column 2, line 19-26

      • page 1, column 1, lines 4,5

      • page 1, column 2, lines 20-27 ”Under constant activation ...”

      • A footnote has been added to contrast the definition of ‘small’ in the context of an linear time invariant system to ‘short-range’ in the context of Rack and Westbury’s definition of short-range-stiffness.

      Revision: page 1, column 2, bottom

      Difference: page 1, column 2, bottom

      • In addition, we have added a brief overview of LTI system theory to make the analysis and results more easily understood:

      Revision: Figure 4 paragraph beginning on page 10, column 2, line 15 ”As long as ...”

      Difference: Figure 4 paragraph beginning on page 12, column 1, line 46 ”As long as ...”

      (6) Page 3, lines 6-8: It also seems unlikely that 25% of cross-bridges are attached at one time (Howard, 1997) even for supramaximal isometric stimulation. The number should be less than 20%. What would the ratio of load path stiffness be for low force movements such as changing the direction of a frictionless manipulandum or slow walking? The range of relative stiffnesses is of more interest than the upper limit.

      We have made the following updates to address this comment:

      • A 20% duty cycle now defines the upper bound stiffness of the actinmyosin load path.

      • We have also evaluated the lower bound actin-myosin stiffness when a single crossbridge is attached.

      • The stiffness of titin from Kellermayer et al. has been digitized at a length of 2 µm and 4 µm to more accurately capture the length dependence of titin’s stiffness.

      • We have added a new figure (Figure 14) to make it easier to compare the range of actin-myosin stiffness to titin-actin stiffness.

      • The text in the main body of the paper and the Appendix has been updated.

      • The script ’main ActinMyosinAndTitinStiffness.m’ used to perform the calculations and generate the figure is now a part of the code repository.

      Please see the following text:

      Revision

      • The paragraph beginning at page 2, column 2, line 45 ”The addition of a titin element ...”

      • Appendix A

      • Figure 14 (in Appendix A)

      Difference

      • The paragraph beginning at page 3, column 1, line 6: ”The addition of a titin element ...”

      • Appendix A

      • Figure 14 (in Appendix A)

      (7) Page 5, line 12: A word seems to be missing here, ”...together to further...”.

      Thank you for your attention to detail. The sentence has been corrected.

      Please see the following text:

      • Revision: page 4, column 2, line 40 ”... into a single ...”

      • Difference: page 5, column 1, line 18

      (8) Page 5, line 24-27: These ”theories” are not mutually exclusive, and it is misleading to suggest they are. There is evidence for binding of titin to actin at multiple locations and there is no reason why evidence supporting one binding location must detract from the evidence supporting other binding locations.

      The text has been modified to make it clear to readers that the different titinactin binding locations are not mutually exclusive. Please see the following text:

      • Revision: page 5, column 1, lines 17-19, the sentence beginning ”As previously mentioned, ...”

      • Difference: page 5, column 1, lines 41-44

      (9) Page 5, lines 48-51: Should cite Kellermayer and Granzier (1996) not Kellermayer et al. (1997).

      The reference to ‘Kellermayer et al.’ has been changed to ‘Kellermayer and Granzier’. The comment that the year of the reference should be changed from (1997) to (1996) is confusing: the 1996 paper is being referenced.

      For further details please see:

      • Revision: page 5, column 1, 39-40

      • Difference: page 5, column 2, line 19-22

      (10) Also, Dutta et al. (2018) should be cited as further showing that N2A titin by itself slows actin motility on myosin.

      Thank you for the suggestion. The sentence has been modified to include Dutta et al.:

      For further details please see:

      • Revision: page 5, column 1, 40

      • Difference: page 5, column 2, line 19-22

      (11) Figure 2 legend and elsewhere: it is odd to say that experiments used ”a cat soleus” when more than one cat coleus was used. Change to ”cat coleus”. See also page 15, line 15.

      Thank you for your attention to detail. All occurrences of ‘a cat soleus’ have been changed, with some sentence revision, to ‘cat soleus’.

      (12) Page 6, line 10: It is not clear why an MTU was used to simulate single muscle fiber experiments. What is the justification for choosing this particular model? Also, the choice of model might explain why the version with stiff tendon performs better than the version with an elastic tendon, but this is never mentioned. Why not use a muscle model with no tendon (e.g., Wakeling et al., 2021 J. Biomech.)?

      Please see the response to comment 2.

      (13) Millard et al.’s activation dynamics model also fails to capture the lengthdependence of activation dynamics (Shue and Crago, 1998; Sandercock and Heckman, 1997), which should be noted in the discussion along with other limitations.

      An additional limitations paragraph is in the revised manuscript that addresses this comment specifically. However, we have used Stephenson and Wendt as a reference for the shift in peak isometric force that comes with submaximal activation. In addition, we also reference Chow and Darling for the property that the maximum shortening velocity is reduced with submaximal activations.

      • Revision: page 22, column 1, line 41 ”Finally, the VEXAT model ...”

      • Difference: page 24, column 2, line 12 ”Finally, the VEXAT model ...”

      In addition, please see the response to comment 23.

      (14) Page 6, line 22: ”An underbar...”.

      Thank you for your attention to detail, this correction has been made.

      (14) Page 7, lines 27-32: This and other issues should be described in the Discussion under a heading of model limitations.

      Please see the response to comment 23.

      (15) Page 7, lines 43-44: Numerous papers from the last author’s laboratory contradict the claim that there is no force enhancement on the ascending limb by demonstrating that force enhancement does occur on the ascending limb (see e.g., Leonard & Herzog 2002, Peterson et al., 2004 and several papers from the Rassier laboratory).

      Thank you for your attention to detail. This statement is in error and has been removed. To improve this section of the paper, a paragraph has been added to briefly mention the experimental observations of residual force enhancement before proceeding to explain how this phenomena is represented by the model.

      Please see the following text:

      Revision:

      • the paragraph starting on page 7, column 2, line 43 ”When active muscle is lengthened, ...”

      • and the following paragraph starting on page 8, column 1, line 3 “To develop RFE, ”

      Difference:

      • the paragraph starting on page 8, column 2, line 15

      • and the following paragraph starting on page 9, column 1, line 6

      (17) Figure 3 legend and elsewhere: The authors use Prado et al. (2005) to determine several titin parameters, however the simulations seem to focus on cat soleus, but Prado et al.’s paper is on rabbits. More clarity is needed about which specific results from which species and muscles were used to parameterize the model.

      The new parameter table includes coded entries to indicate the literature source for experimental data, the animal it came from, and how the data was used. For example, the ‘ECM fraction’ has a source of ‘R[57]’ to show that the data came from rabbits from reference 57. For further details, please see the response to comment #3

      Please see the following text:

      • Revision: page 11, column 2, table section H: ‘ECM fraction’.

      • Difference: page 11, column 2, table section H: ‘ECM fraction’.

      To address this comment in a little more detail, we have had to use Prado et al. (2005) to give us estimates for only one parameter: P, the fraction of the passive force-length relation that is due to titin. Prado et al.’s measurements relating to P are unique to our knowledge: these are the only measurements we have to estimate P in any muscle, cat soleus or otherwise. Here we use the average of the values for P across the 5 muscles measured by Prado et al. as a plausible default value for all of our simulations.

      (18) Figure 4 seems unnecessary.

      Figure 4 has been removed.

      (19) Page 10, lines 17-18: provide the abbreviation (VAF) here with the definition (variance accounted for).

      Thank you for your attention to detail. The abbreviation has been added.

      Please see these parts of the manuscripts for details:

      • Revision: page 12, column 2, line 13

      • Difference: page 13, column 2, line 32

      (20) Page 11, lines 2-3: Here and elsewhere, it is clear that some model parameters have been optimized to fit the model. The main paper should include a table that lists all model parameters and how they were chosen or optimized, including but not limited to the information in Table 1 of the supplemental information section.

      See response to comment 3.

      (20) Page 17, lines 45 -49: Again, a substantial number of ad hoc adjustments to the model appear to be required. These should be described in the Discussion under limitations, and accounted for in the parameters table. See also legends to Fig. 12 and 13, page 19, lines 23-26.

      Please see the response to comment #3: a coded entry now appears to indicate the data source, the animal used in the experiment, and the method used to process the data. This includes entries for parameters which were estimated

      ‘E’ so that the model produced acceptable results in the simulations presented. In addition, the new discussion paragraph includes a number of sentences that use the adjustment to the active-titin-damping coefficient as an opening to discuss the limitations of the VEXAT’s titin-actin bond model and the circumstances under which the model’s parameters would need to be adjusted.

      Please see responses to comments 3 and 23 for additional details. In addition, please see the specific discussion text mentioning the change to βoPEVK:

      • Revision: page 22, column 1, line 30 ”In Sec. 3.3 we had ...”

      • Difference: page 24, column 1, line 49

      (22) Page 20, lines 50-11: It should be noted here that Tahir et al.’s (2018) model has both series and parallel elastic elements, provided by superposition of rotation (series) and translation (parallel) of a pulley.

      While it is true that Tahir et al.’s (2018) model has series and parallel elements, as do the other models mentioned, these models do not have the correct structure to yield a gain and phase response that mimics biological muscle. The text that I originally wrote attempted to explain this without going into the details. As you note, this explanation leaves something to be desired. The original text commenting on the models of Forcinito et al, Tahir et al, Haeufle et al., and Gunther et al. has been updated to be more specific.¨ Please see the parts of the following manuscripts for details:

      • Revision: page 22, column 2, line 20, the paragraph beginning ”The models of Forcinito ...”

      • Difference: page 24, column 2, line 44

      (23) Discussion: This section should include a description of model limitations, including the relatively large number of ad hoc modifications and how many parameters must be found by optimization in practice. The authors should discuss what types of data are most compatible for use with the model (ex vivo, in vivo, single fiber, whole muscle, MTU), requirements for applying the model to different types of data, and impediments to using the model on different types of data.

      An additional limitations paragraph has been added to the discussion.

      Please see the following text:

      • Revision: the paragraph beginning on page 22, column 1, line 11 ”Both the viscoelastic ...”

      • Difference: the paragraph beginning on page 24, column 1, line 27.

      Reviewer #2 (Recommendations For The Authors):

      (1) If it is possible to compare the output of this model to other more contemporary models which incorporate titin but are also simple enough to implement in whole-body simulation (such as the winding filament model), this would seem to greatly strengthen the paper.

      That’s an excellent idea, though beyond the scope of this already lengthy paper. Even though the Hill model we evaluated is a bit old it is widely used, and so, many readers will be interested in seeing the benchmark results. As benchmarking work is both difficult to fund and undertake, we do hope that others will evaluate their own models using the code and data we have provided.

      (2) I’m a little unclear on the basis for the transition between short- and midrange length changes, both in reality and in the model. And also about the range of strains that qualify as ”short”. It seems like there is potential for short range stiffness, although I would have thought more in the range of 1-2% strains than >3%, to be due to currently attached crossbridges. There is clear evidence that active titin is responsible for the low stiffness at very large strains that exceed actin-myosin overlap. But I am not clear on how a transitional stiffness on the descending limb of the force-length relationship is implemented in the model, and what aspect of physiology this is replicating. It may be helpful to clarify this further and indicate where in the model this stiffness arises.

      This question has several parts to it which I will paraphrase here:

      A Short-range stiffness acts over smaller strains than 3.8%. How is shortrange defined?

      B Where is the transition made between short-range and mid-range force response, both in reality and in the model. Also how does this change on the descending limb?

      C What components in the model contribute to the stiffness of the CE?

      A. Short-range stiffness acts over smaller strains than 3.8%. How is shortrange defined?

      The response to Reviewer 1’s comment # 5 directly addresses this question.

      B. Where is the transition made between short-range and mid-range forceresponse, both in reality and in the model. Also how does this change on the descending limb? We are going to rephrase the question because of changes in terminology that we have made in response to Reviewer 1’s comment #5.

      (i) What is the basis for the transition between the muscle behaving like an LTI system? Both in reality, and in the model. (ii) What happens outside the LTI range? (iii) Also how does this change on the descending limb?

      We will address this question one part at a time:

      (i) What is the basis for the transition between the muscle behaving like an LTI system? Both in reality, and in the model.

      A system’s response can be approximated as a linear-time-invariant (LTI) system as long as it is time-invariant, and its output can be expressed as a linear function of its input. In the context of Kirsch et al.’s experiment, the ‘system’ is the muscle, the ‘input’ is the time series of length data, and the ‘output’ is the time series of force data. Due to the requirement for timeinvariance, two experimental conditions must be met to approximate muscle as an LTI system:

      • the nominal length of the muscle stays constant over long periods of time,

      • and the nominal activation of the muscle stays constant.

      These conditions were met by default in Kirch et al.’s experiment, and also in our simulations of this experiment. The one remaining condition to assess is whether or not the muscle’s response is linear.

      To evaluate whether the muscle’s force is a linear function of the length change, Kirch et al. evaluated (Cxy)2 the coherence squared between the length and force time-series data. Even though the mathematical underpinnings of (Cxy)2 are complicated, the interpretation of (Cxy)2 is simple: muscle can be accurately approximated as a linear system if (Cxy)2 is close to 1, but the accuracy of this approximation becomes poor as (Cxy)2 approaches 0. Kirsch et al. used (Cxy)2 to identify a bandwidth in which the response of the muscle to the 1−3.8%ℓoM length changes was sufficiently linear for analysis: a lower bound of 4 Hz was identified using (Cxy)2 and the bandwidth of the input signal (15 Hz, 35 Hz, or 90 Hz) set the upper bound. In Fig. 3 of Kirsch et al. the (Cxy)2 at 4 Hz has a value of at least 0.67 for the 15 Hz and 90 Hz signals. To minimize error in our analysis and yet be consistent with Kirsch et al., we analyze the bandwidth common to both (Cxy)2 ≥ 0.67 and Kirsch et al.’s defined range. Though the bandwidth defined by the criteria (Cxy)2 ≥ 0.67 is usually larger than the one defined by Kirsch et al., there are some exceptions where the lower frequency bound of the models is higher than 4 Hz (now reported in Tables 4D and 5D).

      (ii) What happens outside the LTI range?

      When a muscle’s output cannot be considered a LTI it means that either that its length or activation is time-varying, or the relationship between length and force is no longer linear. In short, that the muscle is behaving as one would normally expect: time-varying and non-linearly. The wonderful part of Kirsch et al.’s work is that they found a surprisingly large region in the frequency domain where muscle behaves linearly and can be analyzed using the powerful tools of linear systems and signals.

      (iii) Also how does this change on the descending limb?

      Since nominal length of Kirsch et al.’s experiments is ℓoM it is not clear how the results of the perturbation experiments will change if the nominal length is moved firmly to the descending limb. However, we can see how the stiffness and damping values will change by examining Figure 9C and 9D which shows the calculated stiffness and damping of the VEXAT and Hill models as ℓM is lengthened from ℓoM down the descending limb: the stiffness and damping of the VEXAT model does not change much, while the Hill model’s stiffness changes sign and the damping coefficient changes a lot. What cannot be seen from Figure 9C and 9D is how the bandwidth over which the models are considered linear changes.

      We have made a number of updates to the text to more clearly communicate these details of our response to part (i):

      • Text has been edited so that it is clear that the terms ’short-range stiffness’ and ’small’ from Rack and Westbury’s work is not confused with ’stiffness’ and ’small’ from the LTI system’s analysis. Please see our response to comment # 5 for details.

      • We have added text to the main body of the paper to explain how the coherence squared metric was used to select a bandwidth in which the response of the system is approximately linear:

      • Revision: the paragraph that starts on page 11, column 1, line 3 ”Kirsch et al. used system identification ...”

      – Difference: page 13, column 2, line 1

      – Coherence is defined in Appendix D

      – Coherence is now also included in the example script ‘main SystemIdentificationExample.m’

      • The bandwidth over which model output can be considered linear (coherence squared > 0.67) has been added to Tables 4 and 5

      – Revision: see Table 4D, and Table 5D in Appendix E

      – Difference: see Table 4D, and Table 5D in Appendix E

      • Figures 6 and Figures 16 are annotated now if the plotted signal does not meet the linearity requirement of Cxy > 0.67.

      C. What components in the model contribute to the stiffness of the CE?

      There are three components that contribute to the stiffness of the CE which are pictured in Figure 1, appear in Eqn. 15, and are listed explicitly in Eqn. 76:

      (a) The XE, as represented by the afL(ℓ˜S+L˜M)k˜oX term in Eqn. 15.

      (b) The elasticity of the distal segment of titin, f2(ℓ˜2). Only f2(ℓ˜2) appears in Eqn. 15 because ℓ˜1 is a model state.

      (c) The extracellular matrix, as represented by the fECM(ℓ˜ECM)

      There is also a compressive element fKE, but it plays no role in the simulations presented in this work because it only begins to produce force at extremely short CE lengths (ℓ˜M < 0.1ℓoM).

      We have made the following changes to make these components clearer

      Figure 1A has been updated:

      – The symbols for a spring and a damper are now defined in Figure 1A

      – The ECM now has a spring symbol. Now all springs and dampers have the correct symbol in Figure 1A.

      – The caption now explicitly lists the rigid, viscoelastic, and elastic elements in the model

      The equations for the VEXAT’s CE stiffness and damping are now compared and contrasted to the the Hill model’s stiffness and damping in Sec. 3.1.

      – Revision: starting at page 14, column 2, line 1: Eqn. 28 and Eqn. 29 and surrounding text

      – Difference: page 17, column 1, line 22

      (3) This model appears to be an amalgamation of a phenomenological (forcelength and force-velocity relationships) and a mechanistic (crossbridge and titin stiffness and damping) model. While this may improve predictions, and so potentially be useful, it also seems like it limits the interpretation of physiological underpinnings of any findings. It may be helpful to explore in greater detail the implications of this approach.

      We have added a limitations paragraph to the discussion which addresses this comment and can be found in:

      • Revision: the paragraph beginning on page 22, column 1, line 11 ”Both the viscoelastic ...”

      • Difference: the paragraph beginning on page 24, column 1, line 27

      (4)As a biologist, I found the interpretation of phase and gain a little difficult and it may help the reader to show in greater detail the time series data and model predictions to highlight conditions under which the models do not accurately capture the magnitude and timing of force production.

      It is important that the ideas of phase and gain are understood, especially because little information can be gleaned from the time series data directly. There is some time series data in the paper already that compares each model’s response to its spring-damper of best fit: plots of the force response of each model and its spring damper of best fit can be found in Figures 6A, 6D, 6G, 6J, 16A, 16D, 16G, and 16J in the revised manuscript. While it is clear that models with a higher VAF more closely match the spring-damper of best fit, there is not much more that can be taken from time series data: the systematic differences, particularly in phase, are just not visually apparent in the time-domain but are clear in gain and phase plots in the frequency-domain.

      To make the meaning of phase and gain plots clearer, Figure 4 (Figure 5 in the first submission) has been completely re-made and includes plots that illustrate the entire process of going from two length and force timedomain signals to gain and phase plots in the frequency-domain. Included in this figure is a visual representation of transforming a signal from the time to the frequency domain (Fig. 4B and 4C), and also an illustration of the terms gain and phase (Fig. 4D). In addition, a small example file ’main SystemIdentificationExample.m’ has been added to the matlab code repository in the elife2023 branch to accompany Appendix D, which goes through the mathematics used to transform input and output time domain signals into gain and phase plots of the input-output relation. Small updates have been made to Figure 6 and 16 in the revised paper (Figures 7 and 18 in the first submission) to make the time domain signals from the spring-damper of best fit and the model output clearer. Finally, I have re-calculated the gain and phase profiles using a more advanced numerical method that trades off some resolution in frequency for more accuracy in the magnitude. This has allowed me to make Figures 6 and 16 easier to follow because the gain and phase responses are now lines rather than a scattering of points. We hope that these additions make the interpretation of gain and phase clearer.

      Please see

      Revision:

      – Figure 4 and caption on page 12

      – The opening 2 paragraphs of Sec 3.1 starting on page 10, column 2, line 4 ”In Kirsch et al.’s ...”

      – Figure 6 & 16: spring damper and model annotation added, plotted the gain and phase as lines

      – Appendix D: Updated to include coherence and the more advanced method used to evaluate the system transfer function, gain, and phase.

      Difference:

      – Figure 4 and caption on page 12

      – The opening 2 paragraphs of Sec 3.1 starting on page 12, column 1, line 34 and ending on page 13, column 2, line 29

      – Figure 6 & 16: spring damper and model annotation added

      – Appendix D

      (5) The actin-myosin and actin-titin load pathways are depicted as distinct in the model. However, given titin’s position in the center of myosin and the crossbridge connections between actin and myosin, this would seem to be an oversimplification. It seems worth considering whether the separation of these pathways is justified if it has any effect on the conclusions or interpretation.

      We have reworked one of the discussion paragraphs to focus on how our simulations would be affected by two mechanisms (Nishikawa et al.’s winding filament theory and DuVall et al.’s titin entanglement hypothesis) that make it possible for crossbridges to do mechanical work on titin.

      • Revision: the paragraph beginning on page 21, column 2, line 42 “The active titin model ...”

      • Difference: the paragraph beginning on page 23, column 2, line 48

      References

      Nishikawa KC, Monroy JA, Uyeno TE, Yeo SH, Pai DK, Lindstedt SL. Is titin a ‘winding filament’? A new twist on muscle contraction. Proceedings of the royal society B: Biological sciences. 2012 Mar 7;279(1730):981-90.

      DuVall M, Jinha A, Schappacher-Tilp G, Leonard T, Herzog W. I-Band Titin Interaction with Myosin in the Muscle Sarcomere during Eccentric Contraction: The Titin Entanglement Hypothesis. Biophysical Journal. 2016 Feb 16;110(3):302a.

    1. Author response:

      Reviewer #1 (Public Review):

      In this manuscript, Naseri et al. present a new strategy for identifying human genetic variants with recessive effects on disease risk by the genome-wide association of phenotype with long runs-of-homozygosity (ROH). The key step of this approach is the identification of long ROH segments shared by many individuals (termed "shared ROH diplotype clusters" by the authors), which is computationally intensive for large-scale genomic data. The authors circumvented this challenge by converting the original diploid genotype data to (pseudo-)haplotype data and modifying the existing positional Burrow-Wheeler transformation (PBWT) algorithms to enable an efficient search for haplotype blocks shared by many individuals. With this method, the authors identified over 1.8 million ROH diplotype clusters (each shared by at least 100 individuals) and 61 significant associations with various non-cancer diseases in the UK Biobank dataset.

      Overall, the study is well-motivated, highly innovative, and potentially impactful. Previous biobank-based studies of recessive genetic effects primarily focused on genome-wide aggregated

      ROH content, but this metric is a poor proxy for homozygosity of the recessive alleles at causal loci. Therefore, searching for the association between phenotype and specific variants in the homozygous state is a key next step towards discovering and understanding disease genes/alleles with recessive effects. That said, I have some concerns regarding the power and error rate of the methods, for both identification of ROH diplotype clusters and subsequent association mapping. In addition, some of the newly identified associations need further validation and careful consideration of potential artifacts (such as cryptic relatedness and environment sharing).

      1) Identification of ROH diplotype clusters.

      The practice of randomly assigning heterozygous sites to a homozygous state is expected to introduce errors, leading to both false positives and false negatives. An advantage that the authors claim for this practice is to reduce false negatives due to occasional mismatch (possibly due to genotyping error, or mutation), but it's unclear how much the false positive rate is reduced compared to traditional ROH detection algorithm. The authors also justified the "random allele drawing" practice by arguing that "the rate of false positives should be low" for long ROH segments, which is likely true but is not backed up with quantitative analysis. As a result, it is unclear whether the trade-off between reducing FNs and introducing FPs makes the practice worthwhile (compared to calling ROHs in each individual with a standard approach first followed by scanning for shared diplotypes across individuals using BWT). I would like to see a combination of back-of-envelope calculation, simulation (with genotyping errors), and analysis of empirical data that characterize the performance of the proposed method.

      In particular, I find the high number of ROH clusters in MHC alarming, and I am not convinced that this can be fully explained by a high density of SNPs and low recombination rate in this region. The authors may provide further support for their hypothesis by examining the genome-wide relationship between ROH cluster abundance and local recombination rate (or mutation rate).

      Thanks for this insightful comment. Through additional experiments, we confirmed that the excessive number of ROH clusters in the MHC region is due to the higher density of markers per centimorgan. As discussed above at Essential Revision 2, we took this opportunity to modify our code to search for clusters with the minimum length in terms of cM instead of sites. We have also provided the genetic distance for reported clusters in the MHC region with significant association (genetic length (cM) column in Tables 1 and 2). We include the following in the main text:

      “We searched for ROH clusters using a minimum target length of 0.1 cM (Figure 3–figure supplement 1). As shown in the figure, there is no excessive number of ROH clusters in chromosome 6 as was spotted using a minimum number of variant sites.”

      Methods section, ROH algorithm subsection:

      “We implemented ROH-DICE to allow direct use of genetic distances in addition to variant sites for L. The program can take minimum target length L directly in cM and detect all ROH clusters greater than or equal to the target length in cM. The program holds a genetic mapping table for all the available sites, and cPBWT was modified to work directly with the genetic length instead of the number of sites.”

      2) Power of ROH association. Given that the authors focused on long segments only (which is a limitation of the current method), I am concerned about the power of the association mapping strategy, because only a small fraction of causal alleles are expected to be present in long, homozygous haplotypes shared by many individuals. It would be useful to perform a power analysis to estimate what fraction of true causal variants with a given effect size can be detected with the current method. To demonstrate the general utility of this method, the authors also need to characterize the condition(s) under which this method could pick up association signals missed by standard GWAS with recessive effects considered. I suspect some variants with truly additive effects can also be picked up by the ROH association, which should be discussed in the manuscript to guide the interpretation of results.

      We added a new experiment in the Results section “Evaluation of ROH clusters in simulated data” under Power of ROH-DICE in association studies. We compared the power of the ROH cluster with additive, recessive, and dominant models. Our simulation shows that using ROH clusters outperforms standard GWAS when a phenotype is associated with a set of consecutive homozygous sites. We added the following text:

      “...We calculated the p-values for both ROH clusters and all variant sites. We used a p-value cut-off of 0.05 divided by the number of tests for each phenotype to determine whether the calculated p-value was smaller than the threshold, indicating an association. For GWAS, only one variant site within the ROH cluster, contributing to the phenotype, was required. We tested for all additive, dominant, and recessive effects (Figure 1–figure supplement 3). The figure demonstrates that ROH-DICE outperforms GWAS when a phenotype is associated with a set of consecutive homozygous sites. The maximum effect size of 0.3 resulted in ROH clusters achieving a power of 100%, whereas the additive model only achieved 11%, and the dominant and recessive models achieved 52% and 70%, respectively. The GWAS with recessive effect yields the best results among other GWAS tests, however, its power is still lower than using ROH clusters.”

      3) False positives of ROH association. GWAS is notoriously prone to confounding by population and environmental stratification. Including leading principal components in association testing alleviates this issue but is not sufficient to remove the effects of recent demographic structure and local environment (Zaidi and Mathieson 2020 eLife). Similar confounding likely applies to homozygosity mapping and should be carefully considered. For example, it is possible that individuals who share a lot of ROH diplotypes tend to be remotely related and live near each other, thus sharing similar environments. Such scenarios need to be excluded to further support the association signals.

      We acknowledge that there could be confounding factors that may affect the association's results. To address this, we utilized principal component (PC) values and additional covariates while using PHESANT after our initial Chi-square tests. We also included your comments in our Discussion section:

      "We used age, gender, and genetic principal components as confounding variables in the association analysis. Genetic principal components can reduce the confounding effect brought on by population structure but it may be insufficient to completely eliminate the effects of recent demographic structure and the local environment45. For example, individuals sharing excessive ROH diplotypes may share similar environments since they are closely related and reside close to one another. Since we did not rule out related individuals, some of the reported GWAS signals may not be attributable to ROH.”

      4) Validation of significant associations. It is reassuring that some of the top associations are indirectly corroborated by significant GWAS associations between the same disease and individual SNPs present in the ROH region (Tables 1 and 2). However, more sanity checks should be done to confirm consistency in direction of effect size (e.g., risk alleles at individual SNPs should be commonly present in risk-increasing ROH segment, and vice versa) and the presence of dominance effect.

      The beta values for effect size are now included in all reported tables. All beta values for ROH-DICE are positive indicating carriers of these ROH diplotypes may increase the risk of certain non-cancerous diseases. Moreover, we conducted the suggested sanity check to confirm the consistency of the direction of risk-inducing ROH diplotypes and risk alleles.

      We also computed D’ as a measure of linkage between the reported GWAS results and ROH clusters. We found that most of the GWAS results and ROH clusters are strongly correlated. However, in a few cases, D' is small or close to zero. In such cases, the reported p-value from GWAS was also insignificant, while the ROH cluster indicated a significant association. We included these points in the Results section.

      Reviewer #3 (Public Review):

      A classic method to detect recessive disease variants is homozygosity mapping, where affected individuals in a pedigree are scanned for the presence of runs of homozygosity (ROH) intersecting in a given region. The method could in theory be extended to biobanks with large samples of unrelated individuals; however, no efficient method was available (to the best of my knowledge) for detecting overlapping clusters of ROH in such large samples. In this paper, the authors developed such a method based on the PBWT data structure. They applied the method to the UK biobank, finding a number of associations, some of them not discovered in single SNP associations.

      Major strengths:

      •           The method is innovative and algorithmically elegant and interesting. It achieves its purpose of efficiently and accurately detecting ROH clusters overlapping in a given region. It is therefore a major methodological advance.

      •           The method could be very useful for many other researchers interested in detecting recessive variants associated with any phenotype.

      •           The statistical analysis of the UK biobank data is solid and the results that were highlighted are interesting and supported by the data.

      Major weaknesses:

      •           The positions and IDs of the ROH clusters in the UK biobank are not available for other researchers. This means that other researchers will not be able to follow up on the results of the present paper.

      We included the SNP IDs, positions, and consensus alleles for all reported loci in the main tables. Moreover, additional information including beta and D’ values were added. The current information should allow researchers to follow up on the results. Supplementary File 2 contains beta, D’ values for all reported clusters.

      Supplementary File 3 contains the SNP IDs and consensus alleles for all reported clusters in Tables 1 and 2. The consensus allele denotes the allele with the highest occurrence in the reported clusters.

      •           The vast majority of the discoveries were in regions already known to be associated with their respective phenotypes based on standard GWAS.

      We agree that a majority of the ROH regions are indeed consistent with GWAS. However, some regions were missed by standard GWAS (e.g. chr6:25969631-26108168, hemochromatosis). Our message is that our method is a complementary approach to standard GWAS and will not replace standard GWAS analysis. See our response to Reviewer #2 Point Six.

      •           The running time seems rather long (at least for the UK biobank), and therefore it will be difficult for other researchers to extensively experiment with the method in very large datasets. That being said, the method has a linear running time, so it is already faster than a naïve algorithm.

      Thank you for your input. The algorithm used to locate matching blocks is efficient and the total CPU hours it consumed was the reported run time. Since it consumes very little memory and resources, it can be executed simultaneously for all chromosomes. We also noticed that a significant time was being spent parsing the input file and slightly modified our script to improve the parsing. We also re-ran it for all chromosomes in parallel and reported the elapsed time which was only 18 hours and 54 minutes.

      “This was achieved by running the ROH-DICE program, with a wall clock time of 18 hours and 54 minutes where the program was executed for all chromosomes in parallel (total CPU hours of ~ 242.5 hours). The maximum residence size for each chromosome was approximately 180 MB.”

    1. Author response;

      Reviewer #1 (Public Review):

      Authors investigated the role of OBOX4 in the zygotic genome activation (ZGA) in mice. Obox4 genes form an array of duplicated genes they were identified as a candidate ZGA factor based on expression patterns during early development. The role of OBOX4 was subsequently studied in embryonic stem cells and early embryos. It was found that transcriptional activation mediated by OBOX4 has similar features as that of DUX, which was previously identified as a zygotic transcription factor involved in ZGA and a major activator of the zygotic expression program. It was, however, unexpected that Dux knock-out did not impair embryonic development. The work by Guo et al. provides several lines of evidence that OBOX4-mediated activation of gene expression considerably overlaps with that of DUX and this redundancy might explain the loss of early developmental phenotype in Dux mutants. Consistent with this model, double mutants of Obox4 and Dux show impaired development. Given the difficulties with investigating details of the genetic model in double mutants at the preimplantation embryo stage, authors not only crossed genetic mutants, but also used (1) nuclear transfer of mutated nuclei of ESCs, which could be characterized on their own in separate experiments, and (2) antisense oligonucleotides (ASO) microinjection, which included a rescue control demonstrating that reintroducing OBOX4 is sufficient to rescue the phenotype caused by blocking both, Dux and Obox4.

      This work is important for the field because it reveals functional redundancy and plasticity of the zygotic genome activation in mammals, where the mouse model stands as a remarkable example of genome activation, which massively integrated long terminal repeat (LTR)-derived enhancers from retrotransposons and now two of the key activating zygotic factors appear to be encoded by tandemly duplicated clusters of different phylogenetic age. Identification of OBOX4 as a second factor partially redundant with DUX now allows us to decipher what constitutes the essential part of the ZGA program.

      We are grateful for the reviewer’s appreciation of our work, particularly the technical difficulty of knocking out two multicopy genes and the value of the rescue experiment.

      Reviewer #2 (Public Review):

      In this study, Guo et al., screened a few homeobox transcription factors and identified that Obox4 can induce the 2-cell like state in mouse embryonic stem cells (mESCs) (Fig. 1 and 2). The authors also compared in detail how Obox4 vs. Dux in activating 2C repeats and genes in mESCs (Fig. 3). Compared to Dux, Obox4 activates fewer 2C genes (Fig. 2). In addition, although both Obox4 and Dux bind to MERVL elements, Obox4 additionally binds to ERVK (Fig. 3). The authors then used three different approaches (i.e., SCNT-mediated KO, ASO-mediated KD, and genetic KO) to study how Obox4 and Dux regulates zygotic genome activation in embryos. Although there are some inconsistencies among different approaches, the authors were able to show that loss of both Obox4 and Dux causes more severe consequences than loss of single protein in embryonic development and zygotic genome activation (Fig. 4 and 5).

      Overall, this is a comprehensive study that addresses an important question that puzzles the community. However, some comparisons to the recent work by Ji et al (PMID: 37459895) are highly recommended. Ji et al knocked out the entire Obox cluster (including Obox4) in mice and found that Obox cluster KO causes 2-4 cell arrest without affecting Dux. That said, Obox proteins seem more critical than Dux in regulating ZGA, and Obox cluster KO cannot be compensated by Dux. Ji et al., also reported that maternal (Obox1, 2, 5, 7) and zygotic (Obox3, 4) Obox proteins redundantly regulate embryogenesis because loss of either is compatible to development. Consistent with Ji's work, Obox4 KO embryos generated in this study can develop to adulthood and are fertile. Since these two studies are highly relevant, some comparisons of Obox4 KO and Obox4/Dux DKO with the previous Obox cluster KO will greatly benefit the community.

      We thank the reviewer for appreciating the value of our study. We are aware of the work done to high standard by Ji et al. and have included a comparison between our data and the data by Ji et al. in the revised manuscript. Despite repeated attempts, various crossing strategies failed to produce Obox4KO/DuxKO mating pairs that could be used to produce large number of Obox4KO/DuxKO embryos required for in-depth transcriptome analysis. Based on the quality of the RNA-seq, we decided to perform comparative analysis using our ASO KD data and showed that Obox4 has distinct regulatory targets from those of other Obox family members, which is consistent with the phylogenetic distance within the family.

    1. Author response:

      A general comment was that this study left several key questions unanswered, in particular the causal mechanism for the reported ribosomal distributions. We have been interested in the evolution of asymmetric bacterial growth and aging for many years. However, a motivational difference is that we are more interested in the evolutionary process, and evolution by natural selection works on the phenotype. Thus, we wanted to start with the phenotype closest to fitness, appropriately defined for the conditions, work downwards. We examined first the asymmetry of elongation rates in single cells, then gene products, and now ribosomes. As we have pointed out, our demonstration of ribosomal asymmetry shows that the phenomenon was not peculiar and unique to the gene products we examined. Rather, the asymmetry is acting higher up in the metabolic network and likely affecting all genes. We find such conceptual guidance to be important. In the ideal world, of course we would have liked to have worked out the causal mechanisms in one swoop. In a less than ideal situation, it is a subjective decision as where to stop. We believe that the publication of this manuscript is more than appropriate at this juncture. We work at the interface of evolutionary theory and microbiology. Our results could appeal to both fields. If we attract new researchers, progress could be accelerated. Could the delay caused by publishing only completed stories slow the rate of discovery? These questions are likely as old as science (e.g., https://telliamedrevisited.wordpress.com/2021/01/28/how-not-to-write-a-response-to-reviewers/).

      We present below our response to specific comments by reviewers. We have not added a new discussion of papers suggested by Reviewer #1 because we feel that the speculations would have been too unfocused. We were already criticized for speculation in the Discussion about a link between aggregate size and ribosomal density.

      Respond to Major comments by Reviewer #1.

      (a) Fig. 1 only shows 2 divisions (rather than 3 as per Rev1) to avoid an overly elaborate figure. We have added text to the figure legend that the old and new poles and daughters in the subsequent 3, 4, 5, 6, and 7 generations can be determined by following the same notations and tracking we presented for generations 1 and 2 in Fig. 1. For example, if we know the old and new poles of any of the four daughters after 2 divisions (as in Fig. 1), and allow that daughter to elongate, become a mother, and divide to produce 2 “grand-daughters”, the polarity of the grand-daughters can also be determined.

      (b) Because division times were normalized and analyzed as quartiles, the raw values were never used. Rather than annotating unused values, we have provided the mean division times in the Material and Methods section on normalization to provide representative values.

      (c) We did not quantify in our study the changes over generations for three reasons. First, the sample sizes for the first generations (cohorts of 1, 2, 4, and 8 cells) are statistically small. Second, and most importantly, cells on an agar pad in a microscope slide, despite being inoculated as fresh exponentially growing cells, experience a growth lag, as all cells transferred to a new physiological condition. Thus, to be safe, we do not collect data from cohorts 1, 2, 4, and 8 to ensure that our cells are as much as possible physiologically uniform. Lastly, as we noted in the Material and Methods they also slow down after 7 generations (128 cells). Thus, we have collected ribosome and length measurements primarily from cohorts 16, 32, 64, and 128. Measurable cells from the 128 cohort are actually rare because a colony with that many cells often starts to form double layers, which are not measurable. Most of our measurements came from the 16, 32, and 64 cohorts, in which case a time series would not be meaningful. Some of these details were not included in our manuscript but have been added to the Material and Methods (Microscopy and time-lapse movies). For these reasons we have not added a time series as requested by the reviewer.

      (d) We have added the additional figure as requested, but as a supplement rather than in the main article (Supplemental Materials Fig. S1). This figure showed the normalized density of ribosomes along the normalized length of old and new daughters. The density was continuous rather than quartiles. This figure was included in the original manuscript, but readers recommended that it be removed because the all the analyzed data had been done with quartiles. Readers felt mislead and confused.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      We greatly appreciate the comments from the editor and the reviewers, based on which we have made the revisions. We have responded to all the questions and summarized the revisions below. The changes are also highlighted in the manuscript.

      Additionally, we’ve noticed a few typos in the manuscript presented on the eLife website, which were not there in our originally submitted file.

      (1) In both the “Full text” presented on the eLife website and the pdf file generated after clicking “Download”: the last FC1000 in the second paragraph of the “Extensive induction curves fitting of TetR mutants” section should be FC1000WT .

      (2) In the pdf file generated after clicking “Download”: the brackets are all incorrectly formatted in the captions of Figure 4 and Figure 3—figure supplement 6.

      eLife assessment

      The fundamental study presents a two-domain thermodynamic model for TetR which accurately predicts in vivo phenotype changes brought about as a result of various mutations. The evidence provided is solid and features the first innovative observations with a computational model that captures the structural behavior, much more than the current single-domain models.

      We appreciate the supportive comments by the editor and reviewers.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors’ earlier deep mutational scanning work observed that allosteric mutations in TetR (the tetracycline repressor) and its homologous transcriptional factors are distributed across the structure instead of along the presumed allosteric pathways as commonly expected. Especially, in addition, the loss of the allosteric communications promoted by those mutations, was rescued by additional distributed mutations. Now the authors develop a two-domain thermodynamic model for TetR that explains these compelling data. The model is consistent with the in vivo phenotypes of the mutants with changes in parameters, which permits quantification. Taken together their work connects intra- and inter-domain allosteric regulation that correlate with structural features. This leads the authors to suggest broader applicability to other multidomain allosteric proteins. Here the authors follow their first innovative observations with a computational model that captures the structural behavior, aiming to make it broadly applicable to multidomain proteins. Altogether, an innovative and potentially useful contribution.

      We thank the reviewer for the supportive comments.

      Weaknesses:

      None that I see, except that I hope that in the future, if possible, the authors would follow with additional proteins to further substantiate the model and show its broad applicability. I realize however the extensive work that this would entail.

      We thank the reviewer for the supportive comments and the suggestion to extend the model to other proteins, which we indeed plan to pursue in future studies.

      Reviewer #2 (Public Review):

      Summary:

      This combined experimental-theoretical paper introduces a novel two-domain statistical thermodynamic model (primarily Equation 1) to study allostery in generic systems but focusing here on the tetracycline repressor (TetR) family of transcription factors. This model, building on a function-centric approach, accurately captures induction data, maps mutants with precision, and reveals insights into epistasis between mutations.

      Strengths:

      The study contributes innovative modeling, successful data fitting, and valuable insights into the interconnectivity of allosteric networks, establishing a flexible and detailed framework for investigating TetR allostery. The manuscript is generally well-structured and communicates key findings effectively.

      We thank the reviewer for the supportive comments.

      Weaknesses:

      The only minor weakness I found was that I still don’t have a better sense into (a) intuition and (b) mathematical derivation of Equation 1, which is so central to the work. I would recommend that the authors provide this early on in the main text.

      We thank the reviewer for the suggestion. The full mathematical derivation of Equation 1 is given in the first section of the supplementary file. Given the length of the derivation, we think it’s better to keep it in the supplementary file rather than the main text. In the main text, the first subsection (overview of the two-domain thermodynamic model of allostery) of the Results section and the paragraph right before Equation 1 are meant for providing intuitive understandings of the two-domain model and the derivation of Equation 1, respectively.

      We would also like to point the reviewer to Figure 2-figure supplement 2 and Equations (12) to (18) in the supplementary file for an alternative derivation. They show that the equilibria among all molecular species containing the operator are dictated by the binding free energies, the ligand concentration, and the allosteric parameters. The probability of an unbound operator (proportional to the probability that the promoter is bound by a RNA polymerase, or the gene expression level) can thus be calculated using Equation (12), which then leads to main text Equation 1 following the derivation given there.

      Additionally, we’ve added a paragraph to the main text (line 248-260) to aid an intuitive understanding of Equation 1.

      “The distinctive roles of the three biophysical parameter on the induction curve as stipulated in Equation 1 could be understood in an intuitive manner as well. First, the value of εD controls the intrinsic strength of binding of TetR to the operator, or the intrinsic difficulty for ligand to induce their separation. Therefore, it controls how tightly the downstream gene is regulated by TetR without ligands (reflected in leakiness) and affects the performance limit of ligands (reflected in saturation). Second, the value of εL controls how favorable ligand binding is in free energy. When εL increases, the binding of ligand at low concentrations become unfavorable, where the ligands cannot effectively bind to TetR to induce its separation from the operator. Therefore, the fold-change as a function of ligand concentration only starts to noticeably increase at higher ligand concentrations, resulting in larger EC50. Third, as discussed above, γ controls the level of anti-cooperativity between the ligand and operator binding of TetR, which is the basis of its allosteric regulation. In other words, γ controls how strongly ligand binding is incompatible with operator binding for TetR, hence it controls the performance limit of ligand (reflected in saturation).”

      We hope that the reviewer will find this explanation helpful.

      Reviewer #3 (Public Review):

      Summary:

      Allosteric regulations are complicated in multi-domain proteins and many large-scale mutational data cannot be explained by current theoretical models, especially for those that are neither in the functional/allosteric sites nor on the allosteric pathways. This work provides a statistical thermodynamic model for a two-domain protein, in which one domain contains an effector binding site and the other domain contains a functional site. The authors build the model to explain the mutational experimental data of TetR, a transcriptional repress protein that contains a ligand and a DNA-binding domain. They incorporate three basic parameters, the energy change of the ligand and DNA binding domains before and after binding, and the coupling between the two domains to explain the free energy landscape of TetR’s conformational and binding states. They go further to quantitatively explain the in vivo expression level of the TetR-regulated gene by fitting into the induction curves of TetR mutants. The effects of most of the mutants studied could be well explained by the model. This approach can be extended to understand the allosteric regulation of other two-domain proteins, especially to explain the effects of widespread mutants not on the allosteric pathways. Strengths: The effects of mutations that are neither in the functional or allosteric sites nor in the allosteric pathways are difficult to explain and quantify. This work develops a statistical thermodynamic model to explain these complicated effects. For simple two-domain proteins, the model is quite clean and theoretically solid. For the real TetR protein that forms a dimeric structure containing two chains with each of them composed of two domains, the model can explain many of the experimental observations. The model separates intra and inter-domain influences that provide a novel angle to analyse allosteric effects in multi-domain proteins.

      We thank the reviewer for the supportive comments.

      Weaknesses:

      As mentioned above, the TetR protein is not a simple two-main protein, but forms a dimeric structure in which the DNA binding domain in each chain forms contacts with the ligand-binding domain in the other chain. In addition, the two ligand-binding domains have strong interactions. Without considering these interactions, especially those mutants that are on these interfaces, the model may be oversimplified for TetR.

      We thank the reviewer for this valid concern and acknowledge that TetR is a homodimer. However, we’ve deliberately chosen to simplify this complexity in our model for the following reasons.

      (1) In this work, we aim to build a minimalist model for two-domain allostery withonly the most essential parameters for capturing experimental data. The simplicity of the model helps promote its mechanistic clarity and potential transferability to other allosteric systems.

      (2) Fewer parameters are needed in a simpler model. Our two-domain modelcurrently uses only three biophysical parameters, which are all demonstrated to have distinct influences on the induction curve (see the main text section “System-level ramifications of the two-domain model”). This enables the inference of parameters with high precision for the mutants, and the quantification of the most essential mechanistic effects of their mutations, provided that the model is shown to accurately recapitulate the comprehensive dataset. Thus, we found it was unnecessary to add another parameter for explicitly describing inter-chain coupling, which would likely incur uncertainty in the inference of parameters due to the redundancy of their effects on induction data, and prevent the model from making faithful predictions.

      (3) From a more biological point of view, TetR is an obligate dimer, meaning thatthe two chains must synchronize for function, supporting the two-domain simplification of TetR for binding concerns.

      Additionally, as shown in the subsection “Inclusion of single-ligand-bound state of repressor” of section 1 of the supplementary file, incorporating the dimeric nature of TetR in our model by allowing partial ligand binding does not change the functional form of main text equation 1 in any practical sense. Therefore, considering all the factors stated above, we think that increasing the complexity of the two-domain model will only be necessary if additional data emerge to suggest the limitation of our model.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      This is an excellent work. I have only one suggestion for the authors. Interestingly, the authors also note that the epistatic interactions that they obtain are consistent with the structural features of the protein, which is not surprising. Within this framework, have the authors considered rescue mutations? Please see for example PMID: 18195360 and PMID: 15683227. If I understand right, this might further extend the applicability of their model. If so, the authors may want to add a comment to that effect.

      We thank the reviewer for the supportive comments and for pointing us to the useful references. We have added some comments to the main text regarding this point in line 332-336: “The diverse mechanistic origins of the rescuing mutations revealed here provide a rational basis for the broad distributions of such mutations. Integrating such thermodynamic analysis with structural and dynamic assessment of allosteric proteins for efficient and quantitative rescuing mutation design could present an interesting avenue for future research, particularly in the context of biomedical applications (PMID: 18195360, PMID: 15683227).”

      Reviewer #3 (Recommendations For The Authors):

      The authors should try to build a more realistic dimeric model for TetR to see if it could better explain experimental data. If it were too complicated for a revision, more discussions on the weakness of the current model should be given.

      We thank the reviewer for this valid concern and for the suggestion. The reasons for refraining from increasing the complexity of the model are fully discussed in our response to the reviewer’s public review given above. Primarily, we think that the value of a simple physical model is two-fold (e.g., the paradigm Ising model in statistical physics and the classic MWC model), first, its mechanistic clarity and potential transferability makes it a useful conceptual framework for understanding complex systems and establishing universal rules by comparing seemingly unrelated phenomena; second, it provides useful insights and design principles of specific systems if it can quantitatively capture the corresponding experimental data. Thus, given the current experimental data set, we believe it is justified to keep the two-domain model in its current form, while additional experimental data could necessitate a more complex model for TetR allostery in the future. Relevant discussions are added to the main text (line 443-446) and section 8 of the supplementary file.

      “It’s noted that the homodimeric nature of TetR is ignored in the current two-domain model to minimize the number of parameters, and additional experimental data could necessitate a more complex model for TetR allostery in the future (see supplementary file section 8 for more discussions).”

      Minor issues:

      (1) There is an error in Figure 3A, the 13th and 14th subgraphs are the same and should be corrected.

      We thank the reviewer for capturing this error, which has been corrected in the revised manuscript.

      (2) The criteria for the selection of mutants for analysis should be clearly given. Apart from deleting mutants that are in direct contact with the ligand of DNA, how many mutants are left, and how far are they are from the two sites? In line 257, what are the criteria for selecting these 15 mutants? Similarly, in line 332, what are the criteria for selecting these 8 mutants?

      We thank the reviewer for this comment. The data selection criteria are now added in section 7 of the supplementary file. The distances to the DNA operator and ligand of the 21 residues under mutational study are now added in Table 1 (Figure 3-figure supplement 9). The added materials are referenced in the main text where relevant.

      “7. Mutation selection for two-domain model analysis

      In this work, there are 24 mutants studied in total including the WT, and they contain mutations at 21 WT residues. We did not perform model parameter inference for the mutant G102D because of its flat induction curve (see the second subsection of section 2 and main text Figure 2—figure Supplement 3). Therefore, there are 23 mutants analyzed in main text Figure 5.

      Measuring the induction curve of a mutant involves a significant amount of experimental effort, which therefore is hard to be extended to a large number of mutants. Nonetheless, we aim to compose a set of comprehensive induction data here for validating our two-domain model for TetR allostery. To this end, we picked 15 individual mutants in the first round of induction curve measurements, which contains mutations spanning different regions in the sequence and structure of TetR (main text Figure 3—figure Supplement 1). Such broad distribution of mutations across LBD, DBD and the domain interface could potentially lead to diverse induction curve shapes and mutant phenotypes for validating the two-domain model. Indeed, as discussed in the main text section "Extensive induction curves fitting of TetR mutants", the diverse effects on induction curve from mutations perturbing different allosteric parameters predicted by the model, are successfully observed in these 15 experimental induction curves. Additionally, 5 of the 15 mutants contain a dead-rescue mutation pair, which helps us validate the model prediction that a dead mutation could be rescued by rescuing mutations that perturb the allosteric parameters in various ways.

      Eight mutation combinations were chosen for the second round of induction curve measurement for studying epistasis, where we paired up C203V and Y132A with mutations from different regions of the TetR structure. Such choice is largely based on two considerations. 1. As both C203V and Y132A greatly enhance the allosteric response of TetR, we want to probe why they cannot rescue a range of dead mutations as observed previously (PMID: 32999067). 2. C203V and Y132A are the only two mutants that show enhanced allosteric response in the first round of analysis. Combining detrimental mutations of allostery in a combined mutant could potentially lead to near flat induction curve, which is less useful for inference (see the second subsection of section 2).”

      Since the number of hotspots identified by DMS is not very large, why not analyze them all?

      We thank the reviewer for this comment. There are 41 hotspot residues in TetR (PMID: 36226916), which have 41*19=779 possible single mutations. It’s unfeasible to perform induction curve measurements for all of these 779 mutants in our current experiment. However, we agree that it would be helpful if we can obtain such a dataset in an efficient way.

      In line 257, there are 15 mutants mentioned, while in Figure 5, there are 23 mutants mentioned, in Figure 3-figure supplement 1, there are 21 mutants mentioned, and in line 226 of the supplementary file, there are 24 mutants mentioned, which is very confusing. Therefore, the data selection criteria used in this article should be given.

      We thank the reviewer for this comment. The data selection criteria are now given in section 7 of the supplementary file, which should clarify this confusion.

      (3) In Figure 4 of the Exploring epistasis between mutations section, the 6 weights of the additive models corresponding to each mutation combination are different. On one hand, it seems that there are no universal laws in these experimental data. On the other hand, unique parameters of a single mutation combination were not validated in other mutation combinations, which somewhat weakened the conclusions about the potential physical significance of these additive weights.

      We thank the reviewer for this comment. We admit that a quantitative universal law for tuning the 6 weights of the additive model does not manifest in our data, which indicates the mutation-specific nature of epistatic interactions in TetR as hinted in the different rescuing mutation distributions of different dead mutations (PMCID: PMC7568325). However, clear common trends in the weight tuning of combined mutants that contain common mutations do emerge, which comply with the structural features of the protein and provide explanations as to why C203V and Y132A don’t rescue a range of dead mutations (main text section “Exploring epistasis between mutations”). Additionally, the lack of a quantitative universal rule for tuning the 6 weights in our simple model doesn’t exclude the possibility of the existence of universal law for epistasis in TetR in another functional form, a point that could be explored in the future with more extensive joint experimental and computational investigations.

      In Eq. (27) of the supplementary file, the prior distribution of inter-domain coupling γ is given as a Gaussian distribution centered at 5 kBT. Since the absolute value of γ is important, can the authors explain why the prior distribution of γ is set to this value and what happens if other values are used?

      We thank the reviewer for the question. As explained in the corresponding discussions of Eq. (27) in the supplementary file, the prior of γ is chosen to serve as a soft constraint on its possible values based on the consideration that 1. inter-domain energetics for a TetR-like protein should be on the order of a few kBT; and 2. the prior distribution should reflect the experimental observation in the literature that γ has a small probability of adopting negative values upon mutations. Given our thorough validation of the statistical model and computational algorithm (see section 3 of the supplementary file), and the high precision in the parameter fitting results using experimental data (Figure 3 and Figure 4-figure supplement 2), we conclude that 1. the physical range of parameters encoded in their chosen prior distributions agrees well with the value reflected in the experimental data; 2. the inference results are predominantly informed by the data. Thus, changing the mean of the prior distribution of γ should not affect the inference results significantly given that it remains in the physical range.

      This point is explicitly shown in the added Table 2 (Figure 3-figure supplement 10), where we compare the current Bayesian inference results with those obtained after increasing the standard deviation of the Gaussian prior of γ from 2.5 to 5 kBT. As shown in the table, most inference results stay virtually unchanged at the use of this less informative prior, which confirms that they are predominantly informed by the data. The only exceptions are the slight increase of the inferred γ values for C203V, C203V-Y132A and C203V-G102D-L146A, reflecting the intrinsic difficulty of precise inference of large γ values with our model, as is already discussed in the second subsection of section 3 of the supplementary file. However, such observations comply with the common trend of epistatic interactions involving C203V presented in the main text and don’t compromise the ability of our model to accurately capture the induction curves of mutants. Relevant discussions are now added to the second subsection of section 3 of the supplementary file (line 368-385).

      “In our experimental dataset, such inference difficulty is only observed in the case of C203V, Y132A-C203V and C203V-G102D-L146A due to their large γ and γ + εL values (see main text Figure 3, Figure 3—figure Supplement 10 and Figure 4). As shown in main text Figure 3—figure Supplement 10, the inference results for the other 20 mutants stay highly precise and virtually unchanged after increasing the standard deviation of the Gaussian prior of γ (gstdγ ) from 2.5 to 5 kBT. This demonstrates that the inference results for these mutants are strongly informed by the induction data and there is no difficulty in the precise inference of the parameter values. On the other hand, the inferred γ values (especially the upper bound of the 95% credible region) for C203V, Y132A-C203V and C203V-G102D-L146A increased with gstdγ . This is because the induction curves in these cases are not sensitive to the value of γ given that it’s large enough as discussed above. Hence, when unphysically large γ values are permitted by the prior distribution, they could enter the posterior distribution as well. Such difficulty in the precise inference of γ values for these three mutants however, doesn’t compromise the ability of our model in accurately capturing the comprehensive set of induction data (see part iv below). Additionally, the increase of the inferred γ value of C203V at the use of larger gstdγ complies with the results presented in main text Figure 4, which show that the effect of C203V on γ tends to be compromised when combined with mutations closer to the domain interface."

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      The study provides potentially fundamental insight into the function and evolution of daily rhythms. The authors investigate the function of the putative core circadian clock gene Clock in the cnidarian Nematostella vectensis. While it parts still incomplete, the evidence suggests that, in contrast to mice and fruit flies, Clock in this species is important for daily rhythms under constant conditions, but not under a rhythmic light/dark cycle, suggesting that the major role of the circadian oscillator in this species could be a stabilizing function under non-rhythmic environmental conditions.

      Public Reviews:

      Reviewer #1 (Public Review):

      In this nice study, the authors set out to investigate the role of the canonical circadian gene Clock in the rhythmic biology of the basal metazoan Nematostella vectensis, a sea anemone, which might illuminate the evolution of the Clock gene functionality. To achieve their aims the team generated a Clock knockout mutant line (Clock-/- ) by CRISPR/Cas9 gene deletion and subsequent crossing. They then compared wild-type (WT) with Clock-/- animals for locomotor activity and transcriptomic changes over time in constant darkness (DD) and under light/dark cycles to establish these phenotypes under circadian control and those driven by light cycles. In addition, they used Hybridization Chain Reaction-In situ Hybridization (HCR-ISH) to demonstrate the spatial expression of Clock and a putative circadian clocl-controlled gene Myh7 in whole-mounted juvenile anemones.

      The authors demonstrate that under LD both WT and Clock-/- animals were behaviourally rhythmic but under DD the mutants lost this rhythmicity, indicating that Clock is necessary for endogenous rhythms in activity. With altered LD regimes (LD6:6) they show also that Clock is light-dependent. RNAseq comparisons of rhythmic gene expression in WT and Clock-/- animals suggest that clock KO has a profound effect on the rhythmic genome, with very little overlap in rhythmic transcripts between the two phenotypes; of the rhythmic genes in both LD and DD in WT animals (220- termed clock-controlled genes, CCGS) 85% were not rhythmic in Clock-/- animals in either light condition. In silico gene ontology (GO) analysis of CCGS reflected process associated with circadian control. Correspondingly, those genes rhythmic in KO animals under DD (here termed neoCCGs) were not rhythmic in WT, lacked upstream E-box motifs associated with circadian regulation, and did not display any GO enrichment terms. 'Core' circadian genes (as identified in previous literature) in WT and Clock-/- animals were only rhythmic under entrainment (LD) conditions whilst Clock-/- displayed altered expression profiles under LD compared to WT. Comparing CCGs with previous studies of cycling genes in Nematostellar, the authors selected a gene from 16 rhythmic transcripts. One of these, Myh7 was detectable by both RNAseq and HCR-ISH and considered a marker of the circadian clock by the authors.

      The authors claim that the study reveals insights into the evolutionary origin of circadian timing; Clock is conserved across distant groups of organisms, having a function as a positive regulator of the transcriptional translational feedback loop at the heart of daily timing, but is not a central element of the core feedback loop circadian system in this basal species. Their behavioural and transcriptomic data largely support the claims that Clock is necessary for endogenous daily activity but that the putative molecular circadian system is not self-sustained under constant darkness (this was known already for WT animals)- rather it is responsive to light cycles with altered dynamics in Clock-/- specimens in some core genes under LD. In the main, I think the authors achieved their aims and the manuscript is a solid piece of important work. The Clock-/- animal is a useful resource for examining time-keeping in a basal metazoan.

      The work described builds on other transcriptomic-based works on cnidaria, including Nematostellar, and does probe into the molecular underpinnings with a loss-of-function in a gene known to be core in other circadian systems. The field of chronobiology will benefit from the evolutionary aspect of this work and the fact that it highlights the necessity to study a range of non-model species to get a fuller picture of timing systems to better appreciate the development and diversity of clocks.

      Strengths:

      The generation of a line of Clock mutant Nematostellar is a very useful tool for the chronobiological community and coupled with a growing suite of tools in this species will be an asset. The experiments seem mostly well conceived and executed (NB see 'weaknesses'). The problem tackled is an interesting one and should be an important contribution to the field.

      Weaknesses:

      I think the claims about shedding light on the evolutionary origin of circadian time maintenance are a little bold. I agree that the data do point to an alternative role for Clock in this animal in light responsiveness, but this doesn't illuminate the evolution of time-keeping more broadly in my view. In addition, these are transcriptomic data and so should be caveated- they only demonstrate the expression of genes and not physiology beyond that. The time-course analysis is weakened by its low resolution, particularly for the RAIN algorithm when 4-hour intervals constrain the analysis. I accept that only 24h rhythms were selected in the analysis from this but, it might be that detail was lost - I think a preferred option would be 2 or 3-hour resolution or 2 full 24h cycles of analysis.

      The authors discount the possibility of the observed 12h rhythmicity in Clock-/- animals by exposing them to LD6:6 cycles before free-running them in DD. I suggest that LD cycles are not a particularly robust way to entrain tidal animals as far as we know. Recent papers show inundation/mechanical agitation are more reliable cues (Kwiatkowski ER, et al. Curr Biol. 2023, 2;33(10):1867-1882.e5. doi: 10.1016/j.cub.2023.03.015; Zhang L., et al Curr Biol. 2013, 23;19, 1863-1873 doi.org/10.1016/j.cub.2013.08.038.) and might be more effective in revealing endogenous 12h rhythms in the absence of 24h cues.

      Response: We removed the suggestion that we used 6:6h LD to perform tidal entrainment. We generated this ultradian light condition to address the 24h rhythmicity observed in the NvClk1-/- in 12:12h LD.

      Reviewer #2 (Public Review):

      This manuscript addresses an important question: what is the role of the gene Clock in the control of circadian rhythms in a very primitive group of animals: Cnidaria. Clock has been found to be essential for circadian rhythms in several animals, but its function outside of Bilaterian animals is unknown. The authors successfully generated a severe loss-of-function mutant in Nematostella. This is an important achievement that should help in understanding the early evolution of circadian clocks. Unfortunately, this study currently suffers from several important weaknesses. In particular, the authors do not present their work in a clear fashion, neither for a general audience nor for more expert readers, and there is a lack of attention to detail. There are also important methodological issues that weaken the study, and I have questions about the robustness of the data and their analysis. I am hoping that the authors will be able to address my concerns, as this work should prove important for the chronobiology field and beyond. I have highlighted below the most important issues, but the manuscript needs editing throughout to be accessible to a broad audience, and referencing could be improved.

      Major issues:

      (1) Why do the authors make the claim in the abstract that CLOCK function is conserved with other animals when their data suggest that it is not essential for circadian rhythms? dCLK is strictly required in Drosophila for circadian rhythms. In mammals, there are two paralogs, CLOCK and NPAS2, but without them, there are no circadian rhythms either. Note also that the recent claim of BMAL1-independent rhythms in mammals by Ray et al., quoted in the discussion to support the idea that rhythms can be observed in the absence of the positive elements of the circadian core clock, had to be corrected substantially, and its main conclusions have been disputed by both Abruzzi et al. and Ness-Cohn et al. This should be mentioned.

      Response: According to our Behavioral and Transcriptomic data, CLOCK function is conserved in constant light condition. In LD context, the rhythmicity is maintained probably by the light-response pathway in Nematostella. We modified our rhythmic transcriptomic analysis and considered the context of the contested results by Ray et al., and discussed it in the revised manuscript.

      (2) The discussion of CIPC on line 222 is hard to follow as well. How does mRNA rhythm inform the function of CIPC, and why would it function as a "dampening factor"? Given that it is "the only core clock member included in the Clock-dependent CCGs," (220) more discussion seems warranted. Discussing work done on this protein in mammals and flies might provide more insight.

      Response: The initial sentence was unclear. Furthermore, since we restricted our rhythmic analysis to genes only found rhythmic with a p<0.01 with RAIN combined with JTK, NvCipc was no longer defined as rhythmic in free running.

      (3) The behavioral arrhythmicity seen with their Clock mutation is really interesting. However, what is shown is only an averaged behavior trace and a single periodogram for the entire population. This leaves open the possibility that individual animals are poorly synchronized with each other, rather than arrhythmic. I also note that in DD there seem to be some residual rhythms, though they do not reach significance. Thus, it is also possible that at least some individual animals retain weak rhythms. The authors should analyze behavioral rhythms in individual animals to determine whether behavioral rhythmicity is really lost. This is important for the solidity of their main conclusions.

      Response: Fig. 1 has been modified. We have separated the data for WT and NvClk1-/- animals to provide clarity on the average behavior pattern for each genotype. While the LSP analysis on the population average informs us about the synchronization of the population, it is true that it does not provide insight into individual rhythmicity. To address this, we analyzed individuals in all conditions using the Discorhythm website (Carlucci et al., 2019).

      In the revised figure, we have included a comparison plot of the acrophase of 24-hour rhythmic animals between genotypes using Cosinor analysis, which is most suitable for acrophase detection. This plot indicates the number of animals detected as significantly rhythmic, providing direct visual input to the reader regarding individual rhythmicity. Additionally, we have added Table 1, which contains the Cosinor period analysis (24 and 12 hours) of individuals for all genotypes and conditions, further enhancing the clarity of our findings.

      (4) There is no mention in the results section of the behavior of heterozygotes. Based on supplement figure 2A, there is a clear reduction in amplitude in the heterozygous animals. Perhaps this might be because there is only half a dose of Clock, but perhaps this could be because of a dominant-negative activity of the truncated protein. There is no direct functional evidence to support the claim that the mutant allele is nonfunctional, so it is important to discuss carefully studies in other species that would support this claim, and the heterozygous behavior since it raises the possibility that the mutant allele acts as a dominant negative.

      Response: Extended Data Fig.1 modified. We show NvClk1+/- normalized locomotion over time in DD of the population, comparison of individual normalized behavior amplitude, LSP of the average population and individual acrophase of only rhythmic 24h individuals. Indeed, we cannot discriminate Dominant-negative from non-functional allele.

      (5) I do not understand what the bar graphs in Figure 2E and 3B represent - what does the y-axis label refer to?

      Response: Not relevant to the revised manuscript.

      (6a) I note that RAIN was used, with a p<0.05 cut-off. I believe RAIN is quite generous in calling genes rhythmic, and the p-value cut-off is also quite high. What happens if the stringency is increased, for example with a p<0.01.

      Response: We acknowledge your concern regarding the stringency of our statistical analysis. To address this, we opted to combine both RAIN and JTK methods and applied a more stringent p-value cut-off of p<0.01.

      (6b) It would be worth choosing a few genes called rhythmic in different conditions (mutant or wild-type. LD or DD), and using qPCR to validate the RNAseq results. For example, in Figure 3D, Myh7 RNAseq data are shown, and they do not look convincing. I am surprised this would be called a circadian rhythm. In wild-type, the curve seems arrhythmic to me, with three peaks, and a rather large difference between the first and second ZT0 time point. In the Clock mutants, rhythms seem to have a 12hr period, so they should not be called rhythmic according to the material and methods, which says that only ca 24hr period mRNA rhythms were considered rhythmic. Also, the result section does not say anything about Myh7 rhythms. What do they tell us? Why were they presented at all?

      Response: Regarding the suggestion for independent verification of our RNAseq results, we agree that such validation would enhance the robustness of our findings. To address this, we chose to overlap our identified rhythmic genes under WT LD conditions with those from another transcriptomic study that shared similarities in experimental design. Notably, the majority of overlapping rhythmic genes between the studies are candidate pacemaker genes. We believe that this replication of biologically significant rhythmic genes strengthens the validity and reliability of our results (see Extended Data Fig. 2).

      Furthermore, we have decided to remove the NvMhc-st (mistakenly named Myh7, only rhythmic in WT DD in the new analysis) as it does not contribute substantively to the revised version of the manuscript.

      (7) The authors should explain better why only the genes that are both rhythmic in LD and DD are considered to be clock-controlled genes (CCGs). In theory, any gene rhythmic in DD could be a CCG. However, Leach and Reitzel actually found that most genes in DD1 do not cycle the next day (DD2)? This suggests that most "rhythmic" genes might show a transient change in expression due to prolonged obscurity and/or the stress induced by the absence of a light-dark cycle, rather than being clock controlled. Is this why the authors saw genes rhythmic under both LD and DD as actual CCGs? I would suggest verifying that in DD the phase of the oscillation for each CCG is similar to that in LD. If a gene is just responding to obscurity, it might show an elevated expression at the end of the dark period of LD, and then a high level in the first hours of DD. Such an expression pattern would be very unlikely to be controlled by the circadian clock.

      Response: As we modified our transcriptomic analysis, we do no longer analyze LD+DD rhythmic genes, but any genes rhythmic (RAIN and JTK p<0.01) in each condition. As such we end up with four list of genes corresponding to each experimental conditions.

      (8) Since there are still rhythms in LD in Clock mutants, I wonder whether there is a paralog that could be taking Clock's place, similar to NPAS2 in mammals.

      Response: see response to (1) > The only NPAS2 orthologous identified in Nematostella NPAS3 showed marginally significance (p=0.013) with RAIN in LD WT suggesting a regulation similar to the candidate pacemaker genes. As such we included within our candidate pacemaker genes list.

      (9) I do not follow the point the authors try to make in lines 268-272. The absence of anticipatory behavior in Drosophila Clk mutants results from disruption of the circadian molecular clock, due to the loss of Clk's circadian function. Which light-dependent function of Clock are the authors referring to, then? Also, following this, it should be kept in mind that clock mutant mice have a weakened oscillator. The effect on entrainment is secondary to the weakening of the oscillator, rather than a direct effect on the light input pathway (weaker oscillators have increased response to environmental inputs). The authors thus need to more clearly explain why they think there is a conservation of circadian and photic clock function.

      Response: Following the changes in our statistical analysis we reframed the discussion and address directly the circadian and the photic clock function (we call it light-response pathway in the manuscript)

      Recommendations for the authors:

      We suggest the following improvements:

      (1) Please undertake a serious effort to make this work more accessible to non-marine chronobiologists. This includes better explanations, and schemes of the animal when images of staining are shown (e.g. Fig.1b) which include the labeling of relevant morphological structures mentioned in the text (like "tentacle endodermis and mesenteries" (line 132)). Similar issues for mentioned life cycle stages like "late planula stage" (line 133), "bisected physa" (line 149).

      Response: Fig. 1b, we outlined the animal shaped and added 2 arrows to locate the tentacle endodermis and mesenteries. We replaced the term late planula stage, by larvae. And we rephrased bisected physa by tissue sampling.

      Please attend to details. This includes:

      • Wrong referrals to figures (currently line 151 refers to EDF2- but should be EDF 1 instead, there is a Fig.3f mentioned in the text, but there is no such Fig.).

      Response: Fixed

      • Mentioning of ZTs when the HCR stainings were performed.

      Response: Fixed

      • Fig.1 a shows a rather incomplete and thus potentially confusing phylogenetic tree. Vertebrates have at least two Clk orthologs (NPAS2 and CLK), please include both, use an outgroup, and rout the tree.

      Response: Identifying NPAS2 and CLK orthologous in all species added more confusion into the conclusion. However, we followed the suggestion of adding an outgroup using a CLK orthologous sequence identified in the sponge Amphimedon queenslandica and rout the tree. Thank for the suggestion.

      • What do the y-axis labels in Figure 2E and 3B refer to exactly? Y-axis label annotations in Fig.3a,d are entirely missing- what do the numbers refer to?

      Response: not relevant in the revised manuscript

      • Fig.2D- is the Go term enrichment referring to LD or DD?

      Response: to DD. We made it cleared on the figure 5.

      • Wording: "Clock regulates genetic pathways." What is meant by "genetic pathways"? There are no "non-genetic pathways". Could one simply say: "Clock regulates a variety of transcripts".

      Response: We modified our threshold to use only p.adj<0.01, which reduced the GO term numbers. We removed “genetic pathways” and now address the specific pathways: cell-cycle and neuronal.

      The use of the term "epistatic" is confusing (line 219), i.e. that light is epistatic to Clock. In genetics, epistasis is defined as the effect of gene interactions on phenotypes. To a geneticist, this implies that there is a second gene impacting on the phenotype of the Clock mutants. Please re-word.

      Response: “light is epistatic on Clock” has been re-phrased.

      The provided Supplementary tables are not well annotated. Several of them need guess-work about what is shown. For instance, for Supplementary Table 1, the Ns are unclear, which in total can go up to almost 200 per condition-genotype, but only about 30 animals for each were tested. Thus, where do the high totals in the LSP table come from? What do the numbers of each periodicity mean? Initially one might assume it was the number of animals that showed a periodogram peak at a given periodicity, but it seems that cannot be. Maybe it counted any period bin over statistical significance? Please clarify with better descriptions and labels.

      Response: Supplementary tables are now clearly annotated on their first Tabs. About Fig.1, we already addressed this point in the public review.

      Albeit not essential, it would be more reader-friendly to also add a summary table with average period and SD, power and SD, and percentage rhythmicity to the main figure.

      Response: Table 1 is added: it contains individual count of rhythmic animals (24h and 12h) with Cosinor. However, using Discorhythm we had to ask for a specific Period. Thus, we can only provide animal count significant for a given period value. And not an estimation of their own period.

      (2) Some of the terminology is quite confusing, in particular the double meaning of the word "clock" (i.e the pacemaker and the transcription factor). This is not a specific problem to this manuscript, but it would be helpful for the readability to try to improve this.

      Could the gene/transcript/protein be spelled: clk and Clk?

      Alternatively, for clarity- how about talking about "core pacemaker genes," "CLOCK-dependent rhythmic genes" and "CLOCK-independent rhythmic genes"?

      Response:

      Clock/CLOCK > NvClk / NvCLK and the mutant is NvClk1-/-

      Core clock genes > candidate pacemaker genes.

      CLOCK-dependent CCG > this notion no longer exists in the revised manuscript.

      CLOCK-independent CCG > this notion no longer exists in the revised manuscript.

      (3) The dismissal of the 12h rhythmicity in Clock-/- animals is not really convincing and should be reconsidered. LD6:6 cycles (before free-running animals in DD) is likely a not particularly robust way to entrain tidal animals. Recent papers show inundation/mechanical agitation are more reliable cues (Kwiatkowski ER, et al. Curr Biol. 2023, 2;33(10):1867-1882.e5. doi: 10.1016/j.cub.2023.03.015; Zhang L., et al Curr Biol. 2013, 23;19, 1863-1873 doi.org/10.1016/j.cub.2013.08.038.) and might be more effective in revealing endogenous 12h rhythms in the absence of 24h cues.

      Response: We removed the proposition of using 6:6hLD as Tidal entrainment. Instead, the LD 6:6 experiment reveals the direct light-dependency of the NvClk1-/- mutant.

      (4) There are significant questions raised on the validity of BMAL1-independent rhythms in mammals as suggested by the Ray et al study. See DOI: 10.1126/science.abe9230 and DOI: 10.1126/science.abf0922

      These technical comments should also be taken into account and the discussion adjusted accordingly to better reflect the ongoing discussions in the chronobiology field.

      Response: We modified our rhythmic analysis. As we cannot use BHQ or adjusted p-value which resulted in very genes, we defined 24h-rhythmic genes if p<0.01 with two different algorithms (RAIN and JTK). We propose this compromise to reduce the risk of false-positive. Furthermore, we discussed our methodology in the light of the significant questions raised by these papers you cited. We thank the reviewer for this important point.

      (5) The HCR stainings for clk are not very convincing. Normally, HCR should have more dots. In principle, the logic of HCR is such that it detects individual mRNA molecules in the cell. Thus, having only one strong dot/cell like in Fig.1b doesn't make much sense.

      Response: We were the first surprised by this single dot signal. We are experienced users of HCRv.3 across different species. We decided to remove the close-up (for further investigations) but to keep the full animal signal. According to our approach it is a convincing signal. However, the doty nature of the signal itself it is not easy to make it highly visible at full scale animal on the picture. We did our best to show the mRNA signal visible without altering the pattern.

      Furthermore, the controls for the HCR in situ hybridization are unclear. In the methods, there are two Clock probes described (B3 & B5) and two control probes (B1 & B3), however, in the negative control image, a combination of one Clock (B1) and one control (B3) probes is used and is unclear what "redundant detection" means in the legend of figure S2.

      Response: Considering the nature of the signal (single of few dots), we decided to use two probes with 2 different fluorophores. A noise is by nature random. Our hypothesis was: only overlapping fluorescent dots are true signal of NvClk mRNA.

      For Control probes we used two zebrafish probes labelling hypothalamic peptides.

      Based on the experience with non-Drosophila, non-mouse animal model systems the reviewers assume that non-sense mediated mRNA decay (NMD) is not strongly initiated upon Crispr-induced premature STOP-codons. If this assumption is correct it would be worth to mention it. Alternatively, it would be worth testing if Nematostella induces NMD, as this would be a great control for the HCR and the mutation itself. At which ZT was the HCR done?

      Response: We performed the HCR at ZT10 when NvClk is described to be at peak. It is now indicated in the Fig. 1b. The RNAseq detected a higher quantity of NvClk1 mRNA in the NvClk1-/- (see Fig. 4a). mRNA quantity regulation involves transcription, stabilization, and degradation. At this stage, we cannot identify which specific step is affected.

      For Fig.1c- please provide the binding site and sequence in the figure, simply include EDF 1 in the main figure.

      Response: We generated a clear indication in the new Fig.1c and EDF. 1b about the protein domains, the CRISPR binding site and the consequences on the DNA and AA sequences.

      (6) Please provide the individual trace data for the behavioral analyses either as supplementary files or as a link to an openly accessible database like DRYAD (see also comment 7 in the public review of reviewer 2). Maybe this is what is shown in Supplementary Table 1, but it is really not clear what is actually shown.

      Response: Fig.1 is updated. Table 1 is added. Supplementary Table 1 contains individual normalized locomotor data of each polyps for each genotypes and light conditions. Supplementary Table 2 contains the cosinor individual rhythmic behavior analysis based on the Supplementary Table 1.

      (7) It is not really clear if the mutation is a true loss-of-function or could also be dominant negative. While this is raised in the discussion, it should be more carefully considered. The reason why a dominant negative would be unlikely is unclear. More specifically also see comment 8) in the public review of reviewer 2.

      Response: Indeed, the results cannot tell us if it is a true loss of function, a dominant negative or non-functional allele. We addressed it in the first part of the discussion.

      (8) The pretty small overlap of rhythmic transcripts in LD and DD could reflect the true biology of a more core clock driven-process under constant conditions and a more light-driven process under LD. But still- wouldn't one expect that similar processes should be rhythmic? If not, why not?

      It would certainly add strength to the data if for one or two transcripts these results were independently verified by qPCR from an independent sampling. This could even be done for just two time points with the most extreme differences.

      Response: We appreciate the reviewer's comments and concerns regarding the overlap of rhythmic transcripts in different conditions. In response to the reviewer's query, we revised our interpretation of the transcriptomic data, acknowledging the limited overlap between light and genotype conditions in our study. This prompted us to reconsider the underlying biological processes driving rhythmic gene expression under constant conditions versus light-dark cycles.

      Regarding the suggestion for independent verification of our RNAseq results, we agree that such validation would enhance the robustness of our findings. To address this, we chose to overlap our identified rhythmic genes under WT LD conditions with those from another transcriptomic study that shared similarities in experimental design. Notably, the majority of overlapping rhythmic genes between the studies are candidate pacemaker genes. We believe that this replication of biologically significant rhythmic genes strengthens the validity and reliability of our results (see Extended Data Fig. 2).

      (9) Expression of myh7 : Checking for co-expression should be pretty straightforward by HCR. This is what this type of staining technique is really good for. Please do clk and myh7 co-staining if you want to claim co-expression. Otherwise don't make such a claim.

      Response: We agree that checking for co-expression should be straightforward by HCR. However, due to time constraints during the revision period, we are unable to conduct the double in-situ experiment. Additionally, upon careful consideration, we recognize that including myhc-st (mistakenly named myh7) staining and co-expression analysis would not significantly contribute to the main conclusions of our study. Therefore, we have decided to remove this analysis from the revised manuscript.

      (10) Missing methodological details:

      • The false discovery rate for each analysis should be included (see Hughes et al.,: "Guidelines for Genome-Scale Analysis of Biological Rhythms," 2017).

      Response: THE FDR is indicated for each gene in supplementary table 3

      • Fig.1f- continuous light- please provide a spectrum (If there is no good spectrophotometer available, please provide at least manufacturer information.

      Response: Unfortunately, we don’t have a good spectrophotometer available during the time of the revision. We added to the method the reference of the lamp. We found the light spectrum provided by the supplier. However, we did not add it to the revised manuscript.

      Author response image 1.

      Spectrum of the Aquastar t8

      Also, it would be easier for the reader, if the measurements of light intensity are provided in photons, because this is what the light receptors ultimately measure.

      Response: Modified.

      • Fig.2E- please add the consensus sequence used for circadian E-box vs. E-box to the figure.

      Response: In the revised manuscript Fig.4c, we show which E-box motifs we extracted for our promoter analysis. We as well changed our analysis and did no longer use HOMER, but we directly extracted promoter sequences and looked for canonical Ebox CANNTG and Circadian Ebox CACGTG and generate a Circadian Ebox enrichment output per gene promoter.

      (11) There has been some discussion about the evolutionary statement as stated by the authors. It appears that depending on the background of the reader, this can be misunderstood. We thus suggest to more clearly point out where the author thinks there is evolutionary conservation (a function for clk in the circadian oscillator under constant light or dark conditions) versus where there is no apparent evolutionary conservation (the situation under light-dark conditions).

      Response: In the revised manuscript we proposed a conserved function of NvCLK in constant darkness, and a light-response pathway compensating in LD conditions in the mutant.

      Please also consider the major comments 8 and 9 of the common review from reviewer 2.

      Reviewer #1 (Recommendations For The Authors):

      The hybridization chain-reaction ISH is OK but, I'm not sure I understand the control condition-this should be clarified. I would also welcome the use of Clock-/- animals in HCR as another, more direct level of control. In addition, the authors state that the Myh7 probes hybridise in anatomical regions resembling those for Clock (Fig 3e). It would be better to duplex these two probe sets with different fluors for a better representation of the relative spatial distributions of each transcript.

      Response: We agree that checking for co-expression should be straightforward by HCR. However, due to time constraints during the revision period, we are unable to conduct the double in-situ experiment. Additionally, upon careful consideration, we recognize that including myhc-st (mistakenly named myh7) staining and co-expression analysis would not significantly contribute to the main conclusions of our study. Therefore, we have decided to remove this analysis from the revised manuscript.

      We clarified in the methods the control probes design.

      Minor points:

      Figure legends do not all convey sufficient detail. For instance, Figure 1c needs a better explanation. Figure 3e- are these images both WT? Fig 3f doesn't exist and other figure text references do not align with figures and need an overhaul.

      Response: All errors have been fixed.

      Reviewer #2 (Recommendations For The Authors):

      Major issues:

      (1) The authors need to introduce their model system better for a broad audience. What are the tissues/cells that express Clock at a higher level? What is their function, does this provide a potential explanation for their specific Clock expression, and how CLOCK might regulate behavior? Terms such as "tentacle endodermis and mesenteries" (line 132), "late planula stage" (line 133), "bisected physa" (line 149) would need some explanation.

      Response: We modified term such as planula to larvae, and bisected physa to tissue samples.

      2) Some of the terminology used is quite confusing, because of the double-meaning of the word "clock" (i.e the pacemaker and the transcription factor). The authors use terms such as "clock-controlled genes", "core clock genes", "CLOCK-dependent clock-controlled genes", "neo-clock-controlled genes". Is there any way to help the reader? Here are several suggestions: "core pacemaker genes," "CLOCK-dependent rhythmic genes" and "CLOCK-independent rhythmic genes".

      Response: all the terminology has been clarified, see previous comments

      3) Also in the abstract, there is mention of "hierarchal light- and Clock-signaling" (52-3) - is this related to the statement on line 219 that light is epistatic to Clock? I do not quite understand what epistatic would mean here. Who is upstream of whom? LD modifies rhythmicity in Clock mutant animals, but Clock mutations also impact rhythmicity in LD. Also, as epistasis is defined as the effect of gene interactions on phenotypes - what is the secondary gene impacting the phenotype of the Clock mutants? I am not sure the term epistatic is appropriate in the present context.

      Response: Indeed, Epistatic is a genetic term which might be unclear in this context. We removed it.

      4) The control for the in situ hybridization is unclear. In the methods, there are two Clock probes described (B3 & B5) and two control probes (B1 & B3), however, in the negative control image, a combination of one Clock (B1) and one control (B3) probe is used, I am not sure what "redundant detection" means in the legend of figure S2. Also, the sequences of each Clock probe should be provided. It might be worth testing the Clock mutant the authors generated. Clock mRNA could be reduced due to non-sense, mediated RNA decay, since the mutation causes a premature stop codon. This would be a great additional control for the in situ hybridization. Even better would be if, by chance, the probes target the mutated sequence. The signal should then be completely lost.

      Response: HCR is a tilling probe. Which means the target transcript is covered by dozens of successive DNA sequence “primer-like” which allow the HCRv.3 technology. We cannot design a mutant probe specific with this technology.

      (5) I have concerns with rhythmic-expression calls, particularly as there is so little overlap between LD and DD, and that a completely different set of rhythmic genes is observed in Clock mutant and wild-type animals. I am not an expert in whole-genome expression studies, so I hope one of my colleague reviewers can weigh in.

      When describing rhythmicity analysis in the Methods, it states that Benjamini-Hochberg corrections were applied to account for multiple comparisons. However, the false discovery rate for each analysis should be included (see Hughes et al.,: "Guidelines for Genome-Scale Analysis of Biological Rhythms," 2017).

      Response: As explained before we cannot used Benjamini-Hochberg corrections as only few genes (mostly oscillator gene pass the threshold). As such we combined two different algorithms (RAIN and JTK) with a p<0.01 to detect confidently rhythmic genes while reducing the risk of false-positives.

      Minor issues:

      (1) Environmental inputs are not "circadian", as written in the title.

      Response: Title modified

      (2) In the abstract, the description of the Clock mutant behavioral phenotypes is hard to follow, with no mention of whether or not Clock mutant animals are behaviorally rhythmic or arrhythmic in constant conditions.

      Response: corrected

      (3) Abstract: A 6/6 h LD cycle is not a compressed tidal cycle as written in the abstract. Light is not an input to tidal rhythms.

      Response: corrected

      (4) Line 101: timeout is not a core clock gene in animals.

      Response: we removed it from the candidate pacemaker genes.

      (5) What is the evidence for the role of PAR-Zip proteins in the Nematostella clock? The reference provided does not mention those.

      Response: There is no functional data in Nematostella yet to support their role within the pacemaker. However based on their rhythmicity in LD and protein conservation, we included them within the candidate pacemaker genes list. The refences have been corrected.

      (6) Line 125. should refer to Fig 1C when describing the Clock protein.

      Response: corrected

      (7) Line 143-4. based on the figure, the region targeted by gRNA was not "close to the 5' end" as stated, it is closer to the middle of the gene sequence as shown in Figure 1C. A more accurate description would be a region in between the PAS domains.

      Response: Indeed we modified the figure and the text.

      (8) Line 150. The mutant allele is described as Clock1 initially, then for the rest of the paper as Clock-. SInce it is not clear that the allele is a null (see major comment #8), Clock1 should be used throughout the manuscript.

      Response: the allele is named NvClk1 in the revised manuscript

      (9) Figure 2A, the second CT/ZT0 is misplaced.

      Response: Fig. 2 modified in the revised manuscript

      (10) Figure legend for 2E and 3B. "The 1000bp upstream ATG" is unclear. I guess it means that 1000bp upstream of the putative initiation codon was used.

      Response: Right, and in the revised version we analyzed 5kb upstream the putative ATG.

      (11) Line 164. The authors write "We discovered..." , but wasn't it already known that these animals are behaviorally rhythmic?

      Response: Fixed

      (12) It would be worth mentioning in the results section the reduced amplitude of rhythms in LL compared to DD (in WT and seemingly also in Clock mutants).

      Response: Indeed, we observed a significant reduction in the mean amplitude in the NvClk1-/- in DD and LL compared WT and NvClk1-/- in LD, DD and LL. However, as rhythmicity is lost by virtually all mutants in LL and DD we do not think these results add to the current interpretation of the gene function.

      (13) Please correct the figure numbers in the main text, there are several mistakes.

      Response: Done

      (14) Line 196, most genes in the quoted study did not cycle on day 2, so whether they are truly clock controlled is questionable.

      Response: We agree, identifying free-running cycling genes in cnidarian remains a challenge to overcome. One of the limitations of this study was to detect rhythmic genes in LD which conserved rhythmicity in DD. However, considering different transcriptomic studies (cited in the discussion) it seems that in the cnidaria phyla rhythmic genes in LD are not necessarily the one we identified rhythmic in DD.

      (15) Line 204-206 needs to be rephrased. It is confusing.

      Response: rephrased

      (16) Line 216. Rephrase to something like: "A similar finding was made for."

      Response: rephrased

      (17) "Clock regulates genetic pathways" sounds quite odd. Do you mean it regulates preferentially specific genetic (or maybe better, molecular) pathways?

      Response: rephrased

      (18) Figure 4 and legend: Dashed lines indicating threshold are missing. Do the black and red dots represent WT and Clock-/-, as indicated in the legend, or up/down, as indicated in the figures?

      Response: Fig.5 modified accordingly. Colors in the Volcano plot indicate Up- (black) versus Down- (red) regulated. It is now coherent within the figure.

      (19) Legend for Extended figure 1. "Immature peptide sequence" is incorrect.

      Response: rephrased

      (20) Extended data Figure 4. What the asterisks labels is unclear.

      Response: EDF4 was modified and become EDF2 with different content. The * indicates NvClk mRNA

      (21) Line 228. Gene "isoforms". I guess the authors mean "paralogs".

      Response: corrected.

      (22) Line 232-3/Figure 3e. Please include a comparable image of the Clk ISH to facilitate the comparison of the spatial expression pattern. In addition, where and what is the "analysis" referred to - "the spatial expression pattern of Myh7 closely resembled that of Clock, as evidenced by our analysis"?

      Response: the analysis has been removed from the revised manuscript because we currently cannot perform the double ish.

      (23) Line 282-3. As mentioned above, it is difficult to be sure that circadian behavior is lost, if only looking at a population of animals.

      Response: Fig.1 corrected

      (24) Line 301-5. Rephrase.

      Response: Rephrased

      (25) Line 325. I am not convinced that the author can say that their mutant is amorphic. See Major comment 8.

      Response: corrected.

      (26) Line 351 "simplifying interactions with the environment". Please explain what is meant here.

      Response: this confusing sentence has been removed from the revised manuscript